AI and ML Meets DevOps – Part II

Putting Data Science in Perspective

Data Science projects work may be broken into two major parts. They are Data Engineering and Data Science.  

Data Engineering is about 

  1. Sourcing the data from different databases inside your enterprise or outside your enterprise
  2. Doing ETL that includes cleaning/cleansing of data. At this point you may also sanitize/anonymize such that data is secured while working on Data Science
  3. Doing feature engineering to extract relevant information from data. This may fall into data science realm as one of the value data scientist/analyst bring is analyzing the data and make sense out of it. 

Data Science is about understanding and analyzing the data and then model it so that we can draw insights into business. It is about

  1. Creating hypothesis on business problems 
  2. Developing models to testing hypothesis either to prove or dis-prove
  3. Discover new knowledge on the business

Data Scientists develop features and train models. Once they get access to the data, they often spend weeks cleaning and then turning it into features and labels. They then develop and train models, and then evaluate them. They repeat this process several times until they have a model that can be used to test the hypothesis.