What does a data scientist?

Data science is a multidisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data - Wikipedia

The role of a data scientist is to solve real company probleme using data. For this purpose, a data scientist will need to know how to code, and also have a good understanding of math and algorithms, from those skills he will be able to gain insight from data. Then he is using that insight by making prediction and draw conclusion.

Venn diagram of the skills of a Data scientist - Drew Conway

However, he also need to know how to communicate his conclusions to the rest of the team he is working with which are non data scientists. That mean, he will also need to know how to do great data visualization and be a good business communicator.

Bellow you can follow how does a data scientist proceed to solve problem from scratch:

  1. understand the problem of the company
  2. data collection: collect data
  3. data preparation: clean and transformed data
  4. data exploration: converting data into visual insights (data visualization tool, machine learning tools, algorithmes)
  5. data modeling: using ML to create models, training and testing the data, using the best models
  6. visualization & communication: be able to communicate his results with the rest of the team in a clear way. Need a good data visualization and good skill in communication.

Depend on the company’s size and budget data scientist have a different role into a company. For instance, in a startup with a little funding one data scientist will do the entire work from understanding the problem to the communication of his conclusion and his prediction. In an other hand in a large company with a huge budget, the work will be shared, a software engineer will collect, clean, and transform the data. Meanwhile, a machine learning engineer will analyze the data and find the best models from Machine Learning.