February 20, 2015
Data Science was named Forbes Magazine's sexiest profession of 2014 as well as being the most trending STEM career in Ebony Magazine’s July issue. This has led to many wondering how they, too, can enter the data science profession. So, what is a typical day in the life of a data scientist?
How is data science similar to other jobs?
"A data scientist is someone who is better at statistics than any software engineer and better at software engineering than any statistician."
- Josh Wills of Cloudera
The above quote illustrates two jobs closely related to data science and the hybrid nature of big data analysis. Generally, statisticians are not apt in trending programming languages, data structures, or the advanced areas of computer science and many software engineers may not know as much about advanced mathematical and machine learning algorithms for analyzing, clustering and classifying data. The newness of the term "data scientist" has led to many disagreements on an formal definition of the term, but speaking with many in the field a typical work day is generally centered around the concept of writing code for mathematical and statistical algorithms, often dealing with "big" data sets.
What kinds of problems does a data scientist solve?
At a basic level, the general task of a data scientist is to search for patterns in large data sets. There is generally a lot of context left out of this task, however. Once we have an understanding of the data and what questions we would like to answer, there is a question of the best and most efficient methods to answer these questions. These two do not always agree and when they do not, the difference can often be grand. This is a common problem in computer science. For example, when we're searching for the maximal element, do we need to sort all the elements and select the element at the top of the stack, or simply run a search algorithm. Similarly in data science, a question may be whether there is a need to run an algorithm such as clustering (which may be expensive if it calls for multiple iterations through a large data set) or can we answer these questions with a simpler distance calculation?
How much of being a data scientist is sitting in front of a computer vs giving presentations vs working in groups?
Most of my time as a data scientist is spent researching, writing algorithms and writing code to answer the questions about the data sets in question. A fundamental part of data science involves group work - obtaining the data, understanding the data, and understanding and analyzing what is wanted from the data. Whether all these roles are filled by one person (me) or by a group of people depends on how your team is set up, but as the data scientist I generally have someone I am working with who can provide me with more insight on the data, answer relevant questions and clear up any confusion. In addition to this, there may be questions from developers and statisticians about our work as a data scientist that compare the methods we used to other methods in their fields. In these type of settings, it may be important for a data scientist to give some presentations to be able to answer such questions.
What is the most stressful/rewarding thing about being a data scientist?
As a mathematician there is a certain feeling of satisfaction in seeing the need for advanced algorithms to help solve problems in the real world. It’s one thing to read a textbook with example problems. It’s a totally different feeling to hear about a real world problem and use your knowledge to solve it. Similarly, being able to write the code and see this develop from a problem, to an idea to an algorithm, to a running program is a great and enjoyable process.
However in order to reach that last stage of a running program, we often have to go through what's known as debugging. This is a process of searching for errors in code that either prevents the program from running or leads to incorrect solutions. This can be a challenging process for a number of reasons - some similar to the reasons a developer would find debugging stressful, and others because data science often involves working in cloud environments which require some of the standard practices for debugging programs in a traditional environment to be revised.