Thu Vu

All Data Science Careers Explained

Data scientist is not the only sexy job.
All Data Science Careers Explained
Photo by Boitumelo Phetla on Unsplash

Data is the new oil. In 2017, a study from IBM estimated that 90% of the all the data in human history has been generated in the last 2 years. This percentage has only increased if we consider the world today in 2021. Together with advancement in algorithms and computational power, this has created a greater than ever demand in data related jobs.

In the past few years, we’ve heard a lot about all those job titles like Data Scientist, Data analyst, Data engineer, Data architect, Machine learning engineer… They all sound fancy and sexy, but to many people they bring as much confusion as excitement.

In this article, I want to demystify some of these main data roles and the different kinds of data careers that exist nowadays. At the end of the article, I’m gonna talk about the skill sets you need for each type of career for those who are aspiring to break into a one of the data related careers and succeed in the field.

*Note: You can also watch the video version of this entire article in the embedded link below.


Data science is NOT a single discipline. It is rather an umbrella (generic) term that describes the complex process in a team of data scientists with almost non-overlapping skills.

If we look at a data science pipeline. Pipeline means the whole workflow that involves everything you do with data in order to obtain values and actionable insights from data. For example, an e-commerce platform who uses shopping behavior data to improve their sales, or an organization who wants to improve their retention rate using employees’ data.

A paper from MIT published last year in 2020, defined two distinct but interconnected areas in a data science pipeline: Back-end and front-end.

Illustration by Author

Back-end data science

Back-end data science deals with hardware, efficient computing and data storage structure. Example roles are Data engineer, Data architect, who create and develop data models, database systems, data APIs and data warehousing solutions.

Front-end data science

Front-end data science, on the other hand, is geared more towards data analysis, machine learning and building applications that interact with users. This area involves roles such as:

  • Data analysts who wrangle, explore, quality assess, fit models to data, perform statistical inference, and develop prototypes.
  • We also have data scientists, who is a step above data analysts, they use complex statistical and machine learning, deep learning models to explore patterns in data, detect anomalies and create prediction models.
  • Another important role is machine learning engineers who work with data scientists to build and assess prediction algorithms and make the solution scalable and robust for many users.
  • A less technically-oriented role is Business Intelligence Analysts, or just Business Analysts, who link data insights to actionable business insights to improve business processes. They are also strong communicators to spread the message to the team and convince management in their decision making.
  • And then we have Data science software developers, who are not directly involved in the data science pipeline but instead develop the software tools that facilitate data science. Examples are the developers of Hadoop, R, RStudio, IPython notebooks, TensorFlow, D3, pandas, tidyverse and all kinds of tooling and packages.

From my experience working in the data science field with a lot of clients and in different kinds of projects, I think in reality there are many more data related roles surrounding this pipeline. Some of them are not talked about nearly as much as roles like Data scientist or Business Analyst. But they are no less important. It would be a pity not to consider these roles if you love working with data.

For example, Data Journalists are those who communicate data insights through visualization or presentation. If you read big newspapers like the New York Times, you may have come across some beautiful and eye-catching data visualizations on these websites. They are entertaining and educational at the same time. A step further is Data Artists who are really artists at heart who use data combined with storytelling as their medium. They often have not only strong technical skills like programming, web development and UX design, but also aesthetic insights. One of my favorite data artists is Nadieh Bremer, who creates beautiful and sophisticated data visualizations, mostly using D3.js and canvas.js (data visualization libraries in Javascript). If you take a look at her website, you can see that her visualizations deserve to be called pieces of art. One of her visualizations has become my inspiration for a network visualization project I did not long ago with a client.

Another role in a data science team I could think of is Data science business developer who has strong domain expertise and at the same time knows data science concepts on a high level. By “high level” I don’t mean “advanced level” but more that he knows data science concepts in a more generic sense, he knows what’s possible, he knows how to apply them to a real-world problem. But it’s not likely he can explain to you the nitty gritty detail of a machine learning algorithm or programming language. What he’s good at is connecting the dots and can spot valuable data science opportunities for the business.

In recent years, data privacy and data security have become a serious topic. Here in Europe, the GDPR (General Data Protection Regulation) has been enforced since 2018. That’s why whenever you visit a website, there’s always that annoying cookie pop-up that asks you if you accept the usage of cookie on this website or not. Since then, many companies have to hire Data Privacy Officer. They help companies interpret the laws and regulation and advises companies on how to comply to regulation and prevent the risks of potential violation of the law.


Skills Required in Each Data Career

As you can see, these roles are quite distinct from each other, because they belong to different steps of a data science pipeline and different aspects of a data science project. Therefore, they also require different skill sets. But not completely different.

Essentially, the skill set of any data science jobs can be summarized in three core groups of skills: Computer science/ IT, Math and Statistics, and Domain/ Business knowledge.

Illustration by Author

You can think of it like this, each data science role that I described to you earlier requires different ratios of these skill groups.

For instance, a Data engineer needs to have a bigger portion of Computer science/ IT skills, because he needs to understand how computer works, how data and information are stored, what are different data structures and how to use them, how distributed computing works, etc.

Illustration by Author

A data analyst or data scientist, however, would need to have stronger Math/ Statistics knowledge. Because they are going to generate insights from data for business, and they’d better be accurate. On the other hand, they don’t need extremely in-depth computer science/ IT skills.

Illustration by Author

There has been quite some complaint about the fact that data scientists don’t know how to write quality and production code. Their code is usually messy, sometimes repetitive and inefficient, because they’re too busy with data cleaning, data preprocessing, exploration and making experiments and modelling. They can’t bother cleaning their own code or making it production-ready. This is when Machine Learning Engineers come into rescue.

Another example is Data science Business developer, he would need a very strong Domain expertise, whatever it is. In a project at work, I worked with a group of doctors and professors from a medical university here in the Netherlands. Our team helped them analyze a huge gene sequencing dataset from lung cancer patients. Our task was to find out which genes or which genetic mutations contribute to a particular kind of cancer that a patient gets, because there are 2 types of lung cancer, one is more malicious than the other and needs a different kind of treatment. In a sense, these doctors and professors have the role of Data science business developers as they are the experts in this research domain. They came up with new ideas and hypotheses for us to test using the available data. They also pointed us to the right direction and where to start when we were swimming in this huge dataset.

In short, this skill framework can help guide you when you don’t know what kind of skill or knowledge you need to obtain for a particular data career. Of course we need to look at this in a relative way and put it in the context of regulation, technological development and all the human aspects like behavioral science and aesthetic elements. There are so many exciting things in this field and new kinds of career are surely going to come in existence in a near future.


I hope this article gives you a good overview of all the main data-related careers out there and helps you better understand what’s needed in each career if you are considering on. Thank you for reading!

About the author

Master data science & AI skills, build awesome portfolios, land the job you love.

Join 3,000+ data enthusiasts getting ahead in their careers by doing real-world projects, building experience and accessing top resources delivered to your inbox.

Thu Vu

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to Thu Vu.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.