Common Myths about Data Science
While Data Science as a field continues to mature, some faulty assumptions persists. Let’s address the most common ones.
While Data Science as a field continues to mature, some faulty assumptions persists. Let’s address the most common ones.
Common Myths about Data Science
Data science is losing some of its mystique as the field continues to mature, which is a good thing in many ways. Business leaders are gaining a more realistic view of the field as more enterprises see returns on their investments in data science capabilities. While some may have a more nuanced understanding of what data science can do, few understand how data science is actually done. As a result, even as its practitioners become indispensable, the field remains a "black box" for many.
Several studies suggest that the practice and profession sometimes remain opaque for the C-suite and other stakeholders. For data science to continue delivering on its great potential, we must correct faulty assumptions held by organizational leaders. In this post, we’ll walk through the most common myths and misconceptions, along with explanations to help dispel them.
Myth 1: More data = greater accuracy
When it comes to data collection, many companies place great emphasis on quantity. This isn't surprising, as research and surveys often teach that large data sets allow for better conclusions. However, as more companies use analytics in their decision-making, the demand for ever-larger amounts of data is increasing. Does gathering huge amounts of data really improve performance?
Data Scientists say no. Deep and broad pools of training data have their advantages, for example, in solving variance problems. But more data doesn't necessarily solve other problems, such as bias, nor can it replace conventional analysis. Companies with the most advanced data science capabilities already know this.
So where should companies focus their data efforts? As the saying goes, quality over quantity. Instead of asking if there's enough, companies should be asking if they're providing their teams with clean, relevant and useful data for what they're trying to model. In fact, huge amounts of low-quality data can lead to noisy results and poor insights, as disappointing early attempts to use AI against COVID -19 show. Companies would do better and make their data scientists happier if they instead prioritized stronger data management practices and better communication.
Myth 2: Data science is next to be automated
After nearly two years of the pandemic crippling factories and disrupting supply chains, and with the labour market tight, executives across industries are turning to the promise of automation. Some may see data science, the basis for much of today's automation, as a natural candidate for the next wave of AI-powered upheaval. But this scenario seems unlikely.
Few data scientists seem worried that machines will replace them. On the contrary, they see the opportunity for AI and automation to help with easily repeatable tasks and free up more resources for work that requires human intervention, interpretation and problem solving. Simply put, automation will allow people to develop more complex models or algorithms and spend less time on routine work.
Myth 3: Data scientists can’t code
Data Science is still an emerging field, and many companies are only now hiring dedicated Data Science talent. Data Scientists are often lumped in with other "technical" staff in a company. Compared to a software engineer, it may be tempting to think that data scientists do not know how to work with code. But make no mistake: the vast majority of data scientists are also programmers, just of a slightly different kind.
The difference between a data scientist and a software developer is how, when and why they programme. For data scientists, Python is usually a fundamental skill in their toolbox for gaining insights from datasets. They work with code from their data pipelines and machine learning models to query data, develop features, and build and deploy models. In contrast, software engineers use code primarily for product development and often focus on infrastructure, automation, testing and maintenance. Nevertheless, due to the wide variety of skills a software engineer must have, some skills will overlap with those of data scientists - these groups have more in common than many realise.
Myth 4: Data Science is all about building models
Machine learning projects require more than just training a model. Data has to be cleaned and preprocessed in a robust pipeline, models have to be placed in production and predictions need to be served to end users.
Most companies can’t afford to keep a data scientist working only on modelling, while remaining idle for the rest of the project. That’s why data professionals are also expected to work on data engineering, model deployment and MLOps tasks. Most data teams — save for maybe R&D departments — spend only a fraction of their time building models.
Bringing clarity
At Prophecy Labs we encourage all our data scientists to continually seek to integrate more effectively with business departments. We consider it essential to take the time to educate our peers on common myths like these, whenever we can. Raising awareness for how data scientists work can help improve everything from the accuracy of model predictions to the quality of candidates recruited to fill open positions.