Demystifying Data Science and Machine Learning
By Priscila Neves Faria, PhD in Science, Lecturer at Data Science and AI Academy, North Carolina State University
Data Science and Machine Learning are probably the most sought-after courses today. The impression you get when reading news is that everyone is talking about them, and those who aren’t probably are looking to find out about them. After all, it doesn’t hurt to know a little more about these topics to continue the search for knowledge, or simply to avoid being left out of the topic of the moment. Data science needs everyone – students, educators, and the general public who have never heard about it – to grasp its terms and historical context. It means teaching it well and chatting about it easily. In this article, the objective is to present a general overview of data science and machine learning, particularly for students beginning to study data science, and teachers hoping to update their understanding, and demystify some concepts. Along the way, this article will help students and educators start their pursuit of knowledge in data, which is a fundamental part of being a citizen in an increasingly data-centric world.
What is the difference between Data Science and Machine Learning?
Data is the fuel needed to drive Machine Learning models, and as we are in the era of Big Data, it is easy to see why so many employers today prize expertise in Data Science. Data Science and Machine Learning are skills, not just technologies. These skills are needed to derive useful insights from data and solve problems by building predictive models.
Data science is the study of data, the formal process of extracting actionable insights from data to address real-world challenges. It is a novel interdisciplinary field that integrates and expands upon statistics, engineering, informatics, computing, communication, management, and sociology to study data and where it comes from. Machine learning, conversely, involves training machines to solve problems through exposure to large datasets. These fields are intrinsically linked; machine learning constitutes a subset of data science, employing machine learning algorithms and statistical methods to automate data analysis for practical applications.
Thinking about Machine Learning is thinking that we haven’t yet discovered how to process billions of information at once with the human brain, but we have machines to do the job.
Demystifying Data Science
Data Science’s applicability is extensive: it provides personalized experiences to millions of internet users and understands what motivates customers or what slows down production lines. Data is collected every second for every website you visit or every purchase you make. And later, data science will use this information to improve your experience when using each platform.
However, as this article was made to demystify concepts, we can say that data science will require you to run experiments to gather data, check its quality, clean and organize it, and prepare it for subsequent analysis. Data scientists develop algorithms to sort through massive datasets. These algorithms, when designed correctly and rigorously tested, identify trends and insights that may elude human observation. They can also make data analysis way faster.
Now, when we talk about these algorithms, we are talking about machine learning.
Demystifying Machine Learning
Machine learning is a branch of artificial intelligence that uses algorithms to extract data and predict future tendencies. It is a type of algorithm that automatically improves itself based on experience. The algorithm gains experience by processing more and more data and then modifies itself based on the properties of the data.
You are enjoying machine learning all the time without even realizing it. As soon as your day begins, your journey has already passed through machine learning. When you use voice assistants, which include Alexa, Siri and Google Assistant, these are all technologies created and continually improved by implementing machine learning. Later, when searching for films on Netflix or searching for videos on YouTube, this same type of technology will be applied to suggest films and series that interest you. Then, on the way to work, the Waze app will also use machine learning to guide you faster and safer in traffic.
The origins of Machine Learning go back further than most people think, but it began to take shape only with the advent of the Internet in the late 1990s and its usage grew in the mid-2010s. Current machine learning models are, in part, based on a brain-cell interaction that was developed as early as 1943 (Najafi et al., 2022). In that year, the very first mathematical model of an artificial neuron was developed by McCulloch and Pitts. However, the term, “Machine Learning,” was not coined until 1959, created by engineer and computer scientist Arthur Samuel, an engineer at MIT. At the time, he was studying the creation of an autonomous machine.
In Machine Learning (ML), the objective is to develop and literally train the model to learn and deduce answers from the data it accesses. In the first stage of the machine learning process, when it is implemented, the process is observed by people, and after that, it is automated. Over time, they generate increasingly more precision in generating results from associations between data. So, the ML process has some important steps: data acquisition; data pre-processing; training, evaluation and improvement of the model.
Thinking about Machine Learning is thinking that we haven’t yet discovered how to process billions of information at once with the human brain, but we have machines to do the job.
What are the future expectations?
The number of job openings shows us that it is an interesting time to think about a career involving data science skills. In fact, the Bureau of Labor Statistics reports that job opportunities in this sector are projected to increase by an impressive 36% from 2021 to 2031, which is well above the average for all occupations. Meanwhile, almost half of businesses use machine learning already, and almost all the top companies are investing in it. The global machine learning market is projected to reach an impressive US$209 billion by 2029 and the need for people with those data skills is going up (Source: Forbes).