Use of Data Science Technologies for Healthcare Professionals
By Mohammed Quazi, Ph.D., Assistant Professor-Data Scientist | Statistician, School of Nursing, West Virginia University
As someone who has been teaching Data Science courses to healthcare professionals, I have observed that students often start with great optimism about what they are going to learn. This excitement is understandable, especially since Data Science has become a buzzword for students who began their college education in the 2010s. During this period, universities introduced specialized programs in Data Science, a field that had been virtually nonexistent in formal academic curricula prior to that time. For instance, in 2010, a student pursuing an undergraduate degree had no option to major in Data Science because there were no Bachelor of Science programs in the field. However, there were programs in statistics and mathematics.
The term “data scientist” gained widespread recognition after being described as the “sexiest job of the 21st century” in a 2012 article by Thomas H. Davenport and DJ Patil, published in the Harvard Business Review. DJ Patil later became the first ever U.S. Chief Data Scientist in 2015 [1]. A decade after their initial publication, in 2022, Davenport and Patil reiterated the growing demand for data scientists, emphasizing its importance to employers more than ever before [2].
However, when students discover that Data Science has deep roots in statistics, and that the course begins with a focus on statistics, their initial enthusiasm often diminishes. This approach is intentional, as a strong foundation in statistics is essential to understanding Data Science. After all, it was John Tukey, a statistician, who first predicted the emergence of a new field stemming from the growth of computing power. In 1962, he referred to this burgeoning field as “data analysis,” which laid the groundwork for what we now call Data Science [3].
Looking ahead, it is unlikely that healthcare experts will need to master the intricacies of Data Science themselves. Instead, collaboration with data scientists will remain essential. However, it is crucial that clinicians approach the results presented to them with a critical perspective.
The decision to start with statistics is not intended to demotivate students but to ensure they understand the foundational roots of Data Science. When I explain how the evolution of Data Science is tied to the emergence of extensive data and advancements in computer science while emphasizing how these intersect with the principles of statistics, professionals begin to appreciate its significance. They have come to realize that statistics plays a central role in the world of Data Science. In fact, in my opinion, it is more accurate to say that Data Science exists within the realm of statistics, rather than the other way around.
In educating healthcare professionals in Data Science, recent advancements have included incorporating research publications in healthcare that utilize statistical methods and providing tutorials for tools such as MS Excel, Tableau, R, SPSS, and SAS as part of the course curriculum. However, introducing these technologies presents unique challenges, with the most notable being a lack of initial interest among students in learning new tools. This places the responsibility on instructors to convince students of the value of these technologies.
To address this, it is essential to allocate ample time, beginning with basic operations and gradually building students’ skillsets. A more effective strategy has been to use these tools as a means to teach statistics rather than as an end in themselves. Students are assessed on their understanding and interpretation of statistical concepts rather than their technical proficiency with the tools. Detailed instructions are provided on executing commands and obtaining statistical results, but evaluations focus solely on interpreting those results. This approach keeps students engaged with the subject matter and reduces the pressure of mastering new software during the learning process.
Working with healthcare professionals as a statistician and data scientist often involves bridging the gap between two distinct domains. While healthcare professionals and data experts operate on different wavelengths, it becomes essential to create a shared understanding. Effective collaboration requires translating healthcare professionals’ language into statistical concepts and fostering meaningful dialogue. Once the information is reframed, statistical methods can analyze and interpret the data. Presenting the results back to the healthcare team in a language that resonates with healthcare professionals and in an accessible manner is equally crucial for ensuring seamless collaboration and maximizing the impact of the research.
Healthcare professionals are generally less concerned with whether results are derived from an extreme gradient boosting algorithm or a deep learning network. What truly matters to them is whether the results are understandable and clinically meaningful. Again, the onus lies on the statistician to present the findings clearly and concisely while also highlighting any limitations of the analysis. Furthermore, it is essential to contextualize the results within the broader scope of the research, helping clinicians understand the practical implications and any potential uncertainties. By doing so, statisticians empower healthcare professionals to make informed decisions and critically assess the data, ensuring that statistical insights contribute effectively to clinical advancements.
Looking ahead, it is unlikely that healthcare experts will need to master the intricacies of Data Science themselves. Instead, collaboration with data scientists will remain essential. However, it is crucial that clinicians approach the results presented to them with a critical perspective. They must develop the ability to question the underlying science, ensuring the validity and applicability of the findings. As George Box wisely noted, “All models are wrong, but some are useful.” This perspective underscores the importance of critical thinking when interpreting data in a clinical context.
References:
[1] Davenport, Thomas, and D.J. Patil. Data Scientist: The Sexiest Job of the 21st Century-Meet the People Who Can Coax Treasure out of Messy, Unstructured Data. Harvard Business Review, Oct. 2012.
[2] Davenport, Thomas, and D.J. Patil. Is Data Scientist Still the Sexiest Job of the 21st Century? Harvard Business Review, July 2022.
[3] Donoho, David. 50 Years of Data Science. Journal of Computational and Graphical Statistics, Sept. 2015.