Data Science for Healthcare: A Brighter Clinical Future


Elnaz Amanzadeh Jajin ORCID 1 , *

1 Cellular and Molecular Biology/Microbiology, Faculty of Biological Sciences and Technologies, University of Isfahan, Isfahan, Iran

How to Cite: Amanzadeh Jajin E. Data Science for Healthcare: A Brighter Clinical Future, Precis Med Clin OMICS. In Press(In Press): e114871.


Precision Medicine and Clinical OMICS: In Press (In Press); e114871
Published Online: June 15, 2021
Article Type: Editorial
Received: April 7, 2021
Revised: April 30, 2021
Accepted: May 9, 2021
Uncorrected Proof scheduled for 1 (1)

The last decade is known for the exploitation of clinical data and huge advancements in computational methods, which resulted in the introduction of large amounts of data with detailed information at the molecular level. These data can be used by health systems to increase the accuracy of diagnostic techniques as well as the efficiency of treatments. Clinical data, especially those collected by Electronic Health Records (EHR), have turned into a new approach for recording patients' health information (1). Computer science has provided a great opportunity to record detailed data of patients. EHRs include data on, for instance, demographic characteristics, medical images, medical signals, genomics data, and pharmaceutical data. EHRs allow hospitals 24/7 access to the clinical history of patients. In addition, the information provided through EHRs allows physicians to make better diagnostic and curative decisions. Moreover, access to such information paves the way for achieving a better follow-up process and to reduces healthcare costs (2). EHRs also have advantages regarding health research and hypothesis to hospitals where new treatments are required and applied every day. Apart from the significant benefits of the collection of clinical data, it seems that storing and analysis of these data is becoming a challenge. The first challenge is related to data storage. Access to patients' data and accurate data analysis require the cooperation of several departments. Data analytics, big data management, programming, classification and prediction algorithms, statistics, web development, computer science, etc., are involved in data science for healthcare. Data scientists have developed infrastructures that are required to store high throughput data, known as cloud computing. Through cloud computing, partners working on a similar project can have access to all necessary data. Authorities have different responsibilities, including operators for data entry, researchers for data analysis, and programmers to update web-based platforms. This approach also makes it possible to review databases and update them remotely (3). Accordingly, the analytical process, as the second challenge, has been improved since clouds are usually connected to super-computers. In healthcare, big data is heterogeneous and multi-spectral, because such data are collected through various resources, including diagnosis, treatment, EHRs, etc. (4). Therefore, several methods should be used to integrate data at pre- or post-analysis levels and to analyze such data. Data science provides the possibility to gather healthcare data and analyze them to return understandable outputs for healthcare staff. The importance of data analytics is highlighted when there are data (both structured and unstructured) that should be linked or should be analyzed and interpreted to provide understandable reports for physicians. Currently, with advancements in OMICs studies, in different dimensions, including genomics, proteomics, and metabolomics, huge amounts of data have been collected (5).

To analysis OMICs data, integrated packages and pipelines have been developed, which can be applied in remote super-computers. The achieved results can be used for point mutation detections, gene enrichment analysis, protein interactions, and pathway enrichment analysis studies. In addition, analysis of genomics data with the aim of detecting new markers for complex diseases, including cancers, neurodegenerative diseases, and atherosclerosis diseases, has been widely used by biological researchers. The output of these analyses can be used to design novel and more effective medications. In the case of clinical images like magnetic resonance imaging (MRI), ultrasound, and computed topography (TC), enhancement, segmentation, and denoising should be applied. Processing clinical signal data, for example, integration of huge data collected in intensive care units (ICU) with a large amount of pathophysiological data, requires a robust infrastructure. Text mining studies using available documents, including academic researches, are considered as a source for improvement of treatment methods.

Another application of healthcare data science is to predict future steps in diagnosis and treatment of various diseases (6). As healthcare data is devaluated and predictions lose their validity after a certain time. Hence, data scientists should have access to updated data. On the other hand, with increase in size, dimensionality, and complexity of data, analysis becomes more complicated. Machine learning (ML) and artificial intelligence (AI) are very important approaches for big data analysis and prediction. The integration of big data analysis and healthcare system has led to the appearance of a new concept called clinical decision support system (CDSS) (7). ML and AI play key roles in the development of CDSS. CDSS uses ML approaches and patient's medical history as the entry and returns medical recommendations.

Noteworthy, considering the heterogeneity of biological and clinical big data, ML methods can be applied to develop new models such as Bayesian Naïve models, and to classify data using classification algorithms, for example, random forest, support vector machine, neighbor-joining, and etc. Meanwhile, deep learning has become very popular and is frequently used in big data projects. Deep learning provides end-to-end learning methods for healthcare big data. Deep learning provides the possibility to translate the clinical complex data into an understandable human health report. Based on this feature, deep learning has been widely used by researchers to develop prediction algorithms among which early diagnosis of neurodegenerative diseases using MRI processing, detection of variations of Alzheimer's disease, early diagnosis of breast cancer using ultrasound images processing, and prediction of future diseases, based on their clinical history, can be mentioned. Moreover, some research groups have used deep learning to develop applications for the diagnosis and treatment of diseases. For example, DeepCare is a predictive application for medical recommendations designed by EHR. DeepPatient contains using clinical history of patients to predict future diseases. However, the development of deep learning models requires data availability, powerful computers, and accurate and intensive algorithms to represent reliable predictions (8-10).

Another important point is data privacy, particularly concerning health-related big data projects that should be investigated by various stakeholders, including providers, data scientists, and the users of deep learning platforms (11). In this regard, policies should be applied for data transfer (especially when cloud computing is applied), data storage, data usage, and the usage of analytics results by hospitals and organizations that cooperate in the development of deep learning methods. Apart from the abovementioned advantages for the improvement of patient care, healthcare data science also has economic benefits for healthcare systems. It is estimated that only in the United States the application of healthcare data science for cancer patients worth about 300$ billion, from which two-third is achieved through declined costs, and the remaining is the value of extra benefits (12).

There is no doubt that healthcare plays a key role in healthcare advancements. So that, in a few years, EHR of patients will be available at a global scale, which will speed up both diagnosis and treatment processes. In other words, EHRs are considered as a very first step to achieve precise medicine and telemedicine (3). Currently, 24 countries and 9 consortia are planning for precision medicine, which have developed robust infrastructures, computational platforms, and mainly use ML methods.

Altogether, fast and accurate data analysis are a necessity for the success of healthcare data science. For this purpose, infrastructures and investments should be provided by various responsible institutions. Besides, the existence of healthcare departments and the availability of data scientists familiar with biological and medical information are necessary for this purpose. In conclusion, data science is showing us a new era in healthcare that can improve all aspects of patient care, will provide new job opportunities, and brings economic benefits to the health systems. Noteworthy, every day means a new step towards a more developed healthcare data science.



  • 1.

    Agrawal R, Prabakaran S. Big data in digital healthcare: Lessons learnt and recommendations for general practice. Heredity (Edinb). 2020;124(4):525-34. doi: 10.1038/s41437-020-0303-2. [PubMed: 32139886]. [PubMed Central: PMC7080757].

  • 2.

    Esposito C, De Santis A, Tortora G, Chang H, Choo KKR. Blockchain: A panacea for healthcare cloud-based data security and privacy? IEEE Cloud Computing. 2018;5(1):31-7. doi: 10.1109/mcc.2018.011791712.

  • 3.

    Hulsen T, Jamuar SS, Moody AR, Karnes JH, Varga O, Hedensted S, et al. From big data to precision medicine. Front Med (Lausanne). 2019;6:34. doi: 10.3389/fmed.2019.00034. [PubMed: 30881956]. [PubMed Central: PMC6405506].

  • 4.

    Dinov ID. Volume and value of big healthcare data. J Med Stat Inform. 2016;4. doi: 10.7243/2053-7662-4-3. [PubMed: 26998309]. [PubMed Central: PMC4795481].

  • 5.

    Issa NT, Byers SW, Dakshanamurthy S. Big data: The next frontier for innovation in therapeutics and healthcare. Expert Rev Clin Pharmacol. 2014;7(3):293-8. doi: 10.1586/17512433.2014.905201. [PubMed: 24702684]. [PubMed Central: PMC4448933].

  • 6.

    Dinov ID. Methodological challenges and analytic opportunities for modeling and interpreting big healthcare data. Gigascience. 2016;5:12. doi: 10.1186/s13742-016-0117-6. [PubMed: 26918190]. [PubMed Central: PMC4766610].

  • 7.

    Belle A, Thiagarajan R, Soroushmehr SM, Navidi F, Beard DA, Najarian K. Big data analytics in healthcare. Biomed Res Int. 2015;2015:370194. doi: 10.1155/2015/370194. [PubMed: 26229957]. [PubMed Central: PMC4503556].

  • 8.

    Frohlich H, Balling R, Beerenwinkel N, Kohlbacher O, Kumar S, Lengauer T, et al. From hype to reality: Data science enabling personalized medicine. BMC Med. 2018;16(1):150. doi: 10.1186/s12916-018-1122-7. [PubMed: 30145981]. [PubMed Central: PMC6109989].

  • 9.

    Zhang S, Bamakan SMH, Qu Q, Li S. Learning for personalized medicine: A comprehensive review from a deep learning perspective. IEEE Rev Biomed Eng. 2019;12:194-208. doi: 10.1109/RBME.2018.2864254. [PubMed: 30106692].

  • 10.

    Miotto R, Wang F, Wang S, Jiang X, Dudley JT. Deep learning for healthcare: Review, opportunities and challenges. Brief Bioinform. 2018;19(6):1236-46. doi: 10.1093/bib/bbx044. [PubMed: 28481991]. [PubMed Central: PMC6455466].

  • 11.

    Iyengar A, Kundu A, Pallis G. Healthcare informatics and privacy. IEEE Internet Computing. 2018;22(2):29-31. doi: 10.1109/mic.2018.022021660.

  • 12.

    Harper EM. The economic value of health care data. Nurs Adm Q. 2013;37(2):105-8. doi: 10.1097/NAQ.0b013e318286db0d. [PubMed: 23454988].

  • Copyright © 2021, Author(s). This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License ( which permits copy and redistribute the material just in noncommercial usages, provided the original work is properly cited.