Machine Learning in Precision Medicine to Preserve Privacy via Encryption

October 22, 2021

What if we could diagnose and treat patients considering their genetic makeups, medical histories, their living environments, and lifestyles?

Wouldn’t it be more effective than the current regime of one-size-fits-all medicine? Yes, it would be referred to as Precision Medicine. Precision Medicine can become a certain type of upgrade for medical identification of health risks, diagnoses, and outcomes; however, the concerns around data privacy have held back the vast adoption of Precision Medicine in clinical practice.

Principles of precision medicine could be applied essentially to all medical areas. Image credit: sasint | Free image via Pixabay

William Briguglio, Parisa Moghaddam, Waleed A. Yousef, Issa Traoré, and Mohammad Mamun have discussed these data privacy concerns within the field of precision medicine in their research paper titled “Machine Learning in Precision Medicine to Preserve Privacy via Encryption” that forms the basis of the following text.

Importance of this Research

The researchers have proposed a generic machine learning with encryption (MLE) framework that out-performs most recent studies conducted on the same dataset while conserving the patient’s genomic data privacy. The researchers have also made the design and implementation of the framework open-source to facilitate the validation, reproduction, and extension of their work.

Understanding Machine Learning with Encryption

The below image presents the block diagram of the proposed Machine Learning model with Encryption.

Image credit: arXiv:2102.03412 [cs.LG]

Key Parameters to understand the above MLE framework

Client: The testing data resides at Client. This data is sensitive, confidential, and needs encryption.
Server: It is the cloud engine that performs prediction & general analytics.
Database: It contains publicly available genetic datasets and private datasets. The public datasets do not require preserving privacy; however, the confidentiality of the private dataset should be protected by encryption.
ML Construction: ML Construction uses the datasets in the detests module to construct models— including transformation, feature selection, resampling, etc.

Datasets

MSK-IMPACT dataset contains tumor tissue samples taken from 10,336 patients that have 100,000 mutations.

Building the Model

The researchers tried 2,240 different ML configurations to achieve an accuracy of 77.47% in one of the configurations. The best three configurations are mentioned below

Image credit: arXiv:2102.03412 [cs.LG]

For encryption, the researchers used the SEAL Library. SEAL Library is an open-source HE library developed by the cryptography and privacy research group at Microsoft.

The Open source Python code was made available by the researchers at https://github.com/isotlaboratory/Healthcare-Security-Analysis-MLE

Conclusion

In the words of the researchers,

Toward building privacy preserving Machine Learning (ML) models for precision medicine, this article has three contributions. First, we proposed and implemented a machine learning with encryption (MLE) framework that accommodates different scenarios for encrypting the ML training-testing process. Second, and most importantly, we analyzed the recent high-quality clinical sequencing cohort dataset MSK-IMPACT and provided a predictive model that is both secure and outperforming the most recent predictive model built for the same dataset. Third, we offered the ML, software engineering, and precision medicine communities free resources: respectively, the client-server implementation of the framework, the Python code of all the ML experiments, and a cloud service to test genomic cases. These offerings contribute to the evolution of the privacy-preserving analytics of precision medicine

Source: William Briguglio, Parisa Moghaddam, Waleed A. Yousef, Issa Traoré and Mohammad Mamun’s “Machine Learning in Precision Medicine to Preserve Privacy via Encryption”