Novel Informatics Approaches to Clinical Trials Cohort Identification for Precision Medicine

By Ajay Shah, Director, Research Informatics and Systems Division, City of Hope

Ajay Shah, Director, Research Informatics and Systems Division, City of Hope

Top ten highest grossing drugs in the US help between 25 to 40 percent people who take them. Precision Medicine (PM) approach promises better-informed, stratified and personalized patient care decisions and therapeutic discoveries. PM uses temporal (time based) collection of multi-faceted description of individuals that include, “omic” (e.g., genomic, transcriptomics, metabolomics), microbiome, exposome, data in conjunction with phenotypic and demographic data from EMR, disease registries, clinical trials and long-term follow-up, along with real time monitoring data from sensors and wearable devices.

"The informatics community is innovating next generation solutions to ensure that PM programs are successful"

A vexing problem of finding sufficient number of patients for clinical trials becomes even more daunting in the PM focused trials. Interestingly, among cancer-related trials that fail, 39 percent fail due to poor patient accrual. As drug discovery becomes more stratified and personalized, this problem will be exacerbated. PM driven new paradigms in clinical trials are therefore, gaining adoption. These include, (i) basket trials to test the effectiveness of an intervention based on the patient’s genotype regardless of a particular disease; (ii) umbrella trials that test effectiveness of several interventions based on the patient’s genotype and disease subtype; (iii) N-of-1 trials designed for a single patient, and (iv) adaptive trials, where trial therapy is evaluated and optimized during the course of the trial.

City of Hope (COH), a NCI designated Comprehensive Cancer Center is at the forefront of novel approaches to cancer therapies using PM. Our Center for Informatics & Data Science is developing SPIRIT—Software Platform for Integrated Research Information and Transformation, comprising of commercial, open source, and custom solutions. SPIRIT leverages four major technology platforms for innovative and unique applications aimed at solving the individual pieces of the complex puzzle that is patient identification and stratification for PM based clinical trials.

The four major underlying technological platforms:

1. Electronic Record System (EMR):

Healthcare providers are mandated to adopt “meaningful use” of EMR. COH is currently implementing Epic EMR. Having clinical care data accessible to researchers is essential for PM. It is anticipated that in the coming years, EMR will play a more significant role in research.

2. Enterprise Data Warehouse (EDW):

Integrating multitude of diverse healthcare data is challenging due to technical and cultural issues. These include lack of universally followed standards, multiple ontologies, textually unstructured data, granularity of data collected, data silos and lack of transparency. Privacy, security, regulatory and compliance (e.g., HIPPA) issues further complicate healthcare data integration. COHEDW combines aforementioned data sources to aid research. tranSMART, a translational research knowledge management and hypothesis generation platform at COH leverages data from the EDW.

3. BioMedical Natural Language Processing (NLP): SPIRIT NLP:

It is essential to get the structured and coded information from the biomedical texts such as clinical notes. Information such as the “recurrence”, “relapse” or “metastasis” of a disease is hidden in the clinical notes. Smoking status of a patient is difficult to obtain from EMR because it may be expressed in multiple ways (smoker, tobacco-user etc.), various ontologies code similar diseases differently, e.g., liver cancer, hepatic cancer are identical whereas hepatocellular carcinoma is a type of liver cancer. Synonyms of the critical terms further obfuscate extraction of usable information. E.g., EGFR, a gene when mutated leads various cancers, may be referred to as ERBB, ERBB1, mENAetc. The SPIRIT NLP platform integrates with the Unified Medical Language System (UMLS) to semantically disambiguate bio-medical concepts, extract related metadata, inter-concept relationships, and attribute-value pairs.

4. Machine Learning and Scientific Analytics (SA): SPIRIT SA

Scientific Analytics platform, SPIRIT SA, uses open source and commercial tools to support a wide-array of scientific applications. SPIRIT SA normalizes the input data, performs data cleanup, missing data analysis, and simultaneously uses several machine learning techniques for analysis. The results are uniformly validated, and consensus scored. The platform guides the user at various stages of data driven analysis.

These major technology platforms are augmented with Clinical Trials Management System (CTMS), biospecimen repositories, imaging informatics systems, etc.

A Portfolio of Applications for Precision Medicine Cohort Identification

We are now building cutting-edge precision medicine cohort identification applications based on the technology platforms described earlier.

1. Deeper Understanding of Patients and their Diseases: Disease Registries

A patient or disease registry contains uniform data elements to evaluate specified outcomes for a population defined by a particular disease, condition, or exposure. Patient registries derive data from multiple sources including EDW and EMR. Lung cancer registry at COH, e.g., has hundreds of discrete data elements to aid PM. Several other disease specific registries are currently in use or being developed. Epic EMR’s Healthy Planet patient registry module aims to further PM research.

2. Understand Similarity and Heterogeneity Among Patients: SPIRIT Alike

It is important to understand the degree of heterogeneity among the patients, to target personalized therapies. SPIRIT Alike represents patient’s attributes as fingerprints. The fingerprints are in turn used to compute similarity metrics to assess the source of heterogeneity among patients. An integration with SPIRIT SA enables machine learning driven deeper analysis.

3. Discover Patients for Trials from EMR: SPIRIT DT (Discover Trees):

Ideally, clinicians contemplating PM studies should have an easy access to potentially eligible patients, and physicians should have information about active clinical trials for patients. The obstacle, however, is that, the eligibility criteria for trials, like the information in the EMR is not directly usable (computable) due to variations in ontologies and non-standard coding.

SPIRIT DT takes a unique, semantically enriched, graphical approach for authoring computable clinical protocols while simultaneously, and, interactively searching the EMR to explore eligible patients. SPIRI DT utilizes i2b2 (Informatics for Integrating Biology and the Bedside) application to draw data from EMR and biospecimen repository via EDW. Integration of DT with SPIRIT SA provides deeper analytics. A complimentary application, eHope, automates the identification of patients from EMR as data.

4. Identify Patients for Trials from Molecular Data: SPIRIT HOPESEQ

PM decisions are guided by genomic information obtained from patient’s biospecimen. SPIRIT HOPESEQ application facilitates analysis and annotation of a patient’s genomic profile based on publicly available knowledge about the variants. Variant patterns of an individual patient and comparison of variant patterns between multiple patients provide further insights into the significance of the variant to pathogenesis. Integration of SPIRIT HOPESEQ with SPIRIT NLP presents clinicians with active trials based on the pathogenic gene variants.

Finding Needle in Multiple Haystacks!

A promising new development in cohort identification is academic and industry consortia. COH is part of ORIEN, a consortium of over a dozen cancer centers that share information and biospecimen of patients who have consented. ORIEN’s partnerships with the industry facilitates enrollment in new therapeutic trials. COH is also part of Los Angeles Data Consortium, which leverages i2b2 at, Los Angeles area hospitals to enable cross-institutional identification of patients for trials.

The convergence of EMR, advances in genomics, computational methodologies and initiatives like Cancer Moonshot have provided a great impetus to PM. The informatics community is innovating next generation solutions to ensure that PM programs are successful.