Predicting DNA methylation from genetic data lacking racial diversity using shared classified random effects

J. Sunil Rao; Hang Zhang; Erin Kobetz; Melinda C Aldrich; Douglas Conway

doi:10.1016/j.ygeno.2020.10.036

Back

Predicting DNA methylation from genetic data lacking racial diversity using shared classified random effects

Journal article

Peer reviewed

Predicting DNA methylation from genetic data lacking racial diversity using shared classified random effects

J. Sunil Rao, Hang Zhang, Erin Kobetz, Melinda C Aldrich and Douglas Conway

Genomics (San Diego, Calif.), Vol.113(1), pp.1018-1028

2021-01

DOI: https://doi.org/10.1016/j.ygeno.2020.10.036

PMID: 33161089

Abstract

DNA methylation

Mixed effects models

Racial diversity

Prediction

Public genomic repositories are notoriously lacking in racially and ethnically diverse samples. This limits the reaches of exploration and has in fact been one of the driving factors for the initiation of the All of Us project. Our particular focus here is to provide a model-based framework for accurately predicting DNA methylation from genetic data using racially sparse public repository data. Epigenetic alterations are of great interest in cancer research but public repository data is limited in the information it provides. However, genetic data is more plentiful. Our phenotype of interest is cervical cancer in The Cancer Genome Atlas (TCGA) repository. Being able to generate such predictions would nicely complement other work that has generated gene-level predictions of gene expression for normal samples. We develop a new prediction approach which uses shared random effects from a nested error mixed effects regression model. The sharing of random effects allows borrowing of strength across racial groups greatly improving predictive accuracy. Additionally, we show how to further borrow strength by combining data from different cancers in TCGA even though the focus of our predictions is DNA methylation in cervical cancer. We compare our methodology against other popular approaches including the elastic net shrinkage estimator and random forest prediction. Results are very encouraging with the shared classified random effects approach uniformly producing more accurate predictions – overall and for each racial group. •Public genomic repositories lack racial and ethnic diversity.•A mixed-model framework is introduced to accurately prediction DNA methylation.•Borrowing strength across racial groups using shared classified random effects.•Borrowing strength is improved by integrating data from different cancers.•Empirical performance shown for predicting DNA methylation for cervical cancer in TCGA.

Metrics

6 Record Views

2 Times Cited - Web of Science

InCites Highlights

These are selected metrics from InCites Benchmarking & Analytics tool, related to this output

Collaboration types: Domestic collaboration
Citation topics: 1 Clinical & Life Sciences; 1.54 Molecular & Cell Biology - Genetics; 1.54.100 DNA Methylation
Web Of Science research areas: Biotechnology & Applied Microbiology; Genetics & Heredity
ESI research areas: Molecular Biology & Genetics

UN Sustainable Development Goals (SDGs)

This output has contributed to the advancement of the following goals:

Source: InCites

Details

Title: Predicting DNA methylation from genetic data lacking racial diversity using shared classified random effects
Creators: J. Sunil Rao - University of Miami, FL, United States of America
Hang Zhang - University of Miami, FL, United States of America
Erin Kobetz - University of Miami, FL, United States of America
Melinda C Aldrich - Vanderbilt University Medical Center, Nashville, TN, United States of America
Douglas Conway - Vanderbilt University Medical Center, Nashville, TN, United States of America
Publication Details: Genomics (San Diego, Calif.), Vol.113(1), pp.1018-1028
Publisher: Elsevier Inc
Academic Unit: Miller School of Medicine; Jay Weiss Institute for Health Equity at Sylvester; UMMG Department of Public Health Sciences Research
Language: English
Resource Type: Journal article
PMID: 33161089
Record Identifier: 991031578434702976