Abstract
Breast cancer is the most common cancer in women according to WHO's World Cancer Report, 2014. In 2015 alone, 571 000 people died of breast cancer. One of the major causes of cancer formation is epigenetic changes, which are changes in gene function without altering the DNA sequence. Among the various epigenetic mechanisms, DNA methylation, which is the process of the addition or removal of a methyl group to DNA, is the most studied. Methylation levels of CpG islands of a gene is crucial in determination of gene expression. Changes in gene expression may silence the tumor suppressor genes, which may cause cancer. Here, we develop a hybrid supervised learning framework to estimate gene expressions from methylation levels of CpG islands in Xena Browser TCGA Breast Cancer dataset. The proposed hybrid framework is comprised of multiple supervised learning algorithms including a crossbreed algorithm for gene selection and estimation of gene expressions. The crossbreed algorithm is tested against various algorithms known in the literature including multi-linear regression, support vector machine and stochastic gradient boosting. Selected algorithm-specific parameters outperformed the rest of the parameter space for the chosen genes. Competing performances of the algorithms helped identify the significant CpG islands with confidence.