Abstract
Several artificial neural networks (ANNs) have recently been developed as the
Cox proportional hazard model for predicting cancer prognosis based on tumor
transcriptome. However, they have not demonstrated significantly better
performance than the traditional Cox regression with regularization. Training
an ANN with high prediction power is challenging in the presence of a limited
number of data samples and a high-dimensional feature space. Recent
advancements in image classification have shown that contrastive learning can
facilitate further learning tasks by learning good feature representation from
a limited number of data samples. In this paper, we applied supervised
contrastive learning to tumor gene expression and clinical data to learn
feature representations in a low-dimensional space. We then used these learned
features to train the Cox model for predicting cancer prognosis. Using data
from The Cancer Genome Atlas (TCGA), we demonstrated that our contrastive
learning-based Cox model (CLCox) significantly outperformed existing methods in
predicting the prognosis of 18 types of cancer under consideration. We also
developed contrastive learning-based classifiers to classify tumors into
different risk groups and showed that contrastive learning can significantly
improve classification accuracy.