Abstract
Tumor deconvolution enables the identification of diverse cell types that comprise solid tumors. To date, however, both the algorithms developed to deconvolve tumor samples, and the gold-standard datasets used to assess the algorithms are geared toward the analysis of gene expression (e.g., RNA sequencing) rather than protein levels. Despite the popularity of gene expression datasets, protein levels often provide a more accurate view of rare cell types. To facilitate the use, development, and reproducibility of multiomic deconvolution algorithms, we introduce Decomprolute, a Common Workflow Language framework that leverages containerization to compare tumor deconvolution algorithms across multiomic datasets. Decomprolute incorporates the large-scale multiomic datasets produced by the Clinical Proteomic Tumor Analysis Consortium (CPTAC), which include matched mRNA expression and proteomic data from thousands of tumors across multiple cancer types to build a fully open-source, containerized proteogenomic tumor deconvolution benchmarking platform. http://pnnl-compbio.github.io/decomprolute
[Display omitted]
•Decomprolute enables benchmarking of proteomic deconvolution algorithms•Framework incorporates proteogenomic tumor data from over 1,000 patient samples•Common Workflow Language (CWL) automates multiple tests across deconvolution algorithms•Extendable framework is designed to incorporate additional algorithm development
Our goal is to provide a comprehensive platform for algorithm developers and researchers to benchmark and run tumor deconvolution algorithms on multiomic data. We designed Decomprolute to be a modular tool that can be used to evaluate a selection of existing deconvolution algorithms on a cancer dataset of interest and to assess the quality of any new methods that may be developed.
Feng et al. describe Decomprolute, a framework to benchmark algorithms that deconvolve bulk gene and protein expression measurements using cell-specific markers. Decomprolute uses the Common Workflow Language to automate a series of benchmarks that assess the performance of algorithms on proteomic data from over 1,000 cancer samples.