Abstract
The fact that there is very little if any overlap between the genes of different
prognostic signatures for early-discovery breast cancer is well documented. The
reasons for this apparent discrepancy have been explained by the limits of
simple machine-learning identification and ranking techniques, and the
biological relevance and meaning of the prognostic gene lists was questioned.
Subsequently, proponents of the prognostic gene lists claimed that different
lists do capture similar underlying biological processes and pathways. The
present study places under scrutiny the validity of this claim, for two
important gene lists that are at the focus of current large-scale validation
efforts. We performed careful enrichment analysis, controlling the effects of
multiple testing in a manner which takes into account the nested dependent
structure of gene ontologies. In contradiction to several previous publications,
we find that the only biological process or pathway for which statistically
significant concordance can be claimed is cell proliferation, a process whose
relevance and prognostic value was well known long before gene expression
profiling. We found that the claims reported by others, of wider concordance
between the biological processes captured by the two prognostic signatures
studied, were found either to be lacking statistical rigor or were in fact based
on addressing some other question.