Abstract
Bulk sequencing has proven inadequate in capturing the full complexity of cancer heterogeneity, shifting focus toward cellular resolution. Single-cell DNA sequencing (scDNA-seq) technologies offer an effective approach for analyzing subclonal molecular architecture of tumor with greater granularity, facilitating the identification of somatic genetic mutations (single nucleotide variants, SNVs, and short indels) and copy number variants (CNVs) at single-cell resolution. This capability holds multiple promises, such as the characterization of timing and hierarchy of molecular events driving tumor initiation and progression. However, traditional scDNA-seq methods are hindered by significant artifacts during genome amplification, causing issues like allelic dropout and imbalance that compromise variant detection accuracy. The introduction of the Primary Template-directed Amplification (PTA) addressed these limitations by providing higher genome coverage and uniformity; however, further refinement in filtering amplification-induced artifacts remains necessary for accurate variant calling.
Here, we present CAPTATe (Characterize Artifacts in PTA Technology), a novel tool developed for improved artifact filtering and enhanced detection of SNVs, indels, and CNVs in scDNA-seq data. CAPTATe employs allele-specific CNV calling by addressing fluctuations in the read-depth ratio (RDR) through a multi-channel joint segmentation approach at the subclonal level. The tool leveraged an extension of the variational framework recently developed for single-cell RNA-seq (de Falco et al. Nature Communications 2023) and integrates both B-allele frequency (BAF) and RDR for accurate CNV detection. Additionally, SNV and indel artifact filtering are performed through a machine learning-based approach. This model accounts for allelic imbalance by evaluating the ratio between observed variant allele frequency (VAF) and expected BAF. Genetic tumor subclones can be accurately identified by clustering cells based on their genomic alteration profiles.
The increased precision and accuracy in single-cell genomic variant identification using CAPTATe overcome the main limitations of PTA scDNA-seq profiling, enabling precise characterization of genetic tumor heterogeneity at individual cell resolution.