Abstract
When case-control studies involve paired samples, tree analyses based on traditional splitting rules are suboptimal as they ignore the paired nature of the data. Paired samples occur in microbiome studies when they are collected from different locations of the same individual or when they are collected from paired individuals with familial ties. Borrowing concepts from tree splitting, we propose a novel approach that accommodates the paired structure in the data for fast and effective nonparametric variable ranking. Importantly this method allows detangling of different types of associations at play with structured correlated outcomes such as host genotype and enviromental exposure effects. Another technique for variable selection are variable importance measures. We describe two types of measures useful for paired data analysis. The methodology is illustrated on the microbiota of paired samples from a case-control study of obesity.