Abstract
Understanding interaction effects among features plays a crucial role in many scientific discoveries and contemporary applications. Yet to date, interaction detection in regression or classification models has been a challenging task. Many existing methods have issues with sensitivity of modeling assumptions or computational feasibility. We describe the concept of conditional residual and a general strategy based on it to detecting interactions between predictors in nonparametric regression model. This prediction-based strategy detects pairwise interactions among all variables automatically after variable selection procedure (if necessary). Two specific methods for interaction detection are developed utilizing this strategy: the effective model method uses raw conditional residual, the quantile regression forest method uses the conditional variance of conditional residual, which is much more robust on data with correlated variables. The new methods evaluate the significance and strength of interaction by variable importance, which is more efficient than other methods checking interactions pair by pair. We compare our methods on real and simulated data to other approaches. Our methods have high performance across a variety of complicated scenarios and outperform others a lot in terms of TNR and FDR. Real data analysis was conducted on the Ames housing price data and the heart transplant data. Based on the properties of conditional residual, we also have some theoretical discussion about quantile regression forest method.