How to Apply Bayesian Stochastic Search Variable Selection with Multiply Imputed Data

Sierra A. Bainter; Zhixin Mao; J. Sunil Rao

doi:10.1037/met0000837

Back

How to Apply Bayesian Stochastic Search Variable Selection with Multiply Imputed Data

Journal article

Open access

Peer reviewed

How to Apply Bayesian Stochastic Search Variable Selection with Multiply Imputed Data

Sierra A. Bainter, Zhixin Mao and J. Sunil Rao

Psychological methods

2026-04-16

DOI: https://doi.org/10.1037/met0000837

PMID: 41989458

Abstract

Bayesian variable selection

SSVS

missing data

multiple imputation

regression

Modern regularization and variable selection methods such as lasso and Bayesian variable selection are important tools for psychological researchers to reduce the risk of overfitting, improve prediction in future samples, and increase model interpretability. Although missing data are common in psychological data, it is not straightforward to combine principled methods for addressing missing data with these modern variable selection methods. This challenge is well-illustrated in a recent paper by Gunn and colleagues (2022) with a comparison of three approaches for combining lasso with multiple imputation to address missing data. Each of the surveyed approaches results in markedly different results in terms of predictors selected. Their findings underscore limitations of the lasso for the purpose of variable selection. In this paper we show how to implement a Bayesian variable selection method, Stochastic Search Variable Selection (SSVS), with multiply imputed data. SSVS is a principled and consistent method for variable selection, and we demonstrate advantages relative to lasso in an example dataset and simulation study. It is straightforward to apply an impute-then-combine strategy for SSVS using existing software. Psychological researchers often analyze many potential predictors, which increases the risk of identifying relationships that do not replicate in future samples. Methods such as the lasso are widely used to reduce this risk by selecting a smaller set of important predictors. However, psychological data also frequently contain missing values, and combining variable-selection methods with approaches for handling missing data is challenging. Recent work comparing ways to combine the lasso with multiple imputation—a common method for addressing missing data—shows that different approaches can lead to very different conclusions about which predictors are important. In this paper, we demonstrate how to apply a Bayesian variable-selection method, Stochastic Search Variable Selection (SSVS), with multiply imputed data. Using both an example dataset and simulations, we show that this approach provides a principled and consistent way to identify important predictors and can offer advantages over the lasso, while remaining straightforward to implement with existing software.

Files and links (1)

url

https://doi.org/10.1037/met0000837View

Published (Version of record) Open

Metrics

1 Record Views

Details

Title: How to Apply Bayesian Stochastic Search Variable Selection with Multiply Imputed Data
Creators: Sierra A. Bainter - University of Miami
Zhixin Mao - Institute of Bioinformatics
J. Sunil Rao - University of Minnesota
Publication Details: Psychological methods
Publisher: AMER PSYCHOLOGICAL ASSOC; WASHINGTON
Number of pages: 19
Grant note: National Institute of Mental Health of the National Institutes of Health: K01MH122805
Research reported in this publication was supported by the National Institute of Mental Health of the National Institutes of Health under award number K01MH122805.
Academic Unit: College of A&S; A&S - Psychology
Language: English
Resource Type: Journal article
PMID: 41989458
Record Identifier: 991033052175502976