Variable importance in binary regression trees and forests

Hemant Ishwaran

doi:10.1214/07-EJS039

Back

Variable importance in binary regression trees and forests

Journal article

Open access

Peer reviewed

Variable importance in binary regression trees and forests

Hemant Ishwaran

Electronic journal of statistics, Vol.1(none), pp.519-537

2007-11-15

DOI: https://doi.org/10.1214/07-EJS039

Abstract

Statistics - Machine Learning

Electronic Journal of Statistics 2007, Vol. 1, 519-537 We characterize and study variable importance (VIMP) and pairwise variable associations in binary regression trees. A key component involves the node mean squared error for a quantity we refer to as a maximal subtree. The theory naturally extends from single trees to ensembles of trees and applies to methods like random forests. This is useful because while importance values from random forests are used to screen variables, for example they are used to filter high throughput genomic data in Bioinformatics, very little theory exists about their properties.

Files and links (1)

url

https://doi.org/10.1214/07-EJS039View

Published (Version of record) Open

Metrics

11 Record Views

266 Times Cited - Web of Science

See more details

InCites Highlights

These are selected metrics from InCites Benchmarking & Analytics tool, related to this output

Citation topics: 4 Electrical Engineering, Electronics & Computer Science; 4.61 Artificial Intelligence & Machine Learning; 4.61.145 Feature Selection
Web Of Science research areas: Statistics & Probability
ESI research areas: Mathematics

UN Sustainable Development Goals (SDGs)

This output has contributed to the advancement of the following goals:

Source: InCites

Details

Title: Variable importance in binary regression trees and forests
Creators: Hemant Ishwaran
Publication Details: Electronic journal of statistics, Vol.1(none), pp.519-537
Academic Unit: Miller School of Medicine; UMMG Department of Public Health Sciences Research
Language: English
Resource Type: Journal article
Record Identifier: 991031600035102976