Regression Phalanxes
Ruben Zamar
University of British Columbia
Tomal et al. (2015) introduced the notion of "phalanxes" in the
context of rare-class detection in two-classification problems. A phalanx
is a subset of features that work well for classification tasks. In this
paper, we propose a different class of phalanxes for application in
regression settings. We define a "regression phalanx" a subset of features
that work well together for prediction. We propose a novel algorithm which
automatically chooses regression phalanxes from high-dimensional data sets
using hierarchical clustering and builds a prediction model for each
phalanx for further ensembling. Through extensive simulation studies and
several real-life applications in various areas (including drug discovery,
chemical analysis of spectra data, microarray analysis and climate
projections) we show that the ensemble of regression phalanxes yields more
accurate predictions when combined with effective prediction methods like
Lasso or Random Forests.
Keywords: Regression phalanxes, model ensembling, hierarchical clustering,
Lasso, Random Forests