/ Seminario de Estadística
 


Regression Phalanxes
Ruben Zamar
University of British Columbia

Tomal et al. (2015) introduced the notion of "phalanxes" in the context of rare-class detection in two-classification problems. A phalanx is a subset of features that work well for classification tasks. In this paper, we propose a different class of phalanxes for application in regression settings. We define a "regression phalanx" a subset of features that work well together for prediction. We propose a novel algorithm which automatically chooses regression phalanxes from high-dimensional data sets using hierarchical clustering and builds a prediction model for each phalanx for further ensembling. Through extensive simulation studies and several real-life applications in various areas (including drug discovery, chemical analysis of spectra data, microarray analysis and climate projections) we show that the ensemble of regression phalanxes yields more accurate predictions when combined with effective prediction methods like Lasso or Random Forests.

Keywords: Regression phalanxes, model ensembling, hierarchical clustering, Lasso, Random Forests

 
 
 
Intendente Güiraldes 2160
Ciudad Universitaria
Pabellón II - 2do. piso
(C1428EGA) Buenos Aires
Argentina
 
Teléfono directo/ Fax:
(54)(11) 4576-3375
Conmutador:
(54)(11) 4576-3300 al 3309
interno 259


 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

.