Nuevo Seminario de Estadística
We propose an approach for building ensembles of regularized linear models by optimizing an objective function that encourages sparsity within each model and diversity among them. Our procedure works on top of a given penalized linear regression estimator (e.g., Lasso, Elastic Net, SCAD) by fitting it to possibly overlapping subsets of features, while at the same time encouraging diversity among the subsets, to reduce the correlation between the predictions from each tted model. The predictions from the models are then aggregated. For the case of an Elastic Net penalty and orthogonal predictors, we give a closed form solution for the regression coefficients in each of the ensembled models. We prove the consistency of our method in possibly high-dimensional linear models, where the number of predictors can increase with the sample size. An extensive simulation study and real-data applications show that the proposed method systematically improves the prediction accuracy of the base linear estimators being ensembled. Possible extensions to GLMs and other models are discussed.