Ranking-based variable selection for high-dimensional data

Baranowski, R., Chen, Y.ORCID logo & Fryzlewicz, P.ORCID logo (2020). Ranking-based variable selection for high-dimensional data. Statistica Sinica, 30(3), 1485 - 1516. https://doi.org/10.5705/ss.202017.0139
Copy

We propose a ranking-based variable selection (RBVS) technique that identifies important variables influencing the response in high-dimensional data. RBVS uses subsampling to identify the covariates that appear nonspuriously at the top of a chosen variable ranking. We study the conditions under which such a set is unique, and show that it can be recovered successfully from the data by our procedure. Unlike many existing high-dimensional variable selection techniques, among all relevant variables, RBVS distinguishes between important and unimportant variables, and aims to recover only the important ones. Moreover, RBVS does not require model restrictions on the relationship between the response and the covariates, and, thus, is widely applicable in both parametric and nonparametric contexts. Lastly, we illustrate the good practical performance of the proposed technique by means of a comparative simulation study. The RBVS algorithm is implemented in rbvs, a publicly available R package.

picture_as_pdf

subject
Accepted Version

Download

Export as

EndNote BibTeX Reference Manager Refer Atom Dublin Core JSON Multiline CSV
Export