Ranking-based variable selection for high-dimensional data
We propose a ranking-based variable selection (RBVS) technique that identifies important variables influencing the response in high-dimensional data. RBVS uses subsampling to identify the covariates that appear nonspuriously at the top of a chosen variable ranking. We study the conditions under which such a set is unique, and show that it can be recovered successfully from the data by our procedure. Unlike many existing high-dimensional variable selection techniques, among all relevant variables, RBVS distinguishes between important and unimportant variables, and aims to recover only the important ones. Moreover, RBVS does not require model restrictions on the relationship between the response and the covariates, and, thus, is widely applicable in both parametric and nonparametric contexts. Lastly, we illustrate the good practical performance of the proposed technique by means of a comparative simulation study. The RBVS algorithm is implemented in rbvs, a publicly available R package.
| Item Type | Article |
|---|---|
| Copyright holders | © 2020 Institute of Statistical Science, Academia Sinica |
| Keywords | variable screening, subset selection, bootstrap, stability selection. |
| Departments | Statistics |
| DOI | 10.5705/ss.202017.0139 |
| Date Deposited | 18 Sep 2018 10:02 |
| Acceptance Date | 2018-09-06 |
| URI | https://researchonline.lse.ac.uk/id/eprint/90233 |