Ranking-based variable selection for high-dimensional data

Baranowski, R., Chen, Y.

(2020). Ranking-based variable selection for high-dimensional data. Statistica Sinica, 30(3), 1485 - 1516. https://doi.org/10.5705/ss.202017.0139

Copy

We propose a ranking-based variable selection (RBVS) technique that identifies important variables influencing the response in high-dimensional data. RBVS uses subsampling to identify the covariates that appear nonspuriously at the top of a chosen variable ranking. We study the conditions under which such a set is unique, and show that it can be recovered successfully from the data by our procedure. Unlike many existing high-dimensional variable selection techniques, among all relevant variables, RBVS distinguishes between important and unimportant variables, and aims to recover only the important ones. Moreover, RBVS does not require model restrictions on the relationship between the response and the covariates, and, thus, is widely applicable in both parametric and nonparametric contexts. Lastly, we illustrate the good practical performance of the proposed technique by means of a comparative simulation study. The RBVS algorithm is implemented in rbvs, a publicly available R package.

Item Type	Article
Copyright holders	© 2020 Institute of Statistical Science, Academia Sinica
Departments	LSE > Academic Departments > Statistics
DOI	10.5705/ss.202017.0139
Date Deposited	18 September 2018
Acceptance Date	6 September 2018
URI	https://researchonline.lse.ac.uk/id/eprint/90233

Explore Further

HA Statistics

http://www.lse.ac.uk/Statistics/People/Professor-Piotr-Fryzlewicz (Author)
https://www.scopus.com/pages/publications/85091898172 (Scopus publication)
http://www3.stat.sinica.edu.tw/statistica/ (Official URL)

picture_as_pdf

subject: Accepted Version

Download

Downloads

View more statistics

Ranking-based variable selection for high-dimensional data

Explore Further

Export as