Bayesian nonparametric disclosure risk assessment

Favaro, S., Panero, F.

& Rigon, T. (2021). Bayesian nonparametric disclosure risk assessment. Electronic Journal of Statistics, 15(2), 5626 - 5651. https://doi.org/10.1214/21-EJS1933

Copy

Any decision about the release of microdata for public use is supported by the estimation of measures of disclosure risk, the most popular being the number τ1 of sample uniques that are also population uniques. In such a context, parametric and nonparametric partition-based models have been shown to have: i) the strength of leading to estimators of τ1 with desirable features, including ease of implementation, computational efficiency and scalability to massive data; ii) the weakness of producing underestimates of τ1 in realistic scenarios, with the underestimation getting worse as the tail behaviour of the empirical distribution of microdata gets heavier. To fix this underestimation phenomenon, we propose a Bayesian nonparametric partition-based model that can be tuned to the tail behaviour of the empirical distribution of microdata. Our model relies on the Pitman–Yor process prior, and it leads to a novel estimator of τ1 with all the desirable features of partition-based estimators and that, in addition, allows to reduce underestimation by tuning a “discount” parameter. We show the effectiveness of our estimator through its application to synthetic data and real data.

Item Type	Article
Copyright holders	© 2022 The Authors
Departments	LSE > Academic Departments > Statistics
DOI	10.1214/21-EJS1933
Date Deposited	11 November 2022
Acceptance Date	18 October 2021
URI	https://researchonline.lse.ac.uk/id/eprint/117305

Explore Further

Panero, Francesca

HA Statistics

European Research Council

https://www.lse.ac.uk/Statistics/People/Dr-Francesca-Panero (Author)
https://www.scopus.com/pages/publications/85127111770 (Scopus publication)
https://projecteuclid.org/journals/electronic-jour... (Official URL)

picture_as_pdf

subject: Published Version
: Creative Commons: Attribution 4.0

Download

Downloads

View more statistics

Bayesian nonparametric disclosure risk assessment

Explore Further

Export as