Deeply-debiased off-policy interval estimation

Shi, ChengchunORCID logo; Wan, Runzhe; Chernozhukov, Victor; and Song, Rui (2021) Deeply-debiased off-policy interval estimation In: International Conference on Machine Learning, 2021-07-18 - 2021-07-24, Online. (In press)
Copy

Off-policy evaluation learns a target policy’s value with a historical dataset generated by a different behavior policy. In addition to a point estimate, many applications would benefit significantly from having a confidence interval (CI) that quantifies the uncertainty of the point estimate. In this paper, we propose a novel deeply-debiasing procedure to construct an efficient, robust, and flexible CI on a target policy’s value. Our method is justified by theoretical results and numerical experiments. A Python implementation of the proposed procedure is available at https://github.com/RunzheStat/D2OPE.

picture_as_pdf

picture_as_pdf
subject
Accepted Version

Download

Atom BibTeX OpenURL ContextObject in Span OpenURL ContextObject Dublin Core MPEG-21 DIDL Data Cite XML EndNote HTML Citation METS MODS RIOXX2 XML Reference Manager Refer ASCII Citation
Export

Downloads