An instrumental variable approach to confounded off-policy evaluation

Xu, Yang; Zhu, Jin; Shi, ChengchunORCID logo; Luo, Shikai; and Song, Rui (2023) An instrumental variable approach to confounded off-policy evaluation. Proceedings of Machine Learning Research, 202. 38848 - 38880. ISSN 1938-7228
Copy

Off-policy evaluation (OPE) aims to estimate the return of a target policy using some pre-collected observational data generated by a potentially different behavior policy. In many cases, there exist unmeasured variables that confound the action-reward or action-next-state relationships, rendering many existing OPE approaches ineffective. This paper develops an instrumental variable (IV)-based method for consistent OPE in confounded sequential decision making. Similar to single-stage decision making, we show that IV enables us to correctly identify the target policy’s value in infinite horizon settings as well. Furthermore, we propose a number of policy value estimators and illustrate their effectiveness through extensive simulations and real data analysis from a world-leading short-video platform.

picture_as_pdf

picture_as_pdf
subject
Accepted Version

Download

Atom BibTeX OpenURL ContextObject in Span OpenURL ContextObject Dublin Core MPEG-21 DIDL Data Cite XML EndNote HTML Citation METS MODS RIOXX2 XML Reference Manager Refer ASCII Citation
Export

Downloads