An instrumental variable approach to confounded off-policy evaluation

Xu, Y., Zhu, J., Shi, C.

, Luo, S. & Song, R. (2023). An instrumental variable approach to confounded off-policy evaluation. Proceedings of Machine Learning Research, 202, 38848 - 38880.

Copy

Abstract

Off-policy evaluation (OPE) aims to estimate the return of a target policy using some pre-collected observational data generated by a potentially different behavior policy. In many cases, there exist unmeasured variables that confound the action-reward or action-next-state relationships, rendering many existing OPE approaches ineffective. This paper develops an instrumental variable (IV)-based method for consistent OPE in confounded sequential decision making. Similar to single-stage decision making, we show that IV enables us to correctly identify the target policy’s value in infinite horizon settings as well. Furthermore, we propose a number of policy value estimators and illustrate their effectiveness through extensive simulations and real data analysis from a world-leading short-video platform.

Item Type	Article
Copyright holders	© 2023 The Author(s)
Departments	LSE > Academic Departments > Statistics
Date Deposited	23 May 2024
Acceptance Date	14 April 2023
URI	https://researchonline.lse.ac.uk/id/eprint/123599

Explore Further

Shi, Chengchun

QA75 Electronic computers. Computer science

https://www.lse.ac.uk/statistics/people/chengchun-shi (Author)
https://www.scopus.com/pages/publications/85172436136 (Scopus publication)
https://proceedings.mlr.press/v202/xu23x.html
https://proceedings.mlr.press/ (Official URL)

picture_as_pdf

subject: Accepted Version

Download

EndNote

BibTeX

Reference Manager (RIS)

Refer

Atom

Dublin Core

JSON

Multiline CSV

Export

Downloads

View more statistics