Future-dependent value-based off-policy evaluation in POMDPs

Uehara, M., Kiyohara, H., Bennett, A., Chernozhukov, V., Jiang, N., Kallus, N., Shi, C.

& Sun, W. (2023). Future-dependent value-based off-policy evaluation in POMDPs. In Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M. & Levine, S. (Eds.), Advances in Neural Information Processing Systems 36 (NeurIPS 2023) . Neural Information Processing Systems Foundation.

Copy

We study off-policy evaluation (OPE) for partially observable MDPs (POMDPs) with general function approximation. Existing methods such as sequential importance sampling estimators suffer from the curse of horizon in POMDPs. To circumvent this problem, we develop a novel model-free OPE method by introducing future-dependent value functions that take future proxies as inputs and perform a similar role to that of classical value functions in fully-observable MDPs. We derive a new off-policy Bellman equation for future-dependent value functions as conditional moment equations that use history proxies as instrumental variables. We further propose a minimax learning method to learn future-dependent value functions using the new Bellman equation. We obtain the PAC result, which implies our OPE estimator is close to the true policy value under Bellman completeness, as long as futures and histories contain sufficient information about latent states. Our code is available at https://github.com/aiueola/neurips2023-future-dependent-ope.

Item Type	Chapter
Copyright holders	© 2023 The Author
Departments	LSE > Academic Departments > Statistics
Date Deposited	23 April 2024
Acceptance Date	22 September 2023
URI	https://researchonline.lse.ac.uk/id/eprint/122752

Explore Further

Shi, Chengchun

HA Statistics

https://www.lse.ac.uk/statistics/people/chengchun-shi (Author)
https://proceedings.neurips.cc/paper_files/paper/2... (Official URL)

picture_as_pdf

subject: Accepted Version

Download

Downloads

View more statistics

Future-dependent value-based off-policy evaluation in POMDPs

Explore Further

Export as