Forward and backward state abstractions for off-policy evaluation
Off-policy evaluation (OPE) is crucial for evaluating a target policy’s impact offline before its deployment. However, achieving accurate OPE in large state spaces remains challenging. This paper studies state abstractions – originally designed for policy learning – in the context of OPE. Our contributions are three-fold: (i) We define a set of irrelevance conditions central to learning state abstractions for OPE. (ii) We derive sufficient conditions for achieving irrelevance in Q-functions and marginalized importance sampling ratios, the latter obtained by constructing a time-reversed Markov decision process (MDP) based on the observed MDP. (iii) We propose a novel two-step procedure that sequentially projects the original state space into a smaller space, which substantially simplify the sample complexity of OPE arising from high cardinality.
| Item Type | Report (Technical Report) |
|---|---|
| Departments | Statistics |
| Date Deposited | 02 Jul 2024 07:54 |
| URI | https://researchonline.lse.ac.uk/id/eprint/124074 |
Explore Further
- https://www.lse.ac.uk/statistics/people/zoltan-szabo (Author)
- https://www.lse.ac.uk/statistics/people/pingfan-su (Author)
- https://www.lse.ac.uk/statistics/people/liyuan-hu (Author)
- https://www.lse.ac.uk/statistics/people/chengchun-shi (Author)
- https://github.com/pufffs/state-abstraction
- https://arxiv.org/abs/2406.19531
-
picture_as_pdf -
subject - Submitted Version