Forward and backward state abstractions for off-policy evaluation

Hao, Meiling; Su, Pingfan; Hu, Liyuan; Szabo, Zoltan

(2024) Forward and backward state abstractions for off-policy evaluation. Technical Report. arXiv. (Submitted)

Copy

Off-policy evaluation (OPE) is crucial for evaluating a target policy’s impact offline before its deployment. However, achieving accurate OPE in large state spaces remains challenging. This paper studies state abstractions – originally designed for policy learning – in the context of OPE. Our contributions are three-fold: (i) We define a set of irrelevance conditions central to learning state abstractions for OPE. (ii) We derive sufficient conditions for achieving irrelevance in Q-functions and marginalized importance sampling ratios, the latter obtained by constructing a time-reversed Markov decision process (MDP) based on the observed MDP. (iii) We propose a novel two-step procedure that sequentially projects the original state space into a smaller space, which substantially simplify the sample complexity of OPE arising from high cardinality.

Item Type	Report (Technical Report)
Departments	Statistics
Date Deposited	02 Jul 2024 07:54
URI	https://researchonline.lse.ac.uk/id/eprint/124074

Explore Further

HA Statistics

picture_as_pdf

picture_as_pdf
subject: Submitted Version

Download

Atom

BibTeX

OpenURL ContextObject in Span

OpenURL ContextObject

Dublin Core

MPEG-21 DIDL

Data Cite XML

EndNote

HTML Citation

METS

MODS

RIOXX2 XML

Reference Manager

Refer

ASCII Citation

Export

Downloads