A review of off-policy evaluation in reinforcement learning

; and Kallus, Nathan (2025) A review of off-policy evaluation in reinforcement learning. Statistical Science. ISSN 0883-4237 (In press)

Copy

Reinforcement learning (RL) is one of the most vibrant research frontiers in machine learning and has been recently applied to solve a number of challenging problems. In this paper, we primarily focus on off-policy evaluation (OPE), one of the most fundamental topics in RL. In recent years, a number of OPE methods have been developed in the statistics and computer science literature. We provide a discussion on the efficiency bound of OPE, some of the existing state-of-the-art OPE methods, their statistical properties and some other related research directions that are currently actively explored.

Item Type	Article
Keywords	off-policy evaluation,semiparametric methods,causal inference,dynamic treatment regime,offline reinforcement learning,contextual bandits
Departments	Statistics
Date Deposited	14 Apr 2025 14:18
Acceptance Date	2025-03-18
URI	https://researchonline.lse.ac.uk/id/eprint/127940

Explore Further

HA Statistics

mail

Request Copy

picture_as_pdf
subject: Accepted Version
lock_clock: Restricted to Repository staff only until 1 January 2100
: Available under Creative Commons: Attribution 4.0

Request Copy

Atom

BibTeX

OpenURL ContextObject in Span

OpenURL ContextObject

Dublin Core

MPEG-21 DIDL

Data Cite XML

EndNote

HTML Citation

METS

MODS

RIOXX2 XML

Reference Manager

Refer

ASCII Citation

Export

Downloads