A review of off-policy evaluation in reinforcement learning
Uehara, Masatoshi; Shi, Chengchun
; and Kallus, Nathan
(2025)
A review of off-policy evaluation in reinforcement learning.
Statistical Science.
ISSN 0883-4237
(In press)
Reinforcement learning (RL) is one of the most vibrant research frontiers in machine learning and has been recently applied to solve a number of challenging problems. In this paper, we primarily focus on off-policy evaluation (OPE), one of the most fundamental topics in RL. In recent years, a number of OPE methods have been developed in the statistics and computer science literature. We provide a discussion on the efficiency bound of OPE, some of the existing state-of-the-art OPE methods, their statistical properties and some other related research directions that are currently actively explored.
| Item Type | Article |
|---|---|
| Keywords | off-policy evaluation,semiparametric methods,causal inference,dynamic treatment regime,offline reinforcement learning,contextual bandits |
| Departments | Statistics |
| Date Deposited | 14 Apr 2025 14:18 |
| Acceptance Date | 2025-03-18 |
| URI | https://researchonline.lse.ac.uk/id/eprint/127940 |
ORCID: https://orcid.org/0000-0001-7773-2099
