A review of off-policy evaluation in reinforcement learning

Uehara, Masatoshi; Shi, ChengchunORCID logo; and Kallus, Nathan (2025) A review of off-policy evaluation in reinforcement learning. Statistical Science. ISSN 0883-4237 (In press)
Copy

Reinforcement learning (RL) is one of the most vibrant research frontiers in machine learning and has been recently applied to solve a number of challenging problems. In this paper, we primarily focus on off-policy evaluation (OPE), one of the most fundamental topics in RL. In recent years, a number of OPE methods have been developed in the statistics and computer science literature. We provide a discussion on the efficiency bound of OPE, some of the existing state-of-the-art OPE methods, their statistical properties and some other related research directions that are currently actively explored.

mail Request Copy

picture_as_pdf
subject
Accepted Version
lock_clock
Restricted to Repository staff only until 1 January 2100
Available under Creative Commons: Attribution 4.0

Request Copy

Atom BibTeX OpenURL ContextObject in Span OpenURL ContextObject Dublin Core MPEG-21 DIDL Data Cite XML EndNote HTML Citation METS MODS RIOXX2 XML Reference Manager Refer ASCII Citation
Export

Downloads