User preferences for large language model refusals: implications for moderation and market structure

Kireyev, P.ORCID logo & Vitorino, M. A. (2025). User preferences for large language model refusals: implications for moderation and market structure. London School of Economics and Political Science.
Copy

Large language models (LLMs) differ in their moderation and content policies, which determine which prompts these models refuse to answer. These refusals can affect user decisions of which models to use and whether to make safe or risky prompts. Using data from LMArena, where users select preferred responses to their prompts from paired LLM comparisons, we estimate a discrete choice model that captures user preferences for making risky prompts and their choice of which LLM provides the best response quality given the possibility of refusals. We leverage this model to analyze how moderation policies affect market shares across proprietary and opensource LLMs. Our findings reveal that proprietary LLMs provide higher quality responses and maintain larger market shares, but implement stricter moderation policies with higher refusal rates compared to open-source alternatives. This stricter moderation by proprietary LLMs reduces market concentration by allowing lower-quality open-source LLMs to compete effectively in the risky prompt segment. Mandating uniform moderation policies across all LLMs could increase market concentration favoring proprietary LLMs, potentially hampering competition. Our framework characterizes the efficient frontier of moderation policies that balance market concentration and safety

Full text not available from this repository.

Export as

EndNote BibTeX Reference Manager Refer Atom Dublin Core JSON Multiline CSV
Export