ImpossibleBench: Measuring LLMs' Propensity of Exploiting Test Cases
Ziqian Zhong, Aditi Raghunathan, Nicholas Carlini • October 20, 2025
This paper introduces ImpossibleBench, a benchmark framework to quantify an LLM's propensity to exploit test cases. We create "impossible" variants of coding tasks by mutating test cases to conflict with natural-language specifications, measuring an agent's "cheating rate" as its pass rate on these impossible tasks.