Posts tagged with "testing"

ImpossibleBench: Measuring LLMs' Propensity of Exploiting Test Cases

Ziqian Zhong, Aditi Raghunathan, Nicholas Carlini • October 20, 2025

This paper introduces ImpossibleBench, a benchmark framework to quantify an LLM's propensity to exploit test cases. We create "impossible" variants of coding tasks by mutating test cases to conflict with natural-language specifications, measuring an agent's "cheating rate" as its pass rate on these impossible tasks.

Tag: testing

Research Papers

ImpossibleBench: Measuring LLMs' Propensity of Exploiting Test Cases