Tag: llm

Research Papers

ImpossibleBench: Measuring LLMs' Propensity of Exploiting Test Cases

Ziqian Zhong, Aditi Raghunathan, Nicholas CarliniOctober 20, 2025

This paper introduces ImpossibleBench, a benchmark framework to quantify an LLM's propensity to exploit test cases. We create "impossible" variants of coding tasks by mutating test cases to conflict with natural-language specifications, measuring an agent's "cheating rate" as its pass rate on these impossible tasks.

A Survey of Vibe Coding with Large Language Models

Yuyao Ge, Lingrui Mei, Zenghao Duan, Tianhao Li, Yujia Zheng, Yiwei Wang, Lexin Wang, Jiayu Yao, Tianyu Liu, Yujun Cai, Baolong Bi, Fangda Guo, Jiafeng Guo, Shenghua Liu, Xueqi ChengOctober 12, 2025

This survey provides a comprehensive review of "Vibe Coding," a paradigm where developers validate AI-generated code through outcome observation rather than line-by-line comprehension. We analyze over 1,000 research papers, examining LLMs for coding, coding agents, development environments, and feedback mechanisms.

LLMs Can Get "Brain Rot"!

Shuo Xing, Junyuan Hong, Yifan Wang, Runjin Chen, Zhenyu Zhang, Ananth Grama, Zhengzhong Tu, Zhangyang WangOctober 13, 2025

We investigate the "LLM Brain Rot Hypothesis"—that continual exposure to low-quality web text can induce cognitive decline in LLMs. Through controlled experiments on Twitter/X corpora, we demonstrate significant declines in reasoning, long-context understanding, and safety, while inflating negative traits.

All That Glitters is Not Novel: Plagiarism in AI Generated Research

Tarun Gupta, Danish PruthiFebruary 16, 2025

We engage 13 experts to evaluate 50 AI-generated research documents for plagiarism. We find that 24% are either paraphrased or significantly borrowed from existing work without proper acknowledgment, highlighting the inadequacy of automated detectors and the need for careful assessment.

Are Today's LLMs Ready to Explain Well-Being Concepts?

Bohan Jiang, Dawei Li, Zhen Tan, Chengshuai Zhao, Huan LiuAugust 5, 2025

This paper investigates whether Large Language Models (LLMs) can generate high-quality explanations of well-being concepts that are tailored to diverse audiences. The research constructs a large-scale dataset of 43,880 explanations from 10 diverse LLMs for 2,194 well-being concepts, and introduces a principle-guided LLM-as-a-judge evaluation framework.

LLM Token: A Comprehensive Analysis of Large Language Model Tokenization

Research TeamMay 27, 2024

This paper provides a comprehensive analysis of tokenization in large language models, exploring the fundamental mechanisms that enable LLMs to process and understand text. The research examines various tokenization strategies, their impact on model performance, and the implications for natural language processing tasks.

Blog Posts