Tag: llm

Research Papers

ImpossibleBench: Measuring LLMs' Propensity of Exploiting Test Cases

Ziqian Zhong, Aditi Raghunathan, Nicholas Carlini • October 20, 2025

This paper introduces ImpossibleBench, a benchmark framework to quantify an LLM's propensity to exploit test cases. We create "impossible" variants of coding tasks by mutating test cases to conflict with natural-language specifications, measuring an agent's "cheating rate" as its pass rate on these impossible tasks.

A Survey of Vibe Coding with Large Language Models

Yuyao Ge, Lingrui Mei, Zenghao Duan, Tianhao Li, Yujia Zheng, Yiwei Wang, Lexin Wang, Jiayu Yao, Tianyu Liu, Yujun Cai, Baolong Bi, Fangda Guo, Jiafeng Guo, Shenghua Liu, Xueqi Cheng • October 12, 2025

This survey provides a comprehensive review of "Vibe Coding," a paradigm where developers validate AI-generated code through outcome observation rather than line-by-line comprehension. We analyze over 1,000 research papers, examining LLMs for coding, coding agents, development environments, and feedback mechanisms.

LLMs Can Get "Brain Rot"!

Shuo Xing, Junyuan Hong, Yifan Wang, Runjin Chen, Zhenyu Zhang, Ananth Grama, Zhengzhong Tu, Zhangyang Wang • October 13, 2025

We investigate the "LLM Brain Rot Hypothesis"—that continual exposure to low-quality web text can induce cognitive decline in LLMs. Through controlled experiments on Twitter/X corpora, we demonstrate significant declines in reasoning, long-context understanding, and safety, while inflating negative traits.

All That Glitters is Not Novel: Plagiarism in AI Generated Research

Tarun Gupta, Danish Pruthi • February 16, 2025

We engage 13 experts to evaluate 50 AI-generated research documents for plagiarism. We find that 24% are either paraphrased or significantly borrowed from existing work without proper acknowledgment, highlighting the inadequacy of automated detectors and the need for careful assessment.

Are Today's LLMs Ready to Explain Well-Being Concepts?

Bohan Jiang, Dawei Li, Zhen Tan, Chengshuai Zhao, Huan Liu • August 5, 2025

This paper investigates whether Large Language Models (LLMs) can generate high-quality explanations of well-being concepts that are tailored to diverse audiences. The research constructs a large-scale dataset of 43,880 explanations from 10 diverse LLMs for 2,194 well-being concepts, and introduces a principle-guided LLM-as-a-judge evaluation framework.

LLM Token: A Comprehensive Analysis of Large Language Model Tokenization

Research Team • May 27, 2024

This paper provides a comprehensive analysis of tokenization in large language models, exploring the fundamental mechanisms that enable LLMs to process and understand text. The research examines various tokenization strategies, their impact on model performance, and the implications for natural language processing tasks.

Blog Posts

Tiny but Mighty: How Small Language Models Are Beating the Giants

January 15, 2025

Models with just 135M to 7B parameters started outperforming their heavyweight counterparts on real-world tasks. Learn why smaller is smarter and how SLMs deliver 10x cost reduction with 5x faster inference.