OpenAI launches "PaperBench" test to prove the strongest AI agent has not surpassed humans

Stheadline

2025.04.03 02:41

I'm PortAI, I can summarize articles.

OpenAI launched a new benchmark test "PaperBench" yesterday, aimed at assessing the ability of AI Agents to replicate top AI research. The test results show that even the most advanced AI models did not surpass the human baseline. PaperBench requires AI Agents to replicate 20 papers from the ICML 2024 conference from scratch, and the results indicate that the best-performing AI Agent achieved only a 21% replication score. OpenAI has open-sourced the relevant code to facilitate research on the engineering capabilities of AI Agents