
OpenAI launches "PaperBench" test to prove the strongest AI agent has not surpassed humans

I'm PortAI, I can summarize articles.
OpenAI launched a new benchmark test "PaperBench" yesterday, aimed at assessing the ability of AI Agents to replicate top AI research. The test results show that even the most advanced AI models did not surpass the human baseline. PaperBench requires AI Agents to replicate 20 papers from the ICML 2024 conference from scratch, and the results indicate that the best-performing AI Agent achieved only a 21% replication score. OpenAI has open-sourced the relevant code to facilitate research on the engineering capabilities of AI Agents
Log in to access the full 0 words article for free
Due to copyright restrictions, please log in to view.
Thank you for supporting legitimate content.

