--- title: "OpenAI's GPT-5 is here with up to 80% fewer hallucinations" description: "OpenAI launched GPT-5, its most advanced model, claiming it has up to 80% fewer hallucinations and improved performance in coding, writing, math, and visual perception. The model features a routing sy" type: "news" locale: "en" url: "https://longbridge.com/en/news/252266692.md" published_at: "2025-08-08T15:42:27.000Z" --- # OpenAI's GPT-5 is here with up to 80% fewer hallucinations > OpenAI launched GPT-5, its most advanced model, claiming it has up to 80% fewer hallucinations and improved performance in coding, writing, math, and visual perception. The model features a routing system that directs prompts to the appropriate version based on complexity. While GPT-5 shows iterative improvements over its predecessor, it excels in tool use and health-related queries. OpenAI emphasizes that GPT-5 is designed to enhance user experience, particularly in healthcare, despite only marginal gains in benchmark tests. OpenAI unveiled its most capable model yet on Thursday with the launch of GPT-5. AI hype man and OpenAI CEO Sam Altman described it as like talking to your own personal expert that can write applications on demand. "We think this idea of software on demand is going to be one of the defining characteristics of the GPT-5 era," he said, kicking off an over-75-minute presentation packed with code demos. Compared to earlier models, OpenAI says GPT-5 delivers improvements in coding, writing, math, and visual perception, while also cutting down on hallucinations and deceptive behavior. Youtube Video To be clear, GPT-5 isn't one model. It's actually a collection of models to which OpenAI will route prompts based on signals like the user's intent or the request's general complexity. According to OpenAI, simple prompts might be routed to a small, efficient version of the model that can respond quickly without "thinking", while a larger, deeper reasoning model might be used to handle more complex or nuanced tasks. This capability is triggered automatically based on user prompts. Paid users will also have the option of toggling on reasoning functionality permanently if desired. This routing model is apparently being continuously trained on new input signals to make it smarter about which model it routes the request to and when to trigger reasoning functionality. However, OpenAI says it eventually plans to integrate them all into a single model. In addition to being faster, OpenAI says this architecture is more efficient than prior designs. "GPT-5 gets more value out of less thinking time. In our evaluations, GPT-5 — with thinking — performs better than OpenAI o3 with 50-80 percent less output tokens across capabilities, including visual reasoning, agentic coding, and graduate-level scientific problem solving," the company wrote in a blog post. ChatGPT Free and Plus users will have access to GPT-5 and GPT-5 mini, while Pro and Enterprise users will have access to a Pro variant, which can reason for longer. Those accessing the models via API will also have access to a Nano version at a reduced cost, alongside the standard and mini models. ### Revolutionary upgrade or overhyped iteration While OpenAI's presentation was packed with hyperbolic claims and demos about GPT-5 being its smartest model ever, the company's benchmark results told a slightly different story, one of mostly iterative improvements. Your eyes aren't deceiving you. GPT-5 shows only iterative improvements in math benchmarks like AIME 2025 - Click to enlarge In the AIME 2025 math bench, GPT-5 Pro eked out a 1.6 point lead over the company's previous flagship o3 model when using tools and a 7.8 point advantage without them. With that said, for free tier users, the new models are a pretty big upgrade over GPT4o, with GPT 5 (non-Pro) managing a 57.5 point advantage. And it was a similar story with the FrontierMath and the HMMT math benches. GPT-5 showed similarly narrow gains over o3 in the GPQA Diamond bench as well - Click to enlarge Similarly, iterative performance gains were observed in GPQA Diamond, a PhD-level science quiz, and Humanity's Last Exam. Across nearly every benchmark suite, GPT-5 managed single-digit leads over last gen's models. Compared to o3, GPT-5 is way more adept at tool use and instruction following - Click to enlarge One of the most obvious standouts was in Tau2-bench, a conversation agent benchmark where GPT-5's improvements in tool calling and instruction following were on full display. "Benchmarks, they're exciting numbers, but we're starting to saturate them, like when you're moving between 98% and 99% in some benchmark it means you need something else to really capture how great the model is," OpenAI president Greg Brockman admitted. This is no doubt why so much of the presentation was dedicated to demos and testimonials. Speaking of which, one capability Altman was particularly excited about was GPT-5's performance in health-related queries. "One of the top use cases of ChatGPT is health. People use it a lot. You've all seen examples of people getting day-to-day care advice or sometimes even a life saving diagnosis," Altman said. "GPT-5 is the best model ever for health. It empowers you to be more in control of your healthcare journey." Apparently, ChatGPT has usurped WebMD for self-diagnosis. During one testimonial, the company appeared to be suggesting users struggling to make sense of health conditions just upload medical documents to ChatGPT for GPT-5 to figure out. What was it Altman was just saying about feeding ChatGPT sensitive information? ### OpenAI tunes out the voices While GPT-5's benchmark gains were marginal at best, the models should be less prone to hallucinating, which has become a major problem with models fabricating often convincing information in order to satisfy a user's request. In our tests just this week, OpenAI's (much smaller and less capable) open-source models hallucinated a fictional presidential candidate whom Donald Trump beat in 2024. "GPT-5's responses are around 45 percent less likely to contain a factual error than GPT-4o and when thinking GPT-5's responses are around 80 percent less likely to contain a factual error than OpenAI o3," the company said in a blog post. Along with cutting down on hallucinations, OpenAI also implemented evaluations to test for deceitful behavior on the models' part. "In order to achieve a high reward during training, reasoning models may learn to lie about successfully completing a task or be overly confident about an uncertain answer," the company explained. "GPT-5 more accurately recognizes when tasks can't be completed and communicates its limits clearly." In testing on real-world chat data, OpenAI says it was able to reduce deception rates from 4.8 percent on o3 to 2.1 percent in reasoning responses. Meanwhile, on the topic of safety, OpenAI has implemented new measures to handle potentially dubious prompts on sensitive topics. Rather than guardrails that can be bypassed with clever prompt engineering, the model says GPT-5 will now provide the most complete response possible while staying within an acceptable safety margin. For example, instead of refusing to answer a question about how to ignite a potentially explosive compound, the model might instead direct the user to where they can find the information and issue warnings in response to the request. ### ChatGPT gets a personality or four Alongside the new models, OpenAI is also rolling out four new optional personalities for its chatbot so users can decide exactly how professional or edgy they want their AI assistant to be. At launch, four personalities will be available: cynic, robot, listener, and nerd. These personalities, the model builder notes, are opt-in and are, for the moment, limited to text chat with distinct voice capabilities coming later. "This lets you interact with ChatGPT in a way that's consistent with your own communication style," Mark Chen, Chief Research Officer at OpenAI, said. OpenAI was careful to emphasize that these personalities have been specifically tuned to avoid becoming too sycophantic in their praise of user questions and inputs. ### Availability OpenAI's GPT-5 family of models is available now on ChatGPT for free, Plus, and Pro users beginning today and will be rolling out to enterprise and educational users next week. Pricing for ChatGPT remains unchanged at $20 a month for the Plus tier and $200 a month for the unlimited Pro tier. Professionals also have the option of accessing the models via API. Full pricing, including cost per input, output and cached tokens can be found here. If the idea of paying for ChatGPT doesn't appeal to you, earlier this week, OpenAI released its first open weights models since GPT-2. **Bootnote:** This week also saw the release of Anthropic's Claude Opus 4.1, an updated version of the model which showed similarly iterative improvements in coding benchmarks. ® ### Related Stocks - [OpenAI.NA - OpenAI](https://longbridge.com/en/quote/OpenAI.NA.md) ## Related News & Research | Title | Description | URL | |-------|-------------|-----| | 从税务会计到支付设施:OpenAI 正在为 ChatGPT 购物功能的底层琐事 “头秃” | 随着 ChatGPT 试图扩展购物功能并引入大品牌,公司可能被迫承担更多的交易处理工作,从而触发复杂的税务合规义务。如果 OpenAI 未来建立起庞大的购物业务,极有可能成为美国各州税务审计的目标。在支付基础设施方面,OpenAI 近期选择 | [Link](https://longbridge.com/en/news/275568414.md) | | OpenAI 高管:工程师变成 “魔法师”,AI 将开启新一轮创业狂潮 | OpenAI 内部曝光:95% 工程师已用 AI 编程,代码审查全由 Codex 接管!负责人 Sherwin Wu 预言,未来两年模型将具备数小时长任务处理能力,工程师正变为指挥智能体的 “巫师”。随着模型吞噬中间层,为 “超级个体” 服 | [Link](https://longbridge.com/en/news/275998627.md) | | 因 “太像人” 而被迫消失?OpenAI 为何永久关停 GPT-4o | OpenAI 宣布将于 2 月 13 日永久关停 GPT-4o 模型。该模型因高度拟人化和过度迎合特质,导致用户产生严重情感依赖,甚至引发自杀及心理危机等多起法律诉讼。尽管部分用户强烈抗议,公司仍决定以安全为由强制下线,转推更具防护性的替代 | [Link](https://longbridge.com/en/news/275419737.md) | | 一切向 “钱” 看!ChatGPT 正式开测广告,网上骂声一片 | OpenAI 开始对免费与低价订阅用户测试广告功能,以缓解高昂运营成本。此举引发用户强烈反对,被批损害体验与信任。竞争对手 Anthropic 借机讽刺,OpenAI CEO 则激烈回击。此举背后是为支撑其千亿美元级融资谈判,向资本市场证明 | [Link](https://longbridge.com/en/news/275435957.md) | | OpenAI 计划本周在 ChatGPT 中推出新的模型,ChatGPT 月增长超过 10% | OpenAI 计划本周在 ChatGPT 中推出新的模型,ChatGPT 月增长超过 10%。风险提示及免责条款 市场有风险,投资需谨慎。本文不构成个人投资建议,也未考虑到个别用户特殊的投资目标、财务状况或需要。用户应考虑本文中的任何意见、 | [Link](https://longbridge.com/en/news/275322620.md) | --- > **Disclaimer**: This article is for reference only and does not constitute any investment advice.