--- title: "GPT-5.2 First Release Review: Expert Deep Experience for Two Weeks, Strong to the Point of Absurdity, but Infuriatingly Slow" type: "News" locale: "en" url: "https://longbridge.com/en/news/269446376.md" description: "OpenAI launched GPT-5.2 to compete with Google Gemini 3, calling it a significant update. GPT-5.2 has notable improvements in instruction following, code generation, visual capabilities, and long context, but it is slower. OthersideAI CEO Matt Shumer's in-depth review pointed out its excellent performance in deep reasoning, but speed is a major drawback. GPT-5.2 Pro is close to professional level in command line tools, but the advanced reasoning mode requires a long wait" datetime: "2025-12-12T00:05:53.000Z" locales: - [zh-CN](https://longbridge.com/zh-CN/news/269446376.md) - [en](https://longbridge.com/en/news/269446376.md) - [zh-HK](https://longbridge.com/zh-HK/news/269446376.md) --- # GPT-5.2 First Release Review: Expert Deep Experience for Two Weeks, Strong to the Point of Absurdity, but Infuriatingly Slow In order to urgently counter Google's Gemini 3, OpenAI has just launched GPT-5.2, and Sam Altman claims this is the biggest update in a long time. The benchmark scores are officially published in the blog post, with impressive performance in programming capabilities, but these scores are just for reference; those interested can check here. https://openai.com/index/introducing-gpt-5-2/ The hallucination rate of GPT-5.2 has decreased by about 30-40%. The price has increased. Additionally, there will be a release around Christmas, likely an update to the image model? The "adult mode" for ChatGPT is currently planned for release in the first quarter of next year. Here I share a deep review of GPT-5.2 by the expert Matt Shumer, CEO of OthersideAI, who has had access to the beta test for two weeks. Key points summarized: **Instruction following and task willingness**: GPT-5.2 Thinking has made significant strides in following instructions and the willingness to tackle difficult tasks. **Code generation capability has greatly improved**: Much better than GPT-5.1. It is more powerful, autonomous, cautious, and willing to write much more code. **Visual and long context**: Substantial improvements, especially in understanding positions in images and handling large codebases. **Speed is a major drawback**: In the author's experience, the Thinking mode is very slow in handling most problems (although feedback from other testers varies). He almost never uses Instant mode. **GPT-5.2 Pro**: Extremely strong in deep reasoning, but slow, and occasionally fails after long periods of contemplation **Codex CLI**: GPT-5.2 is the model that comes closest to Pro-level coding capabilities used by the author in command line tools, but achieving this capability's advanced reasoning mode sometimes requires a long wait. Here are the detailed evaluation contents. ## **GPT-5.2 Thinking: Intuitive Enhancement** The most striking aspect of GPT-5.2 is the way it follows instructions—not the basic "I say you do," but "truly completing the entire task I describe." The author gives an example. When testing creative writing, he asked the model to come up with 50 plot ideas first, and then select the best one to write the story. Most models would take shortcuts, possibly providing only 10 ideas and starting with one. However, GPT-5.2 indeed generated all 50 ideas before making a selection. This may sound trivial, but it is not. In creative work or research, the extra 40 ideas may contain that truly interesting spark. The model trusts the process rather than optimizing for speed, which is crucial. The author further tested it by asking it to write a 200-page book. Although the content of the book pages was weak and short, and the model could not write a publishable novel in one go, what was impressive was that it did **attempt** to do so. It constructed the entire structure of the book and even set it up in PDF format. Most models would assume they couldn't do it and wouldn't even try; they would tell you "this is too long," or just give you an outline. GPT-5.2, on the other hand, jumped right in. This willingness to attempt grand tasks (even if imperfect) opens up new workflows. ## **Code Generation: Real Progress** GPT-5.2 has indeed made significant progress in code generation compared to previous models. The quality of the code it writes is higher, and it can handle larger tasks. For example, the author used Three.js animations to stress-test its spatial reasoning abilities. He asked the model to construct a baseball scene, and the style it generated was more realistic than that of most models (the texture/lighting effects were great), but there is still considerable room for improvement in spatial awareness and object placement. Additionally, the model is willing to write much more code than previous versions and can work continuously for longer periods without interruption. This is a tangible improvement in capability. ## **Visual and Long Context** The visual capabilities of version 5.2 have significantly improved. Its understanding of images, especially regarding position and spatial relationships, has changed greatly (although spatial generation capabilities are still under development). This is good news for intelligent agents operating computers. Its long context capabilities are also excellent. It feels more stable when handling large codebases, vast amounts of data, and lengthy analyses, which is one of the reasons GPT-5.2 performs well in intelligent agent coding workflows The author here complains a bit: the model has become so powerful, but OpenAI's ChatGPT interface has completely failed to keep up. For example, the Canvas interface in ChatGPT still cannot handle large amounts of code. He initially tried a Three.js test in Canvas, but the amount of code output by the model exceeded Canvas's processing capacity. Additionally, the Pro mode can still only be used within ChatGPT and not in the Codex CLI, which continues to frustrate the author. To solve this problem, he uses a tool called RepoPrompt: it converts the local code repository into prompts to paste into 5.2 Pro, and then pastes the model's responses back into RepoPrompt, which applies the changes to the code repository. Although this adds an extra step, it allows him to leverage Pro-level reasoning capabilities on a real codebase. ## **Style** Anyone who has used OpenAI models knows their obsession with bullet points. GPT-5.2 continues this tradition. When you ask it to explain something, you often get a bullet point list, while a few clear paragraphs would be more effective. By carefully designing prompts (such as explicitly requesting a fluent prose style), this issue can be avoided. Aside from bullet points, the overall writing style has improved. Although it is not a huge leap from GPT-5.1, it is indeed somewhat better. On the positive side, GPT-5.2 has learned to keep answers concise. When I ask some simple questions, I occasionally get straightforward answers. The author notes that while this has not yet become the default behavior, it is progress. ## **Speed Issues** This affects the author's daily use: standard GPT-5.2 Thinking is slow. Based on his experience, even simple and direct questions are **very, very slow**. However, he also mentions that other testers have reported different speed performances, with some tasks being fast and others slow. In practical work, this means he rarely uses GPT-5.2 Thinking. His workflow has become: - • Quick questions → Claude Opus 4.5 - • Deep reasoning → GPT-5.2 Pro The standard Thinking model is in an awkward middle ground: slower than Opus, yet lacking the complete reasoning advantages of Pro. ## **Horizontal Comparison of Models** The author simultaneously uses Claude Opus 4.5, Gemini 3 Pro, and GPT-5.2, which form a clear division of labor in his workflow: **For quick questions**: Questions like "What is the syntax of X" or "Remind me how Y works" are won by Claude Opus 4.5. It is faster and more direct **For research and complex reasoning**: GPT-5.2 Pro is clearly superior. When tasks require thinking from multiple angles and integrating a large amount of context, Pro performs optimally. **For front-end UI generation**: Both GPT-5.2 Thinking and Pro have improved compared to previous GPT models, but neither is as good as Gemini 3 Pro. The difference here is subtle: Gemini 3 Pro has the best aesthetic sense, and its UI looks great. However, its reliability in layout and front-end engineering is slightly worse. So, if a UI needs to function correctly and handle edge cases, the author would still use Opus or GPT. If the goal is just to look good and the user is willing to fix the code themselves, Gemini 3 Pro is currently the best choice. ## **GPT-5.2 Pro: A Slow Genius** The Pro mode is where things get really interesting. It is an independent system and is only available in ChatGPT. In short: Pro is absurdly smart. The intelligence gap between Thinking and Pro is immediately apparent. But more important than the original intelligence is Pro's **willingness to think**. It will take much longer than previous Pro models to solve a problem. For research tasks, if necessary, it will spend an extremely long time gathering information. **Recipe Test** The author provided a specific example. He asked the model for meal planning help and emphasized that he "has no time to cook," needing a 7-day plan (three meals and two snacks each day). Pro provided an excellent recipe plan, but what stood out most was its **ingredient list**—much simpler than those suggested by other models. It understood that "no time" not only limited cooking time but also the complexity of shopping, preparation work, and mental effort. It grasped the author's mindset, not just the literal request. The author stated that seeing this level of understanding was quite shocking. He sent the same prompt to all other leading models, and none considered this. ## **Prompt Writing** GPT-5.2 is very good at writing prompts, which is helpful for making the most of AI models and building software that integrates LLMs. The prompts it writes are thoughtful and can anticipate edge cases that the author had not considered. In this regard, it is on par with Claude Opus 4.5 and clearly superior to Gemini 3 Pro. ## **Codex CLI Practical Test** In Codex CLI, the author conducted extensive testing on GPT-5.2, and the results were increasingly impressive. This is the closest experience to a Pro-level model he has seen in the command line. Its accuracy rate is far higher than other tools. The downside is that he could only use the "ultra-high reasoning mode," which sometimes takes a long time, even slower than Pro Its autonomy has significantly improved compared to previous models. But the real difference lies in how it **collects context**. Claude Opus 4.5 tends to start writing code before fully understanding the problem; it makes assumptions and then hits a wall. In contrast, GPT-5.2 does not operate this way. It first asks questions, reads files, and explores the codebase. **It collects context before writing code**. This has improved the author's workflow. He checks the model's work less frequently. Unless the task is critical (such as producing code), he often lets it run directly without reviewing every change. ## **Some Quirks** The author also encountered some strange behaviors in Pro mode. It seems to get stuck between conflicting instructions, hesitating for several minutes before returning a simple task to the user. Occasionally, it thinks for a long time and still fails, which is very time-consuming. It is reported that OpenAI is aware of this and is investigating. ## **Use Case Summary** After two weeks of testing, the author provided his practical breakdown: 1. 1\. Quick questions and daily tasks: Claude Opus 4.5 remains the first choice. It is fast, accurate, and does not waste time. 2. 2\. In-depth research and complex reasoning: GPT-5.2 Pro is currently the best option. In this scenario, correctness is more important than speed. 3. 3\. Front-end styling and UI aesthetics: Gemini 3 Pro currently generates the best-looking results, but be prepared to do some engineering cleanup work yourself. 4. 4\. Serious coding work in Codex CLI: GPT-5.2 is the first choice, as its context collection behavior and reliability make it the default option for agent coding tasks. ## **Final Summary** GPT-5.2 is a real improvement. Its ability to follow instructions has significantly increased, and the intelligence and reliability of Pro mode are impressive. For complex tasks that require careful reasoning, this is the best model the author has used. However, the speed issues with the standard Thinking model mean he rarely uses it in daily tasks. His final usage is: quick tasks with Opus 4.5, deep work with Pro. But for tasks that GPT-5.2 excels at, its performance is indeed outstanding. source: https://shumer.dev/gpt52review Risk Warning and Disclaimer The market has risks, and investment requires caution. This article does not constitute personal investment advice and does not take into account the specific investment goals, financial situation, or needs of individual users. Users should consider whether any opinions, views, or conclusions in this article are suitable for their specific circumstances. Investing based on this is at your own risk ### Related Stocks - [OpenAI.NA](https://longbridge.com/en/quote/OpenAI.NA.md) ## Related News & Research - [OpenAI launches ChatGPT for personal finance, will let you connect bank accounts](https://longbridge.com/en/news/286591478.md) - [OpenAI Announces AI Breakthrough As Model Solves 80-Year-Old Erdős Problem In Discrete Geometry](https://longbridge.com/en/news/287195292.md) - [ChatIPO: Deutsche Breaks Down What To Expect From OpenAI's Record-Breaking Public Plans](https://longbridge.com/en/news/287258845.md) - [OpenAI is hiring workers to reduce 'friction' in communities where it's building data centers](https://longbridge.com/en/news/286747557.md) - [Sam Altman makes ‘mic drop’ offer to every Y Combinator startup](https://longbridge.com/en/news/287122723.md)