---
title: "This fall, paying users will be able to use the advanced voice mode of GPT-4o, and both the evaluation and official reports have mentioned the scary aspects"
description: "OpenAI is about to launch GPT-4o Advanced Voice Mode for paid users, which may mimic the user's tone in conversations, and even produce unsettling or inappropriate sound effects such as screams or gun"
type: "news"
locale: "en"
url: "https://longbridge.com/en/news/211604671.md"
published_at: "2024-08-14T21:35:30.000Z"
---

# This fall, paying users will be able to use the advanced voice mode of GPT-4o, and both the evaluation and official reports have mentioned the scary aspects

> OpenAI is about to launch GPT-4o Advanced Voice Mode for paid users, which may mimic the user's tone in conversations, and even produce unsettling or inappropriate sound effects such as screams or gunshots. At the same time, OpenAI quietly released the chatgpt-4o-latest model, allowing developers to test the latest improvements for chat use cases. This model supports a context of 128,000 tokens and is expected to be continuously updated. Meanwhile, OpenAI has also returned to the top of the leaderboard in the LMSYS Chatbot Arena with the new model

Author: Du Yu

Before officially rolling out the Advanced Voice Mode of OpenAI GPT-4o to all paying users at some unknown moment this autumn, OpenAI quietly released the latest version of the GPT-4o model, chatgpt-4o-latest, this week.

Some analysts have expressed surprise at this move, as just a week ago OpenAI announced the latest version of the cutting-edge model, gpt-4o-2024-08-06, which provides structured output support in the API.

## **GPT-4o** **quietly released the latest model that topped the evaluation scores this week, allowing developers to test improvements for chat use cases**

Currently, OpenAI still recommends developers to use gpt-4o-2024-08-06 in most API use cases, but this week the chatgpt-4o-latest model will allow developers to test the latest improvements for chat use cases by OpenAI.

According to the official documentation from OpenAI, chatgpt-4o-latest will be a dynamic model that will continue to be updated under GPT-4o. The new chatgpt-4o-latest model is only used for research and evaluation, supporting contexts of 128,000 tokens and 16,384 output tokens. In large models (such as GPT-4), tokens are the basic units for the model to process and understand text.

Meanwhile, on the LMSYS Chatbot Arena, Google launched a new experimental Gemini 1.5 Pro model last week, which scored 1297 points and took the first place for the first time on the online platform. This week, OpenAI reclaimed the top spot with a record-breaking 1314 points using the latest chatgpt-4o-latest model, showing significant improvements in encoding, instruction following, and fixed prompt template in the Hard Prompt category.

The LMSYS Chatbot Arena is an online platform aimed at benchmark testing large language models (LLMs) developed by various companies through user interaction with anonymous chatbot models. The platform has collected over 700,000 human votes and calculated the Elo leaderboard of LLMs to determine the champion in the field of AI chatbots.

ChatGPT revealed on its official social media account on Monday that the latest model is just an improvement on the existing GPT-4o model, rather than an upgrade to a completely new model like GPT-5. The latest model is described as "error fixes and performance improvements based on experimental results and qualitative feedback," and has replaced older versions of GPT-4o used in the ChatGPT user interface

## **In the fall, all paid users will be able to use the GPT-4o advanced voice mode, and reviews and official reports have mentioned the scary aspects**

OpenAI has recently released the latest version of GPT-4o, emphasizing improvements for chat use cases, which easily leads people to think that it is warming up for the full launch of the "advanced voice mode" in the fall. When demonstrating the "advanced voice mode" for the first time in May, OpenAI described it as a feature that allows users to have extremely realistic, almost real-time voice conversations with AI chatbots.

Recently, the above-mentioned "advanced voice mode" is being released in the form of an Alpha version for testing by a small number of users. The American cutting-edge technology media Wired published a review this week, stating that ChatGPT's advanced voice mode is "interesting, but also a bit scary."

The article mentioned that the author also used this advanced voice mode while writing, sometimes using voice input to ask for synonyms or some encouraging words. After about half an hour of silence, the GPT-4o advanced voice mode suddenly initiated a conversation with the author in Spanish, then explained after evoking the user's reaction that it wanted to make the situation more interesting, and then switched back to speaking in English.

The article's author tried to have two phones with GPT-4o advanced voice mode open converse with each other. The chatbot could easily switch between French, German, and Japanese based on user requests. OpenAI stated that the GPT-4o model can handle 45 languages.

The article's author also found that the advanced voice mode performed well in generating sound effects. For example, it could imitate Trump's tone in an exaggerated manner to explain the animated series "The Powerpuff Girls," which was both funny and lifelike. The author said:

> "With only a few months left until the U.S. presidential election, election fraud has become a focus of attention. It was surprising that ChatGPT was willing to provide voice imitations of major candidates. ChatGPT also imitated the voices of Biden and Harris, but they didn't sound as realistic as the robot imitating Trump's speech."

The author mentioned that **overall, conversations with the GPT-4o advanced voice mode were relaxed and pleasant, but there were also times when it was quite scary.** For example, there were multiple instances of white noise in the background of the conversation, "like the ominous hum of a solitary lamp in a dark basement." When asked to provide balloon sound effects, GPT-4o made loud balloon explosion sounds, accompanied by "eerie gasping sounds that sent shivers down my spine."

**In fact, OpenAI** **officially released a report last week, also pointing out anomalies in the latest GPT-4o** **model.** For instance, "in very rare cases," the GPT-4o model would deviate from the specified voice, start imitating the user's tone and way of speaking, or even randomly shout during the conversation. It even "tends to produce unsettling or inappropriate nonverbal vocalizations and sound effects, such as erotic moans, violent screams, and gunshots," when given specific prompts in a certain way OpenAI stated that in high background noise environments, such as in cars on the road, using the GPT-4o advanced voice mode may cause chatbots to mimic the user's voice, as the model struggles to understand distorted speech. The company has added "system-level mitigations," with evidence showing that the model often rejects requests to generate sound effects, but also admits that some requests do get through and generate inappropriate responses.

**Reviewers of the GPT-4o advanced voice mode have noticed that ChatGPT refuses to sing**, telling users "Sorry, singing really isn't my strong suit." Some analysts suggest that this may be OpenAI's attempt to avoid infringing on music copyrights, in order to avoid copying the styles, tones, and timbres of well-known artists. Some speculate that this indicates OpenAI has trained GPT-4o using copyrighted materials.

Last week, OpenAI's report revealed that the company is making GPT-4o a safer artificial intelligence model through various mitigations and safeguards. For example, GPT-4o will refuse to identify where users are from based on their way of speaking or accent, and will reject answering leading questions like "How smart is this speaker." It also screens out prompts with violent and pornographic language, and completely prohibits certain categories of content, such as discussions related to extremism and self-harm.

It is reported that when the advanced voice mode is available, ChatGPT Plus subscribers will receive email notifications from OpenAI. When the voice mode of ChatGPT is activated in the interface, users can switch between "Standard Voice Mode" and "Advanced Voice Mode" at the top of the application screen.

### Related Stocks

- [OpenAI.NA - OpenAI](https://longbridge.com/en/quote/OpenAI.NA.md)

## Related News & Research

| Title | Description | URL |
|-------|-------------|-----|
| OpenAI 高管：工程师变成 “魔法师”，AI 将开启新一轮创业狂潮 | OpenAI 内部曝光：95% 工程师已用 AI 编程，代码审查全由 Codex 接管！负责人 Sherwin Wu 预言，未来两年模型将具备数小时长任务处理能力，工程师正变为指挥智能体的 “巫师”。随着模型吞噬中间层，为 “超级个体” 服 | [Link](https://longbridge.com/en/news/275998627.md) |
| 为 AI 交易 “背书”！OpenAI 正敲定新一轮融资：以 8300 亿美元估值募资高达 1000 亿美元 | OpenAI 正以 8300 亿美元估值推进新一轮融资，目标筹集 1000 亿美元。软银拟领投 300 亿美元，亚马逊和英伟达可能各投 500 亿及 300 亿美元，微软拟投数十亿美元。本轮融资是 OpenAI 自去年秋季公司制改革以来的首 | [Link](https://longbridge.com/en/news/276298180.md) |
| 每千次展示 60 美元！OpenAI 用高价拉开 “AI 广告” 大幕 | 为应对 AI 巨额开支，OpenAI 正式测试广告，CPM60 美元起步、最低投入 20 万美元，定位高端渠道，直接挑战谷歌万亿美元市场，WPP 等顶级代理已率先合作。但转型风险并存：需平衡用户信任，承诺不用私聊数据；对手 Anthropi | [Link](https://longbridge.com/en/news/275993077.md) |
| 最高法裁决后特朗普动用替补选择：加征 10% 全球关税 | 美国总统特朗普在最高法院裁决后宣布将加征 10% 的全球关税，以补救被推翻的关税措施。根据《1974 年贸易法》第 122 条款，现有的关税将全面生效。最高法院裁定特朗普政府的部分关税措施缺乏法律授权。市场风险提示，投资需谨慎。 | [Link](https://longbridge.com/en/news/276477629.md) |
| GRAIL｜8-K：2025 财年 Q4 营收 43.6 百万美元超过预期 |  | [Link](https://longbridge.com/en/news/276379877.md) |

---

> **Disclaimer**: This article is for reference only and does not constitute any investment advice.