Here’s what OpenAI said about its latest model GPT-4o :
OpenAI has announced the release of its latest multimodal AI model, GPT-4o, which will be available to all users for free. The model is unusual in that it can accept any combination of text, audio, and image inputs and produce any combination of text, audio, and image outputs. OpenAI claims to have GPT-4-level intelligence that is “much faster and improves on its capabilities across text, voice, and vision.” Furthermore, OpenAI says that its audio reaction time is comparable to human response time.
GPT-4o will also be available to developers via the API, and it is said to be twice as quick and half the price of GPT-4 Turbo. While GPT-4o’s features are free, premium users get five times the capacity limits.
Text and image capabilities are being rolled out today in ChatGPT-4o, while the remaining features will be rolled out incrementally. In the coming weeks, OpenAI intends to release GPT-4o’s additional audio and video capabilities to a “small group of trusted partners in the API”.
What can GPT-4o do?
Text capabilities
Improvements acrosss languages
According to Open AI, 4o “matches GPT-4 Turbo performance on text in English and code, with significant improvement on text in non-English languages.” ChatGPT supports over 50 languages. The effectiveness of Indian languages such as Gujarati, Telugu, Tamil, Marathi, and Urdu has apparently improved significantly.
Based on text inputs, the model can generate various images depicting a visual narrative, as well as caricatures. Furthermore, it may transform textual input into the desired typography.
Audio capabilities:
The GPT-4o supposedly provides significant increases in audio outputs. Previous editions of did include Voice Mode, but it was substantially slower because it employed three independent models to generate an output. It also couldn’t detect tone, numerous speakers, or background noises, and it couldn’t laugh, sing, or exhibit emotion. “This also introduces a lot of delay into the experience, which truly disrupts the immersion in the partnership with ChatGPT. “But now, with GPT-40, this all happens natively,” said Mira Murati, OpenAI’s Chief Technology Officer, during a live demonstration.
In its livestream, OpenAI highlighted that GPT-4o had the ability to be interrupted, reply in real time, and pick up emotions, and demonstrated how 4o’s audio output could “generate voice.”
Visual capabilities
The model is said to have increased visual capabilities, allowing users to interact with video. During the live demonstration, OpenAI demonstrated the model’s ability to advise users in solving equations. It further claimed that 4o can recognize objects, deliver information, and communicate with them, as shown in this video of GPT-40 detecting objects and providing real-time Spanish translation. OpenAI also demonstrated that 4o can analyze data on the desktop app.
How safe is GPT-4o?
According to Murati, “GPT-40 presents new challenges for us when it comes to safety, because we’re dealing with real-time audio, real-time vision.” According to OpenAI’s Preparedness Framework evaluation, GPT-4o does not score higher than Medium risk in cybersecurity, Chemical, Biological, Radiological, and Nuclear (CBRN) information, persuasion, and model autonomy. They admitted that the GPT-4o’s audio capabilities pose particular hazards. As a result, audio outputs will be limited to a set of preset voices at launch.
OpenAI has added several new capabilities in the last month, including a’memory’ function for ChatGPT + users that allows the AI model to recall information provided by users across talks. The feature may be switched on or off in the personalization options, and recorded memories can be ‘forgotten’ simply deleting.