OpenAI Introduces GPT-4 Omni

Editor in Chief at ApiX-Drive

Reading time: ~2 min

OpenAI has unveiled its new flagship generative AI model, GPT-4o, with the “o” standing for “omni,” indicating its capability to handle text, speech, and video. The company plans to release GPT-4o gradually across its developer and consumer-facing products in the coming weeks. According to OpenAI CTO Mira Murati, GPT-4o offers "GPT-4-level" intelligence while enhancing GPT-4’s capabilities across various modalities and media.

GPT-4o significantly enhances the functionality of OpenAI’s AI-powered chatbot, ChatGPT. While the platform has previously offered a voice mode that transcribes responses using a text-to-speech model, GPT-4o amplifies this feature, enabling users to interact with ChatGPT more like an assistant. Users can ask questions and interrupt the GPT-4o-powered ChatGPT during its responses. The model provides “real-time” responsiveness and can detect nuances in a user’s voice, generating responses in various emotive styles.

Additionally, GPT-4o improves ChatGPT’s visual capabilities. When given a photo or a desktop screen, ChatGPT can now answer related questions more effectively. In the future, this model could enable ChatGPT to “watch” videos and provide commentary and explanations. GPT-4o is also more multilingual, with enhanced performance in around 50 languages. Within OpenAI’s API and Microsoft’s Azure OpenAI Service, GPT-4o is reported to be twice as fast, half the price, and offers higher rate limits compared to GPT-4 Turbo.

GPT-4o is available on the free tier of ChatGPT and to subscribers of OpenAI’s premium ChatGPT Plus and Team plans, with "5x higher" message limits. OpenAI also notes that when users reach the limit, ChatGPT will automatically switch to GPT-3.5.