Check out our latest post: KPItarget Helps Prime House Direct Increase E-Commerce Sales by 339%
By Anna Freeman
OpenAI recently announced the release of their new flagship model, GPT-4o. The rollout of GPT-4o (in which the “o” stands for omni) was accompanied by a few other big announcements, marking significant change for OpenAI and its users. We have thoroughly examined the new model and the discourse surrounding it in order to provide you with some key takeaways about the capabilities and implications of GPT-4o.
OpenAI’s latest release came with three significant changes:
GPT-4o was the star of OpenAI’s May 13th presentation. The multimodal model has GPT-4 level intelligence combined with a new ability to reason across audio, vision, and text. Users can now have conversations with the generative model in real time, in which 4o produces responses across the three modalities. Unlike its predecessor, GPT-4, GPT-4o has native multimodality. This means that audio and vision input/output are not converted to text for processing; rather, an audio input remains in audio wave format throughout the reasoning process. This enables the new model’s quick response time, which is 2x faster than GPT-4 Turbo was.
OpenAI also announced that GPT-4o will be free for all users. This is an exciting transition for unpaid users, who were previously restricted to GPT-3.5. However, many GPT Plus subscribers are left wondering what exactly they are paying for. While everyone has access to GPT-4o, paid subscribers get 5x the usage capacity of unpaid subscribers. They also get “first dibs” on the model generally, meaning unpaid subscribers can only access 4o if and when there is capacity and are defaulted back to GPT-3.5 when there is not. Furthermore, paid subscribers are defaulted to GPT-4, which includes DALL-E image generation when they reach their (relatively high) GPT-4o capacity, guaranteeing them consistent access to DALL-E.
The release of GPT-4o was accompanied by the release of ChatGPT’s new desktop application. While this development might seem relatively underwhelming amongst its companions, the seamless device-wide integration offered by the new app can not be understated. With the desktop app, users can share visual information with the model from their screen rather than having to copy and paste content and verbally instruct the model to execute on-device actions. The singular, integrated interface offered by the desktop app combined with GPT-4o’s multimodal capabilities reduces friction in a GPT assisted workflow.
GPT-4o can translate speech in real time with its new audio chatbot. In the model’s release demo, the chatbot effectively translated between two OpenAI employees as one spoke in Italian and the other in English. The utility of this feature is reflected in the nearly 5% drop in stock price experienced by the language learning app DuoLingo in the week following GPT-4o’s release. Furthermore, the audio chatbot demonstrated its ability to perceive and generate a variety of emotions and tones in speech, switching voices as directed while telling a bedtime story. This highlights another useful modification in 4o’s audio capabilities; users can now interrupt and redirect the model as it is speaking, rather than having to wait until it is finished responding to the prompt.
The vision capabilities of GPT-4o are astounding. The model can now read handwriting and provide real time feedback to visual input in video or image form. As a result of this improved vision, GPT-4o can provide better analysis and queried feedback to images and videos. For example, if shown a live video or image of a potted plant in a room, GPT-4o could provide you with information ranging from the type of plant, care instructions, estimated height, and suggestions for where it would fit best in the visible area.
GPT-4o can also conduct basic data analysis. The model’s vision capabilities enable it to read and analyze chart input and produce data visualizations or conduct analyses with provided data. The model can provide descriptive statistics, assist in data cleaning and formatting, create predictive models, and conduct exploratory data analysis, among other tasks. However, it is important to familiarize yourself with OpenAI’s privacy policy and best practices for safe use before inputting sensitive data to any GPT model.
Similarly to previous models, the API for GPT-4o is available for purchase through OpenAI. However, lucky for developers, the GPT-4o API is 50% cheaper than GPT-4 and GPT-4 Turbo, costing only $5 per million tokens of input and $15 per million tokens of output. Such accessibility is not only great news for users looking to develop custom GPTs, but also largely aligned with OpenAI’s charter mission of fair and accessible development. Open AI CEO Sam Altman wrote in his blog about the release of GPT-4o, “Our initial conception when we started OpenAI was that we’d create AI and use it to create all sorts of benefits for the world. Instead, it now looks like we’ll create AI and then other people will use it to create all sorts of amazing things that we all benefit from.”
The introduction of GPT-4o is a milestone in AI development that should not be overlooked. With its advanced multimodal capabilities, accessibility for all users, and seamless integration through the new desktop application, the model is poised to revolutionize how we interact with AI. Users should take advantage of the free access to explore GPT-4o and how it functions for their specific needs. However, due to the popularity of the new model, those planning to use it frequently and consistently will likely benefit from the increased capacity of a paid subscription.