ChatGPT Can Now See and Speak Video

OpenAI's ChatGPT was given the ability to process video, on Dec. 12, 2024, in San Francisco. The feature, now available for Plus and Pro users, enables the AI to analyze live video feeds. It aims to make chats more helpful and enjoyable.

ChatGPT’s newest upgrade enhances its voice and text powers. You can now point your phone camera at anything — a coffee maker, for example — and get real-time tips. OpenAI had teased the option in May 2024 with its GPT-4o model. Finally, after delays, it’s here, allowing users to share screens or ask what’s in view. The rollout was quick, finishing up in days for most subscribers.

Until this, ChatGPT could do text and static images. Now it can identify objects and provide a real-time response. For instance, it can talk you through fixing a bike or explain math problems that appear on your screen. OpenAI, which says this makes the AI more like a real helper. Competitors such as Google’s Gemini 2.0 are also pursuing video capabilities, complicating the A.I. race.

“We’re excited to bring vision to life,” OpenAI’s Chief Product Officer Kevin Weil said in a livestream. The demo displayed ChatGPT preparing coffee and reading messages out loud. Even so, not everyone has it yet — Enterprise and EU users will have to wait until January. So is OpenAI; they’re working on safety so this powerful tool can’t run amok.

Conclusion

ChatGPT’s big leap into video could transform our daily life with AI. It could assist with work, studying or fun active projects like cooking. Wide access and accuracy refinements are the next steps. OpenAI’s jump could be the spark for smarter, more visual A.I. tools in the near term, as competitors catch up.

ChatGPT Can Now See and Speak Video

Conclusion

References