- OpenAI introduces Sora, a groundbreaking generative AI model that creates realistic high-definition videos from simple descriptive sentences.
- Sora extends OpenAI’s reach into video generation, following the success of its chatbot and image generation technologies, offering users creative opportunities but also raising concerns about the spread of misinformation, particularly with the surge in AI-generated deepfakes.
- Sora competes with video-generation AI tools from tech giants like Meta and Google, as well as startups, with its capabilities currently limited to generating videos of one minute or less, while OpenAI works on expanding its multimodality approach integrating text, image, and video generation.
OpenAI has recently unveiled groundbreaking software that enables the creation of lifelike videos through simple descriptive sentences.
Following its surge in popularity last year with ChatGPT, OpenAI is now venturing into the realm of video with its latest artificial intelligence technology.
Introduced on Thursday, Sora is the company’s newest generative AI model, operating akin to OpenAI’s image-generation tool, DALL-E. Users input a description of a desired scene, and Sora promptly generates a high-definition video clip. Moreover, Sora can derive inspiration from still images to produce video clips, as well as extend existing videos or fill in missing frames.
As chatbots and image generators have already permeated both consumer and business domains, video emerges as the next frontier for generative AI. While AI enthusiasts eagerly anticipate the creative possibilities, the proliferation of these new technologies raises serious concerns about misinformation, especially with major political elections looming globally. According to data from Clarity, a machine learning firm, the creation of AI-generated deepfakes has surged by 900% year over year.
With Sora, OpenAI aims to rival video-generation AI tools offered by tech giants like Meta and Google, the latter of which announced Lumiere in January. Additionally, startups like Stability AI with its product Stable Video Diffusion, and Amazon with Create with Alexa, are also in the competition.
Currently, Sora’s capabilities are limited to generating videos of one minute or less. OpenAI, supported by Microsoft, envisions multimodality—integrating text, image, and video generation—as a key objective to broaden its suite of AI models.
Brad Lightcap, OpenAI’s COO, emphasized the significance of multimodality, stating, “The world is multimodal…the world is much bigger than text.” Sora has undergone testing by a select group of safety testers, or “red teamers,” who evaluate the model for vulnerabilities, particularly in areas such as misinformation and bias. Although only a few sample clips are available on its website, OpenAI plans to release its technical paper later on Thursday.
To address concerns regarding the authenticity of Sora-generated content, OpenAI is developing a “detection classifier” and intends to include specific metadata in its output. This metadata aims to assist in identifying AI-generated content, similar to Meta’s approach for AI-generated images during this election year.
Sora, like ChatGPT, operates on the Transformer architecture introduced by Google researchers in a 2017 paper. OpenAI describes Sora as a diffusion AI model that serves as a foundation for understanding and simulating the real world.