Open AI's New Text-To-Video Ai Demos, An Explainer

On Thursday, February 15th, the people who gave us DALL-E and CHATGPT unveiled their new text-to-video model, Sora, which is capable of generating videos based on user text prompts. Sora can create up to a minute of photo-realistic content with high visual quality and no lags. To be honest, regardless of what you think about AI, the results are pretty impressive

If you’ve spent any time online in the last day you’ve no doubt seen some of the footage on X, which includes a scene of a woman walking through a Tokyo street at night, “historical footage” of California during the gold rush, and a dalmatian walking on window ledges in Burano Italy, among others.

Of course, this has inspired all the same conversations every new AI innovation brings. There is a crowd of crypto bros and AI true believers who absolutely love it and think it’s the greatest thing on Earth, and then there are a bunch of people who hate the new technology with a passion, fearing it’ll ruin jobs in the creative industry and be used as a tool for spreading disinformation.

OpenAI has released a lengthy blog post about the next tech, we’ll highlight some of the most important points here and discuss a few of the concerns people have and the claims people have made.

Is Sora Available To The Public?

Prompt: “A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. she wears a black leather jacket, a long red dress, and black boots, and carries a black purse. she wears sunglasses and red lipstick. she walks confidently and casually.… pic.twitter.com/cjIdgYFaWq

— OpenAI (@OpenAI) February 15, 2024

As of now, no. OpenAI is not making Sora broadly available to the public just yet, instead, OpenAI says it is “granting access to a number of visual artists, designers, and filmmakers to gain feedback on how to advance the model to be most helpful for creative professionals.”

The company is sharing its research progress with all of us to give people a sense of what the tech might be capable of.

Where does Sora fall short?

What if I tell u this video is not real, these #wolf #Pups are not real.
This is generated with Open AI's #Sora .
Prompt: Five gray wolf pups frolicking and chasing each other around a remote gravel road, surrounded by grass. The pups run and leap, chasing each other, and nipping… pic.twitter.com/YBgFXCIVLZ

— MedSciInsights 💙 💙 (@Health__Vibes) February 16, 2024

OpenAI is aware of some of the program’s weaknesses so far. Specifically, OpenAi says that Sora “may struggle with accurately simulating the physics of a complex scene, and may not understand specific instances of cause and effect” and has issues with spatial specifics such as “mixing up left and right, and may struggle with precise descriptions of events that take place over time, like following specific camera trajectory.”

That may be why Sora is only capable of making minute-long videos. One thing many people have noticed about AI programs, specifically ChatGPT, is that the longer the program produces, the more weird and psychedelic the results will be.

Is Sora safe?

A big concern over AI video generation is the proliferation of deep fake and misleading content that will come in its wake. It seems like OpenAI is aware of this and they address in their blog post that they have a special team that is tasked with adversarially testing the model. These “red teamers,” as OpenAI calls them will be testing Sora and pushing the program in areas like “misinformation, hateful content, and bias.”

The program will have a “text classifier,” which will check and reject prompts that violate OpenAI’s usage policies, keeping the program from being used to create content that contains “extreme violence, sexual content, hateful imagery, celebrity likeness, or the OP of others.”

OpenAI also claims that it will be engaging with “policymakers, educators, and artists around the world to understand their concerns and to identify positive use cases for this new technology.”

How good does the technology look?

All of these UNBELIEVABLE videos were created using Sora, the new AI model from OpenAI

Watch each one and see how it makes you feel…

I don't think it's crazy for me to say this going to shift Hollywood, social apps and media forever

Video #1
Prompt: The camera directly faces… pic.twitter.com/NJfphtGbWb

— GREG ISENBERG (@gregisenberg) February 15, 2024

That’s going to be entirely subjective. Look, is Sora capable of making photo-realistic scenes? Clearly. But the longer you look at these videos, the stranger they appear. With enough scrutiny, it’s still pretty easy to tell if what you’re looking at is AI-generated. What I’ve seen from OpenAI looks like what I imagine the next generation of video consoles will look like.

And yet… at first glance is it easy to tell real from fake? Maybe not. That’s where the concern lies.

Is this the end for photographers, filmmakers, and the various visual creative industries?

The reason I'm not scared (yet) of the Sora vids as an animator is that animation is an iterative process, especially when working for a client

Here's a bunch of notes to improve one of the anims, which a human could address, but AI would just start over

What client wants that? pic.twitter.com/VGAjGguZIQ

— Owen Fern (@owenferny) February 16, 2024

We highly doubt that. As much as the people who are all-in on AI technology believe that this will empower everyone to be filmmakers, the technology isn’t there yet. As impressive as Sora is, the difference between the sort of videos is capable of churning out and what you see on the big screen is significant. For now.

Sora looks realistic, but it’s not. And that’s glaringly obvious. There is a certain unnaturalness to the lighting and a tough-to-pin-down soullessness. Take for instance this prompt, posted on OpenAI’s X account:

“A movie trailer featuring the adventures of the 30-year-old spaceman wearing a red wool knitted motorcycle helmet. blue sky, salt desert, cinematic style, shot on 35mm film. vivid colors.”

Prompt: “A movie trailer featuring the adventures of the 30 year old space man wearing a red wool knitted motorcycle helmet, blue sky, salt desert, cinematic style, shot on 35mm film, vivid colors.” pic.twitter.com/0JzpwPUGPB

— OpenAI (@OpenAI) February 15, 2024

It would appear that Sora was able to follow the prompt well… but not really. The 17-second video doesn’t show any type of “adventure,” it shows the most generic dude I’ve ever seen wandering around a white sand environment, a dorky-looking spaceship, and several different camera angles. This is hardly a movie trailer, it’s a series of random images.

Also, as someone who shoots 35mm regularly, this doesn’t look anything like film. I’m sorry to the people at OpenAI but it just doesn’t (caveat everything with “not yet”). I’ll give them the vivid colors though!

So I don’t think Sora is going to take away the need for actual filmmakers and visual artists just yet. That doesn’t mean movie studios won’t think that it can, though — they are notorious for not having compassion for the people behind their films.

So no, we don’t think Sora or programs like it will turn everyone into filmmakers (yet), but it certainly might change the pitching process.

How is Sora trained?

The video on the left was one of ~20 shared by OpenAI in its announcement of Sora, its text-to-video generator. It claims the video was, as this viral tweet notes, "generated by Sora."

The video on the left is from Shutterstock, with whom OpenAI has a partnership. https://t.co/QobhX01FJf pic.twitter.com/wp968R2WLy

— Brian Merchant (@bcmerchant) February 16, 2024

We don’t know, and that’s the problem. Past AI programs like ChatGPT and DALL-E have been trained by preexisting art and media, which is a big problem because the creators of that art didn’t necessarily give consent. This has led many critics to say that AI isn’t capable of generating anything but remixed content that erases the hard work of actual creatives.

All OpenAI reveals about how Sora was trained is that it learned from “a wide range of video data without adapting or preprocessing the videos.”

But what video data? Where did it come from? Who authorized it? The fact that OpenAI isn’t fully transparent about this is a major problem.

Will this change everything?

🚨 BREAKING: OpenAI just announced their new Text-To-Video model called Sora.

This video was made with the not-yet-released #Sora AI technology just announced from @OpenAi

This changes everything. It's 27 seconds from a text prompt. pic.twitter.com/UBr15x6rEs

— Captain YAR (@SobkoYaroslav) February 15, 2024

While we’re highly skeptical that AI will replace the need for creatives, to say that this sort of technology isn’t going to have a massive effect on the world as we know it, is kind of naive.

OpenAI might be taking safety into account in the creation of Sora, but we’re not sure how these tools will be used in the future, or what kind of workarounds people with ill-intentions will find. As of now, there are no significant laws that address the use of AI and there are a bunch of ethical questions that need to be considered.

Clearly, we need some sort of AI legislation sooner than later, and it seems like some states are starting to take that on. Axios reports that New York Governor Kathy Hochul has just proposed legislation that would criminalize deceptive and abusive uses of AI. Vox reports that last week California state Senator Scott Wiener introduced a piece of AI legislation that attempts to establish “clear, predictable, common-sense safety standards for developers of the largest and most powerful AI systems.”

Whether these bills will go too far, or far enough, remains to be seen. AI technology is moving fast, and while what it is capable of is a little overblown by people on both sides of the AI debate, it is something we need to concern ourselves with now. The more we know, the better off we’ll be once the technology is fully indistinguishable from human efforts.