An Explainer On The Text-To-AI Video Demos That Are Blowing People’s Minds

On Thursday, February 15th, the people who gave us DALL-E and CHATGPT unveiled their new text-to-video model, Sora, which is capable of generating videos based on user text prompts. Sora can create up to a minute of photo-realistic content with high visual quality and no lags. To be honest, regardless of what you think about AI, the results are pretty impressive

If you’ve spent any time online in the last day you’ve no doubt seen some of the footage on X, which includes a scene of a woman walking through a Tokyo street at night, “historical footage” of California during the gold rush, and a dalmatian walking on window ledges in Burano Italy, among others.

Of course, this has inspired all the same conversations every new AI innovation brings. There is a crowd of crypto bros and AI true believers who absolutely love it and think it’s the greatest thing on Earth, and then there are a bunch of people who hate the new technology with a passion, fearing it’ll ruin jobs in the creative industry and be used as a tool for spreading disinformation.

OpenAI has released a lengthy blog post about the next tech, we’ll highlight some of the most important points here and discuss a few of the concerns people have and the claims people have made.

Is Sora Available To The Public?

As of now, no. OpenAI is not making Sora broadly available to the public just yet, instead, OpenAI says it is “granting access to a number of visual artists, designers, and filmmakers to gain feedback on how to advance the model to be most helpful for creative professionals.”

The company is sharing its research progress with all of us to give people a sense of what the tech might be capable of.

Where does Sora fall short?

OpenAI is aware of some of the program’s weaknesses so far. Specifically, OpenAi says that Sora “may struggle with accurately simulating the physics of a complex scene, and may not understand specific instances of cause and effect” and has issues with spatial specifics such as “mixing up left and right, and may struggle with precise descriptions of events that take place over time, like following specific camera trajectory.”

That may be why Sora is only capable of making minute-long videos. One thing many people have noticed about AI programs, specifically ChatGPT, is that the longer the program produces, the more weird and psychedelic the results will be.

Is Sora safe?

A big concern over AI video generation is the proliferation of deep fake and misleading content that will come in its wake. It seems like OpenAI is aware of this and they address in their blog post that they have a special team that is tasked with adversarially testing the model. These “red teamers,” as OpenAI calls them will be testing Sora and pushing the program in areas like “misinformation, hateful content, and bias.”

The program will have a “text classifier,” which will check and reject prompts that violate OpenAI’s usage policies, keeping the program from being used to create content that contains “extreme violence, sexual content, hateful imagery, celebrity likeness, or the OP of others.”

OpenAI also claims that it will be engaging with “policymakers, educators, and artists around the world to understand their concerns and to identify positive use cases for this new technology.”

How good does the technology look?

That’s going to be entirely subjective. Look, is Sora capable of making photo-realistic scenes? Clearly. But the longer you look at these videos, the stranger they appear. With enough scrutiny, it’s still pretty easy to tell if what you’re looking at is AI-generated. What I’ve seen from OpenAI looks like what I imagine the next generation of video consoles will look like.

And yet… at first glance is it easy to tell real from fake? Maybe not. That’s where the concern lies.

Is this the end for photographers, filmmakers, and the various visual creative industries?

We highly doubt that. As much as the people who are all-in on AI technology believe that this will empower everyone to be filmmakers, the technology isn’t there yet. As impressive as Sora is, the difference between the sort of videos is capable of churning out and what you see on the big screen is significant. For now.

Sora looks realistic, but it’s not. And that’s glaringly obvious. There is a certain unnaturalness to the lighting and a tough-to-pin-down soullessness. Take for instance this prompt, posted on OpenAI’s X account:

“A movie trailer featuring the adventures of the 30-year-old spaceman wearing a red wool knitted motorcycle helmet. blue sky, salt desert, cinematic style, shot on 35mm film. vivid colors.”

It would appear that Sora was able to follow the prompt well… but not really. The 17-second video doesn’t show any type of “adventure,” it shows the most generic dude I’ve ever seen wandering around a white sand environment, a dorky-looking spaceship, and several different camera angles. This is hardly a movie trailer, it’s a series of random images.

Also, as someone who shoots 35mm regularly, this doesn’t look anything like film. I’m sorry to the people at OpenAI but it just doesn’t (caveat everything with “not yet”). I’ll give them the vivid colors though!

So I don’t think Sora is going to take away the need for actual filmmakers and visual artists just yet. That doesn’t mean movie studios won’t think that it can, though — they are notorious for not having compassion for the people behind their films.

So no, we don’t think Sora or programs like it will turn everyone into filmmakers (yet), but it certainly might change the pitching process.

How is Sora trained?

We don’t know, and that’s the problem. Past AI programs like ChatGPT and DALL-E have been trained by preexisting art and media, which is a big problem because the creators of that art didn’t necessarily give consent. This has led many critics to say that AI isn’t capable of generating anything but remixed content that erases the hard work of actual creatives.

All OpenAI reveals about how Sora was trained is that it learned from “a wide range of video data without adapting or preprocessing the videos.”

But what video data? Where did it come from? Who authorized it? The fact that OpenAI isn’t fully transparent about this is a major problem.

Will this change everything?

While we’re highly skeptical that AI will replace the need for creatives, to say that this sort of technology isn’t going to have a massive effect on the world as we know it, is kind of naive.

OpenAI might be taking safety into account in the creation of Sora, but we’re not sure how these tools will be used in the future, or what kind of workarounds people with ill-intentions will find. As of now, there are no significant laws that address the use of AI and there are a bunch of ethical questions that need to be considered.

Clearly, we need some sort of AI legislation sooner than later, and it seems like some states are starting to take that on. Axios reports that New York Governor Kathy Hochul has just proposed legislation that would criminalize deceptive and abusive uses of AI. Vox reports that last week California state Senator Scott Wiener introduced a piece of AI legislation that attempts to establish “clear, predictable, common-sense safety standards for developers of the largest and most powerful AI systems.”

Whether these bills will go too far, or far enough, remains to be seen. AI technology is moving fast, and while what it is capable of is a little overblown by people on both sides of the AI debate, it is something we need to concern ourselves with now. The more we know, the better off we’ll be once the technology is fully indistinguishable from human efforts.