Innovation is moving at a scarily fast pace.
-Bill Gates
Six months ago I wrote about AI creating full length movies within five years. Maybe you didn't believe me? Let me reiterate:
"Imagine this. You and your loved one sit down to watch a movie. Except the movie you want to watch doesn't yet exist. One of you speaks into your device that you want to watch a combination rom-com and heist movie with four plot twists, has a giant elephant, a washed up chef, a happy ending, and stars George Clooney and Julia Roberts. One of you goes to make the popcorn and five minutes later your new personalized movie is ready to watch. As you watch the movie, you laugh, you cry, your adrenaline pumps, and your content desires are fulfilled. You end the night happy and satisfied.
Does that sound farfetched? It's a future that colleagues of mine who are working on aspects of this technology think will happen within five years. This is a future of intense personalization and customization. Are people ready to continually articulate their desires?"
Well, we just got a lot closer to that vision of the future. If you haven't seen it yet, go check out Sora from Open AI. It's their latest text-to-video model and the results are a step up from the current state-of-the-art. The videos while not perfectly mimicking real video, are scary good and longer in length. They look like this:
Exponential
The future is fast approaching. We know AI growth and advances are exponential, so let's do some simple math to estimate where we'll be in a year. In six months we went from AI generating lousy 6 second clips to 60 second high quality clips. Using this data point, every six months we get a 10 times increase in length along with some associated change in quality. At this rate, in a year from now we'll likely be at 6000 second clips or 1hr 40 min (a full length movie) clips with near realism. That's a rough estimate, but I wouldn't surprised if we're near that in a year.
The speed we're seeing, particularly around the creation of media reminds me of Max Tegmark's Life 3.0. Tegmark starts the book with a prelude about a fictional "Omega Team" that creates AGI under a loss leading group within a large tech company. The technology focuses on self-improvement until it can be applied to different areas. The Omega story is fantastic and I highly recommend reading it. Below is a quote about when they start using it to make films, on an order I think we'll see in about a year.
They took this to mean that when Prometheus made its own films, it would know what success meant. The Omegas instructed Prometheus to focus on making animation at first, to avoid embarrassing questions about who the simulated actors were.
On Sunday night, they capped their wild weekend by arming themselves with beer and microwave popcorn, dimming the lights and watching Prometheus’ debut movie. It was an animated fantasy-comedy in the spirit of Disney’s Frozen, and the ray tracing had been performed by boxed Prometheus-built code in the Amazon cloud, using up most of the day’s $1 million MTurk profit.
As the movie began, they found it both fascinating and frightening that it had been created by a machine without human guidance. Before long, however, they were laughing at the gags and holding their breath during the dramatic moments. Some of them even teared up a bit at the emotional ending, so engrossed in this fictional reality that they forgot all about its creator.
Content Creation
Last year Hollywood writers went on strike. Part of resolving the strike were protections from AI created content. You can read more about the individual measures here and here. Unfortunately, these measures don't stop people who aren't part of those organizations or unions from creating content.
Think that can't happen? Where would an independent group not part of a typical union can create blockbuster content? Look no further than the kids digitally created Cocomelon. Cocomelon has no Hollywood writers nor recognizable stars. All of their content is created through digital animation. Cocomelon gained notoriety, fame, and money creating YouTube content and only later signed deals with streaming platforms. How popular is Cocomelon? Wikipedia states:
Cocomelon was ranked #1 on Reelgood's list of Netflix shows for 2020, ahead of The Office and The Queen's Gambit.
At the same time, the Wall Street Journal estimated the ad revenue for Cocomelon in 2019 to be $120M. CashNetUSA estimated the 2022 revenue for Cocomelon to be $282M. Conservatively, the show at earned over $700M from 2019 to today. For reference, the 200th highest grossing film "The Smurfs" came out in 2011 and earned $563M. That puts Cocomelon in the ranks of the highest grossing films of all time, and it's just a kids show.
Think about that. A digitally created show without Hollywood writers or stars was the best show on Netflix and has made as much as some of the highest grossing films of all time. What's stopping more content like that from appearing? The limiting factor is no longer the creation but the distribution of content. You can make content quickly but how can you monetize it without a large channel for viewership? I envision the rise of a "New Hollywood" which consists of small studios creating AI generated content. They'll have their own lineup of digital actors, similar to how you see virtual influencers today. These small studios will find new or digital channels that circumvent traditional avenues like movie theaters until the tide can't be held back. Just like the Innovator's Dilemma talks about large companies being overtaken by smaller competitors, I think there's a chance for the New Hollywood to overtake Old Hollywood.
Rays of Hope
There is hope for these content creators. There is a wrinkle with these foundation models; final products lack variation and creativity from each other. Someone did a comparison of the prompts used for Sora with what the content looks like in Midjourney, solely from an image perspective. See the image and video below. The top part of the image is a clip from Sora and the bottom images are the ones from Midjourney. The results are strikingly similar between the two foundation models. This actually makes sense when you think about it. Each of these foundation models is primarily using the same transformer architecture along with most of the same input data (all the data available on the internet). Consequently it makes sense that the same prompts yield very similar results since the models converge towards the same results.
The challenge in content creation with AI will be in figuring out how to communicate what you actually want with exactness. If you don't care about exactness or don't have a good vision for what you want, you're likely to simply accept the results. But if you want to craft something above the noise, you will still need to put in a lot of effort to create high quality content. As I wrote previously:
Instructing AI and agents is going to be a continual challenge that will not be easily resolved in the future. This is a communication issue. Two people can talk and still not be on the same page about a topic on which they both agree. Going from human to machine becomes even harder.
We're living in an exciting time while people figure out how best apply AI. There will be many challenges along the way as people realize that getting what they want is not as simple as just entering some text. No matter the medium, there's always a lot of effort to get things right. However, not only can AI let you iterate way more than before, it can also give you abilities you might not have had before.