You are currently viewing Meta’s AI can generate video from text

Meta’s AI can generate video from text

So Meta is now able to showcase its Make-A-Video artificial intelligence, and the videos it generates are as surprising as they are scary.

Like other similar templates, Make-A-Video prompts you to enter a description of what you want to generate. So, after typing “A dog wearing a red superhero cape and flying in the sky”, you will get the expected result. Keep in mind that this technology is still in its infancy and the generated videos could be interesting to say the least.

Make-A-Video is not yet available to the public. However, some have already been able to try it. Despite its early days, the results are impressive, and we can’t wait to see how this artificial intelligence will progress over the years. Like image-generating AI, it could soon replace some of the most popular areas of the internet, such as image and video banks.

“Hey, Make-A-Video, I want you to draw a couple in the rain. »

Meta managed to develop a powerful tool. However, for this artificial intelligence to work, very powerful computers would have to be used. Remember that AIs capable of generating images already required a lot of technical resources; now an AI that can turn text into video needs a lot more.

Why so much power? Let’s not forget that videos are nothing more than a series of assembled images, with integrated sound. Now imagine all the time it takes an AI to generate a single frame, and multiply that by the number of frames in a minute of video (there could be thousands). Add to that the fact that all these generated images have to be put together in a single file. No doubt, it’s crazy.

According to Tanmay Gupta, a computer vision researcher at the Allen Institute for Artificial Intelligence, the results obtained by Meta’s Make-A-Video AI are very promising. Additionally, it demonstrates the model’s ability to capture 3D objects, with new subject and background detail appearing as the camera rolls. It also demonstrates that the AI ​​is able to differentiate between depth and light sources.

However, Gupta adds, “The research community still has a long way to go, especially if these systems are to be used for professional video editing and content creation. He adds that the technology is still struggling to generate interactions between objects in the scene.

“Make-A-Video’s research builds on recent advances in text-to-image generation technology, designed to enable text-to-video generation. The system uses images with descriptions to learn what the world looks like and how it is usually described.

He also uses unlabeled videos to learn how the world moves. Using this data, Make-A-Video lets you bring your imagination to life by generating whimsical and unique videos with just a few words or lines of text. »


One of the most striking aspects of this artificial intelligence is its ability to create without the need for paired text and video data. Until now, many image generators were based on content galleries, which already combined text and video. Make-A-Video, however, doesn’t need as much information to work, which turns out to be a significant advantage.

This AI can be used in a variety of ways. Whether it’s giving movement to a single image or filling a sequence of images with movement. Besides, it can also create variant videos from an original. The style you require, such as DALL-E or Midjourney, depends on your imagination.