The Microsoft VASA-1 elevates AI

The Microsoft VASA-1 elevates AI

A group of AI researchers at Microsoft Research Asia have created an AI application that can animate still photos of people and audio files.

With appropriate facial expressions, the persons in the pictures are depicted conversing or singing along with the audio files.

A single static image and a voice audio clip may be used to create realistic talking faces for virtual characters and visually attractive visual affective skills (VAS) with this Microsoft tool, called VASA-1.

In a paper outlining the framework, researchers stated, “Our premiere model, VASA-1, is capable of producing lip movements that are exquisitely synchronized with the audio as well as capturing a large spectrum of facial nuances and natural head motions that contribute to the perception of authenticity and liveliness.”

How does Microsoft’s VASA-1 work?

The team claims that the fundamental design consists of an expressive disentangled face latent space that is developed via the use of movies, holistic facial dynamics, and a head movement generating model that operates in a face latent space.

“Our approach not only produces films of excellent quality with realistic head and face movements, but it can also generate 512×512 videos online at up to 40 frames per second with very little latency at startup. It opens the door for in-the-moment interactions with realistic avatars that mimic human speech patterns,” the researchers stated.