Microsoft’s new VASA-1 AI model can turn photos into ‘talking faces’

Microsoft has provided a glimpse of VASA-1, its new artificial intelligence (AI) model, which can turn still images into ‘talking faces’ to great effect.

The end product can be impressive or terrifying, but the lip-sync capability of this project is very realistic. At present, the model is only available as a research preview to Microsoft researchers but the demos released to the public have created a stir.

It’s Microsoft’s latest move in the ongoing battle for generative AI supremacy. Earlier this week they announced a huge AI investment in UAE. While rivals Meta released their AI assistant across its platforms.

The premise is that anyone can upload a photo and voice sample to create an apparent live, talking head of your own face. VASA-1 takes a single photo and a brief audio file to convert into a quite convincing talking face video.

What makes it stand out, is the quality of the lip-sync, head movements and recognizable facial features.

There will be genuine uses for such a program but safeguards will be required, as ever with AI, due to the potential for misinformation and malicious intentions. Microsoft has acknowledged this with an admission “like other related content generation techniques, (VASA-1) could still potentially be misused for impersonating humans.”

The research report continued, “Given such context, we have no plans to release an online demo, API, product, additional implementation details, or any related offerings until we are certain that the technology will be used responsibly and in accordance with proper regulations.”

This is wild.

Microsoft just unveiled their hyper-realistic talking head AI:

VASA is a framework for generating lifelike talking faces of virtual characters with visual affective skills (VAS).

All from a single static image and audio clip.

Their first model, VASA-1, can:


— Alex Banks (@thealexbanks) April 18, 2024

What will VASA-1 be used for?

The lip-sync qualities of this program need to be seen to be believed, as shown by the imagery of Mona Lisa rapping. Word perfect? Pretty much. It has been said researchers were pleasantly surprised by just how good this performed.

VASA-1 appears to be a great fit for animation, from gaming to social media avatars and AI filmmaking but as stated above, there are no current plans for the project to develop beyond a research demonstration.

That could change as developers will be very keen to get working with the model.

Image credit: Microsoft

Read More

Graeme Hanna