It converts portrait photos into avatars capable of speaking, singing, and performing
Alibaba has unveiled Wan2.2-S2V (Speech-to-Video), which converts portrait photos into avatars capable of speaking, singing, and performing.
Wan2.2-S2V is part of Alibaba’s Wan2.2 video generation series, and can generate high-quality animated videos from a single image and an audio clip.
It offers character animation capabilities including portrait, bust, and full-body perspectives and can generate character actions and environmental factors dynamically based on prompt instructions.
Alibaba says Wan2.2-S2V is powered by advanced audio-driven animation technology to provide lifelike character performances, ranging from natural dialogue to musical performances. It can also handle multiple characters within a scene.
The avatars include cartoon, animals and stylised characters.
Wan2.2-S2V combines text-guided global motion control with audio-driven fine-grained local movements to enable natural and expressive character performances across complex and challenging scenarios, says Alibaba.
Alibaba’s Wan2.2 is made up of open-source large video generation models incorporating the MoE (Mixture-of-Experts) architecture, which significantly elevates the production of cinematic-style videos with a single click, says the company.
The series includes a text-to-video model; an image-to-video model; and a hybrid model that supports both text-to-video and image-to-video generation tasks within a single framework.
No comments yet