Alibaba Has Developed An AI-Generated Video Model “EMO”

Alibaba Has Developed An AI-Generated Video Model “EMO”

Following the groundbreaking release of Sora by OpenAI in the United States, Alibaba Group Holding Limited, a leading Chinese internet technology company, has intensified its efforts to keep pace in the realm of AI innovation.

Alibaba Group Holding Limited’s Intelligent Computing Research Institute has unveiled EMO, a novel AI image-audio-video model technology described as “an expressive audio-driven portrait video generation framework.” This innovative technology reportedly allows users to input a photo and an audio file, enabling EMO to produce AI-generated videos featuring realistic speech, singing, and seamlessly integrated dynamic elements, with a maximum duration of approximately 1 minute and 30 seconds. Notably, the generated expressions exhibit remarkable accuracy, synchronizing flawlessly with any voice, speed, and image.

For instance, EMO has demonstrated its capabilities in various scenarios, such as generating videos where characters from TV dramas deliver dialogues or sing songs, matching lip movements almost perfectly. Moreover, in a nod to OpenAI’s Sora, EMO has showcased its proficiency in creating AI-generated characters capable of speaking and singing, further exemplifying its versatility and potential.

Alibaba’s research team highlights EMO’s ability to produce sound avatar videos characterized by rich facial expressions, diverse head poses, and customizable durations based on input specifications. Furthermore, EMO boasts features such as audio-driven portrait video generation, dynamic expression rendering, support for multiple languages and portrait styles, and swift synchronization, among others.

Technically, the EMO framework leverages the Audio2Video diffusion model to generate expressive portrait videos through a multi-stage process. This process encompasses frame encoding, diffusion, and denoising operations, incorporating mechanisms like reference attention and audio attention to preserve character identity and adjust character actions effectively. Additionally, EMO’s temporal module enables manipulation of the time dimension and adjustment of movement speed, enhancing the realism and dynamism of the generated videos.

Alibaba’s commitment to AI innovation extends beyond EMO, with the company introducing various AI products and technologies like Qwen-VL and Outfit Anyone. Additionally, Alibaba continues to invest in AI startups, evident from its recent involvement in financing MoonShot AI, a domestic AI large-scale model team, with a substantial investment of $1 billion.

In the broader context of China’s AI landscape, Alibaba’s endeavors align with the broader trend of supporting early-stage AI startups through strategic investments, contributing to the advancement of large-scale AI models within the country. Despite a decrease in AI investment deals and total financing amount in 2023 compared to the previous year, Alibaba remains at the forefront of fostering AI innovation and development in China, exemplifying its commitment to technological advancement and leadership in the global AI landscape.