LongCat Video Avatar

Visit Site

LongCat-Video-Avatar: Audio-Driven AI Avatar for Long Video Generation

Added on December 24, 2025
LongCat Video Avatar

What is LongCat Video Avatar?

LongCat-Video-Avatar is a state-of-the-art audio-driven avatar model designed specifically for long-duration video generation. Built on the powerful LongCat-Video architecture, it delivers super-realistic lip synchronization, natural human dynamics, and long-term identity consistency, even across infinite-length video sequences.

LongCat Video Avatar's Core Features

Unified Multi-Mode Generation

Supports Audio-Text-to-Video (AT2V), Audio-Text-Image-to-Video (ATI2V), and audio-conditioned video continuation within one pipeline for flexible inputs and workflows.

Long-Sequence Temporal Stability

Cross-chunk latent stitching prevents degradation and visual noise accumulation, enabling seamless, artifact-free video across very long or theoretically infinite-length sequences.

Natural Human Dynamics & Expressiveness

Disentangled motion guidance decouples speech from motion, producing natural gestures, idle movements and expressive behavior even during silent segments.

Identity Preservation Without Copy-Paste Artifacts

Reference Skip Attention maintains consistent character identity over long durations while avoiding rigid, pasted-reference artifacts common in other models.

Efficient High-Resolution Inference

Coarse-to-fine generation and block-sparse attention enable practical 720p/30fps inference performance suitable for production pipelines and rapid iteration.