AI-Native & Innovation

Video Localisation with AI

Video localisation with AI: a frame from Inchstones for Nestlé Compleat

Video localisation with AI is the use of generative voice and lip-sync tools to adapt a delivered animation for different languages and markets without re-animating each version, used in advertising and broadcast where one master film needs to ship in many regions.

Traditional localisation re-records the voice in each language, then re-animates the lip sync to match. The cost grows linearly with the number of languages. AI localisation generates the new voice from a script (sometimes cloning the original performer's tone for continuity), then drives a lip-sync model that adjusts the character mouth shapes to match the new audio. A single animated master can be re-versioned for many language markets without re-animating the original.

On work like Inchstones, where the cast of characters carries the brand across markets, this approach saves significant time and budget compared to re-animating each language. The brand performance stays consistent; the language adapts.

Limits worth naming: rights for voice cloning vary by jurisdiction and union, lip-sync quality is best for short forms and front-on shots, and tonal nuance can drift between languages. We treat AI-driven localisation as a tool for breadth, not for the highest-stakes hero markets where a native voice performance is still preferred.

Myth Labs runs production localisation pipelines for global campaigns where dozens of language versions are needed inside a single master timeline.

Related

Sources

Academic papers, recognised industry standards, and canonical industry texts that back up claims in this entry.

  1. VideoDubber: Machine Translation with Speech-Aware Length Control for Video Dubbing. Wu, Y., Tan, X., Qin, T., et al., AAAI Conference on Artificial Intelligence, 2023Supports: End-to-end AI dubbing constrained by source-video timing
  2. Whisper: Robust Speech Recognition via Large-Scale Weak Supervision. Radford, A., Kim, J. W., Xu, T., Brockman, G., McLeavey, C., Sutskever, I., arXiv (OpenAI / ICML), 2023Supports: Speech recognition foundation for automatic localisation pipelines
  3. Wav2Lip: Accurately Lip-syncing Videos In The Wild. Prajwal, K. R., Mukhopadhyay, R., Namboodiri, V., Jawahar, C. V., ACM Multimedia, 2020Supports: Speech-driven lip-sync for cross-language video localisation

Frequently asked questions

How does this affect voiceover talent?

It changes the contract more than it removes it. We commission the original performance under licence, with localisation rights agreed up front. For premium markets, native voiceover talent is still preferred. AI localisation tends to cover the long tail of language versions where a full bespoke recording would not be commissioned at all.

Is the lip sync convincing?

It is for short forms and front-on shots, the bulk of advertising content. It is less convincing for cinematic close-ups under intense scrutiny. We pick the technique to match the bar of the project and the channel, and we do not over-promise on hero shots.

Who handles the voice rights?

We do, as part of the production. Voice cloning rights vary by region and platform. We track the licence terms and only ship deliverables that are cleanly licensed for the markets the brand intends to use them in.