AI-Native & Innovation

Video-to-Video

Video-to-video: a hybrid AI and 3D frame from LEGS

Video-to-video is an AI workflow that takes existing footage as input and produces a restyled version, used in animation to apply a hand-drawn or painterly look to live-action plates, to convert 3D renders into a different aesthetic, or to refine the output of earlier AI passes. It is the closer relative of rotoscoping, with the rendering work done by a model rather than by hand, frame by frame.

Inside a hybrid pipeline, video-to-video sits late, after a controlled plate exists. A 3D animator blocks the action with clean controllable cameras, the renders go through a video-to-video pass that takes the geometry-true frames into a stylised world, and a compositor lands the result. The same path works for live-action: a reference plate carries the performance, the model carries the look. This split keeps performance under acting for animation discipline and pushes the styling to a faster pipeline.

On hybrid AI projects like LEGS, video-to-video is used to bridge between clean 3D output and a more painterly look that would be slow to build by hand. The structure of the shot is preserved from the original render, while the texture, brushwork, and lighting respond to a style reference. The style transfer approach is a close cousin, focused on still frames; video-to-video adds the temporal dimension and a much higher cost of getting it wrong, because flicker is brutal at twenty-four frames a second.

The honest limits are flicker and drift. Frame-by-frame restyling can produce small inconsistencies that read as flicker, especially at high motion. Production pipelines pair video-to-video with temporal consistency tooling and a clean-up pass to settle the result. The other limit is faithful brand colour: subtle palette work can shift across the pass, so we lock a reference and check colour at compositing. For hero shots, we still favour hand-keyed work through our hybrid AI animation service. A planning note worth naming: the input plate determines what video-to-video can do. A clean, well-lit, controllable 3D pass produces a clean, controllable restyle. A messy plate produces a messy restyle. We treat the plate as the contract, sign it off before the AI pass runs, and lock the cameras so a re-pass against an updated style reference does not destabilise the comp. The same discipline used to lock a storyboard before animation begins applies here: the cheapest moment to change your mind is before the slow stage runs, not during it.

Myth Labs operates production video-to-video for brand and broadcast work where a controllable plate exists and a faster route to the agreed look is needed. For the wider context, see how artists are using AI without losing the craft.

Related

Frequently asked questions

Is video-to-video the same as a filter or effect?

No. A filter applies a fixed rule to every frame. Video-to-video uses a generative model conditioned on a reference, which means the look can be much further from the source than a filter can take you. A LUT changes colour; video-to-video can take a 3D render to a watercolour world.

When does it earn its place over hand-painted animation?

When the action is performance-led and the look is style-led. A 3D pass nails the character animation and camera, a video-to-video pass nails the look. Hand-painting reaches further visually, but at a different time and cost.

How do you avoid the look flickering?

Three things: a locked style reference, temporal consistency techniques inside the model, and a compositing clean-up pass. For broadcast delivery, the clean-up is non-negotiable.

Sources (6)

Academic papers, recognised industry standards, and canonical industry texts that back up claims in this entry.

  1. Video Style Transfer: A Unified Algorithm for Global and Local Correspondence-Based Video Editing. Chen, He, Wang, et al., ACM Transactions on Graphics, 2008Supports: video-to-video restyling workflow
  2. Real-Time User-Guided Image Colorization with Learned Deep Priors. Zhang, Isola, Efros, ACM Transactions on Graphics, 2017Supports: preserves structure while changing style
  3. Neural Video Portraits. Kim, Zimmermann, Pumarola, et al., ACM Transactions on Graphics, 2018Supports: portrait video relighting and reenactment
  4. Everybody Dance Now. Chan, Ginosar, Zhou, Efros, arXiv, 2019Supports: source-video pose transfer pipeline
  5. First Order Motion Model for Image Animation. Siarohin, Lathuilière, Tulyakov, Ricci, Sebe, NeurIPS, 2019Supports: motion transfer from driving video
  6. StyleBlit: Fast Projection onto Style Surfaces. Fišer, Hanika, Korman, Zwicker, ACM Transactions on Graphics, 2020Supports: video and render stylization pipeline