Skip to main navigation Skip to search Skip to main content

Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise

  • Ryan Burgert
  • , Yuancheng Xu
  • , Wenqi Xian
  • , Oliver Pilarski
  • , Pascal Clausen
  • , Mingming He
  • , Li Ma
  • , Yitong Deng
  • , Lingxiao Li
  • , Mohsen Mousavi
  • , Michael Ryoo
  • , Paul Debevec
  • , Ning Yu
  • Netflix Eyeline Studios
  • Stony Brook University
  • University of Maryland, College Park
  • Netflix, Inc.
  • Stanford University

Research output: Contribution to journalConference articlepeer-review

10 Scopus citations

Abstract

Generative modeling aims to transform random noise into structured outputs. In this work, we enhance video diffusion models by allowing motion control via structured latent noise sampling. This is achieved by just a change in data: we pre-process training videos to yield structured noise. Consequently, our method is agnostic to diffusion model design, requiring no changes to model architectures or training pipelines. Specifically, we propose a novel noise warping algorithm, fast enough to run in real time, that replaces random temporal Gaussianity with correlated warped noise derived from optical flow fields, while preserving the spatial Gaussianity. The efficiency of our algorithm enables us to fine-tune modern video diffusion base models using warped noise with minimal overhead, and provide a one-stop solution for a wide range of userfriendly motion control: local object motion control, global camera movement control, and motion transfer. The harmonization between temporal coherence and spatial Gaussianity in our warped noise leads to effective motion control while maintaining per-frame pixel quality. Extensive experiments and user studies demonstrate the advantages of our method, making it a robust and scalable approach for controlling motion in video diffusion models. Please see our project webpage; source code and checkpoints are available on GitHub.

Original languageEnglish
Pages (from-to)13-23
Number of pages11
JournalProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
DOIs
StatePublished - 2025
Event2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2025 - Nashville, United States
Duration: Jun 11 2025Jun 15 2025

Keywords

  • diffusion models
  • motion control
  • noise warping
  • video generation

Fingerprint

Dive into the research topics of 'Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise'. Together they form a unique fingerprint.

Cite this