2024


2023


Text to Video from the comfort of your own home

All you need is 4090 with 24GB of VRAM (less if you're patient)

A short AI generated video of an insect crawling on a dew covered leaf

AI generated video of an insect crawling on a dew covered leaf created in comfy UI using Mochi1

ComfyUI now supports video generation with the Mochi models

As of November 4, 2024, ComfyUI supports video generation with the Mochi models on consumer hardware.

Mochi 1 from Genmoai is an open source Apache 2 licensed model.

How to get started

Update ComfyUI to the latest version to ensure that the new nodes are available. Save the Webp video above and load it into the ComfyUI interface to see the most basic workflow.

Videos are limited in both length and resolution

For now the length and quality of videos are limited, but that doesn't mean you can't upscale them or stich multiple clips together.

Full control

Genmoai claims that Mochi is a very hackable architecture. I assume that means it won't be long until we have everything from IPAdapters and Controlnets to guide the generation process.

Speed

Mochi is fast. With a beefy GPU you can expect to generate about 50 frames or roughly 2 seconds of video in about three minutes.

Couldn't you always do Video in ComfyUI?

Yes! Video is just a sequence of images with temporal coherence. There are several workflows including video to video with ControlNet and Prompt Travel to guide generations character sheets to guide character consistency, but you've always had to fight againts artifacts and unexpected surprises on any given frame. There are all sorts of techniques for 'cleaning up' and 'denoising' the generated outputs of these techniques. There are also several techniques for combining base images or videos with virtual pupetry to create talking heads.

Are those video techniques obsolete?

No! All these techniques are still valid and useful and can all be combined to generate longer form storty telling with these tools.