Home

Blog

Projects

Patrons

rss

2025

I am unfinished by design

Ring Doorbell Hacking

2024

Even More AI Video on Consumer Hardware

More AI Video generation on consumer hardware

AI Video generation on consumer hardware

Strategy for the good life

SDXL LoRa Training

Making sense of voices

Voice Cloning and TTS with F5

Designing a website

Feedback on your UX research

andremolnar.com V6

2023

ML on Mac: Peformance without CUDA

People Detection and Segmentation

Machine Learn all the things

Text to Video from the comfort of your own home

All you need is 4090 with 24GB of VRAM (less if you're patient)

Written by Andre Molnar 11/5/2024

video

comfyui

stable diffusion

mochi

A short AI generated video of an insect crawling on a dew covered leaf

AI generated video of an insect crawling on a dew covered leaf created in comfy UI using Mochi1

ComfyUI now supports video generation with the Mochi models

As of November 4, 2024, ComfyUI supports video generation with the Mochi models on consumer hardware.

Mochi 1 from Genmoai is an open source Apache 2 licensed model.

How to get started

Update ComfyUI to the latest version to ensure that the new nodes are available. Save the Webp video above and load it into the ComfyUI interface to see the most basic workflow.

Videos are limited in both length and resolution

For now the length and quality of videos are limited, but that doesn't mean you can't upscale them or stich multiple clips together.

Full control

Genmoai claims that Mochi is a very hackable architecture. I assume that means it won't be long until we have everything from IPAdapters and Controlnets to guide the generation process.

Speed

Mochi is fast. With a beefy GPU you can expect to generate about 50 frames or roughly 2 seconds of video in about three minutes.

Couldn't you always do Video in ComfyUI?

Yes! Video is just a sequence of images with temporal coherence. There are several workflows including video to video with ControlNet and Prompt Travel to guide generations character sheets to guide character consistency, but you've always had to fight againts artifacts and unexpected surprises on any given frame. There are all sorts of techniques for 'cleaning up' and 'denoising' the generated outputs of these techniques. There are also several techniques for combining base images or videos with virtual pupetry to create talking heads.

Are those video techniques obsolete?

No! All these techniques are still valid and useful and can all be combined to generate longer form storty telling with these tools.