Home

Blog

Projects

Patrons

rss

2025

I am unfinished by design

Ring Doorbell Hacking

2024

Even More AI Video on Consumer Hardware

More AI Video generation on consumer hardware

AI Video generation on consumer hardware

Strategy for the good life

SDXL LoRa Training

Making sense of voices

Voice Cloning and TTS with F5

Designing a website

Feedback on your UX research

andremolnar.com V6

2023

ML on Mac: Peformance without CUDA

People Detection and Segmentation

Machine Learn all the things

More Text, Image or Video to Video from the comfort of your own home

You still just need is 4090 with 24GB of VRAM (less if you're patient)

Written by Andre Molnar 11/18/2024

video

comfyui

cogxvideo

AI generated video of the distracted boyfriend meme created in comfy UI using Cogxvideo

Another day and another model release

CogvideoX dropped its 1.5 version of their model a couple of days ago.

The authors say:

“

The CogVideoX model is designed to generate high-quality videos based on detailed and highly descriptive prompts. The model performs best when provided with refined, granular prompts, which enhance the quality of video generation. It can handle both text-to-video (t2v) and image-to-video (i2v) conversions.

”

They recommend using multimodal LLMs to help formulate the descriptive video prompts. I haven't tried yet, but I will experiment by creating a workflow with Florance2.

My early testing

As you can see in the example above using a Image to Video workflow using CogvideoXWrapper nodes in ComfyUI, the model does a pretty good job at bringing an image to life.

Resolution

Version 1.5 has a higher resolution than the first release allowing, but I have stayed to the 480 x 720 resolution for my tests.

Speed

Cogvideo is comparable to Mochi1 on a beefy consumer GPU. It takes about 3 minutes to render 49 frames of video. That was with 25 steps of generation. Time increases with both frame and step count.

What about other video generation workflows?

As mentioned in the AI Video generation on consumer hardware post, there are several workflows including video to video with ControlNet and Prompt Travel to guide generations character sheets to guide character consistency, but you've always had to fight againts artifacts and unexpected surprises on any given frame. At this stage, it's not an either or situation. You can use all these techniques together to create longer form story telling with these tools.