2024
2023
AI generated video of the distracted boyfriend meme created in comfy UI using Cogxvideo
CogvideoX dropped its 1.5 version of their model a couple of days ago.
The authors say:
“
The CogVideoX model is designed to generate high-quality videos based on detailed and highly descriptive prompts. The model performs best when provided with refined, granular prompts, which enhance the quality of video generation. It can handle both text-to-video (t2v) and image-to-video (i2v) conversions.
”
They recommend using multimodal LLMs to help formulate the descriptive video prompts. I haven't tried yet, but I will experiment by creating a workflow with Florance2.
As you can see in the example above using a Image to Video workflow using CogvideoXWrapper nodes in ComfyUI, the model does a pretty good job at bringing an image to life.
Version 1.5 has a higher resolution than the first release allowing, but I have stayed to the 480 x 720 resolution for my tests.
Cogvideo is comparable to Mochi1 on a beefy consumer GPU. It takes about 3 minutes to render 49 frames of video. That was with 25 steps of generation. Time increases with both frame and step count.
As mentioned in the AI Video generation on consumer hardware post, there are several workflows including video to video with ControlNet and Prompt Travel to guide generations character sheets to guide character consistency, but you've always had to fight againts artifacts and unexpected surprises on any given frame. At this stage, it's not an either or situation. You can use all these techniques together to create longer form story telling with these tools.