2024


2023


Even more text to video on consumer hardware

I'm beginning to sound like a broken record

There's yet another way to generate video on consumer hardware with LTX Video

LTX Video generated video of a monster exploding out of calm tropical waters

The LTX Video model

LTX Video is from Light Tricks.

LTX-Video is the first DiT-based video generation model that can generate high-quality videos in real-time. It can generate 24 FPS videos at 768x512 resolution, faster than it takes to watch them. The model is trained on a large-scale dataset of diverse videos and can generate high-resolution videos with realistic and diverse content.

LTX has text to video, image to video, and video to video workflows.

Good quality and faster than Mochi1 and CogvideoX

My early experiments are super promising, and the model is super fast. I can confirm almost real time generations.

Extending capabilities

There's already LTX Tricks that implements, among other things, RF-Inversion In rectified flows (think Flux), inversion involves mapping the image back into the latent space using the model's rectified stochastic differential equations. The goal of inversion is to ensure the resulting latent representation can faithfully reconstruct the original image and allow for meaningful edits. That sounds like fancy talk for 'better video to video workflows,' but I haven't had great success yet in my experimentation.

Too many toys, too little time

From the beginning I've said that the most exciting and most frustrating part of AI is the pace of innovation. sEvery day there's a new tool unlocking a new ways to express ourselves, but there's never enough time to explore them all.