Skip to content

Conversation

onzag
Copy link

@onzag onzag commented Aug 7, 2025

This update adds optional_cond_strength to the LTXVBaseSampler which overrides strength if used, it keeps strength in there for backwards compatibility.

It also adds support for optional conditional frames to the extend sampler.

This makes it so that the functionality is better than the current LTXVLoopingSampler as it can be made step by step and deeply controlled.

image

I have some video output examples where I put a source video, and then a frame that goes at the end to extend the video with, this means that it achieves virtually infinite frame controlled video.

I am making this PR with the objective of review since I am puzzled with the optional_guiding_latents and why they needed to delete the overlap, it seemed unecessary as I'd assume the guiding latents to be the size of the new frames plus the overlap, however I needed to change the way the calculation was done because otherwise the output was extremely noisy and chaotic, and once I did it started working; I tested with distilled FP8 due to limitations of my setup, and guiding latents have never worked very well for me, however I plan to expand the documentation further.

I am also a bit puzzled about why the overlap is done the way it does, but the way it was done before, it caused lots of distortions once I set up a reference image.

Standard workflows should be unaffected.

This work in progress would help towards fixing issue #264

As the LoopingSampler is virtually a combination of these two samplers, with this new optional arguments, one could update and make a supercharged LoopingSampler as it would be able to take multiple conditional images accross the tiles; which allows, something very powerful, controlled animation with a storyboard based approach.

However it may be more worth it to use the base sampler and extend with a meticulous approach; as this would allow finer control than the LoopedSampler does, because you can choose what you like and what you don't in each temporal tile rather than generating all at once.

Let's work together as there's an imporant educational objective to be achieved here, trying to bring this tool to the classroom.

onzag added 2 commits August 7, 2025 19:46
This update adds optional_cond_strength to the LTXVBaseSampler which overrides strength if used, it keeps strength in there for backwards compatibility.

It also adds support for optional conditional frames to the extend sampler.
The hybrid sampler allows for both being used as a base sampler and as a extend sampler, this enables an infinite video workflow that is more controllable than the LoopingSampler while still being simpler than having two separate workflows
@onzag
Copy link
Author

onzag commented Aug 8, 2025

I have added a new sampler, it's not a big deal as it is but merely a combination of the Base + Extend samplers; but it enables a very powerful workflow I am developing; when combined with stable diffusion it achieved camera control, character control for large movements and a high degree of fine tuning that the LoopingSampler doesn't allow as it is.

image

It nevertheless ditches the multi prompt and uses a single prompt per fragment, yes you have to chain them, but trial and error is part of the ordeal.

In the example I was able to turn a deer's head to the left and then back, it also enabled character consistency.

Upscaling is the next challenge to tackle, it's a bit annoying right now as you have to keep track of the images used and the used indexes to pass over a classic upscale workflow; it's a work in progress.

onzag added 5 commits August 8, 2025 19:12
Added the denoised created only latents as output in a way that they can be upsampled for reference purposes
@onzag
Copy link
Author

onzag commented Aug 10, 2025

image

The hybrid sampler now supports guiding as well, with the special workflow I think this is becoming very production ready for community animators... even when for fine movements it requires basically to use canny, pose works with humans nevertheless, canny works with everything, even when you can successfully transfer larger movements with pose; but canny allows for expressiveness.

image

This was achieved with a combination of a video of a woman doing that pose, and hand modifications for the canny that was generated and then refed in the workflow, basically total control of even something non-human; that's the point of the hybrid workflow over using something like the looping sampler; yes looping sampler is easier, but if you want fine control, this step by step method.

Also for some reason the num_frames used when guiding must be equal to the frames to be generated and you must pass -1 to the generation, or otherwise it doesn't work, almost makes my hair pull off.

The reference images seem to work magically over the guidance, even when you guys had disabled that within the node, I enabled it and it worked like a charm; the algorithm prioritizes the image over the pose, but it does so well, I even sent contradicting functions and it causes like an inbetween fast move which is actually good because that's kind of what you expect with this contradiction; but if the reference image and the canny/pose match, it does great, but this means doing repairs for consistency with the consistency repair method I put in the workflow are much easier, so you are like a director.

It's not the easiest to use workflow but I think it's coooking something.

@onzag
Copy link
Author

onzag commented Aug 10, 2025

image

Tomorrow working on the upscale, I decided to save relevant data for each generation in chunks that are part of the result, you save your latent (not the video), save your chunk info, save the prompt, and save the images you used as you grow this latent, you use pipes to join (just like it works already on the multi prompt), so I just used it.

image

This means that it can use the same method that you created the video to upscale the video, so if your images are higher quality (they should), then that gets applied for contextual information in the latent space; or at least I think so, resulting in high quality results in upscale, not sure if the IC-lora detailer will have much effect but I need to test what the chunk upsampler going to do with it.

@btakita
Copy link

btakita commented Aug 11, 2025

This PR is working great. Much cleaner than chaining LTXVAddGuide nodes. Thank you @onzag for your contribution.

@onzag
Copy link
Author

onzag commented Aug 16, 2025

This PR is working great. Much cleaner than chaining LTXVAddGuide nodes. Thank you @onzag for your contribution.

@btakita I have some more improvements coming, but I've been testing for quite a while, I'd consider this PR outdated now until I clean up the new version which has an upscaler mechanism based on the chunk_info from the hybrid sampler.

Note that there's a bug in the calculation of the Hybrid sampler in this PR which I forgot the first single frame, this is not causing errors in this specific PR but in what I was planning to follow up for upscaling.

I noticed the LoopingSampler was causing quality degradation but it was truly the only way to generate longer than 30s videos, otherwise I was getting OOM errors, so I created an upscaler sampler which uses the information coming from the hybrid sampler and determines an approach for upscale depending on the size of the latents and the chunks.

That saying the hybrid sampler on this PR works fine for creation and extending, it's just the output, chunk info, generated idxs, etc...

The final purpose of fixing and building this workflow like that is because I am planning a future integration with opensource GIMP in order to do a small course in my community; so once I am done with this (I am almost), I will move on creating nodes for external integration in comfy, and then building a plugin for GIMP that allows comfyui integration within itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants