Refactor: Modularize Inpaint Pipeline #6120

dunkeroni · 2024-04-03T09:17:26Z

Summary

Keeping this as Draft until tested with more models and settings.

This is a rewrite of the inpaint mask code in the diffusers pipeline. It centralizes all the mask logic to a single object that gets called from multiple points. The calls are based on location in the process as a proof of concept for a more elaborate modular extension system of the denoise process. Baby steps first.

Our previous process had mask-based decisions and code in five locations, across multiple object levels, and the mask was inverted partway through. Dragging it all together was necessary for debugging.

Related Issues / Discussions

This (mostly) fixes the issues with inpaint models that were introduced with gradient mask changes. If an Inpaint model is used and a denoise mask is provided that does not include a masked_latents field (generated by providing the original image and VAE to the Create Denoise Mask node), then it will synthetically create one by filling the mask area with predetermined values based on the model architecture. Doing so avoids a second VAE-encode on all inpainting/canvas processes.

This is not one-to-one with the standard setup. The new method has been tuned to reduce artifacts as much as possible, but there can still be some unusual highlighting and occasionally edges of inpainted objects get chopped off.

The dog on the left was inpainted with a VAE-encoded masked_latents, and the dog on the right was from a synthetic fill. The mask edge of the synthetic dog has lost part of the hindquarters and added an errant sparkle between the bench slats.

Oddly, using pure synthetic fill made a significantly higher success rate for inpainting objects from the prompt, nearing 100% before adjusting for artifacts. With the fixes in place to avoid edge artifacts, the success rate for synthetic is much closer to VAE, but still slightly higher. VAE will ignore the prompt and just fill in the bench a bit more often than synthetic does at the current settings.

Code is included for synthetic fill of SD-2 models, but it does not work at this time. Something is different about the way it expects masked_latents to look.

QA Instructions

Since this affects all of the inpainting pipelines, we need to test a variety of settings on all model architectures to make sure there is nothing strange or unexpected happening.

Checklist

The PR has a short but descriptive title, suitable for a changelog
Tests added / updated (if applicable)
Documentation added / updated (if applicable)

psychedelicious · 2024-04-03T21:08:13Z

IMO an extra VAE encode and some additional support code for inpainting is an acceptable tradeoff for no possibility of artifacts.

I'm imagining a scenario where you inpaint and it's almost perfect, but there are some artifacts. So you you inpaint the artifacts, but this then creates more artifacts, and so on. We were OK with a whole extra denoising step before - reducing that overhead to a single VAE encode seems very reasonable.

dunkeroni · 2024-04-03T23:58:23Z

IMO an extra VAE encode and some additional support code for inpainting is an acceptable tradeoff for no possibility of artifacts.

I'm imagining a scenario where you inpaint and it's almost perfect, but there are some artifacts. So you you inpaint the artifacts, but this then creates more artifacts, and so on. We were OK with a whole extra denoising step before - reducing that overhead to a single VAE encode seems very reasonable.

I was hoping it could get to a perfect result, but I think it will have to just live as a "compatibility fallback" procedure if no encoded image is provided.

The issue is how to avoid the extra encode when we definitely don't need it. It is only necessary when a user is inpainting on canvas with an Inpaint trained model. I would like to avoid what we had before, which was that every time you inpaint on canvas it would create the masked image and send it through an extra VAE encode... and then the overwhelming majority of the time it goes unused because normal models don't need it. We have a few options for that:

Find a way for Canvas to adjust the graph based on inpaint models (preferably not, since if it is possible it's more complexity than it's worth)
Package image and VAE info alongside the Denoise Mask data into the Denoise Latents invocation, and then encode if necessary once everything is together in the pipeline (loses the benefits of caching when you use the same mask multiple times)
Feed image, Unet, and VAE info into the Mask nodes so that they can decide internally whether masked_latents is required (a few added connections in Canvas graphs, plus some minor confusion as to what is going on by most users as to what is going on in the mask node)

dunkeroni added 4 commits April 2, 2024 16:41

convert mask code to injection points

15d35ff

Merge branch 'invoke-ai:main' into inpaint_pipeline_refactor

ff1f848

inpaint model mask resizing

b82feac

(chore): ruff and cleanup

d85cc8d

github-actions bot added python PRs that change python files invocations PRs that change invocations backend PRs that change backend files labels Apr 3, 2024

dunkeroni added 2 commits April 5, 2024 14:16

use ABC extension class

e254265

Merge branch 'main' into inpaint_pipeline_refactor

361ac46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor: Modularize Inpaint Pipeline #6120

Refactor: Modularize Inpaint Pipeline #6120

dunkeroni commented Apr 3, 2024

psychedelicious commented Apr 3, 2024

dunkeroni commented Apr 3, 2024

Refactor: Modularize Inpaint Pipeline #6120

Are you sure you want to change the base?

Refactor: Modularize Inpaint Pipeline #6120

Conversation

dunkeroni commented Apr 3, 2024

Summary

Related Issues / Discussions

QA Instructions

Checklist

psychedelicious commented Apr 3, 2024

dunkeroni commented Apr 3, 2024