New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regional prompting / Attention couple proof of concept #639
base: regions
Are you sure you want to change the base?
Conversation
If you want to check my full ComfuUI Regional Prompting that inspired this PR : https://www.reddit.com/r/StableDiffusion/comments/1c7eaza/comfyui_easy_regional_prompting_workflow_3/ Note : The AttentionCouple node use a JS trick where you have to right click then use "add input", so a dumped workflow from krita-ai-diffusion is unusable until you connect the correct mask & cond. |
An example for #387 : |
sorry if it s too nooby, but how do i implement this? thanks in advance. |
Improved the coherency by subtracting the upper masks from each control layer mask. This allow to better respect the order of the control layers. This only affect images where there is a covering between two control layers. If you put a control layer below another one covering it entirely, as intended it will have an empty mask and be ignored. It could be an option, but from my experience it works better that way, otherwise smaller zones are frequently ignored in favor of bigger covering ones. I tried manually computing masks intersections to combine prompts, but the difference in result is marginal, and the computation by nodes would be a nightmare. |
FYI, the trace of the attention couple bug : Click me File "F:\ComfyUI\comfy\ldm\modules\diffusionmodules\openaimodel.py", line 44, in forward_timestep_embed
x = layer(x, context, transformer_options)
File "F:\ComfyUI\venv\Lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "F:\ComfyUI\venv\Lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "F:\ComfyUI\comfy\ldm\modules\attention.py", line 633, in forward
x = block(x, context=context[i], transformer_options=transformer_options)
File "F:\ComfyUI\venv\Lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "F:\ComfyUI\venv\Lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "F:\ComfyUI\custom_nodes\ComfyUI-layerdiffusion\lib_layerdiffusion\attention_sharing.py", line 253, in forward
return func(self, x, context, transformer_options)
File "F:\ComfyUI\comfy\ldm\modules\attention.py", line 460, in forward
return checkpoint(self._forward, (x, context, transformer_options), self.parameters(), self.checkpoint)
File "F:\ComfyUI\comfy\ldm\modules\diffusionmodules\util.py", line 191, in checkpoint
return func(*inputs)
File "F:\ComfyUI\comfy\ldm\modules\attention.py", line 565, in _forward
n = p(n, extra_options)
File "F:\ComfyUI\custom_nodes\cgem156-ComfyUI\scripts\attention_couple\node.py", line 98, in attn2_output_patch
mask_downsample = get_mask(self.mask, self.batch_size, out.shape[1], extra_options["original_shape"])
File "F:\ComfyUI\custom_nodes\cgem156-ComfyUI\scripts\attention_couple\node.py", line 23, in get_mask
mask_downsample = mask_downsample.view(num_conds, num_tokens, 1).repeat_interleave(batch_size, dim=0)
RuntimeError: shape '[3, 4161, 1]' is invalid for input of size 756 |
Yeah, sadly I got the exact same bug when doing a region refine with attention couple. Either on a small region necessitating an upscale, or on a bigger region without needing one : Other than this bug, it's working as intended. If I force the selection to be an accepted resolution like 1152x896 and set the grow/feather/padding to 0% the attention couple is affecting only the part of the image it should : And the masks are correctly cut to size (it was a 50% left/right mask originally) : |
I altered the code of Now attention couple is used directly in all modes : The difference of ratio because of the multiple of 64 goes unnoticed in the majority of the cases, with maybe an exception for a very narrow or tall selection, and mostly affects SD 1.5 because of the smaller original size. The modified code checks the regions existence before altering the extent, so it should not affect the general behavior of the plugin once the regions are automatically excluded if not applicable for a selection, pending your update. In the meantime, it is applied as long as there is an existing region. |
I'm wondering if I should hijack the "Focus" checkbox to switch from attention coupling to a single region prompt finder when refining region & inpainting. 🤔 This could be useful to prevent concept bleeding that can occurs with another close region. |
Done. I think this fits nicely under the same concept. But we could still split the setting into two checkboxes if needed, for example "Focus" and "Single region". [edit] : the focus mode is causing a bug with SD 1.5 models because of a wrong image type when computing the valid region on the upscaled mask I think. |
Fixed the various hires pass that were not being updated as they should, and modified the inpaint code to use attention couple on its hires pass too. Fixed the wrong image type bug, it was an old code portion trying to subtract mask in the region prompt finder. There is still a minor inconvenience where it seems the region mask is the original mask, and not the subtracted one. So the prompt finder may chose the wrong prompt if regions masks are overlapping. To be tested. |
The focus mode is intended to restrict the prompt to the selected region, in scenarios where it would otherwise apply to a much bigger context. It basically did regional prompting with 1 region and background. I was just going to rewire it so it uses the new attention-couple-region mechanism to auto-create a region with the selection as mask. If there are already regions defined, I would consider the prompt "focused", and do nothing. Filtering out gracing regions with small overlap is a good idea anyway, and if 1 region covers like 90% of the selection just use that without attention couple. Regarding multiple of 32 (or 64 now?), there is already a requirement for multiple-of-8 anyway (at VAE encode, because latent is factor 8 smaller resolution). It should be possible to adapt the existing places where this is ensured and use a different value. Still a bit annoying since it will trigger resizes in more cases now. Might take a look at the node at some point to see if it can't use an internal padding or something. |
I thought it was 32, but it was working only something like 75% of the time. Having it set at 64 seems to have entirely solved this issue... by bringing other ones. I saw the code for the multiple of 8, but decided to leave it alone to prevent unwanted effects, particularly on padding and such. And it seems that this code is looking for the next multiple of 8 above a given dimension. For the multiple of 64, I chose to look for the closest multiple above and under the dimension, trying to diminish the distortions. An internal padding in the attention couple would be worth a try, the github page readme is in Japanese, but the code in itself is pretty short : https://github.com/laksjdjf/cgem156-ComfyUI/blob/main/scripts/attention_couple/node.py |
I updated the region filtering. It might still be buggy and I left some prints in there. Also the numbers probably need tweaking. It now does the following, depending on the generation bounds (full image or selection):
Also added a button to quickly set up initial groups. Apply result now sorts the result image into the affected groups and creates transparency masks if needed. Let's see how useful it is. You can do things like easily mix multiple results by toggling them per region, or adjusting the region mask to a result you like before refines. Inpaint results that only affect one region are added to that region only keeping things organized. I will try to add more ideas/experiments before taking closer look at corner cases and issues, to get a complete picture of the whole thing before dealing with the details and messy code. |
# Conflicts: # ai_diffusion/model.py
I merged the latest
|
Also, the new way of computing the regions seems to mess with the This is due to |
Hm are you sure? The initial result state is returned in cases where there are no regions at all, in that case the Issue might be that Anyway, will definitely revisit that code. One thing I still want to try out before doing some clean up, is a more explicit way of assigning regions to layers. Maybe allow layers other than groups to be regions too, with destructive updates. But it shouldn't require any extra steps. Having multiple layers linked to the same region prompt could also be useful. Like I tried to draw/prompt something, then I want it to appear more than once in the image, so intuitively I copy/paste the layer. Would be cool if the copies are their own region following the same prompt. Will be hard to detect automatically though. |
The Otherwise in I also have an unexpected behavior where transparency masks are added to the regions, and strange things then occurs to the rendering. I had some cases where the paint layer color was used as basis for the attention couple when refining the entire image. [edit] : I cannot reproduce this bug. But when I update a region paint layer to alter the mask, the generated transparency mask is not updated to correspond to the new mask. |
That was the point. For cases where no regions are used the placeholder makes no sense. Regarding transparency masks. You can indicate your region directly with a transparency mask. If you use a paint layer, the transparency mask is created automatically on apply (it is required for correct blending). You have to edit the mask to extend or shrink the region. Transparency masks are also the only way to make the region bigger after generating. If the alpha was applied directly to the generated result you wouldn't be able to "unerase" it. Transparency mask allows that. |
…ve prompts clip encoding
I see your point, it is correct if we remove the prompt merging from I simplified the apply attention method by removing prompts computations and unnecessary negative prompt clip encoding. Then I altered the to api method to merge the region positive prompts with the root positive. This should be more robust. |
I had a case where my background region was < 10% and being ignored, even when it had enough of an impact when used previously. I would propose the region cutoff to be set at 5% coverage. |
The Conditioning input from apply_attention do not need to be in/out anymore. This simplifies the code a little bit. |
Hello,
Using the custom node cgem156-ComfyUI , I was able to have a functional regional prompting workflow for SDXL.
Here is a proof of concept in krita-ai-diffusion, using a new control layer type "Attention", and splitting the prompt in lines and parsing for
ZONE
starting lines (orBREAK
,PROMPT
,ATT
, and an optional number... TBD) and new Paint layer in Krita we are able to greatly influence the rendering.Here is a step by step :
ZONE
then describe the contentZONE
line that will be applied only to the image outside the defined zonesAn example with a single zone :
The second ZONE is automatically affected to the cat prompt. The prompts used as attention couple are : "photography of a dog, two animals, a city street in the background" and "photography of a cat, two animals, a city street in the background".
Another example:
To do :
AttentionCouple
cgem156-ComfyUI