Regional prompting / Attention couple proof of concept #639

Danamir · 2024-04-21T00:48:16Z

Hello,

Using the custom node cgem156-ComfyUI , I was able to have a functional regional prompting workflow for SDXL.

Here is a proof of concept in krita-ai-diffusion, using a new control layer type "Attention", and splitting the prompt in lines and parsing for ZONE starting lines (or BREAK, PROMPT, ATT, and an optional number... TBD) and new Paint layer in Krita we are able to greatly influence the rendering.

Here is a step by step :

Create a new Paint layer in Krita
Draw/fill the region you want to control (any color, any opacity, we use the transparency mask directly)
Add a new "Attention" control layer in krita-ai-diffusion interface
Affect the new Paint layer to the control layer
Repeat for any number of zones
Alter the prompt:
- Add as many lines as zones starting by ZONE then describe the content
- Optionally add an extra ZONE line that will be applied only to the image outside the defined zones
- The first line of the prompt is copied at the beginning of all prompts
- The last line of the prompt is copied at the end of all prompts

An example with a single zone :

photography of
ZONE a dog
ZONE a cat
two animals, a city street in the background

The second ZONE is automatically affected to the cat prompt. The prompts used as attention couple are : "photography of a dog, two animals, a city street in the background" and "photography of a cat, two animals, a city street in the background".

Another example:

To do :

Danamir · 2024-04-21T01:05:51Z

If you want to check my full ComfuUI Regional Prompting that inspired this PR : https://www.reddit.com/r/StableDiffusion/comments/1c7eaza/comfyui_easy_regional_prompting_workflow_3/

Note : The AttentionCouple node use a JS trick where you have to right click then use "add input", so a dumped workflow from krita-ai-diffusion is unusable until you connect the correct mask & cond.

Danamir · 2024-04-21T12:10:18Z

This development be of interest to : #386 #387 #567

[edit] : Mentioned the PR in those to get some potential testers. 😅

Danamir · 2024-04-21T12:33:26Z

An example for #387 :

illmarzini · 2024-04-21T14:33:18Z

sorry if it s too nooby, but how do i implement this? thanks in advance.

…n-couple

Danamir · 2024-04-22T02:55:25Z

Improved the coherency by subtracting the upper masks from each control layer mask. This allow to better respect the order of the control layers.

This only affect images where there is a covering between two control layers. If you put a control layer below another one covering it entirely, as intended it will have an empty mask and be ignored.

It could be an option, but from my experience it works better that way, otherwise smaller zones are frequently ignored in favor of bigger covering ones.

I tried manually computing masks intersections to combine prompts, but the difference in result is marginal, and the computation by nodes would be a nightmare.

Danamir · 2024-05-06T12:25:06Z

FYI, the trace of the attention couple bug :

Click me

  File "F:\ComfyUI\comfy\ldm\modules\diffusionmodules\openaimodel.py", line 44, in forward_timestep_embed
    x = layer(x, context, transformer_options)
  File "F:\ComfyUI\venv\Lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "F:\ComfyUI\venv\Lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "F:\ComfyUI\comfy\ldm\modules\attention.py", line 633, in forward
    x = block(x, context=context[i], transformer_options=transformer_options)
  File "F:\ComfyUI\venv\Lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "F:\ComfyUI\venv\Lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "F:\ComfyUI\custom_nodes\ComfyUI-layerdiffusion\lib_layerdiffusion\attention_sharing.py", line 253, in forward
    return func(self, x, context, transformer_options)
  File "F:\ComfyUI\comfy\ldm\modules\attention.py", line 460, in forward
    return checkpoint(self._forward, (x, context, transformer_options), self.parameters(), self.checkpoint)
  File "F:\ComfyUI\comfy\ldm\modules\diffusionmodules\util.py", line 191, in checkpoint
    return func(*inputs)
  File "F:\ComfyUI\comfy\ldm\modules\attention.py", line 565, in _forward
    n = p(n, extra_options)
  File "F:\ComfyUI\custom_nodes\cgem156-ComfyUI\scripts\attention_couple\node.py", line 98, in attn2_output_patch
    mask_downsample = get_mask(self.mask, self.batch_size, out.shape[1], extra_options["original_shape"])
  File "F:\ComfyUI\custom_nodes\cgem156-ComfyUI\scripts\attention_couple\node.py", line 23, in get_mask
    mask_downsample = mask_downsample.view(num_conds, num_tokens, 1).repeat_interleave(batch_size, dim=0)

RuntimeError: shape '[3, 4161, 1]' is invalid for input of size 756

Danamir · 2024-05-06T12:46:02Z

Yeah, sadly I got the exact same bug when doing a region refine with attention couple. Either on a small region necessitating an upscale, or on a bigger region without needing one :

With upscale to 1024 :

Without upscale :

Other than this bug, it's working as intended. If I force the selection to be an accepted resolution like 1152x896 and set the grow/feather/padding to 0% the attention couple is affecting only the part of the image it should :

And the masks are correctly cut to size (it was a 50% left/right mask originally) :

…ttention

Danamir · 2024-05-07T01:00:13Z

I altered the code of apply_attention to return a modified extent object if necessary, containing dimensions multiple of 64 (closest dimension above or under). It also returns a boolean telling if the attention couple was affected, it could be useful to branch some code later.

Now attention couple is used directly in all modes : generate, refine, refine_region, and inpaint. For the later I had to skip the attention couple for the very last upscale after crop, but it's only at 40% strength and used only if an upscale is necessary so it should not be detrimental.

The difference of ratio because of the multiple of 64 goes unnoticed in the majority of the cases, with maybe an exception for a very narrow or tall selection, and mostly affects SD 1.5 because of the smaller original size.

The modified code checks the regions existence before altering the extent, so it should not affect the general behavior of the plugin once the regions are automatically excluded if not applicable for a selection, pending your update. In the meantime, it is applied as long as there is an existing region.

Danamir · 2024-05-07T06:24:59Z

I'm wondering if I should hijack the "Focus" checkbox to switch from attention coupling to a single region prompt finder when refining region & inpainting. 🤔 This could be useful to prevent concept bleeding that can occurs with another close region.

Danamir · 2024-05-07T06:47:02Z

Done.

I think this fits nicely under the same concept. But we could still split the setting into two checkboxes if needed, for example "Focus" and "Single region".

[edit] : the focus mode is causing a bug with SD 1.5 models because of a wrong image type when computing the valid region on the upscaled mask I think.

…pe error.

Danamir · 2024-05-07T08:06:24Z

Fixed the various hires pass that were not being updated as they should, and modified the inpaint code to use attention couple on its hires pass too.

Fixed the wrong image type bug, it was an old code portion trying to subtract mask in the region prompt finder. There is still a minor inconvenience where it seems the region mask is the original mask, and not the subtracted one. So the prompt finder may chose the wrong prompt if regions masks are overlapping. To be tested.

Acly · 2024-05-07T10:07:28Z

The focus mode is intended to restrict the prompt to the selected region, in scenarios where it would otherwise apply to a much bigger context. It basically did regional prompting with 1 region and background.

I was just going to rewire it so it uses the new attention-couple-region mechanism to auto-create a region with the selection as mask. If there are already regions defined, I would consider the prompt "focused", and do nothing.

Filtering out gracing regions with small overlap is a good idea anyway, and if 1 region covers like 90% of the selection just use that without attention couple.

Regarding multiple of 32 (or 64 now?), there is already a requirement for multiple-of-8 anyway (at VAE encode, because latent is factor 8 smaller resolution). It should be possible to adapt the existing places where this is ensured and use a different value. Still a bit annoying since it will trigger resizes in more cases now. Might take a look at the node at some point to see if it can't use an internal padding or something.

Danamir · 2024-05-07T12:28:30Z

Regarding multiple of 32 (or 64 now?), there is already a requirement for multiple-of-8 anyway (at VAE encode, because latent is factor 8 smaller resolution). It should be possible to adapt the existing places where this is ensured and use a different value. Still a bit annoying since it will trigger resizes in more cases now. Might take a look at the node at some point to see if it can't use an internal padding or something.

I thought it was 32, but it was working only something like 75% of the time. Having it set at 64 seems to have entirely solved this issue... by bringing other ones. I saw the code for the multiple of 8, but decided to leave it alone to prevent unwanted effects, particularly on padding and such. And it seems that this code is looking for the next multiple of 8 above a given dimension. For the multiple of 64, I chose to look for the closest multiple above and under the dimension, trying to diminish the distortions.

An internal padding in the attention couple would be worth a try, the github page readme is in Japanese, but the code in itself is pretty short : https://github.com/laksjdjf/cgem156-ComfyUI/blob/main/scripts/attention_couple/node.py

Acly · 2024-05-09T09:46:36Z

I updated the region filtering. It might still be buggy and I left some prints in there. Also the numbers probably need tweaking. It now does the following, depending on the generation bounds (full image or selection):

Remove empty regions
Remove regions without prompt/control
Do a quick bounding-box check and remove regions with <10% overlap
If mask pixels of a region cover >90% of the bounds, use that as exclusive prompt (0 regions, no attention couple)
If mask pixels of a region cover <10% of the bounds, remove it
If no region has >10% coverage, fall back to regular generate (0 regions, no attention couple)
If selected regions don't have 100% coverage, create a background region with the remainder

Also added a button to quickly set up initial groups.

Apply result now sorts the result image into the affected groups and creates transparency masks if needed. Let's see how useful it is. You can do things like easily mix multiple results by toggling them per region, or adjusting the region mask to a result you like before refines. Inpaint results that only affect one region are added to that region only keeping things organized.

I will try to add more ideas/experiments before taking closer look at corner cases and issues, to get a complete picture of the whole thing before dealing with the details and messy code.

# Conflicts: # ai_diffusion/model.py

Danamir · 2024-05-12T21:09:19Z

I merged the latest regions branch into the PR. It seems to work with almost no modifications. Two notes :

The auto background region is created with the "background" prompt. I guess it should be empty, as this token can really alter the generation.
When doing a refine with a small enough selection that only one region would be used, and said selection is on the background, it uses the first region instead of refining the background.

Danamir · 2024-05-12T21:26:34Z

Also, the new way of computing the regions seems to mess with the {prompt} token parsing in the common prompt.

This is due to positive=workflow.merge_prompt("", parent.prompt), when initializing the result in RegionTree.to_api. The merge replace the {prompt} token with "". A simple positive=parent.prompt is sufficient at this point.

Acly · 2024-05-12T21:59:37Z

A simple positive=parent.prompt is sufficient at this point.

Hm are you sure? The initial result state is returned in cases where there are no regions at all, in that case the {prompt} should already be "merged away" as it won't happen later. Or so I thought, not sure.

Issue might be that result.prompt is reused later, where perhaps parent.prompt should be used.

Anyway, will definitely revisit that code. One thing I still want to try out before doing some clean up, is a more explicit way of assigning regions to layers. Maybe allow layers other than groups to be regions too, with destructive updates. But it shouldn't require any extra steps.

Having multiple layers linked to the same region prompt could also be useful. Like I tried to draw/prompt something, then I want it to appear more than once in the image, so intuitively I copy/paste the layer. Would be cool if the copies are their own region following the same prompt. Will be hard to detect automatically though.

Danamir · 2024-05-12T22:26:25Z

in that case the {prompt} should already be "merged away" as it won't happen later.

The merge_prompt is used later in encode_attention_text_prompt in my code, and expects the {prompt} to be available. I use it with region.positive and cond.positive variables.

Otherwise in RegionTree.to_api, merge_prompt is used once in the coverage > 0.9 case, and once when initializing the positive prompt. But using it with "" as first parameter is a sure way to replace {prompt} by nothing. 😅

I also have an unexpected behavior where transparency masks are added to the regions, and strange things then occurs to the rendering. I had some cases where the paint layer color was used as basis for the attention couple when refining the entire image. [edit] : I cannot reproduce this bug. But when I update a region paint layer to alter the mask, the generated transparency mask is not updated to correspond to the new mask.

Acly · 2024-05-12T23:13:17Z

But using it with "" as first parameter is a sure way to replace {prompt} by nothing.

That was the point. For cases where no regions are used the placeholder makes no sense.

Regarding transparency masks. You can indicate your region directly with a transparency mask. If you use a paint layer, the transparency mask is created automatically on apply (it is required for correct blending). You have to edit the mask to extend or shrink the region.

Transparency masks are also the only way to make the region bigger after generating. If the alpha was applied directly to the generated result you wouldn't be able to "unerase" it. Transparency mask allows that.

…s to_api

…ve prompts clip encoding

Danamir · 2024-05-13T07:22:18Z

That was the point. For cases where no regions are used the placeholder makes no sense.

I see your point, it is correct if we remove the prompt merging from workflow.apply_attention to do it entirely in the RegionTree.to_api method.

I simplified the apply attention method by removing prompts computations and unnecessary negative prompt clip encoding. Then I altered the to api method to merge the region positive prompts with the root positive. This should be more robust.

Danamir · 2024-05-13T07:24:10Z

I had a case where my background region was < 10% and being ignored, even when it had enough of an impact when used previously. I would propose the region cutoff to be set at 5% coverage.

Danamir · 2024-05-13T15:31:44Z

The Conditioning input from apply_attention do not need to be in/out anymore. This simplifies the code a little bit.

Danamir added 3 commits April 21, 2024 01:37

Add attention control layer

40ba2f1

Attention couple and mask manipulation nodes

cb1e6a4

Apply attention control layer in generate

0a46923

Danamir marked this pull request as draft April 21, 2024 00:48

Danamir added 2 commits April 21, 2024 02:59

Tentative lint correction

ab9f0ef

Tentative lint correction

d4ace76

Danamir added 5 commits April 21, 2024 03:11

Tentative typecheck correction

987c579

Tentative typecheck correction

93aeb81

Tentative typecheck correction

cd059ac

Tentative lint correction

7af7e97

Tentative lint correction

bdc9fed

Danamir added 14 commits April 21, 2024 18:50

Add attention control layer

d6f6df6

Attention couple and mask manipulation nodes

0735035

Apply attention control layer in generate

2b40e5b

Tentative lint correction

192354d

Tentative lint correction

a50da50

Tentative typecheck correction

1811800

Tentative typecheck correction

4ac63f5

Tentative typecheck correction

d080e31

Tentative lint correction

49afe86

Tentative lint correction

06f33bb

Merge remote-tracking branch 'danamir/attention-couple' into attentio…

3738b1f

…n-couple

Handle attention call in refine method

950475a

Subtract lower masks from each attention control layer mask

6954aa4

Lint correction

b15c447

Danamir added 2 commits May 7, 2024 02:44

New method Extent.closest_multiple_of(int)

b28abf7

Ensuring the extent dimensions are a multiple of 64 before applying a…

ec1afb6

…ttention

When using 'Focus' setting switch to a single region prompt finder

d6a2542

Danamir added 2 commits May 7, 2024 09:33

Correctly apply attention conditioning to hires pass

1b79b60

Also apply attention couple to inpaint hires pass. Fix wrong image ty…

f5f4bc6

…pe error.

Merge branch 'refs/heads/regions' into attention-couple

86cb43f

# Conflicts: # ai_diffusion/model.py

Use empty background region by default

12b16fb

Prevent result initialization to overwrite {prompt} token

d82f07b

Merge branch 'refs/heads/regions' into attention-couple

858f37d

Danamir added 2 commits May 13, 2024 09:15

Merge region positive prompts with root positive prompt in the region…

f305eea

…s to_api

No more prompt merging in apply_attention. Removed unnecessary negati…

06c5ab7

…ve prompts clip encoding

Removed unneeded 'cond' return from apply_attention method

b57ed2a

Removed unneeded 'cond' return from apply_attention method

f953cbc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regional prompting / Attention couple proof of concept #639

Regional prompting / Attention couple proof of concept #639

Danamir commented Apr 21, 2024 •

edited

Danamir commented Apr 21, 2024

Danamir commented Apr 21, 2024 •

edited

Danamir commented Apr 21, 2024

illmarzini commented Apr 21, 2024

Danamir commented Apr 22, 2024 •

edited

Danamir commented May 6, 2024

Danamir commented May 6, 2024

Danamir commented May 7, 2024

Danamir commented May 7, 2024

Danamir commented May 7, 2024 •

edited

Danamir commented May 7, 2024 •

edited

Acly commented May 7, 2024

Danamir commented May 7, 2024 •

edited

Acly commented May 9, 2024

Danamir commented May 12, 2024

Danamir commented May 12, 2024 •

edited

Acly commented May 12, 2024

Danamir commented May 12, 2024 •

edited

Acly commented May 12, 2024

Danamir commented May 13, 2024

Danamir commented May 13, 2024

Danamir commented May 13, 2024

Regional prompting / Attention couple proof of concept #639

Are you sure you want to change the base?

Regional prompting / Attention couple proof of concept #639

Conversation

Danamir commented Apr 21, 2024 • edited

Danamir commented Apr 21, 2024

Danamir commented Apr 21, 2024 • edited

Danamir commented Apr 21, 2024

illmarzini commented Apr 21, 2024

Danamir commented Apr 22, 2024 • edited

Danamir commented May 6, 2024

Danamir commented May 6, 2024

Danamir commented May 7, 2024

Danamir commented May 7, 2024

Danamir commented May 7, 2024 • edited

Danamir commented May 7, 2024 • edited

Acly commented May 7, 2024

Danamir commented May 7, 2024 • edited

Acly commented May 9, 2024

Danamir commented May 12, 2024

Danamir commented May 12, 2024 • edited

Acly commented May 12, 2024

Danamir commented May 12, 2024 • edited

Acly commented May 12, 2024

Danamir commented May 13, 2024

Danamir commented May 13, 2024

Danamir commented May 13, 2024

Danamir commented Apr 21, 2024 •

edited

Danamir commented Apr 21, 2024 •

edited

Danamir commented Apr 22, 2024 •

edited

Danamir commented May 7, 2024 •

edited

Danamir commented May 7, 2024 •

edited

Danamir commented May 7, 2024 •

edited

Danamir commented May 12, 2024 •

edited

Danamir commented May 12, 2024 •

edited