Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regional prompting / Attention couple proof of concept #639

Draft
wants to merge 96 commits into
base: regions
Choose a base branch
from

Conversation

Danamir
Copy link
Contributor

@Danamir Danamir commented Apr 21, 2024

Hello,

Using the custom node cgem156-ComfyUI , I was able to have a functional regional prompting workflow for SDXL.

Here is a proof of concept in krita-ai-diffusion, using a new control layer type "Attention", and splitting the prompt in lines and parsing for ZONE starting lines (or BREAK, PROMPT, ATT, and an optional number... TBD) and new Paint layer in Krita we are able to greatly influence the rendering.

Here is a step by step :

  • Create a new Paint layer in Krita
  • Draw/fill the region you want to control (any color, any opacity, we use the transparency mask directly)
  • Add a new "Attention" control layer in krita-ai-diffusion interface
  • Affect the new Paint layer to the control layer
  • Repeat for any number of zones
  • Alter the prompt:
    • Add as many lines as zones starting by ZONE then describe the content
    • Optionally add an extra ZONE line that will be applied only to the image outside the defined zones
    • The first line of the prompt is copied at the beginning of all prompts
    • The last line of the prompt is copied at the end of all prompts

An example with a single zone :

photography of
ZONE a dog
ZONE a cat
two animals, a city street in the background

image
image

The second ZONE is automatically affected to the cat prompt. The prompts used as attention couple are : "photography of a dog, two animals, a city street in the background" and "photography of a cat, two animals, a city street in the background".

Another example:
image
image

To do :

  • Add a control layer mode "Attention"
  • Get the control layer image opacity as mask
  • Split main prompt by attention zones
  • Apply cond and masks to AttentionCouple
  • Describe resource and auto install cgem156-ComfyUI
  • Handle render methods
    • generate
    • refine
    • inpaint
    • refine_region
  • User switch for attention couple / region prompt finder
  • Load LoRA only once
  • Typechecks
  • Lint

@Danamir Danamir marked this pull request as draft April 21, 2024 00:48
@Danamir
Copy link
Contributor Author

Danamir commented Apr 21, 2024

If you want to check my full ComfuUI Regional Prompting that inspired this PR : https://www.reddit.com/r/StableDiffusion/comments/1c7eaza/comfyui_easy_regional_prompting_workflow_3/

Note : The AttentionCouple node use a JS trick where you have to right click then use "add input", so a dumped workflow from krita-ai-diffusion is unusable until you connect the correct mask & cond.

@Danamir
Copy link
Contributor Author

Danamir commented Apr 21, 2024

This development be of interest to : #386 #387 #567

[edit] : Mentioned the PR in those to get some potential testers. 😅

@Danamir
Copy link
Contributor Author

Danamir commented Apr 21, 2024

An example for #387 :

image
image
image

@illmarzini
Copy link

sorry if it s too nooby, but how do i implement this? thanks in advance.

@Danamir
Copy link
Contributor Author

Danamir commented Apr 22, 2024

Improved the coherency by subtracting the upper masks from each control layer mask. This allow to better respect the order of the control layers.

This only affect images where there is a covering between two control layers. If you put a control layer below another one covering it entirely, as intended it will have an empty mask and be ignored.

It could be an option, but from my experience it works better that way, otherwise smaller zones are frequently ignored in favor of bigger covering ones.

I tried manually computing masks intersections to combine prompts, but the difference in result is marginal, and the computation by nodes would be a nightmare.

@Danamir
Copy link
Contributor Author

Danamir commented May 6, 2024

FYI, the trace of the attention couple bug :

Click me
  File "F:\ComfyUI\comfy\ldm\modules\diffusionmodules\openaimodel.py", line 44, in forward_timestep_embed
    x = layer(x, context, transformer_options)
  File "F:\ComfyUI\venv\Lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "F:\ComfyUI\venv\Lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "F:\ComfyUI\comfy\ldm\modules\attention.py", line 633, in forward
    x = block(x, context=context[i], transformer_options=transformer_options)
  File "F:\ComfyUI\venv\Lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "F:\ComfyUI\venv\Lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "F:\ComfyUI\custom_nodes\ComfyUI-layerdiffusion\lib_layerdiffusion\attention_sharing.py", line 253, in forward
    return func(self, x, context, transformer_options)
  File "F:\ComfyUI\comfy\ldm\modules\attention.py", line 460, in forward
    return checkpoint(self._forward, (x, context, transformer_options), self.parameters(), self.checkpoint)
  File "F:\ComfyUI\comfy\ldm\modules\diffusionmodules\util.py", line 191, in checkpoint
    return func(*inputs)
  File "F:\ComfyUI\comfy\ldm\modules\attention.py", line 565, in _forward
    n = p(n, extra_options)
  File "F:\ComfyUI\custom_nodes\cgem156-ComfyUI\scripts\attention_couple\node.py", line 98, in attn2_output_patch
    mask_downsample = get_mask(self.mask, self.batch_size, out.shape[1], extra_options["original_shape"])
  File "F:\ComfyUI\custom_nodes\cgem156-ComfyUI\scripts\attention_couple\node.py", line 23, in get_mask
    mask_downsample = mask_downsample.view(num_conds, num_tokens, 1).repeat_interleave(batch_size, dim=0)

RuntimeError: shape '[3, 4161, 1]' is invalid for input of size 756

@Danamir
Copy link
Contributor Author

Danamir commented May 6, 2024

Yeah, sadly I got the exact same bug when doing a region refine with attention couple. Either on a small region necessitating an upscale, or on a bigger region without needing one :

With upscale to 1024 :
image

Without upscale :
image

Other than this bug, it's working as intended. If I force the selection to be an accepted resolution like 1152x896 and set the grow/feather/padding to 0% the attention couple is affecting only the part of the image it should :

image

And the masks are correctly cut to size (it was a 50% left/right mask originally) :

image

@Danamir
Copy link
Contributor Author

Danamir commented May 7, 2024

I altered the code of apply_attention to return a modified extent object if necessary, containing dimensions multiple of 64 (closest dimension above or under). It also returns a boolean telling if the attention couple was affected, it could be useful to branch some code later.

Now attention couple is used directly in all modes : generate, refine, refine_region, and inpaint. For the later I had to skip the attention couple for the very last upscale after crop, but it's only at 40% strength and used only if an upscale is necessary so it should not be detrimental.

The difference of ratio because of the multiple of 64 goes unnoticed in the majority of the cases, with maybe an exception for a very narrow or tall selection, and mostly affects SD 1.5 because of the smaller original size.

The modified code checks the regions existence before altering the extent, so it should not affect the general behavior of the plugin once the regions are automatically excluded if not applicable for a selection, pending your update. In the meantime, it is applied as long as there is an existing region.

@Danamir
Copy link
Contributor Author

Danamir commented May 7, 2024

I'm wondering if I should hijack the "Focus" checkbox to switch from attention coupling to a single region prompt finder when refining region & inpainting. 🤔 This could be useful to prevent concept bleeding that can occurs with another close region.

@Danamir
Copy link
Contributor Author

Danamir commented May 7, 2024

Done.

I think this fits nicely under the same concept. But we could still split the setting into two checkboxes if needed, for example "Focus" and "Single region".

[edit] : the focus mode is causing a bug with SD 1.5 models because of a wrong image type when computing the valid region on the upscaled mask I think.

@Danamir
Copy link
Contributor Author

Danamir commented May 7, 2024

Fixed the various hires pass that were not being updated as they should, and modified the inpaint code to use attention couple on its hires pass too.

Fixed the wrong image type bug, it was an old code portion trying to subtract mask in the region prompt finder. There is still a minor inconvenience where it seems the region mask is the original mask, and not the subtracted one. So the prompt finder may chose the wrong prompt if regions masks are overlapping. To be tested.

@Acly
Copy link
Owner

Acly commented May 7, 2024

The focus mode is intended to restrict the prompt to the selected region, in scenarios where it would otherwise apply to a much bigger context. It basically did regional prompting with 1 region and background.

I was just going to rewire it so it uses the new attention-couple-region mechanism to auto-create a region with the selection as mask. If there are already regions defined, I would consider the prompt "focused", and do nothing.

Filtering out gracing regions with small overlap is a good idea anyway, and if 1 region covers like 90% of the selection just use that without attention couple.


Regarding multiple of 32 (or 64 now?), there is already a requirement for multiple-of-8 anyway (at VAE encode, because latent is factor 8 smaller resolution). It should be possible to adapt the existing places where this is ensured and use a different value. Still a bit annoying since it will trigger resizes in more cases now. Might take a look at the node at some point to see if it can't use an internal padding or something.

@Danamir
Copy link
Contributor Author

Danamir commented May 7, 2024

Regarding multiple of 32 (or 64 now?), there is already a requirement for multiple-of-8 anyway (at VAE encode, because latent is factor 8 smaller resolution). It should be possible to adapt the existing places where this is ensured and use a different value. Still a bit annoying since it will trigger resizes in more cases now. Might take a look at the node at some point to see if it can't use an internal padding or something.

I thought it was 32, but it was working only something like 75% of the time. Having it set at 64 seems to have entirely solved this issue... by bringing other ones. I saw the code for the multiple of 8, but decided to leave it alone to prevent unwanted effects, particularly on padding and such. And it seems that this code is looking for the next multiple of 8 above a given dimension. For the multiple of 64, I chose to look for the closest multiple above and under the dimension, trying to diminish the distortions.

An internal padding in the attention couple would be worth a try, the github page readme is in Japanese, but the code in itself is pretty short : https://github.com/laksjdjf/cgem156-ComfyUI/blob/main/scripts/attention_couple/node.py

@Acly
Copy link
Owner

Acly commented May 9, 2024

I updated the region filtering. It might still be buggy and I left some prints in there. Also the numbers probably need tweaking. It now does the following, depending on the generation bounds (full image or selection):

  • Remove empty regions
  • Remove regions without prompt/control
  • Do a quick bounding-box check and remove regions with <10% overlap
  • If mask pixels of a region cover >90% of the bounds, use that as exclusive prompt (0 regions, no attention couple)
  • If mask pixels of a region cover <10% of the bounds, remove it
  • If no region has >10% coverage, fall back to regular generate (0 regions, no attention couple)
  • If selected regions don't have 100% coverage, create a background region with the remainder

Also added a button to quickly set up initial groups.

Apply result now sorts the result image into the affected groups and creates transparency masks if needed. Let's see how useful it is. You can do things like easily mix multiple results by toggling them per region, or adjusting the region mask to a result you like before refines. Inpaint results that only affect one region are added to that region only keeping things organized.

I will try to add more ideas/experiments before taking closer look at corner cases and issues, to get a complete picture of the whole thing before dealing with the details and messy code.

# Conflicts:
#	ai_diffusion/model.py
@Danamir
Copy link
Contributor Author

Danamir commented May 12, 2024

I merged the latest regions branch into the PR. It seems to work with almost no modifications. Two notes :

  • The auto background region is created with the "background" prompt. I guess it should be empty, as this token can really alter the generation.
  • When doing a refine with a small enough selection that only one region would be used, and said selection is on the background, it uses the first region instead of refining the background.

@Danamir
Copy link
Contributor Author

Danamir commented May 12, 2024

Also, the new way of computing the regions seems to mess with the {prompt} token parsing in the common prompt.

This is due to positive=workflow.merge_prompt("", parent.prompt), when initializing the result in RegionTree.to_api. The merge replace the {prompt} token with "". A simple positive=parent.prompt is sufficient at this point.

@Acly
Copy link
Owner

Acly commented May 12, 2024

A simple positive=parent.prompt is sufficient at this point.

Hm are you sure? The initial result state is returned in cases where there are no regions at all, in that case the {prompt} should already be "merged away" as it won't happen later. Or so I thought, not sure.

Issue might be that result.prompt is reused later, where perhaps parent.prompt should be used.

Anyway, will definitely revisit that code. One thing I still want to try out before doing some clean up, is a more explicit way of assigning regions to layers. Maybe allow layers other than groups to be regions too, with destructive updates. But it shouldn't require any extra steps.

Having multiple layers linked to the same region prompt could also be useful. Like I tried to draw/prompt something, then I want it to appear more than once in the image, so intuitively I copy/paste the layer. Would be cool if the copies are their own region following the same prompt. Will be hard to detect automatically though.

@Danamir
Copy link
Contributor Author

Danamir commented May 12, 2024

in that case the {prompt} should already be "merged away" as it won't happen later.

The merge_prompt is used later in encode_attention_text_prompt in my code, and expects the {prompt} to be available. I use it with region.positive and cond.positive variables.

Otherwise in RegionTree.to_api, merge_prompt is used once in the coverage > 0.9 case, and once when initializing the positive prompt. But using it with "" as first parameter is a sure way to replace {prompt} by nothing. 😅

I also have an unexpected behavior where transparency masks are added to the regions, and strange things then occurs to the rendering. I had some cases where the paint layer color was used as basis for the attention couple when refining the entire image. [edit] : I cannot reproduce this bug. But when I update a region paint layer to alter the mask, the generated transparency mask is not updated to correspond to the new mask.

@Acly
Copy link
Owner

Acly commented May 12, 2024

But using it with "" as first parameter is a sure way to replace {prompt} by nothing.

That was the point. For cases where no regions are used the placeholder makes no sense.

Regarding transparency masks. You can indicate your region directly with a transparency mask. If you use a paint layer, the transparency mask is created automatically on apply (it is required for correct blending). You have to edit the mask to extend or shrink the region.

Transparency masks are also the only way to make the region bigger after generating. If the alpha was applied directly to the generated result you wouldn't be able to "unerase" it. Transparency mask allows that.

@Danamir
Copy link
Contributor Author

Danamir commented May 13, 2024

That was the point. For cases where no regions are used the placeholder makes no sense.

I see your point, it is correct if we remove the prompt merging from workflow.apply_attention to do it entirely in the RegionTree.to_api method.

I simplified the apply attention method by removing prompts computations and unnecessary negative prompt clip encoding. Then I altered the to api method to merge the region positive prompts with the root positive. This should be more robust.

@Danamir
Copy link
Contributor Author

Danamir commented May 13, 2024

I had a case where my background region was < 10% and being ignored, even when it had enough of an impact when used previously. I would propose the region cutoff to be set at 5% coverage.

@Danamir
Copy link
Contributor Author

Danamir commented May 13, 2024

The Conditioning input from apply_attention do not need to be in/out anymore. This simplifies the code a little bit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants