Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memory-efficient attention is default opened? if i dont use flash attn #48

Open
wac81 opened this issue Apr 7, 2023 · 3 comments
Open

Comments

@wac81
Copy link

wac81 commented Apr 7, 2023

or if i want use memory-efficient attention, i must call scaled_dot_product_attention?

PyTorch 2.0 includes an optimized and memory-efficient attention implementation through the torch.nn.functional.scaled_dot_product_attention function

@conceptofmind
Copy link
Contributor

conceptofmind commented Apr 9, 2023

PyTorch 2.0 will automatically set the most appropriate version of attention based on your system specs.

All implementations are enabled by default. Scaled dot product attention attempts to automatically select the most optimal implementation based on the inputs. 

scaled_dot_product_attention is used in the repo. If you have Flash set to true but do not have an A100 it should default to mem efficient attn, math, or cpu.

@wac81
Copy link
Author

wac81 commented Apr 24, 2023

thanks, but how do i know which one to use? how to check it?

@wac81
Copy link
Author

wac81 commented Apr 24, 2023

memory_efficient_attention of xformers
it's faster implement than torch? any idea?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants