Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Dynamo level Caching #125958

Open
JackCaoG opened this issue May 10, 2024 · 1 comment
Open

Support Dynamo level Caching #125958

JackCaoG opened this issue May 10, 2024 · 1 comment
Labels
module: startup-tracing-compile Compilation mechanism or time spent in (re)compilation, tracing, startup oncall: pt2 triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@JackCaoG
Copy link
Collaborator

JackCaoG commented May 10, 2024

馃殌 The feature, motivation and pitch

torch.compile can takes order of seconds to compile a decent size model like Llama2 7B with a aot-autogra enabled backend. Note that I only include the dyanmo + aot_autograd time, this does not include the backend compiler(like inductor) compilation time. It would be ideal if dynamo can cache the torch.compile to speed up development time.

We(PyTorch/XLA) are trying to integrate with the VLLM. @WoosukKwon reports that in the warm up phase of the VLLM, it needs to pre-compile ~30 different input shape combinations. PyTorch/XLA does not support dynamic shapes today so torch.compile will keep compiling the model code which slows down the development speed(@WoosukKwon needs to wait for 10 minutes before warm up is finished). PyTorch/XLA already cache the XLA compilation but torch.compile itself is pretty expensive.

Alternatives

Reduce torch.compile time for a model with only batch dimension changes.

Additional context

cc @ezyang @msaroufim @bdhirsh @anijain2305 @chauhang @yanboliang

@msaroufim msaroufim added module: startup-tracing-compile Compilation mechanism or time spent in (re)compilation, tracing, startup triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels May 11, 2024
@ezyang
Copy link
Contributor

ezyang commented May 11, 2024

cc @jamesjwu @oulgen

AOTAutograd caching will help here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: startup-tracing-compile Compilation mechanism or time spent in (re)compilation, tracing, startup oncall: pt2 triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

4 participants