Support Dynamo level Caching #125958
Labels
module: startup-tracing-compile
Compilation mechanism or time spent in (re)compilation, tracing, startup
oncall: pt2
triaged
This issue has been looked at a team member, and triaged and prioritized into an appropriate module
馃殌 The feature, motivation and pitch
torch.compile
can takes order of seconds to compile a decent size model like Llama2 7B with aaot-autogra
enabled backend. Note that I only include thedyanmo
+aot_autograd
time, this does not include the backend compiler(like inductor) compilation time. It would be ideal if dynamo can cache thetorch.compile
to speed up development time.We(PyTorch/XLA) are trying to integrate with the
VLLM
. @WoosukKwon reports that in the warm up phase of theVLLM
, it needs to pre-compile ~30 different input shape combinations. PyTorch/XLA does not support dynamic shapes today sotorch.compile
will keep compiling the model code which slows down the development speed(@WoosukKwon needs to wait for 10 minutes before warm up is finished). PyTorch/XLA already cache the XLA compilation buttorch.compile
itself is pretty expensive.Alternatives
Reduce
torch.compile
time for a model with only batch dimension changes.Additional context
cc @ezyang @msaroufim @bdhirsh @anijain2305 @chauhang @yanboliang
The text was updated successfully, but these errors were encountered: