You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Activation checkpoint is a technique to reduce the memory footprint on a single GPU by trading computing for memory. When an activation checkpoint is applied to a group of consecutive layers, only the output of the last layer is cached for the backward computation. All other intermediate outputs are not stored during the forward pass. During the backward pass, re-computation is triggered to obtain the intermediate outputs temporarily for gradient computation. As a result, the memory consumed by inter- mediate activations can be significantly reduced, making more memory available to accommodate larger models. As this technique does not shard tensors, it stays orthogonal to other parallelization techniques.
This issue is to find the best strategy to determine where to use activation checkpoints in the training pipelines for the best running time with a given memory budget.
The text was updated successfully, but these errors were encountered:
Activation checkpoint is a technique to reduce the memory footprint on a single GPU by trading computing for memory. When an activation checkpoint is applied to a group of consecutive layers, only the output of the last layer is cached for the backward computation. All other intermediate outputs are not stored during the forward pass. During the backward pass, re-computation is triggered to obtain the intermediate outputs temporarily for gradient computation. As a result, the memory consumed by inter- mediate activations can be significantly reduced, making more memory available to accommodate larger models. As this technique does not shard tensors, it stays orthogonal to other parallelization techniques.
This issue is to find the best strategy to determine where to use activation checkpoints in the training pipelines for the best running time with a given memory budget.
The text was updated successfully, but these errors were encountered: