Skip to content

Issue with Increasing VRAM/Shared GPU Memory Usage During Training on EfficientVIT-M2 and EfficientNet_lite0 #2128

Answered by thoj
thoj asked this question in Q&A
Discussion options

You must be logged in to vote

After encountering significant VRAM overflow issues during the training of an EfficientVIT-M2 model, I developed a workaround. It's important to note that my explanation for why this solution works is based on a theory regarding the NVIDIA driver's memory management behavior.

I theorize that the underlying issue arises from the NVIDIA driver's memory manager (In Windows), which appears to attempt optimizing VRAM usage by preemptively transferring data to shared GPU memory. This seems to occur to prevent complete VRAM saturation, with the process starting when VRAM usage is just shy of its maximum capacity (around 9.8GB in my scenario), leaving about 200MB of VRAM "free." PyTorch, recogniz…

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by thoj
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
1 participant