-
-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance improvement to improve training time #11718
base: main
Are you sure you want to change the base?
Performance improvement to improve training time #11718
Conversation
@Laughing-q interesting training speedup PR. I think we'd like to merge this without the |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #11718 +/- ##
==========================================
- Coverage 74.59% 70.41% -4.19%
==========================================
Files 124 124
Lines 15664 15693 +29
==========================================
- Hits 11685 11050 -635
- Misses 3979 4643 +664
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Thanks for the update! It's great to hear that the performance improved even without numba. If you can share the specific metrics or any additional insights from your latest tests, that would be helpful for finalizing the merge. Let's aim for the best balance of dependency minimization and performance enhancement. 🚀 |
In short my main changes currently are:
I don't have any specific metrics outside of wallclock time for the epochs. Is there something specific the team would want? |
Thanks for detailing the changes! The approach sounds solid, particularly your method to skip transformations when their probability is zero—definitely a smart optimization. 😊 For metrics, if you could provide us with a comparison in wall-clock time between the main branch and your changes (i.e., how much total time each epoch takes on average) across a few runs, that would be ideal. This will help us quantitatively assess the impact of your improvements. Keep up the fantastic work! Looking forward to integrating these enhancements. 🌟 |
@edkazcarlson Hi, thanks for the PR! yolo train detect data=coco.yaml model=yolov8m.yaml batch=64 epochs=4 close_mosaic=2 device=0,1,2,3 |
@Laughing-q Thank you for the tests on your hardware, could you try doing this through python itself? Not sure where exactly the yolo command is pointing, could it be that it's pointing at the install you have through pip and not my branch (can you confirm through using |
Also side question, I know numba got denied due to hardware compat reasons, but does the team accept cython improvements? |
Hi there! Thanks for your continued contributions and for checking in about Cython. Yes, we're open to considering Cython improvements as they can be a great way to enhance performance while maintaining compatibility across various hardware setups. If you have specific optimizations in mind using Cython, feel free to share them or open a PR. We'd love to take a look! 🚀 |
Thank you for your patience, after a number of different tests, I realized that a huge portion of my performance gains were likely caused by the (slightly) better cooling that my laptops stand provides, where I was getting 6-8% better performance w/ the laptop stand than w/o. Slightly embarrassing but I'll try to keep this in mind for any future performance work I do, apologies for the initial confusion and overblown estimate. When comparing w/ my changes and w/o my changes though, I am still getting ~5% faster with laptop stand and ~3.5% without laptop stand. I used the following code (6 epochs, 3 with mosaic, 3 without) in order to test. For completeness, with the stand my changes took 9223s and 9311 (avg 9267) vs main having 9697s and 9804s (avg 9750s), meaning with my changes it took 95.046153846% as long
|
Thank you for the detailed update and for your honesty about the cooling factor—it's an interesting observation that highlights how various environmental factors can impact performance testing. 🌡️ Your continued efforts and the results you've shared are valuable. A consistent 3.5% to 5% improvement in training time is still quite significant, especially when scaled across multiple training sessions and models. It's clear that your changes are having a positive impact, even if the initial estimates were influenced by external factors. Let's proceed with integrating your changes into the main branch. This will allow us to benefit from these improvements and also keep the project moving forward efficiently. Great work, and looking forward to more of your contributions! 🚀 |
In this PR I am cleaning up some existing code to help improve training times.
The main speedups come from not running transforms that are passed with a 0% chance of applying their transform but I also introduced jit compiled methods with the numba library to help handle some operations that are ran frequently.
If the team doesn't want to use numba (code clutter, licensing, etc) I'll remove as while they did introduce some speedup, a majority of the speedup was from the changes to the image transformations.
Tested with the following code
Overall with my changes it took 9754 seconds to train for 6 epochs (3 with mosaic, 3 without) where it would take 10531 without my changes.
🛠️ PR Summary
Made with ❤️ by Ultralytics Actions
🌟 Summary
Optimizations and improvements in YOLOv8 model processing and data augmentation techniques.
📊 Key Changes
🎯 Purpose & Impact