⚡️ Speed up load_data()
by 75% in embedchain/loaders/youtube_channel.py
#1267
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
📄
load_data()
inembedchain/loaders/youtube_channel.py
📈 Performance went up by
75%
(0.75x
faster)⏱️ Runtime went down from
14436710.94μs
to8248466.69μs
Explanation and details
(click to show)
There is not a lot of unnecessary overhead in your provided code. However, I've made some minor adjustments to eliminate any extra operations and enhance the code readability. As a note, Python is an interpreted language, it inherently does not perform as fast as languages like C++ or Java.
This version simplifies the logic of the original and should function identically. The threading mechanism (
ThreadPoolExecutor
) still exists to utilize multithreading to speed up the task of loading data from each video in the list. In addition, it also minimizes the number of try/except blocks by pulling them out of the most inner loop. This might reduce the overhead of exception handling and improve performance.Type of change
Please delete options that are not relevant.
How Has This Been Tested?
The new optimized code was tested for correctness. The results are listed below.
✅ 4 Passed − 🌀 Generated Regression Tests
(click to show generated tests)
Checklist:
Maintainer Checklist