-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: RND-48: Parallel sync runtime #99
base: master
Are you sure you want to change the base?
Conversation
elif self.concurrency > 1: | ||
# run batch processing each row in a parallel way, using a fixed number of CPUs | ||
logger.info(f"Running batch processing in parallel using {self.concurrency} CPUs") | ||
pandarallel.initialize(nb_workers=self.concurrency, progress_bar=self.verbose) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would still like to get a comment here explaining the doubled memory footprint, but not PR-blocking worthy
nbd if we don't have numbers for this, but would be nice to see what the speedup is here, if you have even some rough/anecdotal tests |
""" | ||
|
||
verbose: bool = False | ||
batch_size: Optional[int] = None | ||
concurrency: Optional[int] = Field(default=-1, alias='concurrent_clients') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a nitpick, but calling this var "concurrency" for both async and sync runtimes is possibly confusing, for async runtimes this is the level of concurrency and for sync runtimes it's the level of parallelism. Technically what we usually call parallelism in python is "concurrency plus parallelism", while what we call concurrency is "concurrency without parallelism". I can't think of a better name though, I guess it's okay 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I'd agree with your descriptions, and while I agree that we typically mean something more specific when we say "concurrency", imo it's fair enough to use "concurrency" as the umbrella term (since it technically is, and I also can't think of a better umbrella term 🙂 )
Nikolai, if you have a better term up your sleeve, fantastic, but I'm personally good with "concurrency" too, fwiw
concurrency
parameter top-level forRuntime
base class