Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: RND-48: Parallel sync runtime #99

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

niklub
Copy link
Contributor

@niklub niklub commented Apr 25, 2024

  • Added parallel execution for sync runtimes by default
  • Move concurrency parameter top-level for Runtime base class

@robot-ci-heartex robot-ci-heartex temporarily deployed to fb-parallel-sync-runtime April 25, 2024 10:27 Destroyed
@robot-ci-heartex
Copy link
Collaborator

@robot-ci-heartex robot-ci-heartex temporarily deployed to fb-parallel-sync-runtime April 25, 2024 13:15 Destroyed
@robot-ci-heartex
Copy link
Collaborator

@niklub niklub changed the title feat: Parallel sync runtime feat: RND-48: Parallel sync runtime Apr 25, 2024
adala/runtimes/_openai.py Outdated Show resolved Hide resolved
adala/runtimes/base.py Outdated Show resolved Hide resolved
adala/runtimes/base.py Outdated Show resolved Hide resolved
adala/runtimes/base.py Outdated Show resolved Hide resolved
@robot-ci-heartex robot-ci-heartex temporarily deployed to fb-parallel-sync-runtime April 25, 2024 15:52 Destroyed
@robot-ci-heartex
Copy link
Collaborator

elif self.concurrency > 1:
# run batch processing each row in a parallel way, using a fixed number of CPUs
logger.info(f"Running batch processing in parallel using {self.concurrency} CPUs")
pandarallel.initialize(nb_workers=self.concurrency, progress_bar=self.verbose)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would still like to get a comment here explaining the doubled memory footprint, but not PR-blocking worthy

@pakelley
Copy link
Contributor

pakelley commented Apr 25, 2024

nbd if we don't have numbers for this, but would be nice to see what the speedup is here, if you have even some rough/anecdotal tests

"""

verbose: bool = False
batch_size: Optional[int] = None
concurrency: Optional[int] = Field(default=-1, alias='concurrent_clients')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a nitpick, but calling this var "concurrency" for both async and sync runtimes is possibly confusing, for async runtimes this is the level of concurrency and for sync runtimes it's the level of parallelism. Technically what we usually call parallelism in python is "concurrency plus parallelism", while what we call concurrency is "concurrency without parallelism". I can't think of a better name though, I guess it's okay 🙂

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I'd agree with your descriptions, and while I agree that we typically mean something more specific when we say "concurrency", imo it's fair enough to use "concurrency" as the umbrella term (since it technically is, and I also can't think of a better umbrella term 🙂 )
Nikolai, if you have a better term up your sleeve, fantastic, but I'm personally good with "concurrency" too, fwiw

@robot-ci-heartex robot-ci-heartex marked this pull request as draft April 26, 2024 08:05
@robot-ci-heartex
Copy link
Collaborator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants