Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve passing large _Batch to steps #488

Open
gabrielmbmb opened this issue Mar 27, 2024 · 1 comment
Open

Improve passing large _Batch to steps #488

gabrielmbmb opened this issue Mar 27, 2024 · 1 comment
Assignees
Milestone

Comments

@gabrielmbmb
Copy link
Member

gabrielmbmb commented Mar 27, 2024

The mp.Queue that we're using to pass the data between the steps is very slow when it's used to send a lot of data (for example when accumulating embeddings from GenerateEmbeddings step for the DeitaFiltering step).

We need to develop a better way (probably using an storage and dumping the content to a file so it can be read back by the other step) to pass large batches, or using a different Queue implementation like this one: https://github.com/alex-petrenko/faster-fifo

@gabrielmbmb gabrielmbmb self-assigned this Mar 27, 2024
@gabrielmbmb gabrielmbmb added this to the 1.1.0 milestone Mar 27, 2024
@gabrielmbmb
Copy link
Member Author

I've tried faster-fifo but mmap.mmap object cannot be pickled :(

@gabrielmbmb gabrielmbmb modified the milestones: 1.1.0, 1.2.0 May 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Todo
Development

No branches or pull requests

2 participants