Skip to content

Pull requests: huggingface/datatrove

Author
Filter by author
Label
Filter by label
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Milestones
Filter by milestone
Reviews
Assignee
Filter by who’s assigned
Sort

Pull requests list

Migrate dedup to xxhash
#179 opened May 7, 2024 by guipenedo Loading…
Speedup json writer
#175 opened May 5, 2024 by its5Q Loading…
Summary stats
#158 opened Apr 20, 2024 by hynky1999 Loading…
[WIP] Multi-Lingual Tokenization
#147 opened Apr 2, 2024 by beme248 Loading…
Linewise filters
#125 opened Mar 14, 2024 by guipenedo Draft
ProTip! Type g i on any issue or pull request to go back to the issue listing page.