New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unknown PostProcessor type: Sequence #739
Comments
Hi there 👋 Thanks for the report! Luckily, we already support the ByteLevel and TemplateProcessing post-processors, so the only thing needed is to implement the Sequence post-processor. Similarly, we already support sequences of normalizers, decoders, and pre-tokenizers, and a similar pattern can be adapted for post-processors. Is this something you'd be interested in adding? If so, I'd be happy to review a PR. |
Sorry I don't plan to work on this issue, I'm just reporting a random issue I met. |
No worries! It's super simple, so I'll add it soon. Thanks again for reporting! |
@xenova Any thoughts on this? This is preventing loading llama 3 8b, which is a bummer. |
Here's the rust code for it: https://github.com/huggingface/tokenizers/blob/25aee8b88c8de3c5a52e2f9cb6281d6df00ad516/tokenizers/src/processors/sequence.rs#L18-L36 and it should be easy to translate into JS. |
System Info
Using Node.js 20 with transformers.js 2.17.1.
Environment/Platform
Description
It seems that following post preprocessor in
tokenizer.json
is not supported:Reproduction
Throws error:
The text was updated successfully, but these errors were encountered: