You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The TokenClassificationPipeline currently sets a hardcoded tokeniser config within it sanitiser method. This prevents users from passing their own config to the tokeniser.
It would be good to support some user input for tokeniser config. Especially for is_split_into_words as input data may be split already.
Motivation
It is common for token classification datasets to be split into words already so that they match their labels.
Your contribution
I naivley anticipate this being a simple change, so I am happy to submit a PR for it. Though it would first be nice to see a discussion surrounding the feature and if it fits with the goals of Transformers.
The text was updated successfully, but these errors were encountered:
This makes sense to me, but I'm not super-familiar with that pipeline. I'd support a PR to allow some options to be passed through to the tokenizer, though, since that shouldn't have any backward compatibility issues!
Feature request
The TokenClassificationPipeline currently sets a hardcoded tokeniser config within it sanitiser method. This prevents users from passing their own config to the tokeniser.
It would be good to support some user input for tokeniser config. Especially for is_split_into_words as input data may be split already.
Motivation
It is common for token classification datasets to be split into words already so that they match their labels.
Your contribution
I naivley anticipate this being a simple change, so I am happy to submit a PR for it. Though it would first be nice to see a discussion surrounding the feature and if it fits with the goals of Transformers.
The text was updated successfully, but these errors were encountered: