-
-
Notifications
You must be signed in to change notification settings - Fork 606
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change in the default fast field tokenizer manager. #2128
base: main
Are you sure you want to change the base?
Conversation
`{ fast: true }` now results in the use of a the default fast field tokenizer. (instead of no tokenizer) The default tokenizer lowercases. Fast field gets a different default tokenizer manager than the normal tokenizer. The serialization of the fast field options is unchanged.
a0037ac
to
61422d7
Compare
Codecov Report
❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more. @@ Coverage Diff @@
## main #2128 +/- ##
==========================================
- Coverage 94.37% 94.37% -0.01%
==========================================
Files 321 319 -2
Lines 60821 60808 -13
==========================================
- Hits 57401 57386 -15
- Misses 3420 3422 +2
|
} else { | ||
self.fast = FastFieldTextOptions::IsEnabled(true); | ||
} | ||
pub fn set_fast(mut self, tokenizer_name: &str) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: is it possible to set fast to Disabled?
FastFieldTextOptions::IsEnabled(false) | ||
/// Enum used to control the way we serialize fast field text options. | ||
/// | ||
/// For backward compatiblity reasons, we folow the format introduce in tantivy 0.19. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// For backward compatiblity reasons, we folow the format introduce in tantivy 0.19. | |
/// For backward compatibility reasons, we follow the format introduced in tantivy 0.19. |
@@ -153,6 +153,8 @@ pub use self::term::{Term, ValueBytes, JSON_END_OF_PATH}; | |||
pub use self::text_options::{TextFieldIndexing, TextOptions, STRING, TEXT}; | |||
pub use self::value::Value; | |||
|
|||
pub(crate) const DEFAULT_FAST_FIELD_TOKENIZER: &str = "default"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: don't we want to call it by its real name lower
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think TokenizerManager::default_for_fast_fields
registers it both under lower
and under default
. If I understand things correctly, using default
would mean the concrete choice is subject to change whereas lower
should always yield the lower case even in future Tantivy versions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer default_fast_field
, to avoid confusing it with default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fmassot @adamreichold "default" is the one that will be used when a user is not specifying any tokenizer.
For isntance, when using the flags.
.add_text(my_field, STRING | FAST)
It is then possible in tantivy to pass your own tokenizer manager for indexing and for fast fields.
So not putting the specific tokenizer directly here makes it possible for users to pass a tokenizer manager with the default tokenizer they prefer.
{ fast: true }
now results in the use of a the default fast field tokenizer. (instead of no tokenizer) The default tokenizer lowercases.Fast field gets a different default tokenizer manager than the normal tokenizer.
The serialization of the fast field options is unchanged.