Created an initial pluggable tokenizer with ngram support in order to allow using lunr to drive autocomplete style search boxes. #63

wballard · 2014-01-18T19:24:35Z

I use this library all the time, thanks for making it available. One use case we keep doing more is client side autocomplete, and have found that ngram indexing on the server -- usually ElasticSearch -- is giving us the best results. I just need that functionality client side, and in node.js, and don't care to fuss with going out of process to Elastic Search if I can avoid it.

I tried to follow along with your style and formatting, and hopefully did so to your satisfaction.

This sets up an index level tokenizer, I didn't dive as far in as #21, as that implies field level pipelines and tokenizers -- which really then should have some extension to pipeline to 'start' with a tokenizer then stream through multiple filters in the pipeline -- or some other field object that combines a tokenizer and pipeline.

allow using lunr to drive autocomplete style search boxes.

pangratz · 2014-01-18T19:31:05Z

lib/ngramtokenizer.js

+}
+
+/**
+ * A tokenizer tha indexes on character bigrams.


s/tha/that/

wballard · 2014-01-18T19:48:05Z

Thanks -- I can see how I totally copy-pasta that same doc error.

olivernn · 2014-01-21T18:15:09Z

Many thanks for taking the time to look into this.

I think that an ngram tokeniser would make a great plugin for lunr, as part of the changes I am making for better i18n support I am add a very simple plugin system that I think you could take advantage of. It's great to have another potential use case for a plugin so that I make sure the API is flexible enough.

Let me take a closer look through your changes and see if I can make some suggestions of how to extract this as a plugin.

Thanks again!

hugovincent · 2014-05-09T13:16:26Z

Any update on this?

rowanoulton · 2014-08-13T23:38:32Z

Hey, is there an ETA for merging this or the plugin system mentioned? Would love to use it!

cvan · 2014-09-24T22:03:36Z

@olivernn can this be merged in or is the plugin system ready yet?

missinglink · 2016-01-19T10:43:25Z

I would also like to contribute ngram analyzers for autocomplete. what is the status of this? it's been open for a year now and so I'm hesitant to do any more work on it.

olivernn · 2016-01-19T15:22:55Z

The means to add plugins to lunr already exists. The main extension point is to modify an indexes text processing pipeline. Each index has its own pipeline, and so a plugin can safely modify the pipeline of the index it is being applied to.

I think though that in these cases the tokenizer needs to be modified. This is possible but for reasons the tokeniser is global, not individual per index. So all indexes will then be forced to use the replacement tokenizer, this may or may not be a problem.

An example:

var myNgramTokenizer = function () {
  lunr.tokenizer = function (obj) {
    // ngram implementation
  }
}

idx.use(myNgramTokenizer)

I'm not sure why the tokenizer is not a property of the instance of lunr index, I will take a look at this.

natcohen · 2020-04-17T13:18:56Z

@olivernn Great work! Any chance this could be merged? ngram and edgengram are must have nowadays... I'd love to see it built-in or as a plugin.

tienne · 2022-02-16T02:49:59Z

Is there anything we can do?

Created an initial pluggable tokenizer with ngram support in order to

f7d8a54

allow using lunr to drive autocomplete style search boxes.

pangratz reviewed Jan 18, 2014
View reviewed changes

lib/ngramtokenizer.js

}

/**

* A tokenizer tha indexes on character bigrams.

Copy link

pangratz Jan 18, 2014

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/tha/that/

documentation corrections

4df68b4

make on install, works better with npm thus

823eeca

olivernn mentioned this pull request Apr 27, 2015

Find common words, sub-phrases in list of texts? #148

Closed

aguynamedben mentioned this pull request Jun 7, 2018

Feature request: add support for multiple wildcards #342

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Created an initial pluggable tokenizer with ngram support in order to allow using lunr to drive autocomplete style search boxes. #63

Created an initial pluggable tokenizer with ngram support in order to allow using lunr to drive autocomplete style search boxes. #63

wballard commented Jan 18, 2014

pangratz Jan 18, 2014

wballard commented Jan 18, 2014

olivernn commented Jan 21, 2014

hugovincent commented May 9, 2014

rowanoulton commented Aug 13, 2014

cvan commented Sep 24, 2014

missinglink commented Jan 19, 2016

olivernn commented Jan 19, 2016

natcohen commented Apr 17, 2020

tienne commented Feb 16, 2022

Created an initial pluggable tokenizer with ngram support in order to allow using lunr to drive autocomplete style search boxes. #63

Are you sure you want to change the base?

Created an initial pluggable tokenizer with ngram support in order to allow using lunr to drive autocomplete style search boxes. #63

Conversation

wballard commented Jan 18, 2014

pangratz Jan 18, 2014

Choose a reason for hiding this comment

wballard commented Jan 18, 2014

olivernn commented Jan 21, 2014

hugovincent commented May 9, 2014

rowanoulton commented Aug 13, 2014

cvan commented Sep 24, 2014

missinglink commented Jan 19, 2016

olivernn commented Jan 19, 2016

natcohen commented Apr 17, 2020

tienne commented Feb 16, 2022