Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Escaping in wordforms and exceptions #2146

Closed
8 tasks done
sanikolaev opened this issue May 7, 2024 · 4 comments
Closed
8 tasks done

Escaping in wordforms and exceptions #2146

sanikolaev opened this issue May 7, 2024 · 4 comments

Comments

@sanikolaev
Copy link
Collaborator

sanikolaev commented May 7, 2024

Proposal:

There seems to be no way to add a wordform for e.g. a> and exception for a=>.
I suggest we make it possible to escape special characters in wordforms and exceptions.

E.g. this works as expected:

snikolaev@dev2:~$ cat /tmp/wf
a_ > abc
snikolaev@dev2:~$ mysql -P9306 -h0 -e "drop table if exists t; create table t(f text) wordforms='/tmp/wf' charset_table='non_cjk, _'; call keywords('a_', 't'); call keywords('a', 't');"
+------+-----------+------------+
| qpos | tokenized | normalized |
+------+-----------+------------+
| 1    | a_        | abc        |
+------+-----------+------------+
+------+-----------+------------+
| qpos | tokenized | normalized |
+------+-----------+------------+
| 1    | a         | a          |
+------+-----------+------------+

Here I would expect a> normalized to abc:

snikolaev@dev2:~$ cat /tmp/wf
a> > abc

snikolaev@dev2:~$ mysql -P9306 -h0 -e "drop table if exists t; create table t(f text) wordforms='/tmp/wf' charset_table='non_cjk, >'; call keywords('a>', 't'); call keywords('a', 't');"
+------+-----------+------------+
| qpos | tokenized | normalized |
+------+-----------+------------+
| 1    | a>        | a>         |
+------+-----------+------------+
+------+-----------+------------+
| qpos | tokenized | normalized |
+------+-----------+------------+
| 1    | a         | >          |
| 2    | a         | abc        |
+------+-----------+------------+

Escaping with a slash doesn't help:

snikolaev@dev2:~$ cat /tmp/wf
a\> > abc

snikolaev@dev2:~$ mysql -P9306 -h0 -e "drop table if exists t; create table t(f text) wordforms='/tmp/wf' charset_table='non_cjk, >'; call keywords('a>', 't'); call keywords('a', 't');"
+------+-----------+------------+
| qpos | tokenized | normalized |
+------+-----------+------------+
| 1    | a>        | a>         |
+------+-----------+------------+
+------+-----------+------------+
| qpos | tokenized | normalized |
+------+-----------+------------+
| 1    | a         | >          |
| 2    | a         | abc        |
+------+-----------+------------+

Here I would expect a=> normalized to abc:

snikolaev@dev2:~$ cat /tmp/exc
a=> => abc

snikolaev@dev2:~$ mysql -P9306 -h0 -e "drop table if exists t; create table t(f text) exceptions='/tmp/exc' charset_table='non_cjk, >'; call keywords('a=>', 't');"
+------+-----------+------------+
| qpos | tokenized | normalized |
+------+-----------+------------+
| 1    | a         | a          |
| 2    | >         | >          |
+------+-----------+------------+

Tested in:

Manticore 6.2.13 e12bc4f67@24050620 dev (columnar 2.2.5 0c18998@240424) (secondary 2.2.5 0c18998@240424) (knn 2.2.5 0c18998@240424)

on dev2 (Ubuntu Jammy).

Notes

Checklist:

To be completed by the assignee. Check off tasks that have been completed or are not applicable.

  • Task estimated
  • Specification created, reviewed and approved
  • Implementation completed
  • Tests developed
  • Documentation updated
  • Documentation proofread
  • Changelog updated
  • OpenAPI YAML updated and issue created to rebuild clients
@tomatolog
Copy link
Contributor

after the fix 8c3ffa5 escaping \ of the delimiters > and => works in the wordforms and exceptions files

@sanikolaev
Copy link
Collaborator Author

@tomatolog what about escaping ~?

@sanikolaev sanikolaev reopened this May 30, 2024
@sanikolaev
Copy link
Collaborator Author

Updated docs in fa05506

@tomatolog
Copy link
Contributor

escaping ~ in the wordforms added at 2ff82fe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants