-
Notifications
You must be signed in to change notification settings - Fork 281
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] first pass at adding in inoas's code #1150
base: main
Are you sure you want to change the base?
Conversation
end | ||
end | ||
|
||
# TODO: Do not call String.graphemes/1 String.reverse/1 multiple times but work on list of char and benchmark with https://hexdocs.pm/benchee/readme.html |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO found
|> String.reverse() | ||
end | ||
|
||
# TODO Same |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO found
end | ||
end | ||
|
||
# TODO: Do not call String.graphemes/1 String.reverse/1 multiple times but work on list of char and benchmark with https://hexdocs.pm/benchee/readme.html |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO found
end | ||
end | ||
|
||
# TODO: Do not call String.graphemes/1 String.reverse/1 multiple times but work on list of char and benchmark with https://hexdocs.pm/benchee/readme.html |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO found
|> String.trim() | ||
end | ||
|
||
# TODO ask inoas what this is for and how to use it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO found
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caveat: I am out of this for quite some time and very loaded on work, so it will be slow from my side, sorry about that.
take_graphemes_at_max_bytes
was as far as I know meant to limit search string length.
There must be basically 2 or 3 limits:
- limit on the hole string that will hit SQL
- limit on the number of and/or operations that will hit SQL
- limit on the length of each search-stringlet, aka
foo bar | quux
= 3, allor
'edfoo & bar & quux
= 3, alland
'ed - so each of thosefoo
,bar
andquux
should not contain more than x graphemes (where each grapheme can AFAIR be 1 to 4 bytes because of UTF8).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea was to make sure to not overload the database with too long strings and/or too many boolean operations, etc - so let's say a total of 10 and/or and total of 10 strings each being max 100 graphemes (or 400 bytes which would already be 4000 bytes of max string search size in this example).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took another look: https://gist.github.com/inoas/19eedcdd1d0da03cd180d1c4ba29be34w
So take_graphemes_at_max_bytes
was to limit the search query to X bytes totally.
However that could still result into a & b & c & ... 8 & 9
= about 200/2 = 100 logical operations at max graphemes of 200 - all in ts_query (so some weak/fuzzy search that might hit postgres at 100 of them onto 4 columns).
SourceLevel has finished reviewing this Pull Request and has found:
|
|> String.trim() | ||
end | ||
|
||
# TODO ask inoas what this is for and how to use it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caveat: I am out of this for quite some time and very loaded on work, so it will be slow from my side, sorry about that.
take_graphemes_at_max_bytes
was as far as I know meant to limit search string length.
There must be basically 2 or 3 limits:
- limit on the hole string that will hit SQL
- limit on the number of and/or operations that will hit SQL
- limit on the length of each search-stringlet, aka
foo bar | quux
= 3, allor
'edfoo & bar & quux
= 3, alland
'ed - so each of thosefoo
,bar
andquux
should not contain more than x graphemes (where each grapheme can AFAIR be 1 to 4 bytes because of UTF8).
|> String.trim() | ||
end | ||
|
||
# TODO ask inoas what this is for and how to use it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea was to make sure to not overload the database with too long strings and/or too many boolean operations, etc - so let's say a total of 10 and/or and total of 10 strings each being max 100 graphemes (or 400 bytes which would already be 4000 bytes of max string search size in this example).
@@ -9,6 +9,10 @@ defmodule HexpmWeb.PackageController do | |||
def index(conn, params) do | |||
letter = Hexpm.Utils.parse_search(params["letter"]) | |||
search = Hexpm.Utils.parse_search(params["search"]) | |||
sanitized_search = Hexpm.Search.Sanitizer.sanitize(params["search"]) | |||
IO.inspect(sanitized_search, label: "sanitized_search") | |||
parsed_result = Hexpm.Search.Parser.parse_sanitized_user_input(sanitized_search) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for this part the idea I had was to be nice to the user:
- the sanitizer kicks in, it tries to guess best what the user might have meant, adding additional parentheses where it is clear that they are lacking or removing extra ones.
- the parser kicks in, it's result is transformed to both: SQL and a new search string, so that the user will see a sanitizer and parsed (aka correct) version, much like a code formatter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In both my original sanitizer.exs and parser.exs where some inline pseudo-tests (just input's and output to stdout), that should be carried over and created as real unit tests so we can verify that both work on their own
@ericmj let's see if I can find some time during the next 4 weeks, sick today. If you close it please let the branch live so I can pull/clone it. |
This is a first pass at integrating the parser and sanitizer. I still need to turn the SQL query into ecto and maybe add some DB migrations.