Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange bug in numeric sorting (natural sorting) #6192

Open
lapo-luchini opened this issue Apr 26, 2024 · 8 comments · May be fixed by #6256
Open

Strange bug in numeric sorting (natural sorting) #6192

lapo-luchini opened this issue Apr 26, 2024 · 8 comments · May be fixed by #6256
Labels
enhancement New feature or request

Comments

@lapo-luchini
Copy link
Contributor

Describe the bug

When sorting values which have numerical components I'm trying with sort_*_numeric but that appears only to work with proper numbers, it seem not to work for mixed content like it is generally expected from natural sorting.
I'd suggest either replacing it with a natural sorting order* or adding new sort*_natural functions to keep both.

*: the only case I can think of where natural sorting is NOT an extension of numeric sorting is pure numbers with a single period "1.9" vs "1.10", which would be different when interpreted as decimal numbers or as version numbers.

To Reproduce

sort_by_label_numeric(vm_app_version, "short_version")

Version

victoria-metrics-20240116-231234-tags-v1.93.10-0-gd277977

Logs

No response

Screenshots

image

Used command-line flags

No response

Additional information

No response

@lapo-luchini lapo-luchini added the bug Something isn't working label Apr 26, 2024
@lapo-luchini
Copy link
Contributor Author

OTOH #2938 (despite the title) was actually asking for natural sorting already (1:0:2 before 1:0:15) and was closed as solved, is this a regression?

@lapo-luchini
Copy link
Contributor Author

Mhhhhh, something's not right!

Starting from the test example I tried changing ":" with ".", still OK. Then tried adding "v" in front, still OK, but then with same format but putting "real VM versions" it gets wrong:

sort_by_label_numeric((
	label_set(1, "x", "v1.0.15"),
	label_set(2, "x", "v1.0.2"),
	label_set(3, "x", "v1.100.1"),
	label_set(4, "x", "v1.93.10")
), "x")

Gets sorted like this:

v1.0.2
v1.0.15
v1.100.1
v1.93.10

@lapo-luchini lapo-luchini changed the title Use natural sorting instead (in addition) of numeric sorting Strange bug in numeric sorting (natural sorting) Apr 26, 2024
@lapo-luchini
Copy link
Contributor Author

I see that official docs say call it "numeric sort" and links to an article that specifies that numeric sort is not version sort.
But that contradicts the 1:0:2 < 1:0:15 test case, which strongly implies version sort. 🤔
Actual implementation seems to be in between: it separates the string in two parts AFAICT, not in "all numeric and non-numeric sub-ranges" as version sort would require, but not either a single numeric range.
PS: what I would like is specifically "version sort" (maybe as a new function) as described in that page.

@f41gh7
Copy link
Contributor

f41gh7 commented Apr 26, 2024

Thanks for investigation, sort_by_label_semver maybe a good candidate for new function.

@f41gh7 f41gh7 added enhancement New feature or request and removed bug Something isn't working labels Apr 26, 2024
@lapo-luchini
Copy link
Contributor Author

I know little Go, but I could certainly try a PR.
Would it be OK to use something like this dependency?

@lapo-luchini
Copy link
Contributor Author

Basically the problem is that the current "numeric" sort accepts the first (and only the first) dot as a decimal separatore, so:

  • 1:0:2 < 1:0:15 because separators are not ambiguous
  • 1.0.2 < 1.0.15 because it is separated as (1.0, '.', 2) < (1.0, '.', 15) and is correct by chance
  • v1.100.1 < v1.93.10 because it is separated as ('v', 1.1, '.', 1) < ('v', 1.93, '.', 1)

I see the logic that a "numeric" sort should accept decimal values (when alone), I'm not so convinced that a numeric sort should consider the first component with a dot of a string which contains much else (and thus is certainly NOT a decimal value) and I'd resort to natural sort (i.e. not considering the dot as a decimal). This behavior, if kept as is, should probably documented very well, as the current link to "numeric sort is not version sort" doesn't correspond to the current behavior (only describes behavior in the case of simple decimal values with a single dot).

OTOH changing "numeric" behavior would probably carry backwards compatibility problems (?), so my PR #6256 adds a sort_by_label_natural function and keeps the numeric one as is.

@lapo-luchini
Copy link
Contributor Author

the current link to "numeric sort is not version sort" doesn't correspond to the current behavior (only describes behavior in the case of simple decimal values with a single dot).

To be more clear the documentation (both in the link and in current docs) only makes examples with pure numeric values, and in that case the behavior is well-defined. Current "numeric" behavior is an extension of that (by accepting numeric values also with extra non-numerics parts and dots) and that extension might or might not be ideal… IMvHO "1.10.2 > 1.9.2" should be true also in "numeric" sort, as it is not a number, but that's debateable (and never documented anyways), this is why instead of patching the current version to change behavior when more than one dot is present I chose to create the two new "natural" sorting functions.

If you prefer instead a patch to change "numeric" behavior in those cases, I can do that.
(OTOH there is still value in natural/version sort even for single cases which are basic numbers, such as "1.10 > 1.9" so having two sets of functions seems to be necessary anyways)

valyala added a commit that referenced this issue May 13, 2024
Natural sorting is needed for sort_by_label_natural() and sort_by_label_natural_desc()
functions in MetricsQL - see #6192
and #6256

Natural sorting will be also used by `| sort ...` pipe in VictoriaLogs -
see https://docs.victoriametrics.com/victorialogs/logsql/#sort-pipe
@valyala
Copy link
Collaborator

valyala commented May 13, 2024

@lapo-luchini , thanks for thorough investigation of the issue! Let's do the following:

Please update the pull request to lib/stringsutil.LessNatural for natural sorting for sort_by_label_* functions in MetricsQL.

hagen1778 pushed a commit that referenced this issue May 13, 2024
Natural sorting is needed for sort_by_label_natural() and sort_by_label_natural_desc()
functions in MetricsQL - see #6192
and #6256

Natural sorting will be also used by `| sort ...` pipe in VictoriaLogs -
see https://docs.victoriametrics.com/victorialogs/logsql/#sort-pipe

(cherry picked from commit 707f3a6)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants