Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken collation order for Latin characters with diacritics #25219

Open
1 of 7 tasks
9p6 opened this issue May 15, 2024 · 5 comments
Open
1 of 7 tasks

Broken collation order for Latin characters with diacritics #25219

9p6 opened this issue May 15, 2024 · 5 comments
Labels
Ignored rules issue that does not follow the rules (no template, missing debug log, ...) Triage: Needed (managed by bot!) issue that was just created and needs someone looking at it

Comments

@9p6
Copy link

9p6 commented May 15, 2024

Bug report

Describe the bug

Here is a clear and concise description of what the problem is:

It seems that sorting order for movie titles that contain Latin characters with diacritics is buggy. I'll describe the problem I discovered with Polish language but this very likely applies to other Latin based scripts.

Expected Behavior

Here is a clear and concise description of what was expected to happen:

Expected character order for Polish language:

a ą b c ć d e ę f g h i j k l ł m n ń o ó p q r s ś t u v w x y z ź ż

Actual Behavior

Actual order in movie title list (similar results in file manager)

ą a b ć c d e ę f g h i j k l m n ń ó o p q r ś s t u v w x y ż ź z ł

Most diacritics are inversed with relation to their base letter and in addition letter ł is shifted to the end

Possible Fix

As locale-aware sorting for Unicode has been a long solved problem I figure Kodi must be using an in house solution for whatever needs may be that requires fixing.

To Reproduce

Steps to reproduce the behavior:

To generate a test case I used the code:

for x in a ą b c ć d e ę f g h i j k l ł m n ń o ó p q r s ś t u v w x y z ź ż; do touch "$x.mkv"; echo "<movie><title>$x</title></movie>" > "$x.nfo"; done

Scan the above files with local NFO scraper and then verify order on movie title list.

Debuglog

The debuglog can be found here:
https://paste.kodi.tv/welogokofi.kodi

Screenshots

Here are some links or screenshots to help explain the problem:

movies

Additional context or screenshots (if appropriate)

Here is some additional context or explanation that might help:

Your Environment

Used Operating system:

  • Android

  • iOS

  • tvOS

  • Linux

  • macOS

  • Windows

  • Windows UWP

  • Operating system version/name: Ubuntu 22.04, LibreELEC 12

  • Kodi version: 20.2, 21

  • Locale: C.utf8, en_US.utf8, pl_PL.utf8 (same results with different locales)

@xbmc-gh-bot xbmc-gh-bot bot added Triage: Needed (managed by bot!) issue that was just created and needs someone looking at it Ignored rules issue that does not follow the rules (no template, missing debug log, ...) labels May 15, 2024
@xbmc-gh-bot
Copy link

xbmc-gh-bot bot commented May 15, 2024

Thank you for using Kodi and our issue tracker. This is your friendly Kodi GitHub bot 😉

It seems that you have not followed the template we provide and require for all bug reports (or have opened a roadmap item by accident). Please understand that following the template is mandatory and required for the team to be able handle the volume of open issues efficiently.

Please edit your issue message to follow our template and make sure to fill in all fields appropriately. The issue will be closed after one week has passed without satisfactory follow-up from your side.

This is an automatically generated message. If you believe it was sent in error, please say so and a team member will remove the "Ignored rules" label.

@scott967
Copy link
Contributor

A log captured with debug logging enabled is required for this.

@9p6
Copy link
Author

9p6 commented May 16, 2024

I've attached the log from library scan process

@scott967
Copy link
Contributor

From your log
info <general>: CLangInfo: loading resource.language.en_gb language information... debug <general>: trying to set locale to en_DE.UTF-8 info <general>: global locale set to C

You should have the default "c" collation.

@9p6
Copy link
Author

9p6 commented May 17, 2024

The default C collation would have all diacritics shifted to the end

a b c d e f g h i j k l m n o p q r s t u v w x y z ó ą ć ę ł ń ś ź ż

so definitely it's something else. I don't know what to make of en_DE.UTF-8 in the log this seems random.

Forgot to mention that I did try starting Kodi with different locales en_US.utf8, pl_PL.utf8 yet it doesn't change anything with the reported issue

LANG=en_US.utf8 kodi --debug

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ignored rules issue that does not follow the rules (no template, missing debug log, ...) Triage: Needed (managed by bot!) issue that was just created and needs someone looking at it
Projects
None yet
Development

No branches or pull requests

2 participants