New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix conversion problems with fs::path
on Windows
#5638
Conversation
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment has been minimized.
This comment has been minimized.
src/core/control/XournalMain.cpp
Outdated
@@ -85,7 +85,7 @@ void initCAndCoutLocales() { | |||
void initLocalisation() { | |||
#ifdef ENABLE_NLS | |||
fs::path localeDir = Util::getGettextFilepath(Util::getLocalePath().u8string().c_str()); | |||
bindtextdomain(GETTEXT_PACKAGE, localeDir.u8string().c_str()); | |||
bindtextdomain(GETTEXT_PACKAGE, localeDir.string().c_str()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How you know that bindtextdomain
requires system narrow encoded strings instead of utf-8?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We must be careful here, vcpkg fixed this for gettext.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How you know that
bindtextdomain
requires system narrow encoded strings instead of utf-8?
I took a quick look at the source code of gettext
, and it seems that gettext
will open those message files by calling _open
or _wopen
. At least on MSVC
, these functions do not accept UTF-8.
Maybe we should use wbindtextdomain
, as suggested by the documentation.
We must be careful here, vcpkg fixed this for gettext.
I wasn't even aware of this patch. Thanks for pointing this out. I'll check it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should use
wbindtextdomain
, as suggested by the documentation.
I think we are with that on the secure side. The patch does not seem to be applied upstream, only in vcpkg.
I have the bad feeling, that this patch will break old xopp files with paths containing non ASCII characters, since they stored paths as ANSI/system encoded. Now those paths are assumed to be encoded in utf-8. |
This comment was marked as off-topic.
This comment was marked as off-topic.
I made two modifications to
In the original Regarding the background image filename, yes, it may break compatibility with old |
Loading of the system file encoding would be better, since old files are already stored with it.
Maybe, we can find a solution to use utf-8 strings in general, e.g. by advancing the file version. We would like to have them in utf8 anyway. That would improve portability of xopp files, and keep old files compatible. |
@panpaul you can increase the DEV_FILE_FORMAT_VERSION in Line 251 in 8dbb0cf
This would be the best of both versions. |
@@ -472,7 +490,7 @@ void LoadHandler::parseBgPdf() { | |||
attachToDocument = true; | |||
// Handle old format separately | |||
if (this->isGzFile) { | |||
pdfFilename = (fs::path{xournalFilepath} += ".") += pdfFilename; | |||
pdfFilename = (xournalFilepath += ".") += pdfFilename; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, the copy was intentional here, since it is a field of LoadHandler
.
Changing this might be ok in this scope, since the value is not used after this line, but might cause a bug, when this will change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, you are right. I shouldn't modify this. I will revert it.
@bhennion what do you think of that solution? |
I sounds good to me. I'd add a test unit for version 5 files, and conversion from version 4 to 5 though. |
@panpaul are you willing to do this, or is that too much effort for you? |
Sorry, I was busy in the past few days. I'll add the related tests in the next few days. |
I have some new findings while writing tests. I'm using a Windows ARM64 device, and it's the first time I've tried the
[1]: In short,
In
The conversion should be automatic if the file can be opened, as we have dedicated 'load' and 'save' logic. Based on the table above, the Should we add a logic to check whether the string is already encoded with UTF-8 and load them with Notes on how the test files are create:
|
Regarding [1], when is g_markup_parse_context_parse called?
I think, since we ever only supported mingw, we can really change |
Test files, that test non western utf8 symbols would still be good. |
xournalpp/src/core/control/xojfile/LoadHandler.cpp Lines 227 to 239 in 5b8b85c
To be honest, I don't quite understand the constructor part...
Sorry, I didn't catch you. So should I close this PR and wait for #5641 (and add tests for non western utf8 symbol in #5641) |
Yes
It's just an invariant I assume. Constructing a path from the narrow string representation of another path via the constructor for narrow strings should result in an equal path object.
No, just return to your first version without the version bump. Exporting paths with a clang/msvc compiled binary should also be correct. |
Thank you @panpaul! |
(I'm testing on MS Windows and my system language is set to Chinese)
I have observed these problems:
These problems are mainly caused by the conversion between UTF-16 and UTF-8, fixed by using
.u8string()
andfs::u8path()