Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix UTF-32 BOM #8403

Closed
wants to merge 3 commits into from
Closed

Fix UTF-32 BOM #8403

wants to merge 3 commits into from

Conversation

aSemy
Copy link

@aSemy aSemy commented May 10, 2024

The UTF-32 BOM is `0000feff`, not `0000ffff`
@aSemy aSemy mentioned this pull request May 10, 2024
@swankjesse
Copy link
Member

ResponseBodyJvmTest > stringBomUtf32Le() FAILED
    org.opentest4j.AssertionFailedError: expected:<"[hello]"> but was:<"[hello]">
        at app//okhttp3.ResponseBodyJvmTest.stringBomUtf32Le(ResponseBodyJvmTest.kt:96)

ResponseBodyJvmTest > readerBomUtf32Le() FAILED
    org.opentest4j.AssertionFailedError: expected:<"[hello]"> but was:<"[hello]">
        at app//okhttp3.ResponseBodyJvmTest.readerBomUtf32Le(ResponseBodyJvmTest.kt:178)

it's important to sort the entries so that longer BOMs are first, otherwise a UTF16 BOM will 'hide' a UTF32 BOM (since they start with the same bytes)
"ffff0000".decodeHex(),
)
"fffe0000".decodeHex(),
).sortedByDescending { it.size }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use an explicit order. Arbitrarily sorting breaks the indices used on the matching side, which will also need changed to match the new order.

@Endeavour233
Copy link
Contributor

Endeavour233 commented May 11, 2024

ResponseBodyJvmTest > stringBomUtf32Le() FAILED
    org.opentest4j.AssertionFailedError: expected:<"[hello]"> but was:<"[hello]">
        at app//okhttp3.ResponseBodyJvmTest.stringBomUtf32Le(ResponseBodyJvmTest.kt:96)

ResponseBodyJvmTest > readerBomUtf32Le() FAILED
    org.opentest4j.AssertionFailedError: expected:<"[hello]"> but was:<"[hello]">
        at app//okhttp3.ResponseBodyJvmTest.readerBomUtf32Le(ResponseBodyJvmTest.kt:178)

This happens because UTF-16LE BOM(fffe) is a prefix of UTF-32LE BO(fffe0000) and UTF-32LE follows UTF-16LE in current UNICODE_BOMS, so UTF-32LE will be stripped when building trie for UNICODE_BOMS, consequently the "hello" string mistaken as UTF-16LE encoded. After raising the priority of UTF-32LE as below, it's fixed.
截屏2024-05-11 18 43 17

截屏2024-05-11 18 44 04
@aSemy

@aSemy
Copy link
Author

aSemy commented May 13, 2024

Thank you for explaining, I see the problem and the explanation is clear. I fixed the same problem in my project by sorting the options by length, but that's only because I'm using TypedOptions. I don't really have time to fix it and the indices here.

Would someone else mind picking it up? I've enabled "Allow edits by maintainers", or feel free to open a new PR with this one as inspiration.

@Endeavour233
Copy link
Contributor

Thank you for explaining, I see the problem and the explanation is clear. I fixed the same problem in my project by sorting the options by length, but that's only because I'm using TypedOptions. I don't really have time to fix it and the indices here.

Would someone else mind picking it up? I've enabled "Allow edits by maintainers", or feel free to open a new PR with this one as inspiration.

I'd like to open a new PR. Should I pick up the one in square/wire as well?

@aSemy
Copy link
Author

aSemy commented May 13, 2024

I'd like to open a new PR. Should I pick up the one in square/wire as well?

Be my guest!

@aSemy
Copy link
Author

aSemy commented May 13, 2024

Continued in #8407

@aSemy aSemy closed this May 13, 2024
@aSemy aSemy deleted the patch-1 branch May 13, 2024 11:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants