Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When reading .msg files the RTF converted to HTML is garbled in some cases where the appropriate charset is not detected properly #526

Closed
huangyz0801 opened this issue May 17, 2024 · 6 comments
Labels
dependencies Pull requests that update a dependency file Priority-Low
Milestone

Comments

@huangyz0801
Copy link

Using simple Java mail to read the exported msg file from Outlook, but the HTML data in it is garbled in Chinese. The version of simple Java mail used is 6.6.1.
Uploading 1.png…

@bbottema
Copy link
Owner

Thank you for contacting, but there's not much for me to go on, here (your image won't load, either). Please also try with the latest version, since 6.6.1 is rather old and since then there have been developments in the underlying Outlook support. If that still fails for you, you could really help me help you by providing a problematic .msg that demonstrates this.

@huangyz0801
Copy link
Author

msg.zip

The msg.zip contains two msg files. After the program is parsed, garbled.msg is garbled in Chinese, while un_garbled.msg is not garbled.

@huangyz0801
Copy link
Author

I upgraded to 8.10.1, but the Chinese characters are still garbled. The JDK version is 11. The springboot version is 2.7.18.

@huangyz0801
Copy link
Author

msg.zip

Hi Bottema,
Thanks for replying me, above is the msg file. And I have upgraded to the new version. When I send the msg to users to test, the Chinese text showed as below, still garbled. Could u help to give some suggestion? Great thanks.
image

@bbottema
Copy link
Owner

bbottema commented May 25, 2024

Ok, I've looked into it and determined this is would require a fix in rtf-to-html, which is used in outlook-message-parser, which is used by Simple Java Mail.

Please refer to bbottema/rtf-to-html#13.

@bbottema bbottema changed the title Reading msg file, Chinese garbled characters When reading .msg files the RTF converted to HTML is garbled in some cases where the appropriate charset is not detected properly May 25, 2024
@bbottema bbottema added dependencies Pull requests that update a dependency file and removed 3rdparty-problem labels May 25, 2024
bbottema added a commit that referenced this issue May 25, 2024
…s a charset detection issue when converting RTF content to HTML
@bbottema
Copy link
Owner

Fix released in 8.11.0.

@bbottema bbottema added this to the 8.11.0 milestone May 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file Priority-Low
Projects
None yet
Development

No branches or pull requests

2 participants