Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

=?UTF-8?Q? in e-mail headers #2923

Open
jeffjohnson11 opened this issue Dec 13, 2023 · 7 comments
Open

=?UTF-8?Q? in e-mail headers #2923

jeffjohnson11 opened this issue Dec 13, 2023 · 7 comments

Comments

@jeffjohnson11
Copy link

jeffjohnson11 commented Dec 13, 2023

Describe the bug
When sending e-mails, Espo is adding =?UTF-8?Q? and other non-printable characters into 'From' and 'Subject' fields of the header.

To Reproduce

  1. Send an e-mail via formula or via 'Send Test Email' button
  2. Look at headers of received e-mail

Expected behavior
The e-mail header fields from and subject should contain plain characters as long as no special non-utf8/ascii characters have been entered by the user.

Actual behavior

  1. The issue only affects the 'from' field when you set a 'from name' in Espo.
  2. The issue seems to affect the subject as well (regardless whether 'from name' is filled or not)
  3. The behavior does not depend on special characters in the 'from name'. Using a simple one like 'asd' also causes it, please see screenshots.
  4. [not an important issue, but noticed during testing] If you enter a 'from name' in outbound settings, then save, then reload the page, then delete the contents of the 'from name' field, then click 'Send Test Email', the system still sends the 'from name' you had before

Screenshots
E-Mail addresses have been hidden. They are correct though and do not contain faulty characters.

utf8_from
'from name' has been set​

simple_from
no 'from name' has been set

EspoCRM version
8.0.4

Additional context
I have replied to another user in the forum who seems to have the same issue but I think my post there might not be very visible. https://forum.espocrm.com/forum/general/78024-cryptic-from-sender-in-e-mails

​I think the words have somehow been double or unnecessarily encoded. This will trigger some spam guards as this technique is sometimes used by spammers. If I understand correctly (but please don't quote me on it ), the email currently does not follow strict RFC5322 standards https://datatracker.ietf.org/doc/html/rfc5322#section-3.4
An improvement might be to only perform the current encoding if non-standard-ASCII characters are encountered.

A few possibly useful links:
https://security.stackexchange.com/a/213477
https://stackoverflow.com/a/55210089
​I think the issue might occur here (haven't debugged yet though) https://github.com/espocrm/espocrm/blob/master/application/Espo/Core/Mail/Sender.php#L569​

@yurikuzn
Copy link
Contributor

yurikuzn commented Dec 13, 2023

It seems it determines encoding here: https://github.com/laminas/laminas-mail/blob/2.25.1/src/Header/Subject.php#L105. As encoding set to UTF-8 it will encode the header.

If we remove setEncoding, not sure if everything will work fine. We would need a lot of testing. Can't do it soon. Maybe only after v8.1 release. Any help appreciated.

does not follow strict RFC5322 standards

Not sure if it's true.

@yurikuzn
Copy link
Contributor

image

I think if we remove it, we might get problems. Maybe added =?UTF-8?Q? should not be considered as a problem? If it really may cause issues with spam detectors, would be good to see any information on it.

@yurikuzn
Copy link
Contributor

Related: laminas/laminas-mail#54

@yurikuzn
Copy link
Contributor

yurikuzn commented Dec 13, 2023

I tried w/o setEncoding and it failed to send when met UTF-8 characters in the From Name, requiring explicitly set UTF-8 for the 'from' header. This should be the same for any address-line header and maybe for some other headers too. The subject header is fine as the implementation detects UTF-8.

This means that to fix it we would need to check encoding for each header manually, not relying on the laminas-mail. Not sure whether I would like having this logic.

Currently I'm not determined that this issue should be considered as a problem. But I may change my mind.

@jeffjohnson11
Copy link
Author

jeffjohnson11 commented Dec 13, 2023

We noticed the behavior when quite a few e-mails had not been delivered. Below is one example that lead me to believe that the e-mail is not following proper standards. Additionally my Android with K9-Mail does not properly display the sender in notification, can't recreate it now for a screenshot though.
Upon further research I found that we might be following RFC2047 https://ldu2.github.io/rfc2047/ - so I don't understand why GMX didn't like the mail...
Anyways, I'll gladly help debugging. Starting it right now, I'll get back to you.

host mx01.emig.gmx.net [212.227.17.5] said: Transaction failed Reject due to policy restrictions. For explanation visit
https://postmaster.gmx.net/de/case?c=r0710&i=ip&v=85.215.2

@jeffjohnson11
Copy link
Author

jeffjohnson11 commented Dec 13, 2023

So I found out:

  • It is valid to have =?UTF-8?Q? in a mail header if you want to send non-ASCII characters. Encoding like this is called mime encoding. Source: https://en.wikipedia.org/wiki/MIME#Encoded-Word
  • It is not necessary that all strings like "=?UTF-8?Q?EspoCRM?=" are mime encoded but it does not cause errors
  • Laminas does not use php built in functions like iconv_mime_encode or mb_encode_mimeheader but instead encodes the headers itself, see https://github.com/laminas/laminas-mime/blob/2.13.x/src/Mime.php#L483 However, those php functions exhibit the same encoding behavior which leads me to believe that the laminas implementation was based on the faulty php functions.
  • In our 'from name' we had the characters [ and ]. These character's hex codes are 5B and 5D. Laminas does not have a mapping for these characters so it would send the header like From: =?UTF-8?Q?TestName=20[TestInBrackets]?=
  • This causes an error with the mail checkers and also my mail client (SOGo). Instead From: =?UTF-8?Q?TestName=20TestWithoutBrackets?= works fine.

Possible solutions:

Recommendation:
Laminas should probably implement a better base64 header handling. Their function encodeBase64Header https://github.com/laminas/laminas-mime/blob/2.13.x/src/Mime.php#L593 is not being called anywhere. I have not tested more difficult characters like chinese ones but I have the feeling that base64 might be the most safest way to handle this complex topic.

@yurikuzn
Copy link
Contributor

So basically we have another problem which is more significant, that not all characters are properly mime encoded that may result in delivery problems with some providers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants