Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect utf8toUnicode transformation for 00xx #3129

Open
marcstern opened this issue Apr 23, 2024 · 0 comments
Open

Incorrect utf8toUnicode transformation for 00xx #3129

marcstern opened this issue Apr 23, 2024 · 0 comments
Labels
2.x Related to ModSecurity version 2.x

Comments

@marcstern
Copy link
Contributor

marcstern commented Apr 23, 2024

In utf8_unicode_inplace_ex(), we have the following code:

c = *utf;
/* If first byte begins with binary 0 it is single byte encoding */
if ((c & 0x80) == 0) {
    /* single byte unicode (7 bit ASCII equivilent) has no validation */
    count++;
    if (count <= len) {
        if (c == 0) *data = x2c(&c);
        else *data++ = c;
    }
}

The code if (c == 0) *data = x2c(&c); is wrong.
utf the input string.
If input is "x", we copy the character => correct.
If input is "\x00\xA0", we call the x2c function which expects an hexadecimal string as input (like "20" for a space).
Furthermore, the output buffer (data) is not increased, so the character is overwritten on the next loop.
The correct code is

if (c == 0) {
    sprintf(data, "%%u%04x", utf[1]);
    count += 4;
    data += 6;
    i++;
    *changed = 1;
}

Remarks:

  1. Using "%04x" replaces a lot of useless code used in other parts of the code
  2. I didn't add a check if (count <= len) after count += 4; because it cannot overflow if we fix the code: len = input_len * 6 + 1; instead of len = input_len * 4 + 1; (4 bytes -> %u1234). The check (and the variable count) could thus be removed everywhere
  3. After filling data, we have several checks (/* invalid UTF-8 character number range (RFC 3629) /, / check for overlong */, ...) with this code count++; *data++ = c; What's the point? This copies the character after the ones we already copied. This looks wrong to me. Any reason to keep this code?
@marcstern marcstern added the 2.x Related to ModSecurity version 2.x label Apr 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.x Related to ModSecurity version 2.x
Projects
None yet
Development

No branches or pull requests

1 participant