Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

why does urlencoded charset must be utf-8? #237

Open
rodic opened this issue Apr 5, 2017 · 8 comments
Open

why does urlencoded charset must be utf-8? #237

rodic opened this issue Apr 5, 2017 · 8 comments
Labels

Comments

@rodic
Copy link

rodic commented Apr 5, 2017

looks like there's an unnecessary charset check in lib/types/urlencoded.js?

// assert charset
var charset = getCharset(req) || 'utf-8'
if (charset !== 'utf-8') {
  debug('invalid charset')
  next(createError(415, 'unsupported charset "' + charset.toUpperCase() + '"', {
    charset: charset
  }))
  return
}

that value is eventually passed to read fn (defined in lib/read.js) which can work with encodings other than utf-8 as long as they are supported by iconv

// assert charset is supported
if (opts.encoding === null && encoding !== null && !iconv.encodingExists(encoding)) {
  return next(createError(415, 'unsupported charset "' + encoding.toUpperCase() + '"', {
    charset: encoding.toLowerCase()
  }))
}
@dougwilson
Copy link
Contributor

So it's just not yet a completed implementation. In order to determine the charset of the contents, the spec https://www.w3.org/TR/html5/forms.html#url-encoded-form-data spells out the steps to do this. UTF-8 is the specified default value, so it's the only value supported right now until the steps are implemented.

@rodic
Copy link
Author

rodic commented Apr 5, 2017

thanks for the answer @dougwilson.

so basically if you remove check then there are cases when charset cannot be determined reliably, according to the specs that is? do you maybe an have idea when other charsets will be supported?

@dougwilson
Copy link
Contributor

Hi @rodic normally, that type will never have a charset in the header at all; it's only present in some buggy version of Firefox. The spec says that the charset is actually specified in a magic _charset_ parameter value and that's when you include that as a hidden input element on your form.

I'm not working on implementing it, so you're welcome to. Remember that that charset is not the spec for what goes to raw-body; it needs to go to (and thus be supported by) the querystring and qs modules since it applies to the url-decoding (the percent decoding), not to the raw unencoded characters (which technically are required to only be US-ASCII).

@rodic
Copy link
Author

rodic commented Apr 6, 2017

@dougwilson i have an issue with api that is implemented with callbacks. when they send data they set charset in the header to iso-8859-1. so as a workaround, for the callback url, i use raw parser and then i do charset checking, decoding and params parsing outside of the lib (using the same tools - contentType, iconv and qs).

i'll take a look at the issue when i find time. on the first glance it seems that roughly we would have to decode body with charset from header if it is set or "utf8" otherwise. then if __charset__ param exists and is different from the header value or "utf8" we should decode raw data once again?

@intellix
Copy link

intellix commented Mar 10, 2019

@rodic is that Paysafecard payment callbacks by any chance? having the same issue

The header being sent to me is:

Content-Type: application/x-www-form-urlencoded; charset=ISO-8859-1

I believe you're saying there isn't a spec for how to detect it, but what if the header explicitly states what charset it is and you don't need detection. Could it be allowed through then?

Using https://github.com/ds300/patch-package you can get around it like so:

patches/body-parser+1.18.3.patch

diff --git a/node_modules/body-parser/lib/types/urlencoded.js b/node_modules/body-parser/lib/types/urlencoded.js
index 5ccda21..02f22e0 100644
--- a/node_modules/body-parser/lib/types/urlencoded.js
+++ b/node_modules/body-parser/lib/types/urlencoded.js
@@ -101,16 +101,8 @@ function urlencoded (options) {
       return
     }
 
-    // assert charset
+    // get charset
     var charset = getCharset(req) || 'utf-8'
-    if (charset !== 'utf-8') {
-      debug('invalid charset')
-      next(createError(415, 'unsupported charset "' + charset.toUpperCase() + '"', {
-        charset: charset,
-        type: 'charset.unsupported'
-      }))
-      return
-    }
 
     // read
     read(req, res, next, parse, debug, {

Hope it helps someone else

@snax4a
Copy link

snax4a commented Feb 25, 2020

Had the same problem with my Paysafecard API they were sending request with charset "ISO-8859-1"
and i had UnsupportedMediaTypeError: unsupported charset "ISO-8859-1" error.

Solved it by just placing this route handler above bodyparser middleware so its not used here

@kasvith
Copy link

kasvith commented May 29, 2020

I've the same problem with Paysafecard API

@kasvith
Copy link

kasvith commented May 29, 2020

@snax4a

Had the same problem with my Paysafecard API they were sending request with charset "ISO-8859-1"
and i had UnsupportedMediaTypeError: unsupported charset "ISO-8859-1" error.

Solved it by just placing this route handler above bodyparser middleware so its not used here

What you mean by placing above bodyParser? its sending a request with application/x-www-form-urlencoded and its decoded by bodyParser, how you solved this issue by adding route before bodyparser?

choonkeat-govtech added a commit to choonkeat-govtech/singpass-myinfo-oidc-helper that referenced this issue Jul 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants