Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use non blocking json parser #132

Open
mderazon opened this issue Oct 8, 2015 · 14 comments
Open

use non blocking json parser #132

mderazon opened this issue Oct 8, 2015 · 14 comments

Comments

@mderazon
Copy link

mderazon commented Oct 8, 2015

Is there an option to use a non blocking (streaming?) json parser ?

The problem with JSON.parse is that it's blocking the event loop and for big json parsing it can be a problem

@dougwilson
Copy link
Contributor

I'm not against it at all :) Typically everyone uses the default limit value in the parsers here and JSON.parse for that limit takes well under 1ms to parse, so it's typically not an issue.

If you want to put together a PR to implement this (and perhaps get urlencoded parser to be streaming too in that PR) that would be awesome, and the best way to move this forward :)

@mderazon
Copy link
Author

mderazon commented Oct 8, 2015

I've kind of ditched the idea of parsing big json from an api and went with a different approach.
That said, I hope I can find a time to work on this.

In the meantime, maybe we can discuss it a little here for the sake of having some reference

  1. Any streaming json lib you like in particularly ?
  2. Overhead of using streams here for all json parsing will probably mean that small parsing jobs performance will take a hit, as JSON.parse is faster for most cases

@mnpenner
Copy link

  1. If they do take a performance hit, perhaps you can Content-Length to choose a parser. Small payloads can continue to use JSON.parse.

@dougwilson
Copy link
Contributor

Any streaming json lib you like in particularly ?

I've never used any, so I have no preference. If you can find one that is well-supported, that would be best :)

Overhead of using streams here for all json parsing will probably mean that small parsing jobs performance will take a hit, as JSON.parse is faster for most cases

Yea, it would probably be slower, what we need to do would need to be determined by benchmarks. Also, be careful, as just keying a switch off Content-Length can simply penalize slow servers for being slow, when they could have benefited from a streaming parser for even small bodies.

@ghassanmas
Copy link

I have done research on this topic and it seems this topic is already discussed on node nodejs/node#2031 (comment). To my understanding what are suggesting on doing is not feasible.

Did I miss something?

@wesleytodd
Copy link
Member

Good find @ghassanmas, that comment gets to the point well, and IMO probably a good indication that this module should probably not take an opinion on this directly. Offering an interface to override the JSON parse implementation (and then supporting async versions of the parse) is a good idea, but directly implementing a specific one (even if it is the one recommended in that comment) is probably not a good long term solution.

@gireeshpunathil
Copy link

I would like to see what is the recommended direction here:

  • I agree that implementing a specific type of async-json would break the unopinionated attribute of the framework. In addition, we will have to deal with a lot of new compatibility issues

  • offering an interface and leaving the developers to select the impl - feasible, but brings its own issues: there should be a default, and the default should be async too - to match the async behavior of the middleware.

so the second point necessitates the first, and the first is already discarded as an option.

So? should we conclude and close this as wontfix, and revisit later when the workload nature is drastically shifted and an async-json becomes an imminent requirement?

@fed135
Copy link

fed135 commented Jan 3, 2020

Just a bit of a left field solution: I've stumbled upon https://github.com/bjouhier/i-json, which can handle json chunks directly from a stream, instead of concatenating the text and then deciding what to do with it. It's not an easy lift, but could be pretty interesting. I think I'll do a small POC to check feasability and performance overhead.

@gireeshpunathil
Copy link

just so that we are here - if you are going to do a POC, can I request to test with this yieldable-json as well?

  • it does not guarantee latency reduction, but does, on improved concurrency
  • i developed it

@fed135
Copy link

fed135 commented Jan 6, 2020

Update on my POC:

  • Unfortunately, the project I was basing myself on is no longer active: Is this project is still active? bjouhier/i-json#13 and does not compile on recent Node versions.
  • Implementation did break other parsers as the sequence of events is changed.
  • Memory consumption was 5-10% higher with the iterable-json parser. *
  • Latency percentiles and throughput where roughly the same between runs. *

Tests were performed using ApacheBench against a hello-world basic API on Node 8 (due to the i-json package), creating 100k requests with json bodies of 10kb and 100kb, at a concurrency of 100 I'm not sure if that's an indication that problems only happen on extremely large JSON bodies- which to me is a rare edge-case.

I did notice a possible area for improvement though, body parsing as a middleware typically happens before routing, which means that POST requests which would result in a 404 still go through body-parsing. @wesleytodd I'm not sure if that's significant.

@wesleytodd

This comment has been minimized.

@dougwilson

This comment has been minimized.

@renatoalencar
Copy link

renatoalencar commented Apr 9, 2020

I was reading Express docs and found this, I think this is at least a great discussion around interesting topic, so let me try to add my small contribution.

First of all, we need to decide which approach or strategy to implement a non-blocking parser. My ideas are:

Limit the amount of time the parser can use

Start the parser stopping it after a short period of time, keep the state for the next cycle of parsing. Repeat this until the stream is completely parsed.

Parse it in a different thread or process

Starting a different thread or process, parsing the stream and then resolving the value back to the original caller. A single process with its own event loop can be used in this case, so no one needs to deal with several threads running at the same time.

The actual parser

The parser could be built extending the actual Node.js JSON parser, if this is possible no one needs to build a parser from scratch.

I personally think that the incremental option is more viable, since no one needs to deal with a second running thread or process and IPC. The strategy now should be building PoCs of some different approaches and comparing them against each other, both in performance and maintainability.

@rawpixel-vincent
Copy link

I've published https://github.com/rawpixel-vincent/yieldable-json-body-parser
that uses https://github.com/ibmruntimes/yieldable-json
and is based off 2.x at #66

can be implemented as drop in replacement of bodyParser.json() middleware

this is a fresh so there's possibly issues that haven't been found yet, use with caution - suggestions welcome

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants