Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Perf] Include json serialisation in the get_blocks blocking task #3253

Merged
merged 1 commit into from
May 23, 2024

Conversation

niklaslong
Copy link
Collaborator

@niklaslong niklaslong commented May 10, 2024

A small follow-up to #3252. Devnet testing of the endpoint revealed serialisation times in the milliseconds (usually hundreds), motivating the change in this PR.

I managed to OOM some of the nodes with this change, the root cause is still tbd but I suspect it's also present in #3252. This is likely because previously blocked workers are no longer artificially throttling execution. Multiple concurrent requests for MBs of data could then lead to issues despite the existing rate limiting.

Drafted until we can figure out why.

@niklaslong niklaslong changed the title [Perf] include json serialisation in the get_blocks blocking task [Perf] Include json serialisation in the get_blocks blocking task May 13, 2024
@niklaslong niklaslong marked this pull request as ready for review May 14, 2024 10:08
@niklaslong
Copy link
Collaborator Author

niklaslong commented May 14, 2024

Opened for review as the node crash was due to devnet running with --rest-rps 90000000 effectively allowing huge "bursts" of requests disabling rate limiting and is unrelated to this change. This feels like stating the obvious but the disabled rate limiting in combination with large response data (in the 10s of MB) and full resource utilisation via rayon is a potent DoS vector. Node operators should not be disabling rate limiting (and ideally should be enforcing it via proxies at the infra level).

Copy link
Contributor

@iamalwaysuncomfortable iamalwaysuncomfortable left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is likely to decrease the number of requests that can be supported by a node before a stack overflow occurs.

However, if a node operator enforces the default REST limits, this should be mitigated. This however should be backed up with explicit testing.

@niklaslong
Copy link
Collaborator Author

niklaslong commented May 16, 2024

+1 on the testing for establishing robust rate limits but I think that tradeoff should be considered separately from this change. Blocking an async worker with hundreds of milliseconds worth of serialisation would impact the rest of the node's resource allocation and execution.

@howardwu howardwu merged commit 9cc67c9 into AleoNet:mainnet-staging May 23, 2024
24 checks passed
zosorock added a commit that referenced this pull request May 24, 2024
This reverts commit 9cc67c9, reversing
changes made to 01ea476.

Signed-off-by: Fabiano <zosorock@users.noreply.github.com>
@niklaslong niklaslong deleted the perf/rest branch May 27, 2024 09:53
vicsn added a commit to AleoHQ/snarkOS that referenced this pull request May 28, 2024
joske pushed a commit to eqlabs/snarkOS that referenced this pull request May 29, 2024
This reverts commit 9cc67c9, reversing
changes made to 01ea476.

Signed-off-by: Fabiano <zosorock@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants