Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Merged by Bors] - Speed up ATX queries #5791

Closed
wants to merge 22 commits into from

Conversation

ivan4th
Copy link
Contributor

@ivan4th ivan4th commented Mar 30, 2024

Motivation

Lots of SQL queries to the state db take long time to execute, e.g. cache warmup is taking a lot of time

Description

This PR splits ATX blobs into a separate table (atx_blobs).
It also adds nonce column to each row in the atxs table, improving the query performance.
The database upgrade procedure had to be modified so that each migration is done in a separate transaction, and vacuuming can be done between migrations if requested by the db-vacuum-state setting. This may seemingly cause "partial" upgrades when switching to a newer go-spacemesh version, rendering the database unusable with older versions, but given that the local DB is updated separately, the data directory can't be expected to be usable with an older version after such an attempt anyway.

Timings on an M3 Max Mac:

  • cache warmup before upgrade on v1.4.4: up to 6m
  • db upgrade
    • splitting ATX blobs 27m
    • vacuuming 14m
    • updating nonces in the atxs table 11m
    • total ~52m
  • cache warmup after upgrade: 15s

Fixes #5701

Test Plan

Verified on a mainnet node.

TODO

  • initial speedup of cache warmup via subquery
  • split ATX blobs from atxs table
  • store nonce in each atxs row
  • do db upgrade steps as separate transactions
  • do not publish nonces as part of ATX when not necessary
  • add previous ATX ID field to atxs table
  • update CHANGELOG.md (after PR update)

Copy link

codecov bot commented Mar 30, 2024

Codecov Report

Attention: Patch coverage is 73.75415% with 79 lines in your changes are missing coverage. Please review.

Project coverage is 80.3%. Comparing base (3fb7bc5) to head (37546bc).

Files Patch % Lines
sql/migrations/state_0017_migration.go 67.8% 38 Missing and 17 partials ⚠️
sql/atxs/atxs.go 84.9% 6 Missing and 8 partials ⚠️
sql/database.go 65.0% 4 Missing and 3 partials ⚠️
activation/handler.go 62.5% 1 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##           develop   #5791     +/-   ##
=========================================
- Coverage     80.4%   80.3%   -0.1%     
=========================================
  Files          283     284      +1     
  Lines        29265   29497    +232     
=========================================
+ Hits         23539   23705    +166     
- Misses        4134    4172     +38     
- Partials      1592    1620     +28     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ivan4th
Copy link
Contributor Author

ivan4th commented Mar 30, 2024

bors try

spacemesh-bors bot added a commit that referenced this pull request Mar 30, 2024
sql/atxs/atxs.go Outdated
where a1.pubkey = a.pubkey and a1.epoch < a.epoch and a1.nonce is not null
order by a1.epoch desc
limit 1
)) as nonce,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is also possible to copy nonce from the previous atx on insertion, it will add very little data but completely avoid this subqueries, and also this bug #5701

// func(stmt *sql.Statement) {
// stmt.BindInt64(1, int64(from.Uint32()))
// stmt.BindInt64(2, int64(to.Uint32()))
// },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess it means that indexes are not useful for this query at all, and reading whole table is faster.
i tried to decouple blobs from the rest sometime ago #5670

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although splitting the blobs did improve performance of multiple ATX queries substantially, in this particular case it's still more efficient not to use this extra where condition

@fasmat
Copy link
Member

fasmat commented Mar 30, 2024

What is the difference between IterateAtxsData and IterateAtxsOps? They seem to be targeting the same purpose.

@dshulyak
Copy link
Contributor

the second is used in api and allows a bit of programmability, the first one is used in warmup.go (which is a subjet of this change)

@spacemesh-bors
Copy link

try

Build succeeded:

@fasmat
Copy link
Member

fasmat commented Mar 30, 2024

There is also IterateForGrading which looks similar to the other two. Would it make sense to only have IterateAtxsOps and pass the builder.Operations needed for the specific query?

I know IterateAtxsOps decodes the ATXs from their binary representation, but that's gonna change with ATX v2, where aim to not need to do that any more.

cc @poszu

@dshulyak
Copy link
Contributor

dshulyak commented Mar 30, 2024

unlike the other two IterateForGrading selects unique piece of information that is not used in those other two. it would be wasteful to always do a join and select that piece of information in other queries. also in this change there is another part that selects nonce which wouldn't make sense to use in IterateForGrading, and it certainly would be problematic to add it to IterateOps.

it doesn't sound like a good idea to break down atx into multiple fields and store all of them separately. maybe you can try to keep blobs as they are. and save subset of fields that are relevant for consensus from atx with version 2, in a way similar to current schema.

blob is stored in original form so that it can be fetched in sync without re-encoding. if you break down atx into multiple fields it will make it less efficient.

the rest of the fields are decoupled and used only for consensus and sometimes for indexing. which is still imperfect and i think better solution would be something like in #5670

@fasmat
Copy link
Member

fasmat commented Mar 30, 2024

@dshulyak the goal is to keep only the minimum necessary information in individual columns in the table.

The main problem with relying on decoding ATXs when fetching them from the DB is, that we then need to put logic into the SQL package to be able to decode different blobs of ATXs based on their publish epoch. In my opinion this should be the sole responsibility of the activation.Handler and other parts of the codebase should ideally not even be aware about different encoding schemas of ATXs.

For gossip / fetch the handling functions don't need to be aware how the blob data encodes an ATX but just serve / gossip it if needed. As fallback one can use the ATX ID to fetch the binary blob and decode the ATX themselves if needed somewhere, with the caveat that they then need to be aware of possible different encoding schemas of ATXs.

Right now we have multiple ways of representing the same data structure:

  • types.ActivationTx
  • types.VerifiedActivationTx
  • types.ActivationTxHeader
  • atxsdata.ATX

With some of them performing double duties and - at least for the first 3 - tight coupling between them. For me ideally we only had 2 types:

  • types.ATX
  • activation.ATXDto (actually one for every version of ATX encoding scheme we use now or in the future)

Where the first contains all the information relevant for consensus and in the best case maps fields 1:1 to the columns in the atx table in the DB and the second is the representation of the ATX as it is sent over the wire and should ONLY be used for that. The first can be used anywhere and be freely extended or changed without having to fear to break backwards compatibility. Queries would always return an types.ATX (with some fields possibly being nil or zero if that speeds up specific queries) and only the activation package is aware of the actual binary representation of an ATX so it can decode them in the Handler and encode them in the Builder. Possibly checkpoint needs to be aware as well - because it cares about PrevATXID and PositioningATX that are not available as columns in the atx table - besides that it can just (re)store the ATXs as bytes and not care about decoding them.

I'm only advocating for a leaner surface to the sql/atxs package. Ideally there would only be one Iterate method that is flexible enough to handle the 3 use cases we already have and not need 3 individual functions for it. builder.Options could be extended with joins and with selects.

I've seen #5670 but I don't fully understand how this improves performance? Why does it matter if the blob data is in the same or a different table? I'm not advocating against it. It could even help keeping the separation as suggested clear in the code. Would moving it to a different table not cause functions like IterateAtxsOps, Get and GetByEpochAndNodeID to break (also indirectly anything that uses types.ActivationTxHeader)?

@ivan4th
Copy link
Contributor Author

ivan4th commented Apr 1, 2024

Probably simplifying somewhat, but #5670 should improve SQLite performance b/c with blobs taken out of atxs table, the whole table should basically fit in the cache (RAM) and thus all the operations involving a subset of ATXs but not actual blobs should be faster

activation/activation.go Outdated Show resolved Hide resolved
atxsdata/warmup_test.go Outdated Show resolved Hide resolved
@fasmat
Copy link
Member

fasmat commented Apr 5, 2024

It makes no sense to include the VRFNonce in every published ATX. For any given identity it usually never changes. If we wouldn't support PoST resizing we would only ever need to include it in the initial ATX (similar to CommitmentATX). But since an existing nonce can become invalid if the user chooses to increase their PoST size we must have an option to update the VRFNonce of an identity.

If you want to simplify the lookup of the nonce instead we could make the nonce column NOT NULL and set it to the same value as in the previous ATX of the same NodeID if it is not set in the broadcast ATX. I would however also avoid this if possible. Adding the VRFNonce to all existing rows is a lot of redundant data, 8 bytes per ATX, so in the worst case that would be 80 MB of redundant data that is stored and could just be looked up with an existing DB query. Additionally it would be ~ 20 MB of data that needs to be broadcast every epoch - of data that every node already has?

@ivan4th
Copy link
Contributor Author

ivan4th commented Apr 5, 2024

@fasmat ok I'll adjust the patch not to broadcast the nonce, yet we need to include it in the atxs rows.
Even if it's 80Mb increase it's close to nothing compared to the database size (up to ~26 GiB currently) and it speeds up a lot of queries substantially, e.g. warming up the cache is down to ~15s from ~1m 26s on my laptop compared to what can be achieved using a subquery

@ivan4th ivan4th changed the title Speed up cache warmup Speed up ATX queries Apr 5, 2024
@fasmat
Copy link
Member

fasmat commented Apr 5, 2024

@fasmat ok I'll adjust the patch not to broadcast the nonce, yet we need to include it in the atxs rows. Even if it's 80Mb increase it's close to nothing compared to the database size (up to ~26 GiB currently) and it speeds up a lot of queries substantially, e.g. warming up the cache is down to ~15s from ~1m 26s on my laptop compared to what can be achieved using a subquery

I see no issue in adding the value to the DB 🙂, I just want to avoid having to broadcast the nonce with every ATX.

@ivan4th
Copy link
Contributor Author

ivan4th commented Apr 11, 2024

  • reverted the change which was including nonces in each published ATX
  • added prev_id field to the atxs table
  • instead VRFNonce function (which queries nonce by epoch + node id) used NonceByID (which queries atxs.nonce field which is now always populated) where possible
  • made the migration somewhat faster by means of re-creating the atxs table, and also by creating the indices after populating the atxs and atx_blobs tables. This comes at the price of not having PRIMARY KEY column in both tables as SQLite can't add PK to a table after the table has already been created, but the unique indices serve the same purpose and have no differences in performance

sql/atxs/atxs.go Show resolved Hide resolved
sql/atxs/atxs.go Show resolved Hide resolved
sql/atxs/atxs.go Outdated Show resolved Hide resolved
sql/atxs/atxs.go Show resolved Hide resolved
sql/atxs/atxs.go Outdated Show resolved Hide resolved
sql/atxs/atxs_test.go Outdated Show resolved Hide resolved
sql/migrations.go Outdated Show resolved Hide resolved
@@ -0,0 +1,366 @@
package migrations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should probably be a 0017_xyz.sql empty migration file (with a comment) to avoid somebody creating it by mistake.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments in the scripts appear to cause our database code to crash (probably will need to fix this separately), so added an empty placeholder file

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do have comments in .sql files, see for instance here:

--- sqlite doesn't support just adding a NOT NULL constraint, so we create a new column,
--- copy the data, drop the old table, and rename the new table to the old name

sql/migrations/state/0016_atx_blob.sql Show resolved Hide resolved
sql/migrations/state/0016_atx_blob.sql Outdated Show resolved Hide resolved
@ivan4th
Copy link
Contributor Author

ivan4th commented Apr 15, 2024

bors merge

spacemesh-bors bot pushed a commit that referenced this pull request Apr 15, 2024
## Motivation

Lots of SQL queries to the state db take long time to execute, e.g. cache warmup is taking a lot of time
@spacemesh-bors
Copy link

Pull request successfully merged into develop.

Build succeeded:

@spacemesh-bors spacemesh-bors bot changed the title Speed up ATX queries [Merged by Bors] - Speed up ATX queries Apr 15, 2024
@spacemesh-bors spacemesh-bors bot closed this Apr 15, 2024
@spacemesh-bors spacemesh-bors bot deleted the fix/quicker-cache-warmup branch April 15, 2024 16:40
spacemesh-bors bot pushed a commit that referenced this pull request May 16, 2024
## Motivation

The `nonce` column in the `atxs` table has been populated since migration 17 (#5791). Therefore the related field in the domain type `types.ActivationTx` can be required.

Updating to this version requires going through version 1.5 first.
spacemesh-bors bot pushed a commit that referenced this pull request May 16, 2024
## Motivation

The `nonce` column in the `atxs` table has been populated since migration 17 (#5791). Therefore the related field in the domain type `types.ActivationTx` can be required.

Updating to this version requires going through version 1.5 first.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pick nonce based on atx chain instead of the latest for identity
4 participants