-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(client): Make the throttle skip a snapshot if one is already running #1273
base: main
Are you sure you want to change the base?
Conversation
This seems unintentional and i'm certainly in favor of skipping the periodic snapshot if a snapshot is already in progress. |
It sounds good to skip... in the correct conditions :)... but I think this code might miss changes: by the time you check the lock is taken, the mutex area is already beyond the change-capturing phase, so you could miss the last new changes that were produced that triggered the snapshot you're evaluating. Did I miss something? I haven't looked deeply. |
Looking at this again. If polling misses a tick because of a race, maybe it's okay, but do you know why you're observing a large amount of snapshots being queued? Is the polling time not configured to be large enough? or is the apply function taking the mutex too agressively? |
I agree with @balegas the current change makes it possible for explicit calls to When |
@balegas The large amount of snapshots is mostly because of how some tests are configured. The polling interval is 200ms and some tests fake a snapshot taking longer than usual, 500ms-1000ms. So the polling schedules a few snapshots, which need to be resolved before the process stops. Since there are places that want to ensure that a snapshot task gets queued, there could be an optional parameter when running the snapshot. The polling for instance doesn't really care about the skip. |
thanks for clarifying. In the case of a snpashot taking longer than the polling period they would get queued, but they wouldn't run after calling stop because the throttled function is cancelled. Is the behaviour really that they are executing before returning? |
@@ -196,18 +203,55 @@ export class SatelliteProcess implements Satellite { | |||
this._connectRetryHandler = connectRetryHandler | |||
} | |||
|
|||
_onSnapshotThrottleTick( | |||
{ skipIfRunning } = { skipIfRunning: false } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't the default be true
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whatever you guys prefer. I used "dont skip as a default" and opt in to the skip with the polling interval.
We can instead opt out in the places you consider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mentioned the other day there could be places where we don't want to miss the snapshot
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would argue that for periodic snapshots the default of skipping if a snapshot is already in progress is fine. Let's see what @balegas thinks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kevin-dp I agree too to the skip by default on periodic snapshots (polling interval), but the snippet you point out is for the throttled snapshot function, which is used across the process.ts file and tests.
The polling already skips here: https://github.com/davidmartos96/electric/blob/b06b941b9c8f701e34e3f24effaffc0c393d5520/clients/typescript/src/satellite/process.ts#L332
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe _onSnapshotThrottleTick is not the best name because of the "Tick". I wasn't sure how to call it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like us to give a step back
The idea of the long polling function was to pick changes from anyone that was interacting with the database directly.
Looking at the code, I see that potentialDataChangeHandler is making use of throttledSnapshot. That is wrong, because this code should take a snapshot regardless of that because new changes were just applied. Any debouncing is introducing delay in capturing changes from the database.
Also, I don't think the queuing up of snapshots by the throttling function is the behaviour we intend (I'm aware I did that code :D). If a snapshot takes longer than the polling period, we don't want to queue multiple requests, we just want to call a next one immediately and get back into schedule.
wrt this PR, I think the idea of skipping the long polling call if a snapshot is already ongoing is valid, but I think we need to fix the parts above first and (that will lead to simpler code for the skipping part).
@davidmartos96, I understand that we're stepping up the complexity of this PR. So, I'm happy we take over. At the same time, I'm happy we continue discussing with you if you're motivated to do the change.
/** | ||
* Only acquire the mutex if it's not already locked. | ||
*/ | ||
private async acquireIfFree( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we think this is a really good abstraction to have, we should provide it as a wrapper to the mutex and not as a function of process.ts. Maintainers of the lib didn't seem to think this was necessary :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, i would be highly in favor of adding this method as it captures the intent and doesn't make it possible for someone to accidentally slip in an async call in the gap between the check and the acquire.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fwiw I think this would be clearer if rather than being marked as async it is left as synchronous an returns Promise.resolve(null)
if the mutex is locked - this way there's no ambiguity with regards to any async gaps introduced by making the method itself async.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@msfstef @kevin-dp I updated the signature to this so that it is sync. Is Promise.resolve(null) necessary? Wouldn't that be the same as the previous code? async fun
+ return null/undefined
New signature: private acquireIfFree(mutex: Mutex): Promise<MutexInterface.Releaser> | null
@balegas Given there are other things that were discovered, I think it would be easier for communication purposes if you guys take it over then. Feel free to reuse this PR or the unit test provided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davidmartos96 you're right the Promise.resolve
is unnecessary - this signature and behaviour is clearer IMO!
Another way to think about this issue would be to always skip the throttled snapshot if the mutex is taken and flag the process with a boolean property if someone requested a snapshot while one was running. |
yeah, that makes sense for me too. The polling a bit of a corner case, so I prefer to have simpler code even if we lose a tick in some situations (against my original comment). |
This is a behavior we've seen when creating the test for the last PR (#1251). If you add a log to the mock snapshot function, you will see that when stopping, the process waits for a large amount of snapshots to finish before stopping completely.
This is because when polling, they are added to a queue due to the snapshot mutex.
Is this intentional? Or should the throttle just skip the call if the lock is taken?
cc @kevin-dp @msfstef