-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(NODE-6090): Implement CSOT logic for server selection and connection checkout #4095
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks pretty good
src/sdam/server.ts
Outdated
if (options.operationTimeout) { | ||
conn = await this.pool.checkOut({ timeout: options.operationTimeout }); | ||
} else { | ||
conn = await this.pool.checkOut(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (options.operationTimeout) { | |
conn = await this.pool.checkOut({ timeout: options.operationTimeout }); | |
} else { | |
conn = await this.pool.checkOut(); | |
} | |
conn = await this.pool.checkOut({ timeout: options.operationTimeout }); |
TS supports just calling this because the timeout is optional
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel Warren's current code is easier to read (and easier for someone editing the code later to not accidentally make the code not CSOT spec-compliant) , but if we do end up going with this suggestion can we leave a clarifying comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am surprised because breaking this up into two calls to checkOut based on a condition that does matter is more to read without more meaningful context given. Whether or not timeout exists, there is no change to how checkOut is, practically, invoked because the typescript reports that field as optional.
I would actually take this further:
conn = await this.pool.checkOut(options);
Why do we need to make a new object here? passing through options should be fine right? Less branching paths the less there is to debug
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate on what may accidentally break the spec without a test warning us?
src/cmap/connection_pool.ts
Outdated
// Determine if we're using the timeout passed in or a new timeout | ||
if (options.timeout.duration > 0 || serverSelectionTimeoutMS > 0) { | ||
if ( | ||
csotMin(options.timeout.duration, serverSelectionTimeoutMS) === serverSelectionTimeoutMS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still and equals check, sorry I think we discussed it but didn't leave a comment, if duration is the same then we'll create a new timeout when we can use the existing one.
Co-authored-by: Neal Beeken <neal.beeken@mongodb.com>
@@ -889,8 +922,6 @@ function drainWaitQueue(queue: List<ServerSelectionRequest>, drainError: MongoDr | |||
continue; | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
drainWaitQueue is called when the topology is closed. Is it correct not to clear timeouts when the client is closed? I don't think so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or do we rely on drainWaitQueue rejecting each request in the wait queue, which would clear the timeout when the catch
handler is run in selectServer?
@@ -61,6 +62,12 @@ export abstract class AbstractOperation<TResult = any> { | |||
|
|||
options: OperationOptions; | |||
|
|||
/** @internal */ | |||
timeout?: Timeout; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CSOTContext class sounds somewhat similar to the OperationContext idea that you had explored earlier on in the CSOT design. I'm not against it on principle, but I think we scrapped that idea at some point. Will look for any documents we had on why and report back.
The context would be similar to the timeout factory, not the operation context. We decided not to implement an operation context because we can pass CSOT-related data on the options objects in the driver. I'm proposing a CSOT context to encapsulate CSOT logic, which we would then pass through the driver on the options objects.
I do not consider this thread resolved. I think the current implementation is more complicated than necessary because we only sometimes re-use the timeout.
- This means server selection and connection checkout must be responsible for determining when to reuse the timeout.
- We can't unconditionally clear the timeout because it may be re-used later.
Instead, I propose we either:
- Always create a new timeout for server selection and connection checkout.
- Always reuse the same Timeout object, but just reset the Timeout's interval for server selection and connection checkout.
Regardless of which is chosen, I think the resultant code is simpler because server selection and connection checkout 1) do not worry about whether or not they need to use a cached timeout or create a new one 2) they can always clear the timeout.
This works especially nicely with the TimeoutFactory or a TimeoutContext, because we can encapsulate all timeout related logic into a single place that's easily unit testable. I'm partial to the factory approach:
class TimeoutFactory {
private timeoutMS: number | null;
private started = now();
getTimeoutForServerSelection(): Timeout {
// returns a timeout, handling CSOT vs Legacy timeout logic
}
}
class Topology {
selectServer(options: { ..., timeoutFactory: TimeoutFactory }) {
...
const timeout = timeoutFactory.getTimeoutForServerSelection();
try {
....
} finally {
timeout.clear();
}
}
}
Note that with an approach like this, whether or not we reuse a timeout can easily be encapsulated into the TimeoutFactory by instantiating a timeout when the factory is constructed and returning the cached timeout where needed.
But a context class could suffice too:
class TimeoutContext {
private timeoutMS: number | null;
private started = now();
getTimeoutForServerSelection(): number {}
}
class Topology {
selectServer(options: { ..., timeoutContext: TimeoutContext }) {
...
const timeout = Timeout.expires(timeoutFactory.getTimeoutForServerSelection());
try {
....
} finally {
timeout.clear();
}
}
}
An approach like this consolidates CSOT logic and can be reused outside of the main code path (i.e., topology connect).
I don't think this work needs to block this PR. But I do want to make sure we discuss this, and I'd like to consider one of these approaches in a future ticket.
@@ -457,8 +458,14 @@ export class Topology extends TypedEventEmitter<TopologyEvents> { | |||
} | |||
} | |||
|
|||
const timeoutMS = this.client.options.timeoutMS; | |||
const timeout = timeoutMS != null ? Timeout.expires(timeoutMS) : undefined; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this correct if we're auto-connecting the client?
Description
What is changing?
New Error
MongoOperationTimeoutError
class that is thrown when a CSOT timeout is encounteredChanges to
Timeout
Timeout.throwIfExpired()
methodTimeout.remainingTime
getter methodUpdates to
AbstractOperation
timeout
fieldtimeout
is set at construction if thetimeoutMS
option is providedImplementing CSOT behaviour for server selection
Topology.selectServer
to accept atimeout
option which it will use determine whether it has timed out when defined. Otherwise, constructs aTimeout
using theserverSelectionMS
option as beforeTopology.selectServer
to throw aMongoOperationTimeoutError
on timeout whenoptions.timeout
is provided and retain previous error behaviour otherwise.Topology._connect
to pass downtimeout
toServer.command
call used to execute ping on first connectionImplementing CSOT behaviour for connection checkout
Server.command
to accepttimeout
option.ConnectionPool.checkOut
to accepttimeout
optionserverSelectionTimeoutMS
is greater than the duration on thetimeout
, otherwise, computes the time elapsed since server selection completed and creates timeout for theserverSelectionTimeoutMS
deadlineTest changes
Misc changes
resolveOptions
to handletimeoutMS
option propagationcsotMin
helper method that implements the CSOT min algorithm described hereIs there new documentation needed for these changes?
What is the motivation for this change?
Release Highlight
Fill in title or leave empty for no highlight
Double check the following
npm run check:lint
scripttype(NODE-xxxx)[!]: description
feat(NODE-1234)!: rewriting everything in coffeescript