Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(NODE-5464): OIDC machine and callback workflow #3912

Open
wants to merge 48 commits into
base: main
Choose a base branch
from
Open

Conversation

durran
Copy link
Member

@durran durran commented Nov 7, 2023

Description

Implements OIDC new machine and human callback workflows.

What is changing?

  • Implements the OIDC callback workflow. Specified with OIDC_CALLBACK auth mech property.
  • Implements the OIDC human callback workflow. Specified with OIDC_HUMAN_CALLBACK auth mech property.
  • Implements the OIDC Test machine workflow. Specified with ENVIRONMENT:test auth mech property.
  • Implements the OIDC Azure machine workflow. Specified with ENVIRONMENT:azure auth mech property.
  • Implements the OIDC GCP machine workflow. Specified with ENVIRONMENT:gcp auth mech property.
  • Uses a new generic TokenCache for all OIDC authentication that sits at the auth provider level.
  • Removes the old complex callback workflow global caching.
  • Removes the old global Azure token cache.
Is there new documentation needed for these changes?

What is the motivation for this change?

mongodb/specifications#1471
mongodb/specifications#1544
mongodb/specifications#1513

Release Highlight

Support for MONGODB-OIDC Authentication

MONGODB-OIDC is now supported as an authentication mechanism for MongoDB server versions 7.0+. The currently supported facets to authenticate with are callback authentication, human interaction callback authentication, Azure machine authentication, and GCP machine authentication.

Azure Machine Authentication

The MongoClient must be instantiated with authMechanism=MONGODB-OIDC in the URI or in the client options. Additional required auth mechanism properties of TOKEN_RESOURCE and ENVIRONMENT are required and another optional username can be provided. Example:

const client = new MongoClient('mongodb+srv://<username>@<host>:<port>/?authMechanism=MONGODB-OIDC&authMechanismProperties=TOKEN_RESOURCE:<azure_token>,ENVIRONMENT=azure');
await client.connect();

GCP Machine Authentication

The MongoClient must be instantiated with authMechanism=MONGODB-OIDC in the URI or in the client options. Additional required auth mechanism properties of TOKEN_RESOURCE and ENVIRONMENT are required. Example:

const client = new MongoClient('mongodb+srv://<host>:<port>/?authMechanism=MONGODB-OIDC&authMechanismProperties=TOKEN_RESOURCE:<gcp_token>,ENVIRONMENT=gcp');
await client.connect();

Callback Authentication

The user can provide a custom callback to the MongoClient that returns a valid response with an access token. The callback is provided as an auth mechanism property an has the signature of:

const oidcCallBack = (params: OIDCCallbackParams): Promise<OIDCResponse> => {
  // params.timeoutContext is an AbortSignal that will abort after 30 seconds for non-human and 5 minutes for human.
  // params.version is the current OIDC API version.
  // params.idpInfo is the IdP info returned from the server.
  // params.username is the optional username.

  // Make a call to get a token.
  const token = ...;
  return {
     accessToken: token,
     expiresInSeconds: 300,
     refreshToken: token
  };
}

const client = new MongoClient('mongodb+srv://<host>:<port>/?authMechanism=MONGODB-OIDC', {
  authMechanismProperties: {
    OIDC_CALLBACK: oidcCallback
  }
});
await client.connect();

For callbacks that require human interaction, set the callback to the OIDC_HUMAN_CALLBACK property:

const client = new MongoClient('mongodb+srv://<host>:<port>/?authMechanism=MONGODB-OIDC', {
  authMechanismProperties: {
    OIDC_HUMAAN_CALLBACK: oidcCallback
  }
});
await client.connect();

Double check the following

  • Ran npm run check:lint script
  • Self-review completed using the steps outlined here
  • PR title follows the correct format: type(NODE-xxxx)[!]: description
    • Example: feat(NODE-1234)!: rewriting everything in coffeescript
  • Changes are covered by tests
  • New TODOs have a related JIRA ticket

@durran durran force-pushed the NODE-5464 branch 3 times, most recently from f6422b2 to 5482d70 Compare November 8, 2023 19:07
@durran durran force-pushed the NODE-5464 branch 3 times, most recently from 8bc8de0 to e67a221 Compare December 21, 2023 14:49
@durran durran force-pushed the NODE-5464 branch 3 times, most recently from ea3d2bc to 88c6eff Compare February 2, 2024 15:56
@durran durran force-pushed the NODE-5464 branch 2 times, most recently from 569255f to 893a15c Compare February 14, 2024 13:31
@durran durran force-pushed the NODE-5464 branch 3 times, most recently from 4b8ca02 to 5ea2fb3 Compare February 21, 2024 10:57
@durran durran force-pushed the NODE-5464 branch 11 times, most recently from ce7642f to 0542a48 Compare February 28, 2024 20:04
@durran durran changed the title feat(NODE-5464): OIDC machine workflow feat(NODE-5464): OIDC machine and callback workflow Feb 28, 2024
@durran durran force-pushed the NODE-5464 branch 2 times, most recently from a40da5a to 51718d8 Compare February 28, 2024 20:36
@nbbeeken nbbeeken self-assigned this May 7, 2024
@nbbeeken nbbeeken added the Primary Review In Review with primary reviewer, not yet ready for team's eyes label May 7, 2024
test/integration/auth/mongodb_oidc.prose.test.ts Outdated Show resolved Hide resolved
test/tools/unified-spec-runner/entities.ts Outdated Show resolved Hide resolved
src/cmap/auth/mongo_credentials.ts Show resolved Hide resolved
test/tools/uri_spec_runner.ts Show resolved Hide resolved
test/unit/index.test.ts Outdated Show resolved Hide resolved
@durran
Copy link
Member Author

durran commented May 14, 2024

#3912 (comment)

I've update to export both TokenCache and Workflow as types.

@durran durran requested a review from nbbeeken May 14, 2024 16:38
@nbbeeken nbbeeken added Team Review Needs review from team and removed Primary Review In Review with primary reviewer, not yet ready for team's eyes labels May 15, 2024
Copy link
Contributor

@addaleax addaleax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Left a few comments but no need to wait for me before merging :)

Comment on lines +91 to +92
// eslint-disable-next-line github/no-then
lock = lock.then(() => callback(credentials));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// eslint-disable-next-line github/no-then
lock = lock.then(() => callback(credentials));
lock = callback(credentials);

If we already await before this line anyway, we shouldn't need to chain the Promises here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There may be a simpler way to implement this, but Durran's implementation is correct. Durran's implementation forces each callback to execute sequentially, even when it is called multiple times concurrently. If you remove the .then chaining, as soon as lock resolves, each invocation will continue and re-assign lock, calling callback for each invocation. Using the .then chain, as soon as lock resolves, each pending invocation will proceed and then chain onto the lock, forcing each callback invocation to happen sequentially.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(you can try it with this simplified script)

import { setTimeout } from 'timers/promises';

function withLock(callback: () => Promise<number>) {
  let lock: Promise<any> = Promise.resolve();
  return async (): Promise<number> => {
    await lock;

    lock = lock.then(() => callback());
    return await lock;
  };
}

const handler = async () => {
  console.log('starting execution');
  await setTimeout(1000);
  console.log('finished execution');
  return 3;
};
const cache = withLock(handler);

Promise.all(Array.from({ length: 10 }).map(() => cache())).then(() => 'completed');

/**
* Ensure the callback is only executed one at a time.
*/
private withLock(callback: OIDCTokenFunction) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private withLock(callback: OIDCTokenFunction) {
private withLock(callback: OIDCTokenFunction): OIDCTokenFunction {

/**
* Ensure the callback is only executed one at a time.
*/
protected withLock(callback: OIDCCallbackFunction) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
protected withLock(callback: OIDCCallbackFunction) {
protected withLock(callback: OIDCCallbackFunction): OIDCCallbackFunction {

* @public
* @category Error
*/
export class MongoGCPError extends MongoRuntimeError {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be

Suggested change
export class MongoGCPError extends MongoRuntimeError {
export class MongoGCPError extends MongoOIDCError {

?

request.destroy(new MongoNetworkTimeoutError(`request timed out after 10 seconds`));
}, 10000);
});
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably a dumb question, but this seems to duplicate a lot from the request() function below, should this be re-using it?

If not, I'd consider making the error message thrown here a bit more verbose and contain the URL, similar to request() below

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was originally written for azure kms in the libmongocrypt repo. There are also specific requirements for Azure KMS fetching and OIDC that make the other request function incompatible (specifically, we must access the status code, but request only returns the body)

We should probably consolidate these functions at some point but for this PR it probably suffices to use get, which we already had (we just moved it from our FLE code).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: we're using a different endpoint for GCP: computeMetadata/v1/instance/service-accounts/default/identity vs computeMetadata/v1/instance/service-accounts/default/token.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our get and request are HTTP utilities, but Steve's comment is definitely relevant for #3912 (comment)

*/
constructor(cache: TokenCache, callback: OIDCCallbackFunction) {
super(cache, callback);
this.lastInvocationTime = Date.now();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that we won't invoke the callback if we attempt auth within 100ms of starting the driver?

Should we instead initialize this to Date.now() - 100ms?

request.destroy(new MongoNetworkTimeoutError(`request timed out after 10 seconds`));
}, 10000);
});
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was originally written for azure kms in the libmongocrypt repo. There are also specific requirements for Azure KMS fetching and OIDC that make the other request function incompatible (specifically, we must access the status code, but request only returns the body)

We should probably consolidate these functions at some point but for this PR it probably suffices to use get, which we already had (we just moved it from our FLE code).

Comment on lines +91 to +92
// eslint-disable-next-line github/no-then
lock = lock.then(() => callback(credentials));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There may be a simpler way to implement this, but Durran's implementation is correct. Durran's implementation forces each callback to execute sequentially, even when it is called multiple times concurrently. If you remove the .then chaining, as soon as lock resolves, each invocation will continue and re-assign lock, calling callback for each invocation. Using the .then chain, as soon as lock resolves, each pending invocation will proceed and then chain onto the lock, forcing each callback invocation to happen sequentially.

Comment on lines +91 to +92
// eslint-disable-next-line github/no-then
lock = lock.then(() => callback(credentials));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(you can try it with this simplified script)

import { setTimeout } from 'timers/promises';

function withLock(callback: () => Promise<number>) {
  let lock: Promise<any> = Promise.resolve();
  return async (): Promise<number> => {
    await lock;

    lock = lock.then(() => callback());
    return await lock;
  };
}

const handler = async () => {
  console.log('starting execution');
  await setTimeout(1000);
  console.log('finished execution');
  return 3;
};
const cache = withLock(handler);

Promise.all(Array.from({ length: 10 }).map(() => cache())).then(() => 'completed');

*/
function isEndpointResultValid(
token: unknown
): token is { access_token: unknown; expires_in: unknown } {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Azure KMS credential fetching does assume (and assert) that these properties are of type string and number respectively, is there a reason we're not validating the type of the response's fields for OIDC too?

import { type TokenCache } from './token_cache';

/** Base URL for getting Azure tokens. */
const AZURE_BASE_URL = 'http://169.254.169.254/metadata/identity/oauth2/token?';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(non-blocking) there's are a few duplicated constants between this and the Azure KMS module. Could we consolidate?

We also, with minor modifications, be able to use the prepareRequest method defined in the Azure kms module here too. they're almost identical.

Comment on lines +42 to +43
const url = new URL(GCP_BASE_URL);
url.searchParams.append('audience', tokenAudience);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(not blocking) any reason not to use the GCP SDK here?

I'm a little wary of divergent implementations between authentication and KMS credential fetching in FLE (I recently aligned the AWS implementations, but now Azure OIDC and GCP OIDC diverge).

Perhaps we could either adjust GCP and Azure KMS fetching to match the OIDC implementation (with a drivers ticket probably, not in this work) or the vice-versa?

Comment on lines +76 to +82
const responses = await Promise.all([
setTimeout(CALLBACK_DEBOUNCE_MS - (now - this.lastInvocationTime)),
this.fetchAccessToken(credentials)
]);
response = responses[1];
}
this.lastInvocationTime = now;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this supposed to be:

else {
  // wait until we have satisfied the debounce threshold
  await setTimeout(CALLBACK_DEBOUNCE_MS - (now - this.lastInvocationTime));
  const response = await this.fetchAccessToken();
}

Promise.all is eager, so as soon as we enter this if-block, we will call fetchAccessToken. Which means that if we are attempting to fetch an access token and we determine that is has not been CALLBACK_DEBOUNCE_MS since the last call, we immediately call it again?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also - how does this handle potential concurrent calls to workflow.execute()? It seems like if the debounce threshold hasn't been met, alll concurrent requests will wait until the threshold has been satisfied, but will then they will all proceed in parallel without waiting after each invocation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team Review Needs review from team
Projects
None yet
5 participants