Stateless Auth Handler: Access Token Refresh in multi-threaded environment #322

pragmaticway · 2018-11-05T19:36:03Z

In https://doc.networknt.com/style/light-spa-4j/stateless-auth/ its stated:

It is called stateless auth handler because it doesn’t need any stateful session on the BFF server so that the BFF can be scaled freely. The JWT access token and refresh token are sent to the browser with httpOnly

If its stateless, how does it do token refresh?

Current state

In SPA we may see simultaneous asynchronous connections with the same JWT cookie that hits the back-end.
Hence, more than one thread will receive JWT that contains the same access and refresh tokens.

The Problem

For sake of simplicity consider this scenario:
There are two server threads received requests from an SPA app (each containing the same JWT cookie).
Both threads detect that access token needs to be refreshed. Both will initiate refresh token flow and one of them will succeed with the new access token and new refresh token that will be wrapped into JWT and sent back to the browser as a httpOnly cookie that will replace the old cookie value.
BUT, the other thread will not succeed with renewing access token, because previous thread did already use refresh token and the token service will refuse with the previously used refresh token...this will trigger thread to return unauthorized response that will trigger a new login flow.

Net result

If SPA uses multiple threads to call back-end and access token expires and needs to be refreshed, there will be a possibility that user will be directed to login screen instead of continuing its session with access token auto-refreshed.

Possible solutions

Synchronization Point

The back-end threads serving the same access/refresh token should have a distributed synchronization point based on access/refresh token value digest signature during access token refresh flow.

Example:

A thread detected refresh token is required
Attempt to accrue distributed lock on access/refresh token value digest signature key
If the locking was successful, call token service with refresh token to get back a new access and refresh tokens and store in distributed storage or event queue and also send back to the client as a cookie update
If the locking was unsuccessful, wait for the other thread to finish and retrieve new access and refresh tokens from distributed storage or event queue. Use retrieved access token to finish the request.

PROS: Less traffic to the token service - allows to avoid unnecessary refresh token calls that with already used refresh token
CONS: Added complexity - requires additional distributed lock mechanism in place and robust resolution of lock timeouts. Testing scenarios to cover the flow are complex.

Stateless Refresh

Configure access token and refresh token timeout/lifespan the same. Try to refresh access token while its still valid and not expired. If refresh failed because other thread already used refresh token, then continue using existing access token, since its still valid.

Example:

A thread detected refresh token is required (IMPORTANT: existing access token is valid and not expired)
Call token service with refresh token to get back a new access and refresh tokens. If successful use the new access token and send back to the client (access and refresh tokens in JWT) as a cookie update
If refresh was unsuccessful use the old access token since its still valid and not expired.

PROS: The refresh flow still stay stateless
CONS: Additional traffic with used refresh token to the token service

stevehu · 2018-11-06T19:36:44Z

@pragmaticway Thanks a lot for detecting this issue and providing solutions. It is a very complicated scenario and it is very hard to realize this gap. Fantastic job!!!

As for solution 1, we need some sort of in-memory data grid to support synch between multiple router instances. We cannot guarantee that both requests go to the same instance when smart DNS is used in a round robin fashion. We have a plan to build a StatefulAuthHandler and it will leverage some sort of IMDG like Hazelcast or Redis. In that case, we can keep a distributed session on the server so that we don't need to send the access token and refresh token to the browser anymore.

The second solution might work but we need to ask the SPA to send heartbeat requests periodically. Otherwise, there is no trigger to renew the token before it is expired. If the SPA is idle for a long period of time, the token might be expired when the new requests sent to the server simultaneously. It is doable but requires that the SPA has some special logic.

The third solution that I can think of is to update the light-oauth2 service. When the first request renew the token with a refresh token, the service will generate a new access token and a new refresh token; however, it links the old refresh token to the new access token and new refresh token in a distributed map for a short period of time like 1 minute to 5 minutes. When the second request tries to renew the token with the old refresh token, it will give it back the already renewed access token and refresh token. In light-oauth2, we can limit this feature to only certain client types so that this feature can only be used by SPA application.

I think we might end up implement all three options. What do you think?

pragmaticway · 2018-11-06T21:14:44Z

@stevehu I think implementing all 3 options is a good idea to keep it generic and satisfying various security/refresh flow requirements of different customers.
Perhaps implementation option should be configurable via some sort of factory & pluggable implementation.

I suggest to have defined an interface for "a refresh flow plugin" and provide all 3 implementations "out of the box". Therefore users of the framework can select desired implementation and in some cases even provide its own implementation for the interface that satisfies their unique custom logic (in case its required)

Also regarding 3-rd solution to modify light-oauth2 service behaviour....this might open up some concerns for some of the customers IT security who don't want deviations from oauth2 standard flow rule - "refresh token can be used only once"

AlexeiZenin · 2018-11-06T23:51:35Z

@stevehu The second solution @pragmaticway mentions is the solution I am currently implementing in one of my projects (hence how I stumbled upon your implementation and found this bug). From my understanding all solutions (even the original one) needed the SPA to be active as the refresh algorithm is HTTP request initiated and not a background thread.

I am currently using a distributed token store to lookup refresh tokens for incoming access tokens, then attempting refreshes. The cool thing here is that if a refresh fails then it does not matter as the refresh is an optimistic refresh. This means if this optimism is set early enough then it does not matter which thread actually does the refresh, as one of them will send it back to the browser through a cookie before the actual expiry time of any token that failed refreshing (the threads that fail to refresh just use their current valid token). The failed-refresh requests then complete as normal, and when the SPA reissues a request all subsequent requests will have the latest access token.

The other case say where a user has been inactive for a long period of time then they become effectively logged out, since by only handling optimistic refreshes, we ignore if your access token has "actually" expired. This is an implementation detail, but I find this meets my current projects needs well as we need to logout users if they have been inactive for 5 minutes (which is achieved by simply configuring token expiries).

An example configuration would be to have your access token expiry set to 10 minutes, refresh token expiry set to 10 minutes and set your optimistic refresh to happen 1 minute before the access token expires. This way there is enough time for the existing requests to use their valid tokens and enough time to propagate the new access token back via cookie.

If only using cookies, then this algorithm uses no external distributed caches, and becomes truly stateless from a server side, we use the browser as our cache :). This is of course the least secure, as you expose both tokens to the client (maybe encrypt them?).

It was cool to read your implementation though with the cookies, I came to the same design and found it awesome we had the same thinking about the problem.

jiachen1120 · 2019-04-12T19:01:12Z

@stevehu Hey! This problem may be can be solved by using the token caching mechanism introduced by @BalloonWen in OauthHelper. We can use the refresh token as the cached key. Thus, if two requests with the same cookie simultaneously access the token refresh, they will correspond to the same Jwt object. Under the effect of the JWT object lock, when one of them is in the process of renewing, another request will wait outside the lock. After unlocking, it has become an updated JWT, and returns directly. What do you think? In this way, we just need to change current structure slightly.

stevehu · 2019-04-12T20:09:13Z

@jiachen1120 Yes. The new OauthHelper define can be used to resolve the issue of sync within the same light-router instance. We still need to implement something on the light-oauth2 token service to handle the multiple router instances scenarios.

jiachen1120 · 2019-04-12T20:30:10Z

@stevehu Yes. I will look into the light-oauth2 token service then. And currently, the StatelessAuthHandler inside light-spa-4j seems haven't use the cache mechanism. To enable cache, we need to use TokenManager.getJwt() instead of OauthHelper.getTokenResult(). Should we updated it first?

stevehu · 2019-04-13T00:14:12Z

Yes. That is the right direction to move. Thanks.

AlexeiZenin · 2019-04-14T15:51:59Z

A good system to look at where this bug also occurs is with Spring's Session implementation with Zuul Gateway where they also have this cross-instance race condition not handled (in fact the issue has been open for a few years): spring-attic/spring-security-oauth#834

stevehu · 2019-04-15T04:05:36Z

@AlexeiZenin I think we can resolve the cross-instance race condition with something implemented on the light-oauth2. There is no way that we can resolve it between two or more service instances.

pragmaticway changed the title ~~Stateless Auth Handler~~ Stateless Auth Handler: Access Token Refresh in multi-threaded environment Nov 5, 2018

stevehu assigned miklish, pragmaticway, whoamnick, ddobrin and GavinChenYan Nov 6, 2018

stevehu added the bug Issue: Bug label Nov 6, 2018

stevehu added the help wanted Issue: Help Wanted label Feb 16, 2019

whoamnick assigned jiachen1120 Apr 10, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stateless Auth Handler: Access Token Refresh in multi-threaded environment #322

Stateless Auth Handler: Access Token Refresh in multi-threaded environment #322

pragmaticway commented Nov 5, 2018 •

edited

stevehu commented Nov 6, 2018

pragmaticway commented Nov 6, 2018 •

edited

AlexeiZenin commented Nov 6, 2018 •

edited

jiachen1120 commented Apr 12, 2019

stevehu commented Apr 12, 2019

jiachen1120 commented Apr 12, 2019

stevehu commented Apr 13, 2019

AlexeiZenin commented Apr 14, 2019

stevehu commented Apr 15, 2019

Stateless Auth Handler: Access Token Refresh in multi-threaded environment #322

Stateless Auth Handler: Access Token Refresh in multi-threaded environment #322

Comments

pragmaticway commented Nov 5, 2018 • edited

Current state

The Problem

Net result

Possible solutions

Synchronization Point

Example:

Stateless Refresh

Example:

stevehu commented Nov 6, 2018

pragmaticway commented Nov 6, 2018 • edited

AlexeiZenin commented Nov 6, 2018 • edited

jiachen1120 commented Apr 12, 2019

stevehu commented Apr 12, 2019

jiachen1120 commented Apr 12, 2019

stevehu commented Apr 13, 2019

AlexeiZenin commented Apr 14, 2019

stevehu commented Apr 15, 2019

pragmaticway commented Nov 5, 2018 •

edited

pragmaticway commented Nov 6, 2018 •

edited

AlexeiZenin commented Nov 6, 2018 •

edited