Infinispan Deadlock issue when file-store persistent enabled #12367
Replies: 2 comments 5 replies
-
I can't say for sure that the SIFS errors are causing the timeouts. The big reason is I can't tell by the logs if both operations are trying to access the same keys. If we make this link, then it could be some bug hidden in SIFS. So, I also can't confirm that without persistence, the issue won't happen again By 15.0, I assume you meant 15.0.0.Final? Since then, we added a few fixes to SIFS (ISPN-15943, ISPN-15930, ISPN-15894). Are you able to test with a more recent version? The latest is 15.0.3.Final. Configuration-wise, it looks good to me. However, @ryanemerson knows more here. |
Beta Was this translation helpful? Give feedback.
-
A couple of questions.
As @jabolina mentioned, it is difficult to say. One possible case is when the write-behind queue gets full, which blocks further operation until it is flushed to disk. The lock is kept acquired during all this time. In addition, the external Infinispan sends events to the Keycloak nodes and Keycloak needs to process them. Once again, the lock is kept acquired during this period. Can you correlate a peak in your load to the time of those 2 incidents? Offtopic I'm helping the Kecloak team with their Infinispan deployment in multi-site HA scenarios. Can you share the reasons why you are deploying an external Infinispan if cross-site is not enabled? Basically, what problem does the external Infinispan solve that Keycloak itself can't? An external Infinispan is not required for single cluster Keycloak deployments. Thanks. |
Beta Was this translation helpful? Give feedback.
-
Hi,
Looking for help here to understand what is wrong with our Keycloak/Infinispan setup.
We upgraded to Keycloak v23.0.7 a few months ago and decoupled Infinispan with v15.0. Also, we have enabled file-based persistence for caches. All worked well until May 7, when we started to have deadlocks (2 incidents).
In the first case there was an error on Pod infinispan-0:
and more than an hour later deadlocks appeared on other pods:
and on pod infinispan-0 at this time
The second case looks a little bit different but again, started from SIFS error:
And then 18 hours later again deadlocks
Are these both cases related to persistence enabled for caches? We have disabled persistence and so far no issue. But we are worried, that we might be wrong, and the issue could come back at any time again. If it's because of persistence enabled, than why do we have the problem: is it a bug or did we mess up with the configuration?
Our full infinispan config:
Beta Was this translation helpful? Give feedback.
All reactions