New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for Virtual Threads #2055
base: dev
Are you sure you want to change the base?
Conversation
…ock. This avoids that the carrier thread gets pinned.
Reading this page seems to imply that having the synchronized blocks is supported in Java 19.
This statement in that page implies that the synchronized blocks might not be a problem. Do you have any benchmarks or tests that can show that there is a real benefit to changing to using the locks? Or is there some evidence that the synchronized blocks are in fact not supported? |
…atements can throw an exception and and as a consequence won't release the lock
@lfbayer Virtual threads use a small number of carrier threads (OS Threads) to execute. Normally this will be the same amount as the number of CPUs. If the virtual thread runs in a synchronized block, the carrier thread won't be available to execute other work when the virtual thread performs blocking IO. So it depends. When no IO is being performed, synchronized is fine to use. But when IO is being performed it will be better to use a ReentrantLock as the carrier thread can perform other work in the meantime. I was hoping that this limitation would be gone in JDK 21 but as the draft JEP is being prepared, it currently seems the limitation will still be present in JDK 21. It might still change of course in the coming months. |
There is already confirmed information that in Loom GA in JDK 21 this problem still persists. So I think we should accept this pull request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lock() call must be in try {} block
I don't think it must be in the try block. This is documented in the ReentrantClass:
|
you're right |
When can this PR be merged and released? JPA |
JDK 21 has released. Virtual threads are no longer an experimental feature. Can we get a new release with this change included? |
This is somewhat of a philosophical question related to this PR. In previous multi-threaded usages involving Hikari, there was likely a threadpool of platform threads which effectively throttled the number of requests to the connection pool. I'm trying to figure out where/how to handle this? Applications can implement their own counting semaphores around DB-bound operations to re-introduce throttling - but such operations could be sprinkled all over an application. It's probably just cleaner to stick with platform threads in a threadpool at that point. Applications could increase the connection pool timeout, but this may mask actual errors creating DB connections and seems iffy regardless. Should Hikari itself put counting Semaphores in getConnection() outside of the timeout checking? Nothing seems quite right. Thoughts? |
Hi, my pull request was not trying to solve philosophical questions or determine what is the best way to use HikariCP with virtual threads. The goal is to make HikariCP virtual thread friendly and avoid pinning of the carrier thread when it is being called from a virtual thread. Pinning the carrier thread is a limitation from virtual threads in JDK 21. The developers from Loom said that they will work on avoiding pinning the carrier thread when synchronize is used. So in future versions of the JDK it is a problem that will go away but I'm not sure if they will backport their fix to JDK 21. |
@XcrigX The Oracle docs on virtual threads specifically address this use case. They recommend "unlearning this pattern" and using a counting semaphore. (Look specifically for the "Use Semaphores to Limit Concurrency" section in the link above) |
@bdeneuter - understood - it's a welcome PR. I'm just trying to spark some discussion/thought around "what's next". @enragedginger - yes, counting semaphores are the likely answer, but from a client perspective this can be messy. If your data access operations are spread out in many places it's pretty ugly to go wrap each in a try/finally block, deal with InterruptedExceptions acquiring the semaphore, etc. getConnection() in Hikari is a potentially clean place to do this but may break existing contracts with the timeout. |
IMHO HikariCP already conceptually provides a (timeout-based) semaphore to guard access to pooled connections. This PR only prevents virtual threads to pin to platform threads while accessing the pool. |
I'm really curious, when will this PR be merged? |
As far as I understand, applications can continue to use virtual threads in their applications while using HikariCP, but it just means that there is a potential for the JVM to spawn non-virtual threads. So that means that the purpose of this pull request presumably isn't specifically to be able to use HikariCP in an environment that uses virtual threads but rather, the intent is to get some performance improvement by having HikariCP itself use the virtual threads. In my very first comment I asked for benchmarks that demonstrated the performance impact of these changes. These changes are not trivial, and the risk to existing users is still there. I would want assurance that these changes are both correct and meaningful. And meaningful beyond just a cargo-cult thought process of "syncrhonized = bad". I don't entirely disagree with the change, but I also think, if semantically a lock and a synchronize block are the same, shouldn't it be on the JVM itself to add support for this? If they are not semantically the same, then clearly there are other risks of change from synchronize to locks. This is the kind of change that I would want to truly understand myself before merging. To be perfectly honest, I don't have any immediate plan to put the work in to understand the change and its impact. I think the "philosophical questions" are actually very much worth having. Part of me thinks that the model for a connection pool in the virtual thread use-case is likely a bit different from HikariCP's implementation. My gut tells me that simply changing the synchronize blocks to locks wouldn't actually solve as much as it might first seem. To answer the question of "when will this be merged": I don't have any specific plan to do so. My gut tells me it would take two days of full time work to build the confidence that this change is safe. And I don't see myself spending those two days. If someone else can provide the evidence and convincing argument for why this specific change is safe, then I would be happy to merge it. And I mean this specific change, not just an argument for virtual-threads in general. I would also like to hear a good explanation of the risks and impact of the change, as well as how conceptually a database connection pool should work in virtual-thread environment. I'm also really curious about how the current implementation actually behaves in a virtual-thread environment. Was the impetus for this PR in response to real experience using HikariCP with virtual threads? I think almost all of the current hot-path synchronize blocks aren't expected to ever really block anyway. For example, the getConnection() method actually uses zero synchronize blocks (it only uses a synchronize if the pool isn't already initialized). So if none of the hot-paths synchronize then is there any performance benefit to these changes? |
@lfbayer Thanks for the thorough explanation. It's helpful to understand where you're at on all of this.
I don't think this assumption is correct. Check out the section from the Oracle docs on virtual threads titled Scheduling Virtual Threads and Pinned Virtual Threads. If a virtual thread hits a Deadlocks due to resource starvation due to pinned virtual threads is the situation we're all trying to avoid. I've already encountered this situation with virtual threads when using pooling with the Apache commons HTTP library and it sucks. Switching to using FWIW, the section of the docs on Avoid Lengthy and Frequent Pinning states:
After reading this statement and re-reviewing this PR, I don't think any of the current usages of
This is also an interesting statement that I hadn't initially considered. That aforementioned section of the docs has this too:
I'd largely skipped over the word "current" there; it implies that this might change in the future. The docs advocate for switching to use explicit In summary, I don't see how the changes from However, before advertising that "HikariCP is virtual thread compatible" it'd be good to add JDK 21 to the testing grid and add some virtual thread executor related test cases. IMO it doesn't make sense to do that until after this lands though. Thoughts? |
@enragedginger thank you for your detailed reply. I also wouldn't go as far as saying HikariCP is specifically "virtual thread compatible" at this point. Although I don't feel confident saying it isn't either. It seems like there is potentially other issues than just the use of synchronize that might mean we don't play well. Since I don't have ANY experience using virtual threads yet it's hard for me to feel comfortable with any judgement on whether this change is sufficient. My main concern with using the ReentrantLock is that even if it doesn't change anything functionally, it might have a negative impact on performance somehow. For context: when I write code these days I always prefer java.util.concurrent over synchronized. So I have nothing against ReentrantLock and no specific preference for synchronized. What I don't like is change without full comprehension of the impact of the change. I would absolutely expect Reentrant locks to not perform worse than synchronize, but I still want to see some benchmark numbers to verify this before merging. Just because junit passes doesn't mean there is no negative impact. I also feel like the issue of pinning a virtual thread still wouldn't be enough to cause a deadlock. It would make me feel more comfortable if I could see a concrete case where this actually is a problem. Also, I'm still not satisfied with the answer related to locks and synchronize blocks being semantically the same. If the JVM doesn't support something with synchronize that it does with Reentrant locks, then that implies there is something about synchronize that is different. So what are the reasons that synchronize support isn't available. Knowing these underlying reasons would be helpful for us to analyze whether those same reasons would have any impact on us. For me to be satisfied I need the following questions to be resolved:
|
…ehaviour when the class is loaded multiple times in different ClassLoaders.
The pinning of the OS carrier thread occurs when calling native code from Java. It seems that Java is calling native code in the implementation for Java monitors and this seems to be the root cause that synchronized pins the carrier thread. That is why in the JEP it is specified that both native calls and synchronized pins the carrier thread. I found this BUG report for OpenJDK where they mention that they are reimplementing Java monitors: So I'm fine to wait for this new implementation and to not make code changes to avoid pinning. Not sure if they would backport this to JDK 21 though. I would expect not. |
@bdeneuter Your answer satisfies me as to how synchronized and ReentrantLock are different. I think looking forward, even if support for synchronized in virtual-threads is added, HikariCP likely needs to be changed somehow. Do you have any thoughts on my questions above? Specifically 1 and 3. My comments up until now might sounds like I do not want to fix this problem. But the opposite is true, I just want to be absolutely sure the answer is the correct one. |
Each time the code enters a synchronized block the underlying carrier thread gets pinned. If this is a bad thing depends on what happens in the synchronized block. If the virtual thread yields and gets parked, the carrier thread stays pinned and won't pick up other virtual threads that want to continue. If all carrier threads are pinned, the system can block and stops working. If the virtual thread doesn't yield and just leaves the synchronized block after it did its work everything is fine. This is a nice example that is not complex and where the pinning can cause a deadlock: This is their question on the loom-dev mailing list: What happens in this example:
To me it starts to look like it is safer to wait until they solved the pinning of the threads. To answer your other question. I don't think it would break the users application if HikariCP creates OS threads. Of course it would be better if the different libraries avoid creating their own OS threads or allow the user to configure the library to use virtual threads if they want them.
I understand and don't worry. Everybody is searching on how to use this technology with its current limitations. I prefer to not rush it in if there are doubts. |
I can add some color to my own experience trying to use virtual threads with Hikari. I have a batch processing app that takes a set of data in one format, and converts to a flattened SQL model. It receives a an array of records, and for each it creates a task and submits to the ExecutorService to schedule and run on a thread, then blocks for them all to complete. I switched to using virtual threads to see how it would work, pretty much by simply replacing the fixed threadpool with: So suppose I receive 10000 records in a batch. Before at most 8 (threadpool size) could ever be executing concurrently. When switching to virtual threads there is no limit on the number of threads that can be running concurrently. Since the DB IO is the main bottleneck, many more than 8 virtual threads will spin up. Eventually they all start to stack up waiting for a connection from Hikari. And eventually, one of the tasks will wait > 30 seconds and timeout. Again, my suggestion to alleviate this problem would be to have a counting semaphore in getConnection() that re-introduces some throttling based on the connection pool size before the timeout timer starts. This may change the existing contract with clients however, so there may be more to it - some additional explicit configuration or something so clients can "opt-in" to this contract change. |
@XcrigX If HikariCP would allow to disable the connection timeout you essentially would end up with a counting semaphore. |
@svendiedrichsen I suppose one could set the connection timeout to a really high value to approximate it. My thought was that I'd still want to know about problems actually getting a connection from the DB vs just waiting for a connection to become available from the pool. |
@XcrigX But this is more a question on how HikariCP differentiates those two cases. i.e. By using different exceptions. |
I'm certainly not familiar with the hikariCP internals, but it sounds like there are two topics here:
Finally, i think the deadlock behaviour (with
Hope my understanding is correct, and that this helps someone 👍 . Thanks for the great work - eagerly awaiting on a resolution... |
Hello, I was just watching the java 22 launch live stream, and apparently, |
@jimpil Sorry, this isn't answering your question. I'd like to let you know that the short-term fix for pinning is available on the Loom-ea build (https://mail.openjdk.org/pipermail/loom-dev/2024-February/006433.html). Hopefully, this will be ready for production use on JDK 23. |
@sureshg given that java 21 is GA, are we taking a stand that users of HikariCP lib should not use virtual thread and have to wait for java 25 (optimistic that issue of pinning gets fixed in java 25). |
Did a minor benchmarking with spring boot 3.2 + mysql-connector-j(8.3.0) + HikariCP (5.1.0) and results are already mentioned in this blog post by another user. https://medium.com/naukri-engineering/virtual-thread-performance-gain-for-microservices-760a08f0b8f3 Stack trace while threads are getting pinned.
Below are the troublesome section, which are coming from mysql-connector-j.
Fix for which is already getting published here https://bugs.mysql.com/bug.php?id=110512 Based on finding, I will second @lfbayer's opinion to wait till connector issue is fixed, as above changes doesn't fall under any of the critical section or long wait time section. |
It seems that
synchronized
will still pinn carrier threads in JDK 21 for the moment. This is the draft JEP for JDK21:https://openjdk.org/jeps/8303683
So I'm reoping the PR for using ReentrantLock.