Project loom: Why are virtual threads not the default? - java

According to the project loom documentation virtual threads behave like normal threads while having almost zero cost and the ability to turn blocking calls into non-blocking ones.
If this is true, then why are they separate things? Why not just make them the default? Is there any reason to not use them?

There are really two questions here: 1. Why are virtual threads not the default? and 2. Is there ever a reason not to use them.
Regarding the default, Java really has not concept of a "default" thread. Once virtual threads arrive, when you create a thread, you must specify whether you want a platform thread or a virtual thread. The question then becomes why have we decided not to automatically replace today's threads with virtual threads (i.e. make new Thread() create a virtual thread). The answer to that is quite simple: it would not be helpful at all and might well be quite harmful. It would not be helpful because the advantages of virtual threads come from the ability to create a great many of them. If your application creates N threads today, nothing would be gained by turning those N threads into virtual threads. The scaling advantage of virtual threads would only kick in when your application creates, say, 1000N threads, which means it would need to be changed, anyway (e.g. by replacing Executors.newFixedThreadPool with Executors.newVirtualThreadPerTaskExector). It might be harmful because while virtual threads' semantics are almost the same as platform threads, they are not perfectly backward compatible (see JEP 425 for details).
As to the question about when not to use virtual threads, there are some obvious cases. E.g. when your threads heavily interact with native code, which knows nothing about virtual threads, or when you depend on some detail that has changed for virtual threads, like the ability to subclass Thread. Other cases are not so clear. For example, CPU-bound operations do not benefit from having more threads than CPU cores, so they do not benefit from the multitude of virtual threads, but that doesn't mean that they would be harmed. We're still not ready to say that users should pick virtual threads by default, but we might well get there, as we learn more about how people use them.

Be aware that Project Loom is under active experimental development. Things may change.
No default
You asked:
Why not just make them the default?
In modern Java, we generally do not address threads directly. Instead, we use the Executors framework added years ago in Java 5.
In particular, in most cases a Java programmer uses the Executors utility class to produce an ExecutorService. That executor service is backed by various kinds of thread factories or thread pools.
For example, if you want to serialize one task after another, we would use an executor service backed by a single thread.
ExecutorService executorService = Executors.newSingleThreadExecutor() ;
If you browse through Executors class Javadoc, you will see a variety of options. 👉 None of them is "default". The programmer chooses one to suit the needs of her particular situation.
With Project Loom, we will have at least one more such option to choose from. In the preview build of Java, call the new Executors.newVirtualThreadPerTaskExecutor() to get an executor service backed by virtual threads. Go nuts, and throw a million tasks at it.
ExecutorService executorService = Executors.newVirtualThreadPerTaskExecutor() ;
You asked:
why are they separate things?
One of the highest priorities for the Java team is backward-compatibility: Existing apps should be able to run without surprise.
Virtual threads have a very different behavior and performance profile than platform threads. So I do not expect to see the Java team retrofitting virtual threads onto existing features of Java generally. They may choose to do so, but only if absolutely certain no detrimental effects will surface in the behavior of existing apps.
When to choose or avoid virtual threads
You asked:
Is there any reason to not use them?
Yes, certainly. Two reasons:
CPU-bound tasks
Tasks used to indirectly throttle other resources
CPU-bound tasks
The entire point of virtual threads is to keep the "real" thread, the platform host-OS thread, busy. When a virtual thread blocks, such as waiting for storage I/O or waiting network I/O, the virtual thread is "dismounted" from the host thread while another virtual thread is "mounted" on the host thread to get some execution done.
So, if your task’s code does not block, do not bother with virtual threads. But this kind of code is rare. Most tasks in most apps are often waiting for users, storage, networks, attached devices, etc. An example of a rare task that might not block is something that is CPU-bound like video-encoding/decoding, scientific data analysis, or some kind of intense number-crunching. Such tasks should be assigned to platform threads directly rather than virtual threads.
Throttling
Another reason to avoid virtual threads is with existing code that depends on the limits or bottlenecks of platform threads to throttle their app’s usage of other resources. For example, if a database server is limited to 10 simultaneous connections, then some app have been written to use an executor service backed by only 8 or 9 threads. Such existing code should not be blindly switched to virtual threads.
Of course such code is less than optimal. Such a code base would be better, clearer, more obvious to comprehend if explicit limiting/throttling mechanisms were utilized.
Using explicit throttling mechanisms will be needed if a programmer wants to benefit having thousands, even millions, of simultaneous virtual threads while avoiding exhausting/overloading other limited resources.
Java has long offered such throttling mechanisms. They just were not always used, given the simplicity/ease of relying on the limits/bottlenecks of a limited number of platform threads.
I am no expert on this. So rely on those who are experts. For details and insights, be sure to read the articles and watch the presentations and interviews by Ron Pressler, Alan Bateman, or other members of the Project Loom team.

Lets begin with
Why not just make them the default?
Virtual threads are wrapped upon platform threads, so you may consider them an illusion that JVM provides, the whole idea is to make lifecycle of threads to CPU bound operations.
Platform Threads versus Virtual threads. Platform threads take OS
threads hostage in IO based tasks and operations limited to number of
applicable threads with in thread pool and OS threads, by default
they are non Daemon threads
Virtual threads are implemented with JVM, in CPU bound operations the
associated to platform threads and retuning them to thread pool,
after IO bound operation finished a new thread will be called from
thread pool, so no hostage in this case.
Fourth level architecture to have better understanding.
CPU
Multicore CPU multicores with in cpu executing operations.
OS
OS threads the OS scheduler allocating cpu time to engaged OS
threads.
JVM
platform threads are wrapped totally upon OS threads with both task
operations
virtual threads are associated to platform threads in each CPU bound
operation, each virtual thread can be associated with multiple
platform threads as different times.
Virtual threads with Executer service
More effective to use executer service cause it associated to thread pool an limited to applicable threads with it, however in compare of virtual threads, with Executer service and virtual contained we do not ned to handle or manage the associated thread pool.
try(ExecutorService service = Executors.newVirtualThreadPerTaskExecutor()) {
service.submit(ExecutorServiceVirtualThread::taskOne);
service.submit(ExecutorServiceVirtualThread::taskTwo);
}
Executer service implements Auto Closable interface in JDK 19, thus when used with in 'try with resource', once it reach to end of 'try' block the 'close' api being called, alternatively main thread will wait till all submitted task with their dedicated virtual threads finish their lifecycle and associated thread pool being shutdown.
ThreadFactory factory = Thread.ofVirtual().name("user thread-", 0).factory();
try(ExecutorService service = Executors.newThreadPerTaskExecutor(factory)) {
service.submit(ExecutorServiceThreadFactory::taskOne);
service.submit(ExecutorServiceThreadFactory::taskTwo);
}
Executer service can be created with virtual thread factory as well, just putting thread factory with it constructor argument.
Can benefits features of Executer service like Future and Completable Future.
Virtual threads advantages
exhibits exact the same behavior as platform threads.
disposable and can be scaled to millions.
much more lightweight than platform threads.
fast creation time, as fast as creating string object.
the JVM does delimited continuation on IO operations, no IO for
virtual threads.
yet can have the sequential code as previous but way more effective.
the JVM gives an illusion of virtual threads, underneath whole story
goes on platform threads.
Just with usage of virtual thread CPU core become much more concurrent, the combination of virtual threads and multi core CPU with ComputableFutures to parallelized code is very powerful
Virtual threads usage cautions
Don not use monitor i.e the synchronized block, however this will fix in new release of JDK's, an alternative to do so is to use 'ReentrantLock' with try-final statement.
Blocking with native frames on stack, JNI's. its very rare
Control memory per stack (reduce thread locales and no deep recursion)
Monitoring tools not updated yet like debuggers, JConsole, VisualVM etc
Find more on JEP-425

If you make them default a good portion of the existing java code won't just be able to switch to java 19, because that code is optimized for OS threads.
Java has to be backward compatible.
There are cases where Virtual Threads don't make much sense for example
Applications that do heavy computations
If you make requests to a DB that has max connection pool, the bottleneck is not the threads
Using thread locals is not a good idea with Virtual Threads
Furthermore probably most of the existing code that deals with threads pools them which again goes against the main idea

Related

Does the decision of how many cores to use stay in the hands of the JVM?

Suppose we have a very complex task. I know that if we use one thread then in practice we will use one core, but if I divide the task into threads equal to the number of processor cores does the program necessarily run on all the cores?
Or there is no correlation between the number of threads and cores used and the JVM 'decides'?
Actually, it is typically the Operating System that decides how many cores that a Java application gets to use. The scheduling of native threads to cores is handled by the operating system1. The JVM has little (if any) say thread scheduling.
But yes, an Java application won't necessarily get access to all of the cores. It will depend on what other applications, services, etc on system are doing. Indeed, the OS may provide ways for an administrator to externally limit the number of cores that may be used by a given (Java or not) application, or give one application priority over another.
... or there is no correlation between the number of thread and used core's
There is a correlation, but not one that is particularly useful. A JVM won't (cannot) use more cores than there are native threads in existence (including the JVM's internal and GC threads).
It is also worth noting that the OS (typically) doesn't assign a core to a native thread that is not currently runnable.
Basil Bourque notes in his answer that Project Loom will bring significant improvements to threading in Java, if and when it is incorporated into the standard releases. However, Loom won't alter the fact that the number of physical cores assigned an application JVM at any given time is controlled / limited by the OS.
1 - Or the operating system's hypervisor. Things can get a bit complicated in a cloud computing environment, for instance.
The Answer by Stephen C is true today, where Java threads are typically implemented as native threads provided by the host operating systems. But things change with Project Loom technology being developed for a future version of Java.
Project Loom brings virtual threads (fibers) to the concurrency toolbox of Java. Many virtual threads will be mapped to each of the few platform/kernel threads. A virtual thread when blocked (waiting on a call to slow resources such as file I/O, network I/O, database access, etc.) will be “parked” (set aside) allowing some other virtual thread to execute for a while on the “real” platform/kernel thread.
This parked/unparked switching will be very fast, and take little memory (using a flexible growing/shrinking stack). This makes threads “cheap”, so cheap that you might reasonably be able to run millions of threads at a time.
Returning to your question, the management of these virtual threads will be managed within the JVM rather than the host OS. But underneath our virtual threads, we rely on the same platform/kernel threads used today, and those are ultimately controlled by the host OS rather than the JVM.
By default, when setting up an executor service backed by virtual threads via Executors.newVirtualThreadExecutor we do not specify the number of platform/kernel threads to be used. That s handled by the implementation of your JVM.
Experimental builds of Project Loom are available now, based on early-access Java 17. The Loom team seeks feedback.

How does multi core performance relate to execution contexts and thread pools in Scala

So i have a existing Spring library that performs some blocking tasks(exposed as services) that i intend to wrap using Scala Futures to showcase multi processor capabilities. The intention is to get people interested in the Scala/Akka tech stack.
Here is my problem.
Lets say i get two services from the existing Spring library. These services perform different blocking tasks(IO,db operations).
How do i make sure that these tasks(service calls) are carried out across multiple cores ?
For example how do i make use of custom execution contexts?
Do i need one per service call?
How does the execution context(s) / thread pools relate to multi core operations ?
Appreciate any help in this understanding.
You cannot ensure that tasks will be executed on different cores. The workflow for the sample program would be as such.
Write a program that does two things on two different threads (Futures, Java threads, Actors, you name it).
JVM sees that you want two threads so it starts two JVM threads and submits them to the OS process dispatcher (or the other
way round, doesn't matter).
OS decides on which core to execute each thread. Usually, it will try to put threads on different cores to maximize the overall efficiency but it is not guaranteed; you might have a situation that your 10 JVM threads will be executed on one core, although this is extreme.
Rule of the thumb for writing concurrent and seemingly parallel applications is: "Here, take my e.g. 10 threads and TRY to split them among the cores."
There are some tricks, like tuning CPU affinity (low-level, very risky) or spawning a plethora of threads to make sure that they are parallelized (a lot of overhead and work for the GC). However, in general, OS is usually not that overloaded and if you create two e.g. actors, one for db one for network IO they should work well in parallel.
UPDATE:
The global ExecutionContext manages the thread pool. However, you can define your own and submit runnables to it myThreadPool.submit(runnable: Runnable). Have a look at the links provided in the comment.

JVM thread management v.s. OS scheduling

As I know, one of the most common JVM concurrency API: futures - at least as implemented in scala - rely on user code to relinquish a thread when it is potentially going to be waiting idle. In scala it's commonly referred to as "avoiding blocking", and the developer has to implement it everywhere it makes sense.
Not quite efficient.
Is there something very entirely inherent to the JVM, that prevents the JVM switching the context of a thread to new tasks - when the thread is idle - as implemented by operating system process schedulers?
Is there something very entirely inherent to the JVM, that prevents the JVM switching the context of a thread to new tasks - when the thread is idle - as implemented by operating system process schedulers?
Mostly the need that such switch has to be done cooperatively. Every single blocking method must be wrapped or re-implemented in a way that allows the task to be resumed once it is done, after all, there is no native thread waiting for completion of the blocking action anymore.
While this can be done in principle for JVM-internal blocking methods, consider arbitrary native code executed via JNI, the JVM wouldn't know how to stack-switch those native threads, they're stuck in native code after all.
You might want to have a look at quasar, as I understand it they implemented such wrappers or equivalents for some JDK-internal methods, such as sleep, park/unpark, channel-based-IO and a bunch of others which allows their fibers (and thus futures running on those fibers) to perform exactly that kind of user-mode context switching while they wait for completion.
Edit: JNI alone already is sufficient to limit user-mode task switching to being an opportunistic optimization that may have to fall back to spinning up additional native threads when native code blocks a thread.
But it is not the only issue, for example on linux truly asynchronous file IO operations need filesystem and kernel support (see this SO question on AIO), which not all of them provide. Where it is not provided it has to be emulated using additional blocking IO threads, thus re-introducing all the overhead we wanted to avoid in the first place. Might as well just block on the thread pool itself and spin up additional threads, at least we'll avoid inter-thread-communication that way.
Memory-mapped files can also block a thread and force the OS-scheduler to suspend the thread due to page faults and I'm not aware of means to cooperate with the virtual memory system to avoid that.
Not to mention that all blocking calls on the VM would have to re-implemented using asynchronous equivalents provided by the OS. Miss even one and you'll have a blocked thread. If you have a blocked thread your thread pools will need an auto-grow feature and we're back to square one.
Last but not least, there may be cases where blocking, one-thread-per-filedescriptor IO may be desirable. The pervasive changes required to guarantee user-mode switching might break those.
So all in all, user mode switching is possible, sometimes. But the JVM cannot make hard guarantees about it so it has to implement all the native thread handling anyway and the programmer will have code at least somewhat cooperatively with the assumptions of the thread pools executing those futures in mind. Some of the cases could be eliminated, but not all of them.

java fork-join executor usage for db access

The ForkJoinTask
explicitly calls out "Subdividable tasks should also not perform blocking I/O". It's primary aim is "computational tasks calculating pure functions or operating on purely isolated objects". My question is :-
Why design the ForkJoinTask to restrict blocking IO tasks?
What are the gotchas if i do implement a blocking IO task?
How come both spring and play frameworks, are full of examples using fork-join executors for DB calls?
In my scenario, a single request does two types of works, one of which is encryption, which pushes CPU core to 100% for 200 ms and second, few database calls. Any kind of static partitioning such as 6 threads for encryption and 2 threads for blocking IO, will not provide optimal usage of the CPU. Hence, having a fork-join executor, with certain level of over provisioning in number of threads over total CPU count, coupled with work stealing, would ensure better usage of CPU resources.
Is my above assumption and understanding around forkjoin executor correct and if not, please point me towards the gap.
Why design the ForkJoinTask to restrict blocking IO tasks?
underlying the fork join pool is shared amount of threads, if there's some IO work blocking on those threads, then less threads for CPU intensive work. other none blocking work will starve.
What are the gotchas if i do implement a blocking IO task?
typically, FJPool allocated thread about the number of processors. so if you do have to use IO blocking on threads, make sure you allocate enough threads for your other tasks.
you can also iso late your IO work on dedicated threads that is not shared with FJ pool. but you call blocking IO, your thread blocks and get scheduled for other task until unblocked
How come both spring and play frameworks, are full of examples using fork-join executors for DB calls?
play is no different. they use dedicated pools for IO task, so other task won't suffer.
The Framework does not restrict any type of processing. It is not recommended to do blocking, etc. I wrote a critique about this framework years ago, here is the point on the recommendations. This was for the Java7 version but it is still applicable for Java8.
Blocking is not fatal, sprint and play block and they work just fine. You need to be careful when using Java8 since there is a default common-fork/join pool and tying up threads there may have consequences for other users. You could always define your own f/j pool with the additional overhead, but at least you wouldn’t interfere with others using the common pool.
Your scenario doesn’t look bad. You’re not waiting for replies from the internet. Give it a try. If you run into difficulty with stalling threads, look into the ForkJoinPool.ManagedBlocker interface. Using that interface informs the f/j pool that you are doing blocking calls and the framework will create compensation threads.

How does Java makes use of multiple cores?

A JVM runs in a single process and threads in a JVM share the heap belonging to that process. Then how does JVM make use of multiple cores which provide multiple OS threads for high concurrency?
You can make use of multiple cores using multiple threads. But using a higher number of threads than the number of cores present in a machine can simply be a waste of resources. You can use availableProcessors() to get the number of cores.
In Java 7 there is fork/join framework to make use of multiple cores.
Related Questions:
Is Multi-Threaded algorithm required to make use of Multi-core processors ?
Threads per Processor
Correctly multithreaded quicksort or mergesort algo in Java?
A JVM runs in a single process and threads in a JVM share the heap belonging to that process. Then how does JVM make use of multiple cores which provide multiple OS threads for high concurrency?
Java will utilize the underlying OS threads to do the actual job of executing the code on different CPUs, if running on a multi-CPU machine. When each Java thread is started, it creates an associated OS thread and the OS is responsible for scheduling, etc.. The JVM certain does some management and tracking of the thread and Java language constructs like volatile, synchronized, notify(), wait(), etc. all affect the run status of the OS thread.
A JVM runs in a single process and threads in a JVM share the heap belonging to that process.
JVM doesn't necessary "run in a single process" because even the garbage collector and other JVM code run in different threads and the OS often represents these different threads as different processes. In Linux, for example, the single process you see in the process list is often masquerading a bunch of different thread processes. This is even if you are on a single core machine.
However, you are correct that they all share the same heap space. They actually share the same entire memory space which means code, interned strings, stack space, etc..
Then how does JVM make use of multiple cores which provide multiple OS threads for high concurrency?
Threads get their performance improvements from a couple of reasons. Obviously straight concurrency often makes the program run faster. Being able to do multiple CPU tasks at the same time can (though not always) improve the throughput of the application. You are also able to isolate IO operations to a single thread meaning that other threads can be running while a thread is waiting on IO (read/write to disk/network, etc.).
But in terms of memory, threads get a lot of their performance improvements because of local per-CPU cached memory. When a thread runs on a CPU, the local high speed memory cache for the CPU helps the thread isolate storage requests locally without having to spend the time to read or write to central memory. This is why volatile and synchronized calls include memory synchronization constructs because the cache memory has to be flushed to main memory or invalidated when threads need to coordinate their work or communicate with each other.
Java will benefit from multiple cores, if the OS distribute threads over the available processors. JVM itself do not do anything special to get its threads scheduled evenly across multiple cores. A few things to keep in mind:
While implementing parallel algorithms, it might be better to spawn as many threads as there are cores. (Runtime.getRuntime().availableProcessors()). Not more, not less.
Make use of the facilities provided by the java.util.concurrent package.
Make sure that you have Java Concurrency in Practice in your personal library.
Green threads were replaced by native threads in Java 1.2.

Categories