JVM thread management v.s. OS scheduling

JVM thread management v.s. OS scheduling - java

As I know, one of the most common JVM concurrency API: futures - at least as implemented in scala - rely on user code to relinquish a thread when it is potentially going to be waiting idle. In scala it's commonly referred to as "avoiding blocking", and the developer has to implement it everywhere it makes sense.
Not quite efficient.
Is there something very entirely inherent to the JVM, that prevents the JVM switching the context of a thread to new tasks - when the thread is idle - as implemented by operating system process schedulers?

Is there something very entirely inherent to the JVM, that prevents the JVM switching the context of a thread to new tasks - when the thread is idle - as implemented by operating system process schedulers?
Mostly the need that such switch has to be done cooperatively. Every single blocking method must be wrapped or re-implemented in a way that allows the task to be resumed once it is done, after all, there is no native thread waiting for completion of the blocking action anymore.
While this can be done in principle for JVM-internal blocking methods, consider arbitrary native code executed via JNI, the JVM wouldn't know how to stack-switch those native threads, they're stuck in native code after all.
You might want to have a look at quasar, as I understand it they implemented such wrappers or equivalents for some JDK-internal methods, such as sleep, park/unpark, channel-based-IO and a bunch of others which allows their fibers (and thus futures running on those fibers) to perform exactly that kind of user-mode context switching while they wait for completion.
Edit: JNI alone already is sufficient to limit user-mode task switching to being an opportunistic optimization that may have to fall back to spinning up additional native threads when native code blocks a thread.
But it is not the only issue, for example on linux truly asynchronous file IO operations need filesystem and kernel support (see this SO question on AIO), which not all of them provide. Where it is not provided it has to be emulated using additional blocking IO threads, thus re-introducing all the overhead we wanted to avoid in the first place. Might as well just block on the thread pool itself and spin up additional threads, at least we'll avoid inter-thread-communication that way.
Memory-mapped files can also block a thread and force the OS-scheduler to suspend the thread due to page faults and I'm not aware of means to cooperate with the virtual memory system to avoid that.
Not to mention that all blocking calls on the VM would have to re-implemented using asynchronous equivalents provided by the OS. Miss even one and you'll have a blocked thread. If you have a blocked thread your thread pools will need an auto-grow feature and we're back to square one.
Last but not least, there may be cases where blocking, one-thread-per-filedescriptor IO may be desirable. The pervasive changes required to guarantee user-mode switching might break those.
So all in all, user mode switching is possible, sometimes. But the JVM cannot make hard guarantees about it so it has to implement all the native thread handling anyway and the programmer will have code at least somewhat cooperatively with the assumptions of the thread pools executing those futures in mind. Some of the cases could be eliminated, but not all of them.

Related

Non-blocking vs blocking Java server with JDBC calls

Our gRPC need to handle 1000 QPS and each request requires a list of sequential operations to happen, including one which is to read data from the DB using JDBC. Handling a single request takes at most 50ms.
Our application can be written in two ways:
Option 1 - Classic one blocking thread per request: we can create a large thread pool (~200) and simply assign one thread per request and have that thread block while it waits for the DB.
Option 2 - Having each request handled in a truly non-blocking fashion:. This would require us to use a non-blocking MySQL client which I don't know if it exist, but for now let's assume it exist.
My understanding is that non-blocking approach has these pros and cons:
Pros: Allows to reduce the number of threads required, and as a such reduce the memory footprint
Pros: Save some overhead on the OS since it doesn't need to give CPU time to the thread waiting for IO
Cons: For a large application (where each task is subscribing a callback to the previous task), it requires to split a single request to multiple threads creating a different kind of overhead. And potentially if a same request gets executed on multiple physical core, it adds overhead as data might not be available in L1/L2 core cache.
Question 1: Even though non blocking application seems to be the new cool thing, my understanding is that for an application that aren't memory bounded and where creating more threads isn't a problem, it's not clear that writing a non-blocking application is actually more CPU efficient than writing blocking application. Is there any reason to believe otherwise?
Question 2: My understanding is also that if we use JDBC, the connection is actually blocking and even if we make the rest of our application to be non-blocking, because of the JDBC client we lose all the benefit and in that case a Option 1 is most likely better?

For question 1, you are correct -- non-blocking is not inherently better (and with the arrival of Virtual Threads, it's about to become a lot worse in comparison to good old thread-per-request). At best, you could look at the tools you are working with and do some performance testing with a small scale example. But frankly, that is down to the tool, not the strategy (at least, until Virtual Threads get here).
For question 2, I would strongly encourage you to choose the solution that works best with your tool/framework. Staying within your ecosystem will allow you to make more flexible moves when the time comes to optimize.
But all things equal, I would strongly encourage you to stick with thread-per-request, since you are working with Java. Ignoring Virtual Threads, thread-per-request allows you to work with and manage simple, blocking, synchronous code. You don't have to deal with callbacks or tracing the logic through confusing and piecemeal logs. Simply make a thread per request, let it block where it does, and then let your scheduler handle which thread should have the CPU core at any given time.

Pros: Save some overhead on the OS since it doesn't need to give CPU time to the thread waiting for IO
It’s not just the CPU time for waiting threads, but also the overhead of switching between threads competing for the CPU. As you have more threads, more of them will be in a running state, and the CPU time must be spread between them. This requires a lot of memory management for switching.
Cons: For a large application (where each task is subscribing a callback to the previous task), it requires to split a single request to multiple threads creating a different kind of overhead. And potentially if a same request gets executed on multiple physical core, it adds overhead as data might not be available in L1/L2 core cache.
This also happens with the “classic” approach since blocking calls will cause the CPU to switch to a different thread, and, as stated before, the CPU will even have to switch between runnable threads to share the CPU time as their number increases.
Question 1: […] for an application that aren't memory bounded and where creating more threads isn't a problem
In the current state of Java, creating more threads is always going to be a problem at some point. With the thread-per-request model, it depends how many requests you have in parallel. 1000, probably ok, 10000… maybe not.
it's not clear that writing a non-blocking application is actually more CPU efficient than writing blocking application. Is there any reason to believe otherwise?
It is not just a question of efficiency, but also scalability. For the performance itself, this would require proper load testing. You may also want to check Is non-blocking I/O really faster than multi-threaded blocking I/O? How?
Question 2: My understanding is also that if we use JDBC, the connection is actually blocking and even if we make the rest of our application to be non-blocking, because of the JDBC client we lose all the benefit and in that case a Option 1 is most likely better?
JDBC is indeed a synchronous API. Oracle was working on ADBA as an asynchronous equivalent, but they discontinued it, considering that Project Loom will make it irrelevant. R2DBC provides an alternative which supports MySQL. Spring even supports reactive transactions.

Project loom: Why are virtual threads not the default?

According to the project loom documentation virtual threads behave like normal threads while having almost zero cost and the ability to turn blocking calls into non-blocking ones.
If this is true, then why are they separate things? Why not just make them the default? Is there any reason to not use them?

There are really two questions here: 1. Why are virtual threads not the default? and 2. Is there ever a reason not to use them.
Regarding the default, Java really has not concept of a "default" thread. Once virtual threads arrive, when you create a thread, you must specify whether you want a platform thread or a virtual thread. The question then becomes why have we decided not to automatically replace today's threads with virtual threads (i.e. make new Thread() create a virtual thread). The answer to that is quite simple: it would not be helpful at all and might well be quite harmful. It would not be helpful because the advantages of virtual threads come from the ability to create a great many of them. If your application creates N threads today, nothing would be gained by turning those N threads into virtual threads. The scaling advantage of virtual threads would only kick in when your application creates, say, 1000N threads, which means it would need to be changed, anyway (e.g. by replacing Executors.newFixedThreadPool with Executors.newVirtualThreadPerTaskExector). It might be harmful because while virtual threads' semantics are almost the same as platform threads, they are not perfectly backward compatible (see JEP 425 for details).
As to the question about when not to use virtual threads, there are some obvious cases. E.g. when your threads heavily interact with native code, which knows nothing about virtual threads, or when you depend on some detail that has changed for virtual threads, like the ability to subclass Thread. Other cases are not so clear. For example, CPU-bound operations do not benefit from having more threads than CPU cores, so they do not benefit from the multitude of virtual threads, but that doesn't mean that they would be harmed. We're still not ready to say that users should pick virtual threads by default, but we might well get there, as we learn more about how people use them.

Be aware that Project Loom is under active experimental development. Things may change.
No default
You asked:
Why not just make them the default?
In modern Java, we generally do not address threads directly. Instead, we use the Executors framework added years ago in Java 5.
In particular, in most cases a Java programmer uses the Executors utility class to produce an ExecutorService. That executor service is backed by various kinds of thread factories or thread pools.
For example, if you want to serialize one task after another, we would use an executor service backed by a single thread.
ExecutorService executorService = Executors.newSingleThreadExecutor() ;
If you browse through Executors class Javadoc, you will see a variety of options. 👉 None of them is "default". The programmer chooses one to suit the needs of her particular situation.
With Project Loom, we will have at least one more such option to choose from. In the preview build of Java, call the new Executors.newVirtualThreadPerTaskExecutor() to get an executor service backed by virtual threads. Go nuts, and throw a million tasks at it.
ExecutorService executorService = Executors.newVirtualThreadPerTaskExecutor() ;
You asked:
why are they separate things?
One of the highest priorities for the Java team is backward-compatibility: Existing apps should be able to run without surprise.
Virtual threads have a very different behavior and performance profile than platform threads. So I do not expect to see the Java team retrofitting virtual threads onto existing features of Java generally. They may choose to do so, but only if absolutely certain no detrimental effects will surface in the behavior of existing apps.
When to choose or avoid virtual threads
You asked:
Is there any reason to not use them?
Yes, certainly. Two reasons:
CPU-bound tasks
Tasks used to indirectly throttle other resources
CPU-bound tasks
The entire point of virtual threads is to keep the "real" thread, the platform host-OS thread, busy. When a virtual thread blocks, such as waiting for storage I/O or waiting network I/O, the virtual thread is "dismounted" from the host thread while another virtual thread is "mounted" on the host thread to get some execution done.
So, if your task’s code does not block, do not bother with virtual threads. But this kind of code is rare. Most tasks in most apps are often waiting for users, storage, networks, attached devices, etc. An example of a rare task that might not block is something that is CPU-bound like video-encoding/decoding, scientific data analysis, or some kind of intense number-crunching. Such tasks should be assigned to platform threads directly rather than virtual threads.
Throttling
Another reason to avoid virtual threads is with existing code that depends on the limits or bottlenecks of platform threads to throttle their app’s usage of other resources. For example, if a database server is limited to 10 simultaneous connections, then some app have been written to use an executor service backed by only 8 or 9 threads. Such existing code should not be blindly switched to virtual threads.
Of course such code is less than optimal. Such a code base would be better, clearer, more obvious to comprehend if explicit limiting/throttling mechanisms were utilized.
Using explicit throttling mechanisms will be needed if a programmer wants to benefit having thousands, even millions, of simultaneous virtual threads while avoiding exhausting/overloading other limited resources.
Java has long offered such throttling mechanisms. They just were not always used, given the simplicity/ease of relying on the limits/bottlenecks of a limited number of platform threads.
I am no expert on this. So rely on those who are experts. For details and insights, be sure to read the articles and watch the presentations and interviews by Ron Pressler, Alan Bateman, or other members of the Project Loom team.

Lets begin with
Why not just make them the default?
Virtual threads are wrapped upon platform threads, so you may consider them an illusion that JVM provides, the whole idea is to make lifecycle of threads to CPU bound operations.
Platform Threads versus Virtual threads. Platform threads take OS
threads hostage in IO based tasks and operations limited to number of
applicable threads with in thread pool and OS threads, by default
they are non Daemon threads
Virtual threads are implemented with JVM, in CPU bound operations the
associated to platform threads and retuning them to thread pool,
after IO bound operation finished a new thread will be called from
thread pool, so no hostage in this case.
Fourth level architecture to have better understanding.
CPU
Multicore CPU multicores with in cpu executing operations.
OS
OS threads the OS scheduler allocating cpu time to engaged OS
threads.
JVM
platform threads are wrapped totally upon OS threads with both task
operations
virtual threads are associated to platform threads in each CPU bound
operation, each virtual thread can be associated with multiple
platform threads as different times.
Virtual threads with Executer service
More effective to use executer service cause it associated to thread pool an limited to applicable threads with it, however in compare of virtual threads, with Executer service and virtual contained we do not ned to handle or manage the associated thread pool.
try(ExecutorService service = Executors.newVirtualThreadPerTaskExecutor()) {
service.submit(ExecutorServiceVirtualThread::taskOne);
service.submit(ExecutorServiceVirtualThread::taskTwo);
}
Executer service implements Auto Closable interface in JDK 19, thus when used with in 'try with resource', once it reach to end of 'try' block the 'close' api being called, alternatively main thread will wait till all submitted task with their dedicated virtual threads finish their lifecycle and associated thread pool being shutdown.
ThreadFactory factory = Thread.ofVirtual().name("user thread-", 0).factory();
try(ExecutorService service = Executors.newThreadPerTaskExecutor(factory)) {
service.submit(ExecutorServiceThreadFactory::taskOne);
service.submit(ExecutorServiceThreadFactory::taskTwo);
}
Executer service can be created with virtual thread factory as well, just putting thread factory with it constructor argument.
Can benefits features of Executer service like Future and Completable Future.
Virtual threads advantages
exhibits exact the same behavior as platform threads.
disposable and can be scaled to millions.
much more lightweight than platform threads.
fast creation time, as fast as creating string object.
the JVM does delimited continuation on IO operations, no IO for
virtual threads.
yet can have the sequential code as previous but way more effective.
the JVM gives an illusion of virtual threads, underneath whole story
goes on platform threads.
Just with usage of virtual thread CPU core become much more concurrent, the combination of virtual threads and multi core CPU with ComputableFutures to parallelized code is very powerful
Virtual threads usage cautions
Don not use monitor i.e the synchronized block, however this will fix in new release of JDK's, an alternative to do so is to use 'ReentrantLock' with try-final statement.
Blocking with native frames on stack, JNI's. its very rare
Control memory per stack (reduce thread locales and no deep recursion)
Monitoring tools not updated yet like debuggers, JConsole, VisualVM etc
Find more on JEP-425

If you make them default a good portion of the existing java code won't just be able to switch to java 19, because that code is optimized for OS threads.
Java has to be backward compatible.
There are cases where Virtual Threads don't make much sense for example
Applications that do heavy computations
If you make requests to a DB that has max connection pool, the bottleneck is not the threads
Using thread locals is not a good idea with Virtual Threads
Furthermore probably most of the existing code that deals with threads pools them which again goes against the main idea

Why aren't Java threads both lightweight (like green threads) and multi-core capable? (backed by an internal native fixed size native thread pool)

Back in java 1.1 all threads were running on a single core (not taking advantage of machine's multiple cores/CPUs) and scheduled by JVM in user space (so called green threads).
Around Java 1.2 / 1.3 (depending on the underlying OS), a change was made and Java Thread objects were mapped to OS threads (pthreads in case of Linux), which takes full advantage of multiple cores, but OTOH creating a thread became very expensive in terms of memory (because of crazy huge initial stack size of OS threads), which heavily limits the number of concurrent requests that a single machine can handle in thread-per-request model. This required server-side architectures to switch to the asynchronous model (non-blocking I/O package was introduced, AsyncContext was added to servlet API, etc) which has been continuously confusing several generations of Java server-side devs up to this day: at first most APIs look like they were intended for thread-per-request model and one needs to carefully read API documentations to find async capabilities bootstrapped to them from a side.
Only recently project Loom finally aims to deliver lightweight threads that are backed by a thread pool (a Java thread pool of "old-style" Java threads, which in turn map to OS threads) and thus combining the advantages: cheap to create in large quantities threads that do utilize multiple cores and can be lightheartedly suspended on blocking operations (such as I/O etc).
Why is this happening only now, after 20 years, instead of right away in Java 1.3? ie: why Java threads were made to map 1-1 to OS threads instead of being backed (executed) by JVM's internal thread pool of OS threads of fixed size corresponding to available CPU cores?
Is it difficult to implement in JVM maybe?
It seems not much more complex that all the asynchronous programming that java server-side devs have been forced to do for the last 20 years and what C/C++ devs have always been doing, but maybe I'm missing something.
Another possibility is that there is some blocking obstacle in architectural design of JVM that prevents it from being implemented this way.
UPDATE:
Project Loom's architecture design info was updated according to comments: many thanks!

after some consideration it seems to me that JIT compiling of java byte-code to native code may be the reason:
in the model I proposed, a native OS thread switching between execution of java threads would be a picking from its work queue a tuple <thread_stack, thread_instruction_pointer>. However because of JIT, java thread's stack basically is the same thing as backing OS thread's stack, which cannot be replaced just like that AFAIK.
So as I understand, the way I proposed to implement would only be possible if JVM was interpreting the bytcode each time and keeping java threads' stacks on its heap, which is not the case.

How does multi core performance relate to execution contexts and thread pools in Scala

So i have a existing Spring library that performs some blocking tasks(exposed as services) that i intend to wrap using Scala Futures to showcase multi processor capabilities. The intention is to get people interested in the Scala/Akka tech stack.
Here is my problem.
Lets say i get two services from the existing Spring library. These services perform different blocking tasks(IO,db operations).
How do i make sure that these tasks(service calls) are carried out across multiple cores ?
For example how do i make use of custom execution contexts?
Do i need one per service call?
How does the execution context(s) / thread pools relate to multi core operations ?
Appreciate any help in this understanding.

You cannot ensure that tasks will be executed on different cores. The workflow for the sample program would be as such.
Write a program that does two things on two different threads (Futures, Java threads, Actors, you name it).
JVM sees that you want two threads so it starts two JVM threads and submits them to the OS process dispatcher (or the other
way round, doesn't matter).
OS decides on which core to execute each thread. Usually, it will try to put threads on different cores to maximize the overall efficiency but it is not guaranteed; you might have a situation that your 10 JVM threads will be executed on one core, although this is extreme.
Rule of the thumb for writing concurrent and seemingly parallel applications is: "Here, take my e.g. 10 threads and TRY to split them among the cores."
There are some tricks, like tuning CPU affinity (low-level, very risky) or spawning a plethora of threads to make sure that they are parallelized (a lot of overhead and work for the GC). However, in general, OS is usually not that overloaded and if you create two e.g. actors, one for db one for network IO they should work well in parallel.
UPDATE:
The global ExecutionContext manages the thread pool. However, you can define your own and submit runnables to it myThreadPool.submit(runnable: Runnable). Have a look at the links provided in the comment.

java fork-join executor usage for db access

The ForkJoinTask
explicitly calls out "Subdividable tasks should also not perform blocking I/O". It's primary aim is "computational tasks calculating pure functions or operating on purely isolated objects". My question is :-
Why design the ForkJoinTask to restrict blocking IO tasks?
What are the gotchas if i do implement a blocking IO task?
How come both spring and play frameworks, are full of examples using fork-join executors for DB calls?
In my scenario, a single request does two types of works, one of which is encryption, which pushes CPU core to 100% for 200 ms and second, few database calls. Any kind of static partitioning such as 6 threads for encryption and 2 threads for blocking IO, will not provide optimal usage of the CPU. Hence, having a fork-join executor, with certain level of over provisioning in number of threads over total CPU count, coupled with work stealing, would ensure better usage of CPU resources.
Is my above assumption and understanding around forkjoin executor correct and if not, please point me towards the gap.

Why design the ForkJoinTask to restrict blocking IO tasks?
underlying the fork join pool is shared amount of threads, if there's some IO work blocking on those threads, then less threads for CPU intensive work. other none blocking work will starve.
What are the gotchas if i do implement a blocking IO task?
typically, FJPool allocated thread about the number of processors. so if you do have to use IO blocking on threads, make sure you allocate enough threads for your other tasks.
you can also iso late your IO work on dedicated threads that is not shared with FJ pool. but you call blocking IO, your thread blocks and get scheduled for other task until unblocked
How come both spring and play frameworks, are full of examples using fork-join executors for DB calls?
play is no different. they use dedicated pools for IO task, so other task won't suffer.

The Framework does not restrict any type of processing. It is not recommended to do blocking, etc. I wrote a critique about this framework years ago, here is the point on the recommendations. This was for the Java7 version but it is still applicable for Java8.
Blocking is not fatal, sprint and play block and they work just fine. You need to be careful when using Java8 since there is a default common-fork/join pool and tying up threads there may have consequences for other users. You could always define your own f/j pool with the additional overhead, but at least you wouldn’t interfere with others using the common pool.
Your scenario doesn’t look bad. You’re not waiting for replies from the internet. Give it a try. If you run into difficulty with stalling threads, look into the ForkJoinPool.ManagedBlocker interface. Using that interface informs the f/j pool that you are doing blocking calls and the framework will create compensation threads.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.