I am trying to use executor services in one of my application where I have created a pool of 8 since my machine has 4 cores and as per my recent searches, I found that only 2 active threads can work on a core.
When I checked number of cores via java also found the value to 4
int cores = Runtime.getRuntime().availableProcessors();
ExecutorService executor = Executors.newFixedThreadPool(cores*2);
Please suggest am I doing correctly because I dont see any much worth of creating a pool of 500 when my cpu can handle only 8 threads.
Hyper-threading
You should read up on hyper-threading technology. That term is specifically the brand-name used by Intel for its proprietary simultaneous multithreading (SMT) implementation, but is also used more generally for SMT.
An over-simplified explanation:
In a conventional CPU, switching between threads is quite expensive. Registers are the holding place for several pieces of data actually being worked on by the CPU core. The values in those registers have to be swapped out when switching threads. Likewise caches (holding places for data, slower than registers but faster than RAM) may also be cleared. All this takes time.
Hyper-threading design in a CPU adds a duplicate set of registers. Each core having a double set of registers means it can switch between threads without swapping out the values in the registers. The switch between threads is much faster, so much so that the CPU lies to the operating system, reporting each core as a pair of (virtual) cores. So a 4-core chip will appear as 8 cores, for example.
I found that only 2 active threads can work on a core
Be aware that switching threads still has some expense, just a much lower expense. A hyper-threaded CPU core is still executing only one thread at a time. Being hyper-threaded means the switching between threads is easier & faster.
For use an a machine where the threads are often in a holding pattern, waiting for some external function to complete such as a call out over the network, hyper-threading makes much sense. For applications where the cores are likely to be doing the kind of work that is CPU-bound such as number-crunching, simulations, scientific data analysis, then hyper-threading may not be as useful. So on machines doing such work, the sysadmin may decide to disable hyper-threading, so 4 cores are really just 4 cores for example. Also, because of the recent security vulnerabilities related to hyper-threading technology, some sysadmins may decide to disable hyper-threading.
Thread pools
creating a pool of 500 when my cpu can handle only 8 threads.
The sizing of a thread pool depends on the behavior of your application(s). If have CPU-bound apps, then you certainly want to limit the number of such CPU-intensive threads to less than the number of actual or virtual cores. If your apps are not CPU bound, if they are often doing file I/O, network I/O, or other activities where they are often do nothing while waiting on other resources, then you can have more threads than cores. If your threads often sit idle doing nothing at all, then you can have even more threads going.
What is the best way to create thread pool in java
There are no specific rules to help you here. You must make an educated guess initially, then monitor your app and host machine in production. And so you may want to have a way to set the number of threads being used in your apps during runtime rather than hard-coding a number. For example, use preference settings or use JMX. Learn to use profiling tools such as Java Flight Recorder and Mission Control; both are now bundled with OpenJDK-based distributions of Java. If you are deploying to a system supporting DTrace (macOS, BSD, etc.), that may help as well.
Within an app with different kinds of workloads going on in various parts of functionality, it may make sense to maintain multiple thread pools. Use a pool with a very small number of threads for the CPU-intensive work, and a pool with a larger number of threads for CPU-non-intensive work. The Executors framework within modern Java helps make this easy.
Take into account all your apps you may be deploying to a machine. And take account of all the other apps running on that machine. And take account of the CPU needs of the operating system. After all this, you may find that some of your thread pools should be set to only one or two threads at most.
Tricky stuff
Thread-safety is very tricky complicated work. When sharing resources between threads (variables, files, etc.) you must educate yourself about the issues involved in protecting those resources from abuse.
Required reading: Java Concurrency in Practice by Brian Goetz et al.
Go to System properties and check how many cores (physical cores) and logical processors (virtual cores) are there.
For example :
if your system has n cores and n logical processors. This means your processor don't have a support for hyperthreading.
if your system has n cores and n x 2 logical processors. This means your processor have a support for hyperthreading. You can execute n * 2 threads in parallel.
Note : Suppose you have hyperthreading support. Now, you have 8 cores and 16 virtual cores.
Then, the processor will give a good throughput up to 16 threads. If you increase the thread pool more than 16 threads, the throughput will become uniform and will not change too much.
Related
Back in java 1.1 all threads were running on a single core (not taking advantage of machine's multiple cores/CPUs) and scheduled by JVM in user space (so called green threads).
Around Java 1.2 / 1.3 (depending on the underlying OS), a change was made and Java Thread objects were mapped to OS threads (pthreads in case of Linux), which takes full advantage of multiple cores, but OTOH creating a thread became very expensive in terms of memory (because of crazy huge initial stack size of OS threads), which heavily limits the number of concurrent requests that a single machine can handle in thread-per-request model. This required server-side architectures to switch to the asynchronous model (non-blocking I/O package was introduced, AsyncContext was added to servlet API, etc) which has been continuously confusing several generations of Java server-side devs up to this day: at first most APIs look like they were intended for thread-per-request model and one needs to carefully read API documentations to find async capabilities bootstrapped to them from a side.
Only recently project Loom finally aims to deliver lightweight threads that are backed by a thread pool (a Java thread pool of "old-style" Java threads, which in turn map to OS threads) and thus combining the advantages: cheap to create in large quantities threads that do utilize multiple cores and can be lightheartedly suspended on blocking operations (such as I/O etc).
Why is this happening only now, after 20 years, instead of right away in Java 1.3? ie: why Java threads were made to map 1-1 to OS threads instead of being backed (executed) by JVM's internal thread pool of OS threads of fixed size corresponding to available CPU cores?
Is it difficult to implement in JVM maybe?
It seems not much more complex that all the asynchronous programming that java server-side devs have been forced to do for the last 20 years and what C/C++ devs have always been doing, but maybe I'm missing something.
Another possibility is that there is some blocking obstacle in architectural design of JVM that prevents it from being implemented this way.
UPDATE:
Project Loom's architecture design info was updated according to comments: many thanks!
after some consideration it seems to me that JIT compiling of java byte-code to native code may be the reason:
in the model I proposed, a native OS thread switching between execution of java threads would be a picking from its work queue a tuple <thread_stack, thread_instruction_pointer>. However because of JIT, java thread's stack basically is the same thing as backing OS thread's stack, which cannot be replaced just like that AFAIK.
So as I understand, the way I proposed to implement would only be possible if JVM was interpreting the bytcode each time and keeping java threads' stacks on its heap, which is not the case.
Suppose we have a very complex task. I know that if we use one thread then in practice we will use one core, but if I divide the task into threads equal to the number of processor cores does the program necessarily run on all the cores?
Or there is no correlation between the number of threads and cores used and the JVM 'decides'?
Actually, it is typically the Operating System that decides how many cores that a Java application gets to use. The scheduling of native threads to cores is handled by the operating system1. The JVM has little (if any) say thread scheduling.
But yes, an Java application won't necessarily get access to all of the cores. It will depend on what other applications, services, etc on system are doing. Indeed, the OS may provide ways for an administrator to externally limit the number of cores that may be used by a given (Java or not) application, or give one application priority over another.
... or there is no correlation between the number of thread and used core's
There is a correlation, but not one that is particularly useful. A JVM won't (cannot) use more cores than there are native threads in existence (including the JVM's internal and GC threads).
It is also worth noting that the OS (typically) doesn't assign a core to a native thread that is not currently runnable.
Basil Bourque notes in his answer that Project Loom will bring significant improvements to threading in Java, if and when it is incorporated into the standard releases. However, Loom won't alter the fact that the number of physical cores assigned an application JVM at any given time is controlled / limited by the OS.
1 - Or the operating system's hypervisor. Things can get a bit complicated in a cloud computing environment, for instance.
The Answer by Stephen C is true today, where Java threads are typically implemented as native threads provided by the host operating systems. But things change with Project Loom technology being developed for a future version of Java.
Project Loom brings virtual threads (fibers) to the concurrency toolbox of Java. Many virtual threads will be mapped to each of the few platform/kernel threads. A virtual thread when blocked (waiting on a call to slow resources such as file I/O, network I/O, database access, etc.) will be “parked” (set aside) allowing some other virtual thread to execute for a while on the “real” platform/kernel thread.
This parked/unparked switching will be very fast, and take little memory (using a flexible growing/shrinking stack). This makes threads “cheap”, so cheap that you might reasonably be able to run millions of threads at a time.
Returning to your question, the management of these virtual threads will be managed within the JVM rather than the host OS. But underneath our virtual threads, we rely on the same platform/kernel threads used today, and those are ultimately controlled by the host OS rather than the JVM.
By default, when setting up an executor service backed by virtual threads via Executors.newVirtualThreadExecutor we do not specify the number of platform/kernel threads to be used. That s handled by the implementation of your JVM.
Experimental builds of Project Loom are available now, based on early-access Java 17. The Loom team seeks feedback.
Concurrency in Java or some similar languages is achieved through threads or task level parallelism. But under the hood does the hardware or run time also use ILP to achieve best performance.
Little further elaboration: In a multi core processor (say 4 per system) with multiple threads (say 2 per core) ( i.e total 8 threads per system), a java thread is executed in one of the several (8 in this case) processor threads. But if the system determines that all or several other threads are doing nothing but staying ideal, can the hardware or runtime do any legal re-orderings and execute them in other threads on same or other cores and fetch the results back(or in to main memory)
I am bothered about does java implementation allow this or even otherwise it is up to hardware to handle this independently even with out the JVM even knowing anything.
It's a little unclear what you're asking, but I don't think it has much to do with Java.
I think you're talking about (at least) two different things:
"ILP" is generally used to refer to a set of techniques that occur within a single core (such as pipelining and branch prediction), and has little to do with threading or multi-core. These techniques are transparent implementation details of the CPU, and typically not exposed in a way that you (or the runtime) can interact with directly.
Threads are swapped on and off cores by the kernel scheduler if they become blocked (and even if they're not, to ensure fairness).
A JVM runs in a single process and threads in a JVM share the heap belonging to that process. Then how does JVM make use of multiple cores which provide multiple OS threads for high concurrency?
You can make use of multiple cores using multiple threads. But using a higher number of threads than the number of cores present in a machine can simply be a waste of resources. You can use availableProcessors() to get the number of cores.
In Java 7 there is fork/join framework to make use of multiple cores.
Related Questions:
Is Multi-Threaded algorithm required to make use of Multi-core processors ?
Threads per Processor
Correctly multithreaded quicksort or mergesort algo in Java?
A JVM runs in a single process and threads in a JVM share the heap belonging to that process. Then how does JVM make use of multiple cores which provide multiple OS threads for high concurrency?
Java will utilize the underlying OS threads to do the actual job of executing the code on different CPUs, if running on a multi-CPU machine. When each Java thread is started, it creates an associated OS thread and the OS is responsible for scheduling, etc.. The JVM certain does some management and tracking of the thread and Java language constructs like volatile, synchronized, notify(), wait(), etc. all affect the run status of the OS thread.
A JVM runs in a single process and threads in a JVM share the heap belonging to that process.
JVM doesn't necessary "run in a single process" because even the garbage collector and other JVM code run in different threads and the OS often represents these different threads as different processes. In Linux, for example, the single process you see in the process list is often masquerading a bunch of different thread processes. This is even if you are on a single core machine.
However, you are correct that they all share the same heap space. They actually share the same entire memory space which means code, interned strings, stack space, etc..
Then how does JVM make use of multiple cores which provide multiple OS threads for high concurrency?
Threads get their performance improvements from a couple of reasons. Obviously straight concurrency often makes the program run faster. Being able to do multiple CPU tasks at the same time can (though not always) improve the throughput of the application. You are also able to isolate IO operations to a single thread meaning that other threads can be running while a thread is waiting on IO (read/write to disk/network, etc.).
But in terms of memory, threads get a lot of their performance improvements because of local per-CPU cached memory. When a thread runs on a CPU, the local high speed memory cache for the CPU helps the thread isolate storage requests locally without having to spend the time to read or write to central memory. This is why volatile and synchronized calls include memory synchronization constructs because the cache memory has to be flushed to main memory or invalidated when threads need to coordinate their work or communicate with each other.
Java will benefit from multiple cores, if the OS distribute threads over the available processors. JVM itself do not do anything special to get its threads scheduled evenly across multiple cores. A few things to keep in mind:
While implementing parallel algorithms, it might be better to spawn as many threads as there are cores. (Runtime.getRuntime().availableProcessors()). Not more, not less.
Make use of the facilities provided by the java.util.concurrent package.
Make sure that you have Java Concurrency in Practice in your personal library.
Green threads were replaced by native threads in Java 1.2.
I'm working on an application which interacts with hundreds of devices across a network. The type of work being committed requires a lot of the concurrent threads (mostly because each of them requires network interaction and does so separately, but for other reasons as well). At the moment, we're in the area of requiring about 20-30 threads per device being interacted with.
A simple calculation puts this at thousands of threads, even up to 10,000 threads. If we put aside the CPU penalty for thread-switching, etc., how many threads can Java 5 running on CentOS 64-bit handle? Is this just a matter of RAM or is there anything else we should consider?
Thanks!
In such situation its always recomended to use Thread Pooling.
Thread pools address two different problems: they usually provide improved performance when executing large numbers of asynchronous tasks, due to reduced per-task invocation overhead, and they provide a means of bounding and managing the resources, including threads, consumed when executing a collection of tasks. Each ThreadPoolExecutor also maintains some basic statistics, such as the number of completed tasks.
ThreadPoolExecutor is class you should be using.
http://www.javamex.com/tutorials/threads/ThreadPoolExecutor.shtml
I think up to 65k threads is OK with java, the only thing you need to consider is stack space - linux by default allocates 48k per thread/process as stack space, which is wasteful for java (which doesn't have stack-allocated objects, hence uses much less stack space). This will easily use 500 megs for 10k threads.
If this is really an absolute requirement, you might wan't to have a look at a language that's specifically build to deal with this level of concurrent threads, such as erlang.
Like others are suggesting, you should use NIO. We had an app that used a lot (but much less than you are planning) of threads (e.g. 1,000 ) and it was already very inefficient. If you have to use THAT much threads, it's definitely time to consider the use of NIO.
For network, if your apps are using HTTP, one very easy tool would be Async-HTTP-client by 2 very famous author in this field.
If you use a different protocol, using the underlying implementation of Async-HTTP-client (netty) would be recommendable.