I have a problem with my JVM Running on a CentOS 6.0 with openJDK 1.7.0_51 64Bit.
My System is a 4-Core System with 8GB Ram.
I'm running a Java multithread application that I wrote myself. It's supposed to insert tons of Data into a NoSQL Database.
For that, I'm spawning 4 threads, using a "CachedThreadPoolExecutor" from java.concurrent.Executors.
I instantiate 4 Workers that implement the "Runnable" Interface. Afterwards I execute the Thread using the threadpool. Here's my code:
public void startDataPump(int numberOfWorkers){
//class "DataPump" implements runnable
for (int i = 0; i < numberOfWorkers; i++){
DataPump pump = new DataPump();
//"workerList" contains all workers and is a simple arrayList to keep track of the workers
workerList.add(pump);
//"workers" is the thradpool that has been
//initialized earlier with "Executors.newCachedThreadPool()
workers.execute(pump);
}
}
When running this, using a parameter of 4, it will spawn 4 Threads in the Threadpool. I assumed that the JVM or my OS would be smart enough to schedule these threads on all of my cores.
HOWEVER, only one core of my cpu is working at 100%,the others remain almost idle.
Am I doing anything wrong in my code or is this a JVM/OS problem. If so, is there anything I can do about that?
Running this application on only 1 core is extremeley slowing down the whole app.
Help is greatly appreciated :)
Please bear in mind that its the OS and not the JVM responsible for CPU affinity - which is why I suggested that you first figure out how many CPU's you have and then perhaps use schedutils to configure processor affinity for a certain process.
cpu info - use one of the three below
/proc/cpuinfo
lscpu
nproc
install schedutils to confgure processor affinity
yum install schedutils
You can assign cpu affinity via schedutils as follows (2 is second proceccor and 23564 is process id):
taskset -c 2 -p 23564
Scheduling thread is not JVM activity but it is OS activity.if OS finds threads are independent of each other and can be executed seperately then it schedules it on another core.
I am not sure about schedutils but I think it works at application level (it allows you to set cpu affinity but last decision is taken by OS)
one thing about using cores is OS scheduler schedules new processes on new cores as every process has its own process area independent of other processes (thus they can be executed parallely without any obstruction)
Try creating new process for each thread that will help improve your cpu utilization(use of more cores) but there is disadvantage of it also, Every process creates its own process area so extra memory is required for each process (for each thread in your case) if you have good amount of memory available then you can try this one.
if it just a linux OS then "sar" command is enough for monitoring per core cpu utilization (sar is base package in linux almost all utilities use 'sar' so overhead on system will be less).
If your environment are virtual or in other hand special cpu scheduling like docker, there is no way to get Java to automatically use find out many cores are available and use them all. You have to specify how many cores you want to use via
On JDK >= 10, use the following JDK options:
-XX:ActiveProcessorCount=2
On JDK >= 8, use the following JDK options:
-XX:+UnlockExperimentalVMOptions > -XX:ActiveProcessorCount=2
Related
We have a computationally demanding java program (scientific research) that is designed single-threaded. However, when executed, it loads much more than 1 CPU core (we noticed it the hard way - cluster job scheduler killed our program because it loaded more cores than requested). We encountered this weird phenomenon both on linux (Debian, Ubuntu) and windows (7).
I understand that there are several background threads added by java/jvm (garbage collector) so even single-threaded program can load more than one core but I doubt that these background processes could load another full core or two.
I ask for any idea what may be causing this. Thanks for any hints. Feel free to ask for any details, though I can't post the code here (first, it's quite a lot of code, second, it is still under research and I cannot publish anything yet).
Let me first give you my condolences for having to run your program in an environment where someone has found it more intellectually fulfilling to kill jobs attempting to use more than one core, than to restrict jobs to using just one core. But let's move on with the question.
When I pause a random single-threaded java program and look at my debugger's thread listing there is about half a dozen threads in there. That's just how the JVM works. There is at least one thread for the garbage collection, another thread for running finalizers, and various other stuff, most of which I don't even know what purpose they serve. We lost the game of knowing precisely what is going on in our machines a couple of decades ago.
There may be options that you could use to tell the JVM to reduce its use of threads, for example to run garbage-collection in the same thread as your program, but I don't know them by heart, so you would need to look them up, and frankly, I doubt that it would make much difference. There will always be threads that you have no control over.
So, it seems like you are going to have to configure your own job to not use more than one core. I have done it at work, with some success, but today is Saturday, so I do not have access to the script files that I used, so I am going to try and help with whatever I remember.
The concepts you are looking for are "process thread affinity" and "NUMA".
Under Windows, the start command (built into cmd.exe) allows you to specify the number of logical CPUs (in other words, cores) to run your process on. start /affinity 1 myapp will run myapp limiting it on core 1.
Under Linux there are at least a couple of different commands that allow you to launch a process on a limited subset of cores. One that I know of is taskset and another is numactl.
There are set of parameters for JVM you could play. For Java 7 and earlier:
-XX:ParallelGCThreads=n Sets the number of threads used during parallel phases of the garbage collectors.
-XX:ConcGCThreads=n Number of threads concurrent garbage collectors will use
For Java 8 there are another options which depends on OS. You could see them for Windows here. Some you could find helpful:
-XX:CICompilerCount=threads Sets the number of compiler threads to use for compilation
-XX:ConcGCThreads=threads Sets the number of threads used for concurrent GC. The default value depends on the number of CPUs available to the JVM (!possible cause of your problem!)
-XX:ParallelGCThreads=threads Sets the number of threads used for parallel garbage collection in the young and old generations. The default value depends on the number of CPUs available to the JVM
-XX:+UseParNewGC Enables the use of parallel threads for collection in the young generation. By default, this option is disabled (but it could be enabled due to another options)
If you provide us additional info then answers would be more helpful and informative
Can somebody help me to understand how JVM spread threads between available CPU cores? Here som my vision how it is work but pls correct me.
So from the begining: when computer is started then bootstrap thread (usually thread 0 in core 0 in processor 0) starts up fetching code from address 0xfffffff0. All the rest CPUs/cores are in special sleep state called Wait-for-SIPI(WFS).
Then after OS is loaded it starts managing processes and schedule them between CPU/cores sending a special inter-processor-interrupt (IPI) over the Advanced Programmable Interrupt Controller (APIC) called a SIPI (Startup IPI) to each thread that is in WFS. The SIPI contains the address from which that thread should start fetching code.
So for example OS started JVM by loading JVM code in memory and pointing one of the CPU cores to its address (using mechanism described above). After that JVM that is executed as separate OS process with its own virtual memory area can start several threads.
So question is: how?
Does JVM use the same mechanism as OS and during time slice that OS gave to JVM can send SIPI to other cores and point the to address of the tasks that should be executed in a separate thread? If yes then how is restored the original program that could be executed by OS on this core?
Assume that it is not correct vision as suppose that this tasks of involving other CPUs/cores should be managed via OS. Overwise we could interrupt execution of some OS processes running in parallel on other cores. So if JVM wants to start new thread on other CPU/core it makes some OS call and send address of the task to be executed to the OS. OS schedule execution as for other programs but with different that this execution should happen in the same process to be able to access the same address space as the rest JVM threads.
How is it done? Can somebody describe it in more details?
The OS manages and schedule threads by default. The JVM makes the right calls to the OS to make this happen, but doesn't get involved.
Does JVM use the same mechanism as OS
The JVM uses the OS, it has no idea what actually happens.
Each process has its own virtual address space, again managed by the OS.
I have a library which uses JNA to wrap setaffinity on Linux and Windows. You need to do this as thread scheduling is controlled by the OS not the JVM.
https://github.com/OpenHFT/Java-Thread-Affinity
Note: in most cases, using affinity either a) doesn't help or b) doesn't help as much as you might think.
We use it to reduce jitter of around 40 - 100 microseconds which doesn't happen often, but often enough to impact our performance profile. If you want your 99%ile latencies to be as low as possible, in the micro-second range, thread affinity is essential. If you are ok with 1 in 100 requests taking 1 ms longer, I wouldn't bother.
I have a simple stress test that has a configurable number of threads based on the server it's running on. On one Windows machine with 16 cores I'm able to start the process which in turn launches 16 threads and keeps all cores maxxed out for the duration of the test.
On another Windows machine with 16 cores I run the test but it only uses 8 of the 16 available i.e it's using one CPU and not the other.
I'm aware this is more of a Windows config question than a Java question as I've shown the test itself behaves as expected on one machine and not another.
Using Task Manager I can see the system has 16 cores but something is preventing access to the other 8.
Is there a setting that is preventing a single process using all the cores?
If StackOverflow isn't the correct home for this question, please suggest another Stack* where I should move it.
Update One
On the problematic machine I was previously attempting to run 1 process with 16 threads. If I run two processes with 8 threads each I am able to consume 100% of the cores.
This turned out to be the same issue that's posted here:
Unable to use more than one processor group for my threads in a C# app
Which in turn links to an HP advisory here:
http://h20566.www2.hpe.com/hpsc/doc/public/display?sp4ts.oid=5379860&docId=emr_na-c04650594&docLocale=en_US
I am developing a web based application.
The computer where I write the code has 4 core Intel i5 4440 3.10 GHz processor.
The computer where I deploy the application has 8 core Intel i7 4790K 4.00 GHz processor.
One of the tasks that needs to be calculated is very heavy so I decided to use the java executor framework.
I have this:
ExecutorService executorService = Executors.newFixedThreadPool(8);
and then I add 30 tasks at once.
On my development machine the result was calculated in 3 seconds (it used to be 20 secs. when I used only one thread) whereas on the server machine it got calculated in 16 seconds ( which is the same as it used to be when the code used only one thread ).
As you can guess I am quite confused and have no idea why on the server machine it got calculated so much slower.
Anyone who knows why the faster processor does not get benefits from the multithreading algorithm?
It is hard to guess root cause without more evidence. Could you
profile running application on server machine?
connect to server machine with JConsole and see threading info
My guess is that server machine is under heavy load (maybe from other applications or background threads?). Maybe your server user/java application is allowed to use only core?
I would start with using top (on linux) or Task Manager (windows) to find out if server is under load when you run your application. Profiling/JMX monitoring adds overhead, but you will be able to find out how many threads are actually used.
Final note- is server using same architecture (32/64bit), operating system and major/minor Java version than development?
I wrote a very simple single threaded java application that simply iterates (a few times) over a list of Integer:s and calculates the sum. When I run this on my Linux machine (Intel X5677 3.46GHz quad-core), it takes the program about 5 seconds to finish. Same time if I restrict the jvm to two specific cores using taskset (which was quite expected, as the application is single threaded and the cpu load is < 0.1% on all cores). However – when I restrict the jvm to a single core, the program suddenly executes extreeemly slow and it takes 350+ seconds for it to finish. I could understand if it was only marginally slower when restricted to a single core as the jvm is running a few other threads in addition to the main thread, but I can’t understand this extreme difference. I ran the same program on an old laptop with a single core, and it executes in about 15 seconds. Does anyone understand what is going on here, or has anyone successfully restricted a jvm to a single core on multicore system without experiencing something like this?
Btw, I tried this with both hotspot 1.6.0_26-b03 and 1.7.0-b147 – same problem.
Many thanks
Yes, this seems counter-intuitive, but the simple solution would be to not do it. Let the JVM use 2 cores.
FWIW, my theory is that the JVM is looking at the number of cores that the operating system is reporting, assuming that it will be able to use all of them, and tuning itself based on that assumption. But the fact that you've pinned the JVM to a single core is making that tuning pessimal.
One possibility is that the JVM has turned on spin-locking. That is a strategy where a thread that can't immediately acquire a lock will "spin" (repeatedly testing the lock) for a period, rather than immediately rescheduling. This can work well if you've got multiple cores and the locks are held for a short time, but if there is only one core available then spinlocking is an anti-optimization.
(If this is the real cause of the problem, I believe there is a JVM option you can set to turn off spinlocks.)
This would be normal behaviour if you have two or more threads with an interdependence on each other. Imagine a program where two threads ping-ponging messages or data between them. When they are both running this can take 10 - 100 ns per ping-pong. When they have to context switch to run they can take 10 - 100 micro-seconds each. A 1000x increase I wouldn't find surprising.
If you want to limit the program to one core, you may have to re-write portions of it so its designed to run on one core efficiently.