I am developing a web based application.
The computer where I write the code has 4 core Intel i5 4440 3.10 GHz processor.
The computer where I deploy the application has 8 core Intel i7 4790K 4.00 GHz processor.
One of the tasks that needs to be calculated is very heavy so I decided to use the java executor framework.
I have this:
ExecutorService executorService = Executors.newFixedThreadPool(8);
and then I add 30 tasks at once.
On my development machine the result was calculated in 3 seconds (it used to be 20 secs. when I used only one thread) whereas on the server machine it got calculated in 16 seconds ( which is the same as it used to be when the code used only one thread ).
As you can guess I am quite confused and have no idea why on the server machine it got calculated so much slower.
Anyone who knows why the faster processor does not get benefits from the multithreading algorithm?
It is hard to guess root cause without more evidence. Could you
profile running application on server machine?
connect to server machine with JConsole and see threading info
My guess is that server machine is under heavy load (maybe from other applications or background threads?). Maybe your server user/java application is allowed to use only core?
I would start with using top (on linux) or Task Manager (windows) to find out if server is under load when you run your application. Profiling/JMX monitoring adds overhead, but you will be able to find out how many threads are actually used.
Final note- is server using same architecture (32/64bit), operating system and major/minor Java version than development?
Related
I designed an authentication protocol using java. The average excution time for a single authentication on my Desktop computer is 2.87 ms. My computer has the following specification. Windows 10 with a 1.99 GHz Intel Core i7 and 8GB of RAM.
If a number of users say 10 users preform the authentication simultaneously. What is the total computational time. Can I just say (2.87*10)?
A typical Core i7 CPU has between 4 and 8 cores, and can execute 6..12 threads in parallel (# cores times hyperthreading).
Assuming
you have an i7-8650U (1.9GHz, 4 cores, hyper-threaded),
your Java server is multithreaded (which should be the case if you use any popular implementation like Tomcat, Jetty, or alike) and
there is no other CPU-intensive workloads running
You can say your server can handle 8 authentication requests simultaneously: 4 at full speed + 4 more at ~30% speed because of hyper-threading, or 2.87ms*1.3=3.73ms for every 8 users.
10 users will thus take around ~ 3.73ms+2.87ms = 6.6ms.
What's important when measuring Java performance, however, is to measure steady state under load, in order to take garbage collection overhead into account. When measuring a single request, you may often miss the garbage collection step entirely.
You can't say that the time will be multiplied by the number of users.
First of all because it would mean that your authentication mechanisms works on 1 threads, which would be terrible.
Then the time registered on your computer will for sure be different on the production or staging environment. You will have to add network hops, but on the others end servers that would do only this, but you cannot compare this time to the one on your personal computer.
If you want to test this kind of things, use performance/load testing tools. I can recommend JMeter and open source tool from the Apache foundation, or Gatling another open source tool for this.
They are designed to be used on such use case. With them you could then call your authentication API, for example with 100 users in 10 seconds, and you would see reports in the end which will give you your answer.
I have a simple stress test that has a configurable number of threads based on the server it's running on. On one Windows machine with 16 cores I'm able to start the process which in turn launches 16 threads and keeps all cores maxxed out for the duration of the test.
On another Windows machine with 16 cores I run the test but it only uses 8 of the 16 available i.e it's using one CPU and not the other.
I'm aware this is more of a Windows config question than a Java question as I've shown the test itself behaves as expected on one machine and not another.
Using Task Manager I can see the system has 16 cores but something is preventing access to the other 8.
Is there a setting that is preventing a single process using all the cores?
If StackOverflow isn't the correct home for this question, please suggest another Stack* where I should move it.
Update One
On the problematic machine I was previously attempting to run 1 process with 16 threads. If I run two processes with 8 threads each I am able to consume 100% of the cores.
This turned out to be the same issue that's posted here:
Unable to use more than one processor group for my threads in a C# app
Which in turn links to an HP advisory here:
http://h20566.www2.hpe.com/hpsc/doc/public/display?sp4ts.oid=5379860&docId=emr_na-c04650594&docLocale=en_US
I have a shared environment where there are 3 VMs with 2 jboss-as 5 instances on each of them ( total 6 instances). On these instances, we have more that 15 applications deployed and all of them are java based. Lately we are getting a high CPU on one of the VMs, and when we do a 'top' on the Vm, it gives the list of all processes with java being the one having high CPU utilization %. But as I mentioned, this VM has more than 15 java applications, we dont know which application is consuming the high CPU cycles.
Can someone please help me on this?
The PID will only give the JBoss server that the bad application is running on.
If you want to know which application is causing the problem, I recommend installing jprofiler or some other profiling tool.
I have a problem with my JVM Running on a CentOS 6.0 with openJDK 1.7.0_51 64Bit.
My System is a 4-Core System with 8GB Ram.
I'm running a Java multithread application that I wrote myself. It's supposed to insert tons of Data into a NoSQL Database.
For that, I'm spawning 4 threads, using a "CachedThreadPoolExecutor" from java.concurrent.Executors.
I instantiate 4 Workers that implement the "Runnable" Interface. Afterwards I execute the Thread using the threadpool. Here's my code:
public void startDataPump(int numberOfWorkers){
//class "DataPump" implements runnable
for (int i = 0; i < numberOfWorkers; i++){
DataPump pump = new DataPump();
//"workerList" contains all workers and is a simple arrayList to keep track of the workers
workerList.add(pump);
//"workers" is the thradpool that has been
//initialized earlier with "Executors.newCachedThreadPool()
workers.execute(pump);
}
}
When running this, using a parameter of 4, it will spawn 4 Threads in the Threadpool. I assumed that the JVM or my OS would be smart enough to schedule these threads on all of my cores.
HOWEVER, only one core of my cpu is working at 100%,the others remain almost idle.
Am I doing anything wrong in my code or is this a JVM/OS problem. If so, is there anything I can do about that?
Running this application on only 1 core is extremeley slowing down the whole app.
Help is greatly appreciated :)
Please bear in mind that its the OS and not the JVM responsible for CPU affinity - which is why I suggested that you first figure out how many CPU's you have and then perhaps use schedutils to configure processor affinity for a certain process.
cpu info - use one of the three below
/proc/cpuinfo
lscpu
nproc
install schedutils to confgure processor affinity
yum install schedutils
You can assign cpu affinity via schedutils as follows (2 is second proceccor and 23564 is process id):
taskset -c 2 -p 23564
Scheduling thread is not JVM activity but it is OS activity.if OS finds threads are independent of each other and can be executed seperately then it schedules it on another core.
I am not sure about schedutils but I think it works at application level (it allows you to set cpu affinity but last decision is taken by OS)
one thing about using cores is OS scheduler schedules new processes on new cores as every process has its own process area independent of other processes (thus they can be executed parallely without any obstruction)
Try creating new process for each thread that will help improve your cpu utilization(use of more cores) but there is disadvantage of it also, Every process creates its own process area so extra memory is required for each process (for each thread in your case) if you have good amount of memory available then you can try this one.
if it just a linux OS then "sar" command is enough for monitoring per core cpu utilization (sar is base package in linux almost all utilities use 'sar' so overhead on system will be less).
If your environment are virtual or in other hand special cpu scheduling like docker, there is no way to get Java to automatically use find out many cores are available and use them all. You have to specify how many cores you want to use via
On JDK >= 10, use the following JDK options:
-XX:ActiveProcessorCount=2
On JDK >= 8, use the following JDK options:
-XX:+UnlockExperimentalVMOptions > -XX:ActiveProcessorCount=2
Hi I'm trying to test my JAVA app on Solaris Sparc and I'm getting some weird behavior. I'm not looking for flame wars. I just curious to know what is is happening or what is wrong...
I'm running the same JAR on Intel and on the T1000 and while on the Windows machine I'm able to get 100% (Performance monitor) cpu utilisation on the Solaris machine I can only get 25% (prstat)
The application is a custom server app I wrote that uses netty as the network framework.
On the Windows machine I'm able to reach just above 200 requests/responses a second including full business logic and access to outside 3rd parties while on the Solaris machine I get about 150 requests/responses at only 25% CPU
One could only imagine how many more requests/responses I could get out of the Sparc if I can make it uses full power.
The servers are...
Windows 2003 SP2 x64bit, 8GB, 2.39Ghz Intel 4 core
Solaris 10.5 64bit, 8GB, 1Ghz 6 core
Both using jdk 1.6u21 respectively.
Any ideas?
The T1000 uses a multi-core CPU, which means that the CPU can run multiple threads simultaneously. If the CPU is at 100% utilization, it means that all cores are running at 100%. If your application uses less threads than the number of cores, then your application cannot use all the cores, and therefore cannot use 100% of the CPU.
Without any code, it's hard to help out. Some ideas:
Profile the Java app on both systems, and see where the difference is. You might be surprised. Because the T1 CPU lacks out-of-order execution, you might see performance lacking in strange areas.
As Erick Robertson says, try bumping up the number of threads to the number of virtual cores reported via prstat, NOT the number of regular cores. The T1000 uses UltraSparc T1 processors, which make heavy use of thread-level parallelism.
Also, note that you're using the latest-gen Intel processors and old Sun ones. I highly recommend reading Developing and Tuning Applications on UltraSPARC T1 Chip Multithreading Systems and Maximizing Application Performance on Chip Multithreading (CMT) Architectures, both by Sun.
This is quite an old question now, but we ran across similar issues.
An important fact to notice is that SUN T1000 is based on UltraSpac T1 processor which only have 1 single FPU for 8 cores.
So if you application does a lot or even some Float-Point calculation, then this might become an issue, as the FPU will become the bottleneck.