I have a simple stress test that has a configurable number of threads based on the server it's running on. On one Windows machine with 16 cores I'm able to start the process which in turn launches 16 threads and keeps all cores maxxed out for the duration of the test.
On another Windows machine with 16 cores I run the test but it only uses 8 of the 16 available i.e it's using one CPU and not the other.
I'm aware this is more of a Windows config question than a Java question as I've shown the test itself behaves as expected on one machine and not another.
Using Task Manager I can see the system has 16 cores but something is preventing access to the other 8.
Is there a setting that is preventing a single process using all the cores?
If StackOverflow isn't the correct home for this question, please suggest another Stack* where I should move it.
Update One
On the problematic machine I was previously attempting to run 1 process with 16 threads. If I run two processes with 8 threads each I am able to consume 100% of the cores.
This turned out to be the same issue that's posted here:
Unable to use more than one processor group for my threads in a C# app
Which in turn links to an HP advisory here:
http://h20566.www2.hpe.com/hpsc/doc/public/display?sp4ts.oid=5379860&docId=emr_na-c04650594&docLocale=en_US
Related
I am running a task via Docker on AWS's ECS. The task does some calculations which are CPU-bound, which I would like to run in parallel. I start a thread pool with the number of threads specified in Runtime.getRuntime().availableProcessors() which works fine locally on my PC. For some reason, on AWS ECS, this always returns 1, even though there are multiple cores available. Therefore my calculations run serially, and do not utilize the multiple cores.
For example, right now, I have a task running on a "t3.medium" instance which should have 2 cores according to the docs.
When I execute the following code:
System.out.println("Java reports " +
Runtime.getRuntime().availableProcessors() + " cores");
Then the following gets displayed on the log:
Java reports 1 cores
I do not specify the cpu parameter in ECS's task definition. I see that in the list of tasks within the ECS Management Console it has a column for "CPU" which reads 0 for my task. I also notice that in the list of instances (= VMs) it lists "CPU available" as 2048 which presumably has something to do with the fact the VM has 2 cores.
I would like my Java program to see all cores that the VM has to offer. (As would normally be the case when a Java program runs on a computer without Docker).
How do I go about doing that?
Thanks to #stdunbar in the comments for pointing me in the right direction.
EDIT: Thanks to #Imran in the comments. If you start lots of threads, they will absolutely be scheduled to multiple cores. This answer is only about getting Runtime.getRuntime().availableProcessors() to return the right value. Many "thread pools" start as many threads as that method returns: it should return the number of cores available.
There seem to be two main solutions, neither of which is ideal:
Set the cpu parameter in the task definition. For example, if you have 2 cores and want to use them both you have to set "cpu":2048 in the task's definition. This isn't very convenient for two reasons:
If you choose a bigger instance, you have to make sure to update this parameter.
If you want to have two tasks running simultaneously, both of which can sporadically use all cores for short-term activities, AWS will not schedule two tasks on a 2-core system with "cpu":2048. It says the VM is "full" from a CPU perspective. This goes against the timesharing (Unix etc.) philosophy of every task taking what it needs (for example, imagine on a desktop PC, if you run Word and Excel on a dual-core computer, and Windows wouldn't allow you to start any other tasks, on the grounds that Word might need all of one core, and Excel might do too, so if another program might need all the core at the same time, there wouldn't be enough cores.)
Use the -XX:ActiveProcessorCount=xx JVM option in JDK 10 onwards, as described here. This isn't convenient because:
As above, you have to change the value if you change your instance type.
I wrote a longer blog post describing my findings here: https://www.databasesandlife.com/java-docker-aws-ecs-multicore/
I am developing a web based application.
The computer where I write the code has 4 core Intel i5 4440 3.10 GHz processor.
The computer where I deploy the application has 8 core Intel i7 4790K 4.00 GHz processor.
One of the tasks that needs to be calculated is very heavy so I decided to use the java executor framework.
I have this:
ExecutorService executorService = Executors.newFixedThreadPool(8);
and then I add 30 tasks at once.
On my development machine the result was calculated in 3 seconds (it used to be 20 secs. when I used only one thread) whereas on the server machine it got calculated in 16 seconds ( which is the same as it used to be when the code used only one thread ).
As you can guess I am quite confused and have no idea why on the server machine it got calculated so much slower.
Anyone who knows why the faster processor does not get benefits from the multithreading algorithm?
It is hard to guess root cause without more evidence. Could you
profile running application on server machine?
connect to server machine with JConsole and see threading info
My guess is that server machine is under heavy load (maybe from other applications or background threads?). Maybe your server user/java application is allowed to use only core?
I would start with using top (on linux) or Task Manager (windows) to find out if server is under load when you run your application. Profiling/JMX monitoring adds overhead, but you will be able to find out how many threads are actually used.
Final note- is server using same architecture (32/64bit), operating system and major/minor Java version than development?
I have a problem with my JVM Running on a CentOS 6.0 with openJDK 1.7.0_51 64Bit.
My System is a 4-Core System with 8GB Ram.
I'm running a Java multithread application that I wrote myself. It's supposed to insert tons of Data into a NoSQL Database.
For that, I'm spawning 4 threads, using a "CachedThreadPoolExecutor" from java.concurrent.Executors.
I instantiate 4 Workers that implement the "Runnable" Interface. Afterwards I execute the Thread using the threadpool. Here's my code:
public void startDataPump(int numberOfWorkers){
//class "DataPump" implements runnable
for (int i = 0; i < numberOfWorkers; i++){
DataPump pump = new DataPump();
//"workerList" contains all workers and is a simple arrayList to keep track of the workers
workerList.add(pump);
//"workers" is the thradpool that has been
//initialized earlier with "Executors.newCachedThreadPool()
workers.execute(pump);
}
}
When running this, using a parameter of 4, it will spawn 4 Threads in the Threadpool. I assumed that the JVM or my OS would be smart enough to schedule these threads on all of my cores.
HOWEVER, only one core of my cpu is working at 100%,the others remain almost idle.
Am I doing anything wrong in my code or is this a JVM/OS problem. If so, is there anything I can do about that?
Running this application on only 1 core is extremeley slowing down the whole app.
Help is greatly appreciated :)
Please bear in mind that its the OS and not the JVM responsible for CPU affinity - which is why I suggested that you first figure out how many CPU's you have and then perhaps use schedutils to configure processor affinity for a certain process.
cpu info - use one of the three below
/proc/cpuinfo
lscpu
nproc
install schedutils to confgure processor affinity
yum install schedutils
You can assign cpu affinity via schedutils as follows (2 is second proceccor and 23564 is process id):
taskset -c 2 -p 23564
Scheduling thread is not JVM activity but it is OS activity.if OS finds threads are independent of each other and can be executed seperately then it schedules it on another core.
I am not sure about schedutils but I think it works at application level (it allows you to set cpu affinity but last decision is taken by OS)
one thing about using cores is OS scheduler schedules new processes on new cores as every process has its own process area independent of other processes (thus they can be executed parallely without any obstruction)
Try creating new process for each thread that will help improve your cpu utilization(use of more cores) but there is disadvantage of it also, Every process creates its own process area so extra memory is required for each process (for each thread in your case) if you have good amount of memory available then you can try this one.
if it just a linux OS then "sar" command is enough for monitoring per core cpu utilization (sar is base package in linux almost all utilities use 'sar' so overhead on system will be less).
If your environment are virtual or in other hand special cpu scheduling like docker, there is no way to get Java to automatically use find out many cores are available and use them all. You have to specify how many cores you want to use via
On JDK >= 10, use the following JDK options:
-XX:ActiveProcessorCount=2
On JDK >= 8, use the following JDK options:
-XX:+UnlockExperimentalVMOptions > -XX:ActiveProcessorCount=2
My team built a Java application using the Hadoop libraries to transform a bunch of input files into useful output.
Given the current load a single multicore server will do fine for the coming year or so. We do not (yet) have the need to go for a multiserver Hadoop cluster, yet we chose to start this project "being prepared".
When I run this app on the command-line (or in eclipse or netbeans) I have not yet been able to convince it to use more that one map and/or reduce thread at a time.
Given the fact that the tool is very CPU intensive this "single threadedness" is my current bottleneck.
When running it in the netbeans profiler I do see that the app starts several threads for various purposes, but only a single map/reduce is running at the same moment.
The input data consists of several input files so Hadoop should at least be able to run 1 thread per input file at the same time for the map phase.
What do I do to at least have 2 or even 4 active threads running (which should be possible for most of the processing time of this application)?
I'm expecting this to be something very silly that I've overlooked.
I just found this: https://issues.apache.org/jira/browse/MAPREDUCE-1367
This implements the feature I was looking for in Hadoop 0.21
It introduces the flag mapreduce.local.map.tasks.maximum to control it.
For now I've also found the solution described here in this question.
I'm not sure if I'm correct, but when you are running tasks in local mode, you can't have multiple mappers/reducers.
Anyway, to set maximum number of running mappers and reducers use configuration options mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum by default those options are set to 2, so I might be right.
Finally, if you want to be prepared for multinode cluster go straight with running this in fully-distributed way, but have all servers (namenode, datanode, tasktracker, jobtracker, ...) run on a single machine
Just for clarification...
If hadoop runs in local mode you don't have parallel execution on a task level (except you're running >= hadoop 0.21 (MAPREDUCE-1367)). Though you can submit multiple jobs at once and these getting executed in parallel then.
All those
mapred.tasktracker.{map|reduce}.tasks.maximum
properties do only apply to the hadoop running in distributed mode!
HTH
Joahnnes
According to this thread on the hadoop.core-user email list, you'll want to change the mapred.tasktracker.tasks.maximum setting to the max number of tasks you would like your machine to handle (which would be the number of cores).
This (and other properties you may want to configure) is also documented in the main documentation on how to setup your cluster/daemons.
What you want to do is run Hadoop in "pseudo-distributed" mode. One machine, but, running task trackers and name nodes as if it were a real cluster. Then it will (potentially) run several workers.
Note that if your input is small Hadoop will decide it's not worth parallelizing. You may have to coax it by changing its default split size.
In my experience, "typical" Hadoop jobs are I/O bound, sometimes memory-bound, way before they are CPU-bound. You may find it impossible to fully utilize all the cores on one machine for this reason.
We've developed a Java standalone program. We've configured in our Linux (RedHat ES 4) cron
schedule to execute this Java standalone every 10 minutes. Each standalone
may sometime take more than 1 hour to complete, or sometime it may complete
even within 5 minutes.
My problem/solution I'm looking for is, the number of Java standalones executing
at any time should not exceed, for example, 5 process. So, for example,
before even a Java standalone/process starts, if there are already 5 processes running,
then this process should not be started; otherwise this would indirectly start
creating OutOfMemoryError problems. How do I control this? I would also like to make this 5 process limit configurable.
Other Information:
I've also configured -Xms and -Xmx heap size settings.
Is there any tool/mechanism by which we can control this?
I also heard about Java Service Wrapper. What is this all about?
You can create 5 empty files (with names "1.lock",...,"5.lock") and make the app to lock one of them to execute (or exit if all files are already locked).
First, I am assuming you are using the words "thread" and "process" interchangably. Two ideas:
Have the cron job be a script that will check the currently running processes and count them. If less than threshold spawn new process, otherwise exit, here threshold can be defined in your script.
Have the main method in your executing java file check some external resource (a file, database table, etc) for a count of running processes, if it is below threshold increment and start process, otherwise exit (this is assuming the simple main method will not be enough to cause your OOME problem). You may also need to use an appropriate locking mechanism on the external resource (though if your job is every 10 minutes, this may be overkill), here you could defin threshold in a .properties, or some other configuration file for your program.
Java Service Wrapper helps you set up a java program as a Windows service or a *nix daemon. It doesn't really deal with the concurrency issue you are looking at--the closest thing is a config setting that disallows concurrent instances if its a Windows service.