Performance of ExecutorService in Java

Performance of ExecutorService in Java - java

I am trying to understand the ExecutorService in java. There is not much performance difference when I use 1 thread or 4 threads. I have a quad core CPU and I do not have any other process running.
ExecutorService exService = Executors.newFixedThreadPool(4);
exService.execute(new Test().new RunnableThread());
exService.awaitTermination(25, TimeUnit.SECONDS);
class RunnableThread implements Runnable {
#Override
public void run() {
StopWatch stopWatch = new StopWatch();
stopWatch.start();
long cnt = 0;
for (cnt = 0; cnt < 999999999; cnt++) {
try {
for (long j = 0; j < 20; j++){
x += j;
}
} catch (Exception e) {
e.printStackTrace();
}
}
stopWatch.stop();
System.out.println(stopWatch.getTime());
}
}
If my understanding is right, my task should have close to 4x performance improvement when I say newFixedThreadPool(4) right?

Unfortunately, there is no magic in the allocation of workload to threads.
Every task runs on its own thread. It does not somehow automatically get transformed into concurrent execution paths.
If you have only one task, the remaining three threads will be idle.
Multiple threads only speed up things if you can split your workload into multiple tasks that can run concurrently (and you have to do that splitting yourself).

If my understanding is right, my task should have close to 4x performance improvement when I say newFixedThreadPool(4) right?
Yes, if you're actually running 4 concurrent tasks.
Currently, you have a single task that you are submitting to the executor. Let's say that it takes 10 seconds. Even if you have 4 cores and 4 threads, Java will not be able to parallelize a single task. However, if you submit 4 independent tasks (that have no memory or lock contention), then you will see all of them complete in those 10 seconds that it took the 1 task.

Related

CompletableFuture: several tasks

How can I asynchronously execute 20 Runnable tasks(or 1 task 20 times), using 5 CompletableFutures?
That's what I've got:
Runnable task = () -> {
long startTime = System.currentTimeMillis();
Random random = new Random();
while (System.currentTimeMillis() - startTime < 3000) {
DoubleStream.generate(() -> random.nextDouble())
.limit(random.nextInt(100))
.map(n -> Math.cos(n))
.sum();
}
System.out.println("Done");
};
for (int i = 0; i < 4; i++) {
CompletableFuture<Void> future1 = CompletableFuture.runAsync(task);
CompletableFuture<Void> future2 = CompletableFuture.runAsync(task);
CompletableFuture<Void> future3 = CompletableFuture.runAsync(task);
CompletableFuture<Void> future4 = CompletableFuture.runAsync(task);
CompletableFuture<Void> future5 = CompletableFuture.runAsync(task);
future1.get();
future2.get();
future3.get();
future4.get();
future5.get();
}
If I execute this code, I can see that it only runs 3 future.get() asynchronously:
3 and then 2 that's left during 1 for() iteration
So, I would like to do all 20 tasks, as asynchronously as possible

You can use allOf to run several tasks simultaneously as one. First I create a combined of 5 tasks (the same as in your question) but then I added 10 instead (and only loped twice) and got half the execution time.
for (int i = 0; i < 2; i++) {
CompletableFuture<Void> future1 = CompletableFuture.runAsync(task);
CompletableFuture<Void> future2 = CompletableFuture.runAsync(task);
// and so on until ten
CompletableFuture<Void> future10 = CompletableFuture.runAsync(task);
CompletableFuture<Void> combined = CompletableFuture.allOf(future1, future2, future3, future4, future5, future6, future7, future8, future9, future10);
combined.get();
}

The default executor of CompletableFuture is the common pool of the ForkJoinPool, which has a default target parallelism matching the number of CPU cores minus one. So if you have four cores, at most three jobs will get executed asynchronously. Since you are forcing a wait for completion every 5 jobs, you’ll get three parallel executions, followed by two parallel executions in every loop iteration.
If you want to get a particular execution strategy like parallelism of your choice, the best way is to specify a properly configured executor. Then, you should let the executor manage the parallelism instead of waiting in a loop.
ExecutorService pool = Executors.newFixedThreadPool(5);
for (int i = 0; i < 20; i++) {
CompletableFuture.runAsync(task, pool);
}
pool.shutdown();
pool.awaitTermination(1, TimeUnit.DAYS); // wait for the completion of all tasks
This allows five parallel jobs, but will let each of the five threads pick up a new job immediately after one completed, instead of waiting for the next loop iteration.
But when you say
So, I would like to do all 20 tasks, as asynchronously as possible
it’s not clear why you are enforcing a wait after scheduling five jobs at all. The maximum parallelism can be achieve via
ExecutorService pool = Executors.newCachedThreadPool();
for (int i = 0; i < 20; i++) {
CompletableFuture.runAsync(task, pool);
}
pool.shutdown();
pool.awaitTermination(1, TimeUnit.DAYS); // wait for the completion of all tasks
This may spawn as many threads as jobs, unless one job completes before all have been scheduled, as in this case the worker thread may pick up a new job.
But this logic doesn’t require a CompletableFuture at all. You can also use:
ExecutorService pool = Executors.newCachedThreadPool();
// schedule 20 jobs and return when all completed
pool.invokeAll(Collections.nCopies(20, Executors.callable(task)));
pool.shutdown();
But when your job does not involve I/O or any other kind of waiting resp. releasing the CPU, there is no point in creating more threads than CPU cores. A pool configured to the number of processors is preferable.
ExecutorService pool = Executors.newWorkStealingPool(
Runtime.getRuntime().availableProcessors());
// schedule 20 jobs at return when all completed
pool.invokeAll(Collections.nCopies(20, Executors.callable(task)));
pool.shutdown();
In your special case this likely runs slower as your jobs use the system time to appear running faster when having more threads than cores, but are actually doing less work then. But for ordinary computational task, this will improve the performance.

Set the following system property to the number of threads you want the common fork join pool to use:
java.util.concurrent.ForkJoinPool.common.parallelism
See ForkJoinPool
The reason being that you do not specify your own fork join pool when constructing your completable futures, so it implicitly uses
ForkJoinPool.commonPool()
See CompletableFurure

How to check threads timing?

I wrote an application which reads all lines in text files and measure times. I`m wondering what will be the time of whole block.
For example if I start 2 threads at the same time:
for (int i = 0; i < 2; i++) {
t[i] = new Threads(args[j], 2);
j++;
}
try {
Thread.sleep(500);
} catch (InterruptedException e) {
e.printStackTrace();
}
System.out.println("TIME for block 1 of threads; "
+ (max(new long[]{t[0].getTime(),t[1].getTime()})));
Wait for them to stop processing the files and read operation times (by getTime). Is it good thinking for multithreading that in this case the time of block of threads, will be the maximum time got from thread? I think yes, because other threads will stop working by the time the thread with max time will stop.
Or maybe should I think in another way?

It's dangerous to argue about execution order when having multiple threads! E.g. If you run your code on a single core CPU, the threads will not really run in parallel, but sequentially, so the total run time for both threads is the sum of each thread's run time, not the maximum of both.
Fortunately, there is a very easy way to just measure this if you use an ExecutorService instead of directly using Threads (btw. this is always a good advice):
// 1. init executor
int numberOfThreads = 2; // or any other number
int numberOfTasks = numberOfThreads; // is this true in your case?
ExecutorService executor = Executors.newFixedThreadPool(numberOfThreads);
long startTime = System.currentTimeMillis();
// 2. execute tasks in parallel using executor
for(int i = 0; i < numberOfTasks; i++) {
executor.execute(new Task()); // Task is your implementation of Runnable
}
// 3. initiate shutdown and wait until all tasks are finished
executor.shutdown();
executor.awaitTermination(1, TimeUnit.MINUTES); // we won't wait forever
// 4. measure time
long delta = System.currentTimeMillis() - startTime;
Now, delta holds the total running time of your tasks. You can play around with numberOfThreads to see if more or less threads give different results.
Important note: Reading from a file is not thread-safe in Java, so it is not allowed to share a Reader or InputStream between threads!

As far as my concern You can Use System class's static methods.
You can use it in starting of the block and end of the block and subtract the later one with earlier time.
those are :
System.currentTimeMillis(); // The current value of the system timer, in miliseconds.
or
System.nanoTime(); //The current value of the system timer, in nanoseconds.

You can use
Starting of block
long startTime = System.currentTimeMillis();
End of block
long endTime = System.currentTimeMillis()- startTime;
By this you can calculate.

Java: Sending messages to a JMS queue with multiple threads

I am trying to write a Java class to both send and read messages from a JMS queue using multiple threads to speed things up. I have the below code.
System.out.println("Sending messages");
long startTime = System.nanoTime();
Thread threads[] = new Thread[NumberOfThreads];
for (int i = 0; i < threads.length; i ++) {
threads[i] = new Thread() {
public void run() {
try {
for (int i = 0; i < NumberOfMessagesPerThread; i ++) {
sendMessage("Hello");
}
} catch (Exception e) {
e.printStackTrace();
}
}
};
threads[i].start();
}
//Block until all threads are done so we can get total time
for (Thread thread : threads) {
thread.join();
}
long endTime = System.nanoTime();
long duration = (endTime - startTime) / 1000000;
System.out.println("Done in " + duration + " ms");
This code works and sends however many messages to my JMS queue that I say (via NumberOfThreads and NumberOfMessagesPerThread). However, I am not convinced it is truly working multithreaded. For example, if I set threads to 10 and messages to 100 (so 1000 total messages), it takes the same time as 100 threads and 10 messages each. Even this code below takes the same time.
for (int i = 0; i < 1000; i ++) {
sendMessage("Hello");
}
Am I doing the threading right? I would expect the multithreaded code to be much faster than just a plain for loop.

Are you sharing a single connection (a single Producer) across all threads? If so then probably you are hitting some thread contention in there and you are limited to the speed of the socket connection between your producer and your broker. Of course, it will depend much on the jms implementation you are using (and if you are using asyncSends or not).
I will recommend you to repeat your tests using completely separate producers (although, you will lose the "queue" semantic in terms of ordering of messages, but I guess that is expected).
Also, I do not recommend running performance tests with numbers so high like 100 threads. Remember that your multithread capability it at some point limited by the amount of cores you machine has (more or less, you are having also a lot of IO in here so it might help to have a few more threads than cores, but a 100 is not really a good number in my opinion)

I would also review some of the comments in this post Single vs Multi-threaded JMS Producer
What is the implementation of 'sendMessage'. How are the connections, session, and producers being reused?

can the free threads be used in doing the work of other thread simultaneously in java?

I have 10 threads filling unique codes in 10 tables simultaneously. Each thread filling up million records. After sometimes 7 tables got filled up but the rest 3 are still filling up. I want to indulge the free 7 threads in filling up the tables simultaneously with the running 3 threads can this be done??
String noOfCodes = ""+((Integer.parseInt(totalOfCodes))/10);
ExecutorService executor = Executors.newFixedThreadPool(10);
for (int i = 0; i < 10; i++) {
String threadNo = ""+i;
Runnable worker = new CodeGeneratorDAO(pgmId, digits, points, validity, noOfCodes, product, threadNo);
executor.execute(worker);
resp.setSuccess(true);
}
executor.shutdown();
while (!executor.isTerminated()) {
}
System.out.println("Finished all threads");

A simple solution is to define a Runnable executing a smaller task that your current Runnable. Breaking down the tasks will smooth the overall execution time.
You say that your Runnable "fills up 1000 records", so define your Runnable as filling up 1 record and submit all your 10 * 1000 records to be updated to your ExecutorService:
ExecutorService executor = Executors.newFixedThreadPool(10);
for(Runnable oneRecordRunnable : allRunnables) {
executor.submit(oneRecordRunnable);
}
executor.shutdown();
executor.awaitTermination(1, TimeUnit.HOURS);
As a side note, I replaced your cpu-burning while(true) loop by the awaitTermination method.

Using parallelism in Java makes program slower (four times slower!!!)

I'm writing conjugate-gradient method realization.
I use Java multi threading for matrix back-substitution.
Synchronization is made using CyclicBarrier, CountDownLatch.
Why it takes so much time to synchronize threads?
Are there other ways to do it?
code snippet
private void syncThreads() {
// barrier.await();
try {
barrier.await();
} catch (InterruptedException e) {
} catch (BrokenBarrierException e) {
}
}

You need to ensure that each thread spends more time doing useful work than it costs in overhead to pass a task to another thread.
Here is an example of where the overhead of passing a task to another thread far outweighs the benefits of using multiple threads.
final double[] results = new double[10*1000*1000];
{
long start = System.nanoTime();
// using a plain loop.
for(int i=0;i<results.length;i++) {
results[i] = (double) i * i;
}
long time = System.nanoTime() - start;
System.out.printf("With one thread it took %.1f ns per square%n", (double) time / results.length);
}
{
ExecutorService ex = Executors.newFixedThreadPool(4);
long start = System.nanoTime();
// using a plain loop.
for(int i=0;i<results.length;i++) {
final int i2 = i;
ex.execute(new Runnable() {
#Override
public void run() {
results[i2] = i2 * i2;
}
});
}
ex.shutdown();
ex.awaitTermination(1, TimeUnit.MINUTES);
long time = System.nanoTime() - start;
System.out.printf("With four threads it took %.1f ns per square%n", (double) time / results.length);
}
prints
With one thread it took 1.4 ns per square
With four threads it took 715.6 ns per square
Using multiple threads is much worse.
However, increase the amount of work each thread does and
final double[] results = new double[10 * 1000 * 1000];
{
long start = System.nanoTime();
// using a plain loop.
for (int i = 0; i < results.length; i++) {
results[i] = Math.pow(i, 1.5);
}
long time = System.nanoTime() - start;
System.out.printf("With one thread it took %.1f ns per pow 1.5%n", (double) time / results.length);
}
{
int threads = 4;
ExecutorService ex = Executors.newFixedThreadPool(threads);
long start = System.nanoTime();
int blockSize = results.length / threads;
// using a plain loop.
for (int i = 0; i < threads; i++) {
final int istart = i * blockSize;
final int iend = (i + 1) * blockSize;
ex.execute(new Runnable() {
#Override
public void run() {
for (int i = istart; i < iend; i++)
results[i] = Math.pow(i, 1.5);
}
});
}
ex.shutdown();
ex.awaitTermination(1, TimeUnit.MINUTES);
long time = System.nanoTime() - start;
System.out.printf("With four threads it took %.1f ns per pow 1.5%n", (double) time / results.length);
}
prints
With one thread it took 287.6 ns per pow 1.5
With four threads it took 77.3 ns per pow 1.5
That's an almost 4x improvement.

How many threads are being used in total? That is likely the source of your problem. Using multiple threads will only really give a performance boost if:
Each task in the thread does some sort of blocking. For example, waiting on I/O. Using multiple threads in this case enables that blocking time to be used by other threads.
or You have multiple cores. If you have 4 cores or 4 CPUs, you can do 4 tasks simultaneously (or 4 threads).
It sounds like you are not blocking in the threads so my guess is you are using too many threads. If you are for example using 10 different threads to do the work at the same time but only have 2 cores, that would likely be much slower than running all of the tasks in sequence. Generally start the number of threads equal to your number of cores/CPUs. Increase the threads used slowly gaging the performance each time. This will give you the optimal thread count to use.

Perhaps you could try to implement to re-implement your code using fork/join from JDK 7 and see what it does?
The default creates a thread-pool with exactly the same amount of threads as you have cores in your system. If you choose the threshold for dividing your work into smaller chunks reasonably this will probably execute much more efficient.

You are most likely aware of this, but in case you aren't, please read up on Amdahl's Law. It gives the relationship between expected speedup of a program by using parallelism and the sequential segments of the program.

synchronizing across cores is much slower than on a single cored environment see if you can limit the jvm to 1 core (see this blog post)
or you can use a ExecuterorService and use invokeAll to run the parallel tasks

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Performance of ExecutorService in Java - java

Related

CompletableFuture: several tasks

How to check threads timing?

Java: Sending messages to a JMS queue with multiple threads

can the free threads be used in doing the work of other thread simultaneously in java?

Using parallelism in Java makes program slower (four times slower!!!)

Categories

Resources