Java parallelstream not using optimal number of threads when using newCachedThreadPool()

Java parallelstream not using optimal number of threads when using newCachedThreadPool() - java

I have made two separate implementations of parallel reads from database.
First implementation is using ExecutorService with newCachedThreadPool() constructor and Futures: I simply make a call that returns a future for each read case and then after I make all the calls I call get() on them. This implementation works OK and is fast enough.
The second implementation is using parallel streams. When I put parallel stream call into the same ExecutorService pool it works almost 5 times slower and it seems that it is not using as many threads as I would hope. When I instead put it into ForkJoinPool pool = new ForkJoinPool(50) then it works as fast as the previous implementation.
My question is:
Why do parallel streams under-utilize threads in newCachedThreadPool version?
Here is the code for the second implementation (I am not posting the first implementation, cause that one works OK anyway):
private static final ExecutorService pool = Executors.newCachedThreadPool();
final List<AbstractMap.SimpleImmutableEntry<String, String>> simpleImmutableEntryStream =
personIdList.stream().flatMap(
personId -> movieIdList.stream().map(
movieId -> new AbstractMap.SimpleImmutableEntry<>(personId, movieId))).collect(Collectors.toList());
final Future<Map<String, List<Summary>>> futureMovieSummaryForPerson = pool.submit(() -> {
final Stream<Summary> summaryStream = simpleImmutableEntryStream.parallelStream().map(
inputPair -> {
return FeedbackDao.find(inputPair.getKey(), inputPair.getValue());
}).filter(Objects::nonNull);
return summaryStream.collect(Collectors.groupingBy(Summary::getPersonId));
});

This is related to how ForkJoinTask.fork is implemented, if the current thread comes from a ForkJoinPool it will use the same pool to submit the new tasks but if not it will use the common pool with the total amount of processors in your local machine and here when you create your pool with Executors.newCachedThreadPool(), the thread created by this pool is not recognized as coming from a ForkJoinPool such that it uses the common pool.
Here is how it is implemented, it should help you to better understand:
public final ForkJoinTask<V> fork() {
Thread t;
if ((t = Thread.currentThread()) instanceof ForkJoinWorkerThread)
((ForkJoinWorkerThread)t).workQueue.push(this);
else
ForkJoinPool.common.externalPush(this);
return this;
}
The thread created by the pool Executors.newCachedThreadPool() will not be of type ForkJoinWorkerThread such that it will use the common pool with an under optimized pool size to submit the new tasks.

Related

How does parallel stream "know" to use the enclosing ForkJoinPool?

In Java 8 one can set a custom forkJoinPool to be used by parallel streams rather than the common pool.
forkJoinPool.submit(() -> list.parallelStream().forEach(x ->{...} ))
My question is how does it technically happen?
The stream is not in any way aware it was submitted to a custom forkJoinpool and has no direct access to it. So how are the correct threads eventually used for processing the stream's tasks?
I tried looking at the source code but to no avail. My best guess is some threadLocal variable set at some point when submitting and then used by the stream later on. If so, why would the language developers choose such a way to implement the behaviour rather than, say, dependency injecting the pool into the stream?
Thanks!

From what I've read the code, the decisions is made only based on the initial thread that triggers the computation, inside the method ForkJoinTask::fork, that literally does a check against what thread triggered this (also in it's documentation):
Thread.currentThread()) instanceof ForkJoinWorkerThread
So if an instance of ForkJoinWorkerThread has started this (this is what you would get via a custom ForkJoinPool), use whatever the pool already exists and this task run in; otherwise (if it is a different thread that is not an instance of ForkJoinWorkerThread) use:
ForkJoinPool.common.externalPush(this);
Also interesting that ForkJoinWorkerThread is actually a public class, so you could start the computation inside an instance of it, but still using a different pool; though I have not tried this.

The java.util.stream.ForEachOps.ForEachOp#evaluateParallel method calls invoke():
#Override
public <S> Void evaluateParallel(PipelineHelper<T> helper,
Spliterator<S> spliterator) {
if (ordered)
new ForEachOrderedTask<>(helper, spliterator, this).invoke();
else
new ForEachTask<>(helper, spliterator, helper.wrapSink(this)).invoke();
return null;
}
which in turn calls java.util.concurrent.ForkJoinTask#doInvoke:
private int doInvoke() {
int s; Thread t; ForkJoinWorkerThread wt;
return (s = doExec()) < 0 ? s :
((t = Thread.currentThread()) instanceof ForkJoinWorkerThread) ?
(wt = (ForkJoinWorkerThread)t).pool.
awaitJoin(wt.workQueue, this, 0L) :
externalAwaitDone();
}
As seen in the above method, it finds out the current thread using Thread.currentThread().
It then uses the .pool field as in (wt = (ForkJoinWorkerThread)t).pool, which gives the current pool that this thread is running in:
public class ForkJoinWorkerThread extends Thread {
final ForkJoinPool pool; // the pool this thread works in

Frequent concurrent method calls in Java data-logger

I'm implementing a Java Data-logger which reads, at precise intervals of time, some datas from different production machines. To avoid having one call blocking the following ones, I was thinking of making a new thread for every call to the parser class.
However, this would require the creation of many threads, and then to stop them, every 10 seconds (which is my reading interval). A non-concurrent approach would cause me to have many delays when the parser gets an exception (due to the possible timeouts of the IoT devices i'm using) making the next calls to be delayed.
while(!error){
//JDBC connections and other calls here
//Queryresult is a ResultSet that returns all the machine addresses needing to be read
while(queryresult.next()){
//Parser.ParseSpeedV is the method I need to call concurrently
Double v = Parser.ParseSpeedV(..Params..);
Double s = v*queryresult.getDouble("const");
st = conn.createStatement();
st.executeUpdate("INSERT INTO ...");
}
st.close();
Thread.sleep(10000);
}
What is the best way to achieve a concurrent method calls (to the method ParseSpeedV) without having the overhead caused by thousands of thread starting every day?

What you want to use is a ScheduledExecutorService. It allows you to add tasks that are repeated at a fixed rate or fixed delay. So you can i.E. add a task that fetches data from a device every 10 seconds. The Executor service then makes sure that it is run in that interval with resonably low deviation.
final ScheduledExecutorService myScheduledExecutor = Executors.newScheduledThreadPool(16);
myScheduledExecutor.scheduleAtFixedRate(myTask, 0L, 10L, TimeUnit.SECONDS);

Your situation is the perfect use case for a Thread Pool. This part of Java's library that's built on top of simple Threads and allows you to create a fixed-sized pool of threads and reuse them over and over:
ExecutorService executor = Executors.newFixedThreadPool(5);
Any time you want to do some work you add it to the executor
executor.execute(new Runnable() {
#Override
public void run() {
// Do some work
}
});
If you call execute more than 5 times, the extra runnables are held in a queue until there's room.
Now, if you need to receive information from these runnning tasks, you need to write a class that implements Runnable and accepts some kind of object that wishes to have the information that your runnable has:
public class Worker implements Runnable {
Consumer consumer;
public Worker(Consumer consumer) {
this.consumer = consumer;
}
#Override public void run() {
// Do work
value = // get value
consumer.put(value);
}
}
Now all you have to do is define a Consumer class that operates on the value (has that put() method, or whatever) and create your Workers like this:
Consumer consumer = new Consumer();
Worker worker = new Worker(myConsumer);
executor.execute(worker);

Difference between Executors.newFixedThreadPool(1) and Executors.newSingleThreadExecutor()

My question is : does it make sense to use Executors.newFixedThreadPool(1)??. In two threads (main + oneAnotherThread) scenarios is it efficient to use executor service?. Is creating a new thread directly by calling new Runnable(){ } better than using ExecutorService?. What are the upsides and downsides of using ExecutorService for such scenarios?
PS: Main thread and oneAnotherThread dont access any common resource(s).
I have gone through : What are the advantages of using an ExecutorService?. and Only one thread at a time!

does it make sense to use Executors.newFixedThreadPool(1)?
It is essentially the same thing as an Executors.newSingleThreadExecutor() except that the latter is not reconfigurable, as indicated in the javadoc, whereas the former is if you cast it to a ThreadPoolExecutor.
In two threads (main + oneAnotherThread) scenarios is it efficient to use executor service?
An executor service is a very thin wrapper around a Thread that significantly facilitates the thread lifecycle management. If the only thing you need is to new Thread(runnable).start(); and move on, then there is no real need for an ExecutorService.
In any most real life cases, the possibility to monitor the life cycle of the tasks (through the returned Futures), the fact that the executor will re-create threads as required in case of uncaught exceptions, the performance gain of recycling threads vs. creating new ones etc. make the executor service a much more powerful solution at little additional cost.
Bottom line: I don't see any downsides of using an executor service vs. a thread.
The difference between Executors.newSingleThreadExecutor().execute(command) and new Thread(command).start(); goes through the small differences in behaviour between the two options.

Sometimes need to use Executors.newFixedThreadPool(1) to determine number of tasks in the queue
private final ExecutorService executor = Executors.newFixedThreadPool(1);
public int getTaskInQueueCount() {
ThreadPoolExecutor threadPoolExecutor = (ThreadPoolExecutor) executor;
return threadPoolExecutor.getQueue().size();
}

does it make sense to use Executors.newFixedThreadPool(1)??
Yes. It makes sense If you want to process all submitted tasks in order of arrival
In two threads (main + oneAnotherThread) scenarios is it efficient to use executor service? Is creating a new thread directly by calling new Runnable(){ } better than using ExecutorService?.
I prefer ExecutorService or ThreadPoolExecutor even for 1 thread.
Refer to below SE question for explanation for advantages of ThreadPoolExecutor over new Runnable() :
ExecutorService vs Casual Thread Spawner
What are the upsides and downsides of using ExecutorService for such scenarios?
Have a look at related SE question regarding ExexutorService use cases :
Java's Fork/Join vs ExecutorService - when to use which?
Regarding your query in subject line (from grepcode), both are same:
newFixedThreadPool API will return ThreadPoolExecutor as ExecutorService:
public static ExecutorService newFixedThreadPool(int nThreads) {
return new ThreadPoolExecutor(nThreads, nThreads,
0L, TimeUnit.MILLISECONDS,
new LinkedBlockingQueue<Runnable>());
and
newSingleThreadExecutor() return ThreadPoolExecutor as ExecutorService:
public static ExecutorService newSingleThreadExecutor() {
return new FinalizableDelegatedExecutorService
(new ThreadPoolExecutor(1, 1,
0L, TimeUnit.MILLISECONDS,
new LinkedBlockingQueue<Runnable>()));
I agree with #assylias answer regarding similarities/differences.

Is creating a new thread directly by calling new Runnable(){ } better than using ExecutorService?
If you want to compute something on the returned result after thread compilation, you can use Callable interface, which can be used with ExecutorService only, not with new Runnable(){}. The ExecutorService's submit() method, which take the Callable object as an arguement, returns the Future object. On this Future object you check whether the task has been completed on not using isDone() method. Also you can get the results using get() method.
In this case, ExecutorService is better than the new Runnable(){}.

Weak performance of CyclicBarrier with many threads: Would a tree-like synchronization structure be an alternative?

Our application requires all worker threads to synchronize at a defined point. For this we use a CyclicBarrier, but it does not seem to scale well. With more than eight threads, the synchronization overhead seems to outweigh the benefits of multithreading. (However, I cannot support this with measurement data.)
EDIT: Synchronization happens very frequently, in the order of 100k to 1M times.
If synchronization of many threads is "hard", would it help building a synchronization tree? Thread 1 waits for 2 and 3, which in turn wait for 4+5 and 6+7, respectively, etc.; after finishing, threads 2 and 3 wait for thread 1, thread 4 and 5 wait for thread 2, etc..
1
| \
2 3
|\ |\
4 5 6 7
Would such a setup reduce synchronization overhead? I'd appreciate any advice.
See also this featured question: What is the fastest cyclic synchronization in Java (ExecutorService vs. CyclicBarrier vs. X)?

With more than eight threads, the synchronization overhead seems to outweigh the benefits of multithreading. (However, I cannot support this with measurement data.)
Honestly, there's your problem right there. Figure out a performance benchmark and prove that this is the problem, or risk spending hours / days solving the entirely wrong problem.

You are thinking about the problem in a subtly wrong way that tends to lead to very bad coding. You don't want to wait for threads, you want to wait for work to be completed.
Probably the most efficient way is a shared, waitable counter. When you make new work, increment the counter and signal the counter. When you complete work, decrement the counter. If there is no work to do, wait on the counter. If you drop the counter to zero, check if you can make new work.

If I understand correctly, you're trying to break your solution up into parts and solve them separately, but concurrently, right? Then have your current thread wait for those tasks? You want to use something like a fork/join pattern.
List<CustomThread> threads = new ArrayList<CustomThread>();
for (Something something : somethings) {
threads.add(new CustomThread(something));
}
for (CustomThread thread : threads) {
thread.start();
}
for (CustomThread thread : threads) {
thread.join(); // Blocks until thread is complete
}
List<Result> results = new ArrayList<Result>();
for (CustomThread thread : threads) {
results.add(thread.getResult());
}
// do something with results.
In Java 7, there's even further support via a fork/join pool. See ForkJoinPool and its trail, and use Google to find one of many other tutorials.
You can recurse on this concept to get the tree you want, just have the threads you create generate more threads in the exact same way.
Edit: I was under the impression that you wouldn't be creating that many threads, so this is better for your scenario. The example won't be horribly short, but it goes along the same vein as the discussion you're having in the other answer, that you can wait on jobs, not threads.
First, you need a Callable for your sub-jobs that takes an Input and returns a Result:
public class SubJob implements Callable<Result> {
private final Input input;
public MyCallable(Input input) {
this.input = input;
}
public Result call() {
// Actually process input here and return a result
return JobWorker.processInput(input);
}
}
Then to use it, create an ExecutorService with a fix-sized thread pool. This will limit the number of jobs you're running concurrently so you don't accidentally thread-bomb your system. Here's your main job:
public class MainJob extends Thread {
// Adjust the pool to the appropriate number of concurrent
// threads you want running at the same time
private static final ExecutorService pool = Executors.newFixedThreadPool(30);
private final List<Input> inputs;
public MainJob(List<Input> inputs) {
super("MainJob")
this.inputs = new ArrayList<Input>(inputs);
}
public void run() {
CompletionService<Result> compService = new ExecutorCompletionService(pool);
List<Result> results = new ArrayList<Result>();
int submittedJobs = inputs.size();
for (Input input : inputs) {
// Starts the job when a thread is available
compService.submit(new SubJob(input));
}
for (int i = 0; i < submittedJobs; i++) {
// Blocks until a job is completed
results.add(compService.take())
}
// Do something with results
}
}
This will allow you to reuse threads instead of generating a bunch of new ones every time you want to run a job. The completion service will do the blocking while it waits for jobs to complete. Also note that the results list will be in order of completion.
You can also use Executors.newCachedThreadPool, which creates a pool with no upper limit (like using Integer.MAX_VALUE). It will reuse threads if one is available and create a new one if all the threads in the pool are running a job. This may be desirable later if you start encountering deadlocks (because there's so many jobs in the fixed thread pool waiting that sub jobs can't run and complete). This will at least limit the number of threads you're creating/destroying.
Lastly, you'll need to shutdown the ExecutorService manually, perhaps via a shutdown hook, or the threads that it contains will not allow the JVM to terminate.
Hope that helps/makes sense.

If you have a generation task (like the example of processing columns of a matrix) then you may be stuck with a CyclicBarrier. That is to say, if every single piece of work for generation 1 must be done in order to process any work for generation 2, then the best you can do is to wait for that condition to be met.
If there are thousands of tasks in each generation, then it may be better to submit all of those tasks to an ExecutorService (ExecutorService.invokeAll) and simply wait for the results to return before proceeding to the next step. The advantage of doing this is eliminating context switching and wasted time/memory from allocating hundreds of threads when the physical CPU is bounded.
If your tasks are not generational but instead more of a tree-like structure in which only a subset need to be complete before the next step can occur on that subset, then you might want to consider a ForkJoinPool and you don't need Java 7 to do that. You can get a reference implementation for Java 6. This would be found under whatever JSR introduced the ForkJoinPool library code.
I also have another answer which provides a rough implementation in Java 6:
public class Fib implements Callable<Integer> {
int n;
Executor exec;
Fib(final int n, final Executor exec) {
this.n = n;
this.exec = exec;
}
/**
* {#inheritDoc}
*/
#Override
public Integer call() throws Exception {
if (n == 0 || n == 1) {
return n;
}
//Divide the problem
final Fib n1 = new Fib(n - 1, exec);
final Fib n2 = new Fib(n - 2, exec);
//FutureTask only allows run to complete once
final FutureTask<Integer> n2Task = new FutureTask<Integer>(n2);
//Ask the Executor for help
exec.execute(n2Task);
//Do half the work ourselves
final int partialResult = n1.call();
//Do the other half of the work if the Executor hasn't
n2Task.run();
//Return the combined result
return partialResult + n2Task.get();
}
}
Keep in mind that if you have divided the tasks up too much and the unit of work being done by each thread is too small, there will negative performance impacts. For example, the above code is a terribly slow way to solve Fibonacci.

How to scale threads according to CPU cores?

I want to solve a mathematical problem with multiple threads in Java. my math problem can be separated into work units, that I want to have solved in several threads.
I don't want to have a fixed amount of threads working on it, but instead an amount of threads matching the amount of CPU cores. My problem is, that I couldn't find an easy tutorial in the internet for this. All I found are examples with fixed threads.
How can this be done? Can you provide examples?

You can determine the number of processes available to the Java Virtual Machine by using the static Runtime method, availableProcessors. Once you have determined the number of processors available, create that number of threads and split up your work accordingly.
Update: To further clarify, a Thread is just an Object in Java, so you can create it just like you would create any other object. So, let's say that you call the above method and find that it returns 2 processors. Awesome. Now, you can create a loop that generates a new Thread, and splits the work off for that thread, and fires off the thread. Here's some pseudocode to demonstrate what I mean:
int processors = Runtime.getRuntime().availableProcessors();
for(int i=0; i < processors; i++) {
Thread yourThread = new AThreadYouCreated();
// You may need to pass in parameters depending on what work you are doing and how you setup your thread.
yourThread.start();
}
For more information on creating your own thread, head to this tutorial. Also, you may want to look at Thread Pooling for the creation of the threads.

You probably want to look at the java.util.concurrent framework for this stuff too.
Something like:
ExecutorService e = Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());
// Do work using something like either
e.execute(new Runnable() {
public void run() {
// do one task
}
});
or
Future<String> future = pool.submit(new Callable<String>() {
public String call() throws Exception {
return null;
}
});
future.get(); // Will block till result available
This is a lot nicer than coping with your own thread pools etc.

Option 1:
newWorkStealingPool from Executors
public static ExecutorService newWorkStealingPool()
Creates a work-stealing thread pool using all available processors as its target parallelism level.
With this API, you don't need to pass number of cores to ExecutorService.
Implementation of this API from grepcode
/**
* Creates a work-stealing thread pool using all
* {#link Runtime#availableProcessors available processors}
* as its target parallelism level.
* #return the newly created thread pool
* #see #newWorkStealingPool(int)
* #since 1.8
*/
public static ExecutorService newWorkStealingPool() {
return new ForkJoinPool
(Runtime.getRuntime().availableProcessors(),
ForkJoinPool.defaultForkJoinWorkerThreadFactory,
null, true);
}
Option 2:
newFixedThreadPool API from Executors or other newXXX constructors, which returns ExecutorService
public static ExecutorService newFixedThreadPool(int nThreads)
replace nThreads with Runtime.getRuntime().availableProcessors()
Option 3:
ThreadPoolExecutor
public ThreadPoolExecutor(int corePoolSize,
int maximumPoolSize,
long keepAliveTime,
TimeUnit unit,
BlockingQueue<Runnable> workQueue)
pass Runtime.getRuntime().availableProcessors() as parameter to maximumPoolSize.

Doug Lea (author of the concurrent package) has this paper which may be relevant:
http://gee.cs.oswego.edu/dl/papers/fj.pdf
The Fork Join framework has been added to Java SE 7. Below are few more references:
http://www.ibm.com/developerworks/java/library/j-jtp11137/index.html
Article by Brian Goetz
http://www.oracle.com/technetwork/articles/java/fork-join-422606.html

The standard way is the Runtime.getRuntime().availableProcessors() method.
On most standard CPUs you will have returned the optimal thread count (which is not the actual CPU core count) here. Therefore this is what you are looking for.
Example:
ExecutorService service = Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());
Do NOT forget to shut down the executor service like this (or your program won't exit):
service.shutdown();
Here just a quick outline how to set up a future based MT code (offtopic, for illustration):
CompletionService<YourCallableImplementor> completionService =
new ExecutorCompletionService<YourCallableImplementor>(service);
ArrayList<Future<YourCallableImplementor>> futures = new ArrayList<Future<YourCallableImplementor>>();
for (String computeMe : elementsToCompute) {
futures.add(completionService.submit(new YourCallableImplementor(computeMe)));
}
Then you need to keep track on how many results you expect and retrieve them like this:
try {
int received = 0;
while (received < elementsToCompute.size()) {
Future<YourCallableImplementor> resultFuture = completionService.take();
YourCallableImplementor result = resultFuture.get();
received++;
}
} finally {
service.shutdown();
}

On the Runtime class, there is a method called availableProcessors(). You can use that to figure out how many CPUs you have. Since your program is CPU bound, you would probably want to have (at most) one thread per available CPU.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java parallelstream not using optimal number of threads when using newCachedThreadPool() - java

Related

How does parallel stream "know" to use the enclosing ForkJoinPool?

Frequent concurrent method calls in Java data-logger

Difference between Executors.newFixedThreadPool(1) and Executors.newSingleThreadExecutor()

Weak performance of CyclicBarrier with many threads: Would a tree-like synchronization structure be an alternative?

How to scale threads according to CPU cores?

Categories

Resources