Parallelize a for loop in java

Parallelize a for loop in java - java

I have a for loop that is looping over a list of collections. Inside the loop some select/update queries are taking place on collection which are exclusive of the other collections. Since each collection has a lot of data to process on i would like to parallelize it.
The code snippet looks something like this:
//Some variables that are used within the for loop logic
for(String collection : collections) {
//Select queries on collection
//Update queries on collection
}
How can i achieve this in java?

You can use the parallelStream() method (since java 8):
collections.parallelStream().forEach((collection) -> {
//Select queries on collection
//Update queries on collection
});
More informations about streams.
Another way to do it is using Executors :
try
{
final ExecutorService exec = Executors.newFixedThreadPool(collections.size());
for (final String collection : collections)
{
exec.submit(() -> {
// Select queries on collection
// Update queries on collection
});
}
// We want to wait that the jobs are done.
final boolean terminated = exec.awaitTermination(500, TimeUnit.MILLISECONDS);
if (terminated == false)
{
exec.shutdownNow();
}
} catch (final InterruptedException e)
{
e.printStackTrace();
}
This example is more powerfull since you can easily know when the job is done, force termination... and more.

final int numberOfThreads = 32;
final ExecutorService executor = Executors.newFixedThreadPool(numberOfThreads);
// List to store the 'handles' (Futures) for all tasks:
final List<Future<MyResult>> futures = new ArrayList<>();
// Schedule one (parallel) task per String from "collections":
for(final String str : collections) {
futures.add(executor.submit(() -> { return doSomethingWith(str); }));
}
// Wait until all tasks have completed:
for ( Future<MyResult> f : futures ) {
MyResult aResult = f.get(); // Will block until the result of the task is available.
// Optionally do something with the result...
}
executor.shutdown(); // Release the threads held by the executor.
// At this point all tasks have ended and we can continue as if they were all executed sequentially
Adjust the numberOfThreads as needed to achieve the best throughput. More threads will tend to utilize the local CPU better, but may cause more overhead at the remote end. To get good local CPU utilization, you want to have (much) more threads than CPUs (/cores) so that, whenever one thread has to wait, e.g. for a response from the DB, another thread can be switched in to execute on the CPU.

There are a number of question that you need to ask yourself to find the right answer:
If I have as many threads as the number of my CPU cores, would that be enough?
Using parallelStream() will give you as many threads as your CPU cores.
Will parallelizing the loop give me a performance boost or is there a bottleneck on the DB?
You could spin up 100 threads, processing in parallel, but this doesn't mean that you will do things 100 times faster, if your DB or the network cannot handle the volume. DB locking can also be an issue here.
Do I need to process my data in a specific order?
If you have to process your data in a specific order, this may limit your choices. E.g. forEach() doesn't guarantee that the elements of your collection will be processed in a specific order, but forEachOrdered() does (with a performance cost).
Is my datasource capable of fetching data reactively?
There are cases when our datasource can provide data in the form of a stream. In that case, you can always process this stream using a technology such as RxJava or WebFlux. This would enable you to take a different approach on your problem.
Having said all the above, you can choose the approach you want (executors, RxJava etc.) that fit better to your purpose.

Related

Question about potential race condition with ParallelStream of Lists

I came across this piece of code that uses Java streams, specifically parallelStream() in order to collect some data from an oracle database. See below where in this case:
range = some list of input Id
rangeLimit = 1000
rangeLimitedFunction = some function that queries a DB for some content
ForkJoinPool threadPool = new ForkJoinPool(Math.min(Runtime.getRuntime().availableProcessors(), parallelism));
try {
Optional<C> res = threadPool.submit(new Callable<Optional<C>>() {
#Override
public Optional<C> call() throws Exception {
return splitByLimit(range, rangeLimit).parallelStream()
.map(rangeLimitedFunction::apply)
.reduce((list, items) -> {
list.addAll(items);
return list;
});
}
}).get();
From what I understand this is how this is working:
Split range into chunks of 1000 to feed into the function
Process each chunk in a thread to return some results
Aggregate the results to a list of POJOs
My question is around a potential race condition imposed by trying to reduce into a single list. Is it not possible for many of these threads to be trying to add content to the resulting list and potentially corrupt data?

That depends largely on the implementation of List that's used in this case.
That said, this pice of code would be way better using flatMap and a collector to leverage the thread-safety of Java paralell streams and avoid potential pitfalls from non-thread-safe list implementations.
That said, paralellStreams don't offer much benefit for IO-operations. They target processor heavy operations and usually only pay-off if there are more than 15000 (IIRC) operations (that is stream-iterations times cpu-heavy stream operations), which is kind of rare.

Processing sub-streams of a stream in Java using executors

I have a program that processes a huge stream (not in the sense of java.util.stream, but rather InputStream) of data coming in through the network. The stream consists of objects, each having a sort of sub-stream identifier. Right now the whole processing is done in a single thread, but it takes a lot of CPU time and each sub-stream can easily be processed independently, so I'm thinking of multi-threading it.
However, each sub-stream requires to keep a lot of bulky state, including various buffers, hash maps and such. There is no particular reason to make it concurrent or synchronized since sub-streams are independent of each other. Moreover, each sub-stream requires that its objects are processed in the order they arrive, which means that probably there should be a single thread for each sub-stream (but possibly one thread processing multiple sub-streams).
I'm thinking of several approaches to this, but they are not quite elegant.
Create a single ThreadPoolExecutor for all tasks. Each task will contain the next object to process and the reference to a Processor instance which keeps all the state. That would ensure the necessary happens-before relationship thus ensuring that the processing thread will see the up-to-date state for this sub-stream. This approach has no way to make sure that the next object of the same sub-stream will be processed in the same thread, as far as I can see. Moreover, it needs some guarantee that objects will be processed in the order they come in, which will require additional synchronization of the Processor objects, introducing unnecessary delays.
Create multiple single-thread executors manually and a sort of hash-map that maps sub-stream identifiers to executor. This approach requires manual management of executors, creating or shutting down them as new sub-streams begin or end, and distributing the tasks between them accordingly.
Create a custom executor that processes a special subclass of tasks each having a sub-stream ID. This executor would use it as a hint to use the same thread for executing this task as the previous one with the same ID. However, I don't see an easy way to implement such executor. Unfortunately, it doesn't seem possible to extend any of the existing executor classes, and implementing an executor from scratch is kind of overkill.
Create a single ThreadPoolExecutor, but instead of creating a task for each incoming object, create a single long-running task for each sub-stream that would block in a concurrent queue, waiting for the next object. Then put objects in queues according to their sub-stream IDs. This approach needs as many threads as there are sub-streams because the tasks will be blocked. The expected number of sub-streams is about 30-60, so that may be acceptable.
Alternatively, proceed as in 4, but limit the number of threads, assigning multiple sub-streams to a single task. This is sort of a hybrid between 2 and 4. As far as I can see, this is the best approach of these, but it still requires some sort of manual sub-stream distribution between tasks and some way to shut the extra tasks down as sub-streams end.
What would be the best way to ensure that each sub-stream is processed in its own thread without a lot of error-prone code? So that the following pseudo-code will work:
// loop {
Item next = stream.read();
int id = next.getSubstreamID();
Processor processor = getProcessor(id);
SubstreamTask task = new SubstreamTask(processor, next, id);
executor.submit(task); // This makes sure that the task will
// be executed in the same thread as the
// previous task with the same ID.
// } // loop

I suggest having an array of single threaded executors. If you can devise a consistent hashing strategy for sub-streams, you can map sub-streams to individual threads. e.g.
final ExecutorsService[] es = ...
public void submit(int id, Runnable run) {
es[(id & 0x7FFFFFFF) % es.length].submit(run);
}
The key could be an String or long but some way to identify the sub-stream. If you know a particular sub-stream is very expensive, you could assign it a dedicated thread.

The solution I finally chose looks like this:
private final Executor[] streamThreads
= new Executor[Runtime.getRuntime().availableProcessors()];
{
for (int i = 0; i < streamThreads.length; ++i) {
streamThreads[i] = Executors.newSingleThreadExecutor();
}
}
private final ConcurrentHashMap<SubstreamId, Integer>
threadById = new ConcurrentHashMap<>();
This code determines which executor to use:
Message msg = in.readNext();
SubstreamId msgSubstream = msg.getSubstreamId();
int exe = threadById.computeIfAbsent(msgSubstream,
id -> findBestExecutor());
streamThreads[exe].execute(() -> {
// processing goes here
});
And the findBestExecutor() function is this:
private int findBestExecutor() {
// Thread index -> substream count mapping:
final int[] loads = new int[streamThreads.length];
for (int thread : threadById.values()) {
++loads[thread];
}
// return the index of the minimum load
return IntStream.range(0, streamThreads.length)
.reduce((i, j) -> loads[i] <= loads[j] ? i : j)
.orElse(0);
}
This is, of course, not very efficient, but note that this function is only called when a new sub-stream shows up (which happens several times every few hours, so it's not a big deal in my case). My real code looks a bit more complicated because I have a way to determine whether two sub-streams are likely to finish simultaneously, and if they are, I try to assign them to different threads in order to maintain even load after they do finish. But since I never mentioned this detail in the question, I guess it doesn't belong to the answer either.

Design issue: is this doable only with producer/consumer?

I'm trying to increase performance of indexing my lucene files. For this, I created a worker "LuceneWorker" that does the job.
Given the code below, the 'concurrent' execution becomes significantly slow. I think I know why - it's because the futures grows to a limit that there's hardly memory to perform yet another task of the LuceneWorker.
Q: is there a way to limit the amount of 'workers' that goes into the executor? In other words if there are 'n' futures - do not continue and allow the documents to be indexed first?
My intuitive approach is that I should build a consumer/producer with ArrayBlockingQueue. But wonder if I'm right before I redesign it.
ExecutorService executor = Executors.newFixedThreadPool(cores);
List<Future<List<Document>>> futures = new ArrayList<Future<List<Document>>>(3);
for (File file : files)
{
if (isFileIndexingOK(file))
{
System.out.println(file.getName());
Future<List<Document>> future = executor.submit(new LuceneWorker(file, indexSearcher));
futures.add(future);
}
else
{
System.out.println("NOT A VALID FILE FOR INDEXING: "+file.getName());
continue;
}
}
int index=0;
for (Future<List<Document>> future : futures)
{
try{
List<Document> docs = future.get();
for(Document doc : docs)
writer.addDocument(doc);
}catch(Exception exp)
{
//exp code comes here.
}
}

If you want to limit the number of waiting jobs, use a ThreadPoolExecutor with a bounded queue like ArrayBlockingQueue. Also roll your own RejectedExecutionHandler so that the submitting thread waits for capacity in the queue. You cannot use the convenience methods in Executors for that as newFixedThreadPool uses an unbounded LinkedBlockingQueue.

Depending on the standard input size and the complexity of the LuceneWorker class, I could imagine solving this problem at least partially using the Fork/Join framework. When using JDK 8's CountedCompleter implementation (included in jsr166y) I/O operations would not produce any problems.

Weak performance of CyclicBarrier with many threads: Would a tree-like synchronization structure be an alternative?

Our application requires all worker threads to synchronize at a defined point. For this we use a CyclicBarrier, but it does not seem to scale well. With more than eight threads, the synchronization overhead seems to outweigh the benefits of multithreading. (However, I cannot support this with measurement data.)
EDIT: Synchronization happens very frequently, in the order of 100k to 1M times.
If synchronization of many threads is "hard", would it help building a synchronization tree? Thread 1 waits for 2 and 3, which in turn wait for 4+5 and 6+7, respectively, etc.; after finishing, threads 2 and 3 wait for thread 1, thread 4 and 5 wait for thread 2, etc..
1
| \
2 3
|\ |\
4 5 6 7
Would such a setup reduce synchronization overhead? I'd appreciate any advice.
See also this featured question: What is the fastest cyclic synchronization in Java (ExecutorService vs. CyclicBarrier vs. X)?

With more than eight threads, the synchronization overhead seems to outweigh the benefits of multithreading. (However, I cannot support this with measurement data.)
Honestly, there's your problem right there. Figure out a performance benchmark and prove that this is the problem, or risk spending hours / days solving the entirely wrong problem.

You are thinking about the problem in a subtly wrong way that tends to lead to very bad coding. You don't want to wait for threads, you want to wait for work to be completed.
Probably the most efficient way is a shared, waitable counter. When you make new work, increment the counter and signal the counter. When you complete work, decrement the counter. If there is no work to do, wait on the counter. If you drop the counter to zero, check if you can make new work.

If I understand correctly, you're trying to break your solution up into parts and solve them separately, but concurrently, right? Then have your current thread wait for those tasks? You want to use something like a fork/join pattern.
List<CustomThread> threads = new ArrayList<CustomThread>();
for (Something something : somethings) {
threads.add(new CustomThread(something));
}
for (CustomThread thread : threads) {
thread.start();
}
for (CustomThread thread : threads) {
thread.join(); // Blocks until thread is complete
}
List<Result> results = new ArrayList<Result>();
for (CustomThread thread : threads) {
results.add(thread.getResult());
}
// do something with results.
In Java 7, there's even further support via a fork/join pool. See ForkJoinPool and its trail, and use Google to find one of many other tutorials.
You can recurse on this concept to get the tree you want, just have the threads you create generate more threads in the exact same way.
Edit: I was under the impression that you wouldn't be creating that many threads, so this is better for your scenario. The example won't be horribly short, but it goes along the same vein as the discussion you're having in the other answer, that you can wait on jobs, not threads.
First, you need a Callable for your sub-jobs that takes an Input and returns a Result:
public class SubJob implements Callable<Result> {
private final Input input;
public MyCallable(Input input) {
this.input = input;
}
public Result call() {
// Actually process input here and return a result
return JobWorker.processInput(input);
}
}
Then to use it, create an ExecutorService with a fix-sized thread pool. This will limit the number of jobs you're running concurrently so you don't accidentally thread-bomb your system. Here's your main job:
public class MainJob extends Thread {
// Adjust the pool to the appropriate number of concurrent
// threads you want running at the same time
private static final ExecutorService pool = Executors.newFixedThreadPool(30);
private final List<Input> inputs;
public MainJob(List<Input> inputs) {
super("MainJob")
this.inputs = new ArrayList<Input>(inputs);
}
public void run() {
CompletionService<Result> compService = new ExecutorCompletionService(pool);
List<Result> results = new ArrayList<Result>();
int submittedJobs = inputs.size();
for (Input input : inputs) {
// Starts the job when a thread is available
compService.submit(new SubJob(input));
}
for (int i = 0; i < submittedJobs; i++) {
// Blocks until a job is completed
results.add(compService.take())
}
// Do something with results
}
}
This will allow you to reuse threads instead of generating a bunch of new ones every time you want to run a job. The completion service will do the blocking while it waits for jobs to complete. Also note that the results list will be in order of completion.
You can also use Executors.newCachedThreadPool, which creates a pool with no upper limit (like using Integer.MAX_VALUE). It will reuse threads if one is available and create a new one if all the threads in the pool are running a job. This may be desirable later if you start encountering deadlocks (because there's so many jobs in the fixed thread pool waiting that sub jobs can't run and complete). This will at least limit the number of threads you're creating/destroying.
Lastly, you'll need to shutdown the ExecutorService manually, perhaps via a shutdown hook, or the threads that it contains will not allow the JVM to terminate.
Hope that helps/makes sense.

If you have a generation task (like the example of processing columns of a matrix) then you may be stuck with a CyclicBarrier. That is to say, if every single piece of work for generation 1 must be done in order to process any work for generation 2, then the best you can do is to wait for that condition to be met.
If there are thousands of tasks in each generation, then it may be better to submit all of those tasks to an ExecutorService (ExecutorService.invokeAll) and simply wait for the results to return before proceeding to the next step. The advantage of doing this is eliminating context switching and wasted time/memory from allocating hundreds of threads when the physical CPU is bounded.
If your tasks are not generational but instead more of a tree-like structure in which only a subset need to be complete before the next step can occur on that subset, then you might want to consider a ForkJoinPool and you don't need Java 7 to do that. You can get a reference implementation for Java 6. This would be found under whatever JSR introduced the ForkJoinPool library code.
I also have another answer which provides a rough implementation in Java 6:
public class Fib implements Callable<Integer> {
int n;
Executor exec;
Fib(final int n, final Executor exec) {
this.n = n;
this.exec = exec;
}
/**
* {#inheritDoc}
*/
#Override
public Integer call() throws Exception {
if (n == 0 || n == 1) {
return n;
}
//Divide the problem
final Fib n1 = new Fib(n - 1, exec);
final Fib n2 = new Fib(n - 2, exec);
//FutureTask only allows run to complete once
final FutureTask<Integer> n2Task = new FutureTask<Integer>(n2);
//Ask the Executor for help
exec.execute(n2Task);
//Do half the work ourselves
final int partialResult = n1.call();
//Do the other half of the work if the Executor hasn't
n2Task.run();
//Return the combined result
return partialResult + n2Task.get();
}
}
Keep in mind that if you have divided the tasks up too much and the unit of work being done by each thread is too small, there will negative performance impacts. For example, the above code is a terribly slow way to solve Fibonacci.

Java Iterator Concurrency

I'm trying to loop over a Java iterator concurrently, but am having troubles with the best way to do this.
Here is what I have where I don't try to do anything concurrently.
Long l;
Iterator<Long> i = getUserIDs();
while (i.hasNext()) {
l = i.next();
someObject.doSomething(l);
anotheObject.doSomething(l);
}
There should be no race conditions between the things I'm doing on the non iterator objects, so I'm not too worried about that. I'd just like to speed up how long it takes to loop through the iterator by not doing it sequentially.
Thanks in advance.

One solution is to use an executor to parallelise your work.
Simple example:
ExecutorService executor = Executors.newCachedThreadPool();
Iterator<Long> i = getUserIDs();
while (i.hasNext()) {
final Long l = i.next();
Runnable task = new Runnable() {
public void run() {
someObject.doSomething(l);
anotheObject.doSomething(l);
}
}
executor.submit(task);
}
executor.shutdown();
This will create a new thread for each item in the iterator, which will then do the work. You can tune how many threads are used by using a different method on the Executors class, or subdivide the work as you see fit (e.g. a different Runnable for each of the method calls).

A can offer two possible approaches:
Use a thread pool and dispatch the items received from the iterator to a set of processing threads. This will not accelerate the iterator operations themselves, since those would still happen in a single thread, but it will parallelize the actual processing.
Depending on how the iteration is created, you might be able to split the iteration process to multiple segments, each to be processed by a separate thread via a different Iterator object. For an example, have a look at the List.sublist(int fromIndex, int toIndex) and List.listIterator(int index) methods.
This would allow the iterator operations to happen in parallel, but it is not always possible to segment the iteration like this, usually due to the simple fact that the items to be iterated over are not immediately available.
As a bonus trick, if the iteration operations are expensive or slow, such as those required to access a database, you might see a throughput improvement if you separate them out to a separate thread that will use the iterator to fill in a BlockingQueue. The dispatcher thread will then only have to access the queue, without waiting on the iterator object to retrieve the next item.
The most important advice in this case is this: "Use your profiler", usually to be followed by "Do not optimise prematurely". By using a profiler, such as VisualVM, you should be able to ascertain the exact cause of any performance issues, without taking shots in the dark.

If you are using Java 7, you can use the new fork/join; see the tutorial.
Not only does it split automatically the tasks among the threads, but if some thread finishes its tasks earlier than the other threads, it "steals" some tasks from the other threads.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.