Processing sub-streams of a stream in Java using executors - java

I have a program that processes a huge stream (not in the sense of java.util.stream, but rather InputStream) of data coming in through the network. The stream consists of objects, each having a sort of sub-stream identifier. Right now the whole processing is done in a single thread, but it takes a lot of CPU time and each sub-stream can easily be processed independently, so I'm thinking of multi-threading it.
However, each sub-stream requires to keep a lot of bulky state, including various buffers, hash maps and such. There is no particular reason to make it concurrent or synchronized since sub-streams are independent of each other. Moreover, each sub-stream requires that its objects are processed in the order they arrive, which means that probably there should be a single thread for each sub-stream (but possibly one thread processing multiple sub-streams).
I'm thinking of several approaches to this, but they are not quite elegant.
Create a single ThreadPoolExecutor for all tasks. Each task will contain the next object to process and the reference to a Processor instance which keeps all the state. That would ensure the necessary happens-before relationship thus ensuring that the processing thread will see the up-to-date state for this sub-stream. This approach has no way to make sure that the next object of the same sub-stream will be processed in the same thread, as far as I can see. Moreover, it needs some guarantee that objects will be processed in the order they come in, which will require additional synchronization of the Processor objects, introducing unnecessary delays.
Create multiple single-thread executors manually and a sort of hash-map that maps sub-stream identifiers to executor. This approach requires manual management of executors, creating or shutting down them as new sub-streams begin or end, and distributing the tasks between them accordingly.
Create a custom executor that processes a special subclass of tasks each having a sub-stream ID. This executor would use it as a hint to use the same thread for executing this task as the previous one with the same ID. However, I don't see an easy way to implement such executor. Unfortunately, it doesn't seem possible to extend any of the existing executor classes, and implementing an executor from scratch is kind of overkill.
Create a single ThreadPoolExecutor, but instead of creating a task for each incoming object, create a single long-running task for each sub-stream that would block in a concurrent queue, waiting for the next object. Then put objects in queues according to their sub-stream IDs. This approach needs as many threads as there are sub-streams because the tasks will be blocked. The expected number of sub-streams is about 30-60, so that may be acceptable.
Alternatively, proceed as in 4, but limit the number of threads, assigning multiple sub-streams to a single task. This is sort of a hybrid between 2 and 4. As far as I can see, this is the best approach of these, but it still requires some sort of manual sub-stream distribution between tasks and some way to shut the extra tasks down as sub-streams end.
What would be the best way to ensure that each sub-stream is processed in its own thread without a lot of error-prone code? So that the following pseudo-code will work:
// loop {
Item next = stream.read();
int id = next.getSubstreamID();
Processor processor = getProcessor(id);
SubstreamTask task = new SubstreamTask(processor, next, id);
executor.submit(task); // This makes sure that the task will
// be executed in the same thread as the
// previous task with the same ID.
// } // loop

I suggest having an array of single threaded executors. If you can devise a consistent hashing strategy for sub-streams, you can map sub-streams to individual threads. e.g.
final ExecutorsService[] es = ...
public void submit(int id, Runnable run) {
es[(id & 0x7FFFFFFF) % es.length].submit(run);
}
The key could be an String or long but some way to identify the sub-stream. If you know a particular sub-stream is very expensive, you could assign it a dedicated thread.

The solution I finally chose looks like this:
private final Executor[] streamThreads
= new Executor[Runtime.getRuntime().availableProcessors()];
{
for (int i = 0; i < streamThreads.length; ++i) {
streamThreads[i] = Executors.newSingleThreadExecutor();
}
}
private final ConcurrentHashMap<SubstreamId, Integer>
threadById = new ConcurrentHashMap<>();
This code determines which executor to use:
Message msg = in.readNext();
SubstreamId msgSubstream = msg.getSubstreamId();
int exe = threadById.computeIfAbsent(msgSubstream,
id -> findBestExecutor());
streamThreads[exe].execute(() -> {
// processing goes here
});
And the findBestExecutor() function is this:
private int findBestExecutor() {
// Thread index -> substream count mapping:
final int[] loads = new int[streamThreads.length];
for (int thread : threadById.values()) {
++loads[thread];
}
// return the index of the minimum load
return IntStream.range(0, streamThreads.length)
.reduce((i, j) -> loads[i] <= loads[j] ? i : j)
.orElse(0);
}
This is, of course, not very efficient, but note that this function is only called when a new sub-stream shows up (which happens several times every few hours, so it's not a big deal in my case). My real code looks a bit more complicated because I have a way to determine whether two sub-streams are likely to finish simultaneously, and if they are, I try to assign them to different threads in order to maintain even load after they do finish. But since I never mentioned this detail in the question, I guess it doesn't belong to the answer either.

Related

Java Clear CompletionService Working Queue

I am writing a program which uses a CompletionService to run threaded analyses on a bunch of different objects, where each "analysis" consists of taking in a string and doing some computation to give either true or false as an answer. My code looks essentially like this:
// tasks come from a different method and contain the strings + some other needed info
List<Future<Pair<Pieces,Boolean>>> futures = new ArrayList<>(tasks.size());
for (Task task : tasks) {
futures.add(executorCompletionService.submit(task));
}
ArrayList<Pair<Pieces, Boolean>> pairs = new ArrayList<>();
int toComplete = tasks.size();
int received = 0;
int failed = 0;
while (received < toComplete) {
Future<Pair<Pieces, Boolean>> resFuture = executorCompletionService.take();
received++;
Pair<Pieces, Boolean> res = resFuture.get();
if (!res.getValue()) failed++;
if (failed > 300) {
// My problem is here
}
pairs.add(res);
}
// return pairs and go on to do something else
In the marked section, my goal is to have it abandon the computation if over 300 strings have failed, such that I can move on to a new analysis, calling this method again with some different data. The problem is that since the same CompletionService is used again, if I do not somehow clear the queue, then the worker queue will keep growing as I keep adding more to it every time I use it (since after 300 failures there are likely still many unprocessed strings left).
I have tried to loop through the futures list and delete all unfinished tasks using something like futures.foreach(future -> future.cancel(true), however when I next call the method I get a java.util.concurrent.CancellationException error when I try to call resFuture.get().
(Edit: It seems that even though I call foreach(future->future.cancel(true)), this does not guarantee that the workerQueue is actually clear afterwards. I do not understand why this is. It almost seems as if it takes a while to clear the queue, and the code does not wait for this to happen before moving to the next analysis, so occasionally get will be called on a future which has been cancelled.)
I have also tried to do
while (received < toComplete) {
executorCompletionService.take();
received++;
}
To empty the queue, and while this works it is barely faster than just running all of the analyses anyway, and so it does not do very well for the efficiency.
My question is if there is a better way to empty the worker queue such that when I next call this code it is as if the CompletionService is new again.
Edit: Another method I have tried is just setting executorCompletionService = new CompletionService, which is slightly faster than my other solution but is still rather slow and definitely not good practice.
P.S.: Also happy to accept any other ways in which this is possible, I am not attached to using a CompletionService it has just been the easiest thing for what I've done so far
This has since been resolved, but I have seen other similar questions with no good answer so here is my solution:
Previously, I was using an ExecutorService to create my ExecutorCompletionService(ExecutorService). I switched the ExecutorService to be a ThreadPoolExecutor, and since in the backed the ExecutorService already is a ThreadPoolExecutor all method signatures can be fixed with just a cast. Using the ThreadPoolExecutor gives you much more freedom in the backend, and specifically you can called threadPoolExecutor.getQueue().clear() which clears all tasks awaiting completion. Finally, I needed to make sure to "drain" the remaining working tasks, so my final cancelling code looked like this:
if (failed > maxFailures) {
executorService.getQueue().clear();
while (executorService.getActiveCount() > 0) {
executorCompletionService.poll();
}
At the end of this code block, the executor will be ready to run again.

How to avoid congesting/stalling/deadlocking an executorservice with recursive callable

All the threads in an ExecutorService are busy with tasks that wait for tasks that are stuck in the queue of the executor service.
Example code:
ExecutorService es=Executors.newFixedThreadPool(8);
Set<Future<Set<String>>> outerSet=new HashSet<>();
for(int i=0;i<8;i++){
outerSet.add(es.submit(new Callable<Set<String>>() {
#Override
public Set<String> call() throws Exception {
Thread.sleep(10000); //to simulate work
Set<Future<String>> innerSet=new HashSet<>();
for(int j=0;j<8;j++) {
int k=j;
innerSet.add(es.submit(new Callable<String>() {
#Override
public String call() throws Exception {
return "number "+k+" in inner loop";
}
}));
}
Set<String> out=new HashSet<>();
while(!innerSet.isEmpty()) { //we are stuck at this loop because all the
for(Future<String> f:innerSet) { //callable in innerSet are stuckin the queue
if(f.isDone()) { //of es and can't start since all the threads
out.add(f.get()); //in es are busy waiting for them to finish
}
}
}
return out;
}
}));
}
Are there any way to avoid this other than by making more threadpools for each layer or by having a threadpool that is not fixed in size?
A practical example would be if some callables are submitted to ForkJoinPool.commonPool() and then these tasks use objects that also submit to the commonPool in one of their methods.
You should use a ForkJoinPool. It was made for this situation.
Whereas your solution blocks a thread permanently while it's waiting for its subtasks to finish, the work stealing ForkJoinPool can perform work while in join(). This makes it efficient for these kinds of situations where you may have a variable number of small (and often recursive) tasks that are being run. With a regular thread-pool you would need to oversize it, to make sure that you don't run out of threads.
With CompletableFuture you need to handle a lot more of the actual planning/scheduling yourself, and it will be more complex to tune if you decide to change things. With FJP the only thing you need to tune is the amount of threads in the pool, with CF you need to think about then vs. thenAsync as well.
I would recommend trying to decompose the work to use completion stages via CompletableFuture
CompletableFuture.supplyAsync(outerTask)
.thenCompose(CompletableFuture.allOf(innerTasks)
That way your outer task doesn’t hog the execution thread while processing inner tasks, but you still get a Future that resolves when the entire job is done. It can be hard to split those stages up if they’re too tightly coupled though.
The approach that you are suggesting which basically is based on the hypothesis that there is a possible resolution if the number of threads are more than the number of task, will not work here, if you are already allocating a single thread pool. You may try it to see it. It's a simple case of deadlock as you have stated in the comments of your code.
In such a case, use two separate thread pools, one for the outer and another for the inner. And when the task from the inner pool completes, simply return back the value to the outer.
Or you can simply create a thread on the fly, get the work done in it, get the result and return it back to the outer.

Java - Can CountDownLatch.await() be reordered by the compiler

I have to invoke an operation on a different system. The other system is highly concurrent and distributed. Therefore I integrate it over a MessageBus. One implementation is required to let the caller wait until the result got received on the bus or the timeout expires.
The implementation consists of three steps: First I have to pack the future and the CountDownLatch, which is released once the result is received, into a ConcurrentHashMap. This Map is shared with the component listening on the MessageBus. Then I have to start the execution and finally wait on the latch.
public FailSafeFuture execute(Execution execution,long timeout,TimeUnit timeoutUnit) {
//Step 1
final WaitingFailSafeFuture future = new WaitingFailSafeFuture();
final CountDownLatch countDownLatch = new CountDownLatch(1);
final PendingExecution pendingExecution = new PendingExecution(future, countDownLatch);
final String id = execution.getId();
pendingExecutions.put(id, pendingExecution); //ConcurrentHashMap shared with Bus
//Step 2
execution.execute();
//Step 3
final boolean awaitSuccessfull = countDownLatch.await(timeout, timeoutUnit);
//...
return future;
}
So the question is: Can those 3 Steps be reordered by the compiler? As far as I understand, only Step 1 and Step 3 form a happens-before relationship. So in theory step 2 could be freely moved by the compiler. Could step 2 even be moved behind the await and therefore would invalidate the code?
Follow-up question: Would replacing the CountDownLatch with a wait/notify combination on a shared object solve the problem without further synchronization (leaving out the timeout of course)
Instruction reordering can happen if the end result remains the same (as far as the compiler thinks), and may require additional synchronization if multiple threads are involved (since the compiler doesn't know that other threads may expect a specific order).
In a single thread, as in your example, all steps have a happens-before relationship with each other. Even if the compiler could reorder those method invocations, the end result needs to be the same (execute() called before await()). As await() involves synchronization, there's no way that execute() could somehow slip after it, unless you were using a buggy crazy implementation.
The happens-before relationship between countDown() and await() makes sure that PendingExecution code happens before code executed await(). So there's nothing wrong with the code shown.
You should always prefer the java.util.concurrent classes over wait/notify, as they're easier to use and provide a lot more functionality.

Concurrent and scalable data structure in Java to handle tasks?

for my current development I have many threads (Producers) that create Tasks and many threads that that consume these Tasks (consumers)
Each Producers is identified by a unique name; A Tasks is made of:
the name of its Producers
a name
data
My question concerns the data structure used by the (Producers) and the (consumers).
Concurrent Queue?
Naively, we could imagine that Producers populate a concurrent-queue with Tasks and (consumers) reads/consumes the Tasks stored in the concurrent-queue.
I think that this solution would rather well scale but one single case is problematic: If a Producers creates very quickly two Tasks having the same name but not the same data (Both tasks T1 and T2 have the same name but T1 has data D1 and T2 has data D2), it is theoretically possible that they are consumed in the order T2 then T1!
Task Map + Queue?
Now, I imagine creating my own data structure (let's say MyQueue) based on Map + Queue. Such as a queue, it would have a pop() and a push() method.
The pop() method would be quite simple
The push() method would:
Check if an existing Task is not yet inserted in MyQueue (doing find() in the Map)
if found: data stored in the Task to-be-inserted would be merged with data stored in the found Task
if not found: the Task would be inserted in the Map and an entry would be added in the Queue
Of course, I'll have to make it safe for concurrent access... and that will certainly be my problem; I am almost sure that this solution won't scale.
So What?
So my question is now what are the best data structure I have to use in order to fulfill my requirements
You could try Heinz Kabutz's Striped Executor Service a possible candidate.
This magical thread pool would ensure that all Runnables with the same stripeClass would be executed in the order they were submitted, but StripedRunners with different stripedClasses could still execute independently.
Instead of making a data structure safe for concurrent access, why not opting out concurrent and go for parallel?
Functional programming models such as MapReduce are a very scalable way to solve this kind of problems.
I understand that D1 and D2 can be either analyzed together or in isolation and the only constraint is that they shouldn't be analyzed in the wrong order. (Making some assumption here ) But in case the real problem is only the way the results are combined, there might be an easy solution.
You could remove the constraint all together allowing them to be analyzed separately and then having a reduce function that is able to re-combine them together in a sensible way.
In this case you'd have the first step as map and the second as reduce.
Even if the computation is more efficient if done in a single go, a big part of scaling, especially scaling out is accomplished by denormalization.
If consumers are running in parallel, I doubt there is a way to make them execute tasks with the same name sequentially.
In your example (from comments):
BlockingQueue can really be a problem (unfortunately) if a Producer
"P1" adds a first task "T" with data D1 and quickly a second task "T"
with data D2. In this case, the first task can be handled by a thread
and the second task by another thread; If the threads handling the
first task is interrupted, the thread handling the second one can
complete first
There is no difference if P1 submits D2 not so quickly. Consumer1 could still be too slow, so consumer 2 would be able to finish first. Here is an example for such scenario:
P1: submit D1
C1: read D1
P2: submit D2
C2: read D2
C2: process D2
C1: process D1
To solve it, you will have to introduce some kind of completion detection, which I believe will overcomplicate things.
If you have enough load and can process some tasks with different names not sequentially, then you can use a queue per consumer and put same named tasks to the same queue.
public class ParallelQueue {
private final BlockingQueue<Task>[] queues;
private final int consumersCount;
public ParallelQueue(int consumersCount) {
this.consumersCount = consumersCount;
queues = new BlockingQueue[consumersCount];
for (int i = 0; i < consumersCount; i++) {
queues[i] = new LinkedBlockingQueue<>();
}
}
public void push(Task<?> task) {
int index = task.name.hashCode() % consumersCount;
queues[index].add(task);
}
public Task<?> pop(int consumerId) throws InterruptedException {
int index = consumerId % consumersCount;
return queues[index].take();
}
private final static class Task<T> {
private final String name;
private final T data;
private Task(String name, T data) {
this.name = name;
this.data = data;
}
}
}

Java Iterator Concurrency

I'm trying to loop over a Java iterator concurrently, but am having troubles with the best way to do this.
Here is what I have where I don't try to do anything concurrently.
Long l;
Iterator<Long> i = getUserIDs();
while (i.hasNext()) {
l = i.next();
someObject.doSomething(l);
anotheObject.doSomething(l);
}
There should be no race conditions between the things I'm doing on the non iterator objects, so I'm not too worried about that. I'd just like to speed up how long it takes to loop through the iterator by not doing it sequentially.
Thanks in advance.
One solution is to use an executor to parallelise your work.
Simple example:
ExecutorService executor = Executors.newCachedThreadPool();
Iterator<Long> i = getUserIDs();
while (i.hasNext()) {
final Long l = i.next();
Runnable task = new Runnable() {
public void run() {
someObject.doSomething(l);
anotheObject.doSomething(l);
}
}
executor.submit(task);
}
executor.shutdown();
This will create a new thread for each item in the iterator, which will then do the work. You can tune how many threads are used by using a different method on the Executors class, or subdivide the work as you see fit (e.g. a different Runnable for each of the method calls).
A can offer two possible approaches:
Use a thread pool and dispatch the items received from the iterator to a set of processing threads. This will not accelerate the iterator operations themselves, since those would still happen in a single thread, but it will parallelize the actual processing.
Depending on how the iteration is created, you might be able to split the iteration process to multiple segments, each to be processed by a separate thread via a different Iterator object. For an example, have a look at the List.sublist(int fromIndex, int toIndex) and List.listIterator(int index) methods.
This would allow the iterator operations to happen in parallel, but it is not always possible to segment the iteration like this, usually due to the simple fact that the items to be iterated over are not immediately available.
As a bonus trick, if the iteration operations are expensive or slow, such as those required to access a database, you might see a throughput improvement if you separate them out to a separate thread that will use the iterator to fill in a BlockingQueue. The dispatcher thread will then only have to access the queue, without waiting on the iterator object to retrieve the next item.
The most important advice in this case is this: "Use your profiler", usually to be followed by "Do not optimise prematurely". By using a profiler, such as VisualVM, you should be able to ascertain the exact cause of any performance issues, without taking shots in the dark.
If you are using Java 7, you can use the new fork/join; see the tutorial.
Not only does it split automatically the tasks among the threads, but if some thread finishes its tasks earlier than the other threads, it "steals" some tasks from the other threads.

Categories