I have a single threaded model job which iterates over a collection of data and customizes the data. I want to divide the collection into small sublists and want each individual sublist to be executed in parallel. Should I use an array of threads (where the size of the array is the number of sublist created), or a thread pool?
Based on what you are going to divide the collection further? If your type of jobs/data are same then let them be in one collection and threads from threadpool will pick up the task from the list and run in parallel.
It is better to use thread pool in any case, because in this case you can get rid of low level operations for managing arrays of objects and increase flexibility.
You should use ExecutorService instance in your code and choose right type of it.
For example:
Executors.newCachedThreadPool - if your processing logic is simple enough and doesn't require much processing time (to not produce too many concurrent threads that will cause different exceptions).
Executors.newFixedThreadPool - if your processing logic is complex enough, so you should limit number of threads.
So, I think that you should:
Create required ExecutionService in your consumer.
Go through your collection and submit processing job to executor (for each element, instance of Callable). Save them in List<Future<?>> instance.
Iterate through futures (wait completion of all tasks), save results in new collection, send results somewhere and commit kafka offset.
Related
How do I set up some thread-local storage for each worker thread in the default (common) fork-join pool, and how do I access that TLS from the main thread?
I need to implement the "collector" pattern for Java parallel streams, which is like reduce in MapReduce, but with partial reductions of associative operations grouped within each thread, prior to one final reduction step at the end. (Note that I do not want to use the MapReduce pattern directly, for performance reasons -- collectors cut down on the amount of data sent from mappers to reducers.)
Basically I need each thread in the common fork-join thread pool to have a large array of accumulators associated with it, and as the mappers run using Stream.parallel().forEach(...), each thread should collect the resulting value by updating some bin in its own accumulator in a lock-free way. At the end of the operation, I want the calling thread (the main thread) to have access to the accumulators from each worker thread, so that it can do a final reduce to combine all the accumulators into a single accumulator array.
My idea is to use a ConcurrentSkipListMap indexed by thread name to store each of the thread-local accumulators, but there is quite a lot of overhead to this, so it's not ideal.
I'm slightly confused by the internal scheduling mechanism of the ExecutorService and the ForkJoinPool.
I understand the ExecutorService scheduling is done this way.
A bunch of tasks are queued. Once a thread is available it will handle the first available task and so forth.
Meanwhile, a ForkJoinPool is presented as distinct because it uses a work-stealing algorithm. If I understand correctly it means a thread can steal some tasks from another thread.
Yet, I don't really understand the difference between the mechanism implemented in ExecutorService and in ForkJoinPool. From my understanding, both mechanisms should reduce the idle time of each thread as much as possible.
I would understand if in the case of an ExecutorService, each thread would have its own queue. Yet, it is not the case as the queue is shared by the different threads of the pool...
Any clarification would be more than welcome!
Suppose you have a very big array of ints and you want to add all of them. With an ExecutorService you might say: let's divide that array into chunks of let's say number of threads / 4. So if you have an array of 160 elements (and you have 4 CPUs), you create 160 / 4 / 4 = 10, so you would create 16 chunks each holding 10 ints. Create runnables/callables and submit those to an executor service (and of course think of a way to merge those results once they are done).
Now your hopes are that each of the CPUs will take 4 of those tasks and work on them. Now let's also suppose that some of the numbers are very complicated to add (of course not, but bear with me), it could turn out that 3 threads/CPUs are done with their work while one of them is busy only with the first chunk. No one wants that, of course, but could happen. The bad thing now is that you can't do anything about it.
What ForkJoinPool does instead is say provide me with how you want to split your task and the implementation for the minimal workload I have to do and I'll take care of the rest. In the Stream API this is done with Spliterators; mainly with two methods trySplit (that either returns null meaning nothing can be split more or a new Spliterator - meaning a new chunk) and forEachRemaning that will process elements once you can't split your task anymore. And this is where work stealing will help you.
You say how your chunks are computed (usually split in half) and what to do when you can't split anymore. ForkJoinPool will dispatch the first chunk to all threads and when some of them are free - they are done with their work, they can query other queues from other threads and see if they have work. If they notice that there are chunks in some other threads queues, they will take them, split them on their own and work on those. It can even turn out that they don't do the entire work on that chunks on their own - some other thread can now query this thread's queue and notice that there is still work to do and so on... This is far better as now, when those 3 threads are free they can pick up some other work to do - and all of them are busy.
This example is a bit simplified, but is not very far from reality. It's just that you need to have a lot more chunks than CPU's/threads for work stealing to work; thus usually trySplit has to have a smart implementation and you need lots of elements in the source of your stream.
I have a situation in which I have to run some 10,000 threads. Obviously, one machine cannot run these many threads in parallel. Is there any way by which we can ask Thread pool to run some specific number of threads in the beginning and as soon as one thread finishes, the threads which are left can start their processing ?
Executors.newFixedThreadPool(nThreads) is what most likely you are looking for. There will only be as many threads running at one time as the number of threads specified. And yes one machine cannot run 10,000 threads at once in parallel, but it will be able to run them concurrently. Depending on how resource intensive each thread is, it may be more efficient in your case to use
Executors.newCachedThreadPool() wherein as many threads are created as needed, and threads that have finished are reused.
Using Executors.newFixedThreadPool(10000) with invokeAll will throw an OutOfMemory exception with that many threads. You still could use it by submitting tasks to it instead of invoking all tasks at same time, that's I would say safer than just invokeAll.
For this use case. You can have a ThreadPollExecuter with Blocking Queue. http://howtodoinjava.com/core-java/multi-threading/how-to-use-blockingqueue-and-threadpoolexecutor-in-java/ this tutorial explains that very well.
It sounds like you want to run 10,000 tasks on a group of threads. A relatively simple approach is to create a List and then add all the tasks to the list, wrapping them in Runnable. Then, create a class that takes the list in the constructor and pops a Runnable of the list and then runs it. This activity must be synchronized in some manner. The class exits when the list is empty. Start some number of threads using this class. They'll burn down the list and then stop. Your main thread can monitor the length of the list.
When I have hundreds of items to iterate through, and I have to do a computation-heavy operation to each one, I would take a "divide and conquer" approach. Essentially, I would take the processor count + 1, and divide those items into the same number of batches. And then I would execute each batch on a runnable in a cached thread pool. It seems to work well. My GUI task went from 20 seconds to 2 seconds, which is a much better experience for the user.
However, I was reading Brian Goetz' fine book on concurrency, and I noticed that for iterating through a list of items, he would take a totally different approach. He would kick off a Runnable for each item! Before, I always speculated this would be bad, especially on a cached thread pool which could create tons of threads. However each runnable would probably finish very quickly in the larger scope, and I understand the cached thread pool is very optimal for short tasks.
So which is the more accepted paradigm to iterate through computation-heavy items? Dividing into a fixed number of batches and giving each batch a runnable? Or kicking each item off in its own runnable? If the latter approach is optimal, is it okay to use a cached thread pool or is it better to use a bounded thread pool?
With batches you will always have to wait for the longest running batch (you are as fast as the slowest batch). "Divide and conquer" implies management overhead: doing administration for the dividing and monitoring the conquering.
Creating a task for each item is relative straightforward (no management), but you are right in that it may start hundreds of threads (unlikely, but it could happen) which will only slow things down (context switching) if the task does no/very few I/O and is mostly CPU intensive.
If the cached thread pool does not start hundreds of threads (see getLargestPoolSize), then by all means, use the cached thread pool. If too many threads are started then one alternative is to use a bounded thread pool. But a bounded thread pool needs some tuning/decisions: do you use an unbounded task queue or a bounded task queue with a CallerRunsPolicy for example?
On a side note: there is also the ForkJoinPool which is suitable for tasks that start sub-tasks.
I'm working on a project where there is a large input of data elements that need to be processed. The processing of each is independent of the others and I need a return a result from each. What I'm doing now is creating a Callable task for each element to do the processing and using ExecutorCompletionService to collect the Future result as the threads complete.
I then have another thread that is pulling the Future objects from the ExecutorCompletionService queue. This thread just spins in an infinite while loop and calls take() which blocks until a Future shows up in the queue.
What I'm trying to do is avoid the scenario where the queue of Future objects grows faster than I pull them off the queue so I'd like to sleep the process that's creating tasks if I get behind on processing the Future results.
The problem I'm running into is that I'm not able to find a way to see how many Future objects are in the ExecutorCompletionService queue. Is there a way to do this?
I could probably keep an external counter that I increment when a new task is created and decrement when a Future is processed but this only gets me to the number of outstanding tasks, not the number that are actually done. Any thoughts on the best way to tackle this?
You can pass the queues an executor uses using one of the overloaded constructor. Since queue implements Collection you could just call .size() on that queue. You will have a queue for the completion and another queue for the executor that the ExecutorCompletionService uses so you could tell how many are submitted and how many are completed between those two.
You'll just need to hold on to those queues after you create them and pass it to whatever is watching the size of them.
http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ExecutorCompletionService.html shows the overloaded constructor