get objects from List using Multi Threads - java

I have a List of 100,000 objects. Want to read the List as fast as possible.
Had split them into multiple small List each of 500 objects
List<List<String>> smallerLists = Lists.partition(bigList, 500);
ExecutorService executor = Executors.newFixedThreadPool(smallerLists.size());
for(int i = 0; i < smallerLists.size();i++) {
MyXMLConverter xmlList = new MyXMLConverter(smallerLists.get(i));
executor.execute(xmlList);
}
executor.shutdown();
while (!executor.isTerminated()) {}
MyXMLConverter.java
Again using Executors of 50 threads, to process these 500 objects List.
public MyXMLConverter(List<String> data){
this.data = data;
}
#Override
public void run() {
try {
convertLine();
} catch (Exception ex) {}
}
public void convertLine(){
ExecutorService executor = Executors.newFixedThreadPool(50);
for(int i = 0; i < data.size();i++) {
MyConverter worker = new MyConverter(list.get(i));
executor.execute(worker);
}
executor.shutdown();
while (!executor.isTerminated()) {}
}
It's consuming lot of time in fetching the objects from List. Is there any better way to do this ? Please suggest.

Since processing time of each item may vary, it'd be better to just have each worker thread pull the next item to processes directly from the main list, in order to keep all threads busy at the end.
Multi-threaded pulling from a shared list is best done using one of the concurrent collections. In your case, ConcurrentLinkedQueue would be a prime candidate.
So, copy your list into a ConcurrentLinkedQueue (or build the "list" directly as a queue), and let your threads call poll() until it return null.
If building the list of 100000 elements take time too, you can even kickstart the process by allowing worker threads to begin their job while building the queue. For this, you'd use a LinkedBlockingQueue, and the workers would call take().
You'd then add a special element to the queue to mark the end, and when a worker get the end-marker, it would put it back in the queue for the next worker, then exit.

There is two main problem
Your code create 200 * 50 + 50 threads
Most of them do nothing in infinite loop: while (!executor.isTerminated()) {}
I suggest to use something like this.
ExecutorService executor = Executors.newFixedThreadPool(COUNT_OF_YOUR_PROCESSOR_CORESS * 2);
List<Future<?>> futureList = new ArrayList<Future<?>>();
for(String currentString : bigList) {
MyConverter worker = new MyConverter(currentString);
Future<?> future = executor.submit(worker);
futureList.add(future);
}
Collections.reverse(futureList);
for (Future<?> future : futureList){
future.get();
}
executor.shutdown(); //No worries. All task already executed here
Or if you Java 8 addict then
bigList.parallelStream().forEach(s -> new MyConverter(s).run());

Related

Parallellize a for loop in Java using multi-threading

I am very new to java and I want to parallelize a nested for loop using executor service or using any other method in java. I want to create some fixed number of threads so that CPU is not completely acquired by threads.
for(SellerNames sellerNames : sellerDataList) {
for(String selleName : sellerNames) {
//getSellerAddress(sellerName)
//parallize this task
}
}
size of sellerDataList = 1000 and size of sellerNames = 5000.
Now I want to create 10 threads and assign equal chunk of task to each thread equally. That is for i'th sellerDataList, first thread should get address for 500 names, second thread should get address for next 500 names and so on.
What is the best way to do this job?
There are two ways to make it run parallelly: Streams and Executors.
Using streams
You can use parallel streams and leave the rest to the jvm. In this case you don't have too much control over what happens when. On the other hand your code will be easy to read and maintain:
sellerDataList.stream().forEach(sellerNames -> {
Stream<String> stream = StreamSupport.stream(sellerNames.spliterator(), true); // true means use parallel stream
stream.forEach(sellerName -> {
getSellerAddress(sellerName);
});
});
Using an ExecutorService
Suppose, you want 5 Threads and you want to be able to wait until task completion. Then you can use a fixed thread pool with 5 threads and use Future-s so you can wait until they are done.
final ExecutorService executor = Executors.newFixedThreadPool(5); // it's just an arbitrary number
final List<Future<?>> futures = new ArrayList<>();
for (SellerNames sellerNames : sellerDataList) {
for (final String sellerName : sellerNames) {
Future<?> future = executor.submit(() -> {
getSellerAddress(sellerName);
});
futures.add(future);
}
}
try {
for (Future<?> future : futures) {
future.get(); // do anything you need, e.g. isDone(), ...
}
} catch (InterruptedException | ExecutionException e) {
e.printStackTrace();
}
If you are using a parallel stream you can still control the thread by creating your own ForkJoinPool.
List<Long> aList = LongStream.rangeClosed(firstNum, lastNum).boxed()
.collect(Collectors.toList());
ForkJoinPool customThreadPool = new ForkJoinPool(4);
long actualTotal = customThreadPool.submit(
() -> aList.parallelStream().reduce(0L, Long::sum)).get();
Here on this site, it is described very well.
https://www.baeldung.com/java-8-parallel-streams-custom-threadpool

How to wait until all submitted tasks in ExecutorService are completed without shutdown?

Imagine we traverse a collection and submit tasks to be run in background
class Processor {
public void process(Iterable<Item> items, ExecutorService executorService) {
for (Item item : items) {
doStandardProcess(item);
if (needSpecialProcess(item)) {
executorService.submit(createSpecialTaskFor(item));
}
}
}
}
Program flow looks like:
receive items from somewhere
create Processor and process them
send the result to somewhere
Result depends on the background processing, so p.3 should wait until all tasks will be completed. I know it can be achieved by the combination of shutdown() and awaitTermination(), but I do not want to shutdown the service. Also there is a possibility to call invokeAll(List tasks), but as you see, tasks are created one by one during traverse.
How can I achieve waiting for completion with given restrictions?
P.S. If it was not clear, another restriction is to run background tasks in parallel with items traversal, because background tasks takes x100 more time than basic processing operation.
You can store the futures:
List<Future> futures = new ArrayList<> ();
//in the for loop
futures.add(executorService.submit(createTaskFor(item)));
//after for loop + add exception handling
for (Future f : futures) f.get();
//at this point all tasks have finished
List<Callable<Foo>> toProcess = new ArrayList<>();
for (Item item : items) {
if (needProcess(item)) {
toProcess.add(createTaskFor(item));
}
}
executorService.invokeAll(toProcess);

Create and add Runnable only when one/more of the worker Thread is available..?

I am executing millions of iteration and I want to parallelize this. Hence decided to add the task [each iteration] to the Thread Pool.
Now, if I add all the iteration to the Thread Pool, it might throw an OutOfMemoryError. I want to handle that gracefully, so is there any way to know about the availability of the worker Thread in the Thread Pool?
Once it's available, add the Runnable to the Worker Thread.
for(int i=0; i<10000000000; i++) {
executor.submit(new Task(i));
}
Each of those tasks merely take 1 sec to complete.
Why don't you set a limit to how many tasks can run concurrently. Like:
HashSet<Future> futures = new HashSet<>();
int concurrentTasks = 1000;
for (int ii=0; ii<100000000; ii++) {
while(concurrentTasks-- > 0 && ii<100000000) {
concurrentTasks.add(executor.submit(new Task(ii)));
}
Iterator<Future> it = concurrentTasks.iterator();
while(it.hasNext()) {
Future task = it.next();
if (task.isDone()) {
concurrentTasks++;
it.remove();
}
}
}
You'll want to use something like this:
ArrayBlockingQueue<Runnable> queue = new ArrayBlockingQueue<Runnable>(MAX_PENDING_TASKS);
Executor executor = new ThreadPoolExecutor(MIN_THREADS, MAX_THREADS, IDLE_TIMEOUT, TimeUnit.SECONDS, queue, new ThreadPoolExecutor.CallerRunsPolicy());
for(int i=0; i<10000000000; i++) {
executor.submit(new Task(i));
}
Basically you create a thread pool with min/max threads and an array backed queue. When you hit the limit of pending tasks, the "caller runs policy" kicks in and your main thread ends up running the next task (giving time for your other tasks to complete and open slots in the queue).
Since you've stated that your tasks are short lived, this seems like an optimal strategy.
The values for MAX_PENDING_TASKS and MIN_THREADS are something you can fiddle with to figure out what the optimal values are for your workload, but MAX_PENDING_TASKS should be at least twice MIN_THREADS and probably more like 10 to 100 times.
You should use java.lang.Runtime
The biggest memory issue is probably going to be your Object creation, not in adding them to your Executor, so that's where you should be calling Runtime.getRuntime().freeMemory().

Recursive concurrency

I have the following function, in pseudo-code:
Result calc(Data data) {
if (data.isFinal()) {
return new Result(data); // This is the actual lengthy calculation
} else {
List<Result> results = new ArrayList<Result>();
for (int i=0; i<data.numOfSubTasks(); ++i) {
results.add(calc(data.subTask(i));
}
return new Result(results); // merge all results in to a single result
}
}
I want to parallelize it, using a fixed number of threads.
My first attempt was:
ExecutorService executorService = Executors.newFixedThreadPool(numOfThreads);
Result calc(Data data) {
if (data.isFinal()) {
return new Result(data); // This is the actual lengthy calculation
} else {
List<Result> results = new ArrayList<Result>();
List<Callable<Void>> callables = new ArrayList<Callable<Void>>();
for (int i=0; i<data.numOfSubTasks(); ++i) {
callables.add(new Callable<Void>() {
public Void call() {
results.add(calc(data.subTask(i));
}
});
}
executorService.invokeAll(callables); // wait for all sub-tasks to complete
return new Result(results); // merge all results in to a single result
}
}
However, this quickly got stuck in a deadlock, because, while the top recursion level waits for all threads to finish, the inner levels also wait for threads to become available...
How can I efficiently parallelize my program without deadlocks?
Your problem is a general design problem when using ThreadPoolExecutor for tasks with dependencies.
I see two options:
1) Make sure to submit tasks in a bottom-up order, so that you never have a running task that depends on a task which didn't start yet.
2) Use the "direct handoff" strategy (See ThreadPoolExecutor documentation):
ThreadPoolExecutor executor = new ThreadPoolExecutor(poolSize, poolSize, 0, TimeUnit.SECONDS, new SynchronousQueue<Runnable>());
executor.setRejectedExecutionHandler(new CallerRunsPolicy());
The idea is using a synchronous queue so that tasks never wait in a real queue. The rejection handler takes care of tasks which don't have an available thread to run on. With this particular handler, the submitter thread runs the rejected tasks.
This executor configuration guarantees that tasks are never rejected, and that you never have deadlocks due to inter-task dependencies.
you should split your approach in two phases:
create all the tree down until data.isFinal() == true
recursively collect the results (only possible if the merging does not produce other operations/calls)
To do that, you can use [Futures][1] to make the results async. Means all results of calc will be of type Future[Result].
Immediately returning a Future will free the current thread and give space for the processing of others. With the collection of the Results (new Result(results)) you should wait for all results to be ready (ScatterGather-Pattern, you can use a semaphore to wait for all results). The collection itself will be walking a tree and checking (or waiting for the results to arrive) will happen in a single thread.
Overall you build a tree of Futures, that is used to collect the results and perform only the "expensive" operations in the threadpool.

Java Multithreading: How to use multi-threading in different ArrayList containing record info?

I retrieved 50000 data from database and stored them to arraylist. I split the arraylist into half saying 250000 stored in ArrayList1 (even rows) and other 25000 ArrayList2 (odd rows).
Now, I need to use multi-threading in order to process these such that all 50,000 records are processed at a time. Main aim is to speed up the transaction.
The problem is userList gets too heavy and takes time.
How can I implement ExecutorService to speed up?
Hoping to receive your suggestions asap.
List<String[]> userList = new ArrayList<String[]>();
void getRecords()
{
String [] props=null;
while (rs.next()) {
props = new String[2];
props[0] = rs.getString("useremail");
props[1] = rs.getString("active");
userList.add(props);
if (userList.size()>0) sendEmail();
}
}
void sendEmail()
{
String [] user=null;
for (int k=0; k<userList.size(); k++)
{
user = userList.get(k);
userEmail = user[0];
//send email code
}
}
Thanks in advance.
There's a simpler approach: producer-consumer. Leave all items in a single list and define a processing task that encapsulates a data item:
class Task implements Runnable {
private Object data;
public Task(Object data) {
this.data = data;
}
public void run() {
// process data
}
}
Create a thread pool and feed it the tasks one by one:
ExecutorService exec = Executors.newFixedThreadPool(4); // 4 threads
for(Object obj: itemList) {
exec.submit(new Task(obj));
}
exec.shutdown();
exec.awaitTermination(Long.MAX_VALUE, TimeUnit.DAYS);
Now you have parallel execution and load balancing (!!!) since the threads execute work on-demand as they finish previous tasks. By splitting the array into contiguous sections you don't have this guarantee.
I would create an ArrayList for each Thread. That way each thread only reads one list and you won't have a multi-threading issue.
ExecutorService service = ...
List<Work> workList = ...
int blockSize = (workList.size() + threads - 1)/threads;
for(int i = 0; i < threads;i ++) {
int start = i * blockSize;
int end = Math.min((i + 1) * blockSize, workList.size());
final List<Work> someWork = work.subList(start, end);
service.submit(new Runnable() {
public void run() {
process(someWork);
}
});
}
You can use any number of threads, but I suggest using the smallest number which gives you a performance increase.
I don't know why you've split the list into two lists. Why not keep them in one, and run two threads - one processing the even rows, one processing the odd rows ?
Regardless, check out the Java Executor framework. It allows you to easily write jobs and submit them for running (using thread pools, schedulign them etc.). Given that the executor framework can handle arbitrary numbers of threads, I would split your workload more intelligently (perhaps into sublists of 'n' elements) and determine (via changing the number of jobs/threads) which configuration runs the fastest in your particular scenario.
I would use a Queue instead of a List, probably a ConcurrentLinkedQueue. That should be thread safe and thus allow concurent access from different threads.

Categories