I have a pre-populated set of strings. I want to iterate over the items and while iterating, i need to "do work" which might also remove the item from the set. I want to spawn a new thread for each item's "do work". Please note that only some items are removed from the set during "do work".
Now i have the following question,
Can i achieve this by simply using Collections.synchronizedSet(new HashSet()); ? I am guessing this will throw up ConcurrentModificationException since i am removing items from the list while i am iterating. How can i achieve the above behavior efficiently without consistency issues ?
Thanks!
I would use an ExecutorService
ExecutorService es = Executors.newFixedThreadPool(n);
List<Future<String>> toRemove = new ARraysList<>();
for(String s: set)
toRemove.add(es.submit(new Task(s)));
for(Future<String> future : toRemove()) {
String s = future.get();
if (s != null)
set.remove(s);
}
This avoids needing to access the collection in a multi-threaded way.
Use a master producer thread that will remove the elements from the collection and will feed them to consumer threads. The consumer threads have no need to "personally" remove the items.
Yes, a SynchronisedSet will still throw ConcurrentModificationExceptions.
Try this:
Set s = Collections.newSetFromMap(new ConcurrentHashMap())
ConcurrentHashMap should never throw a ConcurrentModificationException, when multiple threads are accessing and modifying it.
The approach depends on the relation between the data in your set and the successful completion of the operation.
Remove from Set is independent of the result of task execution
If you don't care about the actual result of the thread execution, you can just go through the set and remove every item as you dispatch the task (you have some examples of that already)
Remove from Set only if task execution completed successfully
If the deletion from the set should be transactional to the success of the execution, you could use Futures to collect information about the success of the task execution. That way, only successfully executed items will be deleted from the original set. There's no need to access the Set structure concurrently, as you can separate execution from check using Futures and an ExecutorService . eg:
// This task will execute the job and,
// if successful, return the string used as context
class Task implements Callable<String> {
final String target;
Task(String s) {
this.target = s;
}
#Override
public String call() throws Exception {
// do your stuff
// throw an exception if failed
return target;
}
}
And this is how it's used:
ExecutorService executor;
Set<Callable<String>> myTasks = new HashSet<Callable<String>>();
for(String s: set) {
myTasks.add(new Task(s));
}
List<Future<String>> results = executor.invoqueAll(myTasks);
for (Future<String> result:results) {
try {
set.remove(result.get());
} catch (ExecutionException ee) {
// the task failed during execution - handle as required
} catch (CancellationException ce) {
// the task was cancelled - handle as required
}
}
Related
I am just learning and trying to apply CompletableFuture to my problem statement. I have a list of items I am iterating over.
Prop is a class with only two attributes prop1 and prop2, respective getters and setters.
List<Prop> result = new ArrayList<>();
for ( Item item : items ) {
item.load();
Prop temp = new Prop();
// once the item is loaded, get its properties
temp.setProp1(item.getProp1());
temp.setProp2(item.getProp2());
result.add(temp);
}
return result;
However, item.load() here is a blocking call. So, I was thinking to use CompletableFuture something like below -
for (Item item : items) {
CompletableFuture<Prop> prop = CompletableFuture.supplyAsync(() -> {
try {
item.load();
return item;
} catch (Exception e) {
logger.error("Error");
return null;
}
}).thenApply(item1 -> {
try {
Prop temp = new Prop();
// once the item is loaded, get its properties
temp.setProp1(item.getProp1());
temp.setProp2(item.getProp2());
return temp;
} catch (Exception e) {
}
});
}
But I am not sure how I can wait for all the items to be loaded and then aggregate and return their result.
I may be completely wrong in the way of implementing CompletableFutures since this is my first attempt. Please pardon any mistake. Thanks in advance for any help.
There are two issues with your approach of using CompletableFuture.
First, you say item.load() is a blocking call, so the CompletableFuture’s default executor is not suitable for it, as it tries to achieve a level of parallelism matching the number of CPU cores. You could solve this by passing a different Executor to CompletableFuture’s asynchronous methods, but your load() method doesn’t return a value that your subsequent operations rely on. So the use of CompletableFuture complicates the design without a benefit.
You can perform the load() invocations asynchronously and wait for their completion just using an ExecutorService, followed by the loop as-is (without the already performed load() operation, of course):
ExecutorService es = Executors.newCachedThreadPool();
es.invokeAll(items.stream()
.map(i -> Executors.callable(i::load))
.collect(Collectors.toList()));
es.shutdown();
List<Prop> result = new ArrayList<>();
for(Item item : items) {
Prop temp = new Prop();
// once the item is loaded, get its properties
temp.setProp1(item.getProp1());
temp.setProp2(item.getProp2());
result.add(temp);
}
return result;
You can control the level of parallelism through the choice of the executor, e.g. you could use a Executors.newFixedThreadPool(numberOfThreads) instead of the unbounded thread pool.
I have method as
public List<SenderResponse> sendAllFiles(String folderName) {
List<File> allFiles = getListOfFiles();
List<SenderResponse> finalResponse = new ArrayList<SenderResponse>();
for (File file : allFiles) {
finalResponse.getResults().add(sendSingleFile(file));
}
return finalResponse;
}
which is running as a single thread. I want run sendSingleFile(file) using multithread so I can reduce the total time taken to send files.
how can I run sendSingleFile(file) using multithreads for various files and get the final response?
I found few articles using threadpoolexecutor. But how to handle the response got during the sendSingleFile(file) and add it to one Final SenderResponse?
I am kind of new to multi-thread. Please suggest the best way to process these files.
Define an executor service
ExecutorService executor = Executors.newFixedThreadPool(MAX_THREAD); //Define integer value of MAX_THREAD
Then for each job you can do something like this:-
Callable<SenderResponse> task = () -> {
try {
return sendSingleFile(file);
}
catch (InterruptedException e) {
throw new IllegalStateException("Interrupted", e);
}
};
Future<SenderResponse> future = executor.submit(task);
future.get(MAX_TIME_TO_WAIT, TimeUnit.SECONDS); //Blocking call. MAX_TIME_TO_WAIT is max time future will wait for the process to execute.
You start by writing code that works works for the single-thread solution. The code you posted wouldn't even compile; as the method signature says to return SenderResponse; whereas you use/return a List<SenderResponse> within the method!
When that stuff works, you continue with this:
You create an instance of
ExecutorService, based on as many threads as you want to
You submit tasks into that service.
Each tasks knows about that result list object. The task does its work, and adds the result to that result list.
The one point to be careful about: making sure that add() is synchronized somehow - having multiple threads update an ordinary ArrayList is not safe.
For your situation, I would use a work stealing pool (ForkJoin executor service) and submit "jobs" to it. If you're using guava, you can wrap that in a listeningDecorator which will allow you to add a listener on the futures it returns.
Example:
// create the executor service
ListeningExecutorService exec = MoreExecutors.listeningDecorator(Executors.newWorkStealingPool());
for(Foo foo : bar) {
// submit can accept Runnable or Callable<T>
final ListenableFuture<T> future = exec.submit(() -> doSomethingWith(foo));
// Run something when it is complete.
future.addListener(() -> doSomeStuff(future), exec);
}
Note that the listener will be called whether the future was successful or not.
I have a List of 100,000 objects. Want to read the List as fast as possible.
Had split them into multiple small List each of 500 objects
List<List<String>> smallerLists = Lists.partition(bigList, 500);
ExecutorService executor = Executors.newFixedThreadPool(smallerLists.size());
for(int i = 0; i < smallerLists.size();i++) {
MyXMLConverter xmlList = new MyXMLConverter(smallerLists.get(i));
executor.execute(xmlList);
}
executor.shutdown();
while (!executor.isTerminated()) {}
MyXMLConverter.java
Again using Executors of 50 threads, to process these 500 objects List.
public MyXMLConverter(List<String> data){
this.data = data;
}
#Override
public void run() {
try {
convertLine();
} catch (Exception ex) {}
}
public void convertLine(){
ExecutorService executor = Executors.newFixedThreadPool(50);
for(int i = 0; i < data.size();i++) {
MyConverter worker = new MyConverter(list.get(i));
executor.execute(worker);
}
executor.shutdown();
while (!executor.isTerminated()) {}
}
It's consuming lot of time in fetching the objects from List. Is there any better way to do this ? Please suggest.
Since processing time of each item may vary, it'd be better to just have each worker thread pull the next item to processes directly from the main list, in order to keep all threads busy at the end.
Multi-threaded pulling from a shared list is best done using one of the concurrent collections. In your case, ConcurrentLinkedQueue would be a prime candidate.
So, copy your list into a ConcurrentLinkedQueue (or build the "list" directly as a queue), and let your threads call poll() until it return null.
If building the list of 100000 elements take time too, you can even kickstart the process by allowing worker threads to begin their job while building the queue. For this, you'd use a LinkedBlockingQueue, and the workers would call take().
You'd then add a special element to the queue to mark the end, and when a worker get the end-marker, it would put it back in the queue for the next worker, then exit.
There is two main problem
Your code create 200 * 50 + 50 threads
Most of them do nothing in infinite loop: while (!executor.isTerminated()) {}
I suggest to use something like this.
ExecutorService executor = Executors.newFixedThreadPool(COUNT_OF_YOUR_PROCESSOR_CORESS * 2);
List<Future<?>> futureList = new ArrayList<Future<?>>();
for(String currentString : bigList) {
MyConverter worker = new MyConverter(currentString);
Future<?> future = executor.submit(worker);
futureList.add(future);
}
Collections.reverse(futureList);
for (Future<?> future : futureList){
future.get();
}
executor.shutdown(); //No worries. All task already executed here
Or if you Java 8 addict then
bigList.parallelStream().forEach(s -> new MyConverter(s).run());
Imagine we traverse a collection and submit tasks to be run in background
class Processor {
public void process(Iterable<Item> items, ExecutorService executorService) {
for (Item item : items) {
doStandardProcess(item);
if (needSpecialProcess(item)) {
executorService.submit(createSpecialTaskFor(item));
}
}
}
}
Program flow looks like:
receive items from somewhere
create Processor and process them
send the result to somewhere
Result depends on the background processing, so p.3 should wait until all tasks will be completed. I know it can be achieved by the combination of shutdown() and awaitTermination(), but I do not want to shutdown the service. Also there is a possibility to call invokeAll(List tasks), but as you see, tasks are created one by one during traverse.
How can I achieve waiting for completion with given restrictions?
P.S. If it was not clear, another restriction is to run background tasks in parallel with items traversal, because background tasks takes x100 more time than basic processing operation.
You can store the futures:
List<Future> futures = new ArrayList<> ();
//in the for loop
futures.add(executorService.submit(createTaskFor(item)));
//after for loop + add exception handling
for (Future f : futures) f.get();
//at this point all tasks have finished
List<Callable<Foo>> toProcess = new ArrayList<>();
for (Item item : items) {
if (needProcess(item)) {
toProcess.add(createTaskFor(item));
}
}
executorService.invokeAll(toProcess);
Im using the ExecutorService in Java to invoke Threads with invokeAll(). After, I get the result set with future.get(). Its really important that I receive the results in the same order I created the threads.
Here is a snippet:
try {
final List threads = new ArrayList();
// create threads
for (String name : collection)
{
final CallObject object = new CallObject(name);
threads.add(object);
}
// start all Threads
results = pool.invokeAll(threads, 3, TimeUnit.SECONDS);
for (Future<String> future : results)
{
try
{
// this method blocks until it receives the result, unless there is a
// timeout set.
final String rs = future.get();
if (future.isDone())
{
// if future.isDone() = true, a timeout did not occur.
// do something
}
else
{
// timeout
// log it and do something
break;
}
}
catch (Exception e)
{
}
}
}
catch (InterruptedException ex)
{
}
Is it assured that I receive the results from future.get() in the same order I created new CallObjects and added them to my ArrayList? I know, Documentation says the following:
invokeAll(): returns a list of Futures representing the tasks, in the same sequential order as produced by the iterator for the given task list. If the operation did not time out, each task will have completed. If it did time out, some of these tasks will not have completed. But I wanted to make sure I understood it correctly....
Thanks for answers! :-)
This is exactly what this piece of the statement is saying:
returns a list of Futures representing the tasks, in the same
sequential order as produced by the iterator for the given task list.
You will get the Futures in the exact order in which you inserted the items in the original list of Callables.
As per the documentation you will get the futures in same order.
Future object is just a reference of the task.
Future#get() is blocking call.
For ex
We have submitted 4 tasks.
Task 1 - > Completed
Task 2 --> Completed
Task 3 --> Timed Out
Task 4 --> Completed
As per our code
for (Future future : futures) {
future.get(); }
For 1&2 second task it will return immediately. We will wait for the third task will get completed. Even 4th task completed , iteration is waiting in third task . Once third task completed or timed wait expire on that time only iteration will continue.