Java ForkJoinPool with .forEach and .add

Java ForkJoinPool with .forEach and .add - java

I have a List of TicketDTO objects where every TicketDTO needs to go through a function to convert the data to TicketDataDTO. What I want here is to reduce the time it takes for this code to run because when the list size is bigger, it takes a lot of time to convert it and it's unacceptable for fetching the data through a GET mapping. However, when I try to implement ForkJoinPool along with the parallelStream) (code below) to get it done, my return List` is empty. Can someone tell me what am I doing wrong?
#Override
public List<TicketDataDTO> getOtrsTickets(String value, String startDate, String endDate, String product, String user) {
// TODO Implement threads
List<TicketDTO> tickets = ticketDao.findOtrsTickets(value, startDate, endDate, product, user);
Stream<TicketDTO> ticketsStream = tickets.parallelStream();
List<TicketDataDTO> data = new ArrayList<TicketDataDTO>();
ForkJoinPool forkJoinPool = new ForkJoinPool(6);
forkJoinPool.submit(() -> {
try {
ticketsStream.forEach(ticket -> data.add(createTicketData(ticket)));
} catch (Exception e) {
throw new RuntimeException(e);
}
});
forkJoinPool.shutdown();
//ticketsStream.forEach(ticket -> data.add(createTicketData(ticket)));
return data;
createTicketData is just a function with two for loops and one switch loop to create some new columns I need as an output.

Additional to calling shutdown() on the ForkJoinPool, you have to wait for its termination like
forkJoinPool.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS);
If you do not wait for the termination, data will be returned before the threads have the chance to add their results to it.
See
How to wait for all threads to finish, using ExecutorService?
for more details

Related

How can I create multithreaded code using ThreadpoolExecutor & Callable Future

I have an execute method which is running multiple test cases one by one, the test cases are passed in a list of Strings arrays.
I am trying to run this test cases in multi-threaded way, also writing data in CSV file in parallel.
Here is what I have done but it seems that the code is not working in a multithreaded way. I have passed nThread 2,5,7 in newFixedThreadPool() but it is taking the same time to execute the code.
private void executeTest(List<String[]> inputArray) throws ExecutionException, InterruptedException {
ThreadPoolExecutor executor = (ThreadPoolExecutor) Executors.newFixedThreadPool(nThreads);//2, 5, 7
long start = System.currentTimeMillis();
for (String[] listOfArray : inputArray) {
Callable c2 = new Callable() {
public ApiResponse call() {
response = runTestCase(listOfArray);
try {
csvWriter.writeCsv(listOfArray[0], response);
} catch (IOException e) {
e.printStackTrace();
}
return response;
}
};
System.out.println("nThread :"+nThreads);
Future<ApiResponse> result = executor.submit(c2);
result.get();
}
long stop = System.currentTimeMillis();
long timeTaken = stop - start;
System.out.println("Total time taken :"+timeTaken+"No of Theads :"+nThreads);
}

The call to future result.get(0) blocks until the action is completed, so you are just executing the tasks one by one inside your loop - even if they are actioned on different threads by the executor service.
// result.get();
Instead remove the line above and await termination at the end so that the full number of threads in your pool may receive tasks at same time, such as:
// All task submitted, mark for shutdown (only call after ALL submits done)
executor.shutdown();
// Wait for the executor service to finish
// You should consider how long this should be:
if (!executor.awaitTermination(whateverTimeIsReasonable, TimeUnit.SECONDS))
throw new RuntimeException("Test failed");
Tests that hiding exceptions are no help for testing, changing this:
e.printStackTrace();
to throw new UncheckedIOException(e); will ensure that all errors are reported.

How to aggregate results from making CompletableFuture calls in a loop?

I am just learning and trying to apply CompletableFuture to my problem statement. I have a list of items I am iterating over.
Prop is a class with only two attributes prop1 and prop2, respective getters and setters.
List<Prop> result = new ArrayList<>();
for ( Item item : items ) {
item.load();
Prop temp = new Prop();
// once the item is loaded, get its properties
temp.setProp1(item.getProp1());
temp.setProp2(item.getProp2());
result.add(temp);
}
return result;
However, item.load() here is a blocking call. So, I was thinking to use CompletableFuture something like below -
for (Item item : items) {
CompletableFuture<Prop> prop = CompletableFuture.supplyAsync(() -> {
try {
item.load();
return item;
} catch (Exception e) {
logger.error("Error");
return null;
}
}).thenApply(item1 -> {
try {
Prop temp = new Prop();
// once the item is loaded, get its properties
temp.setProp1(item.getProp1());
temp.setProp2(item.getProp2());
return temp;
} catch (Exception e) {
}
});
}
But I am not sure how I can wait for all the items to be loaded and then aggregate and return their result.
I may be completely wrong in the way of implementing CompletableFutures since this is my first attempt. Please pardon any mistake. Thanks in advance for any help.

There are two issues with your approach of using CompletableFuture.
First, you say item.load() is a blocking call, so the CompletableFuture’s default executor is not suitable for it, as it tries to achieve a level of parallelism matching the number of CPU cores. You could solve this by passing a different Executor to CompletableFuture’s asynchronous methods, but your load() method doesn’t return a value that your subsequent operations rely on. So the use of CompletableFuture complicates the design without a benefit.
You can perform the load() invocations asynchronously and wait for their completion just using an ExecutorService, followed by the loop as-is (without the already performed load() operation, of course):
ExecutorService es = Executors.newCachedThreadPool();
es.invokeAll(items.stream()
.map(i -> Executors.callable(i::load))
.collect(Collectors.toList()));
es.shutdown();
List<Prop> result = new ArrayList<>();
for(Item item : items) {
Prop temp = new Prop();
// once the item is loaded, get its properties
temp.setProp1(item.getProp1());
temp.setProp2(item.getProp2());
result.add(temp);
}
return result;
You can control the level of parallelism through the choice of the executor, e.g. you could use a Executors.newFixedThreadPool(numberOfThreads) instead of the unbounded thread pool.

Stream generate at fixed rate

I'm using Stream.generate to get data from Instagram. As instagram limits calls per hour I want generate to run less frequent then every 2 seconds.
I've chosen such title because I moved from ScheduledExecutorService.scheduleAtFixedRate and that's what I was searching for. I do realise that stream intermediate operations are lazy and cannot be called on schedule. If you have better idea for title let me know.
So again I want to have at least 2 second delay between genations.
My attempt wich doesn't take into consideration time consumed by operations after generate, which might take longer then 2s:
Stream.generate(() -> {
List<MediaFeedData> feedDataList = null;
while (feedDataList == null) {
try {
Thread.sleep(2000);
feedDataList = newData();
} catch (InstagramException e) {
notifyError(e.getMessage());
e.printStackTrace();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
return feedDataList;
})

A solution would be to decouple the generator from the Stream, for example using a BlockingQueue
final BlockingQueue<Integer> queue = new LinkedBlockingQueue<>(100);
ScheduledExecutorService scheduler = new ScheduledThreadPoolExecutor(1);
scheduler.scheduleAtFixedRate(() -> {
// Generate new data every 2s, regardless of their processing rate
ThreadLocalRandom random = ThreadLocalRandom.current();
queue.offer(random.nextInt(10));
}, 0, 2, TimeUnit.SECONDS);
Stream.generate(() -> {
try {
// Accept new data if ready, or wait for some more to be generated
return queue.take();
} catch (InterruptedException e) {}
return -1;
}).forEach(System.out::println);
If the data processing takes more than 2s, new data will be enqueued and wait to be consumed. If it takes less than 2s, the take method in the generator will wait for new data to be produced by the scheduler.
This way, you are guaranteed to make less than N calls per hour to Instagram !

As far as I understand, your question is about solving two problems:
waiting at a fixed rate rather than a fixed delay
creating a stream for an unknown number of items which allows processing until some point of time (i.e. is not infinite)
You can solve the first task by using a deadline-based waiting and the second by implementing a Spliterator:
Stream<List<MediaFeedData>> stream = StreamSupport.stream(
new Spliterators.AbstractSpliterator<List<MediaFeedData>>(Long.MAX_VALUE, 0) {
long lastTime=System.currentTimeMillis();
#Override
public boolean tryAdvance(Consumer<? super List<MediaFeedData>> action) {
if(quitCondition()) return false;
List<MediaFeedData> feedDataList = null;
while (feedDataList == null) {
lastTime+=TimeUnit.SECONDS.toMillis(2);
while(System.currentTimeMillis()<lastTime)
LockSupport.parkUntil(lastTime);
try {
feedDataList=newData();
} catch (InstagramException e) {
notifyError(e.getMessage());
if(QUIT_ON_EXCEPTION) return false;
}
}
action.accept(feedDataList);
return true;
}
}, false);

Make a Timer and a semaphore. The timer raises the semaphore every 2 seconds, and in the stream you wait on every call for the semaphore.
This keeps the waits to the specified minimum (2 s), and - funnily - would even work with .parallel().
private final volatile Semaphore tickingSemaphore= new Semaphore(1, true);
In its own thread:
Stream.generate(() -> {
tickingSemaphore.acquire();
...
};
In the timer:
tickingSemaphore.release();

Cache with long running computations of values

I need to store objects in a cache and hese objects take a long time to create. I started with ConcurrentHashMap<id, Future<Object>> and everything was fine, until Out of Memory started to happen. Moved to SoftReferences and it was better, but now I need to control eviction. I'm in the process of moving to Ehcache.
I'm sure there is a library for such thing but I really need to understand the logic of doing the cache storage and calculation in two phases, while keeping everything consistent and not recalculating something that is already calculated or in the process of being calculated. Is a two level cache, one for the more persistent result and the other for the in the process of being calculated.
Any hints on how to better the following code which I'm sure has concurrency problems in the Callable.call() method?
public class TwoLevelCache {
// cache that serializes everything except Futures
private Cache complexicos = new Cache();
private ConcurrentMap<Integer, Future<Complexixo>> calculations =
new ConcurrentHashMap<Integer, Future<Complexico>>();
public Complexico get(final Integer id) {
// if in cache return it
Complexico c = complexicos.get(id);
if (c != null) { return c; }
// if in calculation wait for future
Future f = calculations.get(id);
if (f != null) { return f.get(); } // exceptions obviated
// if not, setup calculation
Callable<Complexico> callable = new Callable<Complexico>() {
public Complexico call() throws Exception {
Complexico complexico = compute(id);
// this might be a problem here
// but how to synchronize without
// blocking the whole structure?
complexicos.put(id, complexico);
calculations.remove(id);
return complexico;
}
};
// store calculation
FutureTask<Complexico> task = new FutureTask<Complexico>(callable);
Future<Complexico> future = futures.putIfAbsent(id, task);
if (future == null) {
// not previosly being run, so start calculation
task.run();
return task.get(); // exceptions obviated
} else {
// there was a previous calculation, so use that
return future.get(); // exceptions obviated
}
}
private Complexico compute(final Integer id) {
// very long computation of complexico
}
}

And what do you do with the values once they are calculated?
What is the number of new calculations per second?
If they are used (stored) and then disposed then I think that Reactive approach (RxJava and similar) could be a nice solution. You could put your "tasks" (a POJO with all info needed to perform calculation) on some off-heap structure (it could be some persistent queue etc.) and only perform calculations for as many as you want (throttle the process with the number for computational threads you want to have).
This way you would avoid OOM and would also gain much more control over the entire process.

Concurrent iteration and deletion from Set in Java

I have a pre-populated set of strings. I want to iterate over the items and while iterating, i need to "do work" which might also remove the item from the set. I want to spawn a new thread for each item's "do work". Please note that only some items are removed from the set during "do work".
Now i have the following question,
Can i achieve this by simply using Collections.synchronizedSet(new HashSet()); ? I am guessing this will throw up ConcurrentModificationException since i am removing items from the list while i am iterating. How can i achieve the above behavior efficiently without consistency issues ?
Thanks!

I would use an ExecutorService
ExecutorService es = Executors.newFixedThreadPool(n);
List<Future<String>> toRemove = new ARraysList<>();
for(String s: set)
toRemove.add(es.submit(new Task(s)));
for(Future<String> future : toRemove()) {
String s = future.get();
if (s != null)
set.remove(s);
}
This avoids needing to access the collection in a multi-threaded way.

Use a master producer thread that will remove the elements from the collection and will feed them to consumer threads. The consumer threads have no need to "personally" remove the items.

Yes, a SynchronisedSet will still throw ConcurrentModificationExceptions.
Try this:
Set s = Collections.newSetFromMap(new ConcurrentHashMap())
ConcurrentHashMap should never throw a ConcurrentModificationException, when multiple threads are accessing and modifying it.

The approach depends on the relation between the data in your set and the successful completion of the operation.
Remove from Set is independent of the result of task execution
If you don't care about the actual result of the thread execution, you can just go through the set and remove every item as you dispatch the task (you have some examples of that already)
Remove from Set only if task execution completed successfully
If the deletion from the set should be transactional to the success of the execution, you could use Futures to collect information about the success of the task execution. That way, only successfully executed items will be deleted from the original set. There's no need to access the Set structure concurrently, as you can separate execution from check using Futures and an ExecutorService . eg:
// This task will execute the job and,
// if successful, return the string used as context
class Task implements Callable<String> {
final String target;
Task(String s) {
this.target = s;
}
#Override
public String call() throws Exception {
// do your stuff
// throw an exception if failed
return target;
}
}
And this is how it's used:
ExecutorService executor;
Set<Callable<String>> myTasks = new HashSet<Callable<String>>();
for(String s: set) {
myTasks.add(new Task(s));
}
List<Future<String>> results = executor.invoqueAll(myTasks);
for (Future<String> result:results) {
try {
set.remove(result.get());
} catch (ExecutionException ee) {
// the task failed during execution - handle as required
} catch (CancellationException ce) {
// the task was cancelled - handle as required
}
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java ForkJoinPool with .forEach and .add - java

Related

How can I create multithreaded code using ThreadpoolExecutor & Callable Future

How to aggregate results from making CompletableFuture calls in a loop?

Stream generate at fixed rate

Cache with long running computations of values

Concurrent iteration and deletion from Set in Java

Categories

Resources