Reducing operation time by using parallel stream

Reducing operation time by using parallel stream - java

In my java 8 spring boot application, I have a list of 40000 records. For each record, I have to call an external API and save the result to DB. How can I do this with better performance within no time? Each of the API calls will take about 20 secs to complete. I used a parallel stream for reducing the time but there was no considerable change in it.
if (!mainList.isEmpty()) {
AtomicInteger counter = new AtomicInteger();
List<List<PolicyAddressDto>> secondList =
new ArrayList<List<PolicyAddressDto>>(
mainList.stream()
.collect(Collectors.groupingBy(it -> counter.getAndIncrement() / subArraySize))
.values());
for (List<PolicyAddressDto> listOfList : secondList) {
listOfList.parallelStream()
.forEach(t -> {
callAtheniumData(t, listDomain1, listDomain2); // listDomain2 and listDomain1 declared
// globally
});
if (!listDomain1.isEmpty()) {
listDomain1Repository.saveAll(listDomain1);
}
if (!listDomain2.isEmpty()) {
listDomain2Repository.saveAll(listDomain2);
}
}
}

Solving a problem in parallel always involves performing more actual work than doing it sequentially. Overhead is involved in splitting the work among several threads and joining or merging the results. Problems like converting short strings to lower-case are small enough that they are in danger of being swamped by the parallel splitting overhead.
As I can see the api call response is not being saved.
Also all api calls are disjoint with respect to each other.
Can we try creating new threads for each api call.
for (List<PolicyAddressDto> listOfList : secondList) {
listOfList.parallelStream()
.forEach(t -> {
new Thread(() ->{callAtheniumData(t, listDomain1, listDomain2)}).start();
});
}

That's because the parallel stream divide the task usually creating one thread per core -1. If every call you do to the external API takes 20 seconds and you have 4 core, this means 3 concurrent requests that wait for 20 seconds.
You can increase the concurrency of your calls in this way https://stackoverflow.com/a/21172732/574147 but I think you're just moving the problems.
An API that takes 20sec it's a really slow "typical" response time. If this is a really complex elaboration and CPU bounded, how can that service be able to respond at 10 concurrent request keeping the same performance? Probably it wouldn't.
Otherwise if the elaboration is "IO bounded" and takes 20 seconds, you probably need a service able to take (and work!) with list of elements

Each of the API calls will take about 20 secs to complete.
Your external API is where you are being bottlenecked. There's really nothing your code can do to speed it up on the client side except to parallelize the process. You've already done that, so if the external API is within your organization, you need to look into any performance improvements there. If not, can do something like offload the processing via Kafka to Apache NiFi or Streamsets so that your Spring Boot API doesn't have to wait for hours to process the data.

Related

Parallel Flux vs Flux in project Reactor

So what I have understood from the docs is that parallel Flux is that essentially divided the flux elements into separate rails.(Essentially something like grouping). And as far as thread is considered, it would be the job of schedulers. So let's consider a situation like this. And all this will be run on the same scheduler instance provided via runOn() methods.
Let's consider a situation like below:
Mono<Response> = webClientCallAPi(..) //function returning Mono from webclient call
Now let's say we make around 100 calls
Flux.range(0,100).subscribeOn(Schedulers.boundedElastic()).flatMap(i -> webClientCallApi(i)).collecttoList() // or subscribe somehow
and if we use paralleFlux:
Flux.range(0,100).parallel().runOn(Schedulers.boundedElastic()).flatMap(i -> webClientCallApi(i)).sequential().collecttoList();
So if my understanding is correct, it pretty much seems to be similar. So what are the advantages of ParallelFlux over Flux and when should you use parallelFlux over flux?

In practice, you'll likely very rarely need to use a parallel flux, including in this example.
In your example, you're firing off 100 web service calls. Bear in mind the actual work needed to do this is very low - you generate and fire off an asynchronous request, and then some time later you receive a response back. In between that request & response you're not doing any work at all, it simply takes a tiny amount of CPU resources when each request is sent, and another tiny about when each response is received. (This is one of the core advantages of using an asynchronous framework to make your web requests, you're not tying up any threads while the request is in-flight.)
If you split this flux and run it in parallel, you're saying that you want these tiny amounts of CPU resources to be split so they can run simultaneously, on different CPU cores. This makes absolutely no sense - the overhead of splitting the flux, running it in parallel and then combining it later is going to be much, much greater than just leaving it to execute on a normal, sequential scheduler.
On the other hand, let's say I had a Flux<Integer> and I wanted to check if each of those integers was a prime for example - or perhaps a Flux<String> of passwords that I wanted to check against a BCrypt hash. Those sorts of operations are genuinely CPU intensive, so in that case a parallel flux, used to split execution across cores, could make a lot of sense. In reality though, those situations occur quite rarely in the normal reactor use cases.
(Also, just as a closing note, you almost always want to use Schedulers.parallel() with a parallel flux, not Schedulers.boundedElastic().)

Java8 stream().map().reduce() is really map reduce

I saw this code somewhere using stream().map().reduce().
Does this map() function really works parallel? If Yes, then how many maximum number of threads it can initiate for map() function?
What if I use parallelStream() instead of just stream() for the below particular use-case.
Can anyone give me good example of where to NOT use parallelStream()
Below code is just to extract tName from tCode and returns comma separated String.
String ts = atList.stream().map(tcode -> {
return CacheUtil.getTCache().getTInfo(tCode).getTName();
}).reduce((tName1, tName2) -> {
return tName1 + ", " + tName2;
}).get();

this stream().map().reduce() is not parallel, thus a single thread acts on the stream.
you have to add parallel or in other cases parallelStream (depends on the API, but it's the same thing). Using parallel by default you will get number of available processors - 1; but the main thread is used too in the ForkJoinPool#commonPool; thus there will be usually 2, 4, 8 threads etc. To check how many you will get, use:
Runtime.getRuntime().availableProcessors()
You can use a custom pool and get as many threads as you want, as shown here.
Also notice that the entire pipeline is run in parallel, not just the map operation.
There isn't a golden law about when to use and when not to use parallel streams, the best way is to measure. But there are obvious choices, like a stream of 10 elements - this is way too little to have any real benefit from parallelization.

All parallel streams use common fork-join thread pool and if you submit a long-running task, you effectively block all threads in the pool. Consequently you block all other tasks that are using parallel streams.
There are only two options how to make sure that such thing will never happen. The first is to ensure that all tasks submitted to the common fork-join pool will not get stuck and will finish in a reasonable time. But it's easier said than done, especially in complex applications. The other option is to not use parallel streams and wait until Oracle allows us to specify the thread pool to be used for parallel streams.
Use case
Lets say you have a collection (List) which gets loaded with values at the start of application and no new value is added to it at any later point. In above scenario you can use parallel stream without any concerns.
Don't worry stream is efficient and safe.

does multi threading improve performance? scenario java [duplicate]

This question already has answers here:
Does multi-threading improve performance? How?
(2 answers)
Closed 8 years ago.
I have a List<Object> objectsToProcess.Lets say it contains 1000000 item`s. For all items in the array you then process each one like this :
for(Object : objectsToProcess){
Go to database retrieve data.
process
save data
}
My question is : would multi threading improve performance? I would of thought that multi threads are allocated by default by the processor anyways?

In the described scenario, given that process is a time-consuming task, and given that the CPU has more than one core, multi-threading will indeed improve the performance.
The processor is not the one who allocates the threads. The processor is the one who provides the resources (virtual CPUs / virtual processors) that can be used by threads by providing more than one execution unit / execution context. Programs need to create multiple threads themselves in order to utilize multiple CPU cores at the same time.
The two major reasons for multi-threading are:
Making use of multiple CPU cores which would otherwise be unused or at least not contribute to reducing the time it takes to solve a given problem - if the problem can be divided into subproblems which can be processed independently of each other (parallelization possible).
Making the program act and react on multiple things at the same time (i.e. Event Thread vs. Swing Worker).
There are programming languages and execution environments in which threads will be created automatically in order to process problems that can be parallelized. Java is not (yet) one of them, but since Java 8 it's on a good way to that, and Java 9 maybe will bring even more.
Usually you do not want significantly more threads than the CPU provides CPU cores, for the simple reason that thread-switching and thread-synchronization is overhead that slows down.
The package java.util.concurrent provides many classes that help with typical problems of multithreading. What you want is an ExecutorService to which you assign the tasks that should be run and completed in parallel. The class Executors provides factor methods for creating popular types of ExecutorServices. If your problem just needs to be solved in parallel, you might want to go for Executors.newCachedThreadPool(). If your problem is urgent, you might want to go for Executors.newWorkStealingPool().
Your code thus could look like this:
final ExecutorService service = Executors.newWorkStealingPool();
for (final Object object : objectsToProcess) {
service.submit(() -> {
Go to database retrieve data.
process
save data
}
});
}
Please note that the sequence in which the objects would be processed is no longer guaranteed if you go for this approach of multithreading.
If your objectsToProcess are something which can provide a parallel stream, you could also do this:
objectsToProcess.parallelStream().forEach(object -> {
Go to database retrieve data.
process
save data
});
This will leave the decisions about how to handle the threads to the VM, which often will be better than implementing the multi-threading ourselves.
Further reading:
http://docs.oracle.com/javase/tutorial/collections/streams/parallelism.html#executing_streams_in_parallel
http://docs.oracle.com/javase/8/docs/api/java/util/concurrent/package-summary.html

Depends on where the time is spent.
If you have a load of calculations to do then allocating work to more threads can help, as you say each thread may execute on a separate CPU. In such a situation there is no value in having more threads than CPUs. As Corbin says you have to figure out how to split the work across the threads and have responsibility for starting the threads, waiting for completion and aggregating the results.
If, as in your case, you are waiting for a database then there can be additional value in using threads. A database can serve several requests in paraallel (the database server itself is multi-threaded) so instead of coding
for(Object : objectsToProcess){
Go to database retrieve data.
process
save data
}
Where you wait for each response before issuing the next, you want to have several worker threads each performing
Go to database retrieve data.
process
save data
Then you get better throughput. The trick though is not to have too many worker threads. Several reasons for that:
Each thread is uses some resources, it has it's own stack, its own
connection to the database. You would not want 10,000 such threads.
Each request uses resources on the server, each connection uses memory, each database server will only serve so many requests in parallel. You have no benefit in submitting thousands of simultaneous requests if it can only server tens of them in parallel. Also If the database is shared you probably don't want to saturate the database with your requests, you need to be a "good citizen".
Net: you will almost certainly get benefit by having a number of worker threads. The number of threads that helps will be determined by factors such as the number of CPUs you have and the ratio between the amount of processing you do and the response time from the DB. You can only really determine that by experiment, so make the number of threads configurable and investigate. Start with say 5, then 10. Keep your eye on the load on the DB as you increase the number of threads.

How to keep a fixed size pool of ListenableFutures?

I am reading a large file of Urls and do requests to a service. The request are executed by a client that returns ListenableFuture. Now I want to keep a pool of ListenableFutures, e.g. have N Futures being executed concurrently in maximum.
The problem I see is that I have no control over the ExecutorService the ListenableFutures are executed in because of the third-party library. Otherwise I would just create a FixedSizePool and create my own Callables.
1) A naïve implementation would be to spawn N Futures and then use AllAsList which would satisfy the fixed size criteria but makes all wait for the slowest request.
Out of order processing is ok.
2) A slightly better naïve option would be to use the first idea and combine it with a rate limiter, by setting N and rate in a way that the amount of concurrent requests is in good approximation to the desired pool size. But I am actually not looking for a way to a Throttle the calls, e.g. using RateLimiter.
3) A last option would be to spawn N Futures and have a Callback that spawns a new one. This satisfies the criteria of a fixed size and minimizes the idle time, but there I don't know how to detect the end if my program, i.e. close the file.
4) A non-ListenableFuture-related approach would be to just .get() the result directly and deal with the embarrassly parallel tasks by creating a simple Threadpool.
For knowing the job queue is empty i.e. closing the file I am thinking of using a CountdownLatch. Which should work for many options.

Hmm. How do you feel about just using a java.util.concurrent.Semaphore?
Semaphore gate = new Semaphore(10);
Runnable release = gate::release; // java 8 syntax.
Iterator<URL> work = ...;
while(work.hasNext() && gate.acquire()) {
ListenableFuture f = ThirdPartyLibrary.doWork(work.next());
f.addListener( release, MoreExecutors.sameThreadExecutor() );
}
You could add other listeners maybe by using Futures.addCallback(ListenableFuture, FutureCallback) to do something with the results, as long as you're careful to release() on both success and error.
It might work.

Your option 3 sounds reasonable. If you want to cleanly detect when all requests have completed, one simple approach is to create a new SettableFuture to represent completion.
When your callback tries to take the next request from the queue, and finds it to be empty, you can call set() on the future to notify anything that's waiting for all requests to complete. Propagating exceptions from the individual request futures is left as an exercise for the reader.

Use a FixedSizePool for embarrassedly parallel tasks and .get() the future's result immediately.
This simplifies the code and allows each worker to have modifiable context.

Threads processing a batch job in servlet enviornment

I have a Spring-MVC, Hibernate, (Postgres 9 db) Web app. An admin user can send in a request to process nearly 200,000 records (each record collected from various tables via joins). Such operation is requested on a weekly or monthly basis (OR whenever the data reaches to a limit of around 200,000/100,000 records). On the database end, i am correctly implementing batching.
PROBLEM: Such a long running request holds up the server thread and that causes the the normal users to suffer.
REQUIREMENT: The high response time of this request is not an issue. Whats required is not make other users suffer because of this time consuming process.
MY SOLUTION:
Implementing threadpool using Spring taskExecutor abstraction. So i can initialize my threadpool with say 5 or 6 threads and break the 200,000 records into smaller chunks say of size 1000 each. I can queue in these chunks. To further allow the normal users to have a faster db access, maybe I can make every runnable thread sleep for 2 or 3 secs.
Advantages of this approach i see is: Instead of executing a huge db interacting request in one go, we have a asynchronous design spanning over a larger time. Thus behaving like multiple normal user requests.
Can some experienced people please give their opinion on this?
I have also read about implementing the same beahviour with a Message Oriented Middleware like JMS/AMQP OR Quartz Scheduling. But frankly speaking, i think internally they are also gonna do the same thing i.e making a thread pool and queueing in the jobs. So why not go with the Spring taskexecutors instead of adding a completely new infrastructure in my web app just for this feature?
Please share your views on this and let me know if there is other better ways to do this?
Once again: the time to completely process all the records in not a concern, whats required is that normal users accessing the web app during that time should not suffer in any way.

You can parallelize the tasks and wait for all of them to finish before returning the call. For this, you want to use ExecutorCompletionService which is available in Java standard since 5.0
In short, you use your container's service locator to create an instance of ExecutorCompletionService
ExecutorCompletionService<List<MyResult>> queue = new ExecutorCompletionService<List<MyResult>>(executor);
// do this in a loop
queue.submit(aCallable);
//after looping
queue.take().get(); //take will block till all threads finish
If you do not want to wait then, you can process the jobs in the background without blocking the current thread but then you will need some mechanism to inform the client when the job has finished. That can be through JMS or if you have an ajax client then, it can poll for updates.
Quartz also has a job scheduling mechanism but, Java provides a standard way.
EDIT:
I might have misunderstood the question. If you do not want a faster response but rather you want to throttle the CPU, use this approach
You can make an inner class like this PollingThread where batches containing java.util.UUID for each job and the number of PollingThreads are defined in the outer class. This will keep going forever and can be tuned to keep your CPUs free to handle other requests
class PollingThread implements Runnable {
#SuppressWarnings("unchecked")
public void run(){
Thread.currentThread().setName("MyPollingThread");
while (!Thread.interrupted()) {
try {
synchronized (incomingList) {
if (incomingList.size() == 0) {
// incoming is empty, wait for some time
} else {
//clear the original
list = (LinkedHashSet<UUID>)
incomingList.clone();
incomingList.clear();
}
}
if (list != null && list.size() > 0) {
processJobs(list);
}
// Sleep for some time
try {
Thread.sleep(seconds * 1000);
} catch (InterruptedException e) {
//ignore
}
} catch (Throwable e) {
//ignore
}
}
}
}

Huge-db-operations are usually triggered at wee hours, where user traffic is pretty less. (Say something like 1 Am to 2 Am.. ) Once you find that out, you can simply schedule a job to run at that time. Quartz can come in handy here, with time based triggers. (Note: Manually triggering a job is also possible.)
The processed result could now be stored in different table(s). (I'll refer to it as result tables) Later when a user wants this result, the db operations would be against these result tables which have minimal records and hardly any joins would be involved.
instead of adding a completely new infrastructure in my web app just for this feature?
Quartz.jar is ~ 350 kb and adding this dependency shouldn't be a problem. Also note that there's no reason this need to be as a web-app. These few classes that do ETL could be placed in a standalone module.The request from the web-app needs to only fetch from the result tables
All these apart, if you already had a master-slave db model(discuss on that with your dba) then you could do the huge-db operations with the slave-db rather than the master, which normal users would be pointed to.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.