I have a Java EE web application. Now when a particular request comes (say /xyz url patter) I want to do complex procesing as follows
Each of the following 3 steps are very complex and takes time.
Get data from one table from DB.Table has huge data and querying takes time.
Make a web service call to some other webserive A and get its data.
Make another web service call to some otheer webserice B and get its data .
Do some processing by using output of 1, 2, 3
1, 2, and 3 are independent of each other so can be called in parallel.
Now the questions are:
Can I do operations 1, 2, and 3 in three separate threads?
Is it advisable to create 3 threads for each request?
Should I use thread pooling?
To address your first question I go through the 4 steps:
Yes, if the database driver you are using allows concurrent access, respectively is safe to use from different threads.
A web service is normally designed to deal with different requests at the same time so this should work as well, the question here is how many threads you want to use (and how long it takes to process one request) and whether the web service will guard itself against too many requests at once.
The same applies here.
Yes, but you have to do synchronization here, as in: wait until all threads have received their results. You can realize this with a java.util.concurrent.CyclicBarrier
Second question
That depends on your data and especially how fast the web services will answer, you should try it out.
Third question Definitively, that's what they are for. This will also help you to structure your application.
1) Can i do operations 1 ,2 and 3 in three separate threads?
Yes, you can.
2) Is it advisable to create 3 threads for each request?
As long as these things don't depend on each other, and as long as you're not depending on getting these in the same transaction, then it seems like it should be ok. You will have to handle the case where one or more threads don't succeed, of course. You'll need a separate watchdog thread to cancel the threads if they take too long or if one comes back with a failure.
3) Should I use thread pooling?
Regardless of what else you do, whenever you use threads you should use a pool. That way if there's a problem where threads don't complete or go into some bad state or otherwise become unavailable, you protect your application from running out of threads.
Related
Angular 4 application sends a list of records to a Java spring MVC application that has been deployed in Websphere 8 Servlet container. The list is then inserted into to a temp table. After the batch insert, a procedure call is made in order to do some calculations and return results. Depending on the size of the list that was inserted into temp table it may take anywhere between: 3000ms( N ~ 500 ), 6000ms( N ~ 1000 ), 50,000+ms ( N > 2000 ).
My asendach would be to create chunks of data and simultaneously send them to database for processing. After threads (Futures) return results I would aggregate them and return back to the client. To sum up, I would split a synchronous call into multiple asynchronous processes(simultaneously executed) and return back to the client over the same thread that initiated HTTP call - landed into my controller.
Everything would be fine and I would not be asking this questions if a more experienced colleague of mine was not strongly disagreeing with this approach. His reasoning is that using this approach is prone to exceptions due to thread interrupts / timeouts / semaphores and so on. Hi is going as far as saying that multithreading should be avoided within a web container because it can crash the Servlet container in case it runs out of threads.
He proposes that we should have the browser send multiple AJAX requests and aggregates/present data in chunks.
Can you please help me understand which approach is better and why?
I would say that your approach is much better.
Threads created by application logic aren't application container threads and limited only by operating system. While each AJAX request uses a thread from application container. So the second approach reduces throughput and increases the possibility of reaching application container limit while and the first one not. Performance also should be considered because it's much cheaper to create a thread than to send a request over network. Plus each network requests uses additional resources for authentication/authorization/encryption etc.
It's definetely harder to write correct multithread code and it can easily prone to errors. However it shouldn't stop you from doing it because concurrency can significantly increase your performance. It's pretty straightforward to handle interrupts and timeouts using Future and you for sure don't need semaphores here.
Exposing this logic to client looks like breaking of encapsulation. Imagine that you use rest api which forces you to send multiple request by splitting you data in chunks. What chunk size should i use? How to deal with timeouts/interrupts? How many requests should i sent? etc. You will have almost the same challenges in both approaches, but it's much easier to deal with them using specially designed for this libraries like ExecutorService and Future.
I would like to make a question to the comunity and get as many feedbacks as possible about an strategy I have been thinking, oriented to resolve some issues of performance in my project.
The context:
We have an important process that perform 4 steps.
An entity status change and its persistence
If 1 ends OK. Entity is exported into a CSV file.
If 2 ends OK. Entity is exported into another CSV. This one with way more Info.
If 3 ends OK. The last CSV is sent by mail
Steps 1 and 2 are linked and they are critical.
Steps 3 and 4 are not critical. Doesn't even care if they ends successfully.
Performance of 1-2 is fine, but 3-4 in some escenarios are just insanely slow. Mostly cause step 3.
If we execute all the steps as a sequence, some times step 3 causes a timeout. Client do not get any response about steps 1 and 2 (the important ones) and user don't know whats going on.
This case made me think in JMS queues in order to delegate the last 2 steps to another app/process. Deallocate the notification from the business logic. Second export and mailing will be processed when posible and probably in parallel. I could also split it in 2 queues: exports, mail notification.
Our webapp runs into a WebLogic 11 cluster, so I could use its implementation.
What do you think about the strategy? Is WebLogic JMS implementation anything good? Should I check another implementation? ActiveMQ, RabbitMQ,...
I have also thinking on tiketing system implementation with spring-tasks.
At this point I have to point at spring-batch. Its usage is limited. We have already so many jobs focused on important processes of data consolidation and the window of time for allocation of more jobs is limited. Plus the impact of to try to process all items massively at once.
May be we could if we find out a way to use the multithread of spring-batch but we didn't find yet the way to fit oír requirements into such strategy.
Thank you in advance and excuse my english. I promise to keep working hard on it :-).
One problem to consider is data integrity. If step n fails, does step n-1 need to be reversed? Is there any ordering dependencies that you need to be aware of? And are you writing to the same or different CSV? If the same, then might have contention issues.
Now, back to the original problem. I would consider Java executors, using 4 fixed-sized pools and move the task through the pools as successes occur:
Submit step 1 to pool 1, getting a Future back, which will be used to check for completion.
When step 1 completes, you submit step 2 to pool 2.
When step 2 completes, you now can return a result to the caller. The call to this point has been waiting (likely with a timeout so it doesn't hang around forever) but now the critical tasks are done.
After returning to the client, submit step 3 to pool 3.
When step 3 completes, submit step to pool 4.
The pools themselves, while fixed sized, could be larger for pool 1/2 to get maximum throughput (and to get back to your client as quickly as possible) and pool 3/4 could be smaller but still large enough to get the work done.
You could do something similar with JMS, but the issues are similar: you need to have multiple listeners or multiple threads per listener so that you can process at an appropriate speed. You could do steps 1/2 synchronously without a pool, but then you don't get some of the thread management that executors give you. You still need to "schedule" steps 3/4 by putting them on the JMS queue and still have listeners to process them.
The ability to recover from server going down is key here, but Executors/ExecutorService has not persistence, so then I'd definitely be looking at JMS (and then I'd be queuing absolutely everything up, even the first 2 steps) but depending on your use case it might be overkill.
Yes, an event-driven approach where a message bus makes the integration sounds good. They are asynch so you will not have timeout. Of course you will need to use a Topic. WLS has some memory issues when you have too many messages in the server, maybe a different server would work better for separation of concerns and resources.
This question already has answers here:
Does multi-threading improve performance? How?
(2 answers)
Closed 8 years ago.
I have a List<Object> objectsToProcess.Lets say it contains 1000000 item`s. For all items in the array you then process each one like this :
for(Object : objectsToProcess){
Go to database retrieve data.
process
save data
}
My question is : would multi threading improve performance? I would of thought that multi threads are allocated by default by the processor anyways?
In the described scenario, given that process is a time-consuming task, and given that the CPU has more than one core, multi-threading will indeed improve the performance.
The processor is not the one who allocates the threads. The processor is the one who provides the resources (virtual CPUs / virtual processors) that can be used by threads by providing more than one execution unit / execution context. Programs need to create multiple threads themselves in order to utilize multiple CPU cores at the same time.
The two major reasons for multi-threading are:
Making use of multiple CPU cores which would otherwise be unused or at least not contribute to reducing the time it takes to solve a given problem - if the problem can be divided into subproblems which can be processed independently of each other (parallelization possible).
Making the program act and react on multiple things at the same time (i.e. Event Thread vs. Swing Worker).
There are programming languages and execution environments in which threads will be created automatically in order to process problems that can be parallelized. Java is not (yet) one of them, but since Java 8 it's on a good way to that, and Java 9 maybe will bring even more.
Usually you do not want significantly more threads than the CPU provides CPU cores, for the simple reason that thread-switching and thread-synchronization is overhead that slows down.
The package java.util.concurrent provides many classes that help with typical problems of multithreading. What you want is an ExecutorService to which you assign the tasks that should be run and completed in parallel. The class Executors provides factor methods for creating popular types of ExecutorServices. If your problem just needs to be solved in parallel, you might want to go for Executors.newCachedThreadPool(). If your problem is urgent, you might want to go for Executors.newWorkStealingPool().
Your code thus could look like this:
final ExecutorService service = Executors.newWorkStealingPool();
for (final Object object : objectsToProcess) {
service.submit(() -> {
Go to database retrieve data.
process
save data
}
});
}
Please note that the sequence in which the objects would be processed is no longer guaranteed if you go for this approach of multithreading.
If your objectsToProcess are something which can provide a parallel stream, you could also do this:
objectsToProcess.parallelStream().forEach(object -> {
Go to database retrieve data.
process
save data
});
This will leave the decisions about how to handle the threads to the VM, which often will be better than implementing the multi-threading ourselves.
Further reading:
http://docs.oracle.com/javase/tutorial/collections/streams/parallelism.html#executing_streams_in_parallel
http://docs.oracle.com/javase/8/docs/api/java/util/concurrent/package-summary.html
Depends on where the time is spent.
If you have a load of calculations to do then allocating work to more threads can help, as you say each thread may execute on a separate CPU. In such a situation there is no value in having more threads than CPUs. As Corbin says you have to figure out how to split the work across the threads and have responsibility for starting the threads, waiting for completion and aggregating the results.
If, as in your case, you are waiting for a database then there can be additional value in using threads. A database can serve several requests in paraallel (the database server itself is multi-threaded) so instead of coding
for(Object : objectsToProcess){
Go to database retrieve data.
process
save data
}
Where you wait for each response before issuing the next, you want to have several worker threads each performing
Go to database retrieve data.
process
save data
Then you get better throughput. The trick though is not to have too many worker threads. Several reasons for that:
Each thread is uses some resources, it has it's own stack, its own
connection to the database. You would not want 10,000 such threads.
Each request uses resources on the server, each connection uses memory, each database server will only serve so many requests in parallel. You have no benefit in submitting thousands of simultaneous requests if it can only server tens of them in parallel. Also If the database is shared you probably don't want to saturate the database with your requests, you need to be a "good citizen".
Net: you will almost certainly get benefit by having a number of worker threads. The number of threads that helps will be determined by factors such as the number of CPUs you have and the ratio between the amount of processing you do and the response time from the DB. You can only really determine that by experiment, so make the number of threads configurable and investigate. Start with say 5, then 10. Keep your eye on the load on the DB as you increase the number of threads.
My question is "is Google appengine single threaded? .Now when i ask that i know that i cannot start my own threads by using threading in java .But we can start threads using backend.
I am concerned about threading with request to how requests are handled.I read someonewhere that in appengine each request is queued and then served one by one.And i can configure the max time for which a request can be queued.If time to server request exceeds max time then new instance is created.
So what if i want to use single instance (free quota).
If i get multiple requests as r1 , r2 ,r3,r4 (in this order).Then will each of the requests be served one after other (in case of single instance)?
If i create multiple instances when the load increases and new instance is created dynamically will the data that is present in main memory of instance one will it be cloned to instance too?
Will the data in 2 instances in synch all the time?
Agree with what Nick said, but also want to point out that this statement:
"Now when i ask that i know that i cannot start my own threads by using threading in java"
is no longer true. For more details, see the section about threads here:
https://developers.google.com/appengine/docs/java/runtime#The_Sandbox
So, in summary, App Engine is multi-threaded in a couple of ways:
- requests can be handled concurrently by a single instance using a thread per request
- a single request may explicitly start additional threads
As stated in the docs, you can enable concurrent requests on your Java app, in which case multiple threads will be spawned, each of which handles requests independently.
Instances are not cloned off already running instances, nor are they synchronized in any way - you are expected to write your code in a manner that doesn't depend on specific mutable instance state.
I am confused about the applicability of multi threading in general...
I am creating an application which executes some code which has been saved in xml format. The work is to use apache http client and retrieve some data from websites...More than 1 website can be visited by one block of code in xml...
Now I want that if 2 users have created their own respective codes and saved them in XML, then each user's 'job' (ie block of code in xml format) runs in a separate thread.
I have with me code to execute one user's code...Now I want that multiple persons' code can be run in parallel. But I have some doubts--
(1) The Apache HTTP client provides a way of multithreaded communication, currently I am simply using the default HTTP client- this same client can be made to visit multiple websites, one after the other- as per code block in xml. Am I correct in thinking that I do not need to change my code so that it uses the recommended multithreaded communication?
(2) I am thinking of creating a servlet that when invoked, executes one block of xml code. So to execute 2 blocks of code as given by 2 different users, I will have to invoke this servlet twice. I am going to deploy this application using Amazon Elastic Beanstalk, so what I am confused about is, do I need to use multi threading at all in my program? Can I not simply invoke the existing code (which is used to execute one block of code at a time) from the servlet? And I do want to keep processing of the different blocks of XML code separate from each other, so I dont think I should use multi threading here.. Am I correct in my assumption?
Running it one after the other as per your 1st option will not be considered 'concurrent' .
Coming to the servlet method , the way you describe it will work concurrently , but you also need to think about how many users concurrently ? Since for each user , there would be a separate request , there would be some network latency involved for multiple calls. You need to think about all these factors before going ahead with this option
Since you have the code for one user's job , you can define a thread class which has userid as an attribute. In the run() method call the code for a particular user's job.
Now create two threads and set the appropriate userid for each thread and spawn them off.
If the number of users are more , you can look at using Java's Thread Pool Executor .
Since you are going to use a servlet container then it's going to manage multithreading for you. Every servlet request will be executed in a different thread. In that scenario one servlet call would execute on block of code from provided XML in a single threaded manner. If there are several sites declared per block of code they would be visited serially. Other user in the same time may call the same server with other block of code running in parallel with the first one.