EJB Spec says you shouldn't manage threads. I have seen Bean code that sends remote requests and loops with a Thread.sleep waiting for a response to reduce CPU usage. From what I understand this breaks spec. Does simply calling the logic from a separate POJO or library that is instantiated then referenced in the EJB's method fix this? Does simply removing Thread.sleep fix the issue at the cost of additional CPU consumption? How should external synchronous requests be coded in EJBs?
That depends on the business case. EJB spec provides plenty of resources for async/sync processing without boilerplate code using Thread, Runnable or any other mechanism.
To execute a piece or code asynchronously (that is, the caller won't wait for the response, but carry on), use #Asynchronous, and Future<T> if you want to listen for responses afterwords.
A synchronous call, as you called, is a call that waits for the response, so "How should external synchronous requests be coded in EJBs" is something that doesn't need any kind of asynchronous/background execution. You just make the call and the code itself wait for the response (otherwise it would be asynchronous), being the tipical case a Web Service (either REST or SOAP).
Web Services calls can actually be synchronous or asynchronous, that depends on the business case, but they are usualy synchronous, you make the call and receive a response with the data. In cases of business logic that takes a while to execute, the Web Service receives the resquest and may launch the business logic asynchronously (with an #Asynchronous for instance) and respond immediately with a plain HTTP 202 - Accepted, which basically means "Hey! The request you just sent me is gonna take a while, so I'll do it in the backround".
In that case, may be you have another web service that you need to check to see how that long lasting process is going. That is the only case I can think of in which someone will want that Thread.sleep(...) in a loop, checking the Web Service until it tells you that the process have finished.
Luckily, EJB also provides a solution for that business case:
You can use #Schedule methods in case you need to check/do something indefenately, in specific intervals: something to do every day at 02:00, or every first day of month, or even every 2 seconds.
Or TimerService and #Timeout, in case you want to programatically schedule a single task. This last fits better in the business case we are talking.
So you call the TimerService with the timespan you want to wait for the next check. When time comes the #Timeout method is fired, in which you can check whatever you need, and shcedule another execution in case you need it, even with a new timespan.
Related
I have a method
#Transactional
public void updateSharedStateByCommunity(List[]idList)
This method is called from the following REST API:
#RequestMapping(method = RequestMethod.POST)
public ret_type updateUser(param) {
// call updateSharedStateByCommunity
}
Now the ID lists are very large, like 200000, When I try to process it, then it takes lots of time and on client side timeout error occurred.
So, I want to split it to two calls with list size of 100000 each.
But, the problem is, it is considered as 2 independent transactions.
NB: The 2 calls is an example, it can be divided to many times, if number ids are more larger.
I need to ensure two separate call to a single transaction. If any one of the 2 calls fails, then it should rollback to all operation.
Also, in the client side, we need to show progress dialog, so I can't use only timeout.
The most obvious direct answer to your question IMO is to slightly change the code:
#RequestMapping(method = RequestMethod.POST)
public ret_type updateUser(param) {
updateSharedStateByCommunityBlocks(resolveIds);
}
...
And in Service introduce a new method (if you can't change the code of the service provide an intermediate class that you'll call from controller with the following functionality):
#Transactional
public updateSharedStatedByCommunityBlocks(resolveIds) {
List<String> [] blocks = split(resolveIds, 100000); // 100000 - bulk size
for(List<String> block :blocks) {
updateSharedStateByCommunity(block);
}
}
If this method is in the same service, the #Transactional in the original updateSharedStateByCommunity won't do anything so it will work. If you'll put this code into some other class, then it will work since the default propagation level of spring transaction is "Required"
So it addresses harsh requirements: you wanted to have a single transaction - you've got it. Now all the code runs in the same transaction. Each method now runs with 100000 and not with all the ids, everything is synchronous :)
However, this design is problematic for many different reasons.
It doesn't allow to track the progress (show it to the user) as you've stated by yourself in the last sentence of the question. REST is synchronous.
It assumes that network is reliable and waiting for 30 minutes is technically not a problem (leaving alone the UX and 'nervous' user that will have to wait :) )
In addition to that, the network equipment can force closing the connection (like load balancers with pre-configured request timeout).
That's why people suggest some kind of asyncrhonous flow.
I can say that you still can use the async flow, spawn the task, and after each bulk update some shared state (in-memory in the case of a single instance) and persistent (like database in the case of cluster).
So that the interaction with the client will change:
Client calls "updateUser" with 200000 ids
Service responds "immediately" with something like "I've got your request, here is a request Id, ping me once in a while to see what happens.
Service starts an async task and process the data chunk by chunk in a single transaction
Client calls "get" method with that id and server reads the progress from the shared state.
Once ready, the "Get" methods will respond "done".
If something fails during the transaction execution, the rollback is done, and the process updates the database status with "failure".
You can also use more modern technologies to notify the server (web sockets for example), but it's kind of out of scope for this question.
Another thing to consider here: from what I know, processing 200000 objects should be done in much less then 30 minutes, its not that much for modern RDBMSs.
Of course, without knowing your use case its hard to tell what happens there, but maybe you can optimize the flow itself (using bulk operations, reducing the number of requests to db, caching and so forth).
My preferred approach in those scenarios is make the call asynchronous (Spring Boot allow this using the #Async annotation), hence the client won't expect for any HTTP response. The notification could be done via a WebSocket that will push a message to the client with the progress each X items processed.
Surely it will add more complexity to your application, but if you design the mechanism properly, you'll be able to reuse it for any other similar operation you may face in the future.
The #Transactional annotation accepts a timeout (although not all underlying implementations will support it). I would argue against trying to split the IDs into two calls, and instead try to fix the timeout (after all, what you really want is a single, all-or-nothing transaction). You can set timeouts for the whole application instead of on a per-method basis.
From technical point, it can be done with the org.springframework.transaction.annotation.Propagation#NESTED Propagation, The NESTED behavior makes nested Spring transactions to use the same physical transaction but sets savepoints between nested invocations so inner transactions may also rollback independently of outer transactions, or let them propagate. But the limitation is only works with org.springframework.jdbc.datasource.DataSourceTransactionManager datasource.
But for really large dataset, it still need more time to processing and make the client waiting, so from solution point of view, maybe using async approach will be more better but it depends on your requirement.
Essentially I've written a service in Java that will do initial synchronous processing (a couple simple calls to other web services). Then, after that processing is done, I return an acknowledgement message to the caller, saying I've verified their request and there is now downstream processing happening in the background asynchronously.
In a nutshell, what I'm concerned about is the complexity of the async processing. The sum of those async calls can take up to 2-3 minutes depending on certain parameters sent. My thought here is: what if there's a lot of traffic at once hitting my service, and there are a bunch of hanging threads in the background, doing a large chunk of processing. Will there be bad data as a result? (like one request getting mixed in with a previous request etc)
The code follows this structure:
Validation of headers and params in body
Synchronous processing
Return acknowledgement message to the caller
Asynchronous processing
For #4, I've simply made a new thread and call a method that does all the async processing within it. Like:
new Thread()
{
#Override
public void run()
{
try {
makeDownstreamCalls(arg1, arg2 , arg3, arg4);
} catch (Exception e) {
e.printStackTrace();
}
}
}.start();
I'm basically wondering about unintended consequences of lots of traffic hitting my service. An example I'm thinking about: a thread executing downstream calls for request A, and then another request comes in, and a new thread has to be made to execute downstream calls for request B. How is request B handled in this situation, and what happens to request A, which is still in-progress? Will the async calls in request A just terminate in this case? Or can each distinct request, and thread, execute in parallel just fine and complete, without any strange consequences?
Well, the answer depends on your code, of which you posted a small part, so my answer contains some guesswork. I'll assume that we're talking about some sort of multi-threaded server which accepts client requests, and that those request come to some handleRequest() method which performs the 4 steps you've mentioned. I'll also assume that the requests aren't related in any way and don't affect each other (so for instance, the code doesn't do something like "if a thread already exists from a previous request then don't create a new thread" or anything like that).
If that's the case, then your handleRequest() method can be simultaneously invoked by different server threads concurrently. And each will execute the four steps you've outlined. If two requests happen simultaneously, then a server thread will execute your handler for request A, and a different one will execute it for B at the same time. If during the processing of a request, a new thread is created, then one will be created for A, another for B. That way, you'll end up with two threads performing makeDownstreamCalls(), one with A's parameters one with B's.
In practice, that's probably a pretty bad idea. The more threads your program will create, the more context-switching the OS has to do. You really don't want the number of requests to increase the number of threads in your application endlessly. Modern OSes are capable of handling hundreds or even thousands of threads (as long as they're bound by IO, not CPU), but it comes at a cost. You might want to consider using a Java executor with a limited number of threads to avoid crushing your process or even OS.
If there's too much load on a server, you can't expect your application to handle it. Process what you can within the limit of the application, and reject further request. Accepting more requests when you're fully loaded means that your application crashes, and none of the requests are processed - this is known as "Load Shedding".
From Javascript, I am calling a REST method which is computationally intensive. Would it be possible to stop that REST call, if you are no longer interested in what it returns.
I understand, it is possible to abort a request in JS. But it won't stop the thread which gets triggered due to the REST call. This is how I am aborting the ajax call in JS.
Abort Ajax requests using jQuery
The REST interface is written in Java. And internally this thread may create multiple threads also.
I would like to stop a Java thread. But from the caller. From JS, where I have triggered it.
How to properly stop the Thread in Java?
As Chris mentioned in the comments above, REST calls should be quick, definitely not an hour long. If the server needs to do a lot of work which takes considerably amount of time, you should modify your design to async. Either provide a callback that the server will use once it's done (also called push approach), or pull every few minutes, by sending a new request to the server to see if it's done.
In order to implement it you'll need the server to return a unique-id for each request in order to be able to identify in the callback/check-call what's the status of that specific request.
The unique-id should be implemented on the server-side in order to avoid two clients send the same ID - overriding each other.
In the link that I posted above you can see an example of how to implement a "stop thread" mechanism which can be implemented on the server-side and called by the client whenever is needed.
You could send a unique identifier along with your request, and then make another request that instructs the server to abort the operation started for that ID.
Is there any way to check if an async ServletRequest is completed from an AsyncContext? I saw that spring has some kind of wrapper that supports this but googling around I couldn't find anything in the standard library. That is what I was hoping for.
I am using Tomcat 7.
Sounds like one of the two - you either need a listener that will be called upon a asynchronous request completion or you don't need to use an asynchronous call.
Your question is a bit too general.
Talking generally - asynchronous calls are used when the caller is not interested in immediate result of the call.
If the caller is interested to know the result of the call immediately then synchronous calls should be used.
If the caller is not interested to know the result immediately (for example it has secondary priority, like logging in some business applications), but some action should be performed upon the end of execution of asynchronous calls you should use some sort of a listener.
What you need for asynchronous call is some listener (of class javax.servlet.AsyncListener).
In the listener you will know for sure that the asynchronous call is over (onComplete method) and may perform some action to finalize/complement the asynchronous call.
Again, if you see that the caller of the request needs to know the result upon completion immediately, there probably is a mistake in your architecture. You should use a synchronous call - just wait until the call is done and you will have the result of it. Using an asynchronous call is wrong in this situation.
I saw how people use some sort of a loop to check from time to time the result of a asynchronous call, but it looks like in 99.99% of cases such approach is the result of some architectural mistake.
You can register AsyncListener which can implement onComplete() method.
The AsyncListener needs to be added to the AsyncContext.
I have a long-running task (report) which would exceed any TCP connection timeouts before it starts returning data. Asynchronous servlets (introducted in Servlets 3.0) are exactly what I need, however I am limited to Servlet v2.4.
Are there any "roll-your-own" solutions? What I'm doing feels hacked - I kick off the task asynchronously in a thread and just return to the client immediately. The client then polls every few seconds (with ajax), and checks for a "ready" status for this task ID (a static list maintains their status and some handles to the objects processed by the thread). Once ready, I inject the output stream into the work object so the thread can write the results back to the client.
You can implement the Reverse ajax technique which means that instead of polling many times to get the response you get the response once the task has finished.
There is a quick solution to implement reverse-ajax technique by using DWR here. But you should maintain the use of the static List. If your background task business logic is complicated you can use an ESB or something more sophisticated.