Handle EJB Transaction Timeouts for a loop

Handle EJB Transaction Timeouts for a loop - java

I have a method to process large amount of files. Thing is this method will timeout when file sizes are increasing. I’m using container managed transactions for the method.
What I did was I spilted the files into lists and offer to do the operation in another method which decorated as #REQUIRES_NEW.
I’m looping through the list on a new method with a new transaction. But when something happens in a middle of the iteration only that transactions will rollback. It won’t rollback previous iterations. I want to rollback the previous iterations as well.
I can’t consider the whole operation as a one because of the timeout issue. Looking for a feedback on this.

You might consider the following approach:
When client requests for a files processing (startProcessing), server (EJB) starts background thread (e.g. quartz-scheduler.org) for handling files and returns id of the operation in progress. If neccessary then client can cancelProcessing, getProcessingStatus etc. using such id.

Check about increasing transaction timeout for your ejb method, for eg, if you are using stateless bean , you can simply annotate
#StatelessDeployment(transactionTimeout=10)
Another option is , check about EJB Asynchronous methods
https://docs.oracle.com/javaee/6/tutorial/doc/gkkqg.html
Asynchronous methods are typically used for long-running operations, for processor-intensive tasks, for background tasks, to increase application throughput, or to improve application response time . Once the process is over, you can get the status of the process.

Related

How to make multiple call of #Transactional method to a single transaction

I have a method
#Transactional
public void updateSharedStateByCommunity(List[]idList)
This method is called from the following REST API:
#RequestMapping(method = RequestMethod.POST)
public ret_type updateUser(param) {
// call updateSharedStateByCommunity
}
Now the ID lists are very large, like 200000, When I try to process it, then it takes lots of time and on client side timeout error occurred.
So, I want to split it to two calls with list size of 100000 each.
But, the problem is, it is considered as 2 independent transactions.
NB: The 2 calls is an example, it can be divided to many times, if number ids are more larger.
I need to ensure two separate call to a single transaction. If any one of the 2 calls fails, then it should rollback to all operation.
Also, in the client side, we need to show progress dialog, so I can't use only timeout.

The most obvious direct answer to your question IMO is to slightly change the code:
#RequestMapping(method = RequestMethod.POST)
public ret_type updateUser(param) {
updateSharedStateByCommunityBlocks(resolveIds);
}
...
And in Service introduce a new method (if you can't change the code of the service provide an intermediate class that you'll call from controller with the following functionality):
#Transactional
public updateSharedStatedByCommunityBlocks(resolveIds) {
List<String> [] blocks = split(resolveIds, 100000); // 100000 - bulk size
for(List<String> block :blocks) {
updateSharedStateByCommunity(block);
}
}
If this method is in the same service, the #Transactional in the original updateSharedStateByCommunity won't do anything so it will work. If you'll put this code into some other class, then it will work since the default propagation level of spring transaction is "Required"
So it addresses harsh requirements: you wanted to have a single transaction - you've got it. Now all the code runs in the same transaction. Each method now runs with 100000 and not with all the ids, everything is synchronous :)
However, this design is problematic for many different reasons.
It doesn't allow to track the progress (show it to the user) as you've stated by yourself in the last sentence of the question. REST is synchronous.
It assumes that network is reliable and waiting for 30 minutes is technically not a problem (leaving alone the UX and 'nervous' user that will have to wait :) )
In addition to that, the network equipment can force closing the connection (like load balancers with pre-configured request timeout).
That's why people suggest some kind of asyncrhonous flow.
I can say that you still can use the async flow, spawn the task, and after each bulk update some shared state (in-memory in the case of a single instance) and persistent (like database in the case of cluster).
So that the interaction with the client will change:
Client calls "updateUser" with 200000 ids
Service responds "immediately" with something like "I've got your request, here is a request Id, ping me once in a while to see what happens.
Service starts an async task and process the data chunk by chunk in a single transaction
Client calls "get" method with that id and server reads the progress from the shared state.
Once ready, the "Get" methods will respond "done".
If something fails during the transaction execution, the rollback is done, and the process updates the database status with "failure".
You can also use more modern technologies to notify the server (web sockets for example), but it's kind of out of scope for this question.
Another thing to consider here: from what I know, processing 200000 objects should be done in much less then 30 minutes, its not that much for modern RDBMSs.
Of course, without knowing your use case its hard to tell what happens there, but maybe you can optimize the flow itself (using bulk operations, reducing the number of requests to db, caching and so forth).

My preferred approach in those scenarios is make the call asynchronous (Spring Boot allow this using the #Async annotation), hence the client won't expect for any HTTP response. The notification could be done via a WebSocket that will push a message to the client with the progress each X items processed.
Surely it will add more complexity to your application, but if you design the mechanism properly, you'll be able to reuse it for any other similar operation you may face in the future.

The #Transactional annotation accepts a timeout (although not all underlying implementations will support it). I would argue against trying to split the IDs into two calls, and instead try to fix the timeout (after all, what you really want is a single, all-or-nothing transaction). You can set timeouts for the whole application instead of on a per-method basis.

From technical point, it can be done with the org.springframework.transaction.annotation.Propagation#NESTED Propagation, The NESTED behavior makes nested Spring transactions to use the same physical transaction but sets savepoints between nested invocations so inner transactions may also rollback independently of outer transactions, or let them propagate. But the limitation is only works with org.springframework.jdbc.datasource.DataSourceTransactionManager datasource.
But for really large dataset, it still need more time to processing and make the client waiting, so from solution point of view, maybe using async approach will be more better but it depends on your requirement.

Single Transaction in multiple java jvms

One spring service is implemented in one java deployment unit(JVM). Another spring service is implemented in another JVM. Making service call from 1st jvm to 2nd jvm. Service interface could be either rest or soap over http. Need to keep single transaction over multiple jvms, meaning if any service fails every thing must be rolled back. How to do this. Any code examples.

Use global transactions (i.e., JTA),
Use XA resources (RDBMS and JMS connections), do "Full XA with 2PC".
For further reference on the Spring transaction management, including the JTA/XA scenario, read: http://docs.spring.io/spring/docs/current/spring-framework-reference/htmlsingle/#transaction

REST faces the exact same problem as SOAP-based web services with regards to atomic transactions. There is no stateful connection, and every operation is immediately committed; performing a series of operations means other clients can see interim states.
Unless, of course, you take care of this by design. First, ask yourself: do I have a standard set of atomic operations? This is commonly the case. For example, for a banking operation, removing a sum from one account and adding the same sum to a different account is often a required atomic operation. But rather than exporting just the primitive building blocks, the REST API should provide a single "transfer" operation, which encapsulates the entire process. This provides the desired atomicity, while also making client code much simpler. This appracoh is known as low granularity services, or high-level batch operations.
If there is no simple, pre-defined set of desired atomic operation sequences, the problem is more severe. A common solution is the batch command pattern. Define one REST method to demarcate the beginning of a transaction, and another to demarcate its end (a 'commit' request). Anything sent between these sets of operations is queued by the server but not committed, until the commit request is sent.
This pattern complicates the server significantly -- it must maintain a state per client. Normally, the first operation ('begin transaction') returns a transaction ID (TID), and all subsequent operations, up to and including the commit, must include this TID as a parameter.
It is a good idea to enforce a timeout on transactions: if too much time has passed since the initial 'begin transaction' request, or since the last step, the server has the right to abort the transaction. This prevents a potential DoS attack that causes the server to waste resources by keeping too many transactions open. The client design must keep in mind that each operation must be checked for a timeout response.
It is also a good idea to allow the client to abort a transaction, by providing a 'rollback' API.
The usual care required in designing code that uses multiple concurrent transactions applies as usual in this complex design scenario. If at all possible, try to limit the use of transactions, and support high-level batch operations instead.
I take no credit of this information, i'm just a director, credit goes to This article
Also please read Transactions in REST?
You can get some handy code samples here http://www.it-soa.eu/en/resp/atomicrest/userguide/index.html

Put #Transactional annotation to 'onMessage' method

I find the similar question here but didn't find a clear answer on the transaction management for back end (database)
My current project is to create producer/consumer and let consumer to digest JMS message and persist in database. Because the back end of the application is managed by JPA, so it is critical to maintain the whole process transactional. My question is what is the downside if place #Transactional annotation on the classic onMessage method? Is there any potential performance challenge if do so?

The only problem may be if the whole queue process takes too long and the connection closes in the middle of the operation. Apart of this, if you enable the transaction for the whole queue process rather than per specific services methods, then theoretically the performance should be the same.
It would be better to enable two phase commit (also known as XA transaction) for each queue process. Then, define each specific service method as #Transactional and interact with your database as expected. At the end, the XA transaction will perform all the commits done by the #Transactional service methods. Note that using this approach does affect your performance.

EJB timer performance

I am trying to decide if use a java-ee timer in my application or not. The server I am using is Weblogic 10.3.2
The need is: After one hour of a call to an async webservice from an EJB, if the async callback method has not been called it is needed to execute some actions. The information regarding if the callback method has been called and the date of the execution of the call is stored in database.
The two possibilities I see are:
Using a batch process that every half hour looks for all the calls that have been more than one hour without response and execute the needed actions.
Create a timer of one hour after every single call to the ws and in the #Timeout method check if the answer has come and if it has not, execute the required actions.
From a pure programming point of view, it looks easier and cleaner the second one, but I am worry of the performance issues I could have if let's say there are 100.000 Timer created at a single moment.
Any thoughts?

You would be better off having a more specialized process. The real problem is the 100,000 issue. It would depend on how long your actions take.
Because its easy to see that each second, the EJB timer would fire up 30 threads to process all of the current pending jobs, since that's how it works.
Also timers are persistent, so your EJB managed timer table will be saving and deleting 30 rows per second (60 total), this is assuming 100K transactions/hour.
So, that's an lot of work happening very quickly. I can easily see the system simply "falling behind" and never catching up.
A specialized process would be much lighter weight, could perhaps batch the action calls (call 5 actions per thread instead of one per thread), etc. It would be nice if you didn't have to persist the timer events, but that is what it is. You could almost easily simply append the timer events to a file for safety, and keep them in memory. On system restart, you can reload that file, and then roll the file (every hour create a new file, delete the older file after it's all been consumed, etc.). That would save a lot of DB traffic, but you could lose the transactional nature of the DB.
Anyway, I don't think you want to use the EJB Timer for this, I don't think it's really designed for this amount of traffic. But you can always test it and see. Make sure you test restarting your container see how well it works with 100K pending timer jobs in its table.

All depends of what is used by the container. e.g. JBoss uses Quartz Scheduler to implement EJB timer functionality. Quartz is pretty good when you have around 100 000 timer instances.

#Pau: why u need to create a timer for every call made...instead u can have a single timer thread created at start up of application which runs after every half-hour(configurable) period of time and looks in your Database for all web services calls whose response have not been received and whose requested time is past 1 hour. And for selected records, in for loop, it can execute required action.
Well above design may not be useful if you have time critical activity to be performed.
If you have spring framework in your application, you may also look up its timer services.http://static.springsource.org/spring/docs/1.2.9/reference/scheduling.html

Maybe you could use some of these ideas:
Where I'm at, we've built a cron-like scheduler which is powered by a single timer. When the timer fires the system checks which crons need to run using a Quartz CronTrigger. Generally these crons have a lot of work to do, and the way we handle that is each cron spins its individual tasks off as JMS messages, then MDBs handle the messages. Currently this runs on a single Glassfish instance and as our task load increases, we should be able to scale this up with a cluster so multiple nodes are processing the jms messages. We balance the jms message processing load for each type of task by setting the max-pool-size in glassfish-ejb-jar.xml (also known as sun-ejb-jar.xml).
Building a system like this and getting all the details right isn't trivial, but it's proving really effective.

Whats the best practice around when to start a new session or transaction for batch jobs using spring/hibernate?

I have set of batch/cron jobs in Java that call my service classes. I'm using Hibernate and Spring as well.
Originally the batch layer was always creating an outer transaction, and then the batch job will call a service to get a list of objects from the DB w/ the same session, then call a service to process each object separately. Theres a tx-advice set for my service layer to rollback on any throwable. So if on the 5th object theres an exception, the first 4 objects that were processed gets rolled back too because they were all part of the same transaction.
So i was thinking this outer transaction created in the batch layer was unnecessary. I removed that, and now i call a service to get a list of objects. THen call another service to process each object separately, and if one of those objects fail, the other ones will still persist because its a new transaction/session for each service call. But the problem I have here now is after getting a list of objects, when i pass each object to a service to process, if i try to get one of the properties i get a lazy initialization error because the session used to load that object (from the list) is closed.
Some options i thought of were to just get a list of IDs in the batch job and pass each id to a service and the service will retrieve the whole object in that one session and process it. Another one is to set lazy loading to false for that object's attributes, but this would load everything everytime even if sometimes the nested attributes aren't needed.
I could always go back to the way it was originally w/ the outer transaction around every batch job, and then create another transaction in the batch job before each call to the service for processing each individual object...
What's the best practice for something like this?

Well I would say that you listed every possible option except OpenSessionInView. That would keep your session alive across transactions, but it's difficult to implement properly. So difficult that it's considered an AntiPattern by many.
However, since you're not implementing a web interface and you aren't dealing with a highly threaded environment, I would say that's the way to go. It's not like you're passing entities to views. Your biggest fear is an N+1 call to the database while iterating through a collection, but since this is a cron job, performance may not be a major issue when compared with code cleanliness. If you're really worried about it, just make sure you get all of your collections via a call to a DAO who can do a select *.
Additionally, you were effectively doing an Open Session In View before when you were doing everything in the same transaction. In Spring, Sessions are opened on a per transaction basis, so keeping a transaction open a long period of time is effectively the same as keeping a Session open a long period of time. The only real difference in your case will be the fact that you can commit periodically without fear of a lazy initialization error down the road.
Edit
All that being said, it takes a bit of time to set up an Open Session in View, so unless you have any particular issues against doing everything in the same transaction, you might consider just going back to that.
Also, I just noticed that you mentioned opening a transaction in the batch layer and then opening "mini transactions" in the Service layer. This is most emphatically NOT a good idea. Spring's annotation driven transactions will piggyback on any currently open transaction in the session. This means that transactions that are supposed to be read-only will suddenly become read-write if the currently open transaction is read-write. Additionally, the Session won't be flushed until the outermost transaction is finished anyways, so there's no point in marking the Service layer with #Transactional. Putting #Transactional on multiple layers only lends to a false sense of security.
I actually blogged about this issue some time ago.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.