I have a method
#Transactional
public void updateSharedStateByCommunity(List[]idList)
This method is called from the following REST API:
#RequestMapping(method = RequestMethod.POST)
public ret_type updateUser(param) {
// call updateSharedStateByCommunity
}
Now the ID lists are very large, like 200000, When I try to process it, then it takes lots of time and on client side timeout error occurred.
So, I want to split it to two calls with list size of 100000 each.
But, the problem is, it is considered as 2 independent transactions.
NB: The 2 calls is an example, it can be divided to many times, if number ids are more larger.
I need to ensure two separate call to a single transaction. If any one of the 2 calls fails, then it should rollback to all operation.
Also, in the client side, we need to show progress dialog, so I can't use only timeout.
The most obvious direct answer to your question IMO is to slightly change the code:
#RequestMapping(method = RequestMethod.POST)
public ret_type updateUser(param) {
updateSharedStateByCommunityBlocks(resolveIds);
}
...
And in Service introduce a new method (if you can't change the code of the service provide an intermediate class that you'll call from controller with the following functionality):
#Transactional
public updateSharedStatedByCommunityBlocks(resolveIds) {
List<String> [] blocks = split(resolveIds, 100000); // 100000 - bulk size
for(List<String> block :blocks) {
updateSharedStateByCommunity(block);
}
}
If this method is in the same service, the #Transactional in the original updateSharedStateByCommunity won't do anything so it will work. If you'll put this code into some other class, then it will work since the default propagation level of spring transaction is "Required"
So it addresses harsh requirements: you wanted to have a single transaction - you've got it. Now all the code runs in the same transaction. Each method now runs with 100000 and not with all the ids, everything is synchronous :)
However, this design is problematic for many different reasons.
It doesn't allow to track the progress (show it to the user) as you've stated by yourself in the last sentence of the question. REST is synchronous.
It assumes that network is reliable and waiting for 30 minutes is technically not a problem (leaving alone the UX and 'nervous' user that will have to wait :) )
In addition to that, the network equipment can force closing the connection (like load balancers with pre-configured request timeout).
That's why people suggest some kind of asyncrhonous flow.
I can say that you still can use the async flow, spawn the task, and after each bulk update some shared state (in-memory in the case of a single instance) and persistent (like database in the case of cluster).
So that the interaction with the client will change:
Client calls "updateUser" with 200000 ids
Service responds "immediately" with something like "I've got your request, here is a request Id, ping me once in a while to see what happens.
Service starts an async task and process the data chunk by chunk in a single transaction
Client calls "get" method with that id and server reads the progress from the shared state.
Once ready, the "Get" methods will respond "done".
If something fails during the transaction execution, the rollback is done, and the process updates the database status with "failure".
You can also use more modern technologies to notify the server (web sockets for example), but it's kind of out of scope for this question.
Another thing to consider here: from what I know, processing 200000 objects should be done in much less then 30 minutes, its not that much for modern RDBMSs.
Of course, without knowing your use case its hard to tell what happens there, but maybe you can optimize the flow itself (using bulk operations, reducing the number of requests to db, caching and so forth).
My preferred approach in those scenarios is make the call asynchronous (Spring Boot allow this using the #Async annotation), hence the client won't expect for any HTTP response. The notification could be done via a WebSocket that will push a message to the client with the progress each X items processed.
Surely it will add more complexity to your application, but if you design the mechanism properly, you'll be able to reuse it for any other similar operation you may face in the future.
The #Transactional annotation accepts a timeout (although not all underlying implementations will support it). I would argue against trying to split the IDs into two calls, and instead try to fix the timeout (after all, what you really want is a single, all-or-nothing transaction). You can set timeouts for the whole application instead of on a per-method basis.
From technical point, it can be done with the org.springframework.transaction.annotation.Propagation#NESTED Propagation, The NESTED behavior makes nested Spring transactions to use the same physical transaction but sets savepoints between nested invocations so inner transactions may also rollback independently of outer transactions, or let them propagate. But the limitation is only works with org.springframework.jdbc.datasource.DataSourceTransactionManager datasource.
But for really large dataset, it still need more time to processing and make the client waiting, so from solution point of view, maybe using async approach will be more better but it depends on your requirement.
Related
I have a service method where I request an entity by ID from the database. If the entity has the attribute paid == false, I set it to true and do something. If paid==true it just returns.
#Override
#Transactional(rollbackFor={ServiceException.class})
public void handleIntentSucceeded(PaymentIntent intent) throws ServiceException {
LOGGER.trace("handleIntentSucceeded({})", intent);
CreditCharge charge = transactionRepository.findByPaymentIntentId(intent.getId());
if(charge.getPaid()) {
return;
// do some stuff
charge.setPaid(true);
transactionRepository.save(charge);
}
Now if there are multiple requests with the same intent at the same time, this method would no longer be consistent because, for example, the first request receives the charge with paid==false, so it does "some things" and if the second request comes to this method before the first request has saved the charge with paid==true, it would also do "some things" even if the first request already does so. Is this a correct conclusion?
To be sure that only one request can process this method at a time, to avoid "some things" being done multiple times, I could set the Transactional to #Transactional(isolation = Isolation.SERIALIZABLE). This way any request can process this method/transaction only if the request has committed the Transactional before.
Is this the best approach or is there a better way?
One solution, as already mentioned above is to use OptimisticLocking. However, an OptimisticLockingException will lead to a failed http request. If this is a problem, you can handle the exception.
But in case you are sure, that you will not run multiple instances of the application and there are not big requirements for perfomance, or you simply want to deal with the problem later and until that use a "workaround", you can make the method synchronized (https://www.baeldung.com/java-synchronized). That way, the Java runtime will ensure, that the method cannot be run in parallel.
I would probably look for a way of optimisically locking the record (e.g. using some kind of update counter), so that only the first concurrent transaction changing the paid property would complete successfully.
Any subsequent transaction which was trying to modify the same entity in the meantime would then fail, and their actions done during do some stuff would rollback.
Optimistic vs. Pessimistic locking
edit: REPEATABLE_READ isolation level (as suggested by one of the comments) might also behave similarly to optimistic locking; though this might depend on the implementation
I have a method to process large amount of files. Thing is this method will timeout when file sizes are increasing. I’m using container managed transactions for the method.
What I did was I spilted the files into lists and offer to do the operation in another method which decorated as #REQUIRES_NEW.
I’m looping through the list on a new method with a new transaction. But when something happens in a middle of the iteration only that transactions will rollback. It won’t rollback previous iterations. I want to rollback the previous iterations as well.
I can’t consider the whole operation as a one because of the timeout issue. Looking for a feedback on this.
You might consider the following approach:
When client requests for a files processing (startProcessing), server (EJB) starts background thread (e.g. quartz-scheduler.org) for handling files and returns id of the operation in progress. If neccessary then client can cancelProcessing, getProcessingStatus etc. using such id.
Check about increasing transaction timeout for your ejb method, for eg, if you are using stateless bean , you can simply annotate
#StatelessDeployment(transactionTimeout=10)
Another option is , check about EJB Asynchronous methods
https://docs.oracle.com/javaee/6/tutorial/doc/gkkqg.html
Asynchronous methods are typically used for long-running operations, for processor-intensive tasks, for background tasks, to increase application throughput, or to improve application response time . Once the process is over, you can get the status of the process.
We have web service APIs to support clients running on ten millions devices. Normally clients call server once a day. That is about 116 clients seen per second. For each client (each with unique ID), it may make several APIs calls concurrently. However, Server can only process those API calls one by one from the same client. Because, those API calls will update the same document of that client in the backend Mongodb database. For example: need to update last seen time and other embedded documents in the document of this client.
One solution I have is to put synchronized block on an "intern" object representing this client's unique ID. That will allow only one request from the same client obtains the lock and be processed at the same time. In addition, requests from other clients can be processed at the same time too. But, this solution requires to turn on load balancer's "stickiness". That means load balancer will route all requests from the same ip address to a specific server within a preset time interval (e.g. 15 minute). I am not sure if this has any impact to the robustness in the whole system design. One thing I can think of is that some clients may make more requests and make the load not balanced (create hotspots).
Solution #1:
Interner<Key> myIdInterner = Interners.newWeakInterner();
public ResponseType1 processApi1(String clientUniqueId, RequestType1 request) {
synchronized(myIdInterner.intern(new Key(clientUniqueId))) {
// code to process request
}
}
public ResponseType2 processApi2(String clientUniqueId, RequestType2 request) {
synchronized(myIdInterner.intern(new Key(clientUniqueId))) {
// code to process request
}
}
You can see my other question for this solution in detail: Should I use Java String Pool for synchronization based on unique customer id?
The second solution I am thinking is to somehow lock the document (Mongodb) of that client (I have not found a good example to do that yet). Then, I don't need to touch load balancer setting. But, I have concerns on this approach as I think the performance (round trips to Mongodb server and busy waiting?) will be much worse compared to solution #1.
Solution #2:
public ResponseType1 processApi1(String clientUniqueId, RequestType1 request) {
try {
obtainDocumentLock(new Key(clientUniqueId));
// code to process request
} finally {
releaseDocumentLock(new Key(clientUniqueId));
}
}
public ResponseType2 processApi2(String clientUniqueId, RequestType2 request) {
try {
obtainDocumentLock(new Key(clientUniqueId));
// code to process request
} finally {
releaseDocumentLock(new Key(clientUniqueId));
}
}
I believe this is very common issue in a scalable and high concurrent system. How do you solve this issue? Is there any other option? What I want to achieve is to be able to process one request at a time for those requests from the same client. Please be noted that just controlling the read/write access to database does not work. The solution need to control the exclusive processing of the whole request.
For example, there are two requests: request #1 and request #2. Request #1 read the document of the client, update one field of a sub-document #5, and save the whole document back. Request #2 read the same document, update one field of sub-document #8, and save the whole document back. At this moment, we will get an OptimisticLockingFailureException because we use #Version annotation from spring-data-mongodb to detect version conflict. So, it is imperative to process only one request from the same client at any time.
P.S. Any suggestion in selecting solution #1 (lock on single process/instance with load balancer stickiness turned on) or solution #2 (distributed lock) for a scalable, and high concurrent system design. The goal is to support tens of millions clients with concurrently hundreds of clients access the system per second.
In your solution, you are doing a lock split based on customer id so two customers can process the service same time. The only problem is the sticky session. One solution can be to use distributed lock so you can dispatch any request to any server and the server gets the lock process. Only one consideration is it involves remote calls. We are using hazelcast/Ignite and it is working very well for average number of nodes.
Hazelcast
Why not just create a processing queue in Mongodb whereby you submit client request documents, and then another server process that consumes them, produces a resulting document, that the client waits for... synchronize the data with clientId, and avoid that activity in the API submission step. The 2nd part of the client submission activity (when finished) just polls Mongodb for consumed records looking for their API / ClientID and some job tag. That way, you can scale out the API submission, and separately the API consumption activities on separate servers etc.
One obvious approach is simply to implement the full optimistic locking algorithm on your end.
That is, you get sometimes get OptimisticLockingFailureException when there are concurrent modifications, but that's fine: just re-read the document and start the modification that failed over again. You'll get the same effect as if you had used locking. Essentially you are leveraging the concurrency control already built-in to MongoDB. This also has the advantage of getting several transactions go through from the same client if they don't conflict (e.g., one is a read, or they write to different documents), potentially increasing the concurrency of your system. On other hand, you have to implement the re-try logic.
If you do want to lock on a per-client basis (or per-document or whatever else) and your server is a single process (which is implied by your suggested approach) you just need a lock manager that works on arbitrary String keys, which has several reasonable solutions including the Interner one your mentioned.
One spring service is implemented in one java deployment unit(JVM). Another spring service is implemented in another JVM. Making service call from 1st jvm to 2nd jvm. Service interface could be either rest or soap over http. Need to keep single transaction over multiple jvms, meaning if any service fails every thing must be rolled back. How to do this. Any code examples.
Use global transactions (i.e., JTA),
Use XA resources (RDBMS and JMS connections), do "Full XA with 2PC".
For further reference on the Spring transaction management, including the JTA/XA scenario, read: http://docs.spring.io/spring/docs/current/spring-framework-reference/htmlsingle/#transaction
REST faces the exact same problem as SOAP-based web services with regards to atomic transactions. There is no stateful connection, and every operation is immediately committed; performing a series of operations means other clients can see interim states.
Unless, of course, you take care of this by design. First, ask yourself: do I have a standard set of atomic operations? This is commonly the case. For example, for a banking operation, removing a sum from one account and adding the same sum to a different account is often a required atomic operation. But rather than exporting just the primitive building blocks, the REST API should provide a single "transfer" operation, which encapsulates the entire process. This provides the desired atomicity, while also making client code much simpler. This appracoh is known as low granularity services, or high-level batch operations.
If there is no simple, pre-defined set of desired atomic operation sequences, the problem is more severe. A common solution is the batch command pattern. Define one REST method to demarcate the beginning of a transaction, and another to demarcate its end (a 'commit' request). Anything sent between these sets of operations is queued by the server but not committed, until the commit request is sent.
This pattern complicates the server significantly -- it must maintain a state per client. Normally, the first operation ('begin transaction') returns a transaction ID (TID), and all subsequent operations, up to and including the commit, must include this TID as a parameter.
It is a good idea to enforce a timeout on transactions: if too much time has passed since the initial 'begin transaction' request, or since the last step, the server has the right to abort the transaction. This prevents a potential DoS attack that causes the server to waste resources by keeping too many transactions open. The client design must keep in mind that each operation must be checked for a timeout response.
It is also a good idea to allow the client to abort a transaction, by providing a 'rollback' API.
The usual care required in designing code that uses multiple concurrent transactions applies as usual in this complex design scenario. If at all possible, try to limit the use of transactions, and support high-level batch operations instead.
I take no credit of this information, i'm just a director, credit goes to This article
Also please read Transactions in REST?
You can get some handy code samples here http://www.it-soa.eu/en/resp/atomicrest/userguide/index.html
After I create a new object 'Order', I would like to get its generated ID and put it on an AMQP queue, so that a worker can do other stuff with it. The worker takes the generated ID (message) and looks up the order but complains that no record exists, even though I just created one. I am trying to figure out either how long to wait for after I call my .persist() before I put the message (generated ID) on the queue (which I dont think is a good idea); have the worker loop over and over until mysql DOES return a record (which I dont like either); or find a point where I can put the message on the queue after I know the data is safe in mysql (this sounds best). Im thinking that it needs to be done outside of any #Transactional method.
The worker that is going to read the data back out of mysql is part of a different system on a different server. So when can I tell the worker that the data is in mysql so that it can get started with its task?
Is it true that after the #Transactional method finishes the data is done being written to mysql, I am having trouble understanding this.
Thanks a million in advanced.
Is it true that after the #Transactional method finishes the data is
done being written to mysql, I am having trouble understanding this.
Thanks a million in advanced.
So first, as Kayamann and Ralf wrote in comments, it is guaranteed that data is stored and available for other processes when the transaction commits (ends)
#Transactional methods are easy to understand. When you have #Transactional method, it means that the container (application that is going to actually invoke that method) will begin the transaction before the method is invoked, and auto commit or rollback the transaction in case of success or error.
So if we have
#Transactional
public void modify(){
doSomething();
}
And when you call somewhere in the code (or invokation via contaier eg due to some bindings) the actuall frol will be as follows
tx=entityManager.beginTransaction();
object.modify();
tx.commit();
There is quite simple. Such approach will mean that transactions are Container Controlled
As four your situation, well to let your external system know that transaction has been complete, you have to either use message queue (that you are using already) with the message that transaction is complete for some id and it can start processing stuff, or use different technology, REST for example.
Remote systems can signal eachoter for various of events via queues and REST services, so there is no difference.