We are currently doing a discussion on our architecture.
The idea is that we have a database and multiple processing units. One transaction does include the following steps:
Query an entry based on a flag
Request exclusive lock for this entry
Do the processing
Update the Flag and some other columns
Release lock and commit transaction
But what happens if the second processing unit queries an entry, while the first one does hold a lock?
Updating the flag during transaction does not do the job due to transaction isolation and dirty read
In my opinion the possible results on this situation are:
The second processing unit gets the same entry and an exception is raised during the lockrequest
The second one get the next available entry and everything is fine.
The second processing unit gets the same entry.
However, whether the exception will be thrown or the lock acquisition will be blocked until the lock is released by the first transaction depends on the way you are asking for the lock in the second transaction (for example, with timeout or NO WAIT or something similar).
The second scenario (The second one gets the next available entry) is a bit harder to implement. Concurrent transactions are isolated from each other, so they basically see the same snapshot of data until one of them commits.
You can take a look at some database specific features, like Oracle Advanced Queue, or you could change the approach you read data (for example, read them in batches and then dispatch the processing to multiple threads/transactions). All of this highly depends on what exactly you are solving, are there any processing order constraints, failure/rollback/retry handling, etc.
Related
I'm using a PostUpdateEventListener registered via
registry.appendListeners(EventType.POST_COMMIT_UPDATE, listener)
and a few other listeners in order to track changes made by Hibernate. This works perfectly, however, I see a problem there:
Let's say, for tracking some amount by id, I simply execute
amountByIdConcurrentMap.put(id, amount);
on every POST_COMMIT_UPDATE (let's ignore other operations). The problem is that this call happens some time after the commit. So with two commits writing the same entity shortly one after the other, I can receive the events in the wrong order, ending up with the older amount stored.
Is this really possible or are the operations synchronized somehow?
Is there a way how to prevent or at least detect such situation?
Two questions and a proposal later
Are you sure, that you need this optimization. Why not fetch the amount as it is written to the database by querying there. What gives you reason to work with caching.
How do you make sure, that the calculation of the amount before writing it to the database is properly synchronized, so that multiple threads or probably nodes do not use old data to calculate the amount and therefore overwrite the result of a later calculation?
I suppose you handle question number 2 right. Then you have to options:
Pessimistic locking, that means that immediatly before the commit you can exclusively update your cache without concurrency issues.
Optimistic locking: In that case you have a kind of timestamp or counter in your database-record you can also put into the cache together with the amount. This value you can use to find out, what the more recent value is.
No, there are no ordering guarantees, so you'll have to take care to ensure proper synchronization manually.
If the real problem you are solving is caching of entity state and if it is suitable to use second-level cache for the entity in question, then you would get everything out of the box by enabling the L2 cache.
Otherwise, instead of updating the map from the update listeners directly, you could submit tasks to an Executor or messaging system that would asynchronously start a new transaction and select for update the amount for the given id from the database. Then update the map in the same transaction while holding the corresponding row lock in the db, so that map updates for the same id are done serially.
Right now, I am thinking of implementing multi-threading to take tasks corresponding to records in the DB tables. The tasks will be ordered by created date. Now, I am stuck to handle the case that when one task (record) being taken, other tasks should skip this one and chase the next one.
Is there any way to do this? Many thanks in advance.
One solution is to make a synchronized pickATask() method and free threads can only pick a task by this method.
this will force the other free threads to wait for their order.
synchronized public NeedTask pickATask(){
return task;
}
According to how big is your data insertion you can either use global vectorized variables or use a table in the database itself to record values like (string TASK, boolean Taken, boolean finished, int Owner_PID).
By using the database to check the status you tend to accomplish a faster code in large scale, but if do not have too many threads or this code will run just once the (Synchronized) global variable approach may be a better solution.
In my opinion if you create multiple thread to read from db and every thread involve in I/O operation and some kind of serialization while reading row from same table.In my mind this is not scallable and also some performance impact.
My solution will be one thread will be producer which will read the row in batch and create task and submit the task to execution (will be thread pool of worker to do the actual task.)Now we have two module which can be scallable independently.In producer side if required we can create multiple thread and every thread will read some partition data.For an example Thread 1 will read 0-100 and thread 2 read 101-200.
It depends on how you manage your communication between java and DB. Are you using direct jdbc calls, Hibernate, Spring Data or any other ORM framework. In case you use just JDBC you can manage this whole issue on your DB level. you will need to configure your DB to lock your record upon writing. I.e. once a record was selected for update no-one can read it until the update is finished.
In case that you use some ORM framework (Such as Hibernate for example) the framework allows you to manage concurrency issues. See about Optimistic and Pessimistic locking. Pessimistic locking does approximately what is described above - Once the record is being updated no-one can read it until the update is finished. Optimistic one uses versioning mechanism, and then multiple threads can try to update the record but only the first one succeeds and the rest will get an exception saying that they are now working with stale data and they should read the record again. The versioning mechanism is to add a version column that is usually a number or sometimes timestamp. Each thread reads the record and upon update it checks if the version in DB still the same. If so it means no-ne else updated the record and upon update the version is changed (incremented or current timestamp is set). If the version changed then someone else already updated the record since it was read and so this thread has stale record and should not be allowed to update it. Optimistic locking shows better performance in environment where reading heavily outnumbers writing
I have an application that works with database table like
Id, state, procdate, result
When there is a need to process some data, the app sets state to PROCESSING. After processing the result of processing is being set to result column and the state goes to STANDBY.
To do the first set to PROCESSING I start the transaction, do select for update, then update the state and procdate.
Then I do the work and using selection for update update the state and the result.
The processing may take up to 5 minutes. The state switching is needed to see how many rows are in progress. The problem is that another request for processing may occur and it has to wait until the first processing will end.
So I want to keep row locked. If I will make the select for update for locking just after I commit the processing state the second request may intercept and lock the row.
So how can I both keep the locking and commit the changes?
You'll need to handle this with your design. Here is an idea.
You records initially have a status, say 'READY', and a processing id, null initially.
When you start, update the status to 'PROCESSING', and update the id to a value for the job run, this can come from a sequence within Oracle, such that it is unique for your process run. commit.
the process runs, with the same id, and selects the status 'PROCESSING' and the same as it's defined processing id. Complete processing, update status to 'COMPLETE' (or 'STANDBY' as you have it). Commit.
This allows a second process to select new 'READY' records and set them for its own processing without interference with the already running process.
Here are two approaches I have taken. (I provide a third, but have never had to take that approach.)
1) Why not exit the transaction after committing the changes.
2) If option 1 is not viable, then you could simply:
COMMIT the changes
attempt to re-acquire the lock, if you fail, leave the screen, else just continue.
3) If it is absolutely imperative that no one can ever acquire the lock in the middle of a commit... you could actually lock another object. I will admit, I have never had to take this approach, but it would be as follows:
Initial phase
LOCK GLOBALOBJECT
Attempt to Acquire record lock for table
UNLOCK GLOBALOBJECT
Test to see if record lock was attained
Phase for committing the change
LOCK GLOBALOBJECT
COMMIT change
Acquire record lock for table
UNLOCK GLOBALOBJECT
Test to see if record lock was attained (Should never happen...)
I have never needed this kind of logic, and I really do not like it since it requires a GLOBAL locking object for this table. Again, it depends on your code, and the criticality of someone being able to commit changes while still being in the transaction.
However, just make sure you are not gold-plating your code when simply exiting out of the transaction after commiting a change would be fine for your stakeholders.
We use Spring and Hibernate in our project and has a layered Architechture. Controller -> Service -> Manager -> Dao. Transactions start in the Manager layer. A method in the service layer which updates an object in the db is called by many threads and this is causing to throw a stale object expection. So I made this method Synchronized and still see the stale object exception thrown. What am I doing wrong here? Any better way to handle this case?
Thanks for the help in advance.
The stale object exception is thrown when an entity has been modified between the time it was read and the time it's updated. This can happen inside a single transaction, but may also happen when you read an object in a transaction, modify it (in the controller layer, for example), then start another transaction and merge/update it (in this case, minutes or hours can separate the read and the update).
The exception is thrown to help you avoid conflicts between users.
If you don't care about conflicts (i.e. the last update always wins and replaces what the previous ones have written), then don't use optimistic locking. If you're concerned about conflicts, then StaleObjectExceptions will happen, and you should popup a meaningful message to the end user, asking him to reload the data and try to modify it again. There's no way to avoid them. You must just be optimistic and hope that they won't happen often.
Note that your synchronized trick will work only if
the exception happens only when reading and writing in the same transaction
updates to the entity are only made by this service
your application is not clustered.
It might also reduce the throughput dramatically, because you forbid any concurrent updates, regardless of which entities are updated by the concurrent transactions. It's like if you locked the whole table for the duration of the whole transaction.
My guess is that you would need to configure optimistic locking on the Hibernate side.
I have a problem when I try to persist objects using multiple threads.
Details :
Suppose I have an object PaymentOrder which has a list of PaymentGroup (One to Many relationship) and PaymentGroup contains a list of CreditTransfer(One to Many Relationship again).
Since the number of CreditTransfer is huge (in lakhs), I have grouped it based on PaymentGroup(based on some business logic)
and creating WORKER threads(one thread for each PaymentGroup) to form the PaymentOrder objects and commit in database.
The problem is, each worker thread is creating one each of PaymentOrder(which contains a unique set of PaymentGroups).
The primary key for all the entitties are auto generated.
So there are three tables, 1. PAYMENT_ORDER_MASTER, 2. PAYMENT_GROUPS, 3. CREDIT_TRANSFERS, all are mapped by One to Many relationship.
Because of that when the second thread tries to persist its group in database, the framework tries to persist the same PaymentOrder, which previous thread committed,the transaction fails due to some other unique field constraints(the checksum of PaymentOrder).
Ideally it must be 1..n..m (PaymentOrder ->PaymentGroup-->CreditTransfer`)
What I need to achieve is if there is no entry of PaymentOrder in database make an entry, if its there, dont make entry in PAYMENT_ORDER_MASTER, but only in PAYMENT_GROUPS and CREDIT_TRANSFERS.
How can I ovecome this problem, maintaining the split-master-payment-order-using-groups logic and multiple threads?
You've got options.
1) Primitive but simple, catch the key violation error at the end and retry your insert without the parents. Assuming your parents are truly unique, you know that another thread just did the parents...proceed with children. This may perform poorly compared to other options, but maybe you get the pop you need. If you had a high % parents with one child, it would work nicely.
2) Change your read consistency level. It's vendor specific, but you can sometimes read uncommitted transactions. This would help you see the other threads' work prior to commit. It isn't foolproof, you still have to do #1 as well, since another thread can sneak in after the read. But it might improve your throughput, at a cost of more complexity. Could be impossible, based on RDBMS (or maybe it can happen but only at DB level, messing up other apps!)
3) Implement a work queue with single threaded consumer. If the main expensive work of the program is before the persistence level, you can have your threads "insert" their data into a work queue, where the keys aren't enforced. Then have a single thread pull from the work queue and persist. The work queue can be in memory, in another table, or in a vendor specific place (Weblogic Queue, Oracle AQ, etc). If the main work of the program is before the persistence, you parallelize THAT and go back to a single thread on the inserts. You can even have your consumer work in "batch insert" mode. Sweeeeeeeet.
4) Relax your constraints. Who cares really if there are two parents for the same child holding identical information? I'm just asking. If you don't later need super fast updates on the parent info, and you can change your reading programs to understand it, it can work nicely. It won't get you an "A" in DB design class, but if it works.....
5) Implement a goofy lock table. I hate this solution, but it does work---have your thread write down that it is working on parent "x" and nobody else can as it's first transaction (and commit). Typically leads to the same problem (and others--cleaning the records later, etc), but can work when child inserts are slow and single row insert is fast. You'll still have collisions, but fewer.
Hibernate sessions are not thread-safe. JDBC connections that underlay Hibernate are not thread safe. Consider multithreading your business logic instead so that each thread would use it's own Hibernate session and JDBC connection. By using a thread pool you can further improve your code by adding ability of throttling the number of the simultaneous threads.