Ehcache getting out of sync with database - java

On a Java EE server using CMT, I am using ehcache to implement a caching layer between the business object layer (EJBs) and the Data Access layer (POJOs using JDBC). I seem to be experiencing a race condition between two threads accessing the same record while using a self-populating Ehcache. The cache is keyed on the primary key of the record.
The scenario is:
The first thread updates the record in the database and removes the record from cache (but the database commit doesn't necessarily happen immediately - there may be other queries to follow.)
The second thread reads the record, causing the cache to be re-populated.
The first thread commits transaction.
This is all happening in a fraction of a second. It results in the cache being out of sync with the database, and subsequent reads of the record returning the stale cached data until another update is performed, or the entry expires from the cache. I can handle stale data for short periods (the typical length of a transaction), but not minutes, which is how long I would like to cache objects.
Any suggestions for avoiding this race condition?
UPDATE:
Clearing the cache after the transaction has committed would certainly be ideal. The question is, in a J2EE environment using CMT, when the caching layer is sandwiched between the business layer (stateless session EJBs) and the data access layer, how to do this?
To be clear about the constraints this imposes, the method call in question may or may not be in the same transaction as additional method calls that happen before or after. I can't force a commit (or do this work in a separate transaction) since that would change the transaction boundaries from what the client code expects. Any subsequent exceptions would not roll back the entire transaction (unneseccarily clearing the cache in this case is an acceptable side-effect). I can't control the entry points into the transaction, as it is essentially an API that clients can use. It is not reasonable to push the resonsiblity of clearing the cache to the client application.
I would like to be able to defer any cache clearing operations until the entire transaction is committed by the EJB container, but I have found no way to hook into that logic and run my own code with a stateless session bean.
UPDATE #2:
The most promising solution so far, short of a major design change, is to use ehcache 2.0's JTA support: http://ehcache.org/documentation/apis/jta
This means upgrading to ehcache 2.x and enabling XA transactions for the database as well, which could potentially have negative side-effects. But it seems like the "right" way.

You are using transactions - it makes more sense to remove the cache after the commit, that is when the change really happens.
That way you see the old data only during the length of the transaction, and all reads afterwards have the latest view.
Update: Since this is CMT specific, you should look at the SessionSynchronization interface, and it's afterCompletion() method. This is showed in this tutorial.

Related

entityManager.flush() will clear the second level cache?

A bit confused with the entityManger.flush();
Hibernate doc for batching
https://docs.jboss.org/hibernate/orm/5.0/userguide/html_single/chapters/batch/Batching.html
"When you make new objects persistent, employ methods flush() and clear() to the session regularly, to control the size of the first-level cache."
I am working on spring boot data jpa.
The first doubt is for Restful applications first level cache is enabled or not?
Can the entityManager.flush() clears the second level cache?
Is entityManger.flush() is similar to System.gc();
1) First level cache is created per started transaction, so its always there for each transactional method.
2) entityManager.flush(), does not clear the second level cache. It also does not clear the first level cache, it enforces any changes done in the current transaction to be pushed into the physical database.
3) Is entityManger.flush() is similar to System.gc()? No, all the objects are still on the heap and are even still managed by the current persistence context.

Micro Services and Transaction Manager how to handle concurrency Issue

I am working on building a microservice which is using transaction manager implemented based on Java Transaction API(JTA).
My question is does Trasaction maanger have ability to handle concurrency issue in distributed database scenario's .
Scenario:
Assume there are multiple instance of a service running and we get two requests to update balance amount by 10 in an account. Initially an account can have $100 and the first instance gets that and increments it to $10 but has not been commited yet.
At the same time the second instance also retreive's account which is still 100 and increments it by $10 and then commits it updating balance to $110 and then service one updates account again to $110.
By this time you must have figured that balance was supposed to be incremented by $20 and not 10. Do I have to write some kind of Optimistic lock exception mechanism to prevent the above scenario or will Transaction Manager based on JTA specification already ensure such a thing will not happen ?
does Trasaction maanger have ability to handle concurrency issue in distributed database scenario's .
Transactions and concurrency are two independent concepts and though Transactions become most siginificant in context where we also see concurrency , transactions can be important without concurrency.
To answer your question : No , Transaction Manager generally does not concern itself with handling issues that arise with concurrent updates. It takes a very naive and simple ( and often most meaningful ) approach : if after the start of a transaction , it detects that the state has become inconsistent ( because of concurrent updates ) it would simply raise it as an exception and Rollback the transaction. If only it can establish that all the conditions of the ACID properties of the transaction are still valid will it commit the transaction.
For such type of requests, you can handle through Optimistic Concurrency where you would have a column on the database (Timestamp) as a reference to the version number.
Each time when a change is commited it would modify the timestamp value.
If two requests try to commit the change at the same time, only one of them will succeed as the version (Timestamp) column will change by then negating other request from comitting its changes.
The transaction manager (as implementation of the JTA specification) makes transparent a work above multiple resources. It ensures all the operations happens as a single unit of work. The "work above multiple resources" mean that that the application can insert data to database and meanwhile it sends a message to a JMS broker. Transaction manager guarantees ACID properties to be hold for this two operations. In simplistic form when the transaction finishes successfully the application developer can be sure both operation was processed. When some trouble happens is on the transaction manager to handle it - possibly throw an exception and rollback the data changes. Thus neither operation was processed.
It makes this transparent for the application developer who does not need to care to update first database and then JMS and checks if all data changes were really processed or a failure happens.
In general the JTA specification was not written with microservice architecture in mind. Now it really depends on your system design(!) But if I consider you have two microservices where each one has attached its own transaction manager then the transaction manager can't help you to sort out your concurrency issue. Transaction managers does not work (usually) in some synchronization. You don't work with multiple resources from one microservice (what is the usecase for the transaction manager) but with one resource from multiple microservices.
As there is the one resource it's the synchronization point for all you updates. It depends on it how it manages concurrency. Considering it's a SQL database then it depends on the level of the isolation it uses (ACID - I = isolation, see https://en.wikipedia.org/wiki/ACID_(computer_science)). Your particular example talks about lost update phenomena (https://vladmihalcea.com/a-beginners-guide-to-database-locking-and-the-lost-update-phenomena/). As both microservices tries to update one record. One solution for the avoiding the issue is using optimistic/pesimistic locking (you can implement it on your own by e.g. timestamps as stated above), the other is to use serializable isolation level in your database, or you can design your application for not reading and updating data based on what is read first time but change the sql query having the update atomic (or there are possibly other strategies how to work with your data model to achieve the desired outcome).
In summary - it depends on how your transaction manager is implemented, it can help you in a way but it's not its purpose. Your goal should be to check how the isolation level is set up at the shared storage and consider if your application needs to handle lost update phenomena at application level or your storage cang manage it for you.

Where does the responsibility lie to ensure the ACID properties of a transaction?

I was going through ACID properties regarding Transaction and encountered the statement below across the different sites
ACID is the acronym for the four properties guaranteed by transactions: atomicity, consistency, isolation, and durability.
**My question is specifically about the phrase.
guaranteed by transactions
**. As per my experience these properties are not taken care by
transaction automatically. But as a java developer we need to ensure that these properties criteria are met.
Let's go through for each property:-
Atomicity:- Assume when we create the customer the account should be created too as it is compulsory. So now during transaction
the customer gets created while during account creation some exception oocurs. So the developer can now go two ways: either he rolls back the
complete transaction (atomicity is met in this case) or he commits the transaction so customer will be created but not the
account (which violates the atomicity). So responsibility lies with developer?
Consistency:- Same reason holds valid for consistency too
Isolation :- as per definition isolation makes a transaction execute without interference from another process or transactions.
But this is achieved when we set the isolation level as Serializable. Otherwis in another case like read commited or read uncommited
changes are visible to other transactions. So responsibility lies with the developer to make it really isolated with Serializable?
Durability:- If we commit the transaction, then even if the application crashes, it should be committed on restart of application. Not sure if it needs to be taken care by developer or by database vendor/transaction?
So as per my understanding these ACID properties are not guaranteed automatically; rather we as a developer sjould achieve them. Please let me know
if above understanding regarding each point is correct? Would appreciate if you folks can reply for each point(yes/no will also do.
As per my understanding read committed should be most logical isolation level in most application, though it depends on requirement too.
The transactions guarantees ACID more or less:
1) Atomicity. Transaction guarantees all changes are made or none of them. But you need to manually set the start and end of a transaction and manually perform commit or rollback. Depending on the technology you use (EJB...), transactions are container-managed, setting the start and end to the whole "method" you are creating. You can control by configuration if a method invoked requires a new transaction or an existing one, no transaction...
2) Consistency. Guaranteed by atomicity.
3) Isolation. You must define the isolation level your application needs. Default value is defined depending upon the database, container... The commonest one is READ COMMITTED. Be careful with locks as can cause dead-lock depending on your logic and isolation level.
4) Durability. Managed entirely by the database. If your commit executes without error, nearly all database guarantees durability of changes, but some scenarios can cause to not guarantee that (writes to disk are cached in memory and flushed later...)
In general, you should be aware of transactions and configure it in the container of declare by code the star and end (commit, rollback).
Database transactions are atomic: They either happen in their entirety or not at all. By itself, this says nothing about the atomicity of business transactions. There are various strategies to map business transactions to database transactions. In the simplest case, a business transaction is implemented by one database transaction (where a business transaction is aborted by rolling back the database one). Then, atomicity of database transactions implies atomicity of business transactions. However, things get tricky once business transactions span several database transactions ...
See above.
Your statement is correct. Often, the weaker guarantees are sufficient to prove correctness.
Database transactions are durable (unless there is a hardware failure): if the transaction has committed, its effect will persist until other transactions change the data. However, calling code might not learn whether a transaction has comitted if the database or the network between database and calling code fails. Therefore
If we commit the transaction, then even if application crash, it should be committed on restart of application.
is wrong. If the transaction has committed, there is nothing left to do.
To summarize, the database does give strong guarantees - about the behaviour of the database. Obviously, it can not give guarantees about the behaviour of the entire application.

Synchronizing spring transaction

We use Spring and Hibernate in our project and has a layered Architechture. Controller -> Service -> Manager -> Dao. Transactions start in the Manager layer. A method in the service layer which updates an object in the db is called by many threads and this is causing to throw a stale object expection. So I made this method Synchronized and still see the stale object exception thrown. What am I doing wrong here? Any better way to handle this case?
Thanks for the help in advance.
The stale object exception is thrown when an entity has been modified between the time it was read and the time it's updated. This can happen inside a single transaction, but may also happen when you read an object in a transaction, modify it (in the controller layer, for example), then start another transaction and merge/update it (in this case, minutes or hours can separate the read and the update).
The exception is thrown to help you avoid conflicts between users.
If you don't care about conflicts (i.e. the last update always wins and replaces what the previous ones have written), then don't use optimistic locking. If you're concerned about conflicts, then StaleObjectExceptions will happen, and you should popup a meaningful message to the end user, asking him to reload the data and try to modify it again. There's no way to avoid them. You must just be optimistic and hope that they won't happen often.
Note that your synchronized trick will work only if
the exception happens only when reading and writing in the same transaction
updates to the entity are only made by this service
your application is not clustered.
It might also reduce the throughput dramatically, because you forbid any concurrent updates, regardless of which entities are updated by the concurrent transactions. It's like if you locked the whole table for the duration of the whole transaction.
My guess is that you would need to configure optimistic locking on the Hibernate side.

Whats the best practice around when to start a new session or transaction for batch jobs using spring/hibernate?

I have set of batch/cron jobs in Java that call my service classes. I'm using Hibernate and Spring as well.
Originally the batch layer was always creating an outer transaction, and then the batch job will call a service to get a list of objects from the DB w/ the same session, then call a service to process each object separately. Theres a tx-advice set for my service layer to rollback on any throwable. So if on the 5th object theres an exception, the first 4 objects that were processed gets rolled back too because they were all part of the same transaction.
So i was thinking this outer transaction created in the batch layer was unnecessary. I removed that, and now i call a service to get a list of objects. THen call another service to process each object separately, and if one of those objects fail, the other ones will still persist because its a new transaction/session for each service call. But the problem I have here now is after getting a list of objects, when i pass each object to a service to process, if i try to get one of the properties i get a lazy initialization error because the session used to load that object (from the list) is closed.
Some options i thought of were to just get a list of IDs in the batch job and pass each id to a service and the service will retrieve the whole object in that one session and process it. Another one is to set lazy loading to false for that object's attributes, but this would load everything everytime even if sometimes the nested attributes aren't needed.
I could always go back to the way it was originally w/ the outer transaction around every batch job, and then create another transaction in the batch job before each call to the service for processing each individual object...
What's the best practice for something like this?
Well I would say that you listed every possible option except OpenSessionInView. That would keep your session alive across transactions, but it's difficult to implement properly. So difficult that it's considered an AntiPattern by many.
However, since you're not implementing a web interface and you aren't dealing with a highly threaded environment, I would say that's the way to go. It's not like you're passing entities to views. Your biggest fear is an N+1 call to the database while iterating through a collection, but since this is a cron job, performance may not be a major issue when compared with code cleanliness. If you're really worried about it, just make sure you get all of your collections via a call to a DAO who can do a select *.
Additionally, you were effectively doing an Open Session In View before when you were doing everything in the same transaction. In Spring, Sessions are opened on a per transaction basis, so keeping a transaction open a long period of time is effectively the same as keeping a Session open a long period of time. The only real difference in your case will be the fact that you can commit periodically without fear of a lazy initialization error down the road.
Edit
All that being said, it takes a bit of time to set up an Open Session in View, so unless you have any particular issues against doing everything in the same transaction, you might consider just going back to that.
Also, I just noticed that you mentioned opening a transaction in the batch layer and then opening "mini transactions" in the Service layer. This is most emphatically NOT a good idea. Spring's annotation driven transactions will piggyback on any currently open transaction in the session. This means that transactions that are supposed to be read-only will suddenly become read-write if the currently open transaction is read-write. Additionally, the Session won't be flushed until the outermost transaction is finished anyways, so there's no point in marking the Service layer with #Transactional. Putting #Transactional on multiple layers only lends to a false sense of security.
I actually blogged about this issue some time ago.

Categories