I have an application built from a set of microservices. One service receives data, persists it via Spring JPA and Eclipse link and then sends an alert (AMQP) to a second service.
Based on specific conditions, the second service then calls a RESTfull web service against the persisted data to retrieve the saved information.
I have noticed that sometimes the RESTfull service returns a null data set even though the data has been previously saved. Looking at the code for the persisting service, save has been used instead of saveandflush so I assume that data is not being flushed fast enough for the downstream service to query.
Is there are cost with saveandflush that I should be weary of or should is it reasonable to use it by default?
Would it ensure immediacy of data availability to downstream applications?
I should say that the original persistence function is wrapped in #Transactional
Possible Prognosis of the Problem
I believe the issue here has nothing to do with save vs. saveAndFlush. The problem seems related to the nature of Spring #Transactional methods, and a wrongful use of these transactions within a distributed environment that involves both your database and an AMQP broker, and perhaps, add to that toxic mix, some fundamental misunderstandings of how JPA context works.
In your explanation, you seem to imply that you start your JPA transaction within a #Transactional method, and during the transaction (but before it has committed), you send messages to an AMQP broker. Later, on the other side of the queue, a consumer application gets the messages and makes a REST service invocation. At this point, you notice that the transactional changes from the publisher side have not yet been committed to the database and therefore are not visible to the consumer side.
The problem seems to be that you propagate those AMQP messages within your JPA transaction before it has committed to disk. By the time the consumer reads a message and process it, your transaction from the publishing side may not be finished yet. So those changes are not visible to the consumer application.
If your AMPQ implementation is Rabbit, then I have seen this problem before. When you start a #Transactional method that uses a database transaction manager, and within that method, you use a RabbitTemplate to send a corresponding message.
If your RabbitTemplate is not using a transacted channel (i.e., channelTransacted=true), then your message is delivered before the database transaction has committed. I believe that by enabling transacted channels (disabled by default) in your RabbitTemplate, you solve part of the problem.
<rabbit:template id="rabbitTemplate"
connection-factory="connectionFactory"
channel-transacted="true"/>
When the channel is transacted, then the RabbitTemplate "joins" the current database transaction (which apparently is a JPA transaction). Once your JPA transaction commits, it runs some epilogue code that also commits the changes in your Rabbit channel, which forces the actual "sending" of the message.
About save vs. saveAndFlush
You might think that flushing the changes in your JPA context should have solved the problem, but you'd be wrong. Flushing your JPA context just forces the changes in your entities (at that point just in memory) to be written to disk. However, they are still written to disk within a corresponding database transaction, which won't commit until your JPA transaction commits. That happens at the end of your #Transactional method (and unfortunately some time after you had already sent your AMQP messages — if you don't use a transacted channel as explained above).
So, even if you flush your JPA context, your consumer application won't see those changes (as per classical database isolation level rules) until your #Transactional method has finished in your publisher application.
When you invoke save(entity), the EntityManager needs not to synchronize any changes right away. Most JPA implementations just mark the entities as dirty in memory and wait until the last minute to synchronize all changes with the database and commit those changes at the database level.
Note: there are cases in which you may want some of those changes to go down to disk right away and not until the whimsical EntityManager decides to do so. A classic example of this happens when there is a trigger in a database table that you need it to run to generate some additional records that you will need later during your transaction. So you force a flush of the changes to disk such that the trigger is forced to run.
By flushing the context, you’re merely forcing a synchronization of changes in memory to disk, but this does not imply an instant database commit of those modifications. Hence, those changes you flush won't necessarily be visible to other transactions. Most likely, they won't, based on traditional database isolation levels.
The 2PC Problem
Another classical problem here is that your database and your AMQP broker are two independent systems. If this is about Rabbit, then you don't have a 2PC (two-phase commit).
So you may want to account for interesting scenarios, e.g., your database transaction successfully commits. Still, then Rabbit fails to commit your message, in whose case you will have to repeat the entire transaction, possibly skipping the database side effects and just re-attempting to send the message to Rabbit.
You should probably read this article on Distributed transactions in Spring, with and without XA, particularly the section about chain transactions is helpful to address this problem.
They suggest a more complex transaction manager definition. For example:
<bean id="jdbcTransactionManager" class="org.springframework.jdbc.datasource.DataSourceTransactionManager">
<property name="dataSource" ref="dataSource"/>
</bean>
<bean id="rabbitTransactionManager" class="org.springframework.amqp.rabbit.transaction.RabbitTransactionManager">
<property name="connectionFactory" ref="connectionFactory"/>
</bean>
<bean id="chainedTransactionManager" class="org.springframework.data.transaction.ChainedTransactionManager">
<constructor-arg name="transactionManagers">
<array>
<ref bean="rabbitTransactionManager"/>
<ref bean="jdbcTransactionManager"/>
</array>
</constructor-arg>
</bean>
And then, in your code, you just use that chained transaction manager to coordinate both your database transactional part and your Rabbit transactional part.
Now, there is still the potential that you commit your database part, but that your Rabbit transaction part fails.
So, imagine something like this:
#Retry
#Transactional("chainedTransactionManager")
public void myServiceOperation() {
if(workNotDone()) {
doDatabaseTransactionWork();
}
sendMessagesToRabbit();
}
In this manner, if your Rabbit transactional part failed for any reason, and you were forced to retry the entire chained transaction, you would avoid repeating the database side effects and simply make sure to send the failed message to Rabbit.
At the same time, if your database part fails, then you never sent the message to Rabbit, and there would be no problems.
Alternatively, if your database side effects are idempotent, then you can skip the check, just reapply the database changes, and just re-attempt to send the message to Rabbit.
The truth is that initially, what you are trying to do seems deceivingly easy, but once you delve into the different problems and understand them, you realize it is a tricky business to do this the right way.
Related
Looking for an architectural pattern to solve the following problem.
In my architecture, I have a Stateless EventDispatcher EJB that implements:
public void dispatchEvent(MyEvent ev)
This method is called by a variety of other EJBs in their business methods. The purpose of my EventDispatcher is to hide the complexity of how events are dispatched (be it JMS or other mechanism).
For now let's assume my bean is using JMS. So it simply looks at the event passed it, and builds JMS messages and dispatches them to the right topic. It can produce several JMS messages and they are only sent if the surrounding transaction ends up being committed successfully (XA transaction).
Problem: I may be looking at transactions where I send thousands of individual messages. Some messages might become invalid because of other things that happened in a transaction (object Updated and then later Deleted). So I need a good deal of logic to "scrub" messages based on a context, and make a final decision on if it is one big JMS batch message or multiple small ones.
Solutions: What I would like to is use some sort of "TransactionalContext" object and use it in my Stateless EJB to "Buffer" all the events. Then I need a callback of some sort to tell me the transaction is about to commit. This is something similar to how we use EntityManager, i can make changes to entities, and it holds onto changes and is shared between stateless EJBs. At "flush" time (transaction complete) it does its logic to figure out what SQL to execute. I need a TransactionContext available to my stateless bean that has a unique session per transaction, and, has a callback as the transaction is about to complete.
What would you do?
Note that I am NOT in a valid CDI context, some of these transactions are starting because of #Schedule timers. Other transactions begin because of JMS MDBs.
I believe the thing I am looking for is the TransactionSynchronizationRegistry.
http://docs.oracle.com/javaee/5/api/javax/transaction/TransactionSynchronizationRegistry.html#putResource(java.lang.Object
I find the similar question here but didn't find a clear answer on the transaction management for back end (database)
My current project is to create producer/consumer and let consumer to digest JMS message and persist in database. Because the back end of the application is managed by JPA, so it is critical to maintain the whole process transactional. My question is what is the downside if place #Transactional annotation on the classic onMessage method? Is there any potential performance challenge if do so?
The only problem may be if the whole queue process takes too long and the connection closes in the middle of the operation. Apart of this, if you enable the transaction for the whole queue process rather than per specific services methods, then theoretically the performance should be the same.
It would be better to enable two phase commit (also known as XA transaction) for each queue process. Then, define each specific service method as #Transactional and interact with your database as expected. At the end, the XA transaction will perform all the commits done by the #Transactional service methods. Note that using this approach does affect your performance.
This question follows directly from a previous question of mine in SO. I am still unable to grok the concept of JMS sessions being used as a transactional unit of work .
From the Java Message Service book :
The QueueConnection object is used to create a JMS Session object
(specifically, a Queue Session), which is the working thread and
transactional unit of work in JMS. Unlike JDBC, which requires a
connection for each transactional unit of work, JMS uses a single
connection and multiple Session objects. Typically, applications will
create single JMS Connection on application startup and maintain a
pool of Session objects for use whenever a message needs to be
produced or consumed.
I am unable to understand the meaning of the phrase transactional unit of work. A plain and simple explanation with an example is what I am looking for here.
A unit of work is something that must complete all or nothing. If it fails to complete it must be like it never happened.
In JTA parlance a unit of work consists of interactions with a transactional resource between a transaction.begin() call and a transaction.commit() call.
Lets say you define a unit of work that pulls a message of a source queue, inserts a record in a database, and puts the another message on a destination queue. In this scenario transaction aware resources are the two JMS queues and the database.
If a failure occurs after the database insert then a number of things must happen to achieve atomicity. The database commit must be rolled back so you don't have an orphaned record in the datasource and the message that was pulled off the source queue must be replaced.
The net out in this contrived scenario is that regardless of where a failure occurs in the unit of work the result is the exact state that you started in.
The key to remember about messaging systems is that a more global transaction can be composed of several smaller atomic transactional handoffs queue to queue.
Queue A -> Processing Agent -> Queue B --> Processing Agent --> Queue C
While in this scenario there isn't really a global transactional context(for instance rolling a failure in B->C all the way back to A) what you do have is garauntees that messages will either be delivered down the chain or remain in their source queues. This makes the system consistent at any instant. Exception states can be handled by creating error routes to achieve a more global state of consistency.
A series of messages of which all or none are processed/sent.
Session may be created as transacted. For a transacted session on session.commit() all messages which consumers of this session have received are committed, that is received messages are removed from their destinations (queues or topics) and messages that all producers of this session have sent become visible to other clients. On rollback received messages are returned back to their destinations, sent messages removed from destination. All sent / received messages until commit / rollback are one unit of work.
On a Java EE server using CMT, I am using ehcache to implement a caching layer between the business object layer (EJBs) and the Data Access layer (POJOs using JDBC). I seem to be experiencing a race condition between two threads accessing the same record while using a self-populating Ehcache. The cache is keyed on the primary key of the record.
The scenario is:
The first thread updates the record in the database and removes the record from cache (but the database commit doesn't necessarily happen immediately - there may be other queries to follow.)
The second thread reads the record, causing the cache to be re-populated.
The first thread commits transaction.
This is all happening in a fraction of a second. It results in the cache being out of sync with the database, and subsequent reads of the record returning the stale cached data until another update is performed, or the entry expires from the cache. I can handle stale data for short periods (the typical length of a transaction), but not minutes, which is how long I would like to cache objects.
Any suggestions for avoiding this race condition?
UPDATE:
Clearing the cache after the transaction has committed would certainly be ideal. The question is, in a J2EE environment using CMT, when the caching layer is sandwiched between the business layer (stateless session EJBs) and the data access layer, how to do this?
To be clear about the constraints this imposes, the method call in question may or may not be in the same transaction as additional method calls that happen before or after. I can't force a commit (or do this work in a separate transaction) since that would change the transaction boundaries from what the client code expects. Any subsequent exceptions would not roll back the entire transaction (unneseccarily clearing the cache in this case is an acceptable side-effect). I can't control the entry points into the transaction, as it is essentially an API that clients can use. It is not reasonable to push the resonsiblity of clearing the cache to the client application.
I would like to be able to defer any cache clearing operations until the entire transaction is committed by the EJB container, but I have found no way to hook into that logic and run my own code with a stateless session bean.
UPDATE #2:
The most promising solution so far, short of a major design change, is to use ehcache 2.0's JTA support: http://ehcache.org/documentation/apis/jta
This means upgrading to ehcache 2.x and enabling XA transactions for the database as well, which could potentially have negative side-effects. But it seems like the "right" way.
You are using transactions - it makes more sense to remove the cache after the commit, that is when the change really happens.
That way you see the old data only during the length of the transaction, and all reads afterwards have the latest view.
Update: Since this is CMT specific, you should look at the SessionSynchronization interface, and it's afterCompletion() method. This is showed in this tutorial.
I have set of batch/cron jobs in Java that call my service classes. I'm using Hibernate and Spring as well.
Originally the batch layer was always creating an outer transaction, and then the batch job will call a service to get a list of objects from the DB w/ the same session, then call a service to process each object separately. Theres a tx-advice set for my service layer to rollback on any throwable. So if on the 5th object theres an exception, the first 4 objects that were processed gets rolled back too because they were all part of the same transaction.
So i was thinking this outer transaction created in the batch layer was unnecessary. I removed that, and now i call a service to get a list of objects. THen call another service to process each object separately, and if one of those objects fail, the other ones will still persist because its a new transaction/session for each service call. But the problem I have here now is after getting a list of objects, when i pass each object to a service to process, if i try to get one of the properties i get a lazy initialization error because the session used to load that object (from the list) is closed.
Some options i thought of were to just get a list of IDs in the batch job and pass each id to a service and the service will retrieve the whole object in that one session and process it. Another one is to set lazy loading to false for that object's attributes, but this would load everything everytime even if sometimes the nested attributes aren't needed.
I could always go back to the way it was originally w/ the outer transaction around every batch job, and then create another transaction in the batch job before each call to the service for processing each individual object...
What's the best practice for something like this?
Well I would say that you listed every possible option except OpenSessionInView. That would keep your session alive across transactions, but it's difficult to implement properly. So difficult that it's considered an AntiPattern by many.
However, since you're not implementing a web interface and you aren't dealing with a highly threaded environment, I would say that's the way to go. It's not like you're passing entities to views. Your biggest fear is an N+1 call to the database while iterating through a collection, but since this is a cron job, performance may not be a major issue when compared with code cleanliness. If you're really worried about it, just make sure you get all of your collections via a call to a DAO who can do a select *.
Additionally, you were effectively doing an Open Session In View before when you were doing everything in the same transaction. In Spring, Sessions are opened on a per transaction basis, so keeping a transaction open a long period of time is effectively the same as keeping a Session open a long period of time. The only real difference in your case will be the fact that you can commit periodically without fear of a lazy initialization error down the road.
Edit
All that being said, it takes a bit of time to set up an Open Session in View, so unless you have any particular issues against doing everything in the same transaction, you might consider just going back to that.
Also, I just noticed that you mentioned opening a transaction in the batch layer and then opening "mini transactions" in the Service layer. This is most emphatically NOT a good idea. Spring's annotation driven transactions will piggyback on any currently open transaction in the session. This means that transactions that are supposed to be read-only will suddenly become read-write if the currently open transaction is read-write. Additionally, the Session won't be flushed until the outermost transaction is finished anyways, so there's no point in marking the Service layer with #Transactional. Putting #Transactional on multiple layers only lends to a false sense of security.
I actually blogged about this issue some time ago.