This question follows directly from a previous question of mine in SO. I am still unable to grok the concept of JMS sessions being used as a transactional unit of work .
From the Java Message Service book :
The QueueConnection object is used to create a JMS Session object
(specifically, a Queue Session), which is the working thread and
transactional unit of work in JMS. Unlike JDBC, which requires a
connection for each transactional unit of work, JMS uses a single
connection and multiple Session objects. Typically, applications will
create single JMS Connection on application startup and maintain a
pool of Session objects for use whenever a message needs to be
produced or consumed.
I am unable to understand the meaning of the phrase transactional unit of work. A plain and simple explanation with an example is what I am looking for here.
A unit of work is something that must complete all or nothing. If it fails to complete it must be like it never happened.
In JTA parlance a unit of work consists of interactions with a transactional resource between a transaction.begin() call and a transaction.commit() call.
Lets say you define a unit of work that pulls a message of a source queue, inserts a record in a database, and puts the another message on a destination queue. In this scenario transaction aware resources are the two JMS queues and the database.
If a failure occurs after the database insert then a number of things must happen to achieve atomicity. The database commit must be rolled back so you don't have an orphaned record in the datasource and the message that was pulled off the source queue must be replaced.
The net out in this contrived scenario is that regardless of where a failure occurs in the unit of work the result is the exact state that you started in.
The key to remember about messaging systems is that a more global transaction can be composed of several smaller atomic transactional handoffs queue to queue.
Queue A -> Processing Agent -> Queue B --> Processing Agent --> Queue C
While in this scenario there isn't really a global transactional context(for instance rolling a failure in B->C all the way back to A) what you do have is garauntees that messages will either be delivered down the chain or remain in their source queues. This makes the system consistent at any instant. Exception states can be handled by creating error routes to achieve a more global state of consistency.
A series of messages of which all or none are processed/sent.
Session may be created as transacted. For a transacted session on session.commit() all messages which consumers of this session have received are committed, that is received messages are removed from their destinations (queues or topics) and messages that all producers of this session have sent become visible to other clients. On rollback received messages are returned back to their destinations, sent messages removed from destination. All sent / received messages until commit / rollback are one unit of work.
Related
I want to use SAGA pattern in my Spring Boot Microservices. For example in order of customer, when the order created, an event like OrderCreatedEvent produced and then in customer microservice the listener on OrderCreatedEvent Update the customer credit and produce CreditUpdateEvent and ... .
I use session transacted JmsTemplate for event producing. In javadoc of JmsTemplate said that the JMS transaction commited after the main transaction:
This has the effect of a local JMS transaction being managed alongside the main transaction (which might be a native JDBC transaction), with the JMS transaction committing right after the main transaction.
Now My question is how can I handle below scenario:
The main transaction committed (for example order recored committed) and system was unable to commit the JMS transaction (for any reason).
I want to use SAGA instead of two phase commit but I think just SAGA move the problem from order and customer service to order service and JMS provider.
SAGA hints the issue:
There are also the following issues to address:
...
In order to be reliable, a service must atomically update its database and publish an event. It cannot use the traditional mechanism of a distributed transaction that spans the database and the message broker. Instead, it must use one of the patterns listed below.
...
The following patterns are ways to atomically update state and publish events:
Event sourcing
Application events
Database triggers
Transaction log tailing
Event Sourcing is special in this list as it brings radical change on how your system stores and processes data. Usually, systems store only the current state of the entities. Some systems add explicit support for historical states with validity periods and/or bitemporal data.
Systems which are based on Event Sourcing store the sequence of events instead of entity state in a way that allows it to reconstruct the state from events. There is only one transactional resource to maintain - event store - so there is no need to coordinate transactions.
Other patterns in the list avoid the issue of transaction coordination by requiring the event producer code to commit all changes - both entities state and events (as entities) - to the single data store. Then a dedicated, but separate mechanism - event publisher - is implemented to fetch the events from the data store and publish them to the event consumers.
Event publisher would need to keep the track of published / unpublished events which usually brings back the problem of coordinated transactions. That's were the idempotency of the event consumers comes to light. Event publisher will replay events from the last known position while consumers will ignore duplicates.
You may also reverse the active / passive aspects of the event producer and event consumer. Event producer stores entities state and events (as entities) to the single data store and provides an endpoint which allows event consumer to access event streams. Each event consumer keeps track of processed / unprocessed events - which it needs to do anyway for idempotency reasons - but only for event streams it is interested about. A really good explanation of this approach is given in the book REST in Practice - chapters 7 and 8.
With SAGA you would like to split or reorder your transaction (tx) steps in 3 phases:
Tx steps for which you can have a compensating action. For each T1..N you have a
C1..N
Tx steps that cannot be compensating. If they fail then you trigger previously
defined C1..N
Retriable Tx steps that always succeed.
SAGAs are not ACID, only ACD. You need to implement yourself isolation, to prevent dirty reads. Usually with a locking.
Why SAGA? To avoid synchronous runtime coupling and pour availability. You wait for the last participant to commit.
It's quite a hefty price to pay.
The chance is small but still you can end up with inconsistent events that might be used to source an aggregate.
I am trying to understand the best use of RabbitMQ to satisfy the following problem.
As context I'm not concerned with performance in this use case (my peak TPS for this flow is 2 TPS) but I am concerned about resilience.
I have RabbitMQ installed in a cluster and ignoring dead letter queues the basic flow is I have a service receive a request, creates a persistent message which it queues, in a transaction, to a durable queue (at this point I'm happy the request is secured to disk). I then have another process listening for a message, which it reads (not using auto ack), does a bunch of stuff, writes a new message to a different exchange queue in a transaction (again now happy this message is secured to disk). Assuming the transaction completes successfully it manually acks the message back to the original consumer.
At this point my only failure scenario is is I have a failure between the commit of the transaction to write to my second queue and the return of the ack. This will lead to a message being potentially processed twice. Is there anything else I can do to plug this gap or do I have to figure out a way of handling duplicate messages.
As a final bit of context the services are written in java so using the java client libs.
Paul Fitz.
First of all, I suggest you to look a this guide here which has a lot of valid information on your topic.
From the RabbitMQ guide:
At the Producer
When using confirms, producers recovering from a channel or connection
failure should retransmit any messages for which an acknowledgement
has not been received from the broker. There is a possibility of
message duplication here, because the broker might have sent a
confirmation that never reached the producer (due to network failures,
etc). Therefore consumer applications will need to perform
deduplication or handle incoming messages in an idempotent manner.
At the Consumer
In the event of network failure (or a node crashing), messages can be
duplicated, and consumers must be prepared to handle them. If
possible, the simplest way to handle this is to ensure that your
consumers handle messages in an idempotent way rather than explicitly
deal with deduplication.
So, the point is that is not possibile in any way at all to guarantee that this "failure" scenario of yours will not happen. You will always have to deal with network failure, disk failure, put something here failure etc.
What you have to do here is to lean on the messaging architecture and implement if possibile "idempotency" of your messages (which means that even if you process the message twice is not going to happen anything wrong, check this).
If you can't than you should provide some kind of "processed message" list (for example you can use a guid inside every message) and check this list every time you receive a message; you can simply discard them in this case.
To be more "theorical", this post from Brave New Geek is very interesting:
Within the context of a distributed system, you cannot have
exactly-once message delivery.
Hope it helps :)
I think plenty of (Spring in my case) applications using JMS may follow this workflow:
Database A ===> Producer ===> JMS Queue ===> Consumer ===> Database B
then reliability is a concern. Let's say if when a data record in Database A should always be marked as delivered, when the message contains the data record is truely consumed and persist the data in Database B. Then there are questions:
From my knowledge, currently JMS protocol does not define any functions to send acknowledgement from consumer to producer, but only to MOM, so the actual consumer-to-producer acknowledgement methods vary by JMS provider. So does it mean there is no way to develop a mechanism for such acknowledgement that can work for generally all JMS products(ActiveMQ, WebSphere MQ and Jboss MQ)?
Consider the scenario of a blackout, then does it make the messages in the queue just evaporate so need to resend? or different JMS products can pick up what is left, as the messages are Serialized, so that missing message can be only caused by transaction management or async/sync configuration but not because of application server is down?
JMS guarantees the delivery of the message by nature, if the message is posted, then it will delivered to a consumer if there is one, whatever happen, the MOM is designed to ensure this fact. Anyway, delivered does not necessary mean processed.
Reliability is ensured by various mechanism :
the first one is the persistence of message in the queue (the queue AND the message must be flagged as persistent, which is the default value) which ensure that message will not be lost in case of system interruption.
then you have the acknowledgement and the retry policy, message will be kept in the queue until consumer acknowledge it and in case of transacted session, will be redelivered until consumer effectively processed the message or max retry is reached. Failed message can then be redirected to a dead letter queue for analysis.
To ensure the coherency between the two datasources you have to use XA transaction at least on the producer side (you have at least 2 resources implied in the transaction database A and JMS queue) in order to guarantee that the message will not be posted to the queue if the commit in database A fails or the database will not be updated if the post to the queue fails. Message consumption should be transacted too to ensure redelivery in case of rollback.
The transaction boundaries will never include both consumer and producer because it conflicts with the asynchronous nature of the messaging system, you can't afford to lock the resources on the producer side until the consumer process the message because you have no guarantee on when it will happen.
NB : in the event that your database does not support XA (or to improve performance) and if you have only 2 resources implied in the transaction (database and JMS queue) you can have a look to Logging Last Resource Transaction Optimization
1) From my experience with queue managers (MQ Series, ActiveMQ and HornetQ) I never needed this kind of acknowledgement between producer/consumer. Also the environment that I used to deal with, the traffic was about 50/60 million per day of objects on several queues. And the queues are all persisted as well.
2) In my case, using the persistence mechanism on queue manager was totally sufficient to handle a blackout scenario. I used disk persistence on MQ Series and HornetQ.
However, sometimes to ack the amount of messages, we developed some mechanisms to compare Database A with Database B, to be sure that messages were consumed as well. I don't know if JMS architecture should provide this kind of mechanism, because such task could decrease the performance.
It's something - in my point of view - that you have to measure on your system architecture how important is to match this information, because it's not that easy to keep.
Regards.
If I understand your question, this seems like a case for JTA/XA transactions (as long as your DB/JMS vendors support them). Spring TX managers can help make the tx management (more) vendor agnostic.
FYI, I use Apache Camel for this type of flow which has pretty good error handling across producers/consumers.
I want to send a batch of 20k JMS messages to a same queue. I'm splitting the task up using 10 threads, so each will be processing 2k messages. I don't need transactions.
I was wondering if having one connection, one session, and 10 producers is the recommended way to go or not?
How about if I had one producer shared by all the threads? Would my messages be corrupt or would it be sent out synchronized (giving no performance gain)?
What's the general guideline of deciding whether to create a new connection or session if I'm always connecting to the same queue?
Thank you and sorry for asking a lot at once.
(Here's a similar question, but it didn't quite answer what I was looking for. Long lived JMS sessions. Is Keeping JMS connections / JMS sessions allways open a bad pratice? )
Is it OK if some of the messages are duplicated or lost? When the JMS client connects to the JMS broker over the network there are three phases to any API call.
The API call, including any message data, is transmitted over the wire to the broker.
The API call is executed by the broker.
The result code and any message data is transmitted back to the client.
Consider the producer for a minute. If the connection is broken in the first step then the broker never got the message and the app would need to send it again. If the connection is broken in the third step then the message has been successfully sent and sending it again would produce a duplicate message. The app cannot tell the difference between these and so the only safe choice is to resend the message on error. If the session is transacted the message can be safely resent in all cases because if the original had made it to the broker, it will be rolled back.
Consider the consumer. If the connection is lost in the third step then the message is deleted from the queue but never made it back to the client. But if the session is transacted the message will be redelivered when the application reconnects.
Outside of transactions there is the possibility of lost or duplicate messages. Inside of a transaction the same window of ambiguity exists but it is on the COMMIT call rather then the PUT or GET. With transacted sessions it is possible to send or receive a message twice but not to lose one.
The JMS spec recognizes this window of ambiguity and provides the following guidance:
If a failure occurs between
the time a client commits its work on
a Session and the commit method
returns, the client cannot determine
if the transaction was committed or
rolled back. The same ambiguity exists
when a failure occurs between the
non-transactional send of a PERSISTENT
message and the return from the
sending method.
It is up to a JMS application to deal
with this ambiguity. In some cases,
this may cause a client to produce
functionally duplicate messages.
A message that is redelivered due to
session recovery is not considered a
duplicate message.
JMS sessions should always be transacted except for cases where it really is OK to lose messages. If the sessions are transacted then you'd need session and connection per-thread due to the JMS thread model.
Any advice about performance impacts would be vendor-specific but in general persistent messages outside of syncpoint are hardened to disk before the API call returns. But a transacted call can return before the persistent message is written to disk so long as the message is persisted before the COMMIT returns. If the vendor optimizes based on this, then it is much more performant to write several messages to disk and then commit them in batches. This allows the broker to optimize writes and disk flushes by disk block rather than per-message. The number of messages to put in the transaction decreases with the size of the message and beyond a certain message size dwindles back down to one.
If your 20k messages are relatively small (measured in k and not mb) then you probably want to use transacted sessions per thread and tune the commit interval.
In most scenarios it is sufficient to work with one connection and multiple sessions, using one session per thread. In some environments you can gain additional performance by using multiple connections:
Some messaging systems support a cluster mode, where connections get loadbalanced to different nodes. With multiple connections you can use the performance of multiple nodes in this scenario. (which of course does only help, when the bottleneck is on the side of the message broker).
The best solution would be to us a pool of connections, and give the administrator some options to configure the behaviour in the specific area.
I was wondering if having one connection, one session, and 10 producers
is the recommended way to go or not?
Sure but point to note here is that you are using single thread only i.e the one which you create while creating Session object. All 10 producers are bound to this Session Object and consequently to same thread.
How about if I had one producer shared by all the threads? Would my messages
be corrupt or would it be sent out synchronized (giving no performance gain)?
Very bad idea I would say. JMS specs clearly say Session should not be shared by more than one thread. It is not thread safe.
What's the general guideline of deciding whether to create a new connection
or session if I'm always connecting to the same queue?
If your System supports multithreading then you can create multiple sessions(each session corresponds to single thread) from a single connection. Each session than can have multiple producers/consumer but all these must not be shared among threads.
From what I investigat this topic, one session means one thread. This is based on JMS specs. If you want the multiple threading (multiple producers/consumers), multiple sessions needs to be created, one connection is fine.
In theory Connections are thread-safe but all the others are not, so you should create one session per thread.
In reality, it depends on the JMS implementation you are using.
Edited Question : I am working on a multithreaded JMS receiver and publisher code (stand alone multithreaded java application). MOM is MQSonic.
XML message is received from a Queue, stored procedures(takes 70 sec to execute) are called and response is send to Topic within 90 sec.
I need to handle a condition when broker is down or application is on scheduled shutdown. i.e. a condition in which messages are received from Queue and are being processed in java, in the mean time both Queue and Topic will be down. Then to handle those messages which are not on queue and not send to topic but are in java memory, I have following options:
(1) To create CLIENT_ACKNOWLEDGE session as :
connection.createSession(false, javax.jms.Session.CLIENT_ACKNOWLEDGE)
Here I will acknowledge message only after the successful completion of transactions(stored procedures)
(2) To use transacted session i.e., connection.createSession(true, -1). In this approach because of some exception in transaction (stored procedure) the message is rolled back and Redelivered. They are rolled back again and again and continue until I kill the program. Can I limit the number of redelivery of jms messages from queue?
Also in above two approached which one is better?
The interface progress.message.jclient.ConnectionFactory has a method setMaxDeliveryCount(java.lang.Integer value) where you can set the maximum number of times a message will be redelivered to your MessageConsumer. When this number of times is up, it will be moved to the SonicMQ.deadMessage queue.
You can check this in the book "Sonic MQ Application Programming Guide" on page 210 (in version 7.6).
As to your question about which is better... that depends on whether the stored procedure minds being executed multiple times. If that is a problem, you should use a transaction that spans the JMS queue and the database both (Sonic has support for XA transactions). If you don't mind executing multiple times, then I would go for not acknowledging the message and aborting the processing when you notice that the broker is down (when you attempt to acknowledge the message, most likely). This way, another processor is able to handle the message if the first one is unable to do so after a connection failure.
If the messages take variable time to process, you may also want to look at the SINGLE_MESSAGE_ACKNOWLEDGE mode of the Sonic JMS Session. Normally, calling acknowledge() on a message also acknowledges all messages that came before it. If you're processing them out of order, that's not what you want to happen. In single message acknowledge mode (which isn't in the JMS standard), acknowledge() only acknowledges the message on which it is called.
If you are worried about communicating with a message queue/broker/server/etc that might be down, and how that interrupts the overall flow of the larger process you are trying to design, then you should probably look into a JMS queue that supports clustering of servers so you can still reliably produce/consume messages when individual servers in the cluster go down.
Your question isn't 100% clear, but it seems the issue is that you're throwing an exception while processing a message when you really shouldn't be.
If there is an actual problem with the message, say the xml is malformed or it's invalid according to your data model, you do not want to roll back your transaction. You might want to log the error, but you have successfully processed that message, it's just that "success" in this case means that you've identified the message as problematic.
On the other hand, if there is a problem in processing the message that is caused by something external to the message (e.g. the database is down, or the destination topic is unavailable) you probably do want to roll the transaction back, however you also want to make sure you stop consuming messages until the problem is resolved otherwise you'll end up with the scenario you've described where you continually process the same message over and over and fail every time you try to access whatever resource is currently unavailable.
Without know what messaging provider you are using, I don't know whether this will help you.
MQ Series messages have a backout counter, that can be enabled by configuring the harden backout counter option on the queue.
When I have previously had this problem , I do as follows:
// get/receive message from queue
if ( backout counter > n ) {
move_message_to_app_dead_letter_queue();
return;
}
process_message();
The MQ series header fields are accessible as JMS properties.
Using the above approach would also help if you can use XA transactions to rollback or commit the database and the queue manager simultaneously.
However XA transactions do incur a significant performance penalty and with stored proc's this probably isn't possible.
An alternative approach would be to write the message immediately to a message_table as a blob, and then commit the message from the queue.
Put a trigger on the message_table to invoke the stored proc, and then add the JMS response mechanism into the stored proc.