ActiveMQ documentation states that Session and MessageProducer objects are not thread-safe. If I have a set of threads that can produce persistent messages then how to send them to ActiveMQ correctly being aware of whether a particular send operation is successful or not?
Have a separate Session/MessageProducer for each worker thread.
Create explicit set of producer threads and pass messages to them through a BlockingQueue (how to find out whether send was successful or not?).
Use Future<> for the previous case to get a success state of persisting message in ActiveMQ.
Simply wrap each MessageProducer.sendMessage() call in a synchronized block.
Or maybe there are any best practices for such cases. Thanks.
One problem with accessing a shared session between thread, like you state in 2-4 is transaction management.
If you do things in a JMS session, then you want to make sure that you know when the transaction is committed or rolled back. That happens on the session object. Multiple threads committing on the same session will cause bugs.
What's common (for instance if you look at the JmsTemplate from Spring) is that you open a new Connection/Session/MessageProducer, send a message, then close them all. This is very inefficient, but thread safe. To solve the efficiency problem, you can wrap your ConnectionFactory in a PooledConnectionFactory. That pool will lend sessions/connections to your thread when needed, and when close is invoked on the Session, it will be put back into the Pool. That way, you don't really have to care about thread safety at all. Read more on the topic here.
Of course, if you are up to some manual managing, you can go with your approach 1 and save a Session per thread. That should be the most efficient way if you have a few threads that sends a lot of messages.
Related
The spec for JMS sessions warns that Session objects/instances must only be used on the thread in which they are created when there are MessageListener instances registered to a Session. However, it doesn't say anything about being thread-un-safe, or perhaps more accurately, "thread-bound", when using MessageConsumer objects (only).
http://docs.oracle.com/javaee/1.3/api/javax/jms/Session.html
(by "thread-bound", I mean that the object must only be used, ever, on a specific thread, not just that it's unsafe to use it on multiple threads without synchronization or other coordination)
The answer to this question also suggests that Sessions are thread-bound: Relationship between JMS connections, sessions, and producers/consumers
However, there may or may not be some assumptions the author is making, and the question is also more about writing messages than about reading them.
Does anybody know if you can read a message in a Session on one thread, and then have another thread deal with the message and do a commit/rollback for the message (with the session) on that other thread? Only commit (or rollback) would be called against the Session from within the processing thread -- no other calls would be made to the Connection / Session / MessageConsumer / Message chain. Also, the Session would not be used to read again until after the commit/rollback occurred.
The following S/O questions seem closely related, but do no satisfactorily address what I am proposing:
How to continuously read JMS Messages in a thread and achnowledge them based on their JMSMessageID in another thread?
Reason for a JMS Session object to be used in a single threaded context
While I would like to use a Session on multiple threads, there will never be overlapping message requests/transactions.
I'm trying to avoid further refactoring of existing code, so I'm considering doing something a little odd, rather than having a Session on each worker thread.
edit (July 26) - - -
This question, Using a JMS Session from different threads, seems to suggest that it is OK to do synchronized operations with a session on different threads, but I am uncertain which version of the spec is referenced.
Maybe you have found a way in the specification.
A quote from the doc of Sessionhttp://docs.oracle.com/javaee/1.3/api/javax/jms/Session.html
A Session object is a single-threaded context for producing and consuming messages. Although it may allocate provider resources outside the Java virtual machine (JVM), it is considered a lightweight JMS object.
So it's single-threaded; and it's not expensive to create one.
And you have to pay attention to
The close method is the only session method that can be called while some other session method is being executed in another thread.
So you have to make sure that read and commit do not overlap, for example.
From a technical point of view I would refactor it; the code will be easier to read/maintain. Resource handling (open/close) would be in one thread (one method) only. This would simplify exception handling as well.
[From a legal point of view: You admit that you are doing something "odd" - against the recommendation. I would not deliver such a piece of software.]
This question follows directly from a previous question of mine in SO. I am still unable to grok the concept of JMS sessions being used as a transactional unit of work .
From the Java Message Service book :
The QueueConnection object is used to create a JMS Session object
(specifically, a Queue Session), which is the working thread and
transactional unit of work in JMS. Unlike JDBC, which requires a
connection for each transactional unit of work, JMS uses a single
connection and multiple Session objects. Typically, applications will
create single JMS Connection on application startup and maintain a
pool of Session objects for use whenever a message needs to be
produced or consumed.
I am unable to understand the meaning of the phrase transactional unit of work. A plain and simple explanation with an example is what I am looking for here.
A unit of work is something that must complete all or nothing. If it fails to complete it must be like it never happened.
In JTA parlance a unit of work consists of interactions with a transactional resource between a transaction.begin() call and a transaction.commit() call.
Lets say you define a unit of work that pulls a message of a source queue, inserts a record in a database, and puts the another message on a destination queue. In this scenario transaction aware resources are the two JMS queues and the database.
If a failure occurs after the database insert then a number of things must happen to achieve atomicity. The database commit must be rolled back so you don't have an orphaned record in the datasource and the message that was pulled off the source queue must be replaced.
The net out in this contrived scenario is that regardless of where a failure occurs in the unit of work the result is the exact state that you started in.
The key to remember about messaging systems is that a more global transaction can be composed of several smaller atomic transactional handoffs queue to queue.
Queue A -> Processing Agent -> Queue B --> Processing Agent --> Queue C
While in this scenario there isn't really a global transactional context(for instance rolling a failure in B->C all the way back to A) what you do have is garauntees that messages will either be delivered down the chain or remain in their source queues. This makes the system consistent at any instant. Exception states can be handled by creating error routes to achieve a more global state of consistency.
A series of messages of which all or none are processed/sent.
Session may be created as transacted. For a transacted session on session.commit() all messages which consumers of this session have received are committed, that is received messages are removed from their destinations (queues or topics) and messages that all producers of this session have sent become visible to other clients. On rollback received messages are returned back to their destinations, sent messages removed from destination. All sent / received messages until commit / rollback are one unit of work.
I want to send a batch of 20k JMS messages to a same queue. I'm splitting the task up using 10 threads, so each will be processing 2k messages. I don't need transactions.
I was wondering if having one connection, one session, and 10 producers is the recommended way to go or not?
How about if I had one producer shared by all the threads? Would my messages be corrupt or would it be sent out synchronized (giving no performance gain)?
What's the general guideline of deciding whether to create a new connection or session if I'm always connecting to the same queue?
Thank you and sorry for asking a lot at once.
(Here's a similar question, but it didn't quite answer what I was looking for. Long lived JMS sessions. Is Keeping JMS connections / JMS sessions allways open a bad pratice? )
Is it OK if some of the messages are duplicated or lost? When the JMS client connects to the JMS broker over the network there are three phases to any API call.
The API call, including any message data, is transmitted over the wire to the broker.
The API call is executed by the broker.
The result code and any message data is transmitted back to the client.
Consider the producer for a minute. If the connection is broken in the first step then the broker never got the message and the app would need to send it again. If the connection is broken in the third step then the message has been successfully sent and sending it again would produce a duplicate message. The app cannot tell the difference between these and so the only safe choice is to resend the message on error. If the session is transacted the message can be safely resent in all cases because if the original had made it to the broker, it will be rolled back.
Consider the consumer. If the connection is lost in the third step then the message is deleted from the queue but never made it back to the client. But if the session is transacted the message will be redelivered when the application reconnects.
Outside of transactions there is the possibility of lost or duplicate messages. Inside of a transaction the same window of ambiguity exists but it is on the COMMIT call rather then the PUT or GET. With transacted sessions it is possible to send or receive a message twice but not to lose one.
The JMS spec recognizes this window of ambiguity and provides the following guidance:
If a failure occurs between
the time a client commits its work on
a Session and the commit method
returns, the client cannot determine
if the transaction was committed or
rolled back. The same ambiguity exists
when a failure occurs between the
non-transactional send of a PERSISTENT
message and the return from the
sending method.
It is up to a JMS application to deal
with this ambiguity. In some cases,
this may cause a client to produce
functionally duplicate messages.
A message that is redelivered due to
session recovery is not considered a
duplicate message.
JMS sessions should always be transacted except for cases where it really is OK to lose messages. If the sessions are transacted then you'd need session and connection per-thread due to the JMS thread model.
Any advice about performance impacts would be vendor-specific but in general persistent messages outside of syncpoint are hardened to disk before the API call returns. But a transacted call can return before the persistent message is written to disk so long as the message is persisted before the COMMIT returns. If the vendor optimizes based on this, then it is much more performant to write several messages to disk and then commit them in batches. This allows the broker to optimize writes and disk flushes by disk block rather than per-message. The number of messages to put in the transaction decreases with the size of the message and beyond a certain message size dwindles back down to one.
If your 20k messages are relatively small (measured in k and not mb) then you probably want to use transacted sessions per thread and tune the commit interval.
In most scenarios it is sufficient to work with one connection and multiple sessions, using one session per thread. In some environments you can gain additional performance by using multiple connections:
Some messaging systems support a cluster mode, where connections get loadbalanced to different nodes. With multiple connections you can use the performance of multiple nodes in this scenario. (which of course does only help, when the bottleneck is on the side of the message broker).
The best solution would be to us a pool of connections, and give the administrator some options to configure the behaviour in the specific area.
I was wondering if having one connection, one session, and 10 producers
is the recommended way to go or not?
Sure but point to note here is that you are using single thread only i.e the one which you create while creating Session object. All 10 producers are bound to this Session Object and consequently to same thread.
How about if I had one producer shared by all the threads? Would my messages
be corrupt or would it be sent out synchronized (giving no performance gain)?
Very bad idea I would say. JMS specs clearly say Session should not be shared by more than one thread. It is not thread safe.
What's the general guideline of deciding whether to create a new connection
or session if I'm always connecting to the same queue?
If your System supports multithreading then you can create multiple sessions(each session corresponds to single thread) from a single connection. Each session than can have multiple producers/consumer but all these must not be shared among threads.
From what I investigat this topic, one session means one thread. This is based on JMS specs. If you want the multiple threading (multiple producers/consumers), multiple sessions needs to be created, one connection is fine.
In theory Connections are thread-safe but all the others are not, so you should create one session per thread.
In reality, it depends on the JMS implementation you are using.
In application based on HornetQ engine I intend to create multiple Producers and Consumers.
I have learned, that I should reuse resources as much as possible thanks to this page.
Does that mean, that for my application I should crate one and exactly one ConnectionFactory, one Connection, one Session and then (using this Session object) creating as many Producers/Consumers as I want?
That shouldn't be hard, but I'm not sure if this is the proper approach.
The best rule of thumb for minimum resource usage is to use the fewest constructs as possible while remaining thread safe. Accordingly:
Connection Factories are thread safe: One per JMS server (or one per JMS server per destination type for topics and queues)
Connections are thread safe: Depending on the application architecture, you may be able to use one connection, but I would not bend over backwards to do this.
Sessions and all constructs below the session are NOT thread safe: You will need one session per concurrent thread (or per transaction if you think about it that way).
Based on that, hopefully you can strike a balance between an elegant architecture and low resource utilization.
We have a JMS queue of job statuses, and two identical processes pulling from the queue to persist the statuses via JDBC. When a job status is pulled from the queue, the database is checked to see if there is already a row for the job. If so, the existing row is updated with new status. If not, a row is created for this initial status.
What we are seeing is that a small percentage of new jobs are being added to the database twice. We are pretty sure this is because the job's initial status is quickly followed by a status update - one process gets one, another process the other. Both processes check to see if the job is new, and since it has not been recorded yet, both create a record for it.
So, my question is, how would you go about preventing this in a vendor-neutral way? Can it be done without locking the entire table?
EDIT: For those saying the "architecture" is unsound - I agree, but am not at liberty to change it.
Create a unique constraint on JOB_ID, and retry to persist the status in the event of a constraint violation exception.
That being said, I think your architecture is unsound: If two processes are pulling messages from the queue, it is not guaranteed they will write them to the database in queue order: one consumer might be a bit slower, a packet might be dropped, ..., causing the other consumer to persist the later messages first, causing them to be overridden with the earlier state.
One way to guard against that is to include sequence numbers in the messages, update the row only if the sequence number is as expected, and delay the update otherwise (this is vulnerable to lost messages, though ...).
Of course, the easiest way would be to have only one consumer ...
JDBC connections are not thread safe, so there's nothing to be done about that.
"...two identical processes pulling from the queue to persist the statuses via JDBC..."
I don't understand this at all. Why two identical processes? Wouldn't it be better to have a pool of message queue listeners, each of which would handle messages landing on the queue? Each listener would have its own thread; each one would be its own transaction. A Java EE app server allows you to configure the size of the message listener pool to match the load.
I think a design that duplicates a process like this is asking for trouble.
You could also change the isolation level on the JDBC connection. If you make it SERIALIZABLE you'll ensure ACID at the price of slower performance.
Since it's an asynchronous process, performance will only be an issue if you find that the listeners can't keep up with the messages landing on the queue. If that's the case, you can try increasing the size of the listener pool until you have adequate capacity to process the incoming messages.