Generic QoS Message batching and compression in Java

Generic QoS Message batching and compression in Java - java

We have a custom messaging system written in Java, and I want to implement a basic batching/compression feature that basically under heavy load it will aggregate a bunch of push responses into a single push response.
Essentially:
if we detect 3 messages were sent in the past second then start batching responses and schedule a timer to fire in 5 seconds
The timer will aggregate all the message responses received in the next 5 seconds into a single message
I'm sure this has been implemented before I'm just looking for the best example of it in Java. I'm not looking for a full blown messaging layer, just the basic detect messages per second and schedule some task (obviously I can easily write this myself I just want to compare it with any existing algorithms to make sure I'm not missing any edge cases or that I've simplified the problem as much as possible).
Are there any good open source examples of building a basic QoS batching/throttling/compression implementations?

we are using a very similar mechanism for high load.
it will work as you described it
* Aggregate messages over a given interval
* Send a List instead of a single message after that.
* Start aggregating again.
You should watch out for the following pitfalls:
* If you are using a transacted messaging system like JMS you can get into trouble because your implementation will not be able to send inside the JMS transaction so it will keep aggregating. Depending on the size of your data structure to hold the messages this can run out of space. If you are have very long transactions sending many messages this can pose a problem.
* Sending a message in such a way will happen asynchronous because a different thread will be sending the message and the thread calling the send() method will only put it in the data structure.
* Sticking to the JMS example you should keep in mind that they way messages are consumed is also changed by this approach. Because you will receive the list of messages from JMS as a single message. So once you commit this single JMS message you commited the entire list of messages. You should check if this i a problem to your requirements.

Related

Transactions with multiple resources (database and JMS broker)

I have an application where we insert to database and we publish event to ActiveMQ.
I am facing problems with the transaction. I will explain the issue with the code below:
#Transactional(rollbackFor = Exception.class)
public class ProcessInvoice {
public boolean insertInvoice(Object obj){
/* Some processing logic here */
/* DB Insert */
insert(obj);
/* Some processing logic here again */
/* Send event to Queue 1 */
sendEvent(obj);
/* Send event to Queue 2 */
sendEvent(obj);
return true;
}
}
Class is annotated with #Transactional, in the insertInvoice method I am doing some processing, inserting to DB, and sending event's to two queues.
With the above code I am facing two problems:
If the queue is slow then I am facing performance issue as process takes time in sendEvent method.
If for some reason ActiveMQ is down or consumer not able to process the message, how to rollback the transaction?
How to deal with these issue?

If you need to send your message transactionally (i.e. you need to be sure the broker actually got your message when you send it) and the broker is performing slowly which is impacting your application then you only have two choices:
Accept the performance loss in your application.
Improve the broker's performance so that your application performance improves as well. Improving broker performance is a whole other subject.
In JMS (and most other messaging architectures) producers and consumers are unaware of each other by design. Therefore, you will not know if the consumer of the message you send is unable to process the message for any reason, at least not through any automatic JMS mechanism.
When the broker is down the sendEvent method should fail outright. However, I'm not terribly familiar with how Spring handles transactions so I can't say what should happen in that regard.

I have some questions regarding your issue:
If the sendEvent(Object o) method is that expensive (according to what you say) in terms of performance, why do you consider to call it twice (apparently for processing the same object)?
Apparently the result of those 2 calls would be the same, with the difference that they would be sent to 2 different queues. I believe that you could send it to both queues in just one call, in order not to execute the same code twice.
When thinking in transactions, the first things that come to my head are synchronous operations. Do you want to perform those operations asynchronously or synchronously? For example, do you want to wait until the invoice is inserted in the DB for sending right after the message to Queue1 and Queue2?
Maybe you should do it asynchronously. If you don't or cannot, maybe you could opt for an "optimistic" strategy, where you send first the message to Queue1 and Queue2, and afterwards while you are processing those messages on the broker side, you perform the insertion of the invoice into the DB. If the database has a high availability, in most cases the insertion will succeed, so you will not have to wait until it is persisted to send the messages to Queue1 and 2. In case the insertion did not succeed (what would be very unlikely), you could send a second message to undo those changes on the broker side. In case that due to your business logic this "undo" process is not trivial, this alternative might not suit for you.
You mention if ActiveMQ is down, how to rollback. Well, in that case maybe you need some monitoring of the queues to find out if the message reached its destination or not. I would advise you to take a look to the Advisory messages, they may help you to control that and act in consequence.
But maybe what you need could also be re-thought and solved with durable subscribers, in that way once the subscribers were available again, they would receive that message that was en-queued. But this performs slightly worse since it needs to persist the messages to files to recover them afterwards if the broker goes down.
Hope these suggestions help you, but in my opinion I believe you should describe more how should it be the result you want (the flow) since it does not seem to be very clear (at least to me)

Queueing a message in JMS, for delayed processing

I have a piece of middleware that sits between two JMS queues. From one it reads, processes some data into the database, and writes to the other.
Here is a small diagram to depict the design:
With that in mind, I have some interesting logic that I would like to integrate into the service.
Scenario 1: Say the middleware service receives a message from Queue 1, and hits the database to store portions of that message. If all goes well, it constructs a new message with some data, and writes it to Queue 2.
Scenario 2: Say that the database complains about something, when the service attempts to perform some logic after getting a message from Queue 1.In this case, instead of writing a message to Queue 2, I would re-try to perform the database functionality in incremental timeouts. i.e Try again in 5 sec., then 30 sec, then 1 minute if still down. The catch of course, is to be able to read other messages independently of this re-try. i.e Re-try to process this one request, while listening for other requests.
With that in mind, what is both the correct and most modern way to construct a future proof solution?
After reading some posts on the net, it seems that I have several options.
One, I could spin off a new thread once a new message is received, so that I can both perform the "re-try" functionality and listen to new requests.
Two, I could possibly send the message back to the Queue, with a delay. i.e If the process failed to execute in the db, write the message to the JMS queue by adding some amount of delay to it.
I am more fond of the first solution, however, I wanted to get the opinion of the community if there is a newer/better way to solve for this functionality in java 7. Is there something built into JMS to support this sort of "send message back for reprocessing at a specific time"?

JMS 2.0 specification describes the concept of delayed delivery of messages. See "What's new" section of https://java.net/projects/jms-spec/pages/JMS20FinalReleaseMany JMS providers have implemented the delayed delivery feature.
But I wonder how the delayed delivery will help your scenario. Since the database writes have issues, subsequent messages processing and attempt to write to database might end up in same situation. I guess it might be better to sort out issues with database updates and then pickup messages from queue.

How to design a JMS message containing large amounts of data

I am working on designing a system that uses an ETL tool to retrieve batches of data, i.e., insert/update/deletes for one or more tables, and puts them on a JMS topic to be processed later by multiple clients. Right now, each message on the topic represents a single record I/U/D and we have a special message to delimit the end of the batch. It's important to process the batches in a single transaction, so having a bunch of messages delimited by a special one is not ideal: both sessions publishing and receiving messages must be designed for multiple messages; the batch delimiter message is a messy solution (each time we receive a message we need to check if it's the last) and very error prone; the system is difficult to debug and maintain; the number of messages on the topic becomes quickly huge (up to millions).
Now, I think that the next natural step to improve the architecture is to pack all the records in a single JMS message so that when a message is received, it encompasses a single transaction, it's easy to detect failures, there are no "orphan" records on the topic, etc. I only see advantages in doing so! Now here are my questions:
What's the best way to create such a packed message? I think my choices are StreamMessage, ByteMessage or ObjectMessage. I excluded text and map messages because the first will require text parsing, which will kill performance, and I assume the second one doesn't really seem to fit the scenario. I'm kinda leaning towards StreamMessage because it seems quite compact although it will require a lot of work writing custom serialization code (even worse for ByteMessage). Not sure about ObjectMessage, how does it perform? Is there an out of the box solution for this?
What's the maximum size allowed per message? Could it be in the order of hundreds of KB or even few MB?
Thanks for the thoughts!
Giovanni

Instead of using one large message, you could use two (or more) queues, correlation ids and a message selector.
Queueing:
Post a notification message to "notification queue" to indicate that processing should start
Post command messages to "command queue" with correlation id set to notification messages message id (you can use multiple command queues, if queue depth gets too high)
Commit the transaction
Processing:
Receive the notification message from "notification queue" (e.g. with message driven bean)
Receive and process all the related messages from "command queue" using a message selector
Commit the transaction

Using bytes (e.g. a ByteMessage) is likely the less memory intensive.
If you manipulate Java Objects, you can use a fast and byte effective serialization/deserialization library like Kryo
We happily use Kryo in production on a messaging system, but you have plenty of alternatives such as the popular Google Protocol Buffers

Handling Failed calls on the Consumer end (in a Producer/Consumer Model)

Let me try explaining the situation:
There is a messaging system that we are going to incorporate which could either be a Queue or Topic (JMS terms).
1 ) Producer/Publisher : There is a service A. A produces messages and writes to a Queue/Topic
2 ) Consumer/Subscriber : There is a service B. B asynchronously reads messages from Queue/Topic. B then calls a web service and passes the message to it. The webservice takes significant amount of time to process the message. (This action need not be processed real-time.)
The Message Broker is Tibco
My intention is : Not to miss out processing any message from A. Re-process it at a later point in time in case the processing failed for the first time (perhaps as a batch).
Question:
I was thinking of writing the message to a DB before making a webservice call. If the call succeeds, I would mark the message processed. Otherwise failed. Later, in a cron job, I would process all the requests that had initially failed.
Is writing to a DB a typical way of doing this?

Since you have a fail callback, you can just requeue your Message and have your Consumer/Subscriber pick it up and try again. If it failed because of some problem in the web service and you want to wait X time before trying again then you can do either schedule for the web service to be called at a later date for that specific Message (look into ScheduledExecutorService) or do as you described and use a cron job with some database entries.
If you only want it to try again once per message, then keep an internal counter either with the Message or within a Map<Message, Integer> as a counter for each Message.

Crudely put that is the technique, although there could be out-of-the-box solutions available which you can use. Typical ESB solutions support reliable messaging. Have a look at MuleESB or Apache ActiveMQ as well.

It might be interesting to take advantage of the EMS platform your already have (example 1) instead of building a custom solution (example 2).
But it all depends on the implementation language:
Example 1 - EMS is the "keeper" : If I were to solve such problem with TIBCO BusinessWorks, I would use the "JMS transaction" feature of BW. By encompassing the EMS read and the WS call within the same "group", you ask for them to be both applied, or not at all. If the call failed for some reason, the message would be returned to EMS.
Two problems with this solution : You might not have BW, and the first failed operation would block all the rest of the batch process (that may be the desired behavior).
FYI, I understand it is possible to use such feature in "pure java", but I never tried it : http://www.javaworld.com/javaworld/jw-02-2002/jw-0315-jms.html
Example 2 - A DB is the "keeper" : If you go with your "DB" method, your queue/topic customer continuously drops insert data in a DB, and all records represent a task to be executed. This feels an awful lot like the simple "mapping engine" problem every integration middleware aims to make easier. You could solve this with anything from a custom java code and multiples threads (DB inserter, WS job handlers, etc.) to an EAI middleware (like BW) or even a BPM engine (TIBCO has many solutions for that)
Of course, there are also other vendors... EMS is a JMS standard implementation, as you know.

I would recommend using the built in EMS (& JMS) features,as "guaranteed delivery" is what it's built for ;) - no db needed at all...
You need to be aware that the first decision will be:
do you need to deliver in order? (then only 1 JMS Session and Client Ack mode should be used)
how often and in what reoccuring times do you want to retry? (To not make an infinite loop of a message that couldn't be processed by that web service).
This is independent whatever kind of client you use (TIBCO BW or e.g. Java onMessage() in a MDB).
For "in order" delivery: make shure only 1 JMS Session processes the messages and it uses Client acknolwedge mode. After you process the message sucessfully, you need to acknowledge the message with either calling the JMS API "acknowledge()" method or in TIBCO BW by executing the "commit" activity.
In case of an error you don't execute the acknowledge for the method, so the message will be put back in the Queue for redelivery (you can see how many times it was redelivered in the JMS header).
EMS's Explicit Client Acknolwedge mode also enables you to do the same if order is not important and you need a few client threads to process the message.
For controlling how often the message get's processed use:
max redelivery properties of the EMS queue (e.g. you could put the message in the dead
letter queue afer x redelivery to not hold up other messages)
redelivery delay to put a "pause" in between redelivery. This is useful in case the
Web Service needs to recover after a crash and not gets stormed by the same message again and again in high intervall through redelivery.
Hope that helps
Cheers
Seb

Mule Aggregator - Streaming Aggregation

The collection aggregator used in the Mule 2.0 framework works a bit like this:
An inbound router takes a collection of messages and splits it up into a number of smaller messages - each smaller message get stamped with a correlation id corresponding to the parent message
These messages flow through various services
Finally these messages arrive at an inbound aggregator that collects up the messages based on the correlation id of the parent message and the number of expected messages. Once all of the expected messages have been received then the aggregation function is called and the result is returned.
Now this works fine when the number of messages in a group is reasonably small. However once the number of messages in a group becomes huge ~100k then a lot of memory is tied up holding onto the group of messages waiting for the later messages to arrive. This is made worse if there are multiple groups being aggregated at the same time.
A way around this issue would be to implement a streaming aggregator. In my use case I am essentially summing up the various messages based on a key and this could be done without having to see all of the messages in the group at the same time. I'd only want to know that all of the messages had been received before forwarding the result onto the endpoint.
Does this sound like a reasonable solution to the problem?
Is this already implemented somewhere in Mule?
Are there better ways of doing this?

This seems like a reasonable approach (I'm not a Mule expert by any means), and I have read all of the Mule documentation and don't think there is something like this out there (the streaming support is limited to a few connectors and transformers - it's pretty simple in that it just passes around an InputStream). Only a few things in Mule stream, so you may need to have other modified transformers (if you use them) that stream. You would just implement the aggregator the provides an InputStream and starts streaming as soon as it got some consecutive sequence of messages.
However one sentence in your description "... all of the messages had been received before forwarding the results to the endpoint" could be troubling. This by it's very nature defeats the purpose of streaming, unless you mean that you (in your service component presumably) will keep track that you got everything before forwarding the (presumably much smaller) processed result onwards.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.