How can we save Java Message Queues for reference? - java

How can we keep track of every message that gets into our Java Message Queue? We need to save the message for later reference. We already log it into an application log (log4j) but we need to query them later.

You can store them
in memory - in a collection or in an in-memory database
in a standalone database

You could create a database logging table for the messages, storing the message as is in a BLOB column, the timestamp that it was created / posted to the MQ and a simple counter as primary key. You can also add fields like message type etc if you want to create statistical reports on messages sent.
Cleanup of the tabe can be done simply by deleting all message older than the retention period by using the timestamp column.

I implemented such a solution in the past, we chose to store messages with all their characteristics in a database and developed a search, replay and cancel application on top of it. This is the Message Store pattern:
(source: eaipatterns.com)
We also used this application for the Dead Letter Channel.
(source: eaipatterns.com)
If you don't want to build a custom solution, have a look at the ReplayService for JMS from CodeStreet.

The best way to do this is to use whatever tracing facility your middleware provider offers. Or possibly, you could set up an intermediate listener whose only job was to log messages and forward on to your existing application.
In most cases, you will find that the middleware provider already has the ability to do this for you with no changes or awareness by your application.

I would change the queue to a topic, and then keep the original consumer that processes the messages, and add another consumer for auditing the messages to a database.
Some JMS providers cater for topic-to-queue-bridge definitions, the consumers then receive from their own dedicated queues, and don't have to read past messages that are left on the queue due to other consumers being inactive.
Alternatively, you could write a log4j appender, which writes your logged messages to a database.

Related

Broker disk usage after topic deletion

I'm using Apache Kafka. I dump huge dbs into Kafka, where each database's table is a topic.
I cannot delete topic before it's completely consumed. I cannot set time-based retention policy because I don't know when topic will be consumed. I have limitited disk and too much data. I have to write code that will orchestrate by consumption and deletion programmatically. I understand that the problem appear because we're using Kafka for batch processing, but I can't change technology stack.
What is the correct way to delete consumed topic from brokers?
Currently, I'm calling kafka.admin.AdminUtils#deleteTopic. But I can't find clear related documentation. The method signature doesn't contain kafka server URLs. Does that mean that I'm deleting only topic's metadata and broker's disk usage isn't reduced? So when real append-log file deletion happens?
Instead of using a time-based retention policy, are you able to use a size-based policy? log.retention.bytes is a per-partition setting that might help you out here.
I'm not sure how you'd want to determine that a topic is fully consumed, but calling deleteTopic against the topic initially marks it for deletion. As soon as there are no consumers/producers connected to the cluster and accessing those topics, and if delete.topic.enable is set to true in your server.properties file, the controller will then delete the topic from the cluster as soon as it is able to do so. This includes purging the data from disk. It can take anywhere between a few seconds and several minutes to do this.

How can I handle large files processing via messaging queries in Microservices environment?

Many people suggest that the good way for organizing IPC (ImicroservicesC) is asynchronous communication via queries like Kafka and JMS.
But what if I need to pass large data files between services?
Suppose I have a Video Microservice and a Publisher Microservice. The first one receives videos from the user, verifies and sends them to Publisher for converting and publishing. It's oblivious video can be a very large file and it can overload messaging system (Kafka is not suitable for big messages at all). Of course, I can share one database for them and send video_id via Kafka, but it couples these services and its not a real microservices architecture anymore.
Do you have similar situations in practice? How do you handle it?
Thanks
There is an Enterprise Integration Pattern from the book by Hohpe/Wolfe called the Claim Check Pattern that addresses these concerns.
Essentially the big blob is removed from the message and stored somewhere that both sender and receiver can access, whether that be a common file share, FTP server, an Amazon S3 blob, whatever. It leaves a "claim check" behind: some sort of address that describes how to find the blob back.
The tiny message can then be transmitted over Kafka/JMS, or some other message queue system, most of which are fairly bad at dealing with large data blobs.
Of course, a very simple implementation is to leave the files on a file share and only refer to them by file path.
It's more complex when it's preferable to have the blob integrated with the rest of the message, requiring a true Claim Check implementation. This can be handled at an infrastructure level so the message sender and receiver don't need to know any of the details behind how the data is transmitted.
I know that you're in the Java landscape, but in NServiceBus (I work for Particular Software, the makers of NServiceBus) this pattern is implemented with the Data Bus feature in a message pipeline step. All the developer needs to do is identify what type of message properties apply to the data bus, and (in the default file share implementation) configure the location where files are stored. Developers are also free to provide their own data bus implementation.
One thing to keep in mind is that with the blobs disconnected from the messages, you have to provide for cleanup. If the messages are one-way, you could clean them up as soon as the message is successfully processed. With Kafka (not terribly familiar) there's a possibility to process messages from a stream multiple times, correct? If so you'd want to wait until it was no longer possible to process that message. Or, if the Publish/Subscribe pattern is use, you would not want to clean up the files until you were sure all subscribers had a chance to be processed. In order to accomplish that, you'd need to set an SLA (a timespan that each message must be processed within) on the message and clean up the blob storage after that timespan had elapsed.
In any case, lots of things to consider, which make it much more useful to implement at an infrastructure level rather than try to roll your own in each instance.

JMS taking too long to process messages

An application has a JMS queue responsible for delivering audit logs. The application send logs to a JMS queue and this queue is consumed by a MDB.
However the messages sent are big XML files that vary from 20 MB to 100 MB. The problem is that the JMS queue take too long to consume the messages, leading to an OutOfMemory error.
What should I do to solve this problem?
This answer may of may not help jguilhermemv, just want to share an idea for those who can read this post, a work around for big messages.
First thing is try not to send to big messages, Now we have two options (But these require implementation changes, and can be done in starting or if system implementation changes are allowed in later stage):
Try to save the log in DB and send just log-ids in JMS msgs. (Saving logs in DB is not recommended as size and time to save will again be a problem in later stage.)
Save logs in form of files (Save them at a common location) and file names in DB and share those file name IDs via JMS. Consumer can then after consuming can read that log file.

Is it possible to salvage messages from Weblogic JMS file store?

I have a couple of JMS file stores from a Weblogic 10.3 server, and i would like to retrieve the messages contained in them, if possible, without using Weblogic. Is this possible?
Many years ago i was able to read the JMS file store for a previous version of Weblogic using Java serialization (ObjectInputStream), but the files i have are giving me a
java.io.StreamCorruptedException: invalid stream header: C001BEAD
exception when i open them using ObjectInputStream. I'm wondering if there is a file header that i need to skip before i can deserialize the messages, or perhaps this version of Weblogic doesn't use Java serialization at all.
The messages in the file are MapMessages. I can see the strings that correspond to the map keys, when i hex dump the file, but of course the values are not readable this way. But the fact that i can see the map keys make me hopeful that the messages are serialized in the file.
Any ideas on how to salvage the data?
Set aside in a safe place all the *.dat files you wish to salvage.
Start-up a weblogic and log-in to the Admin console
Go to Home ->Summary of JMS Servers ->XL-JMS-Server
Enable “Insertion Paused At Startup”
Enable “Production Paused At Startup”
Enable “Consumption Paused At Startup” paused
Save the settings
Shutdown Weblogic
Swap-in a JMS data store you wish to salvage
Start Weblogic
Browse the JMS monitoring page to see which Queues and Topics have messages persisted.
At this point, the datastore is ready to be inspected/dumped using a QueueBrowser or a TopicSubscriber that you write. Alternatively, you could walk the messages ad hoc using Hermes JMS ( http://www.hermesjms.com ). Hermes has message renderers that you can implement for your custom message types.
The only way we and Oracle support were able to come up with, was to create another Weblogic instance configured the same way, and let that Weblogic instance pick up and process the messages.

Tool to send email from a DB

We are developing a webapp that needs to send out emails written in Java/Groovy. Currently, we are persisting each email to a database before we call the Java Mail APIs to send the mail to our SMTP server.
I want to send out email asynchronously. I want to persist the email and then have another process pick up the email and send it (and send it only once). Ideally, this process is running outside of my webapp.
Are there any tools that do that?
Update: This solution needs to prevent duplicate emails and it needs to handle spikes in email. I was hoping someone already wrote an offline email processor. (I'd rather not implement this myself.)
The suggestions to use a cron job to read the database are quite workable.
Another good approach here is to use a Java Message Service (JMS) message queue. These are persistent (backed up by a database) and reliable. You can have one or more producer programs enqueue messages with the relevant data in them, and then one or more consumers process the messages and dequeue them. All of this is set up for very high reliability, and you gain the flexibility of asynchronously decoupling the operations, which means during email spikes the message queue can grow larger until the consumers catch up with the spike. Another benefit is that the email goes out as soon as a consumer gets to it instead of on a timer. Plus, if you require high availability, you can have multiple consumers in case one goes down.
Check out Apache's ActiveMQ for a good open source implementation of JMS.
If you're using Linux/Unix you could create a cron job to run every few minutes which calls a program to grab the email from the database and send it out. You could also have a field in the database to indicate whether the message has been sent. The downside of this approach is that there may be a delay of a few minutes from when your webapp persists the email and when the cron job is run.
Setup a cron job and use scripts to query the db and send out emails via sendmail.
On the off chance it's an Oracle DB, you can use the UTL_MAIL package to write PL/SQL to send the mail through your SMTP server. Then create a scheduled job to execute on your desired schedule.
http://download.oracle.com/docs/cd/B19306_01/appdev.102/b14258/u_mail.htm
Since you are already using groovy, this might be an interesting tool to solve your problem
http://gaq.sourceforge.net/
You could use Quartz, a scheduling library (similar to cron), to schedule a recurring task that reads the DB and sends the emails. If you're using Grails, there's a Quartz plugin that makes working with Quartz a bit more Groovy.

Categories