JMS taking too long to process messages - java

An application has a JMS queue responsible for delivering audit logs. The application send logs to a JMS queue and this queue is consumed by a MDB.
However the messages sent are big XML files that vary from 20 MB to 100 MB. The problem is that the JMS queue take too long to consume the messages, leading to an OutOfMemory error.
What should I do to solve this problem?

This answer may of may not help jguilhermemv, just want to share an idea for those who can read this post, a work around for big messages.
First thing is try not to send to big messages, Now we have two options (But these require implementation changes, and can be done in starting or if system implementation changes are allowed in later stage):
Try to save the log in DB and send just log-ids in JMS msgs. (Saving logs in DB is not recommended as size and time to save will again be a problem in later stage.)
Save logs in form of files (Save them at a common location) and file names in DB and share those file name IDs via JMS. Consumer can then after consuming can read that log file.

Related

How to send Huge data sets to Client from Server

I have a Requirement to send Data from Server (Tomcat : Java Process, Odata API's) to Client (React Based)
Data can range from few KB's to Hundred's of MB (Say 700 MB) which is retrieved from DB : RedShift , Processed and Sent to Client.
There can be multiple clients accessing at the same time as well to keep more stress on the system.
We added Pagination so that data for that page alone is loaded, but we have a functionality to export complete data set in CSV format.
Processing of all the data is consuming lot of memory and application's heap gets exhausted sometime, Increasing heap is not the solution expected, I want to know from Application side anything can be done to Optimize system resources.
Kindly suggest what could be the best way to transfer data also whould like to see if there are any other kind of API(Streaming) which can help me here
Can you change the integration between client and your system?
Something like: the client sends the request to export a CSV with a callback url in payload.
You put this request in a queue (rabbitmq). The queue consumer process the request, generate the CSV and put it in a temporary area (S3 or behind a NGINX). Then your consumer notifies the client in the callback url with then new url for the client to download de full CSV.
This way, the system that process the incoming requests don't use too much heap. You only need to scale the queue consumer, but it's more easy because the concurrency is your configuration of how many consumers are consuming the messages, not the incoming requests from the clients.

Reading huge file and writing in RDBMS

I have a huge text file which is continuously getting appended from a common place, which I need to read line by line from my java application and update in a SQL RDBMS such that if java application crashes, it should start from where it left and not from the beginning.
its a plain text file. Each row will contains:
<Datatimestamp> <service name> <paymentType> <success/failure> <session ID>
Also the data which is retrieved from database should also be real time without any performance, availability or availability issues in web application
Here is my approach:
Deploy application in two systems boxes with each contains heartbeat which pings the other system for service availability.
When you get a success response to heart beat,you also get the time stamp which is last successfully read.
When the next heartbeat response fails, application in another system can take over, based on:
1. failed response
2. Last successful time stamp.
Also, since the need for data retrieval is very real time and data is huge, can I crawl the database put that into Solr or Elastic search for faster retrieval, instead of making the database calls ?
There are various ways to do it, what is the best way.
I would put a messaging system in between the text file and the DB writing applications. (for example RabbitMQ) in this case, the messaging system functions as a queue. one application constantly reads the file and inserts the rows as messages to the broker. on the other side, multiple "DB writing applications" can read from the queue and write to DB.
the advantage of the messaging system is its support for multiple clients reading from the queue. the messaging system takes care of synchronizing between the clients, dealing with errors, dead letters, etc. the clients don't care about what payload was processed by other instances.
regarding maintaining multiple instances of "DB writing applications": I would go for ready made cluster solutions. perhaps docker cluster managed by kubernates?
another viable alternative is a streaming platform, like Apache Kafka.
You can use a software like FileBeat to read the file and direct the filebeat output to RabbitMQ or Kafka. From there a Java program can subscribe / consume the data and put it into a RDBMS system.

Broker disk usage after topic deletion

I'm using Apache Kafka. I dump huge dbs into Kafka, where each database's table is a topic.
I cannot delete topic before it's completely consumed. I cannot set time-based retention policy because I don't know when topic will be consumed. I have limitited disk and too much data. I have to write code that will orchestrate by consumption and deletion programmatically. I understand that the problem appear because we're using Kafka for batch processing, but I can't change technology stack.
What is the correct way to delete consumed topic from brokers?
Currently, I'm calling kafka.admin.AdminUtils#deleteTopic. But I can't find clear related documentation. The method signature doesn't contain kafka server URLs. Does that mean that I'm deleting only topic's metadata and broker's disk usage isn't reduced? So when real append-log file deletion happens?
Instead of using a time-based retention policy, are you able to use a size-based policy? log.retention.bytes is a per-partition setting that might help you out here.
I'm not sure how you'd want to determine that a topic is fully consumed, but calling deleteTopic against the topic initially marks it for deletion. As soon as there are no consumers/producers connected to the cluster and accessing those topics, and if delete.topic.enable is set to true in your server.properties file, the controller will then delete the topic from the cluster as soon as it is able to do so. This includes purging the data from disk. It can take anywhere between a few seconds and several minutes to do this.

what is best option for creating log message buffer

I am working on a web application which needs to be deployed to cloud. There is a cloud service which can store log messages for applications securely. This is exposed by cloud using REST API which can take up to max 25 log messages in json format. we are currently using log4j(open for any other too) to log in to file. Now, we need to transition our application to move from file based logging to using cloud REST API.
I am considering that it would be expensive to make REST API call for every log message and slow down the application.
in this context, I am considering writing a custom appender which can write to a buffer. buffer can be in-memory or persistent buffer which will be read and emptied periodically by a separate thread or process by sending 25 messages in bunch to cloud REST API.
option 1:
using in-memory buffer
my custom appender would write message to in memory list and keep filling it.
There woudl be a daemon thread which will keep removing 25 messages at a time from the buffer and write to cloud using REST API. There is a downside to this approach that in event of application/server/node crashing.. we lose critical log message which can lead to diagnostic of why crash occurred.I am not sure if this is right way of thinking.
option 2:
using persistent buffer database/message queue:
appender can log message to database table temporarily or post to message queue which will be processed by separate long running job to pick up messages from db or queue and post it to cloud using REST API.
please guide which option looks best.
There is a lot of build in appender in log4j : https://logging.apache.org/log4j/2.x/manual/appenders.html and if you use a dedicated service in a cloud, they may give a specific appender.
If it's in your environment, maybe try a stack like ELK with log4j rollingfile apender, with that technique you'll not lose log entries.

How can we save Java Message Queues for reference?

How can we keep track of every message that gets into our Java Message Queue? We need to save the message for later reference. We already log it into an application log (log4j) but we need to query them later.
You can store them
in memory - in a collection or in an in-memory database
in a standalone database
You could create a database logging table for the messages, storing the message as is in a BLOB column, the timestamp that it was created / posted to the MQ and a simple counter as primary key. You can also add fields like message type etc if you want to create statistical reports on messages sent.
Cleanup of the tabe can be done simply by deleting all message older than the retention period by using the timestamp column.
I implemented such a solution in the past, we chose to store messages with all their characteristics in a database and developed a search, replay and cancel application on top of it. This is the Message Store pattern:
(source: eaipatterns.com)
We also used this application for the Dead Letter Channel.
(source: eaipatterns.com)
If you don't want to build a custom solution, have a look at the ReplayService for JMS from CodeStreet.
The best way to do this is to use whatever tracing facility your middleware provider offers. Or possibly, you could set up an intermediate listener whose only job was to log messages and forward on to your existing application.
In most cases, you will find that the middleware provider already has the ability to do this for you with no changes or awareness by your application.
I would change the queue to a topic, and then keep the original consumer that processes the messages, and add another consumer for auditing the messages to a database.
Some JMS providers cater for topic-to-queue-bridge definitions, the consumers then receive from their own dedicated queues, and don't have to read past messages that are left on the queue due to other consumers being inactive.
Alternatively, you could write a log4j appender, which writes your logged messages to a database.

Categories