I am working on a web application which needs to be deployed to cloud. There is a cloud service which can store log messages for applications securely. This is exposed by cloud using REST API which can take up to max 25 log messages in json format. we are currently using log4j(open for any other too) to log in to file. Now, we need to transition our application to move from file based logging to using cloud REST API.
I am considering that it would be expensive to make REST API call for every log message and slow down the application.
in this context, I am considering writing a custom appender which can write to a buffer. buffer can be in-memory or persistent buffer which will be read and emptied periodically by a separate thread or process by sending 25 messages in bunch to cloud REST API.
option 1:
using in-memory buffer
my custom appender would write message to in memory list and keep filling it.
There woudl be a daemon thread which will keep removing 25 messages at a time from the buffer and write to cloud using REST API. There is a downside to this approach that in event of application/server/node crashing.. we lose critical log message which can lead to diagnostic of why crash occurred.I am not sure if this is right way of thinking.
option 2:
using persistent buffer database/message queue:
appender can log message to database table temporarily or post to message queue which will be processed by separate long running job to pick up messages from db or queue and post it to cloud using REST API.
please guide which option looks best.
There is a lot of build in appender in log4j : https://logging.apache.org/log4j/2.x/manual/appenders.html and if you use a dedicated service in a cloud, they may give a specific appender.
If it's in your environment, maybe try a stack like ELK with log4j rollingfile apender, with that technique you'll not lose log entries.
Related
I have a huge text file which is continuously getting appended from a common place, which I need to read line by line from my java application and update in a SQL RDBMS such that if java application crashes, it should start from where it left and not from the beginning.
its a plain text file. Each row will contains:
<Datatimestamp> <service name> <paymentType> <success/failure> <session ID>
Also the data which is retrieved from database should also be real time without any performance, availability or availability issues in web application
Here is my approach:
Deploy application in two systems boxes with each contains heartbeat which pings the other system for service availability.
When you get a success response to heart beat,you also get the time stamp which is last successfully read.
When the next heartbeat response fails, application in another system can take over, based on:
1. failed response
2. Last successful time stamp.
Also, since the need for data retrieval is very real time and data is huge, can I crawl the database put that into Solr or Elastic search for faster retrieval, instead of making the database calls ?
There are various ways to do it, what is the best way.
I would put a messaging system in between the text file and the DB writing applications. (for example RabbitMQ) in this case, the messaging system functions as a queue. one application constantly reads the file and inserts the rows as messages to the broker. on the other side, multiple "DB writing applications" can read from the queue and write to DB.
the advantage of the messaging system is its support for multiple clients reading from the queue. the messaging system takes care of synchronizing between the clients, dealing with errors, dead letters, etc. the clients don't care about what payload was processed by other instances.
regarding maintaining multiple instances of "DB writing applications": I would go for ready made cluster solutions. perhaps docker cluster managed by kubernates?
another viable alternative is a streaming platform, like Apache Kafka.
You can use a software like FileBeat to read the file and direct the filebeat output to RabbitMQ or Kafka. From there a Java program can subscribe / consume the data and put it into a RDBMS system.
I have a Spark Streaming application using a Custom Receiver and I want it to be fully fault-tolerant. To do so, I have enabled Write Ahead Logs (WAL) in the configuration file when running spark-submit and have checkpointing set up (using getOrCreate).
From a tutorial I saw online, it says that to make sure WAL is recovering buffered data properly with custom receiver, I need to make sure that the receiver is reliable and data is acknowledged after it is saved to WAL directory. The reference on Spark website also talks about acknowledging data from source:
https://spark.apache.org/docs/1.6.1/streaming-custom-receivers.html
However, there is no example code of how to set up the order for:
First save data to WAL (by calling store())
Acknowledge the data (??)
Any idea how I can do it?
Currently, in my Spark UI, I see that the application resumes with multiple batches having "0 events".
An application has a JMS queue responsible for delivering audit logs. The application send logs to a JMS queue and this queue is consumed by a MDB.
However the messages sent are big XML files that vary from 20 MB to 100 MB. The problem is that the JMS queue take too long to consume the messages, leading to an OutOfMemory error.
What should I do to solve this problem?
This answer may of may not help jguilhermemv, just want to share an idea for those who can read this post, a work around for big messages.
First thing is try not to send to big messages, Now we have two options (But these require implementation changes, and can be done in starting or if system implementation changes are allowed in later stage):
Try to save the log in DB and send just log-ids in JMS msgs. (Saving logs in DB is not recommended as size and time to save will again be a problem in later stage.)
Save logs in form of files (Save them at a common location) and file names in DB and share those file name IDs via JMS. Consumer can then after consuming can read that log file.
So I am writing an application that will accept log entries from many other applications. We mostly use log4j.
Since these applications are on different machines, I wanted to have a web service that accepted POST'd data from each application. At which we could then search, etc.
I realize there are services like Loggly that handle this but I want to write my own (mainly for our security purposes and company not liking log information on 3rd party providers).
Anyway, I successfully got my own custom HttpAppender to work. So that each application would send the message to a web service.
But before I break out the champagne, I realize that a direct post over HTTP could be a bad thing because some of these apps generate MILLIONS of rows in the logs. So the last thing I want is my HttpAppender bringing down some T1 line or something.
So my idea was to buffer the HTTP POST's somehow and then periodically send those buffers in a single post. So fewer large posts vs many smaller posts.
Of course buffering in something like Reddis/memcached locally on the same machine would help but I have to assume that I can't use external caching (on the same server). So I would have to cache in the appender's memory/process.
Am I on the write track with buffering these HTTP posts? Or, should I write the buffers to log files and then periodically post those log files?
How can we keep track of every message that gets into our Java Message Queue? We need to save the message for later reference. We already log it into an application log (log4j) but we need to query them later.
You can store them
in memory - in a collection or in an in-memory database
in a standalone database
You could create a database logging table for the messages, storing the message as is in a BLOB column, the timestamp that it was created / posted to the MQ and a simple counter as primary key. You can also add fields like message type etc if you want to create statistical reports on messages sent.
Cleanup of the tabe can be done simply by deleting all message older than the retention period by using the timestamp column.
I implemented such a solution in the past, we chose to store messages with all their characteristics in a database and developed a search, replay and cancel application on top of it. This is the Message Store pattern:
(source: eaipatterns.com)
We also used this application for the Dead Letter Channel.
(source: eaipatterns.com)
If you don't want to build a custom solution, have a look at the ReplayService for JMS from CodeStreet.
The best way to do this is to use whatever tracing facility your middleware provider offers. Or possibly, you could set up an intermediate listener whose only job was to log messages and forward on to your existing application.
In most cases, you will find that the middleware provider already has the ability to do this for you with no changes or awareness by your application.
I would change the queue to a topic, and then keep the original consumer that processes the messages, and add another consumer for auditing the messages to a database.
Some JMS providers cater for topic-to-queue-bridge definitions, the consumers then receive from their own dedicated queues, and don't have to read past messages that are left on the queue due to other consumers being inactive.
Alternatively, you could write a log4j appender, which writes your logged messages to a database.