Kafka log compaction callback

Kafka log compaction callback - java

I'm using Kafka log compactor, and I was wondering if there lis any call-back function that I can use to be invoked as a consumer, when the Kafka broker perform the log compaction of my Topic.
So far I cannot see any callback for this, so I was wondering what is the standard strategy to detect that log compaction took place.
Regards

The client itself has no communication with the broker for such events. In the past, we used Splunk to capture the compaction events from the LogCleaner process logs, and we could generate webhook events based on that, if we needed if for any reason (we only used it for administrative debugging and clients never needed it)

Related

Access native KafkaConsumer in Camel RoutePolicy to change polling behaviour

I "monitor" the number of consecutive failures in my Camel processing pipeline with a Camel RoutePolicy.
When a threshold of failures is reached, I want to pause the processing for a configured amount of time because it probably means that the data from another system is not yet ready and therefore every message fails.
Since the source of my pipeline is a Kafka topic, I should not just stop the whole route because the broker would assume my consumer died and rebalance.
The best way to "pause" topic consumption seems to be to pause the [KafkaConsumer][3] (the native, not the one of Camel). Like this, the consumer continues to poll the broker, but it does not fetch any messages. Exactly what I need.
But can I access the native [KafkaConsumer][3] from the RoutePolicy context to call the pause and resume methods?
The spring-kafka listener containers expose these methods, it would be nice to use them from Camel too.

This is not yet supported, the two methods must be added to the camel-kafka consumer first.
There is also an existing issue for it: https://issues.apache.org/jira/browse/CAMEL-15106

Big GC pause caused by Kafka producer in microservice

One of my web services sends log message (size: 10K) to kafka (version: 0.10.2.1) for each online request, and I found that the KafkaProducer consumes lots of memory which caused long gc pausing time.
There is only one Kafka producer in my service, which is recommended officially.
I am just wondering if anyone has any suggestions on how to send messages to kafka without any impact on the online services?

Sounds like the producer isn't able to keep up with the rate at which your service is generating logs. (This is necessarily speculation since you have given minimal details about your setup).
Have you benchmarked your kafka cluster? Is it able to sustain the kind of load you generate?
Another avenue would be to decouple your kafka producer from your actual service. Since you're dealing with log messages, your application can simply write the logs to disk, and you can have a separate process reading these log files and sending them to kafka. This way the producing of messages won't impact your main service.
You can even have the kafka producer running on a different VM/Container entirely, and read the logs via something like an NFS mount.

Delayed Queue implementation in Storm – Kafka, Cassandra, Redis or Beanstalk?

I have a storm topology to process messages from Kafka and make HTTP call / saves in Cassandra based on the task in hand. I process the messages as soon as they come. How ever few messages are not processed completely due to the response form external sources such as an HTTP. I would like to implement a exponential backoff mechanism for retrial in-case HTTP server does not respond/returns an error message to retry after some time. I could think of few ideas using which I could achieve them. I would like to know which of them will be a better solution also if there is any other solution that I can use which is fault tolerant. Since this is used to implement an exponential backoff each message will have a different delay time.
Send it another topic in Kafka which is consumed later. My preferred Solution. I know we can use Kafka offset so consume the message at a latter stage. How ever I could not find documentation/Sample code to do the same. It will be really helpful if any one can help me out with this.
Write the message Cassandra / Redis and write a scheduler to fetch the messages which are not processed and are ready to be consumed and Send it to Kafka so that my storm topology can consume it. (Existing solution in other legacy project(Non Storm))
Send to Beanstalk with Delay (Existing solution in other legacy project(Non Storm). How ever I would like to avoid using this solution and use it only in case I am out of option).
While this is pretty much what I would like to do. I am not able to find documentation to implement delayProcessingUntil as mentioned in Kafka - Delayed Queue implementation using high level consumer
I have done scheduled job from Data-store and delay using Beanstalk in the past, but I would prefer to use Kafka.

Kafka spout has an exponential backoff message retry built-in. You can configure initial delay, delay multiplier and maximum delay through spout configuration. If there is an error in the bolt, you can call collector.fail(input). After that you just leave it to spout to do the retry.
https://github.com/apache/storm/blob/v0.10.0/external/storm-kafka/src/jvm/storm/kafka/ExponentialBackoffMsgRetryManager.java

I think your use case describes the need for a database rather than a queue. You want to temporarily store records until their time and then remove them so they don't show up in future searches. Trying to do that in a queue would be awkward at best, as your analysis shows.
I suggest you create another column family in Cassandra to hold these delayed requests. You'd store the request itself along with a time to retry. Whether you'd want to also have a time series of failed HTTP attempts and related data is up to you. As a delayed request is finally fulfilled, you'd delete the corresponding row from the CF. The search for delayed requests is straightforward, too.
Of course, any database, even a file on the local drive or in HDFS could work, too.

You might be interested in the Kafka Retry project https://github.com/IBM/kafka-retry. It provides a delayed retry queue using a single retry topic.

HornetQ: Remove all messages from queue not working with consumer on queue

I have a simple test case where I start a HornetQ server (V2.4.7.Final) as part of a Spring context. This works quite well and I have access to a queue via JMS, the HornetQ API and/or JMX.
Testcase
The test case is supposed to empty the queue at start, check that it is empty and then add 10 messages to the queue. As long as there are no consumers on this queue, this works using either the management queue or JMSQueueControl. Even doing some operation on the queue via JMX is working well.
Problem description
As soon as I add a message listener to this queue using Spring configuration - and the listener consumes the messages as expected - I cannot remove all messages from the queue. Neither method invocation via JMX, nor the management queue, nor JMSQueueControl is working, i.e. the methods are called without exception but they show no effect.
I thought that maybe I have to pause the queue before doing some modifications to its content but pausing does not work either. I can see that the queue is paused via JMX and the same is reported when using the API but the consumer still consumes messages from the very queue. Thus I think it has not been paused at all.
I know that it is difficult without the source code but from my point of view this is all pretty basic setup as you find it in many, many tutorials. Could anyone give advice what I am doing wrong. In case any source code is needed, please leave a comment and I will add the revelant parts.

HornetQ supports removal of messages which are in the queue on the broker side. Once the messages are dispatched to the consumer and buffered on the consumer, it is not possible to remove the messages from the consumer buffer using any management API.
One way to solve this (if you must) is to disable consumer buffering by setting the consumer-window-size to 0, but be aware of the potential performance degradation.
Otherwise, you need to handle it programmatically; by adding some validity checks before processing the message.
You can read more about HornetQ Flow control here https://docs.jboss.org/hornetq/2.2.5.Final/user-manual/en/html/flow-control.html

log4j: How does a Socket Appender work?

I'm not sure how Socket Appender works. I know that logging events are sent to particular port. Then we can print logs on a console or put into a file.
My question is more about the way logs are sent. Is there e.g. one queue? Is it synchronous or asynchronous? Can using it slow down my program?
I've found some info here, but it isn't clear for me.

From the SocketAppender documentation
Logging events are automatically buffered by the native TCP
implementation. This means that if the link to server is slow but
still faster than the rate of (log) event production by the client,
the client will not be affected by the slow network connection.
However, if the network connection is slower then the rate of event
production, then the client can only progress at the network rate. In
particular, if the network link to the the server is down, the client
will be blocked.
On the other hand, if the network link is up, but the
server is down, the client will not be blocked when making log
requests but the log events will be lost due to server unavailability.
Since the appender uses the TCP protocol, I would say the log events are "sort of synchronous".
Basically, the appender uses TCP to send the first log event to the server. However, if the network latency is so high that the message has still not been sent by the time a second event is generated, then the second log event will have to wait (and thus block), until the first event is consumed. So yes, it would slow down your application, if the app generates log events faster than the network can pass them on.
As mentioned by #Akhil and #Nikita, JMSAppender or AsyncAppender would be better options if you don't want the performance of your application to be impacted by the network latency.

Socket Appender sends the logs as a serialized Obect to a SocketNode or log server. In the appender the Connector Thread with a configured reconnectionDelay will check for the connection integrity and will dump all the logs if the connection is not initialized or disconnected.Hence no blocking on the application flow.
If you need better JMS features in sending log info across JVM try JMSAppender.
Log4j JMS appender can be used to send your log messages to JMS
broker.The events are serialized and transmitted as JMS message type ObjectMessage.
You can get a sample program HERE.

It seems to be synchronous (checked sources) but I may be mistaken. You can use AsyncAppender to make it asyncrhonous. See this.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.