How to avoid losing messages with Kafka streams

How to avoid losing messages with Kafka streams - java

We have a streams application that consumes messages from a source topic, does some processing and forward the results to a destination topic.
The structure of the messages are controlled by some avro schemas.
When starting consuming messages if the schema is not cached yet the application will try to retrieve it from schema registry. If for whichever reason the schema registry is not available (say a network glitch) then the currently being processed message is lost because the default handler is something called LogAndContinueExceptionHandler.
o.a.k.s.e.LogAndContinueExceptionHandler : Exception caught during Deserialization, taskId: 1_5, topic: my.topic.v1, partition: 5, offset: 142768
org.apache.kafka.common.errors.SerializationException: Error retrieving Avro schema for id 62
Caused by: java.net.SocketTimeoutException: connect timed out
at java.base/java.net.PlainSocketImpl.socketConnect(Native Method) ~[na:na]
...
o.a.k.s.p.internals.RecordDeserializer : stream-thread [my-app-StreamThread-3] task [1_5] Skipping record due to deserialization error. topic=[my.topic.v1] partition=[5] offset=[142768]
...
So my question is what would be the proper way of dealing with situations like described above and make sure you don't lose messages no matter what. Is there an out of the box LogAndRollbackExceptionHandler error handler or a way of implementing your own?
Thank you in advance for your inputs.

I've not worked a lot on Kafka, but when i did, i remember having issues such as the one you are describing in our system.
Let me tell you how we took care of our scenarios, maybe it would help you out too:
Scenario 1: If your messages are being lost at the publishing side (publisher --> kafka), you can configure Kafka acknowledgement setting according to your need, if you use spring cloud stream with kafka, the property is spring.cloud.stream.kafka.binder.required-acks
Possible values:
At most once (Ack=0)
Publisher does not care if Kafka acknowledges or not.
Send and forget
Data loss is possible
At least once (Ack=1)
If Kafka does not acknowledge, publisher resends message.
Possible duplication.
Acknowledgment is sent before message is copied to replicas.
Exactly once (Ack=all)
If Kafka does not acknowledge, publisher resends message.
However, if a message gets sent more than once to Kafka, there is no duplication.
Internal sequence number, used to decide if message has already been written on topic or not.
Min.insync.replicas property needs to be set to ensure what is the minimum number of replices that need to be synced before kafka acknowledges to the producer.
Scenario 2: If your data is being lost at the consumer side (kafka --> consumer), you can change the Auto Commit feature of Kafka according to your usage. This is the property if you are using Spring cloud stream spring.cloud.stream.kafka.bindings.input.consumer.AutoCommitOffset.
By default, AutoCommitOffset is true in kafka, and every message that is sent to the consumer is "committed" at Kafka's end, meaning it wont be sent again. However if you change AutoCommitOffset to false, you will have the power to poll the message from kafka in your code, and once you are done with your work, explicitly set commit to true to let kafka know that now you are done with the message.
If a message is not committed, kafka will keep resending it until it is.
Hope this helps you out, or atleast points you in the right direction.

Related

Message getting redelivered to RabbitMQ consumer setup using spring cloud stream

We have a SpringBoot service implementation in which we are using delayed messaging with the below setup:
Initial queue (Queue 1) that gets the message has a TTL set, the queue also has a dead letter exchange mentioned with a specific dead letter routing key.
Another queue (Queue 2) is bound to the DLX of the previous queue with the routing key which is set as the dead letter routing key
A consumer listens to the messages on Queue 2.
The delayed messaging seems to work as expected but I am seeing an issue with messages getting redelivered in certain scenarios.
If I have a debug point in my consumer and keep the message just after reading it for some time then once the current message has been processed consumer gets another message which has the below properties:
Redelivered property as true.
Property deliveryAttempt as 1
Only the first message has an x-death header and redelivered messages do not seem to have it.
The attempt to deliver the message is done 3 times as many times as I pause the consumer using the debug point each time after reading each redelivered message.
My understanding was that the acknowledgment mode by default is AUTO so once the consumer has read the message then it would not be redelivered?
I have tried using maxAttempts=1 property but does not seem to help.
I am using the spring cloud stream to create the consumers and the queues.

I used to run into this issue when the message processing in the consumer failed (exception thrown). In this case, if you have DLQ configured, make sure to add the following configuration as well so the failed message will be routed to the DLQ not the original listening queue.
"
rabbit:
autoBindDlq: true
"
Otherwise if you don't set up the DLQ, configure "autoBindDlq" to "false".

i.grpc.internal.AbstractClientStream - Received data on closed stream meaning

I have a Spring boot application (v2.2.10.RELEASE) that subscribes to multiple topics in pubSub and pulls async data and sends it to somewhere else. I am not using SpringGCP, just native google libraries
this is my subscriber setting:
// Instantiate an asynchronous message receiver.
MessageReceiver receiver =
(PubsubMessage message, AckReplyConsumer consumer) -> {
messages.add(message);
consumer.ack();
};
Subscriber subscriber = Subscriber.newBuilder(subscriptionName, receiver)
.setParallelPullCount(2)
.setFlowControlSettings(flowControlSettings)
.setCredentialsProvider(credentialsProvider)
.setExecutorProvider(executorProvider)
//.setChannelProvider()
.build();
With high traffic and big messages (2 - 4 kb) I encounter this info message:
[grpc-default-worker-ELG-1-1] INFO i.grpc.internal.AbstractClientStream - Received data on closed stream
first of all, I don't fully understand what that means? all that I noticed was that when this happens the delivered duplicated messages increase. so I assumed it meant that pubSub tried to reach the subscriber with some messages but the subscriber for some reason was not ready so pubSub will try to deliver the messages again. and hence more duplicates, is that right?
would this problem be solved using the TransportChannelProvider in subscribers? my understanding of the poorly written documentation, that this will create a new channel for delivery when the current in-use channel is closed, hence get rid of the previous log message.
if yes, how do I define the channel target string? and where can I find A NameResolver-compliant URI for the mangagedChannel. the snippet I mean is this:
private TransportChannelProvider getChannelProvider() {
ManagedChannel channel = ManagedChannelBuilder.forTarget(target).usePlaintext(true).build();
return FixedTransportChannelProvider.create(GrpcTransportChannel.create(channel));
}
I am pretty new to GCP so sorry if my question is not coherent enough

Using a custom TransportChannelProvider won't solve this type of issue. This is more likely an issue deeper down in the stack, e.g., at the gRPC level. There have been some open issues for this type of error [1, 2].
With regard to why it is causing duplicates, it is possible that the messages are getting delivered via a stream that is already closed (which aligns with the error message) because they were trapped in a lower-level buffer at the gRPC layer and therefore ended up being duplicates of messages that were subsequently delivered and processed via another stream. This could be a version of the issue discussed in the documentation around large backlogs of small messages. There was a fix for this issue in v1.109.0 of the Java client library, so if you are using a version older than that, it is worth updating.
If duplicates continue to be an issue, it would be best to reach out to support with the name of your subscription and the message IDs of some of the duplicate messages so that they can look at the delivery patterns for those messages and further diagnose if these redeliveries are unexpected.

How to delete a message from Queue in RabbitMQ

I am using Rabbit MQ to replicate what Jenkins does.
The only issue I am facing is, lets say, when 10 messages are in queue. And there are some duplicate messages which are in unacknowledged state.
And I need to delete those messages from queue, how do I achieve this?
My rabbitmq configuration is as follows, where each queue only has one consumer. So if I have 10 messages, all will get processed through same consumer's thread.
Queue queue = new Queue(sfdcConnectionDetails.getGitRepoId() + "_" + sfdcConnectionDetails.getBranchConnectedTo(), true);
rabbitMqSenderConfig.amqpAdmin().declareQueue(queue);
rabbitMqSenderConfig.amqpAdmin().declareBinding(BindingBuilder.bind(queue).to(new DirectExchange(byRepositoryRepositoryId.getRepository().getRepositoryId())).withQueueName());
RabbitMqConsumer container = new RabbitMqConsumer();
container.setConnectionFactory(rabbitMqSenderConfig.connectionFactory());
container.setQueueNames(queue.getName());
container.setConcurrentConsumers(1);
container.setMessageListener(new MessageListenerAdapter(new ConsumerHandler(****, ***), new Jackson2JsonMessageConverter()));
container.startConsumers();

You can use any plugin (e.g this) for deduplicating messages on the rabbit side.
Use cache on your consumer for detecting if the same message was processing recently.

As already suggested by #ekiryuhin, One of the approach you could take is assign a request_id tag it to the payload before producing message to RabbitMQ & on your consumer's end cache the request_id. Look out for the request_id if already present ignore payload and delete it.
This request_id might work as deduplication-id for your payloads.

Making sure a message published on a topic exchange is received by at least one consumer

TLDR; In the context of a topic exchange and queues created on the fly by the consumers, how to have a message redelivered / the producer notified when no consumer consumes the message?
I have the following components:
a main service, producing files. Each file has a certain category (e.g. pictures.profile, pictures.gallery)
a set of workers, consuming files and producing a textual output from them (e.g. the size of the file)
I currently have a single RabbitMQ topic exchange.
The producer sends messages to the exchange with routing_key = file_category.
Each consumer creates a queue and binds the exchange to this queue for a set of routing keys (e.g. pictures.* and videos.trending).
When a consumer has processed a file, it pushes the result in a processing_results queue.
Now - this works properly, but it still has a major issue. Currently, if the publisher sends a message with a routing key that no consumer is bound to, the message will be lost. This is because even if the queue created by the consumers is durable, it is destroyed as soon as the consumer disconnects since it is unique to this consumer.
Consumer code (python):
channel.exchange_declare(exchange=exchange_name, type='topic', durable=True)
result = channel.queue_declare(exclusive = True, durable=True)
queue_name = result.method.queue
topics = [ "pictures.*", "videos.trending" ]
for topic in topics:
channel.queue_bind(exchange=exchange_name, queue=queue_name, routing_key=topic)
channel.basic_consume(my_handler, queue=queue_name)
channel.start_consuming()
Loosing a message in this condition is not acceptable in my use case.
Attempted solution
However, "loosing" a message becomes acceptable if the producer is notified that no consumer received the message (in this case it can just resend it later). I figured out the mandatory field could help, since the specification of AMQP states:
This flag tells the server how to react if the message cannot be routed to a queue. If this flag is set, the server will return an unroutable message with a Return method.
This is indeed working - in the producer, I am able to register a ReturnListener :
rabbitMq.confirmSelect();
rabbitMq.addReturnListener( (int replyCode, String replyText, String exchange, String routingKey, AMQP.BasicProperties properties, byte[] body) -> {
log.info("A message was returned by the broker");
});
rabbitMq.basicPublish(exchangeName, "pictures.profile", true /* mandatory */, MessageProperties.PERSISTENT_TEXT_PLAIN, messageBytes);
This will as expected print A message was returned by the broker if a message is sent with a routing key no consumer is bound to.
Now, I also want to know when the message was correctly received by a consumer. So I tried registering a ConfirmListener as well:
rabbitMq.addConfirmListener(new ConfirmListener() {
void handleAck(long deliveryTag, boolean multiple) throws IOException {
log.info("ACK message {}, multiple = ", deliveryTag, multiple);
}
void handleNack(long deliveryTag, boolean multiple) throws IOException {
log.info("NACK message {}, multiple = ", deliveryTag, multiple);
}
});
The issue here is that the ACK is sent by the broker, not by the consumer itself. So when the producer sends a message with a routing key K:
If a consumer is bound to this routing key, the broker just sends an ACK
Otherwise, the broker sends a basic.return followed by a ACK
Cf the docs:
For unroutable messages, the broker will issue a confirm once the exchange verifies a message won't route to any queue (returns an empty list of queues). If the message is also published as mandatory, the basic.return is sent to the client before basic.ack. The same is true for negative acknowledgements (basic.nack).
So while my problem is theoretically solvable using this, it would make the logic of knowing if a message was correctly consumed very complicated (especially in the context of multi threading, persistence in a database, etc.):
send a message
on receive ACK:
if no basic.return was received for this message
the message was correctly consumed
else
the message wasn't correctly consumed
on receive basic.return
the message wasn't correctly consumed
Possible other solutions
Have a queue for each file category, i.e. the queues pictures_profile, pictures_gallery, etc. Not good since it removes a lot of flexibility for the consumers
Have a "response timeout" logic in the producer. The producer sends a message. It expects an "answer" for this message in the processing_results queue. A solution would be to resend the message if it hasn't been answered to after X seconds. I don't like it though, it would create some additional tricky logic in the producer.
Produce the messages with a TTL of 0, and have the producer listen on a dead-letter exchange. This is the official suggested solution to replace the 'immediate' flag removed in RabbitMQ 3.0 (see paragraph Removal of "immediate" flag). According to the docs of the dead letter exchanges, a dead letter exchange can only be configured per-queue. So it wouldn't work here
[edit] A last solution I see is to have every consumer create a durable queue that isn't destroyed when he disconnects, and have it listen on it. Example: consumer1 creates queue-consumer-1 that is bound to the message of myExchange having a routing key abcd. The issue I foresee is that it implies to find an unique identifier for every consumer application instance (e.g. hostname of the machine it runs on).
I would love to have some inputs on that - thanks!
Related to:
RabbitMQ: persistent message with Topic exchange (not applicable here since queues are created "on the fly")
Make sure the broker holds messages until at least one consumer gets it
RabbitMQ Topic Exchange with persisted queue
[edit] Solution
I ended up implementing something that uses a basic.return, as mentioned earlier. It is actually not so tricky to implement, you just have to make sure that your method producing the messages and the method handling the basic returns are synchronized (or have a shared lock if not in the same class), otherwise you can end up with interleaved execution flows that will mess up your business logic.

I believe that an alternate exchange would be the best fit for your use case for the part regarding the identification of not routed messages.
Whenever an exchange with a configured AE cannot route a message to any queue, it publishes the message to the specified AE instead.
Basically upon creation of the "main" exchange, you configure an alternate exchange for it.
For the referenced alternate exchange, I tend to go with a fanout, then create a queue (notroutedq) binded to it.
This means any message that is not published to at least one of the queues bound to your "main" exchange will end up in the notroutedq
Now regarding your statement:
because even if the queue created by the consumers is durable, it is destroyed as soon as the consumer disconnects since it is unique to this consumer.
Seems that you have configured your queues with auto-delete set to true.
If so, in case of disconnect, as you stated, the queue is destroyed and the messages still present on the queue are lost, case not covered by the alternate exchange configuration.
It's not clear from your use case description whether you'd expect in some cases for a message to end up in more than one queue, seemed more a case of one queue per type of processing expected (while keeping the grouping flexible). If indeed the queue split is related to type of processing, I do not see the benefit of setting the queue with auto-delete, expect maybe not having to do any cleanup maintenance when you want to change the bindings.
Assuming you can go with durable queues, then a dead letter exchange (would again go with fanout) with a binding to a dlq would cover the missing cases.
not routed covered by alternate exchange
correct processing already handled by your processing_result queue
problematic processing or too long to be processed covered by the dead letter exchange, in which case the additional headers added upon dead lettering the message can even help to identify the type of actions to take

How can I send large messages with Kafka (over 15MB)?

I send String-messages to Kafka V. 0.8 with the Java Producer API.
If the message size is about 15 MB I get a MessageSizeTooLargeException.
I have tried to set message.max.bytesto 40 MB, but I still get the exception. Small messages worked without problems.
(The exception appear in the producer, I don't have a consumer in this application.)
What can I do to get rid of this exception?
My example producer config
private ProducerConfig kafkaConfig() {
Properties props = new Properties();
props.put("metadata.broker.list", BROKERS);
props.put("serializer.class", "kafka.serializer.StringEncoder");
props.put("request.required.acks", "1");
props.put("message.max.bytes", "" + 1024 * 1024 * 40);
return new ProducerConfig(props);
}
Error-Log:
4709 [main] WARN kafka.producer.async.DefaultEventHandler - Produce request with correlation id 214 failed due to [datasift,0]: kafka.common.MessageSizeTooLargeException
4869 [main] WARN kafka.producer.async.DefaultEventHandler - Produce request with correlation id 217 failed due to [datasift,0]: kafka.common.MessageSizeTooLargeException
5035 [main] WARN kafka.producer.async.DefaultEventHandler - Produce request with correlation id 220 failed due to [datasift,0]: kafka.common.MessageSizeTooLargeException
5198 [main] WARN kafka.producer.async.DefaultEventHandler - Produce request with correlation id 223 failed due to [datasift,0]: kafka.common.MessageSizeTooLargeException
5305 [main] ERROR kafka.producer.async.DefaultEventHandler - Failed to send requests for topics datasift with correlation ids in [213,224]
kafka.common.FailedToSendMessageException: Failed to send messages after 3 tries.
at kafka.producer.async.DefaultEventHandler.handle(Unknown Source)
at kafka.producer.Producer.send(Unknown Source)
at kafka.javaapi.producer.Producer.send(Unknown Source)

You need to adjust three (or four) properties:
Consumer side:fetch.message.max.bytes - this will determine the largest size of a message that can be fetched by the consumer.
Broker side: replica.fetch.max.bytes - this will allow for the replicas in the brokers to send messages within the cluster and make sure the messages are replicated correctly. If this is too small, then the message will never be replicated, and therefore, the consumer will never see the message because the message will never be committed (fully replicated).
Broker side: message.max.bytes - this is the largest size of the message that can be received by the broker from a producer.
Broker side (per topic): max.message.bytes - this is the largest size of the message the broker will allow to be appended to the topic. This size is validated pre-compression. (Defaults to broker's message.max.bytes.)
I found out the hard way about number 2 - you don't get ANY exceptions, messages, or warnings from Kafka, so be sure to consider this when you are sending large messages.

Minor changes required for Kafka 0.10 and the new consumer compared to laughing_man's answer:
Broker: No changes, you still need to increase properties message.max.bytes and replica.fetch.max.bytes. message.max.bytes has to be equal or smaller(*) than replica.fetch.max.bytes.
Producer: Increase max.request.size to send the larger message.
Consumer: Increase max.partition.fetch.bytes to receive larger messages.
(*) Read the comments to learn more about message.max.bytes<=replica.fetch.max.bytes

The answer from #laughing_man is quite accurate. But still, I wanted to give a recommendation which I learned from Kafka expert Stephane Maarek. We actively applied this solution in our live systems.
Kafka isn’t meant to handle large messages.
Your API should use cloud storage (for example, AWS S3) and simply push a reference to S3 to Kafka or any other message broker. You'll need to find a place to save your data, whether it can be a network drive or something else entirely, but it shouldn't be a message broker.
If you don't want to proceed with the recommended and reliable solution above,
The message max size is 1MB (the setting in your brokers is called message.max.bytes) Apache Kafka. If you really needed it badly, you could increase that size and make sure to increase the network buffers for your producers and consumers.
And if you really care about splitting your message, make sure each message split has the exact same key so that it gets pushed to the same partition, and your message content should report a “part id” so that your consumer can fully reconstruct the message.
If the message is text-based try to compress the data, which may reduce the data size, but not magically.
Again, you have to use an external system to store that data and just push an external reference to Kafka. That is a very common architecture and one you should go with and widely accepted.
Keep that in mind Kafka works best only if the messages are huge in amount but not in size.
Source: https://www.quora.com/How-do-I-send-Large-messages-80-MB-in-Kafka

The idea is to have equal size of message being sent from Kafka Producer to Kafka Broker and then received by Kafka Consumer i.e.
Kafka producer --> Kafka Broker --> Kafka Consumer
Suppose if the requirement is to send 15MB of message, then the Producer, the Broker and the Consumer, all three, needs to be in sync.
Kafka Producer sends 15 MB --> Kafka Broker Allows/Stores 15 MB --> Kafka Consumer receives 15 MB
The setting therefore should be:
a) on Broker:
message.max.bytes=15728640
replica.fetch.max.bytes=15728640
b) on Consumer:
fetch.message.max.bytes=15728640

You need to override the following properties:
Broker Configs($KAFKA_HOME/config/server.properties)
replica.fetch.max.bytes
message.max.bytes
Consumer Configs($KAFKA_HOME/config/consumer.properties)
This step didn't work for me. I add it to the consumer app and it was working fine
fetch.message.max.bytes
Restart the server.
look at this documentation for more info:
http://kafka.apache.org/08/configuration.html

I think, most of the answers here are kind of outdated or not entirely complete.
To refer on the answer of Sacha Vetter (with the update for Kafka 0.10), I'd like to provide some additional Information and links to the official documentation.
Producer Configuration:
max.request.size (Link) has to be increased for files bigger than 1 MB, otherwise they are rejected
Broker/Topic configuration:
message.max.bytes (Link) may be set, if one like to increase the message size on broker level. But, from the documentation: "This can be set per topic with the topic level max.message.bytes config."
max.message.bytes (Link) may be increased, if only one topic should be able to accept lager files. The broker configuration must not be changed.
I'd always prefer a topic-restricted configuration, due to the fact, that I can configure the topic by myself as a client for the Kafka cluster (e.g. with the admin client). I may not have any influence on the broker configuration itself.
In the answers from above, some more configurations are mentioned as necessary:
replica.fetch.max.bytes (Link) (Broker config)
From the documentation: "This is not an absolute maximum, if the first record batch in the first non-empty partition of the fetch is larger than this value, the record batch will still be returned to ensure that progress can be made."
max.partition.fetch.bytes (Link) (Consumer config)
From the documentation: "Records are fetched in batches by the consumer. If the first record batch in the first non-empty partition of the fetch is larger than this limit, the batch will still be returned to ensure that the consumer can make progress."
fetch.max.bytes (Link) (Consumer config; not mentioned above, but same category)
From the documentation: "Records are fetched in batches by the consumer, and if the first record batch in the first non-empty partition of the fetch is larger than this value, the record batch will still be returned to ensure that the consumer can make progress."
Conclusion: The configurations regarding fetching messages are not necessary to change for processing messages, lager than the default values of these configuration (had this tested in a small setup). Probably, the consumer may always get batches of size 1. However, two of the configurations from the first block has to be set, as mentioned in the answers before.
This clarification should not tell anything about performance and should not be a recommendation to set or not to set these configuration. The best values has to be evaluated individually depending on the concrete planned throughput and data structure.

One key thing to remember that message.max.bytes attribute must be in sync with the consumer's fetch.message.max.bytes property. the fetch size must be at least as large as the maximum message size otherwise there could be situation where producers can send messages larger than the consumer can consume/fetch. It might worth taking a look at it.
Which version of Kafka you are using? Also provide some more details trace that you are getting. is there some thing like ... payload size of xxxx larger
than 1000000 coming up in the log?

For people using landoop kafka:
You can pass the config values in the environment variables like:
docker run -d --rm -p 2181:2181 -p 3030:3030 -p 8081-8083:8081-8083 -p 9581-9585:9581-9585 -p 9092:9092
-e KAFKA_TOPIC_MAX_MESSAGE_BYTES=15728640 -e KAFKA_REPLICA_FETCH_MAX_BYTES=15728640 landoop/fast-data-dev:latest `
This sets topic.max.message.bytes and replica.fetch.max.bytes on the broker.
And if you're using rdkafka then pass the message.max.bytes in the producer config like:
const producer = new Kafka.Producer({
'metadata.broker.list': 'localhost:9092',
'message.max.bytes': '15728640',
'dr_cb': true
});
Similarly, for the consumer,
const kafkaConf = {
"group.id": "librd-test",
"fetch.message.max.bytes":"15728640",
... .. }

Here is how I achieved successfully sending data up to 100mb using kafka-python==2.0.2:
Broker:
consumer = KafkaConsumer(
...
max_partition_fetch_bytes=max_bytes,
fetch_max_bytes=max_bytes,
)
Producer (See final solution at the end):
producer = KafkaProducer(
...
max_request_size=KafkaSettings.MAX_BYTES,
)
Then:
producer.send(topic, value=data).get()
After sending data like this, the following exception appeared:
MessageSizeTooLargeError: The message is n bytes when serialized which is larger than the total memory buffer you have configured with the buffer_memory configuration.
Finally I increased buffer_memory (default 32mb) to receive the message on the other end.
producer = KafkaProducer(
...
max_request_size=KafkaSettings.MAX_BYTES,
buffer_memory=KafkaSettings.MAX_BYTES * 3,
)

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.