We have a streams application that consumes messages from a source topic, does some processing and forward the results to a destination topic.
The structure of the messages are controlled by some avro schemas.
When starting consuming messages if the schema is not cached yet the application will try to retrieve it from schema registry. If for whichever reason the schema registry is not available (say a network glitch) then the currently being processed message is lost because the default handler is something called LogAndContinueExceptionHandler.
o.a.k.s.e.LogAndContinueExceptionHandler : Exception caught during Deserialization, taskId: 1_5, topic: my.topic.v1, partition: 5, offset: 142768
org.apache.kafka.common.errors.SerializationException: Error retrieving Avro schema for id 62
Caused by: java.net.SocketTimeoutException: connect timed out
at java.base/java.net.PlainSocketImpl.socketConnect(Native Method) ~[na:na]
...
o.a.k.s.p.internals.RecordDeserializer : stream-thread [my-app-StreamThread-3] task [1_5] Skipping record due to deserialization error. topic=[my.topic.v1] partition=[5] offset=[142768]
...
So my question is what would be the proper way of dealing with situations like described above and make sure you don't lose messages no matter what. Is there an out of the box LogAndRollbackExceptionHandler error handler or a way of implementing your own?
Thank you in advance for your inputs.
I've not worked a lot on Kafka, but when i did, i remember having issues such as the one you are describing in our system.
Let me tell you how we took care of our scenarios, maybe it would help you out too:
Scenario 1: If your messages are being lost at the publishing side (publisher --> kafka), you can configure Kafka acknowledgement setting according to your need, if you use spring cloud stream with kafka, the property is spring.cloud.stream.kafka.binder.required-acks
Possible values:
At most once (Ack=0)
Publisher does not care if Kafka acknowledges or not.
Send and forget
Data loss is possible
At least once (Ack=1)
If Kafka does not acknowledge, publisher resends message.
Possible duplication.
Acknowledgment is sent before message is copied to replicas.
Exactly once (Ack=all)
If Kafka does not acknowledge, publisher resends message.
However, if a message gets sent more than once to Kafka, there is no duplication.
Internal sequence number, used to decide if message has already been written on topic or not.
Min.insync.replicas property needs to be set to ensure what is the minimum number of replices that need to be synced before kafka acknowledges to the producer.
Scenario 2: If your data is being lost at the consumer side (kafka --> consumer), you can change the Auto Commit feature of Kafka according to your usage. This is the property if you are using Spring cloud stream spring.cloud.stream.kafka.bindings.input.consumer.AutoCommitOffset.
By default, AutoCommitOffset is true in kafka, and every message that is sent to the consumer is "committed" at Kafka's end, meaning it wont be sent again. However if you change AutoCommitOffset to false, you will have the power to poll the message from kafka in your code, and once you are done with your work, explicitly set commit to true to let kafka know that now you are done with the message.
If a message is not committed, kafka will keep resending it until it is.
Hope this helps you out, or atleast points you in the right direction.
We are using Prometheus and Grafana for monitoring our Kafka cluster.
In our application, we use Kafka streams and there is a chance that Kafka stream getting stopped due to exception. We are logging the event setUnCaughtExceptionHandler but, we also need some kind of alerting when the stream stops.
What we currently have is, jmx_exporter running as a agent and exposes Kafka metrics through an endpoint and prometheus fetches the metrics from the endpoint.
We don't see any kind of metrics which gives the count of active consumers per topic. Are we missing something? Any suggestions on how to get the number of active consumers and send alerts when the consumer stops.
we had similar needs and added Kafka Consumer Lag per partition into Grafana, and also added alerts if lag is more than specified threshold (threshold should be different per each topic, depending on load, e.g. for some topics it could be 10, and for highly loaded - 100000). so if you have more that e.g. 1000 unprocessed messages, you will get alert.
you could add state listener for each kafka stream and in case stream is in error state, log error or send email:
kafkaStream.setStateListener((newState, oldState) -> {
log.info("Kafka stream state changed [{}] >>>>> [{}]", oldState, newState);
if (newState == KafkaStreams.State.ERROR || newState == KafkaStreams.State.PENDING_SHUTDOWN) {
log.error("Kafka Stream is in [{}] state. Application should be restarted", newState);
}
});
also you could add health check indicator (e.g. via REST endpoint or via spring-boot HealthIndicator) that provides info whether stream is running or not:
KafkaStreams.State streamState = kafkaStream.state();
state.isRunning();
I also haven't found any kafka streams metrics which provide info about active consumers or available connected partitions, but as for me it would be nice if kafka streams provide such data (and hope it will be available in future releases).
I have a unique problem which is happening like 50-100 times a day with message volume of ~2 millions per day on the topic.I am using Kafka producer API 0.8.2.1 and I have 12 brokers (v 0.8.2.2) running in prod with replication of 4. I have a topic with 60 partitions and I am calculating partition for all my messages and providing the value in the ProducerRecord itself. Now, the issue -
Application creates 'ProducerRecord' using -
new ProducerRecord<String, String>(topic, 30, null, message1);
providing topic, value message1 and partition 30. Then application call the send method and future is returned -
// null is for callback
Future<RecordMetadata> future = producer.send(producerRecord. null);
Now, app prints the offset and partition value by calling get on Future and then getting values from RecordMetadata - this is what i get -
Kafka Response : partition 30, offset 3416092
Now, the app produce the next message - message2 to same partition -
new ProducerRecord<String, String>(topic, 30, null, message2);
and kafka response -
Kafka Response : partition 30, offset 3416092
I receive the same offset again, and if I pull message from the offset of partition 30 using simple consumer, it ends up being the message2 which essentially mean i lost the message1.
Based on KafkaProducer documentation KafkaProducer, I am using a single producer instance (static instance shared) among 10 threads.
The producer is thread safe and should generally be shared among all threads for best performance.
I am using all default properties for producer (except max.request.size: 10000000), the message (String payload) size can be a few kbs to a 500 kbs. I am using ack value of 1.
What am i doing wrong here? Is there something I can look into or any producer property or server property I can tweak to make sure i don't lose any messages. I need some help here soon as I am losing some critical messages in production which is not good at all coz because of no exception its even hard to find out the message lost unless downstream process reports it.
EDIT:
The servers and client are now updated to kafka version 0.8.2.2. Also, the 10 app threads each use their own instance of kafka producer now. We are seeing better performance but still there is message loss.
Producer Properties:
value.serializer: org.apache.kafka.common.serialization.StringSerializer
key.serializer: org.apache.kafka.common.serialization.StringSerializer
bootstrap.servers: {SERVER VIP ENDPOINT}
acks: 1
batch.size: 204800
linger.ms: 10
send.buffer.bytes: 1048576
max.request.size: 10000000
I send String-messages to Kafka V. 0.8 with the Java Producer API.
If the message size is about 15 MB I get a MessageSizeTooLargeException.
I have tried to set message.max.bytesto 40 MB, but I still get the exception. Small messages worked without problems.
(The exception appear in the producer, I don't have a consumer in this application.)
What can I do to get rid of this exception?
My example producer config
private ProducerConfig kafkaConfig() {
Properties props = new Properties();
props.put("metadata.broker.list", BROKERS);
props.put("serializer.class", "kafka.serializer.StringEncoder");
props.put("request.required.acks", "1");
props.put("message.max.bytes", "" + 1024 * 1024 * 40);
return new ProducerConfig(props);
}
Error-Log:
4709 [main] WARN kafka.producer.async.DefaultEventHandler - Produce request with correlation id 214 failed due to [datasift,0]: kafka.common.MessageSizeTooLargeException
4869 [main] WARN kafka.producer.async.DefaultEventHandler - Produce request with correlation id 217 failed due to [datasift,0]: kafka.common.MessageSizeTooLargeException
5035 [main] WARN kafka.producer.async.DefaultEventHandler - Produce request with correlation id 220 failed due to [datasift,0]: kafka.common.MessageSizeTooLargeException
5198 [main] WARN kafka.producer.async.DefaultEventHandler - Produce request with correlation id 223 failed due to [datasift,0]: kafka.common.MessageSizeTooLargeException
5305 [main] ERROR kafka.producer.async.DefaultEventHandler - Failed to send requests for topics datasift with correlation ids in [213,224]
kafka.common.FailedToSendMessageException: Failed to send messages after 3 tries.
at kafka.producer.async.DefaultEventHandler.handle(Unknown Source)
at kafka.producer.Producer.send(Unknown Source)
at kafka.javaapi.producer.Producer.send(Unknown Source)
You need to adjust three (or four) properties:
Consumer side:fetch.message.max.bytes - this will determine the largest size of a message that can be fetched by the consumer.
Broker side: replica.fetch.max.bytes - this will allow for the replicas in the brokers to send messages within the cluster and make sure the messages are replicated correctly. If this is too small, then the message will never be replicated, and therefore, the consumer will never see the message because the message will never be committed (fully replicated).
Broker side: message.max.bytes - this is the largest size of the message that can be received by the broker from a producer.
Broker side (per topic): max.message.bytes - this is the largest size of the message the broker will allow to be appended to the topic. This size is validated pre-compression. (Defaults to broker's message.max.bytes.)
I found out the hard way about number 2 - you don't get ANY exceptions, messages, or warnings from Kafka, so be sure to consider this when you are sending large messages.
Minor changes required for Kafka 0.10 and the new consumer compared to laughing_man's answer:
Broker: No changes, you still need to increase properties message.max.bytes and replica.fetch.max.bytes. message.max.bytes has to be equal or smaller(*) than replica.fetch.max.bytes.
Producer: Increase max.request.size to send the larger message.
Consumer: Increase max.partition.fetch.bytes to receive larger messages.
(*) Read the comments to learn more about message.max.bytes<=replica.fetch.max.bytes
The answer from #laughing_man is quite accurate. But still, I wanted to give a recommendation which I learned from Kafka expert Stephane Maarek. We actively applied this solution in our live systems.
Kafka isn’t meant to handle large messages.
Your API should use cloud storage (for example, AWS S3) and simply push a reference to S3 to Kafka or any other message broker. You'll need to find a place to save your data, whether it can be a network drive or something else entirely, but it shouldn't be a message broker.
If you don't want to proceed with the recommended and reliable solution above,
The message max size is 1MB (the setting in your brokers is called message.max.bytes) Apache Kafka. If you really needed it badly, you could increase that size and make sure to increase the network buffers for your producers and consumers.
And if you really care about splitting your message, make sure each message split has the exact same key so that it gets pushed to the same partition, and your message content should report a “part id” so that your consumer can fully reconstruct the message.
If the message is text-based try to compress the data, which may reduce the data size, but not magically.
Again, you have to use an external system to store that data and just push an external reference to Kafka. That is a very common architecture and one you should go with and widely accepted.
Keep that in mind Kafka works best only if the messages are huge in amount but not in size.
Source: https://www.quora.com/How-do-I-send-Large-messages-80-MB-in-Kafka
The idea is to have equal size of message being sent from Kafka Producer to Kafka Broker and then received by Kafka Consumer i.e.
Kafka producer --> Kafka Broker --> Kafka Consumer
Suppose if the requirement is to send 15MB of message, then the Producer, the Broker and the Consumer, all three, needs to be in sync.
Kafka Producer sends 15 MB --> Kafka Broker Allows/Stores 15 MB --> Kafka Consumer receives 15 MB
The setting therefore should be:
a) on Broker:
message.max.bytes=15728640
replica.fetch.max.bytes=15728640
b) on Consumer:
fetch.message.max.bytes=15728640
You need to override the following properties:
Broker Configs($KAFKA_HOME/config/server.properties)
replica.fetch.max.bytes
message.max.bytes
Consumer Configs($KAFKA_HOME/config/consumer.properties)
This step didn't work for me. I add it to the consumer app and it was working fine
fetch.message.max.bytes
Restart the server.
look at this documentation for more info:
http://kafka.apache.org/08/configuration.html
I think, most of the answers here are kind of outdated or not entirely complete.
To refer on the answer of Sacha Vetter (with the update for Kafka 0.10), I'd like to provide some additional Information and links to the official documentation.
Producer Configuration:
max.request.size (Link) has to be increased for files bigger than 1 MB, otherwise they are rejected
Broker/Topic configuration:
message.max.bytes (Link) may be set, if one like to increase the message size on broker level. But, from the documentation: "This can be set per topic with the topic level max.message.bytes config."
max.message.bytes (Link) may be increased, if only one topic should be able to accept lager files. The broker configuration must not be changed.
I'd always prefer a topic-restricted configuration, due to the fact, that I can configure the topic by myself as a client for the Kafka cluster (e.g. with the admin client). I may not have any influence on the broker configuration itself.
In the answers from above, some more configurations are mentioned as necessary:
replica.fetch.max.bytes (Link) (Broker config)
From the documentation: "This is not an absolute maximum, if the first record batch in the first non-empty partition of the fetch is larger than this value, the record batch will still be returned to ensure that progress can be made."
max.partition.fetch.bytes (Link) (Consumer config)
From the documentation: "Records are fetched in batches by the consumer. If the first record batch in the first non-empty partition of the fetch is larger than this limit, the batch will still be returned to ensure that the consumer can make progress."
fetch.max.bytes (Link) (Consumer config; not mentioned above, but same category)
From the documentation: "Records are fetched in batches by the consumer, and if the first record batch in the first non-empty partition of the fetch is larger than this value, the record batch will still be returned to ensure that the consumer can make progress."
Conclusion: The configurations regarding fetching messages are not necessary to change for processing messages, lager than the default values of these configuration (had this tested in a small setup). Probably, the consumer may always get batches of size 1. However, two of the configurations from the first block has to be set, as mentioned in the answers before.
This clarification should not tell anything about performance and should not be a recommendation to set or not to set these configuration. The best values has to be evaluated individually depending on the concrete planned throughput and data structure.
One key thing to remember that message.max.bytes attribute must be in sync with the consumer's fetch.message.max.bytes property. the fetch size must be at least as large as the maximum message size otherwise there could be situation where producers can send messages larger than the consumer can consume/fetch. It might worth taking a look at it.
Which version of Kafka you are using? Also provide some more details trace that you are getting. is there some thing like ... payload size of xxxx larger
than 1000000 coming up in the log?
For people using landoop kafka:
You can pass the config values in the environment variables like:
docker run -d --rm -p 2181:2181 -p 3030:3030 -p 8081-8083:8081-8083 -p 9581-9585:9581-9585 -p 9092:9092
-e KAFKA_TOPIC_MAX_MESSAGE_BYTES=15728640 -e KAFKA_REPLICA_FETCH_MAX_BYTES=15728640 landoop/fast-data-dev:latest `
This sets topic.max.message.bytes and replica.fetch.max.bytes on the broker.
And if you're using rdkafka then pass the message.max.bytes in the producer config like:
const producer = new Kafka.Producer({
'metadata.broker.list': 'localhost:9092',
'message.max.bytes': '15728640',
'dr_cb': true
});
Similarly, for the consumer,
const kafkaConf = {
"group.id": "librd-test",
"fetch.message.max.bytes":"15728640",
... .. }
Here is how I achieved successfully sending data up to 100mb using kafka-python==2.0.2:
Broker:
consumer = KafkaConsumer(
...
max_partition_fetch_bytes=max_bytes,
fetch_max_bytes=max_bytes,
)
Producer (See final solution at the end):
producer = KafkaProducer(
...
max_request_size=KafkaSettings.MAX_BYTES,
)
Then:
producer.send(topic, value=data).get()
After sending data like this, the following exception appeared:
MessageSizeTooLargeError: The message is n bytes when serialized which is larger than the total memory buffer you have configured with the buffer_memory configuration.
Finally I increased buffer_memory (default 32mb) to receive the message on the other end.
producer = KafkaProducer(
...
max_request_size=KafkaSettings.MAX_BYTES,
buffer_memory=KafkaSettings.MAX_BYTES * 3,
)
I have a publisher that is pushing messages to a topic. I have multiple subscribers each doing a different task once they consume the message from the topic.
Now I want my system to scale to multiple instances of the same process running on different hosts/same host. e.g. I want to run multiple copies of my application A on different hosts so that if one instance of A is slow, then the other instances can pull in subsequent messages and make forward progress..
I found out that this is possible using virtual destinations. I followed the steps here -
http://activemq.apache.org/virtual-destinations.html
But how do i setup my multiple subscribers to the same topic with the same client id? when i try to do that, i get errors. when i try some other way, it doesn't work. can someone help?
Normally, I start a subscriber by doing the below steps -
ActiveMQConnectionFactory connectionFactory = new ActiveMQConnectionFactory(ActiveMQConnection.DEFAULT_USER, ActiveMQConnection.DEFAULT_PASSWORD, ActiveMQConnection.DEFAULT_BROKER_URL;);
activeMQConnection = connectionFactory.createConnection();
activeMQConnection.setClientID("subscriber1");
activeMQConnection.setExceptionListener(exceptionListener);
activeMQSession = activeMQConnection.createSession(false, Session.CLIENT_ACKNOWLEDGE);
activeMQTopic = activeMQSession.createTopic("myTopic");
activeConsumer = activeMQSession.createDurableSubscriber(activeMQTopic, "myTopic");
activeConsumer.setMessageListener(messageListener);
activeMQConnection.start();
when i try to create a 2nd subscriber and pass the topic name as "VirtualTopic.myTopic", nothing happens.
thanks
The virtual topics feature is very simple and quite powerful once you understand it.
When using virtual topics - there is no need for durable consumers. That is because for each client you will get an instance of regular queue created. If you have 5 clients (application A, B, C, D, E) you will get 5 queues created and populated with the copy of the messages every time message is sent to the virtual topic.
Actually it is a limitation of durable consumer - that only ONE connection is allowed per clientId. Being a regular queue, you can create as many consumers as you like and queue will guarantee that 1 message will be received only by 1 consumer. So if you have application A that takes 1 minute to process a message, you can create 5 instances of it listening to the same queue. When you will post 5 messages within 1 second, each of your application will receive its own message to process.
There are not well documented requirements which are not intuitive. To make virtual topic work you need
Use VirtualTopic. in your topic name, for example VirtualTopic.Orders (this prefix can be configured)
Use Consumer. in the name of the queue you. Like Consumer.ApplicationA.VirtualTopic.Orders where ApplicationA is actually your client id
Use regular subscribers not durable ones for the queue above.
Example:
string activeMqConsumerTopic = "Consumer.AmqTestConsumer.VirtualTopic.Orders";
IQueue queue = SessionUtil.GetQueue(session, activeMqConsumerTopic);
IMessageConsumer consumer = session.CreateConsumer(queue);
Queue is created for automatically whenever the first instance of consumer is subscribed to it. Since that moment all messages that are sent to topic are duplicated/copied into all associated queues.
Hope this helps.
Virtual Topics is the answer for you. However you have to define a naming standard for all virtual topic queues. Here is the answer for this:
Virtual Topics helps with following prospective:
1. Load Balancing of messages
2. Fast Failover of Subscriber
3. Re-using same connection Factory for different Producers and Consumers. (Durable Subscribers needs a Unique JMS Client Id and same cannot be reused for any other Producer or consumer)
here is the way to do it, below example creates prefix VTCON.*. So every queue with this prefix and Topic Name at the end will consumer the message.
<virtualDestinations>
<virtualTopic name="TEST.TP01" prefix="VTCON.*." selectorAware="false"/>
</virtualDestinations>
http://workingwithqueues.blogspot.com/2012/05/activemq-virtual-topics-or-virtual.html