Faced an issue on prod system where 1 message was left unacked for 30 mins which lead to consumer being shutdown. Now I have added shutdownlisteners as described in rabbit mq docs -
https://rabbitmq.github.io/rabbitmq-java-client/api/4.x.x/com/rabbitmq/client/ShutdownListener.html
if (cause.isHardError()) {
log.error("Connection error with cause : {}", cause);
Connection conn = (Connection) cause.getReference();
if (!cause.isInitiatedByApplication()) {
Method reason = cause.getReason();
log.error("Rabbit Mq Consumer Connection Shutdown : {} {}", reason, cause);
}
} else{
Channel ch = (Channel)cause.getReference();
log.error("Channel error details : {}", ch);
}
});
The issue is it's not getting invoked at all in testing. I tried triggering it through 2 ways-
Through unacked delivery timeout. Basically threw a general excption and never acked it(these were the original conditions of the bug). However, this didn't work.
I used channel.close() to shutdown the consumer but still didn't receive an event.
Looking for any way to replicate the issue I faced and test/trigger the shutdownlisteners. Thanks
I'm using kafka + redis in my project.
I get message from Kafka, process and save to redis, but it is giving error like below when my code runs after some time my code
io.smallrye.mutiny.TimeoutException
at io.smallrye.mutiny.operators.uni.UniBlockingAwait.await(UniBlockingAwait.java:64)
at io.smallrye.mutiny.groups.UniAwait.atMost(UniAwait.java:65)
at io.quarkus.redis.client.runtime.RedisClientImpl.await(RedisClientImpl.java:1046)
at io.quarkus.redis.client.runtime.RedisClientImpl.set(RedisClientImpl.java:687)
at worker.redis.process.implementation.ProductImplementation.refresh(ProductImplementation.java:34)
at worker.redis.Worker.refresh(Worker.java:51)
at
kafka.InComingProductKafkaConsume.lambda$consume$0(InComingProductKafkaConsume.java:38)
at business.core.hpithead.ThreadStart.doRun(ThreadStart.java:34)
at business.core.hpithead.core.NotifyingThread.run(NotifyingThread.java:27)
at java.base/java.lang.Thread.run(Thread.java:833)
The record 51761 from topic-partition 'mer-outgoing-master-item-0' has waited for 153 seconds to be acknowledged. This waiting time is greater than the configured threshold (150000 ms). At the moment 2 messages from this partition are awaiting acknowledgement. The last committed offset for this partition was 51760. This error is due to a potential issue in the application which does not acknowledged the records in a timely fashion. The connector cannot commit as a record processing has not completed.
#Incoming("mer_product")
#Blocking
public CompletionStage<Void> consume2(Message<String> payload) {
var objectDto = configThreadLocal.mapper.readValue(payload.getPayload(), new TypeReference<KafkaPayload<ItemKO>>(){});
worker.refresh(objectDto.payload.castDto());
return payload.ack();
}
Can anyone please tell me about this exception.
ERROR [kafka-producer-network-thread | producer-2] c.o.p.a.s.CalculatorAdapter [CalculatorAdapter.java:285]
Cannot send outgoingDto with decision id = 46d1-9491-123ce9c7a916 in kafka:
org.springframework.kafka.core.KafkaProducerException: Failed to send;
nested exception is org.apache.kafka.common.errors.TimeoutException:
Expiring 1 record(s) for save-request-0:604351 ms has passed since batch creation
at org.springframework.kafka.core.KafkaTemplate.lambda$buildCallback$4(KafkaTemplate.java:602)
at org.springframework.kafka.core.DefaultKafkaProducerFactory$CloseSafeProducer$1.onCompletion(DefaultKafkaProducerFactory.java:871)
at org.apache.kafka.clients.producer.KafkaProducer$InterceptorCallback.onCompletion(KafkaProducer.java:1356)
at org.apache.kafka.clients.producer.internals.ProducerBatch.completeFutureAndFireCallbacks(ProducerBatch.java:231)
at org.apache.kafka.clients.producer.internals.ProducerBatch.done(ProducerBatch.java:197)
at org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:676)
at org.apache.kafka.clients.producer.internals.Sender.sendProducerData(Sender.java:380)
at org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:323)
at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:239)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: org.apache.kafka.common.errors.TimeoutException:
Expiring 1 record(s) for save-request-0:604351 ms has passed since batch creation
I have been fighting with him for the second week.
Revised a bunch of fix recipes, but none of the recipes helped.
My program sends messages about 60 kilobytes in size, but they do not reach the kafka server.
The entire java application log is filled with exceptions of this kind.
My guess is that the time to fill the batch size takes longer than the time of the transaction, so the message is not sent.
// example
Properties props = new Properties();
...
pros.put(ProducerConfig.BATCH_SIZE_CONFIG, 60000); // 60kb
...
Producer producer = new KafkaProducer<>(props);
Checkout this articles.
Kafka Producer Batch
Kafka Producer batch size
Batch size configuration
http://cloudurable.com/blog/kafka-tutorial-kafka-producer-advanced-java-examples/index.html
https://kafka.apache.org/26/javadoc/org/apache/kafka/clients/producer/ProducerConfig.html
I have a frequent Channel shutdown: connection error issues (under 24.133.241:5671 thread, name is truncated) in RabbitMQ Java client (my producer and consumer are far apart). Most of the time consumer is automatically restarted as I have enabled heartbeat (15 seconds). However, there were some instances only Channel shutdown: connection error but no Consumer raised exception and no Restarting Consumer (under cTaskExecutor-4 thread).
My current workaround is to restart my application. Anyone can shed some light on this matter?
2017-03-20 12:42:38.856 ERROR 24245 --- [24.133.241:5671] o.s.a.r.c.CachingConnectionFactory
: Channel shutdown: connection error
2017-03-20 12:42:39.642 WARN 24245 --- [cTaskExecutor-4] o.s.a.r.l.SimpleMessageListenerCont
ainer : Consumer raised exception, processing can restart if the connection factory supports
it
...
2017-03-20 12:42:39.642 INFO 24245 --- [cTaskExecutor-4] o.s.a.r.l.SimpleMessageListenerCont
ainer : Restarting Consumer: tags=[{amq.ctag-4CqrRsUP8plDpLQdNcOjDw=21-05060179}], channel=Ca
ched Rabbit Channel: AMQChannel(amqp://21-05060179#10.24.133.241:5671/,1), conn: Proxy#7ec317
54 Shared Rabbit Connection: SimpleConnection#44bac9ec [delegate=amqp://21-05060179#10.24.133
.241:5671/], acknowledgeMode=NONE local queue size=0
Generally, this is due to the consumer thread being "stuck" in user code somewhere, so it can't react to the broken connection.
If you have network issues, perhaps it's stuck reading or writing to a socket; make sure you have timeouts set for any I/O operations.
Next time it happens take a thread dump to see what the consumer threads are doing.
My problem is that Storm KafkaSpout stopped to consume messages from Kafka topic after a period of time. When debug is enabled in storm, I get the log file like this:
2016-07-05 03:58:26.097 o.a.s.d.task [INFO] Emitting: packet_spout __metrics [#object[org.apache.storm.metric.api.IMetricsConsumer$TaskInfo 0x2c35b34f "org.apache.storm.metric.api.IMetricsConsumer$TaskInfo#2c35b34f"] [#object[org.apache.storm.metric.api.IMetricsConsumer$DataPoint 0x798f1e35 "[__ack-count = {default=0}]"] #object[org.apache.storm.metric.api.IMetricsConsumer$DataPoint 0x230867ec "[__sendqueue = {sojourn_time_ms=0.0, write_pos=5411461, read_pos=5411461, overflow=0, arrival_rate_secs=0.0, capacity=1024, population=0}]"] #object[org.apache.storm.metric.api.IMetricsConsumer$DataPoint 0x7cdec8eb "[__complete-latency = {default=0.0}]"] #object[org.apache.storm.metric.api.IMetricsConsumer$DataPoint 0x658fc59 "[__skipped-max-spout = 0]"] #object[org.apache.storm.metric.api.IMetricsConsumer$DataPoint 0x3c1f3a50 "[__receive = {sojourn_time_ms=4790.5, write_pos=2468305, read_pos=2468304, overflow=0, arrival_rate_secs=0.20874647740319383, capacity=1024, population=1}]"] #object[org.apache.storm.metric.api.IMetricsConsumer$DataPoint 0x262d7906 "[__skipped-inactive = 0]"] #object[org.apache.storm.metric.api.IMetricsConsumer$DataPoint 0x73648c7e "[kafkaPartition = {Partition{host=slave103:9092, topic=packet, partition=12}/fetchAPICallCount=0, Partition{host=slave103:9092, topic=packet, partition=12}/fetchAPILatencyMax=null, Partition{host=slave103:9092, topic=packet, partition=12}/lostMessageCount=0, Partition{host=slave103:9092, topic=packet, partition=12}/fetchAPILatencyMean=null, Partition{host=slave103:9092, topic=packet, partition=12}/fetchAPIMessageCount=0}]"] #object[org.apache.storm.metric.api.IMetricsConsumer$DataPoint 0x4e43df61 "[kafkaOffset = {packet/totalLatestCompletedOffset=154305947, packet/partition_12/spoutLag=82472754, packet/totalEarliestTimeOffset=233919465, packet/partition_12/earliestTimeOffset=233919465, packet/partition_12/latestEmittedOffset=154307691, packet/partition_12/latestTimeOffset=236778701, packet/totalLatestEmittedOffset=154307691, packet/partition_12/latestCompletedOffset=154305947, packet/totalLatestTimeOffset=236778701, packet/totalSpoutLag=82472754}]"] #object[org.apache.storm.metric.api.IMetricsConsumer$DataPoint 0x49fe816b "[__transfer-count = {__ack_init=0, default=0, __metrics=0}]"] #object[org.apache.storm.metric.api.IMetricsConsumer$DataPoint 0x63e2bdc0 "[__fail-count = {}]"] #object[org.apache.storm.metric.api.IMetricsConsumer$DataPoint 0x3b17bb7b "[__skipped-throttle = 1086120]"] #object[org.apache.storm.metric.api.IMetricsConsumer$DataPoint 0x1315a68c "[__emit-count = {__ack_init=0, default=0, __metrics=0}]"]]]
2016-07-05 03:58:55.042 o.a.s.d.executor [INFO] Processing received message FOR -2 TUPLE: source: __system:-1, stream: __tick, id: {}, [30]
2016-07-05 03:59:25.042 o.a.s.d.executor [INFO] Processing received message FOR -2 TUPLE: source: __system:-1, stream: __tick, id: {}, [30]
2016-07-05 03:59:25.946 o.a.s.d.executor [INFO] Processing received message FOR -2 TUPLE: source: __system:-1, stream: __metrics_tick, id: {}, [60]
My test topology is really simple, One KafkaSpout and another Counter Bolt. When the topology works fine, the value between FOR and TUPLE is a positive number; when the topology stops to consume the message, the value becomes negative. so I'm curious about what causes the problem of Processing received message FOR -2 TUPLE, and how to fix this problem?
By the way, my experiment environment is:
OS: Red Hat Enterprise Linux Server release 7.0 (Maipo)
Kafka: 0.10.0.0
Storm: 1.0.1
With the help from the stom mail list I was able to tune KafkaSpout and resolve the issue. The following settings work for me.
config.put(Config.TOPOLOGY_MAX_SPOUT_PENDING, 2048);
config.put(Config.TOPOLOGY_BACKPRESSURE_ENABLE, false);
config.put(Config.TOPOLOGY_EXECUTOR_RECEIVE_BUFFER_SIZE, 16384);
config.put(Config.TOPOLOGY_EXECUTOR_SEND_BUFFER_SIZE, 16384);
I tested by sending 20k-50k batches with 1sec pause between bursts. Each message was 2048 bytes.
I am running 3 node cluster, my topology has 4 spouts and topic has 64 partitions.
After 200M messages its still working....
Check if the producer is actually writing to the topic you expect.
Make sure that the spouts can reach Kafka, at the network level. You can check it using Telnet command.
Can spouts reach Zookeeper? Check it again using Telnet.
Source: KafkaSpout is not receiving anything from Kafka
If above three are true, then:
Kafka has fixed retention window for topics. If the retention is full, it will drop the messages from the tail.
So here what 'might' be happening : the rate at which you are pushing the data to kafka is faster than the rate at which the consumers can consume the messages.
Source : Storm-kafka spout not fast enough to process the information