Does anyone knows kafka producer hanging fix - java

Can anyone please tell me about this exception.
ERROR [kafka-producer-network-thread | producer-2] c.o.p.a.s.CalculatorAdapter [CalculatorAdapter.java:285]
Cannot send outgoingDto with decision id = 46d1-9491-123ce9c7a916 in kafka:
org.springframework.kafka.core.KafkaProducerException: Failed to send;
nested exception is org.apache.kafka.common.errors.TimeoutException:
Expiring 1 record(s) for save-request-0:604351 ms has passed since batch creation
at org.springframework.kafka.core.KafkaTemplate.lambda$buildCallback$4(KafkaTemplate.java:602)
at org.springframework.kafka.core.DefaultKafkaProducerFactory$CloseSafeProducer$1.onCompletion(DefaultKafkaProducerFactory.java:871)
at org.apache.kafka.clients.producer.KafkaProducer$InterceptorCallback.onCompletion(KafkaProducer.java:1356)
at org.apache.kafka.clients.producer.internals.ProducerBatch.completeFutureAndFireCallbacks(ProducerBatch.java:231)
at org.apache.kafka.clients.producer.internals.ProducerBatch.done(ProducerBatch.java:197)
at org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:676)
at org.apache.kafka.clients.producer.internals.Sender.sendProducerData(Sender.java:380)
at org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:323)
at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:239)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: org.apache.kafka.common.errors.TimeoutException:
Expiring 1 record(s) for save-request-0:604351 ms has passed since batch creation
I have been fighting with him for the second week.
Revised a bunch of fix recipes, but none of the recipes helped.
My program sends messages about 60 kilobytes in size, but they do not reach the kafka server.
The entire java application log is filled with exceptions of this kind.

My guess is that the time to fill the batch size takes longer than the time of the transaction, so the message is not sent.
// example
Properties props = new Properties();
...
pros.put(ProducerConfig.BATCH_SIZE_CONFIG, 60000); // 60kb
...
Producer producer = new KafkaProducer<>(props);
Checkout this articles.
Kafka Producer Batch
Kafka Producer batch size
Batch size configuration
http://cloudurable.com/blog/kafka-tutorial-kafka-producer-advanced-java-examples/index.html
https://kafka.apache.org/26/javadoc/org/apache/kafka/clients/producer/ProducerConfig.html

Related

io.smallrye.mutiny.TimeoutException when using kafka vs redis

I'm using kafka + redis in my project.
I get message from Kafka, process and save to redis, but it is giving error like below when my code runs after some time my code
io.smallrye.mutiny.TimeoutException
at io.smallrye.mutiny.operators.uni.UniBlockingAwait.await(UniBlockingAwait.java:64)
at io.smallrye.mutiny.groups.UniAwait.atMost(UniAwait.java:65)
at io.quarkus.redis.client.runtime.RedisClientImpl.await(RedisClientImpl.java:1046)
at io.quarkus.redis.client.runtime.RedisClientImpl.set(RedisClientImpl.java:687)
at worker.redis.process.implementation.ProductImplementation.refresh(ProductImplementation.java:34)
at worker.redis.Worker.refresh(Worker.java:51)
at
kafka.InComingProductKafkaConsume.lambda$consume$0(InComingProductKafkaConsume.java:38)
at business.core.hpithead.ThreadStart.doRun(ThreadStart.java:34)
at business.core.hpithead.core.NotifyingThread.run(NotifyingThread.java:27)
at java.base/java.lang.Thread.run(Thread.java:833)
The record 51761 from topic-partition 'mer-outgoing-master-item-0' has waited for 153 seconds to be acknowledged. This waiting time is greater than the configured threshold (150000 ms). At the moment 2 messages from this partition are awaiting acknowledgement. The last committed offset for this partition was 51760. This error is due to a potential issue in the application which does not acknowledged the records in a timely fashion. The connector cannot commit as a record processing has not completed.
#Incoming("mer_product")
#Blocking
public CompletionStage<Void> consume2(Message<String> payload) {
var objectDto = configThreadLocal.mapper.readValue(payload.getPayload(), new TypeReference<KafkaPayload<ItemKO>>(){});
worker.refresh(objectDto.payload.castDto());
return payload.ack();
}

Redistribution fails for large messages in ActiveMQ Artemis

I'm using ActiveMQ Artemis version 2.19.1, and I'm facing an issue in a 6-node (3 masters) cluster where redistribution is failing for large messages with the below warning logs:
23:35:05,551 WARN [org.apache.activemq.artemis.core.server] AMQ222303: Redistribution by Redistributor[TEST_QUEUE/2244] of messageID = 196,950,715 failed: java.lang.UnsupportedOperationException: Method not supported with Large Messages
at org.apache.activemq.artemis.protocol.amqp.broker.AMQPLargeMessage.getData(AMQPLargeMessage.java:311) [artemis-amqp-protocol-2.19.1.jar:2.19.1]
at org.apache.activemq.artemis.protocol.amqp.broker.AMQPMessage.anyMessageAnnotations(AMQPMessage.java:1374) [artemis-amqp-protocol-2.19.1.jar:2.19.1]
at org.apache.activemq.artemis.protocol.amqp.broker.AMQPMessage.hasScheduledDeliveryTime(AMQPMessage.java:1352) [artemis-amqp-protocol-2.19.1.jar:2.19.1]
at org.apache.activemq.artemis.core.postoffice.impl.PostOfficeImpl.processRoute(PostOfficeImpl.java:1499) [artemis-server-2.19.1.jar:2.19.1]
at org.apache.activemq.artemis.core.server.cluster.impl.Redistributor$1.run(Redistributor.java:169) [artemis-server-2.19.1.jar:2.19.1]
at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42) [artemis-commons-2.19.1.jar:2.19.1]
at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31) [artemis-commons-2.19.1.jar:2.19.1]
at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:65) [artemis-commons-2.19.1.jar:2.19.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [rt.jar:1.8.0_322]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_322]
at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118) [artemis-commons-2.19.1.jar:2.19.1]
Later I see broker is removing consumers with below warning since properties=null:
00:10:28,280 WARN [org.apache.activemq.artemis.core.server] AMQ222151: removing consumer which did not handle a message, consumer=ServerConsumerImpl [id=57, filter=null, binding=LocalQueueBinding [address=TEST_QUEUE, queue=QueueImpl[name=TEST_QUEUE, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::name=localhost], temp=false]#9c000b7, filter=null, name=TEST_QUEUE, clusterName=TEST_QUEUE1e359f55-c92b-11ec-b908-005056a3af3f]], message=Reference[157995028]:RELIABLE:AMQPLargeMessage( [durable=true, messageID=157995028, address=TEST_QUEUE, size=0, scanningStatus=SCANNED, applicationProperties={VER=05, trackingId=62701757c80c3004d037ded6}, messageAnnotations={}, properties=null, extraProperties = TypedProperties[_AMQ_AD=TEST_QUEUE]]: java.lang.IllegalArgumentException: Array must not be empty or null
at org.apache.qpid.proton.codec.CompositeReadableBuffer.append(CompositeReadableBuffer.java:688) [proton-j-0.33.10.jar:]
at org.apache.qpid.proton.engine.impl.DeliveryImpl.send(DeliveryImpl.java:345) [proton-j-0.33.10.jar:]
at org.apache.qpid.proton.engine.impl.SenderImpl.send(SenderImpl.java:74) [proton-j-0.33.10.jar:]
at org.apache.activemq.artemis.protocol.amqp.proton.ProtonServerSenderContext$LargeMessageDeliveryContext.deliverInitialPacket(ProtonServerSenderContext.java:686) [artemis-amqp-protocol-2.19.1.jar:2.19.1]
at org.apache.activemq.artemis.protocol.amqp.proton.ProtonServerSenderContext$LargeMessageDeliveryContext.deliver(ProtonServerSenderContext.java:587) [artemis-amqp-protocol-2.19.1.jar:2.19.1]
I have 6 consumers for this queue if one message out of many (let's say 1,000) is large - it should process other messages but, the processing is stopped completely with 0 consumers on the queue.
When you send large messages, within the first block sent the large message is parsed for properties. Most likely you are using large properties in a way the server is not being able to parse the first few bytes of the message. You should avoid using large properties and let the large portion only for the client.
We tried following up with many tests on this JIRA, and the only plausible scenarios would be through large properties, or some incomplete message your client is generating in a way the server can't parse it.
https://issues.apache.org/jira/browse/ARTEMIS-3837
if you provide a reproducer showing how the property is failing to be parsed we will follow up with a possible fix.
Please move any discussion regarding this to the JIRA. It could be a bug caused by your anti-pattern.

Kafka Producer error: Closing the Kafka producer with timeoutMillis = 9223372036854775807 ms

One of my KafkaStream apps gives me the following error:
Closing the Kafka producer with timeoutMillis = 9223372036854775807 ms
It is not likely a permission or connectivity issue. Because it consumes a batch of messages (maybe till the last message available) then I see Informed to shut down in logs and then State transition from RUNNING to PENDING_SHUTDOWN. Then I get the timeout error.
It is a simple single-threaded app with code as simple as this:
builder.stream("prod.company.scores",
Consumed.with(ScoreSerde.getGenericKeySerde(), ScoreSerde.getEnvelopeSerde()))
.filter((key,value) -> isRecordNew(value))
.filter((key,value) -> isScoreNew(value))
.filter((key,value) -> isScoreRedeem(value))
.peek(foreachAction)
.to("kafka-consumer.score.redeem");
INFO KafkaProducer: Closing the Kafka producer with timeoutMillis = 9223372036854775807 ms. this message does not shutdown the application. Need to check for other problems.

Kafka No broker in ISR for partition

We have a Kafka cluster consists of 6 nodes. Five of the 6 nodes have zookeeper.
A spark streaming job is reading from a streaming server, do some processing, and send the result to Kafka.
From time to time the spark job got stuck, no data is sent to Kafka, and the job is restarted.
The job keeps stuck and restarted until we manually restart the Kafka cluster. After restarting Kafka everything is working smoothly.
Checking the Kafka logs we found this exception is thrown several times
2017-03-10 05:12:14,177 ERROR state.change.logger: Controller 133 epoch 616 initiated state change for partition [live_stream_2,52] from OfflinePartition to OnlinePartition failed
kafka.common.NoReplicaOnlineException: No broker in ISR for partition [gnip_live_stream_2,52] is alive. Live brokers are: [Set(133, 137, 134, 135, 143)], ISR brokers are: [142]
at kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:66)
at kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:345)
at kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:205)
at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:120)
at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:117)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:778)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:777)
at kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:117)
at kafka.controller.PartitionStateMachine.startup(PartitionStateMachine.scala:70)
at kafka.controller.KafkaController.onControllerFailover(KafkaController.scala:333)
at kafka.controller.KafkaController$$anonfun$1.apply$mcV$sp(KafkaController.scala:164)
at kafka.server.ZookeeperLeaderElector.elect(ZookeeperLeaderElector.scala:84)
at kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply$mcZ$sp(ZookeeperLeaderElector.scala:146)
at kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply(ZookeeperLeaderElector.scala:141)
at kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply(ZookeeperLeaderElector.scala:141)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:259)
at kafka.server.ZookeeperLeaderElector$LeaderChangeListener.handleDataDeleted(ZookeeperLeaderElector.scala:141)
at org.I0Itec.zkclient.ZkClient$9.run(ZkClient.java:823)
at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
The exception above is thrown for an unused topic (live_stream_2) but it is thrown also for a used topic with a little difference.
Here is the exception for the used topic
2017-03-10 12:05:18,535 ERROR state.change.logger: Controller 133 epoch 620 initiated state change for partition [gnip_live_stream,3] from OfflinePartition to OnlinePartition failed
kafka.common.NoReplicaOnlineException: No broker in ISR for partition [live_stream,3] is alive. Live brokers are: [Set(133, 134, 135, 137)], ISR brokers are: [136]
at kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:66)
at kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:345)
at kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:205)
at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:120)
at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:117)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:778)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:777)
at kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:117)
at kafka.controller.PartitionStateMachine.startup(PartitionStateMachine.scala:70)
at kafka.controller.KafkaController.onControllerFailover(KafkaController.scala:333)
at kafka.controller.KafkaController$$anonfun$1.apply$mcV$sp(KafkaController.scala:164)
at kafka.server.ZookeeperLeaderElector.elect(ZookeeperLeaderElector.scala:84)
at kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply$mcZ$sp(ZookeeperLeaderElector.scala:146)
at kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply(ZookeeperLeaderElector.scala:141)
at kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply(ZookeeperLeaderElector.scala:141)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:259)
at kafka.server.ZookeeperLeaderElector$LeaderChangeListener.handleDataDeleted(ZookeeperLeaderElector.scala:141)
at org.I0Itec.zkclient.ZkClient$9.run(ZkClient.java:823)
at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
In the first exception, it says the ISR broker list for partition 52 contains only the broker with ID 142 which is weird because the cluster has no broker with this id.
In the second exception, it says the ISR broker list for partition 3 contains only the broker with ID 136 which is not existing in the broker live list.
I suspect there is stale data in zookeeper that cause the first exception and for some reason broker 136 was down at specific time which causes the second exception.
My questions
1- Could those exceptions be the reason of Kafka (and consequently the spark job) to stuck?
2- How to solve it?

Samza/Kafka Failed to Update Metadata

I am currently working on writing a Samza Script that will just take data from a Kafka topic and output the data to another Kafka topic. I have written a very basic StreamTask however upon execution I am running into an error.
The error is below:
Exception in thread "main" org.apache.samza.SamzaException: org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 193 ms.
at org.apache.samza.coordinator.stream.CoordinatorStreamSystemProducer.send(CoordinatorStreamSystemProducer.java:112)
at org.apache.samza.coordinator.stream.CoordinatorStreamSystemProducer.writeConfig(CoordinatorStreamSystemProducer.java:129)
at org.apache.samza.job.JobRunner.run(JobRunner.scala:79)
at org.apache.samza.job.JobRunner$.main(JobRunner.scala:48)
at org.apache.samza.job.JobRunner.main(JobRunner.scala)
Caused by: org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 193 ms
I not entirely sure how to configure or have the script write the required Kafka metadata. Below is my code for the StreamTask and the properties file. In the properties file I added the Metadata section to see if that would assist in the process afterwards but to no avail. Is that the right direction or am I missing something entirely?
import org.apache.samza.task.StreamTask;
import org.apache.samza.task.MessageCollector;
import org.apache.samza.task.TaskCoordinator;
import org.apache.samza.system.SystemStream;
import org.apache.samza.system.IncomingMessageEnvelope;
import org.apache.samza.system.OutgoingMessageEnvelope;
/*
* Take all messages received and send them to
* a Kafka topic called "words"
*/
public class TestStreamTask implements StreamTask{
private static final SystemStream OUTPUT_STREAM = new SystemStream("kafka" , "words"); // create new system stream for kafka topic "words"
#Override
public void process(IncomingMessageEnvelope envelope, MessageCollector collector, TaskCoordinator coordinator){
String message = (String) envelope.getMessage(); // pull message from stream
for(String word : message.split(" "))
collector.send(new OutgoingMessageEnvelope(OUTPUT_STREAM, word, 1)); // output messsage to new system stream for kafka topic "words"
}
}
# Job
job.factory.class=org.apache.samza.job.yarn.YarnJobFactory
job.name=test-words
# YARN
yarn.package.path=file://${basedir}/target/${project.artifactId}-${pom.version}-dist.tar.gz
# Task
task.class=samza.examples.wikipedia.task.TestStreamTask
task.inputs=kafka.test
task.checkpoint.factory=org.apache.samza.checkpoint.kafka.KafkaCheckpointManagerFactory
task.checkpoint.system=kafka
task.checkpoint.replication.factor=1
# Metrics
metrics.reporters=snapshot,jmx
metrics.reporter.snapshot.class=org.apache.samza.metrics.reporter.MetricsSnapshotReporterFactory
metrics.reporter.snapshot.stream=kafka.metrics
metrics.reporter.jmx.class=org.apache.samza.metrics.reporter.JmxReporterFactory
# Serializers
serializers.registry.string.class=org.apache.samza.serializers.StringSerdeFactory
serializers.registry.metrics.class=org.apache.samza.serializers.MetricsSnapshotSerdeFactory
# Systems
systems.kafka.samza.factory=org.apache.samza.system.kafka.KafkaSystemFactory
systems.kafka.samza.msg.serde=string
systems.kafka.consumer.zookeeper.connect=localhost:2181/
systems.kafka.consumer.auto.offset.reset=largest
systems.kafka.producer.bootstrap.servers=localhost:9092
# Metadata
systems.kafka.metadata.bootstrap.servers=localhost:9092
This question is about Kafka 0.8 which should be out of support if I am not mistaken.
This fact, combined with the context of people only running into this issue sometimes, but not all the time (and nobody seems to struggle with this in recent years), gives me very good confidence that upgrading to a more recent version of Kafka will resolve the problem.

Categories