Kafka consumer receiving same message multiple times - java
I've recently started using kafka to read documents coming through a web crawler. What I'm noticing is when I'm dealing with few million documents, the consumer is processing the same message over and over again. Looks like the data is not getting committed for some reason. This is not the case when I'm testing the consumer with few hundred message.
I'm using kafka high level consumer client code in java. I'm using consumer group running on number of threads equivalent to number of partitions. So each thread is deciated to a partition. Here's a code snippet for polling data.
while (true) {
try{
if(consumerDao.canPollTopic()){
ConsumerRecords records =
consumer.poll(this.config.getPropertyAsIneger(IPreProcessorConstant.KAFKA_POLL_COUNT));
for (ConsumerRecord record : records) {
if(record.value()!=null){
TextAnalysisRequest textAnalysisObj = record.value();
if(textAnalysisObj!=null){
PostProcessRequest req = new PostProcessRequest();
req.setRequest(this.getRequest(textAnalysisObj));
PreProcessorUtil.submitPostProcessRequest(req, config);
}
}
}
}else{
Thread.sleep(this.config.getPropertyAsIneger(IPreProcessorConstant.KAFKA_POLL_SLEEP));
}
}catch(Exception ex){
LOGGER.error("Error in Full Consumer group worker", ex);
}
}
Here's the kafka consumer configuration parameters I'm setting. Rest are default values.
consumer.auto.commit=true
consumer.auto.commit.interval=1000
consumer.session.timeout=180000
consumer.poll.records=2147483647
consumer.request.timeout=181000
Here's the complete consumer config:
metric.reporters =
metadata.max.age.ms = 300000
partition.assignment.strategy = [org.apache.kafka.clients.consumer.RangeAssignor]
reconnect.backoff.ms = 50
sasl.kerberos.ticket.renew.window.factor = 0.8
max.partition.fetch.bytes = 1048576
bootstrap.servers = [kafkahost1:9092, kafkahost2:9092]
ssl.keystore.type = JKS
enable.auto.commit = true
sasl.mechanism = GSSAPI
interceptor.classes = null
exclude.internal.topics = true
ssl.truststore.password = null
client.id =ssl.endpoint.identification.algorithm = null
max.poll.records = 2147483647
check.crcs = true
request.timeout.ms = 181000
heartbeat.interval.ms = 3000
auto.commit.interval.ms = 1000
receive.buffer.bytes = 65536
ssl.truststore.type = JKS
ssl.truststore.location = null
ssl.keystore.password = null
fetch.min.bytes = 1
send.buffer.bytes = 131072
value.deserializer = class com.test.preprocessor.consumer.serializer.KryoObjectSerializer
group.id = full_group
retry.backoff.ms = 100
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
ssl.trustmanager.algorithm = PKIX
ssl.key.password = null
fetch.max.wait.ms = 500
sasl.kerberos.min.time.before.relogin = 60000
connections.max.idle.ms = 540000
session.timeout.ms = 180000
metrics.num.samples = 2
key.deserializer = class org.apache.kafka.common.serialization.StringDeserializer
ssl.protocol = TLS
ssl.provider = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
ssl.keystore.location = null
ssl.cipher.suites = null
security.protocol = PLAINTEXT
ssl.keymanager.algorithm = SunX509
metrics.sample.window.ms = 30000
auto.offset.reset = latest
My sample kafka queue is having 8 partitions with 2 replication factor.
The log retention period in server.properties is setup as 168 hours.
log.retention.hours=168
log.roll.hours=168
Not sure what I'm missing here.
I increased my auto.commit.interval.ms=8000 in my consumer properties from 3000 to 8000. This fixed the duplicate record issues.
I guess the issue with the partition assignment.
Related
Apache camel kafka consumer performance problem
We have observed that our Apache Camel Kafka is able to process only 100Kb / second though our kafka producing rate is high. We have 6 kafka partitions and 6 Apache Camel Kafka consumer instances . We have tried with multithreading options - threads(300, 500) . But we could not see any improvements It will be really helpful if someone could help to understand the correct configurations to improve the kafka consumer rate. We are using all the default settings for Apache Camel Kafka consumer. I could see the below kafka configuration in spring boot application start : INFO 1 — [ main] o.a.k.clients.producer.ProducerConfig : ProducerConfig values: acks = 1 batch.size = 16384 bootstrap.servers = [kafka1.xxx.net:9093, kafka2.xxx.net:9093, kafka3.xxx.net:9093, kafka4.xxx.net:9093, kafka5.xxx.net:9093, kafka6.xxx.net:9093] buffer.memory = 33554432 client.dns.lookup = use_all_dns_ips client.id = producer-1 compression.type = none connections.max.idle.ms = 540000 delivery.timeout.ms = 120000 enable.idempotence = false interceptor.classes = [] key.serializer = class org.apache.kafka.common.serialization.StringSerializer linger.ms = 0 max.block.ms = 60000 max.in.flight.requests.per.connection = 5 max.request.size = 104857600 metadata.max.age.ms = 300000 metadata.max.idle.ms = 300000 metric.reporters = [org.apache.kafka.common.metrics.JmxReporter] metrics.num.samples = 2 metrics.recording.level = INFO metrics.sample.window.ms = 30000 partitioner.class = class org.apache.kafka.clients.producer.internals.DefaultPartitioner receive.buffer.bytes = 65536000 reconnect.backoff.max.ms = 1000 reconnect.backoff.ms = 50 request.timeout.ms = 30000 retries = 0 retry.backoff.ms = 100 sasl.client.callback.handler.class = null sasl.jaas.config = null sasl.kerberos.kinit.cmd = /usr/bin/kinit sasl.kerberos.min.time.before.relogin = 60000 sasl.kerberos.service.name = null sasl.kerberos.ticket.renew.jitter = 0.05 sasl.kerberos.ticket.renew.window.factor = 0.8 sasl.login.callback.handler.class = null sasl.login.class = null sasl.login.refresh.buffer.seconds = 300 sasl.login.refresh.min.period.seconds = 60 sasl.login.refresh.window.factor = 0.8 sasl.login.refresh.window.jitter = 0.05 sasl.mechanism = GSSAPI security.protocol = SSL security.providers = null send.buffer.bytes = 131072 socket.connection.setup.timeout.max.ms = 30000 socket.connection.setup.timeout.ms = 10000 ssl.cipher.suites = null ssl.enabled.protocols = [TLSv1.2, TLSv1.3] ssl.endpoint.identification.algorithm = https ssl.engine.factory.class = null ssl.key.password = [hidden] ssl.keymanager.algorithm = SunX509 ssl.keystore.certificate.chain = null ssl.keystore.key = null ssl.keystore.location = /tmp/certs/XXX.jks ssl.keystore.password = [hidden] ssl.keystore.type = JKS ssl.protocol = TLSv1.3 ssl.provider = null ssl.secure.random.implementation = null ssl.trustmanager.algorithm = PKIX ssl.truststore.certificates = null ssl.truststore.location = /tmp/certsFolder/kafka.truststore.jks ssl.truststore.password = [hidden] ssl.truststore.type = JKS transaction.timeout.ms = 60000 transactional.id = null value.serializer = class org.apache.kafka.common.serialization.StringSerializer We have tried with multithreading options - .threads(300, 500) . But we could not see any improvements. Is it something related with kafka consumer properties ? spring-boot-starter-parent version - 2.6.8 org.apache.camel.springboot version - 3.14.1 Grafana Dashboard - Producer Rate and Consumer Rates Overview : Kafka Partition 1 – Producer and Consumer: Producer is a constant line . But Apache camel consumer has ups & down in the consumption rate Kafka Partition 2 – Producer and Consumer: Producer is a constant line. But Apache camel consumer has ups & down in the consumption rate Kafka Partition 3 – Producer and Consumer: Producer is a constant line. But Apache camel consumer has ups & down in the consumption rate Kafka Partition 4 – Producer and Consumer: Producer is a constant line. But Apache camel consumer is only having a consumption rate of 30 B/s. Kafka Partition 5 – Producer and Consumer: Producer is a constant line. But Apache camel consumer is only having a consumption rate of 30 B/s. Kafka Partition 6 – Producer and Consumer: Producer is a constant line. But Apache camel consumer has ups & down in the consumption rate.
kafka producer does not throw exception when broker is down
Created a cluster with two brokers using same zookeeper and trying to produce message to a topic whose details are as below. When the producer sets acks="all" or -1,min.insync.replicas="2", it is supposed to receive acknowledgement from the brokers(leaders and replicas) but when one broker is shut manually while it is producing, it is making no difference to the kafka producer even when acks="all" can someone explain the reason for this weird behavior? brokers are on 9091,9092. acks = -1 batch.size = 16384 bootstrap.servers = [localhost:9092] buffer.memory = 33554432 client.dns.lookup = use_all_dns_ips client.id = producer-1 compression.type = none connections.max.idle.ms = 540000 delivery.timeout.ms = 120000 enable.idempotence = false interceptor.classes = [] internal.auto.downgrade.txn.commit = false key.serializer = class org.apache.kafka.common.serialization.StringSerializer linger.ms = 0 max.block.ms = 60000 max.in.flight.requests.per.connection = 5 max.request.size = 1048576 metadata.max.age.ms = 300000 metadata.max.idle.ms = 300000 metric.reporters = [] metrics.num.samples = 2 metrics.recording.level = INFO metrics.sample.window.ms = 30000 partitioner.class = class org.apache.kafka.clients.producer.internals.DefaultPartitioner receive.buffer.bytes = 32768 reconnect.backoff.max.ms = 1000 reconnect.backoff.ms = 50 request.timeout.ms = 30000 retries = 2147483647 retry.backoff.ms = 100 sasl.client.callback.handler.class = null sasl.jaas.config = null sasl.kerberos.kinit.cmd = /usr/bin/kinit sasl.kerberos.min.time.before.relogin = 60000 sasl.kerberos.service.name = null sasl.kerberos.ticket.renew.jitter = 0.05 sasl.kerberos.ticket.renew.window.factor = 0.8 sasl.login.callback.handler.class = null sasl.login.class = null sasl.login.refresh.buffer.seconds = 300 sasl.login.refresh.min.period.seconds = 60 sasl.login.refresh.window.factor = 0.8 sasl.login.refresh.window.jitter = 0.05 sasl.mechanism = GSSAPI security.protocol = PLAINTEXT security.providers = null send.buffer.bytes = 131072 ssl.cipher.suites = null ssl.enabled.protocols = [TLSv1.2] ssl.endpoint.identification.algorithm = https ssl.engine.factory.class = null ssl.key.password = null ssl.keymanager.algorithm = SunX509 ssl.keystore.location = null ssl.keystore.password = null ssl.keystore.type = JKS ssl.protocol = TLSv1.2 ssl.provider = null ssl.secure.random.implementation = null ssl.trustmanager.algorithm = PKIX ssl.truststore.location = null ssl.truststore.password = null ssl.truststore.type = JKS transaction.timeout.ms = 60000 transactional.id = null value.serializer = class org.apache.kafka.common.serialization.StringSerializer Below is the source code for the kafka producer public static void main(String k[]) { Properties prop=new Properties(); prop.setProperty(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,"localhost:9092"); prop.setProperty(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName()); prop.setProperty(ProducerConfig.ACKS_CONFIG,"all"); prop.setProperty("min.insync.replicas", "2"); prop.setProperty(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName()); KafkaProducer<String,String> producer=new KafkaProducer<>(prop); ProducerRecord<String,String> rec=new ProducerRecord<String,String>("clust_topic","123"); while(true) { producer.send(rec, new Callback() { #Override public void onCompletion(RecordMetadata rm, Exception arg1) { System.out.println(arg1); if(arg1!=null) System.out.println(arg1); else System.out.println(rm.topic()+" "+rm.partition()+" "+rm.offset()+" "); } }); } }
ack=all means that it requires ack from all in-sync replicas, not from all replicas (refer to documentation)
Kafka duplicates some messages hundreds of times
I'm using 3 Kafka brokers in a cluster using Kafka 2.3.0. Then I have a streaming application that consumes data from an other kafka cluster which transforms data and pushes this data to the before mentioned 3 kafka broker cluster. The streaming application has a producer written in Java using Spring Cloud Stream Greenwhich.SR1. This producer uses the following code to push messages: import lombok.extern.slf4j.Slf4j; import org.springframework.beans.factory.annotation.Value; import org.springframework.cloud.stream.annotation.EnableBinding; import org.springframework.integration.support.MessageBuilder; import org.springframework.kafka.support.KafkaHeaders; import org.springframework.messaging.Message; import org.springframework.stereotype.Component; #Slf4j #Component #EnableBinding(SensorDataBinding.class) public class SensorDataProducer { private final SensorDataBinding sensorDataOut; private final long sendTimeoutInMilliseconds; public SensorDataProducer(SensorDataBinding binding, #Value("${sendTimeoutInMilliseconds}") long sendTimeoutInMilliseconds) { this.sensorDataOut = binding; this.sendTimeoutInMilliseconds = sendTimeoutInMilliseconds; } public void produce(SensorData sensorMeasurement) { send(sensorMeasurement); } private void send(SensorData sensorMeasurement) { log.trace("sending message with contents: {}", sensorMeasurement.toString()); Message<SensorData> message = MessageBuilder .withPayload(sensorMeasurement) .setHeader(KafkaHeaders.MESSAGE_KEY, getMessageKey(sensorMeasurement)) .build(); failSafeMessageSend(message); } private void failSafeMessageSend(Message<SensorData> message) { boolean sendSucceeded = false; do { try { this.sensorDataOut.sensorDataOut().send(message, this.sendTimeoutInMilliseconds); sendSucceeded = true; } catch (Exception ex) { log.error("Exception when sending message: {}", ex.getMessage()); } } while (!sendSucceeded); } private byte[] getMessageKey(SensorData measurement) { return (measurement.getMessageKey()).getBytes(); } } Producer config: ProducerConfig values: acks = 1 batch.size = 16384 bootstrap.servers = [localhost:9095] buffer.memory = 33554432 client.id = compression.type = none connections.max.idle.ms = 540000 enable.idempotence = false interceptor.classes = [] key.serializer = class org.apache.kafka.common.serialization.ByteArraySerializer linger.ms = 0 max.block.ms = 60000 max.in.flight.requests.per.connection = 5 max.request.size = 1048576 metadata.max.age.ms = 300000 metric.reporters = [] metrics.num.samples = 2 metrics.recording.level = INFO metrics.sample.window.ms = 30000 partitioner.class = class org.apache.kafka.clients.producer.internals.DefaultPartitioner receive.buffer.bytes = 32768 reconnect.backoff.max.ms = 1000 reconnect.backoff.ms = 50 request.timeout.ms = 30000 retries = 0 retry.backoff.ms = 100 sasl.client.callback.handler.class = null sasl.jaas.config = null sasl.kerberos.kinit.cmd = /usr/bin/kinit sasl.kerberos.min.time.before.relogin = 60000 sasl.kerberos.service.name = null sasl.kerberos.ticket.renew.jitter = 0.05 sasl.kerberos.ticket.renew.window.factor = 0.8 sasl.login.callback.handler.class = null sasl.login.class = null sasl.login.refresh.buffer.seconds = 300 sasl.login.refresh.min.period.seconds = 60 sasl.login.refresh.window.factor = 0.8 sasl.login.refresh.window.jitter = 0.05 sasl.mechanism = GSSAPI security.protocol = PLAINTEXT send.buffer.bytes = 131072 ssl.cipher.suites = null ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1] ssl.endpoint.identification.algorithm = https ssl.key.password = null ssl.keymanager.algorithm = SunX509 ssl.keystore.location = null ssl.keystore.password = null ssl.keystore.type = JKS ssl.protocol = TLS ssl.provider = null ssl.secure.random.implementation = null ssl.trustmanager.algorithm = PKIX ssl.truststore.location = null ssl.truststore.password = null ssl.truststore.type = JKS transaction.timeout.ms = 60000 transactional.id = null value.serializer = class org.apache.kafka.common.serialization.ByteArraySerializer ProducerConfig values: acks = 1 batch.size = 16384 bootstrap.servers = [https://....:...] buffer.memory = 33554432 client.id = client-1ae836b8-9a13-4903-aad6-09ce11a4be08-StreamThread-1-producer compression.type = none connections.max.idle.ms = 540000 enable.idempotence = false interceptor.classes = [] key.serializer = class org.apache.kafka.common.serialization.ByteArraySerializer linger.ms = 100 max.block.ms = 60000 max.in.flight.requests.per.connection = 5 max.request.size = 1048576 metadata.max.age.ms = 300000 metric.reporters = [] metrics.num.samples = 2 metrics.recording.level = INFO metrics.sample.window.ms = 30000 partitioner.class = class org.apache.kafka.clients.producer.internals.DefaultPartitioner receive.buffer.bytes = 32768 reconnect.backoff.max.ms = 1000 reconnect.backoff.ms = 50 request.timeout.ms = 30000 retries = 10 retry.backoff.ms = 100 sasl.client.callback.handler.class = null sasl.jaas.config = null sasl.kerberos.kinit.cmd = /usr/bin/kinit sasl.kerberos.min.time.before.relogin = 60000 sasl.kerberos.service.name = null sasl.kerberos.ticket.renew.jitter = 0.05 sasl.kerberos.ticket.renew.window.factor = 0.8 sasl.login.callback.handler.class = null sasl.login.class = null sasl.login.refresh.buffer.seconds = 300 sasl.login.refresh.min.period.seconds = 60 sasl.login.refresh.window.factor = 0.8 sasl.login.refresh.window.jitter = 0.05 sasl.mechanism = GSSAPI security.protocol = SSL send.buffer.bytes = 131072 ssl.cipher.suites = null ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1] ssl.endpoint.identification.algorithm = https ssl.key.password = null ssl.keymanager.algorithm = SunX509 ssl.keystore.location = keystore/prod/client.keystore.p12 ssl.keystore.password = [hidden] ssl.keystore.type = PKCS12 ssl.protocol = TLS ssl.provider = null ssl.secure.random.implementation = null ssl.trustmanager.algorithm = PKIX ssl.truststore.location = keystore/prod/client.truststore.jks ssl.truststore.password = [hidden] ssl.truststore.type = JKS transaction.timeout.ms = 60000 transactional.id = null value.serializer = class org.apache.kafka.common.serialization.ByteArraySerializer on our environment we have 8 instances of this application that are part of a consumer group which are consuming from a topic of 60 partitions on the external cluster. As mentioned this data is transformed and pushed on our own 3 broker Kafka cluster setup. The data is pushed to a sensor-data-topic which has 30 partitions, a retention time of 7 days and delete compaction. I'm fully aware of duplicate messages using at least once semantics, but I'm seeing some messages being duplicated sometimes over 300 times, causing the required disksize to grow tremendously, while other messages are duplicated max 3 or 4. The following is an example of some metrics that shows a message being duplicated 233 times where the timestamps array shows the timestamp of each message and the offsets array shows the offset of each message that is seen as duplicate: Key: b'100083952:300793850'|-|1591011300000. Value: {'count': 233, 'partition': 3, 'offset': 26637463, 'timestamps': [1594133472060, 1594133472062, 1594133472064, 1594133472066, 1594133472068, 1594133472071, 1594133472072, 1594133472074, 1594133472076, 1594133472081, 1594133472084, 1594133472085, 1594133472087, 1594133472090, 1594133472092, 1594133472095, 1594133472097, 1594133472099, 1594133472102, 1594133472103, 1594133472105, 1594133472107, 1594133472111, 1594133472113, 1594133472115, 1594133472117, 1594133472119, 1594133472121, 1594133472139, 1594133472141, 1594133472155, 1594133472157, 1594133472160, 1594133472163, 1594133472166, 1594133472169, 1594133472171, 1594133472174, 1594133472179, 1594133472181, 1594133472188, 1594133472190, 1594133472193, 1594133472195, 1594133472197, 1594133472199, 1594133472201, 1594133472204, 1594133472207, 1594133472209, 1594133472212, 1594133472215, 1594133472217, 1594133472219, 1594133472221, 1594133472222, 1594133472224, 1594133472227, 1594133472229, 1594133472231, 1594133472234, 1594133472235, 1594133472237, 1594133472240, 1594133472242, 1594133472243, 1594133472247, 1594133472249, 1594133472252, 1594133472255, 1594133472257, 1594133472262, 1594133472267, 1594133472270, 1594133472272, 1594133472275, 1594133472277, 1594133472279, 1594133472282, 1594133472289, 1594133472293, 1594133472295, 1594133472297, 1594133472299, 1594133472300, 1594133472303, 1594133472305, 1594133472307, 1594133472309, 1594133472311, 1594133472312, 1594133472316, 1594133472319, 1594133472321, 1594133472323, 1594133472325, 1594133472327, 1594133472333, 1594133472335, 1594133472337, 1594133472338, 1594133472341, 1594133472343, 1594133472346, 1594133472351, 1594133472356, 1594133472359, 1594133472361, 1594133472363, 1594133472364, 1594133472366, 1594133472371, 1594133472374, 1594133472377, 1594133472379, 1594133472381, 1594133472385, 1594133472393, 1594133472399, 1594133472401, 1594133472402, 1594133472404, 1594133472406, 1594133472409, 1594133472412, 1594133472413, 1594133472415, 1594133472417, 1594133472419, 1594133472421, 1594133472423, 1594133472425, 1594133472427, 1594133472429, 1594133472432, 1594133472434, 1594133472436, 1594133472439, 1594133472442, 1594133472469, 1594133472479, 1594133472483, 1594133472485, 1594133472488, 1594133472491, 1594133472494, 1594133472496, 1594133472498, 1594133472500, 1594133472503, 1594133472506, 1594133472508, 1594133472510, 1594133472512, 1594133472515, 1594133472520, 1594133472522, 1594133472524, 1594133472526, 1594133472528, 1594133472530, 1594133472532, 1594133472534, 1594133472535, 1594133472537, 1594133472539, 1594133472541, 1594133472543, 1594133472545, 1594133472547, 1594133472549, 1594133472551, 1594133472552, 1594133472554, 1594133472556, 1594133472558, 1594133472560, 1594133472562, 1594133472564, 1594133472566, 1594133472568, 1594133472570, 1594133472572, 1594133472573, 1594133472575, 1594133472577, 1594133472579, 1594133472581, 1594133472583, 1594133472587, 1594133472589, 1594133472593, 1594133472595, 1594133472596, 1594133472598, 1594133472599, 1594133472601, 1594133472603, 1594133472605, 1594133472606, 1594133472609, 1594133472611, 1594133472613, 1594133472615, 1594133472619, 1594133472622, 1594133472624, 1594133472626, 1594133472631, 1594133472633, 1594133472635, 1594133472637, 1594133472639, 1594133472641, 1594133472643, 1594133472644, 1594133472646, 1594133472649, 1594133472651, 1594133472653, 1594133472654, 1594133472657, 1594133472659, 1594133472660, 1594133472662, 1594133472664, 1594133472666, 1594133472667, 1594133472669, 1594133472671, 1594133472673, 1594133472675, 1594133472676], 'offsets': [26637463, 26637464, 26637465, 26637466, 26637467, 26637468, 26637469, 26637470, 26637471, 26637472, 26637473, 26637474, 26637475, 26637476, 26637477, 26637478, 26637479, 26637480, 26637481, 26637482, 26637483, 26637484, 26637485, 26637486, 26637487, 26637488, 26637489, 26637490, 26637491, 26637492, 26637493, 26637494, 26637495, 26637496, 26637497, 26637498, 26637499, 26637500, 26637501, 26637502, 26637503, 26637504, 26637505, 26637506, 26637507, 26637508, 26637509, 26637510, 26637511, 26637512, 26637513, 26637514, 26637515, 26637516, 26637517, 26637518, 26637519, 26637520, 26637521, 26637522, 26637523, 26637524, 26637525, 26637526, 26637527, 26637528, 26637529, 26637530, 26637531, 26637532, 26637533, 26637534, 26637535, 26637536, 26637537, 26637538, 26637539, 26637540, 26637541, 26637542, 26637543, 26637544, 26637545, 26637546, 26637547, 26637548, 26637549, 26637550, 26637551, 26637552, 26637553, 26637554, 26637555, 26637556, 26637557, 26637558, 26637559, 26637560, 26637561, 26637562, 26637563, 26637564, 26637565, 26637566, 26637567, 26637568, 26637569, 26637570, 26637571, 26637572, 26637573, 26637574, 26637575, 26637576, 26637577, 26637578, 26637579, 26637580, 26637581, 26637582, 26637583, 26637584, 26637585, 26637586, 26637587, 26637588, 26637589, 26637590, 26637591, 26637592, 26637593, 26637594, 26637595, 26637596, 26637597, 26637598, 26637599, 26637600, 26637601, 26637602, 26637603, 26637604, 26637605, 26637606, 26637607, 26637608, 26637609, 26637610, 26637611, 26637612, 26637613, 26637614, 26637615, 26637616, 26637617, 26637618, 26637619, 26637620, 26637621, 26637622, 26637623, 26637624, 26637625, 26637626, 26637627, 26637628, 26637629, 26637630, 26637631, 26637632, 26637633, 26637634, 26637635, 26637636, 26637637, 26637638, 26637639, 26637640, 26637641, 26637642, 26637643, 26637644, 26637645, 26637646, 26637647, 26637648, 26637649, 26637650, 26637651, 26637652, 26637653, 26637654, 26637655, 26637656, 26637657, 26637658, 26637659, 26637660, 26637661, 26637662, 26637663, 26637664, 26637665, 26637666, 26637667, 26637668, 26637669, 26637670, 26637671, 26637672, 26637673, 26637674, 26637675, 26637676, 26637677, 26637678, 26637679, 26637680, 26637681, 26637682, 26637683, 26637684, 26637685, 26637686, 26637687, 26637688, 26637689, 26637690, 26637691, 26637692, 26637693, 26637694, 26637695], Date: 2020-06-01 13:35:00 You can see the offset monotonically increasing. I'm wondering what could cause this difference between only a few duplicate messages compared to the many hundred duplicate messages. I'm expecting that the retries property would have a hand in limiting the number of duplicates although it doesn't really show.
Kafka Consumer reading messages only when two messages stack
We have a kafka producer that produces some messages once in a while. I wrote a Consumer to consume these messages. Problem is, the messages are consumed only when 2 of them stack. For example if a message is produced at 13:00 the consumer doesn't do anything. If another message is produced at 13:01, the consumer consumes both messages. In kafkaTool, at consumer properties it's present a column called LAG that when the message is not consumed is 1. Is there any config for this thing that I'm missing? The Consumer Config: 16:43:04,472 INFO [org.apache.kafka.clients.consumer.ConsumerConfig] (http--0.0.0.0-8180-1) ConsumerConfig values: request.timeout.ms = 180001 check.crcs = true retry.backoff.ms = 100 ssl.truststore.password = null ssl.keymanager.algorithm = SunX509 receive.buffer.bytes = 32768 ssl.cipher.suites = null ssl.key.password = null sasl.kerberos.ticket.renew.jitter = 0.05 ssl.provider = null sasl.kerberos.service.name = null session.timeout.ms = 180000 sasl.kerberos.ticket.renew.window.factor = 0.8 bootstrap.servers = [mtxbuctra22.prod.orange.intra:9092] client.id = fetch.max.wait.ms = 180000 fetch.min.bytes = 1024 key.deserializer = class io.confluent.kafka.serializers.KafkaAvroDeserializer sasl.kerberos.kinit.cmd = /usr/bin/kinit auto.offset.reset = earliest value.deserializer = class io.confluent.kafka.serializers.KafkaAvroDeserializer ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1] partition.assignment.strategy = [org.apache.kafka.clients.consumer.RangeAssignor] ssl.endpoint.identification.algorithm = null max.partition.fetch.bytes = 1048576 ssl.keystore.location = null ssl.truststore.location = null ssl.keystore.password = null metrics.sample.window.ms = 30000 metadata.max.age.ms = 300000 security.protocol = PLAINTEXT auto.commit.interval.ms = 1000 ssl.protocol = TLS sasl.kerberos.min.time.before.relogin = 60000 connections.max.idle.ms = 540000 ssl.trustmanager.algorithm = PKIX group.id = ifd_006 enable.auto.commit = true metric.reporters = [] ssl.truststore.type = JKS send.buffer.bytes = 131072 reconnect.backoff.ms = 50 metrics.num.samples = 2 ssl.keystore.type = JKS heartbeat.interval.ms = 3000 16:43:04,493 INFO [io.confluent.kafka.serializers.KafkaAvroDeserializerConfig] (http--0.0.0.0-8180-1) KafkaAvroDeserializerConfig values: max.schemas.per.subject = 1000 specific.avro.reader = true schema.registry.url = [http://mtxbuctra22.prod.orange.intra:8081] 16:43:04,498 INFO [io.confluent.kafka.serializers.KafkaAvroDeserializerConfig] (http--0.0.0.0-8180-1) KafkaAvroDeserializerConfig values: max.schemas.per.subject = 1000 specific.avro.reader = true schema.registry.url = [http://mtxbuctra22.prod.orange.intra:8081] Kafka tool:
Figured it out. In documentation for kafka 0.9.0.1 it's stated that fetch.min.bytes is 1. But i have kafka 0.9.0.0. And the default value is 1024. So, only after 2 messages this value was passed. Changed the fetch.min.bytes to 1 and now it works ok.
Kafka Java consumer marked as dead for group
I'm using a Java consumer to consume messages from a topic (kafka version 0.10.0.1) which works fine if I run them outside of docker container. When I execute them in docker container, however, then the groups are marked as dead with message Marking the coordinator local.kafka.com:9092 (id: 2147483647 rack: null) dead for group my-group My consumer configuration are as follows:- metadata.max.age.ms = 300000 partition.assignment.strategy =[org.apache.kafka.clients.consumer.RangeAssignor] reconnect.backoff.ms = 50 sasl.kerberos.ticket.renew.window.factor = 0.8 max.partition.fetch.bytes = 1048576 bootstrap.servers = [192.168.115.128:9092, 192.168.115.128:9093] ssl.keystore.type = JKS enable.auto.commit = true sasl.mechanism = GSSAPI interceptor.classes = null exclude.internal.topics = true ssl.truststore.password = null client.id = consumer-1 ssl.endpoint.identification.algorithm = null max.poll.records = 2147483647 check.crcs = true request.timeout.ms = 40000 heartbeat.interval.ms = 3000 auto.commit.interval.ms = 5000 receive.buffer.bytes = 65536 ssl.truststore.type = JKS ssl.truststore.location = null ssl.keystore.password = null fetch.min.bytes = 1 send.buffer.bytes = 131072 value.deserializer = class org.apache.kafka.common.serialization.StringDeserializer group.id = my-group retry.backoff.ms = 100 sasl.kerberos.kinit.cmd = /usr/bin/kinit sasl.kerberos.service.name = null sasl.kerberos.ticket.renew.jitter = 0.05 ssl.trustmanager.algorithm = PKIX ssl.key.password = null fetch.max.wait.ms = 500 sasl.kerberos.min.time.before.relogin = 60000 connections.max.idle.ms = 540000 session.timeout.ms = 30000 metrics.num.samples = 2 key.deserializer = class org.apache.kafka.common.serialization.StringDeserializer ssl.protocol = TLS ssl.provider = null ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1] ssl.keystore.location = null ssl.cipher.suites = null security.protocol = PLAINTEXT ssl.keymanager.algorithm = SunX509 metrics.sample.window.ms = 30000 auto.offset.reset = earliest The auto.commit property is set to false and the poll.timeout is set to 10000. Can somebody please point out where I am mistaken?
It might be your advertised.listener (broker config) or lack thereof passing the consumer an incorrect URL back after the first discovery call from boostrap.servers in your consumer. This can cause the consumer to use an incorrect URL for the rest of the RPC calls.
In short this means inacitivity in the communication between broker consumer - in the AbstractConsumer the connection is terminated. A reference to an actual implementation of mine in Spark streaming. In the application our batches could last up to five minutes, therefore we adjusted the Kafka properties under these settings: "heartbeat.interval.ms" -> "30000" "session.timeout.ms" -> "90000" "request.timeout.ms" -> "120000" For the interval, this is a times-five of the original default value, which in the documentation is said to be suited for half minutes batches; pay attention you have to account for the extremely long batches (those which lag out). The other two are just bigger than that because Kafka requires so. A related configuration is: spark.streaming.kafka.consumer.poll.ms About this one, it might be meaningful to set it rather small, like ten seconds, under the rationale that if something goes wrong, then a setup with a high number of Spark task reattempts: spark.task.maxFailures will cover for this. I have always found the configuration for KafkaSpark quite daunting, especially on the Kafka side. The rule of thumb is always: go with the defaults, and override only if stricly needed.