"Commit failed for offsets" while committing offset asynchronously - java

I have a kafka consumer from which I am consuming data from a particular topic and I am seeing below exception. I am using 0.10.0.0 kafka version.
LoggingCommitCallback.onComplete: Commit failed for offsets= {....}, eventType= some_type, time taken= 19ms, error= org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured session.timeout.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records.
I added these two extra consumer properties but still it didn't helped:
session.timeout.ms=20000
max.poll.records=500
I am committing offsets in a different background thread as shown below:
kafkaConsumer.commitAsync(new LoggingCommitCallback(consumerType.name()));
What does that error mean and how can I resolve it? Do I need to add some other consumer properties?

Yes, lower max.poll.records. You'll get smaller batches of data but there more frequent calls to poll that will result will help keep the session alive.

Related

What causes duplicate messages to be repulled when using commitSync?

I have a consumer set up that manually commits offsets. Events are in the millions to low billions. I'm committing offsets if and only if processing was successful within the consumer batch being processed. However, we're noticing that even with commitSync being called successfully, we have hundreds of thousands of duplicates. We will commitSync and just repull the same exact data in the consumer on the next poll from the topic. Why would this happen?
#Ryan - Kindly ensure that you have set below property for Consumer
props.setProperty("enable.auto.commit", "false");
Even if above is not giving you the desired result due to huge load, kindly commit the current offset with below constructor so that you will not get the offset in next Poll.
public void commitSync(java.util.Map<TopicPartition,OffsetAndMetadata> offsets)
Follow API at
commitSync

Apache kafka - manual acknowledge(AbstractMessageListenerContainer.AckMode.MANUAL) not working and events replayed on library upgrade

Kafka events getting replayed to consumer repeatedly. I can see following exception -
5-Nov-2019 10:43:25 ERROR [org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.run : 685] :: org.springframework.kafka.KafkaListenerEndpointContainer#2-0-C-1 :: :: Container exception
org.apache.kafka.clients.consumer.CommitFailedException:
Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member.
This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing.
You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches returned in poll() with max.poll.records.
But in my case, it's just 1 message which takes more than 30 mins to process so we acknowledge it on receiving. So i don't think no.of records is an issue. I know it can be solved by increasing max.poll.interval.ms but it used to work before upgrade. Trying to figure out what is optimal workaround.
Tried with AbstractMessageListenerContainer.AckMode.MANUAL_IMMEDIATE seems to commit offset immediately and works, but I need to figure out why AbstractMessageListenerContainer.AckMode.MANUAL fails now
Previous working jar versions:
spring-kafka-1.0.5.RELEASE.jar
kafka-clients-0.9.0.1.jar
Current versions (getting above exception):
spring-kafka-1.3.9.RELEASE.jar
kafka-clients-2.2.1.jar
Yes, you must increase max.poll.interval.ms; you can use MANUAL_IMMEDIATE instead to commit the offset immediately (with MANUAL, the commit is enqueued, the actual commit is not performed until the thread exits the listener).
However, this will still not prevent a rebalance because Kafka requires the consumer to call poll() within max.poll.interval.ms.
So I suggest you switch to MANUAL_IMMEDIATE and increase the interval beyond 30 minutes.
With the old version (before 1.3), there were two threads - one for the consumer and one for the listener, so the queued ack was processed earlier. But it was a very complicated threading model which was much simplified in 1.3, thanks to KIP-62, but this side effect was the result.

Delayed ACK in Spring Kafka

I'm using Spring and Spring Kafka for a batching service that collects data from Kafka until certain conditions are met, and then dumps the data.
I want to acknowledge the commits when the data leaves my service, but it could potentially sit in memory for 5-10 minutes.
Given that the Acknowledgement implementations in Spring Kafka hold on to the original record(s) it seems unreasonable to hold on to them until I dump my data given what that would do to memory utilization.
Is there any other way to acknowledge / commit offsets from Spring Kafka given just the partition/offset information?
You could use AckMode.TIME or AckMode.COUNT with an incredibly large ackTime or ackCount so the container will never do the ack.
Then, pass the Consumer<?, ?> into your listener method and do the offset commit yourself.
Note, however, that the consumer is not thread-safe so you must perform the commit on the listener thread.
Also, bear in mind that records are not individually ack'd, just the offset. You can't ack "out of order".
Also, you would likely need to increase the max.poll.interval.ms beyond its default (5 minutes) to avoid a rebalance of the partitions.

Keeping consumer alive using Kafka

I'm looking for a "low-cost" method to keep a consumer alive when I'm not actively polling. I.e., still processing records from the last poll, and I don't want the consumer connection to time out.
Some functions that look promising:
wakeup
commitAsync
In each case this would be non-standard usage of the API, so I'm not sure it would be a reasonable / rational approach.
RE: Setting the connection timeout higher - I want the consumer to timeout if it gets wedged. My question pertains to one section where I've fetched a block of records and separate threads are working through them.
The documentation seems to suggest you should call pause() and then keep actively polling. If you call poll() while paused, nothing will be returned.
For use cases where message processing time varies unpredictably,
neither of these options may be sufficient. The recommended way to
handle these cases is to move message processing to another thread,
which allows the consumer to continue calling poll while the processor
is still working. Some care must be taken to ensure that committed
offsets do not get ahead of the actual position. Typically, you must
disable automatic commits and manually commit processed offsets for
records only after the thread has finished handling them (depending on
the delivery semantics you need). Note also that you will need to
pause the partition so that no new records are received from poll
until after thread has finished handling those previously returned.
The documentation for pause() confirms this:
Suspend fetching from the requested partitions. Future calls to
poll(long) will not return any records from these partitions until
they have been resumed using resume(Collection). Note that this method
does not affect partition subscription. In particular, it does not
cause a group rebalance when automatic assignment is used.
Since Kafka 0.10.1, the consumer no longer heartbeats during poll calls. It runs the hearbeat in a separate thread. So if that's your version, there is nothing else to do. See KIP-62

Kafka KStreams - processing timeouts

I am attempting to use <KStream>.process() with a TimeWindows.of("name", 30000) to batch up some KTable values and send them on. It seems that 30 seconds exceeds the consumer timeout interval after which Kafka considers said consumer to be defunct and releases the partition.
I've tried upping the frequency of poll and commit interval to avoid this:
config.put(StreamsConfig.COMMIT_INTERVAL_MS_CONFIG, "5000");
config.put(StreamsConfig.POLL_MS_CONFIG, "5000");
Unfortunately these errors are still occurring:
(lots of these)
ERROR o.a.k.s.p.internals.RecordCollector - Error sending record to topic kafka_test1-write_aggregate2-changelog
org.apache.kafka.common.errors.TimeoutException: Batch containing 1 record(s) expired due to timeout while requesting metadata from brokers for kafka_test1-write_aggregate2-changelog-0
Followed by these:
INFO o.a.k.c.c.i.AbstractCoordinator - Marking the coordinator 12.34.56.7:9092 (id: 2147483547 rack: null) dead for group kafka_test1
WARN o.a.k.s.p.internals.StreamThread - Failed to commit StreamTask #0_0 in thread [StreamThread-1]:
org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured session.timeout.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records.
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator$OffsetCommitResponseHandler.handle(ConsumerCoordinator.java:578)
Clearly I need to be sending heartbeats back to the server more often. How?
My topology is:
KStreamBuilder kStreamBuilder = new KStreamBuilder();
KStream<String, String> lines = kStreamBuilder.stream(TOPIC);
KTable<Windowed<String>, String> kt = lines.aggregateByKey(
new DBAggregateInit(),
new DBAggregate(),
TimeWindows.of("write_aggregate2", 30000));
DBProcessorSupplier dbProcessorSupplier = new DBProcessorSupplier();
kt.toStream().process(dbProcessorSupplier);
KafkaStreams kafkaStreams = new KafkaStreams(kStreamBuilder, streamsConfig);
kafkaStreams.start();
The KTable is grouping values by key every 30 seconds. In Processor.init() I call context.schedule(30000).
DBProcessorSupplier provides an instance of DBProcessor. This is an implementation of AbstractProcessor where all the overrides have been provided. All they do is LOG so I know when each is being hit.
It's a pretty simple topology but it's clear I'm missing a step somewhere.
Edit:
I get that I can adjust this on the server side but Im hoping there is a client-side solution. I like the notion of partitions being made available pretty quickly when a client exits / dies.
Edit:
In an attempt to simplify the problem I removed the aggregation step from the graph. It's now just consumer->processor(). (If I send the consumer directly to .print() it works v quickly so I know it's ok). (Similarly If I output the aggregation (KTable) via .print() it seems ok too).
What I found was that the .process() - which should be calling .punctuate() every 30 seconds is actually blocking for variable lengths of time and outputting somewhat randomly (if at all).
Main program
Debug output
Processor Supplier
Processor
Further:
I set the debug level to 'debug' and reran. Im seeing lots of messages:
DEBUG o.a.k.s.p.internals.StreamTask - Start processing one record [ConsumerRecord <info>
but a breakpoint in the .punctuate() function isn't getting hit. So it's doing lots of work but not giving me a chance to use it.
A few clarifications:
StreamsConfig.COMMIT_INTERVAL_MS_CONFIG is a lower bound on the commit interval, ie, after a commit, the next commit happens not before this time passed. Basically, Kafka Stream tries to commit ASAP after this time passed, but there is no guarantee whatsoever how long it will actually take to do the next commit.
StreamsConfig.POLL_MS_CONFIG is used for the internal KafkaConsumer#poll() call, to specify the maximum blocking time of the poll() call.
Thus, both values are not helpful to heartbeat more often.
Kafka Streams follows a "depth-first" strategy when processing record. This means, that after a poll() for each record all operators of the topology are executed. Let's assume you have three consecutive maps, than all three maps will be called for the first record, before the next/second record will get processed.
Thus, the next poll() call will be made, after all record of the first poll() got fully processed. If you want to heartbeat more often, you need to make sure, that a single poll() call fetches less records, such that processing all records takes less time and the next poll() will be triggered earlier.
You can use configuration parameters for KafkaConsumer that you can specify via StreamsConfig to get this done (see https://kafka.apache.org/documentation.html#consumerconfigs):
streamConfig.put(ConsumerConfig.XXX, VALUE);
max.poll.records: if you decrease this value, less record will be polled
session.timeout.ms: if you increase this value, there is more time for processing data (adding this for completeness because it is actually a client setting and not a server/broker side configuration -- even if you are aware of this solution and do not like it :))
EDIT
As of Kafka 0.10.1 it is possible (and recommended) to prefix consumer and procuder configs within streams config. This avoids parameter conflicts as some parameter names are used for consumer and producer and cannot be distinguiesh otherwise (and would be applied to consumer and producer at the same time).
To prefix a parameter you can use StreamsConfig#consumerPrefix() or StreamsConfig#producerPrefix(), respectively. For example:
streamsConfig.put(StreamsConfig.consumerPrefix(ConsumerConfig.PARAMETER), VALUE);
One more thing to add: The scenario described in this question is a known issue and there is already KIP-62 that introduces a background thread for KafkaConsumer that send heartbeats, thus decoupling heartbeats from poll() calls. Kafka Streams will leverage this new feature in upcoming releases.

Categories