I have Java 8 application working with Apache Kafka 2.11-0.10.1.0. I need to use the seek feature to poll old messages from partitions. However I faced an exception of No current assignment for partition which is occurred every time I am trying to seekByOffset. Here's my class which is responsible for seeking topics to the specified timestamp:
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.clients.consumer.OffsetAndTimestamp;
import org.apache.kafka.common.PartitionInfo;
import org.apache.kafka.common.TopicPartition;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.util.CollectionUtils;
import java.time.Instant;
import java.util.Collection;
import java.util.Collections;
import java.util.List;
import java.util.Map;
import java.util.function.Function;
import java.util.regex.Pattern;
import java.util.stream.Collectors;
/**
* The main purpose of this class is to move fetching point for each partition of the {#link KafkaConsumer}
* to some offset which is determined either by timestamp or by offset number.
*/
public class KafkaSeeker {
public static final long APP_STARTUP_TIME = Instant.now().toEpochMilli();
private final Logger LOGGER = LoggerFactory.getLogger(this.getClass());
private final KafkaConsumer<String, String> kafkaConsumer;
private ConsumerRecords<String, String> polledRecords;
public KafkaSeeker(KafkaConsumer<String, String> kafkaConsumer) {
this.kafkaConsumer = kafkaConsumer;
this.polledRecords = new ConsumerRecords<>(Collections.emptyMap());
}
/**
* For each assigned or subscribed topic {#link org.apache.kafka.clients.consumer.KafkaConsumer#seek(TopicPartition, long)}
* fetching pointer to the specified {#code timestamp}.
* If no messages were found in each partition for a topic,
* then {#link org.apache.kafka.clients.consumer.KafkaConsumer#seekToEnd(Collection)} will be called.
*
* Due to {#link KafkaConsumer#subscribe(Pattern)} and {#link KafkaConsumer#assign(Collection)} laziness
* method needs to execute dummy {#link KafkaConsumer#poll(long)} method. All {#link ConsumerRecords} which were
* polled from buffer are swallowed and produce warning logs.
*
* #param timestamp is used to find proper offset to seek to
* #param topics are used to seek only specific topics. If not specified or empty, all subscribed topics are used.
*/
public Map<TopicPartition, OffsetAndTimestamp> seek(long timestamp, Collection<String> topics) {
this.polledRecords = kafkaConsumer.poll(0);
Collection<TopicPartition> topicPartitions;
if (CollectionUtils.isEmpty(topics)) {
topicPartitions = kafkaConsumer.assignment();
} else {
topicPartitions = topics.stream()
.map(it -> {
List<Integer> partitions = kafkaConsumer.partitionsFor(it).stream()
.map(PartitionInfo::partition).collect(Collectors.toList());
return partitions.stream().map(partition -> new TopicPartition(it, partition));
})
.flatMap(it -> it)
.collect(Collectors.toList());
}
if (topicPartitions.isEmpty()) {
throw new IllegalStateException("Kafka consumer doesn't have any subscribed topics.");
}
Map<TopicPartition, Long> timestampsByTopicPartitions = topicPartitions.stream()
.collect(Collectors.toMap(Function.identity(), topicPartition -> timestamp));
Map<TopicPartition, Long> beginningOffsets = kafkaConsumer.beginningOffsets(topicPartitions);
Map<TopicPartition, OffsetAndTimestamp> offsets = kafkaConsumer.offsetsForTimes(timestampsByTopicPartitions);
for (Map.Entry<TopicPartition, OffsetAndTimestamp> entry : offsets.entrySet()) {
TopicPartition topicPartition = entry.getKey();
if (entry.getValue() != null) {
LOGGER.info("Kafka seek topic:partition [{}:{}] from [{} offset] to [{} offset].",
topicPartition.topic(),
topicPartition.partition(),
beginningOffsets.get(topicPartition),
entry.getValue());
kafkaConsumer.seek(topicPartition, entry.getValue().offset());
} else {
LOGGER.info("Kafka seek topic:partition [{}:{}] from [{} offset] to the end of partition.",
topicPartition.topic(),
topicPartition.partition());
kafkaConsumer.seekToEnd(Collections.singleton(topicPartition));
}
}
return offsets;
}
public ConsumerRecords<String, String> getPolledRecords() {
return polledRecords;
}
}
Before calling the method I have consumer subscribed to a single topic like this consumer.subscribe(singletonList(kafkaTopic));. When I get kafkaConsumer.assignment() it returns zero TopicPartitions assigned. But if I specify the topic and get its partitions then I have valid TopicPartitions, although they are failing on seek call with the error in the title. What is something I forgot?
The correct way to reliably seek and check current assignment is to wait for the onPartitionsAssigned() callback after subscribing. On a newly created (still not connected) consumer, calling poll() once does not guarantees it will immedaitely be connected and assigned partitions.
As a basic example, see the code below that subscribes to a topic, and in the assigned callback, seeks to the desired position. Finally you'll notice that the poll loop correctly only sees records from the seek location and not from the previous committed or reset offset.
public static final Map<TopicPartition, Long> offsets = Map.of(new TopicPartition("testtopic", 0), 5L);
public static void main(String args[]) {
Properties props = new Properties();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(ConsumerConfig.GROUP_ID_CONFIG, "test");
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
try (Consumer<String, String> consumer = new KafkaConsumer<>(props)) {
consumer.subscribe(Collections.singletonList("testtopic"), new ConsumerRebalanceListener() {
#Override
public void onPartitionsRevoked(Collection<TopicPartition> partitions) {}
#Override
public void onPartitionsAssigned(Collection<TopicPartition> partitions) {
System.out.println("Assigned " + partitions);
for (TopicPartition tp : partitions) {
OffsetAndMetadata oam = consumer.committed(tp);
if (oam != null) {
System.out.println("Current offset is " + oam.offset());
} else {
System.out.println("No committed offsets");
}
Long offset = offsets.get(tp);
if (offset != null) {
System.out.println("Seeking to " + offset);
consumer.seek(tp, offset);
}
}
}
});
for (int i = 0; i < 10; i++) {
System.out.println("Calling poll");
ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100L));
for (ConsumerRecord<String, String> r : records) {
System.out.println("record from " + r.topic() + "-" + r.partition() + " at offset " + r.offset());
}
}
}
}
KafkaConsumer<String, byte[]> consumer = new KafkaConsumer<>(props);
// Get topic partitions
List<TopicPartition> partitions = consumer
.partitionsFor(topic)
.stream()
.map(partitionInfo ->
new TopicPartition(topic, partitionInfo.partition()))
.collect(Collectors.toList());
// Explicitly assign the partitions to our consumer
consumer.assign(partitions);
//seek, query offsets, or poll
Please note that this disables consumer group management and rebalancing operations. When possible use #Mickael Maison's approach.
Related
I'm attempting to send data via an actor to a runnable graph that contains a fan out.
I define the source as :
final Source<Integer, ActorRef> integerSource =
Source.actorRef(
elem -> {
if (elem == Done.done()) return Optional.of(CompletionStrategy.immediately());
else return Optional.empty();
},
elem -> Optional.empty(),
10,
OverflowStrategy.dropHead());
But I'm unsure how to get a handle on an ActoRef to send data via an actor to the source so that the runnable graph will process messages asynchronously as they are received :
RunnableGraph<CompletionStage<Done>> graph = RunnableGraph.fromGraph(
GraphDSL.create(sink, (builder, out) -> {
SourceShape<Integer> sourceShape = builder.add(integerSource);
FlowShape<Integer, Integer> flow1Shape = builder.add(flow1);
FlowShape<Integer, Integer> flow2Shape = builder.add(flow1);
UniformFanOutShape<Integer, Integer> broadcast =
builder.add(Broadcast.create(2));
UniformFanInShape<Integer, Integer> merge =
builder.add(Merge.create(2));
builder.from(sourceShape)
.viaFanOut(broadcast)
.via(flow1Shape);
builder.from(broadcast).via(flow2Shape);
builder.from(flow1Shape)
.viaFanIn(merge)
.to(out);
builder.from(flow2Shape).viaFanIn(merge);
return ClosedShape.getInstance();
} )
);
Entire src :
import akka.Done;
import akka.NotUsed;
import akka.actor.ActorRef;
import akka.actor.typed.ActorSystem;
import akka.actor.typed.javadsl.Behaviors;
import akka.stream.*;
import akka.stream.javadsl.*;
import lombok.extern.slf4j.Slf4j;
import java.util.Optional;
import java.util.concurrent.CompletionStage;
#Slf4j
public class GraphActorSource {
private final static ActorSystem actorSystem = ActorSystem.create(Behaviors.empty(), "flowActorSystem");
public void runFlow() {
final Source<Integer, ActorRef> integerSource =
Source.actorRef(
elem -> {
if (elem == Done.done()) return Optional.of(CompletionStrategy.immediately());
else return Optional.empty();
},
elem -> Optional.empty(),
10,
OverflowStrategy.dropHead());
Flow<Integer, Integer, NotUsed> flow1 = Flow.of(Integer.class)
.map (x -> {
System.out.println("Flow 1 is processing " + x);
return (x * 2);
});
Sink<Integer, CompletionStage<Done>> sink = Sink.foreach(x -> {
System.out.println(x);
});
RunnableGraph<CompletionStage<Done>> graph = RunnableGraph.fromGraph(
GraphDSL.create(sink, (builder, out) -> {
SourceShape<Integer> sourceShape = builder.add(integerSource);
FlowShape<Integer, Integer> flow1Shape = builder.add(flow1);
FlowShape<Integer, Integer> flow2Shape = builder.add(flow1);
UniformFanOutShape<Integer, Integer> broadcast =
builder.add(Broadcast.create(2));
UniformFanInShape<Integer, Integer> merge =
builder.add(Merge.create(2));
builder.from(sourceShape)
.viaFanOut(broadcast)
.via(flow1Shape);
builder.from(broadcast).via(flow2Shape);
builder.from(flow1Shape)
.viaFanIn(merge)
.to(out);
builder.from(flow2Shape).viaFanIn(merge);
return ClosedShape.getInstance();
} )
);
graph.run(actorSystem);
}
public static void main(String args[]){
new GraphActorSource().runFlow();
}
}
How to send data to the Runnable graph via an actor?
Something like ? :
integerSource.tell(1)
integerSource.tell(2)
integerSource.tell(3)
ActorRef.tell works. Construct the graph blueprint so the source ActorRef will be returned when the blueprint is materialized and run.
For just one materialized object, use that materialized type for the materialized type parameter of the Graph.
Here the materialized type parameter for integerSource is ActorRef.
The materialized type parameter for Graph is also ActorRef.
Only integerSource is passed to GraphDSL.create.
Source<Integer, ActorRef> integerSource = ...
Graph<ClosedShape, ActorRef> graph =
GraphDSL.create(integerSource, (builder, src) -> {
...
});
RunnableGraph<ActorRef> runnableGraph = RunnableGraph.fromGraph(graph);
ActorRef actorRef = runnableGraph.run(actorSystem);
actorRef.tell(1, ActorRef.noSender());
To access more than one materialized object, a tuple must be constructed to capture them. If two objects from the materialized graph are desired, say src and snk, then Pair<A,B> can capture both types.
Here both integersource and sink are passed to GraphDSL.create.
The materialized ActorRef and CompletionStage are paired for the result of run with Pair::new.
The type Pair<ActorRef,CompletionStage<Done>> is the materialized type parameter of the Graph.
Source<Integer, ActorRef> integerSource = ...
Sink<Integer, CompletionStage<Done>> sink = ...
Graph<ClosedShape, Pair<ActorRef, CompletionStage<Done>>> graph =
GraphDSL.create(integerSource, sink, Pair::new, (builder, src, snk) -> {
....
});
RunnableGraph<Pair<ActorRef, CompletionStage<Done>>> runnableGraph =
RunnableGraph.fromGraph(graph);
Pair<ActorRef, CompletionStage<Done>> pair =
runnableGraph.run(actorSystem);
ActorRef actorRef = pair.first();
CompletionStage<Done> completionStage = pair.second();
actorRef.tell(1, ActorRef.noSender());
Full example:
(build.gradle)
apply plugin: "java"
apply plugin: "application"
mainClassName = "GraphActorSource"
repositories {
mavenCentral()
}
dependencies {
implementation "com.typesafe.akka:akka-actor-typed_2.13:2.6.19"
implementation "com.typesafe.akka:akka-stream-typed_2.13:2.6.19"
implementation 'org.slf4j:slf4j-jdk14:1.7.36'
}
compileJava {
options.compilerArgs << "-Xlint:unchecked"
}
(src/main/java/GraphActorSource.java)
import akka.Done;
import akka.NotUsed;
import akka.actor.ActorRef;
import akka.actor.Status.Success;
import akka.actor.typed.ActorSystem;
import akka.actor.typed.javadsl.Behaviors;
import akka.japi.Pair;
import akka.stream.*;
import akka.stream.javadsl.*;
import akka.util.Timeout;
import java.util.Optional;
import java.util.concurrent.CompletionStage;
import java.util.concurrent.TimeUnit;
public class GraphActorSource {
private final static ActorSystem actorSystem =
ActorSystem.create(Behaviors.empty(), "flowActorSystem");
public void runFlow() {
// 1. Create graph (blueprint)
// 1a. Define source, flows, and sink
final Source<Integer, ActorRef> integerSource =
Source.actorRef
(
elem -> {
if (elem == Done.done()) return Optional.of(CompletionStrategy.immediately());
else return Optional.empty();
},
elem -> Optional.empty(),
10,
OverflowStrategy.dropHead()
);
Flow<Integer, Integer, NotUsed> flow1 = Flow.of(Integer.class)
.map (x -> {
System.out.println("Flow 1 is processing " + x);
return (100 + x);
});
Flow<Integer, Integer, NotUsed> flow2 = Flow.of(Integer.class)
.map (x -> {
System.out.println("Flow 2 is processing " + x);
return (200 + x);
});
Sink<Integer, CompletionStage<Done>> sink = Sink.foreach(x -> {
System.out.println("Sink received "+x);
});
// 1b. Connect nodes and flows into a graph.
// Inputs and output nodes (source, sink) will be produced at run start.
Graph<ClosedShape, Pair<ActorRef, CompletionStage<Done>>> graph =
GraphDSL.create(integerSource, sink, Pair::new, (builder, src, snk) -> {
UniformFanOutShape<Integer, Integer> broadcast =
builder.add(Broadcast.create(2));
FlowShape<Integer, Integer> flow1Shape = builder.add(flow1);
FlowShape<Integer, Integer> flow2Shape = builder.add(flow2);
UniformFanInShape<Integer, Integer> merge =
builder.add(Merge.create(2));
builder.from(src)
.viaFanOut(broadcast);
builder.from(broadcast.out(0))
.via(flow1Shape)
.toInlet(merge.in(0));
builder.from(broadcast.out(1))
.via(flow2Shape)
.toInlet(merge.in(1));
builder.from(merge)
.to(snk);
return ClosedShape.getInstance();
} );
RunnableGraph<Pair<ActorRef, CompletionStage<Done>>> runnableGraph =
RunnableGraph.fromGraph(graph);
// 2. Start run,
// which produces materialized source ActorRef and sink CompletionStage.
Pair<ActorRef, CompletionStage<Done>> pair =
runnableGraph.run(actorSystem);
ActorRef actorRef = pair.first();
CompletionStage<Done> completionStage = pair.second();
// On completion, terminates actor system (optional).
completionStage.thenRun(() -> {
System.out.println("Done, terminating.");
actorSystem.terminate();
});
// 3. Send messages to source actor
actorRef.tell(1, ActorRef.noSender());
actorRef.tell(2, ActorRef.noSender());
// The stream completes successfully with the following message
actorRef.tell(Done.done(), ActorRef.noSender());
}
public static void main(String args[]){
new GraphActorSource().runFlow();
}
}
Reference Akka Documentation (accessed Version 2.6.19)
Streams / Operators / Source.actorRef
Streams / Streams Cookbook / Working with operators
Kafka consumer not receiving messages produced before the consumer gets started.
public class MyKafkaConsumer {
private final KafkaConsumer<String, String> consumer;
private final String TOPIC="javaapp";
private final String BOOTSTRAP_SERVERS="localhost:9092";
private int receivedCounter=0;
private ExecutorService executorService=Executors.newFixedThreadPool(1);
private BlockingQueue<ConsumerRecords<String, String>> queue=new LinkedBlockingQueue<>(500000);
private MyKafkaConsumer() {
final Properties props=new Properties();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, BOOTSTRAP_SERVERS);
props.put(ConsumerConfig.GROUP_ID_CONFIG, "KafkaGroup6");
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
consumer=new KafkaConsumer<>(props);
consumer.subscribe(Collections.singletonList(TOPIC));
}
public static void main(String[] args) throws InterruptedException {
MyKafkaConsumer perfKafkaConsumer=new MyKafkaConsumer();
perfKafkaConsumer.consumeMessage();
perfKafkaConsumer.runConsumer();
}
private void runConsumer() throws InterruptedException {
consumer.poll(Duration.ofMillis(1000));
while (true) {
final ConsumerRecords<String, String> consumerRecords=consumer.poll(Duration.ofMillis(10000));
if (!consumerRecords.isEmpty()) {
System.out.println("Adding result in queue " + queue.size());
queue.put(consumerRecords);
}
consumer.commitAsync();
}
}
private void consumeMessage() {
System.out.println("Consumer starts at " + Instant.now());
executorService.submit(() -> {
while (true) {
ConsumerRecords<String, String> poll=queue.take();
poll.forEach(record -> {
System.out.println("Received " + ++receivedCounter + " time " + Instant.now(Clock.systemUTC()));
});
}
});
}
}
ConsumerRecords are always empty
I checked the offset using Kafka tool
I have also tried with a different group name, it's not working. Same issue i.e. poll returns empty records
Although, if I start my consumer before than producer than its receiving the messages. (Kafka-client ver 2.4.1)
The auto.offset.reset consumer setting controls where a new consumer group will begin consuming from a topic. By default it is set to 'latest' which will set the consumer groups offset to the latest offset. You want to set this to 'earliest' if all consumer groups should start at the earliest offset in the topic.
I use following code to create one producer which produces around 2000 messages.
public class ProducerDemoWithCallback {
public static void main(String[] args) {
final Logger logger = LoggerFactory.getLogger(ProducerDemoWithCallback.class);
String bootstrapServers = "localhost:9092";
Properties properties = new Properties();
properties.setProperty(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
properties.setProperty(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
properties.setProperty(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
// create the producer
KafkaProducer<String, String> producer = new KafkaProducer<String, String>(properties);
for (int i=0; i<2000; i++ ) {
// create a producer record
ProducerRecord<String, String> record =
new ProducerRecord<String, String>("TwitterProducer", "Hello World " + Integer.toString(i));
// send data - asynchronous
producer.send(record, new Callback() {
public void onCompletion(RecordMetadata recordMetadata, Exception e) {
// executes every time a record is successfully sent or an exception is thrown
if (e == null) {
// the record was successfully sent
logger .info("Received new metadata. \n" +
"Topic:" + recordMetadata.topic() + "\n" +
"Partition: " + recordMetadata.partition() + "\n" +
"Offset: " + recordMetadata.offset() + "\n" +
"Timestamp: " + recordMetadata.timestamp());
} else {
logger .error("Error while producing", e);
}
}
});
}
// flush data
producer.flush();
// flush and close producer
producer.close();
}
}
I want to count those messages and get int value.
I use this command and it works, but i am trying to get this count using code.
"bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list localhost:9092 --topic TwitterProducer --time -1"
and the result is
- TwitterProducer:0:2000
My code to do the same programmatically looks something like this, but I'm not sure if this is the correct way to get the count:
int valueCount = (int) recordMetadata.offset();
System.out.println("Offset value " + valueCount);
Can someone help me to get count of Kafka messages offset value using code.
You can have a look at implementation details of GetOffsetShell.
Here is a simplified code re-written in Java:
import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.common.TopicPartition;
import org.apache.kafka.common.serialization.StringDeserializer;
import java.util.*;
import java.util.stream.Collectors;
public class GetOffsetCommand {
private static final Set<String> TopicNames = new HashSet<>();
static {
TopicNames.add("my-topic");
TopicNames.add("not-my-topic");
}
public static void main(String[] args) {
TopicNames.forEach(topicName -> {
final Map<TopicPartition, Long> offsets = getOffsets(topicName);
new ArrayList<>(offsets.entrySet()).forEach(System.out::println);
System.out.println(topicName + ":" + offsets.values().stream().reduce(0L, Long::sum));
});
}
private static Map<TopicPartition, Long> getOffsets(String topicName) {
final KafkaConsumer<String, String> consumer = makeKafkaConsumer();
final List<TopicPartition> partitions = listTopicPartitions(consumer, topicName);
return consumer.endOffsets(partitions);
}
private static KafkaConsumer<String, String> makeKafkaConsumer() {
final Properties props = new Properties();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
props.put(ConsumerConfig.GROUP_ID_CONFIG, "get-offset-command");
return new KafkaConsumer<>(props);
}
private static List<TopicPartition> listTopicPartitions(KafkaConsumer<String, String> consumer, String topicName) {
return consumer.listTopics().entrySet().stream()
.filter(t -> topicName.equals(t.getKey()))
.flatMap(t -> t.getValue().stream())
.map(p -> new TopicPartition(p.topic(), p.partition()))
.collect(Collectors.toList());
}
}
which produces the offset for each topic's partition and sum (total number of messages), like:
my-topic-0=184
my-topic-2=187
my-topic-4=189
my-topic-1=196
my-topic-3=243
my-topic:999
Why do you want to get that value? If you share more detail about the purpose, I can give you more good tip.
For your last question, it's not the correct way to get the count of messages with the offset value. If your topic has one partition and the producer is one, you can use it. You need to consider that the topic has several partitions.
If you want to get the number of messages from each producer, you can count it in the callback function that is onCompletion()
Or you can get the last offset using Consumer client like this:
Properties props = new Properties();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "your-brokers");
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
Consumer<Long, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Collections.singletonList("topic_name");
Collection<TopicPartition> partitions = consumer.assignment();
consumer.seekToEnd(partitions);
for(TopicPartition tp: partitions) {
long offsetPosition = consumer.position(tp);
}
I'm getting up an application consuming kafka messages.
I followed Spring-docs about Deserialization Error Handling in order to catch deserialization exception. I've tried the failedDeserializationFunction method.
This is my Consumer Configuration Class
#Bean
public Map<String, Object> consumerConfigs() {
Map<String, Object> consumerProps = new HashMap<>();
consumerProps.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
consumerProps.put(ConsumerConfig.GROUP_ID_CONFIG, groupId);
consumerProps.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, offsetReset);
consumerProps.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, autoCommit);
/* Error Handling */
consumerProps.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, ErrorHandlingDeserializer2.class);
consumerProps.put(ErrorHandlingDeserializer2.VALUE_DESERIALIZER_CLASS, JsonDeserializer.class.getName());
consumerProps.put(ErrorHandlingDeserializer2.VALUE_FUNCTION, FailedNTCMessageBodyProvider.class);
return consumerProps;
}
#Bean
public ConsumerFactory<String, NTCMessageBody> consumerFactory() {
return new DefaultKafkaConsumerFactory<>(consumerConfigs(), new StringDeserializer(),
new JsonDeserializer<>(NTCMessageBody.class));
}
#Bean
public ConcurrentKafkaListenerContainerFactory<String, NTCMessageBody> kafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<String, NTCMessageBody> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory());
return factory;
}
This is the BiFunction Provider
public class FailedNTCMessageBodyProvider implements BiFunction<byte[], Headers, NTCMessageBody> {
#Override
public NTCMessageBody apply(byte[] t, Headers u) {
return new NTCBadMessageBody(t);
}
}
public class NTCBadMessageBody extends NTCMessageBody{
private final byte[] failedDecode;
public NTCBadMessageBody(byte[] failedDecode) {
this.failedDecode = failedDecode;
}
public byte[] getFailedDecode() {
return this.failedDecode;
}
}
When I send just one corrupted message on the topic I got this error (in loop):
org.apache.kafka.common.errors.SerializationException: Error deserializing key/value
I understood that the ErrorHandlingDeserializer2 should delegate the NTCBadMessageBody type and continue the consumption. I also saw (in debug mode) it didn't never go in the constructor of the NTCBadMessageBody class.
Use ErrorHandlingDeserializer.
When a deserializer fails to deserialize a message, Spring has no way to handle the problem because it occurs before the poll() returns. To solve this problem, version 2.2 introduced the ErrorHandlingDeserializer. This deserializer delegates to a real deserializer (key or value). If the delegate fails to deserialize the record content, the ErrorHandlingDeserializer returns a DeserializationException instead, containing the cause and raw bytes. When using a record-level MessageListener, if either the key or value contains a DeserializationException, the container’s ErrorHandler is called with the failed ConsumerRecord. When using a BatchMessageListener, the failed record is passed to the application along with the remaining records in the batch, so it is the responsibility of the application listener to check whether the key or value in a particular record is a DeserializationException.
You can use the DefaultKafkaConsumerFactory constructor that takes key and value Deserializer objects and wire in appropriate ErrorHandlingDeserializer configured with the proper delegates. Alternatively, you can use consumer configuration properties which are used by the ErrorHandlingDeserializer to instantiate the delegates. The property names are ErrorHandlingDeserializer.KEY_DESERIALIZER_CLASS and ErrorHandlingDeserializer.VALUE_DESERIALIZER_CLASS; the property value can be a class or class name
package com.mypackage.app.config;
import java.util.HashMap;
import java.util.Map;
import java.util.concurrent.TimeoutException;
import com.mypacakage.app.model.kafka.message.KafkaEvent;
import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.common.serialization.StringDeserializer;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.kafka.annotation.EnableKafka;
import org.springframework.kafka.config.ConcurrentKafkaListenerContainerFactory;
import org.springframework.kafka.core.ConsumerFactory;
import org.springframework.kafka.core.DefaultKafkaConsumerFactory;
import org.springframework.kafka.listener.ListenerExecutionFailedException;
import org.springframework.kafka.support.serializer.ErrorHandlingDeserializer;
import org.springframework.kafka.support.serializer.JsonDeserializer;
import org.springframework.retry.policy.SimpleRetryPolicy;
import org.springframework.retry.support.RetryTemplate;
import lombok.extern.slf4j.Slf4j;
#EnableKafka
#Configuration
#Slf4j
public class KafkaConsumerConfig {
#Value("${kafka.bootstrap-servers}")
private String servers;
#Value("${listener.group-id}")
private String groupId;
#Bean
public ConcurrentKafkaListenerContainerFactory<String, KafkaEvent> ListenerFactory() {
ConcurrentKafkaListenerContainerFactory<String, KafkaEvent> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory());
factory.setRetryTemplate(retryTemplate());
factory.setErrorHandler(((exception, data) -> {
/*
* here you can do you custom handling, I am just logging it same as default
* Error handler does If you just want to log. you need not configure the error
* handler here. The default handler does it for you. Generally, you will
* persist the failed records to DB for tracking the failed records.
*/
log.error("Error in process with Exception {} and the record is {}", exception, data);
}));
return factory;
}
#Bean
public ConsumerFactory<String, KafkaEvent> consumerFactory() {
Map<String, Object> config = new HashMap<>();
config.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, servers);
config.put(ConsumerConfig.GROUP_ID_CONFIG, groupId);
config.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, ErrorHandlingDeserializer.class);
config.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, ErrorHandlingDeserializer.class);
config.put(ErrorHandlingDeserializer.KEY_DESERIALIZER_CLASS, StringDeserializer.class);
config.put(ErrorHandlingDeserializer.VALUE_DESERIALIZER_CLASS, JsonDeserializer.class.getName());
config.put(JsonDeserializer.VALUE_DEFAULT_TYPE,
"com.mypackage.app.model.kafka.message.KafkaEvent");
config.put(JsonDeserializer.TRUSTED_PACKAGES, "com.mypackage.app");
return new DefaultKafkaConsumerFactory<>(config);
}
private RetryTemplate retryTemplate() {
RetryTemplate retryTemplate = new RetryTemplate();
/*
* here retry policy is used to set the number of attempts to retry and what
* exceptions you wanted to try and what you don't want to retry.
*/
retryTemplate.setRetryPolicy(retryPolicy());
return retryTemplate;
}
private SimpleRetryPolicy retryPolicy() {
Map<Class<? extends Throwable>, Boolean> exceptionMap = new HashMap<>();
// the boolean value in the map determines whether exception should be retried
exceptionMap.put(IllegalArgumentException.class, false);
exceptionMap.put(TimeoutException.class, true);
exceptionMap.put(ListenerExecutionFailedException.class, true);
return new SimpleRetryPolicy(3, exceptionMap, true);
}
}
ErrorHandlingDeserializer
When a deserializer fails to deserialize a message, Spring has no way to handle the problem because it occurs before the poll() returns. To solve this problem, version 2.2 introduced the ErrorHandlingDeserializer. This deserializer delegates to a real deserializer (key or value). If the delegate fails to deserialize the record content, the ErrorHandlingDeserializer returns a DeserializationException instead, containing the cause and raw bytes. When using a record-level MessageListener, if either the key or value contains a DeserializationException, the container’s ErrorHandler is called with the failed ConsumerRecord. When using a BatchMessageListener, the failed record is passed to the application along with the remaining records in the batch, so it is the responsibility of the application listener to check whether the key or value in a particular record is a DeserializationException.
So according to your code you are using record-level MessageListener then just add ErrorHandler to Container
Handling Exceptions
If your error handler implements this interface you can, for example, adjust the offsets accordingly. For example, to reset the offset to replay the failed message, you could do something like the following; note however, these are simplistic implementations and you would probably want more checking in the error handler.
#Bean
public ConsumerAwareListenerErrorHandler listen3ErrorHandler() {
return (m, e, c) -> {
this.listen3Exception = e;
MessageHeaders headers = m.getHeaders();
c.seek(new org.apache.kafka.common.TopicPartition(
headers.get(KafkaHeaders.RECEIVED_TOPIC, String.class),
headers.get(KafkaHeaders.RECEIVED_PARTITION_ID, Integer.class)),
headers.get(KafkaHeaders.OFFSET, Long.class));
return null;
};
}
Or you can do custom implementation like in this example
#Bean
public ConcurrentKafkaListenerContainerFactory<String, GenericRecord>
kafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<String, GenericRecord> factory
= new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory());
factory.getContainerProperties().setErrorHandler(new ErrorHandler() {
#Override
public void handle(Exception thrownException, List<ConsumerRecord<?, ?>> records, Consumer<?, ?> consumer, MessageListenerContainer container) {
String s = thrownException.getMessage().split("Error deserializing key/value for partition ")[1].split(". If needed, please seek past the record to continue consumption.")[0];
String topics = s.split("-")[0];
int offset = Integer.valueOf(s.split("offset ")[1]);
int partition = Integer.valueOf(s.split("-")[1].split(" at")[0]);
TopicPartition topicPartition = new TopicPartition(topics, partition);
//log.info("Skipping " + topic + "-" + partition + " offset " + offset);
consumer.seek(topicPartition, offset + 1);
System.out.println("OKKKKK");
}
#Override
public void handle(Exception e, ConsumerRecord<?, ?> consumerRecord) {
}
#Override
public void handle(Exception e, ConsumerRecord<?, ?> consumerRecord, Consumer<?,?> consumer) {
String s = e.getMessage().split("Error deserializing key/value for partition ")[1].split(". If needed, please seek past the record to continue consumption.")[0];
String topics = s.split("-")[0];
int offset = Integer.valueOf(s.split("offset ")[1]);
int partition = Integer.valueOf(s.split("-")[1].split(" at")[0]);
TopicPartition topicPartition = new TopicPartition(topics, partition);
//log.info("Skipping " + topic + "-" + partition + " offset " + offset);
consumer.seek(topicPartition, offset + 1);
System.out.println("OKKKKK");
}
});
return factory;
}
Above answer may have problem if the partion name have character like '-'. so, i have modified same logic with regex.
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import org.apache.kafka.clients.consumer.Consumer;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.common.TopicPartition;
import org.apache.kafka.common.errors.SerializationException;
import org.springframework.kafka.listener.ErrorHandler;
import org.springframework.kafka.listener.MessageListenerContainer;
import lombok.extern.slf4j.Slf4j;
#Slf4j
public class KafkaErrHandler implements ErrorHandler {
/**
* Method prevents serialization error freeze
*
* #param e
* #param consumer
*/
private void seekSerializeException(Exception e, Consumer<?, ?> consumer) {
String p = ".*partition (.*) at offset ([0-9]*).*";
Pattern r = Pattern.compile(p);
Matcher m = r.matcher(e.getMessage());
if (m.find()) {
int idx = m.group(1).lastIndexOf("-");
String topics = m.group(1).substring(0, idx);
int partition = Integer.parseInt(m.group(1).substring(idx));
int offset = Integer.parseInt(m.group(2));
TopicPartition topicPartition = new TopicPartition(topics, partition);
consumer.seek(topicPartition, (offset + 1));
log.info("Skipped message with offset {} from partition {}", offset, partition);
}
}
#Override
public void handle(Exception e, ConsumerRecord<?, ?> record, Consumer<?, ?> consumer) {
log.error("Error in process with Exception {} and the record is {}", e, record);
if (e instanceof SerializationException)
seekSerializeException(e, consumer);
}
#Override
public void handle(Exception e, List<ConsumerRecord<?, ?>> records, Consumer<?, ?> consumer,
MessageListenerContainer container) {
log.error("Error in process with Exception {} and the records are {}", e, records);
if (e instanceof SerializationException)
seekSerializeException(e, consumer);
}
#Override
public void handle(Exception e, ConsumerRecord<?, ?> record) {
log.error("Error in process with Exception {} and the record is {}", e, record);
}
}
finally use the error handler in config.
#Bean
public ConcurrentKafkaListenerContainerFactory<String, GenericType> macdStatusListenerFactory() {
ConcurrentKafkaListenerContainerFactory<String, GenericType> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(macdStatusConsumerFactory());
factory.setRetryTemplate(retryTemplate());
factory.setErrorHandler(new KafkaErrHandler());
return factory;
}
However parsing error string to get parition, topic and offset is not recommended. If anyone have better solution please post here.
in my factory I've added commonErrorHander
factory.setCommonErrorHandler(new KafkaMessageErrorHandler());
and KafkaMessageErrorHandler is created as follow
class KafkaMessageErrorHandler implements CommonErrorHandler {
#Override
public void handleRecord(Exception thrownException, ConsumerRecord<?, ?> record, Consumer<?, ?> consumer, MessageListenerContainer container) {
manageException(thrownException, consumer);
}
#Override
public void handleOtherException(Exception thrownException, Consumer<?, ?> consumer, MessageListenerContainer container, boolean batchListener) {
manageException(thrownException, consumer);
}
private void manageException(Exception ex, Consumer<?, ?> consumer) {
log.error("Error polling message: " + ex.getMessage());
if (ex instanceof RecordDeserializationException) {
RecordDeserializationException rde = (RecordDeserializationException) ex;
consumer.seek(rde.topicPartition(), rde.offset() + 1L);
consumer.commitSync();
} else {
log.error("Exception not handled");
}
}
}
I need to execute a Job at night that is going to get all the messages in a kafka queue and execute a process with them. I'm able to get the messages but the kafka stream is waiting for more messages and I'm not able to continue with my process. I have the following code:
...
private ConsumerConnector consumerConnector;
private final static String TOPIC = "test";
public MessageStreamConsumer() {
Properties properties = new Properties();
properties.put("zookeeper.connect", "localhost:2181");
properties.put("group.id", "test-group");
ConsumerConfig consumerConfig = new ConsumerConfig(properties);
consumerConnector = Consumer.createJavaConsumerConnector(consumerConfig);
}
public List<String> getMessages() {
Map<String, Integer> topicCountMap = new HashMap<String, Integer>();
topicCountMap.put(TOPIC, new Integer(1));
Map<String, List<KafkaStream<byte[], byte[]>>> consumerMap = consumerConnector
.createMessageStreams(topicCountMap);
KafkaStream<byte[], byte[]> stream = consumerMap.get(TOPIC).get(0);
ConsumerIterator<byte[], byte[]> it = stream.iterator();
List<String> messages = new ArrayList<>();
while (it.hasNext())
messages.add(new String(it.next().message()));
return messages;
}
The code is able to get the messages but when it process the last message it stays in the line:
while (it.hasNext())
The question is, how can i get all the messages from the kafka, stop the stream and continue with my other tasks.
I hope you can help me
Thanks
It seems that kafka stream does not support to consume from beginning.
You could create a native kafka consumer and set auto.offset.reset to earliest, then it will consume message from beginning.
Something like this may work. Basically the idea is to use a Kafka Consumer and poll until you get some record and then stop when you get an empty batch.
package kafka.examples;
import java.text.DateFormat;
import java.text.SimpleDateFormat;
import java.util.Calendar;
import java.util.Collections;
import java.util.Date;
import java.util.Properties;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.atomic.AtomicBoolean;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
public class Consumer1 extends Thread
{
private final KafkaConsumer<Integer, String> consumer;
private final String topic;
private final DateFormat df;
private final String logTag;
private boolean noMoreData = false;
private boolean gotData = false;
private int messagesReceived = 0;
AtomicBoolean isRunning = new AtomicBoolean(true);
CountDownLatch shutdownLatch = new CountDownLatch(1);
public Consumer1(Properties props)
{
logTag = "Consumer1";
consumer = new KafkaConsumer<>(props);
this.topic = props.getProperty("topic");
this.df = new SimpleDateFormat("HH:mm:ss");
consumer.subscribe(Collections.singletonList(this.topic));
}
public void getMessages() {
System.out.println("Getting messages...");
while (noMoreData == false) {
//System.out.println(logTag + ": Doing work...");
ConsumerRecords<Integer, String> records = consumer.poll(1000);
Date now = Calendar.getInstance().getTime();
int recordsCount = records.count();
messagesReceived += recordsCount;
System.out.println("recordsCount: " + recordsCount);
if (recordsCount > 0) {
gotData = true;
}
if (gotData && recordsCount == 0) {
noMoreData = true;
}
for (ConsumerRecord<Integer, String> record : records) {
int kafkaKey = record.key();
String kafkaValue = record.value();
System.out.println(this.df.format(now) + " " + logTag + ":" +
" Received: {" + kafkaKey + ":" + kafkaValue + "}" +
", partition(" + record.partition() + ")" +
", offset(" + record.offset() + ")");
}
}
System.out.println("Received " + messagesReceived + " messages");
}
public void processMessages() {
System.out.println("Processing messages...");
}
public void run() {
getMessages();
processMessages();
}
}
I'm currently developing with Kafka 0.10.0.1 and found mixed information regarding the use of consumer property auto.offset.reset so I've done some experiments to figure out what actually happens.
Based on those, I now understand it this way: when you set property:
auto.offset.reset=earliest
this positions the consumer to EITHER the first available message in the partitions assigned (when no commits have been made on the paritions) OR it positions the consumer at the last committed partition offsets (notice that you always commit last read offset + 1 or else you'll be re-reading the last committed message on each restart of your consumer)
Alternatively you do not set auto.offset.reset which means default value of 'latest' will be used.
In that case you do not receive any old messages on connecting the consumer - only messages published to the topic after connecting the consumer will be received.
As a conclusion - if you want to ensure to receive all available messages for a certain topic and assigned partitions you'll have to call seekToBeginning().
It seems advised to call poll(0L) first to ensure your consumer gets partitions assigned (or implement your code in the ConsumerRebalanceListener!), then seek each of the assigned partitions to 'beginning':
kafkaConsumer.poll(0L);
kafkaConsumer.seekToBeginning(kafkaConsumer.assignment());