I'am trying to implement a java application with redis streams where every consomer consumes exactly one message. Like a pipeline/queue where every consumer takes exactly one message, processes it and after finishing the consumer takes the next message which was not processed so far in the stream.
What works is that every message is consumed by exactly one consumer (with xreadgroup).
I started with this tutorial from redislabs
The code:
RedisClient redisClient = RedisClient.create("redis://pw#host:port");
StatefulRedisConnection<String, String> connection = redisClient.connect();
RedisCommands<String, String> syncCommands = connection.sync();
try {
syncCommands.xgroupCreate(XReadArgs.StreamOffset.from(STREAM_KEY, "0-0"), ID_READ_GROUP);
} catch (RedisBusyException redisBusyException) {
System.out.println(String.format("\t Group '%s' already exists", ID_READ_GROUP));
}
System.out.println("Waiting for new messages ");
while (true) {
List<StreamMessage<String, String>> messages = syncCommands.xreadgroup(
Consumer.from(ID_READ_GROUP, ID_WORKER), ReadArgs.StreamOffset.lastConsumed(STREAM_KEY));
if (!messages.isEmpty()) {
System.out.println(messages.size()); //
for (StreamMessage<String, String> message : messages) {
System.out.println(message.getId());
Thread.sleep(5000);
syncCommands.xack(STREAM_KEY, ID_READ_GROUP, message.getId());
}
}
}
My current problem is that a consumer takes more that one message from the queue and in some situations the other consumers are waiting and one consumer is processing 10 messages at once.
Thanks in advance!
Notice that XREADGROUP can get COUNT argument.
See the JavaDoc how to do it in Lettuce xreadgroup, by passing XReadArgs.
Related
I'm dealing with an Azure Service Bus Subscription and its Dead Letter Queue in Java SpringBoot setup.
I need to process the messages in DLQ when there is a trigger.
I have 12 messages in the DLQ, I need to read 5 messages in one go and submit it to an ExecutorService to process the individual messages.
I created an IMessageReceiver deadLetterReceiver, then did batch receiving as deadLetterReceiver.receiveBatch(5)
The catch here is until the messages in 1st batch are not processed, the next batch of messages is not read, and the 1st batch of messages will not be removed from DLQ, and it remains there.
The problem is after I process the 1st batch and read the 2nd batch from the ASB, instead of getting the next 5 messages, I get the same messages again.
For example; if I have messages with messageId 1 to 12 in DLQ, after reading the 1st batch, I get messages with messageId 1,2,3,4,5. After reading the second batch instead of getting 6,7,8,9,10 I'm getting 1,2,3,4,5.
Here is the code:
public void processDeadLetterQueue(){
IMessageReceiver deadLetterReceiver = getDeadLetterMessageReceiver();
Long deadLetterMessageCount = getDeadLetterMessageCount();
Long receivedMessageCount = 0L;
ExecutorService executor = Executors.newFixedThreadPool(2);
while(receivedMessageCount < deadLetterMessageCount) {
Collection<IMessage> messageList = deadLetterReceiver.receiveBatch(5);
receivedMessageCount += messageList.size();
List<Callable<Void>> callableDeadLetterMessages = new ArrayList<>();
messageList.forEach(message ->callableDeadLetterMessages.add(() -> {
handleDeadLetterMessage(message, deadLetterReceiver);
return null;
}));
try {
List<Future<Void>> futureList = executor.invokeAll(callableDeadLetterMessages);
for (Future<Void> future : futureList) {
future.get();
}
} catch (InterruptedException | ExecutionException ex){
log.error("Interrupted during processing callableDeadLetterMessage: ", ex);
Thread.currentThread().interrupt();
}
}
executor.shutdown();
deadLetterReceiver.close();
}
How can I stop it reading the same message again in the next batch and read the next available messages instead?
Note: I' not abandoning the message from DLQ (deadLetterReceiver.abandon(message.getLockToken());)
I have have a message producer on my local machine and a broker on remote host (aws).
After sending a message from the producer,
I wait and call the console consumer on the remote host and
see excessive logs.
Without the value from producer.
The producer flushes the data after calling the send method.
Everything is configured correctly.
How can I check to see that the broker received the message from the producer and to see if the producer received the answer?
The Send method asynchronously sends the message to the topic and
returns a Future of RecordMetadata.
java.util.concurrent.Future<RecordMetadata> send(ProducerRecord<K,V> record)
Asynchronously sends a record to a topic
After the flush call,
check to see that the Future has completed by calling the isDone method.
(for example, Future.isDone() == true)
Invoking this method makes all buffered records immediately available to send (even if linger.ms is greater than 0) and blocks on the completion of the requests associated with these records. The post-condition of flush() is that any previously sent record will have completed (e.g. Future.isDone() == true). A request is considered completed when it is successfully acknowledged according to the acks configuration you have specified or else it results in an error.
The RecordMetadata contains the offset and the partition
public int partition()
The partition the record was sent to
public long offset()
the offset of the record, or -1 if {hasOffset()} returns false.
Or you can also use Callback function to ensure messages was sent to topic or not
Fully non-blocking usage can make use of the Callback parameter to provide a callback that will be invoked when the request is complete.
here is clear example in docs
ProducerRecord<byte[],byte[]> record = new ProducerRecord<byte[],byte[]>("the-topic", key, value);
producer.send(myRecord,
new Callback() {
public void onCompletion(RecordMetadata metadata, Exception e) {
if(e != null) {
e.printStackTrace();
} else {
System.out.println("The offset of the record we just sent is: " + metadata.offset());
}
}
});
You can try get() API of send , which will return the Future of RecordMetadata
ProducerRecord<String, String> record =
new ProducerRecord<>("SampleTopic", "SampleKey", "SampleValue");
try {
producer.send(record).get();
} catch (Exception e) {
e.printStackTrace();
}
Use exactly-once-delivery and you won't need to worry about whether your message reached or not: https://www.baeldung.com/kafka-exactly-once, https://www.confluent.io/blog/exactly-once-semantics-are-possible-heres-how-apache-kafka-does-it/
I am new in Kafka and I have a question that I'm not able to resolve.
I have installed Kafka and Zookeeper in my own computer in Windows (not in Linux) and I have created a broker with a topic with several partitions (playing between 6 and 12 partitions).
When I create consumers, they works perfectly and read at good speed, but referring producer, I have created the simple producer one can see in many web sites. The producer is inside a loop and is sending many short messages (about 2000 very short messages).
I can see that consumers read the 2000 messages very quicly, but producer sends message to the broker at more or less 140 or 150 messages per second. As I said before, I'm working in my own laptop (only 1 disk), but when I read about millions of messages per second, I think there is something I forgot because I'm light-years far from that.
If I use more producers, the result is worse.
Is a question of more brokers in the same node or something like that? This problem have been imposed to me in my job and I have not the possibility of a better computer.
The code for creating the producer is
public class Producer {
public void publica(String topic, String strKey, String strValue) {
Properties configProperties = new Properties();
configProperties.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
configProperties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, LongSerializer.class.getName());
configProperties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
KafkaProducer<String, String> producer = new KafkaProducer<String, String>(configProperties);
ProducerRecord<String, String> rec = new ProducerRecord<String, String>(topic, strValue);
producer.send(rec);
}
}
and the code for sending messages is (partial):
Producer prod = new Producer();
for (int i = 0; i < 2000; i++)
{
key = String.valueOf(i);
prod.publica("TopicName", key, texto + " - " + key);
// System.out.println(i + " - " + System.currentTimeMillis());
}
You may create your Kafka producer once and use it every time you need to send a message:
public class Producer {
private final KafkaProducer<String, String> producer; // initialize in constructor
public void publica(String topic, String strKey, String strValue) {
ProducerRecord<String, String> rec = new ProducerRecord<String, String>(topic, strValue);
producer.send(rec);
}
}
Also take a look at the producer and broker configurations available here. There are several options with which you can tune for your application's needs.
I installed Kafka on DC/OS (Mesos) cluster on AWS. Enabled three brokers and created a topic called "topic1".
dcos kafka topic create topic1 --partitions 3 --replication 3
Then I wrote a Producer class to send messages and a Consumer class to receive them.
public class Producer {
public static void sendMessage(String msg) throws InterruptedException, ExecutionException {
Map<String, Object> producerConfig = new HashMap<>();
System.out.println("setting Producerconfig.");
producerConfig.put("bootstrap.servers",
"172.16.20.207:9946,172.16.20.234:9125,172.16.20.36:9636");
ByteArraySerializer serializer = new ByteArraySerializer();
System.out.println("Creating KafkaProcuder");
KafkaProducer<byte[], byte[]> kafkaProducer = new KafkaProducer<>(producerConfig, serializer, serializer);
for (int i = 0; i < 100; i++) {
String msgstr = msg + i;
byte[] message = msgstr.getBytes();
ProducerRecord<byte[], byte[]> record = new ProducerRecord<>("topic1", message);
System.out.println("Sent:" + msgstr);
kafkaProducer.send(record);
}
kafkaProducer.close();
}
public static void main(String[] args) throws InterruptedException, ExecutionException {
sendMessage("Kafka test message 2/27 3:32");
}
}
public class Consumer {
public static String getMessage() {
Map<String, Object> consumerConfig = new HashMap<>();
consumerConfig.put("bootstrap.servers",
"172.16.20.207:9946,172.16.20.234:9125,172.16.20.36:9636");
consumerConfig.put("group.id", "dj-group");
consumerConfig.put("enable.auto.commit", "true");
consumerConfig.put("auto.offset.reset", "earliest");
ByteArrayDeserializer deserializer = new ByteArrayDeserializer();
KafkaConsumer<byte[], byte[]> kafkaConsumer = new KafkaConsumer<>(consumerConfig, deserializer, deserializer);
kafkaConsumer.subscribe(Arrays.asList("topic1"));
while (true) {
ConsumerRecords<byte[], byte[]> records = kafkaConsumer.poll(100);
System.out.println(records.count() + " of records received.");
for (ConsumerRecord<byte[], byte[]> record : records) {
System.out.println(Arrays.toString(record.value()));
}
}
}
public static void main(String[] args) {
getMessage();
}
}
First I ran Producer on the cluster to send messages to topic1. However when I ran Consumer, it couldn't receive anything, just hang.
Producer is working since I was able to get all the messages by running the shell script that came with Kafka install
./bin/kafka-console-consumer.sh --zookeeper master.mesos:2181/dcos-service-kafka --topic topic1 --from-beginning
But why can't I receive with Consumer? This post suggests group.id with old offset might be a possible cause. I only create group.id in the consumer not the producer. How do I config the offset for this group?
As it turns out, kafkaConsumer.subscribe(Arrays.asList("topic1")); is causing poll() to hang. According to Kafka Consumer does not receive messages
, there are two ways to connect to a topic, assign and subscribe. After I replaced subscribe with the lines below, it started working.
TopicPartition tp = new TopicPartition("topic1", 0);
List<TopicPartition> tps = Arrays.asList(tp);
kafkaConsumer.assign(tps);
However the output shows arrays of numbers which is not expected (Producer sent Strings). But I guess this is a separate issue.
Make sure you gracefully shutdown your consumer:
consumer.close()
TLDR
When you have two consumers running with the same group id Kafka won't assign the same partition of your topic to both.
If you repeatedly run an app that spins up a consumer with the same group id and you don't shut them down gracefully, Kafka will take a while to consider a consumer from an earlier run as dead and reassign his partition to a new one.
If new messages come to that partition and it's never assigned to your new consumer, the consumer will never see the messages.
To debug:
How many partition your topic has:
./kafka-topics --zookeeper <host-port> --describe <topic>
How far have your group consumed from each partition:
./kafka-consumer-groups --bootstrap-server <host-port> --describe --group <group-id>
If you already have your partitions stuck on stale consumers, either wipe the state of your Kafka or use a new group id.
I am a student researching and playing around with Kafka. After following the examples on the Apache documentation, I'm playing around with the examples portion in the trunk of their current Github repo.
As of right now, the example implements an 'older' version of their Consumer and does not employ the new KafkaConsumer. Following the documentation, I have written my own version of the KafkaConsumer thinking that it would be faster.
This is a vague question, but on runthrough I produce 5000 simple messages such as "Message_CurrentMessageNumber" to a topic "test" and then use my consumer to fetch these messages and print them to stdout. When I run the example code replacing the provided consumer with the newer KafkaConsumer (v 0.8.2 and up) it works pretty quickly and comparably to the example in its first runthrough, but slows down considerably anytime after that.
I notice that my Kafka Server outputs
Rebalancing group group1 generation 3 (kafka.coordinator.ConsumerCoordinator)
or similar messages often which leads me to believe that Kafka has to do some sort of load balancing that slows stuff down but I was wondering if anyone else had insight as to what I am doing wrong.
public class AlternateConsumer extends Thread {
private final KafkaConsumer<Integer, String> consumer;
private final String topic;
private final Boolean isAsync = false;
public AlternateConsumer(String topic) {
Properties properties = new Properties();
properties.put("bootstrap.servers", "localhost:9092");
properties.put("group.id", "newestGroup");
properties.put("partition.assignment.strategy", "roundrobin");
properties.put("enable.auto.commit", "true");
properties.put("auto.commit.interval.ms", "1000");
properties.put("session.timeout.ms", "30000");
properties.put("key.deserializer", "org.apache.kafka.common.serialization.IntegerDeserializer");
properties.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
consumer = new KafkaConsumer<Integer, String>(properties);
consumer.subscribe(topic);
this.topic = topic;
}
public void run() {
while (true) {
ConsumerRecords<Integer, String> records = consumer.poll(100);
for (ConsumerRecord<Integer, String> record : records) {
System.out.println("We received message: " + record.value() + " from topic: " + record.topic());
}
}
// ConsumerRecords<Integer, String> records = consumer.poll(0);
// for (ConsumerRecord<Integer, String> record : records) {
// System.out.println("We received message: " + record.value() + " from topic: " + record.topic());
// }
// consumer.close();
}
}
To start:
package kafka.examples;
public class KafkaConsumerProducerDemo implements KafkaProperties
{
public static void main(String[] args) {
final boolean isAsync = args.length > 0 ? !args[0].trim().toLowerCase().equals("sync") : true;
Producer producerThread = new Producer("test", isAsync);
producerThread.start();
AlternateConsumer consumerThread = new AlternateConsumer("test");
consumerThread.start();
}
}
The producer is the default producer located here: https://github.com/apache/kafka/blob/trunk/examples/src/main/java/kafka/examples/Producer.java
This should not be the case. If the setup is similar between your two consumers you should expect better result with new consumer unless there is issue in the client/consumer implementation, which seems to be the case here.
Can you share your benchmark results and the frequency of reported rebalancing and/or any pattern (i.e. sluggish once at startup, after fixed message consumption, after the queue is drained, etc) you are observing. Also if you can share some details about your consumer implementation.