I am new in Kafka and I have a question that I'm not able to resolve.
I have installed Kafka and Zookeeper in my own computer in Windows (not in Linux) and I have created a broker with a topic with several partitions (playing between 6 and 12 partitions).
When I create consumers, they works perfectly and read at good speed, but referring producer, I have created the simple producer one can see in many web sites. The producer is inside a loop and is sending many short messages (about 2000 very short messages).
I can see that consumers read the 2000 messages very quicly, but producer sends message to the broker at more or less 140 or 150 messages per second. As I said before, I'm working in my own laptop (only 1 disk), but when I read about millions of messages per second, I think there is something I forgot because I'm light-years far from that.
If I use more producers, the result is worse.
Is a question of more brokers in the same node or something like that? This problem have been imposed to me in my job and I have not the possibility of a better computer.
The code for creating the producer is
public class Producer {
public void publica(String topic, String strKey, String strValue) {
Properties configProperties = new Properties();
configProperties.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
configProperties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, LongSerializer.class.getName());
configProperties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
KafkaProducer<String, String> producer = new KafkaProducer<String, String>(configProperties);
ProducerRecord<String, String> rec = new ProducerRecord<String, String>(topic, strValue);
producer.send(rec);
}
}
and the code for sending messages is (partial):
Producer prod = new Producer();
for (int i = 0; i < 2000; i++)
{
key = String.valueOf(i);
prod.publica("TopicName", key, texto + " - " + key);
// System.out.println(i + " - " + System.currentTimeMillis());
}
You may create your Kafka producer once and use it every time you need to send a message:
public class Producer {
private final KafkaProducer<String, String> producer; // initialize in constructor
public void publica(String topic, String strKey, String strValue) {
ProducerRecord<String, String> rec = new ProducerRecord<String, String>(topic, strValue);
producer.send(rec);
}
}
Also take a look at the producer and broker configurations available here. There are several options with which you can tune for your application's needs.
Related
I'm actually using #KafkaListener to read events in a topic. I want to read 100 events, and after this call Thread.sleep() with a certain time period.
My problem is, when the thread wakes up from sleep, the listener continues on the last event I read, but I want to discard the events when the thread is sleeping and continues with the last events in topic.
Like:
1-100 - Capture
Thread sleeping
101-500
Thread Returns
501 - 601 - Capture
The 101-500 events can be discarded
Code:
#KafkaListener(topics = "topic")
public void consumeBalance(ConsumerRecord<String, String> payload) throws InterruptedException {
this.valorMaximoDeRequest = this.valorMaximoDeRequest + 1;
if (this.valorMaximoDeRequest <= 100) {
log.info("Encontrou evento )");
log.info("Key: " + payload.key() + ", Value:" + payload.value());
log.info("Partition:" + payload.partition() + ",Offset:" + payload.offset());
JsonObject jsonObject = new Gson().fromJson(payload.value(), JsonObject.class);
String accountId = jsonObject.get("accountId").getAsString();
log.info(">>>>>>>>> accountId: " + accountId);
} else {
this.valorMaximoDeRequest = 0;
Thread.sleep(60*1000);
}
}
Kafka config:
#Bean
public Map<String, Object> kafkaFactory() {
Map<String, Object> props = new HashMap<>();
props.put("specific.avro.reader", Boolean.TRUE);
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "brokers");
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
props.put(ConsumerConfig.GROUP_ID_CONFIG, "1");
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "latest");
props.put("security.protocol", "SASL_PLAINTEXT");
return props;
}
First, you shouldn't force the listening thread to sleep. The consumer may be considered as dead and trigger a consumer rebalance. You'd better use pause and resume on the consumer. See https://docs.spring.io/spring-kafka/docs/current/reference/html/#pause-resume
Then, if you want to skip the records published when the consumer was asleep, you'll have to seek (seekToBeginning) when the consumer awakes. See https://docs.spring.io/spring-kafka/docs/current/reference/html/#seek
However it is not simple: The Kafka Consumer doesn't let you seek when the consumer is active nor when it does not own the partition.
The point of having consumer groups is to keep track of the offsets which have been processed so that the subsequent consumption resumes from there and also distribute load across different consumers.
If your use-case doesn't need any of the above, you can use consumer.assign() which doesn't leverage the group management functionality.
#KafkaListener(topicPartitions = #TopicPartition(topic = "so56114299",
partitions = "#{#finder.partitions('so56114299')}"))
public void listen(#Header(KafkaHeaders.RECEIVED_MESSAGE_KEY) String key, String payload) {
System.out.println(key + ":" + payload);
}
Snippet reference: https://stackoverflow.com/a/56114699/2534090
Alternatively, you can write your own KafkaConsumer and manually call the consumer.assign() for assigning the partitions.
To answer your original question, for seeking, you need to call consumer.seekToEnd() method every time your method wakes from Thread.sleep(). The subsequent poll will fetch the records from the end offset. Looks like you can add Consumer as parameter to your #KafkaListener method.
I have multiple servers from where the messages will be produced, and I need broker and consumer at one server. If I have both producer and consumer running on same server then it works fine, but not sure what changes need to be done to keep producers separate. I don't want any dependency of zookeeper and kafka servers at producer servers as there are many and they will increase. I tried with changing bootstrap server to the broker/consumer server like 192.168.0.1:9092 while setting up KafkaProducer but still not able to generate messages. Not sure what am I missing, please help me out here.
I have followed https://github.com/mapr-demos/kafka-sample-programs for code.
Tried running both producer and consumer on same server, it works fine.
Producer.java
public class Producer {
public static void main(String[] args) throws IOException {
// set up the producer
KafkaProducer<String, String> producer;
try (InputStream props = Resources.getResource("producer.props").openStream()) {
Properties properties = new Properties();
properties.load(props);
producer = new KafkaProducer<>(properties);
}
try {
for (int i = 0; i < 1000000; i++) {
// send lots of messages
producer.send(new ProducerRecord<String, String>(
"fast-messages",
String.format("{\"type\":\"test\", \"t\":%.3f, \"k\":%d}", System.nanoTime() * 1e-9, i)));
// every so often send to a different topic
if (i % 1000 == 0) {
producer.send(new ProducerRecord<String, String>(
"fast-messages",
String.format("{\"type\":\"marker\", \"t\":%.3f, \"k\":%d}", System.nanoTime() * 1e-9, i)));
producer.send(new ProducerRecord<String, String>(
"summary-markers",
String.format("{\"type\":\"other\", \"t\":%.3f, \"k\":%d}", System.nanoTime() * 1e-9, i)));
producer.flush();
System.out.println("Sent msg number " + i);
}
}
} catch (Throwable throwable) {
System.out.printf("%s", throwable.getStackTrace());
} finally {
producer.close();
}
}
prodcuer.props
bootstrap.servers=192.168.0.1:9092
acks=all
retries=0
batch.size=16384
auto.commit.interval.ms=1000
linger.ms=0
key.serializer=org.apache.kafka.common.serialization.StringSerializer
value.serializer=org.apache.kafka.common.serialization.StringSerializer
block.on.buffer.full=true
Consumer.java
public class Consumer {
public static void main(String[] args) throws IOException {
// set up house-keeping
ObjectMapper mapper = new ObjectMapper();
Histogram stats = new Histogram(1, 10000000, 2);
Histogram global = new Histogram(1, 10000000, 2);
// and the consumer
KafkaConsumer<String, String> consumer;
try (InputStream props = Resources.getResource("consumer.props").openStream()) {
Properties properties = new Properties();
properties.load(props);
if (properties.getProperty("group.id") == null) {
properties.setProperty("group.id", "group-" + new Random().nextInt(100000));
}
consumer = new KafkaConsumer<>(properties);
}
consumer.subscribe(Arrays.asList("fast-messages", "summary-markers"));
int timeouts = 0;
//noinspection InfiniteLoopStatement
while (true) {
// read records with a short timeout. If we time out, we don't really care.
ConsumerRecords<String, String> records = consumer.poll(200);
if (records.count() == 0) {
timeouts++;
} else {
System.out.printf("Got %d records after %d timeouts\n", records.count(), timeouts);
timeouts = 0;
}
for (ConsumerRecord<String, String> record : records) {
switch (record.topic()) {
case "fast-messages":
// the send time is encoded inside the message
JsonNode msg = mapper.readTree(record.value());
switch (msg.get("type").asText()) {
case "test":
long latency = (long) ((System.nanoTime() * 1e-9 - msg.get("t").asDouble()) * 1000);
stats.recordValue(latency);
global.recordValue(latency);
break;
case "marker":
// whenever we get a marker message, we should dump out the stats
// note that the number of fast messages won't necessarily be quite constant
System.out.printf("%d messages received in period, latency(min, max, avg, 99%%) = %d, %d, %.1f, %d (ms)\n",
stats.getTotalCount(),
stats.getValueAtPercentile(0), stats.getValueAtPercentile(100),
stats.getMean(), stats.getValueAtPercentile(99));
System.out.printf("%d messages received overall, latency(min, max, avg, 99%%) = %d, %d, %.1f, %d (ms)\n",
global.getTotalCount(),
global.getValueAtPercentile(0), global.getValueAtPercentile(100),
global.getMean(), global.getValueAtPercentile(99));
stats.reset();
break;
default:
throw new IllegalArgumentException("Illegal message type: " + msg.get("type"));
}
break;
case "summary-markers":
break;
default:
throw new IllegalStateException("Shouldn't be possible to get message on topic " + record.topic());
}
}
}
}
}
consumer.props
bootstrap.servers=192.168.0.1:9092
group.id=test
enable.auto.commit=true
key.deserializer=org.apache.kafka.common.serialization.StringDeserializer
value.deserializer=org.apache.kafka.common.serialization.StringDeserializer
# fast session timeout makes it more fun to play with failover
session.timeout.ms=10000
# These buffer sizes seem to be needed to avoid consumer switching to
# a mode where it processes one bufferful every 5 seconds with multiple
# timeouts along the way. No idea why this happens.
fetch.min.bytes=50000
receive.buffer.bytes=262144
max.partition.fetch.bytes=2097152
What exactly happens when the producer is not on the broker machine? Do you see any log- or error-messages? You didn't describe your setup but is the IP 192.168.0.1. (broker machine) reachable from the producer machine and is the port 9092 open to the outside (check iptables)?
Another thing: The above code won't give you meaningful results if the producer and consumer are not on the same machine. You use System.nanoTime() to measure the latency. But according to the official documentation:
This method can only be used to measure elapsed time and is not related to any other notion of system or wall-clock time. The value returned represents nanoseconds since some fixed but arbitrary origin time (perhaps in the future, so values may be negative). The same origin is used by all invocations of this method in an instance of a Java virtual machine; other virtual machine instances are likely to use a different origin.
I have a function in java in which I am trying to fetch messages which are unread. For example, If I have messages with offSet 0,1,2 in broker which are already read by the consumer and If I switch off my consumer for an hour. And at that time I produce messages with offset 3,4,5. After that when my consumer is started it should read message from offset 3 not from 0. But, It either reads all the messages or read those messages which are produced after starting Kafka consumer. I want to read those messages which are unread or uncommited
I tried "auto.offset.reset"= "latest" and "earliest". as well as "enable.auto.commit" = "true" and "false". I also tried commitSync() and commitAsync() before calling close() method but no luck.
public static KafkaConsumer createConsumer() {
Properties properties = new Properties();
properties.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, Constants.KAFKA_BROKER);
properties.put(ConsumerConfig.GROUP_ID_CONFIG, "testGroup");
properties.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
properties.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
properties.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "latest");
properties.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, "50");
properties.put(ConsumerConfig.AUTO_COMMIT_INTERVAL_MS_CONFIG, "1");
properties.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);
KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String>(properties);
consumer.subscribe(Collections.singleton(Constants.TOPIC));
return consumer;
}
public static void main(String[] args) {
System.out.println("");
System.out.println("----------------");
System.out.println("");
System.out.println("KAFKA CONSUMER EXAMPLE");
System.out.println("");
System.out.println("----------------");
System.out.println("");
OffsetAndMetadata offsetAndMetadataInitial = createConsumer().committed(new TopicPartition(Constants.TOPIC, 0));
System.out.println("");
System.out.println("Offset And MetaData Initial : ");
System.out.println(offsetAndMetadataInitial);
System.out.println("");
ConsumerRecords<String, String> consumerRecords = createConsumer().poll(Duration.ofSeconds(2L));
System.out.println("");
System.out.println("Count Consumer Records : " + consumerRecords.count());
System.out.println("");
Iterator<ConsumerRecord<String, String>> itr = consumerRecords.iterator();
Map<TopicPartition, OffsetAndMetadata> partationOffsetMap = new HashMap<>(4);
while (itr.hasNext()) {
ConsumerRecord record = itr.next();
System.out.println("OffSet : " + record.offset());
System.out.println("");
System.out.println("Key : " + record.key());
System.out.println("Value : " + record.value());
System.out.println("Partition : " + record.partition());
System.out.println("--------------------------");
System.out.println("");
}
createConsumer().close();
}
I just want to fetch only unread messages in kafka Consumer. Please correct me if I am wrong somewhere. And Thanks in Advance
The main problem in your code is that you are not closing the consumer you used to poll messages; this is because each call to createConsumer() creates a new KafkaConsumer. And as you are not closing the consumer, and are calling poll() only once, you never commit the messages you have read.
(with auto-commit, commit is called within poll() after auto-commit-interval, and within close())
Once you will have corrected that it should work with following settings:
auto-commit=true (otherwise you could also commit manually, but auto-commit is simpler).
offset-reset= earliest (this has only effect the first time you consume for a given group-id, to tell if you want to consume from the begining of the topic or only messages produced after you started to consume. Once you have started to consume with a given group-id, you will always continue to consume from the latest offset you have committed.)
group-id must not change between restarts, or you will start from the begining or from the end again depending on your offset-reset setting.
Hope this helps
I installed Kafka on DC/OS (Mesos) cluster on AWS. Enabled three brokers and created a topic called "topic1".
dcos kafka topic create topic1 --partitions 3 --replication 3
Then I wrote a Producer class to send messages and a Consumer class to receive them.
public class Producer {
public static void sendMessage(String msg) throws InterruptedException, ExecutionException {
Map<String, Object> producerConfig = new HashMap<>();
System.out.println("setting Producerconfig.");
producerConfig.put("bootstrap.servers",
"172.16.20.207:9946,172.16.20.234:9125,172.16.20.36:9636");
ByteArraySerializer serializer = new ByteArraySerializer();
System.out.println("Creating KafkaProcuder");
KafkaProducer<byte[], byte[]> kafkaProducer = new KafkaProducer<>(producerConfig, serializer, serializer);
for (int i = 0; i < 100; i++) {
String msgstr = msg + i;
byte[] message = msgstr.getBytes();
ProducerRecord<byte[], byte[]> record = new ProducerRecord<>("topic1", message);
System.out.println("Sent:" + msgstr);
kafkaProducer.send(record);
}
kafkaProducer.close();
}
public static void main(String[] args) throws InterruptedException, ExecutionException {
sendMessage("Kafka test message 2/27 3:32");
}
}
public class Consumer {
public static String getMessage() {
Map<String, Object> consumerConfig = new HashMap<>();
consumerConfig.put("bootstrap.servers",
"172.16.20.207:9946,172.16.20.234:9125,172.16.20.36:9636");
consumerConfig.put("group.id", "dj-group");
consumerConfig.put("enable.auto.commit", "true");
consumerConfig.put("auto.offset.reset", "earliest");
ByteArrayDeserializer deserializer = new ByteArrayDeserializer();
KafkaConsumer<byte[], byte[]> kafkaConsumer = new KafkaConsumer<>(consumerConfig, deserializer, deserializer);
kafkaConsumer.subscribe(Arrays.asList("topic1"));
while (true) {
ConsumerRecords<byte[], byte[]> records = kafkaConsumer.poll(100);
System.out.println(records.count() + " of records received.");
for (ConsumerRecord<byte[], byte[]> record : records) {
System.out.println(Arrays.toString(record.value()));
}
}
}
public static void main(String[] args) {
getMessage();
}
}
First I ran Producer on the cluster to send messages to topic1. However when I ran Consumer, it couldn't receive anything, just hang.
Producer is working since I was able to get all the messages by running the shell script that came with Kafka install
./bin/kafka-console-consumer.sh --zookeeper master.mesos:2181/dcos-service-kafka --topic topic1 --from-beginning
But why can't I receive with Consumer? This post suggests group.id with old offset might be a possible cause. I only create group.id in the consumer not the producer. How do I config the offset for this group?
As it turns out, kafkaConsumer.subscribe(Arrays.asList("topic1")); is causing poll() to hang. According to Kafka Consumer does not receive messages
, there are two ways to connect to a topic, assign and subscribe. After I replaced subscribe with the lines below, it started working.
TopicPartition tp = new TopicPartition("topic1", 0);
List<TopicPartition> tps = Arrays.asList(tp);
kafkaConsumer.assign(tps);
However the output shows arrays of numbers which is not expected (Producer sent Strings). But I guess this is a separate issue.
Make sure you gracefully shutdown your consumer:
consumer.close()
TLDR
When you have two consumers running with the same group id Kafka won't assign the same partition of your topic to both.
If you repeatedly run an app that spins up a consumer with the same group id and you don't shut them down gracefully, Kafka will take a while to consider a consumer from an earlier run as dead and reassign his partition to a new one.
If new messages come to that partition and it's never assigned to your new consumer, the consumer will never see the messages.
To debug:
How many partition your topic has:
./kafka-topics --zookeeper <host-port> --describe <topic>
How far have your group consumed from each partition:
./kafka-consumer-groups --bootstrap-server <host-port> --describe --group <group-id>
If you already have your partitions stuck on stale consumers, either wipe the state of your Kafka or use a new group id.
I am a student researching and playing around with Kafka. After following the examples on the Apache documentation, I'm playing around with the examples portion in the trunk of their current Github repo.
As of right now, the example implements an 'older' version of their Consumer and does not employ the new KafkaConsumer. Following the documentation, I have written my own version of the KafkaConsumer thinking that it would be faster.
This is a vague question, but on runthrough I produce 5000 simple messages such as "Message_CurrentMessageNumber" to a topic "test" and then use my consumer to fetch these messages and print them to stdout. When I run the example code replacing the provided consumer with the newer KafkaConsumer (v 0.8.2 and up) it works pretty quickly and comparably to the example in its first runthrough, but slows down considerably anytime after that.
I notice that my Kafka Server outputs
Rebalancing group group1 generation 3 (kafka.coordinator.ConsumerCoordinator)
or similar messages often which leads me to believe that Kafka has to do some sort of load balancing that slows stuff down but I was wondering if anyone else had insight as to what I am doing wrong.
public class AlternateConsumer extends Thread {
private final KafkaConsumer<Integer, String> consumer;
private final String topic;
private final Boolean isAsync = false;
public AlternateConsumer(String topic) {
Properties properties = new Properties();
properties.put("bootstrap.servers", "localhost:9092");
properties.put("group.id", "newestGroup");
properties.put("partition.assignment.strategy", "roundrobin");
properties.put("enable.auto.commit", "true");
properties.put("auto.commit.interval.ms", "1000");
properties.put("session.timeout.ms", "30000");
properties.put("key.deserializer", "org.apache.kafka.common.serialization.IntegerDeserializer");
properties.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
consumer = new KafkaConsumer<Integer, String>(properties);
consumer.subscribe(topic);
this.topic = topic;
}
public void run() {
while (true) {
ConsumerRecords<Integer, String> records = consumer.poll(100);
for (ConsumerRecord<Integer, String> record : records) {
System.out.println("We received message: " + record.value() + " from topic: " + record.topic());
}
}
// ConsumerRecords<Integer, String> records = consumer.poll(0);
// for (ConsumerRecord<Integer, String> record : records) {
// System.out.println("We received message: " + record.value() + " from topic: " + record.topic());
// }
// consumer.close();
}
}
To start:
package kafka.examples;
public class KafkaConsumerProducerDemo implements KafkaProperties
{
public static void main(String[] args) {
final boolean isAsync = args.length > 0 ? !args[0].trim().toLowerCase().equals("sync") : true;
Producer producerThread = new Producer("test", isAsync);
producerThread.start();
AlternateConsumer consumerThread = new AlternateConsumer("test");
consumerThread.start();
}
}
The producer is the default producer located here: https://github.com/apache/kafka/blob/trunk/examples/src/main/java/kafka/examples/Producer.java
This should not be the case. If the setup is similar between your two consumers you should expect better result with new consumer unless there is issue in the client/consumer implementation, which seems to be the case here.
Can you share your benchmark results and the frequency of reported rebalancing and/or any pattern (i.e. sluggish once at startup, after fixed message consumption, after the queue is drained, etc) you are observing. Also if you can share some details about your consumer implementation.