Apache kafka producer and consumer at different servres - java

I have multiple servers from where the messages will be produced, and I need broker and consumer at one server. If I have both producer and consumer running on same server then it works fine, but not sure what changes need to be done to keep producers separate. I don't want any dependency of zookeeper and kafka servers at producer servers as there are many and they will increase. I tried with changing bootstrap server to the broker/consumer server like 192.168.0.1:9092 while setting up KafkaProducer but still not able to generate messages. Not sure what am I missing, please help me out here.
I have followed https://github.com/mapr-demos/kafka-sample-programs for code.
Tried running both producer and consumer on same server, it works fine.
Producer.java
public class Producer {
public static void main(String[] args) throws IOException {
// set up the producer
KafkaProducer<String, String> producer;
try (InputStream props = Resources.getResource("producer.props").openStream()) {
Properties properties = new Properties();
properties.load(props);
producer = new KafkaProducer<>(properties);
}
try {
for (int i = 0; i < 1000000; i++) {
// send lots of messages
producer.send(new ProducerRecord<String, String>(
"fast-messages",
String.format("{\"type\":\"test\", \"t\":%.3f, \"k\":%d}", System.nanoTime() * 1e-9, i)));
// every so often send to a different topic
if (i % 1000 == 0) {
producer.send(new ProducerRecord<String, String>(
"fast-messages",
String.format("{\"type\":\"marker\", \"t\":%.3f, \"k\":%d}", System.nanoTime() * 1e-9, i)));
producer.send(new ProducerRecord<String, String>(
"summary-markers",
String.format("{\"type\":\"other\", \"t\":%.3f, \"k\":%d}", System.nanoTime() * 1e-9, i)));
producer.flush();
System.out.println("Sent msg number " + i);
}
}
} catch (Throwable throwable) {
System.out.printf("%s", throwable.getStackTrace());
} finally {
producer.close();
}
}
prodcuer.props
bootstrap.servers=192.168.0.1:9092
acks=all
retries=0
batch.size=16384
auto.commit.interval.ms=1000
linger.ms=0
key.serializer=org.apache.kafka.common.serialization.StringSerializer
value.serializer=org.apache.kafka.common.serialization.StringSerializer
block.on.buffer.full=true
Consumer.java
public class Consumer {
public static void main(String[] args) throws IOException {
// set up house-keeping
ObjectMapper mapper = new ObjectMapper();
Histogram stats = new Histogram(1, 10000000, 2);
Histogram global = new Histogram(1, 10000000, 2);
// and the consumer
KafkaConsumer<String, String> consumer;
try (InputStream props = Resources.getResource("consumer.props").openStream()) {
Properties properties = new Properties();
properties.load(props);
if (properties.getProperty("group.id") == null) {
properties.setProperty("group.id", "group-" + new Random().nextInt(100000));
}
consumer = new KafkaConsumer<>(properties);
}
consumer.subscribe(Arrays.asList("fast-messages", "summary-markers"));
int timeouts = 0;
//noinspection InfiniteLoopStatement
while (true) {
// read records with a short timeout. If we time out, we don't really care.
ConsumerRecords<String, String> records = consumer.poll(200);
if (records.count() == 0) {
timeouts++;
} else {
System.out.printf("Got %d records after %d timeouts\n", records.count(), timeouts);
timeouts = 0;
}
for (ConsumerRecord<String, String> record : records) {
switch (record.topic()) {
case "fast-messages":
// the send time is encoded inside the message
JsonNode msg = mapper.readTree(record.value());
switch (msg.get("type").asText()) {
case "test":
long latency = (long) ((System.nanoTime() * 1e-9 - msg.get("t").asDouble()) * 1000);
stats.recordValue(latency);
global.recordValue(latency);
break;
case "marker":
// whenever we get a marker message, we should dump out the stats
// note that the number of fast messages won't necessarily be quite constant
System.out.printf("%d messages received in period, latency(min, max, avg, 99%%) = %d, %d, %.1f, %d (ms)\n",
stats.getTotalCount(),
stats.getValueAtPercentile(0), stats.getValueAtPercentile(100),
stats.getMean(), stats.getValueAtPercentile(99));
System.out.printf("%d messages received overall, latency(min, max, avg, 99%%) = %d, %d, %.1f, %d (ms)\n",
global.getTotalCount(),
global.getValueAtPercentile(0), global.getValueAtPercentile(100),
global.getMean(), global.getValueAtPercentile(99));
stats.reset();
break;
default:
throw new IllegalArgumentException("Illegal message type: " + msg.get("type"));
}
break;
case "summary-markers":
break;
default:
throw new IllegalStateException("Shouldn't be possible to get message on topic " + record.topic());
}
}
}
}
}
consumer.props
bootstrap.servers=192.168.0.1:9092
group.id=test
enable.auto.commit=true
key.deserializer=org.apache.kafka.common.serialization.StringDeserializer
value.deserializer=org.apache.kafka.common.serialization.StringDeserializer
# fast session timeout makes it more fun to play with failover
session.timeout.ms=10000
# These buffer sizes seem to be needed to avoid consumer switching to
# a mode where it processes one bufferful every 5 seconds with multiple
# timeouts along the way. No idea why this happens.
fetch.min.bytes=50000
receive.buffer.bytes=262144
max.partition.fetch.bytes=2097152

What exactly happens when the producer is not on the broker machine? Do you see any log- or error-messages? You didn't describe your setup but is the IP 192.168.0.1. (broker machine) reachable from the producer machine and is the port 9092 open to the outside (check iptables)?
Another thing: The above code won't give you meaningful results if the producer and consumer are not on the same machine. You use System.nanoTime() to measure the latency. But according to the official documentation:
This method can only be used to measure elapsed time and is not related to any other notion of system or wall-clock time. The value returned represents nanoseconds since some fixed but arbitrary origin time (perhaps in the future, so values may be negative). The same origin is used by all invocations of this method in an instance of a Java virtual machine; other virtual machine instances are likely to use a different origin.

Related

Kafka listener concurrency threads taking time to start in parallel?

I need to process around 50k records(this number can vary from 100 to 50k max) in same topic. Therefore, i used concurrency feature of kafka.Below is my configuration and listener code.
#KafkaListener(topics = {"kafkaTopic"},
containerFactory = "abcd")
public void consume(
#Payload List<String> message,
#Header(KafkaHeaders.RECEIVED_TOPIC) String topic
) throws IOException {
StopWatch st = new StopWatch();
DateFormat dateFormat = new SimpleDateFormat("yyyy/MM/dd HH:mm:ss");
Date date = new Date();
StringBuilder str = new StringBuilder();
st.start("threadName-");
message.forEach(messages -> {
try {
Thread.sleep(2500);
logger.info("message is-{}", messages);
str.append(messages);
str.append(",");
} catch (Exception e) {
str.append("exception-{}" + e);
}
});
st.stop();
List data = objectMapper.readValue(getFile(), new TypeReference<List<String>>() {});
str.append("----thread-" + Thread.currentThread().getName() + "started at time-"+dateFormat.format(date)+" and time taken-" + String.format("%.2f", st.getTotalTimeSeconds()));
str.append("---");
data.add(str);
objectMapper.writeValue(getFile(),
data);
}
#Bean("abcd")
public ConcurrentKafkaListenerContainerFactory<String, String> kafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<String, String> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory());
factory.getContainerProperties().setAckMode(ContainerProperties.AckMode.BATCH);
factory.setConcurrency(5);
factory.setBatchListener(true);
return factory;
}
#Bean
public NewTopic syliusDeTopic() {
return TopicBuilder.name("kafkaTopic").partitions(5).replicas(2).build();
}
#Bean
public ConsumerFactory<String, String> consumerFactory() {
Map<String, Object> configProps = new HashMap<>();
configProps.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, server);
configProps.put(ConsumerConfig.GROUP_ID_CONFIG, consumerGroupId);
configProps.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
configProps.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
configProps.put(ConsumerConfig.PARTITION_ASSIGNMENT_STRATEGY_CONFIG, CustomCooperativeStickyAssignor.class.getName());
configProps.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG,"500");
configProps.put(ConsumerConfig.FETCH_MIN_BYTES_CONFIG,"1");
configProps.put(ConsumerConfig.FETCH_MAX_WAIT_MS_CONFIG,"5000");
return new DefaultKafkaConsumerFactory<>(configProps);
}
But when i checked the result for sample 100 records, threads did not started at same time. Below is the response for the same.
["test-0,test-1,test-2,----thread-org.springframework.kafka.KafkaListenerEndpointContainer#0-0-C-1started at time-2023/01/06 22:20:19 and time taken-7.51---","test-56,test-57,test-58,test-59,test-60,test-61,----thread-org.springframework.kafka.KafkaListenerEndpointContainer#0-1-C-1started at time-2023/01/06 22:20:26 and time taken-15.02---","test-70,test-71,test-72,test-73,test-74,test-75,test-76,test-77,test-78,----thread-org.springframework.kafka.KafkaListenerEndpointContainer#0-3-C-1started at time-2023/01/06 22:20:34 and time taken-22.53---","test-62,test-63,test-64,test-65,test-66,test-67,test-68,test-69,test-85,test-86,test-87,test-88,test-89,test-90,test-91,----thread-org.springframework.kafka.KafkaListenerEndpointContainer#0-2-C-1started at time-2023/01/06 22:20:49 and time taken-37.55---","test-79,test-80,test-81,test-82,test-83,test-84,test-92,test-93,test-94,test-95,test-96,test-97,test-98,test-99,----thread-org.springframework.kafka.KafkaListenerEndpointContainer#0-1-C-1started at time-2023/01/06 22:21:01 and time taken-35.05---","test-3,test-4,test-5,test-6,test-7,test-8,test-9,test-10,test-11,test-12,test-13,test-14,test-15,test-16,test-17,test-18,test-19,test-20,test-21,test-22,test-23,test-24,test-25,test-26,test-27,test-28,test-29,test-30,test-31,test-32,test-33,test-34,test-35,test-36,test-37,test-38,test-39,test-40,test-41,test-42,test-43,test-44,test-45,test-46,test-47,test-48,test-49,test-50,test-51,test-52,test-53,test-54,test-55,----thread-org.springframework.kafka.KafkaListenerEndpointContainer#0-4-C-1started at time-2023/01/06 22:22:24 and time taken-132.69---"]
The started time of threads are different with difference of around >80 secs between first thread and last.
Any idea how to resolve this.I want the thread to run at almost same time (thread count can increase to max 15) which can improve the ingestion of huge records?
Also, data added in partition of varied size.Can it be resolved as well?
Your started at time assumption is not correct. That's really not a time when concurrent containers are started, but rather the time when they consume records from partitions assigned to them. So, you may just not have any data in the partition to consume at the moment. More over there is no guarantee that all the partitions are assigned to consumers at the same time. So, one consumer may have already started to consume, but other has not obtained assigned to them partitions yet.
To evenly distribute data between partitions you need to look into a messageKey and Partitioner abstraction on the producer side.

Kafka consumer seek operation is not returning data

I have a requirement where I have to monitor a consumer group externally and also check the consumer record for a particular offset which is already consumed by that above consumer group. I created an AdminClient to connect to the cluster and do that operation.
Now, when I am trying to do assign() and seek() operation to the particular offset and then poll the data, it always returns an empty map.
ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(10));
Below is my code. I logged in to control center and I can see data for the below topic-partition and offset. Please help me in identifying the issue.
Properties properties = new Properties();
properties.put("bootstrap.servers", "server_list");
properties.put("security.protocol", "SASL_SSL");
properties.put("ssl.truststore.location", ".jks file path");
properties.put("ssl.truststore.password", "****");
properties.put("sasl.mechanism", "****");
properties.put("sasl.kerberos.service.name", "****");
properties.put("group.id", grp_id);
properties.put("auto.offset.reset", "earliest");
// properties.setProperty(ConsumerConfig.GROUP_ID_CONFIG,grp_id);
//properties.setProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG,"earliest");
properties.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
properties.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
properties.put("auto.offset.reset", "earliest");
properties.put("enable.auto.commit", "false");
KafkaConsumer < String, String > consumer = new KafkaConsumer < String, String > (properties);
try {
TopicPartition partition0 = new TopicPartition("topic1", 1);
consumer.assign(Arrays.asList(partition0));
long offset = 19 L;
consumer.seek(tp, offset);
boolean messageend = true;
try {
while (messageend) {
ConsumerRecords < String, String > records = consumer.poll(Duration.ofMillis(10));
if (null != records && !records.isEmpty()) {
for (ConsumerRecord < String, String > record: records) {
if (record.offset() == offset) {
System.out.println("Match found");
messageend = false;
}
}
} else {
messageend = false;
}
}
}
}
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
Unclear what you mean by "externally monitor", but you should not use the same consumer group name to read from the topic if there are already active consumers in that group.
In other words, your consumer will join the group and could cause a rebalance for existing consumers, or it'll join as idle and consume nothing if the assigned partition is already assigned elsewhere in the group. This seems to be the case you're running into.
You should be able to do this more easily on the CLI
kcat -C -b kafka:9092 -t topic1 -p 1 -o 19 -m 1
How many partitions do you have for topic1? There need to be at least 2 for you to seek the TopicPartition("topic1", 1).
the timeout 10 ms is rather short for me when consumer.poll(Duration.ofMillis(10));
Do you have any other consumer in the same group? if number of consumer in the same group is bigger than number of partition, then there will be idle consumer, so it has no assignment, so seek and poll will fail

How to fetch messages which are uncommited in kafka

I have a function in java in which I am trying to fetch messages which are unread. For example, If I have messages with offSet 0,1,2 in broker which are already read by the consumer and If I switch off my consumer for an hour. And at that time I produce messages with offset 3,4,5. After that when my consumer is started it should read message from offset 3 not from 0. But, It either reads all the messages or read those messages which are produced after starting Kafka consumer. I want to read those messages which are unread or uncommited
I tried "auto.offset.reset"= "latest" and "earliest". as well as "enable.auto.commit" = "true" and "false". I also tried commitSync() and commitAsync() before calling close() method but no luck.
public static KafkaConsumer createConsumer() {
Properties properties = new Properties();
properties.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, Constants.KAFKA_BROKER);
properties.put(ConsumerConfig.GROUP_ID_CONFIG, "testGroup");
properties.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
properties.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
properties.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "latest");
properties.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, "50");
properties.put(ConsumerConfig.AUTO_COMMIT_INTERVAL_MS_CONFIG, "1");
properties.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);
KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String>(properties);
consumer.subscribe(Collections.singleton(Constants.TOPIC));
return consumer;
}
public static void main(String[] args) {
System.out.println("");
System.out.println("----------------");
System.out.println("");
System.out.println("KAFKA CONSUMER EXAMPLE");
System.out.println("");
System.out.println("----------------");
System.out.println("");
OffsetAndMetadata offsetAndMetadataInitial = createConsumer().committed(new TopicPartition(Constants.TOPIC, 0));
System.out.println("");
System.out.println("Offset And MetaData Initial : ");
System.out.println(offsetAndMetadataInitial);
System.out.println("");
ConsumerRecords<String, String> consumerRecords = createConsumer().poll(Duration.ofSeconds(2L));
System.out.println("");
System.out.println("Count Consumer Records : " + consumerRecords.count());
System.out.println("");
Iterator<ConsumerRecord<String, String>> itr = consumerRecords.iterator();
Map<TopicPartition, OffsetAndMetadata> partationOffsetMap = new HashMap<>(4);
while (itr.hasNext()) {
ConsumerRecord record = itr.next();
System.out.println("OffSet : " + record.offset());
System.out.println("");
System.out.println("Key : " + record.key());
System.out.println("Value : " + record.value());
System.out.println("Partition : " + record.partition());
System.out.println("--------------------------");
System.out.println("");
}
createConsumer().close();
}
I just want to fetch only unread messages in kafka Consumer. Please correct me if I am wrong somewhere. And Thanks in Advance
The main problem in your code is that you are not closing the consumer you used to poll messages; this is because each call to createConsumer() creates a new KafkaConsumer. And as you are not closing the consumer, and are calling poll() only once, you never commit the messages you have read.
(with auto-commit, commit is called within poll() after auto-commit-interval, and within close())
Once you will have corrected that it should work with following settings:
auto-commit=true (otherwise you could also commit manually, but auto-commit is simpler).
offset-reset= earliest (this has only effect the first time you consume for a given group-id, to tell if you want to consume from the begining of the topic or only messages produced after you started to consume. Once you have started to consume with a given group-id, you will always continue to consume from the latest offset you have committed.)
group-id must not change between restarts, or you will start from the begining or from the end again depending on your offset-reset setting.
Hope this helps

kafka producers are very slow

I am new in Kafka and I have a question that I'm not able to resolve.
I have installed Kafka and Zookeeper in my own computer in Windows (not in Linux) and I have created a broker with a topic with several partitions (playing between 6 and 12 partitions).
When I create consumers, they works perfectly and read at good speed, but referring producer, I have created the simple producer one can see in many web sites. The producer is inside a loop and is sending many short messages (about 2000 very short messages).
I can see that consumers read the 2000 messages very quicly, but producer sends message to the broker at more or less 140 or 150 messages per second. As I said before, I'm working in my own laptop (only 1 disk), but when I read about millions of messages per second, I think there is something I forgot because I'm light-years far from that.
If I use more producers, the result is worse.
Is a question of more brokers in the same node or something like that? This problem have been imposed to me in my job and I have not the possibility of a better computer.
The code for creating the producer is
public class Producer {
public void publica(String topic, String strKey, String strValue) {
Properties configProperties = new Properties();
configProperties.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
configProperties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, LongSerializer.class.getName());
configProperties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
KafkaProducer<String, String> producer = new KafkaProducer<String, String>(configProperties);
ProducerRecord<String, String> rec = new ProducerRecord<String, String>(topic, strValue);
producer.send(rec);
}
}
and the code for sending messages is (partial):
Producer prod = new Producer();
for (int i = 0; i < 2000; i++)
{
key = String.valueOf(i);
prod.publica("TopicName", key, texto + " - " + key);
// System.out.println(i + " - " + System.currentTimeMillis());
}
You may create your Kafka producer once and use it every time you need to send a message:
public class Producer {
private final KafkaProducer<String, String> producer; // initialize in constructor
public void publica(String topic, String strKey, String strValue) {
ProducerRecord<String, String> rec = new ProducerRecord<String, String>(topic, strValue);
producer.send(rec);
}
}
Also take a look at the producer and broker configurations available here. There are several options with which you can tune for your application's needs.

Why does my Kafka Consumer consume messages quickly on first run, but slows down considerably in future runs?

I am a student researching and playing around with Kafka. After following the examples on the Apache documentation, I'm playing around with the examples portion in the trunk of their current Github repo.
As of right now, the example implements an 'older' version of their Consumer and does not employ the new KafkaConsumer. Following the documentation, I have written my own version of the KafkaConsumer thinking that it would be faster.
This is a vague question, but on runthrough I produce 5000 simple messages such as "Message_CurrentMessageNumber" to a topic "test" and then use my consumer to fetch these messages and print them to stdout. When I run the example code replacing the provided consumer with the newer KafkaConsumer (v 0.8.2 and up) it works pretty quickly and comparably to the example in its first runthrough, but slows down considerably anytime after that.
I notice that my Kafka Server outputs
Rebalancing group group1 generation 3 (kafka.coordinator.ConsumerCoordinator)
or similar messages often which leads me to believe that Kafka has to do some sort of load balancing that slows stuff down but I was wondering if anyone else had insight as to what I am doing wrong.
public class AlternateConsumer extends Thread {
private final KafkaConsumer<Integer, String> consumer;
private final String topic;
private final Boolean isAsync = false;
public AlternateConsumer(String topic) {
Properties properties = new Properties();
properties.put("bootstrap.servers", "localhost:9092");
properties.put("group.id", "newestGroup");
properties.put("partition.assignment.strategy", "roundrobin");
properties.put("enable.auto.commit", "true");
properties.put("auto.commit.interval.ms", "1000");
properties.put("session.timeout.ms", "30000");
properties.put("key.deserializer", "org.apache.kafka.common.serialization.IntegerDeserializer");
properties.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
consumer = new KafkaConsumer<Integer, String>(properties);
consumer.subscribe(topic);
this.topic = topic;
}
public void run() {
while (true) {
ConsumerRecords<Integer, String> records = consumer.poll(100);
for (ConsumerRecord<Integer, String> record : records) {
System.out.println("We received message: " + record.value() + " from topic: " + record.topic());
}
}
// ConsumerRecords<Integer, String> records = consumer.poll(0);
// for (ConsumerRecord<Integer, String> record : records) {
// System.out.println("We received message: " + record.value() + " from topic: " + record.topic());
// }
// consumer.close();
}
}
To start:
package kafka.examples;
public class KafkaConsumerProducerDemo implements KafkaProperties
{
public static void main(String[] args) {
final boolean isAsync = args.length > 0 ? !args[0].trim().toLowerCase().equals("sync") : true;
Producer producerThread = new Producer("test", isAsync);
producerThread.start();
AlternateConsumer consumerThread = new AlternateConsumer("test");
consumerThread.start();
}
}
The producer is the default producer located here: https://github.com/apache/kafka/blob/trunk/examples/src/main/java/kafka/examples/Producer.java
This should not be the case. If the setup is similar between your two consumers you should expect better result with new consumer unless there is issue in the client/consumer implementation, which seems to be the case here.
Can you share your benchmark results and the frequency of reported rebalancing and/or any pattern (i.e. sluggish once at startup, after fixed message consumption, after the queue is drained, etc) you are observing. Also if you can share some details about your consumer implementation.

Categories