I am trying out the kafka streaming. I am reading messages from one topic and doing groupByKey and then doing the count of groups. But the problem is that the messages count is coming as unreadable "boxes".
If I run the console consumer these are coming as empty strings
This is the WordCount code I wrote
package streams;
import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.streams.KafkaStreams;
import org.apache.kafka.streams.StreamsBuilder;
import org.apache.kafka.streams.StreamsConfig;
import org.apache.kafka.streams.kstream.KStream;
import java.util.Arrays;
import java.util.Properties;
public class WordCount {
public static void main(String[] args) {
Properties properties = new Properties();
properties.setProperty(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
properties.setProperty(StreamsConfig.APPLICATION_ID_CONFIG, "streams-demo-2");
properties.setProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
properties.setProperty(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.StringSerde.class.getName());
properties.setProperty(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.StringSerde.class.getName());
// topology
StreamsBuilder builder = new StreamsBuilder();
KStream<String, String> input = builder.stream("temp-in");
KStream<String, Long> fil = input.flatMapValues(val -> Arrays.asList(val.split(" "))) // making stream of text line to stream of words
.selectKey((k, v) -> v) // changing the key
.groupByKey().count().toStream(); // getting count after groupBy
fil.to("temp-out");
KafkaStreams streams = new KafkaStreams(builder.build(), properties);
streams.start();
System.out.println(streams.toString());
Runtime.getRuntime().addShutdownHook(new Thread(streams::close));
}
}
This is the output I am getting in the consumer. It is there on the right side in image
I had tried casting the long to long again to see if it works. But it's not working
I am attaching the consumer code too if it helps.
package tutorial;
import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.common.serialization.StringDeserializer;
import java.time.Duration;
import java.util.Collections;
import java.util.Properties;
public class Consumer {
public static void main(String[] args) {
Properties properties = new Properties();
properties.setProperty(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
properties.setProperty(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
properties.setProperty(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
// Once the consumer starts running it keeps running even after we stop in console
// We should create new consumer to read from earliest because the previous one had already consumed until certain offset
// when we run the same consumer in two consoles kafka detects it and re balances
// In this case the consoles split the partitions they consume forming a consumer group
properties.setProperty(ConsumerConfig.GROUP_ID_CONFIG, "consumer-application-1"); // -> consumer id
properties.setProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest"); // -> From when consumer gets data
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(properties);
consumer.subscribe(Collections.singleton("temp-out"));
while (true) {
ConsumerRecords<String, String> consumerRecords = consumer.poll(Duration.ofMillis(1000));
for (ConsumerRecord<String, String> record: consumerRecords) {
System.out.println(record.key() + " " + record.value());
System.out.println(record.partition() + " " + record.offset());
}
}
}
}
Any help is appreciated. Thanks in advance.
The message value you're writing with Kafka Streams is a Long, and you're consuming it as a String.
If you make the following changes to your Consumer class, you'll be able to see the count printed correctly to stdout:
// Change this from StringDeserializer to LongDeserializer.
properties.setProperty(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, LongDeserializer.class.getName());
...
// The value you're consuming here is a Long, not a String.
KafkaConsumer<String, Long> consumer = new KafkaConsumer<>(properties);
consumer.subscribe(Collections.singleton("temp-out"));
while (true) {
ConsumerRecords<String, Long> consumerRecords = consumer.poll(Duration.ofMillis(1000));
for (ConsumerRecord<String, Long> record : consumerRecords) {
System.out.println(record.key() + " " + record.value());
System.out.println(record.partition() + " " + record.offset());
}
}
Related
Currently, when broker is not available, I see that java consumer will try to reconnect to broker for infinite amount of time(or at least I was not able to wait till it stops).
What I would like to have is an ability to have an exception thrown from poll() when broker is not available. Or maybe some other way, but I want my app to crash when broker is not available. It seems that it should be very easy to configure, but I'm unable to find info anywhere.
Here is an example. It'll always return 0 records from poll and will retry connection to broker forever.
package org.example;
import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import java.time.Duration;
import java.util.List;
import java.util.Properties;
public class Consumer {
public static void main(String[] args) {
Properties props = new Properties();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9094");
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringDeserializer");
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringDeserializer");
props.put(ConsumerConfig.GROUP_ID_CONFIG, "test-consumer-group");
try (KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props)) {
consumer.subscribe(List.of("game.journal"));
while (true) {
ConsumerRecords<String, String> recs = consumer.poll(Duration.ofMillis(100));
for (ConsumerRecord<String, String> rec : recs) {
System.out.printf("Recieved %s: %s", rec.key(), rec.value());
}
}
}
}
}
So how can I make consumer crash when broker is not available?
I'm a former legacy ActiveMQ user learning Kafka. And I have a question.
With Active MQ you can do this:
Submit 100 messages into a queue
Wait however long you want
Consume those 100 messages from that queue. Guaranteed single consumer of the message.
I try in Kafka to do the same thing
import java.time.Duration;
import java.util.Collections;
import java.util.Properties;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerConfig;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.common.serialization.StringDeserializer;
import org.apache.kafka.common.serialization.StringSerializer;
import org.junit.After;
import org.junit.Before;
import org.junit.Test;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.testcontainers.containers.KafkaContainer;
import org.testcontainers.utility.DockerImageName;
public class KafkaTest {
private static final Logger LOG = LoggerFactory.getLogger(KafkaTest.class);
public static final String MY_GROUP_ID = "my-group-id";
public static final String TOPIC = "topic";
KafkaContainer kafka = new KafkaContainer(DockerImageName.parse("confluentinc/cp-kafka:6.2.1"));
#Before
public void before() {
kafka.start();
}
#After
public void after() {
kafka.close();
}
#Test
public void testPipes() throws ExecutionException, InterruptedException {
Properties consumerProps = new Properties();
consumerProps.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, kafka.getBootstrapServers());
consumerProps.put("group.id", MY_GROUP_ID);
consumerProps.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
consumerProps.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
ExecutorService es = Executors.newCachedThreadPool();
Future consumerFuture = es.submit(() -> {
try (KafkaConsumer<String, String> consumer = new KafkaConsumer<>(consumerProps)) {
consumer.subscribe(Collections.singletonList(TOPIC));
while (true) {
ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(1000));
for (ConsumerRecord<String, String> record : records) {
LOG.info("Thread: {}, Topic: {}, Partition: {}, Offset: {}, key: {}, value: {}", Thread.currentThread().getName(), record.topic(), record.partition(), record.offset(), record.key(), record.value().toUpperCase());
}
}
} catch (Exception e) {
LOG.error("Consumer error", e);
}
});
Thread.sleep(10000); // NOTICE! if you remove this, the consumer will not receive the messages. because the consumer won't be registered yet before the messages come rolling on in.
Properties producerProps = new Properties();
producerProps.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, kafka.getBootstrapServers());
producerProps.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
producerProps.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
Future producerFuture = es.submit(() -> {
try (KafkaProducer<String, String> producer = new KafkaProducer<>(producerProps)) {
int counter = 0;
while (counter <= 100) {
System.out.println("Sent " + counter);
String msg = "Message " + counter;
producer.send(new ProducerRecord<>(TOPIC, msg));
counter++;
}
} catch (Exception e) {
LOG.error("Failed to send message by the producer", e);
}
});
producerFuture.get();
consumerFuture.get();
}
}
This example does not work if you do not start Consumer, wait for it to start, then run the producer.
Can anyone show me how to alter my example program to do things where the messages await to be consumed?
In your consumer config, you need to add auto.offset.reset=earliest or call seekToBeginning after subscribing.
Otherwise, it starts to read from the end of the topic. In other words, if you start the consumer after the producer, it'll begin to read after all the existing data.
I am using kafka confluent open source version. I am using avro for serialization and deserialization. My producer java code is able to push the record in broker. But when my java consumer tries to read the data, i get below error.
Exception in thread "main" org.apache.kafka.common.errors.RecordDeserializationException: Error deserializing key/value for partition clickRecordsEvents-0 at offset 1. If needed, please seek past the record to continue consumption.
at org.apache.kafka.clients.consumer.internals.Fetcher.parseRecord(Fetcher.java:1429)
at org.apache.kafka.clients.consumer.internals.Fetcher.access$3400(Fetcher.java:134)
at org.apache.kafka.clients.consumer.internals.Fetcher$CompletedFetch.fetchRecords(Fetcher.java:1652)
at org.apache.kafka.clients.consumer.internals.Fetcher$CompletedFetch.access$1800(Fetcher.java:1488)
at org.apache.kafka.clients.consumer.internals.Fetcher.fetchRecords(Fetcher.java:721)
at org.apache.kafka.clients.consumer.internals.Fetcher.fetchedRecords(Fetcher.java:672)
at org.apache.kafka.clients.consumer.KafkaConsumer.pollForFetches(KafkaConsumer.java:1304)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1238)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1211)
at com.ru.kafka.consumer.deserializer.avro.AvroConsumerDemo.main(AvroConsumerDemo.java:31)
Caused by: org.apache.kafka.common.errors.SerializationException: Error deserializing Avro message for id 1
at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer.deserialize(AbstractKafkaAvroDeserializer.java:156)
at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer.deserialize(AbstractKafkaAvroDeserializer.java:79)
at io.confluent.kafka.serializers.KafkaAvroDeserializer.deserialize(KafkaAvroDeserializer.java:55)
at org.apache.kafka.common.serialization.Deserializer.deserialize(Deserializer.java:60)
at org.apache.kafka.clients.consumer.internals.Fetcher.parseRecord(Fetcher.java:1420)
... 9 more
Caused by: org.apache.kafka.common.errors.SerializationException: Could not find class ClickRecord specified in writer's schema whilst finding reader's schema for a SpecificRecord.
I am able to read the records from consumer console command prompt.
Here's my consumer code:
import java.time.Duration;
import java.util.Collections;
import java.util.Properties;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
public class AvroConsumerDemo {
public static void main(String[] args) {
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("group.id", "clicksCG");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "io.confluent.kafka.serializers.KafkaAvroDeserializer");
props.put("specific.avro.reader", "true");
props.put("schema.registry.url", "http://localhost:8083");
String topic = "clickRecordsEvents";
KafkaConsumer<String, ClickRecord> consumer = new KafkaConsumer<String, ClickRecord>(props);
consumer.subscribe(Collections.singletonList(topic));
System.out.println("Reading topic:" + topic);
while (true) {
ConsumerRecords<String, ClickRecord> records = consumer.poll(Duration.ofMillis(1000));
for (ConsumerRecord<String, ClickRecord> record : records) {
System.out.println("Current customer name is: " + record.value().getBrowser());
System.out.println("Session ID read " + record.value().getSessionId());
System.out.println("Browser read " + record.value().getBrowser());
System.out.println("Compaign read " + record.value().getCompaign());
System.out.println("IP read " + record.value().getIp());
System.out.println("Channel" + record.value().getChannel());
System.out.println("Refrrer" + record.value().getRefferer());
}
consumer.commitSync();
}
}
}
Note that, I have generated ClickRecord java class using avro schema file and avro tool plugin. Its not plain java pojo.
Here's my producer code:
import java.util.Properties;
import java.util.concurrent.ExecutionException;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.clients.producer.RecordMetadata;
import com.ru.kafka.avro.pojo.ClickRecord;
public class AvroProducerDemo {
public static void main(String[] args) {
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "io.confluent.kafka.serializers.KafkaAvroSerializer");
props.put("schema.registry.url", "http://localhost:8083"); // URL points to the schema registry.
String topic = "clickRecordsEvents";
Producer<String, ClickRecord> producer = new KafkaProducer<String, ClickRecord>(props);
ClickRecord clickRecord = new ClickRecord();
clickRecord.setSessionId("ABC1245");
clickRecord.setBrowser("Chrome");
clickRecord.setIp("192.168.32.56");
clickRecord.setChannel("HomePage");
System.out.println("Generated clickRecord " + clickRecord.toString());
ProducerRecord<String, ClickRecord> record = new ProducerRecord<String, ClickRecord>(topic,
clickRecord.getSessionId().toString(), clickRecord);
try {
RecordMetadata metdata = producer.send(record).get();
System.out.println("Record written to partition" + metdata.partition());
} catch (InterruptedException e) {
e.printStackTrace();
} catch (ExecutionException e) {
e.printStackTrace();
} finally {
producer.close();
}
}
}
We recently started to use Kafka and I am writing a Kafka consumer application using Kafka Java native consumer API.
However most of the examples I saw are using a while loop and then call poll method on a consumer object in the loop. Like below:
while (true) {
final ConsumerRecords<Long, String> consumerRecords =
consumer.poll(1000);
if (consumerRecords.count()==0) {
noRecordsCount++;
if (noRecordsCount > giveUp) break;
else continue;
}
consumerRecords.forEach(record -> {
System.out.printf("Consumer Record:(%d, %s, %d, %d)\n",
record.key(), record.value(),
record.partition(), record.offset());
});
consumer.commitAsync();
}
I am just seeking a better way of doing this without a loop using the native Java consumer API. I know by using spring kafka you don't need to write those. How about using native API? Any good approach or best practice?
I tried using a scheduler and code is working.
package com.kafka;
import java.text.ParseException;
import java.util.Arrays;
import java.util.Date;
import java.util.List;
import java.util.Properties;
import java.util.TimerTask;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
public class ScheduledTask extends TimerTask {
Date now; // to display current time
public void run( ) {
now = new Date();
System.out.println("Time is :" + now);
String AlarmString=null;
Properties props = new Properties();
props.put("bootstrap.servers", "10.*.*.*:9092");
props.put("group.id", "grp-1");
props.put("enable.auto.commit", "true");
props.put("auto.commit.interval.ms", "1000");
props.put("session.timeout.ms", "30000");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Arrays.asList("RAW_MH_RAN_SAM"));
ConsumerRecords<String, String> records = consumer.poll(1000);
//ConsumerRecords<String, String> records =consumer.poll(Duration.ofMillis(100));
for (ConsumerRecord<String, String> record : records)
{
System.out.println("Consumer:=============== partition Id= " + record.partition() + " offset = " + record.offset() + " value = " + record.value() + "=================");
if (AlarmString==null && !(record.value().toString().contains("PR ALARM:")) ){
AlarmString=record.value();
}
else{
if( !(record.value().toString().contains("PR ALARM:")) )
{
//System.out.println("record.value() :::"+ record.value() );
AlarmString=AlarmString+","+record.value();
}
}
}
if (consumer != null) {
System.out.println("Closing Connection");
consumer.close();
}
}
}
//Simple consumer
package com.kafka;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import com.google.gson.Gson;
import java.text.ParseException;
import java.time.Duration;
import java.util.Arrays;
import java.util.List;
import java.util.Properties;
import java.util.Timer;
public class SimpleConsumer{
public static void main(String[] args)
{
Timer time = new Timer();
ScheduledTask st = new ScheduledTask(); // Instantiate SheduledTask class
time.schedule(st, 0,60000); // Create Repetitively task for every 1 secs
}
}
Polling continuously is "just" the way how Kafka consumer works, how the underneath Kafka protocol works.
Each action is always initiated by the clients (pulling model) not by brokers (pushing model) that in case of consuming messages translates into polling.
Using the Java Kafka consumer API means having a loop, using a scheduler or whatever technology you have with Java for executing code continuously, you have to deal with it.
Other frameworks like Spring or Smallrye reactive messaging just do that for you. They are hiding the poll loop to your application but in the end there is always a loop ... it's how Kafka works.
hello i want to send a message from producer and received in consumer as capitalized. using CONFLUENT and KAFKA with a build.gradle file and StreamingApp.java containing this codes:
ProcessingApp.java:
package kafka_stream;
import org.apache.kafka.streams.StreamsBuilder;
import org.apache.kafka.streams.Topology;
import org.apache.kafka.streams.KafkaStreams;
import org.apache.kafka.streams.StreamsConfig;
import org.apache.kafka.streams.kstream.KStream;
import java.util.Properties;
public class StreamingApp {
public static void main(String[] args) throws Exception {
Properties props = new Properties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG,"streaming_app_id");
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG,"localhost:9092");
StreamsConfig config = new StreamsConfig(props);
StreamsBuilder builder = new StreamsBuilder();
Topology topology = builder.build();
KafkaStreams streams = new KafkaStreams(topology,config);
KStream<String, String> simpleFirstStream = builder.stream("src-topic");
KStream<String, String> upperCasedStream = simpleFirstStream.mapValues(String::toUpperCase);
upperCasedStream.to("out-topic");
System.out.println("Streaming App Started");
streams.start();
Thread.sleep(30000);
System.out.println("shutting downl the streaming app");
streams.close();
}
}
so hot to get this option solved?