I am new to kafka.I read documentation to get started and now I am trying to do hands on using embedded kafka mode.I tried a sample program for the same.
public static void main(String args[]) throws InterruptedException, IOException {
// setup Zookeeper
EmbeddedZookeeper zkServer = new EmbeddedZookeeper();
String zkConnect = ZKHOST + ":" + zkServer.port();
ZkClient zkClient = new ZkClient(zkConnect, 30000, 30000, ZKStringSerializer$.MODULE$);
ZkUtils zkUtils = ZkUtils.apply(zkClient, false);
// setup Broker
Properties brokerProps = new Properties();
brokerProps.setProperty("zookeeper.connect", zkConnect);
brokerProps.setProperty("broker.id", "0");
brokerProps.setProperty("log.dirs", Files.createTempDirectory("kafka-").toAbsolutePath().toString());
brokerProps.setProperty("listeners", "PLAINTEXT://" + BROKERHOST +":" + BROKERPORT);
KafkaConfig config = new KafkaConfig(brokerProps);
Time mock = new MockTime();
KafkaServer kafkaServer = TestUtils.createServer(config, mock);
// create topic
AdminUtils.createTopic(zkUtils, TOPIC, 1, 1, new Properties(), RackAwareMode.Disabled$.MODULE$);
// setup producer
Properties producerProps = new Properties();
producerProps.setProperty("bootstrap.servers", BROKERHOST + ":" + BROKERPORT);
producerProps.setProperty("key.serializer","org.apache.kafka.common.serialization.IntegerSerializer");
producerProps.setProperty("value.serializer", "org.apache.kafka.common.serialization.ByteArraySerializer");
KafkaProducer<Integer, byte[]> producer = new KafkaProducer<Integer, byte[]>(producerProps);
List<PartitionInfo> partitionInfo = producer.partitionsFor("test");
System.out.println(partitionInfo);
// setup consumer
Properties consumerProps = new Properties();
consumerProps.setProperty("bootstrap.servers", BROKERHOST + ":" + BROKERPORT);
consumerProps.setProperty("group.id", "group0");
consumerProps.setProperty("client.id", "consumer0");
consumerProps.setProperty("key.deserializer","org.apache.kafka.common.serialization.IntegerDeserializer");
consumerProps.setProperty("value.deserializer", "org.apache.kafka.common.serialization.ByteArrayDeserializer");
consumerProps.put("auto.offset.reset", "earliest"); // to make sure the consumer starts from the beginning of the topic
KafkaConsumer<Integer, byte[]> consumer = new KafkaConsumer<>(consumerProps);
consumer.subscribe(Arrays.asList(TOPIC));
// send message
ProducerRecord<Integer, byte[]> data = new ProducerRecord<>(TOPIC, 42, "test-message".getBytes(StandardCharsets.UTF_8));
producer.send(data);
producer.close();
// starting consumer
ConsumerRecords<Integer, byte[]> records = consumer.poll(1000);
Iterator<ConsumerRecord<Integer, byte[]>> recordIterator = records.iterator();
ConsumerRecord<Integer, byte[]> record = recordIterator.next();
System.out.printf("offset = %d, key = %s, value = %s", record.offset(), record.key(), record.value());
kafkaServer.shutdown();
zkClient.close();
zkServer.shutdown();
}
}
but Iam not able fetch data for the topics.Iam getting the following exception while executing the programm
java.util.NoSuchElementException
at org.apache.kafka.common.utils.AbstractIterator.next(AbstractIterator.java:52)
at com.nuwaza.evlauation.embedded.kafka.EmbeddedKafka.main(EmbeddedKafka.java:105)
Can anyone guide me?
UPDATED-
WARN [main] (Logging.scala#warn:83) - No meta.properties file under dir C:\Users\bhavanak\AppData\Local\Temp\kafka-1238324273778000675\meta.properties
WARN [main] (Logging.scala#warn:83) - No meta.properties file under dir C:\Users\bhavanak\AppData\Local\Temp\kafka-1238324273778000675\meta.properties
WARN [kafka-producer-network-thread | producer-1] (NetworkClient.java#handleResponse:600) - Error while fetching metadata with correlation id 0 : {test=LEADER_NOT_AVAILABLE}
WARN [kafka-producer-network-thread | producer-1] (NetworkClient.java#handleResponse:600) - Error while fetching metadata with correlation id 1 : {test=LEADER_NOT_AVAILABLE}
WARN [kafka-producer-network-thread | producer-1] (NetworkClient.java#handleResponse:600) - Error while fetching metadata with correlation id 2 : {test=LEADER_NOT_AVAILABLE}
[Partition(topic = test, partition = 0, leader = 0, replicas = [0,], isr = [0,]]
ERROR [main] (NIOServerCnxnFactory.java#uncaughtException:44) - Thread Thread[main,5,main] died
java.util.NoSuchElementException
at org.apache.kafka.common.utils.AbstractIterator.next(AbstractIterator.java:52)
at com.nuwaza.evlauation.embedded.kafka.EmbeddedKafka.main(EmbeddedKafka.java:105)
Try to invoke producer.flush() before reading messages to ensure the produced messages indeed are persisted on disks.
This error signifies that your consumer is trying to read the message even before it gets persisted to kafka logs. Ideally you should run producer and consumer as separate process. I was facing the same issue but that was due to other reason being iterator.next() was invoked twice mistakenly. Just in case someone else facing same issue.
Related
I am trying to publish some messages to kafka from a aws lambda function. When I'm trying to test this feature on my local, the messages are not getting published and the function is getting timed out. I was able to connect to the local kafka instance from the same lambda function using a consumer and list the topics. Is there anything I'm missing here?
String bootstrapServer = "host.docker.internal:9092";
String topic = "test-topic";
Properties properties = new Properties();
properties.setProperty(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServer);
properties.setProperty(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
properties.setProperty(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
properties.setProperty(ProducerConfig.ACKS_CONFIG, "all");
properties.put(ProducerConfig.BATCH_SIZE_CONFIG, 16384);
Properties props = new Properties();
props.put("bootstrap.servers", bootstrapServer);
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> simpleConsumer = new KafkaConsumer<>(props);
// THIS IS WORKING
simpleConsumer.listTopics().forEach((t, v) -> logger.log(t + "\n"));
try (KafkaProducer<String, String> kafkaProducer = new KafkaProducer<>(properties)) {
ProducerRecord<String, String> record = new ProducerRecord<>(topic, value);
logger.log("Publishing '" + value + "' to kafka topic: " + topic + "\n");
Future<RecordMetadata> future = kafkaProducer.send(record);
logger.log("Waiting for flushing data\n");
kafkaProducer.flush();
logger.log("Waiting for the response\n");
RecordMetadata recordMetadata = future.get();
logger.log("Published record to kafka topic: " + recordMetadata.topic() + " partition: " + recordMetadata.partition() + " offset: " + recordMetadata.offset() + "\n");
I'm triggering the function in my local using sam cli as follows:
sam local invoke "MyLambdaFunction" -e events/event.json
This is the timeout message I get.
Invoking helloworld.App::handleRequest (java11)
Skip pulling image and use local one: public.ecr.aws/sam/emulation-java11:rapid-1.59.0-x86_64.
Mounting .aws-sam/build/MyLambdaFunction as /var/task:ro,delegated inside runtime container
START RequestId: c07609a4-7cab-4b58-97fc-cdddf20afd5c Version: $LATEST
Picked up JAVA_TOOL_OPTIONS: -XX:+TieredCompilation -XX:TieredStopAtLevel=1
test-topic
__consumer_offsets
Publishing 'abc' to kafka topic: test-topic
Function 'MyLambdaFunction' timed out after 20 seconds
No response from invoke container for MyLambdaFunction
I'm experimenting with a Kafka clusters (3 nodes) and I was intending to run some tests around redundancy and availability (stopping nodes in cluster, etc) with a simple java app using the following kafka client dependency: -
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>2.8.0</version>
</dependency>
I'd configured a replication factor of 3 to ensure topics are replicated across all nodes and I'm using only 1 partition for the topic. I'm struggling to understand some behaviour I'm seeing on this sample code when specifically seeking to an offset (with one node offline):-
String topic = "test-topic";
TopicPartition partition = new TopicPartition(topic, 0);
List<TopicPartition> partitions = Collections.singletonList(partition);
while (true) {
Consumer<String, String> consumer = createConsumer();
consumer.assign(partitions);
consumer.seek(partition, 0);
ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(2000));
if (records.isEmpty())
System.out.println("No Records Found");
else
System.out.println("Records Found: " + records.count());
consumer.close();
Thread.sleep(2000);
}
This code will on occasion return "No Records Found" when one of the nodes in the cluster is offline:-
No Records
Found Records Found: 1
No Records Found
Records Found: 1
Records Found: 1 Records Found: 1 No Records Found Records Found: 1
Records Found: 1 Records Found: 1 Records Found: 1 Records Found: 1
Records Found: 1 No Records Found Records Found: 1 Records Found: 1
Records Found: 1 Records Found: 1 Records Found: 1 No Records Found
You'll notice that I'm creating the consumer each time inside the while loop. This is to simulate different consumers coming in and connecting as each consumer has a different consumer group id. Moving the consumer creation outside of the while loop (and removing consumer.close()) gives mostly expected results, i.e. all logs show "Records Found: 1". However, "sometimes" the very first poll will return no records, with all remaining showing 1 record found:-
String topic = "test-topic";
TopicPartition partition = new TopicPartition(topic, 0);
List<TopicPartition> partitions = Collections.singletonList(partition);
Consumer<String, String> consumer = createConsumer();
while (true) {
consumer.assign(partitions);
consumer.seek(partition, 0);
ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(2000));
if (records.isEmpty())
System.out.println("No Records Found");
else
System.out.println("Records Found: " + records.count());
Thread.sleep(2000);
}
createConsumer code is defined as follows for reference: -
public static Consumer<String, String> createConsumer() {
Properties config = new Properties();
config.put(ConsumerConfig.GROUP_ID_CONFIG, "test-consumer-" + UUID.randomUUID().toString());
config.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "node-1:9092, node-2:9092, node-3:9092");
config.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
config.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
config.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, 500);
config.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);
config.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest" );
config.put(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG, 10000);
Consumer<String, String> consumer = new KafkaConsumer<String, String>(config);
return consumer;
}
I'd like to understand this behaviour to be able to reliably run my availability tests.
I am also stuck with this problem, Finally solved it like this:
public ConsumerRecord<String, String> seekAndPoll(String topic, int partition, long offset) {
TopicPartition tp = new TopicPartition(topic, partition);
consumer.assign(Collections.singleton(tp));
System.out.println("assignment:" + consumer.assignment()); // 这里是有分配到分区的
// endOffset: the offset of the last successfully replicated message plus one
// if there has 5 messages, valid offsets are [0,1,2,3,4], endOffset is 4+1=5
Long endOffset = consumer.endOffsets(Collections.singleton(tp)).get(tp);
if (offset < 0 || offset >= endOffset) {
System.out.println("offset is illegal");
return null;
} else {
consumer.seek(tp, offset);
ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100))
if(records.isEmpty()){
System.out.println("Not Found");
return null;
} else {
System.out.println("Found");
return records.iterator().next();
}
}
}
I am trying to implement manual offset commit for the messages received on kafka. I have set the offset commit to false, but the offset value keeps on increasing.
Not sure what is the reason. Need help resolving the issue.
Below is the code
application.yml
spring:
application:
name: kafka-consumer-sample
resources:
cache:
period: 60m
kafka:
bootstrapServers: localhost:9092
options:
enable:
auto:
commit: false
KafkaConfig.java
#Bean
public ConsumerFactory<String, String> consumerFactory() {
Map<String, Object> config = new HashMap<>();
config.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
config.put(ConsumerConfig.GROUP_ID_CONFIG, groupId);
config.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
config.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
config.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);
return new DefaultKafkaConsumerFactory<>(config);
}
#Bean
public ConcurrentKafkaListenerContainerFactory<String, String> kafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<String, String> factory = new ConcurrentKafkaListenerContainerFactory();
factory.setConsumerFactory(consumerFactory());
return factory;
}
KafkaConsumer.java
#Service
public class KafkaConsumer {
#KafkaListener(topics = "#{'${kafka-consumer.topics}'.split(',')}", groupId = "${kafka-consumer.groupId}")
public void consume(ConsumerRecord<String, String> record) {
System.out.println("Consumed Kafka Record: " + record);
record.timestampType();
System.out.println("record.timestamp() = " + record.timestamp());
System.out.println("***********************************");
System.out.println(record.timestamp());
System.out.println("record.key() = " + record.key());
System.out.println("Consumed String Message : " + record.value());
}
}
output is as follows
Consumed Kafka Record: ConsumerRecord(topic = test, partition = 0, offset = 31, CreateTime = 1573570989565, serialized key size = -1, serialized value size = 2, headers = RecordHeaders(headers = [], isReadOnly = false), key = null, value = 10)
record.timestamp() = 1573570989565
***********************************
1573570989565
record.key() = null
Consumed String Message : 10
Consumed Kafka Record: ConsumerRecord(topic = test, partition = 0, offset = 32, CreateTime = 1573570991535, serialized key size = -1, serialized value size = 2, headers = RecordHeaders(headers = [], isReadOnly = false), key = null, value = 11)
record.timestamp() = 1573570991535
***********************************
1573570991535
record.key() = null
Consumed String Message : 11
Properties are as follows.
auto.commit.interval.ms = 100000000
auto.offset.reset = earliest
bootstrap.servers = [localhost:9092]
check.crcs = true
connections.max.idle.ms = 540000
enable.auto.commit = false
exclude.internal.topics = true
fetch.max.bytes = 52428800
fetch.max.wait.ms = 500
fetch.min.bytes = 1
group.id = mygroup
heartbeat.interval.ms = 3000
This is after I restart the consumer. I expected the earlier data to be printed as well.
Is my Understanding correct?
Please Note I am restarting my springboot app expecting the messages to start from first. and my kafka server and zookeeper are not terminated.
If theauto acknowledgement is disabled by using this property ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, Then you have to set the acknowledgement mode on container level to MANUAL and don't commit the offset because by default it is set to BATCH.
#Bean
public ConcurrentKafkaListenerContainerFactory<String, String> kafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<String, String> factory = new ConcurrentKafkaListenerContainerFactory();
factory.setConsumerFactory(consumerFactory());
factory.getContainerProperties().setAckMode(AbstractMessageListenerContainer.AckMode.MANUAL);
return factory;
}
Because when auto acknowledgement is disabled container level acknowledgement is set to BATCH
public void setAckMode(ContainerProperties.AckMode ackMode)
Set the ack mode to use when auto ack (in the configuration properties) is false.
RECORD: Ack after each record has been passed to the listener.
BATCH: Ack after each batch of records received from the consumer has been passed to the listener
TIME: Ack after this number of milliseconds; (should be greater than #setPollTimeout(long) pollTimeout.
COUNT: Ack after at least this number of records have been received
MANUAL: Listener is responsible for acking - use a AcknowledgingMessageListener.
Parameters:
ackMode - the ContainerProperties.AckMode; default BATCH.
Committing Offsets
Several options are provided for committing offsets. If the enable.auto.commit consumer property is true, Kafka auto-commits the offsets according to its configuration. If it is false, the containers support several AckMode settings (described in the next list). The default AckMode is BATCH. Starting with version 2.3, the framework sets enable.auto.commit to false unless explicitly set in the configuration. Previously, the Kafka default (true) was used if the property was not set.
And if you want to read from the beginning always you have to set this property auto.offset.reset to earliest
config.put(ConsumerConfig. AUTO_OFFSET_RESET_CONFIG, "earliest");
Note : Make sure groupId must be the new one which does not have any offset in kafka
I'm using Kafka producer 10.2.1 to create a topic and to write to topic, when I create the topic I get the following error, but the topic is created:
java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 60000 ms.
at org.apache.kafka.clients.producer.KafkaProducer$FutureFailure.<init>(KafkaProducer.java:774)
at org.apache.kafka.clients.producer.KafkaProducer.doSend(KafkaProducer.java:494)
at org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:440)
at org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:360)
at kafka.AvroProducer.produce(AvroProducer.java:47)
at samples.TestMqttSource.messageReceived(TestMqttSource.java:89)
at mqtt.JsonConsumer.messageArrived(JsonConsumer.java:132)
at org.eclipse.paho.client.mqttv3.internal.CommsCallback.deliverMessage(CommsCallback.java:477)
at org.eclipse.paho.client.mqttv3.internal.CommsCallback.handleMessage(CommsCallback.java:380)
at org.eclipse.paho.client.mqttv3.internal.CommsCallback.run(CommsCallback.java:184)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 60000 ms.
msg org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 60000 ms.
loc org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 60000 ms.
cause org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 60000 ms.
excep java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 60000 ms.
All suggestions is highly appreciated.
You can't use KafkaProducer to create a topic (So I'm not quite sure how you managed to create the topic, unless you did it previously via a different method such as the kafka admin shell scripts). Instead you use the AdminUtils supplied by Kafka library.
I recently achieved both of the requirements you are after, and you'd be surprised how easy it is to achieve. Below is a simple code example showing you how to create a topic via AdminUtils, and how to then write to it.
class Foo {
private String TOPIC = "testingTopic";
private int NUM_OF_PARTITIONS = 10;
private int REPLICATION_FACTOR = 1;
public Foo() {
ZkClient zkClient = new ZkClient( "localhost:2181", 15000, 10000, ZKStringSerializer$.MODULE$ );
ZkUtils zkUtils = new ZkUtils( zkClient, new ZkConnection( "localhost:2181" ), false);
if ( !AdminUtils.topicExists(zkUtils, TOPIC) ) {
try {
AdminUtils.createTopic(zkUtils, TOPIC, NUM_OF_PARTITIONS, REPLICATION_FACTOR, new Properties(), Enforced$.MODULE$);
Properties producerConfig = new Properties();
producerConfig.put(ProducerConfig.BOOTSTRAP_SERVER_CONFIG, "localhost:9092");
producerConfig.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.ByteArraySerializer");
producerConfig.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer");
KafkaProducer<String, String> producer = new KafkaProducer<>(producerConfig);
// This is just to show you how to write but you could be more elaborate
int i = 0;
while ( i < 11 ) {
ProducerRecord<String, String> rec = new ProducerRecord<>(TOPIC, ("This is line number " + i));
producer.send(rec);
i++;
}
producer.closer();
} catch ( AdminOperationException aoe ) {
aoe.printStackTrace();
}
}
}
}
Remember that if you want to delete topics, by default in the settings this is disabled. The config file you use when starting Kafka (by default it is ${kafka_home}/config/server.properties), add the following line if it doesn't already exist and is set to false or commented out:
delete.topic.enabled=true
You'll then have to restart the server and can delete topics either via Java or the command line tools supplied.
NB
It's always a good idea to close producers / consumers when you are finished with them, as shown in the code example.
I'm stuck on this problem and I can't figure out what's going on. I am trying to use Kafka streams to write a log to a topic. On the other end, I have Kafka-connect entering the each entry into MySQL. So, basically what I need is a Kafka streams program which reads a topic as strings and parses it into Avro format and then enters it into a different topic.
Here's the code I wrote:
//Define schema
String userSchema = "{"
+ "\"type\":\"record\","
+ "\"name\":\"myrecord\","
+ "\"fields\":["
+ " { \"name\":\"ID\", \"type\":\"int\" },"
+ " { \"name\":\"COL_NAME_1\", \"type\":\"string\" },"
+ " { \"name\":\"COL_NAME_2\", \"type\":\"string\" }"
+ "]}";
String key = "key1";
Schema.Parser parser = new Schema.Parser();
Schema schema = parser.parse(userSchema);
//Settings
System.out.println("Kafka Streams Demonstration");
//Settings
Properties settings = new Properties();
// Set a few key parameters
settings.put(StreamsConfig.APPLICATION_ID_CONFIG, APP_ID);
// Kafka bootstrap server (broker to talk to); ubuntu is the host name for my VM running Kafka, port 9092 is where the (single) broker listens
settings.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
// Apache ZooKeeper instance keeping watch over the Kafka cluster; ubuntu is the host name for my VM running Kafka, port 2181 is where the ZooKeeper listens
settings.put(StreamsConfig.ZOOKEEPER_CONNECT_CONFIG, "localhost:2181");
// default serdes for serialzing and deserializing key and value from and to streams in case no specific Serde is specified
settings.put(StreamsConfig.KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
settings.put(StreamsConfig.VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
settings.put(StreamsConfig.STATE_DIR_CONFIG ,"/tmp");
// to work around exception Exception in thread "StreamThread-1" java.lang.IllegalArgumentException: Invalid timestamp -1
// at org.apache.kafka.clients.producer.ProducerRecord.<init>(ProducerRecord.java:60)
// see: https://groups.google.com/forum/#!topic/confluent-platform/5oT0GRztPBo
// Create an instance of StreamsConfig from the Properties instance
StreamsConfig config = new StreamsConfig(getProperties());
final Serde < String > stringSerde = Serdes.String();
final Serde < Long > longSerde = Serdes.Long();
final Serde<byte[]> byteArraySerde = Serdes.ByteArray();
// building Kafka Streams Model
KStreamBuilder kStreamBuilder = new KStreamBuilder();
// the source of the streaming analysis is the topic with country messages
KStream<byte[], String> instream =
kStreamBuilder.stream(byteArraySerde, stringSerde, "sqlin");
final KStream<byte[], GenericRecord> outstream = instream.mapValues(new ValueMapper<String, GenericRecord>() {
#Override
public GenericRecord apply(final String record) {
System.out.println(record);
GenericRecord avroRecord = new GenericData.Record(schema);
String[] array = record.split(" ", -1);
for (int i = 0; i < array.length; i = i + 1) {
if (i == 0)
avroRecord.put("ID", Integer.parseInt(array[0]));
if (i == 1)
avroRecord.put("COL_NAME_1", array[1]);
if (i == 2)
avroRecord.put("COL_NAME_2", array[2]);
}
System.out.println(avroRecord);
return avroRecord;
}
});
outstream.to("sqlout");
Here's the output after I get a Null Pointer Exception:
java -cp streams-examples-3.2.1-standalone.jar io.confluent.examples.streams.sql
Kafka Streams Demonstration
Start
Now started CountriesStreams Example
5 this is
{"ID": 5, "COL_NAME_1": "this", "COL_NAME_2": "is"}
Exception in thread "StreamThread-1" java.lang.NullPointerException
at org.apache.kafka.streams.processor.internals.SinkNode.process(SinkNode.java:81)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:83)
at org.apache.kafka.streams.kstream.internals.KStreamMapValues$KStreamMapProcessor.process(KStreamMapValues.java:42)
at org.apache.kafka.streams.processor.internals.ProcessorNode$1.run(ProcessorNode.java:48)
at org.apache.kafka.streams.processor.internals.StreamsMetricsImpl.measureLatencyNs(StreamsMetricsImpl.java:188)
at org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:134)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:83)
at org.apache.kafka.streams.processor.internals.SourceNode.process(SourceNode.java:70)
at org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:197)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:627)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:361)
The topic sqlin contains a few message which consists of a digit followed by two words. Note the two print lines: The function gets one message, and successfully parses it before catching a null pointer. The problem is I am new to Java, Kafka, and Avro so I'm not sure where I'm going wrong. Did I set up the Avro Schema right? Or am using kstream wrong? Any help here would be greatly appreciated.
I think the problem is the following line:
outstream.to("sqlout");
Your application is configured to, by default, use a String serde for record keys and record values:
settings.put(StreamsConfig.KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
settings.put(StreamsConfig.VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
Since outstream has type KStream<byte[], GenericRecord>, you must provide explicit serdes when calling to():
// sth like
outstream.to(Serdes.ByteArray(), yourGenericAvroSerde, "sqlout");
FYI: The next version of Confluent Platform (ETA: this month = June
2017) will ship with a ready-to-use generic + specific Avro
serde
that integrates with Confluent schema
registry. This
should make your life easier.
See my answer at https://stackoverflow.com/a/44433098/1743580 for further details.