Kafka Streams creates avro topic without schema - java

I developed a java application which reads data from an avro topic, using Schema Registry, then makes simple transformations and prints the result in the console. By default I used GenericAvroSerde class for keys and values. Everything worked fine except that I had to define additionally configuration for each serde like
final Map<String, String> serdeConfig = Collections.singletonMap("schema.registry.url", kafkaStreamsConfig.getProperty("schema.registry.url"));
final Serde<GenericRecord> keyGenericAvroSerde = new GenericAvroSerde();
final Serde<GenericRecord> valueGenericAvroSerde = new GenericAvroSerde();
keyGenericAvroSerde.configure(serdeConfig, true);
valueGenericAvroSerde.configure(serdeConfig, false);
Without that I always get an error like:
Exception in thread "NTB27821-StreamThread-1" org.apache.kafka.streams.errors.StreamsException: Failed to deserialize value for record. topic=CH-PGP-LP2_S20-002_agg, partition=0, offset=4482940
at org.apache.kafka.streams.processor.internals.SourceNodeRecordDeserializer.deserialize(SourceNodeRecordDeserializer.java:46)
at org.apache.kafka.streams.processor.internals.RecordQueue.addRawRecords(RecordQueue.java:84)
at org.apache.kafka.streams.processor.internals.PartitionGroup.addRawRecords(PartitionGroup.java:117)
at org.apache.kafka.streams.processor.internals.StreamTask.addRecords(StreamTask.java:474)
at org.apache.kafka.streams.processor.internals.StreamThread.addRecordsToTasks(StreamThread.java:642)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:548)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:519)
Caused by: org.apache.kafka.common.errors.SerializationException: Error deserializing Avro message for id 69
Caused by: java.lang.NullPointerException
at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer.deserialize(AbstractKafkaAvroDeserializer.java:122)
at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer.deserialize(AbstractKafkaAvroDeserializer.java:93)
at io.confluent.kafka.serializers.KafkaAvroDeserializer.deserialize(KafkaAvroDeserializer.java:55)
at io.confluent.kafka.streams.serdes.avro.GenericAvroDeserializer.deserialize(GenericAvroDeserializer.java:63)
at io.confluent.kafka.streams.serdes.avro.GenericAvroDeserializer.deserialize(GenericAvroDeserializer.java:39)
at org.apache.kafka.common.serialization.ExtendedDeserializer$Wrapper.deserialize(ExtendedDeserializer.java:65)
at org.apache.kafka.common.serialization.ExtendedDeserializer$Wrapper.deserialize(ExtendedDeserializer.java:55)
at org.apache.kafka.streams.processor.internals.SourceNode.deserializeValue(SourceNode.java:56)
at org.apache.kafka.streams.processor.internals.SourceNodeRecordDeserializer.deserialize(SourceNodeRecordDeserializer.java:44)
at org.apache.kafka.streams.processor.internals.RecordQueue.addRawRecords(RecordQueue.java:84)
at org.apache.kafka.streams.processor.internals.PartitionGroup.addRawRecords(PartitionGroup.java:117)
at org.apache.kafka.streams.processor.internals.StreamTask.addRecords(StreamTask.java:474)
at org.apache.kafka.streams.processor.internals.StreamThread.addRecordsToTasks(StreamThread.java:642)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:548)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:519)
Well, it was unsual, but fine, after that (when I added configuration call as I posted above) - it worked and my application was able to to all the operations and print out the result.
But!
When I tried to use call through() - just to post data to the new topic - I faced the problem I am asking about: TOPIC WAS CREATED WITHOUT A SCHEMA.
How it can be???
Interesting fact is that the data is being written, but it is:
a) in binary format, so simple consumer cannot read it
b) it has not a schema - so avro consumer can't read it either:
Processed a total of 1 messages
[2017-10-05 11:25:53,241] ERROR Unknown error when running consumer: (kafka.tools.ConsoleConsumer$:105)
org.apache.kafka.common.errors.SerializationException: Error retrieving Avro schema for id 0
Caused by: io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: Schema not found; error code: 40403
at io.confluent.kafka.schemaregistry.client.rest.RestService.sendHttpRequest(RestService.java:182)
at io.confluent.kafka.schemaregistry.client.rest.RestService.httpRequest(RestService.java:203)
at io.confluent.kafka.schemaregistry.client.rest.RestService.getId(RestService.java:379)
at io.confluent.kafka.schemaregistry.client.rest.RestService.getId(RestService.java:372)
at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getSchemaByIdFromRegistry(CachedSchemaRegistryClient.java:65)
at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getBySubjectAndId(CachedSchemaRegistryClient.java:131)
at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer.deserialize(AbstractKafkaAvroDeserializer.java:122)
at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer.deserialize(AbstractKafkaAvroDeserializer.java:93)
at io.confluent.kafka.formatter.AvroMessageFormatter.writeTo(AvroMessageFormatter.java:122)
at io.confluent.kafka.formatter.AvroMessageFormatter.writeTo(AvroMessageFormatter.java:114)
at kafka.tools.ConsoleConsumer$.process(ConsoleConsumer.scala:140)
at kafka.tools.ConsoleConsumer$.run(ConsoleConsumer.scala:78)
at kafka.tools.ConsoleConsumer$.main(ConsoleConsumer.scala:53)
at kafka.tools.ConsoleConsumer.main(ConsoleConsumer.scala)
[2017-10-05 11:25:53,241] ERROR Unknown error when running consumer: (kafka.tools.ConsoleConsumer$:105)
org.apache.kafka.common.errors.SerializationException: Error retrieving Avro schema for id 0
Caused by: io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: Schema not found; error code: 40403
at io.confluent.kafka.schemaregistry.client.rest.RestService.sendHttpRequest(RestService.java:182)
at io.confluent.kafka.schemaregistry.client.rest.RestService.httpRequest(RestService.java:203)
at io.confluent.kafka.schemaregistry.client.rest.RestService.getId(RestService.java:379)
at io.confluent.kafka.schemaregistry.client.rest.RestService.getId(RestService.java:372)
at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getSchemaByIdFromRegistry(CachedSchemaRegistryClient.java:65)
at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getBySubjectAndId(CachedSchemaRegistryClient.java:131)
at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer.deserialize(AbstractKafkaAvroDeserializer.java:122)
at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer.deserialize(AbstractKafkaAvroDeserializer.java:93)
at io.confluent.kafka.formatter.AvroMessageFormatter.writeTo(AvroMessageFormatter.java:122)
at io.confluent.kafka.formatter.AvroMessageFormatter.writeTo(AvroMessageFormatter.java:114)
at kafka.tools.ConsoleConsumer$.process(ConsoleConsumer.scala:140)
at kafka.tools.ConsoleConsumer$.run(ConsoleConsumer.scala:78)
at kafka.tools.ConsoleConsumer$.main(ConsoleConsumer.scala:53)
at kafka.tools.ConsoleConsumer.main(ConsoleConsumer.scala)
Of course I checked out the schema registry for the subject:
curl -X GET http://localhost:8081/subjects/agg_value_9-value/versions
{"error_code":40401,"message":"Subject not found."}
But the same call to another topic written by Java App - producer of the initial data shows that schema exist:
curl -X GET http://localhost:8081/subjects/CH-PGP-LP2_S20-002_agg-value/versions
[1]
Both applications use identical "schema.registry.url" configuration
Just to summarize - topic is created, data is written, can be read with simple consumer, but it is binary and the schema doesn't exist.
Also I tried to create a schema with a Landoop, somehow to match the data, but no success - and by the way it is not a proper way to use kafka streams - everything should be done on the fly.
Help, please!

When through is called, the default serde defined via StreamsConfig is used unless users specifically overrides it. Which default serde did you use? To be correct you should be using the AbstractKafkaAvroSerializer which will automatically register the schema for that through topic.

Related

NoClassDefFoundError: Could not initialize class com.yammer.metrics.Metrics on storm supervisors when trying to read topics from kafka

I am running a topology to read from 100 kafka topics using 100 storm spouts and then 50 bolts emitting the data to my own data processing class, consisting of 10 instances/threads for each bolt. I am getting the following errors now and then(not always) only on some of the supervisor hosts(not always the same). I clearly have com.yammer.metrics.Metrics in my classpath.
However the error keeps bugging me:
java.lang.NoClassDefFoundError: Could not initialize class com.yammer.metrics.Metrics
at kafka.metrics.KafkaMetricsGroup$class.newTimer(KafkaMetricsGroup.scala:52)
at kafka.consumer.FetchRequestAndResponseMetrics.newTimer(FetchRequestAndResponseStats.scala:25)
at kafka.consumer.FetchRequestAndResponseMetrics.<init>(FetchRequestAndResponseStats.scala:26)
at kafka.consumer.FetchRequestAndResponseStats.<init>(FetchRequestAndResponseStats.scala:37)
at kafka.consumer.FetchRequestAndResponseStatsRegistry$$anonfun$2.apply(FetchRequestAndResponseStats.scala:50)
at kafka.consumer.FetchRequestAndResponseStatsRegistry$$anonfun$2.apply(FetchRequestAndResponseStats.scala:50)
at kafka.utils.Pool.getAndMaybePut(Pool.scala:61)
at kafka.consumer.FetchRequestAndResponseStatsRegistry$.getFetchRequestAndResponseStats(FetchRequestAndResponseStats.scala:54)
at kafka.consumer.SimpleConsumer.<init>(SimpleConsumer.scala:39)
at kafka.javaapi.consumer.SimpleConsumer.<init>(SimpleConsumer.scala:34)
at storm.kafka.DynamicPartitionConnections.register(DynamicPartitionConnections.java:43)
at storm.kafka.PartitionManager.<init>(PartitionManager.java:58)
at storm.kafka.ZkCoordinator.refresh(ZkCoordinator.java:78)
at storm.kafka.ZkCoordinator.getMyManagedPartitions(ZkCoordinator.java:45)
at storm.kafka.KafkaSpoutWrapper.nextTuple(KafkaSpoutWrapper.java:74)
at backtype.storm.daemon.executor$fn__8840$fn__8855$fn__8886.invoke(executor.clj:577)
at backtype.storm.util$async_loop$fn__551.invoke(util.clj:491)
at clojure.lang.AFn.run(AFn.java:22)
at java.base/java.lang.Thread.run(Thread.java:833)
Any one having any ideas about this would be greatly appreciated! Thanks!
The storm version I am using: 0.10.2
The kafka version I am using: kafka_2.10-0.8.2.1
The error is weird because it is not always happening and also not always happening for the same supervisor host, not for all the supervisor hosts. I dont know where to begin with

How to deserialize old and new records in a topic Kafka that its schema registry changes over time?

I have an issue to read from a topic that its schema registry changes with upgrades tell now it has the fourth version of Avro file. I think that the first records in that topic have the structure of the first version of the Avro structure and the last records in the topic have the newest version.
My question how can I deserialize the old and the new records from this topic?
Please note that I have exported the two schemas with theirs ids from schema registry.
The exception message is shown when I'm using the old avro schema that has the id 10.
org.apache.kafka.common.errors.SerializationException: Error deserializing key/value for partition topicvalues_out_v1-1 at offset 334224. If needed, please seek past the record to continue consumption.
Caused by: org.apache.kafka.common.errors.SerializationException: Error retrieving Avro unknown schema for id 10
Caused by: java.io.IOException: Cannot get schema from schema registry!
and when I'm using the newer avro schema that has the id of 25 it throws the same exception
org.apache.kafka.common.errors.SerializationException: Error deserializing key/value for partition topicvalues_out_v1-1 at offset 334224. If needed, please seek past the record to continue consumption.
Caused by: org.apache.kafka.common.errors.SerializationException: Error retrieving Avro unknown schema for id 25
Caused by: java.io.IOException: Cannot get schema from schema registry!

Kafka Stream Subscription Error - Invalid Version

When attempting to connect to a topic from Java jetty microservice, I’m getting this Kafka internal version mismatch error:
stream-thread [App-94d44dcd-f1d4-49a6-9dd3-8d4eee06f82a-StreamThread-1] Encountered the following error during processing:
java.lang.IllegalArgumentException: version must be between 1 and 3; was: 4
at org.apache.kafka.streams.processor.internals.assignment.SubscriptionInfo.<init>(SubscriptionInfo.java:67)
at org.apache.kafka.streams.processor.internals.StreamsPartitionAssignor.subscription(StreamsPartitionAssignor.java:312)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.metadata(ConsumerCoordinator.java:176)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.sendJoinGroupRequest(AbstractCoordinator.java:515)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.initiateJoinGroup(AbstractCoordinator.java:466)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:412)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:352)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:337)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:333)
at org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1218)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1175)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1154)
at org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:861)
at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:814)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:767)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:736)
Any ideas on what could cause such an exception?
I had come across this error myself and it is most likely because you have used non-unique APPLICATION_ID_CONFIG and/or CLIENT_ID_CONFIG
// Give the Streams application a unique name. The name must be unique in the Kafka cluster
// against which the application is run.
streamsConfiguration.put(StreamsConfig.APPLICATION_ID_CONFIG, "my-app");
streamsConfiguration.put(StreamsConfig.CLIENT_ID_CONFIG, "my-client");

Apache-Kafka-Connect , Confluent-HDFS-Connector , Unknown-magic-byte

I have use Confluent HDFS Connector for moving data from Kafka topics to HDFS log file. But when I run these commands:
./bin/connect-standalone
etc/schema-registry/connect-avro-standalone.properties \
etc/kafka-connect-hdfs/quickstart-hdfs.properties
I am taking follow error. How can i solve this problem. What is the reason of that ?
Caused by: org.apache.kafka.common.errors.SerializationException:
Error deserializing Avro message for id -1 Caused by:
org.apache.kafka.common.errors.SerializationException: Unknown magic
byte! [2017-06-03 13:44:41,895] ERROR Task is being killed and will
not recover until manually restarted
(org.apache.kafka.connect.runtime.WorkerTask:142)
This happens if you are trying to read data read the connector and have set key.converter and value.converter to be the AvroConverter but your input topic has data that was not serialized by the same AvroSerializer that uses the schema registry.
You have to match your converter to the input data. In other words, use a serializer that can deserialize the input data.

Samza/Kafka Failed to Update Metadata

I am currently working on writing a Samza Script that will just take data from a Kafka topic and output the data to another Kafka topic. I have written a very basic StreamTask however upon execution I am running into an error.
The error is below:
Exception in thread "main" org.apache.samza.SamzaException: org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 193 ms.
at org.apache.samza.coordinator.stream.CoordinatorStreamSystemProducer.send(CoordinatorStreamSystemProducer.java:112)
at org.apache.samza.coordinator.stream.CoordinatorStreamSystemProducer.writeConfig(CoordinatorStreamSystemProducer.java:129)
at org.apache.samza.job.JobRunner.run(JobRunner.scala:79)
at org.apache.samza.job.JobRunner$.main(JobRunner.scala:48)
at org.apache.samza.job.JobRunner.main(JobRunner.scala)
Caused by: org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 193 ms
I not entirely sure how to configure or have the script write the required Kafka metadata. Below is my code for the StreamTask and the properties file. In the properties file I added the Metadata section to see if that would assist in the process afterwards but to no avail. Is that the right direction or am I missing something entirely?
import org.apache.samza.task.StreamTask;
import org.apache.samza.task.MessageCollector;
import org.apache.samza.task.TaskCoordinator;
import org.apache.samza.system.SystemStream;
import org.apache.samza.system.IncomingMessageEnvelope;
import org.apache.samza.system.OutgoingMessageEnvelope;
/*
* Take all messages received and send them to
* a Kafka topic called "words"
*/
public class TestStreamTask implements StreamTask{
private static final SystemStream OUTPUT_STREAM = new SystemStream("kafka" , "words"); // create new system stream for kafka topic "words"
#Override
public void process(IncomingMessageEnvelope envelope, MessageCollector collector, TaskCoordinator coordinator){
String message = (String) envelope.getMessage(); // pull message from stream
for(String word : message.split(" "))
collector.send(new OutgoingMessageEnvelope(OUTPUT_STREAM, word, 1)); // output messsage to new system stream for kafka topic "words"
}
}
# Job
job.factory.class=org.apache.samza.job.yarn.YarnJobFactory
job.name=test-words
# YARN
yarn.package.path=file://${basedir}/target/${project.artifactId}-${pom.version}-dist.tar.gz
# Task
task.class=samza.examples.wikipedia.task.TestStreamTask
task.inputs=kafka.test
task.checkpoint.factory=org.apache.samza.checkpoint.kafka.KafkaCheckpointManagerFactory
task.checkpoint.system=kafka
task.checkpoint.replication.factor=1
# Metrics
metrics.reporters=snapshot,jmx
metrics.reporter.snapshot.class=org.apache.samza.metrics.reporter.MetricsSnapshotReporterFactory
metrics.reporter.snapshot.stream=kafka.metrics
metrics.reporter.jmx.class=org.apache.samza.metrics.reporter.JmxReporterFactory
# Serializers
serializers.registry.string.class=org.apache.samza.serializers.StringSerdeFactory
serializers.registry.metrics.class=org.apache.samza.serializers.MetricsSnapshotSerdeFactory
# Systems
systems.kafka.samza.factory=org.apache.samza.system.kafka.KafkaSystemFactory
systems.kafka.samza.msg.serde=string
systems.kafka.consumer.zookeeper.connect=localhost:2181/
systems.kafka.consumer.auto.offset.reset=largest
systems.kafka.producer.bootstrap.servers=localhost:9092
# Metadata
systems.kafka.metadata.bootstrap.servers=localhost:9092
This question is about Kafka 0.8 which should be out of support if I am not mistaken.
This fact, combined with the context of people only running into this issue sometimes, but not all the time (and nobody seems to struggle with this in recent years), gives me very good confidence that upgrading to a more recent version of Kafka will resolve the problem.

Categories