Apache-Kafka-Connect , Confluent-HDFS-Connector , Unknown-magic-byte - java

I have use Confluent HDFS Connector for moving data from Kafka topics to HDFS log file. But when I run these commands:
./bin/connect-standalone
etc/schema-registry/connect-avro-standalone.properties \
etc/kafka-connect-hdfs/quickstart-hdfs.properties
I am taking follow error. How can i solve this problem. What is the reason of that ?
Caused by: org.apache.kafka.common.errors.SerializationException:
Error deserializing Avro message for id -1 Caused by:
org.apache.kafka.common.errors.SerializationException: Unknown magic
byte! [2017-06-03 13:44:41,895] ERROR Task is being killed and will
not recover until manually restarted
(org.apache.kafka.connect.runtime.WorkerTask:142)

This happens if you are trying to read data read the connector and have set key.converter and value.converter to be the AvroConverter but your input topic has data that was not serialized by the same AvroSerializer that uses the schema registry.
You have to match your converter to the input data. In other words, use a serializer that can deserialize the input data.

Related

NoClassDefFoundError: Could not initialize class com.yammer.metrics.Metrics on storm supervisors when trying to read topics from kafka

I am running a topology to read from 100 kafka topics using 100 storm spouts and then 50 bolts emitting the data to my own data processing class, consisting of 10 instances/threads for each bolt. I am getting the following errors now and then(not always) only on some of the supervisor hosts(not always the same). I clearly have com.yammer.metrics.Metrics in my classpath.
However the error keeps bugging me:
java.lang.NoClassDefFoundError: Could not initialize class com.yammer.metrics.Metrics
at kafka.metrics.KafkaMetricsGroup$class.newTimer(KafkaMetricsGroup.scala:52)
at kafka.consumer.FetchRequestAndResponseMetrics.newTimer(FetchRequestAndResponseStats.scala:25)
at kafka.consumer.FetchRequestAndResponseMetrics.<init>(FetchRequestAndResponseStats.scala:26)
at kafka.consumer.FetchRequestAndResponseStats.<init>(FetchRequestAndResponseStats.scala:37)
at kafka.consumer.FetchRequestAndResponseStatsRegistry$$anonfun$2.apply(FetchRequestAndResponseStats.scala:50)
at kafka.consumer.FetchRequestAndResponseStatsRegistry$$anonfun$2.apply(FetchRequestAndResponseStats.scala:50)
at kafka.utils.Pool.getAndMaybePut(Pool.scala:61)
at kafka.consumer.FetchRequestAndResponseStatsRegistry$.getFetchRequestAndResponseStats(FetchRequestAndResponseStats.scala:54)
at kafka.consumer.SimpleConsumer.<init>(SimpleConsumer.scala:39)
at kafka.javaapi.consumer.SimpleConsumer.<init>(SimpleConsumer.scala:34)
at storm.kafka.DynamicPartitionConnections.register(DynamicPartitionConnections.java:43)
at storm.kafka.PartitionManager.<init>(PartitionManager.java:58)
at storm.kafka.ZkCoordinator.refresh(ZkCoordinator.java:78)
at storm.kafka.ZkCoordinator.getMyManagedPartitions(ZkCoordinator.java:45)
at storm.kafka.KafkaSpoutWrapper.nextTuple(KafkaSpoutWrapper.java:74)
at backtype.storm.daemon.executor$fn__8840$fn__8855$fn__8886.invoke(executor.clj:577)
at backtype.storm.util$async_loop$fn__551.invoke(util.clj:491)
at clojure.lang.AFn.run(AFn.java:22)
at java.base/java.lang.Thread.run(Thread.java:833)
Any one having any ideas about this would be greatly appreciated! Thanks!
The storm version I am using: 0.10.2
The kafka version I am using: kafka_2.10-0.8.2.1
The error is weird because it is not always happening and also not always happening for the same supervisor host, not for all the supervisor hosts. I dont know where to begin with

Failed to setup gcp repository using elasticsearch operator

I'm launching 3 elastic nodes using elastic operator and i tried to set up automated snapshots for these instances.
I followed this doc
I minified the json of the service account key and created a file called gcs.client.default.credentials_file with no file extension and added this file to kubernetes secrets.
And added the secureSettings.secretName field to the spec of the elastic cluster and added the secret name to it which was gcs-credentials
But i get this error on the logs
{"#timestamp":"2022-12-26T18:45:40.037Z", "log.level":"ERROR", "message":"fatal exception while booting Elasticsearch", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"main","log.logger":"org.elasticsearch.bootstrap.Elasticsearch","elasticsearch.node.name":"elasticsearch-cluster-es-node-1","elasticsearch.cluster.name":"elasticsearch-cluster","error.type":"java.lang.IllegalStateException","error.message":"failed to load plugin class [org.elasticsearch.repositories.gcs.GoogleCloudStoragePlugin]","error.stack_trace":"java.lang.IllegalStateException: failed to load plugin class [org.elasticsearch.repositories.gcs.GoogleCloudStoragePlugin]\n\tat org.elasticsearch.server#8.5.0/org.elasticsearch.plugins.PluginsService.loadPlugin(PluginsService.java:607)\n\tat org.elasticsearch.server#8.5.0/org.elasticsearch.plugins.PluginsService.loadBundle(PluginsService.java:482)\n\tat org.elasticsearch.server#8.5.0/org.elasticsearch.plugins.PluginsService.loadBundles(PluginsService.java:290)\n\tat org.elasticsearch.server#8.5.0/org.elasticsearch.plugins.PluginsService.<init>(PluginsService.java:159)\n\tat org.elasticsearch.server#8.5.0/org.elasticsearch.plugins.PluginsService.lambda$getPluginsServiceCtor$14(PluginsService.java:634)\n\tat org.elasticsearch.server#8.5.0/org.elasticsearch.node.Node.<init>(Node.java:406)\n\tat org.elasticsearch.server#8.5.0/org.elasticsearch.node.Node.<init>(Node.java:316)\n\tat org.elasticsearch.server#8.5.0/org.elasticsearch.bootstrap.Elasticsearch$2.<init>(Elasticsearch.java:214)\n\tat org.elasticsearch.server#8.5.0/org.elasticsearch.bootstrap.Elasticsearch.initPhase3(Elasticsearch.java:214)\n\tat org.elasticsearch.server#8.5.0/org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:67)\nCaused by: java.lang.reflect.InvocationTargetException\n\tat java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:79)\n\tat java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:500)\n\tat java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:484)\n\tat org.elasticsearch.server#8.5.0/org.elasticsearch.plugins.PluginsService.loadPlugin(PluginsService.java:600)\n\t... 9 more\nCaused by: java.lang.IllegalArgumentException: failed to load GCS client credentials from [gcs.client.default.credentials_file]\n\tat org.elasticsearch.repositories.gcs.GoogleCloudStorageClientSettings.loadCredential(GoogleCloudStorageClientSettings.java:265)\n\tat org.elasticsearch.repositories.gcs.GoogleCloudStorageClientSettings.getClientSettings(GoogleCloudStorageClientSettings.java:221)\n\tat org.elasticsearch.repositories.gcs.GoogleCloudStorageClientSettings.load(GoogleCloudStorageClientSettings.java:209)\n\tat org.elasticsearch.repositories.gcs.GoogleCloudStoragePlugin.reload(GoogleCloudStoragePlugin.java:88)\n\tat org.elasticsearch.repositories.gcs.GoogleCloudStoragePlugin.<init>(GoogleCloudStoragePlugin.java:36)\n\tat java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:67)\n\t... 12 more\nCaused by: java.io.IOException: Invalid PKCS#8 data.\n\tat com.google.auth.oauth2.ServiceAccountCredentials.privateKeyFromPkcs8(ServiceAccountCredentials.java:496)\n\tat com.google.auth.oauth2.ServiceAccountCredentials.fromPkcs8(ServiceAccountCredentials.java:474)\n\tat com.google.auth.oauth2.ServiceAccountCredentials.fromJson(ServiceAccountCredentials.java:212)\n\tat com.google.auth.oauth2.ServiceAccountCredentials.fromStream(ServiceAccountCredentials.java:548)\n\tat com.google.auth.oauth2.ServiceAccountCredentials.fromStream(ServiceAccountCredentials.java:520)\n\tat org.elasticsearch.repositories.gcs.GoogleCloudStorageClientSettings.lambda$loadCredential$13(GoogleCloudStorageClientSettings.java:257)\n\tat java.base/java.security.AccessController.doPrivileged(AccessController.java:569)\n\tat org.elasticsearch.repositories.gcs.SocketAccess.doPrivilegedIOException(SocketAccess.java:33)\n\tat org.elasticsearch.repositories.gcs.GoogleCloudStorageClientSettings.loadCredential(GoogleCloudStorageClientSettings.java:256)\n\t... 17 more\n"}
ERROR: Elasticsearch did not exit normally - check the logs at /usr/share/elasticsearch/logs/elasticsearch-cluster.log
Try adding the following lines to your configuration (on each Elasticsearch):
elasticsearch01:
image: docker.elastic.co/elasticsearch/elasticsearch:7.6.2
...
ulimits:
memlock:
soft: -1
hard: -1
Also check this link on Elasticsearch for more detailed information.

How to deserialize old and new records in a topic Kafka that its schema registry changes over time?

I have an issue to read from a topic that its schema registry changes with upgrades tell now it has the fourth version of Avro file. I think that the first records in that topic have the structure of the first version of the Avro structure and the last records in the topic have the newest version.
My question how can I deserialize the old and the new records from this topic?
Please note that I have exported the two schemas with theirs ids from schema registry.
The exception message is shown when I'm using the old avro schema that has the id 10.
org.apache.kafka.common.errors.SerializationException: Error deserializing key/value for partition topicvalues_out_v1-1 at offset 334224. If needed, please seek past the record to continue consumption.
Caused by: org.apache.kafka.common.errors.SerializationException: Error retrieving Avro unknown schema for id 10
Caused by: java.io.IOException: Cannot get schema from schema registry!
and when I'm using the newer avro schema that has the id of 25 it throws the same exception
org.apache.kafka.common.errors.SerializationException: Error deserializing key/value for partition topicvalues_out_v1-1 at offset 334224. If needed, please seek past the record to continue consumption.
Caused by: org.apache.kafka.common.errors.SerializationException: Error retrieving Avro unknown schema for id 25
Caused by: java.io.IOException: Cannot get schema from schema registry!

Kafka Stream Subscription Error - Invalid Version

When attempting to connect to a topic from Java jetty microservice, I’m getting this Kafka internal version mismatch error:
stream-thread [App-94d44dcd-f1d4-49a6-9dd3-8d4eee06f82a-StreamThread-1] Encountered the following error during processing:
java.lang.IllegalArgumentException: version must be between 1 and 3; was: 4
at org.apache.kafka.streams.processor.internals.assignment.SubscriptionInfo.<init>(SubscriptionInfo.java:67)
at org.apache.kafka.streams.processor.internals.StreamsPartitionAssignor.subscription(StreamsPartitionAssignor.java:312)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.metadata(ConsumerCoordinator.java:176)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.sendJoinGroupRequest(AbstractCoordinator.java:515)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.initiateJoinGroup(AbstractCoordinator.java:466)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:412)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:352)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:337)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:333)
at org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1218)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1175)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1154)
at org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:861)
at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:814)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:767)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:736)
Any ideas on what could cause such an exception?
I had come across this error myself and it is most likely because you have used non-unique APPLICATION_ID_CONFIG and/or CLIENT_ID_CONFIG
// Give the Streams application a unique name. The name must be unique in the Kafka cluster
// against which the application is run.
streamsConfiguration.put(StreamsConfig.APPLICATION_ID_CONFIG, "my-app");
streamsConfiguration.put(StreamsConfig.CLIENT_ID_CONFIG, "my-client");

Kafka Streams creates avro topic without schema

I developed a java application which reads data from an avro topic, using Schema Registry, then makes simple transformations and prints the result in the console. By default I used GenericAvroSerde class for keys and values. Everything worked fine except that I had to define additionally configuration for each serde like
final Map<String, String> serdeConfig = Collections.singletonMap("schema.registry.url", kafkaStreamsConfig.getProperty("schema.registry.url"));
final Serde<GenericRecord> keyGenericAvroSerde = new GenericAvroSerde();
final Serde<GenericRecord> valueGenericAvroSerde = new GenericAvroSerde();
keyGenericAvroSerde.configure(serdeConfig, true);
valueGenericAvroSerde.configure(serdeConfig, false);
Without that I always get an error like:
Exception in thread "NTB27821-StreamThread-1" org.apache.kafka.streams.errors.StreamsException: Failed to deserialize value for record. topic=CH-PGP-LP2_S20-002_agg, partition=0, offset=4482940
at org.apache.kafka.streams.processor.internals.SourceNodeRecordDeserializer.deserialize(SourceNodeRecordDeserializer.java:46)
at org.apache.kafka.streams.processor.internals.RecordQueue.addRawRecords(RecordQueue.java:84)
at org.apache.kafka.streams.processor.internals.PartitionGroup.addRawRecords(PartitionGroup.java:117)
at org.apache.kafka.streams.processor.internals.StreamTask.addRecords(StreamTask.java:474)
at org.apache.kafka.streams.processor.internals.StreamThread.addRecordsToTasks(StreamThread.java:642)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:548)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:519)
Caused by: org.apache.kafka.common.errors.SerializationException: Error deserializing Avro message for id 69
Caused by: java.lang.NullPointerException
at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer.deserialize(AbstractKafkaAvroDeserializer.java:122)
at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer.deserialize(AbstractKafkaAvroDeserializer.java:93)
at io.confluent.kafka.serializers.KafkaAvroDeserializer.deserialize(KafkaAvroDeserializer.java:55)
at io.confluent.kafka.streams.serdes.avro.GenericAvroDeserializer.deserialize(GenericAvroDeserializer.java:63)
at io.confluent.kafka.streams.serdes.avro.GenericAvroDeserializer.deserialize(GenericAvroDeserializer.java:39)
at org.apache.kafka.common.serialization.ExtendedDeserializer$Wrapper.deserialize(ExtendedDeserializer.java:65)
at org.apache.kafka.common.serialization.ExtendedDeserializer$Wrapper.deserialize(ExtendedDeserializer.java:55)
at org.apache.kafka.streams.processor.internals.SourceNode.deserializeValue(SourceNode.java:56)
at org.apache.kafka.streams.processor.internals.SourceNodeRecordDeserializer.deserialize(SourceNodeRecordDeserializer.java:44)
at org.apache.kafka.streams.processor.internals.RecordQueue.addRawRecords(RecordQueue.java:84)
at org.apache.kafka.streams.processor.internals.PartitionGroup.addRawRecords(PartitionGroup.java:117)
at org.apache.kafka.streams.processor.internals.StreamTask.addRecords(StreamTask.java:474)
at org.apache.kafka.streams.processor.internals.StreamThread.addRecordsToTasks(StreamThread.java:642)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:548)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:519)
Well, it was unsual, but fine, after that (when I added configuration call as I posted above) - it worked and my application was able to to all the operations and print out the result.
But!
When I tried to use call through() - just to post data to the new topic - I faced the problem I am asking about: TOPIC WAS CREATED WITHOUT A SCHEMA.
How it can be???
Interesting fact is that the data is being written, but it is:
a) in binary format, so simple consumer cannot read it
b) it has not a schema - so avro consumer can't read it either:
Processed a total of 1 messages
[2017-10-05 11:25:53,241] ERROR Unknown error when running consumer: (kafka.tools.ConsoleConsumer$:105)
org.apache.kafka.common.errors.SerializationException: Error retrieving Avro schema for id 0
Caused by: io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: Schema not found; error code: 40403
at io.confluent.kafka.schemaregistry.client.rest.RestService.sendHttpRequest(RestService.java:182)
at io.confluent.kafka.schemaregistry.client.rest.RestService.httpRequest(RestService.java:203)
at io.confluent.kafka.schemaregistry.client.rest.RestService.getId(RestService.java:379)
at io.confluent.kafka.schemaregistry.client.rest.RestService.getId(RestService.java:372)
at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getSchemaByIdFromRegistry(CachedSchemaRegistryClient.java:65)
at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getBySubjectAndId(CachedSchemaRegistryClient.java:131)
at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer.deserialize(AbstractKafkaAvroDeserializer.java:122)
at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer.deserialize(AbstractKafkaAvroDeserializer.java:93)
at io.confluent.kafka.formatter.AvroMessageFormatter.writeTo(AvroMessageFormatter.java:122)
at io.confluent.kafka.formatter.AvroMessageFormatter.writeTo(AvroMessageFormatter.java:114)
at kafka.tools.ConsoleConsumer$.process(ConsoleConsumer.scala:140)
at kafka.tools.ConsoleConsumer$.run(ConsoleConsumer.scala:78)
at kafka.tools.ConsoleConsumer$.main(ConsoleConsumer.scala:53)
at kafka.tools.ConsoleConsumer.main(ConsoleConsumer.scala)
[2017-10-05 11:25:53,241] ERROR Unknown error when running consumer: (kafka.tools.ConsoleConsumer$:105)
org.apache.kafka.common.errors.SerializationException: Error retrieving Avro schema for id 0
Caused by: io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: Schema not found; error code: 40403
at io.confluent.kafka.schemaregistry.client.rest.RestService.sendHttpRequest(RestService.java:182)
at io.confluent.kafka.schemaregistry.client.rest.RestService.httpRequest(RestService.java:203)
at io.confluent.kafka.schemaregistry.client.rest.RestService.getId(RestService.java:379)
at io.confluent.kafka.schemaregistry.client.rest.RestService.getId(RestService.java:372)
at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getSchemaByIdFromRegistry(CachedSchemaRegistryClient.java:65)
at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getBySubjectAndId(CachedSchemaRegistryClient.java:131)
at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer.deserialize(AbstractKafkaAvroDeserializer.java:122)
at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer.deserialize(AbstractKafkaAvroDeserializer.java:93)
at io.confluent.kafka.formatter.AvroMessageFormatter.writeTo(AvroMessageFormatter.java:122)
at io.confluent.kafka.formatter.AvroMessageFormatter.writeTo(AvroMessageFormatter.java:114)
at kafka.tools.ConsoleConsumer$.process(ConsoleConsumer.scala:140)
at kafka.tools.ConsoleConsumer$.run(ConsoleConsumer.scala:78)
at kafka.tools.ConsoleConsumer$.main(ConsoleConsumer.scala:53)
at kafka.tools.ConsoleConsumer.main(ConsoleConsumer.scala)
Of course I checked out the schema registry for the subject:
curl -X GET http://localhost:8081/subjects/agg_value_9-value/versions
{"error_code":40401,"message":"Subject not found."}
But the same call to another topic written by Java App - producer of the initial data shows that schema exist:
curl -X GET http://localhost:8081/subjects/CH-PGP-LP2_S20-002_agg-value/versions
[1]
Both applications use identical "schema.registry.url" configuration
Just to summarize - topic is created, data is written, can be read with simple consumer, but it is binary and the schema doesn't exist.
Also I tried to create a schema with a Landoop, somehow to match the data, but no success - and by the way it is not a proper way to use kafka streams - everything should be done on the fly.
Help, please!
When through is called, the default serde defined via StreamsConfig is used unless users specifically overrides it. Which default serde did you use? To be correct you should be using the AbstractKafkaAvroSerializer which will automatically register the schema for that through topic.

Categories