Kafka consumer in flink - java

I am working with kafka and apache flink. I am trying to consume records (which are in avro format) from a kafka topic in apache flink. Below is the piece of code I am trying with.
Using a custom deserialiser to deserialise avro records from the topic.
the Avro schema for the data I am sending to topic "test-topic" is as below.
{
"namespace": "com.example.flink.avro",
"type": "record",
"name": "UserInfo",
"fields": [
{"name": "name", "type": "string"}
]
}
The custom deserialiser I am using is as below.
public class AvroDeserializationSchema<T> implements DeserializationSchema<T> {
private static final long serialVersionUID = 1L;
private final Class<T> avroType;
private transient DatumReader<T> reader;
private transient BinaryDecoder decoder;
public AvroDeserializationSchema(Class<T> avroType) {
this.avroType = avroType;
}
public T deserialize(byte[] message) {
ensureInitialized();
try {
decoder = DecoderFactory.get().binaryDecoder(message, decoder);
T t = reader.read(null, decoder);
return t;
} catch (Exception ex) {
throw new RuntimeException(ex);
}
}
private void ensureInitialized() {
if (reader == null) {
if (org.apache.avro.specific.SpecificRecordBase.class.isAssignableFrom(avroType)) {
reader = new SpecificDatumReader<T>(avroType);
} else {
reader = new ReflectDatumReader<T>(avroType);
}
}
}
public boolean isEndOfStream(T nextElement) {
return false;
}
public TypeInformation<T> getProducedType() {
return TypeExtractor.getForClass(avroType);
}
}
And this is how my flink app is written.
public class FlinkKafkaApp {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
Properties kafkaProperties = new Properties();
kafkaProperties.put("bootstrap.servers", "localhost:9092");
kafkaProperties.put("group.id", "test");
AvroDeserializationSchema<UserInfo> schema = new AvroDeserializationSchema<UserInfo>(UserInfo.class);
FlinkKafkaConsumer011<UserInfo> consumer = new FlinkKafkaConsumer011<UserInfo>("test-topic", schema, kafkaProperties);
DataStreamSource<UserInfo> userStream = env.addSource(consumer);
userStream.map(new MapFunction<UserInfo, UserInfo>() {
#Override
public UserInfo map(UserInfo userInfo) {
return userInfo;
}
}).print();
env.execute("Test Kafka");
}
I am trying to print the record sent to the the topic which is as below.
{"name" :"sumit"}
Output:
The output I am getting is
{"name":""}
Can anyone help to figure out what is the issue here and why I am not getting {"name" : "sumit"} as output.

Flink documentation says :
Flinkā€™s Kafka consumer is called FlinkKafkaConsumer08 (or 09 for Kafka 0.9.0.x versions, etc. or just FlinkKafkaConsumer for Kafka >= 1.0.0 versions). It provides access to one or more Kafka topics.
We do not have to write the custom de-serializer to consume Avro messages from Kafka.
-To read SpecificRecords :
DataStreamSource<UserInfo> stream = streamExecutionEnvironment.addSource(new FlinkKafkaConsumer<>("test_topic", AvroDeserializationSchema.forSpecific(UserInfo.class), properties).setStartFromEarliest());
To read GenericRecords :
Schema schema = Schema.parse("{"namespace": "com.example.flink.avro","type": "record","name": "UserInfo","fields": [{"name": "name", "type": "string"}]}");
DataStreamSource<GenericRecord> stream = streamExecutionEnvironment.addSource(new FlinkKafkaConsumer<>("test_topic", AvroDeserializationSchema.forGeneric(schema), properties).setStartFromEarliest());
For more details : https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/kafka.html#kafka-consumer

Related

Java - Flink sending empty object on kafka sink

On my flink script I have a stream that I'm getting from one kafka topic, manipulate it and sending it back to kafka using the sink.
public static void main(String[] args) throws Exception {
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(1);
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
Properties p = new Properties();
p.setProperty("bootstrap.servers", servers_ip_list);
p.setProperty("gropu.id", "Flink");
FlinkKafkaConsumer<Event_N> kafkaData_N =
new FlinkKafkaConsumer("CorID_0", new Ev_Des_Sch_N(), p);
WatermarkStrategy<Event_N> wmStrategy =
WatermarkStrategy
.<Event_N>forMonotonousTimestamps()
.withIdleness(Duration.ofMinutes(1))
.withTimestampAssigner((Event, timestamp) -> {
return Event.get_Time();
});
DataStream<Event_N> stream_N = env.addSource(
kafkaData_N.assignTimestampsAndWatermarks(wmStrategy));
The part above is working fine no problems at all, the part below instead is where I'm getting the issue.
String ProducerTopic = "CorID_0_f1";
DataStream<Stream_Blocker_Pojo.block> box_stream_p= stream_N
.keyBy((Event_N CorrID) -> CorrID.get_CorrID())
.map(new Stream_Blocker_Pojo());
FlinkKafkaProducer<Stream_Blocker_Pojo.block> myProducer = new FlinkKafkaProducer<>(
ProducerTopic,
new ObjSerializationSchema(ProducerTopic),
p,
FlinkKafkaProducer.Semantic.EXACTLY_ONCE); // fault-tolerance
box_stream_p.addSink(myProducer);
No errors everything works fine, this is the Stream_Blocker_Pojo where I'm mapping a stream manipulating it and sending out a new one.(I have simplify my code, just keeping 4 variables and removing all the math and data processing).
public class Stream_Blocker_Pojo extends RichMapFunction<Event_N, Stream_Blocker_Pojo.block>
{
public class block {
public Double block_id;
public Double block_var2 ;
public Double block_var3;
public Double block_var4;}
private transient ValueState<block> state_a;
#Override
public void open(Configuration parameters) throws Exception {
state_a = getRuntimeContext().getState(new ValueStateDescriptor<>("BoxState_a", block.class));
}
public block map(Event_N input) throws Exception {
p1.Stream_Blocker_Pojo.block current_a = state_a.value();
if (current_a == null) {
current_a = new p1.Stream_Blocker_Pojo.block();
current_a.block_id = 0.0;
current_a.block_var2 = 0.0;
current_a.block_var3 = 0.0;
current_a.block_var4 = 0.0;}
current_a.block_id = input.f_num_id;
current_a.block_var2 = input.f_num_2;
current_a.block_var3 = input.f_num_3;
current_a.tblock_var4 = input.f_num_4;
state_a.update(current_a);
return new block();
};
}
This is the implementation of the Kafka Serialization schema.
public class ObjSerializationSchema implements KafkaSerializationSchema<Stream_Blocker_Pojo.block>{
private String topic;
private ObjectMapper mapper;
public ObjSerializationSchema(String topic) {
super();
this.topic = topic;
}
#Override
public ProducerRecord<byte[], byte[]> serialize(Stream_Blocker_Pojo.block obj, Long timestamp) {
byte[] b = null;
if (mapper == null) {
mapper = new ObjectMapper();
}
try {
b= mapper.writeValueAsBytes(obj);
} catch (JsonProcessingException e) {
}
return new ProducerRecord<byte[], byte[]>(topic, b);
}
}
When I open the messages that i sent from my Flink script using kafka, I find that all the variables are "null"
CorrID b'{"block_id":null,"block_var1":null,"block_var2":null,"block_var3":null,"block_var4":null}
It looks like I'm sending out an empty obj with no values. But I'm struggling to understand what I'm doing wrong. I think that the problem could be into my implementation of the Stream_Blocker_Pojo or maybe into the ObjSerializationSchema, Any help would be really appreciated. Thanks
There are two probable issues here:
Are You sure the variable You are passing of type block doesn't have null fields? You may want to debug that part to be sure.
The reason may also be in ObjectMapper, You should have getters and setters available for Your block otherwise Jackson may not be able to access them.

Write custom document to Cosmos DB with Java API

I have a Cosmos DB and want to write different kind of documents to it. The structure of the documents is dynamic and can change.
I tried the following. Let's say I have the following class:
class CosmosDbItem implements Serializable {
private final String _id;
private final String _payload;
public CosmosDbItem(String id, String payload) {
_id = id;
_payload = payload;
}
public String getId() {
return _id;
}
public String getPayload() {
return _payload;
}
}
I can create then the document with some JSON as follows:
CosmosContainer _container = ...
CosmosDbItem dataToWrite = new CosmosDbItem("what-ever-id-18357", "{\"name\":\"Jane Doe\", \"age\":42}")
item = _cosmosContainer.createItem(dataToWrite, partitionKey, cosmosItemRequestOptions);
This results in a document like that:
{
"id": "what-ever-id-18357",
"payload": "{\"name\":\"Jane Doe\", \"age\":42}",
"_rid": "aaaaaaDaaAAAAAAAAAA==",
"_self": "dbs/aaaaAA==/colls/aaaaAaaaDI=/docs/aaaaapaaaaaAAAAAAAAAA==/",
"_etag": "\"6e00c443-0000-0700-0000-5f8499a70000\"",
"_attachments": "attachments/",
"_ts": 1602525607
}
Is there a way in generating the payload as real JSON object in that document? What do I need to change in my CosmosDbItem class? Like this:
{
"id": "what-ever-id-18357",
"payload": {
"name":"Jane Doe",
"age":42
},
"_rid": "aaaaaaDaaAAAAAAAAAA==",
"_self": "dbs/aaaaAA==/colls/aaaaAaaaDI=/docs/aaaaapaaaaaAAAAAAAAAA==/",
"_etag": "\"6e00c443-0000-0700-0000-5f8499a70000\"",
"_attachments": "attachments/",
"_ts": 1602525607
}
Here is my solution that I ended up. Actually it is pretty simple once I got behind it. Instead of using CosmosDbItem I use a simple HashMap<String, Object>.
public void writeData() {
...
Map<String, Object> stringObjectMap = buildDocumentMap("the-id-", "{\"key\":\"vale\"}");
_cosmosContainer.createItem(stringObjectMap, partitionKey, cosmosItemRequestOptions);
...
}
public Map<String, Object> buildDocumentMap(String id, String jsonToUse) {
JSONObject jsonObject = new JSONObject(jsonToUse);
jsonObject.put("id", id);
return jsonObject.toMap();
}
This can produce the following document:
{
"key": "value",
"id": "the-id-",
"_rid": "eaaaaaaaaaaaAAAAAAAAAA==",
"_self": "dbs/eaaaAA==/colls/eaaaaaaaaaM=/docs/eaaaaaaaaaaaaaAAAAAAAA==/",
"_etag": "\"3b0063ea-0000-0700-0000-5f804b3d0000\"",
"_attachments": "attachments/",
"_ts": 1602243389
}
One remark: it is important to set the id key in the HashMap. Otherwise one will get the error
"The input content is invalid because the required properties - 'id; ' - are missing"

How to deserialize avro data using Apache Beam (KafkaIO)

I've only seen one thread containing information about the topic I've mentioned which is :
How to Deserialising Kafka AVRO messages using Apache Beam
However, after trying a few variations of kafkaserializers I still cannot deserialize kafka messages. Here's my code:
public class Readkafka {
private static final Logger LOG = LoggerFactory.getLogger(Readkafka.class);
public static void main(String[] args) throws IOException {
// Create the Pipeline object with the options we defined above.
Pipeline p = Pipeline.create(
PipelineOptionsFactory.fromArgs(args).withValidation().create());
PTransform<PBegin, PCollection<KV<action_states_pkey, String>>> kafka =
KafkaIO.<action_states_pkey, String>read()
.withBootstrapServers("mybootstrapserver")
.withTopic("action_States")
.withKeyDeserializer(MyClassKafkaAvroDeserializer.class)
.withValueDeserializer(StringDeserializer.class)
.updateConsumerProperties(ImmutableMap.of("schema.registry.url", (Object)"schemaregistryurl"))
.withMaxNumRecords(5)
.withoutMetadata();
p.apply(kafka)
.apply(Keys.<action_states_pkey>create())
}
where MyClassKafkaAvroDeserilizer is
public class MyClassKafkaAvroDeserializer extends
AbstractKafkaAvroDeserializer implements Deserializer<action_states_pkey> {
#Override
public void configure(Map<String, ?> configs, boolean isKey) {
configure(new KafkaAvroDeserializerConfig(configs));
}
#Override
public action_states_pkey deserialize(String s, byte[] bytes) {
return (action_states_pkey) this.deserialize(bytes);
}
#Override
public void close() {} }
and the class action_states_pkey is code generated from avro tools using
java -jar pathtoavrotools/avro-tools-1.8.1.jar compile schema pathtoschema/action_states_pkey.avsc destination path
where the action_states_pkey.avsc is literally
{"type":"record","name":"action_states_pkey","namespace":"namespace","fields":[{"name":"ad_id","type":["null","int"]},{"name":"action_id","type":["null","int"]},{"name":"state_id","type":["null","int"]}]}
With this code I'm getting the error :
Caused by: java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record cannot be cast to my.mudah.beam.test.action_states_pkey
at my.mudah.beam.test.MyClassKafkaAvroDeserializer.deserialize(MyClassKafkaAvroDeserializer.java:20)
at my.mudah.beam.test.MyClassKafkaAvroDeserializer.deserialize(MyClassKafkaAvroDeserializer.java:1)
at org.apache.beam.sdk.io.kafka.KafkaUnboundedReader.advance(KafkaUnboundedReader.java:221)
at org.apache.beam.sdk.io.BoundedReadFromUnboundedSource$UnboundedToBoundedSourceAdapter$Reader.advanceWithBackoff(BoundedReadFromUnboundedSource.java:279)
at org.apache.beam.sdk.io.BoundedReadFromUnboundedSource$UnboundedToBoundedSourceAdapter$Reader.start(BoundedReadFromUnboundedSource.java:256)
at com.google.cloud.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:592)
... 14 more
It seems there's an error in trying to map the Avro Data to my custom class ?
Alternatively, I've tried the following code :
PTransform<PBegin, PCollection<KV<action_states_pkey, String>>> kafka =
KafkaIO.<action_states_pkey, String>read()
.withBootstrapServers("bootstrapserver")
.withTopic("action_states")
.withKeyDeserializerAndCoder((Class)KafkaAvroDeserializer.class, AvroCoder.of(action_states_pkey.class))
.withValueDeserializer(StringDeserializer.class)
.updateConsumerProperties(ImmutableMap.of("schema.registry.url", (Object)"schemaregistry"))
.withMaxNumRecords(5)
.withoutMetadata();
p.apply(kafka);
.apply(Keys.<action_states_pkey>create())
// .apply("ExtractWords", ParDo.of(new DoFn<action_states_pkey, String>() {
// #ProcessElement
// public void processElement(ProcessContext c) {
// action_states_pkey key = c.element();
// c.output(key.getAdId().toString());
// }
// }));
which does not give me any error until i try to print out the data. I have to verify that I'm succesfully reading the data one way or another so my intent here is to log the data in the console. If I uncomment the commented section i get the same error once again:
SEVERE: 2019-09-13T07:53:56.168Z: java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record cannot be cast to my.mudah.beam.test.action_states_pkey
at my.mudah.beam.test.Readkafka$1.processElement(Readkafka.java:151)
Another thing to note is that if I specify :
.updateConsumerProperties(ImmutableMap.of("specific.avro.reader", (Object)"true"))
always gives me an error of
Caused by: org.apache.kafka.common.errors.SerializationException: Error deserializing Avro message for id 443
Caused by: org.apache.kafka.common.errors.SerializationException: Could not find class NAMESPACE.action_states_pkey specified in writer's schema whilst finding reader's schema for a SpecificRecord.
It seems there's something wrong with my approach?
If anyone has any experience reading AVRO data from Kafka Streams using Apache Beam, please do help me out. I greatly appreciate it.
Here's a snapshot of my package with the schema and class in it as well:
package/working path details
Thanks.
public class MyClassKafkaAvroDeserializer extends
AbstractKafkaAvroDeserializer
Your class is extending the AbstractKafkaAvroDeserializer which returns GenericRecord.
You need to convert the GenericRecord to your custom object.
OR
Use SpecificRecord for this as stated in one of the following answers:
/**
* Extends deserializer to support ReflectData.
*
* #param <V>
* value type
*/
public abstract class ReflectKafkaAvroDeserializer<V> extends KafkaAvroDeserializer {
private Schema readerSchema;
private DecoderFactory decoderFactory = DecoderFactory.get();
protected ReflectKafkaAvroDeserializer(Class<V> type) {
readerSchema = ReflectData.get().getSchema(type);
}
#Override
protected Object deserialize(
boolean includeSchemaAndVersion,
String topic,
Boolean isKey,
byte[] payload,
Schema readerSchemaIgnored) throws SerializationException {
if (payload == null) {
return null;
}
int schemaId = -1;
try {
ByteBuffer buffer = ByteBuffer.wrap(payload);
if (buffer.get() != MAGIC_BYTE) {
throw new SerializationException("Unknown magic byte!");
}
schemaId = buffer.getInt();
Schema writerSchema = schemaRegistry.getByID(schemaId);
int start = buffer.position() + buffer.arrayOffset();
int length = buffer.limit() - 1 - idSize;
DatumReader<Object> reader = new ReflectDatumReader(writerSchema, readerSchema);
BinaryDecoder decoder = decoderFactory.binaryDecoder(buffer.array(), start, length, null);
return reader.read(null, decoder);
} catch (IOException e) {
throw new SerializationException("Error deserializing Avro message for id " + schemaId, e);
} catch (RestClientException e) {
throw new SerializationException("Error retrieving Avro schema for id " + schemaId, e);
}
}
}
The above is copied from https://stackoverflow.com/a/39617120/2534090
https://stackoverflow.com/a/42514352/2534090

Kafka consumer unit test with Avro Schema registry failing

I'm writing a consumer which listens to a Kafka topic and consumes message whenever message is available. I've tested the logic/code by running Kafka locally and it's working fine.
While writing the unit/component test cases, it's failing with avro schema registry url error. I've tried different options available on internet but could not find anything working. I am not sure if my approach is even correct. Please help.
Listener Class
#KafkaListener(topics = "positionmgmt.v1", containerFactory = "genericKafkaListenerFactory")
public void receive(ConsumerRecord<String, GenericRecord> consumerRecord) {
try {
GenericRecord generic = consumerRecord.value();
Object obj = generic.get("metadata");
ObjectMapper mapper = new ObjectMapper();
Header headerMetaData = mapper.readValue(obj.toString(), Header.class);
System.out.println("Received payload : " + consumerRecord.value());
//Call backend with details in GenericRecord
}catch (Exception e){
System.out.println("Exception while reading message from Kafka " + e );
}
Kafka config
#Bean
public ConcurrentKafkaListenerContainerFactory<String, GenericRecord> genericKafkaListenerFactory() {
ConcurrentKafkaListenerContainerFactory<String, GenericRecord> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(genericConsumerFactory());
return factory;
}
public ConsumerFactory<String, GenericRecord> genericConsumerFactory() {
Map<String, Object> config = new HashMap<>();
config.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "127.0.0.1:9092");
config.put(ConsumerConfig.GROUP_ID_CONFIG, "group_id");
config.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
config.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, KafkaAvroDeserializer.class);
config.put(KafkaAvroDeserializerConfig.SCHEMA_REGISTRY_URL_CONFIG,"http://localhost:8081");
return new DefaultKafkaConsumerFactory<>(config);
}
Avro Schema
{
"type":"record",
"name":"KafkaEvent",
"namespace":"com.ms.model.avro",
"fields":[
{
"name":"metadata",
"type":{
"name":"metadata",
"type":"record",
"fields":[
{
"name":"correlationid",
"type":"string",
"doc":"this is corrleation id for transaction"
},
{
"name":"subject",
"type":"string",
"doc":"this is subject for transaction"
},
{
"name":"version",
"type":"string",
"doc":"this is version for transaction"
}
]
}
},
{
"name":"name",
"type":"string"
},
{
"name":"dept",
"type":"string"
},
{
"name":"empnumber",
"type":"string"
}
]
}
This is my test code which I tried...
#ComponentTest
#RunWith(SpringRunner.class)
#EmbeddedKafka(partitions = 1, topics = { "positionmgmt.v1" })
#SpringBootTest(classes={Application.class})
#DirtiesContext
public class ConsumeKafkaMessageTest {
private static final String TEST_TOPIC = "positionmgmt.v1";
#Autowired(required=true)
EmbeddedKafkaBroker embeddedKafkaBroker;
private Schema schema;
private SchemaRegistryClient schemaRegistry;
private KafkaAvroSerializer avroSerializer;
private KafkaAvroDeserializer avroDeserializer;
private MockSchemaRegistryClient mockSchemaRegistryClient = new MockSchemaRegistryClient();
private String registryUrl = "unused";
private String avroSchema = string representation of avro schema
#BeforeEach
public void setUp() throws Exception {
Schema.Parser parser = new Schema.Parser();
schema = parser.parse(avroSchema);
mockSchemaRegistryClient.register("Vendors-value", schema);
}
#Test
public void consumeKafkaMessage_receive_sucess() {
Schema metadataSchema = schema.getField("metadata").schema();
GenericRecord metadata = new GenericData.Record(metadataSchema);
metadata.put("version", "1.0");
metadata.put("correlationid", "correlationid");
metadata.put("subject", "metadata");
GenericRecord record = new GenericData.Record(schema);
record.put("metadata", metadata);
record.put("name", "ABC");
record.put("dept", "XYZ");
Consumer<String, GenericRecord> consumer = configureConsumer();
Producer<String, GenericRecord> producer = configureProducer();
ProducerRecord<String, GenericRecord> prodRecord = new ProducerRecord<String, GenericRecord>(TEST_TOPIC, record);
producer.send(prodRecord);
ConsumerRecord<String, GenericRecord> singleRecord = KafkaTestUtils.getSingleRecord(consumer, TEST_TOPIC);
assertNotNull(singleRecord.value());
consumer.close();
producer.close();
}
private Consumer<String, GenericRecord> configureConsumer() {
Map<String, Object> consumerProps = KafkaTestUtils.consumerProps("groupid", "true", embeddedKafkaBroker);
consumerProps.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
Consumer<String, GenericRecord> consumer = new DefaultKafkaConsumerFactory<String, GenericRecord>(consumerProps).createConsumer();
consumer.subscribe(Collections.singleton(TEST_TOPIC));
return consumer;
}
private Producer<String, GenericRecord> configureProducer() {
Map<String, Object> producerProps = new HashMap<>(KafkaTestUtils.producerProps(embeddedKafkaBroker));
producerProps.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
producerProps.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, KafkaAvroSerializer.class.getName());
producerProps.put(KafkaAvroSerializerConfig.SCHEMA_REGISTRY_URL_CONFIG, mockSchemaRegistryClient);
producerProps.put(KafkaAvroSerializerConfig.AUTO_REGISTER_SCHEMAS, "false");
return new DefaultKafkaProducerFactory<String, GenericRecord>(producerProps).createProducer();
}
}
Error
component.com.ms.listener.ConsumeKafkaMessageTest > consumeKafkaMessage_receive_sucess() FAILED
org.apache.kafka.common.KafkaException: Failed to construct kafka producer
at org.apache.kafka.clients.producer.KafkaProducer.<init>(KafkaProducer.java:457)
at org.apache.kafka.clients.producer.KafkaProducer.<init>(KafkaProducer.java:289)
at org.springframework.kafka.core.DefaultKafkaProducerFactory.createKafkaProducer(DefaultKafkaProducerFactory.java:318)
at org.springframework.kafka.core.DefaultKafkaProducerFactory.createProducer(DefaultKafkaProducerFactory.java:305)
at component.com.ms.listener.ConsumeKafkaMessageTest.configureProducer(ConsumeKafkaMessageTest.java:125)
at component.com.ms.listener.ConsumeKafkaMessageTest.consumeKafkaMessage_receive_sucess(ConsumeKafkaMessageTest.java:97)
Caused by:
io.confluent.common.config.ConfigException: Invalid value io.confluent.kafka.schemaregistry.client.MockSchemaRegistryClient#20751870 for configuration schema.registry.url: Expected a comma separated list.
at io.confluent.common.config.ConfigDef.parseType(ConfigDef.java:345)
at io.confluent.common.config.ConfigDef.parse(ConfigDef.java:249)
at io.confluent.common.config.AbstractConfig.<init>(AbstractConfig.java:78)
at io.confluent.kafka.serializers.AbstractKafkaAvroSerDeConfig.<init>(AbstractKafkaAvroSerDeConfig.java:105)
at io.confluent.kafka.serializers.KafkaAvroSerializerConfig.<init>(KafkaAvroSerializerConfig.java:32)
at io.confluent.kafka.serializers.KafkaAvroSerializer.configure(KafkaAvroSerializer.java:48)
at org.apache.kafka.common.serialization.ExtendedSerializer$Wrapper.configure(ExtendedSerializer.java:60)
at org.apache.kafka.clients.producer.KafkaProducer.<init>(KafkaProducer.java:372)
... 5 more
I investigated it a bit and I found out that the problem is in the CashedSchemaRegistryClient that is used by the KafkaAvroSerializer/Deserializer. It is used to fetch the schema definitions from the Confluent Schema Registry.
You already have your schema definition locally so you don't need to go to Schema Registry for them. (at least in your tests)
I had a similar problem and I solved it by creating a custom KafkaAvroSerializer/KafkaAvroDeserializer.
This is a sample of KafkaAvroSerializer. It is rather simple. You just need to extend the provided KafkaAvroSerializer and tell him to use MockSchemaRegistryClient.
public class CustomKafkaAvroSerializer extends KafkaAvroSerializer {
public CustomKafkaAvroSerializer() {
super();
super.schemaRegistry = new MockSchemaRegistryClient();
}
public CustomKafkaAvroSerializer(SchemaRegistryClient client) {
super(new MockSchemaRegistryClient());
}
public CustomKafkaAvroSerializer(SchemaRegistryClient client, Map<String, ?> props) {
super(new MockSchemaRegistryClient(), props);
}
}
This is a sample of KafkaAvroDeserializer. When the deserialize method is called you need to tell him which schema to use.
public class CustomKafkaAvroDeserializer extends KafkaAvroDeserializer {
#Override
public Object deserialize(String topic, byte[] bytes) {
this.schemaRegistry = getMockClient(KafkaEvent.SCHEMA$);
return super.deserialize(topic, bytes);
}
private static SchemaRegistryClient getMockClient(final Schema schema$) {
return new MockSchemaRegistryClient() {
#Override
public synchronized Schema getById(int id) {
return schema$;
}
};
}
}
The last step is to tell spring to use created Serializer/Deserializer
spring.kafka.producer.properties.schema.registry.url= not-used
spring.kafka.producer.value-serializer = CustomKafkaAvroSerializer
spring.kafka.producer.key-serializer = org.apache.kafka.common.serialization.StringSerializer
spring.kafka.producer.group-id = showcase-producer-id
spring.kafka.consumer.properties.schema.registry.url= not-used
spring.kafka.consumer.value-deserializer = CustomKafkaAvroDeserializer
spring.kafka.consumer.key-deserializer = org.apache.kafka.common.serialization.StringDeserializer
spring.kafka.consumer.group-id = showcase-consumer-id
spring.kafka.auto.offset.reset = earliest
spring.kafka.producer.auto.register.schemas= true
spring.kafka.properties.specific.avro.reader= true
I wrote a short blog post about that:
https://medium.com/#igorvlahek1/no-need-for-schema-registry-in-your-spring-kafka-tests-a5b81468a0e1?source=friends_link&sk=e55f73b86504e9f577e259181c8d0e23
Link to the working sample project: https://github.com/ivlahek/kafka-avro-without-registry
The answer from #ivlahek is working, but if you look at this example 3 year later you might want to do slight modification to CustomKafkaAvroDeserializer
private static SchemaRegistryClient getMockClient(final Schema schema) {
return new MockSchemaRegistryClient() {
#Override
public ParsedSchema getSchemaBySubjectAndId(String subject, int id)
throws IOException, RestClientException {
return new AvroSchema(schema);
}
};
}
As the error says, you need to provide a string to the registry in the producer config, not an object.
Since you're using the Mock class, that string could be anything...
However, you'll need to construct the serializers given the registry instance
Serializer serializer = new KafkaAvroSerializer(mockSchemaRegistry);
// make config map with ("schema.registry.url", "unused")
serializer.configure(config, false);
Otherwise, it will try to create a non-mocked client
And put that into the properties
producerProps.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, serializer);
If your #KafkaListener is in test class then you can read it in StringDeserializer then convert it to the desired class manually
#Autowired
private MyKafkaAvroDeserializer myKafkaAvroDeserializer;
#KafkaListener( topics = "test")
public void inputData(ConsumerRecord<?, ?> consumerRecord) {
log.info("received payload='{}'", consumerRecord.toString(),consumerRecord.value());
GenericRecord genericRecord = (GenericRecord)myKafkaAvroDeserializer.deserialize("test",consumerRecord.value().toString().getBytes(StandardCharsets.UTF_8));
Myclass myclass = (Myclass) SpecificData.get().deepCopy(Myclass.SCHEMA$, genericRecord);
}
#Component
public class MyKafkaAvroDeserializer extends KafkaAvroDeserializer {
#Override
public Object deserialize(String topic, byte[] bytes) {
this.schemaRegistry = getMockClient(Myclass.SCHEMA$);
return super.deserialize(topic, bytes);
}
private static SchemaRegistryClient getMockClient(final Schema schema$) {
return new MockSchemaRegistryClient() {
#Override
public synchronized org.apache.avro.Schema getById(int id) {
return schema$;
}
};
}
}
Remember to add schema registry and key/value serializer in application.yml although it won't be used
consumer:
key-deserializer: org.apache.kafka.common.serialization.StringDeserializer
value-deserializer: org.apache.kafka.common.serialization.StringDeserializer
properties:
schema.registry.url :http://localhost:8080

GSON identifying JSON Object as Primitive

I am writing a relatively simple messaging app that saves its logs in the JSON format, and I am using the GSON library to parse these. I load a JSON file from a server, and put it trough Gson.toJsonTree() function. I'm not sure this is expected, but when I test the result from the previous function with the isJsonSomething() functions (isJsonObject,isJsonAray,isJsonNull,isJsonPrimitive), isJsonPrimitive returns true, and I can't parse it into a object. This is my JSON file's contents:
{
"users": [
{
"picture": "",
"type": "user",
"name": "kroltan"
}
],
"description": "No description",
"messages": [
{
"content": "something",
"time": "2013-08-30 00:38:17.212000",
"type": "message",
"author": "someone"
}
],
"type": "channel",
"name": "default"
}
And here is the class used to parse it into POJOs: (CLEANUP comments is where I've removed irrelevant code from the post)
package com.example.testapp;
//CLEANUP: All needed imports
import com.example.testapp.data.*;
import com.google.gson.*;
public class JSONConverter {
public interface JsonTypeLoadedListener {
public void onSucess(JSONType jsonType);
public void onFailure(Exception e);
}
public static final String DATE_FORMAT = "dd-MM-yyyy HH:mm:ss.SSS";
public static final HashMap<String, Class<?>> JSON_TYPES = new HashMap<String, Class<?>>();
public JSONConverter() {
JSON_TYPES.clear();
JSON_TYPES.put("channel", Channel.class);
JSON_TYPES.put("user", User.class);
JSON_TYPES.put("message", Message.class);
}
public void loadFromURL(final URL url, final JsonTypeLoadedListener listener) {
new Thread(new Runnable() {
#Override
public void run() {
JsonObject result = null;
Gson gson = new GsonBuilder().setDateFormat(DATE_FORMAT).create();
if (url.getProtocol().equals("http")) {
try {
String content = //Loads from a server, omitted for clarity
result = gson.toJsonTree(content).getAsJsonObject();
conn.disconnect();
} catch (Exception e) {
e.printStackTrace();
listener.onFailure(e);
return;
}
} else if (url.getProtocol().equals("file")) {
try {
String content = //Loads from a file, omitted for clarity
result = gson.toJsonTree(content).getAsJsonObject();
br.close();
} catch (Exception e) {
e.printStackTrace();
listener.onFailure(e);
return;
}
}
listener.onSucess((JSONType) gson.fromJson(result, JSON_TYPES.get(result.get("type").getAsString())));
}
}, "URLLoader").start();
}
public JSONType loadFromString(String s) {
Gson gson = new Gson();
JsonObject result = gson.toJsonTree(s).getAsJsonObject();
return (JSONType) gson.fromJson(result, JSON_TYPES.get(result.get("type").getAsString()));
}
}
The classes Message, User and Channel all inherit from JSONType (a custom class with a field called type and some utility methods) and contain all values present in the above mentioned JSON file.
When it reaches gson.toJsonTree(content).getAsJsonObject(), I get this error in Logcat (string omitted for clarity, it's just the full file):
java.lang.IllegalStateException: Not a JSON Object: "String containing all the file with tabs represented as \t"
I'm guessing that the tabs are causing your issue. Try to remove them with:
content = content.replaceAll("\\s","")
this will simply clean your json string from any whitespace.
Btw I suggests you to get rid of Gson library and use directly the JSONObject provided in the android sdk. You can initialize it directly with the json string, as new JSONObject(content). :)

Categories