Building a new SourceRecord from an Object - java

I am writing a Kafka connector in order to download some data from several sources on Github (text and yaml files) and transform them into objects of a certain class, which is automatically generated from an avsc-file:
{
"type": "record",
"name": "MatomoRecord",
"fields": [
{"name": "name", "type": "string"},
{"name": "type", "type": "string"},
{"name": "timestamp", "type": "long"}
]
}
So far everything was successful. So now I have a Map of objects, which I want to persist in a Kafka topic. For that I'm trying to create SourceRecords:
for (Map.Entry<String, MatomoRecord> record : records.entrySet()) {
sourceRecords.add(new SourceRecord(
sourcePartition,
sourceOffset,
matomoTopic,
0,
org.apache.kafka.connect.data.Schema.STRING_SCHEMA,
record.getKey(),
matomoSchema,
record.getValue())
);
}
How can I define the value schema of type org.apache.kafka.connect.data.Schema based on the avro schema? For a test I have manually created a schema using the Builder:
Schema matomoSchema = SchemaBuilder.struct()
.name("MatomoRecord")
.field("name", Schema.STRING_SCHEMA)
.field("type", Schema.STRING_SCHEMA)
.field("timestamp", Schema.INT64_SCHEMA)
.build();
The result was:
org.apache.kafka.connect.errors.DataException: Invalid type for STRUCT: class MatomoRecord
Could sombody help me to define the value schema based on the avro schema?
Best regards
Martin

You can't use record.getValue(), nor is there a direct API from Avro to Connect Schema (without internal methods of Confluent's AvroConverter)
You need to parse that object into a Struct object that matches the schema you've defined (which looks fine assuming none of your object fields can be null)
Look at the Javadoc for how you can define it https://kafka.apache.org/22/javadoc/org/apache/kafka/connect/data/Struct.html
Note (not relevant here), nested structs should be built from the "bottom up", where you put child structs / arrays into parent ones.
Your connector should not necessarily depend on Avro other than to include your model objects. The Converter interfaces are responsible for converting your Struct with its Schema into other data formats (JSON, Confluent's Avro encoding, Protobuf, etc)

A KC Schema is an JSON schema that looks awfully like an Avro schema. Try org.apache.kafka.connect.json.JsonConverter#asConnectSchema - you may need to massage the Avro schema to make it work.

Related

Missing quotes on UUID when using Apache Avro

I use the Apache Avro Maven Plugin to generate my Java classes from my .avsc definitions. I have some fields that are declared as UUID strings as follows:
{
"name": "fieldName",
"type": {
"type": "string",
"logicalType": "uuid"
}
}
In Avro 1.11.0 these fields were mapped as properties of type String by the generator.
Starting with Apache Avro 1.11.1, the generator maps these fields as properties of type java.util.UUID, which is an improvement imo. However, when calling the toString method of a generated class with one of such fields, the output is not a valid JSON since the UUID field is not surrounded by quotes.
I have looked into the toString generated code, and indeed, UUID fields are not serialized with surrounding quotes.
Am I missing something or is this a bug?

Open API Schema for an array of arrays

I'm going to define Open Api schema for an endpoint implemented in JAVA, which returns JSON like this:
{"id": 1, "data": [[int, int]]}
Any idea how to configue the annotation #Schema?

type-preserving deserialization of GraphQL response using schema

I am trying to write deserialization code for responses of user-defined GraphQL queries. The code has access to the query response in JSON-serialized form and the underlying GraphQL schema (by querying the endpoint's schema.json or making introspection requests).
Assume the following schema:
scalar Date
type User {
name: String
birthday: Date
}
type Query {
allUsers: [User]
}
schema {
query: Query
}
And the following query:
query {
allUsers {
name
birthday
}
}
The response may look like this (only includes the data.allUsers-field from the full response for brevity):
[
{"name": "John Doe", "birthday": "1983-12-07"}
]
What I am attempting to do is deserialize the above response in a manner that preserves type information, including for any custom scalars. In the above example, I know by convention that the GraphQL scalar Date should be deserialized as LocalDate in Java, but just from the response alone I do not know that the birthday field represents the GraphQL scalar type Date, since it's serialized as a regular string in JSON.
What I can do is try to utilize the GraphQL schema for this. For the above example, the schema may look something like this (shortened for brevity):
...
"types": [
{
"kind": "OBJECT",
"name": "User",
"fields": [
{
"name": "name",
"type": {
"kind": "SCALAR",
"name": "String"
}
},
{
"name": "birthday"
"type": {
"kind": "SCALAR",
"name": "Date"
}
}
...
From this information I can deduce that that response's birthday field is of type Date, and deserialize it accordingly. However, things get more complicated if the query uses non-trivial GraphQL features. Take aliasing for example:
query {
allUsers {
name
dayOfBirth: birthday
}
}
At this point I would already need to keep track of any aliasing (which I could do since that information is available if I parse the query), and backtrack those to find the correct type. I fear it might get even more complicated if e.g. fragments are used.
Given that I use graphql-java, and it appears to already need to handle all of these cases for serialization, I wondered if there was an easier way to do this than to manually backtrack the types from the query and schema.
How about generating java classes from the schema and then using those classes to deserialize. There is one plugin which I have used before for this - graphql-java-generator
You may need to enhance the plugin a bit to support your custom scalars though
It basically generates a java client for invoking your GraphQL queries in a Java way.
I had the same problem to deserialize an LocalDate attribute, even using the graphql-java-extended-scalars library.
Researching I found that this library works well for queries but not so well for mutations.
I fixed my problem by customizing SchemaParserOptions, like this:
#Bean
public SchemaParserOptions schemaParserOptions() {
return SchemaParserOptions.newOptions().objectMapperConfigurer((mapper, context) -> {
mapper.registerModule(new JavaTimeModule());
}).build();
}
In the object i didn't use any serialization and deserialization annotations.

How to avoid class name in #type while serializing object to JSON using Jackson

I am using Jackson for serializing POJOs to JSON. But, I am getting the JSON as :
{
"#type": "com.company.services.alert.dto.JungleEventDTO",
"company": "xyz",
"enabled": true,
"support": false,
..
}
I do not want to expose my class name to the client.
How can I do this ?
You can use #JsonTypeId or #JsonTypeName or #JsonTypeInfo for Type handling.
Reference from https://github.com/FasterXML/jackson-annotations/wiki/Jackson-Annotations
#JsonTypeId: property annotation used to indicate that the property value should be used as the Type Id for object, instead of using class name or external type name.
Also look at How can I prevent Jackson from serializing a polymorphic type's annotation property?

How to serialize/deserialize dynamic json types with Avro

for the past week I've been trying to use Avro to map data from a streaming api.
I'm using ReflectData to create my schema from a POJO representing the json response.
I'm then using a ReflectDatumReader to convert json to avro bytes and similarly for the reverse.
The problem I'm facing is related to the json responses I get. The reponse can change depending on what type of message is sent.
i.e.
{
"id": 001,
"text": {
"type": "comment",
"event": "event",
"comment": {
...
}
but this can also be
{
"id": 001,
"text": {
"type": "status",
"event": "event",
"status": {
...
}
so, as you can see the type object reflects what the name of the json object will be later on.
I could not find a way to represent such a schema. I've used jackson in the past to represent polymorphic types like this but I can't figure out a way to do this with Avro's Java API.
I'd really appreciate any help/suggestions on this. :)
Many thanks.
You may have to use what in Avro-parlance is known as "schema projection": that is, you define a superset of the different schemas you are parsing, and Avro ignores missing schema fields as necessary. It is described here under section Schema Resolution:
http://avro.apache.org/docs/1.7.7/spec.html
That's the theory at least. In practice I have often had to drop down into the (Java-)API code and deal with nulls etc. explicitly.

Categories