How to insert null values for an Avro map - java

I have a usecase where I need to have null values allowed for an Avro Map, but it seems like Avro doesn't allow unions for Map values. Basically, I need to implement the functionality of a POJO defined as Map<String,<Optional<String>>>.
How can I achieve this?
The following avro schema throws no type found error:
Error:
org.apache.avro:avro-maven-plugin:1.10.0: schema failed:
No type: {"type":["null","string"]}
{
"namespace": "com.testclass.avro",
"name": "test",
"type": "record",
"fields": [
{
"name": "user",
"type": {
"name": "userdetails",
"type": "record",
"fields": [
{
"name": "isPresent",
"type": "boolean"
},
{
"name": "address",
"type": {
"type": "map",
"name": "address",
"values": {
"type": ["null","string"]
}
}
}
]
}
}
]
}

Specifying the string as a string within the json definition helped solved the problem.
"address":{"test":{"string":"a"}, "test2":{"string":"a"}}

Related

Get available attributes (possibly recursive) from JSON Schema in Java

Let's say I've got the following JSON Schema:
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://example.com/product.schema.json",
"title": "Draft JSON Schema",
"type": "object",
"properties": {
"person": {
"type": "object",
"properties": {
"details": {
"type": "object",
"properties": {
"first_name": {
"type": "string"
},
"last_name": {
"type": "string"
},
"groups": {
"type": "array",
"items": { "$ref": "#/$defs/existing_groups"
}
}
}
}
},
"$defs": {
"existing_groups": [ "Teachers", "Students" ]
},
"book": {
"type": "object",
"properties": {
"title": {
"type": "string"
},
"author": {
"type": "string"
}
}
}
}
}
From this schema, I would like to retrieve the available attributes and values at a defined depth:
So what's given is e.g. person.details and I want first_name, last_name, groups to be returned.
If person.details.groups is given, the possible values Student, Teacher should be returned.
If book.title is given, an empty Array or Set should be returned.
Apparently you can get attribute values from JSON objects with JsonPath, but I rather want to get possible attributes (and their possible values, if any are given) from a com.networknt.schema.JsonSchema.
What is the easiest way to do this in Java?
JSON Schema is for validating data. It has nothing to do with data manipulation or extraction. It's not comparable to JSONPath in any way.

Empty object validation in json schema

I'm trying to validate a json using a JSON schema.
In the below json "industry" is of type "object" and it is "not required".
however i need to find out if "industry" is provided in the json or not.
here is my json schema
{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"additionalProperties": false,
"properties": {
"id": {
"enum": ["Russia", "Canada"]
},
"name": {
"type": "string"
},
"industry": {
"$ref": "#/definitions/industry"
}
},
"required": [
"id",
"name"
],
"definitions": {
"industry": {
"type": "object",
"additionalProperties": false,
"properties": {
"type": {
"type": "string"
},
"codes": {
"type": "array",
"items": {
"type": "integer"
}
}
},
"required": [
"codes",
"type"
],
"title": "industry"
}
}
}
here is my json
{
"id": "Russia",
"price": 10.50
}
I want to know if "industry" object is present or not in the given json bcos if the "industry" object is present in the json. I need to do something else. currently if i send the json as above and try this if statement like below code. It is passing as true even though "industry" object is not present in the json. I believe it is considering "industry" object like this {} and not as null.
"if":{
"properties": {"industry" : { "type": "object" }}
},
Any solution to validate if the "industry" object is present in the json object or not will be helpful. Thank you.
A schema containing "properties" will evaluate to true if the property is not present. What you want to put as the conditional of your "if" is "required":
"if": {"required":["industry"]}, "then": { ... }

Merge two avro schemas programmatically

I have two similar schemas where only one nested field changes (it is called onefield in schema1 and anotherfield in schema2).
schema1
{
"type": "record",
"name": "event",
"namespace": "foo",
"fields": [
{
"name": "metadata",
"type": {
"type": "record",
"name": "event",
"namespace": "foo.metadata",
"fields": [
{
"name": "onefield",
"type": [
"null",
"string"
],
"default": null
}
]
},
"default": null
}
]
}
schema2
{
"type": "record",
"name": "event",
"namespace": "foo",
"fields": [
{
"name": "metadata",
"type": {
"type": "record",
"name": "event",
"namespace": "foo.metadata",
"fields": [
{
"name": "anotherfield",
"type": [
"null",
"string"
],
"default": null
}
]
},
"default": null
}
]
}
I am able to programatically merge both schemas using avro 1.8.0:
Schema s1 = new Schema.Parser().parse(schema1);
Schema s2 = new Schema.Parser().parse(schema2);
Schema[] schemas = {s1, s2};
Schema mergedSchema = null;
for (Schema schema: schemas) {
mergedSchema = AvroStorageUtils.mergeSchema(mergedSchema, schema);
}
and use it to convert an input json into an avro or json representation:
JsonAvroConverter converter = new JsonAvroConverter();
try {
byte[] example = new String("{}").getBytes("UTF-8");
byte[] avro = converter.convertToAvro(example, mergedSchema);
byte[] json = converter.convertToJson(avro, mergedSchema);
System.out.println(new String(json));
} catch (AvroConversionException e) {
e.printStackTrace();
}
That code shows the expected output: {"metadata":{"onefield":null,"anotherfield":null}}. The issue is that I am not able to see the merged schema. If I do a simple System.out.println(mergedSchema) I get the following exception:
Exception in thread "main" org.apache.avro.SchemaParseException: Can't redefine: merged schema (generated by AvroStorage).merged
at org.apache.avro.Schema$Names.put(Schema.java:1127)
at org.apache.avro.Schema$NamedSchema.writeNameRef(Schema.java:561)
at org.apache.avro.Schema$RecordSchema.toJson(Schema.java:689)
at org.apache.avro.Schema$RecordSchema.fieldsToJson(Schema.java:715)
at org.apache.avro.Schema$RecordSchema.toJson(Schema.java:700)
at org.apache.avro.Schema.toString(Schema.java:323)
at org.apache.avro.Schema.toString(Schema.java:313)
at java.lang.String.valueOf(String.java:2982)
at java.lang.StringBuilder.append(StringBuilder.java:131)
I call it the avro uncertainty principle :). It looks like avro is able to work with the merged schema, but it fails when it tries to serialize the schema to JSON. The merge works with simpler schemas, so it sounds like a bug in avro 1.8.0 to me.
Do you know what could be happening or how to solve it? Any workaround (ex: alternative Schema serializers) is welcome.
I found the same issue with the pig util class... actually there are 2 bugs here
AVRO allows serialize data through GenericDatumWriter using an invalid schema
The piggybank util class is generating invalid schemas because it is using the same name/namespace for all the merged fields (instance of keep the original name)
This is working properly for more complex scenarios https://github.com/kite-sdk/kite/blob/master/kite-data/kite-data-core/src/main/java/org/kitesdk/data/spi/SchemaUtil.java#L511
Schema mergedSchema = SchemaUtil.merge(s1, s2);
From your example, I am getting the following output
{
"type": "record",
"name": "event",
"namespace": "foo",
"fields": [
{
"name": "metadata",
"type": {
"type": "record",
"name": "event",
"namespace": "foo.metadata",
"fields": [
{
"name": "onefield",
"type": [
"null",
"string"
],
"default": null
},
{
"name": "anotherfield",
"type": [
"null",
"string"
],
"default": null
}
]
},
"default": null
}
]
}
Hopefully this will help others.
Merge schema facility is not ssupported for avro files yet.
But lets say if you are having avro files in one directory with multiple avro files which has different schemas eg: /demo so you can read it through spark using. and provide one master schema file (i.e .avsc file) so spark will internally read all the records from the file and if any one file has missing column so it will show null value.
object AvroSchemaEvolution {
def main(args: Array[String]): Unit = {
val schema = new Schema.Parser().parse(new File("C:\\Users\\murtazaz\\Documents\\Avro_Schema_Evolution\\schema\\emp_inserted.avsc"))
val spark = SparkSession.builder().master("local").getOrCreate()
val df = spark.read
.format("com.databricks.spark.avro").option("avroSchema", schema.toString)
.load("C:\\Users\\murtazaz\\Documents\\Avro_Schema_Evolution\\demo").show()
}
}

generating POJOs from JSON Schema for non-object types

I am trying to generate POJOs from the JSON Schema of XMBC.
I do this with jsonschema2pojo.
However, nothing gets generated. It doesn't even bring me an error.
This is a reduced sample json schema I am trying to generate from:
{
"description": "JSON-RPC API of XBMC",
"id": "http://xbmc.org/jsonrpc/ServiceDescription.json",
"methods": {
"Addons.ExecuteAddon": {
"description": "Executes the given addon with the given parameters (if possible)",
"params": [
{
"name": "addonid",
"required": true,
"type": "string"
},
{
"default": "",
"name": "params",
"type": [
{
"additionalProperties": {
"default": "",
"type": "string"
},
"type": "object"
},
{
"items": {
"type": "string"
},
"type": "array"
},
{
"description": "URL path (must start with / or ?",
"type": "string"
}
]
},
{
"default": false,
"name": "wait",
"type": "boolean"
}
],
"returns": {
"type": "string"
},
"type": "method"
}
},
"notifications": {
"Application.OnVolumeChanged": {
"description": "The volume of the application has changed.",
"params": [
{
"name": "sender",
"required": true,
"type": "string"
},
{
"name": "data",
"properties": {
"muted": {
"required": true,
"type": "boolean"
},
"volume": {
"maximum": 100,
"minimum": 0,
"required": true,
"type": "integer"
}
},
"required": true,
"type": "object"
}
],
"returns": null,
"type": "notification"
}
},
"types": {
"Addon.Content": {
"default": "unknown",
"enums": [
"unknown",
"video",
"audio",
"image",
"executable"
],
"id": "Addon.Content",
"type": "string"
}
},
"version": "6.14.3"
}
I must admin that my knowledge of JSON is very terse, maybe it is just a simple fault of mine. But can anyone help me how I can generate Java objects from such a JSON Schema?
The JSON Schema doesn't support method. JSON schema defines json data structure, it would not be used to define your methods. Most important attribute in JSON schema is properties.
It's good to generate POJO data models from a JSON schema, but not business logic. You can learn the JSON schema from those examples.

Jackson Parser for recursively parsing unknown input structure

I'm trying to parse recursively json input structure in java like the format below and trying to rewrite the same structure in another json.
Meanwhile I need to validate each & every json key/values while parsing.
{"Verbs":[{
"aaaa":"30d", "type":"ed", "rel":1.0, "id":"80", "spoken":"en", "ct":"on", "sps":null
},{
"aaaa":"31", "type":"cc", "rel":3.0, "id":"10", "spoken":"en", "ct":"off", "sps":null
},{
"aaaa":"81", "type":"nn", "rel":3.0, "id":"60", "spoken":"en", "ct":"on", "sps":null
}]}
Please advice how I can use Jackson parser JsonToken enums for reading and writing unknown json content.
You can use JSON Schema to validate your inputs.
Find the documentation for the data format, but from what I can read here, the schema would be something like this:
{
"$schema": "http://json-schema.org/schema#",
"type": "object",
"required": [ "Verbs" ],
"properties": {
"Verbs": { "type": "array", "items": { "$ref": "#/definitions/verb" } }
},
"definitions": {
"verb": {
"type": "object",
"required": [ "aaaa", "type", "rel", "id", "spoken", "ct", "sps" ],
"additionalProperties": false,
"properties": {
"aaaa": { "type": "string" },
"type": { "type": "string" },
"rel": { "type": "number" },
"id": { "type": "string", "pattern": "^[0-9]+$" },
"spoken": { "type": "string" },
"ct": { "enum": [ "on", "off" ] },
"sps": { "enum": [ null ] }
}
}
}
}
As you use Jackson, you can use this library which can validate your data for you.
Transforming your JSON after that can be done by creating a new JsonNode, for instance.

Categories