How to create schema containing list of objects using Avro?

How to create schema containing list of objects using Avro? - java

Does anyone knows how to create Avro schema which contains list of objects of some class?
I want my generated classes to look like below :
class Child {
String name;
}
class Parent {
list<Child> children;
}
For this, I have written part of schema file but do not know how to tell Avro to create list of objects of type Children?
My schema file looks like below :
{
"name": "Parent",
"type":"record",
"fields":[
{
"name":"children",
"type":{
"name":"Child",
"type":"record",
"fields":[
{"name":"name", "type":"string"}
]
}
}
]
}
Now problem is that I can mark field children as either Child type or array but do not know how to mark it as a array of objects of type Child class?
Can anyone please help?

You need to use array type for creating the list.
Following is the updated schema that handles your usecase.
{
"name": "Parent",
"type":"record",
"fields":[
{
"name":"children",
"type":{
"type": "array",
"items":{
"name":"Child",
"type":"record",
"fields":[
{"name":"name", "type":"string"}
]
}
}
}
]
}

I had following class and avro maven plugin generated two classes accordingly :
public class Employees{
String accountNumber;
String address;
List<Account> accountList;
}
public class Account {
String accountNumber;
String id;
}
Avro file format :
{
"type": "record",
"namespace": "com.mypackage",
"name": "AccountEvent",
"fields": [
{
"name": "accountNumber",
"type": "string"
},
{
"name": "address",
"type": "string"
},
{
"name": "accountList",
"type": {
"type": "array",
"items":{
"name": "Account",
"type": "record",
"fields":[
{ "name": "accountNumber",
"type": "string"
},
{ "name": "id",
"type": "string"
}
]
}
}
}
]
}

Array as type
{
"type": "record",
"name": "jamesMedice",
"fields": [{
"name": "columns",
"type": {
"type": "array",
"items": {
"type": "record",
"name": "columnValues",
"fields": [{
"name": "personId",
"type": "string",
"default": "null"
},
{
"name": "email",
"type": "string",
"default": "null"
}
]
}
}
}]
}

Related

Save Nested JSON using Spring data jpa

I want to save data in MYSQL DB by creating Entity class and repository from scratch. I am able to save the normal String Data, Integer Data but struggling to save complex JSON data's
for instance:
[
{
"id": "0001",
"type": "donut",
"name": "Cake",
"ppu": 0.55,
"batters":
{
"batter":
[
{ "id": "1001", "type": "Regular" },
{ "id": "1002", "type": "Chocolate" },
{ "id": "1003", "type": "Blueberry" },
{ "id": "1004", "type": "Devil's Food" }
]
},
"topping":
[
{ "id": "5001", "type": "None" },
{ "id": "5002", "type": "Glazed" },
{ "id": "5005", "type": "Sugar" },
{ "id": "5007", "type": "Powdered Sugar" },
{ "id": "5006", "type": "Chocolate with Sprinkles" },
{ "id": "5003", "type": "Chocolate" },
{ "id": "5004", "type": "Maple" }
]
},
{
"id": "0002",
"type": "donut",
"name": "Raised",
"ppu": 0.55,
"batters":
{
"batter":
[
{ "id": "1001", "type": "Regular" }
]
},
"topping":
[
{ "id": "5001", "type": "None" },
{ "id": "5002", "type": "Glazed" },
{ "id": "5005", "type": "Sugar" },
{ "id": "5003", "type": "Chocolate" },
{ "id": "5004", "type": "Maple" }
]
},
{
"id": "0003",
"type": "donut",
"name": "Old Fashioned",
"ppu": 0.55,
"batters":
{
"batter":
[
{ "id": "1001", "type": "Regular" },
{ "id": "1002", "type": "Chocolate" }
]
},
"topping":
[
{ "id": "5001", "type": "None" },
{ "id": "5002", "type": "Glazed" },
{ "id": "5003", "type": "Chocolate" },
{ "id": "5004", "type": "Maple" }
]
}
]
How can I store such JSON's in MYSQL Db?
Should I Create Class for every nested element ?

(I would consider to switch to a NoSQL DB instead of MySQL, but okay...)
//1.
create table users_json(
id int auto_increment primary key,
details json);
2.
public interface SomeRepository extends JpaRepository<AnyEntity, Long> {
#Modifying(clearAutomatically = true)
#Query(value = "insert into users_json (details) values (:param) ", nativeQuery = true)
#Transactional
int insertValue(#Param("param") String param);}
3.
anyRepository.insertValue("{ \"page\": \"1\" , \"name\": \"Zafari\", \"os\": \"Mac\", \"spend\": 100, \"resolution\": { \"x\": 1920, \"y\": 1080 } }");
4.
SELECT id, details->'$.name' FROM users_json;

Storing JSON in MySQL is possible. You can use these 3 column types depending upon the column size.
For your Entity class :
#Entity
#Getter
#Setter
public class Test {
#Id
#GeneratedValue(strategy = GenerationType.IDENTITY)
private Integer id;
#Column(columnDefinition = "LONGTEXT") // can store upto 4GB
private String longText;
#Column(columnDefinition = "MEDIUMTEXT") // can store upto 64MB
private String mediumText;
#Column(columnDefinition = "TEXT") // can store upto 64KB
private String text;
}
For your Controller method :
#PostMapping(value = "/addData")
public void addData(#RequestBody String payload) {
testRepository.addData(payload);
}
For your Repository Class:
#Repository
public interface TestRepository extends JpaRepository<Test,Integer> {
#Modifying
#Transactional
#Query(value = "INSERT INTO test(text,medium_text,long_text) VALUE(?1,?1,?1)" ,nativeQuery = true)
void addData(String payload);
}
In MYSQL it will look like this,

It depends if you want to store your Json as String or do you want to convert it into DTO instances that are mapped to your entities and use repository to save them to DB? If you want to store JSON as a String than It shouldn't be any different from any other String. If you want to store it as Entities than you need to convert your JSON (de-serialize) into your DTOs and then work with them as regular DTOs. It doesn't matter how they where created. I just answered very similar question. Please see here

Build JSON from a very complex JSON Schema in Java

I have a complex issue here and some advice or suggestions would be greatly appreciated. Essentially I have a complex JSON schema that looks something like this:
{
"$schema": "http://example.org",
"$id": "http://example.org",
"title": "schema title",
"description": "description",
"properties": {
"name": {
"description": "description",
"type": "string",
"enum": [
"name1",
"name2"
]
},
"storage": {
"description": "description",
"type": "integer",
"minimum": "200",
"maximum": "500",
"default": "200",
},
"domain": {
"description": "description",
"type": "string"
},
},
"if": {
"properties": {
"name": {
"const": "name1"
}
}
},
"then": {
"if": {
"properties": {
"version": {
"const": "version1"
}
}
},
"then": {
"properties": {
"cpus": {
"description": "description",
"type": "integer",
"minimum": 1,
"maximum": 8
},
"memory": {
"description": "description",
"type": "integer",
"minimum": 1,
"maximum": 32
},
},
"required": [
"cpus",
"memory"
]
},
"else": {
"if": {
"properties": {
"version": {
"const": "version2"
}
}
},
"then": {
"properties": {
"cpus": {
"description": "description",
"type": "integer",
"minimum": 1,
"maximum": 8
},
"diskSize": {
"description": "description",
"type": "integer",
"minimum": 250,
"maximum": 1000
},
},
"required": [
"cpus",
"diskSize"
]
}
}
},
"else": {
"if": {
"properties": {
"name": {
"const": "name2"
}
}
},
"then": {
"if": {
"properties": {
"version": {
"const": "version3"
}
}
},
"then": {
"properties": {
"diskSize": {
"description": "description",
"type": "integer",
"minimum": 100,
"maximum": 500
}
"memory": {
"description": "description",
"type": "integer",
"minimum": 1,
"maximum": 28
}
},
"required": [
"diskSize",
"memory"
]
},
"else": {
"if": {
"properties": {
"version": {
"const": "version4"
}
}
},
"then": {
"properties": {
"cpus": {
"description": "description",
"type": "integer",
"minimum": 1,
"maximum": 28
},
"memory": {
"description": "description",
"type": "integer",
"minimum": 1,
"maximum": 64
}
},
"required": [
"cpus",
"memory"
]
}
}
}
}
}
I need to build a JSON object using this schema in java. Every property in the schema is inside of a map that I have access to, so I can quite simply just get the property from the map and add it to a JsonNode object that I am building. Every property under the initial "properties" object is easy to retrieve, I can just get a list of them and then get each one from the map.
The complexity lies in the if/then/else part of the json schema. The only way I can see to find which property I need is to first build the initial part of the json from the first "properties" object and then have some sort of quite complex recursive algorithm that goes into every if/then/else statement and compares the value of the property being evaluated and then returns a list of the properties I need to get from the map. I have looked around online for a library that can build Json from a Json schema in java but haven't found anything that can deal with the complex if/then/else statements.
Any suggestions or ideas would be greatly appreciated.

AVRO GenericDatumWriter Fails when writing a ComplexType datum

Here is my scenario:
I have a schema for my class that is validated by Avro:
{
"type": "record",
"name": "MyCLass",
"namespace": "com.somepackage",
"fields": [
{
"name": "attributes",
"type": {
"type": "array",
"items": {
"type": "record",
"name": "KeyValuePair",
"fields": [
{
"name": "key",
"type": "string"
},
{
"name": "value",
"type": "string"
}
]
},
"java-class": "java.util.List"
}
},
{
"name": "someString",
"type": "string"
},
{
"name": "myclass1",
"type": {
"type": "record",
"name": "MyClass1",
"fields": [
{
"name": "attributes",
"type": {
"type": "record",
"name": "ArrayOfKeyValuePair",
"fields": [
{
"name": "item",
"type": {
"type": "array",
"items": "KeyValuePair",
"java-class": "java.util.List"
}
}
]
}
},
{
"name": "labels",
"type": {
"type": "record",
"name": "ArrayOfXsdString",
"fields": [
{
"name": "item",
"type": {
"type": "array",
"items": "string",
"java-class": "java.util.List"
}
}
]
}
},
{
"name": "uniqueID",
"type": "string"
}
]
}
},
{
"name": "MyClass2",
"type": {
"type": "record",
"name": "MyClass2",
"fields": [
{
"name": "attributes",
"type": "ArrayOfKeyValuePair"
},
{
"name": "someString",
"type": "string"
},
{
"name": "someLabel",
"type": "ArrayOfXsdString"
},
{
"name": "someId",
"type": "string"
}
]
}
}
]
}
This schema was generated with `GenericRecord obj = new GenericData.Record(ReflectData.get().getSchema(MyClass.class));
next I build the list of 4 parameters that the obj expects in it's values list:
KeyValuePair kv1 = new KeyValuePair();
kv1.setKey("key1");
kv1.setValue("val1");
KeyValuePair kv2 = new KeyValuePair();
kv2.setKey("key2");
kv2.setValue("val2");
List<KeyValuePair> attr = new ArrayList<KeyValuePair>();
attr.add(kv1);
attr.add(kv2);
ArrayOfKeyValuePair arrKV = new ArrayOfKeyValuePair();
arrKV.getItem().add(kv1);
MyClass1 ud = new MyClass1();
ud.setAttributes(arrKV);
ud.setUniqueID("SomeID");
MyCLass2 sd = new MyCLass2();
sd.setAttributes(arrKV);
sd.setExternalCaseID("SomeID");
sd.setUniqueID("SomeId");
obj.put("attributes", attr);
obj.put("workFlowName", "Nume workflow");
obj.put("userDescriptor", ud);
obj.put("subscriberDescriptor", sd);
And when I try:
ByteArrayOutputStream out = new ByteArrayOutputStream();
DatumWriter<GenericRecord> writerS = new GenericDatumWriter<GenericRecord>(schemaMentionedAbove);
Encoder encoder = EncoderFactory.get().binaryEncoder(out, null);
writerS.write(obj, encoder);
encoder.flush();
out.close();
The code fails at line: writerS.write(obj, encoder); with error:
KeyValuePair cannot be cast to org.apache.avro.generic.IndexedRecord
KeyValuePair class is a simple class with 2 fields: String key, String value.
After debugging I can see that DatumWriter fails on iterating through the first "attributes" array of records.
Any help appreciated! Thanks

Setting a dynamic date format in Elastic Search

I am new to Elastic Search.
I have a User mapping and associated with the User is a Nested Object extraDataValues. In this object is the id, a string value and another nested object. For example:
"extraDataValues": [
{
"id": 1,
"value": "01/01/2016 00:00:00",
"id": 10,
"label": "Metadata Date",
"displayable": true
},
},
{
"id": 2,
"value": "aaaa",
"id": 11,
"label": "Metadata TextBox",
"displayable": true
},
}
],
As you can see, value field can be a date or a normal string. The problem arises here, I want to be able to sort this value given that it could be either a date or a normal string. Moreover, the date can be in two formats: "dd/MM/yyyy HH:mm:ss", "dd/MM/yyyy". How can I achieve this firstly with Elastic Search (so I can understand the theory) and then Java?
I have tried adding "dynamic_date_formats" : ["dd/MM/yyyy HH:mm:ss", "dd/MM/yyyy"]
to no avail.
The mapping for the Users is:
User Mapping Document
{
"User": {
"properties": {
"fullName": {
"type": "string",
"index": "not_analyzed",
"fields": {
"raw_lower_case": {
"type": "string",
"analyzer": "case_insensitive"
}
}
},
"username": {
"type": "string",
"index": "not_analyzed",
"fields": {
"raw_lower_case": {
"type": "string",
"analyzer": "case_insensitive"
}
}
},
"email": {
"type": "string",
"index": "not_analyzed",
"fields": {
"raw_lower_case": {
"type": "string",
"analyzer": "case_insensitive"
}
}
},
"firstName": {
"type": "string",
"index": "not_analyzed",
"fields": {
"raw_lower_case": {
"type": "string",
"analyzer": "case_insensitive"
}
}
},
"surname": {
"type": "string",
"index": "not_analyzed",
"fields": {
"raw_lower_case": {
"type": "string",
"analyzer": "case_insensitive"
}
}
},
"id": {
"type": "long"
},
"extraDataValues": {
"type": "nested",
"dynamic_date_formats" : ["dd/MM/yyyy HH:mm:ss", "dd/MM/yyyy"],
"properties": {
"extraDataValueObject": {
"properties": {
"id": {
"type": "long"
},
"label": {
"type": "string"
},
"displayable": {
"type": "boolean"
}
}
},
"value": {
"type": "string",
"index": "not_analyzed",
"fields": {
"raw_lower_case": {
"type": "string",
"analyzer": "case_insensitive"
}
}
}
}
}
}
}
}

You can't do that the way you are trying to do it. dynamic_date_formats are used only for dynamically added date fields, not for date fields that you specify in your mapping (from the documentation).
What I would suggest trying out is this mapping:
"value": {
"type": "string",
"fields": {
"date1": {
"type": "date",
"format": "dd/MM/yyyy HH:mm:ss",
"ignore_malformed": "true"
},
"date2": {
"type": "date",
"format": "dd/MM/yyyy",
"ignore_malformed": "true"
}
}
}
Where you have a field which is string (for the string type part of the value) and for it you define two subfields each with a different date format. It's imperative to have for them "ignore_malformed": "true" in case you really have a string instead of a date coming in.
In this way you can index this:
POST /my_index/user/1
{
"value": "aaa"
}
POST /my_index/user/2
{
"value": "01/01/2016 00:00:00"
}
POST /my_index/user/3
{
"value": "02/02/2016"
}
And you could differentiate between which type of date or string was indexed like this in a query:
"query": {
"filtered": {
"filter": {
"exists": {
"field": "value.date2"
}
}
}
}
If ES was able to index something under value.date2 then you get that document back. The same goes for value.date1, of course.

jsonschema2pojo: referencing objects of the same type

I need to generate Java classes from a JSON schema file and came across jsonschema2pojo. However, I encountered a "problem" when using the ref keyword.
For example, if I use the following schema from http://spacetelescope.github.io/understanding-json-schema/structuring.html#extending:
{
"$schema": "http://json-schema.org/draft-04/schema#",
"definitions": {
"address": {
"type": "object",
"properties": {
"street_address": { "type": "string" },
"city": { "type": "string" },
"state": { "type": "string" }
},
"required": ["street_address", "city", "state"]
}
},
"type": "object",
"properties": {
"billing_address": { "$ref": "#/definitions/address" },
"shipping_address": { "$ref": "#/definitions/address" }
}
}
As expected, it generated a class named whatever you want to call it, containing an attribute billingAddress and an attribute shippingAddress.
However, it also generated two separate classes BillingAddress and ShippingAddress even though both attributes are referencing to address. Hence, I would rather have both attributes of type Address.
Is this possible to achieve with jsonschema2pojo?

Update
After getting a better understanding of javaType from here. I get the expected result by just adding a javaType in your Address definition.
{
"$schema": "http://json-schema.org/draft-04/schema#",
"definitions": {
"address": {
"type": "object",
"javaType": "Address",
"properties": {
"street_address": { "type": "string" },
"city": { "type": "string" },
"state": { "type": "string" }
},
"required": ["street_address", "city", "state"]
}
},
"type": "object",
"properties": {
"billing_address": { "$ref": "#/definitions/address" },
"shipping_address": { "$ref": "#/definitions/address" }
}
}
Answer with two files
You need to use javaType in your Address.json and use $ref for your billing_address and shipping address. I would suggest you to separate the address definition into a separate json and then use that in your billing_address and shipping_address.
Address.json
{
"$schema": "http://json-schema.org/draft-03/hyper-schema",
"additionalProperties": false,
"javaType": "whatever-package-name-you-have.Address"
"type": "object",
"properties": {
"street_address": { "type": "string", "required":true},
"city": { "type": "string", "required":true },
"state": { "type": "string", "required":true }
}
}
MainClass.json
{
"$schema": "http://json-schema.org/draft-03/hyper-schema",
"additionalProperties": false,
"type": "object",
"properties": {
"billing_address": {
"$ref":"Address.json",
"type": "object",
"required": false
},
"shipping_address": {
"$ref":"Address.json",
"type": "object",
"required": false
}
}
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to create schema containing list of objects using Avro? - java

Related

Save Nested JSON using Spring data jpa

Build JSON from a very complex JSON Schema in Java

AVRO GenericDatumWriter Fails when writing a ComplexType datum

Setting a dynamic date format in Elastic Search

jsonschema2pojo: referencing objects of the same type

Categories

Resources