Document.parse() constructor not working for nested json array - java

I have one extended json string.
{"_id": {"oid": "59a47286cfa9a3a73e51e72c"}, "theaterId": {"numberInt": "101100"}, "location": {"address": {"street1": "340 XDW Market", "city": "Bloomington", "state": "MN", "zipcode": "12427"}, "geo": {"type": "Point", "coordinates": [{"$numberDouble": "-193.24565"}, {"$numberDouble": "144.85466"}]}}}
Trying to convert above json string to document in order to insert it into MongoDB. For this I am using org.bson.Document.Document.parse(json_string) constructor.
But the document I am getting after parsing, doesn't preserve the datatype inside geo.coordinate arraylist (Check below Document). While it preserve datatype of theaterId.
{
"_id": {
"oid": "59a47286cfa9a3a73e51e72c"
},
"theaterId": {
"numberInt": "101100"
},
"location": {
"address": {
"street1": "340 XDW Market",
"city": "Bloomington",
"state": "MN",
"zipcode": "12427"
},
"geo": {
"type": "Point",
"coordinates": [-193.24565, 144.85466]
}
}
}
Is this a potential issue in Document.parse() API ?

Your fields in geo.coordinate are starting with dollar sign $. In theaterId you have numberInt, while in coordinate - $numberDouble.
Check the docs and this question for how to handle it depending on what you need. Considering, that it looks like numberInt satisfies your needs, you might just need to remove the dollars from field names.
Edit: After digging somewhat deeper into those docs, the one you provided as well, {"numberInt": "101100"} is not extended json with datatype, it's just a normal json object with property and value for that property. It would need to be {"$numberInt": "101100"} to be extended json. On the other hand {"$numberDouble": "-193.24565"} is extended. The datatype is not lost, it's parsed into List<Double>, since we know each element is of type Double the datatype can be reconstructed back.
If you take at Document.toJson(), under the hood it's working with RELAXED output mode, which will output coordinates as you are seeing them - [-193.24565, 144.85466]. If you provide EXTENDED output mode, for example like this:
JsonWriterSettings settings = JsonWriterSettings.builder().outputMode(JsonMode.EXTENDED).build();
System.out.println(document.toJson(settings));
then the datatype will be reconstructed back from the java type, and coordinates will look like so:
[{"$numberDouble": "-193.24565"}, {"$numberDouble": "144.85466"}]
In conclusion, there is no problem with Document.parse("json"), but there might be a problem with the json you are supplying to it.
Edit2:
As in showed in example, the datatypes can be reconstructed back from java types. I am not familiar with the way collection.insertOne(Document.parse(json_string)) works under the hood, but if you don't explicitly specify the mode, it might be using RELAXED by default, instead of EXTENDED. The docs here state - This format prioritizes type preservation at the loss of human-readability and interoperability with older formats., so it would make sense. But this is just a wild guess on my part though, you would need to dig into docs to make sure.

Related

JSON Schema - Enum of Objects

I'm new to JSON schema, so bear with me. My goal is to have a JSON property that is an object. It's keys relate to each other, meaning multiple keys always have the same values together. This will probably help make it clear, it's my attempt to do this with an enum:
{
"$schema": "https://json-schema.org/draft/2019-09/schema",
"title": "Part",
"type": "object",
"properties": {
"relationship": {
"type": "object",
"enum": [
{
"code": "1",
"value": "MEMBER"
},
{
"code": "2",
"value": "SPOUSE"
},
{
"code": "3",
"value": "CHILD"
},
{
"code": "4",
"value": "STUDENT"
},
{
"code": "5",
"value": "DISABILITY_DEPENDENT"
},
{
"code": "6",
"value": "ADULT_DEPENDENT"
},
{
"code": "8",
"value": "DOMESTIC_PARTNER"
}
]
}
}
}
So using an enum like this works, even though I can't find it anywhere in the JSON Schema spec. However, the error message sucks. Normally I get the most extremely detailed error messages from schema validation, however in this case I do not.
$.part.relationship: does not have a value in the enumeration [, , , , , , ]
I'm not sure what I'm doing wrong. I'm using a Java parser for JSON Schema:
<dependency>
<groupId>com.networknt</groupId>
<artifactId>json-schema-validator</artifactId>
<version>1.0.53</version>
</dependency>
Not sure if the error message is the fault of the parser or something I'm doing bad with the schema. Help would be appreciated.
It was news to me, but according to the spec it does seem that objects are valid enum values. That said, your usage is quite unusual. I've not seen it used before.
the six primitive types ("null", "boolean", "object", "array", "number", or "string")
...
6.1.2. enum
...
Elements in the array might be of any type, including null.
Your problem is fundamentally that the library that you're using doesn't know how to convert those objects to printable strings. Even if it did give it a reasonable go, you might end up with
does not have a value in the enumeration [{"code": "1", "value":"MEMBER"}, {"code": "2" ...
which might be okay, but it's hardly amazing. If the code and value were both valid but didn't match, you might have to look quite closely at the list before you ever saw the problem.
JSON Schema in general is not very good at enforcing constraints between what it considers to be 2 unrelated fields. That's beyond the scope of it what it aims to do. It's trying to validate the structure. Dependencies between fields are business constraints, not structural ones.
I think the best thing you could do to achieve readable error messages would be to have 2 sub-properties, each with an enumeration containing 8 values; one for the codes, one for the values.
Then you'll get
$.part.relationship.code does not have a value in the enumeration [1,2,3,4 ...
or
$.part.relationship.value does not have a value in the enumeration ["MEMBER", "SPOUSE", ...
You can do some additional business validation on top of the schema validation if enforcing that constraint is important to you. Then generate your own error such as
code "1" does not match value "SPOUSE"
If code and value always have the same values relative to each other, why encode both in the JSON? Just encode a single value in the JSON and infer the other in the application.
This will be much easier to validate.

Could avro's logical types be used to validate input data?

I'm trying to understand how avro's logicaltypes were supposed to be used.
First let me give an example about what I'm trying to achieve; I wanna write a new Logical Type (RegExLogicalType) that validates an input string and either accept it or raise some Exception.
or let's speak about one of the existing supported avro's logical types (decimal) I was expecting to use it in this way:
If invalid decimal logical type is specified an exception must be raised; something like when mandatory field was expected but nothing has been provided org.apache.avro.AvroRuntimeException: Field test_decimal type:BYTES pos:2 not set and has no default value
If a valid decimal logical type is specified no Exception should be raised.
what I have found in the documentation is only speaking about reading/de-serialization and I don't know what about writing/serialization
Language implementations must ignore unknown logical types when
reading, and should use the underlying Avro type. If a logical type is
invalid, for example a decimal with scale greater than its precision,
then implementations should ignore the logical type and use the
underlying Avro type.
I don't want the above mention behavior for the serialization/de-serialization I need to have something equivalent to XSD restrictions (patterns) that is used to validate the data against the schema
here in avro if the schema is as follows
{"namespace": "com.stackoverflow.avro",
"type": "record",
"name": "Request",
"fields": [
{"name": "caller_jwt", "type": "string", "logicalType": "regular-expression", "pattern": "[a-zA-Z0-9]*\\.[a-zA-Z0-9]*\\.[a-zA-Z0-9]*"},
{"name": "test_decimal", "type": "bytes", "logicalType": "decimal", "precision": 4, "scale": 2}
]
}
and if I tried to build an object and serialize it like:
DatumWriter<Request> userDatumWriter = new SpecificDatumWriter<>(Request.class);
DataFileWriter<Request> dataFileWriter = new DataFileWriter<>(userDatumWriter);
ByteBuffer badDecimal = ByteBuffer.wrap("bad".getBytes());
Request request = Request.newBuilder()
.setTestDecimal(badDecimal) // bad decimal
.setCallerJwt("qsdsqdqsd").build(); // bad value according to regEx
dataFileWriter.create(request.getSchema(), new File("users.avro"));
dataFileWriter.append(dcCreationRequest);
dataFileWriter.close();
no exception is thrown and the object is serialized to users.avro file
so I don't know if avro's logical types could be used to validate input data? or there is something else that could be used to validate input data?

Convert nested arbitrary JSON to CSV in Java

This question has been asked many times but I couldn't find the answer that fixes my issue.
I'm trying to convert nested JSON format to CSV format like this :
The JSON structure is arbitrary and could be anything, nested or not.
I'm not suppose to know it, it's a database answer and I need to export this JSON answer into CSV file.
Here is an example
Input :
{
"_id": 1,
"name": "Aurelia Menendez",
"scores": [
{
"type": "exam",
"score": 60.06045071030959
},
{
"type": "quiz",
"score": 52.79790691903873
},
{
"type": "homework",
"score": 71.76133439165544
}
]
}
The output I'm looking for :
_id,name,scores.type,scores.score,scores.type,scores.score,scores.type,scores.score
1,Aurelia Menendez,exam,60.06...,quiz,52.79...,homework,71.76..
This is an example, it could be any other JSON document.
The idea here is to use dot notation in the CSV column name.
I've already used CDL but the output is not what I want :
_id scores name
1 "[{score:60.06045071030959,type:exam},{score:52.79790691903873,type:quiz},{score:71.76133439165544,type:homework}]" Aurelia Menendez
So how can I convert nested JSON to CSV with dot notation and in a generic way ?
Edits
Deserialisation of the JSON with Jackson :
ObjectMapper mapper=new ObjectMapper();
JsonNode jsonNode=mapper.readValue(new File("C:\\...\\...\...\\test.json"), JsonNode.class);
Ismail
Like you said :
The JSON structure is arbitrary and could be anything, nested or not.
The JSON to CSV conversion can't be generalized as it varies from user to user and also depends specific requirements.
But still there's a library json2flat which tries to achieve it. But it may differ from user's requirement. Still it's worth a try.
For example for the JSON given above:
{
"_id": 1,
"name": "Aurelia Menendez",
"scores": [
{
"type": "exam",
"score": 60.06045071030959
},
{
"type": "quiz",
"score": 52.79790691903873
},
{
"type": "homework",
"score": 71.76133439165544
}
]
}
can be interpreted as follows :
/_id,/name,/scores/type,/scores/score
1,"Aurelia Menendez","exam",60.06045071030959
1,"Aurelia Menendez","quiz",52.79790691903873
1,"Aurelia Menendez","homework",71.76133439165544
Converting JSON to XLS/CSV in Java has what you are looking for.
Basically, you need to use org.json.CDL to convert from JSON to CSV format
Comments are not convenient place to post longs answers, so I post my answer here.
Analyze your JSON and all possible JSON structures you can get from your database. It should be a limited number of JSON forms.
As you have analyzed your JSON structure build a class/class hierarchy, that fully reflects this structure.
Use JSON serializer/deserializer library at your choice, to deserialize JSON to a java object.
Employ StringBuffer/StringBuilder classes, and iterate over your object information, and build comma delimited (or tab-delimited) strings.
Write strings you have built on the previous stage to the file.
That's it.

Update the json file in android file

I am saving a json response inside my app using sharedPreference(jsonObject.toString()). It contains JSONArray, when the user updates some value of any one element, I wish to save the updated changes on the sharedPreference. Please help me for this task.
Example:-
{
"locations": {
"record": [
{
"id": 8817,
"loc": "NEW YORK CITY"//update this as California and save the response
},
{
"id": 2873,
"loc": "UNITED STATES"
},
{
"id": 1501
"loc": "NEW YORK STATE"
}
]
}
}
It seems like you're trying to override the purpose of SharedPreferenecs,
It's purpose is to save primitive values such as strings integers or booleans, for simple use of single values, I wouldn't treat a Json Array as a single primitive value.
If I were you I would go with QuokMoon's offer with the Local Sqlite Database, this will allow you simple access for CRUD operations, the setup time is a bit longer, but the benefits you'll find are far beyond SharedPreferences.

How to map Json to Java classes when some variable names begin with a number?

Recently I've been playing with a webservice that returns a json object like this
{
"id": 88319,
"dt": 1345284000,
"name": "Benghazi",
"coord": {
"lat": 32.12,
"lon": 20.07
},
"main": {
"temp": 306.15,
"pressure": 1013,
"humidity": 44
},
"wind": {
"speed": 1,
"deg": -7
},
"clouds": {
"all": 90
},
"rain": {
"3h": 3
}
}
I have automatically generated Java classes mapping to that json data. The problem is I cannot generate a Java class with an attribute named 3h (in Java as in many other languages variable identifiers cannot begin with a number). As a hack around I have redefined the attribute 3h as h3, and whenever I receive a json response from the web service I replace the string "3h" by "h3".
However, that approach is only appropriate for small projects. I would like to know if there is a more convenient approach to deal with this kind of situation.
Notes: For this particular example I used an online tool that generated the java classes given a json example. In other situations I have used Jackson, and other frameworks. ¿Is the answer to this question framework dependent? To be more concrete, and having the future in mind, I would like to adhere to the json-schema specification
If using Gson you can do it with #SerializedName annotation.
Java Class:
public class JsonData {
#SerializedName("3h")
private int h3;
private String name;
public JsonData(int h3, String name) {
this.h3 = h3;
this.name = name;
}
}
Serialization: (same class works for fromJson() as well)
// prints: {"3h": 3,"name": "Benghazi"}
System.out.println(new Gson().toJson(new JsonData(3, "Benghazi")));
Reference:
#SerializedName Annotation
Here is what you're looking for. Simple as it seems, getting the syntax right took me a while.
public class Rain {
#JsonProperty("3h")
public BigDecimal detail = BigDecimal.valueOf(0);
}
You may not need it, but I set the default to 0.
"3h" is the name of the key.
"detail" is the name I gave the property to hold the value that WAS represented by "3h".
You can prefix the data type of the property you are generating. Like arr_,int_,obj_ etc for respective objects because during autogeneration, you will anyway have to deal with the datatype. This will become a general fix rather than specifically looking for strings like "3h". But design-wise or good-practice wise this might not be the most optimal solution.
"I would like to know if there is a more convenient approach to deal with this kind of situation."
Firstly I can not make out what "this kind of situation" is since you have not mentioned the frameworks or approach by which you are making the mapping.
And if you want to have the attribute identifiers mapping to your json keys, to begin with numbers, it is not possible and that too is for good only.
Since,say for example if your last subdocument:
"rain": {
"3h": 3
}
is mapped to a class Rain as:
class Rain{
int 3h=3
}
Then how will you parse the variable assignment "3h=3" ?(Refer this SO post)
So what I can think of a way is that maybe you can prefix any keys starting with numbers with special legal identifier charatcers (like "_" underscore) and later on remove the assignment.
Which means, you can map your json subdocument rain as:
class Rain{
int _3h=3
}
And later remove the leading underscore while deserializing.
Hope that helps!!!

Categories