I'm using Java, Spring-boot, Hibernate stack and protocol buffers as DTO for communication among micro-services. At reverse proxy, I convert the protobuf object to json using protobuf's java support.
I have the following structure
message Item {
int64 id = 1;
string name = 2;
int64 price = 3;
}
message MultipleItems {
repeated Item items = 1;
}
Converting the MultipleItems DTO to json gives me the following result:
{
"items": [
{
"id": 1,
"name": "ABC",
"price": 10
},
{
"id": 2,
"name": "XYZ",
"price": 20
}
]
}
In the generated json, I've got the key items that maps to the json array.
I want to remove the key and return only json array as the result. Is there a clean way to achieve this?
I think it's not possible.
repeated must appear as a modifier on a field and fields must be named.
https://developers.google.com/protocol-buffers/docs/proto3#json
There's no obvious reason why Protobuf could not support this1 but, it would require that its grammar be extended to support use of repeated at the message level rather than its current use at the field level. This, of course, makes everything downstream of the proto messages more complex too
JSON, of course, does permit it.
It's possible that it complicates en/decoding too (an on-the-wire message could be either a message or an array of messages.
1 Perhaps the concern is that generated code (!) then must necessarily be more complex too? Methods would all need to check whether the message is an array type or a struct type, e.g.:
func (x *X) SomeMethod(ctx context.Context, []*pb.SomeMethodRequest) ...
And, in Golang pre-generics, it's not possible to overload methods this way and they would need to have distinct names:
func (x *X) SomeMethodArray(ctx context.Context, []*pb.SomeMethodRequest) ...
func (x *X) SomeMethodMessage(ctx context.Context, *pb.SomeMethodRequest) ...
Related
I am reading data from kafka in Java to perform some processing in Apache Flink and sink the results.
I have the kafka topic topic_a which has some data like {name: "abc", age: 20} and some data like {pin: 111, number: 999999, address: "some place"}
When I read the data from kafka using KafkaSource, I deserialize the records into a POJO which has the fields String name, int age with their respective getter and setter functions and constructor.
When I run the flink code, the deserliazer works fine for {name: "abc", age: 20}
KafkaSource<AllDataPOJO> kafkaAllAlertsSource = KafkaSource.<AllAlertsPOJO>builder()
.setBootstrapServers(bootstrapServers)
.setTopics(Arrays.asList("topic_a"))
.setProperties(properties)
.setGroupId(allEventsGroupID)
.setStartingOffsets(OffsetsInitializer.earliest())
.setValueOnlyDeserializer(new AllDataDeserializationSchema())
.build();
AllDataPOJO
private String name;
private int age;
The code runs fine for {name: "abc", age: 20}, but as soon as {pin: 111, number: 999999, address: "some place"}, it starts failing.
2 questions:
Is there any way that I can read such varying formats of messages and perform the flink operations. Depending on what kind of message comes, I wish to route it to a different kafka topic.?
When I get {name: "abc", age: 20}, it should go to topic user_basic and {pin: 111, number: 999999, address: "some place"} should go to topic ** user_details**
How can I achieve the above with just 1 flink java code?
You might be interested in specifying your Deserialization Schema as:
.setDeserializer(KafkaRecordDeserializationSchema.of(new JSONKeyValueDeserializationSchema(false)))
Then you would then map and filter that source with, validating which fields are present:
Key fields can be accessed by calling objectNode.get("key").get(<name>).as(<type>)
Value fields can be accessed by calling objectNode.get("value").get(<name>).as(<type>)
Or cast the objects to existing POJOs inside your map.
You cannot use <AllDataPOJO> if you have other POJO classes with other fields.
Or, you need to add all fields from all POJO types, and make them nullable when they don't exist in your data. But that may be error prone as name and pin could potentially exist in the same record, for example, but shouldn't.
Otherwise, as the other answer says, use a more generic String/JSON deserializer, and then you can use filter/map operations to cast your data into more concrete types, depending on the fields that are available
In situations like this I normally use the SimpleStringSchema, then follow the source with a ProcessFunction where I parse the string, and use side outputs (one per each message type). The added benefit to this approach is that if the JSON isn't deserializable, or it doesn't properly map to any of the target types, you have the opportunity to flexibly handle the error (e.g. send out to an error sink).
I have one extended json string.
{"_id": {"oid": "59a47286cfa9a3a73e51e72c"}, "theaterId": {"numberInt": "101100"}, "location": {"address": {"street1": "340 XDW Market", "city": "Bloomington", "state": "MN", "zipcode": "12427"}, "geo": {"type": "Point", "coordinates": [{"$numberDouble": "-193.24565"}, {"$numberDouble": "144.85466"}]}}}
Trying to convert above json string to document in order to insert it into MongoDB. For this I am using org.bson.Document.Document.parse(json_string) constructor.
But the document I am getting after parsing, doesn't preserve the datatype inside geo.coordinate arraylist (Check below Document). While it preserve datatype of theaterId.
{
"_id": {
"oid": "59a47286cfa9a3a73e51e72c"
},
"theaterId": {
"numberInt": "101100"
},
"location": {
"address": {
"street1": "340 XDW Market",
"city": "Bloomington",
"state": "MN",
"zipcode": "12427"
},
"geo": {
"type": "Point",
"coordinates": [-193.24565, 144.85466]
}
}
}
Is this a potential issue in Document.parse() API ?
Your fields in geo.coordinate are starting with dollar sign $. In theaterId you have numberInt, while in coordinate - $numberDouble.
Check the docs and this question for how to handle it depending on what you need. Considering, that it looks like numberInt satisfies your needs, you might just need to remove the dollars from field names.
Edit: After digging somewhat deeper into those docs, the one you provided as well, {"numberInt": "101100"} is not extended json with datatype, it's just a normal json object with property and value for that property. It would need to be {"$numberInt": "101100"} to be extended json. On the other hand {"$numberDouble": "-193.24565"} is extended. The datatype is not lost, it's parsed into List<Double>, since we know each element is of type Double the datatype can be reconstructed back.
If you take at Document.toJson(), under the hood it's working with RELAXED output mode, which will output coordinates as you are seeing them - [-193.24565, 144.85466]. If you provide EXTENDED output mode, for example like this:
JsonWriterSettings settings = JsonWriterSettings.builder().outputMode(JsonMode.EXTENDED).build();
System.out.println(document.toJson(settings));
then the datatype will be reconstructed back from the java type, and coordinates will look like so:
[{"$numberDouble": "-193.24565"}, {"$numberDouble": "144.85466"}]
In conclusion, there is no problem with Document.parse("json"), but there might be a problem with the json you are supplying to it.
Edit2:
As in showed in example, the datatypes can be reconstructed back from java types. I am not familiar with the way collection.insertOne(Document.parse(json_string)) works under the hood, but if you don't explicitly specify the mode, it might be using RELAXED by default, instead of EXTENDED. The docs here state - This format prioritizes type preservation at the loss of human-readability and interoperability with older formats., so it would make sense. But this is just a wild guess on my part though, you would need to dig into docs to make sure.
I need to serialize a map to a json in a certain order.
This is the map
HashMap<String, String> dataMap = {
"CompanyCode": "4",
"EntyyCode": "2002296",
"SubEntityCode": "000",
"ContractNumber": "52504467115",
"Progressive Contract": "0",
"DocumentNumber": "200003333494028",
"LogonUserName": "AR333",
"Progressive Title": "0"
}
This is the json model I would like:
{
"Policy": {
"ContractNumber": "52504467115",
"ProgressiveContract": "0"
},
"Title": {
"LogonUserName": "AR333",
"ProgressiveTitle": "0"
},
"BusinessChannel": {
"CompanyCode": "4",
"EntyyCode": "2002296",
"SubEntityCode": "000"
},
"Document": {
"DocumentNumber": "200003333494028"
}
}
I need to convert this map into a JSON string. I know that this can be done using Jackson as below:
new ObjectMapper().writeValueAsString(map);
How do I do this using Jackson? Or is there any other way to do this in Java?
Thank you
First of all, the solution you request contains a second problem: partition. Not only must the items contain a particular order, but they must also somehow be divided over different categories. In Java, these categories usually correspond to their own classes or, since recently, records. Then the top level class (corresponding to the unnamed outer object of the JSON) determines ordering, as so (the name Contract is my choice):
record Contract(
Policy policy,
Title title,
BusinessChannel businessChannel,
Document document )
{
}
with each of the properties of Contract having their own class, e.g.:
record Policy( String contractNumber, int progressiveContract )
etc.
Serializing Contract then recursively serializes each of its parameters, with the required outcome as the result.
This would be the 'standard' way.
So, since you start with a HashMap, which by contract offers no guarantee of ordering, let alone an easy way to partition its contents into sub-objects, you could try two things:
Rethink the use of a map. Switching to the class structure takes care of the structure automatically.
Manually stream and convert the values in order (or use e.g. a TreeMap with custom Comparator) and then partition the values themselves. This probably requires more work than a map saves.
I'm trying to understand how avro's logicaltypes were supposed to be used.
First let me give an example about what I'm trying to achieve; I wanna write a new Logical Type (RegExLogicalType) that validates an input string and either accept it or raise some Exception.
or let's speak about one of the existing supported avro's logical types (decimal) I was expecting to use it in this way:
If invalid decimal logical type is specified an exception must be raised; something like when mandatory field was expected but nothing has been provided org.apache.avro.AvroRuntimeException: Field test_decimal type:BYTES pos:2 not set and has no default value
If a valid decimal logical type is specified no Exception should be raised.
what I have found in the documentation is only speaking about reading/de-serialization and I don't know what about writing/serialization
Language implementations must ignore unknown logical types when
reading, and should use the underlying Avro type. If a logical type is
invalid, for example a decimal with scale greater than its precision,
then implementations should ignore the logical type and use the
underlying Avro type.
I don't want the above mention behavior for the serialization/de-serialization I need to have something equivalent to XSD restrictions (patterns) that is used to validate the data against the schema
here in avro if the schema is as follows
{"namespace": "com.stackoverflow.avro",
"type": "record",
"name": "Request",
"fields": [
{"name": "caller_jwt", "type": "string", "logicalType": "regular-expression", "pattern": "[a-zA-Z0-9]*\\.[a-zA-Z0-9]*\\.[a-zA-Z0-9]*"},
{"name": "test_decimal", "type": "bytes", "logicalType": "decimal", "precision": 4, "scale": 2}
]
}
and if I tried to build an object and serialize it like:
DatumWriter<Request> userDatumWriter = new SpecificDatumWriter<>(Request.class);
DataFileWriter<Request> dataFileWriter = new DataFileWriter<>(userDatumWriter);
ByteBuffer badDecimal = ByteBuffer.wrap("bad".getBytes());
Request request = Request.newBuilder()
.setTestDecimal(badDecimal) // bad decimal
.setCallerJwt("qsdsqdqsd").build(); // bad value according to regEx
dataFileWriter.create(request.getSchema(), new File("users.avro"));
dataFileWriter.append(dcCreationRequest);
dataFileWriter.close();
no exception is thrown and the object is serialized to users.avro file
so I don't know if avro's logical types could be used to validate input data? or there is something else that could be used to validate input data?
Recently I've been playing with a webservice that returns a json object like this
{
"id": 88319,
"dt": 1345284000,
"name": "Benghazi",
"coord": {
"lat": 32.12,
"lon": 20.07
},
"main": {
"temp": 306.15,
"pressure": 1013,
"humidity": 44
},
"wind": {
"speed": 1,
"deg": -7
},
"clouds": {
"all": 90
},
"rain": {
"3h": 3
}
}
I have automatically generated Java classes mapping to that json data. The problem is I cannot generate a Java class with an attribute named 3h (in Java as in many other languages variable identifiers cannot begin with a number). As a hack around I have redefined the attribute 3h as h3, and whenever I receive a json response from the web service I replace the string "3h" by "h3".
However, that approach is only appropriate for small projects. I would like to know if there is a more convenient approach to deal with this kind of situation.
Notes: For this particular example I used an online tool that generated the java classes given a json example. In other situations I have used Jackson, and other frameworks. ¿Is the answer to this question framework dependent? To be more concrete, and having the future in mind, I would like to adhere to the json-schema specification
If using Gson you can do it with #SerializedName annotation.
Java Class:
public class JsonData {
#SerializedName("3h")
private int h3;
private String name;
public JsonData(int h3, String name) {
this.h3 = h3;
this.name = name;
}
}
Serialization: (same class works for fromJson() as well)
// prints: {"3h": 3,"name": "Benghazi"}
System.out.println(new Gson().toJson(new JsonData(3, "Benghazi")));
Reference:
#SerializedName Annotation
Here is what you're looking for. Simple as it seems, getting the syntax right took me a while.
public class Rain {
#JsonProperty("3h")
public BigDecimal detail = BigDecimal.valueOf(0);
}
You may not need it, but I set the default to 0.
"3h" is the name of the key.
"detail" is the name I gave the property to hold the value that WAS represented by "3h".
You can prefix the data type of the property you are generating. Like arr_,int_,obj_ etc for respective objects because during autogeneration, you will anyway have to deal with the datatype. This will become a general fix rather than specifically looking for strings like "3h". But design-wise or good-practice wise this might not be the most optimal solution.
"I would like to know if there is a more convenient approach to deal with this kind of situation."
Firstly I can not make out what "this kind of situation" is since you have not mentioned the frameworks or approach by which you are making the mapping.
And if you want to have the attribute identifiers mapping to your json keys, to begin with numbers, it is not possible and that too is for good only.
Since,say for example if your last subdocument:
"rain": {
"3h": 3
}
is mapped to a class Rain as:
class Rain{
int 3h=3
}
Then how will you parse the variable assignment "3h=3" ?(Refer this SO post)
So what I can think of a way is that maybe you can prefix any keys starting with numbers with special legal identifier charatcers (like "_" underscore) and later on remove the assignment.
Which means, you can map your json subdocument rain as:
class Rain{
int _3h=3
}
And later remove the leading underscore while deserializing.
Hope that helps!!!