How to serialize proto3 message to a string and back? - java

In Java, I want to convert a proto3 message into a string that:
I can send over HTTP
As more fields get added to the proto, it's still able to deserialize old strings?
If you are wondering why I need to serialize into a string, I have a proto:
message Order {
...
}
and I want to create a string 'order_tag' out of it that I want to pass around.
I saw com.google.protobuf.TextFormat but it says it's for proto2 and also it doesn't say anything about backward compatibility.

You can serialize the message to bytes, and use base64 encoding to convert the serialized value to string. In this way even if the fields change, you should be able to deserialize the string as long as the schema change happens within the restrictions defined in https://developers.google.com/protocol-buffers/docs/proto3#updating.
Don't use text encoding except for debugging purposes. It doesn't provide the same backward compatibility guarantee that the binary format provides (e.g. changing a field name will break existing serialized data).
To serialize to string:
BaseEncoding.base64().encode(order.toByteArray())
and to deserialize:
Order.parseFrom(BaseEncoding.base64().decode(orderStr))

Related

Jackson: avoid deserialization of some fields but not ignoring them

i need to carry all the json data (to store them, log, return) but i will never access them from code. is there any way to avoid deserializing them but still use them during serialization?
class MyObject {
int importantField; // i want this field to be properly deserialized
String notImportantJsonGarbage; // i don't care what's here. it must be a valid json
}
So now i'd like to be able to deserialize it from
{"importantField":7, "notImportantJsonGarbage":{"key1":3, "key2":[1,2,3]}}
and later serialize it to the same string
UPDATE
i don't want to ignore this property. i need this data. but as a string, not fully deserialized object
i need to be able to do:
json1 -> object -> json2
json1 == json2
Take a look at: JsonProperty.Access
AUTO - Access setting which means that visibility rules are to be used to automatically determine read- and/or write-access of
this property.
READ_ONLY - Access setting that means that the property may only be read for serialization, but not written (set) during
deserialization.
READ_WRITE - Access setting that means that the property will be accessed for both serialization (writing out values as
external representation) and deserialization (reading values from
external representation), regardless of visibility rules.
WRITE_ONLY - Access setting that means that the property may only be written (set) for deserialization, but will not be read
(get) on serialization, that is, the value of the property is not
included in serialization.
So in your case you could use it like this:
#JsonProperty(access = Access.READ_ONLY)
private String notImportantJsonGarbage;
You can use annotation #JsonIgnoreover the property that you wish to ignore
You can use the type JSONObject for the Json field.
private JSONObject notImportantJsonGarbage;
And when you need to read that, you can convert it to a String (or other type). You can use jackson or gson libraries to achieve that.
Note that when you convert the JsonObject back to String, the resulting string might have the quotes escaped.
Using JSONObject also solves your requirement of 'must be a valid json object'.

Java serialization to string

I have the following declaration of the static type Object:
Integer typeId;
//Obtaining typeId
Object containerObject = ContainerObjectFactory.create(typeId);
The factory can produce different types of container objects, e.g. Date, Integer, BigDecimal and so forth.
Now, after creating the containerObejct I need to serialize it to an object of type String and store it into a database with hibernate. I'm not going to provide Object-relational mapping because it doesn't relate to the question directly.
Well, what I want to do is to serialize the containerObject depending on it runtime-type and desirialize it later with the type it was serialized. Is it ever possible? Could I use xml-serialization for those sakes?
There are numerous alternatives, and your question is quite broad. You could:
use the native Java serialisation, which is binary, and then Base64 encode it
use an XML serialisation library, such as XStream
use a JSON serialisation library, such as Gson
One key feature you mention is that the object type needs to be embedded in the serialised data. Native Java serialisation embeds the type in the data so this is a good candidate. This is a double-edged sword however, as this makes the data brittle - if at some time in the future you changed the fully qualified class name then you'd no longer be able to deserialise the object.
Gson, on the other hand, doesn't embed the type information, and so you'd have to store both the JSON and the object type in order to deserialise the object.
XML and JSON have advantages that they're a textual format, so even without deserialising it, you can use your human eyes to see what it is. Base64 encoded Java serialisation however, is an unintelligible blob of characters.
There are multiple ways, but you need custom serialization scheme, e.g.:
D|25.01.2015
I|12345
BD|123456.123452436
where the first part of the String represents the type and the second part represents the data. You can even use some binary serialization scheme for this.

How would I decode this JSON with GSON (multi-d-array)?

I am using GSON to decode JSON strings that are returned from our server. I haven't had any problems, until I ran into this one particular JSON return from the API. The return is in the following format:
"success":1,"errors":[],"data":{"524":{"id":"524"}, "525":{"id":"525"}}
For the other returns I had data as an array of a class of my own creation, but for this return it says that it is an object and not an array. So how should I format my class?
***edit: What I am having trouble with is that the '524' and '525' fields are not static names. They are dependent on what the user's credentials are. There could be fields 323, 324, 325 or a single one 123. It all depends. How would I be able to handle this dynamically?
SOLVED*
What I had to was make 'data' a <String, Object> hashmap in my custom class. Then after the first decoding, I turned 'data' into an array of type Object []. Then for each Object[i], I converted it into a JSON string. After that I used gson.fromJson() to convert it into what I had originally intended for it to be.
If the API is giving inconsistent results and you can't find a reason on your end why it is doing so, one option is to parse the object into a GSON JSONObject o = gson.fromJson(String) and then convert the data to a list if it is not one already by doing o.getElement("data").isList(), etc..
When this is complete, you can then create the object via gson.fromJson(JSONObject,Class). The alternative is to have two classes, one for each instance, but this seems sloppy if this is the only reason to have two different classes.
GSON is correct. From server reply data is object with two members that are objects also. To be array data should have square brackets [] instead of curly brackets {}. More about JSON format here.
Server format was changed or you tried another API version or someone made bug on server side.

Java sockets - how to determine data type on the server side?

I'm using Java sockets for client - server application. I have a situation when sometimes client needs to send a byte array (using byteArrayOutputStream) and sometimes it should send a custom java object. How can I read the information from the input stream on the server side and determine what is in the stream so that I can properly process that?
Usually this is to be done by sending a "header" in front of the body containing information about the body. Have a look at for example the HTTP protocol. The HTTP stream exist of a header which is separated from the body by a double newline. The header in turn exist of several fields in name: value format, each separated by a single newline. In this particular case, you would in HTTP have used the Content-Type header to identify the data type of the body.
Since Java and TCP/IP doesn't provide standard facilities for this, you would need to specify and document the format you're going to send over the line in detail so that the other side knows how to handle the stream. You can of course also grab a standard specification. E.g. HTTP or FTP.
There are multiple ways to handle this.
One is Object Serialization, which sends it over with Java's Object(In|Out)putStream. You run into a small problem when knowing when to read the object off the stream though.
Another is to marshal and unmarshal XML. Uses a bit more traffic but is easier to debug and get running. It helps to have a well documented XML schema for this. An advantage here is you can use existing XML libraries for it.
You could try a custom format if you wanted, but it would probably end up being just a sloppy, less verbose version of XML.
In general, I don't believe there is a feature built into Java that allows you to do this.
Instead, consider sending some more information along with each message that explains what type is coming next.
For example, you might prefix your messages with an integer, such that every time you receive a message, you read the first 4 bytes (an integer is 4 bytes) and interpret its value (e.g. 1=byte array, 2=custom Java object, 3=another custom Java object, ...).
You might also consider adding an integer containing the size of the message so that you know when the current message ends and the next message begins.
I'm going to get called for overkill for this, but unless you seriously need for this protocol to be economical, you might consider marshalling the data. I mean, without peeking at the data, you can't generally tell the difference between something that's a byte array and something that's something else, since you could conceivably represent everything as a byte array.
You can pretty easily use JAXB to marshall the data to and from XML. And JAXB will even turn byte array objects into hex strings or Base64 for you.
First read the data into a byte array on the server. Write your own parsing routine to do nothing more than identify what is in the byte array.
Second perform the full object parsing based on the identification from step one. If the parsing requires passing an inputstream, you can always put the byte array you read in step one into a new ByteArrayInputStream instance.
You need to define a protocol to indicate what type of data follows. For instance, you could start each transfer with a string or enumerated value. The server would first read this, then read the following data based on the 'header' value.
What you could do, would be to prepend any data you send with an integer that is used to determine the type.
That way, you could read the first 4 bytes, and then determine what type of data it is.
I think the easiest way is to use an object which contains the data that you will send along with its type information. Then you can just send this object and according to this object's data type property you can extract the data.

Using GPB, how do I make my wrapper classes stop accepting binary messages that aren't meant for them?

I'm using Google Protocol Buffers to serialize some of my business objects (in a Java app). As recommended in the tutorials, I wrap the message builder in a class of my own that implements getter and setter methods to access the message's properties. Also, I declared all message fields optional, again following their recommendations.
Now, I can give any of the wrapper classes any of the encoded messages and they will always parse and accept them. This leads to wrapper objects that represent a message type which they don't actually contain and a lot of bogus happens.
When loading the binary content of a message into a wrapper class, how can I make it throw an error if it has been passed the wrong type?
The solution I'm currently thinking of would have all messages extend a base message with a required type field (and maybe a version field). This would have the generated builder class throw an exception if those fields are missing, and if they are there, I can check in my own code. However, I'm not yet done evaluating what repercussions this has for my code, and I'm not sure this is going to be easy.
If the data you pass to MyMessage.parseFrom() does not represent a message of that type, you will get a InvalidProtocolBufferException. Isn't that enough for you?
PB messages are not self-describing, so need to know (by some means) which message you are trying to parse. Of course, you can try to parse them and catch InvalidProtocolBufferException, but that isn't very nice. Instead, I think most people are using the approach you are describing: use a base message class with a type field (usually an enum) and a number of optional fields, one for each possible sub-type. This allows you to parse the message, and then switch on the message type to extract the actual "payload" of the message.
This seems to be what other people do, too, and it works fine for me:
message TypedMessage {
required string type = 1;
required bytes payload = 2;
}
The actual message goes into the payload field in serialized form and the type is used to get the proper builder and wrapper class. The field could also be an enum, I'm currently using Java class names, which I will likely replace by a different system later, since this means refactoring breaks backwards compatibility of the parser.

Categories