Query regarding Google Protocol Buffer

Query regarding Google Protocol Buffer - java

message Person {
required string name = 1;
required int32 id = 2;
optional string email = 3;
}
The above is a snippet from addrbook.proto file mentioned in Google Protocol Buffer tutorials.
The requirement is that, the application that is being developed will need to decode binary data received from a socket.For example,name, id and e-mail represented as binary data.
Now, id can be read as an integer. But I am really not sure how to read name and email considering that these two can be of variable lengths. (And unfortunately, I do not get lengths prefixed before these two fields)
Application is expected to read these kind of data from a variety of sources. Our objective is to make a decoder/adapter for the different types of data originating from these sources. Then there could be different types of messages from same source too.
Thanks in advance

But I am really not sure how to read name and email considering that these two can be of variable lengths.
The entire point of a serializer such as protobuf is that you don't need to worry about that. Specifically, in the case of protobuf strings are always prefixed by their length in bytes (using UTF-8 encoding, and varint encoding of the length).
And unfortunately, I do not get lengths prefixed before these two fields
Then you aren't processing protobuf data. Protobuf is a specific data format, in the same way that xml or json is a data format. Any conversation involving "protocol buffers" only makes sense if you are actually discussing the protobuf format, or data serialized using that format.
Protocol buffers is not an arbitrary data handling API. It will not allow you to process data in any format other than protobuf.

It sounds like you might be trying to re-implement Protocol Buffers by hand; you don't need to do that (though I'm sure it would be fun). Google provides C, Java, and Python implementations to serialize and de-serialize content in protobuf format as part of the Protocol Buffers project.

Related

How to send efficiently a data array over udp between c++ and java

I am going to send a double array over udp socket. I am using Winsock library in c++ on the Clinet side and java on the server side. My current idea is to make a string out of this double numeric data and send. However I feel it is not appropriate and need the conversion in both sides to the numeric values. How I can send this data more efficiently?
I tried the following but received error "argument of type "double *" is incompatible with parameter of type "const char *" in sendto() function"
double arrayTosend[100]
int sendOk = sendto(out, arrayToSend, sizeof(arrayToSend), 0, (sockaddr*)&server, sizeof(server));

I suggest you use Google's Protocol Buffers for handling the transfer of your array:
It is a solid mechanism for serializing/deserializing messages between remote and internal processes.
It is easy to use and learn.
It will generate the required code for getting and receiving the array on both your c++ and java processes.
If you change your array to a different type, or decide to transfer additional information - the message format used by protocol buffer will allow you to change your transferred data easily and will regenerate all the required boilerplate code.

As400Text class vs MQC.MQGMO_CONVERT

Why some people prefer to use As400Text object to handle EBCDIC/ASCII conversion (Java code with IBM MQ jars) if we already have MQC.MQGMO_CONVERT option to handle this?
My requirement is to convert ASCII->EBCDIC during the PUT operation which I am doing by setting the character set to 37 and the write format to "STRING" and using MQC.MQGMO_CONVERT option to automatically convert EBCDIC ->ASCII during the GET operation.
Is there any downfall of using convert option? Could anyone please let me know if this is not 100 percent safe option?

Best practice is to write the MQ message in your local code page (where the CCSID and Encoding will normally be filled in automatically as the correct values) and to set the Format field. Then the getter will should use MQGMO_CONVERT to request the message in the CCSID and Encoding they need it in.
Get with Convert is safe, and will be correct so long as you provide the correct CCSID a and Encoding that describes the message, when you put it.
In the description of what you are doing in your question you convert from ASCII->EBCDIC before putting the message, and then getter is converting from EBCDIC->ASCII on the MQGET. This means you have paid for two data conversion operations, when you could have done none (or if two different ASCIIs, only one).

Google ProtoBuf serialization / deserialization

I am reading Google Protocol Buffers. I want to know Can I Serialize C++ object and send it on the wire to Java server and Deserialize there in java and introspect the fields.
Rather I want to send objects from any language to Java Server. and deserialize it there.
Assume following is my .proto file
message Person {
required int32 id = 1;
required string name = 2;
optional string email = 3;
}
I ran protoc on this and created a C++ object.
Basically Now i want to send the serialized stream to java server.
Here on java side can I deserialized the stream , so that I can find out there are 3 fields in the stream and its respective name, type, and value

Here on java side can I deserialized the stream , so that I can find out there are 3 fields in the stream and its respective name, type, and value
You will need to know the schema in advance. Firstly, protobuf does not transmit names; all it uses as identifiers is the numeric key (1, 2 and 3 in your example) of each field. Secondly, it does not explicitly specify the type; there are only a very few wire-types in protobuf (varint, 32-bit, 64-bit, length-prefix, group); actual data types are mapped onto those, but you cannot unambiguously decode data without the schema
varint is "some form of integer", but could be signed, unsigned or "zigzag" (which allows negative numbers of small magnitude to be cheaply encoded), and could be intended to represent any width of data (64 bit, 32 bit, etc)
32-bit could be an integer, but could be signed or unsigned - or it could be a 32-bit floating-point number
64-bit could be an integer, but could be signed or unsigned - or it could be a 64-bit floating-point number
length-prefix could be a UTF-8 string, a sequence or raw bytes (without any particular meaning), a "packed" set of repeated values of some primitive type (integer, floating point, etc), or could be a structured sub-message in protobuf format
groups - hoorah! this is always unambigous! this can only mean one thing; but that one thing is largely deprecated by google :(
So fundamentally: you need the schema. The encoded data does not include what you want. It does this to avoid unnecessary space - if the protocol assumes that the encoder and decoder both know what the message is meant to look like, then a lot less information needs to be sent.
Note, however, that the information that is included is enough to safely round-trip a message even if there are fields that are not expected; it is not necessary to know the name or type if you only need to re-encode it to pass it along / back.
What you can do is use the parser API to scan over the data to reveal that there are three fields, field 1 is a varint, field 2 is length-prefixed, field 3 is length-prefixed. You could make educated guesses about the data beyond that (for example, you could see whether a UTF-8 decode produces something that looks roughly text-like, and verify that UTF-8 encoding that gives you back the original bytes; if it does, it is possible it is a string)

Can I Serialize C++ object and send it on the wire to Java server and Deserialize there in java and introspect the fields.
Yes, it is the very goal of protobuf.
Serialize data in an application developed in any supported language, and deserialize data in an application developed in any supported language. Serialization and deserialization languages can be the same, or be different.
Keep in mind that protocol buffers are not self describing, so both sides of your application needs to have serializers/deserializers generated from the .proto file.

In short: yes you can.
You will need to create .proto files which define the data structures that you want to share. By using the Google Protocol Buffers compiler you can then generate interfaces and (de)serialization code for your structures for both Java and C++ (and almost any other language you can think of).
To transfer your data over the wire you can use for instance ZeroMQ which is an extremely versatile communications framework which also sports a slew of different language API's, among them Java and C++.
See this question for more details.

Is it possible to serialize an object into a plain text string and send it as a string then convert it back?

As the title says. I'm sending a message, from my server, into a proxy which is outside of my control which then sends it onto my application. All I can do is send and receive strings. Is it possible to serialize to a plain string and send in this way without an input/output stream as you would normally have?
TIA
A little more info:
public class myClass implements java.io.Serializable {
int h = "ccc";
int i = "bbbb";
String myString = "aaaa";
}
I have this class, for example. Now I want to serialize it and send it as a string inside my HTTPpost and send to the proxy, can't do anything about this stage:
HttpPost post = new HttpPost("http://www.myURL.com/send.php?msg="+msg);
Then receive the msg as a string on the other side and convert it back.
Is that easily done without to many other library?

Yes.
This is done every day using JSON and XML, just to name a few formats of strings that are easily formatted and parsed. (Read about JAXRS to know about a way to use JSON formatted strings to do this and do the transfers. Or, read about JAXB which will format as XML but doesn't halp with the communication of the strings.)
You can do it in CSV format.
You can do it in fixed with fields of characters.
Morse code isn't much of a different concept only it starts with strings and converts to short and long beeps.
The way it works is this:
There is some code to which you pass an object and it returns a string in a known format.
You send the string to the other server somehow. Some ways to send strings have limits on the length.
The other server receives the string.
Using its knowledge of the format, that other server parses out the string contents and uses it.
Some notes:
If both servers use Java (or C# or Python or PHP or whatever) the formatting and parsing become symetrical. You start with a Java object of some type and end up with a Java object in the other JVM of the same type. But that is not a given. You can store values in a custom POJO in one server and a Map in the other.
If you write code to format and parse, it seems really easy as long as the contents are simple and you don't run afoul of transmission rules. For example, if you send in the query part of an HTTP get, you can't have any ampersand characters in the string.
If you use an existing library, you take advantage of everyone else's acquired knowlege of how to do this without error.
If you use a standard format for the string, it is easy to explain what's going on to someone else. If your project works, a third server might want to be in the communication loop and if it's controlled by someone else ...
Formatting is easier than parsing. There are lots of pitfalls that other people have already solved. If you are doing this to learn ways not to do things and improve your own knowledge base, by all means, do it yourself. If you want rock solid performance, use an existing and standard library and format.

Take a look at XStream. It serializes into XML, and is very simple to use.
Take a look at there Two Minute Tutorial

Yes it is possible. You can use ajax to to serialize the string to a json object and have it back to the server using an ajax.post event (javascript event).

Developing a (file) exchange format for java

I want to come up with a binary format for passing data between application instances in a form of POFs (Plain Old Files ;)).
Prerequisites:
should be cross-platform
information to be persisted includes a single POJO & arbitrary byte[]s (files actually, the POJO stores it's names in a String[])
only sequential access is required
should be a way to check data consistency
should be small and fast
should prevent an average user with archiver + notepad from modifying the data
Currently I'm using DeflaterOutputStream + OutputStreamWriter together with InflaterInputStream + InputStreamReader to save/restore objects serialized with XStream, one object per file. Readers/Writers use UTF8.
Now, need to extend this to support the previously described.
My idea of format:
{serialized to XML object}
{delimiter}
{String file name}{delimiter}{byte[] file data}
{delimiter}
{another String file name}{delimiter}{another byte[] file data}
...
{delimiter}
{delimiter}
{MD5 hash for the entire file}
Does this look sane?
What would you use for a delimiter and how would you determine it?
The right way to calculate MD5 in this case?
What would you suggest to read on the subject?
TIA.

It looks INsane.
why invent a new file format?
why try to prevent only stupid users from changing file?
why use a binary format ( hard to compress ) ?
why use a format that cannot be parsed while being received? (receiver has to receive entire file before being able to act on the file. )
XML is already a serialization format that is compressable. So you are serializing a serialized format.

Would serialization of the model (if you are into MVC) not be another way? I'd prefer to use things in the language (or standard libraries) rather then roll my own if possible. The only issue I can see with that is that the file size may be larger than you want.

1) Does this look sane?
It looks fairly sane. However, if you are going to invent your own format rather than just using Java serialization then you should have a good reason. Do you have any good reasons (they do exist in some cases)? One of the standard reasons for using XStream is to make the result human readable, which a binary format immediately loses. Do you have a good reason for a binary format rather than a human readable one? See this question for why human readable is good (and bad).
Wouldn't it be easier just to put everything in a signed jar. There are already standard Java libraries and tools to do this, and you get compression and verification provided.
2) What would you use for a delimiter and how determine it?
Rather than a delimiter I'd explicitly store the length of each block before the block. It's just as easy, and prevents you having to escape the delimiter if it comes up on its own.
3) The right way to calculate MD5 in this case?
There is example code here which looks sensible.
4) What would you suggest to read on the subject?
On the subject of serialization? I'd read about the Java serialization, JSON, and XStream serialization so I understood the pros and cons of each, especially the benefits of human readable files. I'd also look at a classic file format, for example from Microsoft, to understand possible design decisions from back in the days that every byte mattered, and how these have been extended. For example: The WAV file format.

Let's see this should be pretty straightforward.
Prerequisites:
0. should be cross-platform
1. information to be persisted includes a single POJO & arbitrary byte[]s (files actually, the POJO stores it's names in a String[])
2. only sequential access is required
3. should be a way to check data consistency
4. should be small and fast
5. should prevent an average user with archiver + notepad from modifying the data
Well guess what, you pretty much have it already, it's built-in the platform already:Object Serialization
If you need to reduce the amount of data sent in the wire and provide a custom serialization ( for instance you can sent only 1,2,3 for a given object without using the attribute name or nothing similar, and read them in the same sequence, ) you can use this somehow "Hidden feature"
If you really need it in "text plain" you can also encode it, it takes almost the same amount of bytes.
For instance this bean:
import java.io.*;
public class SimpleBean implements Serializable {
private String website = "http://stackoverflow.com";
public String toString() {
return website;
}
}
Could be represented like this:
rO0ABXNyAApTaW1wbGVCZWFuPB4W2ZRCqRICAAFMAAd3ZWJzaXRldAASTGphdmEvbGFuZy9TdHJpbmc7eHB0ABhodHRwOi8vc3RhY2tvdmVyZmxvdy5jb20=
See this answer
Additionally, if you need a sounded protocol you can also check to Protobuf, Google's internal exchange format.

You could use a zip (rar / 7z / tar.gz / ...) library. Many exists, most are well tested and it'll likely save you some time.
Possibly not as much fun though.

I agree in that it doesn't really sound like you need a new format, or a binary one.
If you truly want a binary format, why not consider one of these first:
Binary XML (fast infoset, Bnux)
Hessian
google packet buffers
But besides that, many textual formats should work just fine (or perhaps better) too; easier to debug, extensive tool support, compresses to about same size as binary (binary compresses poorly, and information theory suggests that for same effective information, same compression rate is achieved -- and this has been true in my testing).
So perhaps also consider:
Json works well; binary support via base64 (with, say, http://jackson.codehaus.org/)
XML not too bad either; efficient streaming parsers, some with base64 support (http://woodstox.codehaus.org/, "typed access API" under 'org.codehaus.stax2.typed.TypedXMLStreamReader').
So it kind of sounds like you just want to build something of your own. Nothing wrong with that, as a hobby, but if so you need to consider it as such.
It likely is not a requirement for the system you are building.

Perhaps you could explain how this is better than using an existing file format such as JAR.
Most standard files formats of this type just use CRC as its faster to calculate. MD5 is more appropriate if you want to prevent deliberate modification.

Bencode could be the way to go.
Here's an excellent implementation by Daniel Spiewak.
Unfortunately, bencode spec doesn't support utf8 which is a showstopper for me.
Might come to this later but currently xml seems like a better choice (with blobs serialized as a Map).

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.