Hy...I try to explain better my question...
Im using boost serialization text archive before sending data over TCP connection...
Now I need to pass the received data to a Java application...so I would know if the serialized stream is composed only by the data or by the data + boost serialization(tag, code, etc.)...
In this case my only chance to transfer the data to the java application, is to filter them before transfering?thanks...
As far as i know Boost serialization text archive uses custom formatting.
For instance it puts serialization archive version number in the output. So you will have to filter these kind of data with Java.
Even if you have used boost binary archive you would not be able to deserialize it with Java.
So the answer to your question is boost serialization mechanism and Java are not compatible.
Try using JSON as serialization format if you have to use text based communicatoin which makes life easier.
Related
I'm working on a proprietary TCP protocol. This protocol sends and receive messages with a specific sequence of bytes.
I should be complaiant to this protocol, and i cant change it.
So my input / output results are something like that :
\x01\x08\x00\x01\x00\x00\x01\xFF
\x01 - Message type
\x01 - Message type
\x00\x01 - Length
\x00\x00\x01 - Transaction
\xFF - Body
The sequence of field is important. And i want only the values of the fields in my serialization, and nothing about the structure of the class.
I'm working on a Java controller that use this protocol and I've thought to define the message structures in specific classes and serialize/deserialize them, but I was naive.
First of all I tried ObjectOutputStream, but it output the entire structure of the object, when I need only the values in a specific order.
Someone already faced this problem:
Java - Object to Fixed Byte Array
and solved it with a dedicated Marshaller.
But I was searching for a more flexible solution.
For text serialization and deserialization I've found:
http://jeyben.github.io/fixedformat4j/
that with annotation defines the schema of the line. But it outputs a String, not a byte[]. So 1 is output like "1" that is represented differently based on encoding, and often with more bytes.
What I was searching for is something that given the order of my class properties will convert each property in a bunch of bytes (based on the internal representation) and append them to a byte[].
Do you know some library used for that purpose?
Or a simple way to do that, without coding a serialization algorithm for each of my entities?
Serialization just isn't easy; it sounds from your question like you feel you can just invoke something and out rolls compact, simple, versionable, universal data you can then put on the wire. What you need to fix is to scratch the word 'just' from that sentence. You're going to have to invest some time and care.
As you figured out already, java's baked in serialization has a ton of downsides. Don't use that.
There are various serializers. The popular ones are things like GSON or Jackson, which lets you serialize java objects into JSON. This isn't particularly efficient, and is string based. This sounds like crucial downsides but they really aren't, see below.
You can also spend a little more time specifying the exact format and use protobuf which lets you write a quite lean and simple data protocol (and protobuf is available for many languages, if eventually you want to write an participant in this protocol in non-java later).
So, those are the good options: Go to JSON via Jackson or GSON, or, use protobuf.
But JSON is a string.
You can turn a string to bytes trivially using str.getBytes(StandardCharsets.UTF_8). This cannot fail due to charset encoding differences (as long as you also 'decode' in the same fashion: Turn the bytes into a string with new String(theBytes, StandardCharsets.UTF_8). UTF-8 is guaranteed to be available on all JVMs; if it is not there, your JVM is as broken as a JVM that is missing the String class - not something to worry about.
But JSON is inefficient.
Zip it up, of course. You can trivially wrap an InputStream and an OutputStream so that gzip compression is applied which is simple, available on just about every platform, and fast (it's not the most efficient cutting edge compression algorithm, but usually squeezing the last few bytes out is not worth it) - and zipped-up JSON can often be more efficient that carefully handrolled protobuf, even.
The one downside is that it's 'slow', but on modern hardware, note that the overhead of encrypting and decrypting this data (which you should obviously be doing!!) is usually multiple orders of magnitude more involved. A modern CPU is simply very, very fast - creating JSON and zipping it up is going to take 1% of CPU or less even if you are shipping the collected works of shakespeare every second.
If an arduino running on batteries needs to process this data, go with uncompressed, unencrypted protobuf-based data. If you are facebook and writing the whatsapp protocol, the IAAS creds saved by not having to unzip and decode JSON is tiny and pales in comparison to the creds you spend just running the servers, but at that scale its worth the development effort.
In just about every other case, just toss gzipped JSON on the line.
Scenario is like i need to print very large json data set. This json data is consumed by mobile application's. In Java service application, used as below to print the json.
response.getWriter().println(mainjson);
getWriter taking too much time to print all the data.
I heard about getOutputStream also. Which is faster in case of large json data?
Any help will be appreciated :-)
It depends on how you retrieve the data and whether your JSON serializer has a streaming api available.
At the moment you are probably operating in three seperate steps
Retrieving all your data
Serializing it to JSON string
Writing the JSON response.
If you are spending a substantial amount of time on the retrieval and serialization part by itself then you can potentially speed things up by using streams. However, this requires your data retrieval and json serializer to support streams.
When using streams, instead of doing everything in sequential steps you basically setup a pipeline that allows you to start writing the response a bit earlier. This is not guaranteed to be faster though, it depends on where your particular bottleneck occurs. If its almost all an issue with the IO to the client then you are not going to see a substantial difference.
Also
Something else to look into is check if your are compressing your response to the user. Gzip can have a substantial impact on the size of text data and may reduce your payload sufficiently to make this a non issue.
I wanted to parse x12 format file to json file using java.
I didn't find any information regarding this on internet.
can someone please tell me how to do this or any jar file which can be able to do this is also fine.
How to do it:
Obtain documentation on your x12 file (do you mean HIPAA data exchange?). It will tell you about the different records, their layout and sequencing
Define target schema for the JSON you want to produce. Surely you don't want to produce just any JSON
Define mapping. Draw spaghetti on a whiteboard, piece of paper, or something like Altova Mapforce, until you have all elements connected.
Choose your transformation approach depending on the dataset size - streaming or object to JSON serialization
Implement
Look for performance bottlenecks. Introduce optimizations to speed up processing.
I am just getting in to writing networked code using Sockets in Java. I'm just making some test programs. Originally I was going to send data as comma separated values, but I recently discovered ObjectOutputStream. Which method would be faster or more bandwidth efficient? For example, if I'm making a game where I have to send x and y coordinates very often, should I send it through PrintWriter separated by a comma, or make a Position class and send an instance over ObjectOutputStream. What if I change my code and need to send a lot more data?
What are the pros and cons of sending data as CSV over PrintWriter vs as fields in an object over ObjectOutputStream?
An ad-hoc binary format has a good chance of being more bandwidth-efficient than the default serialization format, which should be (but it's a wild guess, and it depends on the nature and amount of data: you should measure it if it matters) more or less as bandwidth efficient than a text-based format.
But bandwidth efficiency is not the only thing that matters.
Using serialization, the client and the server must be written in Java, and have the classes of the serialized objects in their classpath. If you intend to have clients written in any language, you shouldn't consider it.
If serialization is OK, it's of course a really easy way to transform almost any Java object into bytes, which allows you to avoid defining a format.
Note that there are alternatives that provide almost the same flexibility, but don't have the Java-only disadvantage of serialization. For example, JSON, XML, or protobuf.
I think CSV is smaller.
If you want to check data size,please try to output to a File.
and I don't recommend ObjectOutputStream to you by other reason.
Because you have to keep Objects compatibility.
Did you research about serialize and serialVersionUID?
Please check java.io.Serializable
I have a large data-structure which i'm serializing.At certain times i need to edit the values in the data-structure.But just for changing a small value i'll have to re-serialize it again instead of updating the changed value in file.I've heard of Google protocol buffer's.Will using it solve my problem of rewriting the file ? Is it a better option for me to use protocol buffer instead of Java serialization ?
Protocol buffers are themselves a serialization format, so they won't fundamentally change the picture (you'll still need to re-serialize after you change a value).
Google's docs claim that protocol buffers are more compact and faster to parse than XML (which seems plausible); don't know how they compare to native Java serialization.
Advantages of protocol buffers might be portability (if programs written in other languages need to read the file) and upgradability (you can add new fields to the data structure without breaking the file format).
A couple of points
There is an editor for Protocol Buffers binary format (http://code.google.com/p/protobufeditor/)
Protocol buffers has a text format that looks like:
# Textual representation of a protocol buffer.
# This is *not* the binary format used on the wire.
person {
name: "John Doe"
email: "jdoe#example.com"
}
See:
Discussion: http://groups.google.com/group/protobuf/browse_thread/thread/04fc478088137bf3
Class: http://code.google.com/apis/protocolbuffers/docs/reference/java/com/google/protobuf/TextForm
Having said that, I would use a technology (JSon, Xml etc) that is already in use unless one of the following applies
You need the performance of protocol buffers
You already / plan to use protocol buffers
If you care about performance, don't use a text format for your data. If you want to modify the data without deserializing, you'll want to use a fixed record data format. You'll probably have to invent this manually. Then seek to the correct position in the file and rewrite just the changed field. You might look at DataOutputStream to get started or instead use a database such as HSQLDB to store and edit your data.
Thinking about this more, Unless your objects are very simple, I think a database would be a better way to go.
More info on DataOutputStream:
http://download.oracle.com/javase/tutorial/essential/io/datastreams.html
Java Databases:
http://java-source.net/open-source/database-engines
You need a serialization format that can directly be modified for example XML or JSON. Google protocol buffer is a binary format -- as the java serialization -- and thus can not be modifier directly...