Java Socket Programming: Send Object as CSV or Serialized Object?

Java Socket Programming: Send Object as CSV or Serialized Object? - java

I am just getting in to writing networked code using Sockets in Java. I'm just making some test programs. Originally I was going to send data as comma separated values, but I recently discovered ObjectOutputStream. Which method would be faster or more bandwidth efficient? For example, if I'm making a game where I have to send x and y coordinates very often, should I send it through PrintWriter separated by a comma, or make a Position class and send an instance over ObjectOutputStream. What if I change my code and need to send a lot more data?
What are the pros and cons of sending data as CSV over PrintWriter vs as fields in an object over ObjectOutputStream?

An ad-hoc binary format has a good chance of being more bandwidth-efficient than the default serialization format, which should be (but it's a wild guess, and it depends on the nature and amount of data: you should measure it if it matters) more or less as bandwidth efficient than a text-based format.
But bandwidth efficiency is not the only thing that matters.
Using serialization, the client and the server must be written in Java, and have the classes of the serialized objects in their classpath. If you intend to have clients written in any language, you shouldn't consider it.
If serialization is OK, it's of course a really easy way to transform almost any Java object into bytes, which allows you to avoid defining a format.
Note that there are alternatives that provide almost the same flexibility, but don't have the Java-only disadvantage of serialization. For example, JSON, XML, or protobuf.

I think CSV is smaller.
If you want to check data size,please try to output to a File.
and I don't recommend ObjectOutputStream to you by other reason.
Because you have to keep Objects compatibility.
Did you research about serialize and serialVersionUID?
Please check java.io.Serializable

Related

Object to bytes array in Java

I'm working on a proprietary TCP protocol. This protocol sends and receive messages with a specific sequence of bytes.
I should be complaiant to this protocol, and i cant change it.
So my input / output results are something like that :
\x01\x08\x00\x01\x00\x00\x01\xFF
\x01 - Message type
\x01 - Message type
\x00\x01 - Length
\x00\x00\x01 - Transaction
\xFF - Body
The sequence of field is important. And i want only the values of the fields in my serialization, and nothing about the structure of the class.
I'm working on a Java controller that use this protocol and I've thought to define the message structures in specific classes and serialize/deserialize them, but I was naive.
First of all I tried ObjectOutputStream, but it output the entire structure of the object, when I need only the values in a specific order.
Someone already faced this problem:
Java - Object to Fixed Byte Array
and solved it with a dedicated Marshaller.
But I was searching for a more flexible solution.
For text serialization and deserialization I've found:
http://jeyben.github.io/fixedformat4j/
that with annotation defines the schema of the line. But it outputs a String, not a byte[]. So 1 is output like "1" that is represented differently based on encoding, and often with more bytes.
What I was searching for is something that given the order of my class properties will convert each property in a bunch of bytes (based on the internal representation) and append them to a byte[].
Do you know some library used for that purpose?
Or a simple way to do that, without coding a serialization algorithm for each of my entities?

Serialization just isn't easy; it sounds from your question like you feel you can just invoke something and out rolls compact, simple, versionable, universal data you can then put on the wire. What you need to fix is to scratch the word 'just' from that sentence. You're going to have to invest some time and care.
As you figured out already, java's baked in serialization has a ton of downsides. Don't use that.
There are various serializers. The popular ones are things like GSON or Jackson, which lets you serialize java objects into JSON. This isn't particularly efficient, and is string based. This sounds like crucial downsides but they really aren't, see below.
You can also spend a little more time specifying the exact format and use protobuf which lets you write a quite lean and simple data protocol (and protobuf is available for many languages, if eventually you want to write an participant in this protocol in non-java later).
So, those are the good options: Go to JSON via Jackson or GSON, or, use protobuf.
But JSON is a string.
You can turn a string to bytes trivially using str.getBytes(StandardCharsets.UTF_8). This cannot fail due to charset encoding differences (as long as you also 'decode' in the same fashion: Turn the bytes into a string with new String(theBytes, StandardCharsets.UTF_8). UTF-8 is guaranteed to be available on all JVMs; if it is not there, your JVM is as broken as a JVM that is missing the String class - not something to worry about.
But JSON is inefficient.
Zip it up, of course. You can trivially wrap an InputStream and an OutputStream so that gzip compression is applied which is simple, available on just about every platform, and fast (it's not the most efficient cutting edge compression algorithm, but usually squeezing the last few bytes out is not worth it) - and zipped-up JSON can often be more efficient that carefully handrolled protobuf, even.
The one downside is that it's 'slow', but on modern hardware, note that the overhead of encrypting and decrypting this data (which you should obviously be doing!!) is usually multiple orders of magnitude more involved. A modern CPU is simply very, very fast - creating JSON and zipping it up is going to take 1% of CPU or less even if you are shipping the collected works of shakespeare every second.
If an arduino running on batteries needs to process this data, go with uncompressed, unencrypted protobuf-based data. If you are facebook and writing the whatsapp protocol, the IAAS creds saved by not having to unzip and decode JSON is tiny and pales in comparison to the creds you spend just running the servers, but at that scale its worth the development effort.
In just about every other case, just toss gzipped JSON on the line.

Different in transfering data between Pipe and Serialization in Java and C?

I am studying about the Interprocess Communication Methods in the course Operating System Concept.
I don't really understand the mechanism in transferring data. In the case of pipe method, a conduit will be created between 2 process to transfer byte streams , right?
And how about Serialization?
I know Serialization is the method to convert an object into byte stream to transfer and we can rebuild the object when it reached the destination.
So in which case we use Serialzation or Pipe to transfer data?
What is the advantages and the disadvantages between them?
Can anyone explain to me a very deep mechanism in transferring data of these methods? And are these mechanisms different between Java and C? , or it is the same?
Thanks in advanced.

There are two basic types of pipe in UNIX/Linux: a named pipe and an anonymous one.
An anonymous pipe is created by the "pipe()" system call, which returns 2 file descriptors associated with a newly created pipe, one for writing data, the other for reading from it. The shell uses anonymous pipes to connect the standard output of one process to the standard input of another when you connect two process with the "|" operator.
A named pipe appears as a file in the file system, and can be opened with the normal "open()" system call.
In blocking mode (the default), the process that reads from the pipe will block until data appears there; the writer can then send data which will appear as a byte stream to the reader.
The important fact here is that the data that is transferred is a byte stream. The sender and receiver of the data must agree on a protocol to determine how to interpret the bytes. One typical method for this is serialization. Consider a 32 bit integer ... 4 bytes. Some systems store those bytes with the most significant bit in the first byte (known as big-endian), some store the least significant bit in the first byte (little-endian system, such as x86). When transmitting such data across a network, serialization of such data is important, since it is entirely possible that each end stores the data in a different order.
But even when transmitting data between two processes on the same host, serialization helps. It can be used to encapsulate objects so that the receiver knows when it has received everything. For example, with our 32 bit integer, if the receiver doesn't know it is expecting an integer, and gets 3 bytes (the 4th having been delayed by some scheduling), it must know that it needs to wait before continuing.
None of this is particular language specific, save that some languages have built in support for serialization. Java is one such language (see ObjectInputStream and ObjectOutputStream). If you are trying to move data between Java and C programs, and on the Java side you want to use these classes, then you'll need to understand the serialization protocol used by them.
Another common serialization technique is JSON (JavaScript Object Notation), for which there exists several good libraries in C and Java.

I don't really understand the mechanism in transferring data. In the case of pipe method, a conduit will be created between 2 process to transfer byte streams , right?
A named or anonymous pipe is a stream rather like a socket connection over loop back. In fact in some OSes, it is implemented by the same drivers/library.
And how about Serialization?
How serialization is done is not a language specific and you can serialize data in a manner which can be shared between C and Java.
What is the advantages and the disadvantages between them?
There is many forms of serialization and this is too broad a topic to cover in one answer. You could do an entire thesis on it.
Can explain one explain to me a very deep mechanism in transferring data of these methods?
There isn't much to it. A block of data is copied to memory managed by the OS and this buffered data can be read by another program (or the same one)
And are these mechanisms different between Java and C? , or it is the same?
They both use the same OS calls to do the real work. The Java API hides this fact from you and makes it more Java friendly, but they are the same.

Socket streaming in Java

When continuously writing/reading sets of data through a socket, how do you recognize the end of 1 set, the start of the next set, and if the entire set is even in the stream for retrieval yet, and not just a piece of it?
To make things simple let's say I'm sending JSON strings through the socket. How do I know if the whole object is there, and get that object from start to finish so I can correctly read it? Keep in mind there may be more objects behind this one.

That depends. If you use an ObjectOutputStream then Java takes care of this for you. Obviously this is Java specific and requires you to have a ObjectInputStream on the other side. It also expects that you send serializable objects to the other side. String however is a serializable object, and I would in general expect any data structure to be serializable.
Otherwise you will have to think of some kind of container format yourself. Nowadays it is also pretty common to use XML structures to serialize the data into. If you go to an even higher level you get to the point of using web-services.

What is the fastest way to output a large amount of data?

I have an JAX-RS web service that calls a db2 z/os database and returns about 240mb of data in a resultset. I am then creating an OutputStream to send this data to the client by looping through the resultset and adding a few XML tags for my output.
I am confused about what to use PrintWriter, BufferedWriter or OutputStreamWriter. I am looking for the fastest way to deliver the data. I also don't want the JVM to hold onto this data any longer than it needs to, so I don't use up it's memory.
Any help is appreciated.

You should use
BufferedWriter
Call .flush() frequently
Enable gzip for best compression
Start thinking about a different way of doing this. Can your data be paginated? Do you need all the data in one request.

If you are sending a large binary data, you probably don't want to use xml. When xml is used, binary data is usually represented using base64 which becomes larger than the original binary and uses quite a lot of CPU for the conversion into base64.
If I were you, I'd send the binary separate from the xml. If you are using WebService, MTOM attachment could help. Otherwise you could send the reference to the binary data in the xml, and let the app. download the binary data separately.
As for the fastest way to send binary, if you are using weblogic, just writing on the response's outputstram would be ok. That output stream is most probably buffered and whatever you do probably won't change the performance anyways.
Turning on gzip could also help depending on what you are sending (e.g. if you are sending jpeg (stuff that is already compressed) or something, it won't help a lot but if you are sending raw text then it can help a lot, etc.).

One solution (which might not work for you) is to spawn a job / thread that creates a file and then notifies the user when the file is ready to download, in this way you're not tied to the bandwidth of the client connection (and you can even compress the file properly, before the client downloads it)
Some Business Intelligence and data crunching applications do this, specially if the process takes some time to generate the data.

The output max speed will me limited by network bandwith and i am shure any Java OutputStream will be much more faster than you will notice the difference.
The choice depends on the data to send: is that text (lines) PrintWriter is easy, is that a byte array take OutputStream.
To hold not too much data in the buffers you should call flush() any x kb maybe.

You should never use PrintWriter to output data over a network. First of all, it creates platform-dependent line breaks. Second, it silently catches all I/O exceptions, which makes it hard for you to deal with those exceptions.
And if you're sending 240 MB as XML, then you're definitely doing something wrong. Before you start worrying about which stream class to use, try to reduce the amount of data.
EDIT:
The advice about PrintWriter (and PrintStream) came from a book by Elliotte Rusty Harold. I can't remember which one, but it was a few years ago. I think that ServletResponse.getWriter() was added to the API after that book was written - so it looks like Sun didn't follow Rusty's advice. I still think it was good advice - for the reasons stated above, and because it can tempt implementation authors to violate the API contract
in order to get predictable behavior.

Protocol Buffer better than serialization?

I have a large data-structure which i'm serializing.At certain times i need to edit the values in the data-structure.But just for changing a small value i'll have to re-serialize it again instead of updating the changed value in file.I've heard of Google protocol buffer's.Will using it solve my problem of rewriting the file ? Is it a better option for me to use protocol buffer instead of Java serialization ?

Protocol buffers are themselves a serialization format, so they won't fundamentally change the picture (you'll still need to re-serialize after you change a value).
Google's docs claim that protocol buffers are more compact and faster to parse than XML (which seems plausible); don't know how they compare to native Java serialization.
Advantages of protocol buffers might be portability (if programs written in other languages need to read the file) and upgradability (you can add new fields to the data structure without breaking the file format).

A couple of points
There is an editor for Protocol Buffers binary format (http://code.google.com/p/protobufeditor/)
Protocol buffers has a text format that looks like:
# Textual representation of a protocol buffer.
# This is *not* the binary format used on the wire.
person {
name: "John Doe"
email: "jdoe#example.com"
}
See:
Discussion: http://groups.google.com/group/protobuf/browse_thread/thread/04fc478088137bf3
Class: http://code.google.com/apis/protocolbuffers/docs/reference/java/com/google/protobuf/TextForm
Having said that, I would use a technology (JSon, Xml etc) that is already in use unless one of the following applies
You need the performance of protocol buffers
You already / plan to use protocol buffers

If you care about performance, don't use a text format for your data. If you want to modify the data without deserializing, you'll want to use a fixed record data format. You'll probably have to invent this manually. Then seek to the correct position in the file and rewrite just the changed field. You might look at DataOutputStream to get started or instead use a database such as HSQLDB to store and edit your data.
Thinking about this more, Unless your objects are very simple, I think a database would be a better way to go.
More info on DataOutputStream:
http://download.oracle.com/javase/tutorial/essential/io/datastreams.html
Java Databases:
http://java-source.net/open-source/database-engines

You need a serialization format that can directly be modified for example XML or JSON. Google protocol buffer is a binary format -- as the java serialization -- and thus can not be modifier directly...

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.