How to verify that an object transfer is complete - java

I want to send a message (a serializable object) from a java instance to another instance over a network. I would like to verify that the whole object has been sent correctly.
I suppose my first step is to calculate the checksum of the object. Then I include that checksum in the object OR build a container object for the message and its checksum.
Then my second step should be to verify the checksum against the object on the other side.
My third step would be for the receiver to send a confirmation message saying that the object in question was received and that the checksum has passed (or not). If I receive a failed checksum warning, I try to resend it a few times.
After a little while, if I never received a confirmation, I try to resend it a few times as well.
Questions :
Does my protocol sounds right to verify that an object was transferred correctly ?
I would also like to know how am I supposed to implement this in java ? Do I use the CRC32 class to generate the checksum ?
Bonus question : If I were to compress each message, do I generate the checksum before of after the compression, and how do I compress an object in java ?

If you have a reasonably reliable network with a low error rate, you shouldn't need to add an additional checksum. I would implement your protocol without a checksum first and add if you are sure you need it.
You can compress the data with DeflatorOutputStream, InflatorInputStream. If the compressed data is corrupted the Object is highly likely to throw an exception when unpacking it. i.e. it is very unlikely to have a subtle error.
However, unless your objects are large, they may not compress very well and could endup being larger with compression.

For compression, I would like to recommend the Apache Zip Utilities:
http://commons.apache.org/compress/apidocs/org/apache/commons/compress/archivers/zip/package-summary.html
If you are compressing then you can skip the checksum. Compression algorithms are quite sensitive to data damage. If the object fails to decompress on the other end, then you lost some information during transmission.

I agree with your steps, with one addition (this is what I do, over ObjectIO streams) -
[1] Read the incoming stuff as Object, find its class with "instanceof". If it is not the expected class, time to debug what is coming in.
In some other situations, I also send out Strings that contain info about the contents of the object to arrive next. Parse this string, read the object, typecast it, make sure it has what the info in the String said, and write out the confirmation :)

Related

How do i start reading byte through input stream from a specific Location in the stream?

I am using URL class in java and I want to read bytes through Input Stream from a specific byte location in the stream instead of using skip() function which takes a lot of time to get to that specific location.
I suppose it is not possible and here is why: when you send GET request, remote server does not know that you are interested in bytes from 100 till 200 - he sends you full document/file. So you need to read them, but don't need to handle them - that is why skip is slow.
But: I am sure that you can tell server (some of them support it, some - don't) that you want 100+ bytes of file.
Also: see this to get in-depth knowledge about skip mechanics: How does the skip() method in InputStream work?
The nature of streams mean you will need to read through all the data to get to the specific place you want to start from. You will not get faster than skip() unfortunately.
The simple answer is that you can't.
If you perform a GET that requests the entire file, you will have to use skip() to get to the part that you want. (And in fact, the slowness is most likely because the server has to send all of the data that is being skipped to the client. That is how TCP/IP works ...)
However, there is a possible alternative. The HTTP 1.1 specification supports partial fetching documents using the Range header. If your server supports this, then you can request the server to send you just the range of the document that you are interested in. However, you may need to deal with the case where the server ignores the Range header and sends the entire document anyway.

How to backup data on battery plug off in android?

I am developing an application that is writing data to a file. Lets assume while it is writing the data we plug off the battery. What will happen to the file? will it be half-writen (corrupted), empty or same as before we wrote to it? My guess is that it will be corrupted. How to check if it is corrupted when we restart the phone when the file was storing an arraylist of objects? will java throw a corrupted file exception or say that the read arraylist is null or that it is an unknown object?
PS. maybe create another file that will keep the MD5 checksum of the data file? And whenever I write to the file data first I produce its checksum and then when I read from the data file produce a checksum and compare it with the previous. That will indicate whether my data are intact but it wont allow me to roll back to a previous state (pre-corrupted one). I would like a method that would be as lightweight as possible, I am already using the CPU too much by reading/writing changes to my storage on every attribute change of a set of thousands. Probably a database would have been a better idea.
I can't say how Java will read in a corrupted serialized array, but for safety let's assume that there's no error detection.
In that case, you have two easy options:
Store a checksum of your data inside your data structure, before you serialize it.
Compute the checksum of the final serialized file.
Either case will work the same way, though the first option might be a bit faster since you compute the checksum before you've written anything to disk (and therefor avoid an extra round of file I/O).
As you mentioned, MD5 would be fine for this. (Even a basic CRC would probably be fine -- you don't need a cryptograhpic hash for this.)
If you want to allow rolling back to a previous version -- I'd just store each version as a separate file and then have a pointer to the most recent one. (If you update the pointer as the last step of your write operation, this will also provide an extra level of protection against corrupt data being input to your app -- though you'll have to prepare for this pointer to be corrupt as well. Since this is essentially a commit step, you could interpret a corrupt pointer as "use the last version".)
And yes, at this point you might want to just use the SQLite functionality built into Android. :)

Unknown AMF type & other errors when transfering files from flash to java

I'm using a Flash tool to transfer data to Java. I'm having problems when it comes to sending multiple objects at once. The objects being sent are just generic Object objects, so it's not a case of needed to register a class alias or anything.
Sending one object works fine. Once I start sending multiple objects (putting the same Objects in an Array and sending that), it starts getting weird. Up to three objects in an Array seems to work fine. More than that I get different errors in the readObject() function, such as:
Unknown AMF type '47'
Unknown AMF type '40'
Unknown AMF type '20'
OutOfBoundsExceptions index 23, size 0
NullPointerException
etc.
Sending 3 objects will work, sending 4 gives me the error. If I delete one of the previous 3 that worked (while keeping the fourth that was added), it'll work again. Anyone know what's going on?
Some more info:
Communication goes through a Socket class on the Flash side. This is pure AS3, no flex.
Messages are compressed before being sent and decompressed on the server, so I'm pretty sure it's not a buffer size issue (unless I'm missing something)
BlazeDS version seems to be 4.0.0.14931 on the jar
Flash version is 10.1 (it's an AIR app)
Update with rough code
Examples of the objects being sent:
var o:Object = { };
o._key = this._key.toString();
o.someParam = someString;
o.someParam2 = someInt;
o.someParam3 = [someString1, someString2, someString3];
...
It's added to our event object (which we use to determine the event to call, the data etc to pass). The event object is been registered as a class alias
That object is sent to the server through a Socket like so:
myByteArray.writeObject( eventObj );
myByteArray.compress();
mySocket.writeBytes( myByteArray );
mySocket.flush();
On the server side, we receive the bytes, and decompress them. We create a Amf3Input object and set the input stream, then read it:
Amf3Input amf3Input = new Amf3Input( mySerializationContext );
amf3Input.setInputStream( new ByteArrayInputStream( buffer ) ); // buffer is a byte[]
MyEventObj eventObj = (MyEventObj)amf3Input.readObject(); // MyEventObj is the server version of the client event object
If it's going to crash with an "unknown AMF type error", it does so immediately i.e. when we try to read the object, and not when it's trying to read a subobject.
In stepping through the read code, it seems when I pass an array of objects, if the length is <= 4, it reads the length right. If the length is bigger than that, it reads it's length as 4.
If you're getting AMF deserialization errors, there are several possible issues that could be contributing to the problem. Here are several techniques for doing further diagnostics:
Use a network traffic sniffer to make sure that what you are sending matches what you are receiving. On the Mac I'll use CocoaPacketAnalyzer, or you can try Charles, which can actually decode AMF packets that it notices.
Feed the data to a different AMF library, like PyAMF or RocketAMF to see if it's a problem with BlazeDS or with how you're calling it. It's also possible that you may get a different error message that will give you a better idea of where it's failing.
Check the format of the AMF packet. AMF server calls have some additional wrapping around them that would throw off a deserializer if it's not expecting to encounter that wrapping, and vice versa for purely serialized objects. Server call packets always start off with a 0x00, followed by the AMF version (0x00, 0x03, or in rare cases 0x02).
Ok, I figured out the problem. Basically messages are compressed before being sent and decompressed on the server. What I didn't see was that the byte[] buffer that the message was being decompressed to was always 1024 length, which was fine for small arrays of objects. Once that was passed however, it would overwrite the buffer (I'm not quite sure what happens in Java when you try to write more bytes than are available - whether it loops back around, or shifts the data).
When it came to reading the amf object, the first thing it does it read an int, and uses this to determine what type of object it's trying to decode. As this int was gibberish (47, 110, -10), it was failing.
Time to start prepending message lengths I think :)
Thanks for the help.

Exceptions when reading protobuf messages in Java

I am using protobuf now for some weeks, but I still keep getting exceptions when parsing protobuf messages in Java.
I use C++ to create my protobuf messages and send them with boost sockets to a server socket where the Java client ist listening. The C++ code for transmitting the message is this:
boost::asio::streambuf b;
std::ostream os(&b);
ZeroCopyOutputStream *raw_output = new OstreamOutputStream(&os);
CodedOutputStream *coded_output = new CodedOutputStream(raw_output);
coded_output->WriteVarint32(agentMessage.ByteSize());
agentMessage.SerializeToCodedStream(coded_output);
delete coded_output;
delete raw_output;
boost::system::error_code ignored_error;
boost::asio::async_write(socket, b.data(), boost::bind(
&MessageService::handle_write, this,
boost::asio::placeholders::error));
As you can see I write with WriteVarint32 the length of the message, thus the Java side should know by using parseDelimitedFrom how far it should read:
AgentMessage agentMessage = AgentMessageProtos.AgentMessage
.parseDelimitedFrom(socket.getInputStream());
But it's no help, I keep getting these kind of Exceptions:
Protocol message contained an invalid tag (zero).
Message missing required fields: ...
Protocol message tag had invalid wire type.
Protocol message end-group tag did not match expected tag.
While parsing a protocol message, the input ended unexpectedly in the middle of a field. This could mean either than the input has been truncated or that an embedded message misreported its own length.
It is important to know, that these exceptions are not thrown on every message. This is only a fraction of the messages I receive the most work out just fine - still I would like to fix this since I do not want to omit the messages.
I would be really gratful if someone could help me out or spent his ideas.
Another interesting fact is the number of messages I receive. A total messages of 1.000 in 2 seconds is normally for my program. In 20 seconds about 100.000 and so on. I reduced the messages sent artificially and when only 6-8 messages are transmitted, there are no errors at all. So might this be a buffering problem on the Java client socket side?
On, let's say 60.000 messages, 5 of them are corrupted on average.
[I'm not really a TCP expert, this may be way off]
Problem is, [Java] TCP Socket's read(byte[] buffer) will return after reading to the end of the TCP frame. If that happens to be mid-message (I mean, protobuf message), parser will choke and throw an InvalidProtocolBufferException.
Any protobuf parsing call uses CodedInputStream internally (src here), which, in case the source is an InputStream, relies on read() -- and, consequently, is subject to the TCP socket issue.
So, when you stuff big amounts of data through your socket, some messages are bound to be split in two frames -- and that's where they get corrupted.
I'm guessing, when you lower message transfer rate (as you said to 6-8 messages per second), each frame gets sent before the next data piece is put into the stream, so each message always gets its very own TCP frame, i.e. none get split and don't get errors. (Or maybe it's just that the errors are rare and low rate just means you need more time to see them)
As for the solution, your best bet would be to handle the buffer yourself, i.e. read a byte[] from the socket (probably using readFully() instead of read() because the former will block until either there's enough data to fill the buffer [or a EOF is encountered], so it's kind of resistant to the mid-message frame end thing), ensure it's got enough data to be parsed into a whole message, and then feed the buffer to the parser.
Also, there's some good read on the subject in this Google Groups topic -- that's where I got the readFully() part.
I am not familiar with the Java API, but I wonder how Java deals with an uint32 value denoting the message length, because Java only has signed 32-bit integers. A quick look at the Java API reference told me an unsigned 32-bit value is stored within a signed 32-bit variable. So how is the case handled where an unsigned 32-bit value denotes the message length? Also, there seems to be support for varint signed integers in the Java implementation. They are called ZigZag32/64. AFAIK, the C++ version doesn't know about such encodings. So maybe the cause for your problem might be related with these things?

What is the fastest way to output a large amount of data?

I have an JAX-RS web service that calls a db2 z/os database and returns about 240mb of data in a resultset. I am then creating an OutputStream to send this data to the client by looping through the resultset and adding a few XML tags for my output.
I am confused about what to use PrintWriter, BufferedWriter or OutputStreamWriter. I am looking for the fastest way to deliver the data. I also don't want the JVM to hold onto this data any longer than it needs to, so I don't use up it's memory.
Any help is appreciated.
You should use
BufferedWriter
Call .flush() frequently
Enable gzip for best compression
Start thinking about a different way of doing this. Can your data be paginated? Do you need all the data in one request.
If you are sending a large binary data, you probably don't want to use xml. When xml is used, binary data is usually represented using base64 which becomes larger than the original binary and uses quite a lot of CPU for the conversion into base64.
If I were you, I'd send the binary separate from the xml. If you are using WebService, MTOM attachment could help. Otherwise you could send the reference to the binary data in the xml, and let the app. download the binary data separately.
As for the fastest way to send binary, if you are using weblogic, just writing on the response's outputstram would be ok. That output stream is most probably buffered and whatever you do probably won't change the performance anyways.
Turning on gzip could also help depending on what you are sending (e.g. if you are sending jpeg (stuff that is already compressed) or something, it won't help a lot but if you are sending raw text then it can help a lot, etc.).
One solution (which might not work for you) is to spawn a job / thread that creates a file and then notifies the user when the file is ready to download, in this way you're not tied to the bandwidth of the client connection (and you can even compress the file properly, before the client downloads it)
Some Business Intelligence and data crunching applications do this, specially if the process takes some time to generate the data.
The output max speed will me limited by network bandwith and i am shure any Java OutputStream will be much more faster than you will notice the difference.
The choice depends on the data to send: is that text (lines) PrintWriter is easy, is that a byte array take OutputStream.
To hold not too much data in the buffers you should call flush() any x kb maybe.
You should never use PrintWriter to output data over a network. First of all, it creates platform-dependent line breaks. Second, it silently catches all I/O exceptions, which makes it hard for you to deal with those exceptions.
And if you're sending 240 MB as XML, then you're definitely doing something wrong. Before you start worrying about which stream class to use, try to reduce the amount of data.
EDIT:
The advice about PrintWriter (and PrintStream) came from a book by Elliotte Rusty Harold. I can't remember which one, but it was a few years ago. I think that ServletResponse.getWriter() was added to the API after that book was written - so it looks like Sun didn't follow Rusty's advice. I still think it was good advice - for the reasons stated above, and because it can tempt implementation authors to violate the API contract
in order to get predictable behavior.

Categories