Read from TcpClient.GetStream() without knowing the length

Read from TcpClient.GetStream() without knowing the length - java

I'm working on a tcp base communication protocol . As i know
there are many ways to determine when to end reading.
Closing the connection at the end of the message
Putting the length of the message before the data itself
Using a separator; some value which will never occur in the normal data (or would always be escaped somehow)
Typically i'm trying to send a file over the WiFi network (that may be Unstable and Low speed)
Cause of RSA and AES communication I don't like to close the connection each time (Can't use 1)
It's a large file that i cant predict the length of it so i cant act
as method (Can't use 2)
Checking for something special when reading and escape it when writing need a lot of process (Can't use 3)
This method should be compatible with both c# and java.
What you suggest ?
More general problems :
How to identify end of InputStream in java
C# - TcpClient - Detecting end of stream?
More Iformation
I'm coding a TCP client server communication
At first server generates and sends a RSA public code to the client.
Then the client will generate AES(key,IV) and send it back using RSA encryption.
Till here everything is fine.
But i want to send a file over this network. here is my current packet EncryptUsingAES(new AES.IV(16 byte) +file.content(any size))
In the server i can't capture all the data sent by client. So i need to know how much data to read with (TcpClient.GetStream().read(buffer , 0 , buffersize) )
Current code:
List<byte> message = new List<byte>();
int bytes = -1;
do
{
byte[] buffer = new byte[bufferrSize];
bytes = stream.Read(buffer, 0, bufferrSize);
if (bytes > 0)
{
byte[] tmp = new byte[bytes];
Array.Copy(buffer, tmp, bytes);
message.AddRange(tmp);
}
} while (bytes == bufferrSize);

Your second method is the best one. Prefixing each packet with the packet's length will create a reliable message framing protocol which will, if done correctly, ensure that all your data is received even in the same size you sent it (that is, no partial data or data being lumped together).
Recommended packet structure:
[Data length (4 bytes)][Header (1 byte)][Data (?? bytes)]
- The header in question is a single byte you can use to indicate what kind of packet this is, so that the endpoint will know what to do with it.
Sending files
The sender of a file is in 90% of the cases aware of the amount of data it is about to send (after all, it usually has the file stored locally), which means there will be no problem knowing how much of the file has been sent or not.
The method I use and recommend is that you start by sending an "info packet", which explains to the endpoint that it is about to receive a file and also how many bytes that file consists of. After that you start sending the actual data - most preferrably in chunks since it's inefficient to proccess the entire file at once (at least if it's a large file).
Always keep track of how many bytes of the file you've received so far. By doing so the receiver can automatically tell when it has received the whole file.
Send a file a few kilobytes at a time (I use 8192 bytes = 8 kB as a file buffer). That way you don't have to read the entire file into memory nor encrypt it all at the same time.
Encrypting the data
Dealing with encryption will not be a problem. If you use length-prefixing just encrypt the data itself and leave the data length header untouched. The data length header must then be generated by the size of the encrypted data, like so:
Encrypt the data.
Get the length of the encrypted data.
Produce the following packet:
[Encrypted data length][Encrypted data]
(Insert a header byte in there if you need to)
Receiving an encrypted file
Receiving an encrypted file and knowing when everything has been received is infact not very hard. Assuming you're using the above the described method for sending the file, you would just have to:
Receive the encrypted packet → decrypt it.
Get the length of the decrypted data.
Increment a variable keeping track of the amount of file-bytes received.
If the received amount is equal to the expected amount: close the file.
Additional resources/references
You can refer to two of my previous answers that I wrote about TCP length-prefixed message framing:
C# Deserializing a struct after receiving it through TCP
TCP Client to Server communication

The easiest way would be to use your #2. If you cannot predict message length, buffer up to a certain amount of bytes (like 1 KiB or something along those lines), and insert a length header for every one of those chunks instead of prefixing the whole message once.

Related

How to send different datatypes over in a socket connection

I'm trying to make a client/socket program that involves sending a string from the client (to identify which key should be used) to the server, the server sends back a key in byte form, the client then sends a request for a file in string form, the server sends the requested file to the user which can be decrypted with they key.
I understand the cryptology aspects, I'm hung up on how to differentiate between sending bytes, string or a file to and from a server. I understand how to send a single stream (bytes, string or a file), but cannot find a way to send all of these in one stream if that makes any sense?
Do I have to create a new stream or socket connection each time I want to send a string, then a new one to send bytes, then a new one to send a file?
Any resources I could perhaps look up? Cheers!

Basically, what's sent over a socket connection is a bunch of bytes. This can represent a string, character, or an array of strings..etc.
If you want to send this all in one packet, you need to have designated length for each type of data structure, i.e. the string has a max of 1024 bytes, and the bytes have a max of 512 bytes...etc. Doing this will let you be able to decipher the information on the receiving end.
If you don't have maximum size and don't want to set them then you can take a different approach and send each data structure in its own packet. If you take this route you will need to designate the first byte of the packet to flag the receiver what type of data this is; i.e. 1=bytes, 2=string, 3=array..etc

Java Protocol Buffers - Message sizes

So, for the past few weeks I've been learning very simple network programming and protocol buffers. Right now, I have a Java client and a C# server that are communicating back and forth using the protocol buffers. It's all working fine, but to make it work on the client (Java) side I had to create my byte array with the exact size of the incoming message or else the parser would throw an error of "Protocol message contained an invalid tag (zero)"
After doing some research, I came to find out that the array I had created (1024bytes) for my DatagramPacket had tons of trailing zeros (since my incoming data from the server was 27bytes long), and that's why I now, as previously mentioned, have to create the array with the exact size of the incoming data.
As for the question, is there any way to find out the size of all of my proto "messages" in my .proto files? If there isn't some sort of static getSize(), is there a way I can calculate that just by the types of fields I have within the "message"?
My message I'm using right now contains 3 doubles, and now that I'm thinking about it, but I want a for sure answer from someone who knows what's going on, is it 27 because 8bytes per double and the 1byte per "tag" on each message field?

The root object in protobuf data is not self-terminating; it is designed to be appendable (with append===merge), so normally the library simply reads until it runs out of data. If you have spare zeros, it will fail to parse the next field header. There are two ways of addressing this:
if you only want to send one message, simply close the outbound socket at the end of you message; the client should detect the end of the socket and compensate accordingly (note, you still don't want to use an oversized buffer unless you are using a length-limited stream wrapper)
use some kind of "framing" protocol; the simplest of which is simply to prefix each message with the number of bytes in that message (note that in the general case this size is not fixed, but in the case off 3 doubles, each with a field-header of a field-number no-greater-than 16, then yes: it will be 27 bytes); you would then either create the buffer the right size (noting that repeated array allocations can be expensive), or more typically: use a length-limited stream wrapper, or a memory-backed in-memory stream

I believe your problem lies in your socket receive code. Having an array with trailing zeroes is not a problem, but when receiving you should check the number of bytes received (it is the return value of a receive call) and only consider the bytes of the buffer array from the beginning up to "bytes received".

Unknown AMF type & other errors when transfering files from flash to java

I'm using a Flash tool to transfer data to Java. I'm having problems when it comes to sending multiple objects at once. The objects being sent are just generic Object objects, so it's not a case of needed to register a class alias or anything.
Sending one object works fine. Once I start sending multiple objects (putting the same Objects in an Array and sending that), it starts getting weird. Up to three objects in an Array seems to work fine. More than that I get different errors in the readObject() function, such as:
Unknown AMF type '47'
Unknown AMF type '40'
Unknown AMF type '20'
OutOfBoundsExceptions index 23, size 0
NullPointerException
etc.
Sending 3 objects will work, sending 4 gives me the error. If I delete one of the previous 3 that worked (while keeping the fourth that was added), it'll work again. Anyone know what's going on?
Some more info:
Communication goes through a Socket class on the Flash side. This is pure AS3, no flex.
Messages are compressed before being sent and decompressed on the server, so I'm pretty sure it's not a buffer size issue (unless I'm missing something)
BlazeDS version seems to be 4.0.0.14931 on the jar
Flash version is 10.1 (it's an AIR app)
Update with rough code
Examples of the objects being sent:
var o:Object = { };
o._key = this._key.toString();
o.someParam = someString;
o.someParam2 = someInt;
o.someParam3 = [someString1, someString2, someString3];
...
It's added to our event object (which we use to determine the event to call, the data etc to pass). The event object is been registered as a class alias
That object is sent to the server through a Socket like so:
myByteArray.writeObject( eventObj );
myByteArray.compress();
mySocket.writeBytes( myByteArray );
mySocket.flush();
On the server side, we receive the bytes, and decompress them. We create a Amf3Input object and set the input stream, then read it:
Amf3Input amf3Input = new Amf3Input( mySerializationContext );
amf3Input.setInputStream( new ByteArrayInputStream( buffer ) ); // buffer is a byte[]
MyEventObj eventObj = (MyEventObj)amf3Input.readObject(); // MyEventObj is the server version of the client event object
If it's going to crash with an "unknown AMF type error", it does so immediately i.e. when we try to read the object, and not when it's trying to read a subobject.
In stepping through the read code, it seems when I pass an array of objects, if the length is <= 4, it reads the length right. If the length is bigger than that, it reads it's length as 4.

If you're getting AMF deserialization errors, there are several possible issues that could be contributing to the problem. Here are several techniques for doing further diagnostics:
Use a network traffic sniffer to make sure that what you are sending matches what you are receiving. On the Mac I'll use CocoaPacketAnalyzer, or you can try Charles, which can actually decode AMF packets that it notices.
Feed the data to a different AMF library, like PyAMF or RocketAMF to see if it's a problem with BlazeDS or with how you're calling it. It's also possible that you may get a different error message that will give you a better idea of where it's failing.
Check the format of the AMF packet. AMF server calls have some additional wrapping around them that would throw off a deserializer if it's not expecting to encounter that wrapping, and vice versa for purely serialized objects. Server call packets always start off with a 0x00, followed by the AMF version (0x00, 0x03, or in rare cases 0x02).

Ok, I figured out the problem. Basically messages are compressed before being sent and decompressed on the server. What I didn't see was that the byte[] buffer that the message was being decompressed to was always 1024 length, which was fine for small arrays of objects. Once that was passed however, it would overwrite the buffer (I'm not quite sure what happens in Java when you try to write more bytes than are available - whether it loops back around, or shifts the data).
When it came to reading the amf object, the first thing it does it read an int, and uses this to determine what type of object it's trying to decode. As this int was gibberish (47, 110, -10), it was failing.
Time to start prepending message lengths I think :)
Thanks for the help.

Exceptions when reading protobuf messages in Java

I am using protobuf now for some weeks, but I still keep getting exceptions when parsing protobuf messages in Java.
I use C++ to create my protobuf messages and send them with boost sockets to a server socket where the Java client ist listening. The C++ code for transmitting the message is this:
boost::asio::streambuf b;
std::ostream os(&b);
ZeroCopyOutputStream *raw_output = new OstreamOutputStream(&os);
CodedOutputStream *coded_output = new CodedOutputStream(raw_output);
coded_output->WriteVarint32(agentMessage.ByteSize());
agentMessage.SerializeToCodedStream(coded_output);
delete coded_output;
delete raw_output;
boost::system::error_code ignored_error;
boost::asio::async_write(socket, b.data(), boost::bind(
&MessageService::handle_write, this,
boost::asio::placeholders::error));
As you can see I write with WriteVarint32 the length of the message, thus the Java side should know by using parseDelimitedFrom how far it should read:
AgentMessage agentMessage = AgentMessageProtos.AgentMessage
.parseDelimitedFrom(socket.getInputStream());
But it's no help, I keep getting these kind of Exceptions:
Protocol message contained an invalid tag (zero).
Message missing required fields: ...
Protocol message tag had invalid wire type.
Protocol message end-group tag did not match expected tag.
While parsing a protocol message, the input ended unexpectedly in the middle of a field. This could mean either than the input has been truncated or that an embedded message misreported its own length.
It is important to know, that these exceptions are not thrown on every message. This is only a fraction of the messages I receive the most work out just fine - still I would like to fix this since I do not want to omit the messages.
I would be really gratful if someone could help me out or spent his ideas.
Another interesting fact is the number of messages I receive. A total messages of 1.000 in 2 seconds is normally for my program. In 20 seconds about 100.000 and so on. I reduced the messages sent artificially and when only 6-8 messages are transmitted, there are no errors at all. So might this be a buffering problem on the Java client socket side?
On, let's say 60.000 messages, 5 of them are corrupted on average.

[I'm not really a TCP expert, this may be way off]
Problem is, [Java] TCP Socket's read(byte[] buffer) will return after reading to the end of the TCP frame. If that happens to be mid-message (I mean, protobuf message), parser will choke and throw an InvalidProtocolBufferException.
Any protobuf parsing call uses CodedInputStream internally (src here), which, in case the source is an InputStream, relies on read() -- and, consequently, is subject to the TCP socket issue.
So, when you stuff big amounts of data through your socket, some messages are bound to be split in two frames -- and that's where they get corrupted.
I'm guessing, when you lower message transfer rate (as you said to 6-8 messages per second), each frame gets sent before the next data piece is put into the stream, so each message always gets its very own TCP frame, i.e. none get split and don't get errors. (Or maybe it's just that the errors are rare and low rate just means you need more time to see them)
As for the solution, your best bet would be to handle the buffer yourself, i.e. read a byte[] from the socket (probably using readFully() instead of read() because the former will block until either there's enough data to fill the buffer [or a EOF is encountered], so it's kind of resistant to the mid-message frame end thing), ensure it's got enough data to be parsed into a whole message, and then feed the buffer to the parser.
Also, there's some good read on the subject in this Google Groups topic -- that's where I got the readFully() part.

I am not familiar with the Java API, but I wonder how Java deals with an uint32 value denoting the message length, because Java only has signed 32-bit integers. A quick look at the Java API reference told me an unsigned 32-bit value is stored within a signed 32-bit variable. So how is the case handled where an unsigned 32-bit value denotes the message length? Also, there seems to be support for varint signed integers in the Java implementation. They are called ZigZag32/64. AFAIK, the C++ version doesn't know about such encodings. So maybe the cause for your problem might be related with these things?

How does SocketChannel know when reading a file is completed?

I am using socket channel and NIO concept to read data from client.
How does Socket Channel knows when reading a file is completed?
ByteBuffer byteBuffer = ByteBuffer.allocate(BUFSIZE);
int nbytes = socketChannel.getChannel().read(byteBuffer);
I am reading the data all at once if the data is small, but if the data is large, I read the data in fragments and finally get the same data now I want to know how does channel understood for end of data.
Is there any way for me to know when the file reading is completed?

There are three basic options:
The protocol could specify that the length of the file should come before the data.
The protocol could specify some "end of file" marker (which would have to be invalid for data within the file, of course)
The server could close the socket when it had finished: your read call will return -1 to let you know when all the data has been read
Basically the way data is streamed, you can't rely on all the data coming down in a particular number of requests.
What protocol are you using, and can you modify it appropriately? A length prefix is usually the easiest solution.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.