How does Java NIO break up messages?

How does Java NIO break up messages? - java

I'm writing a toy Java NIO server paired with a normal Java client. The client sends a string message to the server using plain Socket. The server receives the message and dumps the content to terminal.
I've noticed that the same message from client is broken up into bytebuffers differently every single time. I understand this is intended behaviour of NIO, but would like to find out roughly how the NIO decides to chop up a message?
Example: Sending string "this is a test message" to server. The following are excerpts of server loggings (each line represents 1 bytebuffer received).
Run 1:
Server receiving: this is a test message
Run 2:
Server receiving: t
Server receiving: his is a test message
Run 3:
Server receiving: this is
Server receiving: a test message
UPDATE - Issue Resolved
I have installed Wireshark to analyse the packets and it has become apparent that the random "break up" was due to me using DataOutputStream for the writer, which sends the message character by character! So there was a packet for each character...
After changing the writer to BufferedWriter, my short message is now sent as a single packet, as expected. So the truth is Java NIO actually did the clever thing and merged my tiny packets to 1 to 2 bytebuffers!
UPDATE2 - Clarification
Thank you all for your replies. Thank you #StephenC for pointing out that unless I encode the message myself(yes, I did call flush() after writing to BufferedWriter), there's always the possiblity of my message arriving across multiple packets.
So the truth is Java NIO actually did the clever thing and merged my tiny
Actually, no. The merging is happening in the BufferedWriter layer. The buffered writer will only deliver a "bunch" of bytes to the NIO layer when either the application flushes or closes the DataOutputStream or the BufferdWriters buffer fills up.
I was in fact referring to my first attempt with DataOutputStream (I got it from an example online, which obviously is incorrect use of the class now that you've pointed it out). BufferedWriter was not involved. My simple writer in that case went like
DataOutputStream out = new DataOutputStream(socket.getOutputStream());
out.writeBytes("this is a test message");
Wireshark confirmed that this message was sent(server on localhost) 1 character a packet(22 packets in total for the actual message not including all the ACK and etc).
I'm probably wrong, but this behaviour seems to suggest that the NIO server combined these 22 packets into 1-2 bytebuffers?
The end game I'm trying to achieve here is a simple Java NIO server capable of receiving request and data stream using TCP from various clients, some may be written in C++ or C# by third party. It's not time critical so the clients can send all data in one go and the server can process them at its own pace. That's why I've written a toy client in Java using plain Socket rather than a NIO client. Therefore the client in this case can't really manipulate the ByteBuffer directly, so I probably need some sort of message format. Could I make this work?

If you are sending data over a TCP/IP socket, then there are no "messages" as such. What you send and receive is a stream of bytes.
If you are asking if you can send a chunk of N bytes, and have the receiver get exactly N bytes in a single read call, then the answer is that there is no guarantee that will happen. However, it is the TCP/IP stack that is "breaking up" the "messages". Not NIO. Not Java.
Data sent over a TCP/IP connection is ultimately broken into network packets for transmission. This typically erases any "message" structure based on the original write request sizes.
If you want a reliable message structure over the top of the TCP/IP byte stream, you need to encode it in the stream itself; e.g. using an "end-of-message" marker or prefixing each message with a byte count. (If you want to use fancy words, you need to implement a "message protocol" over the top of the TCP/IP stream.)
Concerning your update, I think there are still some misconceptions:
... it became apparent that the random "break up" was due to me using DataOutputStream for the writer, which sends the message character by character! So there was a packet for each character...
Yes, lots of small writes to a socket stream may result in severe fragmentation at the network level. However, it won't always. If there is sufficient "back pressure" due to either network bandwidth constraints or the receiver reading slowly, then this will lead to larger packets.
After changing the writer to BufferedWriter, my short message is now sent as a single packet, as expected.
Yes. Adding buffering to the stack is good. However, you are probably doing something else; e.g. calling flush() after each message. If you didn't then I would expect a network packet to contain a sequence of messages and partial messages.
What is more, if the messages are too large to fit into a single network packet, or if there is severe back-pressure (see above) then you are liable to get multiple / partial messages in a packet anyway. Either way, the receiver should not rely on getting one (whole) message each time it reads.
In short, you may not have really resolved your issue!!
So the truth is Java NIO actually did the clever thing and merged my tiny
Actually, no. The merging is happening in the BufferedWriter layer. The buffered writer will only deliver a "bunch" of bytes to the NIO layer when either the application flushes or closes the DataOutputStream or the BufferdWriters buffer fills up.
FWIW - given your description of what you are doing, it is unlikely using NIO is helping performance. If you wanted to maximize performance, you should stop using BufferedWriter and DataOutputStream. Instead do your message encoding "by hand", putting the bytes or characters directly into the ByteBuffer or CharBuffer.
(Also DataOutputStream is for binary data, not text. Putting one in front of a Writer doesn't seem right ... if that is what you are really doing.)

Related

Do I have to handle message fragmentation for Sockets?

I am currently using java.net.Socket to send messages from the client and reading messages from the server. All my messages are fairly short so far, and I have never had any problems.
One of my friends noticed that I was not handling message fragmentation, where the data could come in pieces, and has advised that I should create a buffer to handle this. I insisted that TCP handles this for me, but I'm not 100% sure.
Who is right?
Also, I plan on creating a client in C as well in the future. Do Berkeley sockets handle message fragmentation?
Details: Currently, in Java, the server creates a socket and reads the first byte from the message with InputStream#read(). That first byte determines the length of the entire message, and creates a byte array of the appropriate length, and calls InputStream#read(byte[]) once and assumes that the entire message has been read.

If you are talking about WebSockets,you may be mixing different concepts.
One thing is TCP/IP message fragmentation.
Other thing is how buffering works. You read buffers of data, and you need a framing protocol that tells you when you have a complete "message" (or frame). Basically you:
Read buffer.
Has complete header? No-> Goto 1, Yes-> continue
Read until having all the bytes that the head indicates as message
length.
Has complete message? No-> Goto 3, Yes -> continue
Yield message.
Goto 1.
Other different thing is WebSocket message fragmentation. WebSocket has already a framing protocol and messages can be split in different data frames, and control frames can be interleaved with data frames: https://developer.mozilla.org/en-US/docs/WebSockets/Writing_WebSocket_servers#Message_Fragmentation
If you are writing a WebSocket client or server you have to be ready for this situation.

Expanding on what nos said, TCP will break up large messages into smaller chunks, if the message is large enough. Often, it isn't. Often, the data you write is already split into parts (by you), into meaningful chunks like discrete messages.
The stuff about the reads/writes taking different amounts of calls comes from how the data is written, how it travels over the wire, and how you read it.
If you write 2 bytes 100 times, and then 20 seconds later go to read, it will say there is 200 bytes to be read, which you can read all at once if you want. If you pass a massive 2mb buffer to be written (I dont even know if thats possible), it would take longer to write out, giving more of a chance to the reading program to get different read calls.

Details: Currently, in Java, the server creates a socket and reads the first byte from the message with InputStream#read(). That first byte determines the length of the entire message, and creates a byte array of the appropriate length, and calls InputStream#read(byte[]) once and assumes that the entire message has been read.
That won't work. Have a look at the contract for InputStream.read(byte[]). It isn't obliged to transfer more than one byte. The correct technique is to read the length byte and then use DataInputStream.readFully(), which has the obligation to fill the buffer.

red5 RTMPClient is not publishing stream if streamname is big enough

I have a Red5 client implementation which publishes streams, loaded from video file to our wowza media server. The problem is that if stream name is to big - approximately more than 90 symbols - the client is not publishing it and fails silently. All others actions expected from client are fulfilled: it connects to server and creates a stream. But never publishes the stream. I dont see a corresponding RTMP message and I dont see a resulting reaction in the logs of wowza.
I tried to debug the client and tracked the execution until it starts to write to SocketChannel. Everything is the same for the cases of execution of shorter named stream (which publishes ok), and stream with the long name, which RTMP command "to publish" is never sent.
A questions are:
whats up?
if I have written some bytes to SocketChannel without any exceptions thrown - does it guarantees that the corresponding message was sent?
if I have written some bytes to SocketChannel without any exceptions thrown - can I check by the means of my OS (MACOS in my case) whether the bytes were really written somewhere? Although I know, by the means of WireShark, that this piece of data was never sent.
UPDATE
...and which is even more strange - after sending "the big" packet sending a smaller one doesn't help. No packets can be sent after a packet of bigger length have been submitted to the socket.

If I have written some bytes to SocketChannel without any exceptions thrown - does it guarantees that the corresponding message was sent?
It guarantees that the data has been buffered locally in the socket send buffer, up to the count returned by write(). Nothing more.
As you can't send further data, it sounds to me as though the receiver isn't reading the large piece of data. Is it possibly failing with an exception and ceasing to read altogether?

JAVA : BufferdInputStream and BufferedOutputStream

I have several questions-
1. I have two computers connected by socket connection. When the program executes
outputStream.writeInt(value);
outputStream.flush();
what actually happens? Does the program wait until the other computer reads the integer value?
2. How can I empty the outputStream or inputStream? Meaning, when emptying
the outputStream or inputStream, whatever is written to that stream gets removed.
(please don't suggest to do it by closing the connection!)
I tried to empty the inputStream this way-
byte[] eatup=new byte[20*1024];
int available=0;
while(true)
{
available=serverInputStream.available();
if(available==0)
break;
serverInputStream.read(eatup,0,available);
}
eatup=null;
String fileName=(String)serverInputStream.readObject();
Program should not process the line as nothing else is being written on the outputStream .
But my program executes it anyway and throws a java.io.OptionalDataException error.
Note: I am working on a client-server file transfer project. The client sends files to
the server. The second code is for server terminal. If 'cancel button' is pressed on server
end then it stops reading bytes from the serverInputStream and sends a signal(I used int -1)
to the client. When client receieves this signal it stops sending data to the server, but I've
noticed that serverInputStream is not empty. So I need to empty this serverInputStream so that
the client computer is able to send the server computer files again(That's why I can't manage a lock
from read method)

1 - No. On the flush() the data will be written to the OS kernel which will likely immediately hand it to the network card driver, which in turn will send it to the receiving end. In a nutshell the send is fire and forget.
2 - As Jeffrey commented available() is not reliable for this sort of operation. If doing blocking IO then as he suggests you should just use read() speculatively. However it should be said that you really need to define a protocol on top of the raw streams, even if it's just using DataInput/DataOutputStream. When using raw write/read the golden rule is one write != one read. For example if you were to write 10 bytes on one side and had a reading loop on the other there is no guarantee that one read will read all 10 bytes. It may be "read" as any combination of chunks. Similarly two writes of 10 bytes might appear as one read of 20 bytes on the receiving side. Put another way there is no concept of a "packet" unless you create a higher level protocol on top of the raw bytes to do packets. An example would be each send is prefixed by a byte length so the receiving side knows how much data to expect in the current packet.
If you do need to do anything more complicated than a basic apps I strongly encourage you to investigate some higher level libraries that have solved many of the gnarly issues of network IO. I would recommend Netty which I use for production apps. However it is quite a big leap in understanding from a simple IO stream to Netty's more event based system. There may be other libraries somewhere in the middle.

What is a TCP window update?

I'm making my own custom server software for a game in Java (the game and original server software were written with Java). There isn't any protocol documentation available, so I am having to read the packets with Wireshark.
While a client is connecting the server sends it the level file in Gzip format. At about 94 packets into sending the level, my server crashes the client with an ArrayIndexOutOfBoundsException. According to the capture file from the original server, it sends a TCP Window Update at about that point. What is a TCP Window Update, and how would I send one using a SocketChannel?

TCP windows are used for flow control between the peers on a connection. With each ACK packet, a host will send a "window size" field. This field says how many bytes of data that host can receive before it's full. The sender is not supposed to send more than that amount of data.
The window might get full if the client isn't receiving data fast enough. In other words, the TCP buffers can fill up while the application is off doing something other than reading from its socket. When that happens, the client would send an ACK packet with the "window full" bit set. At that point, the server is supposed to stop sending data. Any packets sent to a machine with a full window will not be acknowledged. (This will cause a badly behaved sender to retransmit. A well-behaved sender will just buffer the outgoing data. If the buffer on the sending side fills up too, then the sending app will block when it tries to write more data to the socket!)
This is a TCP stall. It can happen for a lot of reasons, but ultimately it just means the sender is transmitting faster than the receiver is reading.
Once the app on the receiving end gets back around to reading from the socket, it will drain some of the buffered data, which frees up some space. The receiver will then send a "window update" packet to tell the sender how much data it can transmit. The sender starts transmitting its buffered data and traffic should flow normally.
Of course, you can get repeated stalls if the receiver is consistently slow.
I've worded this as if the sender and receiver are different, but in reality, both peers are exchanging window updates with every ACK packet, and either side can have its window fill up.
The overall message is that you don't need to send window update packets directly. It would actually be a bad idea to spoof one up.
Regarding the exception you're seeing... it's not likely to be either caused or prevented by the window update packet. However, if the client is not reading fast enough, you might be losing data. In your server, you should check the return value from your Socket.write() calls. It could be less than the number of bytes you're trying to write. This happens if the sender's transmit buffer gets full, which can happen during a TCP stall. You might be losing bytes.
For example, if you're trying to write 8192 bytes with each call to write, but one of the calls returns 5691, then you need to send the remaining 2501 bytes on the next call. Otherwise, the client won't see the remainder of that 8K block and your file will be shorter on the client side than on the server side.

This happens really deep in the TCP/IP stack; in your application (server and client) you don't have to worry about TCP windows. The error must be something else.

TCP WindowUpdate - This indicates that the segment was a pure WindowUpdate segment. A WindowUpdate occurs when the application on the receiving side has consumed already received data from the RX buffer causing the TCP layer to send a WindowUpdate to the other side to indicate that there is now more space available in the buffer. Typically seen after a TCP ZeroWindow condition has occurred. Once the application on the receiver retrieves data from the TCP buffer, thereby freeing up space, the receiver should notify the sender that the TCP ZeroWindow condition no longer exists by sending a TCP WindowUpdate that advertises the current window size.
https://wiki.wireshark.org/TCP_Analyze_Sequence_Numbers

A TCP Window Update has to do with communicating the available buffer size between the sender and the receiver. An ArrayIndexOutOfBoundsException is not the likely cause of this. Most likely is that the code is expecting some kind of data that it is not getting (quite possibly well before this point that it is only now referencing). Without seeing the code and the stack trace, it is really hard to say anything more.

You can dive into this web site http://www.tcpipguide.com/free/index.htm for lots of information on TCP/IP.

Do you get any details with the exception?
It is not likely related to the TCP Window Update packet
(have you seen it repeat exactly for multiple instances?)
More likely related to your processing code that works on the received data.

This is normally just a trigger, not the cause of your problem.
For example, if you use NIO selector, a window update may trigger the wake up of a writing channel. That in turn triggers the faulty logic in your code.
Get a stacktrace and it will show you the root cause.

Using NIO DatagramChannel will I need to handle partially read/written packets?

When using SocketChannel, you need to retain read and write buffers to handle partial writes and reads.
I have a nagging suspicion that it might not be needed when using a DatagramChannel, but info is scarce.
What is the story?
Should I call (non-blocking) receive(ByteBuffer) repeatedly until I get a null back to read all waiting datagrams?
When sending in non-blocking mode, can I rely on send(ByteBuffer, SocketAddress) to either send the the whole buffer or rejecting it entirely, or do I need to possibly keep partially written buffers?

Every read of a Datagram is the entire datagram, nothing more, nothing less. There's a hint that this is the case in the description of java.nio.DatagramChannel.read:
If there are more bytes in the
datagram than remain in the given
buffers then the remainder of the
datagram is silently discarded
When you're dealing with a SocketChannel, it's a message stream; there's no guarantee how much or how little data you'll get on each read, as TCP is reassembling separate packets to recreate the message from the other side. But for UDP (which is what you're reading with the DatagramChannel) each packet is its own atomic message.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.