JAVA : BufferdInputStream and BufferedOutputStream

JAVA : BufferdInputStream and BufferedOutputStream - java

I have several questions-
1. I have two computers connected by socket connection. When the program executes
outputStream.writeInt(value);
outputStream.flush();
what actually happens? Does the program wait until the other computer reads the integer value?
2. How can I empty the outputStream or inputStream? Meaning, when emptying
the outputStream or inputStream, whatever is written to that stream gets removed.
(please don't suggest to do it by closing the connection!)
I tried to empty the inputStream this way-
byte[] eatup=new byte[20*1024];
int available=0;
while(true)
{
available=serverInputStream.available();
if(available==0)
break;
serverInputStream.read(eatup,0,available);
}
eatup=null;
String fileName=(String)serverInputStream.readObject();
Program should not process the line as nothing else is being written on the outputStream .
But my program executes it anyway and throws a java.io.OptionalDataException error.
Note: I am working on a client-server file transfer project. The client sends files to
the server. The second code is for server terminal. If 'cancel button' is pressed on server
end then it stops reading bytes from the serverInputStream and sends a signal(I used int -1)
to the client. When client receieves this signal it stops sending data to the server, but I've
noticed that serverInputStream is not empty. So I need to empty this serverInputStream so that
the client computer is able to send the server computer files again(That's why I can't manage a lock
from read method)

1 - No. On the flush() the data will be written to the OS kernel which will likely immediately hand it to the network card driver, which in turn will send it to the receiving end. In a nutshell the send is fire and forget.
2 - As Jeffrey commented available() is not reliable for this sort of operation. If doing blocking IO then as he suggests you should just use read() speculatively. However it should be said that you really need to define a protocol on top of the raw streams, even if it's just using DataInput/DataOutputStream. When using raw write/read the golden rule is one write != one read. For example if you were to write 10 bytes on one side and had a reading loop on the other there is no guarantee that one read will read all 10 bytes. It may be "read" as any combination of chunks. Similarly two writes of 10 bytes might appear as one read of 20 bytes on the receiving side. Put another way there is no concept of a "packet" unless you create a higher level protocol on top of the raw bytes to do packets. An example would be each send is prefixed by a byte length so the receiving side knows how much data to expect in the current packet.
If you do need to do anything more complicated than a basic apps I strongly encourage you to investigate some higher level libraries that have solved many of the gnarly issues of network IO. I would recommend Netty which I use for production apps. However it is quite a big leap in understanding from a simple IO stream to Netty's more event based system. There may be other libraries somewhere in the middle.

Related

How does Java NIO break up messages?

I'm writing a toy Java NIO server paired with a normal Java client. The client sends a string message to the server using plain Socket. The server receives the message and dumps the content to terminal.
I've noticed that the same message from client is broken up into bytebuffers differently every single time. I understand this is intended behaviour of NIO, but would like to find out roughly how the NIO decides to chop up a message?
Example: Sending string "this is a test message" to server. The following are excerpts of server loggings (each line represents 1 bytebuffer received).
Run 1:
Server receiving: this is a test message
Run 2:
Server receiving: t
Server receiving: his is a test message
Run 3:
Server receiving: this is
Server receiving: a test message
UPDATE - Issue Resolved
I have installed Wireshark to analyse the packets and it has become apparent that the random "break up" was due to me using DataOutputStream for the writer, which sends the message character by character! So there was a packet for each character...
After changing the writer to BufferedWriter, my short message is now sent as a single packet, as expected. So the truth is Java NIO actually did the clever thing and merged my tiny packets to 1 to 2 bytebuffers!
UPDATE2 - Clarification
Thank you all for your replies. Thank you #StephenC for pointing out that unless I encode the message myself(yes, I did call flush() after writing to BufferedWriter), there's always the possiblity of my message arriving across multiple packets.
So the truth is Java NIO actually did the clever thing and merged my tiny
Actually, no. The merging is happening in the BufferedWriter layer. The buffered writer will only deliver a "bunch" of bytes to the NIO layer when either the application flushes or closes the DataOutputStream or the BufferdWriters buffer fills up.
I was in fact referring to my first attempt with DataOutputStream (I got it from an example online, which obviously is incorrect use of the class now that you've pointed it out). BufferedWriter was not involved. My simple writer in that case went like
DataOutputStream out = new DataOutputStream(socket.getOutputStream());
out.writeBytes("this is a test message");
Wireshark confirmed that this message was sent(server on localhost) 1 character a packet(22 packets in total for the actual message not including all the ACK and etc).
I'm probably wrong, but this behaviour seems to suggest that the NIO server combined these 22 packets into 1-2 bytebuffers?
The end game I'm trying to achieve here is a simple Java NIO server capable of receiving request and data stream using TCP from various clients, some may be written in C++ or C# by third party. It's not time critical so the clients can send all data in one go and the server can process them at its own pace. That's why I've written a toy client in Java using plain Socket rather than a NIO client. Therefore the client in this case can't really manipulate the ByteBuffer directly, so I probably need some sort of message format. Could I make this work?

If you are sending data over a TCP/IP socket, then there are no "messages" as such. What you send and receive is a stream of bytes.
If you are asking if you can send a chunk of N bytes, and have the receiver get exactly N bytes in a single read call, then the answer is that there is no guarantee that will happen. However, it is the TCP/IP stack that is "breaking up" the "messages". Not NIO. Not Java.
Data sent over a TCP/IP connection is ultimately broken into network packets for transmission. This typically erases any "message" structure based on the original write request sizes.
If you want a reliable message structure over the top of the TCP/IP byte stream, you need to encode it in the stream itself; e.g. using an "end-of-message" marker or prefixing each message with a byte count. (If you want to use fancy words, you need to implement a "message protocol" over the top of the TCP/IP stream.)
Concerning your update, I think there are still some misconceptions:
... it became apparent that the random "break up" was due to me using DataOutputStream for the writer, which sends the message character by character! So there was a packet for each character...
Yes, lots of small writes to a socket stream may result in severe fragmentation at the network level. However, it won't always. If there is sufficient "back pressure" due to either network bandwidth constraints or the receiver reading slowly, then this will lead to larger packets.
After changing the writer to BufferedWriter, my short message is now sent as a single packet, as expected.
Yes. Adding buffering to the stack is good. However, you are probably doing something else; e.g. calling flush() after each message. If you didn't then I would expect a network packet to contain a sequence of messages and partial messages.
What is more, if the messages are too large to fit into a single network packet, or if there is severe back-pressure (see above) then you are liable to get multiple / partial messages in a packet anyway. Either way, the receiver should not rely on getting one (whole) message each time it reads.
In short, you may not have really resolved your issue!!
So the truth is Java NIO actually did the clever thing and merged my tiny
Actually, no. The merging is happening in the BufferedWriter layer. The buffered writer will only deliver a "bunch" of bytes to the NIO layer when either the application flushes or closes the DataOutputStream or the BufferdWriters buffer fills up.
FWIW - given your description of what you are doing, it is unlikely using NIO is helping performance. If you wanted to maximize performance, you should stop using BufferedWriter and DataOutputStream. Instead do your message encoding "by hand", putting the bytes or characters directly into the ByteBuffer or CharBuffer.
(Also DataOutputStream is for binary data, not text. Putting one in front of a Writer doesn't seem right ... if that is what you are really doing.)

Win32 API vs Java socket flushing (TCP)

In Java, one can write a couple of bytes (or however many bytes that is less than the buffer's size) to the socket's output stream and then flush it, in case some bytes need to be sent immediately. In Win32 API, there doesn't seem to be any sort of flush function, so I feel like "flushing" in this case is just padding the rest of the buffer with dummy bytes (probably zeroes) to force the send function to, well, send the data. My question is, how does Java (or even C#, perhaps) implement such a thing under the hood? That is, if I had a Java socket communicating with a Win32 socket somewhere, how would they communicate given that there will be some flushing needed? For example, if the Java socket's buffer were to be flushed (calling flush on the socket's OutputStream), what would I get on the Win32 side (by calling recv)? And the reverse? Would I see some padding behaviour? Would the Win32 side need to know the Java side's buffer size? Something unrelated?

Flushing a Java socket output stream does nothing. Flushing a BufferedOutputStream does something, but only to the application-side buffer. In neither case is there a system call involved other than the send() or write() implied by the flush. You're looking for something that doesn't exist.
so I feel like "flushing" in this case is just padding the rest of the buffer with dummy bytes (probably zeroes) ...
No.

First recall that TCP provides a bidirectional stream of bytes. On one side you write bytes to a socket and the other side can read them from the socket. There is no guarantee whether writing 4 bytes will result in one read call that fetches 4 bytes on the other end. It could also be 2 times 2 bytes. It's no packets or buffers, and therefore "padding the rest of the buffer with dummy bytes" seems like a bad idea, since the other side will eventually receive those dummy bytes (and interpret them for whatever).
Next on to the question:
On the base of all the application there are the OS socket APIs, which provide the write/send calls for the socket. When you write to the OS socket the bytes that are writing are basically only written into the socket send buffer of the OS, from which they will be sent to the remote side at some point of time. This point of time depends on how filled the state of the send buffer is and how things look on the network (TCP windows, network congestion, etc.). However you normally don't have to care about it, the OS will simply send the data eventually and there is no need to flush something. There is one setting on the OS which can be used to influence the sending behavior, which is the nagle algorithm (or TCP_NODELAY) setting. If nagle algorithm is disabled (NODELAY = true) this means the OS will try to send the data immediatly after the write call and doesn't wait for more data from the application in order to send less IP packets. You can use this to reduce latency in case of small packets, but there is no need for it, since the OS will send the data anyway. So it's not a flush in the sense of a required flush.
For the Java side I'm not exactly sure what the flush is doing. It could be that the Java OutputStream has an internal buffer which is only written to the OS socket with the write system call once either a certain treshold of bytes in the buffer is reached or flush is called. Or flush exists purely in order to satisy the OutputStream base class and does nothing extra. You should be safe by using flush on the Java side (or on other platforms where it exists) and doing nothing special in native socket APIs.

How does Java NIO's selector check for available events under the hood?

A similar question has been asked before, but I would like to place it again, hoping that someone would help clear out a couple of things. As an experiment, I tried writing a naive "non-blocking" server in Java without using NIO, where essentially three threads are needed:
Main server thread - accept()s new socket connections, and puts each new socket in a queue
Reading worker thread - goes through each socket in the queue, and reads a little bit out of each socket's input stream, and stores it in an InputQueue
Writing worker thread - depending on when the incoming request gets read out of each socket, this worker would loop over all sockets where a response is needed, and once again, write a few bytes of response on every take.
In the previous question, it was pointed out that Java NIO's select() mechanism is far better than polling on each socket, and sleeping a little after every take through the queue. I know how select works in theory, but the main thing that I struggle to understand is the following: if polling is bad and inefficient, how does select() do it under the hood?
UPDATE: I found this page which gives a bit more light on how the native select() works under the hood. What is interesting is that indeed, my initial speculations seem to be right: select() works in a linear fashion, probing each of the requested file descriptors, similar to what a polling mechanism would do:
They both [select() and poll()] handle file descriptors in a
linear way. The more descriptors you ask them to check, the slower
they get. As soon as you go beyond perhaps a hundred file descriptors
or so - of course depending on your CPU and hardware - you will start
noticing that the mere waiting for file descriptor activity and the
following checking which file descriptor that it was, takes a
significant time and becomes a bottle neck.

It calls the select() method in the operating system, which:
deems a socket to be readable if there is data or a FIN in the socket receive buffer
deems a socket to be writable is there is space in the socket send buffer (i.e. most of the time).

Do I have to handle message fragmentation for Sockets?

I am currently using java.net.Socket to send messages from the client and reading messages from the server. All my messages are fairly short so far, and I have never had any problems.
One of my friends noticed that I was not handling message fragmentation, where the data could come in pieces, and has advised that I should create a buffer to handle this. I insisted that TCP handles this for me, but I'm not 100% sure.
Who is right?
Also, I plan on creating a client in C as well in the future. Do Berkeley sockets handle message fragmentation?
Details: Currently, in Java, the server creates a socket and reads the first byte from the message with InputStream#read(). That first byte determines the length of the entire message, and creates a byte array of the appropriate length, and calls InputStream#read(byte[]) once and assumes that the entire message has been read.

If you are talking about WebSockets,you may be mixing different concepts.
One thing is TCP/IP message fragmentation.
Other thing is how buffering works. You read buffers of data, and you need a framing protocol that tells you when you have a complete "message" (or frame). Basically you:
Read buffer.
Has complete header? No-> Goto 1, Yes-> continue
Read until having all the bytes that the head indicates as message
length.
Has complete message? No-> Goto 3, Yes -> continue
Yield message.
Goto 1.
Other different thing is WebSocket message fragmentation. WebSocket has already a framing protocol and messages can be split in different data frames, and control frames can be interleaved with data frames: https://developer.mozilla.org/en-US/docs/WebSockets/Writing_WebSocket_servers#Message_Fragmentation
If you are writing a WebSocket client or server you have to be ready for this situation.

Expanding on what nos said, TCP will break up large messages into smaller chunks, if the message is large enough. Often, it isn't. Often, the data you write is already split into parts (by you), into meaningful chunks like discrete messages.
The stuff about the reads/writes taking different amounts of calls comes from how the data is written, how it travels over the wire, and how you read it.
If you write 2 bytes 100 times, and then 20 seconds later go to read, it will say there is 200 bytes to be read, which you can read all at once if you want. If you pass a massive 2mb buffer to be written (I dont even know if thats possible), it would take longer to write out, giving more of a chance to the reading program to get different read calls.

Details: Currently, in Java, the server creates a socket and reads the first byte from the message with InputStream#read(). That first byte determines the length of the entire message, and creates a byte array of the appropriate length, and calls InputStream#read(byte[]) once and assumes that the entire message has been read.
That won't work. Have a look at the contract for InputStream.read(byte[]). It isn't obliged to transfer more than one byte. The correct technique is to read the length byte and then use DataInputStream.readFully(), which has the obligation to fill the buffer.

Java sockets are not reading data

I'm programming a server (Java) - client (Android/Java) application. The server is a W7. All the communication goes well until one read in the client that freezes and stops reading data until I send it 2 times more.
The data not read is a byte array. I repeat that all the communication goes well until this point.
Here's the code that I use to send the data:
Long lLength = new Long(length);
byte [] bLength = this.longToBytes(lLength.longValue());
dos.write(bLength);
dos.flush();
dos.write(bLength);
dos.flush();
dos.write(bLength);
dos.flush();
This code transforms a long value into an 8 bytes array. As I said, when the first write is executed (and the client is waiting for data), the read is not done. It is not done until I execute the last write().
And here's the code to read it:
byte length[] = {0,0,0,0,0,0,0,0};
dis.read(length);
I've used Wireshark to sniff the traffic, and I can see that the byte array is send, and the client answers with an ACK, but the read is not done.
In the client and the server, the sockets are setup like this:
socket = new Socket(sIP, oiPort.intValue());
dos = new DataOutputStream(socket.getOutputStream());
dis = new DataInputStream(socket.getInputStream());
This is driving me mad... I don't know why, at one moment, the application stops reading the data, when I send it the same way as always.
I suppose that the problem may be in the input buffers of the client socket... But I don't know what to do or what to try...
Say that I've also test the server in a WXPSP3 and it still doesn't work.

First thing I'd look at is the code for your longToBytes method. Is it really creating a byte array of 8 bytes? If it is generating an array of less than 8 bytes, then that explains the problem. (Your client is expecting 8 bytes, and will block until they all arrive.)
Next thing I'd ask myself is why I'm not just using writeLong and readLong. It would simplify your code, and quite possibly cure the problem.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.