Win32 API vs Java socket flushing (TCP)

Win32 API vs Java socket flushing (TCP) - java

In Java, one can write a couple of bytes (or however many bytes that is less than the buffer's size) to the socket's output stream and then flush it, in case some bytes need to be sent immediately. In Win32 API, there doesn't seem to be any sort of flush function, so I feel like "flushing" in this case is just padding the rest of the buffer with dummy bytes (probably zeroes) to force the send function to, well, send the data. My question is, how does Java (or even C#, perhaps) implement such a thing under the hood? That is, if I had a Java socket communicating with a Win32 socket somewhere, how would they communicate given that there will be some flushing needed? For example, if the Java socket's buffer were to be flushed (calling flush on the socket's OutputStream), what would I get on the Win32 side (by calling recv)? And the reverse? Would I see some padding behaviour? Would the Win32 side need to know the Java side's buffer size? Something unrelated?

Flushing a Java socket output stream does nothing. Flushing a BufferedOutputStream does something, but only to the application-side buffer. In neither case is there a system call involved other than the send() or write() implied by the flush. You're looking for something that doesn't exist.
so I feel like "flushing" in this case is just padding the rest of the buffer with dummy bytes (probably zeroes) ...
No.

First recall that TCP provides a bidirectional stream of bytes. On one side you write bytes to a socket and the other side can read them from the socket. There is no guarantee whether writing 4 bytes will result in one read call that fetches 4 bytes on the other end. It could also be 2 times 2 bytes. It's no packets or buffers, and therefore "padding the rest of the buffer with dummy bytes" seems like a bad idea, since the other side will eventually receive those dummy bytes (and interpret them for whatever).
Next on to the question:
On the base of all the application there are the OS socket APIs, which provide the write/send calls for the socket. When you write to the OS socket the bytes that are writing are basically only written into the socket send buffer of the OS, from which they will be sent to the remote side at some point of time. This point of time depends on how filled the state of the send buffer is and how things look on the network (TCP windows, network congestion, etc.). However you normally don't have to care about it, the OS will simply send the data eventually and there is no need to flush something. There is one setting on the OS which can be used to influence the sending behavior, which is the nagle algorithm (or TCP_NODELAY) setting. If nagle algorithm is disabled (NODELAY = true) this means the OS will try to send the data immediatly after the write call and doesn't wait for more data from the application in order to send less IP packets. You can use this to reduce latency in case of small packets, but there is no need for it, since the OS will send the data anyway. So it's not a flush in the sense of a required flush.
For the Java side I'm not exactly sure what the flush is doing. It could be that the Java OutputStream has an internal buffer which is only written to the OS socket with the write system call once either a certain treshold of bytes in the buffer is reached or flush is called. Or flush exists purely in order to satisy the OutputStream base class and does nothing extra. You should be safe by using flush on the Java side (or on other platforms where it exists) and doing nothing special in native socket APIs.

Related

How does Java NIO break up messages?

I'm writing a toy Java NIO server paired with a normal Java client. The client sends a string message to the server using plain Socket. The server receives the message and dumps the content to terminal.
I've noticed that the same message from client is broken up into bytebuffers differently every single time. I understand this is intended behaviour of NIO, but would like to find out roughly how the NIO decides to chop up a message?
Example: Sending string "this is a test message" to server. The following are excerpts of server loggings (each line represents 1 bytebuffer received).
Run 1:
Server receiving: this is a test message
Run 2:
Server receiving: t
Server receiving: his is a test message
Run 3:
Server receiving: this is
Server receiving: a test message
UPDATE - Issue Resolved
I have installed Wireshark to analyse the packets and it has become apparent that the random "break up" was due to me using DataOutputStream for the writer, which sends the message character by character! So there was a packet for each character...
After changing the writer to BufferedWriter, my short message is now sent as a single packet, as expected. So the truth is Java NIO actually did the clever thing and merged my tiny packets to 1 to 2 bytebuffers!
UPDATE2 - Clarification
Thank you all for your replies. Thank you #StephenC for pointing out that unless I encode the message myself(yes, I did call flush() after writing to BufferedWriter), there's always the possiblity of my message arriving across multiple packets.
So the truth is Java NIO actually did the clever thing and merged my tiny
Actually, no. The merging is happening in the BufferedWriter layer. The buffered writer will only deliver a "bunch" of bytes to the NIO layer when either the application flushes or closes the DataOutputStream or the BufferdWriters buffer fills up.
I was in fact referring to my first attempt with DataOutputStream (I got it from an example online, which obviously is incorrect use of the class now that you've pointed it out). BufferedWriter was not involved. My simple writer in that case went like
DataOutputStream out = new DataOutputStream(socket.getOutputStream());
out.writeBytes("this is a test message");
Wireshark confirmed that this message was sent(server on localhost) 1 character a packet(22 packets in total for the actual message not including all the ACK and etc).
I'm probably wrong, but this behaviour seems to suggest that the NIO server combined these 22 packets into 1-2 bytebuffers?
The end game I'm trying to achieve here is a simple Java NIO server capable of receiving request and data stream using TCP from various clients, some may be written in C++ or C# by third party. It's not time critical so the clients can send all data in one go and the server can process them at its own pace. That's why I've written a toy client in Java using plain Socket rather than a NIO client. Therefore the client in this case can't really manipulate the ByteBuffer directly, so I probably need some sort of message format. Could I make this work?

If you are sending data over a TCP/IP socket, then there are no "messages" as such. What you send and receive is a stream of bytes.
If you are asking if you can send a chunk of N bytes, and have the receiver get exactly N bytes in a single read call, then the answer is that there is no guarantee that will happen. However, it is the TCP/IP stack that is "breaking up" the "messages". Not NIO. Not Java.
Data sent over a TCP/IP connection is ultimately broken into network packets for transmission. This typically erases any "message" structure based on the original write request sizes.
If you want a reliable message structure over the top of the TCP/IP byte stream, you need to encode it in the stream itself; e.g. using an "end-of-message" marker or prefixing each message with a byte count. (If you want to use fancy words, you need to implement a "message protocol" over the top of the TCP/IP stream.)
Concerning your update, I think there are still some misconceptions:
... it became apparent that the random "break up" was due to me using DataOutputStream for the writer, which sends the message character by character! So there was a packet for each character...
Yes, lots of small writes to a socket stream may result in severe fragmentation at the network level. However, it won't always. If there is sufficient "back pressure" due to either network bandwidth constraints or the receiver reading slowly, then this will lead to larger packets.
After changing the writer to BufferedWriter, my short message is now sent as a single packet, as expected.
Yes. Adding buffering to the stack is good. However, you are probably doing something else; e.g. calling flush() after each message. If you didn't then I would expect a network packet to contain a sequence of messages and partial messages.
What is more, if the messages are too large to fit into a single network packet, or if there is severe back-pressure (see above) then you are liable to get multiple / partial messages in a packet anyway. Either way, the receiver should not rely on getting one (whole) message each time it reads.
In short, you may not have really resolved your issue!!
So the truth is Java NIO actually did the clever thing and merged my tiny
Actually, no. The merging is happening in the BufferedWriter layer. The buffered writer will only deliver a "bunch" of bytes to the NIO layer when either the application flushes or closes the DataOutputStream or the BufferdWriters buffer fills up.
FWIW - given your description of what you are doing, it is unlikely using NIO is helping performance. If you wanted to maximize performance, you should stop using BufferedWriter and DataOutputStream. Instead do your message encoding "by hand", putting the bytes or characters directly into the ByteBuffer or CharBuffer.
(Also DataOutputStream is for binary data, not text. Putting one in front of a Writer doesn't seem right ... if that is what you are really doing.)

JAVA : BufferdInputStream and BufferedOutputStream

I have several questions-
1. I have two computers connected by socket connection. When the program executes
outputStream.writeInt(value);
outputStream.flush();
what actually happens? Does the program wait until the other computer reads the integer value?
2. How can I empty the outputStream or inputStream? Meaning, when emptying
the outputStream or inputStream, whatever is written to that stream gets removed.
(please don't suggest to do it by closing the connection!)
I tried to empty the inputStream this way-
byte[] eatup=new byte[20*1024];
int available=0;
while(true)
{
available=serverInputStream.available();
if(available==0)
break;
serverInputStream.read(eatup,0,available);
}
eatup=null;
String fileName=(String)serverInputStream.readObject();
Program should not process the line as nothing else is being written on the outputStream .
But my program executes it anyway and throws a java.io.OptionalDataException error.
Note: I am working on a client-server file transfer project. The client sends files to
the server. The second code is for server terminal. If 'cancel button' is pressed on server
end then it stops reading bytes from the serverInputStream and sends a signal(I used int -1)
to the client. When client receieves this signal it stops sending data to the server, but I've
noticed that serverInputStream is not empty. So I need to empty this serverInputStream so that
the client computer is able to send the server computer files again(That's why I can't manage a lock
from read method)

1 - No. On the flush() the data will be written to the OS kernel which will likely immediately hand it to the network card driver, which in turn will send it to the receiving end. In a nutshell the send is fire and forget.
2 - As Jeffrey commented available() is not reliable for this sort of operation. If doing blocking IO then as he suggests you should just use read() speculatively. However it should be said that you really need to define a protocol on top of the raw streams, even if it's just using DataInput/DataOutputStream. When using raw write/read the golden rule is one write != one read. For example if you were to write 10 bytes on one side and had a reading loop on the other there is no guarantee that one read will read all 10 bytes. It may be "read" as any combination of chunks. Similarly two writes of 10 bytes might appear as one read of 20 bytes on the receiving side. Put another way there is no concept of a "packet" unless you create a higher level protocol on top of the raw bytes to do packets. An example would be each send is prefixed by a byte length so the receiving side knows how much data to expect in the current packet.
If you do need to do anything more complicated than a basic apps I strongly encourage you to investigate some higher level libraries that have solved many of the gnarly issues of network IO. I would recommend Netty which I use for production apps. However it is quite a big leap in understanding from a simple IO stream to Netty's more event based system. There may be other libraries somewhere in the middle.

Streaming response on a socket in Java

I have a Java socket server and the the connection socket working just fine. What I need help with is streaming a response back to the client.
I get the output stream with socket.getOutputStream(). How can I make it so that when I write to the output stream it is immediately sent, but in the future on the same connection I can send another chunk of data.
I tried simply using write and write in conjunction with flush, but I don't really know what I am doing...

Depending on native implementation, the socket may have a buffer, and not send the bytes the second you call write(). flush() however, will force the bytes to be sent. Typically it is good practice to send larger chunks rather than byte by byte (for streaming you generally start by building up a buffer on the receiver's side). Optimal network usage is likely to be to send as large packets as possible (limited by MTU). To have a local buffer in java, wrap the socket outputstream in a BufferedOutputStream.

flush() will force the data to be sent to the OS. The OS can buffer the data and so can the OS on the client. If you want the OS to send data earlier, I suggest you try turning Nagle's Algorithm off. socket.setTcpNoDelay(true); However, you will find that OS/driver parameters can still introduce some buffering/packet coelesing.
If you look at Sun's JDK 6 java.net.SocketOutputStream you will see the flush() method does nothing. This is not guarenteed to be the case on all platforms and a flush() may be required.

Another solution could be DataOutputStream
DataOutputStream dataOut = new DataOutputStream(socket.getOutputStream());
dataOut.writeInt(1)

java socket / output stream writes : do they block?

If I am only WRITING to a socket on an output stream, will it ever block? Only reads can block, right? Someone told me writes can block but I only see a timeout feature for the read method of a socket - Socket.setSoTimeout().
It doesn't make sense to me that a write could block.

A write on a Socket can block too, especially if it is a TCP Socket. The OS will only buffer a certain amount of untransmitted (or transmitted but unacknowledged) data. If you write stuff faster than the remote app is able to read it, the socket will eventually back up and your write calls will block.
It doesn't make sense to me that a write could block.
An OS kernel is unable to provide an unlimited amount of memory for buffering unsent or unacknowledged data. Blocking in write is the simplest way to deal with that.
Responding to these followup questions:
So is there a mechanism to set a
timeout for this? I'm not sure what
behavior it'd have...maybe throw away
data if buffers are full? Or possibly
delete older data in the buffer?
There is no mechanism to set a write timeout on a java.net.Socket. There is a Socket.setSoTimeout() method, but it affects accept() and read() calls ... and not write() calls. Apparently, you can get write timeouts if you use NIO, non-blocking mode, and a Selector, but this is not as useful as you might imagine.
A properly implemented TCP stack does not discard buffered data unless the connection is closed. However, when you get a write timeout, it is uncertain whether the data that is currently in the OS-level buffers has been received by the other end ... or not. The other problem is that you don't know how much of the data from your last write was actually transferred to OS-level TCP stack buffers. Absent some application level protocol for resyncing the stream*, the only safe thing to do after a timeout on write is to shut down the connection.
By contrast, if you use a UDP socket, write() calls won't block for any significant length of time. But the downside is that if there are network problems or the remote application is not keeping up, messages will be dropped on the floor with no notification to either end. In addition, you may find that messages are sometimes delivered to the remote application out of order. It will be up to you (the developer) to deal with these issues.
* It is theoretically possible to do this, but for most applications it makes no sense to implement an additional resyncing mechanism on top of an already reliable (to a point) TCP/IP stream. And if it did make sense, you would also need to deal with the possibility that the connection closed ... so it would be simpler to assume it closed.

The only way to do this is to use NIO and selectors.
See the writeup from the Sun/Oracle engineer in this bug report:
https://bugs.java.com/bugdatabase/view_bug.do?bug_id=4031100

Using NIO DatagramChannel will I need to handle partially read/written packets?

When using SocketChannel, you need to retain read and write buffers to handle partial writes and reads.
I have a nagging suspicion that it might not be needed when using a DatagramChannel, but info is scarce.
What is the story?
Should I call (non-blocking) receive(ByteBuffer) repeatedly until I get a null back to read all waiting datagrams?
When sending in non-blocking mode, can I rely on send(ByteBuffer, SocketAddress) to either send the the whole buffer or rejecting it entirely, or do I need to possibly keep partially written buffers?

Every read of a Datagram is the entire datagram, nothing more, nothing less. There's a hint that this is the case in the description of java.nio.DatagramChannel.read:
If there are more bytes in the
datagram than remain in the given
buffers then the remainder of the
datagram is silently discarded
When you're dealing with a SocketChannel, it's a message stream; there's no guarantee how much or how little data you'll get on each read, as TCP is reassembling separate packets to recreate the message from the other side. But for UDP (which is what you're reading with the DatagramChannel) each packet is its own atomic message.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.