I have a custom NIO server but I am seeing a behavior that I can not reproduce nor explain: the selector keeps triggering on a certain selection key with isReadable() == true but when I read from the channel, there is no data.
I have:
triple checked that the EOS are respected in all scenarios
built custom clients that try all kinds of funky combinations of invalid data to trigger the bug but can not trigger it myself
looked through the apache mina code to see if they do anything special as compared to my server
tried different versions of the JDK (8_111 and 8_121)
triple checked that the selection key is removed from the iterator of the selected key set in a finally block that wraps around everything after iterator.next() so it should not be a ghost key
Everything turns up empty, every weird thing I try on the client side is correctly handled by the server but nonetheless approximately every four hours (you can almost set a clock to it), an IP from Russia connects to the server and triggers the bug.
At that point the selector goes into overdrive continuously triggering the channel and the read process attached to it which keeps reporting 0 bytes incoming.
So two questions:
apart from actual data and an EOS, what else can trigger a read operation on a selector?
if I can not find the actual problem, is it ok to simply check for x amount of subsequent read triggers that turn up with no data and conclude that I should close the socket? There are read timeouts in place but due to the CPU-intensive nature of the bug they are too long for my comfort
UPDATE:
it is not the server socket channel triggering the reads but one of the accepted client channels, as such it can't (shouldn't?) be an incoming connection
by default only OP_READ is registered, OP_WRITE is registered sporadically if the internal buffers still contain data but are unregistered once the data has been sent
the read timeouts mentioned are custom timeouts, the parser will keep track of when the first data for a message comes in and if it takes too long to complete, it will trigger a read timeout
If I knew where the problem was I could provide some code of that part but the whole server is too big to paste here.
UPDATE 2
In the debug things that I have added, I print out the following state:
selectionKey.isReadable() + "/" + selectionKey.isValid() + "/" + selectionKey.channel().isOpen()
All three booleans are always true.
Impossible to answer this properly without some code, but:
apart from actual data and an EOS, what else can trigger a read operation on a selector?
If it's a ServerSocketChannel, an incoming connection.
if I can not find the actual problem, is it ok to simply check for x amount of subsequent read triggers that turn up with no data and conclude that I should close the socket?
No. Find the bug in your code.
There are read timeouts in place
There can't be read timeouts in place on non-blocking sockets, and as you're using a Selector you must be using non-blocking sockets.
but due to the CPU-intensive nature of the bug they are too long for my comfort
I don't know what this means.
The same issue happened to me also, check if you have kept the connection opened with the client. If it is so then selection key will be triggered with isReadable() is true, even though client is not actually sending any data.
Related
now this problem is bugging me for a while.
In a working application that i work on, i use SocketChannel in non-blocking mode to communicate with embedded devices.
Now i receive sporadically corrupted data.
On some PCs it does not happen, now it happens on mine.
But when I change too much in the program, the problem disappears.
So much might have effects. The timing, the network interface hardware, win7, the java version, the company firewall, ...
The data reading boils down to this code:
byteBuffer.compact();
socketChannel.read(byteBuffer); // <<< problem here ?
byteBuffer.flip();
if( byteBuffer.hasRemaining() ){
handleData( byteBuffer );
}
This is run in the same thread as the writing, when the selector wakes up and the interest op OP_READ is set.
This code is the only place where byteBuffer is referenced. socketChannel is used only from the same thread when writing.
I instrumented the code, so i can printout the content of the last few read() calls, when the error happens. At the same time I analyze the network traffic on Wireshark. I added lots of asserts to check the bytebuffer integrity.
In Wireshark, the received stream looks good. No DUP-ACK or something else suspicious. The last read() calls match exactly with the data in Wireshark.
In Wireshark, i see many small TCP frames receiving with 90 bytes of payload data in intervals like 10ms arriving. Normally the Java thread reads the data as well all 10ms when it is just arrived.
When it comes to the problem, the Java thread is a bit delay, as the reading happens after 300ms, and the read returns with like ~3000 bytes which is plausible. But the data is corrupted.
The data looks like, if it was copied into the buffer and concurrently received data has overwritten the first data.
Now I don't know how to proceed. I cannot create a small example, as this only rarely happens and I don't know the exact condition which is needed.
Can someone give a hint?
How can I prove, it is the Java lib or not?
What conditions may be important to look at, too?
thanks
Frank
29-June-2015:
Now i was able to build a example for reproduction.
There is one Sender and a Receiver program.
The Sender is using blocking IO, first waiting for a connection, then sending 90 byte blocks every 2ms. The first 4 byte are a running counter, the remaining is not set. The Sender uses setNoTcpDelay(true).
The Receiver is using non-blocking IO. First it connects to the Sender, then it read the channel whenever the selection key is ready for it. Sometime, the read loop does a Thread.sleep(300).
If they run on the same PC over the loopback, this works for me all the time. If I put the Sender onto another PC, directly connected over LAN, it triggers the error. Checking with Wireshark, the traffic and the data sent looks good.
To run, first start the Sender on one PC, then (after editing the hostaddress) start the Receiver.
As long as it works, it prints a line about every 2 seconds. If it fails, it prints information about the last 5 read() calls.
What i found to be the trigger:
The sender has configured setNoTcpDelay(true)
The receiver has sometimes a Thread.sleep(300) before doing the read().
thanks
Frank
buf.order(ByteOrder.BIG_ENDIAN);
This is the default. Remove this.
buf.clear();
The buffer is already empty, because you just allocated it. Remove this.
buf.limit(0);
The limit is already zero after the clear() and also after the initial allocation. Remove this.
while( true ) {
There should be a select() call here.
Iterator<SelectionKey> it = selector.selectedKeys().iterator();
// ...
if( key == keyData && key.isConnectable() ) {
ch.finishConnect();
This method can return false. You're not handling that case.
// ...
if( key == keyData && key.isReadable() ) {
// ...
readPos += ch.read(buf);
Completely incorrect. You are entirely overlooking the case where read() returns -1, meaning the peer has disconnected. In this case you must close the channel.
// without this Thread.sleep, it would not trigger the error
So? Hasn't the penny dropped? Remove the sleep. It is completely and utterly pointless. The select() will block until data arrives. It doesn't need your help. This sleep is literally a waste of time.
if( rnd.nextInt(20) == 0 ) {
Thread.sleep(300);
}
Remove this.
selector.select();
This should be at the top of the loop, not the bottom.
I turned out to be a driver problem, at least it seems so.
I used the USB to Ethernet adapater "D-Link E-DUB100 Rev A".
Because of wireshark showing the correct data, i thought to eliminate the hardware a possible failure cause.
But meanwhile i tried "D-Link E-DUB100 Rev C1" and the problem disappeared.
So I assume it is a problem in the delivered drivers from D-Link for Rev A. And with Rev C1 it might use a system driver that does not have this problem.
thx for all taking the time to read my question.
As some background:
I have a connection to a server with a SocketChannel, SelectionKey...etc. On the client end, if I want to send something to the server, I just write my data into a ByteBuffer and send it through the socket channel. If all of it was written, I'm done and can return to OP_READ. If not all of it was written, I take the bytes left over, store them in a "to send" buffer somewhere and mark OP_WRITE on the key (is it a good idea to replace OP_READ so its only write?).
Therefore, the next time I call selectNow(), I'm assuming it will recognize OP_WRITE and attempt to flush more data through (which I will attempt to do by entering another writing loop with the data to write, and repeating the previous if needed).
This leads me to two questions:
Am I supposed to leave it in OP_WRITE until all the data has been flushed through? Or should I change to OP_READ and attempt any reads in between?
If the writing channel is full and I can't write, do I just keep looping until I can start writing stuff through? If the connection suddenly gets choked, I'm unsure if I'm supposed to just write what I can, flip back to OP_READ, attempt to read, then flip back to OP_WRITE. From what I've read, this appears to not be the correct way to do things (and may cause large overhead constantly switching back and forth?).
What is the optimal way to handle reading and writing bulk data when the buffers both may become full?
Reading sounds easy cause you just loop through until the data is consumed, but with writing... the server may be only writing and not reading. This would leave you with quite a full send buffer, and cycling around forever on OP_WRITE without reading would be bad. How do you avoid that situation? Do you set a timer on which you just stop attempting to write and start reading again if the send buffer is not clearing up? If so, do you remove OP_WRITE and remember it for later?
Side question: Do you even need OP_READ to read from the network? I'm unsure if it's like OP_WRITE where you only mark it in a specific case (just in case I', doing it wrong, since I have it on OP_READ 99.9% of the time).
Currently I just set my key to OP_READ and then leave it in that mode, waiting for data, and then go to OP_WRITE if and only if writing fails to send all the data (with a write() value of 0).
Am I supposed to leave it in OP_WRITE until all the data has been flushed through? Or should I change to OP_READ and attempt any reads in between?
There are differing views about that. Mine is that the peer should be reading every part of the response you're sending before he sends a new request, and if he doesn't he is just misbehaving, and this you shouldn't encourage by reading ahead. Otherwise you just run out of memory eventually, and you shouldn't let a client do that to you. Of course that assumes you're the server in a request-response protocol. Other situations have their own requirements.
If the writing channel is full and I can't write, do I just keep looping until I can start writing stuff through?
No, you wait for OP_WRITE to fire.
If the connection suddenly gets choked, I'm unsure if I'm supposed to just write what I can, flip back to OP_READ, attempt to read, then flip back to OP_WRITE. From what I've read, this appears to not be the correct way to do things (and may cause large overhead constantly switching back and forth?).
The overhead isn't significant, but it's the wrong thing to do in the situation I described above.
What is the optimal way to handle reading and writing bulk data when the buffers both may become full?
In general, read when OP_READ fires; write whenever you need to; and use OP_WRITE to tell you when an outbound stall has relieved itself.
Do you even need OP_READ to read from the network?
Yes, otherwise you just smoke the CPU.
Whenever you need to write just set interested operation to (OP_READ || OP_WRITE). When you finish writing just set the interested operation to OP_READ.
that's all you have to do.
Let's say I have some UDP channels and some TCP channels registered with my selector. Once the selector wakes up, can I just keep looping and reading as much information as I can from ALL keys (not just the selected ones) without looping back and performing another select()? For TCP this does not make much sense since I can read as much as possible into my ByteBuffer with a call to channel.read(), but for UDP you can only read one packet at a time with a call to channel.receive(). So how many packets do I read? Do you see a problem with just keep reading (not just reading, but writing, connecting and accepting, in other words ALL key operations) until there is nothing else to do then I perform the select again? That way a UDP channel would not starve the other channels. You would process all channels as much as you can, reading one packet at a time from the UDP channels. I am particularly concerned about:
1) Performance hit of doing too many selects if I can just keep processing my keys without it.
2) Does the select() do anything fundamental that I cannot bypass in order to keep reading/writing/accepting/connecting?
Again, keep in mind that I will be processing all keys and not just the ones selected. If there is nothing to do for a key (no data) I just do nothing and continue to the next key.
I think you have to try it both ways. You can construct a plausible argument that says you should read every readable channel until read() returns zero, or that you should process one event per channel and do just one read each time. I probably favour the first but I can remember when I didn't.
Again, keep in mind that I will be processing all keys and not just
the ones selected.
Why? You should process the events on the selected channels, and you might then want to perform timeout processing on the non-selected channels. I wouldn't conflate the two things, they are quite different. Don't forget to remove keys from the selectedKeys set whichever way you do it.
I've a situation where a thread opens a telnet connection to a target m/c and reads the data from a program which spits out the all the data in its buffer. After all the data is flushed out, the target program prints a marker. My thread keeps looking for this marker to close the connection (successful read).
Some times, the target program does not print any marker, it keeps on dumping the data and my thread keeps on reading it (no marker is printed by the target program).
So i want to read the data only for a specific period of time (say 15 mins/configurable). Is there any way to do this at the java API level?
Use another thread to close the connection after 15 mins. Alternatively, you could check after each read if 15mins have passed and then simply stop reading and cleanup the connection, but this would only work if you're sure the remote server will continue to send data (if it doesn't the read will block indefinitely).
Generally, no. Input streams don't provide timeout functinality.
However, in your specific case, that is, reading data from a socket, yes. What you need to do is set the SO_TIMEOUT on your socket to a non-zero value (the timeout you need in millisecs). Any read operations that block for the amount of time specified will throw a SocketTimeoutException.
Watch out though, as even though your socket connection is still valid after this, continuing to read from it may bring unexpected result, as you've already half consumed your data. The easiest way to handle this is to close the connection but if you keep track of how much you've read already, you can choose to recover and continue reading.
If you're using a Java Socket for your communication, you should have a look at the setSoTimeout(int) method.
The read() operation on the socket will block only for the specified time. After that, if no information is received, a java.net.SocketTimeoutException will be raised and if treated correctly, the execution will continue.
If the server really dumps data forever, the client will never be blocked in a read operation. You might thus regularly check (between reads) if the current time minus the start time has exceeded your configurable delay, and stop reading if it has.
If the client can be blocked in a synchronous read, waiting for the server to output something, then you might use a SocketChannel, and start a timer thread that interrupts the main reading thread, or shuts down its input, or closes the channel.
To be more specific, i have written a server with java NIO, and it works quiet well, after some testing i have found out that for some reason, in average a call to the SocketChannels write method takes 1ms, the read method on the other hand takes 0.22ms in average.
Now at first i was thinking that setting the sent/receive buffer values on Socket might help a bit, but after thinking about it, all the messages are very short(a few bytes) and i send a message about every 2 seconds on a single connection. Both sent and receive buffers are well over 1024 bytes in size so this can't really be the problem, i do have several thousand clients connected at once thou.
Now i am a bit out of ideas on this, is this normal and if it is, why ?
I would start by using Wireshark to eliminate variables.
#Nuoji i am using nonblocikng-io and yes i am using a Selector, as for when i write to a channel i do the following:
Since what i wrote in the second paragraph in my post is true, i assume that the channel is ready for writing in most cases, hence i do not at first set the interest set on the key to write, but rather try to write to the channel directly. In case however that, i can not write everything to the channel (or anything at all for that matter), i set the interest set on the key to write(that way the next time i try to write to the channel it is ready to write). Although in my testing where i got the results mentioned in the original post, this happens very rarely.
And yes i can give you samples of the code, although i didn't really want to bother anyone with it. What parts in particular would you like to see, the selector thread or the write thread ?