Java NIO: transferFrom until end of stream - java

I'm playing around with the NIO library. I'm attempting to listen for a connection on port 8888 and once a connection is accepted, dump everything from that channel to somefile.
I know how to do it with ByteBuffers, but I'd like to get it working with the allegedly super efficient FileChannel.transferFrom.
This is what I got:
ServerSocketChannel ssChannel = ServerSocketChannel.open();
ssChannel.socket().bind(new InetSocketAddress(8888));
SocketChannel sChannel = ssChannel.accept();
FileChannel out = new FileOutputStream("somefile").getChannel();
while (... sChannel has not reached the end of the stream ...) <-- what to put here?
out.transferFrom(sChannel, out.position(), BUF_SIZE);
out.close();
So, my question is: How do I express "transferFrom some channel until end-of-stream is reached"?
Edit: Changed 1024 to BUF_SIZE, since the size of the buffer used, is irrelevant for the question.

There are few ways to handle the case. Some background info how trasnferTo/From is implemented internally and when it can be superior.
1st and foremost you should know how many bytes you have to xfer, i.e. use FileChannel.size() to determine the max available and sum the result. The case refers to FileChannel.trasnferTo(socketChanel)
The method does not return -1
The method is emulated on Windows. Windows doesn't have an API function to xfer from filedescriptor to socket, it does have one (two) to xfer from the file designated by name - but that's incompatible with java API.
On Linux the standard sendfile (or sendfile64) is used, on Solaris it's called sendfilev64.
in short for (long xferBytes=0; startPos + xferBytes<fchannel.size();) doXfer() will work for transfer from file -> socket.
There is no OS function that transfers from socket to file (which the OP is interested in). Since the socket data is not int he OS cache it can't be done so effectively, it's emulated. The best way to implement the copy is via standard loop using a polled direct ByteBuffer sized with the socket read buffer. Since I use only non-blocking IO that involves a selector as well.
That being said: I'd like to get it working with the allegedly super efficient "? - it is not efficient and it's emulated on all OSes, hence it will end up the transfer when the socket is closed gracefully or not. The function will not even throw the inherited IOException, provided there was ANY transfer (If the socket was readable and open).
I hope the answer is clear: the only interesting use of File.transferFrom happens when the source is a file. The most efficient (and interesting case) is file->socket and file->file is implemented via filechanel.map/unmap(!!).

Answering your question directly:
while( (count = socketChannel.read(this.readBuffer) ) >= 0) {
/// do something
}
But if this is what you do you do not use any benefits of non-blocking IO because you actually use it exactly as blocking IO. The point of non-blocking IO is that 1 network thread can serve several clients simultaneously: if there is nothing to read from one channel (i.e. count == 0) you can switch to other channel (that belongs to other client connection).
So, the loop should actually iterate different channels instead of reading from one channel until it is over.
Take a look on this tutorial: http://rox-xmlrpc.sourceforge.net/niotut/
I believe it will help you to understand the issue.

I'm not sure, but the JavaDoc says:
An attempt is made to read up to count bytes from the source channel
and write them to this channel's file starting at the given position.
An invocation of this method may or may not transfer all of the
requested bytes; whether or not it does so depends upon the natures
and states of the channels. Fewer than the requested number of bytes
will be transferred if the source channel has fewer than count bytes
remaining, or if the source channel is non-blocking and has fewer than
count bytes immediately available in its input buffer.
I think you may say that telling it to copy infinite bytes (of course not in a loop) will do the job:
out.transferFrom(sChannel, out.position(), Integer.MAX_VALUE);
So, I guess when the socket connection is closed, the state will get changed, which will stop the transferFrom method.
But as I already said: I'm not sure.

allegedly super efficient FileChannel.transferFrom.
If you want both the benefits of DMA access and nonblocking IO the best way is to memory-map the file and then just read from the socket into the memory mapped buffers.
But that requires that you preallocate the file.

This way:
URLConnection connection = new URL("target").openConnection();
File file = new File(connection.getURL().getPath().substring(1));
FileChannel download = new FileOutputStream(file).getChannel();
while(download.transferFrom(Channels.newChannel(connection.getInputStream()),
file.length(), 1024) > 0) {
//Some calculs to get current speed ;)
}

transferFrom() returns a count. Just keep calling it, advancing the position/offset, until it returns zero. But start with a much larger count than 1024, more like a megabyte or two, otherwise you're not getting much benefit from this method.
EDIT To address all the commentary below, the documentation says that "Fewer than the requested number of bytes will be transferred if the source channel has fewer than count bytes remaining, or if the source channel is non-blocking and has fewer than count bytes immediately available in its input buffer." So provided you are in blocking mode it won't return zero until there is nothing left in the source. So looping until it returns zero is valid.
EDIT 2
The transfer methods are certainly mis-designed. They should have been designed to return -1 at end of stream, like all the read() methods.

Building on top of what other people here have written, here's a simple helper method which accomplishes the goal:
public static void transferFully(FileChannel fileChannel, ReadableByteChannel sourceChannel, long totalSize) {
for (long bytesWritten = 0; bytesWritten < totalSize;) {
bytesWritten += fileChannel.transferFrom(sourceChannel, bytesWritten, totalSize - bytesWritten);
}
}

Related

Java NIO ByteBuffer, write after flip

I'm new to Java ByteBuffers and was wondering what the correct way to write to a ByteBuffer after it has been flipped.
In my use case, I am writing an outputBuffer to a socket:
outBuffer.flip();
//Non-blocking SocketChannel
int bytesWritten = getSocket().write(outBuffer);
After this, the output buffer has to be written to again. Also not all of the bytes in the outBuffer may have been written to the socket.
Since it is currently flipped, how can I make it writable again, without overriding any data if it is still in the buffer and wasn't written to the socket?
If I am right, outBuffer.position() == bytesWritten and limit should be at how much data there was to write.
So would using the following in order to reuse the output buffer be right? :
int limit = outBuffer.limit()
outBuffer.limit(outBuffer.capacity());
outBuffer.position(limit);
Again from the API spec.:
The following loop copies bytes from one channel to another via the buffer buf:
while (in.read(buf) >= 0 || buf.position != 0) {
buf.flip();
out.write(buf);
buf.compact(); // In case of partial write
}
since it is currently flipped
It will stay flipped. The write doesn't change that.
how can I make it writable again, without overriding any data if it is still in the buffer and wasn't written to the socket?
You don't have to do anything, but if you want to read before you write again you should do flip/write/compact. If you just want to repeat the write just call write() again, with the buffer still in its current state.
But I prefer to always keep these buffers ready for reading, so there is no possibility of a slip-up, and to flip/write/compact (or flip/get/compact) when those operations are necessary, atomically as it were.
Note that you should not use clear(), unless you are certain that the write was complete and the buffer is now empty. In that case compact and clear are equivalent. But it is simpler to just always compact.
If you're copying in blocking mode, use the loop quoted by #zlakad.

Last few chars in a string sent over socket sometimes missing in Java network program

Right now, I'm trying to write a GUI based Java tic-tac-toe game that functions over a network connection. It essentially works at this point, however I have an intermittent error in which several chars sent over the network connection are lost during gameplay. One case looked like this, when println statements were added to message sends/reads:
Player 1:
Just sent ROW 14 COLUMN 11 GAMEOVER true
Player 2:
Just received ROW 14 COLUMN 11 GAMEOV
Im pretty sure the error is happening when I read over the network. The read takes place in its own thread, with a BufferedReader wrapped around the socket's InputStream, and looks like this:
try {
int input;
while((input = dataIn.read()) != -1 ){
char msgChar = (char)input;
String message = msgChar + "";
while(dataIn.ready()){
msgChar = (char)dataIn.read();
message+= msgChar;
}
System.out.println("Just received " + message);
this.processMessage(message);
}
this.sock.close();
}
My sendMessage method is pretty simple, (just a write over a DataOutputStream wrapped around the socket's outputstream) so I don't think the problem is happening there:
try {
dataOut.writeBytes(message);
System.out.println("Just sent " + message);
}
Any thoughts would be highly appreciated. Thanks!
As it turns out, the ready() method guaruntees only that the next read WON'T block. Consequently, !ready() does not guaruntee that the next read WILL block. Just that it could.
I believe that the problem here had to do with the TCP stack itself. Being stream-oriented, when bytes were written to the socket, TCP makes no guarantees as to the order or grouping of the bytes it sends. I suspect that the TCP stack was breaking up the sent string in a way that made sense to it, and that in the process, the ready() method must detect some sort of underlying break in the stream, and return false, in spite of the fact that more information is available.
I refactored the code to add a newline character to every message send, then simply performed a readLine() instead. This allowed my network protocol to be dependent on the newline character as a message delimiter, rather than the ready() method. I'm happy to say this fixed the problem.
Thanks for all your input!
Try flushing the OutputStream on the sender side. The last bytes might remain in some intenal buffers.
It is really important what types of streamed objects you use to operate with data. It seems to me that this troubleshooting is created by the fact that you use DataOutputStream for sending info, but something else for receiving. Try to send and receive info by DataOutputStream and DataInputStream respectively.
Matter fact, if you send something by calling dataOut.writeBoolean(b)
but trying to receive this thing by calling dataIn.readString(), you will eventually get nothing. DataInputStream and DataOutputStream are type-sensitive. Try to refactor your code keeping it in mind.
Moreover, some input streams return on invocation of read() a single byte. Here you try to convert this one single byte into char, while in java char by default consists of two bytes.
msgChar = (char)dataIn.read();
Check whether it is a reason of data loss.

Java HttpURLConnection InputStream.close() hangs (or works too long?)

First, some background. There is a worker which expands/resolves bunch of short URLS:
http://t.co/example -> http://example.com
So, we just follow redirects. That's it. We don't read any data from the connection. Right after we got 200 we return the final URL and close InputStream.
Now, the problem itself. On a production server one of the resolver threads hangs inside the InputStream.close() call:
"ProcessShortUrlTask" prio=10 tid=0x00007f8810119000 nid=0x402b runnable [0x00007f882b044000]
java.lang.Thread.State: RUNNABLE
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.skip(BufferedInputStream.java:352)
- locked <0x0000000561293aa0> (a java.io.BufferedInputStream)
at sun.net.www.MeteredStream.skip(MeteredStream.java:134)
- locked <0x0000000561293a70> (a sun.net.www.http.KeepAliveStream)
at sun.net.www.http.KeepAliveStream.close(KeepAliveStream.java:76)
at java.io.FilterInputStream.close(FilterInputStream.java:155)
at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.close(HttpURLConnection.java:2735)
at ru.twitter.times.http.URLProcessor.resolve(URLProcessor.java:131)
at ru.twitter.times.http.URLProcessor.resolve(URLProcessor.java:55)
at ...
After a brief research, I understood that skip() is called to clean up the stream before sending it back to the connections pool (if keep-alive is set on?). Still I don't understand how to avoid this situation. Moreover, I doubt if there is some bad design in our code or there is problem in JDK.
So, the questions are:
Is it possible to avoid hanging on close()? Guarantee some reasonable
timeout, for example.
Is it possible to avoid reading data from connection at all?
Remember I just want the final URL. Actually, I think, I don't want
skip() to be called at all ...
Update:
KeepAliveStream, line 79, close() method:
// Skip past the data that's left in the Inputstream because
// some sort of error may have occurred.
// Do this ONLY if the skip won't block. The stream may have
// been closed at the beginning of a big file and we don't want
// to hang around for nothing. So if we can't skip without blocking
// we just close the socket and, therefore, terminate the keepAlive
// NOTE: Don't close super class
try {
if (expected > count) {
long nskip = (long) (expected - count);
if (nskip <= available()) {
long n = 0;
while (n < nskip) {
nskip = nskip - n;
n = skip(nskip);} ...
More and more it seems to me that there is a bug in JDK itself. Unfortunately, it's very hard to reproduce this ...
The implementation of KeepAliveStream that you have linked, violates the contract under which available() and skip() are guaranteed to be non-blocking and thus may indeed block.
The contract of available() guarantees a single non-blocking skip():
Returns an estimate of the number of bytes that can be read (or
skipped over) from this input stream without blocking by the next
caller of a method for this input stream. The next caller might be
the same thread or another thread. A single read or skip of this
many bytes will not block, but may read or skip fewer bytes.
Wheres the implementation calls skip() multiple times per single call to available():
if (nskip <= available()) {
long n = 0;
// The loop below can iterate several times,
// only the first call is guaranteed to be non-blocking.
while (n < nskip) {
nskip = nskip - n;
n = skip(nskip);
}
This doesn't prove that your application blocks because KeepAliveStream incorrectly uses InputStream. Some implementations of InputStream may possibly provide stronger non-blocking guarantees, but I think it is a very likely suspect.
EDIT: After a bit more research, this is a very recently fixed bug in JDK: https://bugs.openjdk.java.net/browse/JDK-8004863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel. The bug report says about an infinite loop, but a blocking skip() could also be a result. The fix seems to address both issues (there is only a single skip() per available())
I guess this skip() on close() is intended for Keep-Alive support.
See http://docs.oracle.com/javase/6/docs/technotes/guides/net/http-keepalive.html.
Prior to Java SE 6, if an application closes a HTTP InputStream when
more than a small amount of data remains to be read, then the
connection had to be closed, rather than being cached. Now in Java SE
6, the behavior is to read up to 512 Kbytes off the connection in a
background thread, thus allowing the connection to be reused. The
exact amount of data which may be read is configurable through the
http.KeepAlive.remainingData system property.
So keep alive can be effectively disabled with http.KeepAlive.remainingData=0 or http.keepAlive=false.
But this can negatively affect performance if you always address to the same http://t.co host.
As #artbristol suggested, using HEAD instead of GET seems to be the preferable solution here.
I was facing a similar issue when I was trying to make a "HEAD" request. To fix it, I removed the "HEAD" method because I just wanted to ping the url

Trying to packetize TCP with non-blocking IO is hard! Am I doing something wrong?

Oh how I wish TCP was packet-based like UDP is! [see comments] But alas, that's not the case, so I'm trying to implement my own packet layer. Here's the chain of events so far (ignoring writing packets)
Oh, and my Packets are very simply structured: two unsigned bytes for length, and then byte[length] data. (I can't imagine if they were any more complex, I'd be up to my ears in if statements!)
Server is in an infinite loop, accepting connections and adding them to a list of Connections.
PacketGatherer (another thread) uses a Selector to figure out which Connection.SocketChannels are ready for reading.
It loops over the results and tells each Connection to read().
Each Connection has a partial IncomingPacket and a list of Packets which have been fully read and are waiting to be processed.
On read():
Tell the partial IncomingPacket to read more data. (IncomingPacket.readData below)
If it's done reading (IncomingPacket.complete()), make a Packet from it and stick the Packet into the list waiting to be processed and then replace it with a new IncomingPacket.
There are a couple problems with this. First, only one packet is being read at a time. If the IncomingPacket needs only one more byte, then only one byte is read this pass. This can of course be fixed with a loop but it starts to get sorta complicated and I wonder if there is a better overall way.
Second, the logic in IncomingPacket is a little bit crazy, to be able to read the two bytes for the length and then read the actual data. Here is the code, boiled down for quick & easy reading:
int readBytes; // number of total bytes read so far
byte length1, length2; // each byte in an unsigned short int (see getLength())
public int getLength() { // will be inaccurate if readBytes < 2
return (int)(length1 << 8 | length2);
}
public void readData(SocketChannel c) {
if (readBytes < 2) { // we don't yet know the length of the actual data
ByteBuffer lengthBuffer = ByteBuffer.allocate(2 - readBytes);
numBytesRead = c.read(lengthBuffer);
if(readBytes == 0) {
if(numBytesRead >= 1)
length1 = lengthBuffer.get();
if(numBytesRead == 2)
length2 = lengthBuffer.get();
} else if(readBytes == 1) {
if(numBytesRead == 1)
length2 = lengthBuffer.get();
}
readBytes += numBytesRead;
}
if(readBytes >= 2) { // then we know we have the entire length variable
// lazily-instantiate data buffers based on getLength()
// read into data buffers, increment readBytes
// (does not read more than the amount of this packet, so it does not
// need to handle overflow into the next packet's data)
}
}
public boolean complete() {
return (readBytes > 2 && readBytes == getLength()+2);
}
Basically I need feedback on my code and overall process. Please suggest any improvements. Even overhauling my entire system would be okay, if you have suggestions for how better to implement the whole thing. Book recommendations are welcome too; I love books. I just get the feeling that something isn't quite right.
Here's the general solution I came up with thanks to Juliano's answer: (feel free to comment if you have any questions)
public void fillWriteBuffer() {
while(!writePackets.isEmpty() && writeBuf.remaining() >= writePackets.peek().size()) {
Packet p = writePackets.poll();
assert p != null;
p.writeTo(writeBuf);
}
}
public void fillReadPackets() {
do {
if(readBuf.position() < 1+2) {
// haven't yet received the length
break;
}
short packetLength = readBuf.getShort(1);
if(readBuf.limit() >= 1+2 + packetLength) {
// we have a complete packet!
readBuf.flip();
byte packetType = readBuf.get();
packetLength = readBuf.getShort();
byte[] packetData = new byte[packetLength];
readBuf.get(packetData);
Packet p = new Packet(packetType, packetData);
readPackets.add(p);
readBuf.compact();
} else {
// not a complete packet
break;
}
} while(true);
}
Probably this is not the answer you are looking for, but someone would have to say it: You are probably overengineering the solution for a very simple problem.
You do not have packets before they arrive completely, not even IncomingPackets. You have just a stream of bytes without defined meaning. The usual, the simple solution is to keep the incoming data in a buffer (it can be a simple byte[] array, but a proper elastic and circular buffer is recommended if performance is an issue). After each read, you check the contents of the buffer to see if you can extract an entire packet from there. If you can, you construct your Packet, discard the correct number of bytes from the beginning of the buffer and repeat. If or when you cannot extract an entire packet, you keep those incoming bytes there until the next time you read something from the socket successfully.
While you are at it, if you are doing datagram-based communication over a stream channel, I would recommend you to include a magic number at the beginning of each "packet" so that you can test that both ends of the connection are still synchronized. They may get out of sync if for some reason (a bug) one of them reads or writes the wrong number of bytes to/from the stream.
Can't you just read whatever number of bytes that are ready to be read, and feed all incoming bytes into a packet parsing state machine? That would mean treating the incoming (TCP) data stream like any other incoming data stream (via serial line, or USB, a pipe, or whatever...)
So you would have some Selector determining from which connection(s) there are incoming bytes to be read, and how many. Then you would (for each connection) read the available bytes, and then feed those bytes into a (connection specific) state machine instance (the reading and feeding could be done from the same class, though). This packet parsing state machine class would then spit out finished packets from time to time, and hand those over to whoever will handle those complete and parsed packets.
For an example packet format like
2 magic header bytes to mark the start
2 bytes of payload size (n)
n bytes of payload data
2 bytes of checksum
the state machine would have states like (try an enum, Java has those now, I gather)
wait_for_magic_byte_0,
wait_for_magic_byte_1,
wait_for_length_byte_0,
wait_for_length_byte_1,
wait_for_payload_byte (with a payload_offset variable counting),
wait_for_chksum_byte_0,
wait_for_chksum_byte_1
and on each incoming byte you can switch the state accordingly. If the incoming byte does not properly advance the state machine, discard the byte by resetting the state machine to wait_for_magic_byte_0.
Ignoring client disconnects and server shutdown for now, here's more or less traditional structure of a socket server:
Selector, handles sockets:
polls open sockets
if it's the server socket, create new Connection object
for each active client socket find the Connection, call it with event (read or write)
Connection (one per socket), handles I/O on one socket:
Communicates to Protocol via two queues, input and output
keeps two buffers, one for reading, one for writing, and respective offsets
on read event: read all available input bytes, look for message boundaries, put whole messages onto Protocol input queue, call Protocol
on write event: write the buffer, or if it's empty, take message form output queue into buffer, start writing it
Protocol (one per connection), handles application protocol exchange on one connection:
take message from input queue, parse application portion of the message
do the server work (here's where the state machine is - some messages are appropriate in one state, while not in the other), generate response message, put it onto output queue
That's it. Everything could be in a single thread. The key here is separation of responsibilities.
Hope this helps.
I think you're approaching the issue from a slightly wrong direction. Instead of thinking of packets, think of a data structure. That's what you're sending. Effectively, yes, it's an application layer packet, but just think of it as a data object. Then, at the lowest level, write a routine which will read off the wire, and output data objects. That will give you the abstraction layer I think you're looking for.

In my application, why does readInt() always throw an EOFException?

(Forgive me because I do not write in Java very often.)
I'm writing a client-side network application in Java and I'm having an interesting issue. Every call to readInt() throws an EOFException. The variable is of type DataInputStream (initialized as: DataInputStream din = new DataInputStream(new BufferedInputStream(sock.getInputStream())); where sock is of type Socket).
Now, sock.isInputShutdown() returns false and socket.isConnected() returns true. I'm assuming that this means that I have a valid connection to the other machine I'm connecting to. I've also performed other checks to ensure that I'm properly connected to the other machine.
Is it possible that the DataInputStream was not set up correctly? Are there any preconditions that I have missed?
Any help is greatly appreciated.
#tofubeer: I actually wrote 17 bytes to the socket. The socket is connected to another machine and I'm waiting on input from that machine (I'm sorry if this was unclear). I successfully read from the stream (to initiate a handshake) first and this worked just fine. I'm checking now to see if my sent-requests are malformed, but I don't think they are. Also, I tried reading a single byte from the stream (via read()) and it returned -1.
Are you writing 4 bytes to the socket? According to the JavaDoc it will throw an EOFException if this stream reaches the end before reading all the bytes.
Try calling readByte() 4 times in a row instead of readInt() and see what happens (likely not all of them will work).
Edit (given your edit).
Find out how many times you can call read() before you get the -1.
When read() returns -1 it means that it has hit the end of file.
Also find out what each read() returns to make sure what you are reading in is what you actually wrote out.
It sounds like a problem either with the read code reading more than you thing while doing the handshake or the other side not writing what you think it is writing.
Some things to check:
Did the handshake consume more than 13 bytes, leaving less than four for the readInt()?
Was the integer you want to read written via DataOutputStream.writeInt()?
Did you flush the stream from the sender?
Edit: I took a look at the Java sources (I have the 1.4 sources on my desktop, not sure which version you're using) and the problem might be in BufferedInputStream. DataInputStream.readInt() is just calling BufferedInputStream.read() four times. BufferedInputStream.read() is calling BufferedInputStream.fill() if its buffer is exhausted (e.g., if its first read only got 16 bytes). BufferedInputStream.fill() calls the underlying InputStream's read(byte[], int, int) method, which by contract might not actually read anything! If this happens, BufferedInputStream.read() will return an erroneous EOF.
This is all assuming that I'm reading all of this correctly, which might not be the case. I only took a quick peek at the sources.
I suspect that your BufferedInputStream is only getting the first 16 bytes of the stream in its first read. I'd be curious what your DataInputStream's available() returns right before the readInt. If you're not already, I'd suggest you flush your OutputStream after writing the int you can't read as a possible workaround.

Categories