Java HttpURLConnection InputStream.close() hangs (or works too long?) - java

First, some background. There is a worker which expands/resolves bunch of short URLS:
http://t.co/example -> http://example.com
So, we just follow redirects. That's it. We don't read any data from the connection. Right after we got 200 we return the final URL and close InputStream.
Now, the problem itself. On a production server one of the resolver threads hangs inside the InputStream.close() call:
"ProcessShortUrlTask" prio=10 tid=0x00007f8810119000 nid=0x402b runnable [0x00007f882b044000]
java.lang.Thread.State: RUNNABLE
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.skip(BufferedInputStream.java:352)
- locked <0x0000000561293aa0> (a java.io.BufferedInputStream)
at sun.net.www.MeteredStream.skip(MeteredStream.java:134)
- locked <0x0000000561293a70> (a sun.net.www.http.KeepAliveStream)
at sun.net.www.http.KeepAliveStream.close(KeepAliveStream.java:76)
at java.io.FilterInputStream.close(FilterInputStream.java:155)
at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.close(HttpURLConnection.java:2735)
at ru.twitter.times.http.URLProcessor.resolve(URLProcessor.java:131)
at ru.twitter.times.http.URLProcessor.resolve(URLProcessor.java:55)
at ...
After a brief research, I understood that skip() is called to clean up the stream before sending it back to the connections pool (if keep-alive is set on?). Still I don't understand how to avoid this situation. Moreover, I doubt if there is some bad design in our code or there is problem in JDK.
So, the questions are:
Is it possible to avoid hanging on close()? Guarantee some reasonable
timeout, for example.
Is it possible to avoid reading data from connection at all?
Remember I just want the final URL. Actually, I think, I don't want
skip() to be called at all ...
Update:
KeepAliveStream, line 79, close() method:
// Skip past the data that's left in the Inputstream because
// some sort of error may have occurred.
// Do this ONLY if the skip won't block. The stream may have
// been closed at the beginning of a big file and we don't want
// to hang around for nothing. So if we can't skip without blocking
// we just close the socket and, therefore, terminate the keepAlive
// NOTE: Don't close super class
try {
if (expected > count) {
long nskip = (long) (expected - count);
if (nskip <= available()) {
long n = 0;
while (n < nskip) {
nskip = nskip - n;
n = skip(nskip);} ...
More and more it seems to me that there is a bug in JDK itself. Unfortunately, it's very hard to reproduce this ...

The implementation of KeepAliveStream that you have linked, violates the contract under which available() and skip() are guaranteed to be non-blocking and thus may indeed block.
The contract of available() guarantees a single non-blocking skip():
Returns an estimate of the number of bytes that can be read (or
skipped over) from this input stream without blocking by the next
caller of a method for this input stream. The next caller might be
the same thread or another thread. A single read or skip of this
many bytes will not block, but may read or skip fewer bytes.
Wheres the implementation calls skip() multiple times per single call to available():
if (nskip <= available()) {
long n = 0;
// The loop below can iterate several times,
// only the first call is guaranteed to be non-blocking.
while (n < nskip) {
nskip = nskip - n;
n = skip(nskip);
}
This doesn't prove that your application blocks because KeepAliveStream incorrectly uses InputStream. Some implementations of InputStream may possibly provide stronger non-blocking guarantees, but I think it is a very likely suspect.
EDIT: After a bit more research, this is a very recently fixed bug in JDK: https://bugs.openjdk.java.net/browse/JDK-8004863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel. The bug report says about an infinite loop, but a blocking skip() could also be a result. The fix seems to address both issues (there is only a single skip() per available())

I guess this skip() on close() is intended for Keep-Alive support.
See http://docs.oracle.com/javase/6/docs/technotes/guides/net/http-keepalive.html.
Prior to Java SE 6, if an application closes a HTTP InputStream when
more than a small amount of data remains to be read, then the
connection had to be closed, rather than being cached. Now in Java SE
6, the behavior is to read up to 512 Kbytes off the connection in a
background thread, thus allowing the connection to be reused. The
exact amount of data which may be read is configurable through the
http.KeepAlive.remainingData system property.
So keep alive can be effectively disabled with http.KeepAlive.remainingData=0 or http.keepAlive=false.
But this can negatively affect performance if you always address to the same http://t.co host.
As #artbristol suggested, using HEAD instead of GET seems to be the preferable solution here.

I was facing a similar issue when I was trying to make a "HEAD" request. To fix it, I removed the "HEAD" method because I just wanted to ping the url

Related

How to properly implement a blocking, thread-safe write method for Java sockets?

I wrote a WebSocket server in Java.
This is the method that the server uses to send WebSocket packets to its clients:
private void sendFrame(boolean fin, boolean rsv1, boolean rsv2, boolean rsv3, WebSocketOpcode opcode, byte[] payloadData) throws IOException {
if (connection.isClosed() || webSocketConnectionClosing != null) return;
byte[] header = new byte[2];
if (fin) header[0] |= 1 << 7;
if (rsv1) header[0] |= 1 << 6;
if (rsv2) header[0] |= 1 << 5;
if (rsv3) header[0] |= 1 << 4;
header[0] |= opcode.get() & 0b1111;
header[1] |= payloadData.length < 126 ? payloadData.length : (payloadData.length <= 65535 ? 126 : 127);
out.write(header);
if (payloadData.length > 125) {
if (payloadData.length <= 65535) {
out.writeShort(payloadData.length);
} else {
out.writeLong(payloadData.length);
}
}
out.write(payloadData);
out.flush();
}
And this is how I declare the output stream after a client connects:
out = new DataOutputStream(new BufferedOutputStream(connection.getOutputStream()));
And I have some questions regarding this:
Is the above code thread-safe? What I mean by that is, can multiple threads call sendFrame() at the same time without the risk of packets data interleaving? It looks like this code is wrong, but I haven't encountered any interleaving yet.
If it isn't thread-safe, then how would I make it thread-safe in this form without the use of queues? (I want the sendFrame() method to be blocking until the data is actually sent)
If I wouldn't wrap the OutputStream in BufferedOutputStream, but only in DataOutputStream instead, would this make the .write() method atomic? Would it be thread-safe to pack the entire packet data into a single byte array and then call .write() once with that array?
Is the above code thread-safe? What I mean by that is, can multiple threads call sendFrame() at the same time without the risk of packets data interleaving?
It is not thread-safe.
It looks like this code is wrong, but I haven't encountered any interleaving yet.
The time window in which the interleaving could occur is very small. Probably less than a microsecond. That means the probability of it occurring is small. But not zero.
If it isn't thread-safe, then how would I make it thread-safe in this form without the use of queues? (I want the sendFrame() method to be blocking until the data is actually sent)
It depends on how the sendFrame method fits in with the rest of your code.
The approach I would used would be to ensure that all calls to sendFrame for a specific output stream and being made on the same target object. Then I would use synchronized to lock on the target object or a private log belonging to the target object.
An alternative would be to use synchronized and lock on out. However there is a risk that something else is doing that already, and sendFrame calls would be blocked unnecessarily.
If I wouldn't wrap the OutputStream in BufferedOutputStream, but only in DataOutputStream instead, would this make the .write() method atomic?
(That's beside the point. You have 3 write calls to contend with. However ....)
Would it be thread-safe to pack the entire packet data into a single byte array and then call .write() once with that array?
None of those classes are documented1 as thread-safe, or as guaranteeing that write operations are atomic. However, in OpenJDK Java 11 (at least), the relevant write methods are implemented as synchronized in BufferedOutputStream and DataOutputStream.
1 - If the javadocs don't specify thread-safety, etc characteristics, then those characteristics could vary depending on the Java version, etc.

EventLoop#submit() vs #execute() vs Channel#writeAndFlush()

What's the difference between the 3 methods when writing bytes to a channel?
In my case, the thread writing these bytes is not the thread that belongs to the channel's EventLoop, and I understand that IO events always happen on the channel's assigned EventLoop thread.
I am trying to minimize latency with getting these bytes flushed as soon as possible.
To better understand what I can do to optimize this, I need to know the difference between these 3 ways to write data to a channel, and possibly any other way I may have missed?
byte[] data = ...
Channel channel = ...
// 1
channel.eventLoop().submit(() -> channel.writeAndFlush(data));
// 2
channel.eventLoop().execute(() -> channel.writeAndFlush(data));
// 3
channel.writeAndFlush(data);
So for what you are doing here there isn't really much difference except in how the return value of writeAndFlush is propagated.

Is CAS a loop like spin?

I came across a problem when I read the code of sun.misc.Unsafe.Java.
Is CAS a loop like spin?
At first, I think CAS is just an atomic operation in a low-live way. However, when I try to find the source code of the function compareAndSwapInt, I find the cpp code like this:
jbyte Atomic::cmpxchg(jbyte exchange_value, volatile jbyte* dest, jbyte compare_value) {
assert(sizeof(jbyte) == 1, "assumption.");
uintptr_t dest_addr = (uintptr_t)dest;
uintptr_t offset = dest_addr % sizeof(jint);
volatile jint* dest_int = (volatile jint*)(dest_addr - offset);
jint cur = *dest_int;
jbyte* cur_as_bytes = (jbyte*)(&cur);
jint new_val = cur;
jbyte* new_val_as_bytes = (jbyte*)(&new_val);
new_val_as_bytes[offset] = exchange_value;
while (cur_as_bytes[offset] == compare_value) {
jint res = cmpxchg(new_val, dest_int, cur);
if (res == cur) break;
cur = res;
new_val = cur;
new_val_as_bytes[offset] = exchange_value;
}
return cur_as_bytes[offset];
}
I saw "when" and "break " in this atomic function.
Is it a spin ways?
related code links:
http://hg.openjdk.java.net/jdk8u/jdk8u20/hotspot/file/190899198332/src/share/vm/prims/unsafe.cpp
http://hg.openjdk.java.net/jdk8u/jdk8u/jdk/file/07011844584f/src/share/classes/sun/misc/Unsafe.java
http://hg.openjdk.java.net/jdk8u/jdk8u20/hotspot/file/55fb97c4c58d/src/share/vm/runtime/atomic.cpp
CAS is a single operation that returns a value of 1 or 0 meaning this operation has made it or not, since you are doing a compareAndSwapInt you want this operation to succeed, thus the operations gets repeated until it works.
I think you are also confusing this with a spin lock, that basically means do something while this value is "1" (for example); all other threads wait until this value is zero (via compareAndSwap), which in effect means that some thread is done with the work and has released the lock (this is referred as release/acquire semantics)
The CAS operation is not a spin, it's an atomic operation at hardware level. On x86 and SPARC processors CAS a single instruction, and it supports int and long operands.
Indeed the Atomic::cmpxchg int / long overloads are generated on x86 using a single cmpxchgl/cmpxchgq instruction.
What you're looking at is an Atomic::cmpxchg single-byte overload, which works around the CAS instruction's limitation to simulate CAS at byte level. It does so by performing a CAS for an int located at the same address as the byte, then checking just one byte out of it and repeating if CAS fails because of a change in the other 3 bytes. The compare-and-swap is still atomic, it just needs to be re-tried sometimes because it covers more bytes than is necessary.
CAS is typically a hardware instruction just like integer addition or comparison, for example (only slower). The instruction itself may be broken down into several steps of so-called microcode, and might indeed contain a low-level loop or a blocking wait for another processor component. However, these are implementation details of the processor architecture. Remember the saying that any problem in CS can be solved by adding another layer of indirection? This also applies here. An atomic operation in Java may actually involve the following layers:
The Java method signature.
A C(++) JNI method to implement it.
A C(++) "compiler intrinsic" such as GCC's __atomic_compare_exchange
The actual processor instruction.
The microcode that implements this instruction.
Additional layers to be used by said microcode, such as cache coherency protocols and the like.
My recommendation is not to worry about how all of this works unless either case applies:
For some reason, it doesn't work. This is likely due to a platform bug.
It is too slow.
Unit tests can help you identify the former case. Benchmarking can help you identify the latter case. But it should be pointed out that if the CAS provided to you by Java is slow, chances are that you will not be able to write a faster one yourself. Therefore, your best bet in this case would be to change your data structures or data flows such as to further reduce the amount of thread synchronization required.

Java NIO: transferFrom until end of stream

I'm playing around with the NIO library. I'm attempting to listen for a connection on port 8888 and once a connection is accepted, dump everything from that channel to somefile.
I know how to do it with ByteBuffers, but I'd like to get it working with the allegedly super efficient FileChannel.transferFrom.
This is what I got:
ServerSocketChannel ssChannel = ServerSocketChannel.open();
ssChannel.socket().bind(new InetSocketAddress(8888));
SocketChannel sChannel = ssChannel.accept();
FileChannel out = new FileOutputStream("somefile").getChannel();
while (... sChannel has not reached the end of the stream ...) <-- what to put here?
out.transferFrom(sChannel, out.position(), BUF_SIZE);
out.close();
So, my question is: How do I express "transferFrom some channel until end-of-stream is reached"?
Edit: Changed 1024 to BUF_SIZE, since the size of the buffer used, is irrelevant for the question.
There are few ways to handle the case. Some background info how trasnferTo/From is implemented internally and when it can be superior.
1st and foremost you should know how many bytes you have to xfer, i.e. use FileChannel.size() to determine the max available and sum the result. The case refers to FileChannel.trasnferTo(socketChanel)
The method does not return -1
The method is emulated on Windows. Windows doesn't have an API function to xfer from filedescriptor to socket, it does have one (two) to xfer from the file designated by name - but that's incompatible with java API.
On Linux the standard sendfile (or sendfile64) is used, on Solaris it's called sendfilev64.
in short for (long xferBytes=0; startPos + xferBytes<fchannel.size();) doXfer() will work for transfer from file -> socket.
There is no OS function that transfers from socket to file (which the OP is interested in). Since the socket data is not int he OS cache it can't be done so effectively, it's emulated. The best way to implement the copy is via standard loop using a polled direct ByteBuffer sized with the socket read buffer. Since I use only non-blocking IO that involves a selector as well.
That being said: I'd like to get it working with the allegedly super efficient "? - it is not efficient and it's emulated on all OSes, hence it will end up the transfer when the socket is closed gracefully or not. The function will not even throw the inherited IOException, provided there was ANY transfer (If the socket was readable and open).
I hope the answer is clear: the only interesting use of File.transferFrom happens when the source is a file. The most efficient (and interesting case) is file->socket and file->file is implemented via filechanel.map/unmap(!!).
Answering your question directly:
while( (count = socketChannel.read(this.readBuffer) ) >= 0) {
/// do something
}
But if this is what you do you do not use any benefits of non-blocking IO because you actually use it exactly as blocking IO. The point of non-blocking IO is that 1 network thread can serve several clients simultaneously: if there is nothing to read from one channel (i.e. count == 0) you can switch to other channel (that belongs to other client connection).
So, the loop should actually iterate different channels instead of reading from one channel until it is over.
Take a look on this tutorial: http://rox-xmlrpc.sourceforge.net/niotut/
I believe it will help you to understand the issue.
I'm not sure, but the JavaDoc says:
An attempt is made to read up to count bytes from the source channel
and write them to this channel's file starting at the given position.
An invocation of this method may or may not transfer all of the
requested bytes; whether or not it does so depends upon the natures
and states of the channels. Fewer than the requested number of bytes
will be transferred if the source channel has fewer than count bytes
remaining, or if the source channel is non-blocking and has fewer than
count bytes immediately available in its input buffer.
I think you may say that telling it to copy infinite bytes (of course not in a loop) will do the job:
out.transferFrom(sChannel, out.position(), Integer.MAX_VALUE);
So, I guess when the socket connection is closed, the state will get changed, which will stop the transferFrom method.
But as I already said: I'm not sure.
allegedly super efficient FileChannel.transferFrom.
If you want both the benefits of DMA access and nonblocking IO the best way is to memory-map the file and then just read from the socket into the memory mapped buffers.
But that requires that you preallocate the file.
This way:
URLConnection connection = new URL("target").openConnection();
File file = new File(connection.getURL().getPath().substring(1));
FileChannel download = new FileOutputStream(file).getChannel();
while(download.transferFrom(Channels.newChannel(connection.getInputStream()),
file.length(), 1024) > 0) {
//Some calculs to get current speed ;)
}
transferFrom() returns a count. Just keep calling it, advancing the position/offset, until it returns zero. But start with a much larger count than 1024, more like a megabyte or two, otherwise you're not getting much benefit from this method.
EDIT To address all the commentary below, the documentation says that "Fewer than the requested number of bytes will be transferred if the source channel has fewer than count bytes remaining, or if the source channel is non-blocking and has fewer than count bytes immediately available in its input buffer." So provided you are in blocking mode it won't return zero until there is nothing left in the source. So looping until it returns zero is valid.
EDIT 2
The transfer methods are certainly mis-designed. They should have been designed to return -1 at end of stream, like all the read() methods.
Building on top of what other people here have written, here's a simple helper method which accomplishes the goal:
public static void transferFully(FileChannel fileChannel, ReadableByteChannel sourceChannel, long totalSize) {
for (long bytesWritten = 0; bytesWritten < totalSize;) {
bytesWritten += fileChannel.transferFrom(sourceChannel, bytesWritten, totalSize - bytesWritten);
}
}

In my application, why does readInt() always throw an EOFException?

(Forgive me because I do not write in Java very often.)
I'm writing a client-side network application in Java and I'm having an interesting issue. Every call to readInt() throws an EOFException. The variable is of type DataInputStream (initialized as: DataInputStream din = new DataInputStream(new BufferedInputStream(sock.getInputStream())); where sock is of type Socket).
Now, sock.isInputShutdown() returns false and socket.isConnected() returns true. I'm assuming that this means that I have a valid connection to the other machine I'm connecting to. I've also performed other checks to ensure that I'm properly connected to the other machine.
Is it possible that the DataInputStream was not set up correctly? Are there any preconditions that I have missed?
Any help is greatly appreciated.
#tofubeer: I actually wrote 17 bytes to the socket. The socket is connected to another machine and I'm waiting on input from that machine (I'm sorry if this was unclear). I successfully read from the stream (to initiate a handshake) first and this worked just fine. I'm checking now to see if my sent-requests are malformed, but I don't think they are. Also, I tried reading a single byte from the stream (via read()) and it returned -1.
Are you writing 4 bytes to the socket? According to the JavaDoc it will throw an EOFException if this stream reaches the end before reading all the bytes.
Try calling readByte() 4 times in a row instead of readInt() and see what happens (likely not all of them will work).
Edit (given your edit).
Find out how many times you can call read() before you get the -1.
When read() returns -1 it means that it has hit the end of file.
Also find out what each read() returns to make sure what you are reading in is what you actually wrote out.
It sounds like a problem either with the read code reading more than you thing while doing the handshake or the other side not writing what you think it is writing.
Some things to check:
Did the handshake consume more than 13 bytes, leaving less than four for the readInt()?
Was the integer you want to read written via DataOutputStream.writeInt()?
Did you flush the stream from the sender?
Edit: I took a look at the Java sources (I have the 1.4 sources on my desktop, not sure which version you're using) and the problem might be in BufferedInputStream. DataInputStream.readInt() is just calling BufferedInputStream.read() four times. BufferedInputStream.read() is calling BufferedInputStream.fill() if its buffer is exhausted (e.g., if its first read only got 16 bytes). BufferedInputStream.fill() calls the underlying InputStream's read(byte[], int, int) method, which by contract might not actually read anything! If this happens, BufferedInputStream.read() will return an erroneous EOF.
This is all assuming that I'm reading all of this correctly, which might not be the case. I only took a quick peek at the sources.
I suspect that your BufferedInputStream is only getting the first 16 bytes of the stream in its first read. I'd be curious what your DataInputStream's available() returns right before the readInt. If you're not already, I'd suggest you flush your OutputStream after writing the int you can't read as a possible workaround.

Categories