Encountered some code doing this inside a servlet:
while ((read = request.getInputStream().read(bytes)) != -1)
buffer.write(bytes, 0, read);
While in most cases, request.getInputStream() is just returning a field somewhere, I was thinking there might be dynamic wrappers or such that could get into a bad state.
Is there anything written in the docs about doing such a thing that I can use as a case for pulling the getInputStream() code out of the loop?
It's ok to call getInputStream() multiple times, the Servlet Specification only prohibits using it together with getReader(). As per the ServletRequest#getInputStream() method javadoc:
Retrieves the body of the request as binary data using a ServletInputStream. Either this method or getReader() may be called to read the body, not both.
Returns:
a ServletInputStream object containing the body of the request
Throws:
IllegalStateException - if the getReader() method has already been called for this request
IOException - if an input or output exception occurred
A particular Servlet implementation is free to return a wrapper object but at the end of the day one is supposed to always expect ServletInputStream can throw throw IOException at some point (e.g. connection reset).
If we take Apache Tomcat as an example, the HTTP connection handling logic is in AbstractProtocol.ConnectionHandler.process() method and is very defensive. The cleanup code for the HTTP connection and the underlying socket runs after catch(Throwable ) so application error shouldn't interfere with resource cleanup.
Related
The doc explains that the HttpServletResponse#sendError() method throws an IOException if an input or output exception occurs (DRY ;).
I couldn't find any scenario that makes this method throw that exception, is there any?
HTTP is sent over TCP so you can safely assume that somewhere in the underlying HttpServletRequest and HttpServletResponse there is a SocketInputStream and a SocketOutputStream.
If a user closes their browser or the network goes down client-side or server-side, then the server won't be able to receive requests or send responses. If the disconnection happens while the server was in the process of sendError(), then an IOException will occur while writing to the SocketOutputStream.
I'm trying to write out to URLConnection#getOutputStream, however, no data is actually sent until I call URLConnection#getInputStream. Even if I set URLConnnection#doInput to false, it still will not send. Does anyone know why this is? There's nothing in the API documentation that describes this.
Java API Documentation on URLConnection: http://download.oracle.com/javase/6/docs/api/java/net/URLConnection.html
Java's Tutorial on Reading from and Writing to a URLConnection: http://download.oracle.com/javase/tutorial/networking/urls/readingWriting.html
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.net.URL;
import java.net.URLConnection;
public class UrlConnectionTest {
private static final String TEST_URL = "http://localhost:3000/test/hitme";
public static void main(String[] args) throws IOException {
URLConnection urlCon = null;
URL url = null;
OutputStreamWriter osw = null;
try {
url = new URL(TEST_URL);
urlCon = url.openConnection();
urlCon.setDoOutput(true);
urlCon.setRequestProperty("Content-Type", "text/plain");
////////////////////////////////////////
// SETTING THIS TO FALSE DOES NOTHING //
////////////////////////////////////////
// urlCon.setDoInput(false);
osw = new OutputStreamWriter(urlCon.getOutputStream());
osw.write("HELLO WORLD");
osw.flush();
/////////////////////////////////////////////////
// MUST CALL THIS OTHERWISE WILL NOT WRITE OUT //
/////////////////////////////////////////////////
urlCon.getInputStream();
/////////////////////////////////////////////////////////////////////////////////////////////////////////
// If getInputStream is called while doInput=false, the following exception is thrown: //
// java.net.ProtocolException: Cannot read from URLConnection if doInput=false (call setDoInput(true)) //
/////////////////////////////////////////////////////////////////////////////////////////////////////////
} catch (Exception e) {
e.printStackTrace();
} finally {
if (osw != null) {
osw.close();
}
}
}
}
The API for URLConnection and HttpURLConnection are (for better or worse) designed for the user to follow a very specific sequence of events:
Set Request Properties
(Optional) getOutputStream(), write to the stream, close the stream
getInputStream(), read from the stream, close the stream
If your request is a POST or PUT, you need the optional step #2.
To the best of my knowledge, the OutputStream is not like a socket, it is not directly connected to an InputStream on the server. Instead, after you close or flush the stream, AND call getInputStream(), your output is built into a Request and sent. The semantics are based on the assumption that you will want to read the response. Every example that I've seen shows this order of events. I would certainly agree with you and others that this API is counterintuitive when compared to the normal stream I/O API.
The tutorial you link to states that "URLConnection is an HTTP-centric class". I interpret that to mean that the methods are designed around a Request-Response model, and make the assumption that is how they will be used.
For what it's worth, I found this bug report that explains the intended operation of the class better than the javadoc documentation. The evaluation of the report states "The only way to send out the request is by calling getInputStream."
Although the getInputStream() method can certainly cause a URLConnection object to initiate an HTTP request, it is not a requirement to do so.
Consider the actual workflow:
Build a request
Submit
Process the response
Step 1 includes the possibility of including data in the request, by way of an HTTP entity. It just so happens that the URLConnection class provides an OutputStream object as the mechanism for providing this data (and rightfully so for many reasons that aren't particularly relevant here). Suffice to say that the streaming nature of this mechanism provides the programmer an amount of flexibility when supplying the data, including the ability to close the output stream (and any input streams feeding it), before finishing the request.
In other words, step 1 allows for supplying a data entity for the request, then continuing to build it (such as by adding headers).
Step 2 is really a virtual step, and can be automated (like it is in the URLConnection class), since submitting a request is meaningless without a response (at least within the confines of the HTTP protocol).
Which brings us to Step 3. When processing an HTTP response, the response entity -- retrieved by calling getInputSteam() -- is just one of the things we might be interested in. A response consists of a status, headers, and optionally an entity. The first time any one of these is requested, the URLConnection will perform virtual step 2 and submit the request.
No matter if an entity is being sent via the connection's output stream or not, and no matter whether a response entity is expected back, a program will ALWAYS want to know the result (as provided by the HTTP status code). Calling getResponseCode() on the URLConnection provides this status, and switching on the result may end the HTTP conversation without ever calling getInputStream().
So, if data is being submitted, and a response entity is not expected, don't do this:
// request is now built, so...
InputStream ignored = urlConnection.getInputStream();
... do this:
// request is now built, so...
int result = urlConnection.getResponseCode();
// act based on this result
As my experiments have shown (java 1.7.0_01) the code:
osw = new OutputStreamWriter(urlCon.getOutputStream());
osw.write("HELLO WORLD");
osw.flush();
Doesn't send anything to the server. It just saves what's written there to the memory buffer. Thus in case you're going to upload a large file via POST - you need to be sure that you have enough memory. On desktop/server it may not be such a big problem, but on android that may result in out of memory error. Here's the example of how the stack trace looks when trying to write to output stream, and memory runs out.
Exception in thread "Thread-488" java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.Arrays.copyOf(Arrays.java:2271)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
at sun.net.www.http.PosterOutputStream.write(PosterOutputStream.java:78)
at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:282)
at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:125)
at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:135)
at java.io.OutputStreamWriter.write(OutputStreamWriter.java:220)
at java.io.Writer.write(Writer.java:157)
at maxela.tables.weboperations.POSTRequest.makePOST(POSTRequest.java:138)
On the bottom of the trace you can see the makePOST() method which does the following:
writer = new OutputStreamWriter(conn.getOutputStream());
for (int j = 0 ; j < 3000 * 100 ; j++)
{
writer.write("&var" + j + "=garbagegarbagegarbage_"+ j);
}
writer.flush();
And writer.write() throws the exception.
Also my experiments have shown that any exception related to the actual connection/IO with the server is thrown only after urlCon.getOutputStream() is called. Even urlCon.connect() seems to be "dummy" method which doesn't do any physical connection.
However if you call urlCon.getContentLengthLong() which returns Content-Length: header field from the server response-headers - then URLConnection.getOutputStream() will be called automatically and in case there's exception - it will be thrown.
The exceptions thrown by urlCon.getOutputStream() are all IOException, and I have met the follwing ones:
try
{
urlCon.getOutputStream();
}
catch (UnknownServiceException ex)
{
System.out.println("UnkownServiceException():" + ex.getMessage());
}
catch (ConnectException ex)
{
System.out.println("ConnectException()");
Logger.getLogger(POSTRequest.class.getName()).log(Level.SEVERE, null, ex);
}
catch (IOException ex) {
System.out.println("IOException():" + ex.getMessage());
Logger.getLogger(POSTRequest.class.getName()).log(Level.SEVERE, null, ex);
}
Hopefully my little research helps to people, as URLConnection class is a bit counter-intuitive in some cases thus, when implementing it - one needs to know what's it deals with.
Second reason is: when working with servers - the work with server may fail because of many reasons (connection, dns, firewall, httpresponses, server not being able to accept connection, server not being able to process request timely). Thus it is important to understand how exceptions raised can explain about what's actually happening with the connection.
Calling getInputStream() signals that the client is finished sending it's request, and is ready to receive the response (per HTTP spec). It seems that the URLConnection class has this notion built into it, and must be flush()ing the output stream when the input stream is asked for.
As the other responder noted, you should be able to call flush() yourself to trigger the write.
The fundamental reason is that it has to compute a Content-length header automatically (unless you are using chunked or streaming mode). It can't do that until it has seen all the output, and it has to send it before the output, so it has to buffer the output. And it needs a decisive event to know when the last output has actually been written. So it uses getInputStream() for that. At that time it writes the headers including the content-length, then the output, then it starts reading the input.
(Repost from your first question. Shameless self-plug)
Don't fiddle around with URLConnection yourself, let Resty handle it.
Here's the code you would need to write (I assume you are getting text back):
import static us.monoid.web.Resty.*;
import us.monoid.web.Resty;
...
new Resty().text(TEST_URL, content("HELLO WORLD")).toString();
When using HttpURLConnection does the InputStream need to be closed if we do not 'get' and use it?
i.e. is this safe?
HttpURLConnection conn = (HttpURLConnection) uri.getURI().toURL().openConnection();
conn.connect();
// check for content type I don't care about
if (conn.getContentType.equals("image/gif") return;
// get stream and read from it
InputStream is = conn.getInputStream();
try {
// read from is
} finally {
is.close();
}
Secondly, is it safe to close an InputStream before all of it's content has been fully read?
Is there a risk of leaving the underlying socket in ESTABLISHED or even CLOSE_WAIT state?
According to http://docs.oracle.com/javase/6/docs/technotes/guides/net/http-keepalive.html
and OpenJDK source code.
(When keepAlive == true)
If client called HttpURLConnection.getInputSteam().close(), the later call to HttpURLConnection.disconnect() will NOT close the Socket. i.e. The Socket is reused (cached)
If client does not call close(), call disconnect() will close the InputStream and close the Socket.
So in order to reuse the Socket, just call InputStream.close(). Do not call HttpURLConnection.disconnect().
is it safe to close an InputStream
before all of it's content has been
read
You need to read all of the data in the input stream before you close it so that the underlying TCP connection gets cached. I have read that it should not be required in latest Java, but it was always mandated to read the whole response for connection re-use.
Check this post: keep-alive in java6
Here is some information regarding the keep-alive cache. All of this information pertains Java 6, but is probably also accurate for many prior and later versions.
From what I can tell, the code boils down to:
If the remote server sends a "Keep-Alive" header with a "timeout" value that can be parsed as a positive integer, that number of seconds is used for the timeout.
If the remote server sends a "Keep-Alive" header but it doesn't have a "timeout" value that can be parsed as a positive integer and "usingProxy" is true, then the timeout is 60 seconds.
In all other cases, the timeout is 5 seconds.
This logic is split between two places: around line 725 of sun.net.www.http.HttpClient (in the "parseHTTPHeader" method), and around line 120 of sun.net.www.http.KeepAliveCache (in the "put" method).
So, there are two ways to control the timeout period:
Control the remote server and configure it to send a Keep-Alive header with the proper timeout field
Modify the JDK source code and build your own.
One would think that it would be possible to change the apparently arbitrary five-second default without recompiling internal JDK classes, but it isn't. A bug was filed in 2005 requesting this ability, but Sun refused to provide it.
If you really want to make sure that the connection is close you should call conn.disconnect().
The open connections you observed are because of the HTTP 1.1 connection keep alive feature (also known as HTTP Persistent Connections).
If the server supports HTTP 1.1 and does not send a Connection: close in the response header Java does not immediately close the underlaying TCP connection when you close the input stream. Instead it keeps it open and tries to reuse it for the next HTTP request to the same server.
If you don't want this behaviour at all you can set the system property http.keepAlive to false:
System.setProperty("http.keepAlive","false");
When using HttpURLConnection does the InputStream need to be closed if we do not 'get' and use it?
Yes, it always needs to be closed.
i.e. is this safe?
Not 100%, you run the risk of getting a NPE. Safer is:
InputStream is = null;
try {
is = conn.getInputStream()
// read from is
} finally {
if (is != null) {
is.close();
}
}
You also have to close error stream if the HTTP request fails (anything but 200):
try {
...
}
catch (IOException e) {
connection.getErrorStream().close();
}
If you don't do it, all requests that don't return 200 (e.g. timeout) will leak one socket.
Since Java 7 the recommended way is
try (InputStream is = conn.getInputStream()) {
// read from is
// ...
}
as for all other classes implementing Closable. close() is called at the end of the try {...} block.
Closing the input stream also means you are done with reading. Otherwise the connection hangs around until the finalizer closes the stream.
Same applies to the output stream, if you are sending data.
There is no need to get an close the ErrorStream. Even if it implements the InputStream interface: It's using the InputStream in combination with a buffer. Closing the InputStream is sufficient.
How can I detect that the client side of a tomcat servlet request has disconnected? I've read that I should do a response.getOutputStream().print(), then a response.getOutputStream().flush() and catch an IOException, but is there a way I can detect this without writing any data?
EDIT:
The servlet sends out a data stream that doesn't necessarily end, but doesn't necessarily have any data flowing through it (it's a stream of real time events). I need to actually detect when the client disconnects because I have some cleanup I have to do at that point (resources to release, etcetera). If I have the HttpServletRequest available, will trying to read from that throw an IOException if the client disconnects?
is there a way I can detect this
without writing any data?
No because there isn't a way in TCP/IP to detect it without writing any data.
Don't worry about it. Just complete the request actions and write the response. If the client has disappeared, that will cause an IOException: connection reset, which will be thrown into the servlet container. Nothing you have to do about that.
I need to actually detect when the client disconnects because I have some cleanup I have to do at that point (resources to release, etcetera).
There the finally block is for. It will be executed regardless of the outcome. E.g.
OutputStream output = null;
try {
output = response.getOutputStream();
// ...
output.flush();
// ...
} finally {
// Do your cleanup here.
}
If I have the HttpServletRequest available, will trying to read from that throw an IOException if the client disconnects?
Depends on how you're reading from it and how much of request body is already in server memory. In case of normal form encoded requests, whenever you call getParameter() beforehand, it will usually be fully parsed and stored in server memory. Calling the getInputStream() won't be useful at all. Better do it on the response instead.
Have you tried to flush the buffer of the response:
response.flushBuffer();
Seems to throw an IOException when the client disconnected.
In the method service(), we use
PrintWriter out = res.getWriter();
Please tell me how it returns the PrintWriter class object, and then makes a connection to the Browser and sends the data to the Browser.
It doesn't make a connection to the browser - the browser has already made a connection to the server. It either buffers what you write in memory, and then transmits the data at the end of the request, or it makes sure all the headers have been written to the network connection and then returns a PrintWriter which writes data directly to that network connection.
In the buffering scenario there may be a fixed buffer size, and if you exceed that the data written so far will be "flushed" to the network connection. The big advantage of having a buffer at all is that if something goes wrong half-way through, you can change your response to an error page. If you've already started writing the response when something goes wrong, there's not a lot you can do to indicate the error cleanly.
(There's also the matter of transmitting the content length before any of the content, for keep-alive connections. If you run out of buffer before completing the response, I'm reliably informed that the response will use a chunked encoding.)
One fairly simple implementation:
PrintWriter getWriter() throws java.io.IOException {
return new PrintWriter(socket.getOutputStream());
}
Also note that several open source implementations of the Servlet API is available. This allows you to see how it can be done.
I believe the official implementation has been open sourced too, and is included with the Glassfish server.