I'am writing a HTTP proxy that is part of a test/verification
system. The proxy filters all requests coming from the client device
and directs them towards various systems under test.
The proxy is implemented as a servlet where each request is forwarded
to the target system, it handles both GET and POST. Somtimes the
response from the target system is altered to fit various test
conditions, but that is not the part of the problem.
When forwarding a request, all headers are copied except for those
that is part of the actual HTTP transfer such as Content-Length and
Connection headers.
If the request is a HTTP POST, then the entity body of the request is
forwarded as well and here is where it doesnt work sometimes.
The code reading the entity body from the servlet request is the following:
URL url = new URL(targetURL);
HttpURLConnection conn = (HttpURLConnection)url.openConnection();
String method = request.getMethod();
java.util.Enumeration headers = request.getHeaderNames();
while(headers.hasMoreElements()) {
String headerName = (String)headers.nextElement();
String headerValue = request.getHeader(headerName);
if (...) { // do various adaptive stuff based on header
}
conn.setRequestProperty(headerName, headerValue);
}
// here is the part that fails
char postBody[] = new char[1024];
int len;
if(method.equals("POST")) {
logger.debug("guiProxy, handle post, read request body");
conn.setDoOutput(true);
BufferedReader br = request.getReader();
BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(conn.getOutputStream()));
do {
logger.debug("Read request into buffer of size: " + postBody.length);
len = br.read(postBody, 0, postBody.length);
logger.debug("guiProxy, send request body, got " + len + " bytes from request");
if(len != -1) {
bw.write(postBody, 0, len);
}
} while(len != -1);
bw.close();
}
So what happends is that the first time a POST is received, -1
characters are read from the request reader, a wireshark trace shows
that the entity body containing URL encoded post parameters are there
and it is in one TCP segment so there is no network related
differences.
The second time, br.read successfully returns the 232 bytes in the
POST request entity body and every forthcoming request works as well.
The only difference between the first and forthcoming POST requests is
that in the first one, no cookies are present, but in the second one,
a cookie is present that maps to the JSESSION.
Can it be a side effect of entity body not being available since the
request processing in the servlet container allready has read the POST
parameters, but why does it work on forthcoming requests.
I believe that the solution is of course to ignore the entity body on
POST requests containing URL encoded data and fetch all parameters
from the servlet request instead using getParameter and reinsert them
int the outgoing request.
Allthough that is tricky since the POST request could contain GET
parameters, not in our application right now, but implementing it
correctly is some work.
So my question is basically: why do the reader from
request.getReader() return -1 when reading and an entity body is
present in the request, if the entity body is not available for
reading, then getReader should throw an illegal state exception. I
have also tried with InputStream using getInputStream() with the same
results.
All of this is tested on apache-tomcat-6.0.18.
So my question is basically: why do the reader from request.getReader() return -1 when reading.
It will return -1 when there is no body or when it has already been read. You cannot read it twice. Make sure that nothing before in the request/response chain has read it.
and an entity body is present in the request, if the entity body is not available for reading, then getReader should throw an illegal state exception.
It will only throw that when you have already called getInputStream() on the request before, not when it is not available.
I have also tried with InputStream using getInputStream() with the same results.
After all, I'd prefer streaming bytes than characters because you then don't need to take character encoding into account (which you aren't doing as far now, this may lead to future problems when you will get this all to work).
Seems, that moving
BufferedReader br = request.getReader()
before all operations, that read request (like request.getHeader() ), works for me well .
Related
I am using HttpURLConnection and I have a problem understanding on how client and server sync. Assume simple download file example. This example is copied somewhere from web. I am only using the code to state the standard process.
Servlet code is like:
response.setContentType(mimeType);
response.setContentLength((int) downloadFile.length());
String headerKey = "Content-Disposition";
String headerValue = String.format("attachment; filename=\"%s\"", downloadFile.getName());
response.setHeader(headerKey, headerValue);
// obtains response's output stream
OutputStream outStream = response.getOutputStream();
//write to stream
//close the stream
And Client code is like :
HttpURLConnection httpConn = (HttpURLConnection) url.openConnection();
int responseCode = httpConn.getResponseCode();
if (responseCode == HttpURLConnection.HTTP_OK) {
String disposition = httpConn.getHeaderField("Content-Disposition");
String contentType = httpConn.getContentType();
int contentLength = httpConn.getContentLength();
//parse content-disposition
....
InputStream inputStream = httpConn.getInputStream();
String saveFilePath = saveDir + File.separator + fileName;
// opens an output stream to save into file
FileOutputStream outputStream = new FileOutputStream(saveFilePath);
//write to stream
//close stream
} else {
//if non Ok status
}
My first question is : Is httConn.getResponseCode() a blocking call that waits for the servlet to finish processing? Otherwise, if error comes or servlet call response.sendError(), when you are inside if (responseCode == HttpURLConnection.HTTP_OK) {, what will happen.
Second Question : is an extension of first question. If responseCode is not blocking, then when i am accessing disposition, contentType, cotentLength, how am i sure that they are already set.
Third Question. If httConn.getResponseCode() is blocking. So if I want to send send some message to the client, how correct is to send it to the client in response headers like : resposnse.setHeader("my-message", "some message I want to send"); rather than using response.getWriter() to write to the stream. So that I am sure client will definitely read that.
Fourth Question : If I am writing two objects to streams on servlet, how will client distinguish or can it distinguish? Suppose I am writing a class object using response.getObjectOutputStream() and then I may be writing some string using writer or may be write a file after that. Can client distinguish these different items coming in stream or do I have to use multiple request. One request per object or file or String to be read from stream.
Yes, as the javadoc indicates
Gets the status code from an HTTP response message. For example, in the case of the following status lines:
HTTP/1.0 200 OK
HTTP/1.0 401 Unauthorized
It will return 200 and 401 respectively. Returns -1 if no code can be discerned from the response (i.e., the response is not valid HTTP).
NA
You can use headers if you want, but headers are limited to text only and limited in length (AFAIR). The response body is usually used to contain... the response body. Whereas headers are typically used for metadata.
server and client have to agree on a protocol. If the protocol is that a response contains two objects, then the client should read two objects. I would not do that though. You'd better send a unique container object rather than 2 in sequence. HTTP can be used to transport anything, but a JSON or XML document is usually used to transport structured data.
I'm wondering how to determine an empty http response.
With empty http response I mean, that the http response will only have set some headers, but contains an empty http body.
For example: I do a HTTP POST to an webserver, but the webserver will only return an status code for my HTTP POST and nothing else.
The problem is, that I have written a little http framework on top of apache HttpClient to do auto json parsing etc. So the default use case of this framework is to make a request and parse the response. However if the response does not contain data, like mentioned in the example above, I will ensure that my framework skip json parsing.
So I do something like this:
HttpResponse response = httpClient.execute(uriRequest);
HttpEntity entity = response.getEntity();
if (entity != null){
InputStream in = entity.getContent();
// json parsing
}
However entity is always != null. And also the retrieved inputstream is != null. Is there a simple way to determine if the http body is empty or not?
The only way I see is that the server response contains the Content-Length header field set to 0.
But not every server set this field.
Any suggestions?
In HttpClient, getEntity() can return null. See the latest samples.
However, there's a difference between an empty entity, and no entity. Sounds like you've got an empty entity. (Sorry to be pedantic -- it's just that HTTP is pedantic. :) With respect to detecting empty entities, have you tried reading from the entity input stream? If the response is an empty entity, you should get an immediate EOF.
Do you need to determine if the entity is empty without reading any bytes from the entity body? Based on the code above, I don't think you do. If that's the case, you can just wrap the entity InputStream with a PushbackInputStream and check:
HttpResponse response = httpClient.execute(uriRequest);
HttpEntity entity = response.getEntity();
if(entity != null) {
InputStream in = new PushbackInputStream(entity.getContent());
try {
int firstByte=in.read();
if(firstByte != -1) {
in.unread(firstByte);
// json parsing
}
else {
// empty
}
}
finally {
// Don't close so we can reuse the connection
EntityUtils.consumeQuietly(entity);
// Or, if you're sure you won't re-use the connection
in.close();
}
}
It's best not to read the entire response into memory just in case it's large. This solution will test for emptiness using constant memory (4 bytes :).
EDIT: <pedantry> In HTTP, if a request has no Content-Length header, then there should be a Transfer-Encoding: chunked header. If there is no Transfer-Encoding: chunked header either, then you should have no entity as opposed to an empty entity. </pedantry>
I would suggest to use the class EntityUtils to get the response as String. If it returns the empty string, then the response is empty.
String resp = EntityUtils.toString(client.execute(uriRequest).getEntity())
if (resp == null || "".equals(resp)) {
// no entity or empty entity
} else {
// got something
JSON.parse(resp);
}
The assumption here is that, for sake of code simplicity and manutenibility, you don't care to distinguish between empty entity and no entity, and that if there is a response, you need to read it anyway.
I have a home grown protocol which uses HttpURLConnection (from Java 1.6) & Jetty (6.1.26) to POST a block of xml as a request and receive a block of xml as a response. The amounts of xml are approx. 5KB.
When running both sender and receiver on Linux EC2 instances in different parts of the world I'm finding that in about 0.04% of my requests the Jetty handler sees the xml request (the post body) as an empty string. I've checked and the client outputs that it's consistently trying to send the correct (> 0 length) xml request string.
I have also reproduced this by looping my JUnit tests on my local (Win 8) box.
I assume the error must be something like:
Misuse of buffers
An HttpURLConnection bug
A network error
A Jetty bug
A random head slapping stupid thing I've done in the code
The relevant code is below:
CLIENT
connection = (HttpURLConnection) (new URL (url)).openConnection();
connection.setReadTimeout(readTimeoutMS);
connection.setConnectTimeout(connectTimeoutMS);
connection.setRequestMethod("POST");
connection.setAllowUserInteraction(false);
connection.setDoOutput(true);
// Send request
byte[] postBytes = requestXML.getBytes("UTF-8");
connection.setRequestProperty("Content-length", "" + postBytes.length);
OutputStream os = connection.getOutputStream();
os.write(postBytes);
os.flush();
os.close();
// Read response
InputStream is = connection.getInputStream();
StringWriter writer = new StringWriter();
IOUtils.copy(is, writer, "UTF-8");
is.close();
connection.disconnect();
return writer.toString();
SERVER (Jetty handler)
public void handle(java.lang.String target, javax.servlet.http.HttpServletRequest request, javax.servlet.http.HttpServletResponse response, int dispatch) {
InputStream is = request.getInputStream();
StringWriter writer = new StringWriter();
IOUtils.copy(is, writer, "UTF-8");
is.close();
String requestXML = writer.toString();
// requestXML is 0 length string about 0.04% of time
Can anyone think of why I'd randomly get the request as an empty string?
Thanks!
EDIT
I introduced some more trace and getContentLength() returns -1 when the error occurs, but the client output still shows it's sending the right amount of bytes.
I can't think of why you are getting a empty string. Code looks correct. If you update you code to check for empty string and if found report the content-length and transfer-encoding of the request, that would be helpful to identify the culprit. A wireshark trace of the network data would also be good.
But the bad new is that jetty-6 is really end of life, and we are unlikely to be updating it. If you are writing the code today, then you really should be using jetty-7 or 8. Perhaps even jetty-9 milestone release if you are brave. If you find such and error in jetty-9, I'd be all over it like a rash trying to fix it for you!
Make sure you set connection.setRequestProperty("Content-Type", "application/xml"); It's possible POST data may be discarded without some Content-type. This was the case when I replicated your problem locally (against a Grails embedded Tomcat instance), and supplying this fixed it.
In my code I use some Http Get request to download some files as a stream. I use the following code:
public String getClassName(String url) throws ClientProtocolException, IOException {
HttpResponse response = sendGetRequestJsonText(url);
Header[] all = response.getAllHeaders();
for (Header h : all) {
System.out.println(h.getName() + ": " + h.getValue());
}
Header[] headers = response.getHeaders("Content-Disposition");
InputStreamParser.convertStreamToString(response.getEntity().getContent());
String result = "";
for (Header header : headers) {
result = header.getValue();
}
return result.substring(result.indexOf("''") + "''".length(), result.length()).trim();
}
But this downloads the full content of the response. I want to retrieve only the http headers without the content. A HEAD request seems not to work because then i get the status 501, not implemented. How can I do that?
Instead of making a GET request, you might consider just making a HEAD request:
The HEAD method is identical to GET except that the server MUST NOT
return a message-body in the response. The metainformation contained
in the HTTP headers in response to a HEAD request SHOULD be identical
to the information sent in response to a GET request. This method can
be used for obtaining metainformation about the entity implied by the
request without transferring the entity-body itself. This method is
often used for testing hypertext links for validity, accessibility,
and recent modification.
You might be able to use the Range header in your request to specify a range of bytes to include in the response entity. Possibly something like:
Range: bytes=0-0
If it does work, you should receive back a 206 Partial Content with the bytes specified in your Range header present in the response entity. However, I've not tried this, and it's also not guaranteed to work:
A server MAY ignore the Range header.
Let's say I have a java program that makes an HTTP request on a server using HTTP 1.1 and doesn't close the connection. I make one request, and read all data returned from the input stream I have bound to the socket. However, upon making a second request, I get no response from the server (or there's a problem with the stream - it doesn't provide any more input). If I make the requests in order (Request, request, read) it works fine, but (request, read, request, read) doesn't.
Could someone shed some insight onto why this might be happening? (Code snippets follow). No matter what I do, the second read loop's isr_reader.read() only ever returns -1.
try{
connection = new Socket("SomeServer", port);
con_out = connection.getOutputStream();
con_in = connection.getInputStream();
PrintWriter out_writer = new PrintWriter(con_out, false);
out_writer.print("GET http://somesite HTTP/1.1\r\n");
out_writer.print("Host: thehost\r\n");
//out_writer.print("Content-Length: 0\r\n");
out_writer.print("\r\n");
out_writer.flush();
// If we were not interpreting this data as a character stream, we might need to adjust byte ordering here.
InputStreamReader isr_reader = new InputStreamReader(con_in);
char[] streamBuf = new char[8192];
int amountRead;
StringBuilder receivedData = new StringBuilder();
while((amountRead = isr_reader.read(streamBuf)) > 0){
receivedData.append(streamBuf, 0, amountRead);
}
// Response is processed here.
if(connection != null && !connection.isClosed()){
//System.out.println("Connection Still Open...");
out_writer.print("GET http://someSite2\r\n");
out_writer.print("Host: somehost\r\n");
out_writer.print("Connection: close\r\n");
out_writer.print("\r\n");
out_writer.flush();
streamBuf = new char[8192];
amountRead = 0;
receivedData.setLength(0);
while((amountRead = isr_reader.read(streamBuf)) > 0 || amountRead < 1){
if (amountRead > 0)
receivedData.append(streamBuf, 0, amountRead);
}
}
// Process response here
}
Responses to questions:
Yes, I'm receiving chunked responses from the server.
I'm using raw sockets because of an outside restriction.
Apologies for the mess of code - I was rewriting it from memory and seem to have introduced a few bugs.
So the consensus is I have to either do (request, request, read) and let the server close the stream once I hit the end, or, if I do (request, read, request, read) stop before I hit the end of the stream so that the stream isn't closed.
According to your code, the only time you'll even reach the statements dealing with sending the second request is when the server closes the output stream (your input stream) after receiving/responding to the first request.
The reason for that is that your code that is supposed to read only the first response
while((amountRead = isr_reader.read(streamBuf)) > 0) {
receivedData.append(streamBuf, 0, amountRead);
}
will block until the server closes the output stream (i.e., when read returns -1) or until the read timeout on the socket elapses. In the case of the read timeout, an exception will be thrown and you won't even get to sending the second request.
The problem with HTTP responses is that they don't tell you how many bytes to read from the stream until the end of the response. This is not a big deal for HTTP 1.0 responses, because the server simply closes the connection after the response thus enabling you to obtain the response (status line + headers + body) by simply reading everything until the end of the stream.
With HTTP 1.1 persistent connections you can no longer simply read everything until the end of the stream. You first need to read the status line and the headers, line by line, and then, based on the status code and the headers (such as Content-Length) decide how many bytes to read to obtain the response body (if it's present at all). If you do the above properly, your read operations will complete before the connection is closed or a timeout happens, and you will have read exactly the response the server sent. This will enable you to send the next request and then read the second response in exactly the same manner as the first one.
P.S. Request, request, read might be "working" in the sense that your server supports request pipelining and thus, receives and processes both request, and you, as a result, read both responses into one buffer as your "first" response.
P.P.S Make sure your PrintWriter is using the US-ASCII encoding. Otherwise, depending on your system encoding, the request line and headers of your HTTP requests might be malformed (wrong encoding).
Writing a simple http/1.1 client respecting the RFC is not such a difficult task.
To solve the problem of the blocking i/o access where reading a socket in java, you must use java.nio classes.
SocketChannels give the possibility to perform a non-blocking i/o access.
This is necessary to send HTTP request on a persistent connection.
Furthermore, nio classes will give better performances.
My stress test give to following results :
HTTP/1.0 (java.io) -> HTTP/1.0 (java.nio) = +20% faster
HTTP/1.0 (java.io) -> HTTP/1.1 (java.nio with persistent connection) = +110% faster
Make sure you have a Connection: keep-alive in your request. This may be a moot point though.
What kind of response is the server returning? Are you using chunked transfer? If the server doesn't know the size of the response body, it can't provide a Content-Length header and has to close the connection at the end of the response body to indicate to the client that the content has ended. In this case, the keep-alive won't work. If you're generating content on-the-fly with PHP, JSP etc., you can enable output buffering, check the size of the accumulated body, push the Content-Length header and flush the output buffer.
Is there a particular reason you're using raw sockets and not Java's URL Connection or Commons HTTPClient?
HTTP isn't easy to get right. I know Commons HTTP Client can re-use connections like you're trying to do.
If there isn't a specific reason for you using Sockets this is what I would recommend :)
Writing your own correct client HTTP/1.1 implementation is nontrivial; historically most people who I've seen attempt it have got it wrong. Their implementation usually ignores the spec and just does what appears to work with one particular test server - in particular, they usually ignore the requirement to be able to handle chunked responses.
Writing your own HTTP client is probably a bad idea, unless you have some VERY strange requirements.