HTTP 1.1 Persistent Connections using Sockets in Java - java

Let's say I have a java program that makes an HTTP request on a server using HTTP 1.1 and doesn't close the connection. I make one request, and read all data returned from the input stream I have bound to the socket. However, upon making a second request, I get no response from the server (or there's a problem with the stream - it doesn't provide any more input). If I make the requests in order (Request, request, read) it works fine, but (request, read, request, read) doesn't.
Could someone shed some insight onto why this might be happening? (Code snippets follow). No matter what I do, the second read loop's isr_reader.read() only ever returns -1.
try{
connection = new Socket("SomeServer", port);
con_out = connection.getOutputStream();
con_in = connection.getInputStream();
PrintWriter out_writer = new PrintWriter(con_out, false);
out_writer.print("GET http://somesite HTTP/1.1\r\n");
out_writer.print("Host: thehost\r\n");
//out_writer.print("Content-Length: 0\r\n");
out_writer.print("\r\n");
out_writer.flush();
// If we were not interpreting this data as a character stream, we might need to adjust byte ordering here.
InputStreamReader isr_reader = new InputStreamReader(con_in);
char[] streamBuf = new char[8192];
int amountRead;
StringBuilder receivedData = new StringBuilder();
while((amountRead = isr_reader.read(streamBuf)) > 0){
receivedData.append(streamBuf, 0, amountRead);
}
// Response is processed here.
if(connection != null && !connection.isClosed()){
//System.out.println("Connection Still Open...");
out_writer.print("GET http://someSite2\r\n");
out_writer.print("Host: somehost\r\n");
out_writer.print("Connection: close\r\n");
out_writer.print("\r\n");
out_writer.flush();
streamBuf = new char[8192];
amountRead = 0;
receivedData.setLength(0);
while((amountRead = isr_reader.read(streamBuf)) > 0 || amountRead < 1){
if (amountRead > 0)
receivedData.append(streamBuf, 0, amountRead);
}
}
// Process response here
}
Responses to questions:
Yes, I'm receiving chunked responses from the server.
I'm using raw sockets because of an outside restriction.
Apologies for the mess of code - I was rewriting it from memory and seem to have introduced a few bugs.
So the consensus is I have to either do (request, request, read) and let the server close the stream once I hit the end, or, if I do (request, read, request, read) stop before I hit the end of the stream so that the stream isn't closed.

According to your code, the only time you'll even reach the statements dealing with sending the second request is when the server closes the output stream (your input stream) after receiving/responding to the first request.
The reason for that is that your code that is supposed to read only the first response
while((amountRead = isr_reader.read(streamBuf)) > 0) {
receivedData.append(streamBuf, 0, amountRead);
}
will block until the server closes the output stream (i.e., when read returns -1) or until the read timeout on the socket elapses. In the case of the read timeout, an exception will be thrown and you won't even get to sending the second request.
The problem with HTTP responses is that they don't tell you how many bytes to read from the stream until the end of the response. This is not a big deal for HTTP 1.0 responses, because the server simply closes the connection after the response thus enabling you to obtain the response (status line + headers + body) by simply reading everything until the end of the stream.
With HTTP 1.1 persistent connections you can no longer simply read everything until the end of the stream. You first need to read the status line and the headers, line by line, and then, based on the status code and the headers (such as Content-Length) decide how many bytes to read to obtain the response body (if it's present at all). If you do the above properly, your read operations will complete before the connection is closed or a timeout happens, and you will have read exactly the response the server sent. This will enable you to send the next request and then read the second response in exactly the same manner as the first one.
P.S. Request, request, read might be "working" in the sense that your server supports request pipelining and thus, receives and processes both request, and you, as a result, read both responses into one buffer as your "first" response.
P.P.S Make sure your PrintWriter is using the US-ASCII encoding. Otherwise, depending on your system encoding, the request line and headers of your HTTP requests might be malformed (wrong encoding).

Writing a simple http/1.1 client respecting the RFC is not such a difficult task.
To solve the problem of the blocking i/o access where reading a socket in java, you must use java.nio classes.
SocketChannels give the possibility to perform a non-blocking i/o access.
This is necessary to send HTTP request on a persistent connection.
Furthermore, nio classes will give better performances.
My stress test give to following results :
HTTP/1.0 (java.io) -> HTTP/1.0 (java.nio) = +20% faster
HTTP/1.0 (java.io) -> HTTP/1.1 (java.nio with persistent connection) = +110% faster

Make sure you have a Connection: keep-alive in your request. This may be a moot point though.
What kind of response is the server returning? Are you using chunked transfer? If the server doesn't know the size of the response body, it can't provide a Content-Length header and has to close the connection at the end of the response body to indicate to the client that the content has ended. In this case, the keep-alive won't work. If you're generating content on-the-fly with PHP, JSP etc., you can enable output buffering, check the size of the accumulated body, push the Content-Length header and flush the output buffer.

Is there a particular reason you're using raw sockets and not Java's URL Connection or Commons HTTPClient?
HTTP isn't easy to get right. I know Commons HTTP Client can re-use connections like you're trying to do.
If there isn't a specific reason for you using Sockets this is what I would recommend :)

Writing your own correct client HTTP/1.1 implementation is nontrivial; historically most people who I've seen attempt it have got it wrong. Their implementation usually ignores the spec and just does what appears to work with one particular test server - in particular, they usually ignore the requirement to be able to handle chunked responses.
Writing your own HTTP client is probably a bad idea, unless you have some VERY strange requirements.

Related

An easy way to detect the end of http response (raw socket, java)?

I want to retrieve the server's response as is, with all headers. The first thing that comes to mind is to use raw sockets. As I have learned from the search, there are 3 ways to indicate the end of response:
(1) closing the connection;
(2) examining Content-Length;
(3) getting all chunks in the case of Transfer-Encoding: Chunked.
There is also
(4) the timeout method: assume that the timeout means end of data, but the latter is not really reliable.
I want a general-case solution and do not want to
add a Connection: close line to the request itself.
In addition, it is recommended to use an existing library rather than re-invent the wheel.
Question:
How do I use an existing package, preferably, something already present in Android, to detect the end of HTTP response while having access (without interference) to the raw data stream?
UPD: forgot to mention that the HTTP request is given to me as a sequence of bytes. Yes, it is for testing.
PS
relevant reading:
End of an HTTP Response
Detect the end of an HTTP Request in Java
Detect end of HTTP request body
How HTTP Server inform its clients that the response has ended
Proper handling of chuncked Http Response within Socket
Detect the end of a HTTP packet
Android socket & HTTP response headers
Java HTTP GET response waits until timeout
I suggest to use a the Apache HTTP client package (http://hc.apache.org/httpclient-3.x/ ) so you don't need to implement all the finicky details of the HTTP protocol.
The Apache Http Client will give you access to the headers and their content, which may be enough for you.
If you really need access to the actual character sequence sent by the server (e.g. for debugging purposes), you could then intercept the communication by replacing the connection socket factory with your own to create "intercepting" sockets which store all data transferred in a buffer where your code can access it later on. See http://hc.apache.org/httpcomponents-client-4.3.x/tutorial/html/connmgmt.html#d5e418

HTTPClient never leaves socketRead() when executing GET on stream - workaround?

I am using Apache HttpClient (from Apache HTTP Components 4.3) in order to execute a GET against a ShoutCast stream:
CloseableHttpClient client = HttpClients.createDefault();
HttpGet request = new HttpGet("http://relay3.181.fm:8062/");
CloseableHttpResponse response = client.execute(request);
The call to client.execute() never returns, and according to the debugger it is a nested invocation to java.net.SocketInputStream#socketRead0() which is the last node in the call stack. From profiling the code, my only conclusion (based on a steadily rising number of char[] allocations) is that it simply "latches on" to the stream and keeps pulling bytes from the socket indefinitely.
What I would like is for the client to simply work normally and give me a HTTPResponse which I can use to pull what I want from the stream. As a matter of fact, I have been able to do so with other ShoutCast streams, but not this one.
Is there any way to work around this? Could I for example tell the client to break off after a certain number of bytes?
That site is very particular. If you don't specify a supported User-Agent (like Mozilla), the server keep streaming bytes. I don't know what these bytes are meant to represent, audio perhaps.
If you print out the bytes that you receive, you will see
ICY 200 OK
icy-notice1:<BR>This stream requires Winamp<BR>
icy-notice2:SHOUTcast Distributed Network Audio Server/Linux v1.9.8<BR>
icy-name:181.FM - The Beatles Channel
icy-genre:Oldies
icy-url:http://www.181.fm
content-type:audio/mpeg
icy-pub:1
icy-br:128
which indicates that the response is not a valid HTTP response. It is an ICY response from the ICY protocol.
Now the default HttpClient you are using uses a DefaultHttpResponseParser which is a
Lenient HTTP response parser implementation that can skip malformed
data until a valid HTTP response message head is encountered.
In other words, it keeps reading the bytes the server is sending until it finds a valid HTTP response header, which will never happen, thus the infinite read.
I don't think you will be able to accomplish what you want with the Http Components library. Either look for an ICY client implementation in Java or spin your own.

Apache HTTPClient Streaming HTTP POST Request?

I'm trying to build a "full-duplex" HTTP streaming request using Apache HTTPClient.
In my first attempt, I tried using the following request code:
URL url=new URL(/* code goes here */);
HttpPost request=new HttpPost(url.toString());
request.addHeader("Connection", "close");
PipedOutputStream requestOutput=new PipedOutputStream();
PipedInputStream requestInput=new PipedInputStream(requestOutput, DEFAULT_PIPE_SIZE);
ContentType requestContentType=getContentType();
InputStreamEntity requestEntity=new InputStreamEntity(requestInput, -1, requestContentType);
request.setEntity(requestEntity);
HttpEntity responseEntity=null;
HttpResponse response=getHttpClient().execute(request); // <-- Hanging here
try {
if(response.getStatusLine().getStatusCode() != 200)
throw new IOException("Unexpected status code: "+response.getStatusLine().getStatusCode());
responseEntity = response.getEntity();
}
finally {
if(responseEntity == null)
request.abort();
}
InputStream responseInput=responseEntity.getContent();
ContentType responseContentType;
if(responseEntity.getContentType() != null)
responseContentType = ContentType.parse(responseEntity.getContentType().getValue());
else
responseContentType = DEFAULT_CONTENT_TYPE;
Reader responseStream=decode(responseInput, responseContentType);
Writer requestStream=encode(requestOutput, getContentType());
The request hangs at the line indicated above. It seems that the code is trying to send the entire request before it gets the response. In retrospect, this makes sense. However, it's not what I was hoping for. :)
Instead, I was hoping to send the request headers with Transfer-Encoding: chunked, receive a response header of HTTP/1.1 200 OK with a Transfer-Encoding: chunked header of its own, and then I'd have a full-duplex streaming HTTP connection to work with.
Happily, my HTTPClient has another NIO-based asynchronous client with good usage examples (like this one). My questions are:
Is my interpretation of the synchronous HTTPClient behavior correct? Or is there something I can do to continue using the (simpler) synchronous HTTPClient in the manner I described?
Does the NIO-based client wait to send the whole request before seeking a response? Or will I be able to send the request incrementally and receive the response incrementally at the same time?
If HTTPClient will not support this modality, is there another HTTP client library that will? Or should I be planning to write a (minimal) HTTP client to support this modality?
Here is my view on skim reading the code:
I cannot completely agree with the fact that a non-200 response means failure. All 2XX responses are mostly valid. Check wiki for more details
For any TCP request, I would recommend to receive the entire response to confirm that it is valid. I say this because, a partial response may mostly be treated as bad response as most of the client implementations cannot make use of it. (Imagine a case where server is responding with 2MB of data and it goes down during this time)
A separate thread must be writing to the OutputStream for your code to
work.
The code above provides the HTTPClient with a PipedInputStream.
PipedInputStream makes bytes available as they are written to the corresponding OutputStream.
The code above does not write to the OutputStream (which must be done by a separate thread.
Therefore the code is hanging exactly where your comment is.
Under the hood, the Apache client says "inputStream.read()" which in the case of piped streams requires that outputStream.write(bytes) was called previously (by a separate thread).
Since you aren't pumping bytes into the associated OutputStream from a separate thread the InputStream just sits and waits for the OutputStream to be written to by "some other thread."
From the JavaDocs:
A piped input stream should be connected to a piped output stream;
the piped input stream then provides whatever data bytes are written
to the piped output stream.
Typically, data is read from a PipedInputStream object by one thread
and data is written to the corresponding PipedOutputStream by some
other thread.
Attempting to use both objects from a single thread is not
recommended, as it may deadlock the thread.
The piped input stream contains a buffer, decoupling read operations
from write operations, within limits. A pipe is said to be "broken"
if a thread that was providing data bytes to the connected piped
output stream is no longer alive.
Note: Seems to me, since piped streams and concurrency were not mentioned in your problem statement, that it's not necessary. Try wrapping a ByteArrayInputStream() with the Entity object instead first for a sanity check... that should help you narrow down the issue.
Update
Incidentally, I wrote an inversion of Apache's HTTP Client API [PipedApacheClientOutputStream] which provides an OutputStream interface for HTTP POST using Apache Commons HTTP Client 4.3.4. This may be close to what you are looking for...
Calling-code looks like this:
// Calling-code manages thread-pool
ExecutorService es = Executors.newCachedThreadPool(
new ThreadFactoryBuilder()
.setNameFormat("apache-client-executor-thread-%d")
.build());
// Build configuration
PipedApacheClientOutputStreamConfig config = new
PipedApacheClientOutputStreamConfig();
config.setUrl("http://localhost:3000");
config.setPipeBufferSizeBytes(1024);
config.setThreadPool(es);
config.setHttpClient(HttpClientBuilder.create().build());
// Instantiate OutputStream
PipedApacheClientOutputStream os = new
PipedApacheClientOutputStream(config);
// Write to OutputStream
os.write(...);
try {
os.close();
} catch (IOException e) {
logger.error(e.getLocalizedMessage(), e);
}
// Do stuff with HTTP response
...
// Close the HTTP response
os.getResponse().close();
// Finally, shut down thread pool
// This must occur after retrieving response (after is) if interested
// in POST result
es.shutdown();
Note - In practice the same client, executor service, and config will likely be reused throughout the life of the application, so the outer prep and close code in the above example will likely live in bootstrap/init and finalization code rather than directly inline with the OutputStream instantiation.

how to use readline without and EOF character java

i am trying to implement a simple server application in java.
all it does is read in a message on the tcp/ip and stores it as a string this is my code.
try{
in = new BufferedReader(new InputStreamReader(clientSocket.getInputStream()));
} catch (IOException e) {
System.out.println("cannot open input buffer");
System.exit(-1);
}
clientSocket.setSoTimeout(5000);
//read first bit of message
message = in.readLine();
System.out.println(message);
//as message is an undefined length we need to loop and check for the springer miller
//end mark /Request
while(message.contains("/Request") == false )
{
try {
message = in.readLine();
System.out.println(message);
}
catch (IOException e) {
System.out.println("cannot open input buffer");
System.exit(-1);
}
}
//reply
out.println(outputLine);
the problem i am having is that the message does not appear to have an EOF. it is another companies protocol i am translating into mine, thats the purpose of the program so i cannot add a EOF to the message
the information a get if i run the program is:
POST / HTTP/1.1
Content-Type: text/xml; charset=utf-8
SOAPAction: http://htng.org/1.1/Listener.Wsdl#ReceiveMessageAsync
User-Agent: Java/1.6.0_24
Host: 192.168.0.32:8080
Accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2
Connection: keep-alive
Content-Length: 3009
then it hangs when it should read the message body.
i have never used java in my life before and do not want to write a binary socket readed to detect my own EOF.
is there a way to read for x seconds and then return
thank you for any help.
P.S have already successfully built the program in C++ but need to port in to java because destined machine is unknown.
BufferedReader.readline will return null on EOF and not throw an exception.
Moreover, the "other companies protocol" seems to be SOAP over HTTP.
Maybe you want to use a HTTP or SOAP library? Others here will be able to give pointers...
Otherwise you can use the following approach:
readLine once to get check if the method is indeed POST (otherwise the Content-Length header might not be there) and the path is correct.
readLine until it either returns an empty line (or null), to read all the HTTP headers.
While doing that look out for a line starting with Content-Length, to determine the length of the following XML data.
create a char[] of the correct length and use in.read(cbuf, 0, cbuf.length) to read the xml into the created buffer cbuf.
Implementing protocols on top of TCP/IP is tricky, and requires quite a lot of understanding of how networking, sockets and your OS:s I/O work.
Further, implementing HTTP is surprisingly complex - on top of the network complexity.
I'm politely suggesting that you probably are in deep water, as you have to ask questions at this level, and probably need more help than you can get on SO.
...anyway.
If the server you are reading from is trying to talk http, use an existing component for it. Apache HttpComponents if probably a good choice. I don't really buy that it forges http headers, and I suggest that you skip your "lightweight" approach.
Here is some network-i/o basic facts.
Network writes are packet oriented. Tcp/ip generally tries to stuff as much as possible into every packet (using some smart algorithms). That means that if you write 4000 bytes, the message is split up into several packets that are arbitrarily sized, but normally less than 1500 byte - depending on the network equipment. It also means that
if you write less than a packet, your writes may be merged into one packet. (Packets may also be split and merged along they way.)
In order to send messages over the stream ( which itself is transported in packets...) you need to know in advance how long the messages are, or, read a full packet (do a .read() into a large buffer), parse the contents, and extract and contruct complete messages in some smart way. Exactly what http does. (among things)
TCP/IP is certainly NOT line-oriented, so your newlines are totally ignored. HTTP uses the content-length (and some other tricks, as it may not always be defined) to send "messages" over a single tcp/ip stream, that may or may not be closed when a message is completely sent.

Get content from HTTP request even if there is no contentlength header

Am testing with a client who send me a HTTP request with no content length header but has a content.
How do I extract this content without the help of contentlength header?
I've kept the original answer for completeness, but I've just been looking in the HTTP RFC (2616) section 4.3:
The presence of a message-body in a request is signaled by the inclusion of a Content-Length or Transfer-Encoding header field in the request's message-headers. A message-body MUST NOT be included in a request if the specification of the request method (section 5.1.1) does not allow sending an entity-body in requests. A server SHOULD read and forward a message-body on any request; if the request method does not include defined semantics for an entity-body, then the message-body SHOULD be ignored when handling the request.
So if you haven't got a content length, you must have a Transfer-Encoding (and if you haven't, you should respond with a 400 status to indicate a bad request or 411 ("length required")). At that point, you do what the Transfer-Encoding tells you :)
Now if you're dealing with a servlet API (or a similar HTTP API) it may well handle all this for you - at which point you may be able to use the techique below to read from the stream until it yields no more data, as the API will take care of it (i.e. it won't just be a raw socket stream).
If you could give us more information about your context, that would help.
Original answer
If there's no content length, that means the content continues until the end of the data (when the socket closes).
Keep reading from the input stream (e.g. writing it to a ByteArrayOutputStream to store it, or possibly a file) until InputStream.read returns -1. For example:
byte[] buffer = new byte[8192];
ByteArrayOutputStream output = new ByteArrayOutputStream();
int bytesRead;
while ((bytesRead = inputStream.read(buffer)) != -1)
{
output.write(buffer, 0, bytesRead);
}
// Now use the data in "output"
EDIT: As has been pointed out in comments, the client could be using a chunked encoding. Normally the HTTP API you're using should deal with this for you, but if you're dealing with a raw socket you'd have to handle it yourself.
The point about this being a request (and therefore the client not being able to close the connection) is an interesting one - I thought the client could just shut down the sending part, but I don't see how that maps to anything in TCP at the moment. My low-level networking knowledge isn't what it might be.
If this answer turns out to be "definitely useless" I'll delete it...
If this were a response then the message could be terminated by closing the connection. But that's not an option here because the client still needs to read the response.
Apart from Content-Length:, the other methods of determining content length are:
Transfer-Encoding: chunked
guesswork
Hopefully it's the former, in which case the request should look something like this:
POST /some/path HTTP/1.1
Host: www.example.com
Content-Type: text/plain
Transfer-Encoding: chunked
25
This is the data in the first chunk
1C
and this is the second one
3
con
8
sequence
0
(shamelessly stolen from the Wikipedia article and modified for a request)
each chunk is of the form: hex-encoded length, CRLF, data, CRLF
after the final data-carrying chunk comes a zero-length chunk with no data
after the zero-length chunk comes optional extra HTTP headers
after the optional HTTP headers comes another CRLF
See HTTPbis Part1, Section 3.3.

Categories