how to use readline without and EOF character java

how to use readline without and EOF character java - java

i am trying to implement a simple server application in java.
all it does is read in a message on the tcp/ip and stores it as a string this is my code.
try{
in = new BufferedReader(new InputStreamReader(clientSocket.getInputStream()));
} catch (IOException e) {
System.out.println("cannot open input buffer");
System.exit(-1);
}
clientSocket.setSoTimeout(5000);
//read first bit of message
message = in.readLine();
System.out.println(message);
//as message is an undefined length we need to loop and check for the springer miller
//end mark /Request
while(message.contains("/Request") == false )
{
try {
message = in.readLine();
System.out.println(message);
}
catch (IOException e) {
System.out.println("cannot open input buffer");
System.exit(-1);
}
}
//reply
out.println(outputLine);
the problem i am having is that the message does not appear to have an EOF. it is another companies protocol i am translating into mine, thats the purpose of the program so i cannot add a EOF to the message
the information a get if i run the program is:
POST / HTTP/1.1
Content-Type: text/xml; charset=utf-8
SOAPAction: http://htng.org/1.1/Listener.Wsdl#ReceiveMessageAsync
User-Agent: Java/1.6.0_24
Host: 192.168.0.32:8080
Accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2
Connection: keep-alive
Content-Length: 3009
then it hangs when it should read the message body.
i have never used java in my life before and do not want to write a binary socket readed to detect my own EOF.
is there a way to read for x seconds and then return
thank you for any help.
P.S have already successfully built the program in C++ but need to port in to java because destined machine is unknown.

BufferedReader.readline will return null on EOF and not throw an exception.
Moreover, the "other companies protocol" seems to be SOAP over HTTP.
Maybe you want to use a HTTP or SOAP library? Others here will be able to give pointers...
Otherwise you can use the following approach:
readLine once to get check if the method is indeed POST (otherwise the Content-Length header might not be there) and the path is correct.
readLine until it either returns an empty line (or null), to read all the HTTP headers.
While doing that look out for a line starting with Content-Length, to determine the length of the following XML data.
create a char[] of the correct length and use in.read(cbuf, 0, cbuf.length) to read the xml into the created buffer cbuf.

Implementing protocols on top of TCP/IP is tricky, and requires quite a lot of understanding of how networking, sockets and your OS:s I/O work.
Further, implementing HTTP is surprisingly complex - on top of the network complexity.
I'm politely suggesting that you probably are in deep water, as you have to ask questions at this level, and probably need more help than you can get on SO.
...anyway.
If the server you are reading from is trying to talk http, use an existing component for it. Apache HttpComponents if probably a good choice. I don't really buy that it forges http headers, and I suggest that you skip your "lightweight" approach.
Here is some network-i/o basic facts.
Network writes are packet oriented. Tcp/ip generally tries to stuff as much as possible into every packet (using some smart algorithms). That means that if you write 4000 bytes, the message is split up into several packets that are arbitrarily sized, but normally less than 1500 byte - depending on the network equipment. It also means that
if you write less than a packet, your writes may be merged into one packet. (Packets may also be split and merged along they way.)
In order to send messages over the stream ( which itself is transported in packets...) you need to know in advance how long the messages are, or, read a full packet (do a .read() into a large buffer), parse the contents, and extract and contruct complete messages in some smart way. Exactly what http does. (among things)
TCP/IP is certainly NOT line-oriented, so your newlines are totally ignored. HTTP uses the content-length (and some other tricks, as it may not always be defined) to send "messages" over a single tcp/ip stream, that may or may not be closed when a message is completely sent.

Related

Streaming an upload with HttpClient/MultipartEntity

I've got a Tomcat instance right now that takes uploads and does some processing work on the data.
I want to replace this with a new servlet that conforms to a similar API. At first, I want this new servlet to just proxy all of the requests to the old one. They're running on separate JVMs, but on the same host.
I've been trying to use the HttpClient to proxy the upload, but it seems that the client waits for the stream to finish before it proxies the request. For large files, this causes the servlet to crash (I think it's buffering everything in memory).
Here's the code I'm currently using:
HttpPost httpPost = new HttpPost("http://localhost:8081/servlet");
String filePartName = request.getHeader("file_part_name");
_logger.info("Attaching file " + filePartName);
try {
Part filePart = request.getPart(filePartName);
MultipartEntity mpe = new MultipartEntity();
mpe.addPart(
filePartName,
new InputStreamBody(filePart.getInputStream(), filePartName)
);
httpPost.setEntity(mpe);
} catch (ServletException | IOException e) {
_logger.error("Caught exception trying to cross the streams, thanks Ghostbusters.", e);
throw new IllegalStateException("Could not proxy the request", e);
}
HttpResponse postResponse;
try {
postResponse = HTTP_CLIENT.execute(httpPost);
} catch (IOException e) {
_logger.error("Caught exception trying to cross the streams, thanks Ghostbusters.", e);
throw new IllegalStateException("Could not proxy the request", e);
}
I can't seem to figure out how to get HttpClient/HttpPost to stream the data as it comes in, instead of blocking until the first upload completes. Has anyone done something similar before? Is there an easier solution?
Thanks!

The issue lies in the way your request is processed by the Mime/Multiplart framework (the one you use to process your HTTPServletRequest, and access file parts).
The nature of a MIME/Multipart request is simple (at a high level), instead of having a traditionnal key=value content, those requests have much more complex syntax, that allows them to carry arbitrary, unstructured data (files to upload).
It basically looks like (taken from wikipedia):
Content-type: multipart/mixed; boundary="'''frontier'''"
This is a multi-part message in MIME format.
--'''frontier'''
Content-type: text/plain
This is the body of the message.
--'''frontier'''
Content-type: application/octet-stream
Content-Disposition: form-data; name="image1"
Content-transfer-encoding: base64
PGh0bWw+CiAgPGhlYWQ+CiAgPC9oZWFkPgogIDxib2R5PgogICAgPHA+VGhpcyBpcyB0aGUg
Ym9keSBvZiB0aGUgbWVzc2FnZS48L3A+CiAgPC9ib2R5Pgo8L2h0bWw+Cg==
--'''frontier'''--
The important part to note is that parts (that are separated by the boundary '''frontier''' here) have "names" (through the Content Disposition header), then follows the content. One such request can have any number of parts.
Now of course, the most simple, straightforward way to implement the parsing of such a request is to process it till the end, detect the boundary, and create a temporary file (or in-memory cache) to hold each part, identified by name.
Seeing the framework can not know what part you will need first (you may need the second part in your servlet call before the first), it parses the whole stream, and then, gives you back the control.
Therefore your call is blocked at this line
Part filePart = request.getPart(filePartName);
Here, the framework has to wait to parse the whole MIME part, before letting you use the result (even a rethorical, super optimised parser could not both parse lazily the stream, and allow you random access to any parts of the message, you'd have to choose between the two options).
So there's not much you can do...
Except, not use the Multipart parser. I wouldn't recommend this if you're not familiar with MIME (and/or MIME libraries such as Apache James), nor confident that you are in control of your request's structure.
But if you are, then you may bypass the framework processing, and access the raw stream of the request. You'd parse the MIME structure by hand, and stop when you hit the start of the request's body, and start building your HTTP Post at this point, being carefull to actually take care of MIME level technicalities (de-base64 ? de-gzip ?, ...).
Alternatively, if you think your server crashes because of an out of memory, it may very well be possible that your framework is configured to cache contents of the multpart in memory. But if there is a way to configure it to cache to disk, then this is a possible workaround.

An easy way to detect the end of http response (raw socket, java)?

I want to retrieve the server's response as is, with all headers. The first thing that comes to mind is to use raw sockets. As I have learned from the search, there are 3 ways to indicate the end of response:
(1) closing the connection;
(2) examining Content-Length;
(3) getting all chunks in the case of Transfer-Encoding: Chunked.
There is also
(4) the timeout method: assume that the timeout means end of data, but the latter is not really reliable.
I want a general-case solution and do not want to
add a Connection: close line to the request itself.
In addition, it is recommended to use an existing library rather than re-invent the wheel.
Question:
How do I use an existing package, preferably, something already present in Android, to detect the end of HTTP response while having access (without interference) to the raw data stream?
UPD: forgot to mention that the HTTP request is given to me as a sequence of bytes. Yes, it is for testing.
PS
relevant reading:
End of an HTTP Response
Detect the end of an HTTP Request in Java
Detect end of HTTP request body
How HTTP Server inform its clients that the response has ended
Proper handling of chuncked Http Response within Socket
Detect the end of a HTTP packet
Android socket & HTTP response headers
Java HTTP GET response waits until timeout

I suggest to use a the Apache HTTP client package (http://hc.apache.org/httpclient-3.x/ ) so you don't need to implement all the finicky details of the HTTP protocol.
The Apache Http Client will give you access to the headers and their content, which may be enough for you.
If you really need access to the actual character sequence sent by the server (e.g. for debugging purposes), you could then intercept the communication by replacing the connection socket factory with your own to create "intercepting" sockets which store all data transferred in a buffer where your code can access it later on. See http://hc.apache.org/httpcomponents-client-4.3.x/tutorial/html/connmgmt.html#d5e418

How to know when HTTP-server is done sending data

I'm working on a browser/proxy oriented project where I need to download webpages. After sending a custom HTTP request to a web server I start listening for a server response.
When reading the response, I check the response headers for a Content-Length:-row. If I get one of those, it's easy to determine when the server is done sending data since I always know how many bytes of data I have received.
The problem occurs when the server doesn't include the Content-Length header and also keeps the connection open for further requests. For example, the google server responds with gzipped-content, but doesn't include content length. How do I know when to stop waiting for more data and close the connection?
I have considered using a timeout value to close the connection when no data has been received for a while, but this seems like the wrong way to do it. Chrome for example, can download the same pages as me and always seem to know exactly when to close the connection.

Have a look at IETF RfC 2616, search for chunked encoding and Content-Range.
HTTP is designed to return content of unknown length, as in:
HTTP/1.1 200 OK
Content-Type: text/plain
Transfer-Encoding: chunked
25
This is the data in the first chunk
1C
and this is the second one
3
con
8
sequence
0
source Wikipedia

I would try to suggest you to force Connection: close header so you are sure that the server closes the connection after output is finished, no matter if the Content-length is set or not. Performance will be partially affected by this

There are two cases you can expect:
1. socket-close
2. socket-timeout
Usually the socket will be closed, it also make sense to declare an Socket Timeout.
Remember
int stream.read(byte[],size);
returns the real size of byte[]-argument's size that has been read till socket-close or socket-timeout (or size-argument reached).
Regards.

Get content from HTTP request even if there is no contentlength header

Am testing with a client who send me a HTTP request with no content length header but has a content.
How do I extract this content without the help of contentlength header?

I've kept the original answer for completeness, but I've just been looking in the HTTP RFC (2616) section 4.3:
The presence of a message-body in a request is signaled by the inclusion of a Content-Length or Transfer-Encoding header field in the request's message-headers. A message-body MUST NOT be included in a request if the specification of the request method (section 5.1.1) does not allow sending an entity-body in requests. A server SHOULD read and forward a message-body on any request; if the request method does not include defined semantics for an entity-body, then the message-body SHOULD be ignored when handling the request.
So if you haven't got a content length, you must have a Transfer-Encoding (and if you haven't, you should respond with a 400 status to indicate a bad request or 411 ("length required")). At that point, you do what the Transfer-Encoding tells you :)
Now if you're dealing with a servlet API (or a similar HTTP API) it may well handle all this for you - at which point you may be able to use the techique below to read from the stream until it yields no more data, as the API will take care of it (i.e. it won't just be a raw socket stream).
If you could give us more information about your context, that would help.
Original answer
If there's no content length, that means the content continues until the end of the data (when the socket closes).
Keep reading from the input stream (e.g. writing it to a ByteArrayOutputStream to store it, or possibly a file) until InputStream.read returns -1. For example:
byte[] buffer = new byte[8192];
ByteArrayOutputStream output = new ByteArrayOutputStream();
int bytesRead;
while ((bytesRead = inputStream.read(buffer)) != -1)
{
output.write(buffer, 0, bytesRead);
}
// Now use the data in "output"
EDIT: As has been pointed out in comments, the client could be using a chunked encoding. Normally the HTTP API you're using should deal with this for you, but if you're dealing with a raw socket you'd have to handle it yourself.
The point about this being a request (and therefore the client not being able to close the connection) is an interesting one - I thought the client could just shut down the sending part, but I don't see how that maps to anything in TCP at the moment. My low-level networking knowledge isn't what it might be.
If this answer turns out to be "definitely useless" I'll delete it...

If this were a response then the message could be terminated by closing the connection. But that's not an option here because the client still needs to read the response.
Apart from Content-Length:, the other methods of determining content length are:
Transfer-Encoding: chunked
guesswork
Hopefully it's the former, in which case the request should look something like this:
POST /some/path HTTP/1.1
Host: www.example.com
Content-Type: text/plain
Transfer-Encoding: chunked
25
This is the data in the first chunk
1C
and this is the second one
3
con
8
sequence
0
(shamelessly stolen from the Wikipedia article and modified for a request)
each chunk is of the form: hex-encoded length, CRLF, data, CRLF
after the final data-carrying chunk comes a zero-length chunk with no data
after the zero-length chunk comes optional extra HTTP headers
after the optional HTTP headers comes another CRLF

See HTTPbis Part1, Section 3.3.

HTTP 1.1 Persistent Connections using Sockets in Java

Let's say I have a java program that makes an HTTP request on a server using HTTP 1.1 and doesn't close the connection. I make one request, and read all data returned from the input stream I have bound to the socket. However, upon making a second request, I get no response from the server (or there's a problem with the stream - it doesn't provide any more input). If I make the requests in order (Request, request, read) it works fine, but (request, read, request, read) doesn't.
Could someone shed some insight onto why this might be happening? (Code snippets follow). No matter what I do, the second read loop's isr_reader.read() only ever returns -1.
try{
connection = new Socket("SomeServer", port);
con_out = connection.getOutputStream();
con_in = connection.getInputStream();
PrintWriter out_writer = new PrintWriter(con_out, false);
out_writer.print("GET http://somesite HTTP/1.1\r\n");
out_writer.print("Host: thehost\r\n");
//out_writer.print("Content-Length: 0\r\n");
out_writer.print("\r\n");
out_writer.flush();
// If we were not interpreting this data as a character stream, we might need to adjust byte ordering here.
InputStreamReader isr_reader = new InputStreamReader(con_in);
char[] streamBuf = new char[8192];
int amountRead;
StringBuilder receivedData = new StringBuilder();
while((amountRead = isr_reader.read(streamBuf)) > 0){
receivedData.append(streamBuf, 0, amountRead);
}
// Response is processed here.
if(connection != null && !connection.isClosed()){
//System.out.println("Connection Still Open...");
out_writer.print("GET http://someSite2\r\n");
out_writer.print("Host: somehost\r\n");
out_writer.print("Connection: close\r\n");
out_writer.print("\r\n");
out_writer.flush();
streamBuf = new char[8192];
amountRead = 0;
receivedData.setLength(0);
while((amountRead = isr_reader.read(streamBuf)) > 0 || amountRead < 1){
if (amountRead > 0)
receivedData.append(streamBuf, 0, amountRead);
}
}
// Process response here
}
Responses to questions:
Yes, I'm receiving chunked responses from the server.
I'm using raw sockets because of an outside restriction.
Apologies for the mess of code - I was rewriting it from memory and seem to have introduced a few bugs.
So the consensus is I have to either do (request, request, read) and let the server close the stream once I hit the end, or, if I do (request, read, request, read) stop before I hit the end of the stream so that the stream isn't closed.

According to your code, the only time you'll even reach the statements dealing with sending the second request is when the server closes the output stream (your input stream) after receiving/responding to the first request.
The reason for that is that your code that is supposed to read only the first response
while((amountRead = isr_reader.read(streamBuf)) > 0) {
receivedData.append(streamBuf, 0, amountRead);
}
will block until the server closes the output stream (i.e., when read returns -1) or until the read timeout on the socket elapses. In the case of the read timeout, an exception will be thrown and you won't even get to sending the second request.
The problem with HTTP responses is that they don't tell you how many bytes to read from the stream until the end of the response. This is not a big deal for HTTP 1.0 responses, because the server simply closes the connection after the response thus enabling you to obtain the response (status line + headers + body) by simply reading everything until the end of the stream.
With HTTP 1.1 persistent connections you can no longer simply read everything until the end of the stream. You first need to read the status line and the headers, line by line, and then, based on the status code and the headers (such as Content-Length) decide how many bytes to read to obtain the response body (if it's present at all). If you do the above properly, your read operations will complete before the connection is closed or a timeout happens, and you will have read exactly the response the server sent. This will enable you to send the next request and then read the second response in exactly the same manner as the first one.
P.S. Request, request, read might be "working" in the sense that your server supports request pipelining and thus, receives and processes both request, and you, as a result, read both responses into one buffer as your "first" response.
P.P.S Make sure your PrintWriter is using the US-ASCII encoding. Otherwise, depending on your system encoding, the request line and headers of your HTTP requests might be malformed (wrong encoding).

Writing a simple http/1.1 client respecting the RFC is not such a difficult task.
To solve the problem of the blocking i/o access where reading a socket in java, you must use java.nio classes.
SocketChannels give the possibility to perform a non-blocking i/o access.
This is necessary to send HTTP request on a persistent connection.
Furthermore, nio classes will give better performances.
My stress test give to following results :
HTTP/1.0 (java.io) -> HTTP/1.0 (java.nio) = +20% faster
HTTP/1.0 (java.io) -> HTTP/1.1 (java.nio with persistent connection) = +110% faster

Make sure you have a Connection: keep-alive in your request. This may be a moot point though.
What kind of response is the server returning? Are you using chunked transfer? If the server doesn't know the size of the response body, it can't provide a Content-Length header and has to close the connection at the end of the response body to indicate to the client that the content has ended. In this case, the keep-alive won't work. If you're generating content on-the-fly with PHP, JSP etc., you can enable output buffering, check the size of the accumulated body, push the Content-Length header and flush the output buffer.

Is there a particular reason you're using raw sockets and not Java's URL Connection or Commons HTTPClient?
HTTP isn't easy to get right. I know Commons HTTP Client can re-use connections like you're trying to do.
If there isn't a specific reason for you using Sockets this is what I would recommend :)

Writing your own correct client HTTP/1.1 implementation is nontrivial; historically most people who I've seen attempt it have got it wrong. Their implementation usually ignores the spec and just does what appears to work with one particular test server - in particular, they usually ignore the requirement to be able to handle chunked responses.
Writing your own HTTP client is probably a bad idea, unless you have some VERY strange requirements.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.