Error 503 in HTTP during page parsing in java

Error 503 in HTTP during page parsing in java - java

Today I'm developing a java RMI server (and also the client) that gets info from a page and returns me what I want. I put the code right down here. The problem is that sometimes the url I pass to the method throws an IOException that says that the url given makes a 503 HTTP error. It could be easy if it was always that way but the thing is that it appears sometimes.
I have this method structure because the page I parse is from a weather company and I want info from many cities, not only for one, so some cities works perfectly at the first chance and others it fails. Any suggestions?
public ArrayList<Medidas> parse(String url){
medidas = new ArrayList<Medidas>();
int v=0;
String sourceLine;
String content = "";
try{
// The URL address of the page to open.
URL address = new URL(url);
// Open the address and create a BufferedReader with the source code.
InputStreamReader pageInput = new InputStreamReader(address.openStream());
BufferedReader source = new BufferedReader(pageInput);
// Append each new HTML line into one string. Add a tab character.
while ((sourceLine = source.readLine()) != null){
if(sourceLine.contains("<tbody>")) v=1;
else if (sourceLine.contains("</tbody>"))
break;
else if(v==1)
content += sourceLine + "\n";
}
........................
........................ NOW THE PARSING CODE, NOT IMPORTANT
}

HTTP 500 errors reflect server errors so it has likely nothing to do with your client code.
You would get a 400 error if you were passing invalid parameters on your request.
503 is "Service Unavailable" and may be sent by the server when it is overloaded and cannot process your request. From a publicly accessible server, that could explain the erratic behavior.
Edit
Build a retry handler in your code when you detect a 503. Apache HTTPClient can do that automatically for you.
List of HTTP Status Codes

Check that the IOException is really not a MalformedURLException. Try printing out the URLs to verify a bad URL is not causing the IOException.
How large is the file you are parsing? Perhaps your JVM is running out of memory.

Related

HTTP Server - Serving up favicon.ico

I'm playing around setting up my own java http server to better understand http servers and what goes on under the hood of the web. I've developed a pretty simple server and have been able to serve both html pages as well as data in JSON form. Then I saw the browser (I'm using chrome but assuming it's the same for others) was sending a request for favicon.ico. I'm able to identify that request on my server, so I'm trying to serve up a random icon I downloaded and resized to 16x16 pixels in png format, as that's what the internet says the size needs to be. Here's my code, note it's not supposed to be anything professional, just something that will work for my basic educational purposes:
[set up ServerSocket and listen]
public static String err_header = "HTTP/1.1 500 ERR\nAccess-Control-Allow-Origin: *";
public static String success_header = "HTTP/1.1 200 OK\nAccess-Control-Allow-Origin: *";
public static String end_header = "\r\n\r\n";
while(true){
try{
System.out.println("Listening for new connections");
clientSocket = server.accept();
System.out.println("Connection established");
InputStreamReader isr = new InputStreamReader(clientSocket.getInputStream());
BufferedReader reader = new BufferedReader(isr);
String getLine = reader.readLine();//first line of HTTP request
handleRequest(getLine,clientSocket);
}//end of try
catch(Exception e){
[error stuff]
}//end of catch
}//end of while
HandleRequest method:
public static void handleRequest(String getLine,Socket clientSocket) throws Exception{
if(getLine.substring(5,16).equals("favicon.ico")){
List<String> iconTag = new ArrayList<String>();
iconTag.add("\nContent-Type: image/png");
handleFileRequest("[file]",iconTag,clientSocket);
}//end of if
else{
handleFileRequest("[file]",clientSocket);
}//end of else
}//end of handleRequest
handleFileRequest for images:
public static void handleFileRequest(String fileName,List<String> headerTags,Socket clientSocket) throws Exception{
OutputStream out = clientSocket.getOutputStream();
BufferedReader read = new BufferedReader(new FileReader(fileName));
out.write(success_header.getBytes("UTF-8"));
Iterator<String> itr = headerTags.iterator();
while(itr.hasNext()){
out.write(itr.next().getBytes("UTF-8"));
}//end of while
out.write(end_header.getBytes("UTF-8"));
String readLine = "";
while((readLine = read.readLine())!=null){
out.write(readLine.getBytes("UTF-8"));
}//end of while
out.flush();
out.close();
}//end of handleFileRequest
And it appears to work, as the server sends the file, the browser shows the 200 OK response, but there's no favicon and when I filter network requests to just images, there is one image requested by the page being served but the favicon request is not listed there (the favicon request is in the "other" section). Similarly when clicking on the other image the image shows up on the preview, whereas that's not the case with the favicon request. Screenshot:
Meanwhile here's what the other image looks like, and it shows up in the page just fine:
I also tried including the Content-Length header, but that didn't seem to make a difference. Am I missing something obvious?
Also just to clarify, I know I can include the favicon in the actual html page, the goal isn't to do it, but to understand how it works.

Reading binary files
It seems the content of the favicon is not served correctly.
I suspect this is most likely due to the way you read its content:
while((readLine = read.readLine())!=null){
out.write(readLine.getBytes("UTF-8"));
}
Reading binary content line by line is inappropriate,
because the concept of lines, and also UTF-8 encoding,
don't make sense in the context of binary files.
And you cannot read binary content correctly line by line this way,
because the readLine method of a BufferedReader doesn't return the full line, because it strips the newline from the end.
You cannot manually add a newline character because you cannot know what exactly it was.
Here's a simpler and correct way to read the content of a binary file:
byte[] bytes = Files.readAllBytes(Paths.get("/path/to/file"));
Once you have this, it's easy to produce a correct file header with the content length, using the value of bytes.length.
What happens when you visit a page in a browser
It seems it will be good for you if we clarify a few things.
When you open a URL in a browser,
the browser sends a GET request to the web server to download the content of the original URL that you have specified.
Once it has the page content, it will send further GET requests:
Fetch a favicon if it doesn't have one already. The location of this may be specified in the HTML document, or else the browser will try to fetch SERVERNAME/favicon.ico by default
Fetch the images specified in src attribute of any (valid) <img/> tags in the document
Fetch the style sheets specified in href attribute of any (valid) <style/> tags in the document
... and similarly for <script/> tags, and so on...
The favicon is purely cosmetic, to show in browser tab titles,
the other resources are essential for rendering a page.
They are not essential in text-based browsers like lynx,
such browsers will obviously not fetch these resources.
This is the explanation for why the favicon is requested, and how.
How does a web server serve files?
In the most basic case, serving a file has two important components:
Produce an appropriate HTTP header: each line in the header is in name: value format, and each line must end with \n.
There must be at least a Content-type header.
The header must be terminated by a blank line.
After the blank line that terminates the header,
the content can be anything, even binary.
To illustrate with an example,
consider the curl command, which dumps the content of a url to standard output.
If you run curl url-to-some-html-file,
you will see the content of the html file.
If you run curl url-to-some-image-file,
you will see the content of the image file.
It will be unreadable, and your terminal will probably make funny noises.
You can redirect the output to a file with curl url-to-some-image-file > image.png,
and that will give you an image file,
binary content,
that you can open in any image viewer tool.
In short, serving files is really just printing a header on stdout,
then printing a blank line to terminate the header,
then printing the content on stdout.
Debugging the serving of an image
An easy way to debug that an image is correctly served is to save the URL to a file using curl,
and then verify that the saved file and the original file are identical,
for example using the cmp command:
curl -o file url-to-favicon
cmp file /path/to/original
The output of cmp should be empty.
This command only produces output if it finds a difference in the two files.
Implementing a simple HTTP server
Instead of using a ServerSocket,
here's a drastically simpler way to implement an HTTP server:
HttpServer server = HttpServer.create(new InetSocketAddress(1234), 0);
server.createContext("/favicon.ico", t -> {
byte[] bytes = Files.readAllBytes(Paths.get("/path/to/favicon"));
t.sendResponseHeaders(200, bytes.length);
try (OutputStream os = t.getResponseBody()) {
os.write(bytes);
}
});
server.createContext("/", t -> {
Charset charset = StandardCharsets.UTF_8;
List<String> lines = Files.readAllLines(Paths.get("/path/to/index"), charset);
t.sendResponseHeaders(200, 0);
try (OutputStream os = t.getResponseBody()) {
for (String line : lines) {
os.write((line + "\n").getBytes(charset));
}
}
});
server.start();

Using URLConnetion.getInputStream() to get source code (amazon.de)

When I want to get the source code of a specific web page, I use following code:
URL url = new URL("https://google.de");
URLConnection urlConnect = url.openConnection();
BufferedReader br = new BufferedReader(new InputStreamReader(urlConnect.getInputStream())); //Here is the error with the amazon url
StringBuffer sb = new StringBuffer();
String line, htmlData;
while((line=br.readLine())!=null){
sb.append(line+"\n");
}
htmlData = sb.toString();
The code above works without problems, but when your url is called...
URL url = new URL("https://amazon.de");
...then you might get sometimes a IOException error -> Server error code 503. In my opinion, this doesn't make any sense, because I can enter the amazon web page with the browser without any errors.

When accessing https://amazon.de with curl -v https://amazon.de you either get a 503 or a 301 status code in the response (When following the redirect, you get a 503 from the referenced location https://www.amazon.de/). The body contains the following comment:
To discuss automated access to Amazon data please contact api-services-support#amazon.com.
For information about migrating to our APIs refer to our Marketplace APIs at https://developer.amazonservices.de/ref=rm_5_sv, or our Product Advertising API at https://partnernet.amazon.de/gp/advertising/api/detail/main.html/ref=rm_5_ac for advertising use cases.
I assume Amazon is returning this response when your request is detected as coming from a non browser context (i.e. by parsing the user agent) to hint you towards using the APIs and not crawling the sites directly.

Unable to download more than 16144 characters when requesting json from API

I have an android application which downloads its information as JSON.
A typical JSON download is about 2,000-3,000 characters. But i wanted to stress it, so I created a larger file (~48,000 characters). As files go this is still small, under 50kb.
The problem I have is when downloading I am only getting 16144 charcters of data. That is reader.readLine() returns just one line containing 16144 characters, as does client.execute(request, new BasicResponseHandler());. Obviously with only part of the file, my JSON parsering code fails quickly as its not a valid JSON object.
There are no exceptions raised, so its not an out of memory error. And the problem is repeatable on a HTC desire (2.2) and Galaxy Nexus (4.1.1), so not OS specific either. I've tested the URL in a web browser and it works fine, all the JSON is available so its not a server error.
Question
Can anyone point out why it is downloading only 16144 characters, and how to make it download the whole file?
Method #1
HttpClient client = new DefaultHttpClient();
HttpGet request = new HttpGet(uri);
HttpResponse response = client.execute(request);
InputStream in = response.getEntity().getContent();
BufferedReader reader = new BufferedReader(new InputStreamReader(in));
StringBuilder str = new StringBuilder();
String line = null;
while ((line = reader.readLine()) != null)
{
str.append(line);
}
in.close();
result.setJSONResult(str.toString());
Method #2
HttpClient client = new DefaultHttpClient();
HttpGet request = new HttpGet(uri);
HttpResponse response = client.execute(request);
String json = client.execute(request, new BasicResponseHandler());
result.setJSONResult(json);
Note - The url is on a LAN network (http://192.168.0.99:8080...), so I've not included it as it won't be useful.
Update - Fixed
Fixed the problem. In the end I put it down to a file transfer issue rather than memory limits of the phone. Whilst it worked on a PC (Chrome), I found it was broken in other places other than on android such as on the website and other browsers (Safari) didn't work with the raw API call. The underlying problem was the webserver's proxy ngix, wanted to buffer larger responses (over 32kb) however it never had write permissions on the server folders it used for buffering. This meant it sent part of the file, started to buffer and hit a critial error due to been unable to write. When it errored, it stopped sending the rest of the file hence it stopping at an unusual number of bytes. Thanks for all your help!

its because that's the max size a string can hold -- always 2147483647 (2^31 - 1) by the Java specification, the maximum size of an array, which the String class uses for internal storage) or half your maximum heap size (since each character is two bytes), whichever is smaller.
and probably the heap size ll be less than 40kbs
you can use json reader instead of using a string to store the data from web pls refer http://developer.android.com/reference/android/util/JsonReader.html

You are using a line-based reader to read data that is not line-based. When you call readLine, you are asking it to forcefully convert whatever it read into a line of text. This mangles the data if it's not in fact a line of text.

Fixed the problem. In the end I put it down to a file transfer issue rather than memory limits of the phone. Whilst it worked on a PC (Chrome), I found it was broken in other places other than on android such as on the website and other browsers (Safari) didn't work with the raw API call. The underlying problem was the webserver's proxy ngix, wanted to buffer larger responses (over 32kb) however it never had write permissions on the server folders it used for buffering. This meant it sent part of the file, started to buffer and hit a critial error due to been unable to write. When it errored, it stopped sending the rest of the file hence it stopping at an unusual number of bytes. Thanks for all your help!

Webservice call returns error 500

I have started a small project in Java.
I have to create a client which will send xml to a url as a HTTP POST request.
I try it using java.net.* package (Following is the piece of code) but I am getting error as follows:
java.io.IOException: Server returned HTTP response code: 500 for URL: "target url"
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1441)
at newExample.main(newExample.java:36)
My code is as follows:
try {
URL url = new URL("target url");
URLConnection connection = url.openConnection();
if( connection instanceof HttpURLConnection )
((HttpURLConnection)connection).setRequestMethod("POST");
connection.setRequestProperty("Content-Length", Integer.toString(requestXml.length()) );
connection.setRequestProperty("Content-Type","text/xml; charset:ISO-8859-1;");
connection.setDoOutput(true);
connection.connect();
// Create a writer to the url
PrintWriter writer = new PrintWriter(new
OutputStreamWriter(connection.getOutputStream()));
// Get a reader from the url
BufferedReader reader = new BufferedReader(new
InputStreamReader(connection.getInputStream()));
writer.println();
writer.println(requestXml);
writer.println();
writer.flush();
String line = reader.readLine();
while( line != null ) {
System.out.println( line );
line = reader.readLine();
}
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
Please help with suitable examples or any other ways of doing this.
Point errors/mistakes in above code or other possibilities.
My Web Service is in spring framework
xml to send is in the string format: requestXml

The problem lies in below code
// Get a reader from the url
BufferedReader reader = new BufferedReader(new
InputStreamReader(connection.getInputStream()));
As the service might not always return you the proper response... as you are calling a service through http, it can be possible that the server itself is not available or the service is not available. So you should always check for the response code before reading response from streams, based on the response code you've to decide whether to read it from inputStream for success response or from errorStream for failure or exception condition.
BufferedReader reader = null;
if(connection.getResponseCode() == 200)
{
reader = new BufferedReader(new
InputStreamReader(connection.getInputStream()));
}
else
{
reader = new BufferedReader(new
InputStreamReader(connection.getErrorStream()));
}
This would resolve the problem

The problem is inside your server code or the server configuration:
10.5.1 500 Internal Server Error
The server encountered an unexpected condition which prevented it from fulfilling the request.
(w3c.org/Protocols)
If the server is under your control (should be, if I look at the URL [before the edit]), then have a look at the server logs.

Well, you should close your streams and connections. Automatic resource maangement from Java 7 or http://projectlombok.org/ can help. However, this is probably not the main problem.
The main problem is that the server-side fails. HTTP code 500 means server-side error. I can't tell you the reason, because I don't know the server side part. Maybe you should look at the log of the server.

I think that your problem is that you are opening the input stream before you have written and closed the output stream. Certainly, the Sun Tutorial does it that way.
If you open the input stream too soon, it is possible that the output stream will be closed automatically, causing the server to see an empty POST request. This could be sufficient to cause it to get confused and send a 500 response.
Even if this is not what is causing the 500 errors, it is a good idea to do things in the order set out in the tutorial. For a start, if you accidentally read the response before you've finished writing the request, you are likely to (at least temporarily) lock up the connection. (In fact, it looks like your code is doing this because you are not closing the writer before reading from the reader.)
A separate issue is that your code does not close the connection in all circumstances, and is therefore liable to leak network connections. If it does this repeatedly, it is likely to lead to more IOExceptions.

If you are calling an External Webservice and passing a JSON in the REST call, check the datatype of the values passed.
Example:
{ "originalReference":"8535064088443985",
"modificationAmount":
{ "amount":"16.0",
"currency":"AUD"
},
"reference":"20170928113425183949",
"merchantAccount":"MOM1"
}
In this example, the value of amount was sent as a string and the webservice call failed with Server returned HTTP response code: 500.
But when the amount: 16.0 was sent, i.e an Integer was passed, the call went through. Though you have referred API documentation while calling such external APIs, small details like this could be missed.

URLConnection FileNotFoundException for non-standard HTTP port sources

I was trying to use the Apache Ant Get task to get a list of WSDLs generated by another team in our company. They have them hosted on a weblogic 9.x server on http://....com:7925/services/. I am able to get to the page through a browser, but the get task gives me a FileNotFoundException when trying to copy the page to a local file to parse. I was still able to get (using the ant task) a URL without the non-standard port 80 for HTTP.
I looked through the Ant source code, and narrowed the error down to the URLConnection. It seems as though the URLConnection doesn't recognize the data is HTTP traffic, since it isn't on the standard port, even though the protocol is specified as HTTP. I sniffed the traffic using WireShark and the page loads correctly across the wire, but still gets the FileNotFoundException.
Here's an example where you will see the error (with the URL changed to protect the innocent). The error is thrown on connection.getInputStream();
import java.io.File;
import java.io.InputStream;
import java.net.URL;
import java.net.URLConnection;
public class TestGet {
private static URL source;
public static void main(String[] args) {
doGet();
}
public static void doGet() {
try {
source = new URL("http", "test.com", 7925,
"/services/index.html");
URLConnection connection = source.openConnection();
connection.connect();
InputStream is = connection.getInputStream();
} catch (Exception e) {
System.err.println(e.toString());
}
}
}

The response to my HTTP request returned with a status code 404, which resulted in a FileNotFoundException when I called getInputStream(). I still wanted to read the response body, so I had to use a different method: HttpURLConnection#getErrorStream().
Here's a JavaDoc snippet of getErrorStream():
Returns the error stream if the
connection failed but the server sent
useful data nonetheless. The typical
example is when an HTTP server
responds with a 404, which will cause
a FileNotFoundException to be thrown
in connect, but the server sent an
HTML help page with suggestions as to
what to do.
Usage example:
public static String httpGet(String url) {
HttpURLConnection con = null;
InputStream is = null;
try {
con = (HttpURLConnection) new URL(url).openConnection();
con.connect();
//4xx: client error, 5xx: server error. See: http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html.
boolean isError = con.getResponseCode() >= 400;
//In HTTP error cases, HttpURLConnection only gives you the input stream via #getErrorStream().
is = isError ? con.getErrorStream() : con.getInputStream();
String contentEncoding = con.getContentEncoding() != null ? con.getContentEncoding() : "UTF-8";
return IOUtils.toString(is, contentEncoding); //Apache Commons IO
} catch (Exception e) {
throw new IllegalStateException(e);
} finally {
//Note: Closing the InputStream manually may be unnecessary, depending on the implementation of HttpURLConnection#disconnect(). Sun/Oracle's implementation does close it for you in said method.
if (is != null) {
try {
is.close();
} catch (IOException e) {
throw new IllegalStateException(e);
}
}
if (con != null) {
con.disconnect();
}
}
}

This is an old thread, but I had a similar problem and found a solution that is not listed here.
I was receiving the page fine in the browser, but got a 404 when I tried to access it via the HttpURLConnection. The URL I was trying to access contained a port number. When I tried it without the port number I successfully got a dummy page through the HttpURLConnection. So it seemed the non-standard port was the problem.
I started thinking the access was restricted, and in a sense it was. My solution was that I needed to tell the server the User-Agent and I also specify the file types I expect. I am trying to read a .json file, so I thought the file type might be a necessary specification as well.
I added these lines and it finally worked:
httpConnection.setRequestProperty("User-Agent","Mozilla/5.0 ( compatible ) ");
httpConnection.setRequestProperty("Accept","*/*");

check the response code being returned by the server

I know this is an old thread but I found a solution not listed anywhere here.
I was trying to pull data in json format from a J2EE servlet on port 8080 but was receiving the file not found error. I was able to pull this same json data from a php server running on port 80.
It turns out that in the servlet, I needed to change doGet to doPost.
Hope this helps somebody.

You could use OkHttp:
OkHttpClient client = new OkHttpClient();
String run(String url) throws IOException {
Request request = new Request.Builder()
.url(url)
.build();
Response response = client.newCall(request).execute();
return response.body().string();
}

I've tried that locally - using the code provided - and I don't get a FileNotFoundException except when the server returns a status 404 response.
Are you sure that you're connecting to the webserver you intend to be connecting to? Is there any chance you're connecting to a different webserver? (I note that the port number in the code doesn't match the port number in the link)

I have run into a similar issue but the reason seems to be different, here is the exception trace:
java.io.FileNotFoundException: http://myhost1:8081/test/api?wait=1
at sun.reflect.GeneratedConstructorAccessor2.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1491)
at java.security.AccessController.doPrivileged(Native Method)
at sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1485)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1139)
at com.doitnext.loadmonger.HttpExecution.getBody(HttpExecution.java:85)
at com.doitnext.loadmonger.HttpExecution.execute(HttpExecution.java:214)
at com.doitnext.loadmonger.ClientWorker.run(ClientWorker.java:126)
at java.lang.Thread.run(Thread.java:680)
Caused by: java.io.FileNotFoundException: http://myhost1:8081/test/api?wait=1
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1434)
at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:379)
at com.doitnext.loadmonger.HttpExecution.execute(HttpExecution.java:166)
... 2 more
So it would seem that just getting the response code will cause the URL connection to callGetInputStream.

I know this is an old thread but just noticed something on this one so thought I will just put it out there.
Like Jessica mentioned, this exception is thrown when using non-standard port.
It only seems to happen when using DNS though. If I use IP number I can specify the port number and everything works fine.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Error 503 in HTTP during page parsing in java - java

Check that the IOException is really not a MalformedURLException. Try printing out the URLs to verify a bad URL is not causing the IOException. How large is the file you are parsing? Perhaps your JVM is running out of memory.

Related

HTTP Server - Serving up favicon.ico

Using URLConnetion.getInputStream() to get source code (amazon.de)

Unable to download more than 16144 characters when requesting json from API

Webservice call returns error 500

URLConnection FileNotFoundException for non-standard HTTP port sources

Categories

Resources