testing url connection java html - java

When I am testing the list of url's to see if the connection is "alive" or "dead", I run into this site: (www.abbapregnancy.org) then site is blank with no information. How can I make an if statement to test for sites like this? Feel free to comment or suggest ideas. My code is currently like this:
try
{// Test URL Connection
URL url = new URL("http://www."+line);
URLConnection conn = url.openConnection();
conn.setDoOutput(true);
wr = new OutputStreamWriter(conn.getOutputStream());
wr.flush();
rd = new BufferedReader(new InputStreamReader(conn.getInputStream()));
while( !rd.ready() ) {}
if( rd.ready())
{
//write to output file
System.out.println("Good URL: " + line);
urlAlive ++;
//increment alive line to Alive url file
aliveUrl += (line + System.getProperty("line.separator"));
}
else // increment dead url
urlDead++;
}
catch (Exception e)
// increment dead url
{
urlDead++;
}

So you want to check more than just "alive" or "dead". If alive, you also want to know if the content is empty, right?
Since you seem to want to check websites, using http connections makes sense here. For the alive/dead condition, connection attempts will throw an exception if dead. If alive, you can use the status code to check if the request was really successful. Finally, you can use the content-length header to check if the site is up but actually returning empty content. For example like this:
public void checkURL(URL url) throws IOException {
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setRequestMethod("GET");
System.out.println(String.format("Fetching %s ...", url));
try {
int responseCode = conn.getResponseCode();
if (responseCode == 200) {
System.out.println(String.format("Site is up, content length = %s", conn.getHeaderField("content-length")));
} else {
System.out.println(String.format("Site is up, but returns non-ok status = %d", responseCode));
}
} catch (java.net.UnknownHostException e) {
System.out.println("Site is down");
}
}

public static boolean IsReachable(Context context, String check_url) {
// First, check we have any sort of connectivity
final ConnectivityManager connMgr = (ConnectivityManager) context.getSystemService(Context.CONNECTIVITY_SERVICE);
final NetworkInfo netInfo = connMgr.getActiveNetworkInfo();
boolean isReachable = false;
if (netInfo != null && netInfo.isConnected()) {
// Some sort of connection is open, check if server is reachable
try {
URL url = new URL(check_url);
// URL url = new URL("http://192.168.100.93/office_com/www/api.php/office_com/Logins/SignUp?api_user=endpoint");
//URL url = new URL("http://10.0.2.2");
HttpURLConnection urlc = (HttpURLConnection) url.openConnection();
urlc.setRequestProperty("User-Agent", "Android Application");
urlc.setRequestProperty("Connection", "close");
urlc.setConnectTimeout(10 * 1000);
try {
urlc.connect();
System.out.println("-----fffff");
} catch (Exception e) {
System.out.println("-----fffff " + e);
}
isReachable = (urlc.getResponseCode() == 200);
} catch (IOException e) {
}
}
return isReachable;
}

Related

Android HttpURLConnection setRequestMethod PUT

I'm trying to connect an Android app to a restful server with HttpURLConnection. The GET requests are successful but the PUT requests aren't. Every PUT request arrives at the server as a GET request.
#Override
protected Boolean doInBackground(Method... params) {
Boolean result = false;
try {
URL url = new URL("http://server.com/api");
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
switch (params[0]) {
case PUT:
connection.setDoOutput(true);
connection.setRequestMethod("PUT");
Log.d(TAG, "Method: " + connection.getRequestMethod()); // Correctly "Method: PUT"
OutputStreamWriter out = new OutputStreamWriter(
connection.getOutputStream());
out.write("Message");
out.close();
break;
case GET:
connection.setRequestMethod("GET");
connection.connect();
break;
default:
connection.setRequestMethod("GET");
connection.connect();
break;
}
if (connection.getResponseCode() == HttpURLConnection.HTTP_OK) {
InputStream in = connection.getInputStream();
JsonReader reader = new JsonReader(new InputStreamReader(in, "UTF-8"));
reader.beginObject();
while (reader.hasNext()) {
String name = reader.nextName();
if (name.equals("state")) {
result = true;
if (reader.nextInt() == 1) {
state = true;
Log.d(TAG, "state: 1");
} else {
state = false;
Log.d(TAG, "state: -1");
}
} else if (name.equals("method")) {
method = reader.nextString(); // Server respone is "Method: GET"
}
}
reader.endObject();
in.close();
} else {
Log.d(TAG, "Connection failed");
}
connection.disconnect();
} catch (Exception e) {
Log.e("Exception", e.toString());
e.printStackTrace();
}
return result;
}
The request method is correctly set to PUT before connection.connect();. What am I missing? I don't want to send data. The PUT request changes a counter, so no data is necessary.
Same function is implemented in Javascript with JQuery for a webfrontend and works
$.ajax({
url: '/api',
method: 'PUT'
});
EDIT:
Maybe it's a problem with the server. Currently I'm using an php file
<?php
echo(var_dump($_SERVER)); // ["REQUEST_METHOD"]=> string(3) "GET"
function get ($db) {
$state = $db->querySingle('SELECT state FROM states WHERE name="main"');
echo('{"state": ' . $state . ', "method": "get"}');
}
function put ($db) {
$state = $db->querySingle('SELECT state FROM states WHERE name="main"');
$db->exec('UPDATE states SET state=' . ($state+1)%2 . ' WHERE name="main"');
$state = $db->querySingle('SELECT state FROM states WHERE name="main"');
echo('{"state": ' . $state . ', "method": "put"}');
}
if ($db = new SQLite3('database.sqlite')) {
switch($_SERVER['REQUEST_METHOD']){
case 'GET':
get($db);
break;
case 'PUT':
put($db);
break;
}
$db->close();
} else {
}
?>
I tried my app with http://httpbin.org/put and it worked.
I found the problem. I have to append a trailing slash to the url otherwise the request is redirected and transformed to a GET request. I don't exactly understand the problem but I found a solution for me.
I have to change
URL url = new URL("http://server.com/api");
to
URL url = new URL("http://server.com/api/");
And now it works. Maybe someone can explain it to me. When I try to open http://server.com/api with curl I get a 303 redirect to http://server.com/api.

HttpUrlConnection BadRequest - Statuscode 400

I have implemented a class using HttpUrlConnection to get some data from the google geocoding api. When I'm using this code on android, it works properly. But as soon as I am using this code in another "normal" java program, I am getting the status-code 400 (BadRequest) sometimes. Here is my code:
HttpURLConnection c = null;
StringBuilder sb = new StringBuilder();
try {
URL u = new URL(url);
c = (HttpURLConnection) u.openConnection();
c.setRequestMethod("GET");
c.setRequestProperty("Content-length", "0");
c.setUseCaches(false);
c.setAllowUserInteraction(false);
c.setConnectTimeout(timeout);
c.setReadTimeout(timeout);
c.connect();
int status = c.getResponseCode();
switch (status) {
case HttpURLConnection.HTTP_OK:
case HttpURLConnection.HTTP_CREATED:
BufferedReader br = new BufferedReader(new InputStreamReader(c.getInputStream()));
String line;
while ((line = br.readLine()) != null) {
sb.append(line + "\n");
}
br.close();
}
} catch (SocketTimeoutException ex){
// Handle ...
} catch (MalformedURLException ex) {
// Handle ...
} catch (IOException ex) {
// Handle ...
} finally {
if (c != null) {
try {
c.disconnect();
} catch (Exception ex) {
}
}
}
I have a reliable internet connection and also the URL I am using to receive the data works, whenever I try it with my web browser.
Thanks in advance!
Bad Request is often caused by inadequat URLs. As you mentioned not every URL gives this error, only a view of them. So it has to be something to do with that. Try the following code to ensure the correct encoding of the URL you are using:
String url = ...; // your url
url = URLEncoder.encode(url,"UTF-8");
// Use 'url' ...

How to close a persistent HTTP Connection without reading

I have an URLConnection which I want to cancel depending on the response code without reading any data. I closely followed the android training to build the following minimal example which
floods the server with requests since no connection is ever released back to the handle pool for reuse
private String downloadUrl(String myurl) throws IOException {
InputStream is = null;
try {
URL url = new URL(myurl);
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setReadTimeout(10000 /* milliseconds */);
conn.setConnectTimeout(15000 /* milliseconds */);
conn.setRequestMethod("GET");
conn.setDoInput(true);
// Starts the query
conn.connect();
int response = conn.getResponseCode();
Log.d(TAG, "The response code is: " + response);
is = conn.getInputStream();
// Do not read anything //String contentAsString = readIt(is, len);
String contentAsString = "notReadingAnything";
return contentAsString;
} finally {
if (is != null) {
is.close();
}
}
}
private class DownloadWebpageTask extends AsyncTask<String, Void, String> {
#Override
protected String doInBackground(String... urls) {
try {
String result = new String();
for (int i=0; i<100; i++) {
result += downloadUrl(urls[0]);
}
return result;
} catch (IOException e) {
return "Unable to retrieve web page. URL may be invalid.";
}
}
#Override
protected void onPostExecute(String result) {
Log.d(TAG, "The response is: " + result);
}
}
Despite the docs explicitly stating
But if the response body is long and you are not interested in the rest of it after seeing the beginning, you can close the InputStream
the server quickly reaches its maximum number of connections (50) and goes to 99% workload if I don't read the stream but works fine if I do read it. What is my mistake?
EDIT: Failed solution attempts so far (thanks to #Blackbelt for most of them)
calling conn.disconnect() in the finally block
calling conn.disconnect() instead of is.close() in the finally block
Setting System.setProperty("http.keepAlive", "false"); before the first call
Setting conn.setRequestProperty("Connection", "Close"); before connecting
Setting "{enable_keep_alive", "no"} on the used backend server (Civetweb)
you should call disconnect() too. Accordingly to the documentation
Disconnect. Once the response body has been read, the
HttpURLConnection should be closed by calling disconnect().
Disconnecting releases the resources held by a connection so they may
be closed or reused.
InputStream is = null;
HttpURLConnection conn = null;
try {
URL url = new URL(myurl);
conn = (HttpURLConnection) url.openConnection();
} finally {
if (is != null) {
is.close();
}
if (conn != null) {
conn.disconnect();
}
}
if you still are experiencing issues, is also possible that the bug is backend side

HTTP Request to Wikipedia gives no result

I try to fetch HTML per Code. When fetching from "http://www.google.com" for example it works perfect. When trying to fetch from "http://en.wikipedia.org/w/api.php" I do not get any results.
Does someone have any idea ?
Code:
String sURL="http://en.wikipedia.org/w/api.php?action=query&generator=categorymembers&gcmtitle=Category:Countries&prop=info&gcmlimit=500&format=json";
String sText=readfromURL(sURL);
public static String readfromURL(String sURL){
URL url = null;
try {
url = new URL(sURL);
} catch (MalformedURLException e1) {
e1.printStackTrace();
}
URLConnection urlconnect = null;
try {
urlconnect = url.openConnection();
urlconnect.setRequestProperty("User-Agent","Mozilla/5.0 (Windows NT 5.1; rv:19.0) Gecko/20100101 Firefox/19.0");
} catch (IOException e) {
e.printStackTrace();
}
BufferedReader in = null;
try {
in = new BufferedReader(new InputStreamReader(urlconnect.getInputStream()));
} catch (IOException e) {
e.printStackTrace();
}
String inputLine;
String sEntireContent="";
try {
while ((inputLine = in.readLine()) != null) {
System.out.println(inputLine);
sEntireContent=sEntireContent+inputLine;
}
} catch (IOException e) {
e.printStackTrace();
}
try {
in.close();
} catch (IOException e) {
e.printStackTrace();
}
return sEntireContent;
}
It looks like the request limit. Try to check the response code.
From the documentation (https://www.mediawiki.org/wiki/API:Etiquette):
If you make your requests in series rather than in parallel (i.e. wait
for the one request to finish before sending a new request, such that
you're never making more than one request at the same time), then you
should definitely be fine.
Be sure that you do not do few request at a time
Update
I did verification on my local your code - you are correct it does not work. Fix - you need to use https, so it would work:
https://en.wikipedia.org/w/api.php?action=query&generator=categorymembers&gcmtitle=Category:Countries&prop=info&gcmlimit=500&format=json
result:
{"batchcomplete":"","query":{"pages":{"5165":{"pageid":5165,"ns":0,"title":"Country","contentmodel":"wikitext","pagelanguage":"en","touched":"2015-10-20T20:09:05Z","lastrevid":686706429,"length":12695},"5112305":{"pageid":5112305,"ns":14,"title":"Category:Countries by continent","contentmodel":"wikitext","pagelanguage":"en","touched":"2015-10-18T17:31:54Z","lastrevid":681415612,"length":133},"14353213":{"pageid":14353213,"ns":14,"title":"Category:Countries by form of government","contentmodel":"wikitext","pagelanguage":"en","touched":"2015-08-13T23:33:29Z","lastrevid":675984011,"length":261},"5112467":{"pageid":5112467,"ns":14,"title":"Category:Countries by international organization","contentmodel":"wikitext","pagelanguage":"en","touched":"2015-10-18T05:11:12Z","lastrevid":686245148,"length":123},"4696391":{"pageid":4696391,"ns":14,"title":"Category:Countries by language","contentmodel":"wikitext","pagelanguage":"en","touched":"2015-10-20T01:17:18Z","lastrevid":675966601,"length":333},"5112374":{"pageid":5112374,"ns":14,"title":"Category:Countries by status","contentmodel":"wikitext","pagelanguage":"en","touched":"2015-08-13T21:05:47Z","lastrevid":675966630,"length":30},"708617":{"pageid":708617,"ns":14,"title":"Category:Lists of countries","contentmodel":"wikitext","pagelanguage":"en","touched":"2015-10-18T05:08:45Z","lastrevid":681553760,"length":256},"46624537":{"pageid":46624537,"ns":14,"title":"Category:Caspian littoral states","contentmodel":"wikitext","pagelanguage":"en","touched":"2015-09-23T08:40:34Z","lastrevid":663549987,"length":50},"18066512":{"pageid":18066512,"ns":14,"title":"Category:City-states","contentmodel":"wikitext","pagelanguage":"en","touched":"2015-09-29T20:14:14Z","lastrevid":679367764,"length":145},"2019528":{"pageid":2019528,"ns":14,"title":"Category:Country classifications","contentmodel":"wikitext","pagelanguage":"en","touched":"2015-09-25T09:09:13Z","lastrevid":675966465,"length":182},"935240":{"pageid":935240,"ns":14,"title":"Category:Country codes","contentmodel":"wikitext","pagelanguage":"en","touched":"2015-10-17T06:05:53Z","lastrevid":546489724,"length":222},"36819536":{"pageid":36819536,"ns":14,"title":"Category:Countries in fiction","contentmodel":"wikitext","pagelanguage":"en","touched":"2015-10-03T06:09:16Z","lastrevid":674147667,"length":169},"699787":{"pageid":699787,"ns":14,"title":"Category:Fictional countries","contentmodel":"wikitext","pagelanguage":"en","touched":"2015-10-17T18:43:25Z","lastrevid":610289877,"length":356},"804303":{"pageid":804303,"ns":14,"title":"Category:Former countries","contentmodel":"wikitext","pagelanguage":"en","touched":"2015-09-21T09:58:52Z","lastrevid":668632882,"length":403},"7213567":{"pageid":7213567,"ns":14,"title":"Category:Island countries","contentmodel":"wikitext","pagelanguage":"en","touched":"2015-10-22T22:10:37Z","lastrevid":648502876,"length":157},"3046541":{"pageid":3046541,"ns":14,"title":"Category:Landlocked countries","contentmodel":"wikitext","pagelanguage":"en","touched":"2015-10-04T00:45:24Z","lastrevid":648502892,"length":54},"743058":{"pageid":743058,"ns":14,"title":"Category:Middle Eastern countries","contentmodel":"wikitext","pagelanguage":"en","touched":"2015-10-12T14:41:59Z","lastrevid":677900732,"length":495},"41711462":{"pageid":41711462,"ns":14,"title":"Category:Mongol states","contentmodel":"wikitext","pagelanguage":"en","touched":"2015-10-23T07:36:21Z","lastrevid":687093637,"length":121},"30645082":{"pageid":30645082,"ns":14,"title":"Category:Country names","contentmodel":"wikitext","pagelanguage":"en","touched":"2015-03-07T06:33:19Z","lastrevid":561256656,"length":94},"21218559":{"pageid":21218559,"ns":14,"title":"Category:Outlines of countries","contentmodel":"wikitext","pagelanguage":"en","touched":"2015-10-07T18:04:29Z","lastrevid":645312408,"length":248},"37943702":{"pageid":37943702,"ns":14,"title":"Category:Proposed countries","contentmodel":"wikitext","pagelanguage":"en","touched":"2015-10-20T02:30:25Z","lastrevid":668630396,"length":130},"15086044":{"pageid":15086044,"ns":14,"title":"Category:Turkic states","contentmodel":"wikitext","pagelanguage":"en","touched":"2015-10-20T06:23:35Z","lastrevid":677424552,"length":114},"32809189":{"pageid":32809189,"ns":14,"title":"Category:Works about countries","contentmodel":"wikitext","pagelanguage":"en","touched":"2015-10-17T08:45:32Z","lastrevid":620016516,"length":153},"27539189":{"pageid":27539189,"ns":14,"title":"Category:Wikipedia books on countries","contentmodel":"wikitext","pagelanguage":"en","touched":"2015-04-11T05:12:25Z","lastrevid":546775798,"length":203},"35317198":{"pageid":35317198,"ns":14,"title":"Category:Wikipedia categories named after countries","contentmodel":"wikitext","pagelanguage":"en","touched":"2015-10-17T18:35:14Z","lastrevid":641689352,"length":202}}}}
{"batchcomplete":"","query":{"pages":{"5165":{"pageid":5165,"ns":0,"title":"Country","contentmodel":"wikitext","pagelanguage":"en","touched":"2015-10-20T20:09:05Z","lastrevid":686706429,"length":12695},"5112305":{"pageid":5112305,"ns":14,"title":"Category:Countries by continent","contentmodel":"wikitext","pagelanguage":"en","touched":"2015-10-18T17:31:54Z","lastrevid":681415612,"length":133},"14353213":{"pageid":14353213,"ns":14,"title":"Category:Countries by form of government","contentmodel":"wikitext","pagelanguage":"en","touched":"2015-08-13T23:33:29Z","lastrevid":675984011,"length":261},"5112467":{"pageid":5112467,"ns":14,"title":"Category:Countries by international organization","contentmodel":"wikitext","pagelanguage":"en","touched":"2015-10-18T05:11:12Z","lastrevid":686245148,"length":123},"4696391":{"pageid":4696391,"ns":14,"title":"Category:Countries by language","contentmodel":"wikitext","pagelanguage":"en","touched":"2015-10-20T01:17:18Z","lastrevid":675966601,"length":333},"5112374":{"pageid":5112374,"ns":14,"title":"Category:Countries by status","contentmodel":"wikitext","pagelanguage":"en","touched":"2015-08-13T21:05:47Z","lastrevid":675966630,"length":30},"708617":{"pageid":708617,"ns":14,"title":"Category:Lists of countries","contentmodel":"wikitext","pagelanguage":"en","touched":"2015-10-18T05:08:45Z","lastrevid":681553760,"length":256},"46624537":{"pageid":46624537,"ns":14,"title":"Category:Caspian littoral states","contentmodel":"wikitext","pagelanguage":"en","touched":"2015-09-23T08:40:34Z","lastrevid":663549987,"length":50},"18066512":{"pageid":18066512,"ns":14,"title":"Category:City-states","contentmodel":"wikitext","pagelanguage":"en","touched":"2015-09-29T20:14:14Z","lastrevid":679367764,"length":145},"2019528":{"pageid":2019528,"ns":14,"title":"Category:Country classifications","contentmodel":"wikitext","pagelanguage":"en","touched":"2015-09-25T09:09:13Z","lastrevid":675966465,"length":182},"935240":{"pageid":935240,"ns":14,"title":"Category:Country codes","contentmodel":"wikitext","pagelanguage":"en","touched":"2015-10-17T06:05:53Z","lastrevid":546489724,"length":222},"36819536":{"pageid":36819536,"ns":14,"title":"Category:Countries in fiction","contentmodel":"wikitext","pagelanguage":"en","touched":"2015-10-03T06:09:16Z","lastrevid":674147667,"length":169},"699787":{"pageid":699787,"ns":14,"title":"Category:Fictional countries","contentmodel":"wikitext","pagelanguage":"en","touched":"2015-10-17T18:43:25Z","lastrevid":610289877,"length":356},"804303":{"pageid":804303,"ns":14,"title":"Category:Former countries","contentmodel":"wikitext","pagelanguage":"en","touched":"2015-09-21T09:58:52Z","lastrevid":668632882,"length":403},"7213567":{"pageid":7213567,"ns":14,"title":"Category:Island countries","contentmodel":"wikitext","pagelanguage":"en","touched":"2015-10-22T22:10:37Z","lastrevid":648502876,"length":157},"3046541":{"pageid":3046541,"ns":14,"title":"Category:Landlocked countries","contentmodel":"wikitext","pagelanguage":"en","touched":"2015-10-04T00:45:24Z","lastrevid":648502892,"length":54},"743058":{"pageid":743058,"ns":14,"title":"Category:Middle Eastern countries","contentmodel":"wikitext","pagelanguage":"en","touched":"2015-10-12T14:41:59Z","lastrevid":677900732,"length":495},"41711462":{"pageid":41711462,"ns":14,"title":"Category:Mongol states","contentmodel":"wikitext","pagelanguage":"en","touched":"2015-10-23T07:36:21Z","lastrevid":687093637,"length":121},"30645082":{"pageid":30645082,"ns":14,"title":"Category:Country names","contentmodel":"wikitext","pagelanguage":"en","touched":"2015-03-07T06:33:19Z","lastrevid":561256656,"length":94},"21218559":{"pageid":21218559,"ns":14,"title":"Category:Outlines of countries","contentmodel":"wikitext","pagelanguage":"en","touched":"2015-10-07T18:04:29Z","lastrevid":645312408,"length":248},"37943702":{"pageid":37943702,"ns":14,"title":"Category:Proposed countries","contentmodel":"wikitext","pagelanguage":"en","touched":"2015-10-20T02:30:25Z","lastrevid":668630396,"length":130},"15086044":{"pageid":15086044,"ns":14,"title":"Category:Turkic states","contentmodel":"wikitext","pagelanguage":"en","touched":"2015-10-20T06:23:35Z","lastrevid":677424552,"length":114},"32809189":{"pageid":32809189,"ns":14,"title":"Category:Works about countries","contentmodel":"wikitext","pagelanguage":"en","touched":"2015-10-17T08:45:32Z","lastrevid":620016516,"length":153},"27539189":{"pageid":27539189,"ns":14,"title":"Category:Wikipedia books on countries","contentmodel":"wikitext","pagelanguage":"en","touched":"2015-04-11T05:12:25Z","lastrevid":546775798,"length":203},"35317198":{"pageid":35317198,"ns":14,"title":"Category:Wikipedia categories named after countries","contentmodel":"wikitext","pagelanguage":"en","touched":"2015-10-17T18:35:14Z","lastrevid":641689352,"length":202}}}}
The reason you didn't receive the response back is due to a HTTP Redirect 3XX. Wikipedia redirects your HTTP Request. Please try the below source code to fetch Response from Redirected URL. Please refer How to send HTTP request GET/POST in Java
public static String readfromURLwithRedirect(String url) {
String response = "";
try {
URL obj = new URL(url);
HttpURLConnection conn = (HttpURLConnection) obj.openConnection();
conn.setReadTimeout(5000);
conn.addRequestProperty("Accept-Language", "en-US,en;q=0.8");
conn.addRequestProperty("User-Agent", "Mozilla");
conn.addRequestProperty("Referer", "google.com");
System.out.println("Request URL ... " + url);
boolean redirect = false;
// normally, 3xx is redirect
int status = conn.getResponseCode();
if (status != HttpURLConnection.HTTP_OK) {
if (status == HttpURLConnection.HTTP_MOVED_TEMP
|| status == HttpURLConnection.HTTP_MOVED_PERM
|| status == HttpURLConnection.HTTP_SEE_OTHER) {
redirect = true;
}
}
System.out.println("Response Code ... " + status);
if (redirect) {
// get redirect url from "location" header field
String newUrl = conn.getHeaderField("Location");
// get the cookie if need, for login
String cookies = conn.getHeaderField("Set-Cookie");
// open the new connnection again
conn = (HttpURLConnection) new URL(newUrl).openConnection();
conn.setRequestProperty("Cookie", cookies);
conn.addRequestProperty("Accept-Language", "en-US,en;q=0.8");
conn.addRequestProperty("User-Agent", "Mozilla");
conn.addRequestProperty("Referer", "google.com");
System.out.println("Redirect to URL : " + newUrl);
}
BufferedReader in = new BufferedReader(
new InputStreamReader(conn.getInputStream()));
String inputLine;
StringBuffer responseBuffer = new StringBuffer();
while ((inputLine = in.readLine()) != null) {
responseBuffer.append(inputLine);
}
in.close();
System.out.println("URL Content... \n" + responseBuffer.toString());
response = responseBuffer.toString();
System.out.println("Done");
} catch (Exception e) {
e.printStackTrace();
}
return response;
}

URLConnection is not allowing me to access data on Http errors (404,500,etc)

I am making a crawler, and need to get the data from the stream regardless if it is a 200 or not. CURL is doing it, as well as any standard browser.
The following will not actually get the content of the request, even though there is some, an exception is thrown with the http error status code. I want the output regardless, is there a way? I prefer to use this library as it will actually do persistent connections, which is perfect for the type of crawling I am doing.
package test;
import java.net.*;
import java.io.*;
public class Test {
public static void main(String[] args) {
try {
URL url = new URL("http://github.com/XXXXXXXXXXXXXX");
URLConnection connection = url.openConnection();
DataInputStream inStream = new DataInputStream(connection.getInputStream());
String inputLine;
while ((inputLine = inStream.readLine()) != null) {
System.out.println(inputLine);
}
inStream.close();
} catch (MalformedURLException me) {
System.err.println("MalformedURLException: " + me);
} catch (IOException ioe) {
System.err.println("IOException: " + ioe);
}
}
}
Worked, thanks: Here is what I came up with - just as a rough proof of concept:
import java.net.*;
import java.io.*;
public class Test {
public static void main(String[] args) {
//InputStream error = ((HttpURLConnection) connection).getErrorStream();
URL url = null;
URLConnection connection = null;
String inputLine = "";
try {
url = new URL("http://verelo.com/asdfrwdfgdg");
connection = url.openConnection();
DataInputStream inStream = new DataInputStream(connection.getInputStream());
while ((inputLine = inStream.readLine()) != null) {
System.out.println(inputLine);
}
inStream.close();
} catch (MalformedURLException me) {
System.err.println("MalformedURLException: " + me);
} catch (IOException ioe) {
System.err.println("IOException: " + ioe);
InputStream error = ((HttpURLConnection) connection).getErrorStream();
try {
int data = error.read();
while (data != -1) {
//do something with data...
//System.out.println(data);
inputLine = inputLine + (char)data;
data = error.read();
//inputLine = inputLine + (char)data;
}
error.close();
} catch (Exception ex) {
try {
if (error != null) {
error.close();
}
} catch (Exception e) {
}
}
}
System.out.println(inputLine);
}
}
Simple:
URLConnection connection = url.openConnection();
InputStream is = connection.getInputStream();
if (connection instanceof HttpURLConnection) {
HttpURLConnection httpConn = (HttpURLConnection) connection;
int statusCode = httpConn.getResponseCode();
if (statusCode != 200 /* or statusCode >= 200 && statusCode < 300 */) {
is = httpConn.getErrorStream();
}
}
You can refer to Javadoc for explanation. The best way I would handle this is as follows:
URLConnection connection = url.openConnection();
InputStream is = null;
try {
is = connection.getInputStream();
} catch (IOException ioe) {
if (connection instanceof HttpURLConnection) {
HttpURLConnection httpConn = (HttpURLConnection) connection;
int statusCode = httpConn.getResponseCode();
if (statusCode != 200) {
is = httpConn.getErrorStream();
}
}
}
You need to do the following after calling openConnection.
Cast the URLConnection to HttpURLConnection
Call getResponseCode
If the response is a success, use getInputStream, otherwise use getErrorStream
(The test for success should be 200 <= code < 300 because there are valid HTTP success codes apart from than 200.)
I am making a crawler, and need to get the data from the stream regardless if it is a 200 or not.
Just be aware that it if the code is a 4xx or 5xx, then the "data" is likely to be an error page of some kind.
The final point that should be made is that you should always respect the "robots.txt" file ... and read the Terms of Service before crawling / scraping the content of a site whose owners might care. Simply blatting off GET requests is likely to annoy site owners ... unless you've already come to some sort of "arrangement" with them.

Categories