Java - Quickest way to check if URL exists - java

Hi I am writing a program that goes through many different URLs and just checks if they exist or not. I am basically checking if the error code returned is 404 or not. However as I am checking over 1000 URLs, I want to be able to do this very quickly. The following is my code, I was wondering how I can modify it to work quickly (if possible):
final URL url = new URL("http://www.example.com");
HttpURLConnection huc = (HttpURLConnection) url.openConnection();
int responseCode = huc.getResponseCode();
if (responseCode != 404) {
System.out.println("GOOD");
} else {
System.out.println("BAD");
}
Would it be quicker to use JSoup?
I am aware some sites give the code 200 and have their own error page, however I know the links that I am checking dont do this, so this is not needed.

Try sending a "HEAD" request instead of get request. That should be faster since the response body is not downloaded.
huc.setRequestMethod("HEAD");
Again instead of checking if response status is not 400, check if it is 200. That is check for positive instead of negative. 404,403,402.. all 40x statuses are nearly equivalent to invalid non-existant url.
You may make use of multi-threading to make it even faster.

Try to ask the next DNS Server
class DNSLookup
{
public static void main(String args[])
{
String host = "stackoverflow.com";
try
{
InetAddress inetAddress = InetAddress.getByName(host);
// show the Internet Address as name/address
System.out.println(inetAddress.getHostName() + " " + inetAddress.getHostAddress());
}
catch (UnknownHostException exception)
{
System.err.println("ERROR: Cannot access '" + host + "'");
}
catch (NamingException exception)
{
System.err.println("ERROR: No DNS record for '" + host + "'");
exception.printStackTrace();
}
}
}

Seems you can set the timeout property, make sure it is acceptable. And if you have many urls to test, do them parallelly, it will be much faster. Hope this will be helpful.

Related

How to check if the url is there or not in java?

I am coming to an issue where I need help to check for a url that when I search on a particular code it wont show the url that I have listed. Is there a way to make it work with my code? I tried and created a method below which is generateLink. Thanks for the help.
First, since generateLink will either return a valid URL or null, you need to change this:
jgen.writeStringField("pay_grade_description_link", generateLink(XXX_URL + value.jobClassCd) + ".pdf");
to this:
jgen.writeStringField("pay_grade_description_link", generateLink(value.jobClassCd));
If you concatenate ".pdf" to it, null return values will be meaningless, since null + ".pdf" results in the eight-character string "null.pdf".
Second, you can check the response code of an HttpURLConnection to test a URL’s validity. (In theory, you should be able to use the "OPTIONS" HTTP method to test the URL, but not all servers support it.)
private String generateLink(String jobClassCd) {
String url = XXX_URL + jobClassCd + ".pdf";
try {
HttpURLConnection connection =
(HttpURLConnection) new URL(url).openConnection();
if (connection.getResponseCode() < 400) {
return url;
}
} catch (IOException e) {
Logger.getLogger(JobSerializer.class.getName()).log(Level.FINE,
"URL \"" + url + "\" is not reachable.", e);
}
return null;
}

Looking for an alternate way to validate URLs in Java

I'm using HttpURLConnection to validate URLs coming out of a database. Sometimes with certain URLs I will get an exception, I assume they are timing out but are in fact reachable (no 400 range error).
Increasing the timeout doesn't seem to matter, I still get an exception. Is there a second check I could do in the catch region to verify if in fact the URL is bad? The relevant code is below. It works with 99.9% of URLs, it's that .01%.
try {
HttpURLConnection connection = (HttpURLConnection) new URL(url).openConnection();
connection.setConnectTimeout(timeout);
connection.setReadTimeout(timeout);
connection.setRequestMethod("GET");
connection.setRequestProperty("User-Agent",
"Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.0.13) Gecko/2009073021 Firefox/3.0.13");
connection.connect () ;
int responseCode = connection.getResponseCode();
if (responseCode >= 401)
{
String prcMessage = "ERROR: URL " + url + " not found, response code was " + responseCode + "\r";
System.out.println(prcMessage);
VerifyUrl.writeToFile(prcMessage);
return (false);
}
}
catch (IOException exception)
{
String errorMessage = ("ERROR: URL " + url + " did not load in the given time of " + timeout + " milliseconds.");
System.out.println(errorMessage);
VerifyUrl.writeToFile(errorMessage);
return false;
}
Depends on what you want to check. But i guess Validating URL in Java got you covered.
You got two possiblities:
Check syntax ("Is this URL a real URL or just made up?")
There is a large amount of text which describes how to do it. Basically search for RFC 3986. I guess someone has implemented a check like this already.
Check the semantics ("Is the URL available?")
There is not really a faster way to do that though there are different tools available for sending a http request in java. You may send a HEAD request instead of GET as HEAD omits the HTTP body and may result in faster requests and less timeouts.

Jsoup malformed url

I'm having trouble with connecting to a url with JSoup.
The url I am trying to test is www.xbox.com/en-US/security which is a 302(I think) redirect to http://www.xbox.com/en-US/Live/Account-Security. I have set up jsoup to follow redirect and get the new url using .headers("location"). The url returned is /en-US/Live/Account-Security. I'm not sure how to handle it, my code is below:
while (i < retries){
try {
response = Jsoup.connect(checkUrl)
.userAgent("Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.21 (KHTML, like Gecko) Chrome/19.0.1042.0 Safari/535.21")
.followRedirects(false)
.timeout(10000)
.execute();
success = true;
break;
} catch (SocketTimeoutException ex){
timeout = true;
} catch (MalformedURLException ep){
malformedUrl = true;
}catch (IOException e) {
statusCode = 404;
}
}
private void getStatus(){
if (success){
statusCode = response.statusCode();
success = false;
}
if (statusCode >= 300 && statusCode <= 399){
//System.out.println("redirect: " +statusCode + " " +checkUrl);
checkUrl = response.header("location");
//System.out.println(checkUrl);
connect();
getStatus();
}
}
Has anyone got suggestions on how to handle this? Or should I do a check on my checkUrl = response.header("location"); to see if it is a valid url and if not don't test it?
First things first: If you try to access "www.xbox.com/en-US/security", it'll throw you a MalformedException and thus not redirect you to where you want.
Than there's the issue that I'd use only the boolean variable success, and set it as false if any exception is caught. Then again I don't know if you're using timeout, or malformed variables for anything.
After that I'd say that the line right after IOException is never useful. I again couldn't tell, since I can't see the full code.
Now... To your question: The returned string is a domain within the first URL you provided. It'll go simply like this:
//Assuming you won't ever change it, make it a final
//variable for less memory usage.
final String URL = "http://www.xbox.com/en-US/security";
//Whatever piece of processing here
//Some tests just to make sure you'll get what you're
//fetching:
String newUrl = ""
if (checkUrl.startsWith("/"))
newUrl = URL + checkUrl;
if (checkUrl.startsWith("http://"))
newUrl = checkUrl;
if (checkUrl.startsWith("www"))
newUrl = "http://" + checkUrl;
This piece of code will basically make sure you can navigate through urls, without getting some MalformedUrlException. I'd suggest putting a manageUrl() method somewhere and test if the fetched URL is within the domain you're searching, or ele you might end up in e-commerces or publicuty websites.
Hope it helps =)

IP fallback in android

I'm accessing a server for web service calls. When I'm developing on the same network as the server, I can access the web service by its internal IP address but not its external IP address. However, if I'm not on the network, I can only access it by its external IP address. What's the best way to try one of the IP addresses and then fall back on the other?
Here's a sample of my code for accessing only one or the other:
protected String retrieve() {
Log.v(TAG, "retrieving data from url: " + getURL());
HttpPost request = new HttpPost(getURL());
try {
StringEntity body = new StringEntity(getBody());
body.setContentType(APPLICATION_XML_CONTENT_TYPE);
request.setEntity(body);
HttpConnectionParams.setConnectionTimeout(client.getParams(), CONNECTION_TIMEOUT);
HttpConnectionParams.setSoTimeout(client.getParams(), SOCKET_TIMEOUT);
HttpResponse response = client.execute(request);
final int statusCode = response.getStatusLine().getStatusCode();
if (statusCode != HttpStatus.SC_OK) {
Log.e(TAG, "the URL " + getURL() + " returned the status code: " + statusCode + ".");
return null;
}
HttpEntity getResponseEntity = response.getEntity();
if (getResponseEntity != null) {
return EntityUtils.toString(getResponseEntity);
}
} catch (IOException e) {
Log.e(TAG, "error retrieving data.", e);
request.abort();
}
return null;
}
/*
* #return the URL which should be called.
*/
protected String getURL() {
return INTERNAL_SERVER_URL + WEB_APP_PATH;
}
Look at own IP address of your android. You can get it like stated here.
Then you can decide:
if you are in subnet of your office (e.g. 192.168.0.0/16) - use internal address
if you are in other subnet - use external address
Building on the very good comment by harism, I would simply use a static boolean to choose the IP and thus avoid pinging the wrong IP every time:
public static final Boolean IS_DEBUG = true; // or false
// all your code here
if (DEBUG)
ip = xxx.xxx.xxx.xxx;
else
ip = yyy.yyy.yyy.yyy;
This isn't exactly something you can easily fix in software. The right answer I think is fixing the filters/configuration that route traffic to your internal web server or by properly configuring DNS to return the proper IP depending on where you are (inside or outside the network). More information can be found here:
Accessing internal network resource using external IP address
http://www.astaro.org/astaro-gateway-products/network-security-firewall-nat-qos-ips-more/6704-cant-acces-internal-webserver-via-external-ip.html
and by Googling something like "external IP doesn't work on internal network"
You could put retry code in the catch clause for IOException
protected String retrieve(String url) {
Log.v(TAG, "retrieving data from url: " + url);
HttpPost request = new HttpPost(url);
try {
StringEntity body = new StringEntity(getBody());
body.setContentType(APPLICATION_XML_CONTENT_TYPE);
request.setEntity(body);
HttpConnectionParams.setConnectionTimeout(client.getParams(), CONNECTION_TIMEOUT);
HttpConnectionParams.setSoTimeout(client.getParams(), SOCKET_TIMEOUT);
HttpResponse response = client.execute(request);
final int statusCode = response.getStatusLine().getStatusCode();
if (statusCode != HttpStatus.SC_OK) {
Log.e(TAG, "the URL " + getURL() + " returned the status code: " + statusCode + ".");
return null;
}
HttpEntity getResponseEntity = response.getEntity();
if (getResponseEntity != null) {
return EntityUtils.toString(getResponseEntity);
}
} catch (IOException e) {
if(url.equals(EXTERNAL_URL){
return retrieve(INTERNAL_URL);
}
Log.e(TAG, "error retrieving data.", e);
request.abort();
}
return null;
}
Note: Like most people have said, this probably is not a great solution for a production release, but for testing it would probably work just fine.
You could change your retrieve() call to take the host as a parameter. On startup, try to ping each possible host. Do a simple retrieve call to something that returns very fast (like maybe a test page). Work through each possible host you want to try. Once you found one that works, save that host and use it for all future calls.
1st, you should be able to handle the case on the runtime. I'd very strongly recommend vs the different builds.. For example: try the external, if fails the internal.
Some heuristics meanwhile:
Internal IP implies the network is the same. So you can check it, if it's the same try the internal address 1st, otherwise the external has precedence.
Later,save the mapping of the local ip address to the successfully connected one and look it up to alter the precedence.
The resolution itself may be carried by requesting the root '/' of the server with a timeout (you can use 2 different threads to carry the task simultaneously, if you feel like it).
Morealso, if you have access to the router/firewall it can be made to recognize the external addresses and properly handle it. So you can end up with the external address that works properly.
I would put this kind of environment-specific information in a properties file (or some kind of configuration file anyway). All you need is a file with one line (obviously you would change the IP address to what you need):
serverUrl=192.168.1.1
Java already has a built-in feature for reading and writing these kinds of files (see the link above). You could also keep database connection information etc. in this file. Anything environment-specific really.
In your code it looks like you are using constants to hold the server URL. I would not suggest that. What happens when the server URL changes? You'd need to modify and re-compile your code. With a configuration file, however, no code changes would be necessary.
import org.apache.commons.httpclient.HttpClient;
import org.apache.commons.httpclient.HttpException;
import org.apache.commons.httpclient.HttpMethod;
import org.apache.commons.httpclient.methods.GetMethod;
and:
HttpClient client = new HttpClient();
HttpMethod method = new GetMethod("http://www.myinnersite.com");
int responseCode = client.executeMethod(method);
if (responseCode != 200) {
method = new GetMethod("http://www.myoutersite.com");
}
etc...

Preferred Java way to ping an HTTP URL for availability

I need a monitor class that regularly checks whether a given HTTP URL is available. I can take care of the "regularly" part using the Spring TaskExecutor abstraction, so that's not the topic here. The question is: What is the preferred way to ping a URL in java?
Here is my current code as a starting point:
try {
final URLConnection connection = new URL(url).openConnection();
connection.connect();
LOG.info("Service " + url + " available, yeah!");
available = true;
} catch (final MalformedURLException e) {
throw new IllegalStateException("Bad URL: " + url, e);
} catch (final IOException e) {
LOG.info("Service " + url + " unavailable, oh no!", e);
available = false;
}
Is this any good at all (will it do what I want)?
Do I have to somehow close the connection?
I suppose this is a GET request. Is there a way to send HEAD instead?
Is this any good at all (will it do what I want?)
You can do so. Another feasible way is using java.net.Socket.
public static boolean pingHost(String host, int port, int timeout) {
try (Socket socket = new Socket()) {
socket.connect(new InetSocketAddress(host, port), timeout);
return true;
} catch (IOException e) {
return false; // Either timeout or unreachable or failed DNS lookup.
}
}
There's also the InetAddress#isReachable():
boolean reachable = InetAddress.getByName(hostname).isReachable();
This however doesn't explicitly test port 80. You risk to get false negatives due to a Firewall blocking other ports.
Do I have to somehow close the connection?
No, you don't explicitly need. It's handled and pooled under the hoods.
I suppose this is a GET request. Is there a way to send HEAD instead?
You can cast the obtained URLConnection to HttpURLConnection and then use setRequestMethod() to set the request method. However, you need to take into account that some poor webapps or homegrown servers may return HTTP 405 error for a HEAD (i.e. not available, not implemented, not allowed) while a GET works perfectly fine. Using GET is more reliable in case you intend to verify links/resources not domains/hosts.
Testing the server for availability is not enough in my case, I need to test the URL (the webapp may not be deployed)
Indeed, connecting a host only informs if the host is available, not if the content is available. It can as good happen that a webserver has started without problems, but the webapp failed to deploy during server's start. This will however usually not cause the entire server to go down. You can determine that by checking if the HTTP response code is 200.
HttpURLConnection connection = (HttpURLConnection) new URL(url).openConnection();
connection.setRequestMethod("HEAD");
int responseCode = connection.getResponseCode();
if (responseCode != 200) {
// Not OK.
}
// < 100 is undetermined.
// 1nn is informal (shouldn't happen on a GET/HEAD)
// 2nn is success
// 3nn is redirect
// 4nn is client error
// 5nn is server error
For more detail about response status codes see RFC 2616 section 10. Calling connect() is by the way not needed if you're determining the response data. It will implicitly connect.
For future reference, here's a complete example in flavor of an utility method, also taking account with timeouts:
/**
* Pings a HTTP URL. This effectively sends a HEAD request and returns <code>true</code> if the response code is in
* the 200-399 range.
* #param url The HTTP URL to be pinged.
* #param timeout The timeout in millis for both the connection timeout and the response read timeout. Note that
* the total timeout is effectively two times the given timeout.
* #return <code>true</code> if the given HTTP URL has returned response code 200-399 on a HEAD request within the
* given timeout, otherwise <code>false</code>.
*/
public static boolean pingURL(String url, int timeout) {
url = url.replaceFirst("^https", "http"); // Otherwise an exception may be thrown on invalid SSL certificates.
try {
HttpURLConnection connection = (HttpURLConnection) new URL(url).openConnection();
connection.setConnectTimeout(timeout);
connection.setReadTimeout(timeout);
connection.setRequestMethod("HEAD");
int responseCode = connection.getResponseCode();
return (200 <= responseCode && responseCode <= 399);
} catch (IOException exception) {
return false;
}
}
Instead of using URLConnection use HttpURLConnection by calling openConnection() on your URL object.
Then use getResponseCode() will give you the HTTP response once you've read from the connection.
here is code:
HttpURLConnection connection = null;
try {
URL u = new URL("http://www.google.com/");
connection = (HttpURLConnection) u.openConnection();
connection.setRequestMethod("HEAD");
int code = connection.getResponseCode();
System.out.println("" + code);
// You can determine on HTTP return code received. 200 is success.
} catch (MalformedURLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} finally {
if (connection != null) {
connection.disconnect();
}
}
Also check similar question How to check if a URL exists or returns 404 with Java?
Hope this helps.
You could also use HttpURLConnection, which allows you to set the request method (to HEAD for example). Here's an example that shows how to send a request, read the response, and disconnect.
The following code performs a HEAD request to check whether the website is available or not.
public static boolean isReachable(String targetUrl) throws IOException
{
HttpURLConnection httpUrlConnection = (HttpURLConnection) new URL(
targetUrl).openConnection();
httpUrlConnection.setRequestMethod("HEAD");
try
{
int responseCode = httpUrlConnection.getResponseCode();
return responseCode == HttpURLConnection.HTTP_OK;
} catch (UnknownHostException noInternetConnection)
{
return false;
}
}
public boolean isOnline() {
Runtime runtime = Runtime.getRuntime();
try {
Process ipProcess = runtime.exec("/system/bin/ping -c 1 8.8.8.8");
int exitValue = ipProcess.waitFor();
return (exitValue == 0);
} catch (IOException | InterruptedException e) { e.printStackTrace(); }
return false;
}
Possible Questions
Is this really fast enough?Yes, very fast!
Couldn’t I just ping my own page, which I want
to request anyways? Sure! You could even check both, if you want to
differentiate between “internet connection available” and your own
servers beeing reachable What if the DNS is down? Google DNS (e.g.
8.8.8.8) is the largest public DNS service in the world. As of 2013 it serves 130 billion requests a day. Let ‘s just say, your app not
responding would probably not be the talk of the day.
read the link. its seems very good
EDIT:
in my exp of using it, it's not as fast as this method:
public boolean isOnline() {
NetworkInfo netInfo = connectivityManager.getActiveNetworkInfo();
return netInfo != null && netInfo.isConnectedOrConnecting();
}
they are a bit different but in the functionality for just checking the connection to internet the first method may become slow due to the connection variables.
Consider using the Restlet framework, which has great semantics for this sort of thing. It's powerful and flexible.
The code could be as simple as:
Client client = new Client(Protocol.HTTP);
Response response = client.get(url);
if (response.getStatus().isError()) {
// uh oh!
}

Categories