How to verify that URL is valid in Java 1.6?

How to verify that URL is valid in Java 1.6? - java

My application processes URLs entered manually by users. I have discovered that some of malformed URLs (like 'http:/not-valid') result in NullPointerException thrown when connection is being opened. As I learned from this Java bug report, the issue is known and will not be fixed. The suggestion is to use java.net.URI, which is "more RFC 2396-conformant".
Question is: how to use URI to work around the problem? The only thing I can do with URI is to use it to parse string and generate URL. I have prepared following program:
import java.net.*;
public class Test
{
public static void main(String[] args)
{
try {
URI uri = URI.create(args[0]);
Object o = uri.toURL().getContent(); // try to get content
}
catch(Throwable e) {
e.printStackTrace();
}
}
}
Here are results of my tests (with java 1.6.0_20), not much different from what I get with java.net.URL:
sh-3.2$ java Test url-not-valid
java.lang.IllegalArgumentException: URI is not absolute
at java.net.URI.toURL(URI.java:1080)
at Test.main(Test.java:9)
sh-3.2$ java Test http:/url-not-valid
java.lang.NullPointerException
at sun.net.www.ParseUtil.toURI(ParseUtil.java:261)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:795)
at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:726)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1049)
at java.net.URLConnection.getContent(URLConnection.java:688)
at java.net.URL.getContent(URL.java:1024)
at Test.main(Test.java:9)
sh-3.2$ java Test http:///url-not-valid
java.lang.IllegalArgumentException: protocol = http host = null
at sun.net.spi.DefaultProxySelector.select(DefaultProxySelector.java:151)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:796)
at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:726)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1049)
at java.net.URLConnection.getContent(URLConnection.java:688)
at java.net.URL.getContent(URL.java:1024)
at Test.main(Test.java:9)
sh-3.2$ java Test http:////url-not-valid
java.lang.NullPointerException
at sun.net.www.ParseUtil.toURI(ParseUtil.java:261)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:795)
at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:726)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1049)
at java.net.URLConnection.getContent(URLConnection.java:688)
at java.net.URL.getContent(URL.java:1024)
at Test.main(Test.java:9)

You can use appache Validator Commons ..
UrlValidator urlValidator = new UrlValidator();
urlValidator.isValid("http://google.com");
http://commons.apache.org/validator/
http://commons.apache.org/validator/api-1.3.1/

If I run your code with the type of malformed URI in the bug report then it throws URISyntaxException. So the suggested fix fixes the reported error.
$ java -cp bin UriTest http:\\\\www.google.com\\
java.lang.IllegalArgumentException
at java.net.URI.create(URI.java:842)
at UriTest.main(UriTest.java:8)
Caused by: java.net.URISyntaxException: Illegal character in opaque part at index 5: http:\\www.google.com\
at java.net.URI$Parser.fail(URI.java:2809)
at java.net.URI$Parser.checkChars(URI.java:2982)
at java.net.URI$Parser.parse(URI.java:3019)
at java.net.URI.(URI.java:578)
at java.net.URI.create(URI.java:840)
Your type of malformed URI is different, and does not appear to be a syntax error.
Instead, catch the null pointer exception and recover with a suitable message.
You could try and be friendly and check whether the URI starts with a single slash "http:/" and suggest that to the user, or you can check whether the hostname of the URL is non-empty:
import java.net.*;
public class UriTest
{
public static void main ( String[] args )
{
try {
URI uri = URI.create ( args[0] );
// avoid null pointer exception
if ( uri.getHost() == null )
throw new MalformedURLException ( "no hostname" );
URL url = uri.toURL();
URLConnection s = url.openConnection();
s.getInputStream();
} catch ( Throwable e ) {
e.printStackTrace();
}
}
}

Note that even with the approaches proposed in the other answers, you wouldn't get validation right, since java.net.URI adheres to RFC 2396, which is notably outdated. By using java.net.URI, you'll get exceptions for URLs that today are valid for all web browsers.
In order to solve these issues, I wrote a library for URL parsing in Java: galimatias. It performs URL parsing the same way web browsers do (adhering to the WHATWG URL Specification).
In your case, you can write:
try {
URL url = io.mola.galimatias.URL.parse(url).toJavaURL();
} catch (GalimatiasParseException e) {
// If this exception is thrown, the given URL contains a unrecoverable error. That is, it's completely invalid.
}
As a nice side-effect, you get a lot of sanitization that you won't get with java.net.URI. For example, http:/example.com will be correctly parsed as http://example.com/.

Related

check for validity of URL in java. so as not to crash on 404 error

Essentially, like a bulletproof tank, i want my program to absord 404 errors and keep on rolling, crushing the interwebs and leaving corpses dead and bludied in its wake, or, w/e.
I keep getting this error:
Exception in thread "main" org.jsoup.HttpStatusException: HTTP error fetching URL. Status=404, URL=https://en.wikipedia.org/wiki/Hudson+Township+%28disambiguation%29
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:537)
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:493)
at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:205)
at org.jsoup.helper.HttpConnection.get(HttpConnection.java:194)
at Q.Wikipedia_Disambig_Fetcher.all_possibilities(Wikipedia_Disambig_Fetcher.java:29)
at Q.Wikidata_Q_Reader.getQ(Wikidata_Q_Reader.java:54)
at Q.Wikipedia_Disambig_Fetcher.all_possibilities(Wikipedia_Disambig_Fetcher.java:38)
at Q.Wikidata_Q_Reader.getQ(Wikidata_Q_Reader.java:54)
at Q.Runner.main(Runner.java:35)
But I can't understand why because I am checking to see if I have a valid URL before I navigate to it. What about my checking procedure is incorrect?
I tried to examine the other stack overflow questions on this subject but they're not very authoritative, plus I implemented the many of the solutions from this one and this one, so far nothing has worked.
I'm using the apache commons URL validator, this is the code I've been using most recently:
//get it's normal wiki disambig page
String URL_check = "https://en.wikipedia.org/wiki/" + associated_alias;
UrlValidator urlValidator = new UrlValidator();
if ( urlValidator.isValid( URL_check ) )
{
Document docx = Jsoup.connect( URL_check ).get();
//this can handle the less structured ones.
and
//check the validity of the URL
String URL_czech = "https://www.wikidata.org/wiki/Special:ItemByTitle?site=en&page=" + associated_alias + "&submit=Search";
UrlValidator urlValidator = new UrlValidator();
if ( urlValidator.isValid( URL_czech ) )
{
URL wikidata_page = new URL( URL_czech );
URLConnection wiki_connection = wikidata_page.openConnection();
BufferedReader wiki_data_pagecontent = new BufferedReader(
new InputStreamReader(
wiki_connection.getInputStream()));

The URLConnection throws an error when the status code of the webpage your downloading returns anything other than 2xx (such as 200 or 201 ect...). Instead of passing Jsoup a URL or String to parse your document consider passing it an input stream of data which contains the webpage.
Using the HttpURLConnection class we can try to download the webpage using getInputStream() and place that in a try/catch block and if it fails attempt to download it via getErrorStream().
Consider this bit of code which will download your wiki page even if it returns 404
String URL_czech = "https://en.wikipedia.org/wiki/Hudson+Township+%28disambiguation%29";
URL wikidata_page = new URL(URL_czech);
HttpURLConnection wiki_connection = (HttpURLConnection)wikidata_page.openConnection();
InputStream wikiInputStream = null;
try {
// try to connect and use the input stream
wiki_connection.connect();
wikiInputStream = wiki_connection.getInputStream();
} catch(IOException e) {
// failed, try using the error stream
wikiInputStream = wiki_connection.getErrorStream();
}
// parse the input stream using Jsoup
Jsoup.parse(wikiInputStream, null, wikidata_page.getProtocol()+"://"+wikidata_page.getHost()+"/");

The Status=404 error means there's no page at that location. Just because a URL is valid doesn't mean there's anything there. A validator can't tell you that. The only way you can determine that is by fetching it, and seeing if you get an error, as you're doing.

Encoded URL and java.lang.IllegalArgumentException

I encode some URL parameters and URL becomes correct, but I still get java.lang.IllegalArgumentException. Here is my code:
StringBuilder makeUrlFromWord = new StringBuilder();
List<String> splittedUrl = mParser.splitRequest(urls[0]);
try {
makeUrlFromWord.append("http://")
.append(URLEncoder.encode(splittedUrl.get(0), HTTP.UTF_8))
.append(".jpg.to/");
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
Log.d("Made url", makeUrlFromWord.toString());
Here is part of the log:
D/Made url﹕ http://%D0%BF%D1%80%D0%B8%D0%B2%D0%B5%D1%82.jpg.to/
W/System.err﹕ java.lang.IllegalArgumentException: Host name may not be null
The link is correct, I tried this in browser, it decodes back in to Cyrillic symbols and works.

It looks like the trick is to use IDNA encoding:
Android defines java.net.IDN providing the conversion functions.

That works for me. Converts "привет.jpg.to" to "http://xn--b1agh1afp.jpg.to/" thanks to #18446744073709551615
makeUrlFromWord.append("http://")
.append(IDN.toASCII(splittedUrl.get(0)))
.append(".jpg.to/");

How to use ESAPI to fix Resource Injection (URL) issues

I am new to the Stack Overflow forum. I have a question in remediating the fortify scan issues.
HP Fortify scan reporting the Resource Injection issue for following code.
String testUrl = "http://google.com";
URL url = null;
try {
url = new URL(testUrl);
} catch (MalformedURLException mue) {
log.error("MalformedUrlException URL " + testUrl + " Exception : " + mue);
}
In the above code fortify showing Resource injection in line => url = new URL(testUrl);
I have done following code changes for URL validation using ESAPI to remediate this issue,
String testUrl = "http://google.com";
URL url = null;
try {
String canonURL = ESAPI.encoder().canonicalize(strurl, false, false);
if(ESAPI.validator().isValidInput("URLContext", canonURL, "URL", canonURL.length(), false)) {
url = new URL(canonURL);
} else {
log.error("In Valid script URL passed"+ canonURL);
}
} catch (MalformedURLException mue) {
log.error("MalformedUrlException URL " + canonURL + " Exception : " + mue);
}
However, still Fortify scan reporting as en error. It is not remeditaing this issue. Anything am doing wrong?
Any solution will help lot.
Thanks,
Marimuthu.M

I think that the real issue here is not that the URL may be somehow malformed, but, that the URL may not reference a valid site. More specifically, if I, the bad guy, am able to cause your URL to point to my web site, then you obtain data from my location that is not tested and I can return data that may be used to compromise your system. I might use that to say return a record for "bob the bad guy" that makes bob look like a good guy.
I suspect that in your code you do not set a hard coded value in a string, since this is usually described with words such as
When an application permits a user input to define a resource, like a
file name or port number, this data can be manipulated to execute or
access different resources.
(see https://www.owasp.org/index.php/Resource_Injection)
I think that the proper response will be some combination of:
Do not get the result from the user, but, use the input to choose from your own internal list.
Argue that the value came from a trusted source. For example, read from a strictly controlled database or configuration file.
You do not need to remove the warnings, you need to demonstrate that you understand the risk and indicate why it is OK to use the value in your case.

boolean isValidInput(java.lang.String context,
java.lang.String input,
java.lang.String type,
int maxLength,
boolean allowNull)
throws IntrusionException
type filed in isValidInput function defines a Regular expression or pattern to match with your testUrl.
Like:
try {
ESAPI.validator().getValidInput("URI_VALIDATION", requestUri, "URL", 80, false);
} catch (ValidationException e) {
System.out.println("Validation exception");
e.printStackTrace();
} catch (IntrusionException e) {
System.out.println("Inrusion exception");
e.printStackTrace();
}
It will pass if requestUri matches pattern defined in validation.properties under Validator.URL and its length is less than 80.
Validator.URL=^(ht|f)tp(s?)\:\/\/0-9a-zA-Z(:(0-9))(\/?)([a-zA-Z0-9\-\.\?\,\:\'\/\\\+=&%\$#_])?$

This is piggybacking on Andrew's answer, but the problem Fortify is warning you of is user control of a URL. If your application later decides to make connections to that website, and it is untrusted, this is an issue.
If this is an application where you care more about sharing public URIs, than you'll have to accept the risk, and make sure users are properly trained on the inherent risk, as well as make sure if you redisplay those URLs, that someone doesn't try to embed malicious data.

What is wrong with the given URL validation code in Java?

I know that better methods of URL validation exist and worse methods might be common that this example. But can someone tell me what is probably wrong with the following URL validation code when the url = "Some random english sentence" ?
I see that the validation fails. Dont know why.
/**
* Checks if url is ok
* THIS METHOD DOESNT SEEM TO WORK WELL
*
* #param url
* #return True if url is ok, False otherwise
*/
static public boolean isUrlOk(String url) {
try {
URL urlObject = new URL(url);
String host = urlObject.getHost();
return true;
} catch (Exception e) {
return false;
}
}
The problem: It sometimes returns true for random sentences.

modify the catch part to add e.printStacktrace() to get the details of why it fails.
If you are trying with "Some random english sentence" it will fail with no protocol specified.

According to the java.net.URL API doc at http://docs.oracle.com/javase/7/docs/api/java/net/URL.html#URL(java.lang.String):
MalformedURLException - if no protocol is specified, or an unknown protocol is found, or spec is null.
Since no scheme was specified, the exception was thrown.

How to resolve java.net.MalformedURLException: Protocol not found: 9 in android

I am trying to load images in my android application from a url (http://www.elifeshopping.com/images/stories/virtuemart/product/thumbnail (2).jpg) using BitmapFactory the code is below :
try {
// ImageView i = (ImageView)findViewById(R.id.image);
bitmap = BitmapFactory.decodeStream((InputStream) new URL(url)
.getContent());
i.setImageBitmap(bitmap);
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
here i get
05-03 15:57:13.156: W/System.err(1086): java.net.MalformedURLException: Protocol not found: 9
05-03 15:57:13.167: W/System.err(1086): at java.net.URL.<init>(URL.java:273)
05-03 15:57:13.167: W/System.err(1086):
at java.net.URL.<init>(URL.java:157).
Please help by telling what I am doing wrong.

I used
productImgUrl = productImgUrl.replaceAll(" ", "%20");
i replaced all the spaces by %20
and its working for me ..
Thanks everybody for their responses

Please help by telling what I am doing wrong.
I think that the problem is that you are calling the URL constructor with an invalid URL string. Indeed, the exception message implies that the URL string starts with "9:". (The 'protocol' component is the sequence of characters before the first colon character of the URL.)
This doesn't make a lot of sense if the URL string really is:
"http://www.elifeshopping.com/images/stories/virtuemart/product/thumbnail (2).jpg"
so I'd infer that it is ... in fact ... something else. Print it out before you call the URL constructor to find out what it really is.
(You should also %-escape the space characters in the URL's path ... but I doubt that will fix this particular exception incarnation.)

Change your url to http://www.elifeshopping.com/images/stories/virtuemart/product/thumbnail%20%282%29.jpg

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to verify that URL is valid in Java 1.6? - java

You can use appache Validator Commons .. UrlValidator urlValidator = new UrlValidator(); urlValidator.isValid("http://google.com"); http://commons.apache.org/validator/ http://commons.apache.org/validator/api-1.3.1/

Related

check for validity of URL in java. so as not to crash on 404 error

Encoded URL and java.lang.IllegalArgumentException

How to use ESAPI to fix Resource Injection (URL) issues

What is wrong with the given URL validation code in Java?

How to resolve java.net.MalformedURLException: Protocol not found: 9 in android

Categories

Resources