Why do i get java.net.MalformedURLException: no protocol? - java

I am trying to build a java program that downloads files, but i get and exemption every time.
java.net.MalformedURLException: no protocol
the code for the URL is
URL site;
String urlString = "http://www.cs.drexel.edu/~spiros/teaching/CS575/slides/java.pdf‎";
site = new URL("urlString");
I have also tried:
String urlString = "www.cs.drexel.edu/~spiros/teaching/CS575/slides/java.pdf‎";
i have tried printing urlString to the console, it is being set correctly to ether one accordingly in each test. What am i missing

This is wrong :
site = new URL("urlString");
Use the variable:
site = new URL(urlString);

"urlString" is a string literal for the literal value urlString.
That isn't a valid URL.
You probably want to reference the variable, not write a string literal.

Related

Extract a randomly generated ID from URL

I need to extract a randomly generated part of an URL for a Selenium Test in Java.
When the browser opens a page, e.g.:
/edit_person.html?id=eb58cea3a3772ff656987792eb0a8c0f
then I'm able to show the url with:
String url = driver.getCurrentUrl();
but now I need to get only the randomly generated ID after the equals sign.
How do I extract the value of parameter id once I have the entire URL as a string in variable url?
URL.getQuery() will give the query portion as a String it is a simple regular expression match to isolate the part you want.
id=(.*) will get you what you want as long as it is the only thing in the query string.
This is how managed to solve the problem:
String url = driver.getCurrentUrl();
URL aURL = new URL(url);
url = aURL.getQuery();
String[] id = url.split("=");
System.out.println(id[1]);
Thanks to Jarrod Roberson!

URL replacement in java

I am trying to replace url with another url.
Below is the example of source url
http://sysserver01.internal.com/web/www/internal/projectwork/resources/injury-prevention-and-recovery/avoiding-injury/overview-of-running-injuries/
so this url should be replace with below url,
http://sysserver01.internal.com/var/www/html/injury-prevention-and-recovery/avoiding-injury/overview-of-running-injuries/
It means if source url comes then the part after resources in source url must be appended with /var/www/html/(and rest of part after resources in source url).
This needs to be happen with rendom set of source url that contains resources string.
I dont have enough knowldege of string manipulation. So please someone help me to solve this query. Please try to solve it in JAVA as I choose this platform for my work.
String originalUrl = "http://sysserver01.internal.com/web/www/internal/projectwork/resources/injury-prevention-and-recovery/avoiding-injury/overview-of-running-injuries";
String newUrl = originalUrl.replaceAll("web/www/internal/projectwork/resources", "var/www/html");
String originalUrl = "http://sysserver01.internal.com/web/www/internal/projectwork/resources/injury-prevention-and-recovery/avoiding-injury/overview-of-running-injuries";
String newUrl = originalUrl.replace("web/www/internal/projectwork/resources", "var/www/html");

Returned URL as String is not valid in JSF

I'm trying to make use of google api as text-to-speech. So, I build a String then should pass it as a URL to a component to obtain a MP3 with the spoken words.
So, this is my code:
URI uri = new URI("http://translate.google.com/translate_tts?tl=es&q="+ URLEncoder.encode((String)this.text.getValue(), "UTF-8"));
When I make uri.toString() its return a well formed URL. If I copy and paste this output in the browser works pefectly.
But if I assign this returned String to the source property of a ice:outputMedia is not working. Then inspect the HTML generated in the page and the String in src property is:
http://translate.google.com/translate_tts?tl=es&q=Bobby+need+peanuts
The & symbol has been replaced by &.
How can I avoid this to make a valid URL?
You need to decode the url on the client side using Javascript.
var decoded = decodeURI(URI)

Encoding an URL sent to a server (not in query)

I need to be testing my server for several URLs daily since these URLs are updated by my users - and this will be dine in Java. However, these URLs contains strange characters (like the german umlaut). Basicly what I am doing is:
for every URL in the list to check
URL u = new URL(the_url);
u.openConnection(..);
// read the content and handle it
Now, what Ive found is that org.apache.commons.codec.net.URLCodec is fine for encoding string to paste into the QueryString, it is not as suitable to encode strange URLs into their hex counterparts. Here are some examples of URLs:
http:// www.example com/u/überraum-03/
http:// www.example com/u/são-paulo-dude/
http:// www.example com/u/håkon-hellström/
The desired result for the first would be;
http:// www.example com/u/%c3%9berraum-03/
Are there any library in the Apache Commons or java itself, to convert special character in the ACTUAL url (not querystring - and therefore not replace the same kind of characters) ?
Thank you for your time.
Edited
Firefox translates "yr.no/place/Norway/Nordland/Moskenes/Å/data.html"; into "yr.no/place/Norway/Nordland/Moskenes/%C3%85/data.html" (try this by entering the first URL, press enter, then copy the url into a document). It is this effect that I am looking for - since this is the actual translation. What is most likely happening is either FF knows Å is a bad thing, it tries multiple versions or it accepts the servers "Location" header; either way - there is a tranformation from "Å" to "%C3%85" on only a subset of the URL. This is the function we need.
Edited
I just verified that the code given by commentor does not work sadly. As an example, try this:
try{
String urlStr = "http://www.yr.no/place/Norway/Nordland/Moskenes/Å/data.html";
URL u=new URL(urlStr);
URI uri = new URI(u.getProtocol(),
u.getUserInfo(), u.getHost(), u.getPort(),
u.getPath(), u.getQuery(),
null); // removing ref
URL urlObj = uri.toURL();
HttpURLConnection connection = (HttpURLConnection) urlObj.openConnection();
connection.setInstanceFollowRedirects(false);
connection.connect();
for (int i=0;i<connection.getHeaderFields().size();i++)
System.out.println(connection.getHeaderFieldKey(i)+": "+connection.getHeaderField(i));
System.exit(0);
}catch(Exception e){e.printStackTrace();};
Will yield a 404 error - strangely enough the encoded part does also not work.
If you need a URL that is a valid URI (RFC 2396 compliant) you can create one like this in Java
String urlString = "http://www.example.com/u/håkon-hellström/";
URL url = new URL(urlString);
URI uri = new URI(url.getProtocol(),url.getAuthority(), url.getPath(), url.getQuery(), url.getRef());
url = new URL(uri.toASCIIString());
That being said all three sample strings you provided are RFC 2396 compliant and do not need to be encoded. I am assuming the spaces in the authority part of the URLs you provided are typos.
EDIT:
I updated the code block above. By using URI.toASCIIString() you can limit the resulting URI to only US-ASCII characters (other characters are encoded). The resulting string can then be used to create a new, valid URL.
http://www.example.com/u/håkon-hellström/
changes to
http://www.example.com/u/h%C3%A5kon-hellstr%C3%B6m/

HttpClient and non-ASCII URL characters (á,é,í,ó,ú)

'Long time reader, first time poster' here.
I'm in the process of making a bot for a spanish Wiki I administer. I wanted to make it from scratch, since one of the purposes of me making it is to practice Java. However, I ran into some trouble when trying to make GET requests with HttpClient to URIs that contain non-ASCII characters such as á,é,í,ó or ú.
String url = "http://es.metroid.wikia.com/api.php?action=query&list=categorymembers&cmtitle=Categoría:Mejoras de las Botas"
method = new GetMethod(url);
client.executeMethod(method);
When I do the above, GetMethod complains about the URI:
Exception in thread "main" java.lang.IllegalArgumentException: Invalid uri 'http://es.pruebaloca.wikia.com/api.php?action=query&list=categorymembers&cmtitle=Categoría:Mejoras%20de%20las%20Botas&cmlimit=500&format=xml': Invalid query
at org.apache.commons.httpclient.HttpMethodBase.<init>(HttpMethodBase.java:222)
at org.apache.commons.httpclient.methods.GetMethod.<init>(GetMethod.java:89)
at net.metroidover.categorybot.http.HttpRequest.request(HttpRequest.java:69)
at net.metroidover.categorybot.http.HttpRequest.request(HttpRequest.java:120)
at net.metroidover.categorybot.http.Action.getCategoryMembers(Action.java:38)
at net.metroidover.categorybot.bot.BotComponent.<init>(BotComponent.java:58)
at net.metroidover.categorybot.bot.BotComponent.main(BotComponent.java:80)
Note that in the URI shown in the stack trace, spaces are encoded into %20 and the ís are left as is. That exact same URI works perfectly on a browser, but I can't get around into GetMethod accepting it.
I've also tried doing the following:
URI uri = new URI(url, false);
method = new GetMethod(uri.getEscapedURI());
client.executeMethod(method);
This way, URI escaped the is, but double escaped the spaces (%2520)...
http://es.metroid.wikia.com/api.php?action=query&list=categorymembers&cmtitle=Categor%C3%ADa:Mejoras%2520de%2520las%2520Botas&cmlimit=500&format=xml
Now, if I don't use any spaces in the query, there's no double escaping and I get the desired output. So if there wasn't any possibility of non-ASCII characters, I wouldn't need to use the URI class and wouldn't get the double escaping. In an attempt to avoid the first escaping of the spaces, I tried this:
URI uri = new URI(url, true);
method = new GetMethod(uri.getEscapedURI());
client.executeMethod(method);
But the URI class didn't like it:
org.apache.commons.httpclient.URIException: Invalid query
at org.apache.commons.httpclient.URI.parseUriReference(URI.java:2049)
at org.apache.commons.httpclient.URI.<init>(URI.java:167)
at net.metroidover.categorybot.http.HttpRequest.request(HttpRequest.java:66)
at net.metroidover.categorybot.http.HttpRequest.request(HttpRequest.java:121)
at net.metroidover.categorybot.http.Action.getCategoryMembers(Action.java:38)
at net.metroidover.categorybot.bot.BotComponent.<init>(BotComponent.java:58)
at net.metroidover.categorybot.bot.BotComponent.main(BotComponent.java:80)
Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 1, Size: 0
at java.util.ArrayList.RangeCheck(ArrayList.java:547)
at java.util.ArrayList.get(ArrayList.java:322)
at net.metroidover.categorybot.http.Action.getCategoryMembers(Action.java:39)
at net.metroidover.categorybot.bot.BotComponent.<init>(BotComponent.java:58)
at net.metroidover.categorybot.bot.BotComponent.main(BotComponent.java:80)
Any input on how to avoid this double escaping would be greatly appreciated. I've lurked all around with absolutely no luck.
Thanks!
Edit: The solution that works best for me is parsifal's one, but, as an addition, I'd like to say that setting the path with method.setPath(url) made HttpMethod reject a cookie I needed to save:
Aug 26, 2011 4:07:08 PM org.apache.commons.httpclient.HttpMethodBase processCookieHeaders
WARNING: Cookie rejected: "wikicities_session=900beded4191ff880e09944c7c0aaf5a". Illegal path attribute "/". Path of origin: "http://es.metroid.wikia.com/api.php"
However, if I send the URI to the constructor and forget about the setPath(url), the cookie gets saved without problem.
String url = "http://es.metroid.wikia.com/api.php";
NameValuePair[] query = { new NameValuePair("action", "query"), new NameValuePair("list", "categorymembers"),
new NameValuePair("cmtitle", "Categoría:Mejoras de las Botas"), new NameValuePair("cmlimit", "500"),
new NameValuePair("format", "xml") };
HttpMethod method = null;
...
method = new GetMethod(url); // Or PostMethod(url)
method.getParams().setCookiePolicy(CookiePolicy.BROWSER_COMPATIBILITY); // It had been like this the whole time
method.setQueryString(query);
client.executeMethod(method);
I would recommend using UrlEncoder to encode your queryString values (not the whole queryString).
UrlEncoder.encode("Categoría:Mejoras de las Botas", "UTF-8");
Looking at the documentation of HttpMethodBase, it appears that all String parameters have to be pre-encoded. The simplest solution is to constructor your URL in stages, with setPath() and the variant of setQueryString() that takes an array of name-value parameters.
why don't you try adding the params as NameValuePair, the problem here is that when you escape the URL everything in the URL is escaped including things like http://.. thats why the system is complaining.
you can also escape just the arguments using URLEncoder.encode(), just pass the get params to this & append the return value to the URL.
String url = "http://es.metroid.wikia.com/api.php?"+URLEncoder.encode("action=query&list=categorymembers&cmtitle=Categoría:Mejoras de las Botas");

Categories