I want to retrieve http:// or https:// as per the protocol being used from the given URL.
How can i do it using Pattern and a Matcher?
If there's some other way to do it then suggest me the snippet.
here is my URL
https://www.google.co.in/#q=retrieving+http:%2F%2F+from+url+using+java+regular+expression+
thanks in advance
Another way:
String urlString = "https://www.google.co.in";
URL url = new URL(urlString);
String protocol = url.getProtocol();
Simply something like
^(https?://)
should match
Related
I need to extract a randomly generated part of an URL for a Selenium Test in Java.
When the browser opens a page, e.g.:
/edit_person.html?id=eb58cea3a3772ff656987792eb0a8c0f
then I'm able to show the url with:
String url = driver.getCurrentUrl();
but now I need to get only the randomly generated ID after the equals sign.
How do I extract the value of parameter id once I have the entire URL as a string in variable url?
URL.getQuery() will give the query portion as a String it is a simple regular expression match to isolate the part you want.
id=(.*) will get you what you want as long as it is the only thing in the query string.
This is how managed to solve the problem:
String url = driver.getCurrentUrl();
URL aURL = new URL(url);
url = aURL.getQuery();
String[] id = url.split("=");
System.out.println(id[1]);
Thanks to Jarrod Roberson!
I need a regex pattern that will find and replace brackets in urls to its urls encoding.
For example a base url like:
http://www.mysite.com/bla/blabla/abc[1].txt
will be turned to:
http://www.mysite.com/bla/blabla/abc%5B1%5D.txt
can anyone help please?
EDIT1:
i originaly use commons-httpclient to access this kind of urls.
when I use the first URL I get an "escaped absolute path no valid" exception.
I can't use URLENCODER because when I use it, I get a "host parameter is null" exception.
The following line should do the trick
String s = URLEncoder.encode("http://www.mysite.com/bla/blabla/abc[1].txt", "UTF-8");
Have you tried URLEncoder.encode?
in the java.net.URLEncoder package.
EDIT:
Ok i see... you cannot pass an entire URL to URLEncoder. URLEncoder is mostly used to encode query parameters.
try this instead:
URI uri = new URI("http", "www.mysite.com", "/bla/blabla/abc[1].txt",null);
System.out.println(uri.toASCIIString());
I am trying to replace url with another url.
Below is the example of source url
http://sysserver01.internal.com/web/www/internal/projectwork/resources/injury-prevention-and-recovery/avoiding-injury/overview-of-running-injuries/
so this url should be replace with below url,
http://sysserver01.internal.com/var/www/html/injury-prevention-and-recovery/avoiding-injury/overview-of-running-injuries/
It means if source url comes then the part after resources in source url must be appended with /var/www/html/(and rest of part after resources in source url).
This needs to be happen with rendom set of source url that contains resources string.
I dont have enough knowldege of string manipulation. So please someone help me to solve this query. Please try to solve it in JAVA as I choose this platform for my work.
String originalUrl = "http://sysserver01.internal.com/web/www/internal/projectwork/resources/injury-prevention-and-recovery/avoiding-injury/overview-of-running-injuries";
String newUrl = originalUrl.replaceAll("web/www/internal/projectwork/resources", "var/www/html");
String originalUrl = "http://sysserver01.internal.com/web/www/internal/projectwork/resources/injury-prevention-and-recovery/avoiding-injury/overview-of-running-injuries";
String newUrl = originalUrl.replace("web/www/internal/projectwork/resources", "var/www/html");
I need to be testing my server for several URLs daily since these URLs are updated by my users - and this will be dine in Java. However, these URLs contains strange characters (like the german umlaut). Basicly what I am doing is:
for every URL in the list to check
URL u = new URL(the_url);
u.openConnection(..);
// read the content and handle it
Now, what Ive found is that org.apache.commons.codec.net.URLCodec is fine for encoding string to paste into the QueryString, it is not as suitable to encode strange URLs into their hex counterparts. Here are some examples of URLs:
http:// www.example com/u/überraum-03/
http:// www.example com/u/são-paulo-dude/
http:// www.example com/u/håkon-hellström/
The desired result for the first would be;
http:// www.example com/u/%c3%9berraum-03/
Are there any library in the Apache Commons or java itself, to convert special character in the ACTUAL url (not querystring - and therefore not replace the same kind of characters) ?
Thank you for your time.
Edited
Firefox translates "yr.no/place/Norway/Nordland/Moskenes/Å/data.html"; into "yr.no/place/Norway/Nordland/Moskenes/%C3%85/data.html" (try this by entering the first URL, press enter, then copy the url into a document). It is this effect that I am looking for - since this is the actual translation. What is most likely happening is either FF knows Å is a bad thing, it tries multiple versions or it accepts the servers "Location" header; either way - there is a tranformation from "Å" to "%C3%85" on only a subset of the URL. This is the function we need.
Edited
I just verified that the code given by commentor does not work sadly. As an example, try this:
try{
String urlStr = "http://www.yr.no/place/Norway/Nordland/Moskenes/Å/data.html";
URL u=new URL(urlStr);
URI uri = new URI(u.getProtocol(),
u.getUserInfo(), u.getHost(), u.getPort(),
u.getPath(), u.getQuery(),
null); // removing ref
URL urlObj = uri.toURL();
HttpURLConnection connection = (HttpURLConnection) urlObj.openConnection();
connection.setInstanceFollowRedirects(false);
connection.connect();
for (int i=0;i<connection.getHeaderFields().size();i++)
System.out.println(connection.getHeaderFieldKey(i)+": "+connection.getHeaderField(i));
System.exit(0);
}catch(Exception e){e.printStackTrace();};
Will yield a 404 error - strangely enough the encoded part does also not work.
If you need a URL that is a valid URI (RFC 2396 compliant) you can create one like this in Java
String urlString = "http://www.example.com/u/håkon-hellström/";
URL url = new URL(urlString);
URI uri = new URI(url.getProtocol(),url.getAuthority(), url.getPath(), url.getQuery(), url.getRef());
url = new URL(uri.toASCIIString());
That being said all three sample strings you provided are RFC 2396 compliant and do not need to be encoded. I am assuming the spaces in the authority part of the URLs you provided are typos.
EDIT:
I updated the code block above. By using URI.toASCIIString() you can limit the resulting URI to only US-ASCII characters (other characters are encoded). The resulting string can then be used to create a new, valid URL.
http://www.example.com/u/håkon-hellström/
changes to
http://www.example.com/u/h%C3%A5kon-hellstr%C3%B6m/
Having trouble setting up a URL connection with Chinese characters in the URL. It works with Latin characters:
String xstr = "维也纳恩斯特哈佩尔球场" ;
URI uri = new URI("http","ajax.googleapis.com","/ajax/services/language/detect","v=1.0&q="+xstr,null);
URL url = uri.toURL();
URLConnection connection = url.openConnection();
InputStream is = connection.getInputStream() ;
The getInputStream() call results in:
java.lang.IllegalArgumentException: Invalid uri 'http://ajax.googleapis.com/ajax/services/language/detect?v=1.0&q=???????????': Invalid query
The problem is caused by the fact that URI.toURL() doesn't percent-encode non-ASCII characters. Use the following instead:
URL url = new URL(uri.toASCIIString());
axtavt's answer above saved me from insanity, thanks! Just one comment (I could not figure out how to comment below the answer:)
If you start with a URL, you need to encode quotes before you build the URI:
String s = "your_url?with=\"quotes\"";
URI su = new URI (s.replaceAll("\"", "%22");
URL ur = new URL( su.toASCIIString());
I think it is related to the "UTF-8" charset. Have a look at this topic to learn more and also this chinese in java
Per the URI RFC (see section 2.4), non-US-ASCII characters aren't valid in a URI. You must encode them.