Capture a group containing URL encoded - java

I tried, searching this but I didn't find anything.
I found a regex pattern to extract username from facebook link:
(?:(?:http|https):\/\/)?(?:www.)?facebook.com\/(?:(?:\w)*#!\/)?(?:pages\/)?(?:[?\w\-]*\/)?(?:profile.php\?id=(?=\d.*))?([\w\-]*)?
The problem with this is that it can not capture the username if it is encoded. The original username is in arabic. For example, this kind of links:
https://www.facebook.com/%D9%82%D8%B1%D9%8A
The problem is with the percentages, but how to fix it? Please help me!

Thanks to #mwp I resolved the problem by just decoding the URL before passing it to the matcher.
link = URLDecoder.decode(link.replaceAll("/$", ""), "UTF-8");
replaceAll() is to remove any trailing slashes.

Related

Encode URL with US-ASCII character set

I refer to the following web site:
http://coderstoolbox.net/string/#!encoding=xml&action=encode&charset=us_ascii
Choosing "URL", "Encode", and "US-ASCII", the input is converted to the desired output.
How do I produce the same output with Java codes?
Thanks in advance.
I used this and it seems to work fine.
public static String encode(String input) {
Pattern doNotReplace = Pattern.compile("[a-zA-Z0-9]");
return input.chars().mapToObj(c->{
if(!doNotReplace.matcher(String.valueOf((char)c)).matches()){
return "%" + (c<256?Integer.toHexString(c):"u"+Integer.toHexString(c));
}
return String.valueOf((char)c);
}).collect(Collectors.joining("")).toUpperCase();
}
PS: I'm using 256 to limit the placement of the prefix U to non-ASCII characters. No need of prefix U for standard ASCII characters which are within 256.
Alternate option:
There is a built-in Java class (java.net.URLEncoder) that does URL Encoding. But it works a little differently (For example, it does not replace the Space character with %20, but replaces with a + instead. Something similar happens with other characters too). See if it helps:
String encoded = URLEncoder.encode(input, "US-ASCII");
Hope this helps!
You can use ESAPi.encoder().encodeForUrl(linkString)
Check more details on encodeForUrl https://en.wikipedia.org/wiki/Percent-encoding
please comment if that does not satisfy your requirement or face any other issue.
Thanks

Extracting Android App ID from URL

I want to extract Android application ID from URL.
Examples of URLs are:
https://play.google.com/store/apps/details?id=com.opera.mini.native
https://play.google.com/store/apps/details?id=com.opera.mini.native&referrer=xxxx
And I want to get com.opera.mini.native substrings form both URLs.
I tried to create regex to parse ID, but unsuccessfully:
^.+details\?id=(.+)&?.+
The problem is that regex returns com.opera.mini.native&referrer=xxxx for second case (for 1st URL it works fine).
How I can change regex to achieve my goal?
Thanks
It's because you made & as optional.
".+\\bdetails\\?id=([^&]+)"
Regex can be:
(?<=[?&]id=)[^&]+
ResEx Demo

other Wesites addresses did not match as url

Hi i am using this code to match to the editbox text(where user input web addresses)
(Patterns.WEB_URL.matcher(txt_Editbox).matches())
but when the user input this url:
http://website.info?ques==two&t=p
it did'nt accept as url, it read as a text. could anyone help me to solve this or suggest to do anything else. ??
thank you.
The URL is incorrect. It's missing a URL path separator /. Try matching with:
http://website.info/?ques=two&t=p
I had resolve this problem instead of using
(Patterns.WEB_URL.matcher(txt_Editbox).matches())
i used
String urlname = "^(https?|ftp|file)://.+$";
Matcher matcherObj = Pattern.compile(urlname).matcher(txt_Editbox);
this one can accept all kinds of web addresses as long as this address exist, and now i am able to view this site: http://website.info?ques==two&t=p to my webview.

how to replace brackets in url with bracket encoding?

I need a regex pattern that will find and replace brackets in urls to its urls encoding.
For example a base url like:
http://www.mysite.com/bla/blabla/abc[1].txt
will be turned to:
http://www.mysite.com/bla/blabla/abc%5B1%5D.txt
can anyone help please?
EDIT1:
i originaly use commons-httpclient to access this kind of urls.
when I use the first URL I get an "escaped absolute path no valid" exception.
I can't use URLENCODER because when I use it, I get a "host parameter is null" exception.
The following line should do the trick
String s = URLEncoder.encode("http://www.mysite.com/bla/blabla/abc[1].txt", "UTF-8");
Have you tried URLEncoder.encode?
in the java.net.URLEncoder package.
EDIT:
Ok i see... you cannot pass an entire URL to URLEncoder. URLEncoder is mostly used to encode query parameters.
try this instead:
URI uri = new URI("http", "www.mysite.com", "/bla/blabla/abc[1].txt",null);
System.out.println(uri.toASCIIString());

Search and replace "/" at end of url's using regular expressions in java

Below is my regular expression :-
\\bhttps?://[-a-zA-Z0-9+&##/%?=~_|!:,.;]*[-a-zA-Z0-9+&##/%=~_|]\\b
when the request url is of type http://www.example.com/ , the last character is not replaced in my shortner url and / is appended at end.
The regex is not able to find the last /.
Please help with this.
I think that / would be a word boundary, so maybe it works better if you add a ? to the and, so it reads:
\\bhttps?://[-a-zA-Z0-9+&##/%?=~_|!:,.;]*[-a-zA-Z0-9+&##/%=~_|]\\b?
what about:
if(url.endsWith("/"))
url = url.substring(0,url.length()-1);
or if you need to use regular expressions you can do something like this:
url = url.replaceAll("(\\bhttps?://[-a-zA-Z0-9+&##/%?=~_|!:,.;]*)/(\\b?)","$1$2");
If all you want is to replace the trailing / (which is what your question directly asks), you can simply do:
url = url.substring(0, url.lastIndexOf('/'));
Remember to KISS often.
You could simply use:
url = url.replaceAll("\/+$","");

Categories