Java regex for google maps url? - java

I want to parse all google map links inside a String. The format is as follows :
1st example
https://www.google.com/maps/place/white+house/#38.8976763,-77.0387185,17z/data=!3m1!4b1!4m5!3m4!1s0x89b7b7bcdecbb1df:0x715969d86d0b76bf!8m2!3d38.8976763!4d-77.0365298
https://www.google.com/maps/place/white+house/#38.8976763,-77.0387185,17z
https://www.google.com/maps/place//#38.8976763,-77.0387185,17z
https://maps.google.com/maps/place//#38.8976763,-77.0387185,17z
https://www.google.com/maps/place/#38.8976763,-77.0387185,17z
https://google.com/maps/place/#38.8976763,-77.0387185,17z
http://google.com/maps/place/#38.8976763,-77.0387185,17z
https://www.google.com.tw/maps/place/#38.8976763,-77.0387185,17z
These are all valid google map URLs (linking to White House)
Here is what I tried
String gmapLinkRegex = "(http|https)://(www\\.)?google\\.com(\\.\\w*)?/maps/(place/.*)?#(.*z)[^ ]*";
Pattern patternGmapLink = Pattern.compile(gmapLinkRegex , Pattern.CASE_INSENSITIVE);
Matcher m = patternGmapLink.matcher(s);
while (m.find()) {
logger.info("group0 = {}" , m.group(0));
String place = m.group(4);
place = StringUtils.stripEnd(place , "/"); // remove tailing '/'
place = StringUtils.stripStart(place , "place/"); // remove header 'place/'
logger.info("place = '{}'" , place);
String latLngZ = m.group(5);
logger.info("latLngZ = '{}'" , latLngZ);
}
It works in simple situation , but still buggy ...
for example
It need post-process to grab optional place information
And it cannot extract one line with two urls such as :
s = "https://www.google.com/maps/place//#38.8976763,-77.0387185,17z " +
" and http://google.com/maps/place/#38.8976763,-77.0387185,17z";
It should be two urls , but the regex matches the whole line ...
The points :
The whole URL should be matched in group(0) (including the tailing data part in 1st example),
in the 1st example , if the zoom level : 17z is removed , it is still a valid gmap URL , but my regex cannot match it.
Easier to extract optional place info
Lat / Lng extraction is must , zoom level is optional.
Able to parse multiple urls in one line
Able to process maps.google.com(.xx)/maps , I tried (www|maps\.)? but seems still buggy
Any suggestion to improve this regex ? Thanks a lot !

The dot-asterisk
.*
will always allow anything to the end of the last url.
You need "tighter" regexes, which match a single URL but not several with anything in between.
The "[^ ]*" might include the next URL if it is separated by something other than " ", which includes line break, tab, shift-space...
I propose (sorry, not tested on java), to use "anything but #" and "digit, minus, comma or dot" and "optional special string followed by tailored charset, many times".
"(http|https)://(www\.)?google\.com(\.\w*)?/maps/(place/[^#]*)?#([0123456789\.,-]*z)(\/data=[\!:\.\-0123456789abcdefmsx]+)?"
I tested the one above on a perl-regex compatible engine (np++).
Please adapt yourself, if I guessed anything wrong. The explicit list of digits can probably be replaced by "\d", I tried to minimise assumptions on regex flavor.
In order to match "URL" or "URL and URL", please use a variable storing the regex, then do "(URL and )*URL", replacing "URL" with regex var. (Asuming this is possible in java.) If the question is how to then retrieve the multiple matches: That is java, I cannot help. Let me know and I delete this answer, not to provoke deserved downvotes ;-)
(Edited to catch the data part in, previously not seen, first example, first line; and the multi URLs in one line.)

I wrote this regex to validate google maps links:
"(http:|https:)?\\/\\/(www\\.)?(maps.)?google\\.[a-z.]+\\/maps/?([\\?]|place/*[^#]*)?/*#?(ll=)?(q=)?(([\\?=]?[a-zA-Z]*[+]?)*/?#{0,1})?([0-9]{1,3}\\.[0-9]+(,|&[a-zA-Z]+=)-?[0-9]{1,3}\\.[0-9]+(,?[0-9]+(z|m))?)?(\\/?data=[\\!:\\.\\-0123456789abcdefmsx]+)?"
I tested with the following list of google maps links:
String location1 = "http://www.google.com/maps/place/21.01196755,105.86306012";
String location2 = "https://www.google.com.tw/maps/place/#38.8976763,-77.0387185,17z";
String location3 = "http://www.google.com/maps/place/21.01196755,105.86306012";
String location4 = "https://www.google.com/maps/place/white+house/#38.8976763,-77.0387185,17z/data=!3m1!4b1!4m5!3m4!1s0x89b7b7bcdecbb1df:0x715969d86d0b76bf!8m2!3d38.8976763!4d-77.0365298";
String location5 = "https://www.google.com/maps/place/white+house/#38.8976763,-77.0387185,17z";
String location6 = "https://www.google.com/maps/place//#38.8976763,-77.0387185,17z";
String location7 = "https://maps.google.com/maps/place//#38.8976763,-77.0387185,17z";
String location8 = "https://www.google.com/maps/place/#38.8976763,-77.0387185,17z";
String location9 = "https://google.com/maps/place/#38.8976763,-77.0387185,17z";
String location10 = "http://google.com/maps/place/#38.8976763,-77.0387185,17z";
String location11 = "https://www.google.com/maps/place/#/data=!4m2!3m1!1s0x3135abf74b040853:0x6ff9dfeb960ec979";
String location12 = "https://maps.google.com/maps?q=New+York,+NY,+USA&hl=no&sll=19.808054,-63.720703&sspn=54.337928,93.076172&oq=n&hnear=New+York&t=m&z=10";
String location13 = "https://www.google.com/maps";
String location14 = "https://www.google.fr/maps";
String location15 = "https://google.fr/maps";
String location16 = "http://google.fr/maps";
String location17 = "https://www.google.de/maps";
String location18 = "https://www.google.com/maps?ll=37.0625,-95.677068&spn=45.197878,93.076172&t=h&z=4";
String location19 = "https://www.google.de/maps?ll=37.0625,-95.677068&spn=45.197878,93.076172&t=h&z=4";
String location20 = "https://www.google.com/maps?ll=37.0625,-95.677068&spn=45.197878,93.076172&t=h&z=4&layer=t&lci=com.panoramio.all,com.google.webcams,weather";
String location21 = "https://www.google.com/maps?ll=37.370157,0.615234&spn=45.047033,93.076172&t=m&z=4&layer=t";
String location22 = "https://www.google.com/maps?ll=37.0625,-95.677068&spn=45.197878,93.076172&t=h&z=4";
String location23 = "https://www.google.de/maps?ll=37.0625,-95.677068&spn=45.197878,93.076172&t=h&z=4";
String location24 = "https://www.google.com/maps?ll=37.0625,-95.677068&spn=45.197878,93.076172&t=h&z=4&layer=t&lci=com.panoramio.all,com.google.webcams,weather";
String location25 = "https://www.google.com/maps?ll=37.370157,0.615234&spn=45.047033,93.076172&t=m&z=4&layer=t";
String location26 = "http://www.google.com/maps/place/21.01196755,105.86306012";
String location27 = "http://google.com/maps/bylatlng?lat=21.01196022&lng=105.86298748";
String location28 = "https://www.google.com/maps/place/C%C3%B4ng+vi%C3%AAn+Th%E1%BB%91ng+Nh%E1%BA%A5t,+354A+%C4%90%C6%B0%E1%BB%9Dng+L%C3%AA+Du%E1%BA%A9n,+L%C3%AA+%C4%90%E1%BA%A1i+H%C3%A0nh,+%C4%90%E1%BB%91ng+%C4%90a,+H%C3%A0+N%E1%BB%99i+100000,+Vi%E1%BB%87t+Nam/#21.0121535,105.8443773,13z/data=!4m2!3m1!1s0x3135ab8ee6df247f:0xe6183d662696d2e9";

Related

URLEncoder - what character set to use for empty space instead of %20 or +

I am trying to open new email from my Java app:
String str=String.valueOf(email);
String body="This is body";
String subject="Hello worlds";
String newStr="mailto:"+str.trim()+"?subject="+URLEncoder.encode(subject,"UTF-8")+"&body="+URLEncoder.encode(body, "UTF-8")+"";
Desktop.getDesktop().mail(new URI(newStr));
Here it is my URLEncoding. As I cannot use body or subject string in URL without encoding them, my output here is with "+" instead of whitespace. Which is normal, I understand that. I was thinking if there is a way to visualize subject and body normally in my message? I tried with .replace("+"," ") but it is not working as it is giving an error. This is how it is now:
I think there might be different character set but I am not sure.
That's the way URLEncoder works.
One possible approach would be to replace all + with %20 after URLEncoder.enocde(...)
Or you could rely on URI constructor to encode your parameters correctly:
String scheme = "mailto";
String recipient = "recipient#snakeoil.com";
String subject = "The Meaning of Life";
String content = "..., the universe and all the rest is 42.\n Rly? Just kidding. Special characters: äöü";
String path = "";
String query = "subject=" + subject + "&body=" + content;
Desktop.getDesktop().mail(new URI(scheme, recipient, path, query, null));
Both solutions have issues:
In the first approach, you might replace actual + signs, with the second, you'll have issues with & character.

GWT - extract text in between two characters

In GWT i have a servlet that returns an image from the database to the client. I need to extract out part of the string to properly show the image. What is returned in chrome, firefox, and IE has a slash in the src part. Ex: String s = "src=\""; Which is not visible in the string below. Maybe the slash is adding more parentheses around the http string. Im not sure?
what is returned in those 3 browsers is = <img style="-webkit-user-select: none;" src="http://localhost:8080/dashboardmanager/downloadfile?entityId=4886">
EDGE browser doesn't have the slash in the src so my method to extract the image doesnt work in edge
What edge returns:
String edge = "<img src=”http://localhost:8080/dashboardmanager/downloadfile?entityId=4886”>";
Problem: I need to extract the string below.
http://localhost:8080/dashboardmanager/downloadfile?entityId=4886
either with src= or src=\
What I tried and works with the browsers that return without the parentheses "src=\":
String s = "src=\"";
int index = returned.indexOf(s) + s.length();
image.setUrl(returned.substring(index, returned.indexOf("\"", index + 1)));
But fails to work in EDGE because it doesnt return a slash
I do not have access to Pattern, and matcher in GWT.
How can i extract and keep in mind the entityId number will change
http://localhost:8080/dashboardmanager/downloadfile?entityId=4886
out of what is returned string above?
EDIT:
I need a generic way to extract out http://localhost:8080/dashboardmanager/downloadfile?entityId=4886
When the string might look like this both ways.
String edge = "<img src=”http://localhost:8080/dashboardmanager/downloadfile?entityId=4886”>";
3 browsers is = <img style="-webkit-user-select: none;" src="http://localhost:8080/dashboardmanager/downloadfile?entityId=4886">
public static void main(String[] args) {
String toParse = "<img style=\"-webkit-user-select: none;\" src=\"http://localhost:8080/dashboardmanager/downloadfile?entityId=4886\">";
String delimiter = "src=\"";
int index = toParse.indexOf(delimiter) + delimiter.length();
System.out.println(toParse.substring(index, toParse.length()).split("\"")[0]);
}

Java String truncate from URL address

I have an URL address like: http://myfile.com/File1/beauty.png
I have to remove http://site address/ from main string
That mean result should be File1/beauty.png
Note: site address might be anything(e.g some.com, some.org)
See here: http://docs.oracle.com/javase/tutorial/networking/urls/urlInfo.html
Just create a URL object out of your string and use URL.getPath() like this:
String s = new URL("http://myfile.com/File1/beauty.png").getPath();
If you don't need the slash at the beginning, you can remove it via s.substring(1, s.length());
Edit, according to comment:
If you are not allowed to use URL, this would be your best bet: Extract main domain name from a given url
See the accepted answer. Basically you have to get a TLD list, find the domain and substract everything till the domain names' end.
If, as you say, you only want to use the standard String methods then this should do it.
public static String getPath(String url){
if(url.contains("://")){
url = url.substring(url.indexOf("://")+3);
url = url.substring(url.indexOf("/") + 1);
} else {
url = url.substring(url.indexOf("/")+1);
}
return url;
}
If the url contains :// then we know that the string you are looking for will come after the third /. Otherwise, it should come after the first. If we do the following;
System.out.println(getPath("http://myfile.com/File1/beauty.png"));
System.out.println(getPath("https://myfile.com/File1/beauty.png"));
System.out.println(getPath("www1.myfile.com/File1/beauty.png"));
System.out.println(getPath("myfile.co.uk/File1/beauty.png"));;
The output is;
File1/beauty.png
File1/beauty.png
File1/beauty.png
File1/beauty.png
You can use the below approach to fetch the required data.
String url = "http://myfile.org/File1/beauty.png";
URL u = new URL(url);
String[] arr = url.split(u.getAuthority());
System.out.println(arr[1]);
Output - /File1/beauty.png
String s = "http://www.freegreatpicture.com/files/146/26189-abstract-color-background.jpg";
s = s.substring(s.indexOf("/", str.indexOf("/") + 1));

Java use regex to extract file name

I need to get a file name from file's absolute path (I am aware of the file.getName() method, but I cannot use it here).
EDIT: I cannot use file.getName() because I don't need the file name only; I need the part of the file's path as well (but again, not the entire absoulte path). I need the part of file's path AFTER certain path provided.
Let's say the file is located in the folder:
C:\Users\someUser
On windows machine, if I make a pattern string as follows:
String patternStr = "C:\\Users\\someUser\\(.*+)";
I get an exception: java.util.regex.PatternSyntaxException: Illegal/unsupported escape sequence for backslash.
If I use Pattern.quote(File.pathSeparator):
String patternStr = "C:" + Pattern.quote(File.separator) + "Users" + Pattern.quote(File.separator) + "someUser" + Pattern.quote(File.separator) + "(.*+)";
the resulting pattern string is: C:\Q;\EUsers\Q;\EsomeUser\Q;\E(.*+) which of course has no match with the actual fileName "C:\Users\someUser\myFile.txt".
What am I missing here? What is the proper way to parse file name?
What is the proper way to parse file name?
The proper way to parse a file name is to use File(String). Using a regex for this is going to hard-wire platform dependencies into your code. That's a bad idea.
I know you said you can't use File.getName() ... but that is the proper solution. If you would care to say why you can't use File.getName() perhaps I could suggest an alternative solution.
If you indeed want to use a regular expressions, you should use
String patternStr = "C:\\\\Users\\\\someUser\\\\(.*+)";
^^ ^^ ^^
instead.
Why? Your string literal
"C:\\Users\\someUser\\(.*+)"
is compiled to
C:\Users\someUser\(.*+)
Since \ is used for escaping in regular expressions too, you'll have to escape them "twice".
Regarding your edit:
You probably want to have a look at URI.relativize(). Example:
File base = new File("C:/Users/someUser");
File file = new File("C:/Users/someUser/someDir/someFile.txt");
String relativePath = base.toURI().relativize(file.toURI()).getPath();
System.out.println(relativePath); // prints "someDir/someFile.txt"
(Note that / works as file-separator on Windows machines too.)
Btw, I don't know what you have as File.separator on your system, but if it's set to \, then
"C:" + Pattern.quote(File.separator) + "Users" + Pattern.quote(File.separator) +
"someUser" + Pattern.quote(File.separator) + "(.*+)";
should yield
C:\Q\\EUsers\Q\\EsomeUser\Q\\E(.*+)
String patternStr = "C:\\Users\\someUser\\(.*+)";
Backslashes (\) are escape characters in the Java Language. Your string contains the following after compilation:
C:\Users\someUser\(.*+)
This string is then parsed as a regex, which uses backslashes as an escape character as well. The regex parser tries to understand the escaped \U, \s and \(. One of them is incorrect regarding the regex syntax (hence your exception), and none of them are what you are trying to achieve.
Try
String patternStr = "C:\\\\Users\\\\someUser\\\\(.*+)";
If you want to solve it by pattern you need to escape your Pattern properly
String patternStr = "C:\\\\Users\\\\someUser\\\\(.*+)";
Try putting double-double-backslashes in your pattern. You need a second backslash to escape one in the patter, plus you'll need to double each one to escape them in the string. Hence you'll end up with something like:
String patternStr = "C:\\\\Users\\\\someUser\\\\(.*+)";
Move from end of string to first occurrence of file path separator* or begin.
File paths separator can be / or \.
public static final char ALTERNATIVE_DIRECTORY_SEPARATOR_CHAR = '/';
public static final char DIRECTORY_SEPARATOR_CHAR = '\\';
public static final char VOLUME_SEPARATOR_CHAR = ':';
public static String getFileName(String path) {
if(path == null || path.isEmpty()) {
return path;
}
int length = path.length();
int index = length;
while(--index >= 0) {
char c = path.charAt(index);
if(c == ALTERNATIVE_DIRECTORY_SEPARATOR_CHAR || c == DIRECTORY_SEPARATOR_CHAR || c == VOLUME_SEPARATOR_CHAR) {
return path.substring(index + 1, length);
}
}
return path;
}
Try to keep it simple ;-).
Try this :
String ResultString = null;
try {
Pattern regex = Pattern.compile("([^\\\\/:*?\"<>|\r\n]+$)");
Matcher regexMatcher = regex.matcher(subjectString);
if (regexMatcher.find()) {
ResultString = regexMatcher.group(1);
}
} catch (PatternSyntaxException ex) {
// Syntax error in the regular expression
}
Output :
myFile.txt
Also for input : C:/Users/someUser/myFile.txt
Output : myFile.txt
What am I missing here? What is the proper way to parse file name?
The proper way to parse a file name is to use the APIs that are already provided for the purpose. You've stated that you can't use File.getName(), without explanation. You are almost certainly mistaken about that.
I cannot use file.getName() because I don't need the file name only; I need the part of the file's path as well (but again, not the entire absoulte path).
OK. So what you want is something like this.
// Canonicalize paths to deal with ".", "..", symlinks,
// relative files and case sensitivity issues.
String directory = new File(someDirectory).canonicalPath();
String test = new File(somePathname).canonicalPath();
if (!directory.endsWith(File.separator)) {
directory += File.separator;
}
if (test.startsWith(directory)) {
String pathInDirectory = test.substring(directory.length()):
...
}
Advantages:
No regexes needed.
Doesn't break if the path separator is something other than \.
Doesn't break if there are symbolic links on the path.
Doesn't break due to case sensitivity issues.
Suppose the file name has special characters, specially when supporting MAC where special characters are allowing in filenames, server side Path.GetFileName(fileName) fails and throws error because of illegal characters in path. The following code using regex come for the rescue.
The following regex take care of 2 things
In IE, when file is uploaded, the file path contains folders aswell (i.e. c:\samplefolder\subfolder\sample.xls). Expression below will replace all folders with empty string and retain the file name
When used in Mac, filename is the only thing supplied as its safari browser and allows special chars in file name
var regExpDir = #"(^[\w]:\\)([\w].+\w\\)";
var fileName = Regex.Replace(fileName, regExpDir, string.Empty);

Java : replacing text URL with clickable HTML link

I am trying to do some stuff with replacing String containing some URL to a browser compatible linked URL.
My initial String looks like this :
"hello, i'm some text with an url like http://www.the-url.com/ and I need to have an hypertext link !"
What I want to get is a String looking like :
"hello, i'm some text with an url like http://www.the-url.com/ and I need to have an hypertext link !"
I can catch URL with this code line :
String withUrlString = myString.replaceAll(".*://[^<>[:space:]]+[[:alnum:]/]", "HereWasAnURL");
Maybe the regexp expression needs some correction, but it's working fine, need to test in further time.
So the question is how to keep the expression catched by the regexp and just add a what's needed to create the link : catched string
Thanks in advance for your interest and responses !
Try to use:
myString.replaceAll("(.*://[^<>[:space:]]+[[:alnum:]/])", "HereWasAnURL");
I didn't check your regex.
By using () you can create groups. The $1 indicates the group index.
$1 will replace the url.
I asked a simalir question: my question
Some exemples: Capturing Text in a Group in a regular expression
public static String textToHtmlConvertingURLsToLinks(String text) {
if (text == null) {
return text;
}
String escapedText = HtmlUtils.htmlEscape(text);
return escapedText.replaceAll("(\\A|\\s)((http|https|ftp|mailto):\\S+)(\\s|\\z)",
"$1$2$4");
}
There may be better REGEXs out there, but this does the trick as long as there is white space after the end of the URL or the URL is at the end of the text. This particular implementation also uses org.springframework.web.util.HtmlUtils to escape any other HTML that may have been entered.
For anybody who is searching a more robust solution I can suggest the Twitter Text Libraries.
Replacing the URLs with this library works like this:
new Autolink().autolink(plainText)
Belows code replaces links starting with "http" or "https", links starting just with "www." and finally replaces also email links.
Pattern httpLinkPattern = Pattern.compile("(http[s]?)://(www\\.)?([\\S&&[^.#]]+)(\\.[\\S&&[^#]]+)");
Pattern wwwLinkPattern = Pattern.compile("(?<!http[s]?://)(www\\.+)([\\S&&[^.#]]+)(\\.[\\S&&[^#]]+)");
Pattern mailAddressPattern = Pattern.compile("[\\S&&[^#]]+#([\\S&&[^.#]]+)(\\.[\\S&&[^#]]+)");
String textWithHttpLinksEnabled =
"ajdhkas www.dasda.pl/asdsad?asd=sd www.absda.pl maiandrze#asdsa.pl klajdld http://dsds.pl httpsda http://www.onet.pl https://www.onsdas.plad/dasda";
if (Objects.nonNull(textWithHttpLinksEnabled)) {
Matcher httpLinksMatcher = httpLinkPattern.matcher(textWithHttpLinksEnabled);
textWithHttpLinksEnabled = httpLinksMatcher.replaceAll("$0");
final Matcher wwwLinksMatcher = wwwLinkPattern.matcher(textWithHttpLinksEnabled);
textWithHttpLinksEnabled = wwwLinksMatcher.replaceAll("$0");
final Matcher mailLinksMatcher = mailAddressPattern.matcher(textWithHttpLinksEnabled);
textWithHttpLinksEnabled = mailLinksMatcher.replaceAll("$0");
System.out.println(textWithHttpLinksEnabled);
}
Prints:
ajdhkas www.dasda.pl/asdsad?asd=sd www.absda.pl maiandrze#asdsa.pl klajdld http://dsds.pl httpsda http://www.onet.pl https://www.onsdas.plad/dasda
Assuming your regex works to capture the correct info, you can use backreferences in your substitution. See the Java regexp tutorial.
In that case, you'd do
myString.replaceAll(....., "\1")
In case of multiline text you can use this:
text.replaceAll("(\\s|\\^|\\A)((http|https|ftp|mailto):\\S+)(\\s|\\$|\\z)",
"$1<a href='$2'>$2</a>$4");
And here is full example of my code where I need to show user's posts with urls in it:
private static final Pattern urlPattern = Pattern.compile(
"(\\s|\\^|\\A)((http|https|ftp|mailto):\\S+)(\\s|\\$|\\z)");
String userText = ""; // user content from db
String replacedValue = HtmlUtils.htmlEscape(userText);
replacedValue = urlPattern.matcher(replacedValue).replaceAll("$1$2$4");
replacedValue = StringUtils.replace(replacedValue, "\n", "<br>");
System.out.println(replacedValue);

Categories