what is the regular expression to find the path param from the url?
http://localhost:8080/domain/v1/809pA8
https://localhost:8080/domain/v1/809pA8
Want to retrieve the value(809pA8) from the above URL using regular expression, java is preferable.
I would suggest you do something like
url.substring(url.lastIndexOf('/') + 1);
If you really prefer regexps, you could do
Matcher m = Pattern.compile("/([^/]+)$").matcher(url);
if (m.find())
value = m.group(1);
I would try:
String url = "http://localhost:8080/domain/v1/809pA8";
String value = String.valueOf(url.subSequence(url.lastIndexOf('/'), url.length()-1));
No need for regex here, I think.
EDIT: I'm sorry I made a mistake:
String url = "http://localhost:8080/domain/v1/809pA8";
String value = String.valueOf(url.subSequence(url.lastIndexOf('/')+1, url.length()));
See this code working here: https://ideone.com/E30ddC
For your simple case, regex is an overkill, as others noted. But, if you have more cases and this is why you prefer regex, give Spring's AntPathMatcher#extractUriTemplateVariables a look, if you're using Spring. It's actually better equipped for extracting path variables than regex directly. Here are some good examples.
Related
I would like to use Java regex to match a domain of a url, for example,
for www.table.google.com, I would like to get 'google' out of the url, namely, the second last word in this URL string.
Any help will be appreciated !!!
It really depends on the complexity of your inputs...
Here is a pretty simple regex:
.+\\.(.+)\\..+
It fetches something that is inside dots \\..
And here are some examples for that pattern: https://regex101.com/r/L52oz6/1.
As you can see, it works for simple inputs but not for complex urls.
But why reinventing the wheel, there are plenty of really good libraries that correctly parse any complex url. But sure, for simple inputs a small regex is easily build. So if that does not solve the problem for your inputs then please callback, I will adjust the regex pattern then.
Note that you can also just use simple splitting like:
String[] elements = input.split("\\.");
String secondToLastElement = elements[elements.length - 2];
But don't forget the index-bound checking.
Or if you search for a very quick solution than walk through the input starting from the last position. Work your way through until you found the first dot, continue until the second dot was found. Then extract that part with input.substring(index1, index2);.
There is also already a delegate method for exactly that purpose, namely String#lastIndexOf (see the documentation).
Take a look at this code snippet:
String input = ...
int indexLastDot = input.lastIndexOf('.');
int indexSecondToLastDot = input.lastIndexOf('.', indexLastDot);
String secondToLastWord = input.substring(indexLastDot, indexSecondToLastDot);
Maybe the bounds are off by 1, haven't tested the code, but you get the idea. Also don't forget bound checking.
The advantage of this approach is that it is really fast, it can directly work on the internal structures of Strings without creating copies.
My attempt:
(?<scheme>https?:\/\/)?(?<subdomain>\S*?)(?<domainword>[^.\s]+)(?<tld>\.[a-z]+|\.[a-z]{2,3}\.[a-z]{2,3})(?=\/|$)
Demo. Works correctly for:
http://www.foo.stackoverflow.com
http://www.stackoverflow.com
http://www.stackoverflow.com/
http://stackoverflow.com
https://www.stackoverflow.com
www.stackoverflow.com
stackoverflow.com
http://www.stackoverflow.com
http://www.stackoverflow.co.uk
foo.www.stackoverflow.com
foo.www.stackoverflow.co.uk
foo.www.stackoverflow.co.uk/a/b/c
private static final Pattern URL_MATCH_GET_SECOND_AND_LAST =
Pattern.compile("www.(.*)//.google.(.*)", Pattern.CASE_INSENSITIVE);
String sURL = "www.table.google.com";
if (URL_MATCH_GET_SECOND_AND_LAST.matcher(sURL).find()){
Matcher matchURL = URL_MATCH_GET_SECOND_AND_LAST .matcher(sURL);
if (matchURL .find()) {
String sFirst = matchURL.group(1);
String sSecond= matchURL.group(2);
}
}
I have a regular expression in apex that is only grabbing part of the link I need in a string. I need it to grab the entire link.
Here is what im working with:
String myvar = 'this is an example http://test.com/testing/123654123%0A%0A%0A%';
String myvar1 = '(?:(?:(?:[a-z0-9]{3,9}:(?://)?)(?:[-;:&=+$,w]+#)?[a-z0-9.-]+|(?:www.|[-;:&=+$??,w]+#)[a-z0-9.-]+)((?:/[+~%/.w-]*)?\\??(?:[-+=&;%#.w]*)#?w*)?)';
Pattern MyPattern = Pattern.compile(myvar1);
Matcher MyMatcher = MyPattern.matcher(myvar);
while (MyMatcher.find()) {
System.debug(MyMatcher.group());
Location = MyMatcher.group();
}
This is only returning http://test.com/
I need http://test.com/testing/123654123
How can I modify the regular expression to provide the complete link?
I just need to modify my existing regex to accomplish this. How can keep as much of the regular expression im using as possible?
(?:(?:(?:[a-z0-9]{3,9}:(?://)?)(?:[-;:&=+$,w]+#)?[a-z0-9.-]+|(?:www.|[-;:&=+$??,w]+#)[a-z0-9.-]+)((?:/[+~%/.w-]*)?\\??(?:[-+=&;%#.w]*)#?w*)?)
Use this regex :
https?:\/\/[a-zA-Z0-9.\/-]*
Online demo http://regexr.com/3d7j7
I have a very simple problem but I am new to Java Matcher and I am having a hard time figuring out how to use it for my specific problem.
I have a string which is something like this <not needed content>src="url"<not needed content>src="url2"<not needed content>
Where <'not needed content'> are the things I want to ignore in my string. I basically want to extract the URLs from the string.
My code currently looks like this
Pattern MY_PATTERN = Pattern.compile("\\src=\"(.*?)\\\"");
Matcher m = MY_PATTERN.matcher(content);
String s = "something";
while (m.find()) {
s = m.group(1);
}
I apologize for such basic, and possibly duplicate question.
Thank you.
Why didn't you try a simplier pattern ? Like this one :
Pattern.compile("src=\"(.*?)\"");
(Not tested, but should be better)
You can use either of the following regexes:
src="([^"]+)
src="(.+?"
Below is my regular expression :-
\\bhttps?://[-a-zA-Z0-9+&##/%?=~_|!:,.;]*[-a-zA-Z0-9+&##/%=~_|]\\b
when the request url is of type http://www.example.com/ , the last character is not replaced in my shortner url and / is appended at end.
The regex is not able to find the last /.
Please help with this.
I think that / would be a word boundary, so maybe it works better if you add a ? to the and, so it reads:
\\bhttps?://[-a-zA-Z0-9+&##/%?=~_|!:,.;]*[-a-zA-Z0-9+&##/%=~_|]\\b?
what about:
if(url.endsWith("/"))
url = url.substring(0,url.length()-1);
or if you need to use regular expressions you can do something like this:
url = url.replaceAll("(\\bhttps?://[-a-zA-Z0-9+&##/%?=~_|!:,.;]*)/(\\b?)","$1$2");
If all you want is to replace the trailing / (which is what your question directly asks), you can simply do:
url = url.substring(0, url.lastIndexOf('/'));
Remember to KISS often.
You could simply use:
url = url.replaceAll("\/+$","");
I have a very simple regex question. Suppose I have 2 conditions:
url =http://www.abc.com/cde/def
url =https://www.abc.com/sadfl/dsaf
How can I extract the baseUrl using regex?
Sample output:
http://www.abc.com
https://www.abc.com
Like this:
String baseUrl;
Pattern p = Pattern.compile("^(([a-zA-Z]+://)?[a-zA-Z0-9.-]+\\.[a-zA-Z]+(:\d+)?/");
Matcher m = p.matcher(str);
if (m.matches())
baseUrl = m.group(1);
However, you should use the URI class instead, like this:
URI uri = new URI(str);
A one liner without regexp:
String baseUrl = url.substring(0, url.indexOf('/', url.indexOf("//")+2));
/^(https?\:\/\/[^\/]+).*/$1/
This will capture ANYTHING that starts with http and $1 will contain everything from the beginning to the first / after the //
Except for write-and-throw-away scripts, you should always refrain from parsing complex syntaxes (e-mail addresses, urls, html pages, etc etc) using regexes.
believe me, you will get bitten eventually.
I'm pretty sure that there is a Java class that will allow path manipulations, but if it has to be a regex,
https?://[^/]+
would work. (s? included to also handle https:)
Looks like the simplest solution to your two specific examples would be the pattern:
[^/]_//[^/]+
i.e.: non-slash (0 or more times), two slashes, non-slash (0 or more times). You can be stricter than that if you wish, as the two existing answers are doing in different ways -- one will reject e.g. URLs starting with ftp:, the other will reject domains with underscores (but accept URLs without a leading protocol://, thereby being even broader than mine in that respect). This variety of answers (all correct wrt your scant specs;-) should suggest to you that your specs are too vague and should be tightened.
Here's a regex that should satisfy the problem as given.
https?://[^/]*
I'm assuming you're asking this partly to gain more knowledge of regexes. If, however, you're trying to pull the host from a URL, it's arguably much more correct to use Java's more robust parsing methods:
String urlStr = "https://www.abc.com/stuff";
URL url = new URL(urlStr);
String host = url.getHost();
String protocol = url.getProtocol();
URL baseUrl = new URL (protocol, host);
This is better, as it should catch more cases if your input URL isn't as strict as described above.
Old post.. thought I might as well put a simple answer to a simple regex Q:
(http|https):\/\/(www.)?(\w+)?\.(\w+)?