Get last part of url using a regex - java

How do I get the last part of the a URL using a regex, here is my URL, I want the segmeent between the last forward slash and the #
http://mycompany.com/test/id/1234#this
So I only want to get 1234.
I have the following but is not removing the '#this'
".*/(.*)(#|$)",
I need this while indexing data so don't want to use the URL class.

Just use URI:
final URI uri = URI.create(yourInput);
final String path = uri.getPath();
path.substring(path.lastIndexOf('/') + 1); // will return what you want
Will also take care of URIs with query strings etc. In any event, when having to extract any part from a URL (which is a URI), using a regex is not what you want: URI can handle it all for you, at a much lower cost -- since it has a dedicated parser.
Demo code using, in addition, Guava's Optional to detect the case where the URI has no path component:
public static void main(final String... args) {
final String url = "http://mycompany.com/test/id/1234#this";
final URI uri = URI.create(url);
final String path = Optional.fromNullable(uri.getPath()).or("/");
System.out.println(path.substring(path.lastIndexOf('/') + 1));
}

how about:
".*/([^/#]*)(#.*|$)"

Addition to what #jtahlborn answer to include query string:
".*/([^/#|?]*)(#.*|$)"

Related

Need to get value after Domain name in url using java

We are getting url from JSON Response and which we open in in Chrome.The page loads , there is submit button which we click then it redirect to url as :-
https://www.google.com/AB1234
We need the need to retrieve only "AB1234" value from url.
tried following code to get value ="AB1234"
String url = driver.getCurrentUrl();
int index=url.lastIndexOf("/");
String result = url.substring(0,index);
but here getting initial part of url:https://www.google.com/
You need to call substring function with index +1 .
Try below code :
String url = driver.getCurrentUrl();
int index = url.lastIndexOf("/");
String result = url.substring(index + 1);
To parse a URI, it's likely a good idea to use a URI parser.
Given http://example.com/bar
String path = URI.create(driver.getCurrentUrl()).getPath();
will get you '/bar'.
Given http://example.com/bar/mumble the same code gets '/bar/mumble'. It's unclear from your question whether this is what you want. Nevertheless, you should at least start the parse as above.

How to build an absolute URL from a relative URL using Java?

I have a relative url string, know host and protocol. How can I build an absolute url string?
Seems easy? Yes at first look, but until escaped characters coming. I have to build absolute url from 302 code http(s) response Location header.
lets consider an example
protocol: http
host: example.com
location: /path/path?param1=param1Data&param2= "
First I tried to build url string like:
Sting urlString = protocol+host+location
Constructor of URL class not escapes spaces and double quotes:
new URL(urlString)
Constructors of URI class fail with exception:
new URI(urlString)
URI.resolve method also fails with exception
Then I found URI can escape params in query string, but only with few constructors like for example:
URI uri = new URI("http", "example.com",
"/path/path", "param1=param1Data&param2= \"", null);
This constructor needs path and query be a separate arguments, but I have a relative URL, and it not split by path and query parts.
I could consider to check if relative URL contains "?" question sign and think everything before it is path, and everything after it is query, but what if relative url not contain path, but query only, and query contains "?" sign? Then this will not works because part of query will be considered as path.
Now I cannot get how to build absolute url from relative url.
These accepted answers seems just wrong:
how to get URL using relative path
Append relative URL to java.net.URL
Building an absolute URL from a relative URL in Java
It could be nice to consider scenario when relative url was given in relation to url with both host and some path part:
initial url http://example.com/...some path...
relative /home?...query here ...
It would be great to get java core solution, though it still possible to use a good lib.
The first ? indicates where the query string begins:
3.4. Query
[...] The query component is indicated by the first question mark (?) character and terminated by a number sign (#) character or by the end of the URI.
A simple approach (that won't handle fragments and assumes that the query string is always present) is as simple as:
String protocol = "http";
String host = "example.com";
String location = "/path/path?key1=value1&key2=value2";
String path = location.substring(0, location.indexOf("?"));
String query = location.substring(location.indexOf("?") + 1);
URI uri = new URI(protocol, host, path, query, null);
A better approach that can also handle fragments could be :
String protocol = "http";
String host = "example.com";
String location = "/path/path?key1=value1&key2=value2#fragment";
// Split the location without removing the delimiters
String[] parts = location.split("(?=\\?)|(?=#)");
String path = null;
String query = null;
String fragment = null;
// Iterate over the parts to find path, query and fragment
for (String part : parts) {
// The query string starts with ?
if (part.startsWith("?")) {
query = part.substring(1);
continue;
}
// The fragment starts with #
if (part.startsWith("#")) {
fragment = part.substring(1);
continue;
}
// Path is what's left
path = part;
}
URI uri = new URI(protocol, host, path, query, fragment);
The best way seems to be to create a URI object with the multi piece constructors, and then convert it to a URL like so:
URI uri = new URI("https", "sitename.domain.tld", "/path/goes/here", "param1=value&param2=otherValue");
URL url = uri.toURL();

Java String truncate from URL address

I have an URL address like: http://myfile.com/File1/beauty.png
I have to remove http://site address/ from main string
That mean result should be File1/beauty.png
Note: site address might be anything(e.g some.com, some.org)
See here: http://docs.oracle.com/javase/tutorial/networking/urls/urlInfo.html
Just create a URL object out of your string and use URL.getPath() like this:
String s = new URL("http://myfile.com/File1/beauty.png").getPath();
If you don't need the slash at the beginning, you can remove it via s.substring(1, s.length());
Edit, according to comment:
If you are not allowed to use URL, this would be your best bet: Extract main domain name from a given url
See the accepted answer. Basically you have to get a TLD list, find the domain and substract everything till the domain names' end.
If, as you say, you only want to use the standard String methods then this should do it.
public static String getPath(String url){
if(url.contains("://")){
url = url.substring(url.indexOf("://")+3);
url = url.substring(url.indexOf("/") + 1);
} else {
url = url.substring(url.indexOf("/")+1);
}
return url;
}
If the url contains :// then we know that the string you are looking for will come after the third /. Otherwise, it should come after the first. If we do the following;
System.out.println(getPath("http://myfile.com/File1/beauty.png"));
System.out.println(getPath("https://myfile.com/File1/beauty.png"));
System.out.println(getPath("www1.myfile.com/File1/beauty.png"));
System.out.println(getPath("myfile.co.uk/File1/beauty.png"));;
The output is;
File1/beauty.png
File1/beauty.png
File1/beauty.png
File1/beauty.png
You can use the below approach to fetch the required data.
String url = "http://myfile.org/File1/beauty.png";
URL u = new URL(url);
String[] arr = url.split(u.getAuthority());
System.out.println(arr[1]);
Output - /File1/beauty.png
String s = "http://www.freegreatpicture.com/files/146/26189-abstract-color-background.jpg";
s = s.substring(s.indexOf("/", str.indexOf("/") + 1));

Extract part of a URL with URL/URI objects

I have a String holding a URL in this format: http://hello.world.com/service/sps/f4c0e810456t
And I would like to extract the last part of the URL, i.e. f4c0e810456t.
I can do it with substrings:
System.out.println(s.substring(s.lastIndexOf("/") + 1, s.length()));
Or regexp however looking for something more elegant using URL/URI objects but couldn't find something.
Any ideas...?
If you can change the URL to "http://hello.world.com/service/sps/?f4c0e810456t" then you could use the getQuery() method (both on URL and URI).
Example with URL and split (it wrap regular expression for you):
String address = "http://hello.world.com/service/sps/f4c0e810456t";
URL url = new URL(address);
String [] str = url.getPath().split("/");
String result = str[str.length-1];

Java : replacing text URL with clickable HTML link

I am trying to do some stuff with replacing String containing some URL to a browser compatible linked URL.
My initial String looks like this :
"hello, i'm some text with an url like http://www.the-url.com/ and I need to have an hypertext link !"
What I want to get is a String looking like :
"hello, i'm some text with an url like http://www.the-url.com/ and I need to have an hypertext link !"
I can catch URL with this code line :
String withUrlString = myString.replaceAll(".*://[^<>[:space:]]+[[:alnum:]/]", "HereWasAnURL");
Maybe the regexp expression needs some correction, but it's working fine, need to test in further time.
So the question is how to keep the expression catched by the regexp and just add a what's needed to create the link : catched string
Thanks in advance for your interest and responses !
Try to use:
myString.replaceAll("(.*://[^<>[:space:]]+[[:alnum:]/])", "HereWasAnURL");
I didn't check your regex.
By using () you can create groups. The $1 indicates the group index.
$1 will replace the url.
I asked a simalir question: my question
Some exemples: Capturing Text in a Group in a regular expression
public static String textToHtmlConvertingURLsToLinks(String text) {
if (text == null) {
return text;
}
String escapedText = HtmlUtils.htmlEscape(text);
return escapedText.replaceAll("(\\A|\\s)((http|https|ftp|mailto):\\S+)(\\s|\\z)",
"$1$2$4");
}
There may be better REGEXs out there, but this does the trick as long as there is white space after the end of the URL or the URL is at the end of the text. This particular implementation also uses org.springframework.web.util.HtmlUtils to escape any other HTML that may have been entered.
For anybody who is searching a more robust solution I can suggest the Twitter Text Libraries.
Replacing the URLs with this library works like this:
new Autolink().autolink(plainText)
Belows code replaces links starting with "http" or "https", links starting just with "www." and finally replaces also email links.
Pattern httpLinkPattern = Pattern.compile("(http[s]?)://(www\\.)?([\\S&&[^.#]]+)(\\.[\\S&&[^#]]+)");
Pattern wwwLinkPattern = Pattern.compile("(?<!http[s]?://)(www\\.+)([\\S&&[^.#]]+)(\\.[\\S&&[^#]]+)");
Pattern mailAddressPattern = Pattern.compile("[\\S&&[^#]]+#([\\S&&[^.#]]+)(\\.[\\S&&[^#]]+)");
String textWithHttpLinksEnabled =
"ajdhkas www.dasda.pl/asdsad?asd=sd www.absda.pl maiandrze#asdsa.pl klajdld http://dsds.pl httpsda http://www.onet.pl https://www.onsdas.plad/dasda";
if (Objects.nonNull(textWithHttpLinksEnabled)) {
Matcher httpLinksMatcher = httpLinkPattern.matcher(textWithHttpLinksEnabled);
textWithHttpLinksEnabled = httpLinksMatcher.replaceAll("$0");
final Matcher wwwLinksMatcher = wwwLinkPattern.matcher(textWithHttpLinksEnabled);
textWithHttpLinksEnabled = wwwLinksMatcher.replaceAll("$0");
final Matcher mailLinksMatcher = mailAddressPattern.matcher(textWithHttpLinksEnabled);
textWithHttpLinksEnabled = mailLinksMatcher.replaceAll("$0");
System.out.println(textWithHttpLinksEnabled);
}
Prints:
ajdhkas www.dasda.pl/asdsad?asd=sd www.absda.pl maiandrze#asdsa.pl klajdld http://dsds.pl httpsda http://www.onet.pl https://www.onsdas.plad/dasda
Assuming your regex works to capture the correct info, you can use backreferences in your substitution. See the Java regexp tutorial.
In that case, you'd do
myString.replaceAll(....., "\1")
In case of multiline text you can use this:
text.replaceAll("(\\s|\\^|\\A)((http|https|ftp|mailto):\\S+)(\\s|\\$|\\z)",
"$1<a href='$2'>$2</a>$4");
And here is full example of my code where I need to show user's posts with urls in it:
private static final Pattern urlPattern = Pattern.compile(
"(\\s|\\^|\\A)((http|https|ftp|mailto):\\S+)(\\s|\\$|\\z)");
String userText = ""; // user content from db
String replacedValue = HtmlUtils.htmlEscape(userText);
replacedValue = urlPattern.matcher(replacedValue).replaceAll("$1$2$4");
replacedValue = StringUtils.replace(replacedValue, "\n", "<br>");
System.out.println(replacedValue);

Categories