How to retrieve text from url using regex in Android - java

I have a url and I am trying to extract the text before the third slash. I am quite new to the concept in Android. I believe the Pattern class is used to achieve this. My problem is how to.
Take for instance: http://name.mywebsite.com/images.... I only require everything before images. Could anyone point me in the right direction?

You can use the uri method getHost in java.This example will help you,
URI uri = new URI("http://name.mywebsite.com/images");
String host = uri.getHost();
/* It returns name.mywebsite.com/*/

Related

Java - Retain only the Rest API base URL of a Rest API

Small question regarding how to use java to retain only the base URL of a rest API please.
As input, many strings, all valid rest APIs.
For instance, the inputs:
https://some-host.com/v1/someapi
https://another-host.fr/api/compute
https://somewhere.host.com/public/api/v3/getsomething
I would like to only retain the bold part, basically, the https, the : and the slashes, the host name. Everything that comes after the host, I would like to discard it.
Currently, I am trying some kind of string.split based on the / character, then trying to re-concat the arrays, but I have a feeling I am not going to the right direction.
What would be the most appropriate way please?
Thank you.
You could just try java.net.URL or java.net.URI. They behave pretty similar.
For example:
URL url = new URL("http://example.com/a/b/c");
url.getProtocol();
url.getHost();
url.getPath();
or:
URI uri = new URI("http://example.com/a/b/c");
uri.getScheme();
uri.getHost();
uri.getPath();
There are several methods in both classes to extract lot's of different parts.

fetch domain name from url

I have an example url:
www.google.com
I would like to fetch only "com" from this url but I completely don't know how to do this :(
Maybe somebody was struggling with this problem and found a solution?
We have to keep it in mind that example can be more advance for example
www.mydomain.com.pl
and from this we have to fetch "com.pl"
Maybe there is a library who can deal with it very easily...
Each
Use Guava
URI uri = URI.create("htp://www.mydomain.com.pl");
InternetDomainName domainName = InternetDomainName.from(uri.getHost());
System.out.println(domainName.publicSuffix()); //com.pl
You cannot do this correctly without referencing the Public Suffix List (which Guava does)

other Wesites addresses did not match as url

Hi i am using this code to match to the editbox text(where user input web addresses)
(Patterns.WEB_URL.matcher(txt_Editbox).matches())
but when the user input this url:
http://website.info?ques==two&t=p
it did'nt accept as url, it read as a text. could anyone help me to solve this or suggest to do anything else. ??
thank you.
The URL is incorrect. It's missing a URL path separator /. Try matching with:
http://website.info/?ques=two&t=p
I had resolve this problem instead of using
(Patterns.WEB_URL.matcher(txt_Editbox).matches())
i used
String urlname = "^(https?|ftp|file)://.+$";
Matcher matcherObj = Pattern.compile(urlname).matcher(txt_Editbox);
this one can accept all kinds of web addresses as long as this address exist, and now i am able to view this site: http://website.info?ques==two&t=p to my webview.

how to encapsulate a URL site into a Java object?

I hava a Java/Grails app that needs to "read" the contents of a given URL to use it as an image, mostly to be dinamically resized.
My app already parses the url into HTML code using an implementation based on this post: http://www.roseindia.net/java/example/java/io/SourceViewer.shtml.
The class I built returns a String object that contains the HTML source code. Now I want to write this String into an object similar to a BufferedImage so I can display the captured URL into my new application.
any ideas, thanks in advance!
You can use a service like Bluga.net WebThumb and use Glen Smith's ThumbnailService to interface with it.
Or, if you really want to do this by yourself, you can use his Thumbnail Server (with an older version of the ThumbnailService), that he used to use before migrating to WebThumb ;)
Regards
You can create an Image object from a url :
URL url = new URL("http://url.to/your/image.jpg";
Image image = Toolkit.getDefaultToolkit().createImage(url)
If by
take a 'print screen' of the this site
you mean display in your app take a look at this : http://www.java-tips.org/java-se-tips/javax.swing/how-to-display-pages-for-a-web-site-in-your-applic.html

Very Simple Regex Question

I have a very simple regex question. Suppose I have 2 conditions:
url =http://www.abc.com/cde/def
url =https://www.abc.com/sadfl/dsaf
How can I extract the baseUrl using regex?
Sample output:
http://www.abc.com
https://www.abc.com
Like this:
String baseUrl;
Pattern p = Pattern.compile("^(([a-zA-Z]+://)?[a-zA-Z0-9.-]+\\.[a-zA-Z]+(:\d+)?/");
Matcher m = p.matcher(str);
if (m.matches())
baseUrl = m.group(1);
However, you should use the URI class instead, like this:
URI uri = new URI(str);
A one liner without regexp:
String baseUrl = url.substring(0, url.indexOf('/', url.indexOf("//")+2));
/^(https?\:\/\/[^\/]+).*/$1/
This will capture ANYTHING that starts with http and $1 will contain everything from the beginning to the first / after the //
Except for write-and-throw-away scripts, you should always refrain from parsing complex syntaxes (e-mail addresses, urls, html pages, etc etc) using regexes.
believe me, you will get bitten eventually.
I'm pretty sure that there is a Java class that will allow path manipulations, but if it has to be a regex,
https?://[^/]+
would work. (s? included to also handle https:)
Looks like the simplest solution to your two specific examples would be the pattern:
[^/]_//[^/]+
i.e.: non-slash (0 or more times), two slashes, non-slash (0 or more times). You can be stricter than that if you wish, as the two existing answers are doing in different ways -- one will reject e.g. URLs starting with ftp:, the other will reject domains with underscores (but accept URLs without a leading protocol://, thereby being even broader than mine in that respect). This variety of answers (all correct wrt your scant specs;-) should suggest to you that your specs are too vague and should be tightened.
Here's a regex that should satisfy the problem as given.
https?://[^/]*
I'm assuming you're asking this partly to gain more knowledge of regexes. If, however, you're trying to pull the host from a URL, it's arguably much more correct to use Java's more robust parsing methods:
String urlStr = "https://www.abc.com/stuff";
URL url = new URL(urlStr);
String host = url.getHost();
String protocol = url.getProtocol();
URL baseUrl = new URL (protocol, host);
This is better, as it should catch more cases if your input URL isn't as strict as described above.
Old post.. thought I might as well put a simple answer to a simple regex Q:
(http|https):\/\/(www.)?(\w+)?\.(\w+)?

Categories