how to extract this using regex - java

I need to extract this
Example:
www.google.com
maps.google.com
maps.maps.google.com
I need to extraact google.com from this.
How can I do this in Java?

Split on . and pick the last two bits.
String s = "maps.google.com";
String[] arr = s.split("\\.");
//should check the size of arr here
System.out.println(arr[arr.length-2] + '.' + arr[arr.length-1]);

Assuming you want to get the top level domain out of the hostname, you could try this:
Pattern pat = Pattern.compile( ".*\\.([^.]+\\.[^.]+)" ) ;
Matcher mat = pat.matcher( "maps.google.com" ) ;
if( mat.find() ) {
System.out.println( mat.group( 1 ) ) ;
}
if it's the other way round, and you want everything excluding the last 2 parts of the domain (in your example; www, maps, and maps.maps), then just change the first line to:
Pattern pat = Pattern.compile( "(.*)\\.[^.]+\\.[^.]+" ) ;

Extracting a known substring from a string doesn't make much sense ;) Why would you do a
String result = address.replaceAll("^.*google.com$", "$1");
when this is equal:
String result = "google.com";
If you need a test, try:
String isGoogle = address.endsWith(".google.com");
If you need the other part from a google address, this may help:
String googleSubDomain = address.replaceAll(".google.com", "");
(hint - the first line of code is a solution for your problem!)

String str="www.google.com";
try{
System.out.println(str.substring(str.lastIndexOf(".", str.lastIndexOf(".") - 1) + 1));
}catch(ArrayIndexOutOfBoundsException ex){
//handle it
}
Demo

Related

How to extract id from url ? Google sheet

I have the follow urls.
https://docs.google.com/spreadsheets/d/1mrsetjgfZI2BIypz7SGHMOfHGv6kTKTzY0xOM5c6TXY/edit#gid=1842172258
https://docs.google.com/a/example.com/spreadsheets/d/1mrsetjgfZI2BIypz7SGHMOfHGv6PTKTzY0xOM5c6TXY/edit#gid=1842172258
https://docs.google.com/spreadsheets/d/1mrsetjgfZI2BIypz7SGHMOfHGv6kTKTzY0xOM5c6TXY
Foreach url, I need to extract the sheet id: 1mrsetjgfZI2BIypz7SGHMOfHGv6PTKTzY0xOM5c6TXY into a java String.
I am thinking of using split but it can't work with all test cases:
String string = "https://docs.google.com/spreadsheets/d/1mrsetjgfZI2BIypz7SGHMOfHGv6kTKTzY0xOM5c6TXY/edit#gid=1842172258";
String[] parts = string.split("/");
String res = parts[parts.length-2];
Log.d("hello res",res );
How can I that be possible?
You can use regex \/d\/(.*?)(\/|$) (regex demo) to solve your problem, if you look closer you can see that the ID exist between d/ and / or end of line for that you can get every thing between this, check this code demo :
String[] urls = new String[]{
"https://docs.google.com/spreadsheets/d/1mrsetjgfZI2BIypz7SGHMOfHGv6kTKTzY0xOM5c6TXY/edit#gid=1842172258",
"https://docs.google.com/a/example.com/spreadsheets/d/1mrsetjgfZI2BIypz7SGHMOfHGv6PTKTzY0xOM5c6TXY/edit#gid=1842172258",
"https://docs.google.com/spreadsheets/d/1mrsetjgfZI2BIypz7SGHMOfHGv6kTKTzY0xOM5c6TXY"
};
String regex = "\\/d\\/(.*?)(\\/|$)";
Pattern pattern = Pattern.compile(regex);
for (String url : urls) {
Matcher matcher = pattern.matcher(url);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
}
Outputs
1mrsetjgfZI2BIypz7SGHMOfHGv6kTKTzY0xOM5c6TXY
1mrsetjgfZI2BIypz7SGHMOfHGv6PTKTzY0xOM5c6TXY
1mrsetjgfZI2BIypz7SGHMOfHGv6kTKTzY0xOM5c6TXY
it looks like the id you are looking for always follow "/spreadsheets/d/" if it is the case you can update your code to that
String string = "https://docs.google.com/spreadsheets/d/1mrsetjgfZI2BIypz7SGHMOfHGv6kTKTzY0xOM5c6TXY/edit#gid=1842172258";
String[] parts = string.split("spreadsheets/d/");
String result;
if(parts[1].contains("/")){
String[] parts2 = parts[1].split("/");
result = parts2[0];
}
else{
result=parts[1];
}
System.out.println("hello "+ result);
Using regex
Pattern pattern = Pattern.compile("(?<=\\/d\\/)[^\\/]*");
Matcher matcher = pattern.matcher(url);
System.out.println(matcher.group(1));
Using Java
String result = url.substring(url.indexOf("/d/") + 3);
int slash = result.indexOf("/");
result = slash == -1 ? result
: result.substring(0, slash);
System.out.println(result);
Google use fixed lenght characters for its IDs, in your case they are 44 characters and these are the characters google use: alphanumeric, -, and _ so you can use this regex:
regex = "([\w-]){44}"
match = re.search(regex,url)

Extract last number after decimal

I am getting a piece of JSON text from a url connection and saving it to a string currently as such:
...//setting up url and connection
BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream()));
String str = in.readLine();
When I print str, I correctly find the data {"build":{"version_component":"1.0.111"}}
Now I want to extract the 111 from str, but I am having some trouble.
I tried
String afterLastDot = inputLine.substring(inputLine.lastIndexOf(".") + 1);
but I end up with 111"}}
I need a solution that is generic so that if I have String str = {"build":{"version_component":"1.0.111111111"}}; the solution still works and extracts 111111111 (ie, I don't want to hard code extract the last three digits after the decimal point)
If you cannot use a JSON parser then you can this regex based extraction:
String lastNum = str.replaceAll("^.*\\.(\\d+).*", "$1");
RegEx Demo
^.* is greedy match that matches everything until last DOT and 1 or more digits that we put in group #1 to be used in replacement.
Find the start and the end indexes of the String you need and substring(start, end) :
// String str = "{"build":{"version_component":"1.0.111"}};" cannot compile without escaping
String str = "{\"build\":{\"version_component\":\"1.0.111\"}}";
int start = str.lastIndexOf(".")+1;
int end = str.lastIndexOf("\"");
String substring = str.substring(start,end);
just use JSON api
JSONObject obj = new JSONObject(str);
String versionComponent= obj.getJSONObject("build").getString("version_component");
Then just split and take the last element
versionComponent.split("\\.")[2];
Please, your can try the following code :
...
int index = inputLine.lastIndexOf(".")+1 ;
String afterLastDot = inputLine.substring(index, index+3);
With Regular Expressions (Rexp),
You can solve your problem like this ;
Pattern pattern = Pattern.compile("111") ;
Matcher matcher = pattern.matcher(str) ;
while(matcher.find()){
System.out.println(matcher.start()+" "+matcher.end());
System.out.println(str.substring(matcher.start(), matcher.end()));
}

How to get multi sub strings from String, Android/Java

I know there are similar questions regarding to this. However, I tried many solutions and it just does not work for me.
I need help to extract multiple substrings from a string:
String content = "Ben Conan General Manager 90010021 benconan#gmail.com";
Note: The content in the String may not be always in this format, it may be all jumbled up.
I want to extract the phone number and email like below:
1. 90010021
2. benconan#gmail.com
In my project, I was trying to get this result and then display it into 2 different EditText.
I have tried using pattern and matcher class but it did not work.
I can provide my codes here if requested, please help me ~
--------------------EDIT---------------------
Below is my current method which only take out the email address:
private static final String EMAIL_PATTERN =
"[a-zA-Z0-9\\+\\.\\_\\%\\-\\+]{1,256}" +
"\\#" +
"[a-zA-Z0-9][a-zA-Z0-9\\-]{0,64}" +
"(" +
"\\." +
"[a-zA-Z0-9][a-zA-Z0-9\\-]{0,25}" +
")+";
public String EmailValidator(String email) {
Pattern pattern = Pattern.compile(EMAIL_PATTERN);
Matcher matcher = pattern.matcher(email);
if (matcher.find()) {
return email.substring(matcher.start(), matcher.end());
} else {
// TODO handle condition when input doesn't have an email address
}
return email;
}
You can separate your string into arraylist like this
String str = "Ben Conan, General Manager, 90010021, benconan#gmail.com";
List<String> List = Arrays.asList(str.split(" "));
maybe you should do this instead of yours :
String[] Stringnames = new String[5]
Stringnames [0] = "your phonenumber"
Stringnames[1] = "your email"
System.out.println(stringnames)
Or :
String[] Stringnames = new String[2]
String[] Stringnames = {"yournumber","your phonenumber"};
System.out.println(stringnames [1]);
String.split(...) is a java method for that.
EXAMPLE:
String content = "Ben Conan, General Manager, 90010021, benconan#gmail.com";
String[] selection = content.split(",");
System.out.println(selection[0]);
System.out.println(selection[3]);
BUT if you want to do a Regex then take a look at this:
https://stackoverflow.com/a/16053961/982161
Try this regex for phone number
[\d+]{8} ---> 8 represents number of digits in phone number
You can use
[\d+]{8,} ---> if you want the number of more than 8 digits
Use appropriate JAVA functions for matching. You can try the results here
http://regexr.com/
For email, it depends whether the format is simple or complicated. There is a good explanation here
http://www.regular-expressions.info/index.html

regex matcher check in if logic not working

Hi, you can see my code below. I have some strings Country, rank and grank in my code, initially they will be null, but if regex is mached, it should change the value. But even if regex is matched it is not changing the value it is always null. If I remove all if statements and append the string it works fine, but if match is not found it is throwing an exception. Please let me know how can I check this in if logic.
System.err.println(content);
Pattern c = Pattern.compile("NAME=\"(.*)\" RANK");
Pattern r = Pattern.compile("\" RANK=\"(.*)\"");
Pattern gr = Pattern.compile("\" TEXT=\"(.*)\" SOURCE");
Matcher co = c.matcher(content);
Matcher ra = r.matcher(content);
Matcher gra = gr.matcher(content);
co.find();
ra.find();
gra.find();
String country = null;
String Rank = null;
String Grank = null;
if (co.matches()) {
country = co.group(1);
}
if (ra.matches()) {
Rank = ra.group(1);
}
if (gra.matches()) {
Grank = gra.group(1);
}
You have to escape a single \ - use double \\ then it should work.
Tried this?
while (co.find()) {
System.out.print("Start index: " + co.start());
System.out.print(" End index: " + co.end() + " ");
System.out.println(co.group());
}
Personally I can't make your program work with / without the if so it's not a problem of logic but just a problem that it doesn't match the string for me
So I changed it to get something working, maybe you can use it :)
String content = "NAME=\"salut\" RANK=\"pouet\" TEXT=\"text\" SOURCE";
System.out.println(content);
System.out.println(content.replaceAll(("NAME=\"(.*)\"\\sRANK=\"(.*)\"\\sTEXT=\"(.*)\" SOURCE"), "$1---$2---$3"));
Output
NAME="salut" RANK="pouet" TEXT="text" SOURCE
salut---pouet---text

Replace String in Java with regex and replaceAll

Is there a simple solution to parse a String by using regex in Java?
I have to adapt a HTML page. Therefore I have to parse several strings, e.g.:
href="/browse/PJBUGS-911"
=>
href="PJBUGS-911.html"
The pattern of the strings is only different corresponding to the ID (e.g. 911). My first idea looks like this:
String input = "";
String output = input.replaceAll("href=\"/browse/PJBUGS\\-[0-9]*\"", "href=\"PJBUGS-???.html\"");
I want to replace everything except the ID. How can I do this?
Would be nice if someone can help me :)
You can capture substrings that were matched by your pattern, using parentheses. And then you can use the captured things in the replacement with $n where n is the number of the set of parentheses (counting opening parentheses from left to right). For your example:
String output = input.replaceAll("href=\"/browse/PJBUGS-([0-9]*)\"", "href=\"PJBUGS-$1.html\"");
Or if you want:
String output = input.replaceAll("href=\"/browse/(PJBUGS-[0-9]*)\"", "href=\"$1.html\"");
This does not use regexp. But maybe it still solves your problem.
output = "href=\"" + input.substring(input.lastIndexOf("/")) + ".html\"";
This is how I would do it:
public static void main(String[] args)
{
String text = "href=\"/browse/PJBUGS-911\" blahblah href=\"/browse/PJBUGS-111\" " +
"blahblah href=\"/browse/PJBUGS-34234\"";
Pattern ptrn = Pattern.compile("href=\"/browse/(PJBUGS-[0-9]+?)\"");
Matcher mtchr = ptrn.matcher(text);
while(mtchr.find())
{
String match = mtchr.group(0);
String insMatch = mtchr.group(1);
String repl = match.replaceFirst(match, "href=\"" + insMatch + ".html\"");
System.out.println("orig = <" + match + "> repl = <" + repl + ">");
}
}
This just shows the regex and replacements, not the final formatted text, which you can get by using Matcher.replaceAll:
String allRepl = mtchr.replaceAll("href=\"$1.html\"");
If just interested in replacing all, you don't need the loop -- I used it just for debugging/showing how regex does business.

Categories