Finding out what .* inside a regex matched?

Finding out what .* inside a regex matched? - java

Hey guys i have a new problem.
I have a dynamically created String, that contains a placeholder !fruit!. I then want to find any Strings that match this String. Therefore i do the following:
String s = "A Basket of !fruit! was lost";
String toCompare "A Basket of Apples was lost";
if (toCompare.match(s.replace("!fruit!", ".*"))) //Yes everycharacter is a Fruit :-P
//do something
I now want to know what the .* matched against (in this case "Apples") and i am kinda clueless on how to go about this...

You can make it a capturing group ((.*)), and use the Pattern API to get the group from the found match:
String s = "A Basket of !fruit! was lost";
String toCompare = "A Basket of Apples was lost";
Pattern pattern = Pattern.compile(s.replace("!fruit!", "(.*)"));
Matcher matcher = pattern.matcher(toCompare);
if (matcher.find()) {
System.out.println(matcher.group(1));
}

Firstly, you need the regex to define a capturing group, which is done with parenthesis: s.replace("!fruit!", "(.*)").
Then, you need to use a Pattern and Matcher instead of just the plain String.match.
Pattern pattern = Pattern.compile(s.replace("!fruit!", "(.*)"));
Matcher m = pattern.matcher(toCompare);
if (m.matches()) {
String fruit = m.group(1);
}
To be a bit more robust, you should also watch out for s strings that themselves have "special" regular expression characters, or don't contain "!fruit!" at all.
For instance, what if s = "Was a basket (or something) of !fruit! lost?". In that case, the first matching group will be of (or something), the parentheses won't be matched against (since they're special characters in regexes), and the ? will affect the t rather than matching a question mark. This would match:
toCompare = "Was a basket or something of apples los";
... with matcher.group(1) being "or something", rather than "apples" (which will be in matcher.group(2)).
Solving this problem generally is going to be just a bit harder. You should basically split the string on !fruit!, use Pattern.quote on each side, and then splice in a "(.*)". (Pattern.quote takes a string and returns a string which, when treated as a regular expression, will match the first string literally. For instance, passing in "foo?" will return `"foo\?".)
String[] splits = s.split("!fruit!");
if (splits.length != 2) {
throw new IllegalArgumentException("no '!fruit!' specified");
}
Pattern pattern = Pattern.compile(
Pattern.quote(splits[0]) + "(.*)" + Pattern.quote(splits[1]));
...

Related

how to exclude "<" in regex match

I have a String which looks like "<name><address> and <Phone_1>". I have get to get the result like
1) <name>
2) <address>
3) <Phone_1>
I have tried using regex "<(.*)>" but it returns just one result.

The regex you want is
<([^<>]+?)><([^<>]+?)> and <([^<>]+?)>
Which will then spit out the stuff you want in the 3 capture groups. The full code would then look something like this:
Matcher m = Pattern.compile("<([^<>]+?)><([^<>]+?)> and <([^<>]+?)>").matcher(string);
if (m.find()) {
String name = m.group(1);
String address = m.group(2);
String phone = m.group(3);
}

The pattern .* in a regex is greedy. It will match as many characters as possible between the first < it finds and the last possible > it can find. In the case of your string it finds the first <, then looks for as much text as possible until a >, which it will find at the very end of the string.
You want a non-greedy or "lazy" pattern, which will match as few characters as possible. Simply <(.+?)>. The question mark is the syntax for non-greedy. See also this question.

This will work if you have dynamic number of groups.
Pattern p = Pattern.compile("(<\\w+>)");
Matcher m = p.matcher("<name><address> and <Phone_1>");
while (m.find()) {
System.out.println(m.group());
}

Java: Need to extract a number from a string

I have a string containing a number. Something like "Incident #492 - The Title Description".
I need to extract the number from this string.
Tried
Pattern p = Pattern.compile("\\d+");
Matcher m = p.matcher(theString);
String substring =m.group();
By getting an error
java.lang.IllegalStateException: No match found
What am I doing wrong?
What is the correct expression?
I'm sorry for such a simple question, but I searched a lot and still not found how to do this (maybe because it's too late here...)

You are getting this exception because you need to call find() on the matcher before accessing groups:
Matcher m = p.matcher(theString);
while (m.find()) {
String substring =m.group();
System.out.println(substring);
}
Demo.

There are two things wrong here:
The pattern you're using is not the most ideal for your scenario, it's only checking if a string only contains numbers. Also, since it doesn't contain a group expression, a call to group() is equivalent to calling group(0), which returns the entire string.
You need to be certain that the matcher has a match before you go calling a group.
Let's start with the regex. Here's what it looks like now.
Debuggex Demo
That will only ever match a string that contains all numbers in it. What you care about is specifically the number in that string, so you want an expression that:
Doesn't care about what's in front of it
Doesn't care about what's after it
Only matches on one occurrence of numbers, and captures it in a group
To that, you'd use this expression:
.*?(\\d+).*
Debuggex Demo
The last part is to ensure that the matcher can find a match, and that it gets the correct group. That's accomplished by this:
if (m.matches()) {
String substring = m.group(1);
System.out.println(substring);
}
All together now:
Pattern p = Pattern.compile(".*?(\\d+).*");
final String theString = "Incident #492 - The Title Description";
Matcher m = p.matcher(theString);
if (m.matches()) {
String substring = m.group(1);
System.out.println(substring);
}

You need to invoke one of the Matcher methods, like find, matches or lookingAt to actually run the match.

contains() method without prefix and suffix in Java

I'm stuck on a (simple, I think) String validation. I use the following method to get a text from an EditText which can have 420 chars and must to contain a specific word (whatever its place):
if(edittext.getText().toString().toLowerCase().contains(stringToHave)) { }
// stringToHave = the specific String
However, I want to improve this condition. For example, you have this:
String a = "This not a MetaStackOverflow question";
String b = "stackoverflow";
And you want to know if a contains b regardless is case sensitive, then, you do as follows:
if(a.toLowerCase().contains(b)) { }
This condition is true because indeed a contains StackOverflow. However, a doesn't contain exactly b, there is a prefix as Meta. And this is not exactly the same word.. I tried to find a way on SO and on other Java websites without result.
How can I improve contains() method to find only the exact String without prefix or suffix? Should I use another method (as containsOnly(), I already tried it but it seems that is undefined for String, it's to check if it contains numeric/alphabetic/etc. chars)?
EDIT:
Add a "two spaces" verification is very attractive and ingenious. However, if the specific text is at the end of the sentence and we add a . this will not work, isn't it?
a = "This is great. This is on StackOverflow." // false because we have "."

You can use regex here. Matcher class has a find() method, which searches for a pattern in a string. This will work:
Matcher matcher = Pattern.compile("(?i)\\b" + Pattern.quote(b) + "\\b").matcher(a);
if (matcher.find()) {
// contains
}
(?i) is case-insensitive embedded flag. Pattern.quote() is used to escape the regex meta-characters if any in the search string.

Use space before and after the pattern what you seek:
if(a.toLowerCase().contains(" " + b + " ")) { }

You can use a case insensitive Pattern with word boundaries \\b to do this:
String a = "This not a MetaStackOverflow question";
String b = "stackoverflow";
Pattern p = Pattern.compile("\\b" + b + "\\b", Pattern.CASE_INSENSITIVE);
if (p.matcher(a).find()) { }

Replace string with part of the matching regex

I have a long string. I want to replace all the matches with part of the matching regex (group).
For example:
String = "This is a great day, is it not? If there is something, THIS IS it. <b>is</b>".
I want to replace all the words "is" by, let's say, "<h1>is</h1>". The case should remain the same as original. So the final string I want is:
This <h1>is</h1> a great day, <h1>is</h1> it not? If there <h1>is</h1> something,
THIS <h1>IS</h1> it. <b><h1>is</h1></b>.
The regex I was trying:
Pattern pattern = Pattern.compile("[.>, ](is)[.<, ]", Pattern.CASE_INSENSITIVE);

The Matcher class is commonly used in conjunction with Pattern. Use the Matcher.replaceAll() method to replace all matches in the string
String str = "This is a great day...";
Pattern p = Pattern.compile("\\bis\\b", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(str);
String result = m.replaceAll("<h1>is</h1>");
Note: Using the \b regex command will match on a word boundary (like whitespace). This is helpful to use in order to ensure that only the word "is" is matched and not words that contain the letters "i" and "s" (like "island").

Like this:
str = str.replaceAll(yourRegex, "<h1>$1</h1>");
The $1 refers to the text captured by group #1 in your regex.

Michael's answer is better, but if you happen to specifically only want [.>, ] and [.<, ] as boundaries, you can do it like this:
String input = "This is a great day, is it not? If there is something, THIS IS it. <b>is</b>";
Pattern p = Pattern.compile("(?<=[.>, ])(is)(?=[.<, ])", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(input);
String result = m.replaceAll("<h1>$1</h1>");

yourStr.replaceAll("(?i)([.>, ])(is)([.<, ])","$1<h1>$2</h1>$3")
(?i)to indicate ignoring case; wrap everything your want to reuse with brackets, reuse them with $1 $2 and $3, concatenate them into what you want.

Simply use a backreference for that.
"This is a great day, is it not? If there is something, THIS IS it. <b>is</b>".replaceAll("[.>, ](is)[.<, ]", "<h1>$2</h1>"); should do.

It may be a late addition, but if anyone is looking for this like
Searching for 'thing' and also he needs 'Something' too to be taken as result,
Pattern p = Pattern.compile("([^ ])is([^ \.])");
String result = m.replaceAll("<\h1>$1is$2</h1>");
will result <\h1>Something</h1> too

String Pattern Matching In Java

I want to search for a given string pattern in an input sting.
For Eg.
String URL = "https://localhost:8080/sbs/01.00/sip/dreamworks/v/01.00/cui/print/$fwVer/{$fwVer}/$lang/en/$model/{$model}/$region/us/$imageBg/{$imageBg}/$imageH/{$imageH}/$imageSz/{$imageSz}/$imageW/{$imageW}/movie/Kung_Fu_Panda_two/categories/3D_Pix/item/{item}/_back/2?$uniqueID={$uniqueID}"
Now I need to search whether the string URL contains "/{item}/". Please help me.
This is an example. Actually I need is check whether the URL contains a string matching "/{a-zA-Z0-9}/"

You can use the Pattern class for this. If you want to match only word characters inside the {} then you can use the following regex. \w is a shorthand for [a-zA-Z0-9_]. If you are ok with _ then use \w or else use [a-zA-Z0-9].
String URL = "https://localhost:8080/sbs/01.00/sip/dreamworks/v/01.00/cui/print/$fwVer/{$fwVer}/$lang/en/$model/{$model}/$region/us/$imageBg/{$imageBg}/$imageH/{$imageH}/$imageSz/{$imageSz}/$imageW/{$imageW}/movie/Kung_Fu_Panda_two/categories/3D_Pix/item/{item}/_back/2?$uniqueID={$uniqueID}";
Pattern pattern = Pattern.compile("/\\{\\w+\\}/");
Matcher matcher = pattern.matcher(URL);
if (matcher.find()) {
System.out.println(matcher.group(0)); //prints /{item}/
} else {
System.out.println("Match not found");
}

That's just a matter of String.contains:
if (input.contains("{item}"))
If you need to know where it occurs, you can use indexOf:
int index = input.indexOf("{item}");
if (index != -1) // -1 means "not found"
{
...
}
That's fine for matching exact strings - if you need real patterns (e.g. "three digits followed by at most 2 letters A-C") then you should look into regular expressions.
EDIT: Okay, it sounds like you do want regular expressions. You might want something like this:
private static final Pattern URL_PATTERN =
Pattern.compile("/\\{[a-zA-Z0-9]+\\}/");
...
if (URL_PATTERN.matcher(input).find())

If you want to check if some string is present in another string, use something like String.contains
If you want to check if some pattern is present in a string, append and prepend the pattern with '.*'. The result will accept strings that contain the pattern.
Example: Suppose you have some regex a(b|c) that checks if a string matches ab or ac
.*(a(b|c)).* will check if a string contains a ab or ac.
A disadvantage of this method is that it will not give you the location of the match, you can use java.util.Mather.find() if you need the position of the match.

You can do it using string.indexOf("{item}"). If the result is greater than -1 {item} is in the string

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Finding out what .* inside a regex matched? - java

Related

how to exclude "<" in regex match

Java: Need to extract a number from a string

contains() method without prefix and suffix in Java

Replace string with part of the matching regex

String Pattern Matching In Java

Categories

Resources