My Java regex isn't capturing the group - java

I'm trying to match the username with a regex. Please don't suggest a split.
USERNAME=geo
Here's my code:
String input = "USERNAME=geo";
Pattern pat = Pattern.compile("USERNAME=(\\w+)");
Matcher mat = pat.matcher(input);
if(mat.find()) {
System.out.println(mat.group());
}
why doesn't it find geo in the group? I noticed that if I use the .group(1), it finds the username. However the group method contains USERNAME=geo. Why?

Because group() is equivalent to group(0), and group 0 denotes the entire pattern.
From the documentation:
public String group(int group)
Group zero denotes the entire pattern, so the expression m.group(0) is equivalent to m.group()
As you've found out, with your pattern, group(1) gives you what you want.
If you insist on using group(), you'd have to modify the pattern to something like "(?<=USERNAME=)\\w+".

As Matcher.group() javadoc says, it "returns the input subsequence matched by the previous match", and the previous match in your case was "USERNAME=geo" since you've called find().
In contrast, the method group(int) returns specific group. Capturing groups are numbered by counting their opening parentheses from left to right, so the first group would match "geo" in your case.

So the VAR.group( int i ) will return the ith capture group of the regex.
With 0 being the full string. You need to call .group( 1 )

For your solution, here's what works:
public static void main(String[] args) {
String input = "USERNAME=geo";
Pattern pat = Pattern.compile("USERNAME=(\\w+)");
Matcher mat = pat.matcher(input);
if(mat.find()) {
System.out.println(mat.group(1));
}
}
Output
geo
Reason
String java.util.regex.Matcher.group(int
group)
Returns the input subsequence
captured by the given group during the
previous match operation.
For a matcher m, input sequence s, and
group index g, the expressions
m.group(g) and s.substring(m.start(g),
m.end(g)) are equivalent.

That's because group is supposed to return the string matching the pattern in its entirety. For getting a group within that string, you need to pass the group number that you want.
See here for details, paraphrased below:
group
public String group()
Returns the input subsequence matched by the previous match.
public String group(int group)
Returns the input subsequence captured by the given group during the previous match operation.
Capturing groups are indexed from left to right, starting at one. Group zero denotes the entire pattern, so the expression m.group(0) is equivalent to m.group().

Related

Regex isn't extracting specific part rather whole string upto the group

This is the follow up to the question that i asked here
The given regex is perfect i.e., (?:[^\/]*\/){4}([A-Za-z]{3}[0-9]{3}). However, when i do it in java, The java matches the string upto the matching group rather just giving me that string.
String defaultRegex = "(?:[^\\/]*\\/){4}([A-Za-z]{3}[0-9]{3})";
String stringToMatch = "unknown/relevant/nonrelevant:2.2.2/random/ABC123:random/morerandom";
Pattern p = Pattern.compile(defaultRegex);
Matcher m = p.matcher (stringToMatch);
if (m.find()){
System.out.println(m.group());
}
The above thing is printing unknown/relevant/nonrelevant:2.2.2/random/ABC123 when I want regex just to give me ABC123
matcher.group() as well as matcher.group(0) always return the whole matched string.
To get the first capturing group, use matcher.group(1),
The second capturing group goes with matcher.group(2), and so on.

Java RegEx negative lookbehind

I have the following Java code:
Pattern pat = Pattern.compile("(?<!function )\\w+");
Matcher mat = pat.matcher("function example");
System.out.println(mat.find());
Why does mat.find() return true? I used negative lookbehind and example is preceded by function. Shouldn't it be discarded?
See what it matches:
public static void main(String[] args) throws Exception {
Pattern pat = Pattern.compile("(?<!function )\\w+");
Matcher mat = pat.matcher("function example");
while (mat.find()) {
System.out.println(mat.group());
}
}
Output:
function
xample
So first it finds function, which isn't preceded by "function". Then it finds xample which is preceded by function e and therefore not "function".
Presumably you want the pattern to match the whole text, not just find matches in the text.
You can either do this with Matcher.matches() or you can change the pattern to add start and end anchors:
^(?<!function )\\w+$
I prefer the second approach as it means that the pattern itself defines its match region rather then the region being defined by its usage. That's just a matter of preference however.
Your string has the word "function" that matches \w+, and is not preceded by "function ".
Notice two things here:
You're using find() which returns true for a sub-string match as well.
Because of the above, "function" matches as it is not preceded by "function".
The whole string would have never matched because your regex didn't
include spaces.
Use Mathcher#matches() or ^ and $ anchors with a negative lookahead instead:
Pattern pat = Pattern.compile("^(?!function)[\\w\\s]+$"); // added \s for whitespaces
Matcher mat = pat.matcher("function example");
System.out.println(mat.find()); // false

Find pattern in string with regex -> how to improve my solution

i would like to parse a string and get the "stringIAmLookingFor"-part of it, which is surrounded by "\_" at the end and the beginning. I'm using a regex to match that and then remove the "\_" in the found string. This is working, but I'm wondering if there is a more elegant approach to this problem?
String test = "xyz_stringIAmLookingFor_zxy";
Pattern p = Pattern.compile("_(\\w)*_");
Matcher m = p.matcher(test);
while (m.find()) { // find next match
String match = m.group();
match = match.replaceAll("_", "");
System.out.println(match);
}
Solution (partial)
Please also check the next section. Don't just read the solution here.
Just modify your code a bit:
String test = "xyz_stringIAmLookingFor_zxy";
// Make the capturing group capture the text in between (\w*)
// A capturing group is enclosed in (pattern), denoting the part of the
// pattern whose text you want to get separately from the main match.
// Note that there is also non-capturing group (?:pattern), whose text
// you don't need to capture.
Pattern p = Pattern.compile("_(\\w*)_");
Matcher m = p.matcher(test);
while (m.find()) { // find next match
// The text is in the capturing group numbered 1
// The numbering is by counting the number of opening
// parentheses that makes up a capturing group, until
// the group that you are interested in.
String match = m.group(1);
System.out.println(match);
}
Matcher.group(), without any argument will return the text matched by the whole regex pattern. Matcher.group(int group) will return the text matched by capturing group with the specified group number.
If you are using Java 7, you can make use of named capturing group, which makes the code slightly more readable. The string matched by the capturing group can be accessed with Matcher.group(String name).
String test = "xyz_stringIAmLookingFor_zxy";
// (?<name>pattern) is similar to (pattern), just that you attach
// a name to it
// specialText is not a really good name, please use a more meaningful
// name in your actual code
Pattern p = Pattern.compile("_(?<specialText>\\w*)_");
Matcher m = p.matcher(test);
while (m.find()) { // find next match
// Access the text captured by the named capturing group
// using Matcher.group(String name)
String match = m.group("specialText");
System.out.println(match);
}
Problem in pattern
Note that \w also matches _. The pattern you have is ambiguous, and I don't know what your expected output is for the cases where there are more than 2 _ in the string. And do you want to allow underscore _ to be part of the output?
You can define the group you actually want, since you're already using parentheses. You just need to tweak your pattern a bit.
String test = "xyz_stringIAmLookingFor_zxy";
Pattern p = Pattern.compile("_(\\w*)_");
Matcher m = p.matcher(test);
while (m.find()) { // find next match
System.out.println(m.group(1));
}
Use group(1) instead of group() because group() will get you the entire pattern and not the matching group.
Reference : http://docs.oracle.com/javase/7/docs/api/java/util/regex/Matcher.html#group(int)
"xyz_stringIAmLookingFor_zxy".replaceAll("_(\\w)*_", "$1");
will replace everything by this group in parenthesis
a simpler regex, no group needed:
"(?<=_)[^_]*"
if you want it more strict:
"(?<=_)[^_]+(?=_)"
try
String s = "xyz_stringIAmLookingFor_zxy".replaceAll(".*_(\\w*)_.*", "$1");
System.out.println(s);
output
stringIAmLookingFor

Java Matching multiple tokens with Regex

I found a regular expression which matches tokens surrounded with {} but it only seems find the first found item.
How can the following code be changed so that all of the tokens will be found rather than just {World}, would i need to use loops?
// The search string
String str = "Hello {World} this {is} a {Tokens} test";
// The Regular expression (Finds {word} tokens)
Pattern pt = Pattern.compile("\\{([^}]*)\\}");
// Match the string with the pattern
Matcher m = pt.matcher(str);
// If results are found
if (m.find()) {
System.out.println(m);
System.out.println(m.groupCount()); // 1
System.out.println(m.group(0)); // {World}
System.out.println(m.group(1)); // World (Get without {})
}
The groupCount() method doesn't return the number of matches, it returns the number of capturing groups in this matcher's pattern. You defined one group in your pattern, hence this method returns 1.
You can find a next match to your pattern by calling find() again; it will attempt to find the next subsequence of the input sequence that matches the pattern. When it returns false, you'll know there are no more matches.
Thus, you should iterate through your matches like this:
while (m.find()) {
System.out.println(m.group(0));
}
Yes, in your code you just do one match, and get the groups captured in that single match.
If you want to get the other matches, you have to continue matching in a loop until find() returns false.
So basically all you need is to replace if with while and you're there.

java regular expression

Can anyone please help me do the following in a java regular expression?
I need to read 3 characters from the 5th position from a given String ignoring whatever is found before and after.
Example : testXXXtest
Expected result : XXX
You don't need regex at all.
Just use substring: yourString.substring(4,7)
Since you do need to use regex, you can do it like this:
Pattern pattern = Pattern.compile(".{4}(.{3}).*");
Matcher matcher = pattern.matcher("testXXXtest");
matcher.matches();
String whatYouNeed = matcher.group(1);
What does it mean, step by step:
.{4} - any four characters
( - start capturing group, i.e. what you need
.{3} - any three characters
) - end capturing group, you got it now
.* followed by 0 or more arbitrary characters.
matcher.group(1) - get the 1st (only) capturing group.
You should be able to use the substring() method to accomplish this:
string example = "testXXXtest";
string result = example.substring(4,7);
This might help: Groups and capturing in java.util.regex.Pattern.
Here is an example:
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class Example {
public static void main(String[] args) {
String text = "This is a testWithSomeDataInBetweentest.";
Pattern p = Pattern.compile("test([A-Za-z0-9]*)test");
Matcher m = p.matcher(text);
if (m.find()) {
System.out.println("Matched: " + m.group(1));
} else {
System.out.println("No match.");
}
}
}
This prints:
Matched: WithSomeDataInBetween
If you don't want to match the entire pattern rather to the input string (rather than to seek a substring that would match), you can use matches() instead of find(). You can continue searching for more matching substrings with subsequent calls with find().
Also, your question did not specify what are admissible characters and length of the string between two "test" strings. I assumed any length is OK including zero and that we seek a substring composed of small and capital letters as well as digits.
You can use substring for this, you don't need a regex.
yourString.substring(4,7);
I'm sure you could use a regex too, but why if you don't need it. Of course you should protect this code against null and strings that are too short.
Use the String.replaceAll() Class Method
If you don't need to be performance optimized, you can try the String.replaceAll() class method for a cleaner option:
String sDataLine = "testXXXtest";
String sWhatYouNeed = sDataLine.replaceAll( ".{4}(.{3}).*", "$1" );
References
https://docs.oracle.com/javase/1.5.0/docs/api/java/lang/String.html
http://www.vogella.com/tutorials/JavaRegularExpressions/article.html#using-regular-expressions-with-string-methods

Categories