How can I get a regular expression to discard a part of the match?
public class main {
public static void main(String[] args) {
Pattern pattern = Pattern.compile("(?<=b)([xyz])(?:a*?)c");
String string = "abyaacbxaaac";
Matcher matcher = pattern.matcher(string);
while(matcher.find()){
System.out.println(matcher.group());
}
}
}
The output here is:
yaac
xaaac
I'd like it to output only y and x when I run System.out.println(matcher.group());
I.e. Discarding what is matched by(?:a*?)
P.S.
I know I can use matcher.group(1) to get x and y on its own but I'd like the entire match to output x and y only without having to access specific groups.
You can use lookarounds in your regex to get only the part you need in match:
(?<=b)[xyz](?=a*c)
RegEx Demo
(?=a*c) is a positive lookahead to assert that we have 0 or more a followed by a c ahead. This is a zero width assertion so your match will still be one of [xyz] characters.
Related
Using regex to pull out words with length of 5 with space before and after. Thus all the following words should match my pattern. But it seems after matching the first word, the space is consumed which makes the second word fail the match.
To illustrate, I should/ want to get the printout as:
apple orange pines dorms
Instead, I get:
apple pines
How can I handle this issue?
Code:
public static void main(String[] args) {
String myStr = " apple orange pines dorms ";
regexChecker("(\\s[A-Za-z]{5}\\s)", myStr);
}
public static void regexChecker(String regex, String strToCheckOn){
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(strToCheckOn);
while (m.find()){
if(m.group().length() != 0){
System.out.println(m.group(1));
}
System.out.println();
}
}
You need to use lookahead and lookbehind instead of consuming spaces before/after words:
(?<=\\s|^)[A-Za-z]{5,}(?=\\s|$)
RegEx Demo
(?<=\\s|^) is lookbehind that asserts we have line start or a whitespace before our match
(?=\\s|$) is lookahead that asserts we have line end or a whitespace after our match
I have following programm
public class PatternMatching {
public static void main(String[] args) {
String pattern ="a??";
Pattern pattern1 = Pattern.compile(pattern);
String findAgainst = "a";
Matcher matcher = pattern1.matcher(findAgainst);
int count=0;
while(matcher.find()){
count++;
System.out.println(matcher.group(0)+".start="+ matcher.start()+".end="+matcher.end());
}
System.out.println(count);
}
}
which prints following output
.start=0.end=0
.start=1.end=1
2
instead of
.start=0.end=0
a.start=0.end=1
.start=1.end=1
3
when I run the program with pattern "b??"
the output is
.start=0.end=0
.start=1.end=1
2
which is correct. What would be the reason for incorrect output eventhough it is a reluctant qualifier?
From what I see, the issue is that Java regex engine uses the following algorithm when encountering a zero-length match: it compares the index of the match to the current regex index, and if they coincide, the regex index is incremented.
Thus, when you matched the empty space before a with a?? the regex engine found a zero-length match and incremented the index that appeared after a, thus, skipping a correct match.
If you use a greedy version - a? - the output will be different:
a.start=0.end=1
.start=1.end=1
2
It happens because the first a was consumed, the regex engine index is after a, and can now match the end-of-string.
I have a Regex Pattern that i am using to match screen.
When i use it to test in Sublime Text, the same is working just fine.
but in Java execution, the code is failing
System.out.println(Pattern.matches("(B+)?|(R+)?", "RRBRR"));//false
System.out.println(Pattern.matches("(B+)?|(R+)?", "RRRRR"));//true
The above code should be coming as true in both cases, whereas in java it is coming as false.
my basic requirement is to identify groups of unique character in sequence...
meaning if String is
RRRRBBBRRBBBRBBBRRR
Then it should identify as
RRRR BBB RR BBB R BBB RRR
Please help...Thanks in advance
Try this:
String value = "RRRRBBBRRBBBRBBBRRR";
Pattern pattern = Pattern.compile("B+|R+");
Matcher matcher = pattern.matcher(value);
while (matcher.find()) {
System.out.println(matcher.group());
}
The fact that the first expression returns false is due to the fact that you have a B in a middle of several R so you don't have an exact match since your regular expression expect only Rs or Bs
matches adds an implicit ^ at the start & $ at the end which means substring matches wont work. find() will look for substring.
Matcher is best suited for this:
public static void main (String[] args) throws java.lang.Exception
{
String regex = "(B+)?|(R+)?";
Pattern pat = Pattern.compile(regex);
Matcher matcher = pat.matcher("RRBRR");
System.out.println(matcher.find());
int count = 0;
while(matcher.find()){
System.out.println(matcher.group());
count++;
}
System.out.println("Count:"+count);
}
I have the following Java code:
Pattern pat = Pattern.compile("(?<!function )\\w+");
Matcher mat = pat.matcher("function example");
System.out.println(mat.find());
Why does mat.find() return true? I used negative lookbehind and example is preceded by function. Shouldn't it be discarded?
See what it matches:
public static void main(String[] args) throws Exception {
Pattern pat = Pattern.compile("(?<!function )\\w+");
Matcher mat = pat.matcher("function example");
while (mat.find()) {
System.out.println(mat.group());
}
}
Output:
function
xample
So first it finds function, which isn't preceded by "function". Then it finds xample which is preceded by function e and therefore not "function".
Presumably you want the pattern to match the whole text, not just find matches in the text.
You can either do this with Matcher.matches() or you can change the pattern to add start and end anchors:
^(?<!function )\\w+$
I prefer the second approach as it means that the pattern itself defines its match region rather then the region being defined by its usage. That's just a matter of preference however.
Your string has the word "function" that matches \w+, and is not preceded by "function ".
Notice two things here:
You're using find() which returns true for a sub-string match as well.
Because of the above, "function" matches as it is not preceded by "function".
The whole string would have never matched because your regex didn't
include spaces.
Use Mathcher#matches() or ^ and $ anchors with a negative lookahead instead:
Pattern pat = Pattern.compile("^(?!function)[\\w\\s]+$"); // added \s for whitespaces
Matcher mat = pat.matcher("function example");
System.out.println(mat.find()); // false
In the following code:
public static void main(String[] args) {
List<String> allMatches = new ArrayList<String>();
Matcher m = Pattern.compile("\\d+\\D+\\d+").matcher("2abc3abc4abc5");
while (m.find()) {
allMatches.add(m.group());
}
String[] res = allMatches.toArray(new String[0]);
System.out.println(Arrays.toString(res));
}
The result is:
[2abc3, 4abc5]
I'd like it to be
[2abc3, 3abc4, 4abc5]
How can it be achieved?
Make the matcher attempt to start its next scan from the latter \d+.
Matcher m = Pattern.compile("\\d+\\D+(\\d+)").matcher("2abc3abc4abc5");
if (m.find()) {
do {
allMatches.add(m.group());
} while (m.find(m.start(1)));
}
Not sure if this is possible in Java, but in PCRE you could do the following:
(?=(\d+\D+\d+)).
Explanation
The technique is to use a matching group in a lookahead, and then "eat" one character to move forward.
(?= : start of positive lookahead
( : start matching group 1
\d+ : match a digit one or more times
\D+ : match a non-digit character one or more times
\d+ : match a digit one or more times
) : end of group 1
) : end of lookahead
. : match anything, this is to "move forward".
Online demo
Thanks to Casimir et Hippolyte it really seems to work in Java. You just need to add backslashes and display the first capturing group: (?=(\\d+\\D+\\d+))..
Tested on www.regexplanet.com:
The above solution of HamZa works perfectly in Java. If you want to find a specific pattern in a text all you have to do is:
String regex = "\\d+\\D+\\d+";
String updatedRegex = "(?=(" + regex + ")).";
Where the regex is the pattern you are looking for and to be overlapping you need to surround it with (?=(" at the start and ")). at the end.