Matching a regular expression on multiline not working - java

I have the following file contents and I'm trying to match a reg explained below:
-- file.txt (doesn't match multi-line) --
test
On blah
more blah wrote:
---------------
If I read the file contents from above to a String and try to match the "On...wrote:" part I cannot get a match:
// String text = <file contents from above>
Pattern PATTERN = Pattern.compile("^(On\\s(.+)wrote:)$", Pattern.MULTILINE);
Matcher m = PATTERN.matcher(text);
if (m.find()) {
System.out.println("Never gets HERE???");
}
The above regex works fine if the contents of the file are on one line:
-- file2.txt (matches on single line) --
test
On blah more blah wrote: On blah more blah wrote:
---------------
How do I get the multiline to work and the single line all in one regex (or two for that matter)? Thx!

Pattern.MULTILINE just tells Java to accept the anchors ^ and $ to match at the start and end of each line.
Add the Pattern.DOTALL flag to allow the dot . character to match newline characters. This is done using the bitwise inclusive OR | operator
Pattern PATTERN =
Pattern.compile("^(On\\s(.+)wrote:)$", Pattern.MULTILINE | Pattern.DOTALL );

You could use a combination of matching \S (non-whitespace) and \s (whitespace)
Pattern PATTERN = Pattern.compile("(On\\s([\\S\\s]*?)wrote:)");
See live regex101 demo
Example:
import java.util.regex.*;
class rTest {
public static void main (String[] args) {
String s = "test\n\n"
+ "On blah\n\n"
+ "more blah wrote:\n";
Pattern p = Pattern.compile("(On\\s([\\S\\s]*?)wrote:)");
Matcher m = p.matcher(s);
if (m.find()) {
System.out.println(m.group(2));
}
}
}

Related

How to figure out exact reason why Regex is failing in java

I have a Regex Pattern that i am using to match screen.
When i use it to test in Sublime Text, the same is working just fine.
but in Java execution, the code is failing
System.out.println(Pattern.matches("(B+)?|(R+)?", "RRBRR"));//false
System.out.println(Pattern.matches("(B+)?|(R+)?", "RRRRR"));//true
The above code should be coming as true in both cases, whereas in java it is coming as false.
my basic requirement is to identify groups of unique character in sequence...
meaning if String is
RRRRBBBRRBBBRBBBRRR
Then it should identify as
RRRR BBB RR BBB R BBB RRR
Please help...Thanks in advance
Try this:
String value = "RRRRBBBRRBBBRBBBRRR";
Pattern pattern = Pattern.compile("B+|R+");
Matcher matcher = pattern.matcher(value);
while (matcher.find()) {
System.out.println(matcher.group());
}
The fact that the first expression returns false is due to the fact that you have a B in a middle of several R so you don't have an exact match since your regular expression expect only Rs or Bs
matches adds an implicit ^ at the start & $ at the end which means substring matches wont work. find() will look for substring.
Matcher is best suited for this:
public static void main (String[] args) throws java.lang.Exception
{
String regex = "(B+)?|(R+)?";
Pattern pat = Pattern.compile(regex);
Matcher matcher = pat.matcher("RRBRR");
System.out.println(matcher.find());
int count = 0;
while(matcher.find()){
System.out.println(matcher.group());
count++;
}
System.out.println("Count:"+count);
}

Regex to match the beginning and the end of a string in Java

I want to extract a certain like of string using Regex in Java. I currently have this pattern:
pattern = "^\\a.+\\sed$\n";
Supposed to match on a string that starts with "a" and ends with "sed". This is not working. Did I miss something ?
Removed the \n line at the end of the pattern and replaced it with a "$":
Still doesn't get a match. The regex looks legit from my side.
What I want to extract is the "a sed" from the temp string.
String temp = "afsgdhgd gfgshfdgadh a sed afdsgdhgdsfgdfagdfhh";
pattern = "(?s)^a.*sed$";
pr = Pattern.compile(pattern);
math = pr.matcher(temp);
UPDATE
You want to match a sed, so you can use a\\s+sed if there is only whitespace between a and sed:
String s = "afsgdhgd gfgshfdgadh a sed afdsgdhgdsfgdfagdfhh";
Pattern pattern = Pattern.compile("a\\s+sed");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
System.out.println(matcher.group(0));
}
See IDEONE demo
Now, if there can be anything between a and sed, use a tempered greedy token:
Pattern pattern = Pattern.compile("(?s)a(?:(?!a|sed).)*sed");
^^^^^^^^^^^^^
See another IDEONE demo.
ORIGINAL ANSWER
The main problem with your regex is the \n at the end. $ is the end of string, and you try to match one more character after a string end, which is impossible. Also, \\s matches a whitespace symbol, but you need a literal s.
You need to remove \\s and \n and make . match a newline, and also it is advisbale to use * quantifier to allow 0 symbols in-between:
pattern = "(?s)^a.*sed$";
See the regex demo
The regex matches:
^ - start of string
a - a literal a
.* - 0 or more any characters (since (?s) modifier makes a . match any character including a newline)
sed - a literal letter sequence sed
$ - end of string
Your temp string cannot match the pattern (?s)^a.*sed$, because this pattern says that your temp string must begin with the character a and end with the sequence sed, which is not the case. Your string has trailing characters after the "sed" sequence.
If you only want to extract that a...sed portion of the whole string, try using the unanchored pattern "a.*sed" and use the find() method of the Matcher class:
Pattern pattern = Pattern.compile("a.*sed");
Matcher m = pattern.matcher(temp);
if (m.find())
{
System.out.println("Found string "+m.group());
System.out.println("From "+m.start()+" to "+m.end());
}

Java regex pattern issue

I have a string:
bundle://24.0:0/com/keop/temp/Activator.class
And from this string I need to get com/keop/temp/Activator but the following pattern:
Pattern p = Pattern.compile("bundle://.*/(.*)\\.class");
returns only Activator. Where is my mistake?
You need to follow the initial token .* with ? for a non-greedy match.
bundle://.*?/(.*)\\.class
^
Your regex uses greedy matching with a . that matches any character (but a newline). .*/ reads everything up to the final /, (.*)\\. matches everything up to the final period. Instead of lazy matching, you can restrict the characters matched to non-/ before the string you want to match. Change to
Pattern p = Pattern.compile("bundle://[^/]*/(.*)\\.class");
Sample code:
String str = "bundle://24.0:0/com/keop/temp/Activator.class";
Pattern ptrn = Pattern.compile("bundle://[^/]*/(.*)\\.class");
Matcher matcher = ptrn.matcher(str);
if (matcher.find()) {
System.out.println(matcher.group(1));
Output of the sample program:
com/keop/temp/Activator

java regex , extract a line?

given 3 lines , how can I extract 2nd line using regular expression ?
line1
line2
line3
I used
pattern = Pattern.compile("line1.*(.*?).*line3");
But nothing appears
You can use Pattern.DOTALL flag like this:
String str = "line1\nline2\nline3";
Pattern pt = Pattern.compile("line1\n(.+?)\nline3", Pattern.DOTALL);
Matcher m = pt.matcher(str);
while (m.find())
System.out.printf("Matched - [%s]%n", m.group(1)); // outputs [line2]
This won't work, since your first .* matches everything up to line3. Your reluctant match gets lost, as does the second .*.
Try to specify the line breaks (^ and $) after line1 / before line3.
Try pattern = Pattern.compile("line1.*?(.*?).*?line3", Pattern.DOTALL | Pattern.MULTILINE);
You can extract everything between two non-empty lines:
(?<=.+\n).+(?=\n.+)

RegEx - problem with multiline input

I have a String with multiline content and want to select a multiline region, preferably using a regular expression (just because I'm trying to understand Java RegEx at the moment).
Consider the input like:
Line 1
abc START def
Line 2
Line 3
gh END jklm
Line 4
Assuming START and END are unique and the start/end markers for the region, I'd like to create a pattern/matcher to get the result:
def
Line 2
Line 3
gh
My current attempt is
Pattern p = Pattern.compile("START(.*)END");
Matcher m = p.matcher(input);
if (m.find())
System.out.println(m.group(1));
But the result is
gh
So m.start() seems to point at the beginning of the line that contains the 'end marker'. I tried to add Pattern.MULTILINE to the compile call but that (alone) didn't change anything.
Where is my mistake?
You want Pattern.DOTALL, so . matches newline characters. MULTILINE addresses a different issue, the ^ and $ anchors.
Pattern p = Pattern.compile("START(.*)END", Pattern.DOTALL);
You want to set Pattern.DOTALL (so you can match end of line characters with your . wildcard), see this test:
#Test
public void testMultilineRegex() throws Exception {
final String input = "Line 1\nabc START def\nLine 2\nLine 3\ngh END jklm\nLine 4";
final String expected = " def\nLine 2\nLine 3\ngh ";
final Pattern p = Pattern.compile("START(.*)END", Pattern.DOTALL);
final Matcher m = p.matcher(input);
if (m.find()) {
Assert.assertEquals(expected, m.group(1));
} else {
Assert.fail("pattern not found");
}
}
The regex metachar . does not match a newline. You can try the regex:
START([\w\W]*)END
which uses [\w\W] in place of ..
[\w\W] is a char class to match a word-char and a non-word-char, so effectively matches everything.

Categories