I have a String with multiline content and want to select a multiline region, preferably using a regular expression (just because I'm trying to understand Java RegEx at the moment).
Consider the input like:
Line 1
abc START def
Line 2
Line 3
gh END jklm
Line 4
Assuming START and END are unique and the start/end markers for the region, I'd like to create a pattern/matcher to get the result:
def
Line 2
Line 3
gh
My current attempt is
Pattern p = Pattern.compile("START(.*)END");
Matcher m = p.matcher(input);
if (m.find())
System.out.println(m.group(1));
But the result is
gh
So m.start() seems to point at the beginning of the line that contains the 'end marker'. I tried to add Pattern.MULTILINE to the compile call but that (alone) didn't change anything.
Where is my mistake?
You want Pattern.DOTALL, so . matches newline characters. MULTILINE addresses a different issue, the ^ and $ anchors.
Pattern p = Pattern.compile("START(.*)END", Pattern.DOTALL);
You want to set Pattern.DOTALL (so you can match end of line characters with your . wildcard), see this test:
#Test
public void testMultilineRegex() throws Exception {
final String input = "Line 1\nabc START def\nLine 2\nLine 3\ngh END jklm\nLine 4";
final String expected = " def\nLine 2\nLine 3\ngh ";
final Pattern p = Pattern.compile("START(.*)END", Pattern.DOTALL);
final Matcher m = p.matcher(input);
if (m.find()) {
Assert.assertEquals(expected, m.group(1));
} else {
Assert.fail("pattern not found");
}
}
The regex metachar . does not match a newline. You can try the regex:
START([\w\W]*)END
which uses [\w\W] in place of ..
[\w\W] is a char class to match a word-char and a non-word-char, so effectively matches everything.
Related
I am trying to get all the matching groups in my string.
My regular expression is "(?<!')/|/(?!')". I am trying to split the string using regular expression pattern and matcher. string needs to be split by using /, but '/'(surrounded by ') this needs to be skipped. for example "One/Two/Three'/'3/Four" needs to be split as ["One", "Two", "Three'/'3", "Four"] but not using .split method.
I am currently the below
// String to be scanned to find the pattern.
String line = "Test1/Test2/Tt";
String pattern = "(?<!')/|/(?!')";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(line);
if (m.matches()) {
System.out.println("Found value: " + m.group(0) );
} else {
System.out.println("NO MATCH");
}
But it always saying "NO MATCH". where i am doing wrong? and how to fix that?
Thanks in advance
To get the matches without using split, you might use
[^'/]+(?:'/'[^'/]*)*
Explanation
[^'/]+ Match 1+ times any char except ' or /
(?: Non capture group
'/'[^'/]* Match '/' followed by optionally matching any char except ' or /
)* Close group and optionally repeat it
Regex demo | Java demo
String regex = "[^'/]+(?:'/'[^'/]*)*";
String string = "One/Two/Three'/'3/Four";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println(matcher.group(0));
}
Output
One
Two
Three'/'3
Four
Edit
If you do not want to split don't you might also use a pattern to not match / but only when surrounded by single quotes
[^/]+(?:(?<=')/(?=')[^/]*)*
Regex demo
Try this.
String line = "One/Two/Three'/'3/Four";
Pattern pattern = Pattern.compile("('/'|[^/])+");
Matcher m = pattern.matcher(line);
while (m.find())
System.out.println(m.group());
output:
One
Two
Three'/'3
Four
Here is simple pattern matching all desired /, so you can split by them:
(?<=[^'])\/(?=')|(?<=')\/(?=[^'])|(?<=[^'])\/(?=[^'])
The logic is as follows: we have 4 cases:
/ is sorrounded by ', i.e. `'/'
/ is preceeded by ', i.e. '/
/ is followed by ', i.e. /'
/ is sorrounded by characters other than '
You want only exclude 1. case. So we need to write regex for three cases, so I have written three similair regexes and used alternation.
Explanation of the first part (other two are analogical):
(?<=[^']) - positiva lookbehind, assert what preceeds is differnt frim ' (negated character class [^']
\/ - match / literally
(?=') - positiva lookahead, assert what follows is '\
Demo with some more edge cases
Try something like this:
String line = "One/Two/Three'/'3/Four";
String pattern = "([^/]+'/'\d)|[^/]+";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(line);
boolean found = false;
while(m.find()) {
System.out.println("Found value: " + m.group() );
found = true;
}
if(!found) {
System.out.println("NO MATCH");
}
Output:
Found value: One
Found value: Two
Found value: Three'/'3
Found value: Four
I have the following text
CHAPTER 1
Introduction
CHAPTER OVERVIEW
Which I did create and tested (http://regexr.com/) the following regEx for
(CHAPTER\s{1}\d\n)
However when I use the following code on Java it fails
String text = stripper.getText(document);//The text above
Pattern p = Pattern.compile("(CHAPTER\\s{1}\\d\\n)");
Matcher m = p.matcher(text);
if (m.find()) {
//do action
}
the m.find() returns always false.
Your document may have DOS line feed \r as well. You can use either of these patterns:
Pattern p = Pattern.compile("CHAPTER\\s+\\d+\\R");
\R (requires Java 8) will match any combination of \r and \n after your digits or just use:
Pattern p = Pattern.compile("CHAPTER\\s+\\d+\\s");
since \s also matches any whitespace including newline characters.
Another alternative is to use MULTILINE flag with anchor $:
Pattern p = Pattern.compile("(?m)CHAPTER\\s+\\d+$");
Your problem is in your source text. I think you forget about new lines. Because this:
String text = "CHAPTER 1\n" +
"Introduction\n" +
"CHAPTER OVERVIEW";
Pattern p = Pattern.compile("(CHAPTER\\s{1}\\d\\n)");
Matcher m = p.matcher(text);
System.out.println(m.find());
will write true. String body is copied from here and Intellij add there new lines. Try to debug what you really get in stripper.getText(document).
You can use Pattern as second param for compile. (Pattern.MULTILINE) More info
here
.
I have a regex to match a line and delete it. Everything is below it (and keep everything above it).
Two Part Ask:
1) Why won't this pattern match the given String text below?
2) How can I be sure to just match on a single line and not multiple lines?
- The pattern has to be found on the same single line.
String text = "Keep this.\n\n\nPlease match junkhere this t-h-i-s is missing.\n"
+ "Everything should be deleted here but don't match this on this line" + "\n\n";
Pattern p = Pattern.compile("^(Please(\\s)(match)(\\s)(.*?)\\sthis\\s(.*))$", Pattern.DOTALL );
Matcher m = p.matcher(text);
if (m.find()) {
text = (m.replaceAll("")).replaceAll("[\n]+$", ""); // remove everything below at and below "Please match ... this"
System.out.println(text);
}
Expected Output:
Keep this.
You are complicating your life...
First, as I said in the comment, use Pattern.MULTILINE.
Then, to truncate the string from the beginning of the match, use .substring():
final Pattern p = Pattern.compile("^Please\\s+match\\b.*?this",
Pattern.MULTILINE);
final Matcher m = p.matcher(input);
return m.find() ? input.substring(0, m.start()) : input;
Remove DOTALL to make sure to match on a single line and convert \s to " "
Pattern p = Pattern.compile("^(Please( )(match)( )(.*?) this (.*))$");
DOTALL makes a dot match newlines as well
\s can match any whitespace including new lines.
I have the following file contents and I'm trying to match a reg explained below:
-- file.txt (doesn't match multi-line) --
test
On blah
more blah wrote:
---------------
If I read the file contents from above to a String and try to match the "On...wrote:" part I cannot get a match:
// String text = <file contents from above>
Pattern PATTERN = Pattern.compile("^(On\\s(.+)wrote:)$", Pattern.MULTILINE);
Matcher m = PATTERN.matcher(text);
if (m.find()) {
System.out.println("Never gets HERE???");
}
The above regex works fine if the contents of the file are on one line:
-- file2.txt (matches on single line) --
test
On blah more blah wrote: On blah more blah wrote:
---------------
How do I get the multiline to work and the single line all in one regex (or two for that matter)? Thx!
Pattern.MULTILINE just tells Java to accept the anchors ^ and $ to match at the start and end of each line.
Add the Pattern.DOTALL flag to allow the dot . character to match newline characters. This is done using the bitwise inclusive OR | operator
Pattern PATTERN =
Pattern.compile("^(On\\s(.+)wrote:)$", Pattern.MULTILINE | Pattern.DOTALL );
You could use a combination of matching \S (non-whitespace) and \s (whitespace)
Pattern PATTERN = Pattern.compile("(On\\s([\\S\\s]*?)wrote:)");
See live regex101 demo
Example:
import java.util.regex.*;
class rTest {
public static void main (String[] args) {
String s = "test\n\n"
+ "On blah\n\n"
+ "more blah wrote:\n";
Pattern p = Pattern.compile("(On\\s([\\S\\s]*?)wrote:)");
Matcher m = p.matcher(s);
if (m.find()) {
System.out.println(m.group(2));
}
}
}
given 3 lines , how can I extract 2nd line using regular expression ?
line1
line2
line3
I used
pattern = Pattern.compile("line1.*(.*?).*line3");
But nothing appears
You can use Pattern.DOTALL flag like this:
String str = "line1\nline2\nline3";
Pattern pt = Pattern.compile("line1\n(.+?)\nline3", Pattern.DOTALL);
Matcher m = pt.matcher(str);
while (m.find())
System.out.printf("Matched - [%s]%n", m.group(1)); // outputs [line2]
This won't work, since your first .* matches everything up to line3. Your reluctant match gets lost, as does the second .*.
Try to specify the line breaks (^ and $) after line1 / before line3.
Try pattern = Pattern.compile("line1.*?(.*?).*?line3", Pattern.DOTALL | Pattern.MULTILINE);
You can extract everything between two non-empty lines:
(?<=.+\n).+(?=\n.+)