How to match anything between two characters? - java

How can I write a regex that matches anything between two specific characters?
like:
ignore me [take:me] ignore me?
How can I match inclusive [take:me]?
The word take:me is dynamic, so I'd also would like to match [123as d:.-,§""§%]

You can use this regex:
"\\[(.*?)\\]"
This link should help you to understand why it works.
Pattern pattern = Pattern.compile("\\[(.*?)\\]");
Matcher matcher = pattern.matcher("ignore me [take:me] ignore me");
if (matcher.find()) {
System.out.println(matcher.group(1));
}
This will print take:me.
If you want to match &([take:me]) you should use this:
&\\(\\[(.*?)\\]\\)
Not that you should escape chars with special meaning in regex. (like ( and )).
Escaping them is done by adding a backslash, but because backslash in Java is written as \\ then you add \\ before any char that have a special meaning. So by doing \\( you're telling Java:
"Take ( as the regular char and not the special char".

Try (?<=c)(.+)(?=c) where c is the caharacter you're using

The java.util.regex.Matcher class is used to search through a text for multiple occurrences of a regular expression. You can also use a Matcher to search for the same regular expression in different texts.
The Matcher class has a lot of useful methods. For a full list, see the official JavaDoc for the Matcher class. I will cover the core methods here. Here is a list of the methods covered:
Creating a Matcher
Creating a Matcher is done via the matcher() method in the Pattern class. Here is an example:
String text =
"This is the text to be searched " +
"for occurrences of the http:// pattern.";
String patternString = ".*http://.*";
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(text);
matches()
The matches() method in the Matcher class matches the regular expression against the whole text passed to the Pattern.matcher() method, when the Matcher was created. Here is an example:
boolean matches = matcher.matches();
If the regular expression matches the whole text, then the matches() method returns true. If not, the matches() method returns false.
You cannot use the matches() method to search for multiple occurrences of a regular expression in a text. For that, you need to use the find(), start() and end() methods.
lookingAt()
The lookingAt() method works like the matches() method with one major difference. The lookingAt() method only matches the regular expression against the beginning of the text, whereas matches() matches the regular expression against the whole text. In other words, if the regular expression matches the beginning of a text but not the whole text, lookingAt() will return true, whereas matches() will return false.
Here is an example:
String text =
"This is the text to be searched " +
"for occurrences of the http:// pattern.";
String patternString = "This is the";
Pattern pattern = Pattern.compile(patternString, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(text);
System.out.println("lookingAt = " + matcher.lookingAt());
System.out.println("matches = " + matcher.matches());
find() + start() + end()
The find() method searches for occurrences of the regular expressions in the text passed to the Pattern.matcher(text) method, when the Matcher was created. If multiple matches can be found in the text, the find() method will find the first, and then for each subsequent call to find() it will move to the next match.
The methods start() and end() will give the indexes into the text where the found match starts and ends.
Here is an example:
String text =
"This is the text which is to be searched " +
"for occurrences of the word 'is'.";
String patternString = "is";
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(text);
int count = 0;
while(matcher.find()) {
count++;
System.out.println("found: " + count + " : "
+ matcher.start() + " - " + matcher.end());
}
This example will find the pattern "is" four times in the searched string. The output printed will be this:
found: 1 : 2 - 4
found: 2 : 5 - 7
found: 3 : 23 - 25
found: 4 : 70 - 72
You can also refer these tutorials..
Tutorial 1

You can also use lookaround assertions. This way the brackets are not included in the match itself.
(?<=\\[).*?(?=\\])
(?<=\\[) is a positive lookbehind assertion. It is true, when the char "[" is before the match
(?=\\]) is a positive lookahead assertion. It is true, when the char "[" is after the match
.*? is matching any character zero or more times, but as less as possible, because of the modifier ?. It changes the matching behaviour of quantifiers from "greedy" to "lazy".

Related

Java regex to match after start of previous match [duplicate]

How can I extract overlapping matches from an input using String.split()?
For example, if trying to find matches to "aba":
String input = "abababa";
String[] parts = input.split(???);
Expected output:
[aba, aba, aba]
String#split will not give you overlapping matches. Because a particular part of the string, will only be included in a unique index, of the array obtained, and not in two indices.
You should use Pattern and Matcher classes here.
You can use this regex: -
Pattern pattern = Pattern.compile("(?=(aba))");
And use Matcher#find method to get all the overlapping matches, and print group(1) for it.
The above regex matches every empty string, that is followed by aba, then just print the 1st captured group. Now since look-ahead is zero-width assertion, so it will not consume the string that is matched. And hence you will get all the overlapping matches.
String input = "abababa";
String patternToFind = "aba";
Pattern pattern = Pattern.compile("(?=" + patternToFind + ")");
Matcher matcher = pattern.matcher(input);
while (matcher.find()) {
System.out.println(patternToFind + " found at index: " + matcher.start());
}
Output: -
aba found at index: 0
aba found at index: 2
aba found at index: 4
I would use indexOf.
for(int i = text.indexOf(find); i >= 0; i = text.indexOf(find, i + 1))
System.out.println(find + " found at " + i);
This is not a correct use of split(). From the javadocs:
Splits this string around matches of the given regular expression.
Seems to me that you are not trying to split the string but to find all matches of your regular expression in the string. For this you would have to use a Matcher, and some extra code that loops on the Matcher to find all matches and then creates the array.

Java pattern usage

SCENARIO :
Pattern whitespace = Pattern.compile("^\\s");
matcher = whitespace.matcher(" WhiteSpace");
Pattern whitespace2 = Pattern.compile("^\\s\\s");
matcher2 = whitespace2.matcher(" WhiteSpace");
I am trying to get whitespaces at the beginning of a line. I want to get exact number of white spaces matcher true. My string is " WhiteSpace".
Problem is both matcher and matcher2 work on this string.
The thing I want is:
A pattern that only get 1 white space, but this pattern should not work
for 2 white space string. In the scenario below both matcher.find() and matcher2.find() are true. But matcher.find() should be false, matcher2.find() should be true.
I want matcher to be true for " WhiteSpace", false for " WhiteSpace" (two spaces)
I want matcher2 to be true for :" WhiteSpace".
The thing I want to do is;
I have a string " two whitespaces".
Below both if statements are true. matcher should be false.
matcher2 should be true.
Pattern whitespace = Pattern.compile("^\\s");
matcher = whitespace.matcher(" two whitespaces");
Pattern whitespace2 = Pattern.compile("^\\s\\s");
matcher2 = whitespace2.matcher(" two whitespaces");
if(matcher.find()==true){
//XXXXXXXXXXX
} else if(matcher2.find()==true){
//YYYYYYYYYYY
}
If you want to ensure that after one whitespace there is no another whitespace, but you don't actually want to include that second character which you will test in match (regardless if it was whitespace or not), you can use negative lookahead mechanism (?!..).
So pattern which can match only whitespace at start of line if it doesn't have another whitespace after it may look like
Pattern whitespace = Pattern.compile("^\\s(?!\\s)");
This can be adapted for any number by spaces
Pattern whitespace = Pattern.compile("^\\s{3}(?!\\s)");
A pattern may be an overkill here*. Use Character.isWhitespace and get a simpler code:
String in = " your input here";
int wsPrefix=0;
for ( ; wsPrefix < in.length() && Character.isWhitespace(in.charAt(wsPrefix)) ;
wsPrefix++ ) {}
System.out.println("wsPrefix = " + wsPrefix);
* For it is said:
"Some people, when confronted with a problem, think
“I know, I'll use regular expressions.” Now they have two problems.
-- Jaimie Zawinski, 1997

Java Pattern / Matcher not finding word break

I am having trouble with Java Pattern and Matcher. I've included a very simplified example of what I'm trying to do.
I had expected the pattern ".\b" to find the last character of the first word (or "4" in the example), but as I step through the code, m.find() always returns false. What am I missing here?
Why does the following Java code always print out "Not Found"?
Pattern p = Pattern.compile(".\b");
Matcher m = p.matcher("102939384 is a word");
int ixEndWord = 0;
if (m.find()) {
ixEndWord = m.end();
System.out.println("Found: " + ixEndWord);
} else {
System.out.println("Not Found");
}
You need to escape special characters in the regex: ".\\b"
Basically, in a String the backslash has to be escaped. So "\\" becomes the character '\'.
So the String ".\\b" becomes the litteral String ".\b", which will be used by the Pattern.
To expand upton AntonH's comment, whenever you want the "\" character to appear in a regex expression, you have to escape it so that it first appears in the string you are passing in.
As is, ".\b" is the string of a dot . followed by the special backspace character represented by \b, compared to ".\\b", which is the regex .\b.

How to match the pattern against the complete string?

How to specify a regex pattern for "either empty string or 'new entry created' is a substring"?
I tried (new entry created|) but Matcher.find() on the pattern is true on input like:
"
Invalid level
Valid level range is 0-14.
"
It is not an option to do straight-forward programming like
String.isEmpty() || String.contains("new entry created")
because the same method will have different patterns as input. In other words, I need to use regex for this case.
I want to stick with Matcher.find() because there are other error patterns I shall be using - 'invalid', for instance.
You simply need to surround the "empty" section with start and end delimiters in order to limit your searches to exact matches of the input:
(new entry created|^$)
Your pattern is fine (for Java), just use Matcher.matches() instead of Matcher.find().
From the Class Matcher documentation:
The matches method attempts to match the entire input sequence against the pattern.
The lookingAt method attempts to match the input sequence, starting at the beginning, against the pattern.
The find method scans the input sequence looking for the next subsequence that matches the pattern.
So, the problem is, the find method will find your empty string alternative in any string. To find an exact match in Java, use the matches() method.
If your requirement actually is to match either
a string that contains new entry created or
an empty string
then you can use
^(?:.*new entry created.*)?$
or (if your string might contain newlines)
(?s)^(?:.*new entry created.*)?$
In Java, where you have the .matches() method, you can remove the anchors:
Pattern regex = Pattern.compile("(?:.*new entry created.*)?", Pattern.DOTALL);
Matcher regexMatcher = regex.matcher(subjectString);
foundMatch = regexMatcher.matches();
Hello try this pattern...
(new entry created)|(^$)
I have tested this code
public static void main(String[] argv) {
String pattern = "(new entry created)|(^$)";
String input = "";
Pattern p = Pattern.compile(pattern);
boolean found = p.matcher(input).lookingAt();
System.out.println("'" + pattern + "'"
+ (found ? " matches '" : " doesn't match '") + input + "'");
}

Regexp grouping and replaceAll with .* in Java duplicates the replacement

I got a problem using Rexexp in Java. The example code writes out ABC_012_suffix_suffix, I was expecting it to output ABC_012_suffix
Pattern rexexp = Pattern.compile("(.*)");
Matcher matcher = rexexp.matcher("ABC_012");
String result = matcher.replaceAll("$1_suffix");
System.out.println(result);
I understand that replaceAll replaces all matched groups, the questions is why is this regexp group (.*) matching twice on my string ABC_012 in Java?
Pattern regexp = Pattern.compile(".*");
Matcher matcher = regexp.matcher("ABC_012");
matcher.matches();
System.out.println(matcher.group(0));
System.out.println(matcher.replaceAll("$0_suffix"));
Same happens here, the output is:
ABC_012
ABC_012_suffix_suffix
The reason is hidden in the replaceAll method: it tries to find all subsequences that match the pattern:
while (matcher.find()) {
System.out.printf("Start: %s, End: %s%n", matcher.start(), matcher.end());
}
This will result in:
Start: 0, End: 7
Start: 7, End: 7
So, to our first surprise, the matcher finds two subsequences, "ABC_012" and another "". And it appends "_suffix" to both of them:
"ABC_012" + "_suffix" + "" + "_suffix"
Probably .* gives you "full match" and then reduces match to the "empty match" (but still a match). Try (.+) or (^.*$) instead. Both work as expected.
At regexinfo star is defined as follows:
*(star) - Repeats the previous item zero or more times. Greedy, so as many items as possible will be matched before trying permutations with less matches of the preceding item, up to the point where the preceding item is not matched at all.
If you just want to add "_suffix" to your input why don't you just do:
String result = "ABC_012" + "_suffix";
?

Categories