I have the following Java code:
Pattern pat = Pattern.compile("(?<!function )\\w+");
Matcher mat = pat.matcher("function example");
System.out.println(mat.find());
Why does mat.find() return true? I used negative lookbehind and example is preceded by function. Shouldn't it be discarded?
See what it matches:
public static void main(String[] args) throws Exception {
Pattern pat = Pattern.compile("(?<!function )\\w+");
Matcher mat = pat.matcher("function example");
while (mat.find()) {
System.out.println(mat.group());
}
}
Output:
function
xample
So first it finds function, which isn't preceded by "function". Then it finds xample which is preceded by function e and therefore not "function".
Presumably you want the pattern to match the whole text, not just find matches in the text.
You can either do this with Matcher.matches() or you can change the pattern to add start and end anchors:
^(?<!function )\\w+$
I prefer the second approach as it means that the pattern itself defines its match region rather then the region being defined by its usage. That's just a matter of preference however.
Your string has the word "function" that matches \w+, and is not preceded by "function ".
Notice two things here:
You're using find() which returns true for a sub-string match as well.
Because of the above, "function" matches as it is not preceded by "function".
The whole string would have never matched because your regex didn't
include spaces.
Use Mathcher#matches() or ^ and $ anchors with a negative lookahead instead:
Pattern pat = Pattern.compile("^(?!function)[\\w\\s]+$"); // added \s for whitespaces
Matcher mat = pat.matcher("function example");
System.out.println(mat.find()); // false
Related
I am new to regular expression and i want to find a string between two characters,
I tried below but it always returns false. May i know whats wrong with this ?
public static void main(String[] args) {
String input = "myFunction(hello ,world, test)";
String patternString = "\\(([^]]+)\\)";
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(input);
while (matcher.find()) {
System.out.println(matcher.group());
}
}
Input:
myFunction(hello,world,test) where myFunction can be any characters. before starting ( there can be any characters.
Output:
hello
world
test
You could match make use of the \G anchor which asserts the position at the end of the previous match and and capture your values in a group:
(?:\bmyFunction\(|\G(?!^))([^,]+)(?:\h*,\h*)?(?=[^)]*\))
In Java:
String regex = "(?:\\bmyFunction\\(|\\G(?!^))([^,]+)(?:\\h*,\\h*)?(?=[^)]*\\))";
Explanation
(?: Non capturing group
\bmyFunction\( Word boundary to prevent the match being part of a larger word, match myFunction and an opening parentheses (
| Or
\G(?!^) Assert position at the end of previous match, not at the start of the string
) Close non capturing group
([^,]+) Capture in a group matching 1+ times not a comma
(?:\h*,\h*)? Optionally match a comma surrounded by 0+ horizontal whitespace chars
(?=[^)]*\)) Positive lookahead, assert what is on the right is a closing parenthesis )
Regex demo | Java demo
For example:
String patternString = "(?:\\bmyFunction\\(|\\G(?!^))([^,]+)(?:\\h*,\\h*)?(?=[^)]*\\))";
String input = "myFunction(hello ,world, test)";
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(input);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
Result
hello
world
test
I'd suggest you to achieve this in a two-step process:
Step 1: Capture all the content between ( and )
Use the regex: ^\S+\((.*)\)$
Demo
The first and the only capturing group will contain the required text.
Step 2: Split the captured string above on ,, thus yielding all the comma-separated parameters independently.
See this you may get idea
([\w]+),([\w]+),([\w]+)
DEMO: https://rubular.com/r/9HDIwBTacxTy2O
I have a string with multiple "message" inside it. "message" starts with certain char sequence. I've tried:
String str = 'ab message1ab message2ab message3'
Pattern pattern = Pattern.compile('(?<record>ab\\p{ASCII}+(?!ab))');
Matcher matcher = pattern.matcher(str);
while (matcher.find()) {
handleMessage(matcher.group('record'))
}
but \p{ASCII}+ greedy eat everything.
Symbols a, b can be inside message only their sequence mean start of next message
p{ASCII}+ is the greedy regex for one or more ASCII characters, meaning that it will use the longest possible match. But you can use the reluctant quantifier if you want the shortest possible match: p{ASCII}+?. In that case, you should use a positive lookahead assertion.
The regex could become:
Pattern pattern = Pattern.compile("(?<record>ab\\p{ASCII}+?)(?=(ab)|\\z)");
Please note the (ab)|\z to match the last message...
How can I write a regex that matches anything between two specific characters?
like:
ignore me [take:me] ignore me?
How can I match inclusive [take:me]?
The word take:me is dynamic, so I'd also would like to match [123as d:.-,§""§%]
You can use this regex:
"\\[(.*?)\\]"
This link should help you to understand why it works.
Pattern pattern = Pattern.compile("\\[(.*?)\\]");
Matcher matcher = pattern.matcher("ignore me [take:me] ignore me");
if (matcher.find()) {
System.out.println(matcher.group(1));
}
This will print take:me.
If you want to match &([take:me]) you should use this:
&\\(\\[(.*?)\\]\\)
Not that you should escape chars with special meaning in regex. (like ( and )).
Escaping them is done by adding a backslash, but because backslash in Java is written as \\ then you add \\ before any char that have a special meaning. So by doing \\( you're telling Java:
"Take ( as the regular char and not the special char".
Try (?<=c)(.+)(?=c) where c is the caharacter you're using
The java.util.regex.Matcher class is used to search through a text for multiple occurrences of a regular expression. You can also use a Matcher to search for the same regular expression in different texts.
The Matcher class has a lot of useful methods. For a full list, see the official JavaDoc for the Matcher class. I will cover the core methods here. Here is a list of the methods covered:
Creating a Matcher
Creating a Matcher is done via the matcher() method in the Pattern class. Here is an example:
String text =
"This is the text to be searched " +
"for occurrences of the http:// pattern.";
String patternString = ".*http://.*";
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(text);
matches()
The matches() method in the Matcher class matches the regular expression against the whole text passed to the Pattern.matcher() method, when the Matcher was created. Here is an example:
boolean matches = matcher.matches();
If the regular expression matches the whole text, then the matches() method returns true. If not, the matches() method returns false.
You cannot use the matches() method to search for multiple occurrences of a regular expression in a text. For that, you need to use the find(), start() and end() methods.
lookingAt()
The lookingAt() method works like the matches() method with one major difference. The lookingAt() method only matches the regular expression against the beginning of the text, whereas matches() matches the regular expression against the whole text. In other words, if the regular expression matches the beginning of a text but not the whole text, lookingAt() will return true, whereas matches() will return false.
Here is an example:
String text =
"This is the text to be searched " +
"for occurrences of the http:// pattern.";
String patternString = "This is the";
Pattern pattern = Pattern.compile(patternString, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(text);
System.out.println("lookingAt = " + matcher.lookingAt());
System.out.println("matches = " + matcher.matches());
find() + start() + end()
The find() method searches for occurrences of the regular expressions in the text passed to the Pattern.matcher(text) method, when the Matcher was created. If multiple matches can be found in the text, the find() method will find the first, and then for each subsequent call to find() it will move to the next match.
The methods start() and end() will give the indexes into the text where the found match starts and ends.
Here is an example:
String text =
"This is the text which is to be searched " +
"for occurrences of the word 'is'.";
String patternString = "is";
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(text);
int count = 0;
while(matcher.find()) {
count++;
System.out.println("found: " + count + " : "
+ matcher.start() + " - " + matcher.end());
}
This example will find the pattern "is" four times in the searched string. The output printed will be this:
found: 1 : 2 - 4
found: 2 : 5 - 7
found: 3 : 23 - 25
found: 4 : 70 - 72
You can also refer these tutorials..
Tutorial 1
You can also use lookaround assertions. This way the brackets are not included in the match itself.
(?<=\\[).*?(?=\\])
(?<=\\[) is a positive lookbehind assertion. It is true, when the char "[" is before the match
(?=\\]) is a positive lookahead assertion. It is true, when the char "[" is after the match
.*? is matching any character zero or more times, but as less as possible, because of the modifier ?. It changes the matching behaviour of quantifiers from "greedy" to "lazy".
Can anyone please help me do the following in a java regular expression?
I need to read 3 characters from the 5th position from a given String ignoring whatever is found before and after.
Example : testXXXtest
Expected result : XXX
You don't need regex at all.
Just use substring: yourString.substring(4,7)
Since you do need to use regex, you can do it like this:
Pattern pattern = Pattern.compile(".{4}(.{3}).*");
Matcher matcher = pattern.matcher("testXXXtest");
matcher.matches();
String whatYouNeed = matcher.group(1);
What does it mean, step by step:
.{4} - any four characters
( - start capturing group, i.e. what you need
.{3} - any three characters
) - end capturing group, you got it now
.* followed by 0 or more arbitrary characters.
matcher.group(1) - get the 1st (only) capturing group.
You should be able to use the substring() method to accomplish this:
string example = "testXXXtest";
string result = example.substring(4,7);
This might help: Groups and capturing in java.util.regex.Pattern.
Here is an example:
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class Example {
public static void main(String[] args) {
String text = "This is a testWithSomeDataInBetweentest.";
Pattern p = Pattern.compile("test([A-Za-z0-9]*)test");
Matcher m = p.matcher(text);
if (m.find()) {
System.out.println("Matched: " + m.group(1));
} else {
System.out.println("No match.");
}
}
}
This prints:
Matched: WithSomeDataInBetween
If you don't want to match the entire pattern rather to the input string (rather than to seek a substring that would match), you can use matches() instead of find(). You can continue searching for more matching substrings with subsequent calls with find().
Also, your question did not specify what are admissible characters and length of the string between two "test" strings. I assumed any length is OK including zero and that we seek a substring composed of small and capital letters as well as digits.
You can use substring for this, you don't need a regex.
yourString.substring(4,7);
I'm sure you could use a regex too, but why if you don't need it. Of course you should protect this code against null and strings that are too short.
Use the String.replaceAll() Class Method
If you don't need to be performance optimized, you can try the String.replaceAll() class method for a cleaner option:
String sDataLine = "testXXXtest";
String sWhatYouNeed = sDataLine.replaceAll( ".{4}(.{3}).*", "$1" );
References
https://docs.oracle.com/javase/1.5.0/docs/api/java/lang/String.html
http://www.vogella.com/tutorials/JavaRegularExpressions/article.html#using-regular-expressions-with-string-methods
input1="caused/VBN by/IN thyroid disorder"
Requirement: find word "caused" that is followed by slash followed by any number of capital alphabets -- and not followed by space + "by/IN.
In the example above "caused/VBN" is followed by " by/IN", so 'caused' should not match.
input2="caused/VBN thyroid disorder"
"by/IN" doesn't follow caused, so it should match
regex="caused/[A-Z]+(?![\\s]+by/IN)"
caused/[A-Z]+ -- word 'caused' + / + one or more capital letters
(?![\\s]+by) -- negative lookahead - not matching space and by
Below is a simple method that I used to test
public static void main(String[] args){
String input = "caused/VBN by/IN thyroid disorder";
String regex = "caused/[A-Z]+(?![\\s]+by/IN)";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(input);
while(matcher.find()){
System.out.println(matcher.group());
}
Output: caused/VB
I don't understand why my negative lookahead regex is not working.
You need to include a word boundary in your regular expression:
String regex = "caused/[A-Z]+\\b(?![\\s]+by/IN)";
Without it you can get a match, but not what you were expecting:
"caused/VBN by/IN thyroid disorder";
^^^^^^^^^
this matches because "N by" doesn't match "[\\s]+by"
The character class []+ match will be adjusted (via backtracking) so that the lookahead will match.
What you have to do is stop the backtracking so that the expression []+ is fully matched.
This can be done a couple of different ways.
A positive lookahead, followed by a consumption
"caused(?=(/[A-Z]+))\\1(?!\\s+by/IN)"
A standalone sub-expression
"caused(?>/[A-Z]+)(?!\\s+by/IN)"
A possesive quantifier
"caused/[A-Z]++(?!\\s+by/IN)"