How to specify a regex pattern for "either empty string or 'new entry created' is a substring"?
I tried (new entry created|) but Matcher.find() on the pattern is true on input like:
"
Invalid level
Valid level range is 0-14.
"
It is not an option to do straight-forward programming like
String.isEmpty() || String.contains("new entry created")
because the same method will have different patterns as input. In other words, I need to use regex for this case.
I want to stick with Matcher.find() because there are other error patterns I shall be using - 'invalid', for instance.
You simply need to surround the "empty" section with start and end delimiters in order to limit your searches to exact matches of the input:
(new entry created|^$)
Your pattern is fine (for Java), just use Matcher.matches() instead of Matcher.find().
From the Class Matcher documentation:
The matches method attempts to match the entire input sequence against the pattern.
The lookingAt method attempts to match the input sequence, starting at the beginning, against the pattern.
The find method scans the input sequence looking for the next subsequence that matches the pattern.
So, the problem is, the find method will find your empty string alternative in any string. To find an exact match in Java, use the matches() method.
If your requirement actually is to match either
a string that contains new entry created or
an empty string
then you can use
^(?:.*new entry created.*)?$
or (if your string might contain newlines)
(?s)^(?:.*new entry created.*)?$
In Java, where you have the .matches() method, you can remove the anchors:
Pattern regex = Pattern.compile("(?:.*new entry created.*)?", Pattern.DOTALL);
Matcher regexMatcher = regex.matcher(subjectString);
foundMatch = regexMatcher.matches();
Hello try this pattern...
(new entry created)|(^$)
I have tested this code
public static void main(String[] argv) {
String pattern = "(new entry created)|(^$)";
String input = "";
Pattern p = Pattern.compile(pattern);
boolean found = p.matcher(input).lookingAt();
System.out.println("'" + pattern + "'"
+ (found ? " matches '" : " doesn't match '") + input + "'");
}
Related
I have this code to find this pattern: 201409250200131738007947036000 - 1 ,inside the text
final String patternStr = "(\\d{30} - \\d{1})";
final Pattern p = Pattern.compile(patternStr);
final Matcher m = p.matcher(page);
if (m.matches()) {
System.out.println("SUCCESS");
}
But for any strange reasson in Java did't work, Can somebody help me where is the error please?
The reason is that the matches method checks for the entire given string to match the regex.
So i.e. if your string is 123456123412345612341234561234 - 8 it will match, if it is my number 123456123412345612341234561234 - 8 is inside other text it won't.
Use the find method to accomplish your task:
if (m.find()) {
System.out.println("SUCCESS");
}
It will search inside the given string instead of attempting to match the entire string.
From the documentation for Matcher, matches:
Attempts to match the entire region against the pattern.
As opposed to find which:
Attempts to find the next subsequence of the input sequence that matches the pattern.
So use matches to match an entire String against a pattern, use find to locate a pattern inside a String.
Try:
final String patternStr = "\\d{30}+\\s-\\s\\d";
final Pattern p = Pattern.compile(patternStr);
final Matcher m = p.matcher(page);
while (m.find()) {
System.out.printf("FOUND A MATCH: %s%n", matcher.group());
}
I edited your pattern slightly to make it more robust. This will print each match that it finds.
I am having trouble with Java Pattern and Matcher. I've included a very simplified example of what I'm trying to do.
I had expected the pattern ".\b" to find the last character of the first word (or "4" in the example), but as I step through the code, m.find() always returns false. What am I missing here?
Why does the following Java code always print out "Not Found"?
Pattern p = Pattern.compile(".\b");
Matcher m = p.matcher("102939384 is a word");
int ixEndWord = 0;
if (m.find()) {
ixEndWord = m.end();
System.out.println("Found: " + ixEndWord);
} else {
System.out.println("Not Found");
}
You need to escape special characters in the regex: ".\\b"
Basically, in a String the backslash has to be escaped. So "\\" becomes the character '\'.
So the String ".\\b" becomes the litteral String ".\b", which will be used by the Pattern.
To expand upton AntonH's comment, whenever you want the "\" character to appear in a regex expression, you have to escape it so that it first appears in the string you are passing in.
As is, ".\b" is the string of a dot . followed by the special backspace character represented by \b, compared to ".\\b", which is the regex .\b.
I have a string that begins with one or more occurrences of the sequence "Re:". This "Re:" can be of any combinations, for ex. Re<any number of spaces>:, re:, re<any number of spaces>:, RE:, RE<any number of spaces>:, etc.
Sample sequence of string : Re: Re : Re : re : RE: This is a Re: sample string.
I want to define a java regular expression that will identify and strip off all occurrences of Re:, but only the ones at the beginning of the string and not the ones occurring within the string.
So the output should look like This is a Re: sample string.
Here is what I have tried:
String REGEX = "^(Re*\\p{Z}*:?|re*\\p{Z}*:?|\\p{Z}Re*\\p{Z}*:?)";
String INPUT = title;
String REPLACE = "";
Pattern p = Pattern.compile(REGEX);
Matcher m = p.matcher(INPUT);
while(m.find()){
m.appendReplacement(sb,REPLACE);
}
m.appendTail(sb);
I am using p{Z} to match whitespaces(have found this somewhere in this forum, as Java regex does not identify \s).
The problem I am facing with this code is that the search stops at the first match, and escapes the while loop.
Try something like this replace statement:
yourString = yourString.replaceAll("(?i)^(\\s*re\\s*:\\s*)+", "");
Explanation of the regex:
(?i) make it case insensitive
^ anchor to start of string
( start a group (this is the "re:")
\\s* any amount of optional whitespace
re "re"
\\s* optional whitespace
: ":"
\\s* optional whitespace
) end the group (the "re:" string)
+ one or more times
in your regex:
String regex = "^(Re*\\p{Z}*:?|re*\\p{Z}*:?|\\p{Z}Re*\\p{Z}*:?)"
here is what it does:
see it live here
it matches strings like:
\p{Z}Reee\p{Z: or
R\p{Z}}}
which make no sense for what you try to do:
you'd better use a regex like the following:
yourString.replaceAll("(?i)^(\\s*re\\s*:\\s*)+", "");
or to make #Doorknob happy, here's another way to achieve this, using a Matcher:
Pattern p = Pattern.compile("(?i)^(\\s*re\\s*:\\s*)+");
Matcher m = p.matcher(yourString);
if (m.find())
yourString = m.replaceAll("");
(which is as the doc says the exact same thing as yourString.replaceAll())
Look it up here
(I had the same regex as #Doorknob, but thanks to #jlordo for the replaceAll and #Doorknob for thinking about the (?i) case insensitivity part ;-) )
How can I write a regex that matches anything between two specific characters?
like:
ignore me [take:me] ignore me?
How can I match inclusive [take:me]?
The word take:me is dynamic, so I'd also would like to match [123as d:.-,§""§%]
You can use this regex:
"\\[(.*?)\\]"
This link should help you to understand why it works.
Pattern pattern = Pattern.compile("\\[(.*?)\\]");
Matcher matcher = pattern.matcher("ignore me [take:me] ignore me");
if (matcher.find()) {
System.out.println(matcher.group(1));
}
This will print take:me.
If you want to match &([take:me]) you should use this:
&\\(\\[(.*?)\\]\\)
Not that you should escape chars with special meaning in regex. (like ( and )).
Escaping them is done by adding a backslash, but because backslash in Java is written as \\ then you add \\ before any char that have a special meaning. So by doing \\( you're telling Java:
"Take ( as the regular char and not the special char".
Try (?<=c)(.+)(?=c) where c is the caharacter you're using
The java.util.regex.Matcher class is used to search through a text for multiple occurrences of a regular expression. You can also use a Matcher to search for the same regular expression in different texts.
The Matcher class has a lot of useful methods. For a full list, see the official JavaDoc for the Matcher class. I will cover the core methods here. Here is a list of the methods covered:
Creating a Matcher
Creating a Matcher is done via the matcher() method in the Pattern class. Here is an example:
String text =
"This is the text to be searched " +
"for occurrences of the http:// pattern.";
String patternString = ".*http://.*";
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(text);
matches()
The matches() method in the Matcher class matches the regular expression against the whole text passed to the Pattern.matcher() method, when the Matcher was created. Here is an example:
boolean matches = matcher.matches();
If the regular expression matches the whole text, then the matches() method returns true. If not, the matches() method returns false.
You cannot use the matches() method to search for multiple occurrences of a regular expression in a text. For that, you need to use the find(), start() and end() methods.
lookingAt()
The lookingAt() method works like the matches() method with one major difference. The lookingAt() method only matches the regular expression against the beginning of the text, whereas matches() matches the regular expression against the whole text. In other words, if the regular expression matches the beginning of a text but not the whole text, lookingAt() will return true, whereas matches() will return false.
Here is an example:
String text =
"This is the text to be searched " +
"for occurrences of the http:// pattern.";
String patternString = "This is the";
Pattern pattern = Pattern.compile(patternString, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(text);
System.out.println("lookingAt = " + matcher.lookingAt());
System.out.println("matches = " + matcher.matches());
find() + start() + end()
The find() method searches for occurrences of the regular expressions in the text passed to the Pattern.matcher(text) method, when the Matcher was created. If multiple matches can be found in the text, the find() method will find the first, and then for each subsequent call to find() it will move to the next match.
The methods start() and end() will give the indexes into the text where the found match starts and ends.
Here is an example:
String text =
"This is the text which is to be searched " +
"for occurrences of the word 'is'.";
String patternString = "is";
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(text);
int count = 0;
while(matcher.find()) {
count++;
System.out.println("found: " + count + " : "
+ matcher.start() + " - " + matcher.end());
}
This example will find the pattern "is" four times in the searched string. The output printed will be this:
found: 1 : 2 - 4
found: 2 : 5 - 7
found: 3 : 23 - 25
found: 4 : 70 - 72
You can also refer these tutorials..
Tutorial 1
You can also use lookaround assertions. This way the brackets are not included in the match itself.
(?<=\\[).*?(?=\\])
(?<=\\[) is a positive lookbehind assertion. It is true, when the char "[" is before the match
(?=\\]) is a positive lookahead assertion. It is true, when the char "[" is after the match
.*? is matching any character zero or more times, but as less as possible, because of the modifier ?. It changes the matching behaviour of quantifiers from "greedy" to "lazy".
Can anyone please help me do the following in a java regular expression?
I need to read 3 characters from the 5th position from a given String ignoring whatever is found before and after.
Example : testXXXtest
Expected result : XXX
You don't need regex at all.
Just use substring: yourString.substring(4,7)
Since you do need to use regex, you can do it like this:
Pattern pattern = Pattern.compile(".{4}(.{3}).*");
Matcher matcher = pattern.matcher("testXXXtest");
matcher.matches();
String whatYouNeed = matcher.group(1);
What does it mean, step by step:
.{4} - any four characters
( - start capturing group, i.e. what you need
.{3} - any three characters
) - end capturing group, you got it now
.* followed by 0 or more arbitrary characters.
matcher.group(1) - get the 1st (only) capturing group.
You should be able to use the substring() method to accomplish this:
string example = "testXXXtest";
string result = example.substring(4,7);
This might help: Groups and capturing in java.util.regex.Pattern.
Here is an example:
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class Example {
public static void main(String[] args) {
String text = "This is a testWithSomeDataInBetweentest.";
Pattern p = Pattern.compile("test([A-Za-z0-9]*)test");
Matcher m = p.matcher(text);
if (m.find()) {
System.out.println("Matched: " + m.group(1));
} else {
System.out.println("No match.");
}
}
}
This prints:
Matched: WithSomeDataInBetween
If you don't want to match the entire pattern rather to the input string (rather than to seek a substring that would match), you can use matches() instead of find(). You can continue searching for more matching substrings with subsequent calls with find().
Also, your question did not specify what are admissible characters and length of the string between two "test" strings. I assumed any length is OK including zero and that we seek a substring composed of small and capital letters as well as digits.
You can use substring for this, you don't need a regex.
yourString.substring(4,7);
I'm sure you could use a regex too, but why if you don't need it. Of course you should protect this code against null and strings that are too short.
Use the String.replaceAll() Class Method
If you don't need to be performance optimized, you can try the String.replaceAll() class method for a cleaner option:
String sDataLine = "testXXXtest";
String sWhatYouNeed = sDataLine.replaceAll( ".{4}(.{3}).*", "$1" );
References
https://docs.oracle.com/javase/1.5.0/docs/api/java/lang/String.html
http://www.vogella.com/tutorials/JavaRegularExpressions/article.html#using-regular-expressions-with-string-methods