No match for Java Regular Expression - java

I am running into an issue where my code is unable to find regex occurrences. Code:
String content = "This\ is\ an\ example.=This is an example\nThis\ is\ second\:=This is second"
String regex = "\"^.*(?=\\=)\"gm";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(content);
List<String> mKeys = new ArrayList<>();
while (m.find()) {
mKeys.add(m.group());
}
mKeys turns out to be empty. I have already validated my regex here https://regex101.com/r/YResRc/3. I am expecting the list to contain two keys from the content.

Your content contains no " quotes, and no text gm, so why would you expect that regex to match?
FYI: Syntaxes like "foo"gm or /foo/gm are something other languages do for regex literals. Java doesn't do that.
The g flag is implied by the fact that you're using a find() loop, and m is the MULTILINE flag that affects ^ and $ and you can specify that using the (?m) pattern, or by adding a second parameter to compile(), i.e. one of these ways:
Pattern p = Pattern.compile("foo", Pattern.MULTILINE);
Pattern p = Pattern.compile("(?m)foo");
Your regex should simply be:
(?m)^.*(?==)
which means: Match everything from the beginning of a line up to the last = sign on the line.
Test
String content = "This is an example.=This is an example\nThis is second:=This is second";
String regex = "(?m)^.*(?==)";
Matcher m = Pattern.compile(regex).matcher(content);
List<String> mKeys = new ArrayList<>();
while (m.find()) {
mKeys.add(m.group());
}
System.out.println(mKeys);
Output
[This is an example., This is second:]

Related

Regex to get value between two colon excluding the colons

I have a string like this:
something:POST:/some/path
Now I want to take the POST alone from the string. I did this by using this regex
:([a-zA-Z]+):
But this gives me a value along with colons. ie I get this:
:POST:
but I need this
POST
My code to match the same and replace it is as follows:
String ss = "something:POST:/some/path/";
Pattern pattern = Pattern.compile(":([a-zA-Z]+):");
Matcher matcher = pattern.matcher(ss);
if (matcher.find()) {
System.out.println(matcher.group());
ss = ss.replaceFirst(":([a-zA-Z]+):", "*");
}
System.out.println(ss);
EDIT:
I've decided to use the lookahead/lookbehind regex since I did not want to use replace with colons such as :*:. This is my final solution.
String s = "something:POST:/some/path/";
String regex = "(?<=:)[a-zA-Z]+(?=:)";
Matcher matcher = Pattern.compile(regex).matcher(s);
if (matcher.find()) {
s = s.replaceFirst(matcher.group(), "*");
System.out.println("replaced: " + s);
}
else {
System.out.println("not replaced: " + s);
}
There are two approaches:
Keep your Java code, and use lookahead/lookbehind (?<=:)[a-zA-Z]+(?=:), or
Change your Java code to replace the result with ":*:"
Note: You may want to define a String constant for your regex, since you use it in different calls.
As pointed out, the reqex captured group can be used to replace.
The following code did it:
String ss = "something:POST:/some/path/";
Pattern pattern = Pattern.compile(":([a-zA-Z]+):");
Matcher matcher = pattern.matcher(ss);
if (matcher.find()) {
ss = ss.replaceFirst(matcher.group(1), "*");
}
System.out.println(ss);
UPDATE
Looking at your update, you just need ReplaceFirst only:
String result = s.replaceFirst(":[a-zA-Z]+:", ":*:");
See the Java demo
When you use (?<=:)[a-zA-Z]+(?=:), the regex engine checks each location inside the string for a * before it, and once found, tries to match 1+ ASCII letters and then assert that there is a : after them. With :[A-Za-z]+:, the checking only starts after a regex engine found : character. Then, after matching :POST:, the replacement pattern replaces the whole match. It is totlally OK to hardcode colons in the replacement pattern since they are hardcoded in the regex pattern.
Original answer
You just need to access Group 1:
if (matcher.find()) {
System.out.println(matcher.group(1));
}
See Java demo
Your :([a-zA-Z]+): regex contains a capturing group (see (....) subpattern). These groups are numbered automatically: the first one has an index of 1, the second has the index of 2, etc.
To replace it, use Matcher#appendReplacement():
String s = "something:POST:/some/path/";
StringBuffer result = new StringBuffer();
Matcher m = Pattern.compile(":([a-zA-Z]+):").matcher(s);
while (m.find()) {
m.appendReplacement(result, ":*:");
}
m.appendTail(result);
System.out.println(result.toString());
See another demo
This is your solution:
regex = (:)([a-zA-Z]+)(:)
And code is:
String ss = "something:POST:/some/path/";
ss = ss.replaceFirst("(:)([a-zA-Z]+)(:)", "$1*$3");
ss now contains:
something:*:/some/path/
Which I believe is what you are looking for...

RegEx Exepression not matching

I have the following text
CHAPTER 1
Introduction
CHAPTER OVERVIEW
Which I did create and tested (http://regexr.com/) the following regEx for
(CHAPTER\s{1}\d\n)
However when I use the following code on Java it fails
String text = stripper.getText(document);//The text above
Pattern p = Pattern.compile("(CHAPTER\\s{1}\\d\\n)");
Matcher m = p.matcher(text);
if (m.find()) {
//do action
}
the m.find() returns always false.
Your document may have DOS line feed \r as well. You can use either of these patterns:
Pattern p = Pattern.compile("CHAPTER\\s+\\d+\\R");
\R (requires Java 8) will match any combination of \r and \n after your digits or just use:
Pattern p = Pattern.compile("CHAPTER\\s+\\d+\\s");
since \s also matches any whitespace including newline characters.
Another alternative is to use MULTILINE flag with anchor $:
Pattern p = Pattern.compile("(?m)CHAPTER\\s+\\d+$");
Your problem is in your source text. I think you forget about new lines. Because this:
String text = "CHAPTER 1\n" +
"Introduction\n" +
"CHAPTER OVERVIEW";
Pattern p = Pattern.compile("(CHAPTER\\s{1}\\d\\n)");
Matcher m = p.matcher(text);
System.out.println(m.find());
will write true. String body is copied from here and Intellij add there new lines. Try to debug what you really get in stripper.getText(document).
You can use Pattern as second param for compile. (Pattern.MULTILINE) More info
here
.

text wrongly matchs with sub string of words in group

I want to check the text to see if it starts with what or who and and is a question type, so for that I wrote the following code:
private static void startWithQOrIf(String commentstr){
String urlPattern = "(|who|what).*\\?.*$";
Pattern p = Pattern.compile(urlPattern,Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(commentstr);
if (m.find()) {
System.out.println("yes");
}
}
everything works good but for example when I try:
whooooooooo is the follower?
will match as well but should not because I am looking for who not whooooooooo
Any idea?
You can ensure a whole word using a word boundary \b:
(|who|what)\\b.*\\?.*$
^^
If the words in the alternation group are supposed to appear at the start of the string, you can just use matches and remove $ anchor:
String urlPattern = "(|who|what)\\b.*\\?.*";
Pattern p = Pattern.compile(urlPattern,Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(commentstr);
if (m.matches()) { // < - Here, matches is used
System.out.println("yes");
}
Note that (|who|what) matches either an empty string, or who, or what. If you do not plan to allow empty string, use just (who|what).
You must use word boundaries.
String urlPattern = "\\b(who|what)\\b.*\\?.*$";

Java regex matcher always returns false

I have a string expression from which I need to get some values. The string is as follows
#min({(((fields['example6'].value + fields['example5'].value) * ((fields['example1'].value*5)+fields['example2'].value+fields['example3'].value-fields['example4'].value)) * 0.15),15,9.087})
From this stribg, I need to obtain a string array list which contains the values such as "example1", "example2" and so on.
I have a Java method which looks like this:
String regex = "/fields\\[['\"]([\\w\\s]+)['\"]\\]/g";
ArrayList<String> arL = new ArrayList<String>();
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(expression);
while(m.find()){
arL.add(m.group());
}
But m.find() always returns false. Is there anything I'm missing?
The problem is with the '/'s. If what you want to extract is only the field name, you should use m.group(1):
String regex = "fields\\[['\"]([\\w\\s]+)['\"]\\]";
ArrayList<String> arL = new ArrayList<String>();
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(expression);
while(m.find()){
arL.add(m.group(1));
}
The main issue you seem to have is that you are using delimiters (as in PHP or Perl or JavaScript) that cannot be used in a Java regex. Also, you have your matches in the first capturing group, but you are using group() that returns the whole match (including fields[').
Here is a working code:
String str = "#min({(((fields['example6'].value + fields['example5'].value) * ((fields['example1'].value*5)+fields['example2'].value+fields['example3'].value-fields['example4'].value)) * 0.15),15,9.087})";
ArrayList<String> arL = new ArrayList<String>();
String rx = "(?<=fields\\[['\"])[\\w\\s]*(?=['\"]\\])";
Pattern ptrn = Pattern.compile(rx);
Matcher m = ptrn.matcher(str);
while (m.find()) {
arL.add(m.group());
}
Here is a working IDEONE demo
Note that I have added look-arounds to extract just the texts between 's with group().

Pattern matching with string containing dots

Pattern is:
private static Pattern r = Pattern.compile("(.*\\..*\\..*)\\..*");
String is:
sentVersion = "1.1.38.24.7";
I do:
Matcher m = r.matcher(sentVersion);
if (m.find()) {
guessedClientVersion = m.group(1);
}
I expect 1.1.38 but the pattern match fails. If I change to Pattern.compile("(.*\\..*\\..*)\\.*");
// notice I remove the "." before the last *
then 1.1.38.XXX fails
My goal is to find (x.x.x) in any incoming string.
Where am I wrong?
Problem is probably due to greedy-ness of your regex. Try this negation based regex pattern:
private static Pattern r = Pattern.compile("([^.]*\\.[^.]*\\.[^.]*)\\..*");
Online Demo: http://regex101.com/r/sJ5rD4
Make your .* matches reluctant with ?
Pattern r = Pattern.compile("(.*?\\..*?\\..*?)\\..*");
otherwise .* matches the whole String value.
See here: http://regex101.com/r/lM2lD5

Categories