Matcher returning not match - java

i'v tested my regex on Regex101 and all the groups was captured and matched my string. But now when i'm trying to use it on java, it returns to me a
java.lang.IllegalStateException: No match found on line 9
String subjectCode = "02 credits between ----";
String regex1 = "^(\\d+).*credits between --+.*?$";
Pattern p1 = Pattern.compile(regex1);
Matcher m;
if(subjectCode.matches(regex1)){
m = p1.matcher(regex1);
m.find();
[LINE 9]Integer subjectCredits = Integer.valueOf(m.group(1));
System.out.println("Subject Credits: " + subjectCredits);
}
How's that possible and what's the problem?

Here is a fix and optimizations (thanks go to #cricket_007):
String subjectCode = "02 credits between ----";
String regex1 = "(\\d+).*credits between --+.*";
Pattern p1 = Pattern.compile(regex1);
Matcher m = p1.matcher(subjectCode);
if (m.matches()) {
Integer subjectCredits = Integer.valueOf(m.group(1));
System.out.println("Subject Credits: " + subjectCredits);
}
You need to pass the input string to the matcher. As a minor enhancement, you can use just 1 Matcher#matches and then access the captured group if there is a match. The regex does not need ^ and $ since with matches() the whole input should match the pattern.
See IDEONE demo

Related

java regex non-capturing groups - extract the whole number portion of a decimal number (digits before the dot)

I am trying to extract the whole portion of a decimal with units.
Conditions:
ignore/don't match numbers(whole portion) greater than 4 digits, anything greater is invalid.
decimal digits are not expected to be more than 2
whole numbers without decimals must be supported.
Numbers can be in the middle of a string with alphabetical characters, starting, ending or only the numbers without any alphabetical characters.
So far this is what I got in Java. If I were to guess, the problem I am having is that the \d{1,4} pattern is matching with the decimal portion and returning the digits after decimal for the last use case below. All the first three asserts work as expected. Greatly appreciate any help.
#Test
public void testTest(){
Pattern pattern = Pattern.compile("\\b\\d{1,4}(?:(\\.\\d{1,2})?) ml\\b", Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher("5432 ml");
Matcher matcher2 = pattern.matcher("54321 ml");
Matcher matcher3 = pattern.matcher("1234.0 ml");
Matcher matcher4 = pattern.matcher("Start 12345.0 ml end");
String result = matcher.find() ? matcher.group() : "";
String result2 = matcher2.find() ? matcher2.group() : "";
String result3 = matcher3.find() ? matcher3.group() : "";
String result4 = matcher4.find() ? matcher4.group() : "";
Assert.assertEquals(result, "5432 ml"); //passes, extracts correctly
Assert.assertEquals(result2, ""); //passes, ignores because the whole number length >4
Assert.assertEquals(result3, "1234 ml"); // fails - result3 = 1234.0 ml
Assert.assertEquals(result4, ""); // fails result4 = 0 ml
}
What is really driving me nuts is why the non-capturing group is being captured in the last assert. My understanding is that it should never return the decimal portion with '?:' Where am I wrong?
It does return "0 ml" since the regex matches the 0 and Matcher::group (oracle.com) returns the full match of the regex. To prevent this, we can use negative lookbehind (regular-expressions.info) to only match a number that is not prefixed by a .. Furthermore, since we need to extract certain parts from the pattern (the number and the unit), I suggest using named groups (regular-expressions.info). This results in the following regular expression
\b(?<!\.)(?<number>\d{1,4})(?:(\.\d{1,2})?)(?<unit> ml)\b
regex101 example
Translated to java, we end up with the following code:
final Pattern pattern = Pattern.compile(
"\\b(?<!\\.)(?<number>\\d{1,4})(?:(\\.\\d{1,2})?)(?<unit> ml)\\b",
Pattern.CASE_INSENSITIVE);
final Matcher matcher = pattern.matcher("5432 ml");
final Matcher matcher2 = pattern.matcher("54321 ml");
final Matcher matcher3 = pattern.matcher("1234.0 ml");
final Matcher matcher4 = pattern.matcher("Start 12345.0 ml end");
final String result = matcher.find()
? matcher.group("number") + matcher.group("unit")
: "";
final String result2 = matcher2.find()
? matcher2.group("number") + matcher2.group("unit")
: "";
final String result3 = matcher3.find()
? matcher3.group("number") + matcher3.group("unit")
: "";
final String result4 = matcher4.find()
? matcher4.group("number") + matcher4.group("unit")
: "";
Ideone demo

how to extract a part of string using regex

Am trying to extract last three strings i.e. 05,06,07. However my regex is working the other way around which is extracting the first three strings. Can someone please help me rectify my mistake in the code.
Pattern p = Pattern.compile("^((?:[^,]+,){2}(?:[^,]+)).+$");
String line = "CgIn,f,CgIn.util:srv2,1,11.65,42,42,42,42,04,05,06,07";
Matcher m = p.matcher(line);
String result;
if (m.matches()) {
result = m.group(1);
}
System.out.println(result);
My current output:
CgIn,f,CgIn.util:srv2
Expected output:
05,06,07
You may fix it as
Pattern p = Pattern.compile("[^,]*(?:,[^,]*){2}$");
String line = "CgIn,f,CgIn.util:srv2,1,11.65,42,42,42,42,04,05,06,07";
Matcher m = p.matcher(line);
String result = "";
if (m.find()) {
result = m.group(0);
}
System.out.println(result);
See the Java demo
The regex is
[^,]*(?:,[^,]*){2}$
See the regex demo.
Pattern details
[^,]* - 0+ chars other than ,
(?:,[^,]*){2} - 2 repetitions of
, - a comma
[^,]* - 0+ chars other than ,
$ - end of string.
Note that you should use Matcher#find() with this regex to find a partial match.

Java reg expression capture string

I have the following string:
"(1)name1:content1(2)name2:content2(3)name3:content3...(n)namen:contentn"
what I want to do is to capture each of the name_i and content_i, how can I do this? I should mention that name_i is unknown. For example name1 could be "abc", name2 could be "xyz".
What I have tried:
String regex = "\\(\\d\\)(.*):(.*)(?=\\(\\d\\))";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(str);
if (matcher.find()) {
System.out.println(matcher.group(0);
System.out.println(matcher.group(1);
System.out.println(matcher.group(2);
}
But the results is not very good. I also tried matcher.mathes(), nothing will be returned.
You may use
String s = "(1)name1:content1(2)name2:content2(3)name3:content3...(4)namen:content4";
Pattern pattern = Pattern.compile("\\(\\d+\\)([^:]+):([^(]*(?:\\((?!\\d+\\))[^(]*)*)");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
}
See the Java demo
Details
\\(\\d+\\) - matches (x) substring where x is 1 or more digits
([^:]+) - Group 1: one or more chars other than :
: - a colon
([^(]*(?:\\((?!\\d+\\))[^(]*)*) - Group 2:
[^(]* - zero or more chars other than (
(?:\\((?!\\d+\\))[^(]*)* - zero or more sequences of:
\\((?!\\d+\\)) - a ( that is not followed with 1+ digits and )
[^(]* - 0+ chars other than (
See the regex demo.
This will work if your name and content does not include any non "word"-boundary characters:
public static void test(String input){
String regexpp = "\\(\\d+\\)(\\w+):(\\w+)";
Pattern p = Pattern.compile(regexpp);
Matcher m = p.matcher(input);
while(m.find()){
System.out.println("Name: " + m.group(1));
System.out.println("Content: " + m.group(2));
}
}
Output:
Name: name1
Content: content1
Name: name2
Content: content2
Name: name3
Content: content3
Name: name99
Content: content99
Your expression matches greedily - your first group eats up the colon first so it won't be possible to match the entire expression. You can use non-greedy matching (using the question mark as in *?) to make your pattern match.
String regex = "\\(\\d\\)(.*?):(.*?)(?=\\(\\d\\))";

Get Substring from a String in Java

I have the following text:
...,Niedersachsen,NOT IN CHARGE SINCE: 03.2009, CATEGORY:...,
Now I want to extract the date after NOT IN CHARGE SINCE: until the comma.
So i need only 03.2009 as result in my substring.
So how can I handle that?
String substr = "not in charge since:";
String before = s.substring(0, s.indexOf(substr));
String after = s.substring(s.indexOf(substr),s.lastIndexOf(","));
EDIT
for (String s : split) {
s = s.toLowerCase();
if (s.contains("ex peps")) {
String substr = "not in charge since:";
String before = s.substring(0, s.indexOf(substr));
String after = s.substring(s.indexOf(substr), s.lastIndexOf(","));
System.out.println(before);
System.out.println(after);
System.out.println("PEP!!!");
} else {
System.out.println("Line ok");
}
}
But that is not the result I want.
You can use Patterns for example :
String str = "Niedersachsen,NOT IN CHARGE SINCE: 03.2009, CATEGORY";
Pattern p = Pattern.compile("\\d{2}\\.\\d{4}");
Matcher m = p.matcher(str);
if (m.find()) {
System.out.println(m.group());
}
Output
03.2009
Note : if you want to get similar dates in all your String you can use while instead of if.
Edit
Or you can use :
String str = "Niedersachsen,NOT IN CHARGE SINCE: 03.03.2009, CATEGORY";
Pattern p = Pattern.compile("SINCE:(.*?)\\,");
Matcher m = p.matcher(str);
if (m.find()) {
System.out.println(m.group(1).trim());
}
You can use : to separate the String s.
String substr = "NOT IN CHARGE SINCE:";
String before = s.substring(0, s.indexOf(substr)+1);
String after = s.substring(s.indexOf(':')+1, s.lastIndexOf(','));
Of course, regular expressions give you more ways to do searching/matching, but assuming that the ":" is the key thing you are looking for (and it shows up exactly once in that position) then:
s.substring(s.indexOf(':')+1, s.lastIndexOf(',')).trim();
is the "most simple" and "least overhead" way of fetching that substring.
Hint: as you are searching for a single character, use a character as search pattern; not a string!
If you have a more generic usecase and you know the structure of the text to be matched well you might profit from using regular expressions:
Pattern pattern = Pattern.compile(".*NOT IN CHARGE SINCE: \([0-9.]*\),");
Matcher matcher = pattern.matcher(line);
System.out.println(matcher.group());
A more generic way to solve your problem is to use Regex to match Every group Between : and ,
Pattern pattern = Pattern.compile("(?<=:)(.*?)(?=,)");
Matcher m = p.matcher(str);
You have to create a pattern for it. Try this as a simple regex starting point, and feel free to improvise on it:
String s = "...,Niedersachsen,NOT IN CHARGE SINCE: 03.2009, CATEGORY:....,";
Pattern pattern = Pattern.compile(".*NOT IN CHARGE SINCE: ([\\d\\.]*).*");
Matcher matcher = pattern.matcher(s);
if (matcher.find())
{
System.out.println(matcher.group(1));
}
That should get you whatever group of digits you received as date.

Use multiple regex to check blacklisted patterns are there in the string

I have a string with value. I want to check whether there are back listed pattern there in that string.
ex: String myString="a/b[c=\"1\"=\"1\"]/c\^]
I want to check following patterns are there
"1"="1"
^
I am using following code which always gives false
String text = "\"1\"=\"1\" ^ for occurrences of the http:// pattern.";
String patternString = "\"1\"=\"1\"|^";
Pattern pattern = Pattern.compile(patternString, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(text);
boolean matches = matcher.matches();
System.out.println("matches = " + matches)
How can I check it with one line of regex.
Couple of issues with your code:
String patternString = "\"1\"=\"1\"|^";
Here ^ must be escaped since ^ is a special meta character so make it:
String patternString = "\"1\"=\"1\"|\\^";
Then this call:
boolean matches = matcher.matches();
should be changed to:
boolean matches = matcher.find();
as matches attempts to match full input string.
To check if BOTH "1"="1" and ^ are in the input String
Using regex:
String text = "\"1\"=\"1\" ^ for occurrences of the http:// pattern.";
Pattern p = Pattern.compile("\"1\"=\"1\".*\\^|\\^.*\"1\"=\"1\"");
Matcher m = p.matcher(text);
if(m.find())
System.out.println("Correct String");
Using contains method:
String text = "\"1\"=\"1\" ^ for occurrences of the http:// pattern.";
if (text.contains("\"1\"=\"1\"") && text.contains("^"))
System.out.println("Correct String");

Categories