No match found in simple regex - java

Given a token in the format "word_suffix", I want to match and capture the "suffix" part.
For instance, in "Peter_NNP" I want to capture "NNP". I wrote:
String p="Peter_NNP";
Matcher matcher=Pattern.compile(".+_(.*\\s)").matcher(p);
System.out.println(matcher.group(1));
instead of printing "NNP" as I would expect, it arises the following exception:
Exception in thread "main" java.lang.IllegalStateException: No match found
at java.util.regex.Matcher.group(Unknown Source)
Note that "word" and "suffix" part can be made of any character.

You need to call find() to grab your match group. Also, your capture group expects that there should be whitespace at the end of the string, in "Peter_NNP" there is none, .* is enough here.
String s = "Peter_NNP";
Pattern p = Pattern.compile(".+_(.*)");
Matcher m = p.matcher(s);
if (m.find()) {
System.out.println(m.group(1)); //=> "NNP"
}
But, I would think a simple split would be fine here:
String s = "Peter_NNP";
String[] parts = s.split("_");
System.out.println(parts[1]); //=> "NNP"

Just to add on hwnd answer, If you want to capture anything after first underscore(even if there is no character before underscore). Thanks hwnd for making me understand this.
String s="_NNP";
Matcher matcher=Pattern.compile(".*?_(.*)").matcher(s);
while (matcher.find()) {
System.out.println(matcher.group(1));
}

Related

Java regex : find the last occurrence of a string using Matcher.matches()

I have following input String:
abc.def.ghi.jkl.mno
Number of dot characters may vary in the input. I want to extract the word after the last . (i.e. mno in the above example). I am using the following regex and its working perfectly fine:
String input = "abc.def.ghi.jkl.mno";
Pattern pattern = Pattern.compile("([^.]+$)");
Matcher matcher = pattern.matcher(input);
if(matcher.find()) {
System.out.println(matcher.group(1));
}
However, I am using a third party library which does this matching (Kafka Connect to be precise) and I can just provide the regex pattern to it. The issue is, this library (whose code I can't change) uses matches() instead of find() to do the matching, and when I execute the same code with matches(), it doesn't work e.g.:
String input = "abc.def.ghi.jkl.mno";
Pattern pattern = Pattern.compile("([^.]+$)");
Matcher matcher = pattern.matcher(input);
if(matcher.matches()) {
System.out.println(matcher.group(1));
}
The above code doesn't print anything. As per the javadoc, matches() tries to match the whole String. Is there any way I can apply similar logic using matches() to extract mno from my input String?
You may use
".*\\.([^.]*)"
It matches
.*\. - any 0+ chars as many as possible up to the last . char
([^.]*) - Capturing group 1: any 0+ chars other than a dot.
See the regex demo and the Regulex graph:
To extract a word after the last . per your instruction you could do this without Pattern and Matcher as following:
String input = "abc.def.ghi.jkl.mno";
String getMe = input.substring(input.lastIndexOf(".")+1, input.length());
System.out.println(getMe);
This will work. Use .* at the beginning to enable it to match the entire input.
public static void main(String[] argv) {
String input = "abc.def.ghi.jkl.mno";
Pattern pattern = Pattern.compile(".*([^.]{3})$");
Matcher matcher = pattern.matcher(input);
if(matcher.matches()) {
System.out.println(matcher.group(0));
System.out.println(matcher.group(1));
}
}
abc.def.ghi.jkl.mno
mno
This is a better pattern if the dot really is anywhere: ".*\\.([^.]+)$"

Regular Expression to match a string that does not contain specific string in Java

I need a regular expression that matches a substring in string /*exa*/mple*/ ,
the matched string must be /*exa*/ not /*exa*/mple*/.
It also must not contain "*/" in it.
I have tried these regex:
"/\\*[.*&&[^*/]]\\*/" ,
"/\\*.*&&(?!^*/$)\\*/"
but im not able to get the exact solution.
I understand you want to pick out comments from a text.
Pattern p = Pattern.compile("/\\*.*?\\*/");
Matcher m = p.matcher("/*ex*a*/mple*/and/*more*/ther*/");
while (m.find()){
System.out.println(m.group());
}
you can try this:
/\*[^\*\/\*]+\*/ --> anything that is in between (including) "/*" and "*/"
Here is a sample:
Pattern p = Pattern.compile("/\\*[^\\*\\/\\*]+\\*/");
Matcher m = p.matcher("/*exa*/mple*/");
while (m.find()){
System.out.println(m.group());
}
OUTPUT:
/*exa*/

Java: Need to extract a number from a string

I have a string containing a number. Something like "Incident #492 - The Title Description".
I need to extract the number from this string.
Tried
Pattern p = Pattern.compile("\\d+");
Matcher m = p.matcher(theString);
String substring =m.group();
By getting an error
java.lang.IllegalStateException: No match found
What am I doing wrong?
What is the correct expression?
I'm sorry for such a simple question, but I searched a lot and still not found how to do this (maybe because it's too late here...)
You are getting this exception because you need to call find() on the matcher before accessing groups:
Matcher m = p.matcher(theString);
while (m.find()) {
String substring =m.group();
System.out.println(substring);
}
Demo.
There are two things wrong here:
The pattern you're using is not the most ideal for your scenario, it's only checking if a string only contains numbers. Also, since it doesn't contain a group expression, a call to group() is equivalent to calling group(0), which returns the entire string.
You need to be certain that the matcher has a match before you go calling a group.
Let's start with the regex. Here's what it looks like now.
Debuggex Demo
That will only ever match a string that contains all numbers in it. What you care about is specifically the number in that string, so you want an expression that:
Doesn't care about what's in front of it
Doesn't care about what's after it
Only matches on one occurrence of numbers, and captures it in a group
To that, you'd use this expression:
.*?(\\d+).*
Debuggex Demo
The last part is to ensure that the matcher can find a match, and that it gets the correct group. That's accomplished by this:
if (m.matches()) {
String substring = m.group(1);
System.out.println(substring);
}
All together now:
Pattern p = Pattern.compile(".*?(\\d+).*");
final String theString = "Incident #492 - The Title Description";
Matcher m = p.matcher(theString);
if (m.matches()) {
String substring = m.group(1);
System.out.println(substring);
}
You need to invoke one of the Matcher methods, like find, matches or lookingAt to actually run the match.

Remove part of String following regex match in Java

I want to remove a part of a string following what matches my regex.
I am trying to make a TV show organization program and I want to cut off anything in the name following the season and episode marker in the form SXXEXX where X is a digit.
I grasped the regex model fairly easily to create "[Ss]\d\d[Ee]\d\d" which should match properly.
I want to use the Matcher method end() to get the last index in the string of the match but it does not seem to be working as I think it should.
Pattern p = Pattern.compile("[Ss]\\d\\d[Ee]\\d\\d");
Matcher m = p.matcher(name);
if(m.matches())
return name.substring(0, m.end());
If someone could tell me why this doesn't work and suggest a proper way to do it, that would be great. Thanks.
matches() tries to match the whole string again the pattern. If you want to find your pattern within a string, use find(), find() will search for the next match in the string.
Your code could be quite the same:
if(m.find())
return name.substring(0, m.end());
matches matches the entire string, try find()
You could capture the name as well:
String name = "a movie S01E02 with some stuff";
Pattern p = Pattern.compile("(.*[Ss]\\d\\d[Ee]\\d\\d)");
Matcher m = p.matcher(name);
if (m.find())
System.out.println(m.group());
else
System.out.println("No match");
Will capture and print:
a movie S01E02
This should work
.*[Ss]\d\d[Ee]\d\d
In java (I'm rusty) this will be
String ResultString = null;
Pattern regex = Pattern.compile(".*[Ss]\\d\\d[Ee]\\d\\d");
Matcher regexMatcher = regex.matcher("Title S11E11Blah");
if (regexMatcher.find()) {
ResultString = regexMatcher.group();
}
Hope this helps

how to use Matcher.replaceAll in java?

i have a file which contains "(*" and "*)". i want to remove everything between this two char sequences.
i used the following code but it didn't do anything with my string.
String regex = "\\(\\*.*\\*\\)";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(input);
matcher.replaceAll("");
the 'input' is:
(* This program prints out a message. *)
program is
begin
write ("Hello, world!");
end;
You need to capture the return value of your matcher - it's replaceAll method returns the replaced String.
Additionally, use a regexp to match what you want to match, this time a parenthesized String. If you don't have some strange inputs, it may look like this:
String regex = "\\(\\*.*\\*\\)";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(input);
String result = matcher.replaceAll("(\\*\\*)");
System.out.println(result);
This regexp in fact captures the whole region from the first comment start to the last comment end, which would usually not be what you want. To let it match non-greedy (reluctantly), use this regexp: \(\*.*?\*\) (with doubled backslashes in Java.)

Categories