Find characters that match a regex's set - java

I have a regex w_p[a-z]
It would match input like w_pa, w_pb ... w_pz. I like to find which character exactly was matched i.e. a,b or z for the above input. Is this possible with java regex?

Yes, you need to capture:
final Pattern pattern = Pattern.compile("w_p([a-z])");
final Matcher m = pattern.matcher(input);
if (m.find())
// what is matched is in m.group(1)

Sure, use Regexpr groups. w_p([a-z]) defines a group for the character you are looking for.
Pattern p = Pattern.compile("w_p([a-z])");
Matcher matcher = p.matcher(input);
if (matcher.find()) {
String character = matcher.group(1)
}
matcher.group(0) contains all that was matched (w_pa or w_pb etc.)
matcher.group(1) contains what was found in the first () pair.
See the documentation for more information.

The REGEX will be something like this:
w_p([a-z])
So you will create a group from wich you can get the value

Related

How To Match Repeating Sub-Patterns

Let's say I have a string:
String sentence = "My nieces are Cara:8 Sarah:9 Tara:10";
And I would like to find all their respective names and ages with the following pattern matcher:
String regex = "My\\s+nieces\\s+are((\\s+(\\S+):(\\d+))*)";
Pattern pattern = Pattern.compile;
Matcher matcher = pattern.matcher(sentence);
I understand something like
matcher.find(0); // resets "pointer"
String niece = matcher.group(2);
String nieceName = matcher.group(3);
String nieceAge = matcher.group(4);
would give me my last niece (" Tara:10", "Tara", "10",).
How would I collect all of my nieces instead of only the last, using only one regex/pattern?
I would like to avoid using split string.
Another idea is to use the \G anchor that matches where the previous match ended (or at start).
String regex = "(?:\\G(?!\\A)|My\\s+nieces\\s+are)\\s+(\\S+):(\\d+)";
If My\s+nieces\s+are matches
\G will chain matches from there
(?!\A) neg. lookahead prevents \G from matching at \A start
\s+(\S+):(\d+) using two capturing groups for extraction
See this demo at regex101 or a Java demo at tio.run
Matcher m = Pattern.compile(regex).matcher(sentence);
while (m.find()) {
System.out.println(m.group(1));
System.out.println(m.group(2));
}
You can't iterate over repeating groups, but you can match each group individually, calling find() in a loop to get the details of each one. If they need to be back-to-back, you can iteratively bound your matcher to the last index, like this:
Matcher matcher = Pattern.compile("My\\s+nieces\\s+are").matcher(sentence);
if (matcher.find()) {
int boundary = matcher.end();
matcher = Pattern.compile("^\\s+(\\S+):(\\d+)").matcher(sentence);
while (matcher.region(boundary, sentence.length()).find()) {
System.out.println(matcher.group());
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
boundary = matcher.end();
}
}

Java regex : find the last occurrence of a string using Matcher.matches()

I have following input String:
abc.def.ghi.jkl.mno
Number of dot characters may vary in the input. I want to extract the word after the last . (i.e. mno in the above example). I am using the following regex and its working perfectly fine:
String input = "abc.def.ghi.jkl.mno";
Pattern pattern = Pattern.compile("([^.]+$)");
Matcher matcher = pattern.matcher(input);
if(matcher.find()) {
System.out.println(matcher.group(1));
}
However, I am using a third party library which does this matching (Kafka Connect to be precise) and I can just provide the regex pattern to it. The issue is, this library (whose code I can't change) uses matches() instead of find() to do the matching, and when I execute the same code with matches(), it doesn't work e.g.:
String input = "abc.def.ghi.jkl.mno";
Pattern pattern = Pattern.compile("([^.]+$)");
Matcher matcher = pattern.matcher(input);
if(matcher.matches()) {
System.out.println(matcher.group(1));
}
The above code doesn't print anything. As per the javadoc, matches() tries to match the whole String. Is there any way I can apply similar logic using matches() to extract mno from my input String?
You may use
".*\\.([^.]*)"
It matches
.*\. - any 0+ chars as many as possible up to the last . char
([^.]*) - Capturing group 1: any 0+ chars other than a dot.
See the regex demo and the Regulex graph:
To extract a word after the last . per your instruction you could do this without Pattern and Matcher as following:
String input = "abc.def.ghi.jkl.mno";
String getMe = input.substring(input.lastIndexOf(".")+1, input.length());
System.out.println(getMe);
This will work. Use .* at the beginning to enable it to match the entire input.
public static void main(String[] argv) {
String input = "abc.def.ghi.jkl.mno";
Pattern pattern = Pattern.compile(".*([^.]{3})$");
Matcher matcher = pattern.matcher(input);
if(matcher.matches()) {
System.out.println(matcher.group(0));
System.out.println(matcher.group(1));
}
}
abc.def.ghi.jkl.mno
mno
This is a better pattern if the dot really is anywhere: ".*\\.([^.]+)$"

Regular Expression to match a string that does not contain specific string in Java

I need a regular expression that matches a substring in string /*exa*/mple*/ ,
the matched string must be /*exa*/ not /*exa*/mple*/.
It also must not contain "*/" in it.
I have tried these regex:
"/\\*[.*&&[^*/]]\\*/" ,
"/\\*.*&&(?!^*/$)\\*/"
but im not able to get the exact solution.
I understand you want to pick out comments from a text.
Pattern p = Pattern.compile("/\\*.*?\\*/");
Matcher m = p.matcher("/*ex*a*/mple*/and/*more*/ther*/");
while (m.find()){
System.out.println(m.group());
}
you can try this:
/\*[^\*\/\*]+\*/ --> anything that is in between (including) "/*" and "*/"
Here is a sample:
Pattern p = Pattern.compile("/\\*[^\\*\\/\\*]+\\*/");
Matcher m = p.matcher("/*exa*/mple*/");
while (m.find()){
System.out.println(m.group());
}
OUTPUT:
/*exa*/

Java regex pattern issue

I have a string:
bundle://24.0:0/com/keop/temp/Activator.class
And from this string I need to get com/keop/temp/Activator but the following pattern:
Pattern p = Pattern.compile("bundle://.*/(.*)\\.class");
returns only Activator. Where is my mistake?
You need to follow the initial token .* with ? for a non-greedy match.
bundle://.*?/(.*)\\.class
^
Your regex uses greedy matching with a . that matches any character (but a newline). .*/ reads everything up to the final /, (.*)\\. matches everything up to the final period. Instead of lazy matching, you can restrict the characters matched to non-/ before the string you want to match. Change to
Pattern p = Pattern.compile("bundle://[^/]*/(.*)\\.class");
Sample code:
String str = "bundle://24.0:0/com/keop/temp/Activator.class";
Pattern ptrn = Pattern.compile("bundle://[^/]*/(.*)\\.class");
Matcher matcher = ptrn.matcher(str);
if (matcher.find()) {
System.out.println(matcher.group(1));
Output of the sample program:
com/keop/temp/Activator

java regex: extract text after delimeter?

i am new to regular expressions in Java. I like to extract a string by using regular expressions.
This is my String: "Hello,World"
I like to extract the text after ",". The result would be "World". I tried this:
final Pattern pattern = Pattern.compile(",(.+?)");
final Matcher matcher = pattern.matcher("Hello,World");
matcher.find();
But what would be the next step?
You don't need Regex for this. You can simply split on comma and get the 2nd element from the array: -
System.out.println("Hello,World".split(",")[1]);
OUTPUT: -
World
But if you want to use Regex, you need to remove ? from your Regex.
? after + is used for Reluctant matching. It will only match W and stop there.
You don't need that here. You need to match until it can match.
So use greedy matching instead.
Here's the code with modified Regex: -
final Pattern pattern = Pattern.compile(",(.+)");
final Matcher matcher = pattern.matcher("Hello,World");
if (matcher.find()) {
System.out.println(matcher.group(1));
}
OUTPUT: -
World
Extending what you have, you need to remove the ? sign from your pattern to use the greedy matching and then process the matched group:
final Pattern pattern = Pattern.compile(",(.+)"); // removed your '?'
final Matcher matcher = pattern.matcher("Hello,World");
while (matcher.find()) {
String result = matcher.group(1);
// work with result
}
Other answers suggest different approaches to your problem and might offer better solution for what you need.
System.out.println( "Hello,World".replaceAll(".*,(.*)","$1") ); // output is "World"
You are using a reluctant expression and will only select a single character W, whereas you can use a greedy one and print your matched group content:
final Pattern pattern = Pattern.compile(",(.+)");
final Matcher matcher = pattern.matcher("Hello,World");
if (matcher.find()) {
System.out.println(matcher.group(1));
}
Output:
World
See Regex Pattern doc

Categories