Java Regex group matches spaces - java

I have this regex and my output seems to be matching each single space but the capturing group is only alpha chars. I must be missing something.
String regexstring = new String("1234567 Mike Peloso ");
Pattern pattern = Pattern.compile("[A-Za-z]*");
Matcher matcher = pattern.matcher(regexstring);
while(matcher.find())
{
System.out.println(Integer.toString(matcher.start()));
String someNumberStr = matcher.group();
System.out.println(someNumberStr);
}

There is no capturing group, but you need to use the + quantifier (meaning 1 or more times). The * quantifier matches the preceding element zero or more times and creates a disaster of output...
Pattern pattern = Pattern.compile("[A-Za-z]+");
And then print the match result:
while (matcher.find()) {
System.out.println(matcher.start());
System.out.println(matcher.group());
}
Working Demo

Related

How To Match Repeating Sub-Patterns

Let's say I have a string:
String sentence = "My nieces are Cara:8 Sarah:9 Tara:10";
And I would like to find all their respective names and ages with the following pattern matcher:
String regex = "My\\s+nieces\\s+are((\\s+(\\S+):(\\d+))*)";
Pattern pattern = Pattern.compile;
Matcher matcher = pattern.matcher(sentence);
I understand something like
matcher.find(0); // resets "pointer"
String niece = matcher.group(2);
String nieceName = matcher.group(3);
String nieceAge = matcher.group(4);
would give me my last niece (" Tara:10", "Tara", "10",).
How would I collect all of my nieces instead of only the last, using only one regex/pattern?
I would like to avoid using split string.
Another idea is to use the \G anchor that matches where the previous match ended (or at start).
String regex = "(?:\\G(?!\\A)|My\\s+nieces\\s+are)\\s+(\\S+):(\\d+)";
If My\s+nieces\s+are matches
\G will chain matches from there
(?!\A) neg. lookahead prevents \G from matching at \A start
\s+(\S+):(\d+) using two capturing groups for extraction
See this demo at regex101 or a Java demo at tio.run
Matcher m = Pattern.compile(regex).matcher(sentence);
while (m.find()) {
System.out.println(m.group(1));
System.out.println(m.group(2));
}
You can't iterate over repeating groups, but you can match each group individually, calling find() in a loop to get the details of each one. If they need to be back-to-back, you can iteratively bound your matcher to the last index, like this:
Matcher matcher = Pattern.compile("My\\s+nieces\\s+are").matcher(sentence);
if (matcher.find()) {
int boundary = matcher.end();
matcher = Pattern.compile("^\\s+(\\S+):(\\d+)").matcher(sentence);
while (matcher.region(boundary, sentence.length()).find()) {
System.out.println(matcher.group());
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
boundary = matcher.end();
}
}

Regular Expression in Java. Splitting a string using pattern and matcher

I am trying to get all the matching groups in my string.
My regular expression is "(?<!')/|/(?!')". I am trying to split the string using regular expression pattern and matcher. string needs to be split by using /, but '/'(surrounded by ') this needs to be skipped. for example "One/Two/Three'/'3/Four" needs to be split as ["One", "Two", "Three'/'3", "Four"] but not using .split method.
I am currently the below
// String to be scanned to find the pattern.
String line = "Test1/Test2/Tt";
String pattern = "(?<!')/|/(?!')";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(line);
if (m.matches()) {
System.out.println("Found value: " + m.group(0) );
} else {
System.out.println("NO MATCH");
}
But it always saying "NO MATCH". where i am doing wrong? and how to fix that?
Thanks in advance
To get the matches without using split, you might use
[^'/]+(?:'/'[^'/]*)*
Explanation
[^'/]+ Match 1+ times any char except ' or /
(?: Non capture group
'/'[^'/]* Match '/' followed by optionally matching any char except ' or /
)* Close group and optionally repeat it
Regex demo | Java demo
String regex = "[^'/]+(?:'/'[^'/]*)*";
String string = "One/Two/Three'/'3/Four";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println(matcher.group(0));
}
Output
One
Two
Three'/'3
Four
Edit
If you do not want to split don't you might also use a pattern to not match / but only when surrounded by single quotes
[^/]+(?:(?<=')/(?=')[^/]*)*
Regex demo
Try this.
String line = "One/Two/Three'/'3/Four";
Pattern pattern = Pattern.compile("('/'|[^/])+");
Matcher m = pattern.matcher(line);
while (m.find())
System.out.println(m.group());
output:
One
Two
Three'/'3
Four
Here is simple pattern matching all desired /, so you can split by them:
(?<=[^'])\/(?=')|(?<=')\/(?=[^'])|(?<=[^'])\/(?=[^'])
The logic is as follows: we have 4 cases:
/ is sorrounded by ', i.e. `'/'
/ is preceeded by ', i.e. '/
/ is followed by ', i.e. /'
/ is sorrounded by characters other than '
You want only exclude 1. case. So we need to write regex for three cases, so I have written three similair regexes and used alternation.
Explanation of the first part (other two are analogical):
(?<=[^']) - positiva lookbehind, assert what preceeds is differnt frim ' (negated character class [^']
\/ - match / literally
(?=') - positiva lookahead, assert what follows is '\
Demo with some more edge cases
Try something like this:
String line = "One/Two/Three'/'3/Four";
String pattern = "([^/]+'/'\d)|[^/]+";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(line);
boolean found = false;
while(m.find()) {
System.out.println("Found value: " + m.group() );
found = true;
}
if(!found) {
System.out.println("NO MATCH");
}
Output:
Found value: One
Found value: Two
Found value: Three'/'3
Found value: Four

Java regex : find the last occurrence of a string using Matcher.matches()

I have following input String:
abc.def.ghi.jkl.mno
Number of dot characters may vary in the input. I want to extract the word after the last . (i.e. mno in the above example). I am using the following regex and its working perfectly fine:
String input = "abc.def.ghi.jkl.mno";
Pattern pattern = Pattern.compile("([^.]+$)");
Matcher matcher = pattern.matcher(input);
if(matcher.find()) {
System.out.println(matcher.group(1));
}
However, I am using a third party library which does this matching (Kafka Connect to be precise) and I can just provide the regex pattern to it. The issue is, this library (whose code I can't change) uses matches() instead of find() to do the matching, and when I execute the same code with matches(), it doesn't work e.g.:
String input = "abc.def.ghi.jkl.mno";
Pattern pattern = Pattern.compile("([^.]+$)");
Matcher matcher = pattern.matcher(input);
if(matcher.matches()) {
System.out.println(matcher.group(1));
}
The above code doesn't print anything. As per the javadoc, matches() tries to match the whole String. Is there any way I can apply similar logic using matches() to extract mno from my input String?
You may use
".*\\.([^.]*)"
It matches
.*\. - any 0+ chars as many as possible up to the last . char
([^.]*) - Capturing group 1: any 0+ chars other than a dot.
See the regex demo and the Regulex graph:
To extract a word after the last . per your instruction you could do this without Pattern and Matcher as following:
String input = "abc.def.ghi.jkl.mno";
String getMe = input.substring(input.lastIndexOf(".")+1, input.length());
System.out.println(getMe);
This will work. Use .* at the beginning to enable it to match the entire input.
public static void main(String[] argv) {
String input = "abc.def.ghi.jkl.mno";
Pattern pattern = Pattern.compile(".*([^.]{3})$");
Matcher matcher = pattern.matcher(input);
if(matcher.matches()) {
System.out.println(matcher.group(0));
System.out.println(matcher.group(1));
}
}
abc.def.ghi.jkl.mno
mno
This is a better pattern if the dot really is anywhere: ".*\\.([^.]+)$"

Extract numbers from a url's string group using regex in java

I have a url which has this format:
https://address.com/somestring/somestring-2/c100.200.3.4/somestrigx3/somestring.4
I want to obtain the number from c100.200.3.4 which are delimited by c and / and a dot. So in the end I want to have 100, 200, 3, 4.
I was wondering if there is a way to build a regex pattern for this instead of the classic string search and compute.
It is possible to get with 1 regex, but with a bit of code.
String s = "https://address.com/somestring/somestring-2/c100.200.3.4/somestrigx3/somestring.4";
Pattern pattern = Pattern.compile("(?<=/c)(\\d+)|(?!^)\\G\\.(\\d+)");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
if (matcher.group(1) != null)
System.out.println(matcher.group(1));
if (matcher.group(2) != null)
System.out.println(matcher.group(2));
}
See IDEONE demo
The regex (?<=/c)(\d+)|(?!^)\G\.(\d+) contains two alternatives: (?<=/c)(\d+) matches and captures into Group 1 any sequence of digits after /c, and the (?!^)\G\.(\d+) matches consecutive sequences of a literal . and digits (capturing the latter into Group 2) after the successful previous match (due to (?!^)\G). Since either group can be non-initialized, we have to check it for null.
UPDATE
Since - as it turns out - the number of digit groups is a fix one (4), you can use a simpler regex with capturing groups:
String s = "https://address.com/somestring/somestring-2/c100.200.3.4/somestrigx3/somestring.4";
Pattern pattern = Pattern.compile("(?<=/c)(\\d+)\\.(\\d+)\\.(\\d+)\\.(\\d+)");
Matcher matcher = pattern.matcher(s);
if (matcher.find()){
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
System.out.println(matcher.group(3));
System.out.println(matcher.group(4));
}
See another demo
String splits[] = input_url.replaceAll(".*?/c([0-9.]+)/.*", "$1").split("[.]");
Here, first it is picking the text in between /c(...)/ at group $1 and replacing the whole string with the captured group. After that it is splitting the string with a dot.

RegEX: how to match string which is not surrounded

I have a String "REC/LESS FEES/CODE/AU013423".
What could be the regEx expression to match "REC" and "AU013423" (anything that is not surrounded by slashes /)
I am using /^>*/, which works and matches the string within slash's i.e. using this I am able to find "/LESS FEES/CODE/", but I want to negate this to find reverse i.e. REC and AU013423.
Need help on this. Thanks
If you know that you're only looking for alphanumeric data you can use the regex ([A-Z0-9]+)/.*/([A-Z0-9]+) If this matches you will have the two groups which contain the first & final text strings.
This code prints RECAU013423
final String s = "REC/LESS FEES/CODE/AU013423";
final Pattern regex = Pattern.compile("([A-Z0-9]+)/.*/([A-Z0-9]+)", Pattern.CASE_INSENSITIVE);
final Matcher matcher = regex.matcher(s);
if (matcher.matches()) {
System.out.println(matcher.group(1) + matcher.group(2));
}
You can tweak the regex groups as necessary to cover valid characters
Here's another option:
String s = "REC/LESS FEES/CODE/AU013423";
String[] results = s.split("/.*/");
System.out.println(Arrays.toString(results));
// [REC, AU013423]
^[^/]+|[^/]+$
matches anything that occurs before the first or after the last slash in the string (or the entire string if there is no slash present).
To iterate over all matches in a string in Java:
Pattern regex = Pattern.compile("^[^/]+|[^/]+$");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
// matched text: regexMatcher.group()
// match start: regexMatcher.start()
// match end: regexMatcher.end()
}

Categories