Java regex grouping

Java regex grouping - java

I have the following entry in a properties file:
some.key = \n
[1:Some value] \n
[14:Some other value] \n
[834:Yet another value] \n
I am trying to parse it using a regular expression, but I can't seem to get the grouping correct. I am trying to print out a key/value for each entry. Example: Key="834", Value="Yet another value"
private static final String REGEX_PATTERN = "[(\\d+)\\:(\\w+(\\s)*)]+";
private void foo(String propValue){
final Pattern p = Pattern.compile(REGEX_PATTERN);
final Matcher m = p.matcher(propValue);
while (m.find()) {
final String key = m.group(0).trim();
final String value = m.group(1).trim();
System.out.println(String.format("Key[%s] Value[%s]", key, value));
}
}
The error I get is:
Exception: java.lang.IndexOutOfBoundsException: No group 1
I thought I was grouping correctly in the regex but I guess not. Any help would be appreciated!
Thanks
UPDATE:
Escaping the brackets worked. Changed the pattern to the followingThanks for the feedback!
private static final String REGEX_PATTERN = "\\[(\\d+)\\:(\\w+(\\w|\\s)*)\\]+";

[ should be escaped (as well as ]).
"\\[(\\d+)....\\]+"
[] Is used for character classes: [0-9] == (0|1|2|...|9)

Try this:
private static final String REGEX_PATTERN = "\\[(\\d+):([\\w\\s]+)\\]";
final Pattern p = Pattern.compile(REGEX_PATTERN);
final Matcher m = p.matcher(propValue);
while (m.find()) {
final String key = m.group(1).trim();
final String value = m.group(2).trim();
System.out.println(String.format("Key[%s] Value[%s]", key, value));
}
the [ and ] need to be escaped because they represent the start and end of a character class
group(0) is always the full match, so your groups should start with 1
note how I wrote the second group [\\w\\s]+. This means a character class of word or whitespace characters

It's your regex, [] are special characters and need to be escaped if you want to interpret them literally.
Try
"\\[(\\d+)\\:(\\w+(\\s)*)\\]"
Note - I removed the '+'. The matcher will keep finding substrings that match the pattern so the + is not necessary. (You might need to feed in a GLOBAL switch - I can't remember).
I can't help but feel this might be simpler without regex though, perhaps by splitting on \n or [ and then splitting on : for each of those.

Since you are using string that consists of several lines you should tell it to Pattern:
final Pattern p = Pattern.compile(REGEX_PATTERN, Pattern.MULTILINE);
Although it is irrelevant directly for you I'd recommend you to add DOTALL too:
final Pattern p = Pattern.compile(REGEX_PATTERN, Pattern.MULTILINE | Pattern.DOTALL);

Related

Replace all occurrences matching given patterns

Having following string:
String value = "/cds/horse/schema1.0.0/day=12321/provider=samsung/run_key=32ee/group_key=222/end_date=2020-04-20/run_key_default=32sas1/somethingElse=else"
In need to replace values of run_key and run_key_default with %, for example, for above string result output will be the:
"/cds/horse/schema1.0.0/day=12321/provider=samsung/run_key=%/group_key=222/end_date=2020-04-20/run_key_default=%/somethingElse=else"
I would like to avoid mistakenly modifying other values, so in my opinion the best solution for it is combining replaceAll method with regex
String output = value.replaceAll("\run_key=[*]\", "%").replaceAll("\run_key_default=[*]\", "%")
I'm not sure how should I construct regex for it?
Feel free to post if you know better solution for it, than this one which I provided.

You may use this regex for search:
(/run_key(?:_default)?=)[^/]*
and for replacement use:
"$1%"
RegEx Demo
Java Code:
String output = value.replaceAll("(/run_key(?:_default)?=)[^/]*", "$1%");
RegEx Details:
(: Start capture group #1
/run_key: Match literal text /run_key
(?:_default)?: Match _default optionally
=: Match a literal =
): End capture group #1
[^/]*: Match 0 or more of any characters that is not /
"$1%" is replacement that puts our 1st capture group back followed by a literal %

public static void main(String[] args) {
final String regex = "(run_key_default|run_key)=\\w*"; //regex
final String string = "/cds/horse/schema1.0.0/day=12321/provider=samsung/run_key=32ee/group_key=222/end_date=2020-04-20/run_key_default=32sas1/somethingElse=else";
final String subst = "$1=%"; //group1 as it is while remaining part with %
final Pattern pattern = Pattern.compile(regex);
final Matcher matcher = pattern.matcher(string);
final String result = matcher.replaceAll(subst);
System.out.println("Substitution result: " + result);
}
output
Substitution result:
/cds/horse/schema1.0.0/day=12321/provider=samsung/run_key=%/group_key=222/end_date=2020-04-20/run_key_default=%/somethingElse=else

Pattern matching with string containing dots

Pattern is:
private static Pattern r = Pattern.compile("(.*\\..*\\..*)\\..*");
String is:
sentVersion = "1.1.38.24.7";
I do:
Matcher m = r.matcher(sentVersion);
if (m.find()) {
guessedClientVersion = m.group(1);
}
I expect 1.1.38 but the pattern match fails. If I change to Pattern.compile("(.*\\..*\\..*)\\.*");
// notice I remove the "." before the last *
then 1.1.38.XXX fails
My goal is to find (x.x.x) in any incoming string.
Where am I wrong?

Problem is probably due to greedy-ness of your regex. Try this negation based regex pattern:
private static Pattern r = Pattern.compile("([^.]*\\.[^.]*\\.[^.]*)\\..*");
Online Demo: http://regex101.com/r/sJ5rD4

Make your .* matches reluctant with ?
Pattern r = Pattern.compile("(.*?\\..*?\\..*?)\\..*");
otherwise .* matches the whole String value.
See here: http://regex101.com/r/lM2lD5

Regex to get the string after # sign

I have a string like follows:
#78517700-1f01-11e3-a6b7-3c970e02b4ec, #68517700-1f01-11e3-a6b7-3c970e02b4ec, #98517700-1f01-11e3-a6b7-3c970e02b4ec, #38517700-1f01-11e3-a6b7-3c970e02b4ec ....
I want to extract the string after #.
I have the current code like follows:
private final static Pattern PATTERN_LOGIN = Pattern.compile("#[^\\s]+");
Matcher m = PATTERN_LOGIN.matcher("#78517700-1f01-11e3-a6b7-3c970e02b4ec , #68517700-1f01-11e3-a6b7-3c970e02b4ec, #98517700-1f01-11e3-a6b7-3c970e02b4ec, #38517700-1f01-11e3-a6b7-3c970e02b4ec");
while (m.find()) {
String mentionedLogin = m.group();
.......
}
... but m.group() gives me #78517700-1f01-11e3-a6b7-3c970e02b4ec but I wanted 78517700-1f01-11e3-a6b7-3c970e02b4ec

You should use the regex "#([^\\s]+)" and then m.group(1), which returns you what "captured" by the capturing parentheses ().
m.group() or m.group(0) return you the full matching string found by your regex.

I would modify the pattern to omit the at sign:
private final static Pattern PATTERN_LOGIN = Pattern.compile("#([^\\s]+)");
So the first group will be the GUID only

Correct answers are mentioned in other responses. I will add some clarification. Your code is working correctly, as expected.
Your regex means: match string which starts with # and after that follows one or more characters which isn't white space. So if you omit the parentheses you get you full string as expected.
The parentheses as mentioned in other responses are used for marking capturing groups. In layman terms - the regex engine does the matching multiple times for each parenthesis enclosed group, working it's way inside the nested structure.

Java and Regex, get a substring which matches

I want to match the following pattern:
[0-9]*-[0-9]*-[BL]
and apply the pattern to this string:
123-456-L-234
which should become
123-456-L.
Here's my code:
HelperRegex{
..
final static Pattern KEY = Pattern.compile("\\d*-\\d*-[BL]");
public static String matchKey(String key) {
return KEY.matcher(key).toMatchResult().group(0);
}
Junit:
#Test
public final void testMatchKey() {
Assert.assertEquals("453-04430-B", HelperRegex.matchKey("453-04430-B-1"));
}
there is a no match found exception thrown.
I've proven my regex with "the regex coach" and it seems not broken, and matches all the teststring

Never mind all that complexity. You only need one line:
String match = input.replaceAll(".*?([0-9]*-[0-9]*-[BL])?.*", "$1");
This will produce a blank string if the pattern is not found.
If it were me, I would in-line this and not even have a separare method.

You need to create the group you want to retrieve with () and make sure your regex matches the whole string (note that group 0 is the whole string, so what you want is group 1):
String key = "453-04430-B-1";
Pattern pattern = Pattern.compile("(\\d*-\\d*-[BL]).*");
Matcher m = pattern.matcher(key);
if (m.matches())
System.out.println(m.group(1)); //prints 453-04430-B

what is wrong with this java regex?

final static private Pattern includePattern = Pattern.compile("^\\s+([^\\s]*)");
...
Matcher mtest = includePattern.matcher(" this.txt");
String ftest = mtest.group(1);
I get an exception No match found at java.util.regex.Matcher.group(Matcher.java:468)
I'm looking for at least 1 space character followed by a captured group of nonspace characters. Where have I gone wrong?

You'll first need to call .find() before you can use group(...).
Note that find() returns a boolean, so it's safe(r) to do something like this:
final static private Pattern includePattern = Pattern.compile("^\\s+([^\\s]*)");
Matcher mtest = includePattern.matcher(" this.txt");
String ftest = m.find() ? mtest.group(1) : null;
And [^\\s] could be rewritten as \\S (capital s).
You might have simplified your example a bit in your question, but I assume you're aware of the fact that String.trim() takes care of any leading and trailing white space characters.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java regex grouping - java

[ should be escaped (as well as ]). "\\[(\\d+)....\\]+" [] Is used for character classes: [0-9] == (0|1|2|...|9)

Related

Replace all occurrences matching given patterns

Pattern matching with string containing dots

Regex to get the string after # sign

Java and Regex, get a substring which matches

what is wrong with this java regex?

Categories

Resources