Java regular expression with Matching text and special charachter - java

Hi I am new to java regex. I have the below string
String s = " "KBC_2022-12-20-2004_IDEAL333_MASTER333_2022-12-20-1804_SUCCESS";
I wanted to only Print "333" which is appended with MASTER . The output should be 333.
Can someone help me writing the regex for this . Basically the code should print the value between "MASTER" and the next "_". here its 333 but the value might be of any number of character not limit to length 3.

You can use this regex: (?<=MASTER)[0-9]+(?=\_).
We are looking for everything between MASTER and _:
lookbehind: everything that goes after MASTER: (?<=MASTER)
lookahead: everything that goes before _: (?=\_)
Try on regex101.com

You can do MASTER(\\d+)_
Pattern p = Pattern.compile("MASTER(\\d+)_");
Matcher m = p.matcher(" KBC_2022-12-20-2004_IDEAL333_MASTER333_2022-12-20-1804_SUCCESS");
if (m.find()) {
System.out.println(m.group(1)); // 333
}
m = p.matcher(" KBC_2022-12-20-2004_IDEAL333_MASTER123_2022-12-20-1804_SUCCESS");
if (m.find()) {
System.out.println(m.group(1)); // 123
}

Related

Java regular expression and "greedy" or early search for slash

Text:
123/444_ab/alphanum/alphanum/alphanum.sss
256/333_123/alphanum/alphanum.fff
777/999_abcde/alphanum.ggg
I want two groups.
first group matches: 123,256, and 77
second group matches: 444_ab, 333_123, and 999_abcde.
The problems is any regexp I come up with is including extra slashes for the second group. e.g.333_123/alphanum
ex.
(\\d{3})/\\d{3}_.+)/.+[.].+
It should be just give first two groups with a following slash.
As an aside, a requirement like this can also easily be handled by any "split by string" function. Split on '/' to obtain an array of values and go from there ...
I find that this is often much easier to read, and to debug, than "regular-expression chicken scratches," when the data has a format such as what you show here. It will also "obviously" show what should happen when the data contains 5, 4, or 3 groups as you demonstrate in your post, and it will work for any number of groups.
^(.*?)\/(.*?)\/.*
This regular expression should do the trick.
Converting my comment to answer so that solution is easy to find for future visitors.
You may use this regex with MULTILINE mode:
(?m)^(\\d{3})/(\\d{3}_[^/]+)
RegEx Demo
RegEx Details:
(?m): Enable inline MULTILINE mode so that ^ matches start of each line
^: Start of line
(\\d{3}): First capture group to match 3 digits
/: Match a /
(\\d{3}_[^/]+): Second capture group to match 3 digits then _ then 1 or more of any character that is not a /
Use *? for a non-greedy match: ^(.*?)/(.*?)/.*.
.*? will match only as few characters as necessary for the whole expression to match.
import java.util.regex.*;
public class MyClass {
public static void main(String args[]) {
String a = "123/444_ab/alphanum/alphanum/alphanum.sss";
String b = "256/333_123/alphanum/alphanum.fff";
String c = "777/999_abcde/alphanum.ggg";
Pattern p = Pattern.compile("^(.*?)/(.*?)/.*");
Matcher m = p.matcher(a);
if (m.matches()) {
System.out.println("a:");
System.out.println(m.group(1));
System.out.println(m.group(2));
} else {
System.out.println("'a' doesn't match.");
}
m = p.matcher(b);
if (m.matches()) {
System.out.println("b:");
System.out.println(m.group(1));
System.out.println(m.group(2));
} else {
System.out.println("'b' doesn't match.");
}
m = p.matcher(c);
if (m.matches()) {
System.out.println("c:");
System.out.println(m.group(1));
System.out.println(m.group(2));
} else {
System.out.println("'c' doesn't match.");
}
}
}
Output:
a:
123
444_ab
b:
256
333_123
c:
777
999_abcde

Extract string between a set of multiple limiters with groups

As title says, I've a string and I want to extract some data from It.
This is my String:
text = "|tab_PRO|1|1|#tRecordType#||0|tab_PRO|";
and I want to extract all the data between the pipes: tab_PRO, 1, 1...and so on
.
I've tried:
Pattern p = Pattern.compile("\\|(.*?)\\|");
Matcher m = p.matcher(text);
while(m.find())
{
for(int i = 1; i< 10; i++) {
test = m.group(i);
System.out.println(test);
}
}
and with this i get the first group that's tab_PRO. But i also get an error
java.lang.IndexOutOfBoundsException: No group 2
Now, probably I didn't understand quite well how the groups works, but I thought that with this I could get the remaining data that I need. I'm not able to understand what I'm missing.
Thanks in advance
Use String.split(). Take into account it expects a regex as an argument, and | is a reserved regex operand, so you'll need to escape it with a \. So, make it two \ so \| won't be interpreted as if you're using an - invalid - escape sequence for the | character:
String[] parts = text.split("\\|");
See it working here:
https://ideone.com/WibjUm
If you want to go with your regex approach, you'll need to group and capture every repetition of characters after every | and restrict them to be anything except |, possibly using a regex like \\|([^\\|]*).
In your loop, you iterate over m.find() and just use capture group 1 because its the only group every match will have.
String text = "|tab_PRO|1|1|#tRecordType#||0|tab_PRO|";
Pattern p = Pattern.compile("\\|([^\\|]*)");
Matcher m = p.matcher(text);
while(m.find()){
System.out.println(m.group(1));
}
https://ideone.com/RNjZRQ
Try using .split() or .substring()
As mentioned in the comments, this is easier done with String.split.
As for your own code, you are unnecessarily using the inner loop, and that's leading to that exception. You only have one group, but the for loop will cause you to query more than one group. Your loop should be as simple as:
Pattern p = Pattern.compile("(?<=\\|)(.*?)\\|");
Matcher m = p.matcher(text);
while (m.find()) {
String test = m.group(1);
System.out.println(test);
}
And that prints
tab_PRO
1
1
#tRecordType#
0
tab_PRO
Note that I had to use a look-behind assertion in your regex.

Regex for find data

I have used this (?:#\d{7}) regex for extracting only 7 digit after '#'.
For example I have string something like "#1234567890". After using the above patterrn I will get 7 digit after '#'.
Now the problem is : I have string something like that "Referenc number #1234567890"
where "Referenc number #" fixed.
Now I am finding for regex which can return the 1234567 number from the above string.
I have a one file which contains above string and there are also other data available.
You can try something like this:
String ref_no = "Referenc number #123456789";
Pattern p = Pattern.compile("Referenc number #([0-9]{7})");
Matcher m = p.matcher(ref_no);
while (m.find())
{
System.out.println(m.group(1));
}
The ?: should make your group "non-capturing", so if you add that separately around the hash sign, it should used for matching but excluded from capture.
(?:#)(\d{7})
If the String always starts with Referenc number # you could just use the following code:
String text = "Referenc number #1234567890";
Pattern pattern = Pattern.compile("\\d{7}");
Matcher matcher = pattern.matcher(text);
while(matcher.find()){
System.out.println(matcher.group());
}

Regular expression for a string starting with some string

I have some string, that has this type: (notice)Any_other_string (notes that : () has in this string`.
So, I want to separate this string to 2 part : (notice) and the rest. I do as follow :
private static final Pattern p1 = Pattern.compile("(^\\(notice\\))([a-z_A-Z1-9])+");
String content = "(notice)Stack Over_Flow 123";
Matcher m = p1.matcher(content);
System.out.println("Printing");
if (m.find()) {
System.out.println(m.group(0));
System.out.println(m.group(1));
}
I hope the result will be (notice) and Stack Over_Flow 123, but instead, the result is : (notice)Stack and (notice)
I cannot explain this result. Which regex is suitable for my purpose?
Issue 1: group(0) will always return the entire match - this is specified in the javadoc - and the actual capturing groups start from index 1. Simply replace it with the following:
System.out.println(m.group(1));
System.out.println(m.group(2));
Issue 2: You do not take spaces and other characters, such as underscores, into account (not even the digit 0). I suggest using the dot, ., for matching unknown characters. Or include \\s (whitespace) and _ into your regex. Either of the following regexes should work:
(^\\(notice\\))(.+)
(^\\(notice\\))([A-Za-z0-9_\\s]+)
Note that you need the + inside the capturing group, or it will only find the last character of the second part.

RegEx - problem with multiline input

I have a String with multiline content and want to select a multiline region, preferably using a regular expression (just because I'm trying to understand Java RegEx at the moment).
Consider the input like:
Line 1
abc START def
Line 2
Line 3
gh END jklm
Line 4
Assuming START and END are unique and the start/end markers for the region, I'd like to create a pattern/matcher to get the result:
def
Line 2
Line 3
gh
My current attempt is
Pattern p = Pattern.compile("START(.*)END");
Matcher m = p.matcher(input);
if (m.find())
System.out.println(m.group(1));
But the result is
gh
So m.start() seems to point at the beginning of the line that contains the 'end marker'. I tried to add Pattern.MULTILINE to the compile call but that (alone) didn't change anything.
Where is my mistake?
You want Pattern.DOTALL, so . matches newline characters. MULTILINE addresses a different issue, the ^ and $ anchors.
Pattern p = Pattern.compile("START(.*)END", Pattern.DOTALL);
You want to set Pattern.DOTALL (so you can match end of line characters with your . wildcard), see this test:
#Test
public void testMultilineRegex() throws Exception {
final String input = "Line 1\nabc START def\nLine 2\nLine 3\ngh END jklm\nLine 4";
final String expected = " def\nLine 2\nLine 3\ngh ";
final Pattern p = Pattern.compile("START(.*)END", Pattern.DOTALL);
final Matcher m = p.matcher(input);
if (m.find()) {
Assert.assertEquals(expected, m.group(1));
} else {
Assert.fail("pattern not found");
}
}
The regex metachar . does not match a newline. You can try the regex:
START([\w\W]*)END
which uses [\w\W] in place of ..
[\w\W] is a char class to match a word-char and a non-word-char, so effectively matches everything.

Categories