Java Regex decoding treating multiple delimiters as same not working - java

and thank you for your help,
I am trying to get a regex expression to decode a string with either a comma or semi-colon as anchor but I can't seem to get it to work for comma's or both. Please tell me what I'm missing or doing wrong. thanks!
^(?<FADECID>\d{6})?(?<MSG>([a-z A-Z 0-9 ()-:]*[;,]{1}+){8,}+)?(?<ANCH>\w*[;,])?(?<TIME>\d{4})?(?<FM>\d{2})?[;,]?(?<CON>.*)$.*
inbound type strings to decode - I need to treat the comma and or semicolon the same.
383154VSC X1;;;;;;;BOTH WASTE DRAIN VLV NOT CLSD (135MG;35MG);HARD;093502
282151FCMC1 1;;;;;;;FUEL MAIN PUMP1 (121QA1);HARD;093502
732112EEC2B 1;;;;;;;FMU(E2-4071KS)WRG:EEC J12 TO FMV LVDT POS,HARD;
383154VSC X1,,,,,,,BOTH WASTE DRAIN VLV NOT CLSD (135MG,35MG),HARD,093502
282151FCMC1 1,,,,,,,FUEL MAIN PUMP1 (121QA1);HARD;093502
732112EEC2B 1,,,,,,,FMU(E2-4071KS)WRG:EEC J12 TO FMV LVDT POS,HARD,
383154VSC X1,,,,,,,BOTH WASTE DRAIN VLV NOT CLSD (135MG;35MG);HARD;093502
282151FCMC1 1;;;;;;;FUEL MAIN PUMP1 (121QA1),HARD,093502
732112EEC2B 1,,,,,,,FMU(E2-4071KS)WRG:EEC J12 TO FMV LVDT POS;HARD;
This string has the possibility to contain mulitple text [;,] separated messages.
ABC;DEF;;HIJ;NNN;JJJ;XXX;EEX;HARD;
This manages that - (?([a-z A-Z 0-9 ()-:]*[;,]{1}+){8,}+)?
but it doesn't observe commas?
This works for ; but not for comma or both, my problem is that it can be both a semi-colon or a comma?
if I make the regex only comma, it works for comma strings, I know i'm missing a quantifier or something like.
if ( null != MORE && ! MORE.isEmpty() ) {
while ( null != MORE && ! MORE.isEmpty() || MORE.trim().equals("EOR")) {
LOG.info("MORE CONTINUE: " + MORE);
if ( MORE.trim().equals("EOR") ) {
break;
}
String patternMoreString = "^(?<FADECID>\\d{6})?(?<MSG>([a-z A-Z 0-9 ()-:()]*[;,]{1}+){8,}+)+?(?<ANCH>\\w*[;,])?(?<TIME>\\d{4})?(?<FM>\\d{2})?[;,]?(?<CON>.*)$.*";
Pattern patternMore = Pattern.compile(patternMoreString, Pattern.DOTALL);
Matcher matcherMore = patternMore.matcher(MORE);
while ( matcherMore.find() ) {
MORE = matcherMore.group("CON");
summary.setReportId("FLR");
summary.setAreg(Areg);
summary.setShip(Ship);
summary.setOrig(Orig);
summary.setDest(Dest);
summary.setTimestamp(Ts);
summary.setAta(matcherMore.group("FADECID"));
summary.setTime(matcherMore.group("TIME"));
summary.setFm(matcherMore.group("FM"));
summary.setMsg(matcherMore.group("MSG"));
serviceRecords.add(summary);
LOG.info("*** A330 MPF MORE Record ***");
LOG.info(summary.getReportId());
LOG.info(summary.getAreg());
LOG.info(summary.getShip());
LOG.info(summary.getOrig());
LOG.info(summary.getDest());
LOG.info(summary.getTimestamp());
LOG.info(summary.getAta());
LOG.info(summary.getTime());
LOG.info(summary.getFm());
LOG.info(summary.getMsg());
summary = new A330PostFlightReportRecord();
}
}
}
}
//---
I need for all cases group 2 and if TIME and FM exists.

You could make use of a capturing group and a backreference using the number of that group to get consistent delimiters.
In this case the capturing group is ([;,]) which is the fourth group denoted by \4 matching either ; or ,
If you only need group 2 and if TIME and FM you can omit group ANCH
^(?<FADECID>\d{6})(?<MSG>([a-zA-Z0-9() -]*([;,])){7,})(?<TIME>\d{4})?(?<FM>\d{2})?\4?(?<CON>.*)$
Explanation
^ Start of string
(?<FADECID>\d{6}) Named group FADECID, match 6 digits
(?<MSG> Named group MSG
( Capture group 3
[a-zA-Z0-9() -]* Match 0+ times any of the lister
([;,]) Capture group 4, used as backreference to get consistent delimiters
){7,} Close group and repeat 7+ times
) Close group MSG
(?<TIME>\d{4})? Optional named group TIME, match 4 digits
(?<FM>\d{2})? Optional named group FM, match 2 digits
\4? Optional backreference to capture group 4
(?<CON>.*) Named group CON Match any char except a newline 0+ times
$ End of string
Regex demo
Note that group 3 the capture group itself is repeated, giving you the last value of the iteration, which will be HARD

Related

Use regex to get 2 specific groups of substring

String s = #Section250342,Main,First/HS/12345/Jack/M,2000 10.00,
#Section250322,Main,First/HS/12345/Aaron/N,2000 17.00,
#Section250399,Main,First/HS/12345/Jimmy/N,2000 12.00,
#Section251234,Main,First/HS/12345/Jack/M,2000 11.00
Wherever there is the word /Jack/M in the3 string, I want to pull the section numbers(250342,251234) and the values(10.00,11.00) associated with it using regex each time.
I tried something like this https://regex101.com/r/4te0Lg/1 but it is still messed.
.Section(\d+(?:\.\d+)?).*/Jack/M
If the only parts of each section that change are the section number, the name of the person and the last value (like in your example) then you can make a pattern very easily by using one of the sections where Jack appears and replacing the numbers you want by capturing groups.
Example:
#Section250342,Main,First/HS/12345/Jack/M,2000 10.00
becomes,
#Section(\d+),Main,First/HS/12345/Jack/M,2000 (\d+.\d{2})
If the section substring keeps the format but the other parts of it may change then just replace the rest like this:
#Section(\d+),\w+,(?:\w+/)*Jack/M,\d+ (\d+.\d{2})
I'm assuming that "Main" is a class, "First/HS/..." is a path and that the last value always has 2 and only 2 decimal places.
\d - A digit: [0-9]
\w - A word character: [a-zA-Z_0-9]
+ - one or more times
* - zero or more times
{2} - exactly 2 times
() - a capturing group
(?:) - a non-capturing group
For reference see: https://docs.oracle.com/en/java/javase/18/docs/api/java.base/java/util/regex/Pattern.html
Simple Java example on how to get the values from the capturing groups using java.util.regex.Pattern and java.util.regex.Matcher
import java.util.regex.*;
public class GetMatch {
public static void main(String[] args) {
String s = "#Section250342,Main,First/HS/12345/Jack/M,2000 10.00,#Section250322,Main,First/HS/12345/Aaron/N,2000 17.00,#Section250399,Main,First/HS/12345/Jimmy/N,2000 12.00,#Section251234,Main,First/HS/12345/Jack/M,2000 11.00";
Pattern p = Pattern.compile("#Section(\\d+),\\w+,(?:\\w+/)*Jack/M,\\d+ (\\d+.\\d{2})");
Matcher m;
String[] tokens = s.split(",(?=#)"); //split the sections into different strings
for(String t : tokens) //checks every string that we got with the split
{
m = p.matcher(t);
if(m.matches()) //if the string matches the pattern then print the capturing groups
System.out.printf("Section: %s, Value: %s\n", m.group(1), m.group(2));
}
}
}
You could use 2 capture groups, and use a tempered greedy token approach to not cross #Section followed by a digit.
#Section(\d+)(?:(?!#Section\d).)*\bJack/M,\d+\h+(\d+(?:\.\d+)?)\b
Explanation
#Section(\d+) Match #Section and capture 1+ digits in group 1
(?:(?!#Section\d).)* Match any character if not directly followed by #Section and a digit
\bJack/M, Match the word Jack and /M,
\d+\h+ Match 1+ digits and 1+ spaces
(\d+(?:\.\d+)?) Capture group 2, match 1+ digits and an optional decimal part
\b A word boundary
Regex demo
In Java:
String regex = "#Section(\\d+)(?:(?!#Section\\d).)*\\bJack/M,\\d+\\h+(\\d+(?:\\.\\d+)?)\\b";

Regex matching in OData Filter query

I have to match patterns from a main string using regex in java 8
This is the pattern I have so far.
Email.*?(:parameter[0-9]+[^,])
It works on line 1 and line 2 below but fails on line 3 by matching just this Email IN (:parameter10
Note: I am fine with the closing bracket at the end being matched or not, I can work either way
// should match "Email = :parameter1)"
String line1 = "(Email = :parameter1)";
// should match "Email IN (:parameter1,:parameter2)"
String line2 = "(Email IN (:parameter1,:parameter2) AND (FirstName = :parameter3))";
// should match "Email IN (:parameter10,:parameter11)"
String line3 = "(Email IN (:parameter10,:parameter11) AND (FirstName = :parameter13))";
Thanks in advance
You can use
Email.*?(:parameter[0-9]+)(?![0-9,])\)?
See the regex demo. Details:
Email - a fixed string
.*? - any zero or more chars other than line break chars as few as possible
(:parameter[0-9]+) - Group 1: a : char, then parameter word and then one or more digits
(?![0-9,]) - a negative lookahead that fails the match if there is a digit or a comma immediately to the right of the current location
\)? - an optional ) char.
Based on your input, technically this is sufficient:
Email[^)]*\)
It takes everything for Email up to the last ) inclusive.
If you want more validation on the parameterX then this is more specific
Email.*?((:parameter\d+,?)+)\)
It takes Email then anything until first parameter then optional other parameter and again ends by the )

Java regex Matcher.find() confusion

I'm an experienced coder but a regex novice running Oracle's JDK 1.8 on Windows 10.
My code:
private static void regex1() {
Console con = System.console();
String txt;
Pattern pat =
Pattern.compile(con.readLine("Input a regular expression: "));
while (true) {
txt = con.readLine("\nInput a string: ");
if (txt.isEmpty()) {
break;
}
Matcher mch = pat.matcher(txt);
if (mch.find()) {
con.printf("That string matches\n");
for (int grp = 0; grp <= mch.groupCount(); grp++) {
con.printf(" Group %d matched %s\n",
grp, mch.group(grp));
}
}
else {
con.printf("That string does not match\n");
}
}
}
A sample run:
Input a regular expression: ([a-zA-Z]*), ([a-zA-Z]*)
Pattern: '([a-zA-Z]*), ([a-zA-Z]*)'
Input a string: Doe, John
String: 'Doe, John'
That string matches
2 groups
Group 0 matched 'Doe, John'
Group 1 matched 'Doe'
Group 2 matched 'John'
Input a string: Bond, 007
String: 'Bond, 007'
That string matches
2 groups
Group 0 matched 'Bond, '
Group 1 matched 'Bond'
Group 2 matched ''
Input a string: once again, stuff
String: 'once again, stuff'
That string matches
2 groups
Group 0 matched 'again, stuff'
Group 1 matched 'again'
Group 2 matched 'stuff'
Input a string:
The first and third sets seem fine, but the "Bond, 007" response has me stumped.
The expression is a group of one or more alphas followed by a comma and a space followed by another group of one or more alphas.
The find() method seems to be returning true when it stumbles on the "007" and the group that it claims to have matched is a null string.
Am I missing something obvious here or just losing my mind?
TIA
Following documentation of the find() method, we can see that it will:
Attempts to find the next subsequence of the input sequence that matches the pattern.
In the case where you input Bond, 0007, your regex will match:
Capture group 0 (the whole match): Bond,
Capture group 1 (the first part between ()'s (([a-zA-Z]*)): Bond
Capture group 2 (the second part between ()'s (([a-zA-Z]*)): Empty string
I'm suspecting that your confusion either comes from find() not matching the entire input (if you want this, then you should use matches() instead), or you might be confused by * being able to match zero occurrences of the part it applies to (opposed to +, which must match at least once).

Regular expression for phone number starting with '00' or '+'

I've got a regex problem: I'm trying to force a phone number beginning with either "00" or "+" but my attempt doesn't work.
String PHONE_PATTERN = "^[(00)|(+)]{1}[0-9\\s.\\/-]{6,20}$";
It still allows for example "0123-45678". What am i doing wrong?
Inside character class every character is matched literally, which means [(00)|(+)] will match a 0 or + or | or ( or )
Use this regex:
String PHONE_PATTERN = "^(?:00|\\+)[0-9\\s.\\/-]{6,20}$";
if you have removed spaces, hyphens and whatever from the number, and you want to catch either +xxnnnnnnnn or 00xxnnnnnnnn where xx is the country code of course and n is the 9 digit number OR 0nnnnnnnnn where a non international number starting with a zero is followed by 9 digits then try this regex
String PHONE_PATTERN = "^(?:(?:00|\+)\d{2}|0)[1-9](?:\d{8})$"

How to group in regex

I have this input string(oid) : 1.2.3.4.5.66.77.88.99.10.52
I want group each number into 3 to like this
Group 1 : 1.2.3
Group 2 : 4.5.66
Group 3 : 77.88.99
Group 4 : 10.52
It should be very dynamic depending on the input. If it has 30 numbers meaning it will return 10 groups.
I have tested using this regex : (\d+.\d+.\d+)
But the result is this
Match 1: 1.2.3
Subgroups:
1: 1.2.3
Match 2: 4.5.66
Subgroups:
1: 4.5.66
Match 3: 77.88.99
Subgroups:
1: 77.88.99
Where as still missed one more matches.
Can anyone help me to provide the Regex. Thank you
\d+(?:\.\d+){0,2}
This is basically the same as Al's final regex - ((?:\d+\.){0,2}\d+) - but I think it's clearer this way. And there's no need to put parentheses around the whole regex. Assuming you're using Matcher.find() to get the matches, you can use group() or group(0) instead of group(1) to retrieve the matched text.
If you want to match up to three digits, you should try:
((?:\d+\.?){1,3})
The {1,3} part matches 1-3 of the preceding item (which is one or more digits followed by a literal .. Note that the dot is escaped so that it doesn't match any character.
Edit
Further explanation: The (?: ) part is a grouping that cannot be used for backreferences (tends to be faster), see section 4.3 here for more information. You could, of course, also just use ((\d+\.?){1,3}) if you prefer. For more information on {1,3}, see here under "Limiting Repetition".
Edit (2)
Fixed error pointed out by dtmunir. An alternative way that is a bit more explicit (and doesn't catch the extra "." at the end of the early groups) is:
((?:\d+\.){0,2}\d+)
Al that will not capture the 52. But this one in fact will:
((?:\d+\.?){1,3})
The only change is adding the question mark after the .
This allows it to accept the last number without having a period after it
Explanation (EDIT):
The \d+ as you can imagine captures consecutive digits.
The \. captures a period
The \.? captures a period, but allows the inner group to not require a period at the end
The (?:\d+\.?) defines "one group" which in your case you want to be 3 numbers.
The {1,3} sets the limits. It requires a minimum of 1 inner group and at most 3 inner groups. These groups may or may not end with a period.
This is my weird code for do this without regex :-)
public static String[] getTokens(String s) {
String[] splitted = s.split("\\.");
//Personally I hate Double.valueOf but I don't know how to avoid it
String[] result = new String[Double.valueOf(Math.ceil(Double.valueOf(splitted.length) / 3)).intValue()];
for (int i = 0, j = 0; j < splitted.length; i++, j+=3) {
//Weird concat
result[i] = splitted[j] + ( j+1 < splitted.length ? "." + splitted[j+1] : "" ) + ( j+2 < splitted.length ? "." + splitted[j+2] : "" );
}
return result;
}

Categories