Regex matching in OData Filter query - java

I have to match patterns from a main string using regex in java 8
This is the pattern I have so far.
Email.*?(:parameter[0-9]+[^,])
It works on line 1 and line 2 below but fails on line 3 by matching just this Email IN (:parameter10
Note: I am fine with the closing bracket at the end being matched or not, I can work either way
// should match "Email = :parameter1)"
String line1 = "(Email = :parameter1)";
// should match "Email IN (:parameter1,:parameter2)"
String line2 = "(Email IN (:parameter1,:parameter2) AND (FirstName = :parameter3))";
// should match "Email IN (:parameter10,:parameter11)"
String line3 = "(Email IN (:parameter10,:parameter11) AND (FirstName = :parameter13))";
Thanks in advance

You can use
Email.*?(:parameter[0-9]+)(?![0-9,])\)?
See the regex demo. Details:
Email - a fixed string
.*? - any zero or more chars other than line break chars as few as possible
(:parameter[0-9]+) - Group 1: a : char, then parameter word and then one or more digits
(?![0-9,]) - a negative lookahead that fails the match if there is a digit or a comma immediately to the right of the current location
\)? - an optional ) char.

Based on your input, technically this is sufficient:
Email[^)]*\)
It takes everything for Email up to the last ) inclusive.
If you want more validation on the parameterX then this is more specific
Email.*?((:parameter\d+,?)+)\)
It takes Email then anything until first parameter then optional other parameter and again ends by the )

Related

Pattern Matching to find trailing spaces outside of text fields in a line

I have to validate the lines from a text file. The line would be something like below.
"Field1" "Field2" "Field3 Field_3.1 Field3.2" 23 3445 "Field5".
The delimiter here is a single Space(\s). If more than one space present outside of text fields, then the line should be rejected. For example,
Note : \s would be present as literal space and not as \s in the line. For easy reading I mentioned space as \s
Invalid:
"Field1"\\s\\s"Field2" "Field3 Field_3.1 Field3.2" 23\\s\\s3445 "Field5". //two or more spaces between "Field1" and "Field2" or numeric fields 23 3445. \s would be present as literal space and not as \s
Valid
"Field1\\s\\s" "\\s\\sField2" "Field3\\s\\sField_3.1\\s\\sField3.2" 23 3445 "Field5". //two or more spaces within third field "Field3 Field_3.1 Field3.2" or at the end/beginning of any field as in first two fields.
I created a Pattern as below to validate the Spaces in between. But it's not working as expected when there're more than two Strings and a numeric present inside a Field wrapped by Double quotes like "Field3 Field_3.1 123"
public class SpaceValidation
{
public static void main(String ar[])
{
String spacePattern_1 = "[\"^\\n]\\s{2,}?(\".*\")|\\s\\s\\d|\\d\\s\\s";
String line1 = "Field3 Field_3.1 "; // valid and pattern doesn't find it as invalid - Works as expected
String line2 = "Field3 Field_3.1 123";//Valid and but pattern find it as invalid - Not working as expected.
Pattern pattern = Pattern.compile(spacePattern_1);
Matcher matLine1 = pattern.matcher(line1);
Matcher matLine2 = pattern.matcher(line2);
if(matLine1.find())
{
sysout("Invalid Line1");
}
if(matLine2.find())
{
sysout("Invalid Line2");
}
}
I have tried another pattern given below. But due to backtracking issues reported I have to avoid the below pattern, Even this one is not working when there are more than two subfields present two or more spaces in a line.
(\".*\")\\s{2,}?(\".*\")|\\s\\s\\d|\\d\\s\\s
// * or . shouldn't be present more than once in the same condition to prevent backtracking, hence I have to use negation of \\n in the above code
Kindly let me know how I could resolve this using pattern for fields such as "field3 field3.1 123", which is a valid field. Thanks in advance.
EDIT:
After little bit tinkering, I narrowed down the issue to digit. The lines becomes invalid only if the third subfield is numeric ("Field 3 Field3.1 123"). For alphabets its working fine.
Here in the pattern \\s\\s\\d seems to be the culprit. It's that condition that flags the third subfield as invalid(numeric subfield 123). But I need that to validate numeric fields present outside of the DoubleQuotes.
You can use
^(?:\"[^\"]*\"|\d+)(?:\s(?:\"[^\"]*\"|\d+))*$
If you are using it to extract lines from a multiline document:
(?m)^(?:\"[^\"\n\r]*\"|\d+)(?:\h(?:\"[^\"\n\r]*\"|\d+))*\r?$
See the regex demo.
Details:
^ - start of a string (line, if you use (?m) or Pattern.MULTILINE)
(?:\"[^\"]*\"|\d+) - either " + zero or more chars other than " + ", or one or more digits
(?:\s(?:\"[^\"]*\"|\d+))* - zero or more sequences of
\s - a single whitespace
(?:\"[^\"]*\"|\d+) - either " + zero or more chars other than " + ", or one or more digits
$ - end of string
The second pattern contains \h instead of \s to only match horizontal whitespaces, [^\"\n\r] matches any char other than ", line feed and carriage return.
In Java:
String pattern = "^(?:\"[^\"]*\"|\\d+)(?:\\s(?:\"[^\"]*\"|\\d+))*$";
String pattern = "(?m)^(?:\"[^\"\n\r]*\"|\\d+)(?:\\h(?:\"[^\"\n\r]*\"|\\d+))*\r?$";

Java regex Matcher.find() confusion

I'm an experienced coder but a regex novice running Oracle's JDK 1.8 on Windows 10.
My code:
private static void regex1() {
Console con = System.console();
String txt;
Pattern pat =
Pattern.compile(con.readLine("Input a regular expression: "));
while (true) {
txt = con.readLine("\nInput a string: ");
if (txt.isEmpty()) {
break;
}
Matcher mch = pat.matcher(txt);
if (mch.find()) {
con.printf("That string matches\n");
for (int grp = 0; grp <= mch.groupCount(); grp++) {
con.printf(" Group %d matched %s\n",
grp, mch.group(grp));
}
}
else {
con.printf("That string does not match\n");
}
}
}
A sample run:
Input a regular expression: ([a-zA-Z]*), ([a-zA-Z]*)
Pattern: '([a-zA-Z]*), ([a-zA-Z]*)'
Input a string: Doe, John
String: 'Doe, John'
That string matches
2 groups
Group 0 matched 'Doe, John'
Group 1 matched 'Doe'
Group 2 matched 'John'
Input a string: Bond, 007
String: 'Bond, 007'
That string matches
2 groups
Group 0 matched 'Bond, '
Group 1 matched 'Bond'
Group 2 matched ''
Input a string: once again, stuff
String: 'once again, stuff'
That string matches
2 groups
Group 0 matched 'again, stuff'
Group 1 matched 'again'
Group 2 matched 'stuff'
Input a string:
The first and third sets seem fine, but the "Bond, 007" response has me stumped.
The expression is a group of one or more alphas followed by a comma and a space followed by another group of one or more alphas.
The find() method seems to be returning true when it stumbles on the "007" and the group that it claims to have matched is a null string.
Am I missing something obvious here or just losing my mind?
TIA
Following documentation of the find() method, we can see that it will:
Attempts to find the next subsequence of the input sequence that matches the pattern.
In the case where you input Bond, 0007, your regex will match:
Capture group 0 (the whole match): Bond,
Capture group 1 (the first part between ()'s (([a-zA-Z]*)): Bond
Capture group 2 (the second part between ()'s (([a-zA-Z]*)): Empty string
I'm suspecting that your confusion either comes from find() not matching the entire input (if you want this, then you should use matches() instead), or you might be confused by * being able to match zero occurrences of the part it applies to (opposed to +, which must match at least once).

Java Regex decoding treating multiple delimiters as same not working

and thank you for your help,
I am trying to get a regex expression to decode a string with either a comma or semi-colon as anchor but I can't seem to get it to work for comma's or both. Please tell me what I'm missing or doing wrong. thanks!
^(?<FADECID>\d{6})?(?<MSG>([a-z A-Z 0-9 ()-:]*[;,]{1}+){8,}+)?(?<ANCH>\w*[;,])?(?<TIME>\d{4})?(?<FM>\d{2})?[;,]?(?<CON>.*)$.*
inbound type strings to decode - I need to treat the comma and or semicolon the same.
383154VSC X1;;;;;;;BOTH WASTE DRAIN VLV NOT CLSD (135MG;35MG);HARD;093502
282151FCMC1 1;;;;;;;FUEL MAIN PUMP1 (121QA1);HARD;093502
732112EEC2B 1;;;;;;;FMU(E2-4071KS)WRG:EEC J12 TO FMV LVDT POS,HARD;
383154VSC X1,,,,,,,BOTH WASTE DRAIN VLV NOT CLSD (135MG,35MG),HARD,093502
282151FCMC1 1,,,,,,,FUEL MAIN PUMP1 (121QA1);HARD;093502
732112EEC2B 1,,,,,,,FMU(E2-4071KS)WRG:EEC J12 TO FMV LVDT POS,HARD,
383154VSC X1,,,,,,,BOTH WASTE DRAIN VLV NOT CLSD (135MG;35MG);HARD;093502
282151FCMC1 1;;;;;;;FUEL MAIN PUMP1 (121QA1),HARD,093502
732112EEC2B 1,,,,,,,FMU(E2-4071KS)WRG:EEC J12 TO FMV LVDT POS;HARD;
This string has the possibility to contain mulitple text [;,] separated messages.
ABC;DEF;;HIJ;NNN;JJJ;XXX;EEX;HARD;
This manages that - (?([a-z A-Z 0-9 ()-:]*[;,]{1}+){8,}+)?
but it doesn't observe commas?
This works for ; but not for comma or both, my problem is that it can be both a semi-colon or a comma?
if I make the regex only comma, it works for comma strings, I know i'm missing a quantifier or something like.
if ( null != MORE && ! MORE.isEmpty() ) {
while ( null != MORE && ! MORE.isEmpty() || MORE.trim().equals("EOR")) {
LOG.info("MORE CONTINUE: " + MORE);
if ( MORE.trim().equals("EOR") ) {
break;
}
String patternMoreString = "^(?<FADECID>\\d{6})?(?<MSG>([a-z A-Z 0-9 ()-:()]*[;,]{1}+){8,}+)+?(?<ANCH>\\w*[;,])?(?<TIME>\\d{4})?(?<FM>\\d{2})?[;,]?(?<CON>.*)$.*";
Pattern patternMore = Pattern.compile(patternMoreString, Pattern.DOTALL);
Matcher matcherMore = patternMore.matcher(MORE);
while ( matcherMore.find() ) {
MORE = matcherMore.group("CON");
summary.setReportId("FLR");
summary.setAreg(Areg);
summary.setShip(Ship);
summary.setOrig(Orig);
summary.setDest(Dest);
summary.setTimestamp(Ts);
summary.setAta(matcherMore.group("FADECID"));
summary.setTime(matcherMore.group("TIME"));
summary.setFm(matcherMore.group("FM"));
summary.setMsg(matcherMore.group("MSG"));
serviceRecords.add(summary);
LOG.info("*** A330 MPF MORE Record ***");
LOG.info(summary.getReportId());
LOG.info(summary.getAreg());
LOG.info(summary.getShip());
LOG.info(summary.getOrig());
LOG.info(summary.getDest());
LOG.info(summary.getTimestamp());
LOG.info(summary.getAta());
LOG.info(summary.getTime());
LOG.info(summary.getFm());
LOG.info(summary.getMsg());
summary = new A330PostFlightReportRecord();
}
}
}
}
//---
I need for all cases group 2 and if TIME and FM exists.
You could make use of a capturing group and a backreference using the number of that group to get consistent delimiters.
In this case the capturing group is ([;,]) which is the fourth group denoted by \4 matching either ; or ,
If you only need group 2 and if TIME and FM you can omit group ANCH
^(?<FADECID>\d{6})(?<MSG>([a-zA-Z0-9() -]*([;,])){7,})(?<TIME>\d{4})?(?<FM>\d{2})?\4?(?<CON>.*)$
Explanation
^ Start of string
(?<FADECID>\d{6}) Named group FADECID, match 6 digits
(?<MSG> Named group MSG
( Capture group 3
[a-zA-Z0-9() -]* Match 0+ times any of the lister
([;,]) Capture group 4, used as backreference to get consistent delimiters
){7,} Close group and repeat 7+ times
) Close group MSG
(?<TIME>\d{4})? Optional named group TIME, match 4 digits
(?<FM>\d{2})? Optional named group FM, match 2 digits
\4? Optional backreference to capture group 4
(?<CON>.*) Named group CON Match any char except a newline 0+ times
$ End of string
Regex demo
Note that group 3 the capture group itself is repeated, giving you the last value of the iteration, which will be HARD

Match a string in java replace it and get the integer from it

I am trying to find and replace a part of the string which contains an integer.
String str = "I <FS:20>am in trouble.</FS>";
I need to replace and
for /FS I am using
str = str.replace("</FS>", "\\fs0");
I am not sure how to approach the FS:20 because the 20 is a variable and in some cases might be a different number which means that I need to somehow the int part.
Input :
"I FS:20 am in trouble.";
Output :
"I \fs20 am in trouble.";
but 20 is not a fixed variable so I can't hardcode it
One way to do it is to make two replacements:
str = str.replaceAll("</FS>", "");
str = str.replaceAll("<FS:(\\d+)>", "\\\\fs$1");
System.out.println(str);
Output:
I \fs20am in trouble.
The first replacement just removes </FS> from the string.
The second replacement makes use of a RegEx pattern <FS:(\d+)>.
The RegEx pattern matches the literal characters <FS: followed by one or more digits, which it stores in group 1 (\d+), finally followed by the character >
The value stored in group 1 can be used in the replacement string using $1, so \\\\fs$1 will be a backslash \ followed by fs followed by the contents of group 1 (\d+), in this case 20.
The numbers matched by \d+ are stored in group 1, accessed using $1
If you can use your variable that is 20 in described case.
Integer yourVariable=20;
String str = "I <FS:20>am in trouble.</FS>";
str = str.replace("<FS:"+yourVariable+">", "\\fs0");

How to replace last letter to another letter in java using regular expression

i have seen to replace "," to "." by using ".$"|",$", but this logic is not working with alphabets.
i need to replace last letter of a word to another letter for all word in string containing EXAMPLE_TEST using java
this is my code
Pattern replace = Pattern.compile("n$");//here got the real problem
matcher2 = replace.matcher(EXAMPLE_TEST);
EXAMPLE_TEST=matcher2.replaceAll("k");
i also tried "//n$" ,"\n$" etc
Please help me to get the solution
input text=>njan ayman
output text=> njak aymak
Instead of the end of string $ anchor, use a word boundary \b
String s = "njan ayman";
s = s.replaceAll("n\\b", "k");
System.out.println(s); //=> "njak aymak"
You can use lookahead and group matching:
String EXAMPLE_TEST = "njan ayman";
s = EXAMPLE_TEST.replaceAll("(n)(?=\\s|$)", "k");
System.out.println("s = " + s); // prints: s = njak aymak
Explanation:
(n) - the matched word character
(?=\\s|$) - which is followed by a space or at the end of the line (lookahead)
The above is only an example! if you want to switch every comma with a period the middle line should be changed to:
s = s.replaceAll("(,)(?=\\s|$)", "\\.");
Here's how I would set it up:
(?=.\b)\w
Which in Java would need to be escaped as following:
(?=.\\b)\\w
It translates to something like "a character (\w) after (?=) any single character (.) at the end of a word (\b)".
String s = "njan ayman aowkdwo wdonwan. wadawd,.. wadwdawd;";
s = s.replaceAll("(?=.\\b)\\w", "");
System.out.println(s); //nja ayma aowkdw wdonwa. wadaw,.. wadwdaw;
This removes the last character of all words, but leaves following non-alphanumeric characters. You can specify only specific characters to remove/replace by changing the . to something else.
However, the other answers are perfectly good and might achieve exactly what you are looking for.
if (word.endsWith("char oldletter")) {
name = name.substring(0, name.length() - 1 "char newletter");
}

Categories