Get Sub-string from String with specific Pattern in JAVA

Get Sub-string from String with specific Pattern in JAVA - java

I have the following input:
8=FIX.4.2|9=00394|35=8|49=FIRST|8=FIX.4.2|9=00394|35=8|56=MIDDLE|10=245|8=FIX.4.2|9=00394|35=8|49=LAST|56=HEMADTS|10=024|
Now I want the strings that are starting with "8=???" and end with "10=???|". You can see above that there are exactly two strings that start with 8 and end with 10. I have written a program for this.
Below is my code:
public class Main {
static Pattern r = Pattern.compile("(.*?)(8=\\w\\w\\w)[\\s\\S]*?(10=\\w\\w\\w)");
public static void main(String[] args) {
String str = "8=FIX.4.2|9=00394|35=8|49=FIRST|8=FIX.4.2|9=00394|35=8|56=MIDDLE|10=245|8=FIX.4.2|9=00394|35=8|49=LAST|56=HEMADTS|10=024|";
match(str);
}
public static void match(String message) { //send to OMS
Matcher m = r.matcher(message);
while (m.find()) {
System.out.println(m.group());
}
}
}
When I just run this I am getting the wrong output like:
8=FIX.4.2|9=00394|35=849=FIRST`|8=FIX.4.2|9=00394|35=8|56=MIDDLE|10=245|`
8=FIX.4.2|9=00394|35=849=LAST|56=HEMADTS|10=024|
You can see the first string in the output. It consists of "8=???" two times but the exact output needs to be like:
8=FIX.4.2|9=00394|35=8|56=MIDDLE|10=245|
8=FIX.4.2|9=00394|35=849=LAST|56=HEMADTS|10=024|
I also want the un-matched strings in separate as there is a further work with those strings. How can I get that? So, the total output needs to be like:
Matched : 8=FIX.4.2|9=00394|35=8|56=MIDDLE|10=245|
Matched : 8=FIX.4.2|9=00394|35=849=LAST|56=HEMADTS|10=024|
UnMatched : 8=FIX.4.2|9=00394|35=849=FIRST`|

You need to use a tempered greedy token to match the shortest window possible between 2 strings. That will solve the first problem. To get unmatched strings, just split the string with the pattern.
Use
\b8=\w{3}(?:(?!8=\w{3})[\s\S])*?10=\w{3}\|
See the regex demo.
Details
\b - a word boundary
8= - a literal substring
\w{3} - 3 word chars
(?:(?!8=\w{3})[\s\S])*? - a tempered greedy token matching any char ([\s\S]), zero or more times, as few as possible, that do not start a 8= and 3 word chars pattern
10= - a literal substring
\w{3} - 3 word chars
\| - a literal |.
Java code:
public static Pattern r = Pattern.compile("\\b8=\\w{3}(?:(?!8=\\w{3})[\\s\\S])*?10=\\w{3}\\|");
public static void main (String[] args) throws java.lang.Exception
{
String str = "8=FIX.4.2|9=00394|35=8|49=FIRST|8=FIX.4.2|9=00394|35=8|56=MIDDLE|10=245|8=FIX.4.2|9=00394|35=8|49=LAST|56=HEMADTS|10=024|";
match(str);
}
public static void match(String message) { //send to OMS
Matcher m = r.matcher(message);
System.out.println("MATCHED:");
while (m.find()) {
System.out.println(m.group());
}
System.out.println("UNMATCHED:");
String[] unm = r.split(message);
for (String s: unm) {
System.out.println(s);
}
}
See the Java demo.
Results:
MATCHED:
8=FIX.4.2|9=00394|35=8|56=MIDDLE|10=245|
8=FIX.4.2|9=00394|35=8|49=LAST|56=HEMADTS|10=024|
UNMATCHED:
8=FIX.4.2|9=00394|35=8|49=FIRST|

Related

Regex max a string till " and not stop at \"

I have a String to be checked for regex :
"field":"Testing, for something \"and something\""
which I want to pattern match and replace with :
"field":"SAFE"
For this, I am trying to pattern match and capture till the last inverted commas. I have tried the following regex, but its not matching :
Pattern p = Pattern.compile("\"field\":\".*?(?!\\\")\"");
New to regex, can anyone suggest what I might be doing wrong? Thanks!
EDIT :
I guess the question was not clear. Apologies. The above is not the end of the string. It can contain more fields in succession :
"field":"Testing, for something \"and something\"", "new_field":"blahblah", ...
output should be :
"field":"SAFE", "new_field":"blahblah", ...

You can do it as follows:
public class Testing {
public static void main(String[] args) {
String str = "\"field\":\"Testing, for something \\\"and something\\\"\"";
str = str.replaceAll("(\"field\":).*", "$1\"SAFE\"");
System.out.println(str);
}
}
Output:
"field":"SAFE"
Explanation:
(\"field\":) is the first capturing group
.* specifies all characters
$1 specifies the first capturing group
Update:
Writing this update based on the clarification from OP.
You can use positive lookahead for comma as shown below:
public class Testing {
public static void main(String[] args) {
String str = "\"field\":\"Testing, for something \\\"and something\\\"\", \"new_field\":\"blahblah\"";
str = str.replaceAll("(\"field\":).*(?=,)", "$1\"SAFE\"");
System.out.println(str);
}
}
Output:
"field":"SAFE", "new_field":"blahblah"

Here is an example.
$str = '"field":"Testing, for something \"and something\""';
echo preg_replace('/(\"field\":\")(.*)(\")/i', "$1SAFE$3", $str);
Regex is tested: here.

Replace everything except ONE single char

I am dealing with some cells where I have to extract certain letters from these cells. I want to replace a whole String with " " except from one single-standing character. The biggest challenge to me has been to tell my Regex code only to look for a single Char and remove everything else.
To further elaborate and simplify; I need my Regex to replace everything with "" execept from a single character that standa ALONE (I.E white spaces left and right or linked to a number)
public class Main {
public static void main(String[] args) {
String test = "22A 302 abc";
String works = test.replaceAll("^\\w[\\s\\S]*", " ");
System.out.println(works);
//Desired result: A
}
}

You could match a digit or space before and after capturing a char [A-Z].
In the replacement use group 1.
^.*[\d ]([A-Z])[ \d].*$
Regex demo | Java demo
If there can be only a single uppercase char in the string:
^[^A-Z]*[\d ]([A-Z])[ \d][^A-Z]*$
Regex demo
Example code
String test = "22A 302 abc";
String works = test.replaceAll("^.*[\\d ]([A-Z])[ \\d].*$", "$1");
System.out.println(works);
Output
A
To match between digits 0-9, horizontal whitespace chars or punctuations:
String works = test.replaceAll("^.*[\\p{P}0-9\\h]([A-Z])[\\p{P}0-9\\h].*$", "$1");

You can do it as follows:
public class Test {
public static void main(String[] args) {
String test = "22A 302 abc";
String works = test.replaceAll("\\d+([A-Z]).*", "$1");
System.out.println(works);
}
}
Output:
A
Explanation: Replace everything with capturing group-1 ($1 in the code given above) which has just a letter A-Z preceded by an integer number(\\d+) and can have anything (.*) after it.

Split a string using split method

I have tried to split a string using split method, but I'm facing some problem in using split method.
String str="1-DRYBEANS,2-PLAINRICE,3-COLDCEREAL,4-HOTCEREAL,51-ASSORTEDETHNIC,GOURMET&SPECIALTY";
List<String> zoneArray = new ArrayList<>(Arrays.asList(zoneDescTemp.split(",")));
Actual output :
zoneArray = {"1-DRYBEANS","2-PLAINRICE","3-COLDCEREAL","4-HOTCEREAL","51-ASSORTEDETHNIC","GOURMET&SPECIALTY"}
Expected output :
zoneArray = {"1-DRYBEANS","2-PLAINRICE","3-COLDCEREAL","4-HOTCEREAL","51-ASSORTEDETHNIC,GOURMET&SPECIALTY"}
Any help would be appreciated.

Use split(",(?=[0-9])")
You are not just splitting by comma, but splitting by comma only if it is followed by a digit from 0-9. This is also known as positive lookahead (?=).
Take a look at this code snippet for example:
public static void main(String[] args) {
String str="1-DRYBEANS,2-PLAINRICE,3-COLDCEREAL,4-HOTCEREAL,51-ASSORTEDETHNIC,GOURMET&SPECIALTY";
String[] array1= str.split(",(?=[0-9])");
for (String temp: array1){
System.out.println(temp);
}
}
}

Use a look-ahead within your regex, one that uses comma (not in the look-ahead), followed by a number (in the look-head). \\d+ will suffice for number. The regex can look like:
String regex = ",(?=\\d+)";
For example:
public class Foo {
public static void main(String[] args) {
String str = "1-DRYBEANS,2-PLAINRICE,3-COLDCEREAL,4-HOTCEREAL,51-ASSORTEDETHNIC,GOURMET&SPECIALTY";
String regex = ",(?=\\d+)";
String[] tokens = str.split(regex);
for (String item : tokens) {
System.out.println(item);
}
}
}
what this does is split on a comma that is followed by numbers, but does not remove from the output, the numbers since they are part of the look-ahead.
For more on look-ahead, look-behind and look-around, please check out this relevant tutorial page.

In java pattern matcher(regex) how to iterate and replace each text with different text

I want to check for pattern matching, and if the pattern matches, then I wanted to replace those text matches with the element in the test array at the given index.
public class test {
public static void main(String[] args) {
String[] test={"one","two","three","four"}
Pattern pattern = Pattern.compile("\\$(\\d)+");
String text="{\"test1\":\"$1\",\"test2\":\"$5\",\"test3\":\"$3\",\"test4\":\"$4\"}";
Matcher matcher = pattern.matcher(text);
while(matcher.find()) {
System.out.println(matcher.groupCount());
System.out.println(matcher.replaceAll("test"));
}
System.out.println(text);
}
}
I want the end result text string to be in this format:
{\"test1\":\"one\",\"test2\":\"$two\",\"test3\":\"three\",\"test4\":\"four\"}
but the while loop is exiting after one match and "test" is replaced everywhere like this:
{"test1":"test","test2":"test","test3":"test","test4":"test"}
Using the below code I got the result:
public class test {
public static void main(String[] args) {
String[] test={"one","two","three","four"};
Pattern pattern = Pattern.compile("\\$(\\d)+");
String text="{\"test1\":\"$1\",\"test2\":\"$2\",\"test3\":\"$3\",\"test4\":\"$4\"}";
Matcher m = pattern.matcher(text);
StringBuffer sb = new StringBuffer();
while (m.find()) {
m.appendReplacement(sb, test[Integer.parseInt(m.group(1)) - 1]);
}
m.appendTail(sb);
System.out.println(sb.toString());
}
}
But, if I have a replacement text array like this,
String[] test={"$$one","two","three","four"};
then, because of the $$, I am getting an exception in thread "main":
java.lang.IllegalArgumentException: Illegal group reference
at java.util.regex.Matcher.appendReplacement(Matcher.java:857)**

The following line is your problem:
System.out.println(matcher.replaceAll("test"));
If you remove it the loop will walk through all matches.
As a solution for your problem, you could replace the loop with something like this:
For Java 8:
StringBuffer out = new StringBuffer();
while (matcher.find()) {
String r = test[Integer.parseInt(matcher.group(1)) - 1];
matcher.appendReplacement(out, r);
}
matcher.appendTail(out);
System.out.println(out.toString());
For Java 9 and above:
String x = matcher.replaceAll(match -> test[Integer.parseInt(match.group(1)) - 1]);
System.out.println(x);
This only works, if you replace the $5 with $2 which is what I would assume is your goal.
Concerning the $ signs in the replacement string, the documentation states:
A dollar sign ($) may be included as a literal in the replacement string by preceding it with a backslash (\$).
In other words, you must write your replacement array as String[] test = { "\\$\\$one", "two", "three", "four" };

I can do a regex solution if you like, but this is much easier (assuming this is the desired output).
int count = 1;
for (String s : test) {
text = text.replace("$" + count++, s);
}
System.out.println(text);
It prints.
{"test1":"one","test2":"two","test3":"three","test4":"four"}

Java Regex How to Find if String Contains Characters but order is not a matter

I have String like this "abcdefgh"
I want to check the string contains the following characters [fcb]
Condition is : The string must contain all characters in any order.
How to write a regex for this one.
I tried following regexes :
.*[fcb].* ---> In this case it not check all characters. If any one character matchs it will return true

Don't use regex. Just use String.contains to test for each of the characters in turn:
in.contains("f") && in.contains("c") && in.contains("b")

You could get the char arry and sort it. Afterwards you could check if it contains .*b.*c.*f.*.
public static boolean contains(String input) {
char[] inputChars = input.toCharArray();
Arrays.sort(inputChars);
String bufferInput = String.valueOf(inputChars);
// Since it is sorted this will check if it simply contains `b,c and f`.
return bufferInput.matches(".*b.*c.*f.*");
}
public static void main(String[] args) {
System.out.println(contains("abcdefgh"));
System.out.println(contains("abdefgh"));
}
output:
true
false

this will check if all the letters are present in the string.
public class Example {
public static void main(String args[]) {
String stringA = "abcdefgh";
String opPattern = "(?=[^ ]*f)(?=[^ ]*c)(?=[^ ]*b)[^ ]+";
Pattern opPatternRegex = Pattern.compile(opPattern);
Matcher matcher = opPatternRegex.matcher(stringA);
System.out.println(matcher.find());
}
}

You can use positive lookahead for this
(?=.*b)(?=.*c)(?=.*f)

Not very efficient but easy to understand:
if (s.matches(".*b.*") && s.matches(".*c.*") && s.matches(".*f.*"))

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Get Sub-string from String with specific Pattern in JAVA - java

Related

Regex max a string till " and not stop at \"

Replace everything except ONE single char

Split a string using split method

In java pattern matcher(regex) how to iterate and replace each text with different text

Java Regex How to Find if String Contains Characters but order is not a matter

Categories

Resources