java regex failing for multiple group - java

Below regex is working fine in most of the regex tools. However, its not working in the java code. Can anyone please advise?
String text="CHANGE FEE/ADD COLLECT DATA "+
"1.1 COLOR/RED TOMATO "+
"CF USD10.00 "+
" "+
"2.2 COLOR/DARK BLUE PLUM "+
"CF USD11.00 "+
" ";
String patterString = "([0-9]{1,3}\\.[0-9]{1,3})\\s.+\\s*CF\\s+[a-zA-Z]{1,5}([0-9]{1,10}.[0-9]{2})";
Pattern pattern = Pattern.compile(patterString);
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
System.out.println("found: " + matcher.group(1) +">>>"+ matcher.group(2));
}
actual output:
found: 1.1>>>11.00
expected output:
found: 1.1>>>10.00
found: 2.2>>>11.00

Your regex needs to be:
String patterString = "([0-9]{1,3}\\.[0-9]{1,3}).*?CF\\s+[a-zA-Z]{1,5}([0-9]{1,10}.[0-9]{2})";
Which yields:
found: 1.1>>>10.00
found: 2.2>>>11.00
I haven't read the docs, but guess that when iterating with find() it's implicitly in MULTILINE mode, so the portion of your regex \\s.+\\s* is greedy - replacing this with .*? minimizes the greed ;-)
Edit, sample source:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexFind {
public static void main(String[] args)
{
String text="CHANGE FEE/ADD COLLECT DATA "+
"1.1 COLOR/RED TOMATO "+
"CF USD10.00 "+
" "+
"2.2 COLOR/DARK BLUE PLUM "+
"CF USD11.00 "+
" ";
//String patterString = "([0-9]{1,3}\\.[0-9]{1,3})\\s.+\\s*CF\\s+[a-zA-Z]{1,5}([0-9]{1,10}.[0-9]{2})";
String patterString = "([0-9]{1,3}\\.[0-9]{1,3}).*?CF\\s+[a-zA-Z]{1,5}([0-9]{1,10}.[0-9]{2})";
Pattern pattern = Pattern.compile(patterString);
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
System.out.println("found: " + matcher.group(1) +">>>"+ matcher.group(2));
}
}
}

Related

Regex to split a string using java

I am trying to parse a string as I need to pass the map to UI.
Here is my input string :
"2020-02-01T00:00:00Z",1,
"2020-04-01T00:00:00Z",4,
"2020-05-01T00:00:00Z",2,
"2020-06-01T00:00:00Z",31,
"2020-07-01T00:00:00Z",60,
"2020-08-01T00:00:00Z",19,
"2020-09-01T00:00:00Z",10,
"2020-10-01T00:00:00Z",33,
"2020-11-01T00:00:00Z",280,
"2020-12-01T00:00:00Z",61,
"2021-01-01T00:00:00Z",122,
"2021-12-01T00:00:00Z",1
I need to split the string like this :
"2020-02-01T00:00:00Z",1 : split[0]
"2020-04-01T00:00:00Z",4 : split[1]
Issue is I can't split it on " , " as its repeated 2 times.
I need a regex that gives 2020-02-01T00:00:00Z,1 as one token to process further.
I am new to regex. Can someone please provide a regex expression for the same.
If you want the pairs of date-time and ID, you can use the regex, (\"\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}Z\",\d+)(?=,|$) to get the match results.
The pattern, (?=,|$) is the lookahead assertion for comma or end of the line.
Demo:
import java.util.List;
import java.util.regex.MatchResult;
import java.util.regex.Pattern;
import java.util.stream.Collectors;
public class Main {
public static void main(String[] args) {
String s = "\"2020-02-01T00:00:00Z\",1,\n"
+ " \"2020-04-01T00:00:00Z\",4,\n"
+ " \"2020-05-01T00:00:00Z\",2,\n"
+ " \"2020-06-01T00:00:00Z\",31,\n"
+ " \"2020-07-01T00:00:00Z\",60,\n"
+ " \"2020-08-01T00:00:00Z\",19,\n"
+ " \"2020-09-01T00:00:00Z\",10,\n"
+ " \"2020-10-01T00:00:00Z\",33,\n"
+ " \"2020-11-01T00:00:00Z\",280,\n"
+ " \"2020-12-01T00:00:00Z\",61,\n"
+ " \"2021-01-01T00:00:00Z\",122,\n"
+ " \"2021-12-01T00:00:00Z\",1";
List<String> list = Pattern.compile("(\\\"\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}Z\\\",\\d+)(?=,|$)")
.matcher(s)
.results()
.map(MatchResult::group)
.collect(Collectors.toList());
list.stream()
.forEach(p -> System.out.println(p));
}
}
Output:
"2020-02-01T00:00:00Z",1
"2020-04-01T00:00:00Z",4
"2020-05-01T00:00:00Z",2
"2020-06-01T00:00:00Z",31
"2020-07-01T00:00:00Z",60
"2020-08-01T00:00:00Z",19
"2020-09-01T00:00:00Z",10
"2020-10-01T00:00:00Z",33
"2020-11-01T00:00:00Z",280
"2020-12-01T00:00:00Z",61
"2021-01-01T00:00:00Z",122
"2021-12-01T00:00:00Z",1
Why can't you just split on , and ignore the last value?
Here's your pattern:
final Pattern pattern = Pattern.compile("(\\S+),(\\d+)");
final Matcher matcher = pattern.matcher("Input....");
Here's how to use it:
while (matcher.find()) {
final String date = matcher.group(1);
final String number = matcher.group(2);
}

return' character and new line use with regex java 8

I 'm facing strange behaviour in java 8 regarding the use of (\r?\n) inside a regex to parse text file with IDE eclipse runing under java 8.
see regex101 test demo https://regex101.com/r/QHSsfQ/4
the regex work fine for java 7 with IDE eclipse .
but with IDE runing in java 8 it dosen't work ( see bellow code )
can someone help how me to solved this?
String REGEX =
"\\s+NAME.*" + "\\r?\\n"
+ "INFO-\\d{1,2}\\s+(?<name>[$\\w]+).*" + "\\r?\\n"
+ ".*" + "\\r?\\n"
+ ".*VERAT2.*" + "\\r?\\n"
+ "\\s+\\w+\\s+(?<verat2>\\w+).*"
.......
.......
Matcher matcher = Pattern.compile( REGEX ).matcher( data );
if( matcher.find() )
{
System.out.println("LEVELINFO=DATA=" + matcher.group("name") + " &&NAME=" + matcher.group("name") +" &&VERAT2="+ matcher.group("verat2")+"\n");
}
}
sc.close();
the sample text file looks like this :
DATA NAME MAC1
INFO-0 EQUIP Q10
VL VER VERAT2
V22 V22
thanks
Alternative regex:
String regexName = "^DATA\\s+NAME\\s+.*?^\\S+\\s+(?<name>\\S+)";
String regexVerat2 = "\\s+VER\\s+VERAT2\\s+.*?^\\s+\\S+\\s+(?<verat2>\\S+)";
String regex = String.format("%s.*?%s", regexName, regexVerat2);
Matcher matcher = Pattern.compile(regex, Pattern.MULTILINE|Pattern.DOTALL).matcher(input);
Regex in context:
public static void main(String[] args) {
String input =
"DATA NAME MAC1 MAC2\n"
+ "INFO-0 EQUIP Q10 Q13\n"
+ " \n"
+ " VL VER VERAT2 MAP\n"
+ " V22 V22 SELF100\n"
+ " \n"
+ " CMD1 CMD2 CMD3 CMD4 CMD4 \n"
+ " NO 44 FAL BYTE\n";
String regexName = "^DATA\\s+NAME\\s+.*?^\\S+\\s+(?<name>\\S+)";
String regexVerat2 = "\\s+VER\\s+VERAT2\\s+.*?^\\s+\\S+\\s+(?<verat2>\\S+)";
String regex = String.format("%s.*?%s", regexName, regexVerat2);
Matcher matcher = Pattern.compile(regex, Pattern.MULTILINE|Pattern.DOTALL).matcher(input);
while(matcher.find()) {
System.out.println("Name: " + matcher.group("name"));
System.out.println("Verat2 : " + matcher.group("verat2"));
}
}
Output:
Name: EQUIP
Verat2 : V22

Find substring from a complex string using regex

I have a String containing huge script code as follows :
String script = "node {
stage(someString) {
try {
**parameters= [
[someString],
[someString],
[someString],
[someString],
[someString],
[someString],
[someString],
]**
//some more script
}
}";
I want to extract the parameters variable containing array of array values
I tried the following pattern but didnt work
Pattern pattern = Pattern.compile("parameters= [(.*?)]");
How do I extract the parameters variable from script String variable using Regex?
Thanks in advance!
You may try using:
parameters=\s*\[(.*)]
Explanation of the above regex:
parameters= - Matches parameters= literally.
\s* - Matches a white-space character zero or more times.
\[ - Matches [ literally.
(.*)] - represents a capturing group capturing everything before a ].
You can find the demo of the above regex in here.
Sample Implementation in java:
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class Main
{
private static final Pattern pattern = Pattern.compile("parameters=\\s*\\[(.*)]", Pattern.DOTALL);
public static void main(String[] args) {
String string = "node {\n"
+ " stage(someString) {\n"
+ " try {\n"
+ " **parameters= [\n"
+ " [someString],\n"
+ " [someString],\n"
+ " [someString],\n"
+ " [someString],\n"
+ " [someString],\n"
+ " [someString],\n"
+ " [someString],\n"
+ " ]**\n"
+ " //some more script";
StringBuilder sb = new StringBuilder();
Matcher matcher = pattern.matcher(string);
while(matcher.find()){
// Replaced all the unwanted spaces and commas. You can address that accordingly.
sb.append(matcher.group(1).replaceAll("[\\s,]+", " "));
}
System.out.println(sb.toString());
}
}
Please find the sample run of the above implementation in here.

java regex take variable between two tag

I am very new in regex and need your help. I wanna take numbers and letters between two span.
<span>454.000 $</span>
I wanna take 454.000 $. There are 12 space before . Please help me.
This Should Work.
Regexp:
\s+<.+>(.+)<.+>
Input:
<span>454.000 $</span>
Output:
454.000 $
JAVA CODE:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
final String regex = "\\s+<.+>(.+)<.+>";
final String string = " <span>454.000 $</span>";
final Pattern pattern = Pattern.compile(regex);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Full match: " + matcher.group(0));
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println("Group " + i + ": " + matcher.group(i));
}
}
See: https://regex101.com/r/2zg5Ws/1
Capturing group using pattern matching is something like below
String x = " <span>454.000 $</span> ";
Pattern p = Pattern.compile("<span>(.*?)</span>");
Matcher m = p.matcher(x);
if (m.find()) {
System.out.println(">> "+ m.group(1)); // output 454.000 $
}
But for such cases I always prefer to use the replaceAll() as it is shorter version of code:
String num = x.replaceAll(".*<span>(.*?)</span>.*", "$1");
// num has 454.000 $
For the replace it is actually capturing the group from the text and replacing the whole text with that group ($1). This solution depends upon how your input string is.

Java regEx URL matching issue

and as usual thank you in advance.
I am trying to familiarize myself with regEx and I am having an issue matching a URL.
Here is an example URL:
www.examplesite.com/dir/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html
here is what my regex breakdown looks like:
[site]/[dir]*?/[year]/[month]/[day]/[storyTitle]?/[id]/htmlpage.html
the [id] is a string 22 characters in length that can be either uppercase or lowercase letters, as well as numbers. However, I do not want to extract that from the URL. Just clarifying
Now, I need to extract two values from this url.
First,
I need to extract the dirs(s). However, the [dir] is optional, but also can be as many as wanted. In other words that parameter could not be there, or it could be dir1/dir2/dir3 ..etc . So, going off my first example :
www.examplesite.com/dir1/dir2/dir3/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html
Here I would need to extract dir1/dir2/dir3 where a dir is a string that is a single word with all lowercase letters (ie sports/mlb/games). There are no numbers in the dir, only using that as an example.
But in this example of a valid URL:
www.examplesite.com/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html
There is no [dir] so I would not extract anything. thus, the [dir] is optional
Secondly,
I need to extract the [storyTitle] where the [storyTitle] is also optional just like the [dir] above, but however if there is a storyTitle there can only be one.
So going off my previous examples
www.examplesite.com/dir/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html
would be valid where I need to extract 'title-of-some-story' where story titles are dash separated strings that are always lowercase. The example belowis also valid:
www.examplesite.com/dir/2012/06/19/FAQKZjC3veXSalP9zxFgZP/htmlpage.html
In the above example, there is no [storyTitle] thus making it optional
Lastly, just to be thorough, a URL without a [dir] and without a [storyTitle] are also valid. Example:
www.examplesite.com/2012/06/19/FAQKZjC3veXSalP9zxFgZP/htmlpage.html
Is a valid URL. Any input would be helpful I hope I am clear.
Here is one example that will work.
public static void main(String[] args) {
Pattern p = Pattern.compile("(?:http://)?.+?(/.+?)?/\\d+/\\d{2}/\\d{2}(/.+?)?/\\w{22}");
String[] strings ={
"www.examplesite.com/dir1/dir2/4444/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html",
"www.examplesite.com/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html",
"www.examplesite.com/dir/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html",
"www.examplesite.com/dir/2012/06/19/FAQKZjC3veXSalP9zxFgZP/htmlpage.html",
"www.examplesite.com/2012/06/19/FAQKZjC3veXSalP9zxFgZP/htmlpage.html"
};
for (int idx = 0; idx < strings.length; idx++) {
Matcher m = p.matcher(strings[idx]);
if (m.find()) {
String dir = m.group(1);
String title = m.group(2);
if (title != null) {
title = title.substring(1); // remove the leading /
}
System.out.println(idx+": Dir: "+dir+", Title: "+title);
}
}
}
Here is an all regex solution.
Edit: Allows for http://
Java source:
import java.util.*;
import java.lang.*;
import java.util.regex.*;
class Main
{
public static void main (String[] args) throws java.lang.Exception
{
String url = "http://www.examplesite.com/dir/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html";
String url2 = "www.examplesite.com/dir/dir2/dir3/2012/06/19/FAQKZjC3veXSalP9zxFgZP/htmlpage.html";
String url3 = "www.examplesite.com/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html";
String patternStr = "(?:http://)?[^/]*[/]?([\\S]*)/[\\d]{4}/[\\d]{2}/[\\d]{2}[/]?([\\S]*)/[\\S]*/[\\S]*";
// Compile regular expression
Pattern pattern = Pattern.compile(patternStr);
// Match 1st url
System.out.println("Match 1st URL:");
Matcher matcher = pattern.matcher(url);
if (matcher.find()) {
System.out.println("URL: " + matcher.group(0));
System.out.println("DIR: " + matcher.group(1));
System.out.println("TITLE: " + matcher.group(2));
}
else{ System.out.println("No match."); }
// Match 2nd url
System.out.println("\nMatch 2nd URL:");
matcher = pattern.matcher(url2);
if (matcher.find()) {
System.out.println("URL: " + matcher.group(0));
System.out.println("DIR: " + matcher.group(1));
System.out.println("TITLE: " + matcher.group(2));
}
else{ System.out.println("No match."); }
// Match 3rd url
System.out.println("\nMatch 3rd URL:");
matcher = pattern.matcher(url3);
if (matcher.find()) {
System.out.println("URL: " + matcher.group(0));
System.out.println("DIR: " + matcher.group(1));
System.out.println("TITLE: " + matcher.group(2));
}
else{ System.out.println("No match."); }
}
}
Output:
Match 1st URL:
URL: http://www.examplesite.com/dir/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html
DIR: dir
TITLE: title-of-some-story
Match 2nd URL:
URL: www.examplesite.com/dir/dir2/dir3/2012/06/19/FAQKZjC3veXSalP9zxFgZP/htmlpage.html
DIR: dir/dir2/dir3
TITLE:
Match 3rd URL:
URL: www.examplesite.com/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html
DIR:
TITLE: title-of-some-story

Categories