How Java regex expression should look like if I want to find two matches
1. NEW D City
2. 1259669
From
Object No: NEW D City | Item ID: 1259669
I tried with
(?<=:\s)\w+
but it only get
1. NEW
2. 1259669
https://regex101.com/r/j5jwK2/1
Using a pattern to capture both values is simpler. Here is the regex used :
Object No:([^|]*)\| Item ID: (\d*)
And a code generated by regex101 and adapted to match the output you want.
final String regex = "Object No: ([^|]*)\\| Item ID: (\\d*)";
final String string = "Object No: NEW D City | Item ID: 1259669";
final Pattern pattern = Pattern.compile(regex);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println(+ i + ": " + matcher.group(i));
}
}
Output:
1: NEW D City
2: 1259669
A similar but more generec solution would be [^:]*[:\s]*([^|]*)\|[^:]*[:\s]*(\d*) (not perfect, I didn't try to do something efficient)
You may use a combination of two splits:
String key = "Object No: NEW D City | Item ID: 1259669";
String[] parts = key.split("\\s*\\|\\s*");
List<String> result = new ArrayList<>();
for (String part : parts) {
String[] kvp = part.split(":\\s*");
if (kvp.length == 2) {
result.add(kvp[1]);
System.out.println(kvp[1]); // demo
}
}
See the Java demo
First, you split with \\s*\\|\\s* (a | enclosed with 0+ whitespaces) and then with :\\s*, a colon followed with 0+ whitespaces.
Another approach is to use :\s*([^|]+) pattern and grab and trim Group 1 value:
String s = "Object No: NEW D City | Item ID: 1259669";
List<String> result = new ArrayList<>();
Pattern p = Pattern.compile(":\\s*([^|]+)");
Matcher m = p.matcher(s);
while(m.find()) {
result.add(m.group(1).trim());
System.out.println(m.group(1).trim()); // For demo
}
See the Java demo. In this regex, the ([^|]+) is a capturing group (pushing its contents into matcher.group(1)) that matches one or more (+) chars other than | (with the [^|] negated character class).
Related
String s = #Section250342,Main,First/HS/12345/Jack/M,200010 10.00 200011 -2.00,
#Section250322,Main,First/HS/12345/Aaron/N,200010 17.00,
#Section250399,Main,First/HS/12345/Jimmy/N,200010 12.00,
#Section251234,Main,First/HS/12345/Jack/M,200011 11.00
Wherever there is the word /Jack/M in the3 string, I want to pull the section numbers(250342,251234),dates (200010,200011) and the values(10.00,11.00,-2.00) associated with it using regex each time. Sometines a single line can contain either one value or two so that what makes the regex sort of confusing. So at the end of day, there will be 3 diff groups we want to extract.
I tried
#Section(\d+)(?:(?!#Section\d).)*\bJack/M,(\d+)\h+(\d+(?:\.\d+)?)\s(\d+)\h+([-+]?\d+(?:\.\d+)?)\b
See it in action here - https://regex101.com/r/JaKeGg/1, it brings in 5 groups instead of 3 and when there is only one value here it doesn't seem to match so I need help with this.
You might use a pattern to get 2 capture groups, and then after process the capture 2 values to combine the numbers that should be grouped together.
As the dates and the values in the examples strings seem to go by pair, you can split the group 2 values from the regex on a space and create 2 groups using the modulo operator to group the even/odd occurrences.
#Section(\d+)\b(?:(?!#Section\d).)*\bJack/M,(\d+\h+[-+]?\d+(?:\.\d+)?(?:\s+\d+\h+[-+]?\d+(?:\.\d+)?)*)
Regex demo | Java demo
String regex = "#Section(\\d+)\\b(?:(?!#Section\\d).)*\\bJack/M,(\\d+\\h+[-+]?\\d+(?:\\.\\d+)?(?:\\s+\\d+\\h+[-+]?\\d+(?:\\.\\d+)?)*)";
String string = "#Section250342,Main,First/HS/12345/Jack/M,200010 10.00 200011 -2.00,\n"
+ "#Section250322,Main,First/HS/12345/Aaron/N,200010 17.00,\n"
+ "#Section250399,Main,First/HS/12345/Jimmy/N,200010 12.00,\n"
+ "#Section251234,Main,First/HS/12345/Jack/M,200011 11.00";
Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
List<String> group2 = new ArrayList<>();
List<String> group3 = new ArrayList<>();
System.out.println("Group 1: " + matcher.group(1));
String[] parts = matcher.group(2).split("\\s+");
for (int i = 0; i < parts.length; i++) {
if (i % 2 == 0) {
group2.add(parts[i]);
} else {
group3.add(parts[i]);
}
}
System.out.println("Group 2: " + Arrays.toString(group2.toArray()));
System.out.println("Group 3: " + Arrays.toString(group3.toArray()));
}
}
Output
Group 1: 250342
Group 2: [200010, 200011]
Group 3: [10.00, -2.00]
Group 1: 251234
Group 2: [200011]
Group 3: [11.00]
If you want to group all values, you can create 3 lists and print all the 3 lists after the looping.
List<String> group1 = new ArrayList<>();
List<String> group2 = new ArrayList<>();
List<String> group3 = new ArrayList<>();
while (matcher.find()) {
group1.add(matcher.group(1));
String[] parts = matcher.group(2).split("\\s+");
for (int i = 0; i < parts.length; i++) {
if (i % 2 == 0) {
group2.add(parts[i]);
} else {
group3.add(parts[i]);
}
}
}
System.out.println("Group 1: " + Arrays.toString(group1.toArray()));
System.out.println("Group 2: " + Arrays.toString(group2.toArray()));
System.out.println("Group 3: " + Arrays.toString(group3.toArray()));
Output
Group 1: [250342, 251234]
Group 2: [200010, 200011, 200011]
Group 3: [10.00, -2.00, 11.00]
See this Java demo
I think it is quite difficult to accomplish what you want using solely regex. According to another SO question you can't have multiple matches for the same capturing group in your regex. Instead only the last matching pattern will actually be captured.
My suggestion is to split your string by line in java, iterate through the lines, check if a line contains the substring you search for "Jack/M", and then use regex to extract the different bits by searching for simpler regex pattern instead of trying to match one long regex to the whole string.
A good walk through on how to find matches for a regex in a string: https://www.tutorialspoint.com/getting-the-list-of-all-the-matches-java-regular-expressions
Let's imagine I have the following strings:
String one = "123|abc|123abc";
String two = "123|ab12c|abc|456|abc|def";
String three = "123|1abc|1abc1|456|abc|wer";
String four = "123|abc|def|456|ghi|jkl|789|mno|pqr";
If I do a split on them I expect the following output:
one = ["123|abc|123abc"];
two = ["123|ab12c|abc", "456|abc|def"];
three = ["123|1abc|1abc1", "456|abc|wer"];
four = ["123|abc|def", "456|ghi|jkl", "789|mno|pqr"];
The string has the following structure:
Starts with 1 or more digits followed by a random number of (| followed by random number of characters).
When after a | it's only numbers is considered a new value.
More examples:
In - 123456|xxxxxx|zzzzzzz|xa2314|xzxczxc|1234|qwerty
Out - ["123456|xxxxxx|zzzzzzz|xa2314|xzxczxc", "1234|qwerty"]
Tried multiple variations of the following but does not work:
value.split( "\\|\\d+|\\d+" )
You may split on \|(?=\d+(?:\||$)):
List<String> nums = Arrays.asList(new String[] {
"123|abc|123abc",
"123|ab12c|abc|456|abc|def",
"123|1abc|1abc1|456|abc|wer",
"123|abc|def|456|ghi|jkl|789|mno|pqr"
});
for (String num : nums) {
String[] parts = num.split("\\|(?=\\d+(?:\\||$))");
System.out.println(num + " => " + Arrays.toString(parts));
}
This prints:
123|abc|123abc => [123|abc|123abc]
123|ab12c|abc|456|abc|def => [123|ab12c|abc, 456|abc|def]
123|1abc|1abc1|456|abc|wer => [123|1abc|1abc1, 456|abc|wer]
123|abc|def|456|ghi|jkl|789|mno|pqr => [123|abc|def, 456|ghi|jkl, 789|mno|pqr]
Instead of splitting, you can match the parts in the string:
\b\d+(?:\|(?!\d+(?:$|\|))[^|\r\n]+)*
\b A word boundary
\d+ Match 1+ digits
(?: Non capture group
\|(?!\d+(?:$|\|)) Match | and assert not only digits till either the next pipe or the end of the string
[^|\r\n]+ Match 1+ chars other than a pipe or a newline
)* Close the non capture group and optionally repeat (use + to repeat one or more times to match at least one pipe char)
Regex demo | Java demo
String regex = "\\b\\d+(?:\\|(?!\\d+(?:$|\\|))[^|\\r\\n]+)+";
String string = "123|abc|def|456|ghi|jkl|789|mno|pqr";
Pattern pattern = Pattern.compile(regex);
Matcher m = pattern.matcher(string);
List<String> matches = new ArrayList<String>();
while (m.find())
matches.add(m.group());
for (String s : matches)
System.out.println(s);
Output
123|abc|def
456|ghi|jkl
789|mno|pqr
I'm trying to search for a set of words, contained within an ArrayList(terms_1pers), inside a string and, since the precondition is that before and after the search word there should be no letters, I thought of using expression regular.
I just don't know what I'm doing wrong using the matches operator. In the code reported, if the matching is not verified, it writes to an external file.
String url = csvRecord.get("url");
String text = csvRecord.get("review");
String var = null;
for(String term : terms_1pers)
{
if(!text.matches("[^a-z]"+term+"[^a-z]"))
{
var="true";
}
}
if(!var.equals("true"))
{
bw.write(url+";"+text+"\n");
}
In order to find regex matches, you should use the regex classes. Pattern and Matcher.
String term = "term";
ArrayList<String> a = new ArrayList<String>();
a.add("123term456"); //true
a.add("A123Term5"); //false
a.add("term456"); //true
a.add("123term"); //true
Pattern p = Pattern.compile("^[^A-Za-z]*(" + term + ")[^A-Za-z]*$");
for(String text : a) {
Matcher m = p.matcher(text);
if (m.find()) {
System.out.println("Found: " + m.group(1) );
//since the term you are adding is the second matchable portion, you're looking for group(1)
}
else System.out.println("No match for: " + term);
}
}
In the example there, we create an instance of a https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html to find matches in the text you are matching against.
Note that I adjusted the regex a bit. The choice in this code excludes all letters A-Z and the lowercase versions from the initial matching part. It will also allow for situations where there are no characters at all before or after the match term. If you need to have something there, use + instead of *. I also limited the regex to force the match to only contain matches for these three groups by using ^ and $ to verify end the end of the matching text. If this doesn't fit your use case, you may need to adjust.
To demonstrate using this with a variety of different terms:
ArrayList<String> terms = new ArrayList<String>();
terms.add("term");
terms.add("the book is on the table");
terms.add("1981 was the best year ever!");
ArrayList<String> a = new ArrayList<String>();
a.add("123term456");
a.add("A123Term5");
a.add("the book is on the table456");
a.add("1##!231981 was the best year ever!9#");
for (String term: terms) {
Pattern p = Pattern.compile("^[^A-Za-z]*(" + term + ")[^A-Za-z]*$");
for(String text : a) {
Matcher m = p.matcher(text);
if (m.find()) {
System.out.println("Found: " + m.group(1) + " in " + text);
//since the term you are adding is the second matchable portion, you're looking for group(1)
}
else System.out.println("No match for: " + term + " in " + text);
}
}
Output for this is:
Found: term in 123term456
No match for: term in A123Term5
No match for: term in the book is on the table456....
In response to the question about having String term being case insensitive, here's a way that we can build a string by taking advantage of java.lang.Character to options for upper and lower case letters.
String term = "This iS the teRm.";
String matchText = "123This is the term.";
StringBuilder str = new StringBuilder();
str.append("^[^A-Za-z]*(");
for (int i = 0; i < term.length(); i++) {
char c = term.charAt(i);
if (Character.isLetter(c))
str.append("(" + Character.toLowerCase(c) + "|" + Character.toUpperCase(c) + ")");
else str.append(c);
}
str.append(")[^A-Za-z]*$");
System.out.println(str.toString());
Pattern p = Pattern.compile(str.toString());
Matcher m = p.matcher(matchText);
if (m.find()) System.out.println("Found!");
else System.out.println("Not Found!");
This code outputs two lines, the first line is the regex string that's being compiled in the Pattern. "^[^A-Za-z]*((t|T)(h|H)(i|I)(s|S) (i|I)(s|S) (t|T)(h|H)(e|E) (t|T)(e|E)(r|R)(m|M).)[^A-Za-z]*$" This adjusted regex allows for letters in the term to be matched regardless of case. The second output line is "Found!" because the mixed case term is found within matchText.
There are several things to note:
matches requires a full string match, so [^a-z]term[^a-z] will only match a string like :term.. You need to use .find() to find partial matches
If you pass a literal string to a regex, you need to Pattern.quote it, or if it contains special chars, it will not get matched
To check if a word has some pattern before or after or at the start/end, you should either use alternations with anchors (like (?:^|[^a-z]) or (?:$|[^a-z])) or lookarounds, (?<![a-z]) and (?![a-z]).
To match any letter just use \p{Alpha} or - if you plan to match any Unicode letter - \p{L}.
The var variable is more logical to set to Boolean type.
Fixed code:
String url = csvRecord.get("url");
String text = csvRecord.get("review");
Boolean var = false;
for(String term : terms_1pers)
{
Matcher m = Pattern.compile("(?<!\\p{L})" + Pattern.quote(term) + "(?!\\p{L})").matcher(text);
// If the search must be case insensitive use
// Matcher m = Pattern.compile("(?i)(?<!\\p{L})" + Pattern.quote(term) + "(?!\\p{L})").matcher(text);
if(!m.find())
{
var = true;
}
}
if (!var) {
bw.write(url+";"+text+"\n");
}
you did not consider the case where the start and end may contain letters
so adding .* at the front and end should solve your problem.
for(String term : terms_1pers)
{
if( text.matches(".*[^a-zA-Z]+" + term + "[^a-zA-Z]+.*)" )
{
var="true";
break; //exit the loop
}
}
if(!var.equals("true"))
{
bw.write(url+";"+text+"\n");
}
I am pretty new to regular expressions and I need to create a pattern that could be used in matching up different text values(cases). I can use the created pattern but it can only be used in a single case. I would like to maximize the search pattern so that it can be used to different search texts.
By the way, I am using Java 8.
Objective:
Display matcher.find() by group.
Sample Search Texts and Expected output (Group):
Search Text: "employeeName:*borgy*";
Expected Output:
-
(employeeName) (:) (*) (borgy) (*)
-
Search Text: "employeeName:Borgy Manotoy*";
Expected Output:
-
(employeeName) (:) () (Borgy Manotoy) (*)
-
Search Text: "employeeName:*Borgy Manotoy*";
Expected Output:
-
(employeeName) (:) (*) (Borgy Manotoy) (*)
-
Search Text: "employeeEmail:*borgymanotoy#iyotbihagay.com*";
Expected Output:
-
(employeeEmail) (:) (*) (borgymanotoy#iyotbihagay.com) (*)
-
Search Text: "employeeEmail:borgymanotoy#iyotbihagay.com";
Expected Output:
-
(employeeEmail) (:) () (borgymanotoy#iyotbihagay.com) ()
-
Search Text: "employeeName:*Manotoy*, employeeEmail:*#iyotbihagay.*";
Expected Output:
-
(employeeName) (:) (*) (Manotoy) (*)
(employeeEmail) (:) (*) (#iyotbihagay.com) (*)
-
Search Text: "employeeName:*Manotoy*, employeeEmail:*#iyotbihagay.*, employeeRole:*bouncer*";
Expected Output:
-
(employeeName) (:) (*) (Manotoy) (*)
(employeeEmail) (:) (*) (#iyotbihagay.com) (*)
(employeeRole) (:) (*) (bouncer) (*)
-
Search pattern:
String searchPattern = "(\\w+?)(:|!)(\\p{Punct}?)(\\w+?) (.+?)?(\\p{Punct}?),";
Sample search texts:
String text1 = "employeeName:borgy";
String text2 = "employeeName:Borgy*";
String text3 = "employeeName:*borgy*";
String text4 = "employeeName:*Borgy*";
String text5 = "employeeName:*Borgy Manotoy*";
String text6 = "employeeEmail:*borgymanotoy#iyotbihagay.com*";
String text7 = "employeeEmail:borgymanotoy#iyotbihagay.com";
String text8 = "employeeEmail:borgymanotoy#iyotbihagay.*";
String text9 = "employeeEmail:*#iyotbihagay.*";
String text10 = "employeeName:*Manotoy*, employeeEmail:*#iyotbihagay.*";
Search texts using the given pattern:
processUserSearch(text1, searchPattern);
processUserSearch(text2, searchPattern);
processUserSearch(text3, searchPattern);
...
processUserSearch(text10, searchPattern);
Display found
private void processUserSearch(String searchText, String searchPattern) {
if (!Util.isEmptyOrNull(searchText) && !Util.isEmptyOrNull(searchPattern)) {
Pattern pattern = Pattern.compile(searchPattern);
Matcher matcher = pattern.matcher(searchText + ",");
while(matcher.find()) {
System.out.println("[matcher-count]: " + matcher.groupCount());
System.out.print("found: ");
for (int x = 1; x <= matcher.groupCount(); x++) {
System.out.print("(" + matcher.group(x) + ") ");
}
System.out.println("\n");
}
}
}
I suggest using
private static final Pattern pattern = Pattern.compile("(\\w+)([:!])(\\p{Punct}?)(.*?)(\\p{Punct}?)(?=$|,)");
private static void processUserSearch(String searchText) {
if (!searchText.isEmpty() && searchText != null) {
//if (!Util.isEmptyOrNull(searchText) && !Util.isEmptyOrNull(searchPattern)) {
Matcher matcher = pattern.matcher(searchText);
while(matcher.find()) {
System.out.println(searchText + "\n[matcher-count]: " + matcher.groupCount());
System.out.print("found: ");
for (int x = 1; x <= matcher.groupCount(); x++) {
System.out.print("(" + matcher.group(x) + ") ");
}
System.out.println("\n");
}
}
}
Note you can compile it once outside of the matching method for better efficiency.
Use as
String[] texts = new String[] { "employeeName:*borgy*","employeeName:Borgy Manotoy*","employeeName:*Borgy Manotoy*",
"employeeEmail:*borgymanotoy#iyotbihagay.com*","employeeEmail:borgymanotoy#iyotbihagay.com",
"employeeName:*Manotoy*, employeeEmail:*#iyotbihagay.*",
"employeeName:*Manotoy*, employeeEmail:*#iyotbihagay.*, employeeRole:*bouncer*"};
for (String s: texts) {
processUserSearch(s);
}
}
See the Java demo
Here is the regex demo:
(\w+)([:!])(\p{Punct}?)(.*?)(\p{Punct}?)(?=$|,)
Details
(\w+) - Group 1: one or more word chars
([:!]) - Group 2: a : or !
(\p{Punct}?) - Group 3: an optional punctuation char
(.*?) - Group 4: any 0+ chars other than line break chars
(\p{Punct}?) - Group 5: an optional punctuation char
(?=$|,) - an end of string or , should come immediately to the right of the current location (but they do not get added to the match value since it is a positive lookahead).
I would like to maximize the search pattern so that it can be used to different search texts.
And what are "different search texts"? Be specific!
Your problem doesn't seem specific to Java. Your current pattern contains (:|!), but none of the examples suggest how !s may occur in the input. You use \p{Punct} to match the * surrounding the names and emails, but you have no examples of other enclosures than *. You don't say what the purpose of the *s are; are they enclosures, wildcard patterns, what?
The following pattern seems to work for some purposes:
(?:employee(Name|Email)):([\w*#. ]+)
i'v tested my regex on Regex101 and all the groups was captured and matched my string. But now when i'm trying to use it on java, it returns to me a
java.lang.IllegalStateException: No match found on line 9
String subjectCode = "02 credits between ----";
String regex1 = "^(\\d+).*credits between --+.*?$";
Pattern p1 = Pattern.compile(regex1);
Matcher m;
if(subjectCode.matches(regex1)){
m = p1.matcher(regex1);
m.find();
[LINE 9]Integer subjectCredits = Integer.valueOf(m.group(1));
System.out.println("Subject Credits: " + subjectCredits);
}
How's that possible and what's the problem?
Here is a fix and optimizations (thanks go to #cricket_007):
String subjectCode = "02 credits between ----";
String regex1 = "(\\d+).*credits between --+.*";
Pattern p1 = Pattern.compile(regex1);
Matcher m = p1.matcher(subjectCode);
if (m.matches()) {
Integer subjectCredits = Integer.valueOf(m.group(1));
System.out.println("Subject Credits: " + subjectCredits);
}
You need to pass the input string to the matcher. As a minor enhancement, you can use just 1 Matcher#matches and then access the captured group if there is a match. The regex does not need ^ and $ since with matches() the whole input should match the pattern.
See IDEONE demo