Question about Java regex - java

I get a string from a array list:
array.get(0).toString()
gives TITLE = "blabla"
I want the string blabla, so I try this :
Pattern p = Pattern.compile("(\".*\")");
Matcher m = p.matcher(array.get(0).toString());
System.out.println("Title : " + m.group(0));
It doesn't work: java.lang.IllegalStateException: No match found
I also try:
Pattern p = Pattern.compile("\".*\"");
Pattern p = Pattern.compile("\".*\"");
Pattern p = Pattern.compile("\\\".*\\\"");
Nothing matches in my program but ALL patterns work on http://www.fileformat.info/tool/regex.htm
Any Idea? Thanks in advance.

A couple of points:
The Javadoc for Matcher#group states:
IllegalStateException - If no match has yet been attempted, or if the previous match operation failed
That is, before using group, you must first use m.matches (to match the entire sequence), or m.find (to match a subsequence).
Secondly, you actually want m.group(1), since m.group(0) is the whole pattern.
Actually, this isn't so important here since the regexp in question starts and ends with the capture parentheses, so that group(0) is the same string as group(1), but it would matter if your regexp looked like: "TITLE = (\".*\")"
Example code:
import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import org.junit.Test;
#SuppressWarnings("serial")
public class MatcherTest {
#Test(expected = IllegalStateException.class)
public void testIllegalState() {
List<String> array = new ArrayList<String>() {{ add("Title: \"blah\""); }};
Pattern p = Pattern.compile("(\".*\")");
Matcher m = p.matcher(array.get(0).toString());
System.out.println("Title : " + m.group(0));
}
#Test
public void testLegal() {
List<String> array = new ArrayList<String>() {{ add("Title: \"blah\""); }};
Pattern p = Pattern.compile("(\".*\")");
Matcher m = p.matcher(array.get(0).toString());
if (m.find()) {
System.out.println("Title : " + m.group(1));
}
}
}

You need to call find() or matches() on the Matcher instance first: these actually execute the regular expression and return whether it matched or not. And then only if it matched you can call the methods to get the match groups.

are you including the double quotes (") in the string?
All your regex' have escaped "s and will only match if the string in the list includes double quote characters.

Related

Looking for A Regular expression to match java regex (punct) pattern

I am looking for help/support for a Regex expression which will match studentIdMatch2 value in below class. studentIdMatch1 matches fine.However the studentIdMatch2 has studentId which can allow all the special characters other than : and ^ and comma.Hence its not working,thank you for your time and appreciate your support.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class TestRegEx {
public static void main(String args[]){
String studentIdMatch1 = "studentName:harry,^studentId:Id123";
String studentIdMatch2 = "studentName:harry,^studentId:Id-H/MPU/L&T/OA+_T/(1490)/17#)123";
Pattern pattern = Pattern
.compile("(\\p{Punct}?)(\\w+?)(:)(\\p{Punct}?)(\\w+?)(\\p{Punct}?),");
Matcher matcher = pattern.matcher(studentIdMatch1 + ","); // Works Fine(Matches Student Name and Id)
// No Special Characters in StudentId
//Matcher matcher = pattern.matcher(studentIdMatch2 + ","); //Wont work Special Characters in StudentId. Matches Student Name
while (matcher.find()) {
System.out.println("group1 = "+matcher.group(1)+ "group2 = "+matcher.group(2) +"group3 = "+matcher.group(3) +"group4 = "+matcher.group(4)+"group5 = "+matcher.group(5));
}
System.out.println("match ended");
}
}
You may try:
^SutdentName:(\w+),\^StudenId:([^\s,^:]+)$
Explanation of the above regex:
^, $ - Represents start and end of line respectively.
SutdentName: - Matches SutdentName: literally. Although according to me it should be StudentName; but I didn't changed it.
(\w+) - Represents first capturing group matching only word characters i.e. [A-Za-z0-9_] one or more times greedily.
,\^StudenId: - Matches ,^StudenId literally. Here also I guess it should be StudentId.
([^\s,^:]+) - Represents second capturing group matching everything other than white-space, ,, ^ and : one or more times greedily. You can add others according to your requirements.
You can find the demo of the above regex in here.
Sample Implementation in java:
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class Main
{
private static final Pattern pattern = Pattern.compile("^SutdentName:(\\w+),\\^StudenId:([^\\s,^:]+)$", Pattern.MULTILINE);
public static void main(String[] args) {
String string = "SutdentName:harry,^StudenId:Id123\n"
+ "SutdentName:harry,^StudenId:Id-H/MNK/U&T/BA+_T/(1490)/17#)123";
Matcher matcher = pattern.matcher(string);
while(matcher.find()){
System.out.println(matcher.group(1) + " " + matcher.group(2));
}
}
}
You can find the sample run of the above code in here.
The second (\\w+?) only captures words. So change it to capture what you want. i.e
allow all the special characters other than : and ^ and comma
like ([^:^,]+?)
^ - Negate the match
:^, - Matches : , ^ and comma

Need Regex Support for List Object

I have the following program,
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.regex.Pattern;
public class Regex {
public static void main(String[] args) {
String VALID_GUID_REGEX = "[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[89abAB][0-9a-fA-F]{3}-[0-9a-fA-F]{12}";
Pattern NOT_PREFIXED_FILES_REGEX =
Pattern.compile("(^"+VALID_GUID_REGEX+"/\\b(foo|bar)\\b.*)|^[^/]+$");
List<String> list = new ArrayList<>();
list.add("256a5037-9fc1-4e60-95c3-523d5ae1c935/foo/44434038019,2019-05-24T09:02:18.695Z,b4786bf4-157a-4f1b-a030-4c5416e1884a");
list.add("256a5037-9fc1-4e60-95c3-523d5ae1c935/bar/44434038019,2019-05-24T09:02:18.695Z,b4786bf4-157a-4f1b-a030-4c5416e1884a");
list.add("govcorp/123a5037-9fc1-4e60-95c3-523d5ae1c935/foo/text.doc");
list.add("156a5037-9fc1-4e60-95c3-523d5ae1c935/123a5037-9fc1-4e60-95c3-523d5ae1c935/delta/text.doc");
list.add("123a5037-9fc1-4e60-95c3-523d5ae1c935/");
String[] keys = list.stream()
.filter(k -> NOT_PREFIXED_FILES_REGEX.matcher(k).find())
.toArray(String[]::new);
System.out.println(Arrays.toString(keys));
}
}
And the code works fine except the last item in list, i need the following condition to be satisfied,
256a5037-9fc1-4e60-95c3-523d5ae1c935/foo/44434038019,2019-05-24T09:02:18.695Z,b4786bf4-157a-4f1b-a030-4c5416e1884a -- Pass
256a5037-9fc1-4e60-95c3-523d5ae1c935/bar/44434038019,2019-05-24T09:02:18.695Z,b4786bf4-157a-4f1b-a030-4c5416e1884a -- Pass
govcorp/123a5037-9fc1-4e60-95c3-523d5ae1c935/foo/text.doc -- Fail
156a5037-9fc1-4e60-95c3-523d5ae1c935/123a5037-9fc1-4e60-95c3-523d5ae1c935/foo/text.doc -- Fail
123a5037-9fc1-4e60-95c3-523d5ae1c935/ - Pass
Let's consider first line,
256a5037-9fc1-4e60-95c3-523d5ae1c935/bar/44434038019,2019-05-24T09:02:18.695Z,b4786bf4-157a-4f1b-a030-4c5416e1884a
If i give my input "256a5037-9fc1-4e60-95c3-523d5ae1c935/" - Pass and "256a5037-9fc1-4e60-95c3-523d5ae1c935/bar/" - Pass, am getting file path from server.
Let's consider fail case, "govcorp/" - Fail and "govcorp/123a5037-9fc1-4e60-95c3-523d5ae1c935/" - Fail
If two GUID sequence case should FAIL, such as
156a5037-9fc1-4e60-95c3-523d5ae1c935/123a5037-9fc1-4e60-95c3-523d5ae1c935/ - FAIL
If only one GUID case such as "123e4567-e89b-12d3-a456-426655440001/" - Pass
Here, we would first fail our undesired strings with a simple expression:
^((?!\.doc).)*$
Demo 1
then for the remaining strings, we would be designing a second expression, which in this case, your original expression works just fine, and we might just want to wrap that with a capturing group:
([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[89abAB][0-9a-fA-F]{3}-[0-9a-fA-F]{12})
Demo 2
Test
import java.util.regex.Matcher;
import java.util.regex.Pattern;
final String regex = "([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[89abAB][0-9a-fA-F]{3}-[0-9a-fA-F]{12})";
final String string = "256a5037-9fc1-4e60-95c3-523d5ae1c935/foo/44434038019,2019-05-24T09:02:18.695Z,b4786bf4-157a-4f1b-a030-4c5416e1884a\n"
+ "256a5037-9fc1-4e60-95c3-523d5ae1c935/bar/44434038019,2019-05-24T09:02:18.695Z,b4786bf4-157a-4f1b-a030-4c5416e1884a\n"
+ "123a5037-9fc1-4e60-95c3-523d5ae1c935/";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Full match: " + matcher.group(0));
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println("Group " + i + ": " + matcher.group(i));
}
}
RegEx Circuit
jex.im visualizes regular expressions:
Reference
Do you want to match all .doc with regex, or just match the line which has a substring that matches your existing regex including the .doc?
In case of the latter, surround your regex with .*\b {regex} \b.*
This way, the whole line is matched, and the match is still captured.
^(.*\b[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[89abAB][0-9a-fA-F]{3}-[0-9a-fA-F]{12})\b.*

Regex to exclude word from matches java code

Maybe someone could help me. I'm trying to include within a java code a regex to match all strings except the ZZ78. I'd like to know what it's missing in the regex I have.
The input string is str = "ab57cdZZ78efghZZ7ij#klmZZ78noCODpqrZZ78stuvw27z#xyzZZ78"
and I'm trying with this regex (?:(?![ZZF8]).)* but if you test in http://regexpal.com/
this regex against the string, you'll see that is not working completely.
str = new String ("ab57cdZZ78efghZZ7ij#klmZZ78noCODpqrZZ78stuvw27z#xyzZZ78");
Pattern pattern = Pattern.compile("(?:(?![ZZ78]).)*");
the matched strings should be
ab57cd
efghZZ7ij#klm
noCODpqr
stuvw27z#xyz
Update:
Hello Avinash Raj and Chthonic Project. Thanks so much for your help and solutions provided.
I originally thougth in split method, but I was trying to avoid get empty strings as result
when for example the delimiter string is at the beginning or at the end of the main string.
Then, I thought that a regex could help me to extract all except "ZZ78", avoiding in this way
empty results in the output.
Below I show the code using split method (Chthonic´s) and regex (Avinash´s) both produce empty
string if the commented "if()" conditions are not used.
Does the use of those "if()" are the only way to not print empty strings? or could be the regex
tweaked a little bit to match not empty strings?
This is the code I have tested so far:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexTest {
public static void main(String[] args) {
System.out.println("########### Matches with Split ###########");
String str = "ZZ78ab57cdZZ78efghZZ7ij#klmZZ78noCODpqrZZ78stuvw27z#xyzZZ78";
for (String s : str.split("ZZ78")) {
//if ( !s.isEmpty() ) {
System.out.println("This is a match <<" + s + ">>");
//}
}
System.out.println("##########################################");
System.out.println("########### Matches with Regex ###########");
String s = "ZZ78ab57cdZZ78efghZZ7ij#klmZZ78noCODpqrZZ78stuvw27z#xyzZZ78";
Pattern regex = Pattern.compile("((?:(?!ZZ78).)*)(ZZ78|$)");
Matcher matcher = regex.matcher(s);
while(matcher.find()){
//if ( !matcher.group(1).isEmpty() ) {
System.out.println("This is a match <<" + matcher.group(1) + ">>");
//}
}
}
}
**and the output (without use the "if()´s"):**
########### Matches with Split ###########
This is a match <<>>
This is a match <<ab57cd>>
This is a match <<efghZZ7ij#klm>>
This is a match <<noCODpqr>>
This is a match <<stuvw27z#xyz>>
##########################################
########### Matches with Regex ###########
This is a match <<>>
This is a match <<ab57cd>>
This is a match <<efghZZ7ij#klm>>
This is a match <<noCODpqr>>
This is a match <<stuvw27z#xyz>>
This is a match <<>>
Thanks for help so far.
Thanks in advance
Update #2:
Excellent both of your answers and solutions. Now it works very nice. This is the final code I've tested with both solutions.
Many thanks again.
import java.util.ArrayList;
import java.util.Arrays;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexTest {
public static void main(String[] args) {
System.out.println("########### Matches with Split ###########");
String str = "ZZ78ab57cdZZ78efghZZ7ij#klmZZ78noCODpqrZZ78stuvw27z#xyzZZ78";
Arrays.stream(str.split("ZZ78")).filter(s -> !s.isEmpty()).forEach(System.out::println);
System.out.println("##########################################");
System.out.println("########### Matches with Regex ###########");
String s = "ZZ78ab57cdZZ78efghZZ7ij#klmZZ78noCODpqrZZ78stuvw27z#xyzZZ78";
Pattern regex = Pattern.compile("((?:(?!ZZ78).)*)(ZZ78|$)");
Matcher matcher = regex.matcher(s);
ArrayList<String> allMatches = new ArrayList<String>();
ArrayList<String> list = new ArrayList<String>();
while(matcher.find()){
allMatches.add(matcher.group(1));
}
for (String s1 : allMatches)
if (!s1.equals(""))
list.add(s1);
System.out.println(list);
}
}
And output:
########### Matches with Split ###########
ab57cd
efghZZ7ij#klm
noCODpqr
stuvw27z#xyz
##########################################
########### Matches with Regex ###########
[ab57cd, efghZZ7ij#klm, noCODpqr, stuvw27z#xyz]
The easiest way to do this is as follows:
public static void main(String[] args) {
String str = "ab57cdZZ78efghZZ7ij#klmZZ78noCODpqrZZ78stuvw27z#xyzZZ78";
for (String s : str.split("ZZ78"))
System.out.println(s);
}
The output, as expected, is:
ab57cd
efghZZ7ij#klm
noCODpqr
stuvw27z#xyz
If the pattern used to split the string is at the beginning (i.e. "ZZ78" in your example code), the first element returned will be an empty string, as you have already noted. To avoid that, all you need to do is filter the array. This is essentially the same as putting an if, but you can avoid the extra condition line this way. I would do this as follows (in Java 8):
String test_str = ...; // whatever string you want to test it with
Arrays.stream(str.split("ZZ78")).filter(s -> !s.isEmpty()).foreach(System.out::println);
You must need to remove the character class since [ZZ78] matches a single charcater from the given list. (?:(?!ZZ78).)* alone won't give the match you want. Consider this ab57cdZZ78 as an input string. At first this (?:(?!ZZ78).)* matches the string ab57cd, next it tries to match the following Z and check the condition (?!ZZ78) which means match any character but not of ZZ78. So it failes to match the following Z, next the regex engine moves on to the next character Z and checks this (?!ZZ78) condition. Because of the second Z isn't followed by Z78, this Z got matched by the regex engine.
String s = "ab57cdZZ78efghZZ7ij#klmZZ78noCODpqrZZ78stuvw27z#xyzZZ78";
Pattern regex = Pattern.compile("((?:(?!ZZ78).)*)(ZZ78|$)");
Matcher matcher = regex.matcher(s);
while(matcher.find()){
System.out.println(matcher.group(1));
}
Output:
ab57cd
efghZZ7ij#klm
noCODpqr
stuvw27z#xyz
Explanation:
((?:(?!ZZ78).)*) Capture any character but not of ZZ78 zero or more times.
(ZZ78|$) And also capture the following ZZ78 or the end of the line anchor into group 2.
Group index 1 contains single or group of characters other than ZZ78
Update:
String s = "ZZ78ab57cdZZ78efghZZ7ij#klmZZ78noCODpqrZZ78stuvw27z#xyzZZ78";
Pattern regex = Pattern.compile("((?:(?!ZZ78).)*)(ZZ78|$)");
Matcher matcher = regex.matcher(s);
ArrayList<String> allMatches = new ArrayList<String>();
ArrayList<String> list = new ArrayList<String>();
while(matcher.find()){
allMatches.add(matcher.group(1));
}
for (String s1 : allMatches)
if (!s1.equals(""))
list.add(s1);
System.out.println(list);
Output:
[ab57cd, efghZZ7ij#klm, noCODpqr, stuvw27z#xyz]

Java how to use pattern matcher using regular expressions to find certain string

I am not familiar with Patterns & matchers, and I am pretty stuck with this problem.
I have a string that I would like to manipulate in Java.
I understand that I have to use
Pattern p = Pattern.compile("\\[(.*?)\\]");
Matcher m = p.matcher(in);
while(m.find()) {
String found = m.group(1).toString();
}
But in my string, let's say I have the following examples:
String in = "test anything goes here [A#1234|567]"; //OR
String in = "test anything goes here [B#1234|567]"; //OR
String in = "test anything goes here [C#1234|567]";
I want to find [A# | ] or [B# | ] or [C# | ] in the string, how do I use the regex to find the expression?
Use [ABC]# in your regex to match your expression.
Pattern p = Pattern.compile("(\\[[ABC]#.*?\\])");
If the fields are digit then you can safely use \d+
Pattern p = Pattern.compile("(\\[[ABC]#\\d+\\|\\d+\\])");
I'd use a simple Pattern as in the following example:
String[] in = { "test anything goes here [A#1234|567]",
"test anything goes here [B#1234|567]",
"test anything goes here [C#1234|567]" };
Pattern p = Pattern.compile("\\[[A-Z]#\\d+\\|\\d+\\]");
for (String s: in) {
Matcher m = p.matcher(s);
while (m.find()) {
System.out.println("Found: " + m.group());
}
}
}
Output
Found: [A#1234|567]
Found: [B#1234|567]
Found: [C#1234|567]
I'm assuming here that your Pattern has specific restrictions:
Starts with [
Followed by one upper-case non-accented letter
Followed by #
Followed by any number of digits
Followed by |
Followed by any number of digits
Followed by ]
Try:
Pattern p = Pattern.compile("(\\[[A-Z]#.*\\])");
If you want to match any capital A through Z. Unclear if you want all the data between [] though.
My solution:
import java.util.*;
import java.lang.*;
import java.io.*;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
class Some {
public static void main(String[] args) throws java.lang.Exception {
String[] in = {
"test anything goes here [A#1234|567]",
"test anything goes here [B#1234|567]",
"test anything goes here [C#1234|567]"
};
Pattern p = Pattern.compile("\\[(.*?)\\]");
for (String s: in ) {
Matcher m = p.matcher(s);
while (m.find()) {
System.out.println("Found: " + m.group().replaceAll("\\d", ""));
}
}
}
}
This uses your original regex.
Demo: http://ideone.com/4Z5oYD

getting a string with quotes

I have a string "Hello" hello (including the quotes) and i just want to get the Hello that has the quotes but without the quotes
i tried using regular expression but it never finds the quotes im guessing
String s = new String("string");
Pattern p = Pattern.compile("\"([^\"])\"");
Matcher m = p.matcher(n);
while (m.find()) {
s = m.group(1);
}
the while loop never gets executed, suggestions?
-- Moved the star inside the parenthesis for proper grouping ---
"\"([^\"]*)\""
Tested successfully with the code
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Test {
public static void main(String[] args) {
String s = new String("\"Hello\" hello");
Pattern p = Pattern.compile("\"([^\"]*)\"");
Matcher m = p.matcher(s);
while (m.find()) {
System.out.println(m.group(1));
}
}
}
which produced the expected output
Hello
-- Original post follows --
You don't match anything because your regex is written to only match quoted one character strings.
"\"([^\"])*\""
is closer to what you need. Note the star, it means zero or more of the preceeding expression. In this case the preceeding expression is "anything that lacks a double quote".
I suggest you try a String which has quotes in it if you want to find any. ;)
Try
String s = "start \"string\" end";
or
String s = "\"Hello\" hello";
You can simply use indexOf("\"") in this case.

Categories