Looking for A Regular expression to match java regex (punct) pattern - java

I am looking for help/support for a Regex expression which will match studentIdMatch2 value in below class. studentIdMatch1 matches fine.However the studentIdMatch2 has studentId which can allow all the special characters other than : and ^ and comma.Hence its not working,thank you for your time and appreciate your support.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class TestRegEx {
public static void main(String args[]){
String studentIdMatch1 = "studentName:harry,^studentId:Id123";
String studentIdMatch2 = "studentName:harry,^studentId:Id-H/MPU/L&T/OA+_T/(1490)/17#)123";
Pattern pattern = Pattern
.compile("(\\p{Punct}?)(\\w+?)(:)(\\p{Punct}?)(\\w+?)(\\p{Punct}?),");
Matcher matcher = pattern.matcher(studentIdMatch1 + ","); // Works Fine(Matches Student Name and Id)
// No Special Characters in StudentId
//Matcher matcher = pattern.matcher(studentIdMatch2 + ","); //Wont work Special Characters in StudentId. Matches Student Name
while (matcher.find()) {
System.out.println("group1 = "+matcher.group(1)+ "group2 = "+matcher.group(2) +"group3 = "+matcher.group(3) +"group4 = "+matcher.group(4)+"group5 = "+matcher.group(5));
}
System.out.println("match ended");
}
}

You may try:
^SutdentName:(\w+),\^StudenId:([^\s,^:]+)$
Explanation of the above regex:
^, $ - Represents start and end of line respectively.
SutdentName: - Matches SutdentName: literally. Although according to me it should be StudentName; but I didn't changed it.
(\w+) - Represents first capturing group matching only word characters i.e. [A-Za-z0-9_] one or more times greedily.
,\^StudenId: - Matches ,^StudenId literally. Here also I guess it should be StudentId.
([^\s,^:]+) - Represents second capturing group matching everything other than white-space, ,, ^ and : one or more times greedily. You can add others according to your requirements.
You can find the demo of the above regex in here.
Sample Implementation in java:
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class Main
{
private static final Pattern pattern = Pattern.compile("^SutdentName:(\\w+),\\^StudenId:([^\\s,^:]+)$", Pattern.MULTILINE);
public static void main(String[] args) {
String string = "SutdentName:harry,^StudenId:Id123\n"
+ "SutdentName:harry,^StudenId:Id-H/MNK/U&T/BA+_T/(1490)/17#)123";
Matcher matcher = pattern.matcher(string);
while(matcher.find()){
System.out.println(matcher.group(1) + " " + matcher.group(2));
}
}
}
You can find the sample run of the above code in here.

The second (\\w+?) only captures words. So change it to capture what you want. i.e
allow all the special characters other than : and ^ and comma
like ([^:^,]+?)
^ - Negate the match
:^, - Matches : , ^ and comma

Related

RegEx for capturing digits from a string

I have this String:
String filename = 20190516.BBARC.GLIND.statistics.xml;
How can I get the first part of the String (numbers) without the use of substrings.
Here, we might just want to collect our digits using a capturing group, and if we wish, we could later add more boundaries, maybe with an expression as simple as:
([0-9]+)
For instance, if our desired digits are at the start of our inputs, we might want to add a start char as a left boundary:
^([0-9]+)
Or if our digits are always followed by a ., we can bound it with that:
^([0-9]+)\.
and we can also add a uppercase letter after that to strengthen our right boundary and continue this process, if it might be necessary:
^([0-9]+)\.[A-Z]
RegEx
If this expression wasn't desired, it can be modified or changed in regex101.com.
RegEx Circuit
jex.im visualizes regular expressions:
Test
import java.util.regex.Matcher;
import java.util.regex.Pattern;
final String regex = "([0-9]+)";
final String string = "20190516.BBARC.GLIND.statistics.xml";
final String subst = "\\1";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
// The substituted value will be contained in the result variable
final String result = matcher.replaceAll(subst);
System.out.println("Substitution result: " + result);
Demo
const regex = /([0-9]+)(.*)/gm;
const str = `20190516.BBARC.GLIND.statistics.xml`;
const subst = `$1`;
// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);
console.log('Substitution result: ', result);
To extract a part or parts of string using regex I prefer to define groups.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class B {
public static void main(String[] args) {
String in="20190516.BBARC.GLIND.statistics.xml";
Pattern p=Pattern.compile("(\\w+).*");
Matcher m=p.matcher(in);
if(m.matches())
System.out.println(m.group(1));
else
System.out.println("no match");
}
}

Java - Find Regex to match in String

I need some help because i'm junior in Java and after some research on the web
I can't find a solution.
So my problem:
String str : size="A4"
I would like to extract 'A4' with a regex by giving the word "size" in the regex.
How can I do ?
import java.util.*;
import java.util.regex.*;
import java.lang.*;
import java.io.*;
class Ideone
{
public static void main (String[] args) throws java.lang.Exception
{
Matcher m=Pattern.compile("size\\s*=\\s*\"([^\"]*)\"").matcher("size=\"A4\"");
while(m.find())
System.out.println(m.group(1));
}
}
Output:
A4
http://ideone.com/FqMuTA
Regex breakdown:
size\\s*=\\s*\"([^\"]*)\"
size matches size literally
\\s*=\\s* matches 0 or more white spaces leading or trailing the
= sign
\" matches a double quote
([^\"]*) matches 0 or more characters(which is not a double quote
[^\"]) and remembers the captured text as back-reference 1 i.e
nothing but captured group number 1 used below in the while loop
\" we match the ending double quote
You can find more info on regex here
Create a Pattern java.util.regex.Pattern that matches your conditions.
Generate a Matcher java.util.regex.Matcher that handles the input String
let the Matcher find the desired value (by using Matcher.group(group) )
.
//1. create Pattern
Pattern p = Pattern.compile("size=\\\"([A-Za-z0-9]{2})\\\"");
//2. generate Matcher
Matcher m = p.matcher(myString);
//3. find value using groups(int)
if(m.find()) {
System.out.println( m.group(1) );
}

Regex to exclude word from matches java code

Maybe someone could help me. I'm trying to include within a java code a regex to match all strings except the ZZ78. I'd like to know what it's missing in the regex I have.
The input string is str = "ab57cdZZ78efghZZ7ij#klmZZ78noCODpqrZZ78stuvw27z#xyzZZ78"
and I'm trying with this regex (?:(?![ZZF8]).)* but if you test in http://regexpal.com/
this regex against the string, you'll see that is not working completely.
str = new String ("ab57cdZZ78efghZZ7ij#klmZZ78noCODpqrZZ78stuvw27z#xyzZZ78");
Pattern pattern = Pattern.compile("(?:(?![ZZ78]).)*");
the matched strings should be
ab57cd
efghZZ7ij#klm
noCODpqr
stuvw27z#xyz
Update:
Hello Avinash Raj and Chthonic Project. Thanks so much for your help and solutions provided.
I originally thougth in split method, but I was trying to avoid get empty strings as result
when for example the delimiter string is at the beginning or at the end of the main string.
Then, I thought that a regex could help me to extract all except "ZZ78", avoiding in this way
empty results in the output.
Below I show the code using split method (Chthonic´s) and regex (Avinash´s) both produce empty
string if the commented "if()" conditions are not used.
Does the use of those "if()" are the only way to not print empty strings? or could be the regex
tweaked a little bit to match not empty strings?
This is the code I have tested so far:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexTest {
public static void main(String[] args) {
System.out.println("########### Matches with Split ###########");
String str = "ZZ78ab57cdZZ78efghZZ7ij#klmZZ78noCODpqrZZ78stuvw27z#xyzZZ78";
for (String s : str.split("ZZ78")) {
//if ( !s.isEmpty() ) {
System.out.println("This is a match <<" + s + ">>");
//}
}
System.out.println("##########################################");
System.out.println("########### Matches with Regex ###########");
String s = "ZZ78ab57cdZZ78efghZZ7ij#klmZZ78noCODpqrZZ78stuvw27z#xyzZZ78";
Pattern regex = Pattern.compile("((?:(?!ZZ78).)*)(ZZ78|$)");
Matcher matcher = regex.matcher(s);
while(matcher.find()){
//if ( !matcher.group(1).isEmpty() ) {
System.out.println("This is a match <<" + matcher.group(1) + ">>");
//}
}
}
}
**and the output (without use the "if()´s"):**
########### Matches with Split ###########
This is a match <<>>
This is a match <<ab57cd>>
This is a match <<efghZZ7ij#klm>>
This is a match <<noCODpqr>>
This is a match <<stuvw27z#xyz>>
##########################################
########### Matches with Regex ###########
This is a match <<>>
This is a match <<ab57cd>>
This is a match <<efghZZ7ij#klm>>
This is a match <<noCODpqr>>
This is a match <<stuvw27z#xyz>>
This is a match <<>>
Thanks for help so far.
Thanks in advance
Update #2:
Excellent both of your answers and solutions. Now it works very nice. This is the final code I've tested with both solutions.
Many thanks again.
import java.util.ArrayList;
import java.util.Arrays;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexTest {
public static void main(String[] args) {
System.out.println("########### Matches with Split ###########");
String str = "ZZ78ab57cdZZ78efghZZ7ij#klmZZ78noCODpqrZZ78stuvw27z#xyzZZ78";
Arrays.stream(str.split("ZZ78")).filter(s -> !s.isEmpty()).forEach(System.out::println);
System.out.println("##########################################");
System.out.println("########### Matches with Regex ###########");
String s = "ZZ78ab57cdZZ78efghZZ7ij#klmZZ78noCODpqrZZ78stuvw27z#xyzZZ78";
Pattern regex = Pattern.compile("((?:(?!ZZ78).)*)(ZZ78|$)");
Matcher matcher = regex.matcher(s);
ArrayList<String> allMatches = new ArrayList<String>();
ArrayList<String> list = new ArrayList<String>();
while(matcher.find()){
allMatches.add(matcher.group(1));
}
for (String s1 : allMatches)
if (!s1.equals(""))
list.add(s1);
System.out.println(list);
}
}
And output:
########### Matches with Split ###########
ab57cd
efghZZ7ij#klm
noCODpqr
stuvw27z#xyz
##########################################
########### Matches with Regex ###########
[ab57cd, efghZZ7ij#klm, noCODpqr, stuvw27z#xyz]
The easiest way to do this is as follows:
public static void main(String[] args) {
String str = "ab57cdZZ78efghZZ7ij#klmZZ78noCODpqrZZ78stuvw27z#xyzZZ78";
for (String s : str.split("ZZ78"))
System.out.println(s);
}
The output, as expected, is:
ab57cd
efghZZ7ij#klm
noCODpqr
stuvw27z#xyz
If the pattern used to split the string is at the beginning (i.e. "ZZ78" in your example code), the first element returned will be an empty string, as you have already noted. To avoid that, all you need to do is filter the array. This is essentially the same as putting an if, but you can avoid the extra condition line this way. I would do this as follows (in Java 8):
String test_str = ...; // whatever string you want to test it with
Arrays.stream(str.split("ZZ78")).filter(s -> !s.isEmpty()).foreach(System.out::println);
You must need to remove the character class since [ZZ78] matches a single charcater from the given list. (?:(?!ZZ78).)* alone won't give the match you want. Consider this ab57cdZZ78 as an input string. At first this (?:(?!ZZ78).)* matches the string ab57cd, next it tries to match the following Z and check the condition (?!ZZ78) which means match any character but not of ZZ78. So it failes to match the following Z, next the regex engine moves on to the next character Z and checks this (?!ZZ78) condition. Because of the second Z isn't followed by Z78, this Z got matched by the regex engine.
String s = "ab57cdZZ78efghZZ7ij#klmZZ78noCODpqrZZ78stuvw27z#xyzZZ78";
Pattern regex = Pattern.compile("((?:(?!ZZ78).)*)(ZZ78|$)");
Matcher matcher = regex.matcher(s);
while(matcher.find()){
System.out.println(matcher.group(1));
}
Output:
ab57cd
efghZZ7ij#klm
noCODpqr
stuvw27z#xyz
Explanation:
((?:(?!ZZ78).)*) Capture any character but not of ZZ78 zero or more times.
(ZZ78|$) And also capture the following ZZ78 or the end of the line anchor into group 2.
Group index 1 contains single or group of characters other than ZZ78
Update:
String s = "ZZ78ab57cdZZ78efghZZ7ij#klmZZ78noCODpqrZZ78stuvw27z#xyzZZ78";
Pattern regex = Pattern.compile("((?:(?!ZZ78).)*)(ZZ78|$)");
Matcher matcher = regex.matcher(s);
ArrayList<String> allMatches = new ArrayList<String>();
ArrayList<String> list = new ArrayList<String>();
while(matcher.find()){
allMatches.add(matcher.group(1));
}
for (String s1 : allMatches)
if (!s1.equals(""))
list.add(s1);
System.out.println(list);
Output:
[ab57cd, efghZZ7ij#klm, noCODpqr, stuvw27z#xyz]

pattern matching with regular expression in java

I need to write a program that matches pattern with a line, that pattern may be a regular expression or normal pattern
Example:
if pattern is "tiger" then line that contains only "tiger" should match
if pattern is "^t" then lines that starts with "t" should match
I have done this with:
Blockquote Pattern and Matcher class
The problem is that when I use Matcher.find(), all regular expressions are matching but if I give full pattern then it is not matching.
If I use matches(), then only complete patterns are matching, not regular expressions.
My code:
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class MatchesLooking
{
private static final String REGEX = "^f";
private static final String INPUT =
"fooooooooooooooooo";
private static Pattern pattern;
private static Matcher matcher;
public static void main(String[] args)
{
// Initialize
pattern = Pattern.compile(REGEX);
matcher = pattern.matcher(INPUT);
System.out.println("Current REGEX is: "
+ REGEX);
System.out.println("Current INPUT is: "
+ INPUT);
System.out.println("find(): "
+ matcher.find());
System.out.println("matches(): "
+ matcher.matches());
}
}
matches given a regex of ^t would only match when the string only consists of a t.
You need to include the rest of the string as well for it to match. You can do so by appending .*, which means zero or more wildcards.
"^t.*"
Also, the ^ (and equivalently $) is optional when using matches.
I hope that helps, I'm not entirely clear on what you're struggling with. Feel free to clarify.
This is how Matcher works:
while (matcher.find()) {
System.out.println(matcher.group());
}
If you're sure there could be only one match in the input, then you could also use:
System.out.println("find(): " + matcher.find());
System.out.println("matches(): " + matcher.group());

Question about Java regex

I get a string from a array list:
array.get(0).toString()
gives TITLE = "blabla"
I want the string blabla, so I try this :
Pattern p = Pattern.compile("(\".*\")");
Matcher m = p.matcher(array.get(0).toString());
System.out.println("Title : " + m.group(0));
It doesn't work: java.lang.IllegalStateException: No match found
I also try:
Pattern p = Pattern.compile("\".*\"");
Pattern p = Pattern.compile("\".*\"");
Pattern p = Pattern.compile("\\\".*\\\"");
Nothing matches in my program but ALL patterns work on http://www.fileformat.info/tool/regex.htm
Any Idea? Thanks in advance.
A couple of points:
The Javadoc for Matcher#group states:
IllegalStateException - If no match has yet been attempted, or if the previous match operation failed
That is, before using group, you must first use m.matches (to match the entire sequence), or m.find (to match a subsequence).
Secondly, you actually want m.group(1), since m.group(0) is the whole pattern.
Actually, this isn't so important here since the regexp in question starts and ends with the capture parentheses, so that group(0) is the same string as group(1), but it would matter if your regexp looked like: "TITLE = (\".*\")"
Example code:
import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import org.junit.Test;
#SuppressWarnings("serial")
public class MatcherTest {
#Test(expected = IllegalStateException.class)
public void testIllegalState() {
List<String> array = new ArrayList<String>() {{ add("Title: \"blah\""); }};
Pattern p = Pattern.compile("(\".*\")");
Matcher m = p.matcher(array.get(0).toString());
System.out.println("Title : " + m.group(0));
}
#Test
public void testLegal() {
List<String> array = new ArrayList<String>() {{ add("Title: \"blah\""); }};
Pattern p = Pattern.compile("(\".*\")");
Matcher m = p.matcher(array.get(0).toString());
if (m.find()) {
System.out.println("Title : " + m.group(1));
}
}
}
You need to call find() or matches() on the Matcher instance first: these actually execute the regular expression and return whether it matched or not. And then only if it matched you can call the methods to get the match groups.
are you including the double quotes (") in the string?
All your regex' have escaped "s and will only match if the string in the list includes double quote characters.

Categories