Regex - Match Pattern with list of values - java

I have a input like google.com and a list of values like
1. *.com
2. *go*.com
3. *abc.com
4. *le.com
5. *.*
I need to write a pattern in java which should return all the matches except *abc.com. I have tried a few but nothing worked as expected. Kindly help. Thanks in advance.
Update:
public static void main(String[] args) {
List<String> values = new ArrayList<String>();
values.add("*.com");
values.add("*go*.com");
values.add("*abc.com");
values.add("*le.com");
values.add("*.*");
String stringToMatch = "google.com";
for (String pattern : values) {
String regex = Pattern.quote(pattern).replace("*", ".*");
System.out.println(stringToMatch.matches(regex));
}
}
Output:
false
false
false
false
false
I have tried this but the pattern doesn't match.

You could transform the given patterns into regexes, and then use normal regex functions like String.matches():
for (String pattern : patterns) {
final String regex = pattern.replaceAll("[\\.\\[\\](){}?+|\\\\]", "\\\\$0").replace("*", ".*");
System.out.println(stringToMatch.matches(regex));
}
edit: Apparently Pattern.quote() just adds \Q...\E around the string. Edited to use manual quoting.
edit 2: Another possibility is:
final String regex = Pattern.quote(pattern).replace("*", "\\E.*\\Q");

Based on a previous answer of mine (read the comments of the question, very instructive), here is a wildcardsToRegex method:
public static String wildcardsToRegex(String wildcards) {
String regex = wildcards;
// .matches() auto-anchors, so add [*] (i.e. "containing")
regex = "*" + regex + "*";
// replace any pair of backslashes by [*]
regex = regex.replaceAll("(?<!\\\\)(\\\\\\\\)+(?!\\\\)", "*");
// minimize unescaped redundant wildcards
regex = regex.replaceAll("(?<!\\\\)[?]*[*][*?]+", "*");
// escape unescaped regexps special chars, but [\], [?] and [*]
regex = regex.replaceAll("(?<!\\\\)([|\\[\\]{}(),.^$+-])", "\\\\$1");
// replace unescaped [?] by [.]
regex = regex.replaceAll("(?<!\\\\)[?]", ".");
// replace unescaped [*] by [.*]
regex = regex.replaceAll("(?<!\\\\)[*]", ".*");
// return whether data matches regex or not
return regex;
}
Then, within your loop, use:
for (String pattern : values) {
System.out.println(stringToMatch.matches(wildcardsToRegex(pattern)));
}

Change this line in your code:
String regex = Pattern.quote(pattern).replace("*", ".*");
To this:
String regex = pattern.replace(".", "\\.").replace("*", ".*");

You can use :
List<String> values = new ArrayList<String>();
values.add("*.com");
values.add("*go*.com");
values.add("*abc.com");
values.add("*le.com");
values.add("*.*");
String stringToMatch = "google.com";
for (String pattern : values) {
String regex = pattern.replaceAll("[.]", "\\.").replaceAll("[*]", "\\.\\*");
System.out.println(stringToMatch.matches(regex));
}

Related

Replace all occurrences matching given patterns

Having following string:
String value = "/cds/horse/schema1.0.0/day=12321/provider=samsung/run_key=32ee/group_key=222/end_date=2020-04-20/run_key_default=32sas1/somethingElse=else"
In need to replace values of run_key and run_key_default with %, for example, for above string result output will be the:
"/cds/horse/schema1.0.0/day=12321/provider=samsung/run_key=%/group_key=222/end_date=2020-04-20/run_key_default=%/somethingElse=else"
I would like to avoid mistakenly modifying other values, so in my opinion the best solution for it is combining replaceAll method with regex
String output = value.replaceAll("\run_key=[*]\", "%").replaceAll("\run_key_default=[*]\", "%")
I'm not sure how should I construct regex for it?
Feel free to post if you know better solution for it, than this one which I provided.
You may use this regex for search:
(/run_key(?:_default)?=)[^/]*
and for replacement use:
"$1%"
RegEx Demo
Java Code:
String output = value.replaceAll("(/run_key(?:_default)?=)[^/]*", "$1%");
RegEx Details:
(: Start capture group #1
/run_key: Match literal text /run_key
(?:_default)?: Match _default optionally
=: Match a literal =
): End capture group #1
[^/]*: Match 0 or more of any characters that is not /
"$1%" is replacement that puts our 1st capture group back followed by a literal %
public static void main(String[] args) {
final String regex = "(run_key_default|run_key)=\\w*"; //regex
final String string = "/cds/horse/schema1.0.0/day=12321/provider=samsung/run_key=32ee/group_key=222/end_date=2020-04-20/run_key_default=32sas1/somethingElse=else";
final String subst = "$1=%"; //group1 as it is while remaining part with %
final Pattern pattern = Pattern.compile(regex);
final Matcher matcher = pattern.matcher(string);
final String result = matcher.replaceAll(subst);
System.out.println("Substitution result: " + result);
}
output
Substitution result:
/cds/horse/schema1.0.0/day=12321/provider=samsung/run_key=%/group_key=222/end_date=2020-04-20/run_key_default=%/somethingElse=else

When use java regular-expression pattern.matcher(), source does not match regex.But, my hope result is ,source matches regex

When use java regular-expression pattern.matcher(), source does not match regex.But, my hope result is ,source matches regex.
String source = "ONE.TWO"
String regex = "^ONE\\.TWO\\..*"
import java.util.regex.Pattern;
public class Test {
public static void main(String[] args) {
test();
}
public static void test() {
Test stringDemo = new Test();
stringDemo.testMatcher();
}
public void testMatcher() {
String source = "ONE.TWO";
String regex = "^ONE\\.TWo\\..*";
// The result = false, "not match". But, the hope result is true, "match"
matcher(source, regex);
}
public void matcher(String source, String regex) {
Pattern pattern = Pattern.compile(regex);
boolean match = pattern.matcher(source).matches();
if (match) {
System.out.println("match");
} else {
System.out.println("not match");
}
}
}
In your code, your regular expression expects the o in TWO to be lower case and expects it to be followed by a ..
Try:
String source = "ONE.TWo.";
This will match your regular expression as coded in your question.
The expression \. means match a literal dot (rather than any character). When you code this into a Java String, you have to escape the backslash with another backslash, so it becomes "\\.".
The .* on the end of the expression means "match zero or more of any character (except line-break)".
So this would also match:
String source = "ONE.TWo.blah blah";
Well it doesn't match for two reasons:
Your regex "^ONE\\.TWo\\..*" isn't case sensitive so how do you expect TWo to match TWO.
And your regex expects a . character at the end while your string "ONE.TWO" doesn't have it.
Use the following Regex, to match your source string:
String regex = "^ONE\\.TWO\\.*.*";
Pattern matching is case sensitive by Default. In your case source has a uppercase O and regex a lowercase o.
So you have to add Pattern.CASE_INSENSITIVE or Change the case of o
Pattern pattern = Pattern.compile(regex,Pattern.CASE_INSENSITIVE );
or
String regex = "^ONE\\.TWO\\..*";
Your regex is a bit incorrect. You have an extra dot here:
String regex = "^ONE\.TWO\.(extra dot).*"
Try this one, without dot:
String regex = "^ONE\.TWO.*"
String regex = "^ONE\\.TWO\\..*"
The DOUBLE SLASH \\ in regex is escape sequence to match a SINGLE SLASH \ in Source string.
The .* at the end matches any character 0 or More times except line breaks.
To match the regex your source should be like
String source = "ONE\.TWO\three blah ##$ etc" OR
String source = "ONE\.TWO\.123##$ etc"
Basically its Any String which starts with ONE\.TWO\ and without line breaks.

Get all matches within a string using complie and regex

I'm trying to get all matches which starts with _ and ends with = from a URL which looks like
?_field1=param1,param2,paramX&_field2=param1,param2,paramX
In that case I'm looking for any instance of _fieldX=
A method which I use to get it looks like
public static List<String> getAllMatches(String url, String regex) {
List<String> matches = new ArrayList<String>();
Matcher m = Pattern.compile("(?=(" + regex + "))").matcher(url);
while(m.find()) {
matches.add(m.group(1));
}
return matches;
}
called as
List<String> fieldsList = getAllMatches(url, "_.=");
but somehow is not finding anything what I have expected.
Any suggestions what I have missed?
A regex like (?=(_.=)) matches all occurrences of overlapping matches that start with _, then have any 1 char (other than a line break char) and then =.
You need no overlapping matches in the context of the string you provided.
You may just use a lazy dot matching pattern, _(.*?)=. Alternatively, you may use a negated character class based regex: _([^=]+)= (it will capture into Group 1 any one or more chars other than = symbol).
Since you are passing a regex to the method, it seems you want a generic function.
If so, you may use this method:
public static List<String> getAllMatches(String url, String start, String end) {
List<String> matches = new ArrayList<String>();
Matcher m = Pattern.compile(start + "(.*?)" + end).matcher(url);
while(m.find()) {
matches.add(m.group(1));
}
return matches;
}
and call it as:
List<String> fieldsList = getAllMatches(url, "_", "=");

Regex to exclude word from matches java code

Maybe someone could help me. I'm trying to include within a java code a regex to match all strings except the ZZ78. I'd like to know what it's missing in the regex I have.
The input string is str = "ab57cdZZ78efghZZ7ij#klmZZ78noCODpqrZZ78stuvw27z#xyzZZ78"
and I'm trying with this regex (?:(?![ZZF8]).)* but if you test in http://regexpal.com/
this regex against the string, you'll see that is not working completely.
str = new String ("ab57cdZZ78efghZZ7ij#klmZZ78noCODpqrZZ78stuvw27z#xyzZZ78");
Pattern pattern = Pattern.compile("(?:(?![ZZ78]).)*");
the matched strings should be
ab57cd
efghZZ7ij#klm
noCODpqr
stuvw27z#xyz
Update:
Hello Avinash Raj and Chthonic Project. Thanks so much for your help and solutions provided.
I originally thougth in split method, but I was trying to avoid get empty strings as result
when for example the delimiter string is at the beginning or at the end of the main string.
Then, I thought that a regex could help me to extract all except "ZZ78", avoiding in this way
empty results in the output.
Below I show the code using split method (Chthonic´s) and regex (Avinash´s) both produce empty
string if the commented "if()" conditions are not used.
Does the use of those "if()" are the only way to not print empty strings? or could be the regex
tweaked a little bit to match not empty strings?
This is the code I have tested so far:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexTest {
public static void main(String[] args) {
System.out.println("########### Matches with Split ###########");
String str = "ZZ78ab57cdZZ78efghZZ7ij#klmZZ78noCODpqrZZ78stuvw27z#xyzZZ78";
for (String s : str.split("ZZ78")) {
//if ( !s.isEmpty() ) {
System.out.println("This is a match <<" + s + ">>");
//}
}
System.out.println("##########################################");
System.out.println("########### Matches with Regex ###########");
String s = "ZZ78ab57cdZZ78efghZZ7ij#klmZZ78noCODpqrZZ78stuvw27z#xyzZZ78";
Pattern regex = Pattern.compile("((?:(?!ZZ78).)*)(ZZ78|$)");
Matcher matcher = regex.matcher(s);
while(matcher.find()){
//if ( !matcher.group(1).isEmpty() ) {
System.out.println("This is a match <<" + matcher.group(1) + ">>");
//}
}
}
}
**and the output (without use the "if()´s"):**
########### Matches with Split ###########
This is a match <<>>
This is a match <<ab57cd>>
This is a match <<efghZZ7ij#klm>>
This is a match <<noCODpqr>>
This is a match <<stuvw27z#xyz>>
##########################################
########### Matches with Regex ###########
This is a match <<>>
This is a match <<ab57cd>>
This is a match <<efghZZ7ij#klm>>
This is a match <<noCODpqr>>
This is a match <<stuvw27z#xyz>>
This is a match <<>>
Thanks for help so far.
Thanks in advance
Update #2:
Excellent both of your answers and solutions. Now it works very nice. This is the final code I've tested with both solutions.
Many thanks again.
import java.util.ArrayList;
import java.util.Arrays;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexTest {
public static void main(String[] args) {
System.out.println("########### Matches with Split ###########");
String str = "ZZ78ab57cdZZ78efghZZ7ij#klmZZ78noCODpqrZZ78stuvw27z#xyzZZ78";
Arrays.stream(str.split("ZZ78")).filter(s -> !s.isEmpty()).forEach(System.out::println);
System.out.println("##########################################");
System.out.println("########### Matches with Regex ###########");
String s = "ZZ78ab57cdZZ78efghZZ7ij#klmZZ78noCODpqrZZ78stuvw27z#xyzZZ78";
Pattern regex = Pattern.compile("((?:(?!ZZ78).)*)(ZZ78|$)");
Matcher matcher = regex.matcher(s);
ArrayList<String> allMatches = new ArrayList<String>();
ArrayList<String> list = new ArrayList<String>();
while(matcher.find()){
allMatches.add(matcher.group(1));
}
for (String s1 : allMatches)
if (!s1.equals(""))
list.add(s1);
System.out.println(list);
}
}
And output:
########### Matches with Split ###########
ab57cd
efghZZ7ij#klm
noCODpqr
stuvw27z#xyz
##########################################
########### Matches with Regex ###########
[ab57cd, efghZZ7ij#klm, noCODpqr, stuvw27z#xyz]
The easiest way to do this is as follows:
public static void main(String[] args) {
String str = "ab57cdZZ78efghZZ7ij#klmZZ78noCODpqrZZ78stuvw27z#xyzZZ78";
for (String s : str.split("ZZ78"))
System.out.println(s);
}
The output, as expected, is:
ab57cd
efghZZ7ij#klm
noCODpqr
stuvw27z#xyz
If the pattern used to split the string is at the beginning (i.e. "ZZ78" in your example code), the first element returned will be an empty string, as you have already noted. To avoid that, all you need to do is filter the array. This is essentially the same as putting an if, but you can avoid the extra condition line this way. I would do this as follows (in Java 8):
String test_str = ...; // whatever string you want to test it with
Arrays.stream(str.split("ZZ78")).filter(s -> !s.isEmpty()).foreach(System.out::println);
You must need to remove the character class since [ZZ78] matches a single charcater from the given list. (?:(?!ZZ78).)* alone won't give the match you want. Consider this ab57cdZZ78 as an input string. At first this (?:(?!ZZ78).)* matches the string ab57cd, next it tries to match the following Z and check the condition (?!ZZ78) which means match any character but not of ZZ78. So it failes to match the following Z, next the regex engine moves on to the next character Z and checks this (?!ZZ78) condition. Because of the second Z isn't followed by Z78, this Z got matched by the regex engine.
String s = "ab57cdZZ78efghZZ7ij#klmZZ78noCODpqrZZ78stuvw27z#xyzZZ78";
Pattern regex = Pattern.compile("((?:(?!ZZ78).)*)(ZZ78|$)");
Matcher matcher = regex.matcher(s);
while(matcher.find()){
System.out.println(matcher.group(1));
}
Output:
ab57cd
efghZZ7ij#klm
noCODpqr
stuvw27z#xyz
Explanation:
((?:(?!ZZ78).)*) Capture any character but not of ZZ78 zero or more times.
(ZZ78|$) And also capture the following ZZ78 or the end of the line anchor into group 2.
Group index 1 contains single or group of characters other than ZZ78
Update:
String s = "ZZ78ab57cdZZ78efghZZ7ij#klmZZ78noCODpqrZZ78stuvw27z#xyzZZ78";
Pattern regex = Pattern.compile("((?:(?!ZZ78).)*)(ZZ78|$)");
Matcher matcher = regex.matcher(s);
ArrayList<String> allMatches = new ArrayList<String>();
ArrayList<String> list = new ArrayList<String>();
while(matcher.find()){
allMatches.add(matcher.group(1));
}
for (String s1 : allMatches)
if (!s1.equals(""))
list.add(s1);
System.out.println(list);
Output:
[ab57cd, efghZZ7ij#klm, noCODpqr, stuvw27z#xyz]

Finding a Match using java.lang.String.matches()

I have a String that contains new line characters say...
str = "Hello\n"+"Batman,\n" + "Joker\n" + "here\n"
I would want to know how to find the existance of a particular word say .. Joker in the string str using java.lang.String.matches()
I find that str.matches(".*Joker.*") returns false and returns true if i remove the new line characters. So what would be the regex expression to be used as an argument to str.matches()?
One way is... str.replaceAll("\\n","").matches(.*Joker.*);
The problem is that the dot in .* does not match newlines by default. If you want newlines to be matched, your regex must have the flag Pattern.DOTALL.
If you want to embed that in a regex used in .matches() the regex would be:
"(?s).*Joker.*"
However, note that this will match Jokers too. A regex does not have the notion of words. Your regex would therefore really need to be:
"(?s).*\\bJoker\\b.*"
However, a regex does not need to match all its input text (which is what .matches() does, counterintuitively), only what is needed. Therefore, this solution is even better, and does not require Pattern.DOTALL:
Pattern p = Pattern.compile("\\bJoker\\b"); // \b is the word anchor
p.matcher(str).find(); // returns true
You can do something much simpler; this is a contains. You do not need the power of regex:
public static void main(String[] args) throws Exception {
final String str = "Hello\n" + "Batman,\n" + "Joker\n" + "here\n";
System.out.println(str.contains("Joker"));
}
Alternatively you can use a Pattern and find:
public static void main(String[] args) throws Exception {
final String str = "Hello\n" + "Batman,\n" + "Joker\n" + "here\n";
final Pattern p = Pattern.compile("Joker");
final Matcher m = p.matcher(str);
if (m.find()) {
System.out.println("Found match");
}
}
You want to use a Pattern that uses the DOTALL flag, which says that a dot should also match new lines.
String str = "Hello\n"+"Batman,\n" + "Joker\n" + "here\n";
Pattern regex = Pattern.compile("".*Joker.*", Pattern.DOTALL);
Matcher regexMatcher = regex.matcher(str);
if (regexMatcher.find()) {
// found a match
}
else
{
// no match
}

Categories