Java regex mid-string matching - java

I have a dictionary file containing lines
ns*.abc.com
ns*.xyz.com
I want to match patterns such as ns15.abc.com, nsABC.abc.com with the dictionary file and return true.
E.g. ns15.abc.com is match while ns16.ABC.abc.com is not a match.
Thanks in advance
public class ValidateDemo {
public static void main(String[] args) {
List<String> input = new ArrayList<String>();
input.add("ns14.abc.com");
for (String str : input) {
if (str.matches("ns*.abc.com")) {
System.out.println("Match: " + str);
}
}
}
}

In each entry from the dictionary, replace asterisk by "[^\.]*" and dot by "\." - there's your pattern for each individual entry:
Pattern p = Pattern.compile("^" + dictEntry.replaceAll(".", "\\.").replaceAll("*", "[^\\.]*") + "$");
Or you can also join all dictionary entries with "|" to have single pattern matching them all.

Related

complex regular expression in Java

I have a rather complex (to me it seems rather complex) problem that I'm using regular expressions in Java for:
I can get any text string that must be of the format:
M:<some text>:D:<either a url or string>:C:<some more text>:Q:<a number>
I started with a regular expression for extracting the text between the M:/:D:/:C:/:Q: as:
String pattern2 = "(M:|:D:|:C:|:Q:.*?)([a-zA-Z_\\.0-9]+)";
And that works fine if the <either a url or string> is just an alphanumeric string. But it all falls apart when the embedded string is a url of the format:
tcp://someurl.something:port
Can anyone help me adjust the above reg exp to extract the text after :D: to be either a url or a alpha-numeric string?
Here's an example:
public static void main(String[] args) {
String name = "M:myString1:D:tcp://someurl.com:8989:C:myString2:Q:1";
boolean matchFound = false;
ArrayList<String> values = new ArrayList<>();
String pattern2 = "(M:|:D:|:C:|:Q:.*?)([a-zA-Z_\\.0-9]+)";
Matcher m3 = Pattern.compile(pattern2).matcher(name);
while (m3.find()) {
matchFound = true;
String m = m3.group(2);
System.out.println("regex found match: " + m);
values.add(m);
}
}
In the above example, my results would be:
myString1
tcp://someurl.com:8989
myString2
1
And note that the Strings can be of variable length, alphanumeric, but allowing some characters (such as the url format with :// and/or . - characters
You mention that the format is constant:
M:<some text>:D:<either a url or string>:C:<some more text>:Q:<a number>
Capture groups can do this for you with the pattern:
"M:(.*):D:(.*):C:(.*):Q:(.*)"
Or you can do a String.split() with a pattern of "M:|:D:|:C:|:Q:". However, the split will return an empty element at the first index. Everything else will follow.
public static void main(String[] args) throws Exception {
System.out.println("Regex: ");
String data = "M:<some text>:D:tcp://someurl.something:port:C:<some more text>:Q:<a number>";
Matcher matcher = Pattern.compile("M:(.*):D:(.*):C:(.*):Q:(.*)").matcher(data);
if (matcher.matches()) {
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println(matcher.group(i));
}
}
System.out.println();
System.out.println("String.split(): ");
String[] pieces = data.split("M:|:D:|:C:|:Q:");
for (String piece : pieces) {
System.out.println(piece);
}
}
Results:
Regex:
<some text>
tcp://someurl.something:port
<some more text>
<a number>
String.split():
<some text>
tcp://someurl.something:port
<some more text>
<a number>
To extract the URL/text part you don't need the regular expression. Use
int startPos = input.indexOf(":D:")+":D:".length();
int endPos = input.indexOf(":C:", startPos);
String urlOrText = input.substring(startPos, endPos);
Assuming you need to do some validation along with the parsing:
break the regex into different parts like this:
String m_regex = "[\\w.]+"; //in jsva a . in [] is just a plain dot
String url_regex = "."; //theres a bunch online, pick your favorite.
String d_regex = "(?:" + url_regex + "|\\p{Alnum}+)"; // url or a sequence of alphanumeric characters
String c_regex = "[\\w.]+"; //but i'm assuming you want this to be a bit more strictive. not sure.
String q_regex = "\\d+"; //what sort of number exactly? assuming any string of digits here
String regex = "M:(?<M>" + m_regex + "):"
+ "D:(?<D>" + d_regex + "):"
+ "C:(?<D>" + c_regex + "):"
+ "Q:(?<D>" + q_regex + ")";
Pattern p = Pattern.compile(regex);
Might be a good idea to keep the pattern as a static field somewhere and compile it in a static block so that the temporary regex strings don't overcrowd some class with basically useless fields.
Then you can retrieve each part by its name:
Matcher m = p.matcher( input );
if (m.matches()) {
String m_part = m.group( "M" );
...
String q_part = m.group( "Q" );
}
You can go even a step further by making a RegexGroup interface/objects where each implementing object represents a part of the regex which has a name and the actual regex. Though you definitely lose the simplicity makes it harder to understand it with a quick glance. (I wouldn't do this, just pointing out its possible and has its own benefits)

Find words matching a specific REGEX within a sentence using JAVA

I am trying to generate a dynamic message that can be used for processing using Java and Regular Expressions. My incoming value can be just "$bdate$" or be embedded within a sentence like "Your Birthdate : $bdate$". I want to replace these $aaa$ values dynamically at run time and am not able to isolate the regex matched values within a sentence. Here is what I have so far....
package com.test;
import java.util.Arrays;
import java.util.List;
import java.util.regex.Pattern;
public class TestRegex {
public static String REGEX = "\\$((?:[a-zA-Z0-9_ ]*))\\$";
public static String testString = "Summary : $summary$"
+ "Age : $age$"
+ "Location : $location$";
public static void main(String[] args) {
System.out.println("Matcher : " + Pattern.matches(REGEX, "$ABX_ 11$"));
String [] splitStrings = testString.split("\\W+"); //also tried "\\b+"
List<String> stringList = Arrays.asList(splitStrings);
for(String test : stringList) {
System.out.println("Split Word : " + test);
}
}
}
The output is below - it misses the preceding and succeeding $ symbols:
Matcher : true
Split Word : Summary
Split Word : summary
Split Word : Age
Split Word : age
Split Word : Location
Split Word : location
I know I am very close but not able to figure out the issue - Can anyone please help !!
You can use the following:
String pattern = "\\w+|\\$\\w+\\$";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(testString);
while (m.find( )) {
System.out.println("Found value: " + m.group(0) );
}
See Ideone DEMO
Just to extend #Karthik's answer and complete the thread, below code snippet only looks for words that match a pattern within the sentence and collects them - it might be easier to replace those dynamically at run time.
package com.test;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class TestRegex {
public static String testString = "Summary : $summary$"
+ "Age : $age$"
+ "Location : $location$";
public static void main(String[] args) {
String pattern = "\\$\\w+\\$";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(testString);
while (m.find( )) {
System.out.println("Found value: " + m.group(0) );
}
}
}

Regex to match [[Wikipedia:Manual of Style#Links|]] # in java

I have been trying to match the following string -
String temp = "[[Wikipedia:Manual of Style#Links|]]" ;
with the regex
boolean a = temp.matches("\\[\\[Wikipedia:[a-zA-Z_0-9]*#[a-zA-Z_0-9]*\\|\\]\\]");
"\\[\\[Wikipedia:(.*?)#(.*?)\\|\\]\\]"
"\\[\\[Wikipedia:(.*)*#(.+)*\\|\\]\\]"
"\\[\\[(.*?)#(.*?)\\|\\]\\]"
But none of them are giving any positive matches.
Straight away I can see a problem: you are using a character class without a space to match input with spaces.
Try this:
boolean a = temp.matches("\\[\\[Wikipedia:[\\w ]*#[\\w ]+\\|\\]\\]");
Note that [a-zA-Z_0-9] can be replaced by [\w] (but would include letters/numbers from all languages, which should be fine)
public static void main(String[] args) {
String temp = "[[Wikipedia:Manual of Style#Links|]]";
Pattern pattern = Pattern.compile("\\[\\[Wikipedia:([\\w ]+)#([\\w ]+)\\|\\]\\]");
Matcher matcher = pattern.matcher(temp);
if(matcher.find()) {
System.out.println("Manual of Style: " + matcher.group(1));
System.out.println("links : " + matcher.group(2));
}
}
or
temp.matches("\\[\\[Wikipedia:([\\w ]+)#([\\w ]+)\\|\\]\\]");
Just add a space to your custom character class:
String temp = "[[Wikipedia:Manual of Style#Links|]]" ;
temp.matches("\\[\\[Wikipedia:[a-zA-Z_0-9 ]*#[a-zA-Z_0-9]*\\|\\]\\]"); //true

Matcher can't match

I have the following code. I need to check the text for existing any of the words from some list of banned words. But even if this word exists in the text matcher doesn't see it. here is the code:
final ArrayList<String> regexps = config.getProperty(property);
for (String regexp: regexps){
Pattern pt = Pattern.compile("(" + regexp + ")", Pattern.CASE_INSENSITIVE);
Matcher mt = pt.matcher(plainText);
if (mt.find()){
result = result + "message can't be processed because it doesn't satisfy the rule " + property;
reason = false;
System.out.println("reason" + mt.group() + regexp);
}
}
What is wrong? This code can'f find regexp в[ыy][шs]лит[еe], which is regexp in the plainText = "Вышлите пожалуйста новый счет на оплату на Санг, пока согласовывали, уже
прошли его сроки. Лиценз...". I also tried another variants of the regexp but everything is useless
The trouble is elsewhere.
import java.util.regex.*;
public class HelloWorld {
public static void main(String []args) {
Pattern pt = Pattern.compile("(qwer)");
Matcher mt = pt.matcher("asdf qwer zxcv");
System.out.println(mt.find());
}
}
This prints out true. You may want to use word boundary as delimiter, though:
import java.util.regex.*;
public class HelloWorld {
public static void main(String []args) {
Pattern pt = Pattern.compile("\\bqwer\\b");
Matcher mt = pt.matcher("asdf qwer zxcv");
System.out.println(mt.find());
mt = pt.matcher("asdfqwer zxcv");
System.out.println(mt.find());
}
}
The parenthesis are useless unless you need to capture the keyword in a group. But you already have it to begin with.
Use ArrayList's built in functions indexOf(Object o) and contains(Object o) to check if a String exists anywhere in the Array and where.
e.g.
ArrayList<String> keywords = new ArrayList<String>();
keywords.add("hello");
System.out.println(keywords.contains("hello"));
System.out.println(keywords.indexOf("hello"));
outputs:
true
0
Try this to filter out messages which contain banned words using the following regex which uses OR operator.
private static void findBannedWords() {
final ArrayList<String> keywords = new ArrayList<String>();
keywords.add("f$%k");
keywords.add("s!#t");
keywords.add("a$s");
String input = "what the f$%k";
String bannedRegex = "";
for (String keyword: keywords){
bannedRegex = bannedRegex + ".*" + keyword + ".*" + "|";
}
Pattern pt = Pattern.compile(bannedRegex.substring(0, bannedRegex.length()-1));
Matcher mt = pt.matcher(input);
if (mt.matches()) {
System.out.println("message can't be processed because it doesn't satisfy the rule ");
}
}

regex pattern to match particular uri from list of urls

I have a list of urls (lMapValues ) with wild cards like as mentioned in the code below
I need to match uri against this list to find matching url.
In below code I should get matching url as value of d in the map m.
That means if part of uri is matching in the list of urls, that particular url should be picked.
I tried splitting uri in tokens and then checking each token in list lMapValues .However its not giving me correct result.Below is code for that.
public class Matcher
{
public static void main( String[] args )
{
Map m = new HashMap();
m.put("a","https:/abc/eRControl/*");
m.put("b","https://abc/xyz/*");
m.put("c","https://work/Mypage/*");
m.put("d","https://cr/eRControl/*");
m.put("e","https://custom/MyApp/*");
List lMapValues = new ArrayList(m.values());
List tokens = new ArrayList();
String uri = "cr/eRControl/work/custom.jsp";
StringTokenizer st = new StringTokenizer(uri,"/");
while(st.hasMoreTokens()) {
String token = st.nextToken();
tokens.add(token);
}
for(int i=0;i<lMapValues.size();i++) {
String value = (String)lMapValues.get(i);
String patternString = "\\b(" + StringUtils.join(tokens, "|") + ")\\b";
Pattern pattern = Pattern.compile(patternString);
java.util.regex.Matcher matcher = pattern.matcher(value);
while (matcher.find()) {
System.out.println(matcher.group(1));
System.out.println(value);
}
}
}
}
Please help me with regex pattern to achieve above objective.
Any help will be appreciated.
It's much simpler to check if a string starts with a certain value with String.indexOf().
String[] urls = {
"abc/eRControl",
"abc/xyz",
"work/Mypage",
"cr/eRControl",
"custom/MyApp"
};
String uri = "cr/eRControl/work/custom.jsp";
for (String url : urls) {
if (uri.indexOf(url) == 0) {
System.out.println("Matched: " + url);
}else{
System.out.println("Not matched: " + url);
}
}
Also. There is no need to store the scheme into the map if you are never going to match against it.
if I understand your goal correctly, you might not even need regular expressions here.
Try this...
package test;
import java.util.HashSet;
import java.util.Set;
public class PartialURLMapper {
private static final Set<String> PARTIAL_URLS = new HashSet<String>();
static {
PARTIAL_URLS.add("cr/eRControl");
// TODO add more partial Strings to check against input
}
public static String getPartialStringIfMatching(final String input) {
if (input != null && !input.isEmpty()) {
for (String partial: PARTIAL_URLS) {
// this will be case-sensitive
if (input.contains(partial)) {
return partial;
}
}
}
// no partial match found, we return an empty String
return "";
}
// main method just to add example
public static void main(String[] args) {
System.out.println(PartialURLMapper.getPartialStringIfMatching("cr/eRControl/work/custom.jsp"));
}
}
... it will return:
cr/eRControl
The problem is that i is acting as a key not as an index on
String value = (String)lMapValues.get(i);
you will be better served exchanging the map for a list, and using the for each loop.
List<String> patterns = new ArrayList<String>();
...
for (String pattern : patterns) {
....
}

Categories