RegEx performence issue

RegEx performence issue - java

I have written a regular expression to validate a name. The name can start with alphabetics and can be followed by alphabetics, numbers, a space or a _.
The regex that I wrote is:
private static final String REGEX = "([a-zA-Z][a-zA-Z0-9 _]*)*";
If the input is: "kasklfhklasdhklghjsdkgsjkdbgjsbdjKg;" the program gets stuck on matcher.matches().
Pattern pattern = Pattern.compile(REGEX);
Matcher matcher = pattern.matcher(input);
if (matcher.matches()) {
System.out.println("Pattern Matches");
} else {
System.out.println("Match Declined");
}
How can I optimize the regex?

Change your regex to:
private static final String REGEX = "[a-zA-Z][a-zA-Z0-9 _]*";
And it will match the String in a click.

Related

Regex to find a word between $$ sign

I want regular expression to find a word between $$ sign only. It must start and end with $ sign. I have tried below expression
final String regex = "\\$\\w+\\$";
final Pattern pattern = Pattern.compile(regex);
final Matcher matcher = pattern.matcher("$abc$ cde$efg$hij pqr");
This should give me count as 1. But my regular expression also considering second occurrence of (cde$efg$hij) which it should not consider as it is not starting and ending with $$ sign.

You may use non-word boundaries:
final String regex = "\\B\\$\\w+\\$\\B";
The pattern will only match if the $abc$ is not preceded and followed with word chars. See the regex demo.
See Java demo:
String regex = "\\B\\$\\w+\\$\\B";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher("$abc$ cde$efg$hij pqr");
while (matcher.find()){
System.out.println(matcher.group(0));
} // => $abc$
Besides non-word boundaries, you may use whitespace boundaries if you only want to match in between whitespace chars or start/end of string:
String regex = "(?<!\\S)\\$\\w+\\$(?!\\S)";
Or, use unambiguous word boundaries (as I call them):
String regex = "(?<!\\w)\\$\\w+\\$(?!\\w)";
The (?<!\\w) negative lookbehind will fail the match if a word char is found immediately to the left of the current location, and the (?!\w) negative lookahead will fail the match if a word char is found immediately to the right of the current location.

The problem was extracting fields between dollar signs for me.
List<String> getFieldNames(#NotNull String str) {
final String regex = "\\$(\\w+)\\$";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(str);
List<String> fields = new ArrayList<>();
while (matcher.find()) {
fields.add(matcher.group(1));
}
return fields;
}
This will return list of words between dollar signs.

matches.find() with replaceAll()

I am new to Java and I found a loop in existing code that seems like it should be an infinite loop (or otherwise have highly undesirable behavior) which actually works.
Can you explain what I'm missing? The reason I think it should be infinite is that according to the documentation here (https://docs.oracle.com/javase/8/docs/api/java/util/regex/Matcher.html#replaceAll-java.lang.String-) a call to replaceAll will reset the matcher (This method first resets this matcher. It then scans the input sequence...). So I thought the below code would do its replacement and then call find() again, which would start over at the beginning. And it would keep finding the same string, since as you can see the string is just getting wrapped in a tag.
In case it's not obvious, Pattern and Matcher are the classes in java.util.regex.
String aTagName = getSomeTagName()
String text = getSomeText()
Pattern pattern = getSomePattern()
Matcher matches = pattern.matcher(text);
while (matches.find()) {
text = matches.replaceAll(String.format("<%1$s> %2$s </%1$s>", aTagName, matches.group()));
}
Why is that not the case?

I share your suspicions that this code very likely is unintended, for replaceAll changes the state, and since it scans the string to replace, the result is that only 1 search is performed and stated group is used to replace all searches with this group.
String text = "abcdEfg";
Pattern pattern = Pattern.compile("[a-z]");
Matcher matches = pattern.matcher(text);
while (matches.find()) {
System.out.println(text); // abcdEfg
text = matches.replaceAll(matches.group());
System.out.println(text); // aaaaEaa
}
As replaceAll tells the matcher to scan through the string, it ends up moving the pointer to the end to exhaust the entire string's state. Then find resumes search (from the current state - which is the end, not the start), but the search has already been exhausted.
One of the correct ways to iterate and replace for each group appropriately may be to use appendReplacement:
String text = "abcdEfg";
Pattern pattern = Pattern.compile("[a-z]");
Matcher matches = pattern.matcher(text);
StringBuffer sb = new StringBuffer();
while (matches.find()) {
matches.appendReplacement(sb, matches.group().toUpperCase());
System.out.println(text); // some of ABCDEFG
}
matches.appendTail(sb);
System.out.println(sb); // ABCDEFG

The below examples shows there is no reason to call the while loop if you are using replace all. In both the cases the answer is
is th is a summer ? Th is is very hot summer. is n't it?
import java.util.regex.*;
public class Test {
public static void main(String[] args) {
String text = "is this a summer ? This is very hot summer. isn't it?";
String tag = "b";
String pattern = "is";
System.out.println(question(text,tag,pattern));
System.out.println(alt(text,tag,pattern));
}
public static String question(String text, String tag, String p) {
Pattern pattern = Pattern.compile(p);
Matcher matcher= pattern.matcher(text);
while (matcher.find()) {
text = matcher.replaceAll(
String.format("<%1$s> %2$s </%1$s>",
tag, matcher.group()));
}
return text;
}
public static String alt(String text, String tag, String p) {
Pattern pattern = Pattern.compile(p);
Matcher matcher= pattern.matcher(text);
if(matcher.find())
return matcher.replaceAll(
String.format("<%1$s> %2$s </%1$s>",
tag, matcher.group()));
else
return text;
}
}

Java Regex to match supplied word with special charcters

I am trying to match string by using Java Pattern class.
private boolean isMatch(String searchSentence, String matchWord) {
String patternText = ".*\\b";
Pattern pattern = Pattern.compile(patternText + matchWord + "\\b.*",Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(searchSentence);
return matcher.matches();
}
This will work if i have match string, which doesn't contains any special character like ({[ etc.
I am able to find "RANCH" but not "RANCH (EAGLEFORD)".
Few more examples
Let say my input string is: "Point [-99.73586,28.38092]" then i should be able to search for "-99.73586,28.38092"
Let say my input string is: "Point [-99.73586,28.38092]" then i should be able to search for "[-99.73586,28.38092]"
Let say my input string is: "Rench RenchY" then i should be able to search for "Rench" but RenchY Should not be the part of search result.
So how can i handle these type of case.

Example of using special characters ({}) in regex
String stringToSearch = "Some lengthy string I am trying to RANCH (EAGLEFORD) and RANCH {EAGLEFORD}";
Pattern p1 = Pattern.compile("RANCH\\s[(){}\\w]+");
Matcher m = p1.matcher(stringToSearch);
while (m.find())
{
System.out.println(m.group());
}
output:
RANCH (EAGLEFORD)
RANCH {EAGLEFORD}

If you plan to match your keywords not enclosed with word chars (letters, digits, or underscores), use (?<!\w) and (?!\w) lookarounds instead of \b.
private boolean isMatch(String searchSentence, String matchWord) {
Pattern pattern = Pattern.compile("(?<!\\w)" + Pattern.quote(matchWord) + "(?!\\w)", Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(searchSentence);
return matcher.find();
}
If you only plan to find matches enclosed with whitespace/start/end of string, use (?<!\S) and (?!\S) lookarounds:
private boolean isMatch(String searchSentence, String matchWord) {
Pattern pattern = Pattern.compile("(?<!\\S)" + Pattern.quote(matchWord) + "(?!\\S)", Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(searchSentence);
return matcher.find();
}
Do not forget to Pattern.quote your literal strings.
Using Matcher#find is preferable since you do not need the initial/trailing .* and eliminates unnecessary overhead related to regex backtracking mechanism.

pattern matching to detect special characters in a word

I am trying to identify any special characters ('?', '.', ',') at the end of a string in java. Here is what I wrote:
public static void main(String[] args) {
Pattern pattern = Pattern.compile("{.,?}$");
Matcher matcher = pattern.matcher("Sure?");
System.out.println("Input String matches regex - "+matcher.matches());
}
This returns a false when it's expected to be true. Please suggest.

Use "sure?".matches(".*[.,?]").
String#matches(...) anto-anchors the regex with ^ and $, no need to add them manually.

This is your code:
Pattern pattern = Pattern.compile("{.,?}$");
Matcher matcher = pattern.matcher("Sure?");
System.out.println("Input String matches regex - "+matcher.matches());
You have 2 problems:
You're using { and } instead of character class [ and ]
You're using Matcher#matches() instead of Matcher#find. matches method matches the full input line while find performs a search anywhere in the string.
Change your code to:
Pattern pattern = Pattern.compile("[.,?]$");
Matcher matcher = pattern.matcher("Sure?");
System.out.println("Input String matches regex - " + matcher.find());

Try this
Pattern pattern = Pattern.compile(".*[.,?]");
...

RegEX: how to match string which is not surrounded

I have a String "REC/LESS FEES/CODE/AU013423".
What could be the regEx expression to match "REC" and "AU013423" (anything that is not surrounded by slashes /)
I am using /^>*/, which works and matches the string within slash's i.e. using this I am able to find "/LESS FEES/CODE/", but I want to negate this to find reverse i.e. REC and AU013423.
Need help on this. Thanks

If you know that you're only looking for alphanumeric data you can use the regex ([A-Z0-9]+)/.*/([A-Z0-9]+) If this matches you will have the two groups which contain the first & final text strings.
This code prints RECAU013423
final String s = "REC/LESS FEES/CODE/AU013423";
final Pattern regex = Pattern.compile("([A-Z0-9]+)/.*/([A-Z0-9]+)", Pattern.CASE_INSENSITIVE);
final Matcher matcher = regex.matcher(s);
if (matcher.matches()) {
System.out.println(matcher.group(1) + matcher.group(2));
}
You can tweak the regex groups as necessary to cover valid characters

Here's another option:
String s = "REC/LESS FEES/CODE/AU013423";
String[] results = s.split("/.*/");
System.out.println(Arrays.toString(results));
// [REC, AU013423]

^[^/]+|[^/]+$
matches anything that occurs before the first or after the last slash in the string (or the entire string if there is no slash present).
To iterate over all matches in a string in Java:
Pattern regex = Pattern.compile("^[^/]+|[^/]+$");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
// matched text: regexMatcher.group()
// match start: regexMatcher.start()
// match end: regexMatcher.end()
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

RegEx performence issue - java

Change your regex to: private static final String REGEX = "[a-zA-Z][a-zA-Z0-9 _]*"; And it will match the String in a click.

Related

Regex to find a word between $$ sign

matches.find() with replaceAll()

Java Regex to match supplied word with special charcters

pattern matching to detect special characters in a word

RegEX: how to match string which is not surrounded

Categories

Resources