Java Pattern matching regex - java

I am doing a Pattern match the matcher.matches is coming as false, while the matcher.replaceAll actually finds the pattern and replaces it. Also the matcher.group(1) is returning an exception.
#Test
public void testname() throws Exception {
Pattern p = Pattern.compile("<DOCUMENT>(.*)</DOCUMENT>");
Matcher m = p.matcher("<RESPONSE><DOCUMENT>SDFS878SDF87DSF</DOCUMENT></RESPONSE>");
System.out.println("Why is this false=" + m.matches());
String s = m.replaceAll("HEY");
System.out.println("But replaceAll found it="+s);
}
I need the matcher.matches() to return true, and the matcher.group(1) to return
"<DOCUMENT>SDFS878SDF87DSF</DOCUMENT>"
Thanks in advance for the help.

final Pattern pattern = Pattern.compile("<DOCUMENT>(.+?)</DOCUMENT>");
final Matcher matcher = pattern.matcher("<RESPONSE><DOCUMENT>SDFS878SDF87DSF</DOCUMENT></RESPONSE>");
if (matcher.find())
{
System.out.println(matcher.group(1));
// code to replace and inject new value between the <DOCUMENT> tags
}

Related

matches.find() with replaceAll()

I am new to Java and I found a loop in existing code that seems like it should be an infinite loop (or otherwise have highly undesirable behavior) which actually works.
Can you explain what I'm missing? The reason I think it should be infinite is that according to the documentation here (https://docs.oracle.com/javase/8/docs/api/java/util/regex/Matcher.html#replaceAll-java.lang.String-) a call to replaceAll will reset the matcher (This method first resets this matcher. It then scans the input sequence...). So I thought the below code would do its replacement and then call find() again, which would start over at the beginning. And it would keep finding the same string, since as you can see the string is just getting wrapped in a tag.
In case it's not obvious, Pattern and Matcher are the classes in java.util.regex.
String aTagName = getSomeTagName()
String text = getSomeText()
Pattern pattern = getSomePattern()
Matcher matches = pattern.matcher(text);
while (matches.find()) {
text = matches.replaceAll(String.format("<%1$s> %2$s </%1$s>", aTagName, matches.group()));
}
Why is that not the case?
I share your suspicions that this code very likely is unintended, for replaceAll changes the state, and since it scans the string to replace, the result is that only 1 search is performed and stated group is used to replace all searches with this group.
String text = "abcdEfg";
Pattern pattern = Pattern.compile("[a-z]");
Matcher matches = pattern.matcher(text);
while (matches.find()) {
System.out.println(text); // abcdEfg
text = matches.replaceAll(matches.group());
System.out.println(text); // aaaaEaa
}
As replaceAll tells the matcher to scan through the string, it ends up moving the pointer to the end to exhaust the entire string's state. Then find resumes search (from the current state - which is the end, not the start), but the search has already been exhausted.
One of the correct ways to iterate and replace for each group appropriately may be to use appendReplacement:
String text = "abcdEfg";
Pattern pattern = Pattern.compile("[a-z]");
Matcher matches = pattern.matcher(text);
StringBuffer sb = new StringBuffer();
while (matches.find()) {
matches.appendReplacement(sb, matches.group().toUpperCase());
System.out.println(text); // some of ABCDEFG
}
matches.appendTail(sb);
System.out.println(sb); // ABCDEFG
The below examples shows there is no reason to call the while loop if you are using replace all. In both the cases the answer is
is th is a summer ? Th is is very hot summer. is n't it?
import java.util.regex.*;
public class Test {
public static void main(String[] args) {
String text = "is this a summer ? This is very hot summer. isn't it?";
String tag = "b";
String pattern = "is";
System.out.println(question(text,tag,pattern));
System.out.println(alt(text,tag,pattern));
}
public static String question(String text, String tag, String p) {
Pattern pattern = Pattern.compile(p);
Matcher matcher= pattern.matcher(text);
while (matcher.find()) {
text = matcher.replaceAll(
String.format("<%1$s> %2$s </%1$s>",
tag, matcher.group()));
}
return text;
}
public static String alt(String text, String tag, String p) {
Pattern pattern = Pattern.compile(p);
Matcher matcher= pattern.matcher(text);
if(matcher.find())
return matcher.replaceAll(
String.format("<%1$s> %2$s </%1$s>",
tag, matcher.group()));
else
return text;
}
}

Ignore parameters in url using regex in java

So I have the following route /path1/path2/{value1}/path3/{value2} and I'm trying to figure out if the request route matches path1 path2 and path3 regardless the {value1} and {value2} which change.
This is what I have but its not matching:
#Test
public void testURLMatches() {
String input = "/path1/path2/123/path3/456";
Pattern pattern = Pattern.compile("\\/path1\\/path2\\/([a-zA-Z0-9]{0,})\\/path3\\/([a-zA-Z0-9]{0,})");
Matcher matcher = pattern.matcher(input);
if (matcher.find()) {
System.out.println("Does match!");
} else {
System.out.println("Does not match!");
}
assertTrue(matcher.find());
}
Edit 1:
Added in the pattern \/ which was missing originally
I think the Regex you are looking at is
^\/path1\/path2\/([\w]+)\/path3\/([\w]+)$
PS : You have another problem in your test, you call the matcher.find() functions twice, whereas you should only call it once. Remove the if condition.
In Java, you get
#Test
public void testURLMatches() {
String input = "/path1/path2/123/path3/456";
Pattern pattern = Pattern.compile("^\\/path1\\/path2\\/([\\w]+)\\/path3\\/([\\w]+)$");
Matcher matcher = pattern.matcher(input);
assertTrue(matcher.find());
}
(example)
Your pattern does not match because you need a / after: /path2, try this and it will work:
string input = "/path1/path2/123/path3/456";
string pattern = #"\/path1\/path2\/[a-zA-Z0-9]{0,}\/path3\/[a-zA-Z0-9]{0,}";
Match m = Regex.Match(input, pattern, RegexOptions.IgnoreCase);
if (m.Success)
{
// match
}
else
{
// not match
}
It is not very clear for me what is the accepted values for {Value}, but you can use this, as well:
\/path1\/path2\/[\w]*\/path3\/[\w]*
[\w]*: zero or more occurrence of any alphanumeric char

Regular Expression not contain a word list

I'm trying to create a regular expression
to match a string which not contains some specific words and following it by a certain word like this:
(?<!(state|government|head).*)of
ex:
state of -> not match
government of -> not match
Abc of -> match
But It doesn't work. I don't know why, please help me explain it.
You can use this regex with Negative lookahead. The sample like:
public static void main(String[] args) {
Pattern pattern = Pattern.compile("^(?!state|government|head).*$");
String s = "state of";
Matcher matcher = pattern.matcher(s);
boolean bl = matcher.find();
System.out.println(bl);
s = "government of";
matcher = pattern.matcher(s);
bl = matcher.find();
System.out.println(bl);
s = "Abc of";
matcher = pattern.matcher(s);
bl = matcher.find();
System.out.println(bl);
}
Hope this help!

Pattern to match any character until boundary character

I have the string:
String myStr = "Operation=myMethod\nDataIn=A;B;C;D\nDataOut=X;Y;Z\n"
and I want to match DataIn.
I have the following code:
Pattern pattern = Pattern.compile("Operation=myMethod.*DataIn=(.*)?\n", Pattern.DOTALL);
Matcher matcher = pattern.matcher(myStr);
if (matcher.find()) {
return matcher.group(1);
}
The problem is that it is returning: "A;B;C;D\nDataOut=X;Y;Z\n"
I tried with the patter: "Operation=myMethod.DataIn=(.?\n)"
It then returns "A;B;C;D\n". I don't want the final "\n" to be returned.
Replace (.*) in your regex by ([^\n]*) to match until the line-break, or ([^\b]*) to match until any boundary character.
Pattern pattern = Pattern.compile("Operation=myMethod.*DataIn=([^\\n]*)?\n", Pattern.DOTALL);
Matcher matcher = pattern.matcher(myStr);
if (matcher.find()) {
return matcher.group(1);
}
The [^...] construct in a character class that means match any character that isn't in this set.
You can use:
Pattern pattern = Pattern.compile("Operation=myMethod.*?DataIn=([^\\n]*)", Pattern.DOTALL);
This will match until 0 or more characters in group #1 until \n is matched.
Try using this:
PATTERN
(?<=DataIn=)(.+?)(?=\\n)
CODE
Pattern pattern = Pattern.compile("(?<=DataIn=)(.+?)(?=\\n)", Pattern.DOTALL);
Matcher matcher = pattern.matcher(myStr);
if (matcher.find())
{
return matcher.group(1);
}
INPUT
Operation=myMethod\nDataIn=A;B;C;D\nDataOut=X;Y;Z\n
OUTPUT
A;B;C;D
Regex
(?s)(?<=DataIn=)(.+?)(?=\n)
Description

RegEx performence issue

I have written a regular expression to validate a name. The name can start with alphabetics and can be followed by alphabetics, numbers, a space or a _.
The regex that I wrote is:
private static final String REGEX = "([a-zA-Z][a-zA-Z0-9 _]*)*";
If the input is: "kasklfhklasdhklghjsdkgsjkdbgjsbdjKg;" the program gets stuck on matcher.matches().
Pattern pattern = Pattern.compile(REGEX);
Matcher matcher = pattern.matcher(input);
if (matcher.matches()) {
System.out.println("Pattern Matches");
} else {
System.out.println("Match Declined");
}
How can I optimize the regex?
Change your regex to:
private static final String REGEX = "[a-zA-Z][a-zA-Z0-9 _]*";
And it will match the String in a click.

Categories