matches.find() with replaceAll() - java

I am new to Java and I found a loop in existing code that seems like it should be an infinite loop (or otherwise have highly undesirable behavior) which actually works.
Can you explain what I'm missing? The reason I think it should be infinite is that according to the documentation here (https://docs.oracle.com/javase/8/docs/api/java/util/regex/Matcher.html#replaceAll-java.lang.String-) a call to replaceAll will reset the matcher (This method first resets this matcher. It then scans the input sequence...). So I thought the below code would do its replacement and then call find() again, which would start over at the beginning. And it would keep finding the same string, since as you can see the string is just getting wrapped in a tag.
In case it's not obvious, Pattern and Matcher are the classes in java.util.regex.
String aTagName = getSomeTagName()
String text = getSomeText()
Pattern pattern = getSomePattern()
Matcher matches = pattern.matcher(text);
while (matches.find()) {
text = matches.replaceAll(String.format("<%1$s> %2$s </%1$s>", aTagName, matches.group()));
}
Why is that not the case?

I share your suspicions that this code very likely is unintended, for replaceAll changes the state, and since it scans the string to replace, the result is that only 1 search is performed and stated group is used to replace all searches with this group.
String text = "abcdEfg";
Pattern pattern = Pattern.compile("[a-z]");
Matcher matches = pattern.matcher(text);
while (matches.find()) {
System.out.println(text); // abcdEfg
text = matches.replaceAll(matches.group());
System.out.println(text); // aaaaEaa
}
As replaceAll tells the matcher to scan through the string, it ends up moving the pointer to the end to exhaust the entire string's state. Then find resumes search (from the current state - which is the end, not the start), but the search has already been exhausted.
One of the correct ways to iterate and replace for each group appropriately may be to use appendReplacement:
String text = "abcdEfg";
Pattern pattern = Pattern.compile("[a-z]");
Matcher matches = pattern.matcher(text);
StringBuffer sb = new StringBuffer();
while (matches.find()) {
matches.appendReplacement(sb, matches.group().toUpperCase());
System.out.println(text); // some of ABCDEFG
}
matches.appendTail(sb);
System.out.println(sb); // ABCDEFG

The below examples shows there is no reason to call the while loop if you are using replace all. In both the cases the answer is
is th is a summer ? Th is is very hot summer. is n't it?
import java.util.regex.*;
public class Test {
public static void main(String[] args) {
String text = "is this a summer ? This is very hot summer. isn't it?";
String tag = "b";
String pattern = "is";
System.out.println(question(text,tag,pattern));
System.out.println(alt(text,tag,pattern));
}
public static String question(String text, String tag, String p) {
Pattern pattern = Pattern.compile(p);
Matcher matcher= pattern.matcher(text);
while (matcher.find()) {
text = matcher.replaceAll(
String.format("<%1$s> %2$s </%1$s>",
tag, matcher.group()));
}
return text;
}
public static String alt(String text, String tag, String p) {
Pattern pattern = Pattern.compile(p);
Matcher matcher= pattern.matcher(text);
if(matcher.find())
return matcher.replaceAll(
String.format("<%1$s> %2$s </%1$s>",
tag, matcher.group()));
else
return text;
}
}

Related

Splitting string by new line with a condition

I am trying to split a String by \n only when it's not in my "action block".
Here is an example of a text message\n [testing](hover: actions!\nnew line!) more\nmessage I want to split when ever the \n is not inside the [](this \n should be ignored), I made a regex for it that you can see here https://regex101.com/r/RpaQ2h/1/ in the example it seems like it's working correctly so I followed up with an implementation in Java:
final List<String> lines = new ArrayList<>();
final Matcher matcher = NEW_LINE_ACTION.matcher(message);
String rest = message;
int start = 0;
while (matcher.find()) {
if (matcher.group("action") != null) continue;
final String before = message.substring(start, matcher.start());
if (!before.isEmpty()) lines.add(before.trim());
start = matcher.end();
rest = message.substring(start);
}
if (!rest.isEmpty()) lines.add(rest.trim());
return lines;
This should ignore any \n if they are inside the pattern showed above, however it never matches the "action" group, seems like when it is added to java and a \n is present it never matches it. I am a bit confused as to why, since it worked perfectly on the regex101.
Instead of checking whether the group is action, you can simply use regex replacement with the group $1 (the first capture group).
I also changed your regex to (?<action>\[[^\]]*]\([^)]*\))|(?<break>\\n) as [^\]]* doesn't backtrack (.*? backtracks and causes more steps). I did the same with [^)]*.
See code working here
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main {
public static void main(String[] args) {
final String regex = "(?<action>\\[[^\\]]*\\]\\([^)]*\\))|(?<break>\\\\n)";
final String string = "message\\n [testing test](hover: actions!\\nnew line!) more\\nmessage";
final Pattern pattern = Pattern.compile(regex);
final Matcher matcher = pattern.matcher(string);
final String result = matcher.replaceAll("$1");
System.out.println(result);
}
}

Ignore parameters in url using regex in java

So I have the following route /path1/path2/{value1}/path3/{value2} and I'm trying to figure out if the request route matches path1 path2 and path3 regardless the {value1} and {value2} which change.
This is what I have but its not matching:
#Test
public void testURLMatches() {
String input = "/path1/path2/123/path3/456";
Pattern pattern = Pattern.compile("\\/path1\\/path2\\/([a-zA-Z0-9]{0,})\\/path3\\/([a-zA-Z0-9]{0,})");
Matcher matcher = pattern.matcher(input);
if (matcher.find()) {
System.out.println("Does match!");
} else {
System.out.println("Does not match!");
}
assertTrue(matcher.find());
}
Edit 1:
Added in the pattern \/ which was missing originally
I think the Regex you are looking at is
^\/path1\/path2\/([\w]+)\/path3\/([\w]+)$
PS : You have another problem in your test, you call the matcher.find() functions twice, whereas you should only call it once. Remove the if condition.
In Java, you get
#Test
public void testURLMatches() {
String input = "/path1/path2/123/path3/456";
Pattern pattern = Pattern.compile("^\\/path1\\/path2\\/([\\w]+)\\/path3\\/([\\w]+)$");
Matcher matcher = pattern.matcher(input);
assertTrue(matcher.find());
}
(example)
Your pattern does not match because you need a / after: /path2, try this and it will work:
string input = "/path1/path2/123/path3/456";
string pattern = #"\/path1\/path2\/[a-zA-Z0-9]{0,}\/path3\/[a-zA-Z0-9]{0,}";
Match m = Regex.Match(input, pattern, RegexOptions.IgnoreCase);
if (m.Success)
{
// match
}
else
{
// not match
}
It is not very clear for me what is the accepted values for {Value}, but you can use this, as well:
\/path1\/path2\/[\w]*\/path3\/[\w]*
[\w]*: zero or more occurrence of any alphanumeric char

How do I take a string with a named group and replace only that named capture group with a value in Java 7

Say for example I have the following string with a named capture group:
/this/(?<capture1>.*)/a/string/(?<capture2>.*)
And I want to replace the capture group with a value like "foo" so that I end up with a string that looks like:
/this/foo/a/string/bar
Limitations are:
Regex must be used as the string is evaluated elsewhere but it doesn't have to be a capture group.
I'd rather not have to regex match the regex.
EDIT: There can be many groups in the string.
You can find the starting and ending index
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
startindex= matcher.start();
stopindex=matcher.end();
// Your code for replacing that index and generating a new string with foo
// you can use string buffer to delete and insert the characters as you know the indexes
}
}
Full Implementation:
public static String getnewString(String text,String reg){
StringBuffer result = new StringBuffer(text);
Pattern pattern = Pattern.compile(reg);
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
int startindex= matcher.start();
int stopindex=matcher.end();
System.out.println(startindex+" "+stopindex);
result.delete(startindex, stopindex);
result.insert(startindex, "foo");
}
return result.toString();
}
Try this,
int lastIndex = s.lastIndexOf("/");
String newString = s.substring(0, lastIndex+1).concat("newString");
System.out.println(newString);
Get the subString till last '/' and then add new string to the substring like above
I got it:
String string = "/this/(?<capture1>.*)/a/string/(?<capture2>.*)";
Pattern pattern = Pattern.compile(string);
Matcher matcher = pattern.matches(string);
string.replace(matcher.group("capture1"), "value 1");
string.replace(matcher.group("capture2"), "value 2");
Crazy, but works.

Print out the last match of a regex

I have this code:
String responseData = "http://xxxxx-f.frehd.net/i/world/open/20150426/1370235-005A/EPISOD-1370235-005A-016f1729028090bf_,892,144,252,360,540,1584,2700,.mp4.csmil/.m3u8";
"http://xxxxx-f.frehd.net/i/world/open/20150426/1370235-005A/EPISOD-1370235-005A-016f1729028090bf_,892,144,252,360,540,1584,2700,.mp4.csmil/.m3u8";
String pattern = ^(https://.*\.54325)$;
Pattern pr = Pattern.compile(pattern);
Matcher math = pr.matcher(responseData);
if (math.find()) {
// print the url
}
else {
System.out.println("No Math");
}
I want to print out the last string that starts with http and ends with .m3u8. How do I do this? I'm stuck. All help is appreciated.
The problem I have now is that when I find a math and what to print out the string, I get everything from responseData.
In case you need to get some substring at the end that is preceded by similar substrings, you need to make sure the regex engine has already consumed as many characters before your required match as possible.
Also, you have a ^ in your pattern that means beginning of a string. Thus, it starts matching from the very beginning.
You can achieve what you want with just lastIndexOf and substring:
System.out.println(str.substring(str.lastIndexOf("http://")));
Or, if you need a regex, you'll need to use
String pattern = ".*(http://.*?\\.m3u8)$";
and use math.group(1) to print the value.
Sample code:
import java.util.regex.*;
public class HelloWorld{
public static void main(String []args){
String str = "http://xxxxx-f.akamaihd.net/i/world/open/20150426/1370235-005A/EPISOD-1370235-005A-016f1729028090bf_,892,144,252,360,540,1584,2700,.mp4.csmil/index_0_av.m3u8" +
"EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=2795000,RESOLUTION=1280x720,CODECS=avc1.64001f, mp4a.40.2" +
"http://xxxxx-f.akamaihd.net/i/world/open/20150426/1370235-005A/EPISOD-1370235-005A-016f1729028090bf_,892,144,252,360,540,1584,2700,.mp4.csmil/index_6_av.m3u8";
String rx = ".*(http://.*?\\.m3u8)$";
Pattern ptrn = Pattern.compile(rx);
Matcher m = ptrn.matcher(str);
while (m.find()) {
System.out.println(m.group(1));
}
}
}
Output:
http://xxxxx-f.akamaihd.net/i/world/open/20150426/1370235-005A/EPISOD-1370235-005A-016f1729028090bf_,892,144,252,360,540,1584,2700,.mp4.csmil/index_6_av.m3u8
Also tested on RegexPlanet

I am trying to extract text using regex but it is not working

I am trying to extract text using regex but it is not working. Although my regex work fine on regex validators.
public class HelloWorld {
public static void main(String []args){
String PATTERN1 = "F\\{([\\w\\s&]*)\\}";
String PATTERN2 = "{([\\w\\s&]*)\\}";
String src = "F{403}#{Title1}";
List<String> fvalues = Arrays.asList(src.split("#"));
System.out.println(fieldExtract(fvalues.get(0), PATTERN1));
System.out.println(fieldExtract(fvalues.get(1), PATTERN2));
}
private static String fieldExtract(String src, String ptrn) {
System.out.println(src);
System.out.println(ptrn);
Pattern pattern = Pattern.compile(ptrn);
Matcher matcher = pattern.matcher(src);
return matcher.group(1);
}
}
Why not use:
Pattern regex = Pattern.compile("F\\{([\\d\\s&]*)\\}#\\{([\\s\\w&]*)\\}");
To get both ?
This way the number will be in group 1 and the title in group 2.
Another thing if you're going to compile the regex (which can be helpful to performance) at least make the regex object static so that it doesn't get compiled each time you call the function (which kind of misses the whole pre-compilation point :) )
Basic demo here.
First problem:
String PATTERN2 = "\\{([\\w\\s&]*)\\}"; // quote '{'
Second problem:
Matcher matcher = pattern.matcher(src);
if( matcher.matches() ){
return matcher.group(1);
} else ...
The Matcher must be asked to plough the field, otherwise you can't harvest the results.

Categories