I'm trying to replace certain words in a long string. What happens is some words stay the same and some change. The words that don't change seem to get the matcher stuck in an infinite loop as it keeps trying to do the same action on words that are meant to stay the same. Below is an example similar to mine - I couldn't put the exact code that I'm using because it's far more detailed and would take up too much space I'm afraid.
public String test() {
String temp = "<p><img src=\"logo.jpg\"/></p>\n<p>CANT TOUCH THIS!</p>";
Pattern pattern = Pattern.compile("(<p(\\s.+)?>(.+)?</p>)");
Matcher matcher = pattern.matcher(temp);
StringBuilder stringBuilder = new StringBuilder(temp);
int start;
int end;
String match;
while (matcher.find()) {
start = matcher.start();
end = matcher.end();
match = temp.substring(start, end);
stringBuilder.replace(start, end, changeWords(match));
temp = stringBuilder.toString();
matcher = pattern.matcher(temp);
System.out.println("This is the word I'm getting stuck on: " + match);
}
return temp;
}
public String changeWords(String words) {
return "<p><img src=\"logo.jpg\"/></p>";
}
Any suggestions as to why this might be happening?
You reinitialize the matcher in the loop.
Remove the matcher = pattern.matcher(temp); instruction in your while loop and you should not be stuck any more.
You are using Matcher wrong. Your while loop reads:
while (matcher.find()) {
start = matcher.start();
end = matcher.end();
match = temp.substring(start, end);
stringBuilder.replace(start, end, changeWords(match));
temp = stringBuilder.toString();
matcher = pattern.matcher(temp);
}
it should just be:
matcher.replaceAll(temp, "new text");
No "while" loop, it is unnecessary. A matcher will not replace text it does not match and it will do the right job with regards to not matching twice at the same place etc -- no need to spoonfeed it.
What is more, your regex can do without the capturing parens. And if you only want to replace "words" (regexes have no notion of words), add word anchors around the text to be matched:
Pattern pattern = Pattern.compile("\\btext\\b");
You are looking to match "text" word and again replacing that word either with "text" (if condition in changeWord()) or "new text" (else in changeWord()). That whay it's causing infinite loop.
Why are you using Matcher at all? You don't need regex to replace words, just use replace():
input.replace("oldtext", "newtext"); // replace all occurrences of old with new
I fixed it simply by adding this line:
if (!match.equals(changeWords(match))) {
matcher = pattern.matcher(temp);
}
Related
Given regex I want to replace that part of string with multiple "." character based on its size.
I tried something like this:
s = s.replaceAll(matcher.group(1),"." * matcher.group(1).length() );
but the "." * length gives an error any way I can fix that.
You might have to use a formal pattern matcher here:
String input = "Peas porridge hot, peas porridge cold";
Pattern pattern = Pattern.compile("(?i)\\bpeas\\b");
Matcher m = pattern.matcher(input);
StringBuffer buffer = new StringBuffer();
while(m.find()) {
m.appendReplacement(buffer, m.group().replaceAll(".", "."));
}
m.appendTail(buffer);
System.out.println(buffer.toString());
// .... porridge hot, .... porridge cold
The above logic is to match each occurrence of peas (case insensitive). For each match, we pause and splice on a replacement which is the match (peas), with every character being replaced by dot.
I am new to Java and I found a loop in existing code that seems like it should be an infinite loop (or otherwise have highly undesirable behavior) which actually works.
Can you explain what I'm missing? The reason I think it should be infinite is that according to the documentation here (https://docs.oracle.com/javase/8/docs/api/java/util/regex/Matcher.html#replaceAll-java.lang.String-) a call to replaceAll will reset the matcher (This method first resets this matcher. It then scans the input sequence...). So I thought the below code would do its replacement and then call find() again, which would start over at the beginning. And it would keep finding the same string, since as you can see the string is just getting wrapped in a tag.
In case it's not obvious, Pattern and Matcher are the classes in java.util.regex.
String aTagName = getSomeTagName()
String text = getSomeText()
Pattern pattern = getSomePattern()
Matcher matches = pattern.matcher(text);
while (matches.find()) {
text = matches.replaceAll(String.format("<%1$s> %2$s </%1$s>", aTagName, matches.group()));
}
Why is that not the case?
I share your suspicions that this code very likely is unintended, for replaceAll changes the state, and since it scans the string to replace, the result is that only 1 search is performed and stated group is used to replace all searches with this group.
String text = "abcdEfg";
Pattern pattern = Pattern.compile("[a-z]");
Matcher matches = pattern.matcher(text);
while (matches.find()) {
System.out.println(text); // abcdEfg
text = matches.replaceAll(matches.group());
System.out.println(text); // aaaaEaa
}
As replaceAll tells the matcher to scan through the string, it ends up moving the pointer to the end to exhaust the entire string's state. Then find resumes search (from the current state - which is the end, not the start), but the search has already been exhausted.
One of the correct ways to iterate and replace for each group appropriately may be to use appendReplacement:
String text = "abcdEfg";
Pattern pattern = Pattern.compile("[a-z]");
Matcher matches = pattern.matcher(text);
StringBuffer sb = new StringBuffer();
while (matches.find()) {
matches.appendReplacement(sb, matches.group().toUpperCase());
System.out.println(text); // some of ABCDEFG
}
matches.appendTail(sb);
System.out.println(sb); // ABCDEFG
The below examples shows there is no reason to call the while loop if you are using replace all. In both the cases the answer is
is th is a summer ? Th is is very hot summer. is n't it?
import java.util.regex.*;
public class Test {
public static void main(String[] args) {
String text = "is this a summer ? This is very hot summer. isn't it?";
String tag = "b";
String pattern = "is";
System.out.println(question(text,tag,pattern));
System.out.println(alt(text,tag,pattern));
}
public static String question(String text, String tag, String p) {
Pattern pattern = Pattern.compile(p);
Matcher matcher= pattern.matcher(text);
while (matcher.find()) {
text = matcher.replaceAll(
String.format("<%1$s> %2$s </%1$s>",
tag, matcher.group()));
}
return text;
}
public static String alt(String text, String tag, String p) {
Pattern pattern = Pattern.compile(p);
Matcher matcher= pattern.matcher(text);
if(matcher.find())
return matcher.replaceAll(
String.format("<%1$s> %2$s </%1$s>",
tag, matcher.group()));
else
return text;
}
}
I have a string like this:
something:POST:/some/path
Now I want to take the POST alone from the string. I did this by using this regex
:([a-zA-Z]+):
But this gives me a value along with colons. ie I get this:
:POST:
but I need this
POST
My code to match the same and replace it is as follows:
String ss = "something:POST:/some/path/";
Pattern pattern = Pattern.compile(":([a-zA-Z]+):");
Matcher matcher = pattern.matcher(ss);
if (matcher.find()) {
System.out.println(matcher.group());
ss = ss.replaceFirst(":([a-zA-Z]+):", "*");
}
System.out.println(ss);
EDIT:
I've decided to use the lookahead/lookbehind regex since I did not want to use replace with colons such as :*:. This is my final solution.
String s = "something:POST:/some/path/";
String regex = "(?<=:)[a-zA-Z]+(?=:)";
Matcher matcher = Pattern.compile(regex).matcher(s);
if (matcher.find()) {
s = s.replaceFirst(matcher.group(), "*");
System.out.println("replaced: " + s);
}
else {
System.out.println("not replaced: " + s);
}
There are two approaches:
Keep your Java code, and use lookahead/lookbehind (?<=:)[a-zA-Z]+(?=:), or
Change your Java code to replace the result with ":*:"
Note: You may want to define a String constant for your regex, since you use it in different calls.
As pointed out, the reqex captured group can be used to replace.
The following code did it:
String ss = "something:POST:/some/path/";
Pattern pattern = Pattern.compile(":([a-zA-Z]+):");
Matcher matcher = pattern.matcher(ss);
if (matcher.find()) {
ss = ss.replaceFirst(matcher.group(1), "*");
}
System.out.println(ss);
UPDATE
Looking at your update, you just need ReplaceFirst only:
String result = s.replaceFirst(":[a-zA-Z]+:", ":*:");
See the Java demo
When you use (?<=:)[a-zA-Z]+(?=:), the regex engine checks each location inside the string for a * before it, and once found, tries to match 1+ ASCII letters and then assert that there is a : after them. With :[A-Za-z]+:, the checking only starts after a regex engine found : character. Then, after matching :POST:, the replacement pattern replaces the whole match. It is totlally OK to hardcode colons in the replacement pattern since they are hardcoded in the regex pattern.
Original answer
You just need to access Group 1:
if (matcher.find()) {
System.out.println(matcher.group(1));
}
See Java demo
Your :([a-zA-Z]+): regex contains a capturing group (see (....) subpattern). These groups are numbered automatically: the first one has an index of 1, the second has the index of 2, etc.
To replace it, use Matcher#appendReplacement():
String s = "something:POST:/some/path/";
StringBuffer result = new StringBuffer();
Matcher m = Pattern.compile(":([a-zA-Z]+):").matcher(s);
while (m.find()) {
m.appendReplacement(result, ":*:");
}
m.appendTail(result);
System.out.println(result.toString());
See another demo
This is your solution:
regex = (:)([a-zA-Z]+)(:)
And code is:
String ss = "something:POST:/some/path/";
ss = ss.replaceFirst("(:)([a-zA-Z]+)(:)", "$1*$3");
ss now contains:
something:*:/some/path/
Which I believe is what you are looking for...
I'm trying to get the indexes for each pattern that I find in a document. So far I have:
String temp = "This is a test to see HelloWorld in a test that sees HelloWorld in a test";
Pattern pattern = Pattern.compile("HelloWorld");
Matcher matcher = pattern.matcher(temp);
int current = 0;
int start;
int end;
while (matcher.find()) {
start = matcher.start(current);
end = matcher.end(current);
System.out.println(temp.substring(start, end));
current++;
}
For some reason it keeps finding only the first instance of HelloWorld in temp though which results in an infinite loop. To be honest, I wasn't sure if you could use matcher.start(current) and matcher.end(current) - it was just a wild guess because matcher.group(current) worked before. This time I need the actual indexes though so matcher.group() wouldn't work for me.
Modify the regex to look like this:
while (matcher.find()) {
start = matcher.start();
end = matcher.end();
System.out.println(temp.substring(start, end));
}
Don't pass the index to start(int) and end(int). The API states that the parameter is the group number. In your case, only zero is correct. Use start() and end() instead.
The matcher will move to the next match on each iteration because of your call to find():
This method starts at the beginning of the input sequence or, if a previous invocation of the method was successful and the matcher has not since been reset, at the first character not matched by the previous match.
The problem is this line of code.
start = matcher.start(current);
current is 1 after first iteration.
If you just need the start and end offsets of your matched text, you don't need the current group, this will be ok:
String temp = "This is a test to see HelloWorld in a test that sees HelloWorld in a test";
Pattern pattern = Pattern.compile("HelloWorld");
Matcher matcher = pattern.matcher(temp);
int current = 0;
while (matcher.find()) {
System.out.println(temp.substring(matcher.start(), matcher.end()));
}
while (matcher.find()) {
start = matcher.start();
end = matcher.end();
System.out.println(temp.substring(start, end));
}
Will do what you want.
String temp = "This is a test to see HelloWorld in a test that sees HelloWorld in a test";
Pattern pattern = Pattern.compile("HelloWorld");
Matcher m = pattern.matcher(temp);
while (matcher.find()) {
System.out.println(temp.substring(m.start(), m.stop()));
}
I try to parse a String with a Regexp to get parameters out of it.
As an example:
String: "TestStringpart1 with second test part2"
Result should be: String[] {"part1", "part2"}
Regexp: "TestString(.*?) with second test (.*?)"
My Testcode was:
String regexp = "TestString(.*?) with second test (.*?)";
String res = "TestStringpart1 with second test part2";
Pattern pattern = Pattern.compile(regexp);
Matcher matcher = pattern.matcher(res);
int i = 0;
while(matcher.find()) {
i++;
System.out.println(matcher.group(i));
}
But it only outputs the "part1"
Could someone give me hint?
Thanks
may be some fix regexp
String regexp = "TestString(.*?) with second test (.*)";
and change println code ..
if (matcher.find())
for (int i = 1; i <= matcher.groupCount(); ++i)
System.out.println(matcher.group(i));
Well, you only ever ask it to... In your original code, the find keeps shifting the matcher from one match of the entire regular expression to the next, while within the while's body you only ever pull out one group. Actually, if there would have been multiple matches of the regexp in your string, you would have found that for the first occurence, you would have got the "part1", for the second occurence you would have got the "part2", and for any other reference you would have got an error.
while(matcher.find()) {
System.out.print("Part 1: ");
System.out.println(matcher.group(1));
System.out.print("Part 2: ");
System.out.println(matcher.group(2));
System.out.print("Entire match: ");
System.out.println(matcher.group(0));
}