in matcher.replace method,how to limit replace times? - java

in matcher.replace method,only has:
replaceFirst() and replaceAll() two methods
i want limit replace 3 times,how to do?
example:
String content="aaaaaaaaaa";
i want to get result is: "bbbaaaaaaa"
my code:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class T1 {
public static void main(String[] args) {
String content="aaaaaaaaaa";
Pattern pattern = Pattern.compile("a");
Matcher m=pattern.matcher(content);
if(m.find()){
String result=m.replaceFirst("b");
System.out.println(result);
}
}
}
thanks :)

On appendReplacement/Tail
You'd have to use appendReplacement and appendTail explicitly. Unfortunately you have to use StringBuffer to do this. Here's a snippet (see also in ideone.com):
String content="aaaaaaaaaa";
Pattern pattern = Pattern.compile("a");
Matcher m = pattern.matcher(content);
StringBuffer sb = new StringBuffer();
final int N = 3;
for (int i = 0; i < N; i++) {
if (m.find()) {
m.appendReplacement(sb, "b");
} else {
break;
}
}
m.appendTail(sb);
System.out.println(sb); // bbbaaaaaaa
See also
StringBuilder and StringBuffer in Java
StringBuffer is synchronized and therefore slower than StringBuilder
BugID 5066679: Matcher should make more use of Appendable
If granted, this request for enhancement would allow Matcher to append to any Appendable
Another example: N times uppercase replacement
Here's another example that shows how appendReplacement/Tail can give you more control over replacement than replaceFirst/replaceAll:
// replaces up to N times with uppercase of matched text
static String replaceUppercase(int N, Matcher m) {
StringBuffer sb = new StringBuffer();
for (int i = 0; i < N; i++) {
if (m.find()) {
m.appendReplacement(
sb,
Matcher.quoteReplacement(m.group().toUpperCase())
);
} else {
break;
}
}
m.appendTail(sb);
return sb.toString();
}
Then we can have (see also on ideone.com):
Pattern p = Pattern.compile("<[^>]*>");
Matcher m = p.matcher("<a> b c <ddd> e <ff> g <$$$> i <jjj>");
System.out.println(replaceUppercase(4, m));
// <A> b c <DDD> e <FF> g <$$$> i <jjj>
// 1 2 3 4
The pattern <[^>]*> is just a simple example pattern that matches "<tags like this>".
Note that Matcher.quoteReplacement is necessary in this particular case, or else appending "<$$$>" as replacement would trigger IllegalArgumentException about an illegal group reference (because $ unescaped in replacement string is a backreference sigil).
On replaceFirst and replaceAll
Attached is the java.util.regex.Matcher code for replaceFirst and replaceAll (version 1.64 06/04/07). Note that it's done using essentially the same appendReplacement/Tail logic:
// Excerpt from #(#)Matcher.java 1.64 06/04/07
public String replaceFirst(String replacement) {
if (replacement == null)
throw new NullPointerException("replacement");
StringBuffer sb = new StringBuffer();
reset(); // !!!!
if (find())
appendReplacement(sb, replacement);
appendTail(sb);
return sb.toString();
}
public String replaceAll(String replacement) {
reset(); // !!!!
boolean result = find();
if (result) {
StringBuffer sb = new StringBuffer();
do {
appendReplacement(sb, replacement);
result = find();
} while (result);
appendTail(sb);
return sb.toString();
}
return text.toString();
}
Note that the Matcher is reset() prior to any replaceFirst/All. Thus, simply calling replaceFirst 3 times would always get you the same result (see also on ideone.com):
String content="aaaaaaaaaa";
Pattern pattern = Pattern.compile("a");
Matcher m = pattern.matcher(content);
String result;
result = m.replaceFirst("b"); // once!
result = m.replaceFirst("b"); // twice!
result = m.replaceFirst("b"); // one more for "good" measure!
System.out.println(result);
// baaaaaaaaa
// i.e. THIS DOES NOT WORK!!!
See also
java.util.regex.Matcher source code, OpenJDK version

i think use StringUtils
code
org.apache.commons.lang3.StringUtils.replace(content,"a","b",3);

Related

How to get String between last two underscore

I have a string "abcde-abc-db-tada_x12.12_999ZZZ_121121.333"
The result I want should be 999ZZZ
I have tried using:
private static String getValue(String myString) {
Pattern p = Pattern.compile("_(\\d+)_1");
Matcher m = p.matcher(myString);
if (m.matches()) {
System.out.println(m.group(1)); // Should print 999ZZZ
}
else {
System.out.println("not found");
}
}
If you want to continue with a regex based approach, then use the following pattern:
.*_([^_]+)_.*
This will greedily consume up to and including the second to last underscrore. Then it will consume and capture 9999ZZZ.
Code sample:
String name = "abcde-abc-db-tada_x12.12_999ZZZ_121121.333";
Pattern p = Pattern.compile(".*_([^_]+)_.*");
Matcher m = p.matcher(name);
if (m.matches()) {
System.out.println(m.group(1)); // Should print 999ZZZ
} else {
System.out.println("not found");
}
Demo
Using String.split?
String given = "abcde-abc-db-tada_x12.12_999ZZZ_121121.333";
String [] splitted = given.split("_");
String result = splitted[splitted.length-2];
System.out.println(result);
Apart from split you can use substring as well:
String s = "abcde-abc-db-tada_x12.12_999ZZZ_121121.333";
String ss = (s.substring(0,s.lastIndexOf("_"))).substring((s.substring(0,s.lastIndexOf("_"))).lastIndexOf("_")+1);
System.out.println(ss);
OR,
String s = "abcde-abc-db-tada_x12.12_999ZZZ_121121.333";
String arr[] = s.split("_");
System.out.println(arr[arr.length-2]);
The get text between the last two underscore characters, you first need to find the index of the last two underscore characters, which is very easy using lastIndexOf:
String s = "abcde-abc-db-tada_x12.12_999ZZZ_121121.333";
String r = null;
int idx1 = s.lastIndexOf('_');
if (idx1 != -1) {
int idx2 = s.lastIndexOf('_', idx1 - 1);
if (idx2 != -1)
r = s.substring(idx2 + 1, idx1);
}
System.out.println(r); // prints: 999ZZZ
This is faster than any solution using regex, including use of split.
As I misunderstood the logic from the code in question a bit with the first read and in the meantime there appeared some great answers with the use of regular expressions, this is my try with the use of some methods contained in String class (it introduces some variables just to make it more clear to read, it could be written in the shorter way of course) :
String s = "abcde-abc-db-ta__dax12.12_999ZZZ_121121.333";
int indexOfLastUnderscore = s.lastIndexOf("_");
int indexOfOneBeforeLastUnderscore = s.lastIndexOf("_", indexOfLastUnderscore - 1);
if(indexOfLastUnderscore != -1 && indexOfOneBeforeLastUnderscore != -1) {
String sub = s.substring(indexOfOneBeforeLastUnderscore + 1, indexOfLastUnderscore);
System.out.println(sub);
}

Finding longest regex match in Java?

I have this:
import java.util.regex.*;
String regex = "(?<m1>(hello|universe))|(?<m2>(hello world))";
String s = "hello world";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(s);
while(matcher.find()) {
MatchResult matchResult = m.toMatchResult();
String substring = s.substring(matchResult.start(), matchResult.end());
System.out.println(substring);
}
The above only prints hello whereas I want it to print hello world.
One way to fix this is to re-order the groups in String regex = "(?<m2>(hello world))|(?<m1>(hello|universe))" but I don't have control over the regex I get in my case...
So what is the best way to find the longest match? An obvious way would be to check all possible substrings of s as mentioned here (Efficiently finding all overlapping matches for a regular expression) by length and pick the first but that is O(n^2). Can we do better?
Here is a way of doing it using matcher regions, but with a single loop over the string index:
public static String findLongestMatch(String regex, String s) {
Pattern pattern = Pattern.compile("(" + regex + ")$");
Matcher matcher = pattern.matcher(s);
String longest = null;
int longestLength = -1;
for (int i = s.length(); i > longestLength; i--) {
matcher.region(0, i);
if (matcher.find() && longestLength < matcher.end() - matcher.start()) {
longest = matcher.group();
longestLength = longest.length();
}
}
return longest;
}
I'm forcing the pattern to match until the region's end, and then I move the region's end from the rightmost string index towards the left. For each region's end tried, Java will match the leftmost starting substring that finishes at that region's end, i.e. the longest substring that ends at that place. Finally, it's just a matter of keeping track of the longest match found so far.
As a matter of optimization, and since I start from the longer regions towards the shorter ones, I stop the loop as soon as all regions that would come after are already shorter than the length of longest substring already found.
An advantage of this approach is that it can deal with arbitrary regular expressions and no specific pattern structure is required:
findLongestMatch("(?<m1>(hello|universe))|(?<m2>(hello world))", "hello world")
==> "hello world"
findLongestMatch("hello( universe)?", "hello world")
==> "hello"
findLongestMatch("hello( world)?", "hello world")
==> "hello world"
findLongestMatch("\\w+|\\d+", "12345 abc")
==> "12345"
If you are dealing with just this specific pattern:
There is one or more named group on the highest level connected by |.
The regex for the group is put in superfluous braces.
Inside those braces is one or more literal connected by |.
Literals never contain |, ( or ).
Then it is possible to write a solution by extracting the literals, sorting them by their length and then returning the first match:
private static final Pattern g = Pattern.compile("\\(\\?\\<[^>]+\\>\\(([^)]+)\\)\\)");
public static final String findLongestMatch(String s, Pattern p) {
Matcher m = g.matcher(p.pattern());
List<String> literals = new ArrayList<>();
while (m.find())
Collections.addAll(literals, m.group(1).split("\\|"));
Collections.sort(literals, new Comparator<String>() {
public int compare(String a, String b) {
return Integer.compare(b.length(), a.length());
}
});
for (Iterator<String> itr = literals.iterator(); itr.hasNext();) {
String literal = itr.next();
if (s.indexOf(literal) >= 0)
return literal;
}
return null;
}
Test:
System.out.println(findLongestMatch(
"hello world",
Pattern.compile("(?<m1>(hello|universe))|(?<m2>(hello world))")
));
// output: hello world
System.out.println(findLongestMatch(
"hello universe",
Pattern.compile("(?<m1>(hello|universe))|(?<m2>(hello world))")
));
// output: universe
just add the $ (End of string) before the Or separator |.
Then it check whether the string is ended of not. If ended, it will return the string. Otherwise skip that part of regex.
The below code gives what you want
import java.util.regex.*;
public class RegTest{
public static void main(String[] arg){
String regex = "(?<m1>(hello|universe))$|(?<m2>(hello world))";
String s = "hello world";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(s);
while(matcher.find()) {
MatchResult matchResult = matcher.toMatchResult();
String substring = s.substring(matchResult.start(), matchResult.end());
System.out.println(substring);
}
}
}
Likewise, the below code will skip hello , hello world and match hello world there
See the usage of $ there
import java.util.regex.*;
public class RegTest{
public static void main(String[] arg){
String regex = "(?<m1>(hello|universe))$|(?<m2>(hello world))$|(?<m3>(hello world there))";
String s = "hello world there";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(s);
while(matcher.find()) {
MatchResult matchResult = matcher.toMatchResult();
String substring = s.substring(matchResult.start(), matchResult.end());
System.out.println(substring);
}
}
}
If the structure of the regex is always the same, this should work:
String regex = "(?<m1>(hello|universe))|(?<m2>(hello world))";
String s = "hello world";
//split the regex into the different groups
String[] allParts = regex.split("\\|\\(\\?\\<");
for (int i=1; i<allParts.length; i++) {
allParts[i] = "(?<" + allParts[i];
}
//find the longest string
int longestSize = -1;
String longestString = null;
for (int i=0; i<allParts.length; i++) {
Pattern pattern = Pattern.compile(allParts[i]);
Matcher matcher = pattern.matcher(s);
while(matcher.find()) {
MatchResult matchResult = matcher.toMatchResult();
String substring = s.substring(matchResult.start(), matchResult.end());
if (substring.length() > longestSize) {
longestSize = substring.length();
longestString = substring;
}
}
}
System.out.println("Longest: " + longestString);

ReplaceFirst with Regular Expression

Let's say I have a String
String link = "www.thisisalink.com/tick1=#tick1#&tick2=#tick2#&tick3=#tick3#&tick4=#tick4#";
Then I can use
link = replaceFirst("(.+)=#\\1#", "");
To make it
link = "www.thisisalink.com/&tick2=#tick2#&tick3=#tick3#&tick4=#tick4#";
But I want to loop though the String, to get what has been replace and save it somewhere else, like a linked list or an array... result would be:
String[] result = ["tick1=#tick1#", "tick2=#tick2#", "tick3=#tick3#", "tick4=#tick4#"];
String link = "www.thisisalink.com/&&&";
But how can I do this? I tried looping with
while (link.matches("(.+)=#\\1#")){}
Which didn't work.
You can use Pattern Matcher classes to iterate over your string to find substrings that will match your regex. Then to replace founded substring you can use appednReplacement and appendTail. To get founded match you can use group() from Matcher instance.
Here is something similar to what you want
String link = "www.thisisalink.com/tick1=#tick1#&tick2=#tick2#&tick3=#tick3#&tick4=#tick4#";
StringBuffer sb = new StringBuffer();
Pattern p = Pattern.compile("(.+)=#\\1#");
Matcher m = p.matcher(link);
List<String> replaced = new ArrayList<>();
while (m.find()) {
m.appendReplacement(sb, "");
replaced.add(m.group());
}
m.appendTail(sb);
//to replace link with String stored in sb use link=sb.toString();
//otherwise link will be unchanged
System.out.println(sb);
System.out.println(replaced);
output:
www.thisisalink.com/&&&
[tick1=#tick1#, tick2=#tick2#, tick3=#tick3#, tick4=#tick4#]
This produces the Strings you want:
public static void main(String[] args)
{
final String link = "www.thisisalink.com/tick1=#tick1#&tick2=#tick2#&tick3=#tick3#&tick4=#tick4#";
final int index = link.indexOf("/") + 1;
final String[] result = link.substring(index).split("&");
final String newLink = link.substring(0, index) + repeat("&", result.length -1);
System.out.println(newLink);
for(final String tick : result)
{
System.out.println(tick);
}
}
private static String repeat(final String toRepeat, final int repetitions)
{
final StringBuilder sb = new StringBuilder(repetitions);
for(int i = 0; i < repetitions; i++)
{
sb.append(toRepeat);
}
return sb.toString();
}
Produces:
www.thisisalink.com/&&&
tick1=#tick1#
tick2=#tick2#
tick3=#tick3#
tick4=#tick4#

better way to create a string in java

I have a string as follows:
"This is #awesome #dude"
From this string i want to extract awesome and dude and create a string
output==> "awesome,dude"
So my code is like following:
Matcher matcher = Pattern.compile("(?<=#)\\w+").matcher(textStr);
while (matcher.find()){
mergedStr += matcher.group() +",";
}
But this creates an artifact in the end
output==> "awesome,dude," //<-- egghh comma.. in the end
What is a better way to solve this out.
Another approach:
boolean needComma = false;
while (blah, blah, blah) {
if (needComma) {
string += ",";
}
string += word;
needComma = true;
}
But there are a dozen different approaches.
This is one option:
Matcher matcher = Pattern.compile("(?<=#)\\w+").matcher(textStr);
while (matcher.find()){
if (!mergedStr.isEmpty())
mergedStr += ",";
mergedStr += matcher.group();
}
Here is another common approach:
Matcher matcher = Pattern.compile("(?<=#)\\w+").matcher(textStr);
StringBuilder sb = new StringBuilder();
while (matcher.find()){
sb.append(matcher.group()).append(",");
}
return sb.toString().replaceAll(",$", "");
If you don't want to use a regex, you could do it like this:
Matcher matcher = Pattern.compile("(?<=#)\\w+").matcher(textStr);
StringBuilder sb = new StringBuilder();
while (matcher.find()){
sb.append(matcher.group()).append(",");
}
if (sb.length() == 0) {
return "";
}
else {
return sb.toString().substring(0, sb.length() - 1);
}
A useful pattern that I often use for this kind of thing is to append the first item, and then append the remainder of the items preceded by the separator. This avoids unnecessary conditionals in loops or postprocessing to remove trailing separators.
I know, microoptimizations blah, blah, sixth circle of hell, blah, blah, but just including here for your amusement:
Matcher matcher = Pattern.compile("(?<=#)\\w+").matcher(textStr);
StringBuilder mergedStr = new StringBuilder();
if (matcher.find()) {
mergedStr.append(matcher.group());
while (matcher.find()) {
mergedStr.append(',').append(matcher.group());
}
}
return mergedStr.toString();
Also, I'm not 100% convinced that replacing a quadratic algorithm (string concatenation) with a linear algorithm (StringBuilder) qualifies as a microoptimization in the bad sense.
String input = "#awesome#dude";
List<String> strSplit = new ArrayList<String>();
String result = "";
Matcher matcher = Pattern.compile("(?<=#)\\w+").matcher(input);
while (matcher.find()){
strSplit.add(matcher.group());
}
for(int j = 0; j< strSplit.size(); j++){
result = result + strSplit.get(j);
if(j < strSplit.size() -1){
result = result+",";
}
}
System.out.println("Result : " + result);

Java replace string with increasing number

I want to replace "a" of "abababababababab" with 001,002,003,004......
that is "001b002b003b004b005b....."
int n=1
String test="ababababab";
int lo=test.lastIndexOf("a");
while(n++<=lo) Abstract=Abstract.replaceFirst("a",change(n));
//change is another function to return a string "00"+n;
however this is poor efficiency,when the string is large enough,it will take minutes!
do you have a high efficiency way?
thanks very much!
Use a Matcher to find and replace the as:
public static void main(String[] args) {
Matcher m = Pattern.compile("a").matcher("abababababababab");
StringBuffer sb = new StringBuffer();
int i = 1;
while (m.find())
m.appendReplacement(sb, new DecimalFormat("000").format(i++));
m.appendTail(sb);
System.out.println(sb);
}
Outputs:
001b002b003b004b005b006b007b008b

Categories