String.replaceAll variation - java

Is there a quick way to replace all of some pattern occurrences with data derived from the matched pattern?
For example, if I wanted to replace all occurrences of a number within a string with the same number padded to fixed length with 0s.
In this case if the length is 4, then ab3cd5 would become ab0003cd0005.
My idea was using a StringBuilder and 2 patterns: one would get all numbers and the other would get everything that is not a number, and appending the matches to the builder by the index the matches were found.
I think there might be something simpler.

You can probably achieve what you're after using appendReplacement and appendTail, something like this:
import java.util.regex.Pattern;
import java.util.regex.Matcher;
String REGEX = "(\\d+)";
String INPUT = "abc3def45";
NumberFormat formatter = new DecimalFormat("0000");
Pattern p = Pattern.compile(REGEX);
Matcher m = p.matcher(INPUT); // get a matcher object
StringBuffer sb = new StringBuffer();
while(m.find()){
m.appendReplacement(sb,formatter.format(Integer.parseInt(m.group(1))));
}
m.appendTail(sb);
String result = sb.toString();

If you know exactly how many zeros you want to pad before any single number, then something like this should work:
String text = "ab3cd5";
text = text.replaceAll("\\d","0000$0");
System.out.println(text);
Otherwise:
Pattern pattern = Pattern.compile("\\d+");
Matcher matcher = pattern.matcher(text);
StringBuffer result = new StringBuffer();
while(matcher.find()){
matcher.appendReplacement(result, String.format("%04d", Integer.parseInt(matcher.group())));
}
matcher.appendTail(result);
System.out.println(result);
The format %04d means: an integer, padded by zero up to a length of 4.

Related

How to replace multiple consecutive occurrences of a character with a maximum allowed number of occurences?

CharSequence content = new StringBuffer("aaabbbccaaa");
String pattern = "([a-zA-Z])\\1\\1+";
String replace = "-";
Pattern patt = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE);
Matcher matcher = patt.matcher(content);
boolean isMatch = matcher.find();
StringBuffer buffer = new StringBuffer();
for (int i = 0; i < content.length(); i++) {
while (matcher.find()) {
matcher.appendReplacement(buffer, replace);
}
}
matcher.appendTail(buffer);
System.out.println(buffer.toString());
In the above code content is input string,
I am trying to find repetitive occurrences from string and want to replace it with max no of occurrences
For Example
input -("abaaadccc",2)
output - "abaadcc"
here aaaand cccis replced by aa and cc as max allowed repitation is 2
In the above code, I found such occurrences and tried replacing them with -, it's working, But can someone help me How can I get current char and replace with allowed occurrences
i.e If aaa is found it is replaced by aa
or is there any alternative method w/o using regex?
You can declare the second group in a regex and use it as a replacement:
String result = "aaabbbccaaa".replaceAll("(([a-zA-Z])\\2)\\2+", "$1");
Here's how it works:
( first group - a character repeated two times
([a-zA-Z]) second group - a character
\2 a character repeated once
)
\2+ a character repeated at least once more
Thus, the first group captures a replacement string.
It isn't hard to extrapolate this solution for a different maximum value of allowed repeats:
String input = "aaaaabbcccccaaa";
int maxRepeats = 4;
String pattern = String.format("(([a-zA-Z])\\2{%s})\\2+", maxRepeats-1);
String result = input.replaceAll(pattern, "$1");
System.out.println(result); //aaaabbccccaaa
Since you defined a group in your regex, you can get the matching characters of this group by calling matcher.group(1). In your case it contains the first character from the repeating group so by appending it twice you get your expected result.
CharSequence content = new StringBuffer("aaabbbccaaa");
String pattern = "([a-zA-Z])\\1\\1+";
Pattern patt = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE);
Matcher matcher = patt.matcher(content);
StringBuffer buffer = new StringBuffer();
while (matcher.find()) {
System.out.println("found : "+matcher.start()+","+matcher.end()+":"+matcher.group(1));
matcher.appendReplacement(buffer, matcher.group(1)+matcher.group(1));
}
matcher.appendTail(buffer);
System.out.println(buffer.toString());
Output:
found : 0,3:a
found : 3,6:b
found : 8,11:a
aabbccaa

Java Regexp "\\d+" (Digits Only) not working

Input string: 07-000
JAVA Regexp: \\d+ (digits only)
Expected Result: 07000 (digits only from input string)
Then why does this Java code return 07 only?
Pattern pattern = Pattern.compile("\\d+");
Matcher matcher = pattern.matcher("07-000");
String result = null;
if (matcher.find()) {
result = matcher.group();
}
System.out.println(result);
I guess that what you want to achieve is rather this:
Pattern pattern = Pattern.compile("\\d+");
Matcher matcher = pattern.matcher("07-000");
StringBuilder result = new StringBuilder();
// Iterate over all the matches
while (matcher.find()) {
// Append the new match to the current result
result.append(matcher.group());
}
System.out.println(result);
Output:
07000
Indeed matcher.find() will return the next subsequence in the input that matches with the pattern so if you call it only once, you will get only the first subsequence which is 07 here. So if you want to get everything you need to loop until it returns false indicating that there is no more matches available.
However in this particular case, it would be better to call directly myString.replaceAll("\\D+", "") which will replace by an empty String any non digit characters.
Then why does this Java code return 07 only?
It returns only 07 because that is the first group found by your regex, you need a while loop to get all groups and later you can concatenate them to get all numbers in one string.
Pattern pattern = Pattern.compile("\\d+");
Matcher matcher = pattern.matcher("07-000");
StringBuilder sb = new StringBuilder();
while (matcher.find())
{
sb.append( matcher.group() );
}
System.out.println( "All the numbers are : " + sb.toString() );

Get what was removed by String.replaceAll()

So, let's say I got my regular expression
String regex = "\d*";
for finding any digits.
Now I also got a inputted string, for example
String input = "We got 34 apples and too much to do";
Now I want to replace all digits with "", doing it like that:
input = input.replaceAll(regex, "");
When now printing input I got "We got apples and too much to do". It works, it replaced the 3 and the 4 with "".
Now my question: Is there any way - maybe an existing lib? - to get what actually was replaced?
The example here is very simple, just to understand how it works. Want to use it for complexer inputs and regex.
Thanks for your help.
You can use a Matcher with the append-and-replace procedure:
String regex = "\\d*";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(input);
StringBuffer sb = new StringBuffer();
StringBuffer replaced = new StringBuffer();
while(matcher.find()) {
replaced.append(matcher.group());
matcher.appendReplacement(sb, "");
}
matcher.appendTail(sb);
System.out.println(sb.toString()); // prints the replacement result
System.out.println(replaced.toString()); // prints what was replaced

Extract only the numbers from String

I need a Regex that given the following Strings: "12.123.123/1234-11", "12.123123123411" or "1123123/1234-11".
I could extract only the numbers(12123123123411);
Pattern padrao = Pattern.compile("\d+");
Matcher matcher = padrao.matcher("12.123.123/1234-11");
while (matcher.find()) {
System.out.println(matcher.group());
}
//output:12,123,123,1234,11,
//I need: 121231234123411
Can anyone help me?
A better way would be use String#replaceAll(regex, replacement) method to replace all characters except digits (As you see, the method takes a regex for replacing):
String str = "12.123.123/1234-11";
String digits = str.replaceAll("\\D", "");
\\D matches non-digit characters. Equivalent to [^0-9].
Note that, you need to escape the \D on Java regex engine.
If you have restriction for using Matcher#group() method, then you would have to build a StringBuilder instance, appending digits, everytime they are found:
String str = "12.123.123/1234-11";
StringBuilder digits = new StringBuilder();
Matcher matcher = Pattern.compile("\\d+").matcher(str);
while (matcher.find()) {
digits.append(matcher.group());
}
System.out.println(digits);
You could simply remove all the non-digit characters through replaceAll:
String out = string.replaceAll("\\D+", "");

Replacing Pattern Matches in a String

String output = "";
pattern = Pattern.compile(">Part\s.");
matcher = pattern.matcher(docToProcess);
while (matcher.find()) {
match = matcher.group();
}
I'm trying to use the above code to find the pattern >Part\s. inside docToProcess (Which is a string of a large xml document) and then what I want to do is replace the content that matches the pattern with <ref></ref>
Any ideas how I can make the output variable equal to docToProcess except with the replacements as indicated above?
EDIT: I need to use the matcher somehow when replacing. I can't just use replaceAll()
You can use String#replaceAll method. It takes a Regex as first parameter: -
String output = docToProcess.replaceAll(">Part\\s\\.", "<ref></ref>");
Note that, dot (.) is a special meta-character in regex, which matches everything, and not just a dot(.). So, you need to escape it, unless you really wanted to match any character after >Part\\s. And you need to add 2 backslashes to escape in Java.
If you want to use Matcher class, the you can use Matcher.appendReplacement method: -
String docToProcess = "XYZ>Part .asdf";
Pattern p = Pattern.compile(">Part\\s\\.");
Matcher m = p.matcher(docToProcess);
StringBuffer sb = new StringBuffer();
while (m.find()) {
m.appendReplacement(sb, "<ref></ref>");
}
m.appendTail(sb);
System.out.println(sb.toString());
OUTPUT : -
"XYZ<ref></ref>asdf"
This is what you need:
String docToProcess = "... your xml here ...";
Pattern pattern = Pattern.compile(">Part\\s.");
Matcher matcher = pattern.matcher(docToProcess);
StringBuffer output = new StringBuffer();
while (matcher.find()) matcher.appendReplacement(output, "<ref></ref>");
matcher.appendTail(output);
Unfortunately, you can't use the StringBuilder due to historical constraints on the Java API.
docToProcess.replaceAll(">Part\\s[.]", "<ref></ref>");
String output = docToProcess.replaceAll(">Part\\s\\.", "<ref></ref>");

Categories