Replacing Pattern Matches in a String - java

String output = "";
pattern = Pattern.compile(">Part\s.");
matcher = pattern.matcher(docToProcess);
while (matcher.find()) {
match = matcher.group();
}
I'm trying to use the above code to find the pattern >Part\s. inside docToProcess (Which is a string of a large xml document) and then what I want to do is replace the content that matches the pattern with <ref></ref>
Any ideas how I can make the output variable equal to docToProcess except with the replacements as indicated above?
EDIT: I need to use the matcher somehow when replacing. I can't just use replaceAll()

You can use String#replaceAll method. It takes a Regex as first parameter: -
String output = docToProcess.replaceAll(">Part\\s\\.", "<ref></ref>");
Note that, dot (.) is a special meta-character in regex, which matches everything, and not just a dot(.). So, you need to escape it, unless you really wanted to match any character after >Part\\s. And you need to add 2 backslashes to escape in Java.
If you want to use Matcher class, the you can use Matcher.appendReplacement method: -
String docToProcess = "XYZ>Part .asdf";
Pattern p = Pattern.compile(">Part\\s\\.");
Matcher m = p.matcher(docToProcess);
StringBuffer sb = new StringBuffer();
while (m.find()) {
m.appendReplacement(sb, "<ref></ref>");
}
m.appendTail(sb);
System.out.println(sb.toString());
OUTPUT : -
"XYZ<ref></ref>asdf"

This is what you need:
String docToProcess = "... your xml here ...";
Pattern pattern = Pattern.compile(">Part\\s.");
Matcher matcher = pattern.matcher(docToProcess);
StringBuffer output = new StringBuffer();
while (matcher.find()) matcher.appendReplacement(output, "<ref></ref>");
matcher.appendTail(output);
Unfortunately, you can't use the StringBuilder due to historical constraints on the Java API.

docToProcess.replaceAll(">Part\\s[.]", "<ref></ref>");

String output = docToProcess.replaceAll(">Part\\s\\.", "<ref></ref>");

Related

Does Java have an equivalent to .NET's Regex.Replace Method (String, MatchEvaluator) [duplicate]

I'm trying to do something like this:
public String evaluateString(String s){
Pattern p = Pattern.compile("someregex");
Matcher m = p.matcher(s);
while(m.find()){
m.replaceCurrent(methodFoo(m.group()));
}
}
The problem is that there is no replaceCurrent method. Maybe there is an equivalent I overlooked. Basically I want to replace each match with the return value of a method called on that match. Any tips would be much appreciated!
Update:
Since Java 9 we can use Matcher#replaceAll​(Function<MatchResult,​String> replacer) like
String result = Pattern.compile("yourRegex")
.matcher(yourString)
.replaceAll(match -> yourMethod(match.group()));
// ^^^- or generate replacement directly
// like `match.group().toUpperCase()`
Before Java 9
You may use Matcher#appendReplacement and Matcher#appendTail.
appendReplacement will do two things:
it will add to selected buffer text placed between current match and previous match (or start of string for first match),
after that, it will also add replacement for current match (which can be based on it).
appendTail will add to buffer text placed after current match.
Pattern p = Pattern.compile("yourRegex");
Matcher m = p.matcher(yourString);
StringBuffer sb = new StringBuffer();
while (m.find()) {
m.appendReplacement(sb, yourMethod(m.group()));
}
m.appendTail(sb);
String result = sb.toString();

Regex to get value between two colon excluding the colons

I have a string like this:
something:POST:/some/path
Now I want to take the POST alone from the string. I did this by using this regex
:([a-zA-Z]+):
But this gives me a value along with colons. ie I get this:
:POST:
but I need this
POST
My code to match the same and replace it is as follows:
String ss = "something:POST:/some/path/";
Pattern pattern = Pattern.compile(":([a-zA-Z]+):");
Matcher matcher = pattern.matcher(ss);
if (matcher.find()) {
System.out.println(matcher.group());
ss = ss.replaceFirst(":([a-zA-Z]+):", "*");
}
System.out.println(ss);
EDIT:
I've decided to use the lookahead/lookbehind regex since I did not want to use replace with colons such as :*:. This is my final solution.
String s = "something:POST:/some/path/";
String regex = "(?<=:)[a-zA-Z]+(?=:)";
Matcher matcher = Pattern.compile(regex).matcher(s);
if (matcher.find()) {
s = s.replaceFirst(matcher.group(), "*");
System.out.println("replaced: " + s);
}
else {
System.out.println("not replaced: " + s);
}
There are two approaches:
Keep your Java code, and use lookahead/lookbehind (?<=:)[a-zA-Z]+(?=:), or
Change your Java code to replace the result with ":*:"
Note: You may want to define a String constant for your regex, since you use it in different calls.
As pointed out, the reqex captured group can be used to replace.
The following code did it:
String ss = "something:POST:/some/path/";
Pattern pattern = Pattern.compile(":([a-zA-Z]+):");
Matcher matcher = pattern.matcher(ss);
if (matcher.find()) {
ss = ss.replaceFirst(matcher.group(1), "*");
}
System.out.println(ss);
UPDATE
Looking at your update, you just need ReplaceFirst only:
String result = s.replaceFirst(":[a-zA-Z]+:", ":*:");
See the Java demo
When you use (?<=:)[a-zA-Z]+(?=:), the regex engine checks each location inside the string for a * before it, and once found, tries to match 1+ ASCII letters and then assert that there is a : after them. With :[A-Za-z]+:, the checking only starts after a regex engine found : character. Then, after matching :POST:, the replacement pattern replaces the whole match. It is totlally OK to hardcode colons in the replacement pattern since they are hardcoded in the regex pattern.
Original answer
You just need to access Group 1:
if (matcher.find()) {
System.out.println(matcher.group(1));
}
See Java demo
Your :([a-zA-Z]+): regex contains a capturing group (see (....) subpattern). These groups are numbered automatically: the first one has an index of 1, the second has the index of 2, etc.
To replace it, use Matcher#appendReplacement():
String s = "something:POST:/some/path/";
StringBuffer result = new StringBuffer();
Matcher m = Pattern.compile(":([a-zA-Z]+):").matcher(s);
while (m.find()) {
m.appendReplacement(result, ":*:");
}
m.appendTail(result);
System.out.println(result.toString());
See another demo
This is your solution:
regex = (:)([a-zA-Z]+)(:)
And code is:
String ss = "something:POST:/some/path/";
ss = ss.replaceFirst("(:)([a-zA-Z]+)(:)", "$1*$3");
ss now contains:
something:*:/some/path/
Which I believe is what you are looking for...

What's the best way to remove all \n \r \t from a string?

What's the best way to remove all \n, \r, \t from a String in java?
Is there some sort of library and method that can do that for me nicely instead of me having to use string.replaceAll(); multiple times?
Please try this:
str.replaceAll("[\\n\\r\\t]+", "");
There is no need to do str.replaceAll multiple times.
Just use a regex:
str.replaceAll("\\s+", "");
Using regex in java. for future reference, if you want to replace a more complex subset of strings
// strings that you want to remove
String regexp = "str1|str2|str3";
StringBuffer sb = new StringBuffer();
Pattern p = Pattern.compile(regexp);
// here input is your input string
Matcher m = p.matcher(input);
while (m.find())
m.appendReplacement(sb, "");
m.appendTail(sb);
System.out.println(sb.toString());

Get what was removed by String.replaceAll()

So, let's say I got my regular expression
String regex = "\d*";
for finding any digits.
Now I also got a inputted string, for example
String input = "We got 34 apples and too much to do";
Now I want to replace all digits with "", doing it like that:
input = input.replaceAll(regex, "");
When now printing input I got "We got apples and too much to do". It works, it replaced the 3 and the 4 with "".
Now my question: Is there any way - maybe an existing lib? - to get what actually was replaced?
The example here is very simple, just to understand how it works. Want to use it for complexer inputs and regex.
Thanks for your help.
You can use a Matcher with the append-and-replace procedure:
String regex = "\\d*";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(input);
StringBuffer sb = new StringBuffer();
StringBuffer replaced = new StringBuffer();
while(matcher.find()) {
replaced.append(matcher.group());
matcher.appendReplacement(sb, "");
}
matcher.appendTail(sb);
System.out.println(sb.toString()); // prints the replacement result
System.out.println(replaced.toString()); // prints what was replaced

Extract only the numbers from String

I need a Regex that given the following Strings: "12.123.123/1234-11", "12.123123123411" or "1123123/1234-11".
I could extract only the numbers(12123123123411);
Pattern padrao = Pattern.compile("\d+");
Matcher matcher = padrao.matcher("12.123.123/1234-11");
while (matcher.find()) {
System.out.println(matcher.group());
}
//output:12,123,123,1234,11,
//I need: 121231234123411
Can anyone help me?
A better way would be use String#replaceAll(regex, replacement) method to replace all characters except digits (As you see, the method takes a regex for replacing):
String str = "12.123.123/1234-11";
String digits = str.replaceAll("\\D", "");
\\D matches non-digit characters. Equivalent to [^0-9].
Note that, you need to escape the \D on Java regex engine.
If you have restriction for using Matcher#group() method, then you would have to build a StringBuilder instance, appending digits, everytime they are found:
String str = "12.123.123/1234-11";
StringBuilder digits = new StringBuilder();
Matcher matcher = Pattern.compile("\\d+").matcher(str);
while (matcher.find()) {
digits.append(matcher.group());
}
System.out.println(digits);
You could simply remove all the non-digit characters through replaceAll:
String out = string.replaceAll("\\D+", "");

Categories