Get what was removed by String.replaceAll() - java

So, let's say I got my regular expression
String regex = "\d*";
for finding any digits.
Now I also got a inputted string, for example
String input = "We got 34 apples and too much to do";
Now I want to replace all digits with "", doing it like that:
input = input.replaceAll(regex, "");
When now printing input I got "We got apples and too much to do". It works, it replaced the 3 and the 4 with "".
Now my question: Is there any way - maybe an existing lib? - to get what actually was replaced?
The example here is very simple, just to understand how it works. Want to use it for complexer inputs and regex.
Thanks for your help.

You can use a Matcher with the append-and-replace procedure:
String regex = "\\d*";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(input);
StringBuffer sb = new StringBuffer();
StringBuffer replaced = new StringBuffer();
while(matcher.find()) {
replaced.append(matcher.group());
matcher.appendReplacement(sb, "");
}
matcher.appendTail(sb);
System.out.println(sb.toString()); // prints the replacement result
System.out.println(replaced.toString()); // prints what was replaced

Related

How to capture a regex group for below pattern [duplicate]

This question already has answers here:
Regex: match everything but a specific pattern
(6 answers)
Closed 3 years ago.
I am exploring java regex groups and I am trying to replace a string with some characters.
I have a string str = "abXYabcXYZ"; and I am trying to replace all characters except for the pattern group abc in string.
I tried to use str.replaceAll("(^abc)",""), but it did not work. I understand that (abc) will match a group.
You might find it easier to find the parts you want to keep and just build a new string. There are flaws with this issue with overlapping patterns, but it will likely be good enough for your use case. However, if your pattern really is as simple as "abc" then you may want to instead consider just counting the total number of matches.
String str = "abXYabcXYZ";
Pattern patternToKeep = Pattern.compile("abc");
MatchResult matches = patternToKeep.matcher(str).toMatchResult();
StringBuilder sb = new StringBuilder();
for (int i = 1; i < matches.groupCount(); i++) {
sb.append(matches.group(i));
}
System.out.println(sb.toString());
It is easier to keep the matching parts of the pattern and concatenate them. In the following example the matcher iterates with find() over str and match the next pattern. In the loop your "abc" pattern will be always found at group(0).
String str = "abXYabcXYZabcxss";
Pattern pattern = Pattern.compile("abc");
StringBuilder sb = new StringBuilder();
Matcher matcher = pattern.matcher(str);
while(matcher.find()){
sb.append(matcher.group(0));
}
System.out.println(sb.toString());
For only replacing, the nearest you can get would be:
((?!abc).)*
But with the problem that only the a's of abc would not be replaced.
Regex101 example

Regex to split by special characters with exceptions JAVA

I am very new to regular expressions and Im having difficulties with this one:
I want to split a String when found this patern but also this one "text here" and this one "text here"^^ (this should be considered as one in the output).
Note these symbols: ^^
The three cases can be repeated each many times or can be one after the other and are always separated by spaces.
Example:
<\herewouldbeurl> "HEY THERE" "Asioc-project.org/."^^<\anotherurl/>
would produce:
1.<\herewouldbeurl>
2."HEY THERE"
3."Asioc-project.org/."^^<\anotherurl/>
Ive found this: "\s+(?=(?:(?<=[a-zA-Z])\"(?=[A-Za-z])|\"[^\"]\"|[^\"])$)" but does not work for the third case.
Any ideas?
Don't use split(). Use a find() loop.
String input = "<\\herewouldbeurl> \"HEY THERE\" \"Asioc-project.org/.\"^^<\\anotherurl/>";
Pattern p = Pattern.compile("\".*?\"(?:\\^\\^)?");
Matcher m = p.matcher(input);
int start = 0;
while (m.find()) {
String s = input.substring(start, m.start()).trim();
if (! s.isEmpty())
System.out.println(s);
System.out.println(m.group());
start = m.end();
}
String s = input.substring(start).trim();
if (! s.isEmpty())
System.out.println(s);
Output
<\herewouldbeurl>
"HEY THERE"
"Asioc-project.org/."^^
<\anotherurl/>

Regex to get value between two colon excluding the colons

I have a string like this:
something:POST:/some/path
Now I want to take the POST alone from the string. I did this by using this regex
:([a-zA-Z]+):
But this gives me a value along with colons. ie I get this:
:POST:
but I need this
POST
My code to match the same and replace it is as follows:
String ss = "something:POST:/some/path/";
Pattern pattern = Pattern.compile(":([a-zA-Z]+):");
Matcher matcher = pattern.matcher(ss);
if (matcher.find()) {
System.out.println(matcher.group());
ss = ss.replaceFirst(":([a-zA-Z]+):", "*");
}
System.out.println(ss);
EDIT:
I've decided to use the lookahead/lookbehind regex since I did not want to use replace with colons such as :*:. This is my final solution.
String s = "something:POST:/some/path/";
String regex = "(?<=:)[a-zA-Z]+(?=:)";
Matcher matcher = Pattern.compile(regex).matcher(s);
if (matcher.find()) {
s = s.replaceFirst(matcher.group(), "*");
System.out.println("replaced: " + s);
}
else {
System.out.println("not replaced: " + s);
}
There are two approaches:
Keep your Java code, and use lookahead/lookbehind (?<=:)[a-zA-Z]+(?=:), or
Change your Java code to replace the result with ":*:"
Note: You may want to define a String constant for your regex, since you use it in different calls.
As pointed out, the reqex captured group can be used to replace.
The following code did it:
String ss = "something:POST:/some/path/";
Pattern pattern = Pattern.compile(":([a-zA-Z]+):");
Matcher matcher = pattern.matcher(ss);
if (matcher.find()) {
ss = ss.replaceFirst(matcher.group(1), "*");
}
System.out.println(ss);
UPDATE
Looking at your update, you just need ReplaceFirst only:
String result = s.replaceFirst(":[a-zA-Z]+:", ":*:");
See the Java demo
When you use (?<=:)[a-zA-Z]+(?=:), the regex engine checks each location inside the string for a * before it, and once found, tries to match 1+ ASCII letters and then assert that there is a : after them. With :[A-Za-z]+:, the checking only starts after a regex engine found : character. Then, after matching :POST:, the replacement pattern replaces the whole match. It is totlally OK to hardcode colons in the replacement pattern since they are hardcoded in the regex pattern.
Original answer
You just need to access Group 1:
if (matcher.find()) {
System.out.println(matcher.group(1));
}
See Java demo
Your :([a-zA-Z]+): regex contains a capturing group (see (....) subpattern). These groups are numbered automatically: the first one has an index of 1, the second has the index of 2, etc.
To replace it, use Matcher#appendReplacement():
String s = "something:POST:/some/path/";
StringBuffer result = new StringBuffer();
Matcher m = Pattern.compile(":([a-zA-Z]+):").matcher(s);
while (m.find()) {
m.appendReplacement(result, ":*:");
}
m.appendTail(result);
System.out.println(result.toString());
See another demo
This is your solution:
regex = (:)([a-zA-Z]+)(:)
And code is:
String ss = "something:POST:/some/path/";
ss = ss.replaceFirst("(:)([a-zA-Z]+)(:)", "$1*$3");
ss now contains:
something:*:/some/path/
Which I believe is what you are looking for...

What's the best way to remove all \n \r \t from a string?

What's the best way to remove all \n, \r, \t from a String in java?
Is there some sort of library and method that can do that for me nicely instead of me having to use string.replaceAll(); multiple times?
Please try this:
str.replaceAll("[\\n\\r\\t]+", "");
There is no need to do str.replaceAll multiple times.
Just use a regex:
str.replaceAll("\\s+", "");
Using regex in java. for future reference, if you want to replace a more complex subset of strings
// strings that you want to remove
String regexp = "str1|str2|str3";
StringBuffer sb = new StringBuffer();
Pattern p = Pattern.compile(regexp);
// here input is your input string
Matcher m = p.matcher(input);
while (m.find())
m.appendReplacement(sb, "");
m.appendTail(sb);
System.out.println(sb.toString());

Replacing Pattern Matches in a String

String output = "";
pattern = Pattern.compile(">Part\s.");
matcher = pattern.matcher(docToProcess);
while (matcher.find()) {
match = matcher.group();
}
I'm trying to use the above code to find the pattern >Part\s. inside docToProcess (Which is a string of a large xml document) and then what I want to do is replace the content that matches the pattern with <ref></ref>
Any ideas how I can make the output variable equal to docToProcess except with the replacements as indicated above?
EDIT: I need to use the matcher somehow when replacing. I can't just use replaceAll()
You can use String#replaceAll method. It takes a Regex as first parameter: -
String output = docToProcess.replaceAll(">Part\\s\\.", "<ref></ref>");
Note that, dot (.) is a special meta-character in regex, which matches everything, and not just a dot(.). So, you need to escape it, unless you really wanted to match any character after >Part\\s. And you need to add 2 backslashes to escape in Java.
If you want to use Matcher class, the you can use Matcher.appendReplacement method: -
String docToProcess = "XYZ>Part .asdf";
Pattern p = Pattern.compile(">Part\\s\\.");
Matcher m = p.matcher(docToProcess);
StringBuffer sb = new StringBuffer();
while (m.find()) {
m.appendReplacement(sb, "<ref></ref>");
}
m.appendTail(sb);
System.out.println(sb.toString());
OUTPUT : -
"XYZ<ref></ref>asdf"
This is what you need:
String docToProcess = "... your xml here ...";
Pattern pattern = Pattern.compile(">Part\\s.");
Matcher matcher = pattern.matcher(docToProcess);
StringBuffer output = new StringBuffer();
while (matcher.find()) matcher.appendReplacement(output, "<ref></ref>");
matcher.appendTail(output);
Unfortunately, you can't use the StringBuilder due to historical constraints on the Java API.
docToProcess.replaceAll(">Part\\s[.]", "<ref></ref>");
String output = docToProcess.replaceAll(">Part\\s\\.", "<ref></ref>");

Categories