How to capture a regex group for below pattern [duplicate] - java

This question already has answers here:
Regex: match everything but a specific pattern
(6 answers)
Closed 3 years ago.
I am exploring java regex groups and I am trying to replace a string with some characters.
I have a string str = "abXYabcXYZ"; and I am trying to replace all characters except for the pattern group abc in string.
I tried to use str.replaceAll("(^abc)",""), but it did not work. I understand that (abc) will match a group.

You might find it easier to find the parts you want to keep and just build a new string. There are flaws with this issue with overlapping patterns, but it will likely be good enough for your use case. However, if your pattern really is as simple as "abc" then you may want to instead consider just counting the total number of matches.
String str = "abXYabcXYZ";
Pattern patternToKeep = Pattern.compile("abc");
MatchResult matches = patternToKeep.matcher(str).toMatchResult();
StringBuilder sb = new StringBuilder();
for (int i = 1; i < matches.groupCount(); i++) {
sb.append(matches.group(i));
}
System.out.println(sb.toString());

It is easier to keep the matching parts of the pattern and concatenate them. In the following example the matcher iterates with find() over str and match the next pattern. In the loop your "abc" pattern will be always found at group(0).
String str = "abXYabcXYZabcxss";
Pattern pattern = Pattern.compile("abc");
StringBuilder sb = new StringBuilder();
Matcher matcher = pattern.matcher(str);
while(matcher.find()){
sb.append(matcher.group(0));
}
System.out.println(sb.toString());
For only replacing, the nearest you can get would be:
((?!abc).)*
But with the problem that only the a's of abc would not be replaced.
Regex101 example

Related

Find ALL matches of a regex pattern in Java - even overlapping ones [duplicate]

This question already has answers here:
Matcher not finding overlapping words?
(4 answers)
Closed 4 years ago.
I have a String of the form:
1,2,3,4,5,6,7,8,...
I am trying to find all substrings in this string that contain exactly 4 digits. For this I have the regex [0-9],[0-9],[0-9],[0-9]. Unfortunately when I try to match the regex against my String, I never obtain all the substrings, only a part of all the possible substrings. For instance, in the example above I would only get:
1,2,3,4
5,6,7,8
although I expect to get:
1,2,3,4
2,3,4,5
3,4,5,6
...
How would I go about finding all matches corresponding to my regex?
for info, I am using Pattern and Matcher to find the matches:
Pattern pattern = Pattern.compile([0-9],[0-9],[0-9],[0-9]);
Matcher matcher = pattern.matcher(myString);
List<String> matches = new ArrayList<String>();
while (matcher.find())
{
matches.add(matcher.group());
}
By default, successive calls to Matcher.find() start at the end of the previous match.
To find from a specific location pass a start position parameter to find of one character past the start of the previous find.
In your case probably something like:
while (matcher.find(matcher.start()+1))
This works fine:
Pattern p = Pattern.compile("[0-9],[0-9],[0-9],[0-9]");
public void test(String[] args) throws Exception {
String test = "0,1,2,3,4,5,6,7,8,9";
Matcher m = p.matcher(test);
if(m.find()) {
do {
System.out.println(m.group());
} while(m.find(m.start()+1));
}
}
printing
0,1,2,3
1,2,3,4
...
If you are looking for a pure regex based solution then you may use this lookahead based regex for overlapping matches:
(?=((?:[0-9],){3}[0-9]))
Note that your matches are available in captured group #1
RegEx Demo
Code:
final String regex = "(?=((?:[0-9],){3}[0-9]))";
final String string = "0,1,2,3,4,5,6,7,8,9";
final Pattern pattern = Pattern.compile(regex);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
Code Demo
output:
0,1,2,3
1,2,3,4
2,3,4,5
3,4,5,6
4,5,6,7
5,6,7,8
6,7,8,9
Some sample code without regex (since it seems not useful to me). Also I would assume regex to be slower in this case. Yet it will only work as it is as long as the numbers are only 1 character long.
String s = "a,b,c,d,e,f,g,h";
for (int i = 0; i < s.length() - 8; i+=2) {
System.out.println(s.substring(i, i + 7));
}
Ouput for this string:
a,b,c,d
b,c,d,e
c,d,e,f
d,e,f,g
As #OldCurmudgeon pointed out, find() by default start looking from the end of the previous match. To position it right after the first matched element, introduce the first matched region as a capturing group, and use it's end index:
Pattern pattern = Pattern.compile("(\\d,)\\d,\\d,\\d");
Matcher matcher = pattern.matcher("1,2,3,4,5,6,7,8,9");
List<String> matches = new ArrayList<>();
int start = 0;
while (matcher.find(start)) {
start = matcher.end(1);
matches.add(matcher.group());
}
System.out.println(matches);
results in
[1,2,3,4, 2,3,4,5, 3,4,5,6, 4,5,6,7, 5,6,7,8, 6,7,8,9]
This approach would also work if your matching region is longer than one digit

java split by bracket and keep the delmiter - RegEx [duplicate]

This question already has answers here:
How do I split a string in Java?
(39 answers)
Closed 6 years ago.
i am trying to split the string using regex with closing bracket as a delimiter and have to keep the bracket..
i/p String: (GROUP=test1)(GROUP=test2)(GROUP=test3)(GROUP=test4)
needed o/p:
(GROUP=test1)
(GROUP=test2)
(GROUP=test3)
(GROUP=test4)
I am using the java regex - "\([^)]*?\)" and it is throwing me the error..Below is the code I am using and when I try to get the group, its throwing the error..
Pattern splitDelRegex = Pattern.compile("\\([^)]*?\\)");
Matcher regexMatcher = splitDelRegex.matcher("(GROUP=test1)(GROUP=test2) (GROUP=test3)(GROUP=test4)");
List<String> matcherList = new ArrayList<String>();
while(regexMatcher.find()){
String perm = regexMatcher.group(1);
matcherList.add(perm);
}
any help is appreciated..Thanks
You simply forgot to put capturing parentheses around the entire regex. You are not capturing anything at all. Just change the regex to
Pattern splitDelRegex = Pattern.compile("(\\([^)]*?\\))");
^ ^
I tested this in Eclipse and got your desired output.
You could use
str.split(")")
That would return an array of strings which you would know are lacking the closing parentheses and so could add them back in afterwards. Thats seems much easier and less error prone to me.
You could try changing this line :
String perm = regexMatcher.group(1);
To this :
String perm = regexMatcher.group();
So you read the last found group.
I'm not sure why you need to split the string at all. You can capture each of the bracketed groups with a regex.
Try this regex (\\([a-zA-Z0-9=]*\\)). I have a capturing group () that looks for text that starts with a literal \\(, contains [a-zA-Z0-9=] zero or many times * and ends with a literal \\). This is a pretty loose regex, you could tighten up the match if the text inside the brackets will be predictable.
String input = "(GROUP=test1)(GROUP=test2)(GROUP=test3)(GROUP=test4)";
String regex = "(\\([a-zA-Z0-9=]*\\))";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(input);
while(matcher.find()) { // find the next match
System.out.println(matcher.group()); // print the match
}
Output:
(GROUP=test1)
(GROUP=test2)
(GROUP=test3)
(GROUP=test4)

Regex to get value between two colon excluding the colons

I have a string like this:
something:POST:/some/path
Now I want to take the POST alone from the string. I did this by using this regex
:([a-zA-Z]+):
But this gives me a value along with colons. ie I get this:
:POST:
but I need this
POST
My code to match the same and replace it is as follows:
String ss = "something:POST:/some/path/";
Pattern pattern = Pattern.compile(":([a-zA-Z]+):");
Matcher matcher = pattern.matcher(ss);
if (matcher.find()) {
System.out.println(matcher.group());
ss = ss.replaceFirst(":([a-zA-Z]+):", "*");
}
System.out.println(ss);
EDIT:
I've decided to use the lookahead/lookbehind regex since I did not want to use replace with colons such as :*:. This is my final solution.
String s = "something:POST:/some/path/";
String regex = "(?<=:)[a-zA-Z]+(?=:)";
Matcher matcher = Pattern.compile(regex).matcher(s);
if (matcher.find()) {
s = s.replaceFirst(matcher.group(), "*");
System.out.println("replaced: " + s);
}
else {
System.out.println("not replaced: " + s);
}
There are two approaches:
Keep your Java code, and use lookahead/lookbehind (?<=:)[a-zA-Z]+(?=:), or
Change your Java code to replace the result with ":*:"
Note: You may want to define a String constant for your regex, since you use it in different calls.
As pointed out, the reqex captured group can be used to replace.
The following code did it:
String ss = "something:POST:/some/path/";
Pattern pattern = Pattern.compile(":([a-zA-Z]+):");
Matcher matcher = pattern.matcher(ss);
if (matcher.find()) {
ss = ss.replaceFirst(matcher.group(1), "*");
}
System.out.println(ss);
UPDATE
Looking at your update, you just need ReplaceFirst only:
String result = s.replaceFirst(":[a-zA-Z]+:", ":*:");
See the Java demo
When you use (?<=:)[a-zA-Z]+(?=:), the regex engine checks each location inside the string for a * before it, and once found, tries to match 1+ ASCII letters and then assert that there is a : after them. With :[A-Za-z]+:, the checking only starts after a regex engine found : character. Then, after matching :POST:, the replacement pattern replaces the whole match. It is totlally OK to hardcode colons in the replacement pattern since they are hardcoded in the regex pattern.
Original answer
You just need to access Group 1:
if (matcher.find()) {
System.out.println(matcher.group(1));
}
See Java demo
Your :([a-zA-Z]+): regex contains a capturing group (see (....) subpattern). These groups are numbered automatically: the first one has an index of 1, the second has the index of 2, etc.
To replace it, use Matcher#appendReplacement():
String s = "something:POST:/some/path/";
StringBuffer result = new StringBuffer();
Matcher m = Pattern.compile(":([a-zA-Z]+):").matcher(s);
while (m.find()) {
m.appendReplacement(result, ":*:");
}
m.appendTail(result);
System.out.println(result.toString());
See another demo
This is your solution:
regex = (:)([a-zA-Z]+)(:)
And code is:
String ss = "something:POST:/some/path/";
ss = ss.replaceFirst("(:)([a-zA-Z]+)(:)", "$1*$3");
ss now contains:
something:*:/some/path/
Which I believe is what you are looking for...

Get what was removed by String.replaceAll()

So, let's say I got my regular expression
String regex = "\d*";
for finding any digits.
Now I also got a inputted string, for example
String input = "We got 34 apples and too much to do";
Now I want to replace all digits with "", doing it like that:
input = input.replaceAll(regex, "");
When now printing input I got "We got apples and too much to do". It works, it replaced the 3 and the 4 with "".
Now my question: Is there any way - maybe an existing lib? - to get what actually was replaced?
The example here is very simple, just to understand how it works. Want to use it for complexer inputs and regex.
Thanks for your help.
You can use a Matcher with the append-and-replace procedure:
String regex = "\\d*";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(input);
StringBuffer sb = new StringBuffer();
StringBuffer replaced = new StringBuffer();
while(matcher.find()) {
replaced.append(matcher.group());
matcher.appendReplacement(sb, "");
}
matcher.appendTail(sb);
System.out.println(sb.toString()); // prints the replacement result
System.out.println(replaced.toString()); // prints what was replaced

Replacing Pattern Matches in a String

String output = "";
pattern = Pattern.compile(">Part\s.");
matcher = pattern.matcher(docToProcess);
while (matcher.find()) {
match = matcher.group();
}
I'm trying to use the above code to find the pattern >Part\s. inside docToProcess (Which is a string of a large xml document) and then what I want to do is replace the content that matches the pattern with <ref></ref>
Any ideas how I can make the output variable equal to docToProcess except with the replacements as indicated above?
EDIT: I need to use the matcher somehow when replacing. I can't just use replaceAll()
You can use String#replaceAll method. It takes a Regex as first parameter: -
String output = docToProcess.replaceAll(">Part\\s\\.", "<ref></ref>");
Note that, dot (.) is a special meta-character in regex, which matches everything, and not just a dot(.). So, you need to escape it, unless you really wanted to match any character after >Part\\s. And you need to add 2 backslashes to escape in Java.
If you want to use Matcher class, the you can use Matcher.appendReplacement method: -
String docToProcess = "XYZ>Part .asdf";
Pattern p = Pattern.compile(">Part\\s\\.");
Matcher m = p.matcher(docToProcess);
StringBuffer sb = new StringBuffer();
while (m.find()) {
m.appendReplacement(sb, "<ref></ref>");
}
m.appendTail(sb);
System.out.println(sb.toString());
OUTPUT : -
"XYZ<ref></ref>asdf"
This is what you need:
String docToProcess = "... your xml here ...";
Pattern pattern = Pattern.compile(">Part\\s.");
Matcher matcher = pattern.matcher(docToProcess);
StringBuffer output = new StringBuffer();
while (matcher.find()) matcher.appendReplacement(output, "<ref></ref>");
matcher.appendTail(output);
Unfortunately, you can't use the StringBuilder due to historical constraints on the Java API.
docToProcess.replaceAll(">Part\\s[.]", "<ref></ref>");
String output = docToProcess.replaceAll(">Part\\s\\.", "<ref></ref>");

Categories