java split by bracket and keep the delmiter - RegEx [duplicate]

java split by bracket and keep the delmiter - RegEx [duplicate] - java

This question already has answers here:
How do I split a string in Java?
(39 answers)
Closed 6 years ago.
i am trying to split the string using regex with closing bracket as a delimiter and have to keep the bracket..
i/p String: (GROUP=test1)(GROUP=test2)(GROUP=test3)(GROUP=test4)
needed o/p:
(GROUP=test1)
(GROUP=test2)
(GROUP=test3)
(GROUP=test4)
I am using the java regex - "\([^)]*?\)" and it is throwing me the error..Below is the code I am using and when I try to get the group, its throwing the error..
Pattern splitDelRegex = Pattern.compile("\\([^)]*?\\)");
Matcher regexMatcher = splitDelRegex.matcher("(GROUP=test1)(GROUP=test2) (GROUP=test3)(GROUP=test4)");
List<String> matcherList = new ArrayList<String>();
while(regexMatcher.find()){
String perm = regexMatcher.group(1);
matcherList.add(perm);
}
any help is appreciated..Thanks

You simply forgot to put capturing parentheses around the entire regex. You are not capturing anything at all. Just change the regex to
Pattern splitDelRegex = Pattern.compile("(\\([^)]*?\\))");
^ ^
I tested this in Eclipse and got your desired output.

You could use
str.split(")")
That would return an array of strings which you would know are lacking the closing parentheses and so could add them back in afterwards. Thats seems much easier and less error prone to me.

You could try changing this line :
String perm = regexMatcher.group(1);
To this :
String perm = regexMatcher.group();
So you read the last found group.

I'm not sure why you need to split the string at all. You can capture each of the bracketed groups with a regex.
Try this regex (\\([a-zA-Z0-9=]*\\)). I have a capturing group () that looks for text that starts with a literal \\(, contains [a-zA-Z0-9=] zero or many times * and ends with a literal \\). This is a pretty loose regex, you could tighten up the match if the text inside the brackets will be predictable.
String input = "(GROUP=test1)(GROUP=test2)(GROUP=test3)(GROUP=test4)";
String regex = "(\\([a-zA-Z0-9=]*\\))";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(input);
while(matcher.find()) { // find the next match
System.out.println(matcher.group()); // print the match
}
Output:
(GROUP=test1)
(GROUP=test2)
(GROUP=test3)
(GROUP=test4)

Related

How to capture a regex group for below pattern [duplicate]

This question already has answers here:
Regex: match everything but a specific pattern
(6 answers)
Closed 3 years ago.
I am exploring java regex groups and I am trying to replace a string with some characters.
I have a string str = "abXYabcXYZ"; and I am trying to replace all characters except for the pattern group abc in string.
I tried to use str.replaceAll("(^abc)",""), but it did not work. I understand that (abc) will match a group.

You might find it easier to find the parts you want to keep and just build a new string. There are flaws with this issue with overlapping patterns, but it will likely be good enough for your use case. However, if your pattern really is as simple as "abc" then you may want to instead consider just counting the total number of matches.
String str = "abXYabcXYZ";
Pattern patternToKeep = Pattern.compile("abc");
MatchResult matches = patternToKeep.matcher(str).toMatchResult();
StringBuilder sb = new StringBuilder();
for (int i = 1; i < matches.groupCount(); i++) {
sb.append(matches.group(i));
}
System.out.println(sb.toString());

It is easier to keep the matching parts of the pattern and concatenate them. In the following example the matcher iterates with find() over str and match the next pattern. In the loop your "abc" pattern will be always found at group(0).
String str = "abXYabcXYZabcxss";
Pattern pattern = Pattern.compile("abc");
StringBuilder sb = new StringBuilder();
Matcher matcher = pattern.matcher(str);
while(matcher.find()){
sb.append(matcher.group(0));
}
System.out.println(sb.toString());
For only replacing, the nearest you can get would be:
((?!abc).)*
But with the problem that only the a's of abc would not be replaced.
Regex101 example

How to extract values in between { } through regex? [duplicate]

This question already has answers here:
Java Regex matching between curly braces
(5 answers)
Closed 6 years ago.
I am trying to extract value between { } using
"\\(\\{[^}]+\\}\\)"
regex in java. My input is
String text = "Hi this is {text to be extracted}."
I want output as
"text to be extracted"
but that regex isn't working.

Try this:
"\\{([^}]*)\\}"
Online Demo
Then $1 is containing text to be extracted.

The regexp seems malformed.
You need to match extra characters before and after the group, and you do not need to escape the parenthesis.
Also, you can use the named group to extract exactly the text you care about
Here is working code
String text = "Hi this is {text to be extracted}.";
Pattern p = Pattern.compile(".*\\{(?<t>[^}]+)\\}.*");
Matcher m = p.matcher(text);
if (m.matches()) {
System.out.println(m.group("t"));
}

Remove occurrences of a given character sequence at the beginning of a string using Java Regex

I have a string that begins with one or more occurrences of the sequence "Re:". This "Re:" can be of any combinations, for ex. Re<any number of spaces>:, re:, re<any number of spaces>:, RE:, RE<any number of spaces>:, etc.
Sample sequence of string : Re: Re : Re : re : RE: This is a Re: sample string.
I want to define a java regular expression that will identify and strip off all occurrences of Re:, but only the ones at the beginning of the string and not the ones occurring within the string.
So the output should look like This is a Re: sample string.
Here is what I have tried:
String REGEX = "^(Re*\\p{Z}*:?|re*\\p{Z}*:?|\\p{Z}Re*\\p{Z}*:?)";
String INPUT = title;
String REPLACE = "";
Pattern p = Pattern.compile(REGEX);
Matcher m = p.matcher(INPUT);
while(m.find()){
m.appendReplacement(sb,REPLACE);
}
m.appendTail(sb);
I am using p{Z} to match whitespaces(have found this somewhere in this forum, as Java regex does not identify \s).
The problem I am facing with this code is that the search stops at the first match, and escapes the while loop.

Try something like this replace statement:
yourString = yourString.replaceAll("(?i)^(\\s*re\\s*:\\s*)+", "");
Explanation of the regex:
(?i) make it case insensitive
^ anchor to start of string
( start a group (this is the "re:")
\\s* any amount of optional whitespace
re "re"
\\s* optional whitespace
: ":"
\\s* optional whitespace
) end the group (the "re:" string)
+ one or more times

in your regex:
String regex = "^(Re*\\p{Z}*:?|re*\\p{Z}*:?|\\p{Z}Re*\\p{Z}*:?)"
here is what it does:
see it live here
it matches strings like:
\p{Z}Reee\p{Z: or
R\p{Z}}}
which make no sense for what you try to do:
you'd better use a regex like the following:
yourString.replaceAll("(?i)^(\\s*re\\s*:\\s*)+", "");
or to make #Doorknob happy, here's another way to achieve this, using a Matcher:
Pattern p = Pattern.compile("(?i)^(\\s*re\\s*:\\s*)+");
Matcher m = p.matcher(yourString);
if (m.find())
yourString = m.replaceAll("");
(which is as the doc says the exact same thing as yourString.replaceAll())
Look it up here
(I had the same regex as #Doorknob, but thanks to #jlordo for the replaceAll and #Doorknob for thinking about the (?i) case insensitivity part ;-) )

Java regex split string by comma but ignore quotes and also parentheses [duplicate]

This question already has answers here:
Java: splitting a comma-separated string but ignoring commas in quotes
(12 answers)
Closed 9 years ago.
I'm stuck with this regex.
So, I have input as:
"Crane device, (physical object)"(X1,x2,x4), not "Seen by research nurse (finding)", EntirePatellaBodyStructure(X1,X8), "Besnoitia wallacei (organism)", "Catatropis (organism)"(X1,x2,x4), not IntracerebralRouteQualifierValue, "Diospyros virginiana (organism)"(X1,x2,x4), not SuturingOfHandProcedure(X1)
and in the end I would like to get is:
"Crane device, (physical object)"(X1,x2,x4)
not "Seen by research nurse (finding)"
EntirePatellaBodyStructure(X1,X8)
"Besnoitia wallacei (organism)"
"Catatropis (organism)"(X1,x2,x4)
not IntracerebralRouteQualifierValue
"Diospyros virginiana (organism)"(X1,x2,x4)
not SuturingOfHandProcedure(X1)
I've tried regex
(\'[^\']*\')|(\"[^\"]*\")|([^,]+)|\\s*,\\s*
It works if I don't have a comma inside parentheses.

RegEx
(\w+\s)?("[^"]+"|\w+)(\(\w\d(,\w\d)*\))?
Java Code
String input = ... ;
Matcher m = Pattern.compile(
"(\\w+\\s)?(\"[^\"]+\"|\\w+)(\\(\\w\\d(,\\w\\d)*\\))?").matcher(input);
while(matcher.find()) {
System.out.println(matcher.group());
}
Output
"Crane device, (physical object)"(X1,x2,x4)
not "Seen by research nurse (finding)"
EntirePatellaBodyStructure(X1,X8)
not "Besnoitia wallacei (organism)"(X1,x2,x4)
not "Catatropis (organism)"(X1,x2,x4)
not IntracerebralRouteQualifierValue
not "Diospyros virginiana (organism)"(X1,x2,x4)
not SuturingOfHandProcedure(X1)

Don't use regexes for this. Write a simple parser that keeps track of the number of parentheses encountered, and whether or not you are inside quotes. For more information, see: RegEx match open tags except XHTML self-contained tags

Would this do what you need?
System.out.println(yourString.replaceAll(", not", "\nnot"));

Assuming that there is no possibility of nesting () within (), and no possibility of (say) \" within "", you can write something like:
private static final Pattern CUSTOM_SPLIT_PATTERN =
Pattern.compile("\\s*((?:\"[^\"]*\"|[(][^)]*[)]|[^\"(]+)+)");
private static final String[] customSplit(final String input) {
final List<String> ret = new ArrayList<String>();
final Matcher m = CUSTOM_SPLIT_PATTERN.matcher(input);
while(m.find()) {
ret.add(m.group(1));
}
return ret.toArray(new String[ret.size()]);
}
(disclaimer: not tested).

regex needed which matches for two sample string

I have two input strings :
this-is-a-sample-string-%7b3DES%7dFPvKTjGHUA3lD9Us70rfjQ==?Id=113690_2&Index=0&Referrer=IC
this-is-a-sample-string-%7b3DES%7dFPvKTjGHUA3lD9Us70rfjQ==
What I want is only the %7b3DES%7dFPvKTjGHUA3lD9Us70rfjQ== from both of the sample strings.
I tried by using the regex [a-zA-Z-]+-(.*) which works fine for the second input string.
String inputString = "this-is-a-sample-string-%7b3DES%7dFPvKTjGHUA3lD9Us70rfjQ==";
String regexString = "[a-zA-Z-]+-(.*)";
Pattern pattern = Pattern.compile(regexString);
Matcher matcher = pattern.matcher(inputString);
if(matcher.matches()) {
System.out.println("--->" + matcher.group(1) + "<---");
} else {
System.out.println("nope");
}

The following patterns match the desired group with the limited information and examples provided:
-([^-?]*)(?:\?|$)
.*-(.*?)(?:\?|$)
The first will match a hyphen then group all the characters up to either the ? or the end of the string.
The second matches as many characters and hyphens as possible followed by the smallest string to either the next question mark or the end of the string.
There are dozens of ways of writing something that will match this text though so I'm kinda just guessing if this is what you wanted. If this is not what you're after please elaborate on what exactly you're trying to accomplish.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

java split by bracket and keep the delmiter - RegEx [duplicate] - java

You simply forgot to put capturing parentheses around the entire regex. You are not capturing anything at all. Just change the regex to Pattern splitDelRegex = Pattern.compile("(\\([^)]*?\\))"); ^ ^ I tested this in Eclipse and got your desired output.

You could use str.split(")") That would return an array of strings which you would know are lacking the closing parentheses and so could add them back in afterwards. Thats seems much easier and less error prone to me.

You could try changing this line : String perm = regexMatcher.group(1); To this : String perm = regexMatcher.group(); So you read the last found group.

Related

How to capture a regex group for below pattern [duplicate]

How to extract values in between { } through regex? [duplicate]

Remove occurrences of a given character sequence at the beginning of a string using Java Regex

Java regex split string by comma but ignore quotes and also parentheses [duplicate]

regex needed which matches for two sample string

Categories

Resources