I want to replace all matches in a String with a dynamic number of characters; let's use [\d]+ as a simple example regex:
Desired results:
1984 -> DDDD
1 -> D
123456789 -> DDDDDDDDD
a 123 b ->a DDD b`
The common Java approach for replacing regex matches looks like this:
Pattern p = Pattern.compile("[\d]+");
String input = [...];
String replacement = ???;
String result = p.matcher(input).replaceAll(replacement);
My question is: do I have to extract the individual matches, measure their lengths, and then generate the replacement string based on that? Or does Java provide any more straightforward approach to that?
Update: I actually under-complicated the problem by giving rather simple examples. Here are some more that should be caught, taking more context into account:
<1984/> (Regex: <[\d]+/>) -> DDDDDDD
123. -> DDDD, whereas: 123 -> 123 (i.e. no match)
Note: I am not trying to parse HTML with a regex.
You're really overthinking/overcomplicating this. Just use \d with replacement as D as seen here. There's no need to get the length of the string or do any additional processing; just a straight up replaceAll()
See code in use here
final String[] a = {"1984","1","123456789","a 123 b"};
for (String s: a) {
System.out.println(s.replaceAll("\\d","D"));
}
Related
I'm pretty new to the regex world.
Given a list of Strings as input, I would like to split them by using a regex of punctuations pattern: "[!.?\n]".
The thing is, I would like to specify that if there are multiple punctuations together like this:
input: "I want it now!!!"
output: "I want it now!!"
input: "Am I ok? Yeah, I'm fine!!!"
output: ["Am I ok", "Yeah, I'm fine!!"]
You can use
[!.?\n](?![!.?\n])
Here, a !, ., ? or newline are matched only if not followed with any of these chars.
Or, if the char must be repeated:
([!.?\n])(?!\1)
Here, a !, ., ? or newline are matched only if not followed with exactly the same char.
See the regex demo #1 and the regex demo #2.
See a Java demo:
String p = "[!.?\n](?![!.?\n])";
String p2 = "([!.?\n])(?!\\1)";
String s = "I want it now!!!";
System.out.println(Arrays.toString(s.split(p))); // => [I want it now!!]
System.out.println(Arrays.toString(s.split(p2))); // => [I want it now!!]
s = "Am I ok? Yeah, I'm fine!!!";
System.out.println(Arrays.toString(s.split(p))); // => [Am I ok, Yeah, I'm fine!!]
System.out.println(Arrays.toString(s.split(p2))); // => [Am I ok, Yeah, I'm fine!!]
The split() methos of String accept as delimitator a regular expression.
Ref.: https://docs.oracle.com/javase/7/docs/api/java/lang/String.html#split(java.lang.String)
Example:
String str = "Am I ok? Yeah, I'm fine!!!";
String delimiters = "[!.?\n]";
String[] splitted = str.split(delimiters);
for(String part : splitted) {
System.out.print(part + "\n");
}
Output:
Am I ok
Yeah, I'm fine
I would like to resolve this problem.
, comma : split terms
" double quote : String value (ignore special char)
[] array
For instance:
input : a=1,b="1,2,3",c=[d=1,e="1,2,3"]
expected output:
a=1
b="1,2,3"
c=[d=1,e="1,2,3"]
But I could not get above result.
I have written the code below:
String line = "a=1,b=\"1,2,3\",c=[d=1,e=\"1,11\"]";
String[] tokens = line.split(",(?=(([^\"]*\"){2})*[^\"]*$)");
for (String t : tokens)
System.out.println("> " + t);
and my output is:
a=1
b="1,2,3"
c=[d=1
e="1,11"]
What do I need to change to get the expected output? Should I stick to a regular expression or might another solution be more flexible and easier to maintain?
This regex does the trick:
",(?=(([^\"]*\"){2})*[^\"]*$)(?=([^\\[]*?\\[[^\\]]*\\][^\\[\\]]*?)*$)"
It works by adding a look-ahead for matching pairs of square brackets after the comma - if you're inside a square-bracketed term, of course you won't have balanced brackets following.
Here's some test code:
String line = "a=1,b=\"1,2,3\",c=[d=1,e=\"1,11\"]";
String[] tokens = line.split(",(?=(([^\"]*\"){2})*[^\"]*$)(?=([^\\[]*?\\[[^\\]]*\\][^\\[\\]]*?)*$)");
for (String t : tokens)
System.out.println(t);
Output:
a=1
b="1,2,3"
c=[d=1,e="1,11"]
I know the question is nearly a year old, but... this regex is much simpler:
\[[^]]*\]|"[^"]*"|(,)
The leftmost branch of the | matches [complete brackets]
The next side of the | matches \"strings like this\"
The right side captures commas to Group 1, and we know they are the right commas because they weren't matched by the expressions on the left
All we need to do is split on Group 1
Splitting on Group 1 Captures
You can do it like this (see the output at the bottom of the online demo):
String subject = "a=1,b=\"1,2,3\",c=[d=1,e=\"1,11\"]";
Pattern regex = Pattern.compile("\\[[^]]*\\]|\".*?\"|(,)");
Matcher m = regex.matcher(subject);
StringBuffer b= new StringBuffer();
while (m.find()) {
if(m.group(1) != null) m.appendReplacement(b, "##SplitHere##");
else m.appendReplacement(b, m.group(0));
}
m.appendTail(b);
String replaced = b.toString();
String[] splits = replaced.split("##SplitHere##");
for (String split : splits) System.out.println(split);
This is a two-step split: first, we replace the commas with something distinctive, such as ##SplitHere##
Pros and Cons
The main benefit of this technique is that it is extremely easy to understand and maintain. If you suddenly decide to exclude commas {inside , curlies}, you just add another OR branch to the left of the regex: {[^{}]*}
When you are familiar with it, you can use it in many contexts
In this case, the main drawback is that we proceed in two steps as we replace before splitting. In my view, with modern processors that's irrelevant. Maintainable code is much more important.
Reference
This technique has many applications. It is fully explained in these two links.
How to match (or replace) a pattern except in situations s1, s2, s3...
Article about matching a pattern unless...
I have query about java regular expressions. Actually, I am new to regular expressions.
So I need help to form a regex for the statement below:
Statement: a-alphanumeric&b-digits&c-digits
Possible matching Examples: 1) a-90485jlkerj&b-34534534&c-643546
2) A-RT7456ffgt&B-86763454&C-684241
Use case: First of all I have to validate input string against the regular expression. If the input string matches then I have to extract a value, b value and c value like
90485jlkerj, 34534534 and 643546 respectively.
Could someone please share how I can achieve this in the best possible way?
I really appreciate your help on this.
you can use this pattern :
^(?i)a-([0-9a-z]++)&b-([0-9]++)&c-([0-9]++)$
In the case what you try to match is not the whole string, just remove the anchors:
(?i)a-([0-9a-z]++)&b-([0-9]++)&c-([0-9]++)
explanations:
(?i) make the pattern case-insensitive
[0-9]++ digit one or more times (possessive)
[0-9a-z]++ the same with letters
^ anchor for the string start
$ anchor for the string end
Parenthesis in the two patterns are capture groups (to catch what you want)
Given a string with the format a-XXX&b-XXX&c-XXX, you can extract all XXX parts in one simple line:
String[] parts = str.replaceAll("[abc]-", "").split("&");
parts will be an array with 3 elements, being the target strings you want.
The simplest regex that matches your string is:
^(?i)a-([\\da-z]+)&b-(\\d+)&c-(\\d+)
With your target strings in groups 1, 2 and 3, but you need lot of code around that to get you the strings, which as shown above is not necessary.
Following code will help you:
String[] texts = new String[]{"a-90485jlkerj&b-34534534&c-643546", "A-RT7456ffgt&B-86763454&C-684241"};
Pattern full = Pattern.compile("^(?i)a-([\\da-z]+)&b-(\\d+)&c-(\\d+)");
Pattern patternA = Pattern.compile("(?i)([\\da-z]+)&[bc]");
Pattern patternB = Pattern.compile("(\\d+)");
for (String text : texts) {
if (full.matcher(text).matches()) {
for (String part : text.split("-")) {
Matcher m = patternA.matcher(part);
if (m.matches()) {
System.out.println(part.substring(m.start(), m.end()).split("&")[0]);
}
m = patternB.matcher(part);
if (m.matches()) {
System.out.println(part.substring(m.start(), m.end()));
}
}
}
}
Suppose I have a string like :
String s = "hellllooooo howwwwwww areeeeeee youuuuuuu";
I want to discard the repeated letters and want to get :
"helloo howw aree youu"
I have done the matching using ::
matches(".*([a-z])\\1{3,}.*"
But how can I replace the helloooooooo to helloo and the others ?
Any of the following produces the result you want:
s = s.replaceAll("([a-z])\\1+", "$1$1");
s = s.replaceAll("(([a-z])\\2)\\2*", "$1");
A bit of continuation of Get groups with regex and OR
Sample
AD ABCDEFG HIJKLMN
AB HIJKLMN
AC DJKEJKW SJKLAJL JSHELSJ
Rule: Always 2 Chars Code (AB|AC|AD) at line beginning then any number (>1) of 7 Chars codes following (at least one 7char code). The space between the groups also can be a '.'
With this expression I get it nicely grouped
/^(AB|AC|AD)|((\S{7})+)/
I can access the 2chars code with group[0] and so on.
Can I enforce the rule as above the same time ?
With above regex the following lines are also valid (because of the OR | in the regex statement)
AC
dfghjkl
asdfgh hjklpoi
Which is not what I need.
Thanks again to the regex experts
Try that:
^(A[BCD])(([ .])([A-Z]{7}))+$
Personally, I would do this in two separate steps
I'd check the string matches a regular expression
I'd split matching strings based on the separator chars [ .]
This code:
def input = [
'AD ABCDEFG HIJKLMN',
'AB HIJKLMN',
'AC DJKEJKW SJKLAJL JSHELSJ',
'AC',
'dfghjkl',
'asdfgh hjklpoi',
'AC DJKEJKW.SJKLAJL JSHELSJ',
]
def regexp = /^A[BCD]([ .](\S{7}))+$/
def result = input.inject( [] ) { list, inp ->
// Does the line match the regexp?
if( inp ==~ regexp ) {
// If so, split it
list << inp.split( /[ .]/ )
}
list
}
println result
Shows you an example of what I mean, and prints out:
[[AD, ABCDEFG, HIJKLMN], [AB, HIJKLMN], [AC, DJKEJKW, SJKLAJL, JSHELSJ], [AC, DJKEJKW, SJKLAJL, JSHELSJ]]