Nested regexps and replace

Nested regexps and replace - java

I have strings like this <p0=v0 p1=v1 p2=v2 ....> and I want to swap pX with vX to have something like <v0=p0 v1=p1 v2=p2 ....> using regexps.
I want only pairs in <> to be swapped.
I wrote:
Pattern pattern = Pattern.compile("<(\\w*)=(\\w*)>");
Matcher matcher = pattern.matcher("<p1=v1>");
System.out.println(matcher.replaceAll("$2=$1"));
But it works only with a single pair pX=vX
Could someone explain me how to write regexp that works for multiple pairs?

Simple, use groups:
String input = "<p0=v0 p1=v1 p2=v2>";
// |group 1
// ||matches "p" followed by one digit
// || |... followed by "="
// || ||group 2
// || |||... followed by "v", followed by one digit
// || ||| |replaces group 2 with group 1,
// || ||| |re-writes "=" in the middle
System.out.println(input.replaceAll("(p[0-9])=(v[0-9])", "$2=$1"));
Output:
<v0=p0 v1=p1 v2=p2>

You can use this pattern:
"((?:<|\\G(?<!\\A))\\s*)(p[0-9]+)(\\s*=\\s*)(v[0-9]+)"
To ensure that the pairs are after an opening angle bracket, the pattern start with:
(?:<|\\G(?<!\\A))
that means: an opening angle bracket OR at the end of the last match
\\G is an anchor for the position immediatly after the last match or the begining of the string (in other words, it is the last position of the regex engine in the string, that is zero at the start of the string). To avoid a match at the start of the string I added a negative lookbehind (?<!\\A) -> not preceded by the start of the string.
This trick forces each pair to be preceded by an other pair or by a <.
example:
String subject = "p5=v5 <p0=v0 p1=v1 p2=v2 p3=v3> p4=v4";
String pattern = "((?:<|\\G(?<!\\A))\\s*)(p[0-9]+)(\\s*=\\s*)(v[0-9]+)";
String result = subject.replaceAll(pattern, "$1$4$3$2");
If you need p and v to have the same number you can change it to:
String pattern = "((?:<|\\G(?<!\\A))\\s*)(p([0-9]+))(\\s*=\\s*)(v\\3)";
String result = subject.replaceAll(pattern, "$1$5$4$2");
If parts between angle brackets can contain other things (that are not pairs):
String pattern = "((?:<|\\G(?<!\\A))(?:[^\s>]+\\s*)*?\\s*)(p([0-9]+))(\\s*=\\s*)(v\\3)";
String result = subject.replaceAll(pattern, "$1$4$3$2");
Note: all these patterns only checks if there is an opening angle bracket, but don't check if there is a closing angle bracket. If a closing angle bracket is missing, all pairs will be replaced until there is no more contiguous pairs for the two first patterns and until the next closing angle bracket or the end of the string for the third pattern.
You can check the presence of a closing angle bracket by adding (?=[^<>]*>) at the end of each pattern. However adding this will make your pattern not performant at all. It is better to search parts between angle brackets with (?<=<)[^<>]++(?=>) and to perform the replacement of pairs in a callback function. You can take a look at this post to implement it.

To replace everything between < and > (let's call it tag) is - imho - not possible if the same pattern can occur outside the tag.
Instead to replace everything at once, I'd go for two regexes:
String str = "<p1=v1 p2=v2> p3=v3 <p4=v4>";
Pattern insideTag = Pattern.compile("<(.+?)>");
Matcher m = insideTag.matcher(str);
while(m.find()) {
str = str.replace(m.group(1), m.group(1).replaceAll("(\\w*)=(\\w*)", "$2=$1"));
}
System.out.println(str);
//prints: <v1=p1 v2=p2> p3=v3 <v4=p4>
The matcher grabs everything between < and > and for each match it replaces the content of the first capturing group with the swapped one on the original string, but only if it matches (\w*)=(\w*), of course.
Trying it with
<p1=v1 p2=v2 just some trash> p3=v3 <p4=v4>
gives the output
<v1=p1 v2=p2 just some trash> p3=v3 <v4=p4>

This should work to swap only those pairs between < and >:
String string = "<p0=v0 p1=v1 p2=v2> a=b c=d xyz=abc <foo=bar baz=bat>";
Pattern pattern1 = Pattern.compile("<[^>]+>");
Pattern pattern2 = Pattern.compile("(\\w+)=(\\w+)");
Matcher matcher1 = pattern1.matcher(string);
StringBuffer sbuf = new StringBuffer();
while (matcher1.find()) {
Matcher matcher2 = pattern2.matcher(matcher1.group());
matcher1.appendReplacement(sbuf, matcher2.replaceAll("$2=$1"));
}
matcher1.appendTail(sbuf);
System.out.println(sbuf);
OUTPUT:
<v0=p0 v1=p1 v2=p2> a=b c=d xyz=abc <bar=foo bat=baz>

If Java can do the \G anchor, this will work for unnested <>'s
Find: ((?:(?!\A|<)\G|<)[^<>]*?)(\w+)=(\w+)(?=[^<>]*?>)
Replace (globally): $1$3=$2
Regex explained
( # (1 start)
(?:
(?! \A | < )
\G # Start at last match
|
< # Or, <
)
[^<>]*?
) # (1 end)
( \w+ ) # (2)
=
( \w+ ) # (3)
(?= [^<>]*? > ) # There must be a closing > ahead
Perl test case
$/ = undef;
$str = <DATA>;
$str =~ s/((?:(?!\A|<)\G|<)[^<>]*?)(\w+)=(\w+)(?=[^<>]*?>)/$1$3=$2/g;
print $str;
__DATA__
<p0=v0 p1=v1 p2=v2 ....>
Output >>
<v0=p0 v1=p1 v2=p2 ....>

Related

How to match forward slashes or periods at end of String but Not Capture Using Java Regular Expression

I am having problems understand how regular expression can match text but not include the matched text that is found. Perhaps I need to be working with groups which I'm not doing because I usually see the term non-capturing groups being used.
The goal is say I have ticket in a log file as follows:
TICKET/A/ADMIN/05MAR2020// to return only A/ADMIN/05MAR2020
or if
TICKET/A/ENGINEERING/05MAR2020. to return only A/ENGINEERING/05MAR02020
where the "//" or "." has been removed
Lastly to ignore lines like:
TICKET HAS BEEN COMPLETED
using regex = "(?<=^TICKET\\s{0,2}/).*(?://|\\.)?
So telling parser look for TICKET at start of string followed by a forward slash, but don't return TICKET. And look for either a double forward slash "//" or "." a period at the end of string but make this optional.
My Java 1.8.x code follows:
// used in the import statement: import java.util.regex.Matcher;
// import java.util.regex.Pattern;
private static void testRegex() {
String ticket1 = "TICKET/A/ITSUPPORT/05MAR2020//";
String ticket2 = "TICKET /B/ADMIN/06MAR2020.";
String ticket3 = "TICKET/C/GENERAL/07MAR2020";
//https://www.regular-expressions.info/brackets.html
String regex = "(?<=^TICKET\\s{0,2}/).*(?://|\\.)?";
Pattern pat = Pattern.compile(regex);
Matcher mat = pat.matcher(ticket1);
if (mat.find()) {
String myticket = ticket1.substring(mat.start(), mat.end());
System.out.println(myticket+ ", Expect 'A/ITSUPPORT/05MAR2020'");
}
mat = pat.matcher(ticket2);
if (mat.find()) {
String myticket = ticket2.substring(mat.start(), mat.end());
System.out.println(myticket+", Expect 'B/ADMIN/06MAR2020'");
}
mat = pat.matcher(ticket3);
if (mat.find()) {
String myticket = ticket3.substring(mat.start(), mat.end());
System.out.println(myticket+", Expect 'C/GENERAL/07MAR2020'");
}
regex = "(//|\\.)";
pat = Pattern.compile(regex);
mat = pat.matcher(ticket1);
if (mat.find()) {
String myticket = ticket1.substring(mat.start(), mat.end());
System.out.println(myticket+", "+mat.start() + ", " + mat.end() + ", " + mat.groupCount());
}
}
My actual results follow:
A/ITSUPPORT/05MAR2020//, Expect 'A/ITSUPPORT/05MAR2020
B/ADMIN/06MAR2020., Expect 'B/ADMIN/06MAR2020
C/GENERAL/07MAR2020, Expect 'C/GENERAL/07MAR2020
//, 28, 30, 1
Any suggestion would be appreciate. Please note, been learning from StackOverflow long-time but first entry, hope question is asked appropriately. Thank you.

You could use a positive lookahead at the end of the pattern instead of a match.
The lookahead asserts what is at the end of the string is an optional // or .
As the dot and the double forward slash are optional, you have to make the .*? non greedy.
(?<=^TICKET\s{0,2}/).*?(?=(?://|\.)?$)
In parts
(?<= Positive lookbehind, assert what is on the left is
^ Start of the string
TICKET\s{0,2}/ Match TICKET and 0-2 whitespace chars followed by /
) Close lookbehind
.*? Match any char except a newline 0+ times, as least as possible (non greedy)
(?= Positive lookahead, assert what is on the the right is
(?: Non capture group for the alternation | because both can be followed by $
// Match 2 forward slashes
| Or
\. Match a dot
)? Close the non capture group and make it optional
$ Assert the end of the string
) Close the positive lookahead
In Java
String regex = "(?<=^TICKET\\s{0,2}/).*?(?=(?://|\\.)?$)";
Regex demo 1 | Java demo
1. The regex demo has Javascript selected for the demo only
Output of the updated pattern with your code:
A/ITSUPPORT/05MAR2020, Expect 'A/ITSUPPORT/05MAR2020'
B/ADMIN/06MAR2020, Expect 'B/ADMIN/06MAR2020'
C/GENERAL/07MAR2020, Expect 'C/GENERAL/07MAR2020'
//, 28, 30, 1

How do get value between () in java string just before .?

I have to replace file name abc(1).jpg to abc(2).jpg . Here is the code
String example = "my attachements with some name (56).jpg";
Matcher m = Pattern.compile("\\((\\d{1,}).\\)").matcher(example);
int a = 0;
while(m.find()) {
a=Integer.parseInt(m.group(1));
String p = example.replace(String.valueOf(a), String.valueOf(a+1));
}
It is working fien as per given use case . But fails in case of abc(ab)(1)(ab).jpg for this case it just changed to abc(ab)(2)(ab).jpg . Which is not required . So how do i can verify that numeric bracket is just before dot i.e .

You may use
String example = "my attachements with some name (56).jpg";
Matcher m = Pattern.compile("(?<=\\()\\d+(?=\\)\\.)").matcher(example);
example = m.replaceAll(r -> String.valueOf(Integer.parseInt(m.group())+1) );
System.out.println( example );
// => my attachements with some name (57).jpg
See the Java demo. The regex used is
(?<=\()\d+(?=\)\.)
See the regex demo. It matches
(?<=\() - a location immediately preceded with (
\d+ - then consumes 1+ digits
(?=\)\.) - immediately followed with ). char sequence.
If you need to tell the regex to match the dot that is the last dot in the string (where it is most likely the extension delimiter) replace (?=\)\.) with (?=\)\.[^.]*$). See this regex demo.

You can use a lookahead regex for this:
"\\((\\d+)\\)(?=\\.)"
(?=\.) is a lookahead condition that asserts presence of dot right after closing )
RegEx Demo

java regex minimum character not working

^[a-zA-Z1-9][a-zA-Z1-9_\\.-]{2,64}[^\\.-]$
this is the regex that should match the following conditions
should start only with alphabets and numbers ,
contains alphabets numbers ,dot and hyphen
should not end with hyphen
it works for all conditions but when i try with three character like
vu6
111
aaa
after four characters validation is working properly did i miss anything

Reason why your Regex doesn't work:
Hope breaking it into smaller pieces will help:
^[a-zA-Z1-9][a-zA-Z1-9_\\.-]{2,64}[^\\.-]$
[a-zA-Z1-9]: Will match a single alphanumeric character ( except for _ )
[a-zA-Z1-9_\\.-]{2,64}: Will match alphanumeric character + "." + -
[^\\.-]: Will expect exactly 1 character which should not be "." or "-"
Solution:
You can use 2 simple regex:
This answer assumes that the length of the string you want to match lies between [3-65] (both inclusive)
First, that will actually validate the string
[a-zA-Z1-9][a-zA-Z1-9_\\.-]{2,64}
Second, that will check the char doesn't end with ".|-"
[^\\.-]$
In Java
Pattern pattern1 = Pattern.compile("^[a-zA-Z1-9][a-zA-Z1-9_\\.-]{2,64}$");
Pattern pattern2 = Pattern.compile("[^\\.-]$");
Matcher m1 = pattern1.matcher(input);
Matcher m2 = pattern1.matcher(input);
if(m1.find() && m2.find()) {
System.out.println("found");
}

regex to remove round brackets from a string

i have a string
String s="[[Identity (philosophy)|unique identity]]";
i need to parse it to .
s1 = Identity_philosphy
s2= unique identity
I have tried following code
Pattern p = Pattern.compile("(\\[\\[)(\\w*?\\s\\(\\w*?\\))(\\s*[|])\\w*(\\]\\])");
Matcher m = p.matcher(s);
while(m.find())
{
....
}
But the pattern is not matching..
Please Help
Thanks

Use
String s="[[Identity (philosophy)|unique identity]]";
String[] results = s.replaceAll("^\\Q[[\\E|]]$", "") // Delete double brackets at start/end
.replaceAll("\\s+\\(([^()]*)\\)","_$1") // Replace spaces and parens with _
.split("\\Q|\\E"); // Split with pipe
System.out.println(results[0]);
System.out.println(results[1]);
Output:
Identity_philosophy
unique identity

You may use
String s="[[Identity (philosophy)|unique identity]]";
Matcher m = Pattern.compile("\\[{2}(.*)\\|(.*)]]").matcher(s);
if (m.matches()) {
System.out.println(m.group(1).replaceAll("\\W+", " ").trim().replace(" ", "_")); // // => Identity_philosphy
System.out.println(m.group(2).trim()); // => unique identity
}
See a Java demo.
Details
The "\\[{2}(.*)\\|(.*)]]" with matches() is parsed as a ^\[{2}(.*)\|(.*)]]\z pattern that matches a string that starts with [[, then matches and captures any 0 or more chars other than line break chars as many as possible into Group 1, then matches a |, then matches and capture any 0 or more chars other than line break chars as many as possible into Group 2 and then matches ]]. See the regex demo.
The contents in Group 2 can be trimmed from whitespace and used as is, but Group 1 should be preprocessed by replacing all 1+ non-word character chhunks with a space (.replaceAll("\\W+", " ")), then trimming the result (.trim()) and replacing all spaces with _ (.replace(" ", "_")) as the final touch.

split string for returning only the latter part

I have a string like this:
abc:def,ghi,jkl;mno:pqr,stu;vwx:yza,aaa,bbb;
I want to split first on ; and then on :
Finally the output should be only the latter part around : i.e. my output should be
def, ghi, jkl, pqr, stu, yza,aaa,bbb
This can be done using Split twice i.e. once with ; and then with : and then pattern match to find just the right part next to the :. Howvever, is there a better and optimized solution to achieve this?

So basically you want to fetch the content between ; and :, with : on the left and ; on the right.
You can use this regex: -
"(?<=:)(.*?)(?=;)"
This contains a look-behind for : and a look-ahead for ;. And matches the string preceded by a colon(:) and followed by a semi-colon (;).
Regex Explanation: -
(?<= // Look behind assertion.
: // Check for preceding colon (:)
)
( // Capture group 1
. // Any character except newline
* // repeated 0 or more times
? // Reluctant matching. (Match shortest possible string)
)
(?= // Look ahead assertion
; // Check for string followed by `semi-colon (;)`
)
Here's the working code: -
String str = "abc:def,ghi,jkl;mno:pqr,stu;vwx:yza,aaa,bbb;";
Matcher matcher = Pattern.compile("(?<=:)(.*?)(?=;)").matcher(str);
StringBuilder builder = new StringBuilder();
while (matcher.find()) {
builder.append(matcher.group(1)).append(", ");
}
System.out.println(builder.substring(0, builder.lastIndexOf(",")));
OUTPUT: -
def,ghi,jkl, pqr,stu, yza,aaa,bbb

String[] tabS="abc:def,ghi,jkl;mno:pqr,stu;vwx:yza,aaa,bbb;".split(";");
StringBuilder sb = new StringBuilder();
Pattern patt = Pattern.compile("(.*:)(.*)");
String sep = ",";
for (String string : tabS) {
sb.append(patt.matcher(string).replaceAll("$2 ")); // ' ' after $2 == ';' replaced
sb.append(sep);
}
System.out.println(sb.substring(0,sb.lastIndexOf(sep)));
output
def,ghi,jkl ,pqr,stu ,yza,aaa,bbb

Don't pattern match unless you have to in Java; if you can't have the ':' character in the field name (abc in your example), you can use indexOf(":") to figure out the "right part".

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Nested regexps and replace - java

Related

How to match forward slashes or periods at end of String but Not Capture Using Java Regular Expression

How do get value between () in java string just before .?

java regex minimum character not working

regex to remove round brackets from a string

split string for returning only the latter part

Categories

Resources