Regular expression for String.replaceAll - java

I need a regular expression that can be used with replaceAll method of String class to replace all instance of * with .* except for one that has trailing \
i.e. conversion would be
[any character]*[any character] => [any character].*[any character]
* => .*
\* => \* (i.e. no conversion.)
Can someone please help me?

Use lookbehind.
String resultString = subjectString.replaceAll("(?<!\\\\)\\*", ".*");
Explanation :
"(?<!" + // Assert that it is impossible to match the regex below with the match ending at this position (negative lookbehind)
"\\\\" + // Match the character “\” literally
")" +
"\\*" // Match the character “*” literally

it may be possible to do without capture groups, but this should work:
myString.replaceAll("\\*([^\\\\]|$)", "*.$1");

Related

Java Regex : Replace characters between two specific characters with equal number of another character

Replacing all characters between starting character "+" and end character "+" with the equal number of "-" characters.
My specific situation is as follows:
Input: +-+-+-+
Output: +-----+
String s = = "+-+-+-+";
s = s.replaceAll("-\\+-","---")
This is not working. How can I achieve the output in Java? Thanks in advance.
You can use this replacement using look around assertions:
String repl = s.replaceAll("(?<=-)\\+(?=-)", "-");
//=> +-----+
RegEx Demo
(?<=-)\\+(?=-) will match a + if it is surrounded by - on both sides. Since we are using lookbehind and lookahead therefore we are not consuming characters, these are only assertions.
The matches you have are overlapping, look:
+-+-+-+
^^^
Match found and replaced by "---"
+---+-+
^
+-- Next match search continues from here
WARNING: No more matches found!
To make sure there is a hyphen free for the next match, you need to wrap the trailing - with a positive lookahead and use -- as replacement pattern:
String s = = "+-+-+-+";
s = s.replaceAll("-\\+(?=-)","--")
See the regex demo.

Regular expression match fails if only whitespace after the - character

I am working on a regular expression where the pattern is:
1.0.0[ - optional description]/1.0.0.0[ - optional description].txt
The [ - optional description] part is of course, optional. So some possible VALID values are
1.0.0/1.0.0.0.txt
1.0.0/1.0.0.0 - xyz.txt
1.0.0 - abc/1.0.0.0 - xyz.txt
1.0.0 - abc/1.0.0.0.txt
To be a little more robust in the pattern matching, I'd like to match zero or more spaces before and after the "-" character. So all these would be valid too.
1.0.0 - abc/1.0.0.0 - xyz.txt
1.0.0-abc/1.0.0.0-xyz.txt
1.0.0 -abc/1.0.0.0- xyz.txt
To do this matching, I have the following regular expression (Java code):
String part1 = "((\\d+.{1}\\d+.{1}\\d+)(\\s*-\\s*(.+))?)";
String part2 = "((\\d+.{1}\\d+.{1}\\d+.{1}\\d+)(\\s*-\\s*(.+))?\\.sql)";
pattern = Pattern.compile(part1+ "/" + part2);
So far this regular expression is working well. But while unit testing I found a case I can't quite figure out yet. The use case is if the string contains the "-" character is surrounded by 1 or more spaces, but there is no description after the "-" character. This would look like:
1.0.0 - /1.0.0.0.txt
1.0.0- /1.0.0.0-xyz.txt
In these cases, I want the pattern match to FAIL. But with my current regular expression the match succeeds. I think what I want is if there is a "-" character surrounded by any number of spaces like " - " then there must also be at least 1 non-space character following it. But I can't quite figure out the regex for this.
Thanks!
Something like,
^\d+\.\d+\.\d+(?:\s*-\s*\w+)?\/\d+\.\d+\.\d+\.\d+(?:\s*-\s*\w+)?.txt$
Or you can combine the \.\d+ repetitions as
^\d+(?:\.\d+){2}(?:\s*-\s*\w+)?\/\d+(?:\.\d+){3}(?:\s*-\s*\w+)?.txt$
Regex Demo
Changes
.{1} When you want to repeat something once, no need for {}. Its implicit
(?:\s*-\s*\w+) Matches zero or more space (\s*) followed by -, another space and then \w+ a description of length greater than 1
The ? at the end of this patterns makes this optional.
This same pattern is repeated again at the end to match the second part.
^ Anchors the regex at the start of the string.
$ Anchors the regex at the end of the string. These two are necessary so that there is nothing other in the string.
Don't group the patterns using () unless it is necessary to capture them. This can lead to wastage of memory. Use (?:..) If you want to group patterns but not capture them
In the group that matches the optional part, you need to replace .+ with \\S+ where \S means any non-whitespace character. This enforces the optional part to include non-whitespace character in order to match the pattern:
String part1
= "((\\d+\\.\\d+\\.\\d+)(\\s*-\\s*(\\S+))?)";
String part2
= "((\\d+\\.\\d+\\.\\d+.{1}\\d+)(\\s*-\\s*(\\S+))?\\.txt)";
Also note that .{1} (which is the same as just .) matches any character. From the examples, you want to match a dot, so it should be replaced with \.
Something like
^\d+\.\d+\.\d+(?:\s*-\s*[^\/\s]+)?\/\d+\.\d+\.\d+\.\d+?(?:\s*-\s*[^.\s]+)?\.\w+$
Check it out here at regex101.

How this Regex eliminates html?

I saw one code example and didn't understand how this prints only Print statement.
Appreciate your help on this.
String str = "<a href=/utility/ReportResult.jsp?reportId=5>Print</a>";
System.out.println(str.replaceAll("\\<.*?\\>", ""));
OutPut: Print
How to modify my regex expression to print Print<>Report instead of PrintReport. Below is my regex and statement.
String str = "Print<>Report";
System.out.println(str.replaceAll("<.*?>", ""));
In order to print Print<>Report instead of PrintReport, change the * by +:
System.out.println(str.replaceAll("<.+?>", ""));
// here __^
* means 0 or more precedent character
+ means 1 or more precedent character
You don't have to escape the < (angular braces). So in java str.replaceAll("<.*?>", "") will be sufficient.
How it works :
<.*?> --> Search for first < then match everything until the next >. Note that .*? is called lazy selector / matcher.
Its a Regex says anything b/w "<" and ">" must be repalce by ""(blank string)
So
<a href=/utility/ReportResult.jsp?reportId=5>==> ""(blank)
</a>==>""(blank)
and only "Print" left
First, the leading backslashes are treated as an escape sequence for Java, so the actual regular expression is \<.*?\>
The \<' matches the<` character (the backslash again is an escape sequence, which indicates that the following character should be interpreted literally and not as a regex operator). This is the beginning of an html tag.
The . token matches any character.
The ? is a reluctant quantifier that indicates that the preceding token (any character in this case) should be matched zero or more times.
The /> matches the end of a tag. Because the ? is reluctant, the . does not match the character(s) that can be matched by this token.

Regular Expression to Replace All But One Character In String

I need regular expression to replace all matching characters except the first one in a squence in string.
For example;
For matching with 'A' and replacing with 'B'
'AAA' should be replaced with 'ABB'
'AAA AAA' should be replaced with 'ABB ABB'
For matching with ' ' and replacing with 'X'
'[space][space][space]A[space][space][space]' should be replaced with '[space]XXA[space]XX'
You need to use this regex for replacement:
\\BA
Working Demo
\B (between word characters) assert position where \b (word boundary) doesn't match
A matches the character A literally
Java Code:
String repl = input.replaceAll("\\BA", "B");
UPDATE For second part of your question use this regex for replacement:
"(?<!^|\\w) "
Code:
String repl = input.replaceAll("(?<!^|\\w) ", "X");
Demo 2
Negative Lookbehind and Beginning of String Anchor
Use the regex (?<!^| )A like this:
String resultString = subjectString.replaceAll("(?<!^| )A", "B");
In the demo, inspect the substitutions at the bottom.
Explanation
(?<!^| ) asserts that what immediately precedes the position is neither the beginning of the string nor a space character
A matches A
Reference
Lookahead and Lookbehind Zero-Length Assertions
Mastering Lookahead and Lookbehind

Regex matching quoted string but ignoring escaped quotation mark

What I want to know is how to modify following regex: \".*?\" so it will ignore escaped " character (\") so it won't end matching at \".
For example:
parameter1 = " fsfsdfsd \" " parameter2 = " fsfsfs "
I want to match:
" fsfsdfsd \" "
and
" fsfsfs "
but not
" fsfsdfsd \" " parameter2 = " fsfsfs "
etc...
Try this one:
"(?:\\"|[^"])*"
It matches "test \" though(you can probably avoid that using lookbehind). Escape the character if you need using \
Online Demo
I usually handle this sort of task by figuring out what are the elements that can appear between quote marks. In this case, each element can be:
any character that is not \ or ";
the two-character sequence \";
a \ that is not followed by ".
You can expand this if desired, by allowing \\ to represent \, for instance, or allowing other escapes; it should be pretty simple to modify the above list.
Then the regular expression just follows the rules in the list: Note: this is a regex and not a Java string literal
"(([^\\"]|\\"|\\(?!"))*)"
which means that, within the quote marks, we match one or more of: (1) a character other than \ or " (the character class); (2) the sequence \"; (3) \ not followed by " (negative lookahead). Of course, the Java string literal looks pretty ugly:
"\"(([^\\\\\"]|\\\\\"|\\\\(?!\"))*)\""
(Note: not tested.)
You will need negative lookbehind in your regex:
(?<!\\\\)\".*?(?<!\\\\)\"
correct regexp for matching strings between quotes will be:
"([^\\"]+|\\.|\\\\)*"
but besause in java slashes need to be escaped, the result expression will be:
Pattern.compile("\"(?:[^\\\\\"]+|\\\\.|\\\\\\\\)*\"");
this expression matches slash-escaped characters and slash themselve, for example:
... "123 \\\" 456 \\" ...
^ ^ slash literal
^
^ slash literal + escaped quote
regexp written in comments above will fail on this example

Categories