Regex: How to remove a substring that is bounded by certain characters? - java

I'm not sure what the appropriate regex expression would be for this:
String s = "[Don't remove] Don't remove [Remove | Don't remove]";
I want to remove everything in between [ and | but not [ and ]. So the output is:
"[Don't remove] Don't remove Don't remove]"
I tried doing this,
s = s.replaceAll("\\[.*?\\|", "");
but I end up getting something like this.
"Don't remove]"
Now I'm at a lost. I'm still new to regular expressions and any help would be greatly appreciated. Thanks!

Use a ngated character class [^\[|]* that will not allow matching any other [ and | in between [ and |:
String s = "[Don't remove] Don't remove [Remove | Don't remove]";
s = s.replaceAll("\\[[^\\[|]*\\|", "");
System.out.println(s); // => [Don't remove] Don't remove Don't remove]
See a regex demo and an online Java demo.
Details
\\[ - a literal [
[^\\[|]* - a negated character class matching any 0+ chars other than a [ and |
\\| - a literal | symbol.

Related

Replacing quotes in a Java String only on specific places

We have a String as below.
\config\test\[name="sample"]\identifier["2"]\age["3"]
I need to remove the quotes surrounding the numbers. For example, the above string after replacement should look like below.
\config\test\[name="sample"]\identifier[2]\age[3]
Currently I'm trying with the regex as below
String.replaceAll("\"\\\\d\"", "");
This is replacing the numbers also. Please help to find out a regex for this.
You can use replaceAll with this regex \"(\d+)\" so you can replace the matching of \"(\d+)\" with the capturing group (\d+) :
String str = "\\config\\test\\[name=\"sample\"]\\identifier[\"2\"]\\age[\"3\"]";
str = str.replaceAll("\"(\\d+)\"", "$1");
//----------------------^____^------^^
Output
\config\test\[name="sample"]\identifier[2]\age[3]
regex demo
Take a look about Capturing Groups
We can try doing a blanket replacement of the following pattern:
\["(\d+)"\]
And replacing it with this:
\[$1\]
Note that we specifically target quoted numbers only appearing in square brackets. This minimizes the risk of accidentally doing an unintended replacement.
Code:
String input = "\\config\\test\\[name=\"sample\"]\\identifier[\"2\"]\\age[\"3\"]";
input = input.replaceAll("\\[\"(\\d+)\"\\]", "[$1]");
System.out.println(input);
Output:
\config\test\[name="sample"]\identifier[2]\age[3]
Demo here:
Rextester
You can use:
(?:"(?=\d)|(?<=\d)")
and replace it with nothing == ( "" )
fast test:
echo '\config\test\[name="sample"]\identifier["2"]\age["3"]' | perl -lpe 's/(?:"(?=\d)|(?<=\d)")//g'
the output:
\config\test\[name="sample"]\identifier[2]\age[3]
test2:
echo 'identifier["123"]\age["456"]' | perl -lpe 's/(?:"(?=\d)|(?<=\d)")//g'
the output:
identifier[123]\age[456]
NOTE
if you have only a single double quote " it works fine; otherwise you should add quantifier + for both beginning and end "
test3:
echo '"""""1234234"""""' | perl -lpe 's/(?:"+(?=\d)|(?<=\d)"+)//g'
the output:
1234234

high-level regular expression with not

Hi regular expression experts,
I have the following text
<[~UNKNOWN:a-z\.]> <[~UNKNOWN:A-Z\-0-9]> <[~UNKNOWN:A-Z\]a-z]
And the following reg expr
\[\~[^\[\~\]]*\]
It works fine for the 1st and 2nd group in the text but not for the 3rd one.
The 1st group is
[~UNKNOWN:a-z\.]
The 2nd is
[~UNKNOWN:A-Z\-0-9]
and the 3rd one is
[~UNKNOWN:A-Z\]a-z]
However the reg exp finds the following text
[~UNKNOWN:A-Z\]
I understand why and I know that I have to add the following rule to the reg exp:
starting with '[' and '~' characters and ending with ']' UNLESS there is a '\' in front of ']'. So I should add a NOT expression but not sure how.
Could anybody please help?
Thanks,
V.
Why not simply:
<([^>]+)>?
Regex Demo
This should work (first line pattern, second line your pattern (ignore whitespace), third line my changes):
\[\~(?:[^\[\~\]]|(?<=\\)\])*(?<!\\)\]
\[\~ [^\[\~\]] * \]
(?: |(?<=\\)\]) (?<!\\)
Your regex:
\[\~ # Literal characters [~
[^ # Character group, NONE of the following:
\[\~\] # [ or ~ or ]
]* # 0 or more of this character group
\] # Followed by ]
Your pattern in words: [~, everything in between, up to the next ], as long as there is no [ or ~ or ] in there.
My pattern , only relevant changes explained:
\[\~
(?: # Non capturing group
[^\[\~\]]
| # OR
(?<=\\)\] # ], preceded by \
)*
(?<!\\)\] # ], not preceded by \
In words: Same as yours, plus ] may be contained if it is preceded by \, and the closing ] may not be preceded by \

Regex - trying to match / \ | , or newline, error says invalid escape character

I'm actually trying to split a string on any of the following :
/
\
|
,
\n
Here's the regex I'm using, which gives the 'invalid escape character' error :
String delims = "[\\\\\|\\/\\n,]+";
String[] list1 = str1.split(delims);
I've tried a few more versions of this, trying to get the number of \'s right. What's the right way to do this?
"[/\\|\n,\\\\]+"
Some of these you need to double escape
/ matches /
\\| matches |
\n matches new line
, matches ,
\\\\ matches \
To create \ literal in regex engine you need to write it with four \ in string, so you have one \ extra
"[\\\\\|\\/\\n,]+";
1234^
here
Also you don't need to escape / in Java regex engine, and you don't need to pass \n as \\n (\n literal will be also accepted) you can so try with
String delims = "[\\\\|/\n,]+";

Regular expression for no whitespaces on the first position

Example accepted:
This is a try!
And this is the second line!
Example not accepted:
this is a try with initial spaces
and this the second line
So, I need:
no string made only by whitespaces " "
no string where first char is whitespace
new lines are ok; only the first character cannot be a new line
I was using
^(?=\s*\S).*$
but that pattern can't allow new lines.
You can try this regex
^(?!\s*$|\s).*$
---- -- --
| | |->matches everything!
| |->no string where first char is whitespace
|->no string made only by whitespaces
you need to use singleline mode ..
you can try it here..you need to use matches method
"no string made only by whitespaces" is the same to "no string where first char is whitespace" as it also begins with white space.
You have to set Pattern.MULTILINE which changes the meaning of ^ and $ also to begin and end of line, not only entire string
"^\\S.+$"
I'm not a Java guy, but a solution in Python could look like this here:
In [1]: import re
In [2]: example_accepted = 'This is a try!\nAnd this is the second line!'
In [3]: example_not_accepted = ' This is a try with initial spaces\nand this the second line'
In [4]: pattern = re.compile(r"""
....: ^ # matches at the beginning of a string
....: \S # matches any non-whitespace character
....: .+ # matches one or more arbitrary characters
....: $ # matches at the end of a string
....: """,
....: flags=re.MULTILINE|re.VERBOSE)
In [5]: pattern.findall(example_accepted)
Out[5]: ['This is a try!', 'And this is the second line!']
In [6]: pattern.findall(example_not_accepted)
Out[6]: ['and this the second line']
The key part here is the flag re.MULTILINE. With this flag enabled, ^ and $ do not only match at the beginning and end of a string, but also at the beginning and end of lines which are separated by newlines. I'm sure there is something equivalent for Java as well.

Error reading log file with reg expression

I am trying to read a log file with the content look like this:
127.0.0.1 - - [17/OCT/2009:00:02:14 0000] GET xxxxxx xxxx xxx
I tried the following reg exp and I am getting ERROR: Unclosed group near index 90
regex = (\d+\.\d+\.\d+\.\d+)\s-\s-\s\[(\d+)/(\w{3})/(\d{4}):(\d{2}):(\d{2}):(\d{2})\s(\d{4}\)].*
Can someone help me?
You forgot escaping some chars:
^(\d+\.\d+\.\d+\.\d+)\s-\s-\s\[(\d+)\/(\w{3})\/(\d{4}):(\d{2}):(\d{2}):(\d{2})\s(\d{4})\]
I think the "[" and "]" should be escaped: [[] and []] or \[ and \].
For Java:
java.util.regex.Pattern.compile("(\\d+.\\d+.\\d+.\\d+)\\s-\\s-\\s\\[(\\d+)/(\\w{3})/(\\d{4}):(\\d{2}):(\\d{2}):(\\d{2})\\s(\\d{4})\\].*")
First, escape [ and ] with backslahes. They have special meaning in regexps.
[ and ] are special characters. That's what it means by unclosed group. Depending on your flavor of regex, you'll need to put either 1 \ or 2 \ in front of each bracket.
regex = (\d+.\d+.\d+.\d+)\s-\s-\s[(\d+)/(\w{3})/(\d{4}):(\d{2}):(\d{2}):(\d{2})\s(\d{4})].*
^\d+\.\d+\.\d+\.\d+\s-\s-\s\[\d{2}\/[A-Z]{3}\/\d{4}:\d{2}:\d{2}:\d{2}\s\d{4}]\sGET\s(.{6}\s.{4}\s.{3})$

Categories