Error reading log file with reg expression

Error reading log file with reg expression - java

I am trying to read a log file with the content look like this:
127.0.0.1 - - [17/OCT/2009:00:02:14 0000] GET xxxxxx xxxx xxx
I tried the following reg exp and I am getting ERROR: Unclosed group near index 90
regex = (\d+\.\d+\.\d+\.\d+)\s-\s-\s\[(\d+)/(\w{3})/(\d{4}):(\d{2}):(\d{2}):(\d{2})\s(\d{4}\)].*
Can someone help me?

You forgot escaping some chars:
^(\d+\.\d+\.\d+\.\d+)\s-\s-\s\[(\d+)\/(\w{3})\/(\d{4}):(\d{2}):(\d{2}):(\d{2})\s(\d{4})\]

I think the "[" and "]" should be escaped: [[] and []] or \[ and \].
For Java:
java.util.regex.Pattern.compile("(\\d+.\\d+.\\d+.\\d+)\\s-\\s-\\s\\[(\\d+)/(\\w{3})/(\\d{4}):(\\d{2}):(\\d{2}):(\\d{2})\\s(\\d{4})\\].*")

First, escape [ and ] with backslahes. They have special meaning in regexps.

[ and ] are special characters. That's what it means by unclosed group. Depending on your flavor of regex, you'll need to put either 1 \ or 2 \ in front of each bracket.
regex = (\d+.\d+.\d+.\d+)\s-\s-\s[(\d+)/(\w{3})/(\d{4}):(\d{2}):(\d{2}):(\d{2})\s(\d{4})].*

^\d+\.\d+\.\d+\.\d+\s-\s-\s\[\d{2}\/[A-Z]{3}\/\d{4}:\d{2}:\d{2}:\d{2}\s\d{4}]\sGET\s(.{6}\s.{4}\s.{3})$

Related

Regex: How to remove a substring that is bounded by certain characters?

I'm not sure what the appropriate regex expression would be for this:
String s = "[Don't remove] Don't remove [Remove | Don't remove]";
I want to remove everything in between [ and | but not [ and ]. So the output is:
"[Don't remove] Don't remove Don't remove]"
I tried doing this,
s = s.replaceAll("\\[.*?\\|", "");
but I end up getting something like this.
"Don't remove]"
Now I'm at a lost. I'm still new to regular expressions and any help would be greatly appreciated. Thanks!

Use a ngated character class [^\[|]* that will not allow matching any other [ and | in between [ and |:
String s = "[Don't remove] Don't remove [Remove | Don't remove]";
s = s.replaceAll("\\[[^\\[|]*\\|", "");
System.out.println(s); // => [Don't remove] Don't remove Don't remove]
See a regex demo and an online Java demo.
Details
\\[ - a literal [
[^\\[|]* - a negated character class matching any 0+ chars other than a [ and |
\\| - a literal | symbol.

high-level regular expression with not

Hi regular expression experts,
I have the following text
<[~UNKNOWN:a-z\.]> <[~UNKNOWN:A-Z\-0-9]> <[~UNKNOWN:A-Z\]a-z]
And the following reg expr
\[\~[^\[\~\]]*\]
It works fine for the 1st and 2nd group in the text but not for the 3rd one.
The 1st group is
[~UNKNOWN:a-z\.]
The 2nd is
[~UNKNOWN:A-Z\-0-9]
and the 3rd one is
[~UNKNOWN:A-Z\]a-z]
However the reg exp finds the following text
[~UNKNOWN:A-Z\]
I understand why and I know that I have to add the following rule to the reg exp:
starting with '[' and '~' characters and ending with ']' UNLESS there is a '\' in front of ']'. So I should add a NOT expression but not sure how.
Could anybody please help?
Thanks,
V.

Why not simply:
<([^>]+)>?
Regex Demo

This should work (first line pattern, second line your pattern (ignore whitespace), third line my changes):
\[\~(?:[^\[\~\]]|(?<=\\)\])*(?<!\\)\]
\[\~ [^\[\~\]] * \]
(?: |(?<=\\)\]) (?<!\\)
Your regex:
\[\~ # Literal characters [~
[^ # Character group, NONE of the following:
\[\~\] # [ or ~ or ]
]* # 0 or more of this character group
\] # Followed by ]
Your pattern in words: [~, everything in between, up to the next ], as long as there is no [ or ~ or ] in there.
My pattern , only relevant changes explained:
\[\~
(?: # Non capturing group
[^\[\~\]]
| # OR
(?<=\\)\] # ], preceded by \
)*
(?<!\\)\] # ], not preceded by \
In words: Same as yours, plus ] may be contained if it is preceded by \, and the closing ] may not be preceded by \

Regex error on Java

I created regex for extracting php exception message fields
(\w+.*)|\G(?!\A)\s*#\d+\s+(\S+\.php)\((\d+)\):\s(\w+.*)#012|#\d+\s{(\w+)}
Demo Links : https://regex101.com/r/xI6cR0/2
Error Message:
Illegal repetition near index 66 (\w+.*)|\G(?!\A)\s*#\d+\s+(\S+\.php)\((\d+)\):\s(\w+.*)#012|#\d+\s{(\w+)} ^

You need to escape all \ again by \\ for it to work.So your \w would become \\w.You also need to escape {}.So it would be
(\\w+.*)|\\G(?!\\A)\\s*#\\d+\\s+(\\S+\\.php)\\((\\d+)\\):\\s(\\w+.*)#012|#\\d+\\s\\{(\\w+)\\}

quotes in the java regular expressions

I get a String from the JSP, containing [", e.g.
["Bulgaria
I would like to replace all the [" occurrences for [', but I don't know exactly how to do it...
I just tried:
str = str.replaceAll("[\\\"", "['");
with the result
java.util.regex.PatternSyntaxException: Unclosed character class near index 2 [\"
and
html = html.replaceAll("[\"", "['");
with the result
java.util.regex.PatternSyntaxException: Unclosed character class near index 1 [" ^
any help will be appreciated

Try this:
str.replaceAll("\\[\"", "['");
You need \\ to escape in java regex and [ is a special character in java regex, thus the \\ in front of it. " is a special character in strings so you only need one \ to escape it.

"Test[\"".replaceAll("\\[\"", "['"); // Test['

Regular expression for no whitespaces on the first position

Example accepted:
This is a try!
And this is the second line!
Example not accepted:
this is a try with initial spaces
and this the second line
So, I need:
no string made only by whitespaces " "
no string where first char is whitespace
new lines are ok; only the first character cannot be a new line
I was using
^(?=\s*\S).*$
but that pattern can't allow new lines.

You can try this regex
^(?!\s*$|\s).*$
---- -- --
| | |->matches everything!
| |->no string where first char is whitespace
|->no string made only by whitespaces
you need to use singleline mode ..
you can try it here..you need to use matches method

"no string made only by whitespaces" is the same to "no string where first char is whitespace" as it also begins with white space.
You have to set Pattern.MULTILINE which changes the meaning of ^ and $ also to begin and end of line, not only entire string
"^\\S.+$"

I'm not a Java guy, but a solution in Python could look like this here:
In [1]: import re
In [2]: example_accepted = 'This is a try!\nAnd this is the second line!'
In [3]: example_not_accepted = ' This is a try with initial spaces\nand this the second line'
In [4]: pattern = re.compile(r"""
....: ^ # matches at the beginning of a string
....: \S # matches any non-whitespace character
....: .+ # matches one or more arbitrary characters
....: $ # matches at the end of a string
....: """,
....: flags=re.MULTILINE|re.VERBOSE)
In [5]: pattern.findall(example_accepted)
Out[5]: ['This is a try!', 'And this is the second line!']
In [6]: pattern.findall(example_not_accepted)
Out[6]: ['and this the second line']
The key part here is the flag re.MULTILINE. With this flag enabled, ^ and $ do not only match at the beginning and end of a string, but also at the beginning and end of lines which are separated by newlines. I'm sure there is something equivalent for Java as well.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Error reading log file with reg expression - java

You forgot escaping some chars: ^(\d+\.\d+\.\d+\.\d+)\s-\s-\s\[(\d+)\/(\w{3})\/(\d{4}):(\d{2}):(\d{2}):(\d{2})\s(\d{4})\]

I think the "[" and "]" should be escaped: [[] and []] or \[ and \]. For Java: java.util.regex.Pattern.compile("(\\d+.\\d+.\\d+.\\d+)\\s-\\s-\\s\\[(\\d+)/(\\w{3})/(\\d{4}):(\\d{2}):(\\d{2}):(\\d{2})\\s(\\d{4})\\].*")

First, escape [ and ] with backslahes. They have special meaning in regexps.

[ and ] are special characters. That's what it means by unclosed group. Depending on your flavor of regex, you'll need to put either 1 \ or 2 \ in front of each bracket. regex = (\d+.\d+.\d+.\d+)\s-\s-\s[(\d+)/(\w{3})/(\d{4}):(\d{2}):(\d{2}):(\d{2})\s(\d{4})].*

^\d+\.\d+\.\d+\.\d+\s-\s-\s\[\d{2}\/[A-Z]{3}\/\d{4}:\d{2}:\d{2}:\d{2}\s\d{4}]\sGET\s(.{6}\s.{4}\s.{3})$

Related

Regex: How to remove a substring that is bounded by certain characters?

high-level regular expression with not

Regex error on Java

quotes in the java regular expressions

Regular expression for no whitespaces on the first position

Categories

Resources