Regex Unclosed character class in Java; escaping doesn't solve issue [duplicate] - java

This question already has answers here:
Using square brackets inside character class in Java regex
(1 answer)
Regular expression works on regex101.com, but not on prod
(1 answer)
Closed 2 years ago.
I have this regex in JS flavor
^[\w.-]+(?:\.[\w\.-]+)+[\w\-\._~:/?#[\]#!\$&'\(\)\*\+,;=.]+$
I tried it in JS and it worked for me (only want to match URL that does not start with the HTTP/HTTPS protocol):
https://regex101.com/r/6y2Gnd/2
Now I want to use the same regex in my Java backend. At first I got the error
Unclosed character class
Upon reading into it, I realized I have to escape the \ slash. I basically added three \\\ to every \ slash. The result is:
^[\\\\w.-]+(?:\\\\.[\\\\w\\\\.-]+)+[\\\\w\\\\-\\\\._~:/?#[\\\\]#!\\\\$&'\\\\(\\\\)\\\\*\\\\+,;=.]+$
Even though the compiler doesn't show any errors anymore, the result was empty, i.e. it couldn't match the cases like it did with in JS flavor.
I tested the Java regex here and in my code.
www.web.de # I want to match this
web.de # I want to match this
http://web.de # I do NOT want to match this
https://www.web.de # I do NOT want to match this
Anyone know what I'm missing?

Following regex works well in Java regex tester:
^[\w.-]+(?:\.[\w\.-]+)+[\w\-\._~:/?#\[\]#!\$&'\(\)\*\+,;=.]+$
Please try it yourself. Backslash should be added in the front of the [ character.

Related

Java regex match part of path [duplicate]

This question already has answers here:
Regular expression to match a backslash followed by a quote
(3 answers)
Escaping special characters in Java Regular Expressions
(7 answers)
How to check if a directory is inside other directory
(1 answer)
Closed 4 months ago.
The community reviewed whether to reopen this question 4 months ago and left it closed:
Original close reason(s) were not resolved
I am trying to check if a chosen path is a valid path for my Java program. In order to be valid, it must match the path E:\test\(someFolderName)\. The chosen folder can be deeper in that directory.
This is what I have tried:
String a = "E:\\test\\anotherFolder";
if (a.matches("E:\\\\btest\\b\\.*")) {
System.out.println("match");
}
I have also tried putting test into [] but it did not work.
\b would mark the beginning of a word boundary, and adding \b again should close it, correct?
.* would match any character 1 to infinite times.
So, is there a problem with the escaping? Or do I need to group it differently?
Possible duplicated of escaping-special-characters-in-java-regular-expressions.
You need more backslashes. "The 4 slashes in the Java string turn into 2 slashes in the regex pattern. 2 backslashes in a regex pattern matches the backslash itself.".

not able to escape regex using karate framework on IntelliJ [duplicate]

This question already has an answer here:
regex: How to escape backslashes and special characters?
(1 answer)
Closed 4 years ago.
I'm testing a web service using karate framework using IntelliJ.
By framework definition, I should be able to use regex to assert XML responses and I have been able to use it to some extent.
But the problem arises when I want to assert using regex which contains back-slash, for example: "\X{20,}"
So I tried: (using 3 back-slashes \\\)
Then match response ...... rawData == '#regex \\\X{20,}'
and this gives me an error:
com.intuit.karate.exception.KarateException: Illegal/unsupported escape sequence near index 1
\X{20,}
The regex \\\X{20,} cannot be valid. Double backslash is required in your feature file.
So \\ represents a single regex backslash (meant to escape what follows it). If you want to match a single literal \ in your regex, you need \\\\ in the feature file.
So your pattern should probably be \\\\X{20,} if the content should contain a backslash.
This is documented here.
Note that regex escaping has to be done with a double back-slash - for e.g: '#regex a\.dot' will match 'a.dot'

Why is the beginning of a line not recognized by a regex? [duplicate]

This question already has answers here:
How to use beginning and endline markers in regex for Java String?
(5 answers)
How to use java regex to match a line
(2 answers)
Closed 5 years ago.
given the following expression:
Pattern.compile("^Test.*\n").matcher("Test 123\nNothing\nTest 2\n").replaceAll("foo\n")
This yields:
"foo\nNothing\nTest 2\n"
for me. I expected that the last line is also replaced to foo\n since there is a linebreak immediately before Test 2 in the input string.
Why is doesn't the regex match there?
You have to add the multiline flag to the pattern: Pattern.MULTILINE.
Pattern.compile("^Test.*\n", Pattern.MULTILINE).matcher("Test 123\nNothing\nTest 2\n").replaceAll("foo\n")
By Default the match is only single line. For more Informations see the javadoc
At the beginning of your regex you have a ^ sign which normally anchors the regex to the beginning of a tested string. You need to specify multiline regex option (Oracle Documentation link) to make it apply to start of each line instead.
Try this (I have split the lines for legibility, feel free to oneline it back):
Pattern.compile("^Test.*\n", Pattern.MULTILINE)
.matcher("Test 123\nNothing\nTest 2\n")
.replaceAll("foo\n")
Unfortunately I do not have Java environment set up at the moment, so I'm unable to check this by myself.

how to separate a java string that is separated by "$"? [duplicate]

This question already has answers here:
illegal string body character after dollar sign
(5 answers)
Closed 8 years ago.
I am using spock to test a java app.It seems "$" is a special character in groovy.any java string that is separated by "$" can't be separated in groovy properly.Any workaround for this problem?
update
The "split" happened in java code that I can't edit. It turns out that java code has a problem same as:Why can't I split a string with the dollar sign?
I don't think $ is a special character in Groovy strings. Edit: Yes, it is, if you use GStrings! But the rest may still be useful: But it's a special character in the string you give to String#split, because that string is interpreted as a regular expression, and in a regular expression, $ is "end of input" (or end of line, depending on flags).
If you're using String#split, to make it split on a literal $, you have to escape it with a backslash. To make the regex engine see a backslash, you have to escape the backslash in a string literal with another backslash.
Example:
'testing$one$two$three'.split('\\$').each {
println it
}
Output:
testing
one
two
three
Better yet, as suggested by Dónal, use tokenize:
Example:
'testing$one$two$three'.tokenize('$').each {
println it
}
(Same output)

Java RegEx back slash Number [duplicate]

This question already has answers here:
What's the meaning of a number after a backslash in a regular expression?
(2 answers)
Closed 8 years ago.
What does it mean to have a \number in a regex in java.
Let's say I have something like \1 or \2. What does this mean and how is it used?
An example would be really helpful.
Thanks
Backreferences match the same text as previously matched by a
capturing group. Suppose you want to match a pair of opening and
closing HTML tags, and the text in between. By putting the opening tag
into a backreference, we can reuse the name of the tag for the closing
tag. Here's how:
<([A-Z][A-Z0-9]*)\b[^>]*>.*?</\1>
This regex contains only one pair of parentheses, which capture the string
matched by
[A-Z][A-Z0-9]*
The backreference \1 (backslash one)
references the first capturing group. \1 matches the exact same text
that was matched by the first capturing group. The / before it is a
literal character. It is simply the forward slash in the closing HTML
tag that we are trying to match.
For more details and examples check:
http://www.regular-expressions.info/backref.html
\ usually is used at the start of the construction of a match.
It also represents an escape character.

Categories