not able to escape regex using karate framework on IntelliJ [duplicate] - java

This question already has an answer here:
regex: How to escape backslashes and special characters?
(1 answer)
Closed 4 years ago.
I'm testing a web service using karate framework using IntelliJ.
By framework definition, I should be able to use regex to assert XML responses and I have been able to use it to some extent.
But the problem arises when I want to assert using regex which contains back-slash, for example: "\X{20,}"
So I tried: (using 3 back-slashes \\\)
Then match response ...... rawData == '#regex \\\X{20,}'
and this gives me an error:
com.intuit.karate.exception.KarateException: Illegal/unsupported escape sequence near index 1
\X{20,}

The regex \\\X{20,} cannot be valid. Double backslash is required in your feature file.
So \\ represents a single regex backslash (meant to escape what follows it). If you want to match a single literal \ in your regex, you need \\\\ in the feature file.
So your pattern should probably be \\\\X{20,} if the content should contain a backslash.
This is documented here.
Note that regex escaping has to be done with a double back-slash - for e.g: '#regex a\.dot' will match 'a.dot'

Related

Find strings with regex expression [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 1 year ago.
Improve this question
I want to search into Java packages using the following expression:
com.company.*
Test example: https://regex101.com/r/tHTQd9/2
But when I use it into Java code it's not finding anything. Do I need to put some escape characters for .?
The following expression would work:
\bcom\.company\.\w[\w\.]*\b
Match between word-boundaries
Use literal dot characters by escaping
1 alphanumeric (or underscore) followed by 0 or more alphanumerics or dots
Pattern regex = Pattern.compile("\\bcom\\.company\\.\\w[\\w\\.]*\\b");
If you are looking for a word or more in the last sequence you can try:
com\\.company\\.\w+
Or, even more generic (any other character or more):
com\\.company\\..+
Please remember that this is quite generic and prone to errors.
If you provide a more detailed explanation or constraints we can help building a better RegEx.
Why double backslash in Java?
We know that the backslash character is an escape character in Java
String literals as well. Therefore, we need to double the backslash
character when using it to precede any character (including the \
character itself).
Source
In java to escape dot (.) you need to append double backslash (\\) so your regex will be like this:
com\\.company\\.*
Why double backslash is needed:
As dot(.) is a special symbol in regex so you need to escape it using a backslash (\) but as backslash also works as an escape character in java so it will be removed by java after processing the string. In order to preserve it, we need to add another backslash (\)
Regex string you will see
com\\.company\\.*
String after java processed it which will be the input as regex
com\.company\.*

Regex Unclosed character class in Java; escaping doesn't solve issue [duplicate]

This question already has answers here:
Using square brackets inside character class in Java regex
(1 answer)
Regular expression works on regex101.com, but not on prod
(1 answer)
Closed 2 years ago.
I have this regex in JS flavor
^[\w.-]+(?:\.[\w\.-]+)+[\w\-\._~:/?#[\]#!\$&'\(\)\*\+,;=.]+$
I tried it in JS and it worked for me (only want to match URL that does not start with the HTTP/HTTPS protocol):
https://regex101.com/r/6y2Gnd/2
Now I want to use the same regex in my Java backend. At first I got the error
Unclosed character class
Upon reading into it, I realized I have to escape the \ slash. I basically added three \\\ to every \ slash. The result is:
^[\\\\w.-]+(?:\\\\.[\\\\w\\\\.-]+)+[\\\\w\\\\-\\\\._~:/?#[\\\\]#!\\\\$&'\\\\(\\\\)\\\\*\\\\+,;=.]+$
Even though the compiler doesn't show any errors anymore, the result was empty, i.e. it couldn't match the cases like it did with in JS flavor.
I tested the Java regex here and in my code.
www.web.de # I want to match this
web.de # I want to match this
http://web.de # I do NOT want to match this
https://www.web.de # I do NOT want to match this
Anyone know what I'm missing?
Following regex works well in Java regex tester:
^[\w.-]+(?:\.[\w\.-]+)+[\w\-\._~:/?#\[\]#!\$&'\(\)\*\+,;=.]+$
Please try it yourself. Backslash should be added in the front of the [ character.

Why is `\.` not a valid escape sequence in java regex [duplicate]

This question already has answers here:
java regex pattern unclosed character class
(2 answers)
Closed 6 years ago.
In the GWT tutorial where you build a stock watcher there is this regex expression to check if an input is valid:
if (!symbol.matches("^[0-9A-Z\\.]{1,10}$"))
Which allows inputs between 1 and 10 chars that are numbers, letters, or dots.
The part that confuses me is the \\.
I interpret this as escaped backslash \\ and then a . which stands for any character. And I thought the correct expression would be \. to escape the dot but doing this results in a regex error in eclipse Invalid escape sequence.
Am I missing the obvious here?
This is one of the hassles of regular expressions in Java. That \\ is not an escaped backslash at the regex level, just at the string level.
This string:
"^[0-9A-Z\\.]{1,10}$"
Defines this regular expression:
^[0-9A-Z\.]{1,10}$
...because the escape is consumed by the string literal.
\ is the escape symbol in a Java String Literal. For instance the newline character is written as \n. In order to place a normal \ in a Java string, this is done by using \\.
So your Java String literal (string in the code): "^[0-9A-Z\\.]{1,10}$" is the actual string used for the regular expression "^[0-9A-Z\.]{1,10}$" (with a single slash). So as you expected this is \. in the regular expression.

how to separate a java string that is separated by "$"? [duplicate]

This question already has answers here:
illegal string body character after dollar sign
(5 answers)
Closed 8 years ago.
I am using spock to test a java app.It seems "$" is a special character in groovy.any java string that is separated by "$" can't be separated in groovy properly.Any workaround for this problem?
update
The "split" happened in java code that I can't edit. It turns out that java code has a problem same as:Why can't I split a string with the dollar sign?
I don't think $ is a special character in Groovy strings. Edit: Yes, it is, if you use GStrings! But the rest may still be useful: But it's a special character in the string you give to String#split, because that string is interpreted as a regular expression, and in a regular expression, $ is "end of input" (or end of line, depending on flags).
If you're using String#split, to make it split on a literal $, you have to escape it with a backslash. To make the regex engine see a backslash, you have to escape the backslash in a string literal with another backslash.
Example:
'testing$one$two$three'.split('\\$').each {
println it
}
Output:
testing
one
two
three
Better yet, as suggested by Dónal, use tokenize:
Example:
'testing$one$two$three'.tokenize('$').each {
println it
}
(Same output)

Splitting on "," but not "\," [duplicate]

This question already has answers here:
How to split a comma separated String while ignoring escaped commas?
(6 answers)
Closed 9 years ago.
I'm looking for a regular expression to match , but ignore \, in Java's regex engine. This comes close:
[^\\],
However, it matches the previous character (in addition to the comma), which won't work.
Perhaps the regular expression approach is the wrong one altogether. I was intending to use String.split() to parse a simple CSV file (can't use an external library) with escaped commas.
You need a negative look-behind assertion here:
String[] arr = str.split("(?<![^\\\\]\\\\),");
Note that you need 4 backslashes there. First escape the backslash for Java string literal. And then again escape both the backslashes for regex.

Categories