Werid behavior - Java regex expression taken from DB - java

I have regex expression stored in DB - '\\\\E\\\\', I use java to fetch it and match it to Strings.
I thought that since java reads from the DB it knows to escape SQL special characters by itself, and all I need is to escape the regex special chracters, so this expression actually matches '\\E\\'.
Problem is that it macthes '\E\' rather then '\\E\\' , why?

If you want to use a regex to match one literal backslash character, you need to use four backslashes in a Java string.
The regex \\ matches one literal backslash.
The string "\\" denotes a single backslash.
Therefore, in order to build a regex that consists of two backslashes, you need a Java string with four backslashes.
So you need "\\\\\\\\E\\\\\\\\" to construct a regex that matches \\E\\...

Related

How to replace a space exactly with "\\\\s+" [duplicate]

I'm trying to convert the String \something\ into the String \\something\\ using replaceAll, but I keep getting all kinds of errors. I thought this was the solution:
theString.replaceAll("\\", "\\\\");
But this gives the below exception:
java.util.regex.PatternSyntaxException: Unexpected internal error near index 1
The String#replaceAll() interprets the argument as a regular expression. The \ is an escape character in both String and regex. You need to double-escape it for regex:
string.replaceAll("\\\\", "\\\\\\\\");
But you don't necessarily need regex for this, simply because you want an exact character-by-character replacement and you don't need patterns here. So String#replace() should suffice:
string.replace("\\", "\\\\");
Update: as per the comments, you appear to want to use the string in JavaScript context. You'd perhaps better use StringEscapeUtils#escapeEcmaScript() instead to cover more characters.
TLDR: use theString = theString.replace("\\", "\\\\"); instead.
Problem
replaceAll(target, replacement) uses regular expression (regex) syntax for target and partially for replacement.
Problem is that \ is special character in regex (it can be used like \d to represents digit) and in String literal (it can be used like "\n" to represent line separator or \" to escape double quote symbol which normally would represent end of string literal).
In both these cases to create \ symbol we can escape it (make it literal instead of special character) by placing additional \ before it (like we escape " in string literals via \").
So to target regex representing \ symbol will need to hold \\, and string literal representing such text will need to look like "\\\\".
So we escaped \ twice:
once in regex \\
once in String literal "\\\\" (each \ is represented as "\\").
In case of replacement \ is also special there. It allows us to escape other special character $ which via $x notation, allows us to use portion of data matched by regex and held by capturing group indexed as x, like "012".replaceAll("(\\d)", "$1$1") will match each digit, place it in capturing group 1 and $1$1 will replace it with its two copies (it will duplicate it) resulting in "001122".
So again, to let replacement represent \ literal we need to escape it with additional \ which means that:
replacement must hold two backslash characters \\
and String literal which represents \\ looks like "\\\\"
BUT since we want replacement to hold two backslashes we will need "\\\\\\\\" (each \ represented by one "\\\\").
So version with replaceAll can look like
replaceAll("\\\\", "\\\\\\\\");
Easier way with replaceAll
To make out life easier Java provides tools to automatically escape text into target and replacement parts. So now we can focus only on strings, and forget about regex syntax:
replaceAll(Pattern.quote(target), Matcher.quoteReplacement(replacement))
which in our case can look like
replaceAll(Pattern.quote("\\"), Matcher.quoteReplacement("\\\\"))
Even better: use replace
If we don't really need regex syntax support lets not involve replaceAll at all. Instead lets use replace. Both methods will replace all targets, but replace doesn't involve regex syntax. So you could simply write
theString = theString.replace("\\", "\\\\");
To avoid this sort of trouble, you can use replace (which takes a plain string) instead of replaceAll (which takes a regular expression). You will still need to escape backslashes, but not in the wild ways required with regular expressions.
You'll need to escape the (escaped) backslash in the first argument as it is a regular expression. Replacement (2nd argument - see Matcher#replaceAll(String)) also has it's special meaning of backslashes, so you'll have to replace those to:
theString.replaceAll("\\\\", "\\\\\\\\");
Yes... by the time the regex compiler sees the pattern you've given it, it sees only a single backslash (since Java's lexer has turned the double backwhack into a single one). You need to replace "\\\\" with "\\\\", believe it or not! Java really needs a good raw string syntax.

PatternSyntaxException when splitting string at "*" character

Every time i try to split the string "hello*world" using s.split("*"); I get a PatternSyntaxException.
I have tried using s.split("\*"); but that gives me another error. Im sure this is something simple.
How do i stop this?
* is a meta-character in regular expressions used as a wildcard quantifier to match zero of more characters
Try using 2 backslash characters
s.split("\\*");
The split method takes a regular expression as the argument, not a normal string. The * has special meaning in regular expressions. If you want to split on a literal *, you have to escape it with a backslash. But the backslash is also an escape character in Java string literals, so you have to escape the backslash too by using two backslashes:
s.split("\\*")

Regex matching groups

I have an issue with replaceAll function of Java string
replaceAll("regex", "replacement");
works fine but whenever my "replacement" string contains the substring like "$0", "$1" .e.t.c, it will create problem by substituting these $x's with corresponding matching group.
For instance
input ="NAME";
input.replaceAll("NAME", "HAR$0I");
will result in a string "HARNAMEI" as the replacement string contains "$0" which will be substituted by matching group "NAME". How can I override that nature. I need to get the result string as "HAR$0I" only.
I escaped the $ .i.e I converted the replacement string to "HAR\\$0I" which worked fine. But I am looking for any method in java that will do this for me for all such characters which has special meaning in regex world.
The documentation of java.lang.String.replaceAll() says:
Note that backslashes () and dollar signs ($) in the replacement
string may cause the results to be different than if it were being
treated as a literal replacement string; see Matcher.replaceAll. Use
Matcher.quoteReplacement(java.lang.String) to suppress the special
meaning of these characters, if desired.
The documentation of String quoteReplacement(String s) says:
Returns a literal replacement String for the specified String. This
method produces a String that will work as a literal replacement s in
the appendReplacement method of the Matcher class. The String produced
will match the sequence of characters in s treated as a literal
sequence. Slashes ('\') and dollar signs ('$') will be given no
special meaning.
$ in replacement is special character allowing you to use groups. To make it literal you will need to escape it with \$ which needs to be written as "\\$". Same rule apply for \, since it is special character used to escape $. If you would like to use \ literal in replacement you would also need to escape it with another \, so you would need to write it as \\\\.
To simplify this process you can just use Matcher.quoteReplacement("yourReplacement")).
In case where you don't need to use regular expression you can simplify it even more and use
replace("NAME", "HAR$0I")
instead of
replaceAll("NAME", Matcher.quoteReplacement("HAR$0I"))
It sounds like you're actually trying to replace raw strings, without using regexes at all.
You should simply call String.replace(), which does literal replacements without using regexes.

Java regular expressions and dollar sign

I have Java string:
String b = "/feedback/com.school.edu.domain.feedback.Review$0/feedbackId");
I also have generated pattern against which I want to match this string:
String pattern = "/feedback/com.school.edu.domain.feedback.Review$0(.)*";
When I say b.matches(pattern) it returns false. Now I know dollar sign is part of Java RegEx, but I don't know how should my pattern look like. I am assuming that $ in pattern needs to be replaced by some escape characters, but don't know how many. This $ sign is important to me as it helps me distinguish elements in list (numbers after dollar), and I can't go without it.
Use
String escapedString = java.util.regex.Pattern.quote(myString)
to automatically escape all special regex characters in a given string.
You need to escape $ in the regex with a back-slash (\), but as a back-slash is an escape character in strings you need to escape the back-slash itself.
You will need to escape any special regex char the same way, for example with ".".
String pattern = "/feedback/com\\.navteq\\.lcms\\.common\\.domain\\.poi\\.feedback\\.Review\\$0(.)*";
In Java regex both . and $ are special. You need to escape it with 2 backslashes, i.e..
"/feedback/com\\.navtag\\.etc\\.Review\\$0(.*)"
(1 backslash is for the Java string, and 1 is for the regex engine.)
Escape the dollar with \
String pattern =
"/feedback/com.navteq.lcms.common.domain.poi.feedback.Review\\$0(.)*";
I advise you to escape . as well, . represent any character.
String pattern =
"/feedback/com\\.navteq\\.lcms\\.common\\.domain\\.poi\\.feedback\\.Review\\$0(.)*";
The ans by #Colin Hebert and edited by #theon is correct. The explanation is as follows. #azec-pdx
It is a regex as a string literal (within double quotes).
period (.) and dollar-sign ($) are special regex characters (metacharacters).
To make the regex engine interpret them as normal regex characters period(.) and dollar-sign ($), you need to prefix a single backslash to each. The single backslash ( itself a special regex character) quotes the character following it and thus escaping it.
Since the given regex is a string literal, another backslash is required to be prefixed to each to avoid confusion with the usual visible-ASCII escapes(character, string and Unicode escapes in string literals) and thus avoid compiler error.
Even if you use within a string literal any special regex construct that has been defined as an escape sequence, it needs to be prefixed with another backslash to avoid compiler error.For example, the special regex construct (an escape sequence) \b (word boundary) of regex would clash with \b(backspace) of the usual visible-ASCII escape(character escape). Thus another backslash is prefixed to avoid the clash and then \\b would be read by regex as word boundary.
To be always safe, all single backslash escapes (quotes) within string literals are prefixed with another backslash. For example, the string literal "\(hello\)" is illegal and leads to a compile-time error; in order to match the string (hello) the string literal "\\(hello\\)" must be used.
The last period (.)* is supposed to be interpreted as special regex character and thus it needs no quoting by a backslash, let alone prefixing a second one.

String.replaceAll single backslashes with double backslashes

I'm trying to convert the String \something\ into the String \\something\\ using replaceAll, but I keep getting all kinds of errors. I thought this was the solution:
theString.replaceAll("\\", "\\\\");
But this gives the below exception:
java.util.regex.PatternSyntaxException: Unexpected internal error near index 1
The String#replaceAll() interprets the argument as a regular expression. The \ is an escape character in both String and regex. You need to double-escape it for regex:
string.replaceAll("\\\\", "\\\\\\\\");
But you don't necessarily need regex for this, simply because you want an exact character-by-character replacement and you don't need patterns here. So String#replace() should suffice:
string.replace("\\", "\\\\");
Update: as per the comments, you appear to want to use the string in JavaScript context. You'd perhaps better use StringEscapeUtils#escapeEcmaScript() instead to cover more characters.
TLDR: use theString = theString.replace("\\", "\\\\"); instead.
Problem
replaceAll(target, replacement) uses regular expression (regex) syntax for target and partially for replacement.
Problem is that \ is special character in regex (it can be used like \d to represents digit) and in String literal (it can be used like "\n" to represent line separator or \" to escape double quote symbol which normally would represent end of string literal).
In both these cases to create \ symbol we can escape it (make it literal instead of special character) by placing additional \ before it (like we escape " in string literals via \").
So to target regex representing \ symbol will need to hold \\, and string literal representing such text will need to look like "\\\\".
So we escaped \ twice:
once in regex \\
once in String literal "\\\\" (each \ is represented as "\\").
In case of replacement \ is also special there. It allows us to escape other special character $ which via $x notation, allows us to use portion of data matched by regex and held by capturing group indexed as x, like "012".replaceAll("(\\d)", "$1$1") will match each digit, place it in capturing group 1 and $1$1 will replace it with its two copies (it will duplicate it) resulting in "001122".
So again, to let replacement represent \ literal we need to escape it with additional \ which means that:
replacement must hold two backslash characters \\
and String literal which represents \\ looks like "\\\\"
BUT since we want replacement to hold two backslashes we will need "\\\\\\\\" (each \ represented by one "\\\\").
So version with replaceAll can look like
replaceAll("\\\\", "\\\\\\\\");
Easier way with replaceAll
To make out life easier Java provides tools to automatically escape text into target and replacement parts. So now we can focus only on strings, and forget about regex syntax:
replaceAll(Pattern.quote(target), Matcher.quoteReplacement(replacement))
which in our case can look like
replaceAll(Pattern.quote("\\"), Matcher.quoteReplacement("\\\\"))
Even better: use replace
If we don't really need regex syntax support lets not involve replaceAll at all. Instead lets use replace. Both methods will replace all targets, but replace doesn't involve regex syntax. So you could simply write
theString = theString.replace("\\", "\\\\");
To avoid this sort of trouble, you can use replace (which takes a plain string) instead of replaceAll (which takes a regular expression). You will still need to escape backslashes, but not in the wild ways required with regular expressions.
You'll need to escape the (escaped) backslash in the first argument as it is a regular expression. Replacement (2nd argument - see Matcher#replaceAll(String)) also has it's special meaning of backslashes, so you'll have to replace those to:
theString.replaceAll("\\\\", "\\\\\\\\");
Yes... by the time the regex compiler sees the pattern you've given it, it sees only a single backslash (since Java's lexer has turned the double backwhack into a single one). You need to replace "\\\\" with "\\\\", believe it or not! Java really needs a good raw string syntax.

Categories