Java regular expressions and dollar sign - java

I have Java string:
String b = "/feedback/com.school.edu.domain.feedback.Review$0/feedbackId");
I also have generated pattern against which I want to match this string:
String pattern = "/feedback/com.school.edu.domain.feedback.Review$0(.)*";
When I say b.matches(pattern) it returns false. Now I know dollar sign is part of Java RegEx, but I don't know how should my pattern look like. I am assuming that $ in pattern needs to be replaced by some escape characters, but don't know how many. This $ sign is important to me as it helps me distinguish elements in list (numbers after dollar), and I can't go without it.

Use
String escapedString = java.util.regex.Pattern.quote(myString)
to automatically escape all special regex characters in a given string.

You need to escape $ in the regex with a back-slash (\), but as a back-slash is an escape character in strings you need to escape the back-slash itself.
You will need to escape any special regex char the same way, for example with ".".
String pattern = "/feedback/com\\.navteq\\.lcms\\.common\\.domain\\.poi\\.feedback\\.Review\\$0(.)*";

In Java regex both . and $ are special. You need to escape it with 2 backslashes, i.e..
"/feedback/com\\.navtag\\.etc\\.Review\\$0(.*)"
(1 backslash is for the Java string, and 1 is for the regex engine.)

Escape the dollar with \
String pattern =
"/feedback/com.navteq.lcms.common.domain.poi.feedback.Review\\$0(.)*";
I advise you to escape . as well, . represent any character.
String pattern =
"/feedback/com\\.navteq\\.lcms\\.common\\.domain\\.poi\\.feedback\\.Review\\$0(.)*";

The ans by #Colin Hebert and edited by #theon is correct. The explanation is as follows. #azec-pdx
It is a regex as a string literal (within double quotes).
period (.) and dollar-sign ($) are special regex characters (metacharacters).
To make the regex engine interpret them as normal regex characters period(.) and dollar-sign ($), you need to prefix a single backslash to each. The single backslash ( itself a special regex character) quotes the character following it and thus escaping it.
Since the given regex is a string literal, another backslash is required to be prefixed to each to avoid confusion with the usual visible-ASCII escapes(character, string and Unicode escapes in string literals) and thus avoid compiler error.
Even if you use within a string literal any special regex construct that has been defined as an escape sequence, it needs to be prefixed with another backslash to avoid compiler error.For example, the special regex construct (an escape sequence) \b (word boundary) of regex would clash with \b(backspace) of the usual visible-ASCII escape(character escape). Thus another backslash is prefixed to avoid the clash and then \\b would be read by regex as word boundary.
To be always safe, all single backslash escapes (quotes) within string literals are prefixed with another backslash. For example, the string literal "\(hello\)" is illegal and leads to a compile-time error; in order to match the string (hello) the string literal "\\(hello\\)" must be used.
The last period (.)* is supposed to be interpreted as special regex character and thus it needs no quoting by a backslash, let alone prefixing a second one.

Related

How to put [] in my regex [duplicate]

I have comma separated list of regular expressions:
.{8},[0-9],[^0-9A-Za-z ],[A-Z],[a-z]
I have done a split on the comma. Now I'm trying to match this regex against a generated password. The problem is that Pattern.compile does not like square brackets that is not escaped.
Can some please give me a simple function that takes a string like so: [0-9] and returns the escaped string \[0-9\].
For some reason, the above answer didn't work for me. For those like me who come after, here is what I found.
I was expecting a single backslash to escape the bracket, however, you must use two if you have the pattern stored in a string. The first backslash escapes the second one into the string, so that what regex sees is \]. Since regex just sees one backslash, it uses it to escape the square bracket.
\\]
In regex, that will match a single closing square bracket.
If you're trying to match a newline, for example though, you'd only use a single backslash. You're using the string escape pattern to insert a newline character into the string. Regex doesn't see \n - it sees the newline character, and matches that. You need two backslashes because it's not a string escape sequence, it's a regex escape sequence.
You can use Pattern.quote(String).
From the docs:
public static String quote​(String s)
Returns a literal pattern String for the specified String.
This method produces a String that can be used to create a Pattern that would match the string s as if it were a literal pattern.
Metacharacters or escape sequences in the input sequence will be given no special meaning.
You can use the \Q and \E special characters...anything between \Q and \E is automatically escaped.
\Q[0-9]\E
Pattern.compile() likes square brackets just fine. If you take the string
".{8},[0-9],[^0-9A-Za-z ],[A-Z],[a-z]"
and split it on commas, you end up with five perfectly valid regexes: the first one matches eight non-line-separator characters, the second matches an ASCII digit, and so on. Unless you really want to match strings like ".{8}" and "[0-9]", I don't see why you would need to escape anything.

Werid behavior - Java regex expression taken from DB

I have regex expression stored in DB - '\\\\E\\\\', I use java to fetch it and match it to Strings.
I thought that since java reads from the DB it knows to escape SQL special characters by itself, and all I need is to escape the regex special chracters, so this expression actually matches '\\E\\'.
Problem is that it macthes '\E\' rather then '\\E\\' , why?
If you want to use a regex to match one literal backslash character, you need to use four backslashes in a Java string.
The regex \\ matches one literal backslash.
The string "\\" denotes a single backslash.
Therefore, in order to build a regex that consists of two backslashes, you need a Java string with four backslashes.
So you need "\\\\\\\\E\\\\\\\\" to construct a regex that matches \\E\\...

String replace throws error with $ sign

I'm having an issue with replacing a string in java...
the line is:
subject = subject.replaceAll("\\[calEvent\\]", calSubject);
This line doesn’t work with $ sign in calSubject.
what the subject variable is, a dynamic subject line variable from a file. for example like so:
Calnot = [calEvent]
what i am trying to do is replace the calEvent place holder with the subject variable. but how i did it does not work because it crashes when the subject contains a $ sign.
any idea how I can do this so it won't break if the subject contains a $ sign or any characters for that matter?
That's because the dollar sign is a special character in a replacement string, use Matcher.quoteReplacement() to escape this kind of character.
subject = subject.replaceAll("\\[calEvent\\]", Matcher.quoteReplacement(calSubject));
From the doc of String.replaceAll() :
Note that backslashes (\) and dollar signs ($) in the replacement
string may cause the results to be different than if it were being
treated as a literal replacement string; see Matcher.replaceAll. Use
Matcher.quoteReplacement(java.lang.String) to suppress the special
meaning of these characters, if desired.
Note that the dollar sign is used to refer to the corresponding capturing groups in the regular expression ($0, $1, etc.).
EDIT
Matcher.quoteReplacement() has been introduced in Java 1.5, if you're stuck in Java 1.4 you have to escape $ manually by replacing it with \$ inside the string. But since String.replaceAll() would also take the \ and the $ as special characters you have to escape them once and you also have to escape all \ once more for the Java runtime.
("$", "\$") /* what we want */
("\$", "\\\$") /* RegExp engine escape */
("\\$", "\\\\\\$") /* Java runtime escape */
So we get :
calSubject = calSubject.replaceAll("\\$", "\\\\\\$");
if you don't need the regex feature, you can consider to use this method of String class:
replace(CharSequence target,CharSequence replacement)
It saves your "escape" backslashes as well.
api doc:
Replaces each substring of this string that matches the literal target
sequence with the specified literal replacement sequence. The
replacement proceeds from the beginning of the string to the end, for
example, replacing "aa" with "b" in the string "aaa" will result in
"ba" rather than "ab".
From the documentation of replaceAll:
Note that backslashes () and dollar signs ($) in the replacement
string may cause the results to be different than if it were being
treated as a literal replacement string; see Matcher.replaceAll. Use
java.util.regex.Matcher.quoteReplacement to suppress the special
meaning of these characters, if desired.
And in Matcher.replaceAll
Dollar signs may be treated as references to captured subsequences as
described above, and backslashes are used to escape literal characters
in the replacement string.
Not sure I really understand your question but try
subject = subject.replaceAll("\\[calEvent\\]", Matcher.quoteReplacement(calSubject));
Please use
Matcher.quoteReplacement(calEvent);

Regex matching groups

I have an issue with replaceAll function of Java string
replaceAll("regex", "replacement");
works fine but whenever my "replacement" string contains the substring like "$0", "$1" .e.t.c, it will create problem by substituting these $x's with corresponding matching group.
For instance
input ="NAME";
input.replaceAll("NAME", "HAR$0I");
will result in a string "HARNAMEI" as the replacement string contains "$0" which will be substituted by matching group "NAME". How can I override that nature. I need to get the result string as "HAR$0I" only.
I escaped the $ .i.e I converted the replacement string to "HAR\\$0I" which worked fine. But I am looking for any method in java that will do this for me for all such characters which has special meaning in regex world.
The documentation of java.lang.String.replaceAll() says:
Note that backslashes () and dollar signs ($) in the replacement
string may cause the results to be different than if it were being
treated as a literal replacement string; see Matcher.replaceAll. Use
Matcher.quoteReplacement(java.lang.String) to suppress the special
meaning of these characters, if desired.
The documentation of String quoteReplacement(String s) says:
Returns a literal replacement String for the specified String. This
method produces a String that will work as a literal replacement s in
the appendReplacement method of the Matcher class. The String produced
will match the sequence of characters in s treated as a literal
sequence. Slashes ('\') and dollar signs ('$') will be given no
special meaning.
$ in replacement is special character allowing you to use groups. To make it literal you will need to escape it with \$ which needs to be written as "\\$". Same rule apply for \, since it is special character used to escape $. If you would like to use \ literal in replacement you would also need to escape it with another \, so you would need to write it as \\\\.
To simplify this process you can just use Matcher.quoteReplacement("yourReplacement")).
In case where you don't need to use regular expression you can simplify it even more and use
replace("NAME", "HAR$0I")
instead of
replaceAll("NAME", Matcher.quoteReplacement("HAR$0I"))
It sounds like you're actually trying to replace raw strings, without using regexes at all.
You should simply call String.replace(), which does literal replacements without using regexes.

How to replace a special character with single slash

I have a question about strings in Java. Let's say, I have a string like so:
String str = "The . startup trace ?state is info?";
As the string contains the special character like "?" I need the string to be replaced with "\?" as per my requirement. How do I replace special characters with "\"? I tried the following way.
str.replace("?","\?");
But it gives a compilation error. Then I tried the following:
str.replace("?","\\?");
When I do this it replaces the special characters with "\\". But when I print the string, it prints with single slash. I thought it is taking single slash only but when I debugged I found that the variable is taking "\\".
Can anyone suggest how to replace the special characters with single slash ("\")?
On escape sequences
A declaration like:
String s = "\\";
defines a string containing a single backslash. That is, s.length() == 1.
This is because \ is a Java escape character for String and char literals. Here are some other examples:
"\n" is a String of length 1 containing the newline character
"\t" is a String of length 1 containing the tab character
"\"" is a String of length 1 containing the double quote character
"\/" contains an invalid escape sequence, and therefore is not a valid String literal
it causes compilation error
Naturally you can combine escape sequences with normal unescaped characters in a String literal:
System.out.println("\"Hey\\\nHow\tare you?");
The above prints (tab spacing may vary):
"Hey\
How are you?
References
JLS 3.10.6 Escape Sequences for Character and String Literals
See also
Is the char literal '\"' the same as '"' ?(backslash-doublequote vs only-doublequote)
Back to the problem
Your problem definition is very vague, but the following snippet works as it should:
System.out.println("How are you? Really??? Awesome!".replace("?", "\\?"));
The above snippet replaces ? with \?, and thus prints:
How are you\? Really\?\?\? Awesome!
If instead you want to replace a char with another char, then there's also an overload for that:
System.out.println("How are you? Really??? Awesome!".replace('?', '\\'));
The above snippet replaces ? with \, and thus prints:
How are you\ Really\\\ Awesome!
String API links
replace(CharSequence target, CharSequence replacement)
Replaces each substring of this string that matches the literal target sequence with the specified literal replacement sequence.
replace(char oldChar, char newChar)
Returns a new string resulting from replacing all occurrences of oldChar in this string with newChar.
On how regex complicates things
If you're using replaceAll or any other regex-based methods, then things becomes somewhat more complicated. It can be greatly simplified if you understand some basic rules.
Regex patterns in Java is given as String values
Metacharacters (such as ? and .) have special meanings, and may need to be escaped by preceding with a backslash to be matched literally
The backslash is also a special character in replacement String values
The above factors can lead to the need for numerous backslashes in patterns and replacement strings in a Java source code.
It doesn't look like you need regex for this problem, but here's a simple example to show what it can do:
System.out.println(
"Who you gonna call? GHOSTBUSTERS!!!"
.replaceAll("[?!]+", "<$0>")
);
The above prints:
Who you gonna call<?> GHOSTBUSTERS<!!!>
The pattern [?!]+ matches one-or-more (+) of any characters in the character class [...] definition (which contains a ? and ! in this case). The replacement string <$0> essentially puts the entire match $0 within angled brackets.
Related questions
Having trouble with Splitting text. - discusses common mistakes like split(".") and split("|")
Regular expressions references
regular-expressions.info
Character class and Repetition with Star and Plus
java.util.regex.Pattern and Matcher
In case you want to replace ? with \?, there are 2 possibilities: replace and replaceAll (for regular expressions):
str.replace("?", "\\?")
str.replaceAll("\\?","\\\\?");
The result is "The . startup trace \?state is info\?"
If you want to replace ? with \, just remove the ? character from the second argument.
But when I print the string, it prints
with single slash.
Good. That's exactly what you want, isn't it?
There are two simple rules:
A backslash inside a String literal has to be specified as two to satisfy the compiler, i.e. "\". Otherwise it is taken as a special-character escape.
A backslash in a regular expresion has to be specified as two to satisfy regex, otherwise it is taken as a regex escape. Because of (1) this means you have to write 2x2=4 of them:"\\\\" (and because of the forum software I actually had to write 8!).
String str="\\";
str=str.replace(str,"\\\\");
System.out.println("New String="+str);
Out put:- New String=\
In java "\\" treat as "\". So, the above code replace a "\" single slash into "\\".

Categories