Escape Java RegExp Metacharacters - java

I'm trying to escape a RegExp metacharacter in Java. Below is what I want:
INPUT STRING: "This is $ test"
OUTPUT STRING: "This is \$ test"
This is what I'm currently doing but it's not working:
String inputStr= "This is $ test";
inputStr = inputStr.replaceAll("$","\\$");
But I'm getting wrong output:
"This is $ test$"

You'll need:
inputStr.replaceAll("\\$", "\\\\\\$");
The String to be replaced needs 2 backslashes because $ has a special meaning in the regexp. So $ must be escaped, to get: \$, and that backslash must itself be escaped within the java String: "\\$".
The replacement string needs 6 backslashes because both \ and $ have special meaning in the replacement strings:
\ can be used to escape characters in the replacement string.
$ can be used to make back-references in the replacement string.
So if your intended replacement string is "\$", you need to escape each of those two characters to get: \\\$, and then each backslash you need to use - 3 of them, 1 literal and 2 for escapes - must also be escaped within the java String: "\\\\\\$".
See: Matcher.replaceAll

As you said, $ is a reserved character for Regex. Then, you need to escape it. You can use a backslash character to do this:
inputStr.replaceAll("\\$", ...);
In the replacement, the $ and \ characters also have a special meaning:
Note that backslashes () and dollar signs ($) in the replacement string may cause the results to be different than if it were being treated as a literal replacement string; see Matcher.replaceAll
Then, the replacement will be the backslash character and the dollar sign, both of them being escaped by a '\' character (which needs to be doubled toi build the String):
inputStr.replaceAll("\\$", "\\\\\\$");

You have to put 6 backslashes so you escape the backslash and escape the metachar:
inputStr.replaceAll("\\$","\\\\\\$");

The first argument to replaceAll is infact a regexp, and the $ actually means "match the end of the string". You can just use replace instead, which doesn't use regexp, just a normal string replace, to achieve what you want in this case. If you want to use a regexp, just escape the $ in the first argument.

Related

Java - Regular expression to match strings not containing quote and backslash characters [duplicate]

How to write a regular expression to match this \" (a backslash then a quote)? Assume I have a string like this:
click to search
I need to replace all the \" with a ", so the result would look like:
click to search
This one does not work: str.replaceAll("\\\"", "\"") because it only matches the quote. Not sure how to get around with the backslash. I could have removed the backslash first, but there are other backslashes in my string.
If you don't need any of regex mechanisms like predefined character classes \d, quantifiers etc. instead of replaceAll which expects regex use replace which expects literals
str = str.replace("\\\"","\"");
Both methods will replace all occurrences of targets, but replace will treat targets literally.
BUT if you really must use regex you are looking for
str = str.replaceAll("\\\\\"", "\"")
\ is special character in regex (used for instance to create \d - character class representing digits). To make regex treat \ as normal character you need to place another \ before it to turn off its special meaning (you need to escape it). So regex which we are trying to create is \\.
But to create string literal representing text \\ so you could pass it to regex engine you need to write it as four \ ("\\\\"), because \ is also special character in String literals (part of code written using "...") since it can be used for instance as \t to represent tabulator.
That is why you also need to escape \ there.
In short you need to escape \ twice:
in regex \\
and then in String literal "\\\\"
You don't need a regular expression.
str.replace("\\\"", "\"")
should work just fine.
The replace method takes two substrings and replaces all non-overlapping occurrences of the first with the second. Per the javadoc:
public String replace(CharSequence target,
CharSequence replacement)
Replaces each substring of this string that matches the literal target sequence with the specified literal replacement sequence. The replacement proceeds from the beginning of the string to the end, for example, replacing "aa" with "b" in the string "aaa" will result in "ba" rather than "ab".
try this: str.replaceAll("\\\\\"", "\\\"")
because Java will replace \ twice:
(1) \\\\\" --> \\" (for string)
(2) \\" --> \" (for regex)

Regex to match a consecutive occurrence of two backslashes?

I have to replace all occurrences of '\''\' with '\' in a String. I know regex "\\\\" means \ but how do I write a regex for replaceAll() to match '\''\'. I tried:
.replaceAll("\\\\\\\\", "\\")
but I get a java.util.regex.PatternSyntaxException?
If you want to replace literals instead of replaceAll use replace method which doesn't use regex syntax:
replace("\\\\", "\\")
If you absolutely must use replaceAll remember that its second parameter also have some special characters which are
$ (where $x represents match from group x)
and \ to escape that $ and itself,
so code using replaceAll would need to look like:
replaceAll("\\\\\\\\", "\\\\")
since we also need to escape \ twice (once in regex engine \\, once in string \\\\).

difference between '\' and '\\' in while using it as escape characters

I know that we use escape characters like \n for next line and \t for tab.
But today while working on few string I came across \\$.
I had to print "nike$" so to print it I had to modify the string as "nike\\$".
I want to know what is the exact difference between \ and \\.
Inside a string literal, \ is an escape: The next character that follows tells us what it will do, as in your \n example for newline.
This means you can't put \ in a string on its own, since it's half of an escape sequence. Instead, to have a \ actually in a string, you use \\.
I had to print "nike$" so to print it I had to modify the string as "nike\\$"
"nike\\$" will result in a string that outputs (for instance, via System.out.println) as nike\$, not nike$.
Your use of \\$ suggests to me that you were feeding a regular expression pattern into something, e.g.:
p = Pattern.compile("nike\\$");
In that situation, we have two levels of escaping going on: The string literal, and the regular expression. To have a literal $ in a regular expression, it has to be escaped by \ because otherwise it's an end-of-input assertion. To get that \$ actually to the regular expression parser when using a string literal, we have to escape the backslash in the literal so we actually have a backslash in the string for the regular expression engine to see, thus \\$.

How to replace one or more \ in string with just \?

Consider the string,
this\is\\a\new\\string
The output should be:
this\is\a\new\string
So basically one or more \ character should be replaced with just one \.
I tried the following:
str = str.replace("[\\]+","\")
but it was no use. The reason I used two \ in [\\]+ was because internally \ is stored as \\. I know this might be a basic regex question, but I am able to replace one or more normal alphabets but not \ character. Any help is really appreciated.
str.replace("[\\]+", "\") has few problems,
replace doesn't use regex (replaceAll does) so "[\\]" will represent [\] literal, not \ nor \\ (depending on what you think it would represent)
even if it did accept regex "[\\]" would not be correct regex because \\] would escape ] so you would end up with unclosed character class [..
it will not compile (your replacement String is not closed)
It will not compile because \ is start of escape sequence \X where X needs to be either
changed from being String special character to simple literal, like in your case \" will escape " to be literal (so you could print it for instance) instead of being start/end of String,
changed from being normal character to be special one like in case of line separators \n \r or tabulations \t.
Now we know that \ is special and is used to escape other character. So what do you think we need to do to make \ represent literal (when we want to print \). If you guessed that it needs to be escaped with another \ then you are right. To create \ literal we need to write it in String as "\\".
Since you know how to create String containing \ literal (escaped \) you can start thinking about how to create your replacements.
Regex which represents one or more \ can look like
\\+
But that is its native form, and we need to create it using String. I used \\ here because in regex \ is also special character (for instance \d represents digits , not \ literal followed by d) so it also needs to be escaped first to represent \ literal. Just like in String we can escape it with another \.
So String representing this regex will need to be written as
"\\\\+" (we escaped \ twice, once in regex \\+ and once in string)
You can use it as first argument of replaceAll (because replace as mentioned earlier doesn't accept regex).
Now last problem you will face is second argument of replaceAll method. If you write
replaceAll("\\\\+", "\\")
and it will find match for regex you will see exception
java.lang.IllegalArgumentException: character to be escaped is missing
It is because in replacement part (second argument in replaceAll method) we can also use special formula $x which represents current match from group with index x. So to be able to escape $ into literal we need some escape mechanism, and again \ was used here for that purpose. So \ is also special in replacement part of our method.
So again to create \ literal we need to escape it with another \, and string literal representing expression \\ is "\\\\".
But lets get back to earlier exception: message "character to be escaped is missing" refers to X part of \X formula (X is character we want to be escaped). Problem is that earlier your replacement "\\" represented only \ part, so this method expected either $ to create \$, or \\ to create \ literal. So valid replacements would be "\\$ or "\\\\".
To make things work you need to write your replacing method as
str = str.replaceAll("\\\\+", "\\\\")
You can use:
str = str.replace("\\\\", "\\");
Remember that String#replace doesn't take a regex.
try this
str = str.replaceAll("\\\\+", "\\\\");
When writing regular expressions, you typically need to double-escape backslashes. So you would do this:
str = str.replaceAll("\\\\+", "\\\\");
I'd use Matcher.quoteReplacement() and String.replaceAll() here.
Like this:
String s;
[...]
s = s.replaceAll("\\\\+", Matcher.quoteReplacement("\\"));

Java regular expressions and dollar sign

I have Java string:
String b = "/feedback/com.school.edu.domain.feedback.Review$0/feedbackId");
I also have generated pattern against which I want to match this string:
String pattern = "/feedback/com.school.edu.domain.feedback.Review$0(.)*";
When I say b.matches(pattern) it returns false. Now I know dollar sign is part of Java RegEx, but I don't know how should my pattern look like. I am assuming that $ in pattern needs to be replaced by some escape characters, but don't know how many. This $ sign is important to me as it helps me distinguish elements in list (numbers after dollar), and I can't go without it.
Use
String escapedString = java.util.regex.Pattern.quote(myString)
to automatically escape all special regex characters in a given string.
You need to escape $ in the regex with a back-slash (\), but as a back-slash is an escape character in strings you need to escape the back-slash itself.
You will need to escape any special regex char the same way, for example with ".".
String pattern = "/feedback/com\\.navteq\\.lcms\\.common\\.domain\\.poi\\.feedback\\.Review\\$0(.)*";
In Java regex both . and $ are special. You need to escape it with 2 backslashes, i.e..
"/feedback/com\\.navtag\\.etc\\.Review\\$0(.*)"
(1 backslash is for the Java string, and 1 is for the regex engine.)
Escape the dollar with \
String pattern =
"/feedback/com.navteq.lcms.common.domain.poi.feedback.Review\\$0(.)*";
I advise you to escape . as well, . represent any character.
String pattern =
"/feedback/com\\.navteq\\.lcms\\.common\\.domain\\.poi\\.feedback\\.Review\\$0(.)*";
The ans by #Colin Hebert and edited by #theon is correct. The explanation is as follows. #azec-pdx
It is a regex as a string literal (within double quotes).
period (.) and dollar-sign ($) are special regex characters (metacharacters).
To make the regex engine interpret them as normal regex characters period(.) and dollar-sign ($), you need to prefix a single backslash to each. The single backslash ( itself a special regex character) quotes the character following it and thus escaping it.
Since the given regex is a string literal, another backslash is required to be prefixed to each to avoid confusion with the usual visible-ASCII escapes(character, string and Unicode escapes in string literals) and thus avoid compiler error.
Even if you use within a string literal any special regex construct that has been defined as an escape sequence, it needs to be prefixed with another backslash to avoid compiler error.For example, the special regex construct (an escape sequence) \b (word boundary) of regex would clash with \b(backspace) of the usual visible-ASCII escape(character escape). Thus another backslash is prefixed to avoid the clash and then \\b would be read by regex as word boundary.
To be always safe, all single backslash escapes (quotes) within string literals are prefixed with another backslash. For example, the string literal "\(hello\)" is illegal and leads to a compile-time error; in order to match the string (hello) the string literal "\\(hello\\)" must be used.
The last period (.)* is supposed to be interpreted as special regex character and thus it needs no quoting by a backslash, let alone prefixing a second one.

Categories