Matcher.replaceAll() removes backslash even when I escape it. Java - java

I have functionality in my app that should replace some text in json (I have simplified it in the example). Their replacement may contain escaping sequences like \n \b \t etc. which can break the json string when I try to build json with Jackson. So I decided to use Apache's solution - StringEscapeUtils.escapeJava() to escape all escaping sequences. But
Matcher.replaceAll() removes backslashes which added by escapeJava()
There is the code:
public static void main(String[] args) {
String json = "{\"test2\": \"Hello toReplace \\\"test\\\" world\"}";
String replacedJson = Pattern.compile("toReplace")
.matcher(json)
.replaceAll(StringEscapeUtils.escapeJava("replacement \n \b \t"));
System.out.println(replacedJson);
}
Expected Output:
{"test2": "Hello replacement \n \b \t \"test\" world"}
Actual Output:
{"test2": "Hello replacement n b t \"test\" world"}
Why does Matcher.replaceAll() removes backslahes while System.out.println(StringEscapeUtils.escapeJava("replacement \n \b \t")); returns correct output - replacement \n \b \t

StringEscapeUtils.escapeJava("\n") allows you to transform the single newline character \n into two characters: \ and n.
\ is a special character in pattern replacements though, from https://docs.oracle.com/javase/7/docs/api/java/util/regex/Matcher.html#replaceAll(java.lang.String):
Note that backslashes (\) and dollar signs ($) in the replacement string may cause the results to be different than if it were being treated as a literal replacement string. Dollar signs may be treated as references to captured subsequences as described above, and backslashes are used to escape literal characters in the replacement string.
To have them taken as literal characters, you need to escape it via Matcher.quoteReplacement, from https://docs.oracle.com/javase/7/docs/api/java/util/regex/Matcher.html#quoteReplacement(java.lang.String):
Returns a literal replacement String for the specified String. This method produces a String that will work as a literal replacement s in the appendReplacement method of the Matcher class. The String produced will match the sequence of characters in s treated as a literal sequence. Slashes (\) and dollar signs ($) will be given no special meaning.
So in your case:
.replaceAll(Matcher.quoteReplacement(StringEscapeUtils.escapeJava("replacement \n \b \t")))

If you want a literal backslash in replaceAll, you need to escape it. You can find this in the documentation here
StringEscapeUtils.escapeJava will escape a string suitable for use in Java source code - but it won't allow you to use unescaped strings in your source code.
"replacement \n \b \t"
^ new line
^ backspace
^ tab
If you want literal backslashes in a regular Java string, you need:
"replacement \\n \\b \\t"
Because this is a java string of the replace part of a regular expression for replaceAll, you need:
"replacement \\\\n \\\\b \\\\t"
Try:
String replacedJson = Pattern.compile("toReplace")
.matcher(json)
.replaceAll("replacement \\\\n \\\\b \\\\t")

You have to escape \ as well using Matcher.quoteReplacement().
public static String replaceAll(String json, String regex, String replace) {
return Pattern.compile(regex)
.matcher(json)
.replaceAll(Matcher.quoteReplacement(StringEscapeUtils.escapeJava(replace)));
}

Related

Why does this regex match online but not in my environment? [duplicate]

I am trying out the following code and it's printing false.
I was expected that this would print true.
In addition , the Pattern.Compile() statemenet , gives a warning 'redundant escape character'.
Can someone please help me as to why this is not returning true and why do I see a warning.
public static void main(String[] args) {
String s = "\\n";
System.out.println(s);
Pattern p = Pattern.compile("\\\n");
Matcher mm = p.matcher(s);
System.out.println(mm.matches());
}
The s="\\n" means you assign a backslash and n to the variable s, and it contains a sequence of two chars, \ and n.
The Pattern.compile("\\\n") means you define a regex pattern \<LF> (a backslash and a newline, line feed, char) that matches a newline (LF) char, because escaped non-word non-special chars match themselves. \, matches a ,, \; matches a ;. Thus, this pattern won't match the string in variable s.
The redundant escape warning is thrown because \<LF> matches the same newline char that can be matched with mere <LF>.
More examples:
Regex
Regex string literal
Matching text
Matching string literal
<LF>
"\n"
<LF>
"\n"
\n
"\\n"
<LF>
"\n"
\\n
"\\\\n"
\n
"\\n"
Because "\\n" evaulates to backslash \\ and the letter n while "\\\n" evaluates to a backslash \\ and then a newline \n.
Backslashes within string literals in Java source code are interpreted as required by The Java™ Language Specification as either Unicode escapes (section 3.3) or other character escapes (section 3.10.6) It is therefore necessary to double backslashes in string literals that represent regular expressions to protect them from interpretation by the Java bytecode compiler. The string literal "\b", for example, matches a single backspace character when interpreted as a regular expression, while "\\b" matches a word boundary.
Refer : https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
Your source s has two characters, '\' and 'n', if you meant it would be \ followed by a line break then it should be "\\\n"
Pattern has two characters '\' and '\n' (line break) and \ the escape characher is not needed, hence warning. If you meant \ followed by line break it should be "\\\\\n" (twice \ to escape it for regex and then \n).
String s = "\\\n";
System.out.println(s);
Pattern p = Pattern.compile("\\\\\n");
Matcher mm = p.matcher(s);
System.out.println(mm.matches());

Java - Regular expression to match strings not containing quote and backslash characters [duplicate]

How to write a regular expression to match this \" (a backslash then a quote)? Assume I have a string like this:
click to search
I need to replace all the \" with a ", so the result would look like:
click to search
This one does not work: str.replaceAll("\\\"", "\"") because it only matches the quote. Not sure how to get around with the backslash. I could have removed the backslash first, but there are other backslashes in my string.
If you don't need any of regex mechanisms like predefined character classes \d, quantifiers etc. instead of replaceAll which expects regex use replace which expects literals
str = str.replace("\\\"","\"");
Both methods will replace all occurrences of targets, but replace will treat targets literally.
BUT if you really must use regex you are looking for
str = str.replaceAll("\\\\\"", "\"")
\ is special character in regex (used for instance to create \d - character class representing digits). To make regex treat \ as normal character you need to place another \ before it to turn off its special meaning (you need to escape it). So regex which we are trying to create is \\.
But to create string literal representing text \\ so you could pass it to regex engine you need to write it as four \ ("\\\\"), because \ is also special character in String literals (part of code written using "...") since it can be used for instance as \t to represent tabulator.
That is why you also need to escape \ there.
In short you need to escape \ twice:
in regex \\
and then in String literal "\\\\"
You don't need a regular expression.
str.replace("\\\"", "\"")
should work just fine.
The replace method takes two substrings and replaces all non-overlapping occurrences of the first with the second. Per the javadoc:
public String replace(CharSequence target,
CharSequence replacement)
Replaces each substring of this string that matches the literal target sequence with the specified literal replacement sequence. The replacement proceeds from the beginning of the string to the end, for example, replacing "aa" with "b" in the string "aaa" will result in "ba" rather than "ab".
try this: str.replaceAll("\\\\\"", "\\\"")
because Java will replace \ twice:
(1) \\\\\" --> \\" (for string)
(2) \\" --> \" (for regex)

Regular expression does not match newline when escaped with backslash [duplicate]

I am trying out the following code and it's printing false.
I was expected that this would print true.
In addition , the Pattern.Compile() statemenet , gives a warning 'redundant escape character'.
Can someone please help me as to why this is not returning true and why do I see a warning.
public static void main(String[] args) {
String s = "\\n";
System.out.println(s);
Pattern p = Pattern.compile("\\\n");
Matcher mm = p.matcher(s);
System.out.println(mm.matches());
}
The s="\\n" means you assign a backslash and n to the variable s, and it contains a sequence of two chars, \ and n.
The Pattern.compile("\\\n") means you define a regex pattern \<LF> (a backslash and a newline, line feed, char) that matches a newline (LF) char, because escaped non-word non-special chars match themselves. \, matches a ,, \; matches a ;. Thus, this pattern won't match the string in variable s.
The redundant escape warning is thrown because \<LF> matches the same newline char that can be matched with mere <LF>.
More examples:
Regex
Regex string literal
Matching text
Matching string literal
<LF>
"\n"
<LF>
"\n"
\n
"\\n"
<LF>
"\n"
\\n
"\\\\n"
\n
"\\n"
Because "\\n" evaulates to backslash \\ and the letter n while "\\\n" evaluates to a backslash \\ and then a newline \n.
Backslashes within string literals in Java source code are interpreted as required by The Java™ Language Specification as either Unicode escapes (section 3.3) or other character escapes (section 3.10.6) It is therefore necessary to double backslashes in string literals that represent regular expressions to protect them from interpretation by the Java bytecode compiler. The string literal "\b", for example, matches a single backspace character when interpreted as a regular expression, while "\\b" matches a word boundary.
Refer : https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
Your source s has two characters, '\' and 'n', if you meant it would be \ followed by a line break then it should be "\\\n"
Pattern has two characters '\' and '\n' (line break) and \ the escape characher is not needed, hence warning. If you meant \ followed by line break it should be "\\\\\n" (twice \ to escape it for regex and then \n).
String s = "\\\n";
System.out.println(s);
Pattern p = Pattern.compile("\\\\\n");
Matcher mm = p.matcher(s);
System.out.println(mm.matches());

Regular expression to match a backslash followed by a quote

How to write a regular expression to match this \" (a backslash then a quote)? Assume I have a string like this:
click to search
I need to replace all the \" with a ", so the result would look like:
click to search
This one does not work: str.replaceAll("\\\"", "\"") because it only matches the quote. Not sure how to get around with the backslash. I could have removed the backslash first, but there are other backslashes in my string.
If you don't need any of regex mechanisms like predefined character classes \d, quantifiers etc. instead of replaceAll which expects regex use replace which expects literals
str = str.replace("\\\"","\"");
Both methods will replace all occurrences of targets, but replace will treat targets literally.
BUT if you really must use regex you are looking for
str = str.replaceAll("\\\\\"", "\"")
\ is special character in regex (used for instance to create \d - character class representing digits). To make regex treat \ as normal character you need to place another \ before it to turn off its special meaning (you need to escape it). So regex which we are trying to create is \\.
But to create string literal representing text \\ so you could pass it to regex engine you need to write it as four \ ("\\\\"), because \ is also special character in String literals (part of code written using "...") since it can be used for instance as \t to represent tabulator.
That is why you also need to escape \ there.
In short you need to escape \ twice:
in regex \\
and then in String literal "\\\\"
You don't need a regular expression.
str.replace("\\\"", "\"")
should work just fine.
The replace method takes two substrings and replaces all non-overlapping occurrences of the first with the second. Per the javadoc:
public String replace(CharSequence target,
CharSequence replacement)
Replaces each substring of this string that matches the literal target sequence with the specified literal replacement sequence. The replacement proceeds from the beginning of the string to the end, for example, replacing "aa" with "b" in the string "aaa" will result in "ba" rather than "ab".
try this: str.replaceAll("\\\\\"", "\\\"")
because Java will replace \ twice:
(1) \\\\\" --> \\" (for string)
(2) \\" --> \" (for regex)

Java regular expressions and dollar sign

I have Java string:
String b = "/feedback/com.school.edu.domain.feedback.Review$0/feedbackId");
I also have generated pattern against which I want to match this string:
String pattern = "/feedback/com.school.edu.domain.feedback.Review$0(.)*";
When I say b.matches(pattern) it returns false. Now I know dollar sign is part of Java RegEx, but I don't know how should my pattern look like. I am assuming that $ in pattern needs to be replaced by some escape characters, but don't know how many. This $ sign is important to me as it helps me distinguish elements in list (numbers after dollar), and I can't go without it.
Use
String escapedString = java.util.regex.Pattern.quote(myString)
to automatically escape all special regex characters in a given string.
You need to escape $ in the regex with a back-slash (\), but as a back-slash is an escape character in strings you need to escape the back-slash itself.
You will need to escape any special regex char the same way, for example with ".".
String pattern = "/feedback/com\\.navteq\\.lcms\\.common\\.domain\\.poi\\.feedback\\.Review\\$0(.)*";
In Java regex both . and $ are special. You need to escape it with 2 backslashes, i.e..
"/feedback/com\\.navtag\\.etc\\.Review\\$0(.*)"
(1 backslash is for the Java string, and 1 is for the regex engine.)
Escape the dollar with \
String pattern =
"/feedback/com.navteq.lcms.common.domain.poi.feedback.Review\\$0(.)*";
I advise you to escape . as well, . represent any character.
String pattern =
"/feedback/com\\.navteq\\.lcms\\.common\\.domain\\.poi\\.feedback\\.Review\\$0(.)*";
The ans by #Colin Hebert and edited by #theon is correct. The explanation is as follows. #azec-pdx
It is a regex as a string literal (within double quotes).
period (.) and dollar-sign ($) are special regex characters (metacharacters).
To make the regex engine interpret them as normal regex characters period(.) and dollar-sign ($), you need to prefix a single backslash to each. The single backslash ( itself a special regex character) quotes the character following it and thus escaping it.
Since the given regex is a string literal, another backslash is required to be prefixed to each to avoid confusion with the usual visible-ASCII escapes(character, string and Unicode escapes in string literals) and thus avoid compiler error.
Even if you use within a string literal any special regex construct that has been defined as an escape sequence, it needs to be prefixed with another backslash to avoid compiler error.For example, the special regex construct (an escape sequence) \b (word boundary) of regex would clash with \b(backspace) of the usual visible-ASCII escape(character escape). Thus another backslash is prefixed to avoid the clash and then \\b would be read by regex as word boundary.
To be always safe, all single backslash escapes (quotes) within string literals are prefixed with another backslash. For example, the string literal "\(hello\)" is illegal and leads to a compile-time error; in order to match the string (hello) the string literal "\\(hello\\)" must be used.
The last period (.)* is supposed to be interpreted as special regex character and thus it needs no quoting by a backslash, let alone prefixing a second one.

Categories