Java: Replace all ' in a string with \' - java

I need to escape all quotes (') in a string, so it becomes \'
I've tried using replaceAll, but it doesn't do anything. For some reason I can't get the regex to work.
I'm trying with
String s = "You'll be totally awesome, I'm really terrible";
String shouldBecome = "You\'ll be totally awesome, I\'m really terrible";
s = s.replaceAll("'","\\'"); // Doesn't do anything
s = s.replaceAll("\'","\\'"); // Doesn't do anything
s = s.replaceAll("\\'","\\'"); // Doesn't do anything
I'm really stuck here, hope somebody can help me here.
Thanks,
Iwan

You have to first escape the backslash because it's a literal (yielding \\), and then escape it again because of the regular expression (yielding \\\\). So, Try:
s.replaceAll("'", "\\\\'");
output:
You\'ll be totally awesome, I\'m really terrible

Use replace()
s = s.replace("'", "\\'");
output:
You\'ll be totally awesome, I\'m really terrible

Let's take a tour of String#repalceAll(String regex, String replacement)
You will see that:
An invocation of this method of the form str.replaceAll(regex, repl) yields exactly the same result as the expression
Pattern.compile(regex).matcher(str).replaceAll(repl)
So lets take a look at Matcher.html#replaceAll(java.lang.String) documentation
Note that backslashes (\) and dollar signs ($) in the replacement string may cause the results to be different than if it were being treated as a literal replacement string. Dollar signs may be treated as references to captured subsequences as described above, and backslashes are used to escape literal characters in the replacement string.
You can see that in replacement we have special character $ which can be used as reference to captured group like
System.out.println("aHellob,aWorldb".replaceAll("a(\\w+?)b", "$1"));
// result Hello,World
But sometimes we don't want $ to be such special because we want to use it as simple dollar character, so we need a way to escape it.
And here comes \, because since it is used to escape metacharacters in regex, Strings and probably in other places it is good convention to use it here to escape $.
So now \ is also metacharacter in replacing part, so if you want to make it simple \ literal in replacement you need to escape it somehow. And guess what? You escape it the same way as you escape it in regex or String. You just need to place another \ before one you escaping.
So if you want to create \ in replacement part you need to add another \ before it. But remember that to write \ literal in String you need to write it as "\\" so to create two \\ in replacement you need to write it as "\\\\".
So try
s = s.replaceAll("'", "\\\\'");
Or even better
to reduce explicit escaping in replacement part (and also in regex part - forgot to mentioned that earlier) just use replace instead replaceAll which adds regex escaping for us
s = s.replace("'", "\\'");

This doesn't say how to "fix" the problem - that's already been done in other answers; it exists to draw out the details and applicable documentation references.
When using String.replaceAll or any of the applicable Matcher replacers, pay attention to the replacement string and how it is handled:
Note that backslashes (\) and dollar signs ($) in the replacement string may cause the results to be different than if it were being treated as a literal replacement string. Dollar signs may be treated as references to captured subsequences as described above, and backslashes are used to escape literal characters in the replacement string.
As pointed out by isnot2bad in a comment, Matcher.quoteReplacement may be useful here:
Returns a literal replacement String for the specified String. .. The String produced will match the sequence of characters in s treated as a literal sequence. Slashes (\) and dollar signs ($) will be given no special meaning.

You could also try using something like StringEscapeUtils to make your life even easier: http://commons.apache.org/proper/commons-lang/javadocs/api-2.6/org/apache/commons/lang/StringEscapeUtils.html
s = StringEscapeUtils.escapeJava(s);

You can use apache's commons-text library (instead of commons-lang):
Example code:
org.apache.commons.text.StringEscapeUtils.escapeJava(escapedString);
Dependency:
compile 'org.apache.commons:commons-text:1.8'
OR
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-text</artifactId>
<version>1.8</version>
</dependency>

Related

How to replace a space exactly with "\\\\s+" [duplicate]

I'm trying to convert the String \something\ into the String \\something\\ using replaceAll, but I keep getting all kinds of errors. I thought this was the solution:
theString.replaceAll("\\", "\\\\");
But this gives the below exception:
java.util.regex.PatternSyntaxException: Unexpected internal error near index 1
The String#replaceAll() interprets the argument as a regular expression. The \ is an escape character in both String and regex. You need to double-escape it for regex:
string.replaceAll("\\\\", "\\\\\\\\");
But you don't necessarily need regex for this, simply because you want an exact character-by-character replacement and you don't need patterns here. So String#replace() should suffice:
string.replace("\\", "\\\\");
Update: as per the comments, you appear to want to use the string in JavaScript context. You'd perhaps better use StringEscapeUtils#escapeEcmaScript() instead to cover more characters.
TLDR: use theString = theString.replace("\\", "\\\\"); instead.
Problem
replaceAll(target, replacement) uses regular expression (regex) syntax for target and partially for replacement.
Problem is that \ is special character in regex (it can be used like \d to represents digit) and in String literal (it can be used like "\n" to represent line separator or \" to escape double quote symbol which normally would represent end of string literal).
In both these cases to create \ symbol we can escape it (make it literal instead of special character) by placing additional \ before it (like we escape " in string literals via \").
So to target regex representing \ symbol will need to hold \\, and string literal representing such text will need to look like "\\\\".
So we escaped \ twice:
once in regex \\
once in String literal "\\\\" (each \ is represented as "\\").
In case of replacement \ is also special there. It allows us to escape other special character $ which via $x notation, allows us to use portion of data matched by regex and held by capturing group indexed as x, like "012".replaceAll("(\\d)", "$1$1") will match each digit, place it in capturing group 1 and $1$1 will replace it with its two copies (it will duplicate it) resulting in "001122".
So again, to let replacement represent \ literal we need to escape it with additional \ which means that:
replacement must hold two backslash characters \\
and String literal which represents \\ looks like "\\\\"
BUT since we want replacement to hold two backslashes we will need "\\\\\\\\" (each \ represented by one "\\\\").
So version with replaceAll can look like
replaceAll("\\\\", "\\\\\\\\");
Easier way with replaceAll
To make out life easier Java provides tools to automatically escape text into target and replacement parts. So now we can focus only on strings, and forget about regex syntax:
replaceAll(Pattern.quote(target), Matcher.quoteReplacement(replacement))
which in our case can look like
replaceAll(Pattern.quote("\\"), Matcher.quoteReplacement("\\\\"))
Even better: use replace
If we don't really need regex syntax support lets not involve replaceAll at all. Instead lets use replace. Both methods will replace all targets, but replace doesn't involve regex syntax. So you could simply write
theString = theString.replace("\\", "\\\\");
To avoid this sort of trouble, you can use replace (which takes a plain string) instead of replaceAll (which takes a regular expression). You will still need to escape backslashes, but not in the wild ways required with regular expressions.
You'll need to escape the (escaped) backslash in the first argument as it is a regular expression. Replacement (2nd argument - see Matcher#replaceAll(String)) also has it's special meaning of backslashes, so you'll have to replace those to:
theString.replaceAll("\\\\", "\\\\\\\\");
Yes... by the time the regex compiler sees the pattern you've given it, it sees only a single backslash (since Java's lexer has turned the double backwhack into a single one). You need to replace "\\\\" with "\\\\", believe it or not! Java really needs a good raw string syntax.

Why String.replaceAll() in java requires 4 slashes "\\\\" in regex to actually replace "\"?

I recently noticed that, String.replaceAll(regex,replacement) behaves very weirdly when it comes to the escape-character "\"(slash)
For example consider there is a string with filepath - String text = "E:\\dummypath"
and we want to replace the "\\" with "/".
text.replace("\\","/") gives the output "E:/dummypath" whereas text.replaceAll("\\","/") raises the exception java.util.regex.PatternSyntaxException.
If we want to implement the same functionality with replaceAll() we need to write it as,
text.replaceAll("\\\\","/")
One notable difference is replaceAll() has its arguments as reg-ex whereas replace() has arguments character-sequence!
But text.replaceAll("\n","/") works exactly the same as its char-sequence equivalent text.replace("\n","/")
Digging Deeper:
Even more weird behaviors can be observed when we try some other inputs.
Lets assign text="Hello\nWorld\n"
Now,
text.replaceAll("\n","/"), text.replaceAll("\\n","/"), text.replaceAll("\\\n","/") all these three gives the same output Hello/World/
Java had really messed up with the reg-ex in its best possible way I feel! No other language seems to have these playful behaviors in reg-ex. Any specific reason, why Java messed up like this?
You need to esacpe twice, once for Java, once for the regex.
Java code is
"\\\\"
makes a regex string of
"\\" - two chars
but the regex needs an escape too so it turns into
\ - one symbol
#Peter Lawrey's answer describes the mechanics. The "problem" is that backslash is an escape character in both Java string literals, and in the mini-language of regexes. So when you use a string literal to represent a regex, there are two sets of escaping to consider ... depending on what you want the regex to mean.
But why is it like that?
It is a historical thing. Java originally didn't have regexes at all. The syntax rules for Java String literals were borrowed from C / C++, which also didn't have built-in regex support. Awkwardness of double escaping didn't become apparent in Java until they added regex support in the form of the Pattern class ... in Java 1.4.
So how do other languages manage to avoid this?
They do it by providing direct or indirect syntactic support for regexes in the programming language itself. For instance, in Perl, Ruby, Javascript and many other languages, there is a syntax for patterns / regexs (e.g. '/pattern/') where string literal escaping rules do not apply. In C# and Python, they provide an alternative "raw" string literal syntax in which backslashes are not escapes. (But note that if you use the normal C# / Python string syntax, you have the Java problem of double escaping.)
Why do text.replaceAll("\n","/"), text.replaceAll("\\n","/"), and text.replaceAll("\\\n","/") all give the same output?
The first case is a newline character at the String level. The Java regex language treats all non-special characters as matching themselves.
The second case is a backslash followed by an "n" at the String level. The Java regex language interprets a backslash followed by an "n" as a newline.
The final case is a backslash followed by a newline character at the String level. The Java regex language doesn't recognize this as a specific (regex) escape sequence. However in the regex language, a backslash followed by any non-alphabetic character means the latter character. So, a backslash followed by a newline character ... means the same thing as a newline.
1) Let's say you want to replace a single \ using Java's replaceAll method:
\
˪--- 1) the final backslash
2) Java's replaceAll method takes a regex as first argument. In a regex literal, \ has a special meaning, e.g. in \d which is a shortcut for [0-9] (any digit). The way to escape a metachar in a regex literal is to precede it with a \, which leads to:
\ \
| ˪--- 1) the final backslash
|
˪----- 2) the backslash needed to escape 1) in a regex literal
3) In Java, there is no regex literal: you write a regex in a string literal (unlike JavaScript for example, where you can write /\d+/). But in a string literal, \ also has a special meaning, e.g. in \n (a new line) or \t (a tab). The way to escape a metachar in a string literal is to precede it with a \, which leads to:
\\\\
|||˪--- 1) the final backslash
||˪---- 3) the backslash needed to escape 1) in a string literal
|˪----- 2) the backslash needed to escape 1) in a regex literal
˪------ 3) the backslash needed to escape 2) in a string literal
This is because Java tries to give \ a special meaning in the replacement string, so that \$ will be a literal $ sign, but in the process they seem to have removed the actual special meaning of \
While text.replaceAll("\\\\","/"), at least can be considered to be okay in some sense (though it itself is not absolutely right), all the three executions, text.replaceAll("\n","/"), text.replaceAll("\\n","/"), text.replaceAll("\\\n","/") giving same output seem even more funny. It is just contradicting as to why they have restricted the functioning of text.replaceAll("\\","/") for the same reason.
Java didn't mess up with regular expressions. It is because, Java likes to mess up with coders by trying to do something unique and different, when it is not at all required.
One way around this problem is to replace backslash with another character, use that stand-in character for intermediate replacements, then convert it back into backslash at the end. For example, to convert "\r\n" to "\n":
String out = in.replace('\\','#').replaceAll("#r#n","#n").replace('#','\\');
Of course, that won't work very well if you choose a replacement character that can occur in the input string.
I think java really messed with regular expression in String.replaceAll();
Other than java I have never seen a language parse regular expression this way. You will be confused if you have used regex in some other languages.
In case of using the "\\" in replacement string, you can use java.util.regex.Matcher.quoteReplacement(String)
String.replaceAll("/", Matcher.quoteReplacement("\\"));
By using this Matcher class you can get the expected result.

String replace throws error with $ sign

I'm having an issue with replacing a string in java...
the line is:
subject = subject.replaceAll("\\[calEvent\\]", calSubject);
This line doesn’t work with $ sign in calSubject.
what the subject variable is, a dynamic subject line variable from a file. for example like so:
Calnot = [calEvent]
what i am trying to do is replace the calEvent place holder with the subject variable. but how i did it does not work because it crashes when the subject contains a $ sign.
any idea how I can do this so it won't break if the subject contains a $ sign or any characters for that matter?
That's because the dollar sign is a special character in a replacement string, use Matcher.quoteReplacement() to escape this kind of character.
subject = subject.replaceAll("\\[calEvent\\]", Matcher.quoteReplacement(calSubject));
From the doc of String.replaceAll() :
Note that backslashes (\) and dollar signs ($) in the replacement
string may cause the results to be different than if it were being
treated as a literal replacement string; see Matcher.replaceAll. Use
Matcher.quoteReplacement(java.lang.String) to suppress the special
meaning of these characters, if desired.
Note that the dollar sign is used to refer to the corresponding capturing groups in the regular expression ($0, $1, etc.).
EDIT
Matcher.quoteReplacement() has been introduced in Java 1.5, if you're stuck in Java 1.4 you have to escape $ manually by replacing it with \$ inside the string. But since String.replaceAll() would also take the \ and the $ as special characters you have to escape them once and you also have to escape all \ once more for the Java runtime.
("$", "\$") /* what we want */
("\$", "\\\$") /* RegExp engine escape */
("\\$", "\\\\\\$") /* Java runtime escape */
So we get :
calSubject = calSubject.replaceAll("\\$", "\\\\\\$");
if you don't need the regex feature, you can consider to use this method of String class:
replace(CharSequence target,CharSequence replacement)
It saves your "escape" backslashes as well.
api doc:
Replaces each substring of this string that matches the literal target
sequence with the specified literal replacement sequence. The
replacement proceeds from the beginning of the string to the end, for
example, replacing "aa" with "b" in the string "aaa" will result in
"ba" rather than "ab".
From the documentation of replaceAll:
Note that backslashes () and dollar signs ($) in the replacement
string may cause the results to be different than if it were being
treated as a literal replacement string; see Matcher.replaceAll. Use
java.util.regex.Matcher.quoteReplacement to suppress the special
meaning of these characters, if desired.
And in Matcher.replaceAll
Dollar signs may be treated as references to captured subsequences as
described above, and backslashes are used to escape literal characters
in the replacement string.
Not sure I really understand your question but try
subject = subject.replaceAll("\\[calEvent\\]", Matcher.quoteReplacement(calSubject));
Please use
Matcher.quoteReplacement(calEvent);

Eight backslashes required to replace single backslash with double backslashes?

This is a "what the heck is going on here" question. I don't actually need a solution.
I had to replace all single backslashes in a String with double backslashes . This is what I ended up doing...
strRootDirectory = strRootDirectory.replaceAll("\\\\", "\\\\\\\\");
...where strRootDirectory is a java.lang.String above.
Now, I understand the four backslashes for the first argument: regex expects two backslashes in order to indicate a single literal backslash, and java wants them doubled up. That's fine.
BUT, what the heck is going on with the eight backslashes for the second argument? Isn't the replacement string supposed to be a literal (non-regex, I mean) string? I expected to need four backslashes in the second argument, in order to represent two backslashes.
The second argument isn't a regex-string, but a regex-replacement-string, in which the backslash also has a special meaning (it is used to escape the special character $ used for variable interpolation and is also used to escape itself).
From The API:
Note that backslashes (\) and dollar signs ($) in the replacement string may cause the results to be different than if it were being treated as a literal replacement string; see Matcher.replaceAll. Use Matcher.quoteReplacement(java.lang.String) to suppress the special meaning of these characters, if desired.
-- http://download.oracle.com/javase/6/docs/api/java/lang/String.html#replaceAll(...)
It's easier if you use replace("\\","\\\\") (String.replace takes literal strings and is more efficient when it's all literal)
or you can ensure correctness through the Pattern.quote and Matcher.quoteReplacement functions
"\\\\\\\\" leads to an in memory representation of a string with 4 backslashes: \\\\. Although the second string isn't a regex string, backslashes and dollar signs are still special characters in it, so they need to be escaped.
According to Java reference material, the replaceAll method interprets backslashes in the replacement string as escape characters too. They could be used to escape the dollar sign character, which could refer to matched expressions to re-use in the replacement string. so naturally, if you want to double the number of backslashes, and both parameters treat backslash as an escape character, you need twice as many backslashes in the replacement string.
Yep, it gets hairy when you need to do this sort of thing, doesn't it.
The reason you need so many backslashes is that you need to take into account that backslash is used for both escaping a string and for escaping a regex.
Take 1 backslash.
Double it for string escaping.
Double it again for regex escaping.
Double it again because you need to match two consecutive backslashes in your original string.
That makes 8.
As a fan of not getting into super detailed explanations of regex... I figured out from the major answer post by Bart Kiers above:
System.out.println( "line1: "+"hello\\\\world" );
System.out.println( "line2: "+"hello\\\\world".replaceAll("\\\\\\\\", Matcher.quoteReplacement("\\") ) );
prints out
line1: hello\\world
line2: hello\world
I hope it helps...

String.replaceAll single backslashes with double backslashes

I'm trying to convert the String \something\ into the String \\something\\ using replaceAll, but I keep getting all kinds of errors. I thought this was the solution:
theString.replaceAll("\\", "\\\\");
But this gives the below exception:
java.util.regex.PatternSyntaxException: Unexpected internal error near index 1
The String#replaceAll() interprets the argument as a regular expression. The \ is an escape character in both String and regex. You need to double-escape it for regex:
string.replaceAll("\\\\", "\\\\\\\\");
But you don't necessarily need regex for this, simply because you want an exact character-by-character replacement and you don't need patterns here. So String#replace() should suffice:
string.replace("\\", "\\\\");
Update: as per the comments, you appear to want to use the string in JavaScript context. You'd perhaps better use StringEscapeUtils#escapeEcmaScript() instead to cover more characters.
TLDR: use theString = theString.replace("\\", "\\\\"); instead.
Problem
replaceAll(target, replacement) uses regular expression (regex) syntax for target and partially for replacement.
Problem is that \ is special character in regex (it can be used like \d to represents digit) and in String literal (it can be used like "\n" to represent line separator or \" to escape double quote symbol which normally would represent end of string literal).
In both these cases to create \ symbol we can escape it (make it literal instead of special character) by placing additional \ before it (like we escape " in string literals via \").
So to target regex representing \ symbol will need to hold \\, and string literal representing such text will need to look like "\\\\".
So we escaped \ twice:
once in regex \\
once in String literal "\\\\" (each \ is represented as "\\").
In case of replacement \ is also special there. It allows us to escape other special character $ which via $x notation, allows us to use portion of data matched by regex and held by capturing group indexed as x, like "012".replaceAll("(\\d)", "$1$1") will match each digit, place it in capturing group 1 and $1$1 will replace it with its two copies (it will duplicate it) resulting in "001122".
So again, to let replacement represent \ literal we need to escape it with additional \ which means that:
replacement must hold two backslash characters \\
and String literal which represents \\ looks like "\\\\"
BUT since we want replacement to hold two backslashes we will need "\\\\\\\\" (each \ represented by one "\\\\").
So version with replaceAll can look like
replaceAll("\\\\", "\\\\\\\\");
Easier way with replaceAll
To make out life easier Java provides tools to automatically escape text into target and replacement parts. So now we can focus only on strings, and forget about regex syntax:
replaceAll(Pattern.quote(target), Matcher.quoteReplacement(replacement))
which in our case can look like
replaceAll(Pattern.quote("\\"), Matcher.quoteReplacement("\\\\"))
Even better: use replace
If we don't really need regex syntax support lets not involve replaceAll at all. Instead lets use replace. Both methods will replace all targets, but replace doesn't involve regex syntax. So you could simply write
theString = theString.replace("\\", "\\\\");
To avoid this sort of trouble, you can use replace (which takes a plain string) instead of replaceAll (which takes a regular expression). You will still need to escape backslashes, but not in the wild ways required with regular expressions.
You'll need to escape the (escaped) backslash in the first argument as it is a regular expression. Replacement (2nd argument - see Matcher#replaceAll(String)) also has it's special meaning of backslashes, so you'll have to replace those to:
theString.replaceAll("\\\\", "\\\\\\\\");
Yes... by the time the regex compiler sees the pattern you've given it, it sees only a single backslash (since Java's lexer has turned the double backwhack into a single one). You need to replace "\\\\" with "\\\\", believe it or not! Java really needs a good raw string syntax.

Categories