If I have string a"b"c", but I want to get a\"b\"c\", I would naturally write
String t = "a\"b\"c\"";
t = t.replaceAll("\"", "\\\"");
However, that results in the same string, a"b"c". The correct way is
t.replaceAll("\"", "\\\\\"");
Why?
replaceAll uses regular expressions for both the pattern and the replacement - both of which require backslashes to be escaped. So the regex replacement pattern you want for the second argument is:
\\"
Now because both \ and " in Java string literals also need escaping, that means each of those characters needs an extra backslash. Add the quotes, and you've got:
"\\\\\""
which is what you've got in your source.
It's simpler if you just use String.replace which doesn't use regular expressions. That way you're only trying to provide this string (not string literal) as the second argument:
\"
After escaping and turning into a string literal, that becomes:
"\\\""
which still isn't great, but it's at least better.
An alternative is to use replaceAll but with Matcher.quoteReplacement:
t = t.replaceAll("\"", Matcher.quoteReplacement("\\\""));
Personally I'd just use replace() though. You don't want regular expression replacements, after all.
Related
I'm trying to convert the String \something\ into the String \\something\\ using replaceAll, but I keep getting all kinds of errors. I thought this was the solution:
theString.replaceAll("\\", "\\\\");
But this gives the below exception:
java.util.regex.PatternSyntaxException: Unexpected internal error near index 1
The String#replaceAll() interprets the argument as a regular expression. The \ is an escape character in both String and regex. You need to double-escape it for regex:
string.replaceAll("\\\\", "\\\\\\\\");
But you don't necessarily need regex for this, simply because you want an exact character-by-character replacement and you don't need patterns here. So String#replace() should suffice:
string.replace("\\", "\\\\");
Update: as per the comments, you appear to want to use the string in JavaScript context. You'd perhaps better use StringEscapeUtils#escapeEcmaScript() instead to cover more characters.
TLDR: use theString = theString.replace("\\", "\\\\"); instead.
Problem
replaceAll(target, replacement) uses regular expression (regex) syntax for target and partially for replacement.
Problem is that \ is special character in regex (it can be used like \d to represents digit) and in String literal (it can be used like "\n" to represent line separator or \" to escape double quote symbol which normally would represent end of string literal).
In both these cases to create \ symbol we can escape it (make it literal instead of special character) by placing additional \ before it (like we escape " in string literals via \").
So to target regex representing \ symbol will need to hold \\, and string literal representing such text will need to look like "\\\\".
So we escaped \ twice:
once in regex \\
once in String literal "\\\\" (each \ is represented as "\\").
In case of replacement \ is also special there. It allows us to escape other special character $ which via $x notation, allows us to use portion of data matched by regex and held by capturing group indexed as x, like "012".replaceAll("(\\d)", "$1$1") will match each digit, place it in capturing group 1 and $1$1 will replace it with its two copies (it will duplicate it) resulting in "001122".
So again, to let replacement represent \ literal we need to escape it with additional \ which means that:
replacement must hold two backslash characters \\
and String literal which represents \\ looks like "\\\\"
BUT since we want replacement to hold two backslashes we will need "\\\\\\\\" (each \ represented by one "\\\\").
So version with replaceAll can look like
replaceAll("\\\\", "\\\\\\\\");
Easier way with replaceAll
To make out life easier Java provides tools to automatically escape text into target and replacement parts. So now we can focus only on strings, and forget about regex syntax:
replaceAll(Pattern.quote(target), Matcher.quoteReplacement(replacement))
which in our case can look like
replaceAll(Pattern.quote("\\"), Matcher.quoteReplacement("\\\\"))
Even better: use replace
If we don't really need regex syntax support lets not involve replaceAll at all. Instead lets use replace. Both methods will replace all targets, but replace doesn't involve regex syntax. So you could simply write
theString = theString.replace("\\", "\\\\");
To avoid this sort of trouble, you can use replace (which takes a plain string) instead of replaceAll (which takes a regular expression). You will still need to escape backslashes, but not in the wild ways required with regular expressions.
You'll need to escape the (escaped) backslash in the first argument as it is a regular expression. Replacement (2nd argument - see Matcher#replaceAll(String)) also has it's special meaning of backslashes, so you'll have to replace those to:
theString.replaceAll("\\\\", "\\\\\\\\");
Yes... by the time the regex compiler sees the pattern you've given it, it sees only a single backslash (since Java's lexer has turned the double backwhack into a single one). You need to replace "\\\\" with "\\\\", believe it or not! Java really needs a good raw string syntax.
I need to escape all quotes (') in a string, so it becomes \'
I've tried using replaceAll, but it doesn't do anything. For some reason I can't get the regex to work.
I'm trying with
String s = "You'll be totally awesome, I'm really terrible";
String shouldBecome = "You\'ll be totally awesome, I\'m really terrible";
s = s.replaceAll("'","\\'"); // Doesn't do anything
s = s.replaceAll("\'","\\'"); // Doesn't do anything
s = s.replaceAll("\\'","\\'"); // Doesn't do anything
I'm really stuck here, hope somebody can help me here.
Thanks,
Iwan
You have to first escape the backslash because it's a literal (yielding \\), and then escape it again because of the regular expression (yielding \\\\). So, Try:
s.replaceAll("'", "\\\\'");
output:
You\'ll be totally awesome, I\'m really terrible
Use replace()
s = s.replace("'", "\\'");
output:
You\'ll be totally awesome, I\'m really terrible
Let's take a tour of String#repalceAll(String regex, String replacement)
You will see that:
An invocation of this method of the form str.replaceAll(regex, repl) yields exactly the same result as the expression
Pattern.compile(regex).matcher(str).replaceAll(repl)
So lets take a look at Matcher.html#replaceAll(java.lang.String) documentation
Note that backslashes (\) and dollar signs ($) in the replacement string may cause the results to be different than if it were being treated as a literal replacement string. Dollar signs may be treated as references to captured subsequences as described above, and backslashes are used to escape literal characters in the replacement string.
You can see that in replacement we have special character $ which can be used as reference to captured group like
System.out.println("aHellob,aWorldb".replaceAll("a(\\w+?)b", "$1"));
// result Hello,World
But sometimes we don't want $ to be such special because we want to use it as simple dollar character, so we need a way to escape it.
And here comes \, because since it is used to escape metacharacters in regex, Strings and probably in other places it is good convention to use it here to escape $.
So now \ is also metacharacter in replacing part, so if you want to make it simple \ literal in replacement you need to escape it somehow. And guess what? You escape it the same way as you escape it in regex or String. You just need to place another \ before one you escaping.
So if you want to create \ in replacement part you need to add another \ before it. But remember that to write \ literal in String you need to write it as "\\" so to create two \\ in replacement you need to write it as "\\\\".
So try
s = s.replaceAll("'", "\\\\'");
Or even better
to reduce explicit escaping in replacement part (and also in regex part - forgot to mentioned that earlier) just use replace instead replaceAll which adds regex escaping for us
s = s.replace("'", "\\'");
This doesn't say how to "fix" the problem - that's already been done in other answers; it exists to draw out the details and applicable documentation references.
When using String.replaceAll or any of the applicable Matcher replacers, pay attention to the replacement string and how it is handled:
Note that backslashes (\) and dollar signs ($) in the replacement string may cause the results to be different than if it were being treated as a literal replacement string. Dollar signs may be treated as references to captured subsequences as described above, and backslashes are used to escape literal characters in the replacement string.
As pointed out by isnot2bad in a comment, Matcher.quoteReplacement may be useful here:
Returns a literal replacement String for the specified String. .. The String produced will match the sequence of characters in s treated as a literal sequence. Slashes (\) and dollar signs ($) will be given no special meaning.
You could also try using something like StringEscapeUtils to make your life even easier: http://commons.apache.org/proper/commons-lang/javadocs/api-2.6/org/apache/commons/lang/StringEscapeUtils.html
s = StringEscapeUtils.escapeJava(s);
You can use apache's commons-text library (instead of commons-lang):
Example code:
org.apache.commons.text.StringEscapeUtils.escapeJava(escapedString);
Dependency:
compile 'org.apache.commons:commons-text:1.8'
OR
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-text</artifactId>
<version>1.8</version>
</dependency>
I have an issue with replaceAll function of Java string
replaceAll("regex", "replacement");
works fine but whenever my "replacement" string contains the substring like "$0", "$1" .e.t.c, it will create problem by substituting these $x's with corresponding matching group.
For instance
input ="NAME";
input.replaceAll("NAME", "HAR$0I");
will result in a string "HARNAMEI" as the replacement string contains "$0" which will be substituted by matching group "NAME". How can I override that nature. I need to get the result string as "HAR$0I" only.
I escaped the $ .i.e I converted the replacement string to "HAR\\$0I" which worked fine. But I am looking for any method in java that will do this for me for all such characters which has special meaning in regex world.
The documentation of java.lang.String.replaceAll() says:
Note that backslashes () and dollar signs ($) in the replacement
string may cause the results to be different than if it were being
treated as a literal replacement string; see Matcher.replaceAll. Use
Matcher.quoteReplacement(java.lang.String) to suppress the special
meaning of these characters, if desired.
The documentation of String quoteReplacement(String s) says:
Returns a literal replacement String for the specified String. This
method produces a String that will work as a literal replacement s in
the appendReplacement method of the Matcher class. The String produced
will match the sequence of characters in s treated as a literal
sequence. Slashes ('\') and dollar signs ('$') will be given no
special meaning.
$ in replacement is special character allowing you to use groups. To make it literal you will need to escape it with \$ which needs to be written as "\\$". Same rule apply for \, since it is special character used to escape $. If you would like to use \ literal in replacement you would also need to escape it with another \, so you would need to write it as \\\\.
To simplify this process you can just use Matcher.quoteReplacement("yourReplacement")).
In case where you don't need to use regular expression you can simplify it even more and use
replace("NAME", "HAR$0I")
instead of
replaceAll("NAME", Matcher.quoteReplacement("HAR$0I"))
It sounds like you're actually trying to replace raw strings, without using regexes at all.
You should simply call String.replace(), which does literal replacements without using regexes.
I'm trying to convert the String \something\ into the String \\something\\ using replaceAll, but I keep getting all kinds of errors. I thought this was the solution:
theString.replaceAll("\\", "\\\\");
But this gives the below exception:
java.util.regex.PatternSyntaxException: Unexpected internal error near index 1
The String#replaceAll() interprets the argument as a regular expression. The \ is an escape character in both String and regex. You need to double-escape it for regex:
string.replaceAll("\\\\", "\\\\\\\\");
But you don't necessarily need regex for this, simply because you want an exact character-by-character replacement and you don't need patterns here. So String#replace() should suffice:
string.replace("\\", "\\\\");
Update: as per the comments, you appear to want to use the string in JavaScript context. You'd perhaps better use StringEscapeUtils#escapeEcmaScript() instead to cover more characters.
TLDR: use theString = theString.replace("\\", "\\\\"); instead.
Problem
replaceAll(target, replacement) uses regular expression (regex) syntax for target and partially for replacement.
Problem is that \ is special character in regex (it can be used like \d to represents digit) and in String literal (it can be used like "\n" to represent line separator or \" to escape double quote symbol which normally would represent end of string literal).
In both these cases to create \ symbol we can escape it (make it literal instead of special character) by placing additional \ before it (like we escape " in string literals via \").
So to target regex representing \ symbol will need to hold \\, and string literal representing such text will need to look like "\\\\".
So we escaped \ twice:
once in regex \\
once in String literal "\\\\" (each \ is represented as "\\").
In case of replacement \ is also special there. It allows us to escape other special character $ which via $x notation, allows us to use portion of data matched by regex and held by capturing group indexed as x, like "012".replaceAll("(\\d)", "$1$1") will match each digit, place it in capturing group 1 and $1$1 will replace it with its two copies (it will duplicate it) resulting in "001122".
So again, to let replacement represent \ literal we need to escape it with additional \ which means that:
replacement must hold two backslash characters \\
and String literal which represents \\ looks like "\\\\"
BUT since we want replacement to hold two backslashes we will need "\\\\\\\\" (each \ represented by one "\\\\").
So version with replaceAll can look like
replaceAll("\\\\", "\\\\\\\\");
Easier way with replaceAll
To make out life easier Java provides tools to automatically escape text into target and replacement parts. So now we can focus only on strings, and forget about regex syntax:
replaceAll(Pattern.quote(target), Matcher.quoteReplacement(replacement))
which in our case can look like
replaceAll(Pattern.quote("\\"), Matcher.quoteReplacement("\\\\"))
Even better: use replace
If we don't really need regex syntax support lets not involve replaceAll at all. Instead lets use replace. Both methods will replace all targets, but replace doesn't involve regex syntax. So you could simply write
theString = theString.replace("\\", "\\\\");
To avoid this sort of trouble, you can use replace (which takes a plain string) instead of replaceAll (which takes a regular expression). You will still need to escape backslashes, but not in the wild ways required with regular expressions.
You'll need to escape the (escaped) backslash in the first argument as it is a regular expression. Replacement (2nd argument - see Matcher#replaceAll(String)) also has it's special meaning of backslashes, so you'll have to replace those to:
theString.replaceAll("\\\\", "\\\\\\\\");
Yes... by the time the regex compiler sees the pattern you've given it, it sees only a single backslash (since Java's lexer has turned the double backwhack into a single one). You need to replace "\\\\" with "\\\\", believe it or not! Java really needs a good raw string syntax.
What's the correct regex for a plus character (+) as the first argument (i.e. the string to replace) to Java's replaceAll method in the String class? I can't get the syntax right.
You need to escape the + for the regular expression, using \.
However, Java uses a String parameter to construct regular expressions, which uses \ for its own escape sequences. So you have to escape the \ itself:
"\\+"
when in doubt, let java do the work for you:
myStr.replaceAll(Pattern.quote("+"), replaceStr);
You'll need to escape the + with a \ and because \ is itself a special character in Java strings you'll need to escape it with another \.
So your regex string will be defined as "\\+" in Java code.
I.e. this example:
String test = "ABCD+EFGH";
test = test.replaceAll("\\+", "-");
System.out.println(test);
Others have already stated the correct method of:
Escaping the + as \\+
Using the Pattern.quote method which escapes all the regex meta-characters.
Another method that you can use is to put the + in a character class. Many of the regex meta characters (., *, + among many others) are treated literally in the character class.
So you can also do:
orgStr.replaceAll("[+]",replaceStr);
Ideone Link
If you want a simple string find-and-replace (i.e. you don't need regex), it may be simpler to use the StringUtils from Apache Commons, which would allow you to write:
mystr = StringUtils.replace(mystr, "+", "plus");
Say you want to replace - with \\\-, use:
text.replaceAll("-", "\\\\\\\\-");
String str="Hello+Hello";
str=str.replaceAll("\\+","-");
System.out.println(str);
OR
String str="Hello+Hello";
str=str.replace(Pattern.quote(str),"_");
System.out.println(str);
How about replacing multiple ‘+’ with an undefined amount of repeats?
Example: test+test+test+1234
(+) or [+] seem to pick on a single literal character but on repeats.