replace special character String with another Special character - java

I have a String which is path taken dynamically from my system .
i store it in a String .
C:\Users\SXR8036\Downloads\LANE-914.xls
I need to pass this path to read excel file function , but it needs the backward slashes to be replaced with forward slash.
and i want something like C:/Users/SXR8036/Downloads/LANE-914.xls
i.e all backward slash replaced with forward one
With String replace method i am only able to replace with a a-z character , but it shows error when i replace Special characters
something.replaceAll("[^a-zA-Z0-9]", "/");
I have to pass the String name to read a file.

It's better in this case to use non-regex replace() instead of regex replaceAll(). You don't need regular expressions for this replacement and it complicates things because it needs extra escapes. Backslash is a special character in Java and also in regular expressions, so in Java if you want a straight backslash you have to double it up \\ and if you want a straight backslash in a regular expression in Java you have to quadruple it \\\\.
something = something.replace("\\", "/");
Behind the scenes, replace(String, String) uses regular expression patterns (at least in Oracle JDK) so has some overhead. In your specific case, you can actually use single character replacement, which may be more efficient (not that it probably matters!):
something = something.replace('\\', '/');
If you were to use regular expressions:
something = something.replaceAll("\\\\", "/");
Or:
something = something.replaceAll(Pattern.quote("\\"), "/");

To replace backslashes with replaceAll you'll have to escape them properly in the regular expression that you are using.
In your case the correct expression would be:
final String path = "C:\\Users\\SXR8036\\Downloads\\LANE-914.xls";
final String normalizedPath = path.replaceAll("\\\\", "/");
As the backslash itself is the escape character in Java Strings it needs to be escaped twice to work as desired.
In general you can pass very complex regular expressions to String.replaceAll. See the JavaDocs of java.lang.String.replaceAll and especially java.util.regex.Pattern for more information.

Related

How to replace a space exactly with "\\\\s+" [duplicate]

I'm trying to convert the String \something\ into the String \\something\\ using replaceAll, but I keep getting all kinds of errors. I thought this was the solution:
theString.replaceAll("\\", "\\\\");
But this gives the below exception:
java.util.regex.PatternSyntaxException: Unexpected internal error near index 1
The String#replaceAll() interprets the argument as a regular expression. The \ is an escape character in both String and regex. You need to double-escape it for regex:
string.replaceAll("\\\\", "\\\\\\\\");
But you don't necessarily need regex for this, simply because you want an exact character-by-character replacement and you don't need patterns here. So String#replace() should suffice:
string.replace("\\", "\\\\");
Update: as per the comments, you appear to want to use the string in JavaScript context. You'd perhaps better use StringEscapeUtils#escapeEcmaScript() instead to cover more characters.
TLDR: use theString = theString.replace("\\", "\\\\"); instead.
Problem
replaceAll(target, replacement) uses regular expression (regex) syntax for target and partially for replacement.
Problem is that \ is special character in regex (it can be used like \d to represents digit) and in String literal (it can be used like "\n" to represent line separator or \" to escape double quote symbol which normally would represent end of string literal).
In both these cases to create \ symbol we can escape it (make it literal instead of special character) by placing additional \ before it (like we escape " in string literals via \").
So to target regex representing \ symbol will need to hold \\, and string literal representing such text will need to look like "\\\\".
So we escaped \ twice:
once in regex \\
once in String literal "\\\\" (each \ is represented as "\\").
In case of replacement \ is also special there. It allows us to escape other special character $ which via $x notation, allows us to use portion of data matched by regex and held by capturing group indexed as x, like "012".replaceAll("(\\d)", "$1$1") will match each digit, place it in capturing group 1 and $1$1 will replace it with its two copies (it will duplicate it) resulting in "001122".
So again, to let replacement represent \ literal we need to escape it with additional \ which means that:
replacement must hold two backslash characters \\
and String literal which represents \\ looks like "\\\\"
BUT since we want replacement to hold two backslashes we will need "\\\\\\\\" (each \ represented by one "\\\\").
So version with replaceAll can look like
replaceAll("\\\\", "\\\\\\\\");
Easier way with replaceAll
To make out life easier Java provides tools to automatically escape text into target and replacement parts. So now we can focus only on strings, and forget about regex syntax:
replaceAll(Pattern.quote(target), Matcher.quoteReplacement(replacement))
which in our case can look like
replaceAll(Pattern.quote("\\"), Matcher.quoteReplacement("\\\\"))
Even better: use replace
If we don't really need regex syntax support lets not involve replaceAll at all. Instead lets use replace. Both methods will replace all targets, but replace doesn't involve regex syntax. So you could simply write
theString = theString.replace("\\", "\\\\");
To avoid this sort of trouble, you can use replace (which takes a plain string) instead of replaceAll (which takes a regular expression). You will still need to escape backslashes, but not in the wild ways required with regular expressions.
You'll need to escape the (escaped) backslash in the first argument as it is a regular expression. Replacement (2nd argument - see Matcher#replaceAll(String)) also has it's special meaning of backslashes, so you'll have to replace those to:
theString.replaceAll("\\\\", "\\\\\\\\");
Yes... by the time the regex compiler sees the pattern you've given it, it sees only a single backslash (since Java's lexer has turned the double backwhack into a single one). You need to replace "\\\\" with "\\\\", believe it or not! Java really needs a good raw string syntax.

Java escaping hyphen "-" character using regex

I am using Java and have string which have value as shown below,
String data = "vale-cx";
data = data.replaceAll("\\-", "\\-\\");
I am replacing "-" inside of it and it is not working. Final value i am looking is "vale\-cx". Meaning, hyphen needs to be escaped.
Hyphen doesn't need to be escaped, but backslash needs to be escaped in the replacement expression, meaning you need an extra two backslashes before the hyphen (and none after):
data = data.replaceAll("-", "\\\\-");
Better yet, don't use regex at all:
data = data.replace("-", "\\-");
Try with \\\\- instead, e.g:
String data = "vale-cx";
System.out.println(data.replaceAll("\\-", "\\\\-"));
The hyphen is only special in regular expressions when used to create ranges in character classes, e.g. [A-Z]. You aren't doing that here, so you don't need any escaping at all.

Why does String.replaceAll() need so many escapes for " character?

If I have string a"b"c", but I want to get a\"b\"c\", I would naturally write
String t = "a\"b\"c\"";
t = t.replaceAll("\"", "\\\"");
However, that results in the same string, a"b"c". The correct way is
t.replaceAll("\"", "\\\\\"");
Why?
replaceAll uses regular expressions for both the pattern and the replacement - both of which require backslashes to be escaped. So the regex replacement pattern you want for the second argument is:
\\"
Now because both \ and " in Java string literals also need escaping, that means each of those characters needs an extra backslash. Add the quotes, and you've got:
"\\\\\""
which is what you've got in your source.
It's simpler if you just use String.replace which doesn't use regular expressions. That way you're only trying to provide this string (not string literal) as the second argument:
\"
After escaping and turning into a string literal, that becomes:
"\\\""
which still isn't great, but it's at least better.
An alternative is to use replaceAll but with Matcher.quoteReplacement:
t = t.replaceAll("\"", Matcher.quoteReplacement("\\\""));
Personally I'd just use replace() though. You don't want regular expression replacements, after all.

A simple scenario in Java where replacing a single character with a back slash requires four back slashes

Let's consider the following code snippet in Java.
package escape;
final public class Main
{
public static void main(String[] args)
{
String s = "abc/xyz";
System.out.println(s.replaceAll("/", "\\\\"));
}
}
I just want to replace "/" with "\" in the above String abc/xyz and which is done and displays abc\xyz as expected but I couldn't get why it requires back slashes four times. It looks like two back slashes are sufficient. Why such is not a case?
The reason is that String.replaceAll uses regular expressions (and actually calls Matcher.replaceAll which does document this). In regular expressions you have to escape the '\' also in string literals you have to escape the '\'. Your 4 slashes are two slashes in the java string. And thereby an escaped slash in the regular expression.
You need to escape back slash once\\ for java String and one more time\\ for regex replacement string.
From the JavaDoc:
Note that backslashes (\) and dollar signs ($) in the replacement
string may cause the results to be different than if it were being
treated as a literal replacement string; see Matcher.replaceAll. Use
Matcher.quoteReplacement(java.lang.String) to suppress the special
meaning of these characters, if desired
System.out.println(s.replaceAll("/", Matcher.quoteReplacement("\\")));
I just want to replace "/" with "\"
Then you should not be using a regular expression, which is overkill, and requires backslashes to be escaped (twice). Instead do
string.replace('/', '\\');
(Still need to escape it once)
Referring to the Java documentation, if you use replaceAll, Java will treat the first parameter as a RegEx, and will assess special meaning for backslashes in the replacement string. Basically, \1 would refer to the first matching glob in the regex... In this case you need to escape the backslashes so they're "litteral" backslashes for the String, and then you need to escape these a second time so that replaceAll doesn't try to treat them with a special meaning.

String.replaceAll single backslashes with double backslashes

I'm trying to convert the String \something\ into the String \\something\\ using replaceAll, but I keep getting all kinds of errors. I thought this was the solution:
theString.replaceAll("\\", "\\\\");
But this gives the below exception:
java.util.regex.PatternSyntaxException: Unexpected internal error near index 1
The String#replaceAll() interprets the argument as a regular expression. The \ is an escape character in both String and regex. You need to double-escape it for regex:
string.replaceAll("\\\\", "\\\\\\\\");
But you don't necessarily need regex for this, simply because you want an exact character-by-character replacement and you don't need patterns here. So String#replace() should suffice:
string.replace("\\", "\\\\");
Update: as per the comments, you appear to want to use the string in JavaScript context. You'd perhaps better use StringEscapeUtils#escapeEcmaScript() instead to cover more characters.
TLDR: use theString = theString.replace("\\", "\\\\"); instead.
Problem
replaceAll(target, replacement) uses regular expression (regex) syntax for target and partially for replacement.
Problem is that \ is special character in regex (it can be used like \d to represents digit) and in String literal (it can be used like "\n" to represent line separator or \" to escape double quote symbol which normally would represent end of string literal).
In both these cases to create \ symbol we can escape it (make it literal instead of special character) by placing additional \ before it (like we escape " in string literals via \").
So to target regex representing \ symbol will need to hold \\, and string literal representing such text will need to look like "\\\\".
So we escaped \ twice:
once in regex \\
once in String literal "\\\\" (each \ is represented as "\\").
In case of replacement \ is also special there. It allows us to escape other special character $ which via $x notation, allows us to use portion of data matched by regex and held by capturing group indexed as x, like "012".replaceAll("(\\d)", "$1$1") will match each digit, place it in capturing group 1 and $1$1 will replace it with its two copies (it will duplicate it) resulting in "001122".
So again, to let replacement represent \ literal we need to escape it with additional \ which means that:
replacement must hold two backslash characters \\
and String literal which represents \\ looks like "\\\\"
BUT since we want replacement to hold two backslashes we will need "\\\\\\\\" (each \ represented by one "\\\\").
So version with replaceAll can look like
replaceAll("\\\\", "\\\\\\\\");
Easier way with replaceAll
To make out life easier Java provides tools to automatically escape text into target and replacement parts. So now we can focus only on strings, and forget about regex syntax:
replaceAll(Pattern.quote(target), Matcher.quoteReplacement(replacement))
which in our case can look like
replaceAll(Pattern.quote("\\"), Matcher.quoteReplacement("\\\\"))
Even better: use replace
If we don't really need regex syntax support lets not involve replaceAll at all. Instead lets use replace. Both methods will replace all targets, but replace doesn't involve regex syntax. So you could simply write
theString = theString.replace("\\", "\\\\");
To avoid this sort of trouble, you can use replace (which takes a plain string) instead of replaceAll (which takes a regular expression). You will still need to escape backslashes, but not in the wild ways required with regular expressions.
You'll need to escape the (escaped) backslash in the first argument as it is a regular expression. Replacement (2nd argument - see Matcher#replaceAll(String)) also has it's special meaning of backslashes, so you'll have to replace those to:
theString.replaceAll("\\\\", "\\\\\\\\");
Yes... by the time the regex compiler sees the pattern you've given it, it sees only a single backslash (since Java's lexer has turned the double backwhack into a single one). You need to replace "\\\\" with "\\\\", believe it or not! Java really needs a good raw string syntax.

Categories