RegEx special char "|" escaping in Java - java

I am trying to split a string like: abc|aa||
When I use the regular string.split I am required to provide a regular expression.
I tried to do the following :
string.split("|")
string.split("\|")
string.split("/|")
string.split("\Q|\E")
Non of them work.....
Does anyone know how to make it work?

I don't know how you tried, but
public static void main(String[] args) {
String a= "abc|aa||";
String split = Pattern.quote("|");
System.out.println(split);
System.out.println(Arrays.toString(a.split(split)));
}
prints out
\Q|\E
[abc, aa]
effectively splitting on |. The \Q ... \E is a regex quote. Anything inside it will be matched as a literal pattern.
string.split("\|"); // won't work because \| is not a valid escape sequence
string.split("/|"); // will compile, but split on / and empty space, so between each character
string.split("|"); // will compile, but split on empty space, so between each character
// true alternative to quoted solution above
string.split("\\|") // escape the second \ which will resolve as an escaped | in the regex pattern

using a double backslash is required because the backslash is also a special character. So you need to escape the escape character. i.e. \
\|

| is a special character hence you need to escape it using slashes. Try using
string.split("\\|")

| is a special character for the regular expression, thus it must be escaped e.g. \|
The backslash \ is a special character in Java, thus it must also be escaped
As a result, must do the following to achieve the desired effect.
string.split("\\|")

All of the following patterns split it all right: "\\Q|\\E" "\\|" "[|]" of course the latter two are preferrable

Related

Using regex, with reserved characters in Java [duplicate]

How to write a regular expression to match this \" (a backslash then a quote)? Assume I have a string like this:
click to search
I need to replace all the \" with a ", so the result would look like:
click to search
This one does not work: str.replaceAll("\\\"", "\"") because it only matches the quote. Not sure how to get around with the backslash. I could have removed the backslash first, but there are other backslashes in my string.
If you don't need any of regex mechanisms like predefined character classes \d, quantifiers etc. instead of replaceAll which expects regex use replace which expects literals
str = str.replace("\\\"","\"");
Both methods will replace all occurrences of targets, but replace will treat targets literally.
BUT if you really must use regex you are looking for
str = str.replaceAll("\\\\\"", "\"")
\ is special character in regex (used for instance to create \d - character class representing digits). To make regex treat \ as normal character you need to place another \ before it to turn off its special meaning (you need to escape it). So regex which we are trying to create is \\.
But to create string literal representing text \\ so you could pass it to regex engine you need to write it as four \ ("\\\\"), because \ is also special character in String literals (part of code written using "...") since it can be used for instance as \t to represent tabulator.
That is why you also need to escape \ there.
In short you need to escape \ twice:
in regex \\
and then in String literal "\\\\"
You don't need a regular expression.
str.replace("\\\"", "\"")
should work just fine.
The replace method takes two substrings and replaces all non-overlapping occurrences of the first with the second. Per the javadoc:
public String replace(CharSequence target,
CharSequence replacement)
Replaces each substring of this string that matches the literal target sequence with the specified literal replacement sequence. The replacement proceeds from the beginning of the string to the end, for example, replacing "aa" with "b" in the string "aaa" will result in "ba" rather than "ab".
try this: str.replaceAll("\\\\\"", "\\\"")
because Java will replace \ twice:
(1) \\\\\" --> \\" (for string)
(2) \\" --> \" (for regex)

Java - Regular expression to match strings not containing quote and backslash characters [duplicate]

How to write a regular expression to match this \" (a backslash then a quote)? Assume I have a string like this:
click to search
I need to replace all the \" with a ", so the result would look like:
click to search
This one does not work: str.replaceAll("\\\"", "\"") because it only matches the quote. Not sure how to get around with the backslash. I could have removed the backslash first, but there are other backslashes in my string.
If you don't need any of regex mechanisms like predefined character classes \d, quantifiers etc. instead of replaceAll which expects regex use replace which expects literals
str = str.replace("\\\"","\"");
Both methods will replace all occurrences of targets, but replace will treat targets literally.
BUT if you really must use regex you are looking for
str = str.replaceAll("\\\\\"", "\"")
\ is special character in regex (used for instance to create \d - character class representing digits). To make regex treat \ as normal character you need to place another \ before it to turn off its special meaning (you need to escape it). So regex which we are trying to create is \\.
But to create string literal representing text \\ so you could pass it to regex engine you need to write it as four \ ("\\\\"), because \ is also special character in String literals (part of code written using "...") since it can be used for instance as \t to represent tabulator.
That is why you also need to escape \ there.
In short you need to escape \ twice:
in regex \\
and then in String literal "\\\\"
You don't need a regular expression.
str.replace("\\\"", "\"")
should work just fine.
The replace method takes two substrings and replaces all non-overlapping occurrences of the first with the second. Per the javadoc:
public String replace(CharSequence target,
CharSequence replacement)
Replaces each substring of this string that matches the literal target sequence with the specified literal replacement sequence. The replacement proceeds from the beginning of the string to the end, for example, replacing "aa" with "b" in the string "aaa" will result in "ba" rather than "ab".
try this: str.replaceAll("\\\\\"", "\\\"")
because Java will replace \ twice:
(1) \\\\\" --> \\" (for string)
(2) \\" --> \" (for regex)

difference between '\' and '\\' in while using it as escape characters

I know that we use escape characters like \n for next line and \t for tab.
But today while working on few string I came across \\$.
I had to print "nike$" so to print it I had to modify the string as "nike\\$".
I want to know what is the exact difference between \ and \\.
Inside a string literal, \ is an escape: The next character that follows tells us what it will do, as in your \n example for newline.
This means you can't put \ in a string on its own, since it's half of an escape sequence. Instead, to have a \ actually in a string, you use \\.
I had to print "nike$" so to print it I had to modify the string as "nike\\$"
"nike\\$" will result in a string that outputs (for instance, via System.out.println) as nike\$, not nike$.
Your use of \\$ suggests to me that you were feeding a regular expression pattern into something, e.g.:
p = Pattern.compile("nike\\$");
In that situation, we have two levels of escaping going on: The string literal, and the regular expression. To have a literal $ in a regular expression, it has to be escaped by \ because otherwise it's an end-of-input assertion. To get that \$ actually to the regular expression parser when using a string literal, we have to escape the backslash in the literal so we actually have a backslash in the string for the regular expression engine to see, thus \\$.

How to replace one or more \ in string with just \?

Consider the string,
this\is\\a\new\\string
The output should be:
this\is\a\new\string
So basically one or more \ character should be replaced with just one \.
I tried the following:
str = str.replace("[\\]+","\")
but it was no use. The reason I used two \ in [\\]+ was because internally \ is stored as \\. I know this might be a basic regex question, but I am able to replace one or more normal alphabets but not \ character. Any help is really appreciated.
str.replace("[\\]+", "\") has few problems,
replace doesn't use regex (replaceAll does) so "[\\]" will represent [\] literal, not \ nor \\ (depending on what you think it would represent)
even if it did accept regex "[\\]" would not be correct regex because \\] would escape ] so you would end up with unclosed character class [..
it will not compile (your replacement String is not closed)
It will not compile because \ is start of escape sequence \X where X needs to be either
changed from being String special character to simple literal, like in your case \" will escape " to be literal (so you could print it for instance) instead of being start/end of String,
changed from being normal character to be special one like in case of line separators \n \r or tabulations \t.
Now we know that \ is special and is used to escape other character. So what do you think we need to do to make \ represent literal (when we want to print \). If you guessed that it needs to be escaped with another \ then you are right. To create \ literal we need to write it in String as "\\".
Since you know how to create String containing \ literal (escaped \) you can start thinking about how to create your replacements.
Regex which represents one or more \ can look like
\\+
But that is its native form, and we need to create it using String. I used \\ here because in regex \ is also special character (for instance \d represents digits , not \ literal followed by d) so it also needs to be escaped first to represent \ literal. Just like in String we can escape it with another \.
So String representing this regex will need to be written as
"\\\\+" (we escaped \ twice, once in regex \\+ and once in string)
You can use it as first argument of replaceAll (because replace as mentioned earlier doesn't accept regex).
Now last problem you will face is second argument of replaceAll method. If you write
replaceAll("\\\\+", "\\")
and it will find match for regex you will see exception
java.lang.IllegalArgumentException: character to be escaped is missing
It is because in replacement part (second argument in replaceAll method) we can also use special formula $x which represents current match from group with index x. So to be able to escape $ into literal we need some escape mechanism, and again \ was used here for that purpose. So \ is also special in replacement part of our method.
So again to create \ literal we need to escape it with another \, and string literal representing expression \\ is "\\\\".
But lets get back to earlier exception: message "character to be escaped is missing" refers to X part of \X formula (X is character we want to be escaped). Problem is that earlier your replacement "\\" represented only \ part, so this method expected either $ to create \$, or \\ to create \ literal. So valid replacements would be "\\$ or "\\\\".
To make things work you need to write your replacing method as
str = str.replaceAll("\\\\+", "\\\\")
You can use:
str = str.replace("\\\\", "\\");
Remember that String#replace doesn't take a regex.
try this
str = str.replaceAll("\\\\+", "\\\\");
When writing regular expressions, you typically need to double-escape backslashes. So you would do this:
str = str.replaceAll("\\\\+", "\\\\");
I'd use Matcher.quoteReplacement() and String.replaceAll() here.
Like this:
String s;
[...]
s = s.replaceAll("\\\\+", Matcher.quoteReplacement("\\"));

Regular expression to match a backslash followed by a quote

How to write a regular expression to match this \" (a backslash then a quote)? Assume I have a string like this:
click to search
I need to replace all the \" with a ", so the result would look like:
click to search
This one does not work: str.replaceAll("\\\"", "\"") because it only matches the quote. Not sure how to get around with the backslash. I could have removed the backslash first, but there are other backslashes in my string.
If you don't need any of regex mechanisms like predefined character classes \d, quantifiers etc. instead of replaceAll which expects regex use replace which expects literals
str = str.replace("\\\"","\"");
Both methods will replace all occurrences of targets, but replace will treat targets literally.
BUT if you really must use regex you are looking for
str = str.replaceAll("\\\\\"", "\"")
\ is special character in regex (used for instance to create \d - character class representing digits). To make regex treat \ as normal character you need to place another \ before it to turn off its special meaning (you need to escape it). So regex which we are trying to create is \\.
But to create string literal representing text \\ so you could pass it to regex engine you need to write it as four \ ("\\\\"), because \ is also special character in String literals (part of code written using "...") since it can be used for instance as \t to represent tabulator.
That is why you also need to escape \ there.
In short you need to escape \ twice:
in regex \\
and then in String literal "\\\\"
You don't need a regular expression.
str.replace("\\\"", "\"")
should work just fine.
The replace method takes two substrings and replaces all non-overlapping occurrences of the first with the second. Per the javadoc:
public String replace(CharSequence target,
CharSequence replacement)
Replaces each substring of this string that matches the literal target sequence with the specified literal replacement sequence. The replacement proceeds from the beginning of the string to the end, for example, replacing "aa" with "b" in the string "aaa" will result in "ba" rather than "ab".
try this: str.replaceAll("\\\\\"", "\\\"")
because Java will replace \ twice:
(1) \\\\\" --> \\" (for string)
(2) \\" --> \" (for regex)

Categories