Invalid escape sequence (valid ones are \b \t \n \f \r \" \' \ ) - java

I have a problem with a regex in java.
When I try to use this regex:
^(?:(?:([01]?\d|2[0-3]):)?([0-5]?\d):)?([0-5]?\d)$
I get the following error
"Invalid escape sequence (valid ones are \b \t \n \f \r \" \' \ )"
I don't know how to handle that error.
I already tried to double the backslashes, but it didn't work.
I hope someone can help me with this.
Thanks

This should work ^(?:(?:([01]?\\d|2[0-3]):)?([0-5]?\\d):)?([0-5]?\\d)$
The reason is that the listed symbols in the error message have special meaning, but \d is not one of those defined special symbols for using \, this means you have to escape it (by adding an extra \ in front of the symbol).

Whenever you're writing regular expressions in Java, remember to escape the \ characters used in the string that defines the regular expression. In other words, if your regular expression contains one \, then you HAVE to write two \\. For example, your code should look like this:
^(?:(?:([01]?\\d|2[0-3]):)?([0-5]?\\d):)?([0-5]?\\d)$
Why, you ask? because in Java's strings, \ is the escape character used to denote special characters (example: tabs, new lines, etc.) and if a string contains a \ then it must itself be escaped, by prepending another \ in front of it. Hence, \\.
For the record, here is the Java language specification page listing the valid escape characters and their meanings, notice the last one:
\b backspace
\t horizontal tab
\n linefeed
\f form feed
\r carriage return
\" double quote
\' single quote
\\ backslash

you can use notepad++ with find / and replace with //

Related

regex with replaceAll

I have done some searching and would like advice on this problem:
I want to replace "labels":"Webapp" with "labels":["Webapp"]
I found the regex (\"labels\"\:\")+(([a-zA-Z]|\s|\-)+)+(\") with the following substitution "labels":["$2"]
I use the method replaceAll and the Talend editor.
I write output_row.json = output_row.json.replaceAll("(\"labels\"\:\")+(([a-zA-Z]|\s|\-)+)+(\")",""labels":["$2"]"); but It doesn't work.
Message détaillé: Invalid escape sequence (valid ones are \b \t \n \f \r \" \' \ )
Then I escaped the characters, I did:
output_row.json = output_row.json.replaceAll("(\\"labels\\"\:\\")+(([a-zA-Z]|\\s|\-)+)+(\\")","\"labels\":[\"$2\"]");
But It doesn't work yet.
Please could you help me?
Thanks.
Issues : don't escape - and : they are not special characters in regex
escape \s with \\s plus escape " as you did in your second example \"labels\":[\"$2\"]
Although you can use a more concise regex and combine your \\s , - inside character class []
You can use (\"labels\":\")+([a-zA-Z -]+)\
System.out.println("labels\":\"Webapp"
.replaceAll("(\"labels\":\")+([a-zA-Z -]+)\""
, "\"labels\":[\"$2\"]"));

difference between '\' and '\\' in while using it as escape characters

I know that we use escape characters like \n for next line and \t for tab.
But today while working on few string I came across \\$.
I had to print "nike$" so to print it I had to modify the string as "nike\\$".
I want to know what is the exact difference between \ and \\.
Inside a string literal, \ is an escape: The next character that follows tells us what it will do, as in your \n example for newline.
This means you can't put \ in a string on its own, since it's half of an escape sequence. Instead, to have a \ actually in a string, you use \\.
I had to print "nike$" so to print it I had to modify the string as "nike\\$"
"nike\\$" will result in a string that outputs (for instance, via System.out.println) as nike\$, not nike$.
Your use of \\$ suggests to me that you were feeding a regular expression pattern into something, e.g.:
p = Pattern.compile("nike\\$");
In that situation, we have two levels of escaping going on: The string literal, and the regular expression. To have a literal $ in a regular expression, it has to be escaped by \ because otherwise it's an end-of-input assertion. To get that \$ actually to the regular expression parser when using a string literal, we have to escape the backslash in the literal so we actually have a backslash in the string for the regular expression engine to see, thus \\$.

Why should I use a different number of escape characters in different situations?

With regular expressions in Java, why I should write "\n" to define a new line character and "\\s" to define whitespace character?
Why does the quantity of backslashes differs?
Java does its own string parsing, converting it from your code to an internal string in memory and before it sends the string to the regex parser.
Java converts the 2 characters \n to a linefeed (ASCII code 0x0A) and the first 2 (!) characters in \\s to a single backslash: \s. Now this string is sent to the regex parser, and since regular expressions recognize their own special escaped characters, it treats the \s as "any whitespace".
At this point, the code \n is already stored as a single character "linefeed", and the regular expression does not process it again.
Since regular expressions also recognize the set \n as "a linefeed", you can also use \\n in your Java string -- Java converts the escaped \\ to a single \, and the regular expression module then finds \n, which (again) gets translated into a linefeed.
A Java string has a certain set of allowed escape sequences, of which "\n" is one, but "\s" is not. A string doesn't understand the regexp shorthand for whitespace. You're probably passing a Java string to the RegExp constructor, so in order to pass "\s" as a string, you have to escape the "\" by doubling it.
\ is special character in many languages (in Java it is special in String or char) or tools like regex.
In String or char it is used to create other special characters which you normally couldn't write. By using \x where x is representation of that special character you are able to create
\t tab
\b backspace
\n newline
\r carriage return
\f formfeed
or to escape other special characters
\' single quote (' is special in char because it represents where char starts and ends, so to actually write ' character you need to escape it and write it as
here we start creating character
| here we end creating character
↓ ↓
'\''
↑↑
here we created literal of '
\" double quote - similarly to \' in char, in String " represents where it starts and ends, so to put " literal into string (to actually be able to write it) you need to escape it
here we start creating String
| here we end creating String
↓ ↓
"\""
↑↑
here we created literal of "
\\ backslash - since \ is special character used to create others special character there has to be a way to un-special it so we could actually print \ as simple literal.
Problem: how to write string representing day\night? If you write it such string in a way "day\night" it will be interpreted asday[newline]ight`.
So in many languages to represent \ literal another \ is added before it to escape it. So String which represent day\night needs to be written as "day\\night" (now \ in \n is escaped so it no longer represents \n - newline - but concatenation of \ and n characters)
In case of regex to represent character class which will accept any whitespace you need to actually pass \s.
But string which will represent \s needs to be written as "\\s" because as mentioned earlier in String \ is special and needs escaping.
If you would write \s as "\s" you would get

How to replace one or more \ in string with just \?

Consider the string,
this\is\\a\new\\string
The output should be:
this\is\a\new\string
So basically one or more \ character should be replaced with just one \.
I tried the following:
str = str.replace("[\\]+","\")
but it was no use. The reason I used two \ in [\\]+ was because internally \ is stored as \\. I know this might be a basic regex question, but I am able to replace one or more normal alphabets but not \ character. Any help is really appreciated.
str.replace("[\\]+", "\") has few problems,
replace doesn't use regex (replaceAll does) so "[\\]" will represent [\] literal, not \ nor \\ (depending on what you think it would represent)
even if it did accept regex "[\\]" would not be correct regex because \\] would escape ] so you would end up with unclosed character class [..
it will not compile (your replacement String is not closed)
It will not compile because \ is start of escape sequence \X where X needs to be either
changed from being String special character to simple literal, like in your case \" will escape " to be literal (so you could print it for instance) instead of being start/end of String,
changed from being normal character to be special one like in case of line separators \n \r or tabulations \t.
Now we know that \ is special and is used to escape other character. So what do you think we need to do to make \ represent literal (when we want to print \). If you guessed that it needs to be escaped with another \ then you are right. To create \ literal we need to write it in String as "\\".
Since you know how to create String containing \ literal (escaped \) you can start thinking about how to create your replacements.
Regex which represents one or more \ can look like
\\+
But that is its native form, and we need to create it using String. I used \\ here because in regex \ is also special character (for instance \d represents digits , not \ literal followed by d) so it also needs to be escaped first to represent \ literal. Just like in String we can escape it with another \.
So String representing this regex will need to be written as
"\\\\+" (we escaped \ twice, once in regex \\+ and once in string)
You can use it as first argument of replaceAll (because replace as mentioned earlier doesn't accept regex).
Now last problem you will face is second argument of replaceAll method. If you write
replaceAll("\\\\+", "\\")
and it will find match for regex you will see exception
java.lang.IllegalArgumentException: character to be escaped is missing
It is because in replacement part (second argument in replaceAll method) we can also use special formula $x which represents current match from group with index x. So to be able to escape $ into literal we need some escape mechanism, and again \ was used here for that purpose. So \ is also special in replacement part of our method.
So again to create \ literal we need to escape it with another \, and string literal representing expression \\ is "\\\\".
But lets get back to earlier exception: message "character to be escaped is missing" refers to X part of \X formula (X is character we want to be escaped). Problem is that earlier your replacement "\\" represented only \ part, so this method expected either $ to create \$, or \\ to create \ literal. So valid replacements would be "\\$ or "\\\\".
To make things work you need to write your replacing method as
str = str.replaceAll("\\\\+", "\\\\")
You can use:
str = str.replace("\\\\", "\\");
Remember that String#replace doesn't take a regex.
try this
str = str.replaceAll("\\\\+", "\\\\");
When writing regular expressions, you typically need to double-escape backslashes. So you would do this:
str = str.replaceAll("\\\\+", "\\\\");
I'd use Matcher.quoteReplacement() and String.replaceAll() here.
Like this:
String s;
[...]
s = s.replaceAll("\\\\+", Matcher.quoteReplacement("\\"));

Java regular expression to remove all non alphanumeric characters EXCEPT spaces

I'm trying to write a regular expression in Java which removes all non-alphanumeric characters from a paragraph, except the spaces between the words.
This is the code I've written:
paragraphInformation = paragraphInformation.replaceAll("[^a-zA-Z0-9\s]", "");
However, the compiler gave me an error message pointing to the s saying it's an illegal escape character. The program compiled OK before I added the \s to the end of the regular expression, but the problem with that was that the spaces between words in the paragraph were stripped out.
How can I fix this error?
You need to double-escape the \ character: "[^a-zA-Z0-9\\s]"
Java will interpret \s as a Java String escape character, which is indeed an invalid Java escape. By writing \\, you escape the \ character, essentially sending a single \ character to the regex. This \ then becomes part of the regex escape character \s.
You need to escape the \ so that the regular expression recognizes \s :
paragraphInformation = paragraphInformation.replaceAll("[^a-zA-Z0-9\\s]", "");
Generally whenever you see that error, it means you only have a single backslash where you need two:
paragraphInformation = paragraphInformation.replaceAll("[^a-zA-Z0-9\\s]", "");
Victoria, you must write \\s not \s here.
Please take a look at this site, you can test Java Regex online and get wellformatted regex string patterns back:
http://www.regexplanet.com/advanced/java/index.html

Categories