Pattern Matching failed when "\" is input - java

My pattern is something like this:
"^[a-zA-Z0-9_'^&/+-\\.]{1,}#{1,1}[a-zA-Z0-9_'^&/+-.]{1,}$"
But when I try to match something with a backslash in it, like this:
"abc\\#abc"
...it does not match. Can anyone explain why?

try with below pattern
"^[a-zA-Z0-9_'^&/+-\\\\.]{1,}#{1,1}[a-zA-Z0-9_'^&/+-.]{1,}$";
or
"^[a-zA-Z0-9_'^&/+-\\{0,}}.]{1,}#{1,1}[a-zA-Z0-9_'^&/+-.]{1,}$";
The expression \\ matches a single backslash \

Try escaping each backslash of your test string with an additional backslash: e.g.
"abc\\\\#abc" becomes "abc\\\\\\\\#abc"

You need to use "\\\\" if you want the end result to look like "\"
why, you ask?
The Java compiler sees the string "\\\\" and turns that into "\\" as "\" is an escape character.
Afterwards the regular expression sees the string "\\" and turns it into "\" as an "\" is an escape character.
so to want a single backslash you must put in four.

I'm assuming you're writing the regex in your Java source code, like this:
Pattern p = Pattern.compile(
"^[a-zA-Z0-9_'^&/+-\\.]{1,}#{1,1}[a-zA-Z0-9_'^&/+-.]{1,}$"
);
I'm also assuming you meant \\. as a backslash followed by a dot, not as an escaped dot.
Because it's in a string literal, you have to escape backslashes one more time. That means you have to use four backslashes in the regex to match one in the target string. You also need to escape the - (hyphen) so the regex compiler doesn't think (for example) that [+-.] is meant to be a range expression like [0-9] or [a-z].
"^[a-zA-Z0-9_'^&/+\\\\.-]+#[a-zA-Z0-9_'^&/+.-]+$"
I also changed your {1,} to + because it means the same thing, and got rid of the {1,1} because it doesn't do anything. And I changed your & to &. I don't know how that got in there, but if you wrote it that way in your source code, it's wrong.

Related

How to escape characters in a regular expression

When I use the following code I've got an error:
Matcher matcher = pattern.matcher("/Date\(\d+\)/");
The error is :
invalid escape sequence (valid ones are \b \t \n \f \r \" \' \\ )
I have also tried to change the value in the brackets to('/Date\(\d+\)/'); without any success.
How can i avoid this error?
You need to double-escape your \ character, like this: \\.
Otherwise your String is interpreted as if you were trying to escape (.
Same with the other round bracket and the d.
In fact it seems you are trying to initialize a Pattern here, while pattern.matcher references a text you want your Pattern to match.
Finally, note that in a Pattern, escaped characters require a double escape, as such:
\\(\\d+\\)
Also, as Rohit says, Patterns in Java do not need to be surrounded by forward slashes (/).
In fact if you initialize a Pattern like that, it will interpret your Pattern as starting and ending with literal forward slashes.
Here's a small example of what you probably want to do:
// your input text
String myText = "Date(123)";
// your Pattern initialization
Pattern p = Pattern.compile("Date\\(\\d+\\)");
// your matcher initialization
Matcher m = p.matcher(myText);
// printing the output of the match...
System.out.println(m.find());
Output:
true
Your regex is correct by itself, but in Java, the backslash character itself needs to be escaped.
Thus, this regex:
/Date\(\d+\)/
Must turn into this:
/Date\\(\\d+\\)/
One backslash is for escaping the parenthesis or d. The other one is for escaping the backslash itself.
The error message you are getting arises because Java thinks you're trying to use \( as a single escape character, like \n, or any of the other examples. However, \( is not a valid escape sequence, and so Java complains.
In addition, the logic of your code is probably incorrect. The argument to matcher should be the text to search (for example, "/Date(234)/Date(6578)/"), whereas the variable pattern should contain the pattern itself. Try this:
String textToMatch = "/Date(234)/Date(6578)/";
Pattern pattern = pattern.compile("/Date\\(\\d+\\)/");
Matcher matcher = pattern.matcher(textToMatch);
Finally, the regex character class \d means "one single digit." If you are trying to refer to the literal phrase \\d, you would have to use \\\\d to escape this. However, in that case, your regex would be a constant, and you could use textToMatch.indexOf and textToMatch.contains more easily.
To escape regex in java, you can also use Pattern.quote()

Removing backslash and newline character (occurring together) in Java

I have stream of data coming from different feeds which I need to clean up.
Data is in specific format and if some sentence spans through multiple lines it is separated using "\"(backslash), which I want to remove. \ is also present in other part of text for escaping quotes etc and I don't want to remove these backslashes. So eventually I want to remove "\\n".
I have tried following regex for removing \ and \n but it didn't work :
singleLine.replaceAll("(\\\\n|\\\\r)", "");
I am not sure what regex would work in this case.
Regex isn't really necessary for this; If I were you, I would use...
singleLine=singleLine.replace("\\\\n", "");
Many people think the replace method only replaces one, but in fact the only difference is that replaceAll uses regex, while replace simply replaces exact matches of the String.
If you do want to use regex though, I believe you have to do \\\\\\\\ (you have to 'nullify' the escape character in Java, and in regex, so x4, not just x2)
Explaining this some more
The only other issue is in your example, you never set singeLine equal to anything; I'm not sure if you hid that, or missed that.
Edit:
Explaining the reasoning for \\\\\\\\ some more, Java requires that you do "\\" to represent one \. Regex also has a use for the \ character, and requires you do the same again for it. If you just "\\" in Java, the regex parser essentially receives "\", it's escape character for certain things. You need to give the regex parser two of them, to escape it, so in Java, you need to do "\\\\" just to represent a match for a single "\"
You'll need 5 backslash characters for each pattern in that regexp.
Use:
singleLine.replaceAll("(\\\\\n|\\\\\r)", "");
The backslash character is both an escape sequence in your string and an escape sequence in the regexp. So to represent a literal \ in a regexp you'll need to use 4 \ characters - your regexp needs \\ to get an escaped backslash, and each of those needs to be escaped in the java String - and then another to represent either \n or \r.
String str = "string with \\\n newline and \\\n newline ...";
String repl = str.replaceAll("(\\\\\n|\\\\\r)", "");
System.out.println("str: " + str);
System.out.println("repl: " + repl);
Output:
STR: string with \
newline and \
newline ...
REPL: string with newline and newline ...
You need to assign the return value to another String object, or the same object, because of String immutability.
singleLine = singleLine.replaceAll("(\\\\n|\\\\r)", "");
More info is here
Remember that Strings are immutable. This means that replaceAll() does not change the String in singleLine. You must use the return value to get the modified String. For example, you can do
singleLine = singleLine.replaceAll("(\\\\n|\\\\r)", "");

Java String appending a double quote

I want to replace all brackets in a String with the double quote character.
I thought this would work:
"[foo".replaceAll(Pattern.quote("["), Pattern.quote("""));
but it does not. Can anyone help me understand what I need to do?
You need to escape the quotes
"[foo".replaceAll(Pattern.quote("["), "\"");
replaceAll takes strings
"[foo".replaceAll("\\[", "\""));
Might I also add this as a good place to test your regex strings
"[foo".replaceAll(Pattern.quote("["), "\"") ;
The second argument - replacement - is a common string (doesnt need quotation).
"[foo".replaceAll("\\[", "\"")
to escape special characters in strings, like " \, you prepend a \, so \" becomes ", \\ becomes \, etc...
The following works:
"[foo".replaceAll("\[", "\\"")
Notes:
replaceAll interprets its first argument as a regular expression.
you need to escape (within the regex context) the opening bracket or it will be malformed.

java eclipse regex cant "\+"

I need to check a String is "\++?" which will match something like +6014456
But I get this error message invalid escape sequence (valid ones are \b \t \n \f \r \" \' \\) .... why?
It's giving you an error because "\++?" isn't a valid Java literal - you need to escape the backslash. Try this:
Pattern pattern = Pattern.compile("\\++?");
However, I don't think that's actually the regular expression you want. Don't you actually mean something like:
Pattern pattern = Pattern.compile("\\+\\d+");
That corresponds to a regular expression of \+\d+, i.e. a plus followed by at least one digit.
I think you should use two backslashes. One for escaping the second (because it's a java string), the second for escaping the + (because it's a special character for regex).
shouldn't it be more like "\\+?" ?
Pattern pattern = Pattern.compile("\\++?");
System.out.println(pattern.matcher("+9970").find());
works for me

String.replaceAll single backslashes with double backslashes

I'm trying to convert the String \something\ into the String \\something\\ using replaceAll, but I keep getting all kinds of errors. I thought this was the solution:
theString.replaceAll("\\", "\\\\");
But this gives the below exception:
java.util.regex.PatternSyntaxException: Unexpected internal error near index 1
The String#replaceAll() interprets the argument as a regular expression. The \ is an escape character in both String and regex. You need to double-escape it for regex:
string.replaceAll("\\\\", "\\\\\\\\");
But you don't necessarily need regex for this, simply because you want an exact character-by-character replacement and you don't need patterns here. So String#replace() should suffice:
string.replace("\\", "\\\\");
Update: as per the comments, you appear to want to use the string in JavaScript context. You'd perhaps better use StringEscapeUtils#escapeEcmaScript() instead to cover more characters.
TLDR: use theString = theString.replace("\\", "\\\\"); instead.
Problem
replaceAll(target, replacement) uses regular expression (regex) syntax for target and partially for replacement.
Problem is that \ is special character in regex (it can be used like \d to represents digit) and in String literal (it can be used like "\n" to represent line separator or \" to escape double quote symbol which normally would represent end of string literal).
In both these cases to create \ symbol we can escape it (make it literal instead of special character) by placing additional \ before it (like we escape " in string literals via \").
So to target regex representing \ symbol will need to hold \\, and string literal representing such text will need to look like "\\\\".
So we escaped \ twice:
once in regex \\
once in String literal "\\\\" (each \ is represented as "\\").
In case of replacement \ is also special there. It allows us to escape other special character $ which via $x notation, allows us to use portion of data matched by regex and held by capturing group indexed as x, like "012".replaceAll("(\\d)", "$1$1") will match each digit, place it in capturing group 1 and $1$1 will replace it with its two copies (it will duplicate it) resulting in "001122".
So again, to let replacement represent \ literal we need to escape it with additional \ which means that:
replacement must hold two backslash characters \\
and String literal which represents \\ looks like "\\\\"
BUT since we want replacement to hold two backslashes we will need "\\\\\\\\" (each \ represented by one "\\\\").
So version with replaceAll can look like
replaceAll("\\\\", "\\\\\\\\");
Easier way with replaceAll
To make out life easier Java provides tools to automatically escape text into target and replacement parts. So now we can focus only on strings, and forget about regex syntax:
replaceAll(Pattern.quote(target), Matcher.quoteReplacement(replacement))
which in our case can look like
replaceAll(Pattern.quote("\\"), Matcher.quoteReplacement("\\\\"))
Even better: use replace
If we don't really need regex syntax support lets not involve replaceAll at all. Instead lets use replace. Both methods will replace all targets, but replace doesn't involve regex syntax. So you could simply write
theString = theString.replace("\\", "\\\\");
To avoid this sort of trouble, you can use replace (which takes a plain string) instead of replaceAll (which takes a regular expression). You will still need to escape backslashes, but not in the wild ways required with regular expressions.
You'll need to escape the (escaped) backslash in the first argument as it is a regular expression. Replacement (2nd argument - see Matcher#replaceAll(String)) also has it's special meaning of backslashes, so you'll have to replace those to:
theString.replaceAll("\\\\", "\\\\\\\\");
Yes... by the time the regex compiler sees the pattern you've given it, it sees only a single backslash (since Java's lexer has turned the double backwhack into a single one). You need to replace "\\\\" with "\\\\", believe it or not! Java really needs a good raw string syntax.

Categories