Regex in java: error for str.replace("\s+", " ") - java

Why the java (1.7) gives me error for the following line?
String str2 = str.replace("\s+", " ");
Error:
Invalid escape sequence (valid ones are \b \t \n \f \r \" \' \\ )
As far as I know "\s+" is a valid regex. Isn't it?

String.replace() will only replace literals, that's the first problem.
The second problem is that \s is not a valid escape sequence in a Java string literal, by definition.
Which means what you wanted was probably "\\s+".
But even then, .replace() won't take that as a regex. You have to use .replaceAll() instead:
s.replaceAll("\\s+", "");
BUT there is another problem. You seem to be using it often... Therefore, use a Pattern instead:
private static final Pattern SPACES = Pattern.compile("\\s+");
// In code...
SPACES.matcher(input).replaceAll("");
FURTHER NOTES:
If what you want is to only replace the first occurrence, then use .replaceFirst(); String has it, and so does Pattern
When you .replace{First,All}() on a String, a new Pattern is recompiled for each and every invocation. Use a Pattern if you have to do repetitive matches!

It's a valid regular expression pattern, but \s is not a valid String literal escape sequence. Escape the \.
String str2 = str.replace("\\s+", " ");
As suggested, String#replace(CharSequence, CharSequence) doesn't consider the arguments you provide as regular expressions. So even if you got the program to compile, it wouldn't do what you seem to want it to do. Check out String#replaceAll(String, String).

Related

Input with backslashes doesn't match regular expression

I'm trying to create a regular expression matcher, but it doesn't work as expected.
String input = "// source C:\\path\\to\\folder";
System.out.println(Pattern.matches("//\\s*source\\s+[a-zA-Z]:(\\[a-zA-Z0-9_-]+)+", input));
It returns false but it should pass. What is wrong with that regex?
Backslashes. That's what is wrong.
System.out.println(Pattern.matches("//\\s*source\\s+[a-zA-Z]:(\\\\[a-zA-Z0-9_-]+)+", input));
^^
In regex, a backslash must be escaped—backslashed. That's two backslashes. Add to that, Java escaping and you must write four backslashes to match one.
You forgot \\ in [a-zA-Z0-9_-]:
String input = "// source C:\\path\\to\\folder";
System.out.println(Pattern.matches("//\\s*source\\s+[a-zA-Z]:(\\\\[a-zA-Z0-9_\\-]+)+", input));
You should use: \\\\ to match a backslash in Java regex:
String input = "// source C:\\path\\to\\folder";
boolean m = Pattern.matches("//\\s*source\\s+[a-zA-Z]:(\\\\[a-zA-Z0-9_-]+)+", input);
//=> true
You need first escaping i.e. \\ for String and another escaping i.e. \\ for underlying regex engine to get a literal \.

How to escape characters in a regular expression

When I use the following code I've got an error:
Matcher matcher = pattern.matcher("/Date\(\d+\)/");
The error is :
invalid escape sequence (valid ones are \b \t \n \f \r \" \' \\ )
I have also tried to change the value in the brackets to('/Date\(\d+\)/'); without any success.
How can i avoid this error?
You need to double-escape your \ character, like this: \\.
Otherwise your String is interpreted as if you were trying to escape (.
Same with the other round bracket and the d.
In fact it seems you are trying to initialize a Pattern here, while pattern.matcher references a text you want your Pattern to match.
Finally, note that in a Pattern, escaped characters require a double escape, as such:
\\(\\d+\\)
Also, as Rohit says, Patterns in Java do not need to be surrounded by forward slashes (/).
In fact if you initialize a Pattern like that, it will interpret your Pattern as starting and ending with literal forward slashes.
Here's a small example of what you probably want to do:
// your input text
String myText = "Date(123)";
// your Pattern initialization
Pattern p = Pattern.compile("Date\\(\\d+\\)");
// your matcher initialization
Matcher m = p.matcher(myText);
// printing the output of the match...
System.out.println(m.find());
Output:
true
Your regex is correct by itself, but in Java, the backslash character itself needs to be escaped.
Thus, this regex:
/Date\(\d+\)/
Must turn into this:
/Date\\(\\d+\\)/
One backslash is for escaping the parenthesis or d. The other one is for escaping the backslash itself.
The error message you are getting arises because Java thinks you're trying to use \( as a single escape character, like \n, or any of the other examples. However, \( is not a valid escape sequence, and so Java complains.
In addition, the logic of your code is probably incorrect. The argument to matcher should be the text to search (for example, "/Date(234)/Date(6578)/"), whereas the variable pattern should contain the pattern itself. Try this:
String textToMatch = "/Date(234)/Date(6578)/";
Pattern pattern = pattern.compile("/Date\\(\\d+\\)/");
Matcher matcher = pattern.matcher(textToMatch);
Finally, the regex character class \d means "one single digit." If you are trying to refer to the literal phrase \\d, you would have to use \\\\d to escape this. However, in that case, your regex would be a constant, and you could use textToMatch.indexOf and textToMatch.contains more easily.
To escape regex in java, you can also use Pattern.quote()

How to match a string's end using a regex pattern in Java?

I want a regular expression pattern that will match with the end of a string.
I'm implementing a stemming algorithm that will remove suffixes of a word.
E.g. for a word 'Developers' it should match 's'.
I can do it using following code :
Pattern p = Pattern.compile("s");
Matcher m = p.matcher("Developers");
m.replaceAll(" "); // it will replace all 's' with ' '
I want a regular expression that will match only a string's end something like replaceLast().
You need to match "s", but only if it is the last character in a word. This is achieved with the boundary assertion $:
input.replaceAll("s$", " ");
If you enhance the regular expression, you can replace multiple suffixes with one call to replaceAll:
input.replaceAll("(ed|s)$", " ");
Use $:
Pattern p = Pattern.compile("s$");
public static void main(String[] args)
{
String message = "hi this message is a test message";
message = message.replaceAll("message$", "email");
System.out.println(message);
}
Check this,
http://docs.oracle.com/javase/tutorial/essential/regex/bounds.html
When matching a character at the end of string, mind that the $ anchor matches either the very end of string or the position before the final line break char if it is present even when the Pattern.MULTILINE option is not used.
That is why it is safer to use \z as the very end of string anchor in a Java regex.
For example:
Pattern p = Pattern.compile("s\\z");
will match s at the end of string.
See a related Whats the difference between \z and \Z in a regular expression and when and how do I use it? post.
NOTE: Do not use zero-length patterns with \z or $ after them because String.replaceAll(regex) makes the same replacement twice in that case. That is, do not use input.replaceAll("s*\\z", " ");, since you will get two spaces at the end, not one. Either use "s\\z" to replace one s, or use "s+\\z" to replace one or more.
If you still want to use replaceAll with a zero-length pattern anchored at the end of string to replace with a single occurrence of the replacement, you can use a workaround similar to the one in the How to make a regular expression for this seemingly simple case? post (writing "a regular expression that works with String replaceAll() to remove zero or more spaces from the end of a line and replace them with a single period (.)").

Removing literal character in regex

I have the following string
\Qpipe,name=office1\E
And I am using a simplified regex library that doesn't support the \Q and \E.
I tried removing them
s.replaceAll("\\Q", "").replaceAll("\\E", "")
However, I get the error Caused by: java.util.regex.PatternSyntaxException: Illegal/unsupported escape sequence near index 1
\E
^
Any ideas?
\ is the special escape character in both Java string and regex engine. To pass a literal \ to the regex engine you need to have \\\\ in the Java string. So try:
s.replaceAll("\\\\Q", "").replaceAll("\\\\E", "")
Alternatively and a simpler way would be to use the replace method which takes string and not regex:
s.replace("\\Q", "").replace("\\E", "")
Use the Pattern.quote() function to escape special characters in regex for example
s.replaceAll(Pattern.quote("\Q"), "")
replaceAll takes a regular expression string. Instead, just use replace which takes a literal string. So myRegexString.replace("\\Q", "").replace("\\E", "").
But that still leaves you with the problem of quoting special regex characters for your simplified regex library.
String.replaceAll() takes a regular expression as parameter, so you need to escape your backslash twice:
s.replaceAll("\\\Q", "").replaceAll("\\\\E", "");
You can also use the below. I used this because i was matching and replacing a text wrapped and the Q & E would stay in the pattern. This way it doesn't.
final int flags = Pattern.LITERAL;
regex = "My regex";
pattern = Pattern.compile( regex, flags );

java eclipse regex cant "\+"

I need to check a String is "\++?" which will match something like +6014456
But I get this error message invalid escape sequence (valid ones are \b \t \n \f \r \" \' \\) .... why?
It's giving you an error because "\++?" isn't a valid Java literal - you need to escape the backslash. Try this:
Pattern pattern = Pattern.compile("\\++?");
However, I don't think that's actually the regular expression you want. Don't you actually mean something like:
Pattern pattern = Pattern.compile("\\+\\d+");
That corresponds to a regular expression of \+\d+, i.e. a plus followed by at least one digit.
I think you should use two backslashes. One for escaping the second (because it's a java string), the second for escaping the + (because it's a special character for regex).
shouldn't it be more like "\\+?" ?
Pattern pattern = Pattern.compile("\\++?");
System.out.println(pattern.matcher("+9970").find());
works for me

Categories