Java regular expression - java

I want to replace any one of these chars:
% \ , [ ] # & # ! ^
... with empty string ("").
I used this code:
String line = "[ybi-173]";
Pattern cleanPattern = Pattern.compile("%|\\|,|[|]|#|&|#|!|^");
Matcher matcher = cleanPattern.matcher(line);
line = matcher.replaceAll("");
But it doesn't work.
What do I miss in this regular expression?

Some of the characters are special characters that are being interpreted differently. You can either escape them all with backslashes, or better yet put them in a character class (no need to escape the non-CC characters, eases readability):
Pattern cleanPattern = Pattern.compile("[%\\\\,\\[\\]#&#!^]");

There are several reasons why your solution doesn't work.
Several of the characters you wish to match have special meanings in regular expressions, including ^, [, and ]. These must be escaped with a \ character, but, to make matters worse, the \ itself must be escaped so that the Java compiler will pass the \ through to the regular expression constructor. So, to sum up step one, if you wish to match a ] character, the Java string must look like "\\]".
But, furthermore, this is a case for character classes [], rather than the alternation operator |. If you want to match "any of the characters a, b, c, that looks like [abc]. You character class would be [%\,[]#&#!^], but, because of the Java string escaping rules and the special meaning of certain characters, your regex will be [%\\\\,\\[\\]#&#!\\^].

You'd define your pattern as a character group enclosed in [ and ] and escape special chars, e.g.
String n = "%\\,[]#&#!^".replaceAll("[%\\\\,\\[\\]#&#!^]", "");

Related

Java - Regular expression to match strings not containing quote and backslash characters [duplicate]

How to write a regular expression to match this \" (a backslash then a quote)? Assume I have a string like this:
click to search
I need to replace all the \" with a ", so the result would look like:
click to search
This one does not work: str.replaceAll("\\\"", "\"") because it only matches the quote. Not sure how to get around with the backslash. I could have removed the backslash first, but there are other backslashes in my string.
If you don't need any of regex mechanisms like predefined character classes \d, quantifiers etc. instead of replaceAll which expects regex use replace which expects literals
str = str.replace("\\\"","\"");
Both methods will replace all occurrences of targets, but replace will treat targets literally.
BUT if you really must use regex you are looking for
str = str.replaceAll("\\\\\"", "\"")
\ is special character in regex (used for instance to create \d - character class representing digits). To make regex treat \ as normal character you need to place another \ before it to turn off its special meaning (you need to escape it). So regex which we are trying to create is \\.
But to create string literal representing text \\ so you could pass it to regex engine you need to write it as four \ ("\\\\"), because \ is also special character in String literals (part of code written using "...") since it can be used for instance as \t to represent tabulator.
That is why you also need to escape \ there.
In short you need to escape \ twice:
in regex \\
and then in String literal "\\\\"
You don't need a regular expression.
str.replace("\\\"", "\"")
should work just fine.
The replace method takes two substrings and replaces all non-overlapping occurrences of the first with the second. Per the javadoc:
public String replace(CharSequence target,
CharSequence replacement)
Replaces each substring of this string that matches the literal target sequence with the specified literal replacement sequence. The replacement proceeds from the beginning of the string to the end, for example, replacing "aa" with "b" in the string "aaa" will result in "ba" rather than "ab".
try this: str.replaceAll("\\\\\"", "\\\"")
because Java will replace \ twice:
(1) \\\\\" --> \\" (for string)
(2) \\" --> \" (for regex)

How do I properly create a regex for String.matches() with escape characters?

I am trying to check if a string of length one is any of the following characters: "[", "\", "^", "_", single back-tick "`", or "]".
Right now I am trying to accomplish this with the following if statement:
if (character.matches("[[\\]^_`]")){
isValid = false;
}
When I run my program I get the following error for the if statement:
java.util.regex.PatternSyntaxException: null (in
java.util.regex.Pattern)
What is the correct syntax for a regex with escape characters?
Your list has four characters that need special attention:
^ is the inversion character. It must not be the first character in a character class, or it must be escaped.
\ is the escape character. It must be escaped for direct use.
[ starts a character class, so it must be escaped.
] ends a character class, so it must be escaped.
Here is the "raw" regex:
[\[\]_`\\^]
Since you represent your regex as a Java string literal, all backslashes must be additionally escaped for the Java compiler:
if (character.matches("[\\[\\]_`\\\\^]")){
isValid = false;
}
You need to escape the [, ] and \\ - the [ and ] so that the pattern compiler knows that they are not the special character class delimiters, and \\ because it's already being converted to one backslash because it's in a string literal, so to represent an escaped backslash in a pattern, you need to have no less than four consecutive backslashes.
So the resulting regex should be
"[\\[\\]\\\\^_`]"
(Test it on RegexPlanet - click on the "Java" button to test).

RegEx special char "|" escaping in Java

I am trying to split a string like: abc|aa||
When I use the regular string.split I am required to provide a regular expression.
I tried to do the following :
string.split("|")
string.split("\|")
string.split("/|")
string.split("\Q|\E")
Non of them work.....
Does anyone know how to make it work?
I don't know how you tried, but
public static void main(String[] args) {
String a= "abc|aa||";
String split = Pattern.quote("|");
System.out.println(split);
System.out.println(Arrays.toString(a.split(split)));
}
prints out
\Q|\E
[abc, aa]
effectively splitting on |. The \Q ... \E is a regex quote. Anything inside it will be matched as a literal pattern.
string.split("\|"); // won't work because \| is not a valid escape sequence
string.split("/|"); // will compile, but split on / and empty space, so between each character
string.split("|"); // will compile, but split on empty space, so between each character
// true alternative to quoted solution above
string.split("\\|") // escape the second \ which will resolve as an escaped | in the regex pattern
using a double backslash is required because the backslash is also a special character. So you need to escape the escape character. i.e. \
\|
| is a special character hence you need to escape it using slashes. Try using
string.split("\\|")
| is a special character for the regular expression, thus it must be escaped e.g. \|
The backslash \ is a special character in Java, thus it must also be escaped
As a result, must do the following to achieve the desired effect.
string.split("\\|")
All of the following patterns split it all right: "\\Q|\\E" "\\|" "[|]" of course the latter two are preferrable

java regex pattern unclosed character class

I need some help. Im getting:
Caused by: java.util.regex.PatternSyntaxException: Unclosed character class near index 24
^[a-zA-Z└- 0-9£µ /.'-\]*$
^
at java.util.regex.Pattern.error(Pattern.java:1713)
at java.util.regex.Pattern.clazz(Pattern.java:2254)
at java.util.regex.Pattern.sequence(Pattern.java:1818)
at java.util.regex.Pattern.expr(Pattern.java:1752)
at java.util.regex.Pattern.compile(Pattern.java:1460)
at java.util.regex.Pattern.<init>(Pattern.java:1133)
at java.util.regex.Pattern.compile(Pattern.java:823)
Here is my code:
String testString = value.toString();
Pattern pattern = Pattern.compile("^[a-zA-Z\300-\3770-9\u0153\346 \u002F.'-\\]*$");
Matcher m = pattern.matcher(testString);
I have to use the unicode value for some because I'm working with xhtml.
Any help would be great!
Assuming that you want to match \ and - and not ]:
Pattern pattern = Pattern.compile("^[a-zA-Z\300-\3770-9\u0153\346 \u002F.'\\\\-]*$");
You need to double escape your backslashes, as \ is also an escape character in regex. Thus \\] escapes the backslash for java but not for regex. You need to add another java-escaped \ in order to regex-escape your second java-escaped \.
So \\\\ after java escaping becomes \\ which is then regex escaped to \.
Moving - to the end of the sequence means that it is used as a character, instead of a range operator as pointed out by Pshemo.
It is hard to say what are you trying to achieve, but I can see few strange things in your regex:
you have opened class of characters but never closed it. Instead you used \\] which makes ] normal character.
If you want to include ] in your characters class then you need additional ] at the end, like "^[a-zA-Z\300-\3770-9\u0153\346 \u002F.'-\\]]*$"
if you want to include \ in your characters class then you need to use \\\\ version, because you need to escape its special meaning two times, in regex engine, and in Javas String
you used - with ('-\\]) which in character class is used to specify range of characters like a-z or A-Z. To escape its special meaning you need to use \\-

Regular expression to match a backslash followed by a quote

How to write a regular expression to match this \" (a backslash then a quote)? Assume I have a string like this:
click to search
I need to replace all the \" with a ", so the result would look like:
click to search
This one does not work: str.replaceAll("\\\"", "\"") because it only matches the quote. Not sure how to get around with the backslash. I could have removed the backslash first, but there are other backslashes in my string.
If you don't need any of regex mechanisms like predefined character classes \d, quantifiers etc. instead of replaceAll which expects regex use replace which expects literals
str = str.replace("\\\"","\"");
Both methods will replace all occurrences of targets, but replace will treat targets literally.
BUT if you really must use regex you are looking for
str = str.replaceAll("\\\\\"", "\"")
\ is special character in regex (used for instance to create \d - character class representing digits). To make regex treat \ as normal character you need to place another \ before it to turn off its special meaning (you need to escape it). So regex which we are trying to create is \\.
But to create string literal representing text \\ so you could pass it to regex engine you need to write it as four \ ("\\\\"), because \ is also special character in String literals (part of code written using "...") since it can be used for instance as \t to represent tabulator.
That is why you also need to escape \ there.
In short you need to escape \ twice:
in regex \\
and then in String literal "\\\\"
You don't need a regular expression.
str.replace("\\\"", "\"")
should work just fine.
The replace method takes two substrings and replaces all non-overlapping occurrences of the first with the second. Per the javadoc:
public String replace(CharSequence target,
CharSequence replacement)
Replaces each substring of this string that matches the literal target sequence with the specified literal replacement sequence. The replacement proceeds from the beginning of the string to the end, for example, replacing "aa" with "b" in the string "aaa" will result in "ba" rather than "ab".
try this: str.replaceAll("\\\\\"", "\\\"")
because Java will replace \ twice:
(1) \\\\\" --> \\" (for string)
(2) \\" --> \" (for regex)

Categories