Every time i try to split the string "hello*world" using s.split("*"); I get a PatternSyntaxException.
I have tried using s.split("\*"); but that gives me another error. Im sure this is something simple.
How do i stop this?
* is a meta-character in regular expressions used as a wildcard quantifier to match zero of more characters
Try using 2 backslash characters
s.split("\\*");
The split method takes a regular expression as the argument, not a normal string. The * has special meaning in regular expressions. If you want to split on a literal *, you have to escape it with a backslash. But the backslash is also an escape character in Java string literals, so you have to escape the backslash too by using two backslashes:
s.split("\\*")
Related
I have comma separated list of regular expressions:
.{8},[0-9],[^0-9A-Za-z ],[A-Z],[a-z]
I have done a split on the comma. Now I'm trying to match this regex against a generated password. The problem is that Pattern.compile does not like square brackets that is not escaped.
Can some please give me a simple function that takes a string like so: [0-9] and returns the escaped string \[0-9\].
For some reason, the above answer didn't work for me. For those like me who come after, here is what I found.
I was expecting a single backslash to escape the bracket, however, you must use two if you have the pattern stored in a string. The first backslash escapes the second one into the string, so that what regex sees is \]. Since regex just sees one backslash, it uses it to escape the square bracket.
\\]
In regex, that will match a single closing square bracket.
If you're trying to match a newline, for example though, you'd only use a single backslash. You're using the string escape pattern to insert a newline character into the string. Regex doesn't see \n - it sees the newline character, and matches that. You need two backslashes because it's not a string escape sequence, it's a regex escape sequence.
You can use Pattern.quote(String).
From the docs:
public static String quote​(String s)
Returns a literal pattern String for the specified String.
This method produces a String that can be used to create a Pattern that would match the string s as if it were a literal pattern.
Metacharacters or escape sequences in the input sequence will be given no special meaning.
You can use the \Q and \E special characters...anything between \Q and \E is automatically escaped.
\Q[0-9]\E
Pattern.compile() likes square brackets just fine. If you take the string
".{8},[0-9],[^0-9A-Za-z ],[A-Z],[a-z]"
and split it on commas, you end up with five perfectly valid regexes: the first one matches eight non-line-separator characters, the second matches an ASCII digit, and so on. Unless you really want to match strings like ".{8}" and "[0-9]", I don't see why you would need to escape anything.
I have regex expression stored in DB - '\\\\E\\\\', I use java to fetch it and match it to Strings.
I thought that since java reads from the DB it knows to escape SQL special characters by itself, and all I need is to escape the regex special chracters, so this expression actually matches '\\E\\'.
Problem is that it macthes '\E\' rather then '\\E\\' , why?
If you want to use a regex to match one literal backslash character, you need to use four backslashes in a Java string.
The regex \\ matches one literal backslash.
The string "\\" denotes a single backslash.
Therefore, in order to build a regex that consists of two backslashes, you need a Java string with four backslashes.
So you need "\\\\\\\\E\\\\\\\\" to construct a regex that matches \\E\\...
I want to split a String in Java on * using the split method. Here is the code:
String str = "abc*def";
String temp[] = str.split("*");
System.out.println(temp[0]);
But this program gives me the following error:
Exception in thread "main" java.util.regex.PatternSyntaxException:
Dangling meta character '*' near index 0 *
I tweaked the code a little bit, using '\\*' as the delimiter, which works perfectly. Can anyone explain this behavior (or suggest an alternate solution)?
I don't want to use StringTokenizer.
The split() method actually accepts a regular expression. The * character has special meaning within a regular expression, and cannot appear on its own. In order to tell the regular expression to use the actual * character, you need to escape it with the \ character.
Thus, your code becomes:
String str = "abc*def";
String temp[] = str.split("\\*");
System.out.println(temp[0]); // Prints "abc"
Note the \\: you need to escape the slash for Java as well.
If you want to avoid this issue in the future, please read up on regular expressions, so you'll have a good idea of both what types of expressions you can use, as well as which characters you'll need to escape.
String.split() expects a regular expression. In a regular expression * has a special meaning (0 or more of the character class before it) so it has to be escaped. \* accomplishes it. Since you are using a Java string \\ is the escape sequence for \ so your regular expression just becomes \* which behaves correctly.
Split accepts a regular expression to split on, not a string. Regular expressions have * as a reserved character, so you need to escape it with a backslash.
In java specifically, backslashes in strings are also special characters. They are used for newlines (\n), tabs (\t), and many other less common characters.
So because you are writing java, and writing a regex, you need to escape the * character twice. And thus '\*'.
I have Java string:
String b = "/feedback/com.school.edu.domain.feedback.Review$0/feedbackId");
I also have generated pattern against which I want to match this string:
String pattern = "/feedback/com.school.edu.domain.feedback.Review$0(.)*";
When I say b.matches(pattern) it returns false. Now I know dollar sign is part of Java RegEx, but I don't know how should my pattern look like. I am assuming that $ in pattern needs to be replaced by some escape characters, but don't know how many. This $ sign is important to me as it helps me distinguish elements in list (numbers after dollar), and I can't go without it.
Use
String escapedString = java.util.regex.Pattern.quote(myString)
to automatically escape all special regex characters in a given string.
You need to escape $ in the regex with a back-slash (\), but as a back-slash is an escape character in strings you need to escape the back-slash itself.
You will need to escape any special regex char the same way, for example with ".".
String pattern = "/feedback/com\\.navteq\\.lcms\\.common\\.domain\\.poi\\.feedback\\.Review\\$0(.)*";
In Java regex both . and $ are special. You need to escape it with 2 backslashes, i.e..
"/feedback/com\\.navtag\\.etc\\.Review\\$0(.*)"
(1 backslash is for the Java string, and 1 is for the regex engine.)
Escape the dollar with \
String pattern =
"/feedback/com.navteq.lcms.common.domain.poi.feedback.Review\\$0(.)*";
I advise you to escape . as well, . represent any character.
String pattern =
"/feedback/com\\.navteq\\.lcms\\.common\\.domain\\.poi\\.feedback\\.Review\\$0(.)*";
The ans by #Colin Hebert and edited by #theon is correct. The explanation is as follows. #azec-pdx
It is a regex as a string literal (within double quotes).
period (.) and dollar-sign ($) are special regex characters (metacharacters).
To make the regex engine interpret them as normal regex characters period(.) and dollar-sign ($), you need to prefix a single backslash to each. The single backslash ( itself a special regex character) quotes the character following it and thus escaping it.
Since the given regex is a string literal, another backslash is required to be prefixed to each to avoid confusion with the usual visible-ASCII escapes(character, string and Unicode escapes in string literals) and thus avoid compiler error.
Even if you use within a string literal any special regex construct that has been defined as an escape sequence, it needs to be prefixed with another backslash to avoid compiler error.For example, the special regex construct (an escape sequence) \b (word boundary) of regex would clash with \b(backspace) of the usual visible-ASCII escape(character escape). Thus another backslash is prefixed to avoid the clash and then \\b would be read by regex as word boundary.
To be always safe, all single backslash escapes (quotes) within string literals are prefixed with another backslash. For example, the string literal "\(hello\)" is illegal and leads to a compile-time error; in order to match the string (hello) the string literal "\\(hello\\)" must be used.
The last period (.)* is supposed to be interpreted as special regex character and thus it needs no quoting by a backslash, let alone prefixing a second one.
I want to split a string with "?" as the delimiter.
str.split("?")[0] fails.
The argument to the "split" method must be a regular expression, and the '?' character has special meaning in regular expressions so you have to escape it. That's done by adding a backslash before it in the regexp. However, since the regexp is being supplied by way of a Java String, it requires two backslashes instead so as to get an actual backslash character into the regexp:
str.split( "\\?" )[0];
str.split("\\?")[0]