Java String split with regular expression and escape character [duplicate] - java

This question already has answers here:
How do I express ":" but not preceded by "\" in a Java regular expression?
(2 answers)
Closed 4 years ago.
I need to split a string in below condition.
Split with / and should not split if it has \/.
Split with = and should not split if it has \=.
Basically looking for TWO regular expressions which split with above conditions and avoid if it has escape character.

You may try using lookarounds here:
String input = "Hello/World";
String[] parts = input.split("(?<!\\\\)[/=]");
The above single regex covers both splitting cases. It uses a negative lookbehind, which asserts that the character which immediately precedes the / or = is not a backslash.

You could use a negative lookbehind (?<!\\\\) to assert what is on the left is not a backslash.
Then match 1+ times a forward slash or an equals sign [/=]+ using a character class:
String regex = "(?<!\\\\)[/=]+";
Java demo | Regex demo

Related

Why is `\.` not a valid escape sequence in java regex [duplicate]

This question already has answers here:
java regex pattern unclosed character class
(2 answers)
Closed 6 years ago.
In the GWT tutorial where you build a stock watcher there is this regex expression to check if an input is valid:
if (!symbol.matches("^[0-9A-Z\\.]{1,10}$"))
Which allows inputs between 1 and 10 chars that are numbers, letters, or dots.
The part that confuses me is the \\.
I interpret this as escaped backslash \\ and then a . which stands for any character. And I thought the correct expression would be \. to escape the dot but doing this results in a regex error in eclipse Invalid escape sequence.
Am I missing the obvious here?
This is one of the hassles of regular expressions in Java. That \\ is not an escaped backslash at the regex level, just at the string level.
This string:
"^[0-9A-Z\\.]{1,10}$"
Defines this regular expression:
^[0-9A-Z\.]{1,10}$
...because the escape is consumed by the string literal.
\ is the escape symbol in a Java String Literal. For instance the newline character is written as \n. In order to place a normal \ in a Java string, this is done by using \\.
So your Java String literal (string in the code): "^[0-9A-Z\\.]{1,10}$" is the actual string used for the regular expression "^[0-9A-Z\.]{1,10}$" (with a single slash). So as you expected this is \. in the regular expression.

Java doesn't split "|" symbol correctly [duplicate]

This question already has answers here:
Splitting a Java String by the pipe symbol using split("|")
(7 answers)
Closed 7 years ago.
I have a file with content
1|yes|
2|yes|
3|yes|
4|yes|
5|yes|
6|yes|
7|yes|
8|yes|
9|yes|
10|yes|
11|yes|
12|yes|
13|yes|
14|yes|
15|yes|
I use java's String[] tokens = split("|"); to split each line, but it returns (for example splitting "10|yes|") [1,0,|,y,e,s,|]. It seems instead of splitting by "|", it splits every character. Anyone has any idea on it? Thanks!
split accepts a regular expression. | has a specific meaning in regular expressions, it expresses an alternation. To actually split on |, you have to escape it in the regex with a backslash. Since you specify the regex using a string literal, and backslashes are special in string literals, you have to escape that with another backslash:
String[] tokens = str.split("\\|");
In the general case, if you want to use the contents of a string literally, you can use Pattern.quote to automatically escape any special characters. You don't really need it here, but it's useful for end-user-entered values:
String[] tokens = str.split(Pattern.quote(stringToSplitOnLiterally));

Is it possible to use split function with a '.' as a delimiting regular expression? [duplicate]

This question already has answers here:
Split string with dot as delimiter
(13 answers)
Closed 8 years ago.
I wanna split the content of a string variable, but I wanna use the point as a delimiting regular expression, my code doesn't work.
public class Test {
public static void main(String [] a){
String ch = "r.a.c.h.i.d";
String[] tab;
tab=ch.split(".");
System.out.println(tab.length);
for(String e : tab)System.out.println(e);
}
}
Change tab=ch.split("."); to tab=ch.split("\\.");. You need to escape the dot because otherwise it's treated as a special character in the regex passed to split.
tab = ch.split("\\.");
One slash is the escape character for the regex. But in Java you need to have a second slash because you have to escape the first slash.
Yes, it's possible. In a regular expression, . means any character.
Predefined character classes
.       Any character (may or may not match line terminators)
So you must escape it to provide the literal . meaning. Escape it with a backslash character, providing two backslashes, because Java needs the backslash character itself escaped.
Use the regular expression "\\.".
In general, to get the literal characters out of an expression that may contain special-meaning characters, you can use the Pattern.quote method.
This method produces a String that can be used to create a Pattern that would match the string s as if it were a literal pattern.
split(Pattern.quote("."))

Splitting on "," but not "\," [duplicate]

This question already has answers here:
How to split a comma separated String while ignoring escaped commas?
(6 answers)
Closed 9 years ago.
I'm looking for a regular expression to match , but ignore \, in Java's regex engine. This comes close:
[^\\],
However, it matches the previous character (in addition to the comma), which won't work.
Perhaps the regular expression approach is the wrong one altogether. I was intending to use String.split() to parse a simple CSV file (can't use an external library) with escaped commas.
You need a negative look-behind assertion here:
String[] arr = str.split("(?<![^\\\\]\\\\),");
Note that you need 4 backslashes there. First escape the backslash for Java string literal. And then again escape both the backslashes for regex.

String.split won't let me split with periods [duplicate]

This question already has answers here:
How can I use "." as the delimiter with String.split() in java [duplicate]
(8 answers)
Closed 9 years ago.
I am trying to do a String.split on a website address using the "." so that I can find the domain name of the website.
However, when I do this:
String href = "www.google.com";
String split[] = href.split(".");
int splitLength = split.length;
It tells me that the splitLength variable is 0. Why is this, and how can I make this work?
Try using this to split the string:
href.split("\\.");
Explanation: split splits on a regex, not on a regular substring. In regexes, . is the metacharacter for 'match any character', which we don't want. So we have to escape it using a backslash \. But \ is also a metacharacter for escaping in Java strings, so we need to escape it twice.
Split uses a regex so do:
String split[] = href.split("\\.");

Categories