Splitting a string in Java on ";", but not on "\\;" - java

In Java I try try to use the String.split() method splitting a string on ";", but not on "\\\\;". (2 back-slashes followed by a semi-colon)
Ex: "aa;bb;cc\\;dd;ee\\;;ff" should be split into;
aa
bb
cc\\;dd
ee\\;
ff
How do I accomplish this using a regular expression?
Markus

Use
"aa;bb;cc\\;dd;ee\\;;ff".split("(?<!\\\\);");
(?<!...) is called a "zero-width lookbehind". In English, you're splitting on all ; characters that are NOT preceded by a double slash, without actually matching the double slash. The quadruple slash is to escape backslashes to the regex parser. The actual regular expression used in the split would then read:
(?<!\\);

This is called negative lookbehind and the syntax is like (?<!a)b. This matches on any b that isnt precended by an a. You would want something like:
(?<!\\\\);

Here a code example with . as separator:
String p = "hello.regex\\.brain\\.twister";
System.out.println( p );
for (String s : p.split( "(?<!\\\\)\\.", -1 )) {
System.out.println( "-> "+ s );
}
Will Ouptut:
hello.regex\.brain\.twister
-> hello
-> regex\.brain.\twister

Related

How to split a List of Strings in Java with a regex that uses the last occurrence of given Pattern?

I'm pretty new to the regex world.
Given a list of Strings as input, I would like to split them by using a regex of punctuations pattern: "[!.?\n]".
The thing is, I would like to specify that if there are multiple punctuations together like this:
input: "I want it now!!!"
output: "I want it now!!"
input: "Am I ok? Yeah, I'm fine!!!"
output: ["Am I ok", "Yeah, I'm fine!!"]
You can use
[!.?\n](?![!.?\n])
Here, a !, ., ? or newline are matched only if not followed with any of these chars.
Or, if the char must be repeated:
([!.?\n])(?!\1)
Here, a !, ., ? or newline are matched only if not followed with exactly the same char.
See the regex demo #1 and the regex demo #2.
See a Java demo:
String p = "[!.?\n](?![!.?\n])";
String p2 = "([!.?\n])(?!\\1)";
String s = "I want it now!!!";
System.out.println(Arrays.toString(s.split(p))); // => [I want it now!!]
System.out.println(Arrays.toString(s.split(p2))); // => [I want it now!!]
s = "Am I ok? Yeah, I'm fine!!!";
System.out.println(Arrays.toString(s.split(p))); // => [Am I ok, Yeah, I'm fine!!]
System.out.println(Arrays.toString(s.split(p2))); // => [Am I ok, Yeah, I'm fine!!]
The split() methos of String accept as delimitator a regular expression.
Ref.: https://docs.oracle.com/javase/7/docs/api/java/lang/String.html#split(java.lang.String)
Example:
String str = "Am I ok? Yeah, I'm fine!!!";
String delimiters = "[!.?\n]";
String[] splitted = str.split(delimiters);
for(String part : splitted) {
System.out.print(part + "\n");
}
Output:
Am I ok
Yeah, I'm fine

Scala Pattern Syntax Exception

I'm trying to split a string in by the characters "}{". However I am getting an error:
> val string = "{one}{two}".split("}{")
java.util.regex.PatternSyntaxException: Illegal repetition near index 0
}{
^
I am not trying to use regex or anything. I tried using "\}\{" and it also doesn't work.
Well... the reason is that split treats its parameter string as a regular expression.
Now, both { and } are special character in regular expressions.
So you will have to skip the special characters of regex world for split's argument, like this,
val string = "{one}{two}".split("\\}\\{")
// string: Array[String] = Array({one, two})
Escape the {
val string = "{one}{two}".split("}\\{")
There are two ways to force a metacharacter to be treated as an ordinary character:
-> precede the metacharacter with a backslash.
String[] ss1 = "{one}{two}".split("[}\\{]+");
System.out.println(Arrays.toString(ss1));
output:
[one, two]
-> enclose it within \Q (which starts the quote) and \E (which ends it).
When using this technique, the \Q and \E can be placed at any location within the expression, provided that the \Q comes first.
String[] ss2 = "{one}{two}".split("[}\\Q{\\E]+");
System.out.println(Arrays.toString(ss2));
output:
[one, two]

split string based on text qualifier regex java

I want to split a string based on text qualifier for example
"1","10411721","MikeTison","08/11/2009","21/11/2009","2800.00","002934538","051","New York","10411720-002",".\Images\b.jpg",".\RTF\b.rtf"
Qualifer="
Spliter = ,
I want to split string based on Spliter , but if Spliter comes inside qualifier " than ignore it and return string including Spliter .
Regular expression i am using is (?:|,)(\"(?:[^\"]+|\"\")*\"|[^,]*)
but this regular expression only returns commas,please help me in this perspective as i am new to regular expressions
please note that if we have newline characters in string ie \r\n than it should ignore newline character
"1","10411","Muis","a","21/11/2009","2800.06","0029683778","03005136851","Awan","10411720-001",".\Images\a.jpg",".\RTF\a.rtf"
"2","08/10/2009","07:32","Call","On-Net","030092343242342376543","Monk","00:00","1.500","0.000","10.000","0.200"
"2","08/10/2009","02:50","Call","Off-Net","030092343242342376543","Une","08:00","1.500","2.000","20.000","3.500"
"2","09/10/2009","03:55","SMS","On-Net","030092343242342376543","Mink","00:00","1.500","0.000","5.000","100.500"
"2","09/10/2009","12:30","Call","Off-Net","030092343242342376543","Zog","01:01","3.500","3.000","70.000","6.500"
"2","09/10/2009","09:11","Call","On-Net","030092343242342376543","Monk","02:30","2.00","2.000","90.000","4.000"
Probably easiest solution is not searching for place to split, but finding elements which you want to return. In your case these elements
starts "
ends with "
have no " inside.
So you try with something like
String data = "\"1\",\"10411721\",\"MikeTison\",\"08/11/2009\",\"21/11/2009\",\"2800.00\",\"002934538\",\"051\",\"New York\",\"10411720-002\",\".\\Images\\b.jpg\",\".\\RTF\\b.rtf\"";
Pattern p = Pattern.compile("\"([^\"]+)\"");
Matcher m = p.matcher(data);
while(m.find()){
System.out.println(m.group(1));
}
Output:
1
10411721
MikeTison
08/11/2009
21/11/2009
2800.00
002934538
051
New York
10411720-002
.\Images\b.jpg
.\RTF\b.rtf
You can split using this regex:
String[] arr = input.split( "(?=(([^\"]*\"){2})*[^\"]*$),+" );
This regex will split on commas if those are outside double quotes by using a lookahead to make sure there are even number of quotes after a comma.
Remove the first and the last character of the whole string. Then split with ","
String test = "\"1\",\"10411721\",\"MikeTison\",\"08/11/2009\",\"21/11/2009\",\"2800.00\",\"002934538\",\"051\",\"New York\",\"10411720-002\",\".\\Images\\b.jpg\",\".\\RTF\\b.rtf\"";
if (test.length() > 0)
test = test.substring(1, test.length()-1);
System.out.println(Arrays.toString(test.split("\",\"")));
This works even if you have new line character..try it out
String str="\"1\",\"10411721\",\"MikeTison\",\"08/11/2009\",\"21/11/2009\",\"2800.00\",\"002934538\",\"051\",\"New York\",\"10411720-002\",\".\\Images\\b.jpg\",\".\\RTF\\b.rtf\"";
System.out.println(Arrays.toString(str.split(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)")));

Regex : Split on comma , but exclude commas within parentheses and quotes(Both single & Double)

I have one string
5,(5,5),C'A,B','A,B',',B','A,',"A,B",C"A,B"
I want to split it on comma but need to exclude commas within parentheses and quotes(Both single and double quotes).
Like this
5 (5,5) C'A,B' 'A,B' ',B' 'A,' "A,B" C"A,B"
Using java Regular Expression how to achieve this ??
You can use this regex:
String input = "5,(5,5),C'A,B','A,B',',B','A,',\"A,B\",C\"A,B\"";
String[] toks = input.split(
",(?=(([^']*'){2})*[^']*$)(?=(([^\"]*\"){2})*[^\"]*$)(?![^()]*\\))" );
for (String tok: toks)
System.out.printf("<%s>%n", tok);
Output:
<5>
<(5,5)>
<C'A,B'>
<'A,B'>
<',B'>
<'A,'>
<"A,B">
<C"A,B">
Explanation:
, # Match literal comma
(?=(([^']*'){2})*[^']*$) # Lookahead to ensure comma is followed by even number of '
(?=(([^"]*"){2})*[^"]*$) # Lookahead to ensure comma is followed by even number of "
(?![^()]*\\)) # Negative lookahead to ensure ) is not followed by matching
# all non [()] characters in between
,(?![^(]*\))(?![^"']*["'](?:[^"']*["'][^"']*["'])*[^"']*$)
Try this.
See demo.
For java
,(?![^(]*\\))(?![^"']*["'](?:[^"']*["'][^"']*["'])*[^"']*$)
Instead of splitting the string, consider matching instead.
String s = "5,(5,5),C'A,B','A,B',',B','A,',\"A,B\",C\"A,B\"";
Pattern p = Pattern.compile("(?:[^,]*(['\"])[^'\"]*\\1|\\([^)]*\\))|[^,]+");
Matcher m = p.matcher(s);
while (m.find()) {
System.out.println(m.group());
}
Output
5
(5,5)
C'A,B'
'A,B'
',B'
'A,'
"A,B"
C"A,B"

Java regexp: splitting on "/" that is not at the beginning of a string

I want to split a/bc/de/f as [a, bc, de, f] but /a/bc/de/f as [/a, bc, de, f].
Is there a way to split on / which is not at the beginning of the string? (I'm having a bad Regexp day.)
(?!^)/ seems to work:
public class Funclass{
public static void main(String [] args) {
String s = "/firstWithSlash/second/third/forth/fifth/";
String[] ss = s.split("(?!^)/");
for (String s_ : ss)
System.out.println(s_);
}
}
output:
/firstWithSlash
second
third
forth
fifth
As #user unknown commented, this seems to be a wrong expression, it should be (?<!^)/ to indicate negative lookbehind.
The simplest solution is probably just to split s.substring(1) on /, and then prepend s.charAt(0) to the first result.
Other than that, since the split regex is not anchored, it would be challenging to do. You'd want to split on "something that isn't the start of the line, followed by a slash" - i.e. [^^ ]/ - but this would mean that the character preceding the slash was stripped out too. In order to do this you'd need negative look-behind, but I don't think that syntax is supported in the String.split regexes.
Edit: According to the Pattern javadocs it seems that Java does support negative lookbehind, and the following regex may do the job:
s.split("(?<!^)/");
A quick test indicates that this does indeed do what you want.
Couldn't you just add a check at the beginning to see if there's a slash in the beginning?
if( str.charAt(0) == '/' ) {
arr = str.substring(1).split( "/" );
arr[0] = "/"+arr[0];
} else
arr = str.split( "/" );
Or a little simpler:
arr = str.charAt(0) + str.substring(1).split( "/" );
If the first is a slash, it'll just slap on a slash at the beginning of the first token. If there's only one character in the first token (that doesn't begin with a slash), then the first array element is the empty string and it'll still work.

Categories