Java split returns white spaces in result - java

I'm using the function "split" on this string:
p(80,2)
I would like to obtain just the two numbers, so this is what I do:
String[] split = msg.msgContent().split("[p(,)]")
The regex is correct (or at least, I think so) since it splits the two numbers and puts them in the vector "split", but it turns out that this vector has a length of 4, and the first two positions are occupied by white spaces.
In fact, if I print each vector position, this is the result:
Split:
80
2
I've tried adding \\s to the regex to match with white spaces, but since there are none in my string, it didn't work.

You don't need split here, just use a simple regex to extract the digits from your string:
Pattern p = Pattern.compile("\\d+");
Matcher m = p.matcher(msg.msgContent());
while (m.find()) {
String number = m.group();
// add to array
}
Note that String#split takes a regex, and the regex you passed doesn't match the pattern you're looking for.
You might want to read the documentation of Pattern and Matcher for more information about the solution above.

split accepts a regular expression as parameter, and this is a character class: [p(,)].
Given that your code is splitting on all characters in the class:
"p(80,2)" will return an array {"", "80", "2"}
I know is not very beautiful:
List<String> collect = Pattern.compile("[^\\d]+")
.splitAsStream(s)
.filter(s -> s.length() > 0)
.collect(Collectors.toList());

Since you're splitting on p and (, the first two characters of your string are resulting in splits. I would split on the comma after replacing the p, (, and ). Like this:
String x = "p(80,2)";
String [] y = x.replaceAll("[p()]", "").split(",");

Split it's not really what you need here, but if you want to use it you can do something like that:
"p(80,2)".replace("p(", "").replace(")", "").split(",")
Results with
[80, 2]

Related

How can I avoid splitting on a comma in brackets?

I have a string below which I want to split in String array with multiple delimiters.
The delimiters are comma (,), semicolon (;), "OR" and "AND".
But I do not want to split on a comma if it's in brackets.
Example input:
device_name==device503,device_type!=GATEWAY;site_name<site3434 OR country==India AND location==BLR; new_name=in=(Rajesh,Suresh)
I am able to split the String with regex, but it doesn't handle commas in brackets correctly.
How can I fix this?
Pattern ptn = Pattern.compile("(,|;|OR|AND)");
String[] parts = ptn.split(query);
for(String p:parts){
System.out.println(p);
queryParams.add(p.trim());
}
You could use a negative look-ahead:.
String[] parts = input.split(",(?![^()]*\\))|;| OR | AND ")
Or an uglier (but perhaps conceptually simpler) way you could do it would be to replace any commas within brackets with a temporary placeholder, then do the split and replace the placeholders with real commas in the results.
String input = "X,Y=((A,B),C) OR Z";
Pattern pattern = Pattern.compile("\\(.*\\)");
Matcher matcher = pattern.matcher(input);
StringBuffer sb = new StringBuffer();
while (matcher.find()) {
matcher.appendReplacement(sb, matcher.group().replaceAll(",", "_COMMA_"));
}
matcher.appendTail(sb);
String[] parts = sb.toString().split("(,|;| OR | AND )");
for (String part : parts) {
System.out.println(part.replace("_COMMA_", ","));
}
Prints:
X
Y=((A,B),C)
Z
Alternatively, you could write your own little tokenizer that reads the input character-by-character using charAt(index) or define a grammar for an off-the-shelf parser.
You can use negative look-ahead (?!...), which looks at the following characters, and if those characters match the pattern in brackets, the overall match will fail.
String query = "device_name==device503,device_type!=GATEWAY;site_name<site3434 OR country==India AND location==BLR; new_name=in=(Rajesh,Suresh)";
String[] parts = query.split("\\s*(,(?![^()]*\\))|;|OR|AND)\\s*");
for(String part: parts)
System.out.println(part);
Output:
device_name==device503
device_type!=GATEWAY
site_name<site3434
country==India
location==BLR
new_name=in=(Rajesh,Suresh)
So in this case we check whether the characters following the , are 0 or more characters which aren't either ( or ), followed by a ), and if this is true, the , match fails.
This won't work if you can have nested brackets.
Note:
String also has a split method (as used above), which is useful for simplicity's sake (but would be slower than reusing the same Pattern over and over again for multiple Strings).
You can add \\s* (0 or more whitespace characters) to your regex to remove any spaces before or after a delimiter.
If you're using | without anything before or after (e.g. "a|b|c"), you don't need to put it in brackets.

Java replace strings between two commas

String = "9,3,5,*****,1,2,3"
I'd like to simply access "5", which is between two commas, and right before "*****"; then only replace this "5" to other value.
How could I do this in Java?
You can try using the following regex replacement:
String input = "9,3,5,*****,1,2,3";
input = input.replaceAll("[^,]*,\\*{5}", "X,*****");
Here is an explanation of the regex:
[^,]*, match any number of non-comma characters, followed by one comma
\\*{5} followed by five asterisks
This means to match whatever CSV term plus a comma comes before the five asterisks in your string. We then replace this with what you want, along with the five stars in the original pattern.
Demo here:
Rextester
I'd use a regular expression with a lookahead, to find a string of digits that precedes ",*****", and replace it with the new value. The regular expression you're looking for would be \d+(?=,\*{5}) - that is, one or more digits, with a lookahead consisting of a comma and five asterisks. So you'd write
newString = oldString.replaceAll("\\d+(?=,\\*{5})", "replacement");
Here is an explanation of the regex pattern used in the replacement:
\\d+ match any numbers of digits, but only when
(?=,\\*{5}) we can lookahead and assert that what follows immediately
is a single comma followed by five asterisks
It is important to note that the lookahead (?=,\\*{5}) asserts but does not consume. Hence, we can ignore it with regards to the replacement.
I considered newstr be "6"
String str = "9,3,5,*****,1,2,3";
char newstr = '6';
str = str.replace(str.charAt(str.indexOf(",*") - 1), newstr);
Also if you are not sure about str length check for IndexOutOfBoundException
and handle it
You could split on , and then join with a , (after replacing 5 with the desired value - say X). Like,
String[] arr = "9,3,5,*****,1,2,3".split(",");
arr[2] = "X";
System.out.println(String.join(",", arr));
Which outputs
9,3,X,*****,1,2,3
you can use spit() for replacing a string
String str = "9,3,5,*****,1,2,3";
String[] myStrings = str.split(",");
String str1 = myStrings[2];

How to split comma-separated string but exclude some words containing comma in Java

Assume that we have below string:
"test01,test02,test03,exceptional,case,test04"
What I want is to split the string into string array, like below:
["test01","test02","test03","exceptional,case","test04"]
How can I do that in Java?
This negative lookaround regex should work for you:
(?<!exceptional),|,(?!case)
Working Demo
Java Code:
String[] arr = str.split("(?<!exceptional),|,(?!case)");
Explanation:
This regex matches a comma if any one of these 2 conditions meet:
comma is not preceded by word exceptional using negative lookbehind (?<!exceptional)
comma is not followed by word case using negative lookahead (?!case)
That effectively disallows splitting on comma when it is surrounded by exceptional and case on either side.
#anubhava's answer is great—use it. For completion, here's a general solution that is applicable to many solutions and uses a beautifully simple regex:
exceptional,case|(,)
The left side of the alternation | matches complete exceptional,case. We will ignore these matches. The right side matches and captures commas to Group 1, and we know they are the right ones because they were not matched by the expression on the left. We then replace these commas by something distinctive, and split on that string.
This program shows how to use the regex (see the results at the bottom of the online demo):
String subject = "somethingelse,case,test02,test03,exceptional,case,test04,exceptional,notcase";
Pattern regex = Pattern.compile("exceptional,case|(,)");
Matcher m = regex.matcher(subject);
StringBuffer b= new StringBuffer();
while (m.find()) {
if(m.group(1) != null) m.appendReplacement(b, "##SplitHere##");
else m.appendReplacement(b, m.group(0));
}
m.appendTail(b);
String replaced = b.toString();
String[] splits = replaced.split("##SplitHere##");
for (String split : splits) System.out.println(split);
Reference
How to match (or replace) a pattern except in situations s1, s2, s3...
Article about matching a pattern unless...
How can Java understand the exceptional,case is a single word and not to split ?
Still If there would have been some other recurring character like "" you could have split it.
For ex. if It was
"test01","test02","test03","exceptional,case","test04"
You could split it using ","
So in your case it is not possible, unless you use regular expression.
Here's a dead-simple answer, don't know why I didn't think of it yesterday:
(?<!exceptional(?=,case)),
Explanation
A comma (the last character of the regex) that is not preceded by exceptional followed by ,case
String s1 = "test01.test02.test03.{i}.case.test04.test03.{i}.test03.{i}.test03.{i}";
String[] arr1 = s1.split("(?<!)\\.|\\.(?!\\{i})");
Output:
test01
test02
test03.{i}
case
test04
test03.{i}
test03.{i}
test03.{i}
You probably want to use split()
Like this:
String[] array = "test01,test02,test03,exceptional,case,test04".split(",");

Splitting a string using Regex in Java

Would anyone be able to assist me with some regex.
I want to split the following string into a number, string number
"810LN15"
1 method requires 810 to be returned, another requires LN and another should return 15.
The only real solution to this is using regex as the numbers will grow in length
What regex can I used to accomodate this?
String.split won't give you the desired result, which I guess would be "810", "LN", "15", since it would have to look for a token to split at and would strip that token.
Try Pattern and Matcher instead, using this regex: (\d+)|([a-zA-Z]+), which would match any sequence of numbers and letters and get distinct number/text groups (i.e. "AA810LN15QQ12345" would result in the groups "AA", "810", "LN", "15", "QQ" and "12345").
Example:
Pattern p = Pattern.compile("(\\d+)|([a-zA-Z]+)");
Matcher m = p.matcher("810LN15");
List<String> tokens = new LinkedList<String>();
while(m.find())
{
String token = m.group( 1 ); //group 0 is always the entire match
tokens.add(token);
}
//now iterate through 'tokens' and check whether you have a number or text
In Java, as in most regex flavors (Python being a notable exception), the split() regex isn't required to consume any characters when it finds a match. Here I've used lookaheads and lookbehinds to match any position that has a digit one side of it and a non-digit on the other:
String source = "810LN15";
String[] parts = source.split("(?<=\\d)(?=\\D)|(?<=\\D)(?=\\d)");
System.out.println(Arrays.toString(parts));
output:
[810, LN, 15]
(\\d+)([a-zA-Z]+)(\\d+) should do the trick. The first capture group will be the first number, the second capture group will be the letters in between and the third capture group will be the second number. The double backslashes are for java.
This gives you the exact thing you guys are looking for
Pattern p = Pattern.compile("(([a-zA-Z]+)|(\\d+))|((\\d+)|([a-zA-Z]+))");
Matcher m = p.matcher("810LN15");
List<Object> tokens = new LinkedList<Object>();
while(m.find())
{
String token = m.group( 1 );
tokens.add(token);
}
System.out.println(tokens);

Java replaceAll() & split() irregularities

I know, I know, now I have two problems 'n all that, but regex here means I don't have to write two complicated loops. Instead, I have a regex that only I understand, and I'll be employed for yonks.
I have a string, say stack.overflow.questions[0].answer[1].postDate, and I need to get the [0] and the [1], preferably in an array. "Easy!" my neurons exclaimed, just use regex and the split method on your input string; so I came up with this:
String[] tokens = input.split("[^\\[\\d\\]]");
which produced the following:
[, , , , , , , , , , , , , , , , [0], , , , , , , [1]]
Oh dear. So, I thought, "what would replaceAll do in this instance?":
String onlyArrayIndexes = input.replaceAll("[^\\[\\d\\]]", "");
which produced:
[0][1]
Hmm. Why so? I'm looking for a two-element string array that contains "[0]" as the first element and "[1]" as the second. Why does split not work here, when the Javadocs declare they both use the Pattern class as per the Javadoc?
To summarise, I have two questions: why does the split() call produce that large array with seemingly random space characters and am I right in thinking the replaceAll works because the regex replaces all characters not matching "[", a number and "]"? What am I missing that means I expect them to produce similar output (OK that's three, and please don't answer "a clue?" to this one!).
well from what I can see the split does work, it gives you an array that holds the string split for each match that is not a set of brackets with a digit in the middle.
as for the replaceAll I think your assumption is right. it removes everything (replace the match with "") that is not what you want.
From the API documentation:
Splits this string around matches of
the given regular expression.
This method works as if by invoking
the two-argument split method with the
given expression and a limit argument
of zero. Trailing empty strings are
therefore not included in the
resulting array.
The string "boo:and:foo", for example,
yields the following results with
these expressions:
Regex Result
: { "boo", "and", "foo" }
o { "b", "", ":and:f" }
This is not a direct answer to your question, however I want to show you a great API that will suit your need.
Check out Splitter from Google Guava.
So for your example, you would use it like this:
Iterable<String> tokens = Splitter.onPattern("[^\\[\\d\\]]").omitEmptyStrings().trimResults().split(input);
//Now you get back an Iterable which you can iterate over. Much better than an Array.
for(String s : tokens) {
System.out.println(s);
}
This prints:
0
1
split splits on boundaries defined by the regex you provide, so it's no great surprise you're getting lots of entries — nearly all of the characters in the string match your regex and so, by definition, are boundaries on which a split should occur.
replaceAll replaces matches for your regex with the replacement you give it, which in your case is a blank string.
If you're trying to grab the 0 and the 1, it's a trivial loop:
String text = "stack.overflow.questions[0].answer[1].postDate";
Pattern pat = Pattern.compile("\\[(\\d+)\\]");
Matcher m = pat.matcher(text);
List<String> results = new ArrayList<String>();
while (m.find()) {
results.add(m.group(1)); // Or just .group() if you want the [] as well
}
String[] tokens = results.toArray(new String[0]);
Or if it's always exactly two of them:
String text = "stack.overflow.questions[0].answer[1].postDate";
Pattern pat = Pattern.compile(".*\\[(\\d+)\\].*\\[(\\d+)\\].*");
Matcher m = pat.matcher(text);
m.find();
String[] tokens = new String[2];
tokens[0] = m.group(1);
tokens[1] = m.group(2);
The problem is that split is the wrong operation here.
In ruby, I'd tell you to string.scan(/\[\d+\]/), which would give you the array ["[0]","[1]"]
Java doesn't have a single-method equivalent, but we can write a scan method as follows:
public List<String> scan(String string, String regex){
List<String> list = new ArrayList<String>();
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(string);
while(matcher.find()) {
list.add(matcher.group());
}
return retval;
}
and we can call it as scan(string,"\\[\\d+\\]")
The equivalent Scala code is:
"""\[\d+\]""".r findAllIn string

Categories