Java Splitting Strings with several conditions

Java Splitting Strings with several conditions - java

I want to split a string along several different conditions -
I understand there is a Java String method called String.split(element), which splits the String into an array based on the element specified.
However, splitting among more objects seems to be very complex -- especially if the split must occur to a range of elements.
Precisely, I want java to split the string
"a>=b" into {"a",">=","b"}
"a>b" into {"a", ">", "b"}
"a==b" into {"a","==","b"}
I have been fiddling around with regex too just to see how to split it exactly based on this parameters, but the closest I've gotten is just splitting along a single character.
EDIT: a and b are arbitrary Strings that can be of any length. I simply want to split along the different kinds of comparators ">",">=","==";
For example, a could be "Apple" and b could be "Orange".
So in the end I want the String from "Apple>=Orange" into
{"Apple", ">=", "Orange"}

You can use regular expressions. No matter if you use a, or b or abc for your variables you'll get the first variable in the group 1, the condition in the group 2 and the second variable in the group 3.
Pattern pattern = Pattern.compile("(\\w+)([<=>]+)(\\w+)");
Matcher matcher = pattern.matcher("var1>=ar2b");
if(matcher.find()){
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
System.out.println(matcher.group(3));
}

The following code works for your examples:
System.out.println(Arrays.asList("a<=b".split("\\b")));
It splits the string on word boundaries.
If you need more elaborate splitting, you have to provide more examples.

You could code it out by hand and use whichever tokens you want to split on like so
public String[] splitString(String word)
{
String[] pieces;
String[] tokens = {"==", ">=", "<=","<", ">"};
for(int i = 0; i < tokens.length; i++)
{
if(word.contains(tokens[i]))
{
pieces = {
word.substring(0, word.indexOf(tokens[i])),
tokens[i],
word.substring(word.indexOf(tokens[i]) +
tokens[i].length(), word.length())};
return pieces;
}
}
return pieces;
}
This will return an array with whatever is before the token found, the token itself and whatever is left.

Related

How can I split a string without knowing the split characters a-priori?

For my project I have to read various input graphs. Unfortunately, the input edges have not the same format. Some of them are comma-separated, others are tab-separated, etc. For example:
File 1:
123,45
67,89
...
File 2
123 45
67 89
...
Rather than handling each case separately, I would like to automatically detect the split characters. Currently I have developed the following solution:
String str = "123,45";
String splitChars = "";
for(int i=0; i < str.length(); i++) {
if(!Character.isDigit(str.charAt(i))) {
splitChars += str.charAt(i);
}
}
String[] endpoints = str.split(splitChars);
Basically I pick the first row and select all the non-numeric characters, then I use the generated substring as split characters. Is there a cleaner way to perform this?

Split requires a regexp, so your code would fail for many reasons: If the separator has meaning in regexp (say, +), it'll fail. If there is more than 1 non-digit character, your code will also fail. If you code contains more than exactly 2 numbers, it will also fail. Imagine it contains hello, world - then your splitChars string becomes " , " - and your split would do nothing (that would split the string "test , abc" into two, nothing else).
Why not make a regexp to fetch digits, and then find all sequences of digits, instead of focussing on the separators?
You're using regexps whether you want to or not, so let's make it official and use Pattern, while we are at it.
private static final Pattern ALL_DIGITS = Pattern.compile("\\d+");
// then in your split method..
Matcher m = ALL_DIGITS.matcher(str);
List<Integer> numbers = new ArrayList<Integer>();
// dont use arrays, generally. List is better.
while (m.find()) {
numbers.add(Integer.parseInt(m.group(0)));
}
//d+ is: Any number of digits.
m.find() finds the next match (so, the next block of digits), returning false if there aren't any more.
m.group(0) retrieves the entire matched string.

Split the string on \\D+ which means one or more non-digit characters.
Demo:
import java.util.Arrays;
public class Main {
public static void main(String[] args) {
// Test strings
String[] arr = { "123,45", "67,89", "125 89", "678 129" };
for (String s : arr) {
System.out.println(Arrays.toString(s.split("\\D+")));
}
}
}
Output:
[123, 45]
[67, 89]
[125, 89]
[678, 129]

Why not split with [^\d]+ (every group of nondigfit) :
for (String n : "123,456 789".split("[^\\d]+")) {
System.out.println(n);
}
Result:
123
456
789

Java split returns white spaces in result

I'm using the function "split" on this string:
p(80,2)
I would like to obtain just the two numbers, so this is what I do:
String[] split = msg.msgContent().split("[p(,)]")
The regex is correct (or at least, I think so) since it splits the two numbers and puts them in the vector "split", but it turns out that this vector has a length of 4, and the first two positions are occupied by white spaces.
In fact, if I print each vector position, this is the result:
Split:
80
2
I've tried adding \\s to the regex to match with white spaces, but since there are none in my string, it didn't work.

You don't need split here, just use a simple regex to extract the digits from your string:
Pattern p = Pattern.compile("\\d+");
Matcher m = p.matcher(msg.msgContent());
while (m.find()) {
String number = m.group();
// add to array
}
Note that String#split takes a regex, and the regex you passed doesn't match the pattern you're looking for.
You might want to read the documentation of Pattern and Matcher for more information about the solution above.

split accepts a regular expression as parameter, and this is a character class: [p(,)].
Given that your code is splitting on all characters in the class:
"p(80,2)" will return an array {"", "80", "2"}
I know is not very beautiful:
List<String> collect = Pattern.compile("[^\\d]+")
.splitAsStream(s)
.filter(s -> s.length() > 0)
.collect(Collectors.toList());

Since you're splitting on p and (, the first two characters of your string are resulting in splits. I would split on the comma after replacing the p, (, and ). Like this:
String x = "p(80,2)";
String [] y = x.replaceAll("[p()]", "").split(",");

Split it's not really what you need here, but if you want to use it you can do something like that:
"p(80,2)".replace("p(", "").replace(")", "").split(",")
Results with
[80, 2]

Java split with certain patern

String abc ="abc_123,low,101.111.111.111,100.254.132.156,abc,1";
String[] ab = abc.split("(\\d+),[a-z]");
System.out.println(ab[0]);
Expected Output:
abc_123
low
101.111.111.111,100.254.132.156
abc
1
The problem is i am not able to find appropriate regex for this pattern.

I would suggest to not solve all problems with one regular expression.
It seems that your initial string contains values that are separated by ",". So split those values with ",".
Then iterate the output of that process; and "join" those elements that are IP addresses (as it seems that this is what you are looking for).
And just for the sake of it: keep in mind that IP addresses are actually pretty complicated; a pattern "to match em all" can be found here

You could use lookahead and lookbehind to check, if 3 digits and a . at the correct place are preceding or following the ,:
String[] ab = abc.split("(?<!\\.\\d{3}),|,(?!\\d{3}\\.)");

String[] ab = abc.split(",");
System.out.println(ab[0]);
System.out.println(ab[1]);
int i = 2;
while(ab[i].matches("[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}")) {
if(i > 2) System.out.print(",");
System.out.print(ab[i++]);
}
System.out.println();
System.out.println(ab[i++]);
System.out.println(ab[i++]);

first split them into array by , ,then apply regex to check whether it is in desired formate or not.If yes then concate all these separated by,
String abc ="abc_123,low,101.111.111.111,100.254.132.156,abc,1";//or something else.
String[] split = abc.split(",");
String concat="";
for(String data:split){
boolean matched=data.matches("[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}");
if(matched){
concat=concat+","+data;
}else{
System.out.println(data);
}
}
if(concat.length()>0)
System.out.println(concat.substring(1));
}

Java String Regex Divide - Always the Same Pattern

I never understood how to make properly regex to divide my Strings.
I have this types of Strings example = "on[?a, ?b, ?c]";
Sometimes I have this, Strings example2 = "not clear[?c]";
For the first Example I would like to divide into this:
[on, a, b, c]
or
String name = "on";
String [] vars = [a,b,c];
And for the second example I would like to divide into this type:
[not clear, c]
or
String name = "not clear";
String [] vars = [c];
Thanks alot in advance guys ;)

If you know the character set of your identifiers, you can simply do a split on all of the text that isn't in that set. For example, if your identifiers only consist of word characters ([a-zA-Z_0-9]) you can use:
String[] parts = "on[?a, ?b, ?c]".split("[\\W]+");
String name = parts[0];
String[] vars = Arrays.copyOfRange(parts, 1, parts.length);
If your identifiers only have A-Z (upper and lower) you could replace \\W above with ^A-Za-z.
I feel that this is more elegant than using a complex regular expression.
Edit: I realize that this will have issues with your second example "not clear". If you have no option of using something like an underscore instead of a space there, you could do one split on [? (or substring) to get the "name", and another split on the remainder, like so:
String s = "not clear[?a, ?b, ?c]";
String[] parts = s.split("\\[\\?"); //need the '?' so we don't get an extra empty array element in the next split
String name = parts[0];
String[] vars = parts[1].split("[\\W]+");

This comes close, but the problem is the third remembered group is actually repeated so it only captures the last match.
(.*?)\[(?:\s*(?:\?(.*?)(?:\s*,\s*\?(.*?))*)\s*)?]
For example, the first one you list on[?a, ?b, ?c] would give group 1 as on, 2 as a 3 as c. If you are using perl, you could the g flag to apply a regex to a line multiple times and use this:
my #tokens;
while ( my $line =~ /\s*(.*?)\s*[[,\]]/g ) {
push( #tokens, $1 );
}
Note, i did not actually test the perl code, just off the top of my head. It should give you the idea though

String[] parts = example.split("[^\\w ]");
List<String> x = new ArrayList<String>();
for (int i = 0; i < parts.length; i++) {
if (!"".equals(parts[i]) && !" ".equals(parts[i])) {
x.add(parts[i]);
}
}
This will work as long as you don't have more than one space separating your non-space characters. There's probably a cleverer way of filtering out the null and " " strings.

Java replaceAll() & split() irregularities

I know, I know, now I have two problems 'n all that, but regex here means I don't have to write two complicated loops. Instead, I have a regex that only I understand, and I'll be employed for yonks.
I have a string, say stack.overflow.questions[0].answer[1].postDate, and I need to get the [0] and the [1], preferably in an array. "Easy!" my neurons exclaimed, just use regex and the split method on your input string; so I came up with this:
String[] tokens = input.split("[^\\[\\d\\]]");
which produced the following:
[, , , , , , , , , , , , , , , , [0], , , , , , , [1]]
Oh dear. So, I thought, "what would replaceAll do in this instance?":
String onlyArrayIndexes = input.replaceAll("[^\\[\\d\\]]", "");
which produced:
[0][1]
Hmm. Why so? I'm looking for a two-element string array that contains "[0]" as the first element and "[1]" as the second. Why does split not work here, when the Javadocs declare they both use the Pattern class as per the Javadoc?
To summarise, I have two questions: why does the split() call produce that large array with seemingly random space characters and am I right in thinking the replaceAll works because the regex replaces all characters not matching "[", a number and "]"? What am I missing that means I expect them to produce similar output (OK that's three, and please don't answer "a clue?" to this one!).

well from what I can see the split does work, it gives you an array that holds the string split for each match that is not a set of brackets with a digit in the middle.
as for the replaceAll I think your assumption is right. it removes everything (replace the match with "") that is not what you want.
From the API documentation:
Splits this string around matches of
the given regular expression.
This method works as if by invoking
the two-argument split method with the
given expression and a limit argument
of zero. Trailing empty strings are
therefore not included in the
resulting array.
The string "boo:and:foo", for example,
yields the following results with
these expressions:
Regex Result
: { "boo", "and", "foo" }
o { "b", "", ":and:f" }

This is not a direct answer to your question, however I want to show you a great API that will suit your need.
Check out Splitter from Google Guava.
So for your example, you would use it like this:
Iterable<String> tokens = Splitter.onPattern("[^\\[\\d\\]]").omitEmptyStrings().trimResults().split(input);
//Now you get back an Iterable which you can iterate over. Much better than an Array.
for(String s : tokens) {
System.out.println(s);
}
This prints:
0
1

split splits on boundaries defined by the regex you provide, so it's no great surprise you're getting lots of entries — nearly all of the characters in the string match your regex and so, by definition, are boundaries on which a split should occur.
replaceAll replaces matches for your regex with the replacement you give it, which in your case is a blank string.
If you're trying to grab the 0 and the 1, it's a trivial loop:
String text = "stack.overflow.questions[0].answer[1].postDate";
Pattern pat = Pattern.compile("\\[(\\d+)\\]");
Matcher m = pat.matcher(text);
List<String> results = new ArrayList<String>();
while (m.find()) {
results.add(m.group(1)); // Or just .group() if you want the [] as well
}
String[] tokens = results.toArray(new String[0]);
Or if it's always exactly two of them:
String text = "stack.overflow.questions[0].answer[1].postDate";
Pattern pat = Pattern.compile(".*\\[(\\d+)\\].*\\[(\\d+)\\].*");
Matcher m = pat.matcher(text);
m.find();
String[] tokens = new String[2];
tokens[0] = m.group(1);
tokens[1] = m.group(2);

The problem is that split is the wrong operation here.
In ruby, I'd tell you to string.scan(/\[\d+\]/), which would give you the array ["[0]","[1]"]
Java doesn't have a single-method equivalent, but we can write a scan method as follows:
public List<String> scan(String string, String regex){
List<String> list = new ArrayList<String>();
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(string);
while(matcher.find()) {
list.add(matcher.group());
}
return retval;
}
and we can call it as scan(string,"\\[\\d+\\]")
The equivalent Scala code is:
"""\[\d+\]""".r findAllIn string

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java Splitting Strings with several conditions - java

The following code works for your examples: System.out.println(Arrays.asList("a<=b".split("\\b"))); It splits the string on word boundaries. If you need more elaborate splitting, you have to provide more examples.

Related

How can I split a string without knowing the split characters a-priori?

Java split returns white spaces in result

Java split with certain patern

Java String Regex Divide - Always the Same Pattern

Java replaceAll() & split() irregularities

Categories

Resources