String splitting and wrapping - java

Suppose I am given this string-
String A_Old[]="010.011.100.000.111";
On every occurrence of "000" I am expected to break the string and wrap the end towards the front of the string. The resultant should look like this-
String A_New[]="111.010.011.100";
Any help on how to tackle this problem would be appreciated.
Also, what do I do when there are multiple occurrences of "000"?
String A_Old[]="010.011.100.000.111.001.011.000.101.110";
should convert to :
String A_New[]="101.110.111.001.011.010.011.100";
Code in java or c++ is understood and appreciated.
EDIT
This was what I thought would work-
String b[]=A_Old.split(".000.",2); //should split the string in 2 parts.
A_New=b[1].concat(b[0]); // concatenation with a loss of a period
I was told to avoid the loss of the period as well as not using dummy variables. Both of which fails here. Any idea on how to tackle this?

To rotate the list, you wanna split and then concatenate in the order you want. You probably want to check for blanks/if it's the last element.
public static void main(String[] args)
{
String inString = "010.011.100.000.111";
String outString = "";
String[] arr = inString.split("\\.000\\.");
outString = arr[0];
for(int i = 1; i < arr.length; i++)
{
outString = outString + arr[i];
}
System.out.println(outString);
}

By Using Split method and reverse forloop in java
public static void main(String[] args) {
//insert code here
String A_Old="010.011.100.000.111.001.011.000.101.110";
String A_new ="";
String newe[]=A_Old.split("\\.000");
for(int i=newe.length-1;i>=0;i--){
A_new+=newe[i];
}
System.out.print(A_new.replaceFirst("\\.", ""));
}

I tried it in java:
public static String convert(String in) {
String[] strs = in.split("\\.*000\\.*");
StringBuilder sb = new StringBuilder();
for (int i = strs.length - 1; i >= 0; --i) {
sb.append(strs[i]);
if (i > 0 && strs[i - 1].length() > 0) {
sb.append(".");
}
}
return sb.toString();
}
public static void main(String[] args) {
System.out.println(convert("010.011.100.000.111"));
System.out.println(convert("010.011.100.000.111.001.011.000.101.110"));
System.out.println(convert("010.011.100.111"));
System.out.println(convert("000.010.011.100.111"));
System.out.println(convert("010.011.100.111.000"));
System.out.println(convert("000.010.011.100.111.000"));
}
Output:
111.010.011.100
101.110.111.001.011.010.011.100
010.011.100.111
010.011.100.111
010.011.100.111
010.011.100.111

"No dummy variables" sounds like a recursive solution is sought. Is this a Scheme class?
Here's a Perl solution. C++11 is sufficiently high-level to provide most of the facilities demonstrated here:
my #tests = (
'010.011.100.000.111',
'010.011.100.000.111.001.011.000.101.110'
);
foreach ( #tests ) {
print "$_ => ", rotate($_), "\n"; # Call our rotate routine, and show results.
}
sub rotate {
# Split the string into components, reverse their order, join them with '.'
return join('.', reverse split /(?:^|\.)000(?:\.|$)/, shift);
}
No dummy variables. In fact, no explicit variables either.
Here's how the split regex works:
(?:^|\.) # Match either start of string, or a dot.
000 # Match three zeros.
(?:\.|$) # Match either a dot, or the end of string.
This permits the 000 to appear anywhere in the string (including beginning or end). If it appears at the beginning, there will not need to be a leading dot. If it appears at the end, a trailing dot will not be required. Anywhere else, dots must surround the 000.
Update:
Here is a recursive solution using Perl. From what I've read of the Java solutions (I'm not a Java guy), it seems like translation should be straightforward for someone with some Java background.
my #tests = (
'010.011.100.000.111',
'010.011.100.000.111.001.011.000.101.110'
);
foreach ( #tests ) {
print "($_) => (", rotate($_), ")\n";
}
sub rotate {
my( $ok, $rest ) = split /(?:^|\.)000(?:\.|$)/, $_[0], 2;
return $ok unless defined $rest;
return join( '.', rotate($rest), $ok);
}
Update2:
And here's another Perl solution that eliminates any explicit variables, using only subroutine parameters:
my #tests = (
'010.011.100.000.111',
'010.011.100.000.111.001.011.000.101.110'
);
foreach ( #tests ) {
print "($_) => (", rotate($_), ")\n";
}
sub rotate {
return sub {
return $_[0] unless defined $_[1];
return join( '.', rotate($_[1]), $_[0]);
}->( split /(?:^|\.)000(?:\.|$)/, shift, 2 );
}
I have no idea how one would translate that Java. I suspect translation would become a little more difficult, though I'm confident there must exist a semantically equivalent C++11 solution.

Related

Any way to prevent that last char of a string from replacing in java

Let's say that want to add ? after each letter in a string.
String letters = "A#B#C#D"; //Split by #
String splitLetters[]=letters.split("#");
for(String ltr: splitLetters)
System.out.println(ltr+"?");
the output will be like:
A? B? C? D?
What I want is to prevent that last char from getting the change.
I want only the first letters to be changed.
Note:
replacing # with ? in a direct way like (...replace("#","?")) won't work. The code above is only an example.
You're almost thre! Just use for...loop and check if current letter last or not.
String letters = "A#B#C#D"; //Split by #
String splitLetters[] = letters.split("#");
for (int i = 0; i < splitLetters.length; i++) {
System.out.print(splitLetters[i]);
if (i + 1 < splitLetters.length)
System.out.println('?');
}
There are so many ways to do it (as already described in the existing answers). The following solution is based on your own solution with the required change:
public class Main {
public static void main(String[] args) {
String letters = "A#B#C#D"; // Split by #
String splitLetters[] = letters.split("#");
boolean firstStrPrinted = false;
for (String ltr : splitLetters) {
if (firstStrPrinted) {
System.out.print("?" + ltr);
} else {
System.out.print(ltr);
firstStrPrinted = true;
}
}
}
}
Output:
A?B?C?D
Here, a boolean firstStrPrinted has been used to track if the first string has been printed. If not, do not print the ? and update it to true.
Nothing is actually getting changed in your example, so it's difficult to figure out what you want to do.
If it's just that you don't want to print out the question mark after the last substring, then:
int k;
for (k=0; k<splitLetters.length-1; k++)
System.out.println(splitLetters[k] + "?");
System.out.println(splitLetters[k]);
You can apply similar reasoning to your actual code.
let's say that want to add ? after each letter in a string.
How about this?
This works by replacing the # sign followed by a character or end of string with the same character followed by the letter and the ? mark. It uses a back reference to capture the character.
String[] strs = { "A#B#C#D", "ABC#BBB#CCC#DDDD" };
for (String text : strs) {
String rep = text.replaceAll("(\\w)#|$", "$1? ");
System.out.println(text + " -> " + rep);
}
Prints
A#B#C#D -> A? B? C? D?
ABC#BBB#CCC#DDDD -> ABC? BBB? CCC? DDDD?
If this does not meet your requirements, please provide more specific guidelines.

Regex for splitting word / (slash) word [duplicate]

This question already has answers here:
Split the string on forward slash
(4 answers)
Closed 2 years ago.
I really need a regex expert:
I need a regex expression (in java) for splitting this examples:
Hello/World (word/word) => Hello,World
Hello/12 (word/number) => Hello,12
15/Hello (number/word) => 15,Hello
12/17 (number/number) => 12/17 (Do not split)
Update:
This is what I tried but it also mark the number/number option
https://regex101.com/r/zZ9nO5/2
Thanks
It might not be the most elegant solution but for your requirement you can do it like that:
(([a-zA-Z]+?)/([a-zA-Z]+))|(([a-zA-Z]+?)/([\d]+))|(([\d]+?)/([a-zA-Z]+))
It's a check for word / word, word / number and number / word
replace with the corresponding groups found \2\5\8,\3\6\9
A simple java program for that would be:
public static void main(String[] args) {
String[] stringArray=new String[]{"Hello/World","Hello/12","15/Hello","12/17"};
for(String s:stringArray) {
System.out.println(s.replaceAll("(([a-zA-Z]+?)/([a-zA-Z]+))|(([a-zA-Z]+?)/([\\d]+))|(([\\d]+?)/([a-zA-Z]+))", "$2$5$8,$3$6$9"));
}
}
Result is:
Hello,World
Hello,12
15,Hello
12/17
Slightly different approach, but you could check the characters in the String to see that they all are either a number or a forward slash, and then split if necessary:
public static void main(String[] args) {
String[] strArray = new String[]{"Hello/World", "Hello/12", "15/Hello", "12/17"};
for(String str: strArray){
if(checkIfValid(str)){
System.out.println(str);
}
else{
System.out.println(str.replace("/", ","));
}
}
}
public static boolean checkIfValid(String str) {
for (int i = 0; i < str.length(); i++) {
if (!Character.isDigit(str.charAt(i)) && str.charAt(i) != '/') {
return false;
}
}
return true;
}
Output:
Hello,World
Hello,12
15,Hello
12/17
This might help if Hello12/15 is not supposed to be split.
A little more context would be nice but as I understand it, you get a string with a single '/' in the middle and you either replace the '/' with ',' or you dont if it has numbers on both sides.
So i would do something like this:
public class MyClass {
public static void main(String args[]) {
String mystring = "12/25";
if(!mystring.matches("^\\d+\\/\\d+$"))
mystring = mystring.replace("/", ",");
System.out.println(mystring);
}
}
If that is what you wanted to do here, then I belive its less complicated and also quicker than a big regex destinguishing between all 4 cases.

Java efficiently replace unless matches complex regular expression

I have over a gigabyte of text that I need to go through and surround punctuation with spaces (tokenizing). I have a long regular expression (1818 characters, though that's mostly lists) that defines when punctuation should not be separated. Being long and complicated makes it hard to use groups with it, though I wouldn't leave that out as an option since I could make most groups non-capturing (?:).
Question: How can I efficiently replace certain characters that don't match a particular regular expression?
I've looked into using lookaheads or similar, and I haven't quite figured it out, but it seems to be terribly inefficient anyway. It would likely be better than using placeholders though.
I can't seem to find a good "replace with a bunch of different regular expressions for both finding and replacing in one pass" function.
Should I do this line by line instead of operating on the whole text?
String completeRegex = "[^\\w](("+protectedPrefixes+")|(("+protectedNumericOnly+")\\s*\\p{N}))|"+protectedRegex;
Matcher protectedM = Pattern.compile(completeRegex).matcher(s);
ArrayList<String> protectedStrs = new ArrayList<String>();
//Take note of the protected matches.
while (protectedM.find()) {
protectedStrs.add(protectedM.group());
}
//Replace protected matches.
String replaceStr = "<PROTECTED>";
s = protectedM.replaceAll(replaceStr);
//Now that it's safe, separate punctuation.
s = s.replaceAll("([^\\p{L}\\p{N}\\p{Mn}_\\-<>'])"," $1 ");
// These are for apostrophes. Can these be combined with either the protecting regular expression or the one above?
s = s.replaceAll("([\\p{N}\\p{L}])'(\\p{L})", "$1 '$2");
s = s.replaceAll("([^\\p{L}])'([^\\p{L}])", "$1 ' $2");
Note the two additional replacements for apostrophes. Using placeholders protects against those replacements as well, but I'm not really concerned with apostrophes or single quotes in my protecting regex anyway, so it's not a real concern.
I'm rewriting what I considered very inefficient Perl code with my own in Java, keeping track of speed, and things were going fine until I started replacing the placeholders with the original strings. With that addition it's too slow to be reasonable (I've never seen it get even close to finishing).
//Replace placeholders with original text.
String resultStr = "";
String currentStr = "";
int currentPos = 0;
int[] protectedArray = replaceStr.codePoints().toArray();
int protectedLen = protectedArray.length;
int[] strArray = s.codePoints().toArray();
int protectedCount = 0;
for (int i=0; i<strArray.length; i++) {
int pt = strArray[i];
// System.out.println("pt: "+pt+" symbol: "+String.valueOf(Character.toChars(pt)));
if (protectedArray[currentPos]==pt) {
if (currentPos == protectedLen - 1) {
resultStr += protectedStrs.get(protectedCount);
protectedCount++;
currentPos = 0;
} else {
currentPos++;
}
} else {
if (currentPos > 0) {
resultStr += replaceStr.substring(0, currentPos);
currentPos = 0;
currentStr = "";
}
resultStr += ParseUtils.getSymbol(pt);
}
}
s = resultStr;
This code may not be the most efficient way to return the protected matches. What is a better way? Or better yet, how can I replace punctuation without having to use placeholders?
I don't know exactly how big your in-between strings are, but I suspect that you can do somewhat better than using Matcher.replaceAll, speed-wise.
You're doing 3 passes across the string, each time creating a new Matcher instance, and then creating a new String; and because you're using + to concatenate the strings, you're creating a new string which is the concatenation of the in-between string and the protected group, and then another string when you concatenate this to the current result. You don't really need all of these extra instances.
Firstly, you should accumulate the resultStr in a StringBuilder, rather than via direct string concatenation. Then you can proceed something like:
StringBuilder resultStr = new StringBuilder();
int currIndex = 0;
while (protectedM.find()) {
protectedStrs.add(protectedM.group());
appendInBetween(resultStr, str, current, protectedM.str());
resultStr.append(protectedM.group());
currIndex = protectedM.end();
}
resultStr.append(str, currIndex, str.length());
where appendInBetween is a method implementing the equivalent to the replacements, just in a single pass:
void appendInBetween(StringBuilder resultStr, String s, int start, int end) {
// Pass the whole input string and the bounds, rather than taking a substring.
// Allocate roughly enough space up-front.
resultStr.ensureCapacity(resultStr.length() + end - start);
for (int i = start; i < end; ++i) {
char c = s.charAt(i);
// Check if c matches "([^\\p{L}\\p{N}\\p{Mn}_\\-<>'])".
if (!(Character.isLetter(c)
|| Character.isDigit(c)
|| Character.getType(c) == Character.NON_SPACING_MARK
|| "_\\-<>'".indexOf(c) != -1)) {
resultStr.append(' ');
resultStr.append(c);
resultStr.append(' ');
} else if (c == '\'' && i > 0 && i + 1 < s.length()) {
// We have a quote that's not at the beginning or end.
// Call these 3 characters bcd, where c is the quote.
char b = s.charAt(i - 1);
char d = s.charAt(i + 1);
if ((Character.isDigit(b) || Character.isLetter(b)) && Character.isLetter(d)) {
// If the 3 chars match "([\\p{N}\\p{L}])'(\\p{L})"
resultStr.append(' ');
resultStr.append(c);
} else if (!Character.isLetter(b) && !Character.isLetter(d)) {
// If the 3 chars match "([^\\p{L}])'([^\\p{L}])"
resultStr.append(' ');
resultStr.append(c);
resultStr.append(' ');
} else {
resultStr.append(c);
}
} else {
// Everything else, just append.
resultStr.append(c);
}
}
}
Ideone demo
Obviously, there is a maintenance cost associated with this code - it is undeniably more verbose. But the advantage of doing it explicitly like this (aside from the fact it is just a single pass) is that you can debug the code like any other - rather than it just being the black box that regexes are.
I'd be interested to know if this works any faster for you!
At first I thought that appendReplacement wasn't what I was looking for, but indeed it was. Since it's replacing the placeholders at the end that slowed things down, all I really needed was a way to dynamically replace matches:
StringBuffer replacedBuff = new StringBuffer();
Matcher replaceM = Pattern.compile(replaceStr).matcher(s);
int index = 0;
while (replaceM.find()) {
replaceM.appendReplacement(replacedBuff, "");
replacedBuff.append(protectedStrs.get(index));
index++;
}
replaceM.appendTail(replacedBuff);
s = replacedBuff.toString();
Reference: Second answer at this question.
Another option to consider:
During the first pass through the String, to find the protected Strings, take the start and end indices of each match, replace the punctuation for everything outside of the match, add the matched String, and then keep going. This takes away the need to write a String with placeholders, and requires only one pass through the entire String. It does, however, require many separate small replacement operations. (By the way, be sure to compile the patterns before the loop, as opposed to using String.replaceAll()). A similar alternative is to add the unprotected substrings together, and then replace them all at the same time. However, the protected strings would then have to be added to the replaced string at the end, so I doubt this would save time.
int currIndex = 0;
while (protectedM.find()) {
protectedStrs.add(protectedM.group());
String substr = s.substring(currIndex,protectedM.start());
substr = p1.matcher(substr).replaceAll(" $1 ");
substr = p2.matcher(substr).replaceAll("$1 '$2");
substr = p3.matcher(substr).replaceAll("$1 ' $2");
resultStr += substr+protectedM.group();
currIndex = protectedM.end();
}
Speed comparison for 100,000 lines of text:
Original Perl script: 272.960579875 seconds
My first attempt: Too long to finish.
With appendReplacement(): 14.245160866 seconds
Replacing while finding protected: 68.691842962 seconds
Thank you, Java, for not letting me down.

Java String spilter

I have a string of annotation of country abbreviation , I want to split them out so I can identify the country of each abbreviation. Such that I will have String c = USA; I will output the country name...
currently it doesnt have c = USA but only A
public class Example {
public static void main(String[] args) {
String x = "USAIND";
String c = "";
System.out.print("Country: ");
for (int i = 0; i < 3; i++) {
c = Character.toString(x.charAt(i));
System.out.print(c);
if (c.equals("USA")) {
System.out.println("United State of America");
}
}
System.out.println("");
System.out.print("Country: ");
for (int i = 3; i < 6; i++) {
c = Character.toString(x.charAt(i));
System.out.print(c);
if (c.equals("IND")) {
System.out.println("India");
}
}
System.out.println("");
}
}
You need to append each character to your String and then compare it, otherwise, it'll just keep replacing your String with the last character always.
for (int i = 0; i < 3; i++) {
c += Character.toString(x.charAt(i)); // Appending all the characters one by one
}
System.out.print(c); // Printing the String c after all the characters are appending
if (c.equals("USA")) { // checking if its equal to USA
System.out.println("United State of America");
}
And the same goes with the other half of the process.
c = ""; // re-initialize it to blank
for (int i = 3; i < 6; i++) {
c += Character.toString(x.charAt(i));
}
System.out.print(c);
if (c.equals("IND")) {
System.out.println("India");
}
But the easiest way would be to use String.substring(startIndex, endIndex) for this.
String c = x.substring(0,3); // For USA
String c1 = x.substring(3,6); // For IND
because when you do this
c = Character.toString(x.charAt(i));
the character at the ith position is getting stored in c and as it is in a for loop the only thing that would be stored there would be 'A'
use a substring instead of the for loop and charAt
c = x.substring(0,3); \\which would give you "USA"
You should probably use String.substring(...) for this.
You are iterating through the string, but you only retain the last character because of this snippet:
c = Character.toString(x.charAt(i));
This should be:
c += Character.toString(x.charAt(i));
As this will append the current character iteration to the overall string. Replace the snippets with this fix, for the two loops. The c variable will build up the country code and will pass this condition this time:
if (c.equals("USA")) {
After the first loop and before the second loop, you will need to re-initialize the c variable:
c = "";
Once done, you can put that logic in a method of its own, so you avoid duplicate code within the loops.
This logic could be simplified by using String.substring instead, as others pointed out, as you work in details with the String.charAt which is more tedious. I thought though that pointing out your logic error was worth it, before giving you other pointers.
So talking about other approaches, you could try another one to your country code and name console output. Try to use a HashMap where the keys are the country code and the value is the country's name. You can iterate through the HashMap after that and print out both keys and values. That would be more high-level to your current solution and way shorter in code.
EDIT1: I offered the code to the last suggestion but I removed it, as I realized that giving code to assignment related questions is not encouraged.
I would go another way, either try to add a terminal symbol like ";". If this is not possible you could check with String.contains("USA") if a certain country is set. But beaware it could happen that you will find a country not listed in because the combination of two others.
With your logic, I thing it will be better to use
public String substring(int beginIndex, int endIndex)
Get the substring for a start index and the end index and compare. Currently you are just getting the character .With your current implementation you have to convert each character to String and append.
For the part of splitting, I would use Guava's Splitter, like so:
Splitter.fixedLength(3).split(x)
And for the part of matching abbreviation with full name, I would use an enum instead of equals comparing, it seems a little cleaner. So, a possible full result would be:
public class Example {
public static enum Country {
USA("United States of America"), IND("India");
String fullName;
Country(String fullName) {
this.fullName = fullName;
}
String getFullName() {
return fullName;
}
};
public static void main(String[] args) {
String x = "USAINDUSAIND";
for (String s : Splitter.fixedLength(3).split(x)) {
System.out.println(Country.valueOf(s).getFullName());
}
}
}
EDIT: Sorry about using third party, seemed clean to me. I agree with the other answers in using substring, but I like more thinking of everything as just one process and not two, so you can easily have a string of more than 2 abbreviations, like (example assuming the Enum as above)
public static void main(String[] args) {
String x = "USAINDUSAIND";
for (int i = 0;i < x.length();i += 3) {
System.out.println(Country.valueOf(x.substring(i, i + 3).getFullName());
}
}

String.split() Not Acting on Semicolon or Space Delimiters

This may be a simple question, but I have been Googling for over an hour and haven't found an answer yet.
I'm trying to simply use the String.split() method with a small Android application to split an input string. The input string will be something along the lines of: "Launch ip:192.168.1.101;port:5900". I'm doing this in two iterations to ensure that all of the required parameters are there. I'm first trying to do a split on spaces and semicolons to get the individual tokens sorted out. Next, I'm trying to split on colons in order to strip off the identification tags of each piece of information.
So, for example, I would expect the first round of split to give me the following data from the above example string:
(1) Launch
(2) ip:192.168.1.101
(3) port:5900
Then the second round would give me the following:
(1) 192.168.1.101
(2) 5900
However, the following code that I wrote doesn't give me what's expected:
private String[] splitString(String inputString)
{
String[] parsedString;
String[] orderedString = new String[SOSLauncherConstants.SOCKET_INPUT_STRING_PARSE_VALUE];
parsedString = inputString.trim().split("; ");
Log.i("info", "The parsed data is as follows for the initially parsed string of size " + parsedString.length + ": ");
for (int i = 0; i < parsedString.length; ++i)
{
Log.i("info", parsedString[i]);
}
for (int i = 0; i < parsedString.length; ++i )
{
if (parsedString[i].toLowerCase().contains(SOSLauncherConstants.PARSED_LAUNCH_COMMAND_VALUE))
{
orderedString[SOSLauncherConstants.PARSED_COMMAND_WORD] = parsedString[i];
}
if (parsedString[i].toLowerCase().contains("ip"))
{
orderedString[SOSLauncherConstants.PARSED_IP_VALUE] = parsedString[i].split(":")[1];
}
else if (parsedString[i].toLowerCase().contains("port"))
{
orderedString[SOSLauncherConstants.PARSED_PORT_VALUE] = parsedString[i].split(":")[1];
}
else if (parsedString[i].toLowerCase().contains("username"))
{
orderedString[SOSLauncherConstants.PARSED_USERNAME_VALUE] = parsedString[i].split(":")[1];
}
else if (parsedString[i].toLowerCase().contains("password"))
{
orderedString[SOSLauncherConstants.PARSED_PASSWORD_VALUE] = parsedString[i].split(":")[1];
}
else if (parsedString[i].toLowerCase().contains("color"))
{
orderedString[SOSLauncherConstants.PARSED_COLOR_VALUE] = parsedString[i].split(":")[1];
}
}
Log.i("info", "The parsed data is as follows for the second parsed string of size " + orderedString.length + ": ");
for (int i = 0; i < orderedString.length; ++i)
{
Log.i("info", orderedString[i]);
}
return orderedString;
}
For a result, I'm getting the following:
The parsed data is as follows for the parsed string of size 1:
launch ip:192.168.1.106;port:5900
The parsed data is as follows for the second parsed string of size 6:
launch ip:192.168.1.106;port:5900
192.168.1.106;port
And then, of course, it crashes because the for loop runs into a null string.
Side Note:
The following snippet is from the constants class that defines all of the string indexes --
public static final int SOCKET_INPUT_STRING_PARSE_VALUE = 6;
public static final int PARSED_COMMAND_WORD = 0;
public static final String PARSED_LAUNCH_COMMAND_VALUE = "launch";
public static final int PARSED_IP_VALUE = 1;
public static final int PARSED_PORT_VALUE = 2;
public static final int PARSED_USERNAME_VALUE = 3;
public static final int PARSED_PASSWORD_VALUE = 4;
public static final int PARSED_COLOR_VALUE = 5;
I looked into needing a possible escape (by inserting a \\ before the semicolon) on the semicolon delimiter, and even tried using it, but that didn't work. The odd part is that neither the space nor the semicolon function as a delimiter, yet the colon works on the second time around. Does anybody have any ideas what would cause this?
Thanks for your time!
EDIT: I should also add that I'm receiving the string over a WiFi socket connection. I don't think this should make a difference, but I'd like you to have all of the information that you need.
String.split(String) takes a regex. Use "[; ]". eg:
"foo;bar baz".split("[; ]")
will return an array containing "foo", "bar" and "baz".
If you need groups of spaces to work as a single delimiter, you can use something like:
"foo;bar baz".split("(;| +)")
I believe String.split() tries to split on each of the characters you specify together (or on a regex), not each character individually. That is, split(";.") would not split "a;b.c" at all, but would split "a;.b".
You may have better luck with Guava's Splitter, which is meant to be slightly less unpredictable than java.lang.String.split.
I would write something like
Iterable<String> splits = Splitter.on(CharMatcher.anyOf("; ")).split(string);
but Splitter also provides fluent-style customization like "trim results" or "skip over empty strings."
Is there a reason why you are using String.split(), but not using Regular Expressions? This is a perfect candidate for regex'es, esp if the string format is consistent.
I'm not sure if your format is fixed, and if it is, then the following regex should break it down for you (am sure that someone can come up with an even more elegant regex). If you have several command strings that follow, then you can use a more flexible regex and loop over all the groups:
Pattern p = Pattern.compile("([\w]*)[ ;](([\w]*):([^ ;]*))*");
Matcher m = p.match( <input string>);
if( m.find() )
command = m.group(1);
do{
id = m.group(3);
value = m.group(4);
} while( m.find() );
A great place to test out regex'es online is http://www.regexplanet.com/simple/index.html. It allows you to play with the regex without having to compile and launch you app every time if you just want to get the regex correct.

Categories