Handling delimiter with escape characters in Java String.split() method - java

I have searched the web for my query, but didn't get the answer which fits my requirement exactly. I have my string like below:
A|B|C|The Steading\|Keir Allan\|Braco|E
My Output should look like below:
A
B
C
The Steading|Keir Allan|Braco
E
My requirement is to skip the delimiter if it is preceded by the escape sequence. I have tried the following using negative lookbehinds in String.split():
(?<!\\)\|
But, my problem is the delimiter will be defined by the end user dynamically and it need not be always |. It can be any character on the keyboard (no restrictions). Hence, my doubt is that the above regex might fail for some of the special characters which are not allowed in regex.
I just wanted to know if this is the perfect way to do it.

You can use Pattern.quote():
String regex = "(?<!\\\\)" + Pattern.quote(delim);
Using your example:
String delim = "|";
String regex = "(?<!\\\\)" + Pattern.quote(delim);
for (String s : "A|B|C|The Steading\\|Keir Allan\\|Braco|E".split(regex))
System.out.println(s);
A
B
C
The Steading\|Keir Allan\|Braco
E
You can extend this to use a custom escape sequence as well:
String delim = "|";
String esc = "+";
String regex = "(?<!" + Pattern.quote(esc) + ")" + Pattern.quote(delim);
for (String s : "A|B|C|The Steading+|Keir Allan+|Braco|E".split(regex))
System.out.println(s);
A
B
C
The Steading+|Keir Allan+|Braco
E

I know this is an old thread, but the lookbehind solution has an issue, that it doesn't allow escaping of the escape character (the split would not occur on A|B|C|The Steading\\|Keir Allan\|Braco|E)).
The positive matching solution in thread Regex and escaped and unescaped delimiter works better (with modification using Pattern.quote() if the delimiter is dynamic).

private static void splitString(String str, char escapeCharacter, char delimiter, Consumer<String> resultConsumer) {
final StringBuilder sb = new StringBuilder();
boolean isEscaped = false;
for (int i = 0; i < str.length(); i++) {
char c = str.charAt(i);
if (c == escapeCharacter) {
isEscaped = ! isEscaped;
sb.append(c);
} else if (c == delimiter) {
if (isEscaped) {
sb.append(c);
isEscaped = false;
} else {
resultConsumer.accept(sb.toString());
sb.setLength(0);
}
} else {
isEscaped = false;
sb.append(c);
}
}
resultConsumer.accept(sb.toString());
}

Related

Replace special characters (non ASCII) in String by corresponding unicode [duplicate]

I have strings "A função", "Ãugent" in which I need to replace characters like ç, ã, and à with empty strings.
How can I remove those non-ASCII characters from my string?
I have attempted to implement this using the following function, but it is not working properly. One problem is that the unwanted characters are getting replaced by the space character.
public static String matchAndReplaceNonEnglishChar(String tmpsrcdta) {
String newsrcdta = null;
char array[] = Arrays.stringToCharArray(tmpsrcdta);
if (array == null)
return newsrcdta;
for (int i = 0; i < array.length; i++) {
int nVal = (int) array[i];
boolean bISO =
// Is character ISO control
Character.isISOControl(array[i]);
boolean bIgnorable =
// Is Ignorable identifier
Character.isIdentifierIgnorable(array[i]);
// Remove tab and other unwanted characters..
if (nVal == 9 || bISO || bIgnorable)
array[i] = ' ';
else if (nVal > 255)
array[i] = ' ';
}
newsrcdta = Arrays.charArrayToString(array);
return newsrcdta;
}
This will search and replace all non ASCII letters:
String resultString = subjectString.replaceAll("[^\\x00-\\x7F]", "");
FailedDev's answer is good, but can be improved. If you want to preserve the ascii equivalents, you need to normalize first:
String subjectString = "öäü";
subjectString = Normalizer.normalize(subjectString, Normalizer.Form.NFD);
String resultString = subjectString.replaceAll("[^\\x00-\\x7F]", "");
=> will produce "oau"
That way, characters like "öäü" will be mapped to "oau", which at least preserves some information. Without normalization, the resulting String will be blank.
This would be the Unicode solution
String s = "A função, Ãugent";
String r = s.replaceAll("\\P{InBasic_Latin}", "");
\p{InBasic_Latin} is the Unicode block that contains all letters in the Unicode range U+0000..U+007F (see regular-expression.info)
\P{InBasic_Latin} is the negated \p{InBasic_Latin}
You can try something like this. Special Characters range for alphabets starts from 192, so you can avoid such characters in the result.
String name = "A função";
StringBuilder result = new StringBuilder();
for(char val : name.toCharArray()) {
if(val < 192) result.append(val);
}
System.out.println("Result "+result.toString());
[Updated solution]
can be used with "Normalize" (Canonical decomposition) and "replaceAll", to replace it with the appropriate characters.
import java.text.Normalizer;
import java.text.Normalizer.Form;
import java.util.regex.Pattern;
public final class NormalizeUtils {
public static String normalizeASCII(final String string) {
final String normalize = Normalizer.normalize(string, Form.NFD);
return Pattern.compile("\\p{InCombiningDiacriticalMarks}+")
.matcher(normalize)
.replaceAll("");
} ...
Or you can use the function below for removing non-ascii character from the string.
You will get know internal working.
private static String removeNonASCIIChar(String str) {
StringBuffer buff = new StringBuffer();
char chars[] = str.toCharArray();
for (int i = 0; i < chars.length; i++) {
if (0 < chars[i] && chars[i] < 127) {
buff.append(chars[i]);
}
}
return buff.toString();
}
The ASCII table contains 128 codes, with a total of 95 printable characters, of which only 52 characters are letters:
[0-127] ASCII codes
[32-126] printable characters
[48-57] digits [0-9]
[65-90] uppercase letters [A-Z]
[97-122] lowercase letters [a-z]
You can use String.codePoints method to get a stream over int values of characters of this string and filter out non-ASCII characters:
String str1 = "A função, Ãugent";
String str2 = str1.codePoints()
.filter(ch -> ch < 128)
.mapToObj(Character::toString)
.collect(Collectors.joining());
System.out.println(str2); // A funo, ugent
Or you can explicitly specify character ranges. For example filter out everything except letters:
String str3 = str1.codePoints()
.filter(ch -> ch >= 'A' && ch <= 'Z'
|| ch >= 'a' && ch <= 'z')
.mapToObj(Character::toString)
.collect(Collectors.joining());
System.out.println(str3); // Afunougent
See also: How do I not take Special Characters in my Password Validation (without Regex)?
String s = "A função";
String stripped = s.replaceAll("\\P{ASCII}", "");
System.out.println(stripped); // Prints "A funo"
or
private static final Pattern NON_ASCII_PATTERN = Pattern.compile("\\P{ASCII}");
public static String matchAndReplaceNonEnglishChar(String tmpsrcdta) {
return NON_ASCII_PATTERN.matcher(s).replaceAll("");
}
public static void main(String[] args) {
matchAndReplaceNonEnglishChar("A função"); // Prints "A funo"
}
Explanation
The method String.replaceAll(String regex, String replacement) replaces all instances of a given regular expression (regex) with a given replacement string.
Replaces each substring of this string that matches the given regular expression with the given replacement.
Java has the "\p{ASCII}" regular expression construct which matches any ASCII character, and its inverse, "\P{ASCII}", which matches any non-ASCII character. The matched characters can then be replaced with the empty string, effectively removing them from the resulting string.
String s = "A função";
String stripped = s.replaceAll("\\P{ASCII}", "");
System.out.println(stripped); // Prints "A funo"
The full list of valid regex constructs is documented in the Pattern class.
Note: If you are going to be calling this pattern multiple times within a run, it will be more efficient to use a compiled Pattern directly, rather than String.replaceAll. This way the pattern is compiled only once and reused, rather than each time replaceAll is called:
public class AsciiStripper {
private static final Pattern NON_ASCII_PATTERN = Pattern.compile("\\P{ASCII}");
public static String stripNonAscii(String s) {
return NON_ASCII_PATTERN.matcher(s).replaceAll("");
}
}
An easily-readable, ascii-printable, streams solution:
String result = str.chars()
.filter(c -> isAsciiPrintable((char) c))
.mapToObj(c -> String.valueOf((char) c))
.collect(Collectors.joining());
private static boolean isAsciiPrintable(char ch) {
return ch >= 32 && ch < 127;
}
To convert to "_": .map(c -> isAsciiPrintable((char) c) ? c : '_')
32 to 127 is equivalent to the regex [^\\x20-\\x7E] (from comment on the regex solution)
Source for isAsciiPrintable: http://www.java2s.com/Code/Java/Data-Type/ChecksifthestringcontainsonlyASCIIprintablecharacters.htm
CharMatcher.retainFrom can be used, if you're using the Google Guava library:
String s = "A função";
String stripped = CharMatcher.ascii().retainFrom(s);
System.out.println(stripped); // Prints "A funo"

How can I remove punctuation from input text in Java?

I am trying to get a sentence using input from the user in Java, and i need to make it lowercase and remove all punctuation. Here is my code:
String[] words = instring.split("\\s+");
for (int i = 0; i < words.length; i++) {
words[i] = words[i].toLowerCase();
}
String[] wordsout = new String[50];
Arrays.fill(wordsout,"");
int e = 0;
for (int i = 0; i < words.length; i++) {
if (words[i] != "") {
wordsout[e] = words[e];
wordsout[e] = wordsout[e].replaceAll(" ", "");
e++;
}
}
return wordsout;
I cant seem to find any way to remove all non-letter characters. I have tried using regexes and iterators with no luck. Thanks for any help.
This first removes all non-letter characters, folds to lowercase, then splits the input, doing all the work in a single line:
String[] words = instring.replaceAll("[^a-zA-Z ]", "").toLowerCase().split("\\s+");
Spaces are initially left in the input so the split will still work.
By removing the rubbish characters before splitting, you avoid having to loop through the elements.
You can use following regular expression construct
Punctuation: One of !"#$%&'()*+,-./:;<=>?#[]^_`{|}~
inputString.replaceAll("\\p{Punct}", "");
You may try this:-
Scanner scan = new Scanner(System.in);
System.out.println("Type a sentence and press enter.");
String input = scan.nextLine();
String strippedInput = input.replaceAll("\\W", "");
System.out.println("Your string: " + strippedInput);
[^\w] matches a non-word character, so the above regular expression will match and remove all non-word characters.
If you don't want to use RegEx (which seems highly unnecessary given your problem), perhaps you should try something like this:
public String modified(final String input){
final StringBuilder builder = new StringBuilder();
for(final char c : input.toCharArray())
if(Character.isLetterOrDigit(c))
builder.append(Character.isLowerCase(c) ? c : Character.toLowerCase(c));
return builder.toString();
}
It loops through the underlying char[] in the String and only appends the char if it is a letter or digit (filtering out all symbols, which I am assuming is what you are trying to accomplish) and then appends the lower case version of the char.
I don't like to use regex, so here is another simple solution.
public String removePunctuations(String s) {
String res = "";
for (Character c : s.toCharArray()) {
if(Character.isLetterOrDigit(c))
res += c;
}
return res;
}
Note: This will include both Letters and Digits
If your goal is to REMOVE punctuation, then refer to the above. If the goal is to find words, none of the above solutions does that.
INPUT: "This. and:that. with'the-other".
OUTPUT: ["This", "and", "that", "with", "the", "other"]
but what most of these "replaceAll" solutions is actually giving you is:
OUTPUT: ["This", "andthat", "withtheother"]

How to distinguish two strings in a String?(How to prevent plain text injection)

Say I have two randomly generated Strings.
What can I do to make a single String with the two Strings generated, while being able to split them to get the original two Strings for later use?
For example, I have "[aweiroj\3aoierjvg0_3409" and " 4093 w_/e9 ". How can I attach those two words into one variable while being able to split them to original two Strings?
My problem is, I can't seem to find a regex for .spit() because those two strings can have any chararacters(alpabet, integer, \, /, spaces...).
EDIT
I just thought of a real life case where this could be used. Sometimes, sending plain text over network(HTTP) is better than xml or json. Slow server with fast broadband - use xml or json, fast server with slow broadband - use plain text. The answers below could prevent plain text injection. However, these methods are not benchmarked or tested, I would probably test these methods before actually using them.
The short answer is: Don't do that. Use an array, or a class with two data members, but combining the strings together into one string is probably a bad idea.
But if you have some truly obscure use case, you can:
Create a sufficiently-unique delimiter, like "<<Jee Seok Yoon's Delimiter>>".
final static String DELIM = "<<Jee Seok Yoon's Unique Delimiter>>";
String a = /*...*/;
String b = /*...*/;
String combined = a + DELIM + b;
int breakAt = combined.indexOf(DELIM);
String a1 = combined.substring(0, breakAt);
String b1 = combined.substring(breakAt + DELIM.length());
Have a simpler delimiter that you escape if present in the string.
Remember the length of the first string and store it in your unified string followed by an "end of length" delimiter.
String a = /*...*/;
String b = /*...*/;
String combined = String.valueOf(a.length()) + "|" + a + b;
int breakAt = combined.indexOf("|");
int len = Integer.parseInt(combined.substring(0, breakAt), 10);
String a1 = combined.substring(breakAt + 1, len);
String b1 = combined.substring(breakAt + 1 + len);
(Both code examples are completely off-the-cuff and untested.)
I would create a Class that holds both Strings and is able to print them seperatly and combined.
This one simply extends ArrayList so you don't need to reimplement add, get and so on:
public class ConcatedString extends ArrayList<String>
{
public String concated() {
StringBuilder b = new StringBuilder();
for (String string : this)
{
b.append(string);
}
return b.toString();
}
}
If this is a matter of serialization of some (obscure) kind, then there is at least one obvious way to do this.
Encode the strings using some encoding (HTML encoding is an easy and readable choice). Pick a character that the encoded strings cannot possibly contain, use that as a separator and concatenate them all.
Then, to retrieve, separate the strings by that character and decode the substrings using your initial method in reverse.
If you want it to work in every cases, you need to define 2 special characters :
A delimiter character
An escape character.
1-Encoding : When you concat the 2 String :
In both String,
replace all characters which equal the escape character with 2 escape characters
replace all characters which equal the delimiter character with escape + delimiter
then concat both String with the delimiter character between them.
2-Decoding : When you decode the String :
If the current character is a escape character while the next one is also a escape character, replace it with only one escape character and skip 1 character.
If the current character is a escape character while the next one is also a delimiter character, replace it with only one delimiter character and skip 1 character.
If the current character is the delimiter character, then you are between the 2 original Strings.
Here is a working example :
//I make on purpose a bad choice for escape/delimiter characters
private static final char DELIMITER = '1';
private static final char ESCAPE = '2';
public static String encode(String s1, String s2){
StringBuilder sb = new StringBuilder();
subEncode(s1, sb);
sb.append(DELIMITER);
subEncode(s2, sb);
return sb.toString();
}
private static void subEncode(String s, StringBuilder sb) {
for(char c : s.toCharArray()) {
if(c == ESCAPE) {
sb.append(ESCAPE);
sb.append(ESCAPE);
}else if(c == DELIMITER) {
sb.append(ESCAPE);
sb.append(DELIMITER);
}else {
sb.append(c);
}
}
}
public static String[] decode(String encoded) {
StringBuilder sb1 = new StringBuilder();
StringBuilder sb2 = new StringBuilder();
StringBuilder currentSb = sb1;
char[] chars = encoded.toCharArray();
for(int i = 0; i< chars.length ; i++) {
if(chars[i] == ESCAPE) {
if(chars.length < i+2) {
throw new IllegalArgumentException("Malformed encoded String");
}
if(chars[i+1] == ESCAPE) {
currentSb.append(ESCAPE);
}else if(chars[i+1] == DELIMITER) {
currentSb.append(DELIMITER);
}
i++;
}else if(chars[i] == DELIMITER) {
currentSb=sb2;
}else {
currentSb.append(chars[i]);
}
}
return new String[]{sb1.toString(), sb2.toString()};
}
Test :
public static void main(String[] args) {
//Nominal case :
{
String s1 = "aaa";
String s2 = "bbb";
System.out.println("Encoded : " + encode(s1, s2));
System.out.println("Decoded" + Arrays.asList(decode(encode(s1,s2))));
}
//with bad characters :
{
String s1 = "111";
String s2 = "222";
System.out.println("Encoded : " + encode(s1, s2));
System.out.println("Decoded" + Arrays.asList(decode(encode(s1,s2))));
}
//with random characters :
{
String s1 = "a11a1";
String s2 = "1112bb22";
System.out.println("Encoded : " + encode(s1, s2));
System.out.println("Decoded" + Arrays.asList(decode(encode(s1,s2))));
}
}
Output :
Encoded : aaa1bbb
Decoded[aaa, bbb]
Encoded : 2121211222222
Decoded[111, 222]
Encoded : a2121a21121212122bb2222
Decoded[a11a1, 1112bb22]
Another way to do this, format the encoded String using the following format :
size_of_str_1:str1|size_of_str2:str2
Example : if string1 is 'aa' and string2 is 'bbbb', the encoded String is : '2:aa|4:bbbb'.
You decode it via String#subString(). the "hard" part is to parse the string until you finished to read the size of the next String.

How can non-ASCII characters be removed from a string?

I have strings "A função", "Ãugent" in which I need to replace characters like ç, ã, and à with empty strings.
How can I remove those non-ASCII characters from my string?
I have attempted to implement this using the following function, but it is not working properly. One problem is that the unwanted characters are getting replaced by the space character.
public static String matchAndReplaceNonEnglishChar(String tmpsrcdta) {
String newsrcdta = null;
char array[] = Arrays.stringToCharArray(tmpsrcdta);
if (array == null)
return newsrcdta;
for (int i = 0; i < array.length; i++) {
int nVal = (int) array[i];
boolean bISO =
// Is character ISO control
Character.isISOControl(array[i]);
boolean bIgnorable =
// Is Ignorable identifier
Character.isIdentifierIgnorable(array[i]);
// Remove tab and other unwanted characters..
if (nVal == 9 || bISO || bIgnorable)
array[i] = ' ';
else if (nVal > 255)
array[i] = ' ';
}
newsrcdta = Arrays.charArrayToString(array);
return newsrcdta;
}
This will search and replace all non ASCII letters:
String resultString = subjectString.replaceAll("[^\\x00-\\x7F]", "");
FailedDev's answer is good, but can be improved. If you want to preserve the ascii equivalents, you need to normalize first:
String subjectString = "öäü";
subjectString = Normalizer.normalize(subjectString, Normalizer.Form.NFD);
String resultString = subjectString.replaceAll("[^\\x00-\\x7F]", "");
=> will produce "oau"
That way, characters like "öäü" will be mapped to "oau", which at least preserves some information. Without normalization, the resulting String will be blank.
This would be the Unicode solution
String s = "A função, Ãugent";
String r = s.replaceAll("\\P{InBasic_Latin}", "");
\p{InBasic_Latin} is the Unicode block that contains all letters in the Unicode range U+0000..U+007F (see regular-expression.info)
\P{InBasic_Latin} is the negated \p{InBasic_Latin}
You can try something like this. Special Characters range for alphabets starts from 192, so you can avoid such characters in the result.
String name = "A função";
StringBuilder result = new StringBuilder();
for(char val : name.toCharArray()) {
if(val < 192) result.append(val);
}
System.out.println("Result "+result.toString());
[Updated solution]
can be used with "Normalize" (Canonical decomposition) and "replaceAll", to replace it with the appropriate characters.
import java.text.Normalizer;
import java.text.Normalizer.Form;
import java.util.regex.Pattern;
public final class NormalizeUtils {
public static String normalizeASCII(final String string) {
final String normalize = Normalizer.normalize(string, Form.NFD);
return Pattern.compile("\\p{InCombiningDiacriticalMarks}+")
.matcher(normalize)
.replaceAll("");
} ...
Or you can use the function below for removing non-ascii character from the string.
You will get know internal working.
private static String removeNonASCIIChar(String str) {
StringBuffer buff = new StringBuffer();
char chars[] = str.toCharArray();
for (int i = 0; i < chars.length; i++) {
if (0 < chars[i] && chars[i] < 127) {
buff.append(chars[i]);
}
}
return buff.toString();
}
The ASCII table contains 128 codes, with a total of 95 printable characters, of which only 52 characters are letters:
[0-127] ASCII codes
[32-126] printable characters
[48-57] digits [0-9]
[65-90] uppercase letters [A-Z]
[97-122] lowercase letters [a-z]
You can use String.codePoints method to get a stream over int values of characters of this string and filter out non-ASCII characters:
String str1 = "A função, Ãugent";
String str2 = str1.codePoints()
.filter(ch -> ch < 128)
.mapToObj(Character::toString)
.collect(Collectors.joining());
System.out.println(str2); // A funo, ugent
Or you can explicitly specify character ranges. For example filter out everything except letters:
String str3 = str1.codePoints()
.filter(ch -> ch >= 'A' && ch <= 'Z'
|| ch >= 'a' && ch <= 'z')
.mapToObj(Character::toString)
.collect(Collectors.joining());
System.out.println(str3); // Afunougent
See also: How do I not take Special Characters in my Password Validation (without Regex)?
String s = "A função";
String stripped = s.replaceAll("\\P{ASCII}", "");
System.out.println(stripped); // Prints "A funo"
or
private static final Pattern NON_ASCII_PATTERN = Pattern.compile("\\P{ASCII}");
public static String matchAndReplaceNonEnglishChar(String tmpsrcdta) {
return NON_ASCII_PATTERN.matcher(s).replaceAll("");
}
public static void main(String[] args) {
matchAndReplaceNonEnglishChar("A função"); // Prints "A funo"
}
Explanation
The method String.replaceAll(String regex, String replacement) replaces all instances of a given regular expression (regex) with a given replacement string.
Replaces each substring of this string that matches the given regular expression with the given replacement.
Java has the "\p{ASCII}" regular expression construct which matches any ASCII character, and its inverse, "\P{ASCII}", which matches any non-ASCII character. The matched characters can then be replaced with the empty string, effectively removing them from the resulting string.
String s = "A função";
String stripped = s.replaceAll("\\P{ASCII}", "");
System.out.println(stripped); // Prints "A funo"
The full list of valid regex constructs is documented in the Pattern class.
Note: If you are going to be calling this pattern multiple times within a run, it will be more efficient to use a compiled Pattern directly, rather than String.replaceAll. This way the pattern is compiled only once and reused, rather than each time replaceAll is called:
public class AsciiStripper {
private static final Pattern NON_ASCII_PATTERN = Pattern.compile("\\P{ASCII}");
public static String stripNonAscii(String s) {
return NON_ASCII_PATTERN.matcher(s).replaceAll("");
}
}
An easily-readable, ascii-printable, streams solution:
String result = str.chars()
.filter(c -> isAsciiPrintable((char) c))
.mapToObj(c -> String.valueOf((char) c))
.collect(Collectors.joining());
private static boolean isAsciiPrintable(char ch) {
return ch >= 32 && ch < 127;
}
To convert to "_": .map(c -> isAsciiPrintable((char) c) ? c : '_')
32 to 127 is equivalent to the regex [^\\x20-\\x7E] (from comment on the regex solution)
Source for isAsciiPrintable: http://www.java2s.com/Code/Java/Data-Type/ChecksifthestringcontainsonlyASCIIprintablecharacters.htm
CharMatcher.retainFrom can be used, if you're using the Google Guava library:
String s = "A função";
String stripped = CharMatcher.ascii().retainFrom(s);
System.out.println(stripped); // Prints "A funo"

Regex (java) help

How do I split this comma+quote delimited String into a set of strings:
String test = "[\"String 1\",\"String, two\"]";
String[] embeddedStrings = test.split("<insert magic regex here>");
//note: It should also work for this string, with a space after the separating comma: "[\"String 1\", \"String, two\"]";
assertEquals("String 1", embeddedStrings[0]);
assertEquals("String, two", embeddedStrings[1]);
I'm fine with trimming the square brackets as a first step. But the catch is, even if I do that, I can't just split on a comma because embedded strings can have commas in them.
Using Apache StringUtils is also acceptable.
You could also use one of the many open source small libraries for parsing CSVs, e.g. opencsv or Commons CSV.
If you can remove [\" from the start of the outer string and \"] from the end of it
to become:
String test = "String 1\",\"String, two";
You can use:
test.split("\",\"");
This is extremely fragile and should be avoided, but you could match the string literals.
Pattern p = Pattern.compile("\"((?:[^\"]+|\\\\\")*)\"");
String test = "[\"String 1\",\"String, two\"]";
Matcher m = p.matcher(test);
ArrayList<String> embeddedStrings = new ArrayList<String>();
while (m.find()) {
embeddedStrings.add(m.group(1));
}
The regular expression assumes that double quotes in the input are escaped using \" and not "". The pattern would break if the input had an odd number of (unescaped) double quotes.
Brute-force method, some of this may be pseudocode and I think there's a fencepost problem when setting currStart and/or String.substring(). This assumes that brackets are already removed.
boolean inquote = false;
List strings = new ArrayList();
int currStart=0;
for (int i=0; i<test.length(); i++) {
char c = test.charAt(i);
if (c == ',' && ! inquote) {
strings.add(test.substring(currStart, i);
currStart = i;
}
else if (c == ' ' && currStart + == i)
currStart = i; // strip off spaces after a comma
else if (c == '"')
inquote != inquote;
}
strings.add(test.substring(currStart,i));
String embeddedStrings = strings.toArray();

Categories