StringTokenizer vs. String.split? - java

Someone just asked a question on String.split() and the solution was to use StringTokenizer. String split comma and parenthisis-JAVA Why doesn't String.split() split on parentheses?
public static void main(String[] args) {
String a = "(id,created,employee(id,firstname," +
"employeeType(id), lastname),location)";
StringTokenizer tok = new StringTokenizer(a, "(), ");
System.out.println("StringTokenizer example");
while (tok.hasMoreElements()) {
String b = (String)tok.nextElement();
System.out.println(b);
}
System.out.println("Split example");
String[] array = a.split("(),");
for (String ii: array) {
System.out.println(ii);
}
}
Outputs:
StringTokenizer example
id
created
employee
id
firstname
employeeType
id
lastname
location
Split example
(id
created
employee(id
firstname
employeeType(id)
lastname)
location)
There was a discussion on String.split() vs. StringTokenizer at Scanner vs. StringTokenizer vs. String.Split but it doesn't explain the parentheses. Is this by design? What's going on here?

If you want split to split on the characters '(', ')', ',', and ' ', you need to pass a regex that matches any of those. The easiest is to use a character class:
String[] array = a.split("[(), ]");
Normally, parentheses in a regex are a grouping operator and would have to be escaped if you intended them to be used as literals. However, inside the character class delimiters, the parenthesis characters do not have to be escaped.

StringTokenizer does not support regular expressions . The token characters "()," for the StringTokenizer are split , so the StringTokenizer code will split the input when it encounters any one of the following ( or ) or ,
String.split takes a regular expression and parenthesis are used to term different expressions. Since there is nothing in the parenthesis , they are ignored and only the comma , is used.

Related

How can I avoid splitting on a comma in brackets?

I have a string below which I want to split in String array with multiple delimiters.
The delimiters are comma (,), semicolon (;), "OR" and "AND".
But I do not want to split on a comma if it's in brackets.
Example input:
device_name==device503,device_type!=GATEWAY;site_name<site3434 OR country==India AND location==BLR; new_name=in=(Rajesh,Suresh)
I am able to split the String with regex, but it doesn't handle commas in brackets correctly.
How can I fix this?
Pattern ptn = Pattern.compile("(,|;|OR|AND)");
String[] parts = ptn.split(query);
for(String p:parts){
System.out.println(p);
queryParams.add(p.trim());
}
You could use a negative look-ahead:.
String[] parts = input.split(",(?![^()]*\\))|;| OR | AND ")
Or an uglier (but perhaps conceptually simpler) way you could do it would be to replace any commas within brackets with a temporary placeholder, then do the split and replace the placeholders with real commas in the results.
String input = "X,Y=((A,B),C) OR Z";
Pattern pattern = Pattern.compile("\\(.*\\)");
Matcher matcher = pattern.matcher(input);
StringBuffer sb = new StringBuffer();
while (matcher.find()) {
matcher.appendReplacement(sb, matcher.group().replaceAll(",", "_COMMA_"));
}
matcher.appendTail(sb);
String[] parts = sb.toString().split("(,|;| OR | AND )");
for (String part : parts) {
System.out.println(part.replace("_COMMA_", ","));
}
Prints:
X
Y=((A,B),C)
Z
Alternatively, you could write your own little tokenizer that reads the input character-by-character using charAt(index) or define a grammar for an off-the-shelf parser.
You can use negative look-ahead (?!...), which looks at the following characters, and if those characters match the pattern in brackets, the overall match will fail.
String query = "device_name==device503,device_type!=GATEWAY;site_name<site3434 OR country==India AND location==BLR; new_name=in=(Rajesh,Suresh)";
String[] parts = query.split("\\s*(,(?![^()]*\\))|;|OR|AND)\\s*");
for(String part: parts)
System.out.println(part);
Output:
device_name==device503
device_type!=GATEWAY
site_name<site3434
country==India
location==BLR
new_name=in=(Rajesh,Suresh)
So in this case we check whether the characters following the , are 0 or more characters which aren't either ( or ), followed by a ), and if this is true, the , match fails.
This won't work if you can have nested brackets.
Note:
String also has a split method (as used above), which is useful for simplicity's sake (but would be slower than reusing the same Pattern over and over again for multiple Strings).
You can add \\s* (0 or more whitespace characters) to your regex to remove any spaces before or after a delimiter.
If you're using | without anything before or after (e.g. "a|b|c"), you don't need to put it in brackets.

String.split not working with combination of delimiter {^

I am trying to split the string with combination of {^
How to use combination of delimiter for splitting the string.
The sample data is :
String str = "0002{^000000000000001157{^000006206210015461{^PR{^ID{^62499{^";
The delimiter passed to String.split() is a regex. As { and ^ are characters with special meaning within a regex, you need to escape them if you want to use them as literals:
String[] tokens = str.split("\\{\\^");
split method in java takes an regex as an input.
so if you want to split the string using '{' and '^' then you need to do the following:
String str = "0002{^000000000000001157{^000006206210015461{^PR{^ID{^62499{^";
String[] splitted = str.split("\\{\\^"); //note \\ before { and ^
You have to escape { and ^ in your split Statement, because both are Special character in regex:
s.split("\\{\\^");

split string based on comma delimiter

what is wrong in the following code?
String selectedCountriesStr = countries.replaceAll("[", "").replaceAll("]", "").trim();
String[] selectedCountriesArr = selectedCountriesStr.split(",");
Input String [10000,20000,304050,766666]
Getting error java.util.regex.PatternSyntaxException: Unclosed character class near index 0
You have to escape square brackets because replaceAll() interprets the first argument as a regular expression:
replaceAll("\\[", "")
^^
because, as the error message tells you, the are used for character classes in a regex. Double backslashes are necessary, because "\[" would be an invalid escape sequence. Since the backslash is escaped, the regex engine only receives one backslash.
Also, you can use
replace("[", "")
it will also replace all occurrences of the given CharSequence as is.
You can read more about it in JavaDoc.
Brackets are regex metacharacters, you need to prefix them with a backslash:
.replaceAll("\\[", "").replaceAll("\\]", "")
Also, since this is a simple string substitution, you'd better use .replace():
.replace("[", "").replace("]", "")
String str = "hi,hello,abc,example,problems";
String[] splits = str.split(",");
System.out.println("splits.size: " + splits.length);
for(String asset: splits){
System.out.println(asset);
}
Split function will easily split your string like this

Java - Split string

i have string which is separated by "." when i try to split it by the dot it is not getting spitted.
Here is the exact code i have. Please let me know what could cause this not to split the string.
public class TestStringSplit {
public static void main(String[] args) {
String testStr = "[Lcom.hexgen.ro.request.CreateRequisitionRO;";
String test[] = testStr.split(".");
for (String string : test) {
System.out.println("test : " + string);
}
System.out.println("Str Length : " + test.length);
}
}
I have to separate the above string and get only the last part. in the above case it is CreateRequisitionRO not CreateRequisitionRO; please help me to get this.
You can split this string through StringTokenizer and get each word between dot
StringTokenizer tokenizer = new StringTokenizer(string, ".");
String firstToken = tokenizer.nextToken();
String secondToken = tokenizer.nextToken();
As you are finding for last word CreateRequisitionRO you can also use
String testStr = "[Lcom.hexgen.ro.request.CreateRequisitionRO;";
String yourString = testStr.substring(testStr.lastIndexOf('.')+1, testStr.length()-1);
String testStr = "[Lcom.hexgen.ro.request.CreateRequisitionRO;";
String test[] = testStr.split("\\.");
for (String string : test) {
System.out.println("test : " + string);
}
System.out.println("Str Length : " + test.length);
The "." is a regular expression wildcard you need to escape it.
Change String test[] = testStr.split("."); to String test[] = testStr.split("\\.");.
As the argument to String.split takes a regex argument, you need to escape the dot character (which means wildcard in regex):
Note that String.split takes in a regular expression, and . has special meaning in regular expression (which matches any character except for line separator), so you need to escape it:
String test[] = testStr.split("\\.");
Note that you escape the . at the level of regular expression once: \., and to specify \. in a string literal, \ needs to be escaped again. So the string to pass to String.split is "\\.".
Or another way is to specify it inside a character class, where . loses it special meaning:
String test[] = testStr.split("[.]");
You need to escape the . as it is a special character, a full list of these is available. Your split line needs to be:
String test[] = testStr.split("\\.");
Split takes a regular expression as a parameter. If you want to split by the literal ".", you need to escape the dot because that is a special character in a regular expression. Try putting 2 backslashes before your dot ("\\.") - hopefully that does what you are looking for.
String test[] = testStr.split("\\.");

String Tokenizing in java

I need to tokenize a string using a delimiter.
StringTokenizer is capable of tokenizing the string with given delimiter. But, when there are two consecutive delimiters in the string, then it is not considering it as a token.
Thanks in advance for you help
Regards,
The second parameter to the constructor of StringTokenizer object is just a string containing all delimiters that you require.
StringTokenizer st = new StringTokenizer(str, "#!");
In this case, there are two delimiters both # and !
Consider this example :
String s = "Hello, i am using Stack Overflow;";
System.out.println("s = " + s);
String delims = " ,;";
StringTokenizer tokens = new StringTokenizer(s, delims);
while(tokens.hasMoreTokens())
System.out.println(tokens.nextToken());
Here you would get an output similar to this with 3 delimiters :
Hello
,
i
am
using
Stack
Overflow
;
Look into String.split()
This should do what you are looking for.
http://docs.oracle.com/javase/6/docs/api/java/lang/String.html
Use the split() method of java.lang.String and pass it a regular expression which matches your one or more delimiter condition.
for e.g. "a||b|||c||||d" could be tokenised with split("\\|{2,}"); with the resulting array [a,b,c,d]

Categories