String.split() Not Acting on Semicolon or Space Delimiters - java

This may be a simple question, but I have been Googling for over an hour and haven't found an answer yet.
I'm trying to simply use the String.split() method with a small Android application to split an input string. The input string will be something along the lines of: "Launch ip:192.168.1.101;port:5900". I'm doing this in two iterations to ensure that all of the required parameters are there. I'm first trying to do a split on spaces and semicolons to get the individual tokens sorted out. Next, I'm trying to split on colons in order to strip off the identification tags of each piece of information.
So, for example, I would expect the first round of split to give me the following data from the above example string:
(1) Launch
(2) ip:192.168.1.101
(3) port:5900
Then the second round would give me the following:
(1) 192.168.1.101
(2) 5900
However, the following code that I wrote doesn't give me what's expected:
private String[] splitString(String inputString)
{
String[] parsedString;
String[] orderedString = new String[SOSLauncherConstants.SOCKET_INPUT_STRING_PARSE_VALUE];
parsedString = inputString.trim().split("; ");
Log.i("info", "The parsed data is as follows for the initially parsed string of size " + parsedString.length + ": ");
for (int i = 0; i < parsedString.length; ++i)
{
Log.i("info", parsedString[i]);
}
for (int i = 0; i < parsedString.length; ++i )
{
if (parsedString[i].toLowerCase().contains(SOSLauncherConstants.PARSED_LAUNCH_COMMAND_VALUE))
{
orderedString[SOSLauncherConstants.PARSED_COMMAND_WORD] = parsedString[i];
}
if (parsedString[i].toLowerCase().contains("ip"))
{
orderedString[SOSLauncherConstants.PARSED_IP_VALUE] = parsedString[i].split(":")[1];
}
else if (parsedString[i].toLowerCase().contains("port"))
{
orderedString[SOSLauncherConstants.PARSED_PORT_VALUE] = parsedString[i].split(":")[1];
}
else if (parsedString[i].toLowerCase().contains("username"))
{
orderedString[SOSLauncherConstants.PARSED_USERNAME_VALUE] = parsedString[i].split(":")[1];
}
else if (parsedString[i].toLowerCase().contains("password"))
{
orderedString[SOSLauncherConstants.PARSED_PASSWORD_VALUE] = parsedString[i].split(":")[1];
}
else if (parsedString[i].toLowerCase().contains("color"))
{
orderedString[SOSLauncherConstants.PARSED_COLOR_VALUE] = parsedString[i].split(":")[1];
}
}
Log.i("info", "The parsed data is as follows for the second parsed string of size " + orderedString.length + ": ");
for (int i = 0; i < orderedString.length; ++i)
{
Log.i("info", orderedString[i]);
}
return orderedString;
}
For a result, I'm getting the following:
The parsed data is as follows for the parsed string of size 1:
launch ip:192.168.1.106;port:5900
The parsed data is as follows for the second parsed string of size 6:
launch ip:192.168.1.106;port:5900
192.168.1.106;port
And then, of course, it crashes because the for loop runs into a null string.
Side Note:
The following snippet is from the constants class that defines all of the string indexes --
public static final int SOCKET_INPUT_STRING_PARSE_VALUE = 6;
public static final int PARSED_COMMAND_WORD = 0;
public static final String PARSED_LAUNCH_COMMAND_VALUE = "launch";
public static final int PARSED_IP_VALUE = 1;
public static final int PARSED_PORT_VALUE = 2;
public static final int PARSED_USERNAME_VALUE = 3;
public static final int PARSED_PASSWORD_VALUE = 4;
public static final int PARSED_COLOR_VALUE = 5;
I looked into needing a possible escape (by inserting a \\ before the semicolon) on the semicolon delimiter, and even tried using it, but that didn't work. The odd part is that neither the space nor the semicolon function as a delimiter, yet the colon works on the second time around. Does anybody have any ideas what would cause this?
Thanks for your time!
EDIT: I should also add that I'm receiving the string over a WiFi socket connection. I don't think this should make a difference, but I'd like you to have all of the information that you need.

String.split(String) takes a regex. Use "[; ]". eg:
"foo;bar baz".split("[; ]")
will return an array containing "foo", "bar" and "baz".
If you need groups of spaces to work as a single delimiter, you can use something like:
"foo;bar baz".split("(;| +)")

I believe String.split() tries to split on each of the characters you specify together (or on a regex), not each character individually. That is, split(";.") would not split "a;b.c" at all, but would split "a;.b".
You may have better luck with Guava's Splitter, which is meant to be slightly less unpredictable than java.lang.String.split.
I would write something like
Iterable<String> splits = Splitter.on(CharMatcher.anyOf("; ")).split(string);
but Splitter also provides fluent-style customization like "trim results" or "skip over empty strings."

Is there a reason why you are using String.split(), but not using Regular Expressions? This is a perfect candidate for regex'es, esp if the string format is consistent.
I'm not sure if your format is fixed, and if it is, then the following regex should break it down for you (am sure that someone can come up with an even more elegant regex). If you have several command strings that follow, then you can use a more flexible regex and loop over all the groups:
Pattern p = Pattern.compile("([\w]*)[ ;](([\w]*):([^ ;]*))*");
Matcher m = p.match( <input string>);
if( m.find() )
command = m.group(1);
do{
id = m.group(3);
value = m.group(4);
} while( m.find() );
A great place to test out regex'es online is http://www.regexplanet.com/simple/index.html. It allows you to play with the regex without having to compile and launch you app every time if you just want to get the regex correct.

Related

String splitting and wrapping

Suppose I am given this string-
String A_Old[]="010.011.100.000.111";
On every occurrence of "000" I am expected to break the string and wrap the end towards the front of the string. The resultant should look like this-
String A_New[]="111.010.011.100";
Any help on how to tackle this problem would be appreciated.
Also, what do I do when there are multiple occurrences of "000"?
String A_Old[]="010.011.100.000.111.001.011.000.101.110";
should convert to :
String A_New[]="101.110.111.001.011.010.011.100";
Code in java or c++ is understood and appreciated.
EDIT
This was what I thought would work-
String b[]=A_Old.split(".000.",2); //should split the string in 2 parts.
A_New=b[1].concat(b[0]); // concatenation with a loss of a period
I was told to avoid the loss of the period as well as not using dummy variables. Both of which fails here. Any idea on how to tackle this?
To rotate the list, you wanna split and then concatenate in the order you want. You probably want to check for blanks/if it's the last element.
public static void main(String[] args)
{
String inString = "010.011.100.000.111";
String outString = "";
String[] arr = inString.split("\\.000\\.");
outString = arr[0];
for(int i = 1; i < arr.length; i++)
{
outString = outString + arr[i];
}
System.out.println(outString);
}
By Using Split method and reverse forloop in java
public static void main(String[] args) {
//insert code here
String A_Old="010.011.100.000.111.001.011.000.101.110";
String A_new ="";
String newe[]=A_Old.split("\\.000");
for(int i=newe.length-1;i>=0;i--){
A_new+=newe[i];
}
System.out.print(A_new.replaceFirst("\\.", ""));
}
I tried it in java:
public static String convert(String in) {
String[] strs = in.split("\\.*000\\.*");
StringBuilder sb = new StringBuilder();
for (int i = strs.length - 1; i >= 0; --i) {
sb.append(strs[i]);
if (i > 0 && strs[i - 1].length() > 0) {
sb.append(".");
}
}
return sb.toString();
}
public static void main(String[] args) {
System.out.println(convert("010.011.100.000.111"));
System.out.println(convert("010.011.100.000.111.001.011.000.101.110"));
System.out.println(convert("010.011.100.111"));
System.out.println(convert("000.010.011.100.111"));
System.out.println(convert("010.011.100.111.000"));
System.out.println(convert("000.010.011.100.111.000"));
}
Output:
111.010.011.100
101.110.111.001.011.010.011.100
010.011.100.111
010.011.100.111
010.011.100.111
010.011.100.111
"No dummy variables" sounds like a recursive solution is sought. Is this a Scheme class?
Here's a Perl solution. C++11 is sufficiently high-level to provide most of the facilities demonstrated here:
my #tests = (
'010.011.100.000.111',
'010.011.100.000.111.001.011.000.101.110'
);
foreach ( #tests ) {
print "$_ => ", rotate($_), "\n"; # Call our rotate routine, and show results.
}
sub rotate {
# Split the string into components, reverse their order, join them with '.'
return join('.', reverse split /(?:^|\.)000(?:\.|$)/, shift);
}
No dummy variables. In fact, no explicit variables either.
Here's how the split regex works:
(?:^|\.) # Match either start of string, or a dot.
000 # Match three zeros.
(?:\.|$) # Match either a dot, or the end of string.
This permits the 000 to appear anywhere in the string (including beginning or end). If it appears at the beginning, there will not need to be a leading dot. If it appears at the end, a trailing dot will not be required. Anywhere else, dots must surround the 000.
Update:
Here is a recursive solution using Perl. From what I've read of the Java solutions (I'm not a Java guy), it seems like translation should be straightforward for someone with some Java background.
my #tests = (
'010.011.100.000.111',
'010.011.100.000.111.001.011.000.101.110'
);
foreach ( #tests ) {
print "($_) => (", rotate($_), ")\n";
}
sub rotate {
my( $ok, $rest ) = split /(?:^|\.)000(?:\.|$)/, $_[0], 2;
return $ok unless defined $rest;
return join( '.', rotate($rest), $ok);
}
Update2:
And here's another Perl solution that eliminates any explicit variables, using only subroutine parameters:
my #tests = (
'010.011.100.000.111',
'010.011.100.000.111.001.011.000.101.110'
);
foreach ( #tests ) {
print "($_) => (", rotate($_), ")\n";
}
sub rotate {
return sub {
return $_[0] unless defined $_[1];
return join( '.', rotate($_[1]), $_[0]);
}->( split /(?:^|\.)000(?:\.|$)/, shift, 2 );
}
I have no idea how one would translate that Java. I suspect translation would become a little more difficult, though I'm confident there must exist a semantically equivalent C++11 solution.

How to remove case sensitivity in this particular situation?

I would like to ask you for your help, regarding this code. I am trying to do a kind of encoding of particular words, such as "Microsoft" etc. (random ones, just to learn the technique). I've suceeded in doing everything, but to make this kinf of searching for words case insesitive. Here is the code:
public class BannedWords {
public static String returnStars(int length){
String stars = "";
String addStar = "*";
for (int i = 1; i<=length; i++){
stars += addStar;
}
return stars;
}
public static void main(String[] args) {
String textString = "Microsoft announced its next generation Java compiler today."
+ " It uses advanced parser and special optimizer for the Microsoft JVM.";
StringBuilder text = new StringBuilder(textString);
String bannedWords = "Java, JVM, Microsoft";
String [] bWordsArr = bannedWords.split("[, ]+");
for(int i = 0; i<bWordsArr.length; i++){
int index = textString.indexOf(bWordsArr[i]);
while(index != -1){
text = text.replace(index, index+bWordsArr[i].length(), returnStars(bWordsArr[i].length()));
index = textString.indexOf(bWordsArr[i], index +1);
}
}
System.out.println(text.toString());
}
}
I need to search for "Java", "JVM" and "Microsoft" words regardless of their case, even if we try "MiCrosoFt" it should work, but after a few hours thinking and trying to do it with using toUpperCase(), toLowerCase(), I couldn't find out how to do that. Do you have any ideas ?
Thank you beforehand ! :)
When using indexOf(), toLowerCase() will convert checked text to lowercase. Then, you must put your search terms in lowercase.
String text = "Java is a good programming language.";
int index = text.toLowerCase().indexOf("java");
You can also use toUpperCase(), simply put your search terms in uppercase.
Actually you should be using equalsIgnoreCase. This is the correct way of comparing strings irrespective of their case. And also you don't have to modify the original string to upper or lower case to perform check. I hope it helps :)
yourString.equalsIgnoreCase(anotherString)

How can i extract specific terms from string lines in Java?

I have a serious problem with extracting terms from each string line. To be more specific, I have one csv formatted file which is actually not csv format (it saves all terms into line[0] only)
So, here's just example string line among thousands of string lines:
(split() doesn't work.!!! )
test.csv
"31451 CID005319044   15939353   C8H14O3S2    beta-lipoic acid   C1C[S#](=O)S[C##H]1CCCCC(=O)O "
"12232 COD05374044 23439353  C924O3S2    saponin   CCCC(=O)O "
"9048   CTD042032 23241  C3HO4O3S2 Berberine  [C##H]1CCCCC(=O)O "
I want to extract "beta-lipoic acid" ,"saponin" and "Berberine" only which is located in 5th position.
You can see there are big spaces between terms, so that's why I said 5th position.
In this case, how can I extract terms located in 5th position for each line?
One more thing: the length of whitespace between each of the six terms is not always equal. the length could be one, two, three, four, or five, or something like that.
Because the length of whitespace is random, I can not use the .split() function.
For example, in the first line I would get "beta-lipoic" instead "beta-lipoic acid.**
Here is a solution for your problem using the string split and index of,
import java.util.ArrayList;
public class StringSplit {
public static void main(String[] args) {
String[] seperatedStr = null;
int fourthStrIndex = 0;
String modifiedStr = null, finalStr = null;
ArrayList<String> strList = new ArrayList<String>();
strList.add("31451 CID005319044   15939353   C8H14O3S2 beta-lipoic acid C1C[S#](=O)S[C##H]1CCCCC(=O)O ");
strList.add("12232 COD05374044 23439353 C924O3S2 saponin CCCC(=O)O ");
strList.add("9048 CTD042032 23241 C3HO4O3S2 Berberine [C##H]1CCCCC(=O)O ");
for (String item: strList) {
seperatedStr = item.split("\\s+");
fourthStrIndex = item.indexOf(seperatedStr[3]) + seperatedStr[3].length();
modifiedStr = item.substring(fourthStrIndex, item.length());
finalStr = modifiedStr.substring(0, modifiedStr.indexOf(seperatedStr[seperatedStr.length - 1]));
System.out.println(finalStr.trim());
}
}
}
Output:
beta-lipoic acid
saponin
Berberine
Option 1 : Use spring.split and check for multiple consecutive spaces. Like the code below:
String s[] = str.split("\\s\\s+");
for (String string : s) {
System.out.println(string);
}
Option 2 : Implement your own string split logic by browsing through all the characters. Sample code below (This code is just to give an idea. I didnot test this code.)
public static List<String> getData(String str) {
List<String> list = new ArrayList<>();
String s="";
int count=0;
for(char c : str.toCharArray()){
System.out.println(c);
if (c==' '){
count++;
}else {
s = s+c;
}
if(count>1&&!s.equalsIgnoreCase("")){
list.add(s);
count=0;
s="";
}
}
return list;
}
This would be a relatively easy fix if it weren't for beta-lipoic acid...
Assuming that only spaces/tabs/other whitespace separate terms, you could split on whitespace.
Pattern whitespace = Pattern.compile("\\s+");
String[] terms = whitespace.split(line); // Not 100% sure of syntax here...
// Your desired term should be index 4 of the terms array
While this would work for the majority of your terms, this would also result in you losing the "acid" in "beta-lipoic acid"...
Another hacky solution would be to add in a check for the 6th spot in the array produced by the above code and see if it matches English letters. If so, you can be reasonably confident that the 6th spot is actually part of the same term as the 5th spot, so you can then concatenate those together. This falls apart pretty quickly though if you have terms with >= 3 words. So something like
Pattern possibleEnglishWord = Pattern.compile([[a-zA-Z]*); // Can add dashes and such as needed
if (possibleEnglishWord.matches(line[5])) {
// return line[4].append(line[5]) or something like that
}
Another thing you can try is to replace all groups of spaces with a single space, and then remove everything that isn't made up of just english letters/dashes
line = whitespace.matcher(line).replaceAll("");
Pattern notEnglishWord = Pattern.compile("^[a-zA-Z]*"); // The syntax on this is almost certainly wrong
notEnglishWord.matcher(line).replaceAll("");
Then hopefully the only thing that is left would be the term you're looking for.
Hopefully this helps, but I do admit it's rather convoluted. One of the issues is that it appears that non-term words may have only one space between them, which would fool Option 1 as presented by Hirak... If that weren't the case that option should work.
Oh by the way, if you do end up doing this, put the Pattern declarations outside of any loops. They only need to be created once.

Reading a string with multiple options

I have String like ",yes,,,,,,,,,,,,," which says option2 is selected out of 15 options. Here, a comma , represents a option; if it is selected then some data will be there in place of option. I need to read this string and get the exact option selected value. In above it should be option2. How shall I do this?
I have 15 options in database from which selected data is replaced here and , in place none selected.
Or, looked at another way, there are 15 fields separated by commas. One field — in the example, the second field — has a non-empty value; the others are all empty. How can I determine the first field that is not empty?
Try String.split(",") - this will return String[]
http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#split%28java.lang.String%29
public class Split {
public static void main(String [] args) {
String [] options = args[0].split(",",15);
for(int i = 0; i < options.length; i++) {
System.out.printf("option %d = [%s]\n", i, options[i]);
}
}
}
If I understand you correctly, you get a string of commas and between two commas there is some word like "Yes". Your task is to retrieve the index of that word, i.e., the number of commas (plus 1) before the word.
First of all, that encoding for an option is quite stupid, so if it lies in your responsibility, change it.
The simple solution is to count the commas before the word.
Something like this will do:
String s = ",yes,,,,"
for ( int i = 0 , len = s.length() ; i < len ; i++ )
if ( s.charAt(i) != ',' )
return i+1;
throw new Exception ("No option found");
Well, I assume you want to split the string on , like this:
String[] options = ",yes,,,,,,,,,,,,,".split(",");
String option2 = options[1]; //yields "yes"
However, why don't you use some more understandable markup, like option2=yes etc.?
Have a look at Apache Commons CLI for some library to support better options.
If your String always follows the format you specify, you can use String.split(). For example,
String[] split = ",yes,,,,,,,,,,,,,".split(",");
System.out.format("The chosen data is in Option %d and is '%s'%n", split.length, split[split.length - 1]);
prints
The chosen data is in Option 2 and is 'yes'

How do I concatenate input in java?

I am trying to concatenate and trying to parse at the same time. I am right now making a excel like program where I can say a1 = "Hello" + "World" and in the cell of A1 have it say HelloWorld. I just need to know how to parse the adding sign and connect those two words. Please tell me if you need more code to understand this, like the runner.
This is my parseInput class :
public class ParseInput {
private static String inputs;
static int col;
private static int row;
private static String operation;
private static Value field;
public static void parseInput(String input){
//splits the input at each regular expression match. \w is used for letters and \d && \D for integers
inputs = input;
Scanner tokens = new Scanner(inputs);
String none0 = tokens.next();
#SuppressWarnings("unused")
String none1 = tokens.next();
operation = tokens.nextLine().substring(1);
String[] holder = new String[2];
String regex = "(?<=[\\w&&\\D])(?=\\d)";
holder = none0.split(regex);
row = Integer.parseInt(holder[1]);
col = 0;
int counter = -1;
char temp = holder[0].charAt(0);
char check = 'a';
while(check <= temp){
if(check == temp){
col = counter +1;
}
counter++;
check = (char) (check + 1);
}
System.out.println(col);
System.out.println(row);
System.out.println(operation);
setField(Value.parseValue(operation));
Spreadsheet.changeCell(row, col, field);
}
public static Value getField() {
return field;
}
public static void setField(Value field) {
ParseInput.field = field;
}
}
This is actually a pretty complicated problem unless you can constrain input to a very small subset of what Excel accepts. If not then you'll probably want to look into something like ANTLR. However, assuming the above input then you'll want to do something like:
Split the string on the equal sign into s1 and s2
Split s2 on the plus sign into s3 and s4.
Trim all the strings, remove the quotes around s3 and s4.
Concatenate s3 and s4 and assign to your datastore indexed by s1.
Depending on how complex your concatenation needs are you can either use string concatenation or a StringBuilder:
result = "" + s3 + s4; // string concatenation
result = new StringBuilder().append(s3).append(s4).toString(); // StringBuilder
Let me know if you have any questions about any of the steps detailed above.
Details on (1) above, assuming input is a1 = "Hello" + "World":
String[] strings = input.split("=");
String s1 = strings[0].trim(); // a1
String s2 = strings[1].trim(); // "Hello" + "World"
strings = s2.split("+");
String s3 = strings[0].trim().replaceAll("^\"", "").replaceAll("\"$", "") // Hello
String s4 = strings[1].trim().replaceAll("^\"", "").replaceAll("\"$", ""); // World
String field = s3 + s4;
String colString = s1.replaceAll("[\\d]", ""); // a
String rowString = s1.replaceAll("[\\D]", ""); // 1
int col = colString.charAt(0) - 'a'; // 0
int row = Integer.parseInt(rowString);
Spreadsheet.changeCell(row, col, field);
I suggest you to implement your custom grammar using a parser generator like JavaCC.
Here you can find a simple tutorial.
I believe this is the better solution because in this way you can handle every expression you need.
Are you sure you want to use all the classes you are using? To parse something like "a=b+c+d.." (assuming you are not trying to validate), easiest and possibly the most efficient way is to use split API in Java lang String
Then join whatever is required using StringBuilder
You need to design and implement a parser and an evaluator. And before that, you need to design the language that your parser/evaluator is going to evaluate.
How to do it.
If your language is really simple, you can get away with parsing it by hand, using something like StringTokenizer to do the tokenization,
Otherwise, you are probably best off learning to use a Java "parser generator" such as JavaCC or ANTLR.
Either way, you need to do some background reading to understand all of the terminology. You could start with Wikipedia and/or the tutorial material from one of the parser generators. Alternatively, there are good textbooks on this topic.
In addition to what Abdullah said, if you really want to save every single ounce of memory you can, you should use the StringBuilder instead of the String concatenation. I believe i read somewhere before that the String concatenation make a new string object for each concatenations while the StringBuilder will add them all to a single String. Shouldn't matter too much though.
In my early life I made an equation evaluator in your style. It cost me huge code and complexity, because of my unawareness about Expression trees. But now with this you will be able to add more capabilities to your parser easily and with native JAVA codes. You will get tons of example of using Expression Trees.

Categories