Split a string by "," and not spaces - java

Every time I try to split a string e.g. foo,bar,foo bar,bar it skips the string after the space.
How do I stop Java from doing this?
String[] transactionItem = transactionItems[i].split(",");
if transactioItems[0] = Y685,Blue Tie,2,34.79,2
it would output
transactionItem[0] = Y685
transactionItem[1] = Blue
transactionItem[3] = out of bounds

This code is working correctly:
String[] split = myString.split(",");
Basic demo : http://www.ideone.com/kLchx
With your new example, it works too : http://www.ideone.com/hWWzd
I think we need more code to search the problem.

This:
transactionItem[3]
Should be 2 instead of 3. Arrays are 0 indexed.

Related

How can i extract specific terms from string lines in Java?

I have a serious problem with extracting terms from each string line. To be more specific, I have one csv formatted file which is actually not csv format (it saves all terms into line[0] only)
So, here's just example string line among thousands of string lines:
(split() doesn't work.!!! )
test.csv
"31451 CID005319044   15939353   C8H14O3S2    beta-lipoic acid   C1C[S#](=O)S[C##H]1CCCCC(=O)O "
"12232 COD05374044 23439353  C924O3S2    saponin   CCCC(=O)O "
"9048   CTD042032 23241  C3HO4O3S2 Berberine  [C##H]1CCCCC(=O)O "
I want to extract "beta-lipoic acid" ,"saponin" and "Berberine" only which is located in 5th position.
You can see there are big spaces between terms, so that's why I said 5th position.
In this case, how can I extract terms located in 5th position for each line?
One more thing: the length of whitespace between each of the six terms is not always equal. the length could be one, two, three, four, or five, or something like that.
Because the length of whitespace is random, I can not use the .split() function.
For example, in the first line I would get "beta-lipoic" instead "beta-lipoic acid.**
Here is a solution for your problem using the string split and index of,
import java.util.ArrayList;
public class StringSplit {
public static void main(String[] args) {
String[] seperatedStr = null;
int fourthStrIndex = 0;
String modifiedStr = null, finalStr = null;
ArrayList<String> strList = new ArrayList<String>();
strList.add("31451 CID005319044   15939353   C8H14O3S2 beta-lipoic acid C1C[S#](=O)S[C##H]1CCCCC(=O)O ");
strList.add("12232 COD05374044 23439353 C924O3S2 saponin CCCC(=O)O ");
strList.add("9048 CTD042032 23241 C3HO4O3S2 Berberine [C##H]1CCCCC(=O)O ");
for (String item: strList) {
seperatedStr = item.split("\\s+");
fourthStrIndex = item.indexOf(seperatedStr[3]) + seperatedStr[3].length();
modifiedStr = item.substring(fourthStrIndex, item.length());
finalStr = modifiedStr.substring(0, modifiedStr.indexOf(seperatedStr[seperatedStr.length - 1]));
System.out.println(finalStr.trim());
}
}
}
Output:
beta-lipoic acid
saponin
Berberine
Option 1 : Use spring.split and check for multiple consecutive spaces. Like the code below:
String s[] = str.split("\\s\\s+");
for (String string : s) {
System.out.println(string);
}
Option 2 : Implement your own string split logic by browsing through all the characters. Sample code below (This code is just to give an idea. I didnot test this code.)
public static List<String> getData(String str) {
List<String> list = new ArrayList<>();
String s="";
int count=0;
for(char c : str.toCharArray()){
System.out.println(c);
if (c==' '){
count++;
}else {
s = s+c;
}
if(count>1&&!s.equalsIgnoreCase("")){
list.add(s);
count=0;
s="";
}
}
return list;
}
This would be a relatively easy fix if it weren't for beta-lipoic acid...
Assuming that only spaces/tabs/other whitespace separate terms, you could split on whitespace.
Pattern whitespace = Pattern.compile("\\s+");
String[] terms = whitespace.split(line); // Not 100% sure of syntax here...
// Your desired term should be index 4 of the terms array
While this would work for the majority of your terms, this would also result in you losing the "acid" in "beta-lipoic acid"...
Another hacky solution would be to add in a check for the 6th spot in the array produced by the above code and see if it matches English letters. If so, you can be reasonably confident that the 6th spot is actually part of the same term as the 5th spot, so you can then concatenate those together. This falls apart pretty quickly though if you have terms with >= 3 words. So something like
Pattern possibleEnglishWord = Pattern.compile([[a-zA-Z]*); // Can add dashes and such as needed
if (possibleEnglishWord.matches(line[5])) {
// return line[4].append(line[5]) or something like that
}
Another thing you can try is to replace all groups of spaces with a single space, and then remove everything that isn't made up of just english letters/dashes
line = whitespace.matcher(line).replaceAll("");
Pattern notEnglishWord = Pattern.compile("^[a-zA-Z]*"); // The syntax on this is almost certainly wrong
notEnglishWord.matcher(line).replaceAll("");
Then hopefully the only thing that is left would be the term you're looking for.
Hopefully this helps, but I do admit it's rather convoluted. One of the issues is that it appears that non-term words may have only one space between them, which would fool Option 1 as presented by Hirak... If that weren't the case that option should work.
Oh by the way, if you do end up doing this, put the Pattern declarations outside of any loops. They only need to be created once.

Java - split string into an array

I have this code
String speed_string = "baka baka saka laka";
String[] string_array = speed_string.split(" ");
System.out.println(string_array.length);
and it returns the value of 1 when I run it. Why is that? It seems as if only the first word of the string gets saved.
Use \\s and update the code as below
String speed_string = "baka baka saka laka";
String[] string_array = speed_string.split("\\s");
System.out.println(string_array.length);
Most probably what you think is space (ASCII decimal 32) is not (in your input string).
That would explain perfectly the behavior you're seeing.

Android - how to delete last 2 lines from String?

I have a String which always looks like this:
data
data
data
data
non-data
non-data
And I need to delete the 2 last lines from it. The lenght of these lines can be different. How I can do that fast (String = ~1000 lines)?
I'd say something along the lines of:
String[] lines = input.split("\n");
String[] dataLines = Arrays.copyOfRange(lines, 0, lines.length - 2);
int lastNewLineAt = string.lastIndexOf("\n");
string.subString(0, string.lastIndexOf("\n", lastNewLineAt));
You can use constant for new line character reading system property
This Code will split your text by "\n" 's which means your lines in to a String Array.
Than you will get that array's length..
And in a for loop you will set and append your text till your length-1 element.
This may be a long approach but I was searching this and I couldn't find anything.
This was my easiest way.
String[] lines = YourTextViev.getText().toString().split("\n");
YourTextView.setText(""); // clear your TextView
int Arraylength = lines.length-1; // Changing "-1" will change which lines will be deleted
for(int i=0;i<Arraylength;i++){
YourTextView.append(lines[i]+"\n");
}

Cannot get a Substring of a substring

I'm trying to parse a String from a file that looks something like this:
Mark Henry, Tiger Woods, James the Golfer, Bob,
3, 4, 5, 1,
1, 2, 3, 5,
6, 2, 1, 4,
For ease of use, I'd like to split off the first line of the String, because it will be the only one that cannot be converted into integer values (the rest will be stored in a double array of integers[line][value]);
I tried to use String.split("\\\n") to divide out each line into its own String, which works. However, I am unable to divide the new strings into substrings with String.split("\\,"). I am not sure what is going on:
String[] firstsplit = fileOne.split("\\\n");
System.out.println("file split into " + firstsplit.length + " parts");
for (int i = 0; i < firstsplit.length; i++){
System.out.println(firstsplit[i]); // prints values to verify it works
}
String firstLine = firstsplit[0];
String[] secondSplit = firstLine.split("\\,");
System.out.println(secondSplit[0]); // prints nothing for some reason
I've tried a variety of different things with this, and nothing seems to work (copying over to a new String is an attempt to get it to work even). Any suggestions?
EDIT: I have changed it to String.split(",") and also tried String.split(", ") but I still get nothing to print afterwards.
It occurs to me now that maybe the first location is a blank one....after testing I found this to be true and everything works for firstsplit[1];
You're trying to split \\,, which translates to the actual value \,. You want to escape only ,.
Comma , doesn't need \ before it as it isn't a special character. Try using , instead of \\,, which is translated to \, (not only a comma, also a backslash).
Not only do you not need to escape a comma, but you also don't need three backslashes for the newline character:
String[] firstSplit = fileOne.split("\n");
That will work just fine. I tested your code with the string you specified, and it actually worked just fine, and it also worked just fine splitting without the extraneous escapes...
Have you actually tested it with the String data you provided in the question, or perhaps is the actual data something else. I was worried about the carriage return (\r\n in e.g. Windows files), but that didn't matter in my test, either. If you can scrub the String data you're actually parsing, and provide a sample output of the original String (fileOne), that would help significantly.
You could just load the file into a list of lines:
fin = new FileInputStream(filename);
bin = new BufferedReader(new InputStreamReader(fin));
String line = null;
List<String> lines = new ArrayList<>();
while (( line = bin.readLine()) != null ) {
lines.add( line );
}
fin.close();
Of course you have to include this stuff into some try catch block which fits into your exception handling. Then parse the lines starting with the second one like this:
for ( int i = 1; i < lines.size(); i++ ) {
String[] values = lines.get( i ).split( "," );
}

How to split Java String using REGEX into array

I have got a Java String as follows:
C|51199120|36937872|14261248|0.73|I|102398308|6240560|96157748|0.07|J|90598564|1920184|8867 8380|0.0
I want split this using regex as String arrays:
Array1 = C,51199120,36937872,14261248,0.73
Array2 =I,102398308,6240560,96157748,0.07
Array3 =J,90598564,1920184,88678380,0.03
Can Anybody help with Java code?
I don't think it's that simple. You have two things you need to do:
Break up the input string when you encounter a letter
Break up each substring by the pipe
I'm no regex expert, but I don't think it can be a single pattern. You need two and a loop over the substrings.
You can easily split your string on subcomponents using String.split("\\|"), but regexes won't help you to group them up in different arrays, nor will it help you to convert substrings to appropriate type. You'll need a separate logic for that.
Use String.split() method.
String []ar=str.split("(?=([A-Z]))");
for(String s:ar)
System.out.println(s.replace("|",","));
Simpler to just split then loop.
More or less:
String input = ...
String[] splitted = input.split("|");
List<String[]> resultArrays = new ArrayList<String[]>();
String[] currentArray = null;
for (int i = 0; i < splitted.length; i++) {
if (i % 5 == 0) {
currentArray = new String[5];
resultArrays.put(currentArray);
}
currentArray[i%5] = splitted[i];
}

Categories