Separating elements in a string by white space into two dimensional array - java

I am trying to store the following strings in a file into a two dimensional array. What code I have written works except for when an element contains a space, it separates into an additional element. Here is my file:
Student1 New York
Student 2 Miami
Student3 Chicago
So I would want my output to look like this:
[Student1] [New York]
[Student 2] [Miami]
[Student3] [Chicago]
This is my actual output:
[Student1] [New] [York]
[Student] [2] [Miami]
[Student3] [Chicago]
Here is what I've written so far:
String file= "file.txt";
BufferedReader br = new BufferedReader(new FileReader(file));
while ((file = br.readLine()) != null) {
if (!file.isEmpty()) {
String strSingleSpace = file.trim().replaceAll("\\s+", " ");
String[] obj = strSingleSpace.trim().split("\\s+");
int i=0;
String[][] newString = new String[obj.length][];
for(String temp : obj){
newString[i++]=temp.trim().split("\\s+");
}
List<String[]> yourList = Arrays.asList(newString);
System.out.println(yourList.get(0)[0] + " " + yourList.get(1)[0]);

Just giving you some "food for thought": your code is treating all lines the same way. As if they were looking exactly the same. Although you already made it very clear, that some lines have a different format.
In other words: there is no point in blindly splitting on spaces, if sometimes spaces belong into the first or the second column.
Instead:
Determine the last index of a number in a line - and then everything up to that index "makes up the first column".
The remainder of that line (after that last number) should go into the second column; only call trim() on that remaining string to get rid of the potentially leading spaces.
You could put all of that into a single matching regular expression too; but as that is probably some kind of homework; I leave that exercise to the reader.

I think for your specific test case it will work if you change this line:
String[] obj = strSingleSpace.trim().split("\\s+");\
to this:
String[] obj = strSingleSpace.trim().split("\\s+", 1);

Related

searching from txt file for a specific characters (Java)

I have a big txt. (a dictionary) file which contains about 100k + words ordered like that:
tree trees asderi 12
car cars asdfei 123
mouse mouses dasrkfi 333
plate plates asdegvi 333
......
(ps. there are no empty rows in between)
what i want to do is to to check the 3th column (asderi in this case at first row) and if there are letters "i" and "e" in this word then copy the first word in this row (tree in this case) to a new txt. file. I don't need a whole solution but maybe and example how to read 3th word and check for it letters and if they are TRUE print the first word in that line out.
When it comes to big data files you want to process line by line rather than reading all of it to your memory you may want to start with this to process the file line by line:
BufferedReader br = new BufferedReader(new FileReader(new File("C:/sample/sample.txt")));
String line;
while ((line = br.readLine()) != null) {
// process the line.
}
br.close();
Once you have the line i bet you will be able to use the common String-methods like .indexOf(.., .substring(..., .split to aquire the data you want (expecially since the source file seems to have well structured data).
So assumed your "columns" are always seperated by a space and there is never a word in a column containing a space nor is there never a column missing you could catch the columns using .split like this:
// this will be the current line of the file
String s = "tree trees asderi 12";
String[] fragments = s.split(" ");
String thirdColumn = fragments[2];
boolean hasI = thirdColumn.contains("i");
String firstColumn = fragments[0];
System.out.println("Fragment: "+thirdColumn+" contains i: "+hasI+" thats why i want the first fragment: "+firstColumn);
But in the end you will have to try around a bit and play with the String-methods to get it together especially for all special cases this file probably will bring up ;)
You may update your "question" with some source you managed to write with this hints and then ask again if you get stuck.

How can i extract specific terms from string lines in Java?

I have a serious problem with extracting terms from each string line. To be more specific, I have one csv formatted file which is actually not csv format (it saves all terms into line[0] only)
So, here's just example string line among thousands of string lines:
(split() doesn't work.!!! )
test.csv
"31451 CID005319044   15939353   C8H14O3S2    beta-lipoic acid   C1C[S#](=O)S[C##H]1CCCCC(=O)O "
"12232 COD05374044 23439353  C924O3S2    saponin   CCCC(=O)O "
"9048   CTD042032 23241  C3HO4O3S2 Berberine  [C##H]1CCCCC(=O)O "
I want to extract "beta-lipoic acid" ,"saponin" and "Berberine" only which is located in 5th position.
You can see there are big spaces between terms, so that's why I said 5th position.
In this case, how can I extract terms located in 5th position for each line?
One more thing: the length of whitespace between each of the six terms is not always equal. the length could be one, two, three, four, or five, or something like that.
Because the length of whitespace is random, I can not use the .split() function.
For example, in the first line I would get "beta-lipoic" instead "beta-lipoic acid.**
Here is a solution for your problem using the string split and index of,
import java.util.ArrayList;
public class StringSplit {
public static void main(String[] args) {
String[] seperatedStr = null;
int fourthStrIndex = 0;
String modifiedStr = null, finalStr = null;
ArrayList<String> strList = new ArrayList<String>();
strList.add("31451 CID005319044   15939353   C8H14O3S2 beta-lipoic acid C1C[S#](=O)S[C##H]1CCCCC(=O)O ");
strList.add("12232 COD05374044 23439353 C924O3S2 saponin CCCC(=O)O ");
strList.add("9048 CTD042032 23241 C3HO4O3S2 Berberine [C##H]1CCCCC(=O)O ");
for (String item: strList) {
seperatedStr = item.split("\\s+");
fourthStrIndex = item.indexOf(seperatedStr[3]) + seperatedStr[3].length();
modifiedStr = item.substring(fourthStrIndex, item.length());
finalStr = modifiedStr.substring(0, modifiedStr.indexOf(seperatedStr[seperatedStr.length - 1]));
System.out.println(finalStr.trim());
}
}
}
Output:
beta-lipoic acid
saponin
Berberine
Option 1 : Use spring.split and check for multiple consecutive spaces. Like the code below:
String s[] = str.split("\\s\\s+");
for (String string : s) {
System.out.println(string);
}
Option 2 : Implement your own string split logic by browsing through all the characters. Sample code below (This code is just to give an idea. I didnot test this code.)
public static List<String> getData(String str) {
List<String> list = new ArrayList<>();
String s="";
int count=0;
for(char c : str.toCharArray()){
System.out.println(c);
if (c==' '){
count++;
}else {
s = s+c;
}
if(count>1&&!s.equalsIgnoreCase("")){
list.add(s);
count=0;
s="";
}
}
return list;
}
This would be a relatively easy fix if it weren't for beta-lipoic acid...
Assuming that only spaces/tabs/other whitespace separate terms, you could split on whitespace.
Pattern whitespace = Pattern.compile("\\s+");
String[] terms = whitespace.split(line); // Not 100% sure of syntax here...
// Your desired term should be index 4 of the terms array
While this would work for the majority of your terms, this would also result in you losing the "acid" in "beta-lipoic acid"...
Another hacky solution would be to add in a check for the 6th spot in the array produced by the above code and see if it matches English letters. If so, you can be reasonably confident that the 6th spot is actually part of the same term as the 5th spot, so you can then concatenate those together. This falls apart pretty quickly though if you have terms with >= 3 words. So something like
Pattern possibleEnglishWord = Pattern.compile([[a-zA-Z]*); // Can add dashes and such as needed
if (possibleEnglishWord.matches(line[5])) {
// return line[4].append(line[5]) or something like that
}
Another thing you can try is to replace all groups of spaces with a single space, and then remove everything that isn't made up of just english letters/dashes
line = whitespace.matcher(line).replaceAll("");
Pattern notEnglishWord = Pattern.compile("^[a-zA-Z]*"); // The syntax on this is almost certainly wrong
notEnglishWord.matcher(line).replaceAll("");
Then hopefully the only thing that is left would be the term you're looking for.
Hopefully this helps, but I do admit it's rather convoluted. One of the issues is that it appears that non-term words may have only one space between them, which would fool Option 1 as presented by Hirak... If that weren't the case that option should work.
Oh by the way, if you do end up doing this, put the Pattern declarations outside of any loops. They only need to be created once.

Appending text from array list to a String takes a lot of time

I am reading a Simple Notepad Text file containing a lot of data actually in a 3mb of size so you can imagine the number of words it can have! The problem is I am reading this file into a string then splits the string so that I can hold each single word inside an ArrayList(String). It works fine for me but the actual problem is that I am processing this array list for some purpose and then again I have to append or you can say put all the words of array list back to the String!
so that the steps are:
I read a text file into a String (alltext)
Split all words into an arraylist
process that array list (suppose I removed all the stop words like is, am, are)
after processing on array list I want to put all the words of array list back to the string (alltext)
then I have to work with that string (alltext)
(alltext is the string that must contains the text after all processing)
The problem is that at step number 4 it takes a lot of time to append all the words back to the string my code is:
BufferedReader br = new BufferedReader(new FileReader(file));
String line = "";
while ((line = br.readLine()) != null) {
alltext += line.trim().replaceAll("\\s+", " ") + " ";
}
br.close();
//Adding All elements from all text to temp list
ArrayList<String> tempList = new ArrayList<String>();
String[] array = alltext.split(" ");
for (String a : array) {
tempList.add(a);
}
//remove stop words here from the temp list
//Adding File Words from List in One String
alltext = "";
for (String removed1 : tempList) {
System.out.println("appending the text");
alltext += removed1.toLowerCase() + " ";
//here it is taking a lot of time suppose 5-10 minutes for a simple text file of even 1.4mb
}
So I just want any idea so that I can reduce the time for an efficient processing and relax the machine! I will be thankful for any suggestions and ideas...
Thanks
Use a StringBuffer instead of a String.
A String is immutable and thus you create a new Object everytime you append, which takes more and more time the longer your String becomes. A StringBuffer is mutable and made for cases like yours.
I would recommend StringBuilder
According to this stringbuilder-and-stringbuffer-in-java it's faster than a StringBuffer also check if you need the ArrayList because you can iterate through the array too

Reading integers from a file separated by space in java

Input file containing integers will be like this:
5 2 3 5
2 4 23 4 5 6 4
So how would I read the first line, separate it by space and add these numbers to Arraylist1. Then read the second line, separate it by space and add the numbers to ArrayList2 and so on. (So Arraylist1 will contain [5,2,3,5] etc)
FileInputStream fstream = new FileInputStream("file.txt");
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String data;
while ((data = br.readLine()) != null) {
//How can I do what I described above here?
}
Homework?
You can use this:
String[] tmp = data.split(" "); //Split space
for(String s: tmp)
myArrayList.add(s);
Or you have a look at the Scanner class.
Consider using a StringTokenizer
Some help : String tokenizer
StringTokenizer st = new StringTokenizer(in, "=;");
while(st.hasMoreTokens()) {
String key = st.nextToken();
String val = st.nextToken();
System.out.println(key + "\t" + val);
}
You can get a standard array out of data.split("\\s+");, which will give you int[]. You'll need something extra to throw different lines into different lists.
After I tried the answer provided by HectorLector, it didn't work in some specific situation. So, here is mine:
String[] tmp = data.split("\\s+");
This uses Regular Expression
what you would require is something like an ArrayList of ArrayList. You can use the data.split("\\s+"); function in java to get all the elements in a single line in a String array and then put these elements into the inner ArrayList of the ArrayList of ArrayLists.
and for the next line you can move to the next element of the outer ArrayList and so on.

Get csv and compare lines. ArrayList? Java

i dont't use java very often and now i got some Problem.
I want to read a CSV file like this one:
A,B,C,D
A,B,F,K
E,F,S,A
A,B,C,S
A,C,C,S
Java don't know dynamic arrays, so i choose an ArrayList. This works so far. The Problem is:
How can I store the ArrayList? I think an other ArrayList would help.
This is what I got:
BufferedReader reader = new BufferedReader(
new InputStreamReader(this.getClass().getResourceAsStream(
"../data/" + filename + ".csv")));
List rows = new ArrayList();
String line;
while ((line = reader.readLine()) != null) {
rows.add(Arrays.asList(line.split(",")));
}
Now I get an ArrayList with a size of 5 for rows.size().
How do I get row[0][0] for example?
What do I want to do? The Problem is i want to find the same row except the last column.
For example i want to find row 0 and row 3.
thank you very much
Thank you all! You helped me a lot. =) Maybe Java and I will become friends =) THANKS!
You don't need to know the row size in advance, String.split() returns a String array:
List<String[]> rows = new ArrayList<String[]>();
String line = null;
while((line = reader.readLine()) != null)
rows.add(line.split(",", -1));
To access a specific row:
int len = rows.get(0).length;
String val = rows.get(0)[0];
Also, are you always comparing by the entire row except the last column? You could just take off the last value (line.replaceFirst(",.*?$", "")) and compare the rows as strings (have to be careful of whitespace and other formatting, of course).
A slightly different way:
Set<String> rows = new HashSet<String>();
String line = null;
while((line = reader.readLine()) != null){
if(!rows.add(line.substring(0, line.lastIndexOf(','))))
System.out.println("duplicate found: " + line);
}
Of course, modify as necessary if you actually need to capture the matching lines.
You'll need to declare an ArrayList of arrays. Asuming that csv file has a known number of columns, the only dynamic list needed here are the "rows" of your "table", formed by an ArrayList(rows) of arrays char[] (columns). (If not, then an ArrayList of ArrayList is fine).
It's just like a 2D table in any other language: an array of arrays. Just that in this case one of the arrays needs to be dynamic.
To read the file you'll need two loops. One that reads each line, just as you're doing, and another one that reads char per char.
Just a quick note: if you are going to declare an array like this:
char[] row = new char[5];
and then going to add each row to the ArrayList like this:
yourList.add(row);
You will have a list full of pointers to the same array. You'll need to use the .clone() method like this:
yourList.add(row.clone());
To access it like table[1][2], you'll need to use arraylist.get(1).get(2);

Categories