JAVA - import CSV to ArrayList - java

I'm trying import CSV file to Arraylist using StringTokenizer:
public class Test
{
public static void main(String [] args)
{
List<ImportedXls> datalist = new ArrayList<ImportedXls>();
try
{
FileReader fr = new FileReader("c:\\temp.csv");
BufferedReader br = new BufferedReader(fr);
String stringRead = br.readLine();
while( stringRead != null )
{
StringTokenizer st = new StringTokenizer(stringRead, ",");
String docNumber = st.nextToken( );
String note = st.nextToken( ); /** PROBLEM */
String index = st.nextToken( ); /** PROBLEM */
ImportedXls temp = new ImportedXls(docNumber, note, index);
datalist.add(temp);
// read the next line
stringRead = br.readLine();
}
br.close( );
}
catch(IOException ioe){...}
for (ImportedXls item : datalist) {
System.out.println(item.getDocNumber());
}
}
}
I don't understand how the nextToken works, because if I keep the initialize three variables (docNumber, note and index) as nextToken(), it fails on:
Exception in thread "main" java.util.NoSuchElementException
at java.util.StringTokenizer.nextToken(Unknown Source)
at _test.Test.main(Test.java:32)
If I keep docNumber only, it works. Could you help me?

It seems that some of the rows of your input file have less then 3 comma separated fields.You should always check if tokenizer has more tokens (StringTokenizer.hasMoreTokens), unless you are are 100% sure your input is correct.
CORRECT parsing of CSV files is not so trivial task. Why not to use a library that can do it very well - http://opencsv.sourceforge.net/ ?

Seems like your code is getting to a line that the Tokenizer is only breaking up into 1 part instead of 3. Is it possible to have lines with missing data? If so, you need to handle this.

Most probably your input file doesn't contain another element delimited by , in at least one line. Please show us your input - if possible the line that fails.
However, you don't need to use StringTokenizer. Using String#split() might be easier:
...
while( stringRead != null )
{
String[] elements = stringRead.split(",");
if(elements.length < 3) {
throw new RuntimeException("line too short"); //handle missing entries
}
String docNumber = elements[0];
String note = elements[1];
String index = elements[2];
ImportedXls temp = new ImportedXls(docNumber, note, index);
datalist.add(temp);
// read the next line
stringRead = br.readLine();
}
...

You should be able to check your tokens using the hasMoreTokens() method. If this returns false, then it's possible that the line you've read does not contain anything (i.e., an empty string).
It would be better though to use the String.split() method--if I'm not mistaken, there were plans to deprecate the StringTokenizer class.

Related

Putting a text file into an ArrayList, but if word exist it skips it

I´m in a bit of a struggle here, I´m trying to add each word from a textfile to an ArrayList and every time the reader comes across the same word again it will skip it. (Makes sense?)
I don't even know where to start. I kind of know that I need one loop that adds the textfile to the ArrayList and one the checks if the word is not in the list. Any ideas?
PS: Just started with Java
This is what I've done so far, don't even know if I'm on the right path..
public String findWord(){
int text = 0;
int i = 0;
while sc.hasNextLine()){
wordArray[i] = sc.nextLine();
}
if wordArray[i].contains() {
}
i++;
}
A List (an ArrayList or otherwise) is not the best data structure to use; a Set is better. In pseudo code:
define a Set
for each word
if adding to the set returns false, skip it
else do whatever do want to do with the (first time encountered) word
The add() method of Set returns true if the set changed as a result of the call, which only happens if the word isn't already in the set, because sets disallow duplicates.
I once made a similar program, it read through a textfile and counted how many times a word came up.
Id start with importing a scanner, as well as a file system(this needs to be at the top of the java class)
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.File;
import java.io.PrintStream;
import java.util.Scanner;
then you can make file, as well as a scanner reading from this file, make sure to adjsut the path to the file accordingly. The new Printstream is not necessary but when dealing with a big amount of data i dont like to overflow the console.
public static void main(String[] args) throws FileNotFoundException {
File file=new File("E:/Youtube analytics/input/input.txt");
Scanner scanner = new Scanner(file); //will read from the file above
PrintStream out = new PrintStream(new FileOutputStream("E:/Youtube analytics/output/output.txt"));
System.setOut(out);
}
after this you can use scanner.next() to get the next word so you would write something like this:
String[] array=new String[MaxAmountOfWords];//this will make an array
int numberOfWords=0;
String currentWord="";
while(scanner.hasNext()){
currentWord=scanner.next();
if(isNotInArray(currentWord))
{
array[numberOfWords]=currentWord
}
numberOfWords++;
}
If you dont understand any of this or need further guidence to progress, let me know. It is hard to help you if we dont exactly know where you are at...
You can try this:
public List<String> getAllWords(String filePath){
String line;
List<String> allWords = new ArrayList<String>();
BufferedReader reader = new BufferedReader(new FileReader(new File(filePath)));
//read each line of the file
while((line = reader.readLine()) != null) {
//get each word in the line
for(String word: line.split("(\\w)+"))
//validate if the current word is not empty
if(!word.isEmpty())
if(!allWords.contains(word))
allWords.add(word);
}
}
return allWords;
}
Best solution is to use a Set. But if you still want to use a List, here goes:
Suppose the file has the following data:
Hi how are you
I am Hardi
Who are you
Code will be:
List<String> list = new ArrayList<>();
// Get the file.
FileInputStream fis = new FileInputStream("C:/Users/hdinesh/Desktop/samples.txt");
//Construct BufferedReader from InputStreamReader
BufferedReader br = new BufferedReader(new InputStreamReader(fis));
String line = null;
// Loop through each line in the file
while ((line = br.readLine()) != null) {
// Regex for finding just the words
String[] strArray = line.split("[ ]");
for (int i = 0; i< strArray.length; i++) {
if (!list.contains(strArray[i])) {
list.add(strArray[i]);
}
}
}
br.close();
System.out.println(list.toString());
If your text file has sentences with special characters, you will have to write a regex for that.

how can i use split() with a big number of elements, java

I need to process a big text file, there are almost 400 column in each line, and almost 800000 lines in the file, the format of each line in the file is like:
340,9,2,3........5,2,LA
what I want to do is, for each line, if the last column is LA, then print the first column of this line.
i write a simple program to do it
BufferedReader bufr = new BufferedReader(new FileReader ("A.txt"));
BufferedWriter bufw = new BufferedWriter(new FileWriter ("LA.txt"));
String line = null;
while ((line = bufr.readLine()) != null) {
String [] text = new String [388];
text = line.split(",");
if (text [387] == args[2]) {
bufw.write(text[0]);
bufw.newLine();
bufw.flush();
}
}
bufw.close();
bufr.close();
but it seems the length of an array cant be that big, i received a java.lang.ArrayIndexOutOfBoundsException
since i'm using split(",") in order to get the last column of a line, and it will be out of array bounds, how can I do with it? thanks.
text does not need to be initialized, String.split will create a correctly sized array:
String[] text = line.split(",");
You're also comparing Strings using reference equality (==). You should be using .equals():
if (text[387].equals(args[2])) { ... }
You're probably getting java.lang.ArrayIndexOutOfBoundsException because the the index 387 is too big. If you want to get last element, use this:
text[text.length - 1]
Modify and try this
String [] text = line.split(",");
if (text [text.length - 1].equals(args[2])) {
bufw.write(text[0]);
bufw.newLine();
bufw.flush();
}
Assuming args[2] is LA.
String [] text;
Change your code to this. You don't need to initialize a size. When the String.split method executes it will automatically initialize the correct size for your array.
If you just need the first and the last column, then there is no need to create an array out of the current line.
You could do something like this:
final String test = "340,9,2,354,63,5,5,45,634,5,5,2,LA";
final char delimiter = ',';
final String lastColumn = test.substring(test.lastIndexOf(delimiter) + 1);
if (lastColumn.equals("LA")) {
final String firstColumn = test.substring(0, test.indexOf(delimiter));
System.out.println(firstColumn);
}
This code extracts the last column first and tests it. If it matches "LA", then it extract the first column. It will ignore the remaining content of the line.
Your code would be:
BufferedReader bufr = new BufferedReader(new FileReader ("A.txt"));
BufferedWriter bufw = new BufferedWriter(new FileWriter ("LA.txt"));
String line = null;
while ((line = bufr.readLine()) != null) {
final String lastColumn = line.substring(line.lastIndexOf(delimiter) + 1);
if (lastColumn.equals(args[2])) {
bufw.write(line.substring(0, line.indexOf(delimiter)));
bufw.newLine();
bufw.flush();
}
}
bufw.close();
bufr.close();
(this code is not tested yet, but you get the idea :))

Find a string in a very large formatted text file in java

Here is the thing:
I have a really big text file and it has a format like this:
0007476|000011434982|00249626000|R|2008-01-11 00:00:00|9999-12-31 23:59:59|000019.99
0007476|000014017887|00313865000|R|2011-04-19 00:00:00|9999-12-31 23:59:59|000599.99
...
...
And I need to find if a particular pattern exists in the file, say
0007476|whatever|00313865000|whatever
All I need is a boolean saying yes or no.
Now what I have done is to read the file line by line and do a regular expression matching:
Pattern pattern = Pattern.compile(regex);
Scanner scanner = new Scanner(new File(fileName));
String line;
while (scanner.hasNextLine()) {
line = scanner.nextLine();
if (pattern.matcher(line).matches()) {
scanner.close();
return true;
}
}
and the regex has a form of
"0007476\|\d{12}\|0031386500.*
This method works, but it takes usually 15 seconds to search for a string that is far from the start line. Is there a faster way to achieve that? Thanks
The java String class has a contains method which returns a boolean. If your string is fixed, this is a lot faster than a regular expression:
if (string.contains("0007476|") && string.contains("|00313865000|")) {
// whatever
}
Hope that helped, if not, leave a comment.
I assume that you need the Scanner because the file is too big to read into a single String instead?
If that is not the case, you can probably use a regular expression that finds the match directly. Depending on whether or not you care about the specific text at the start of the line you can you something along the lines of:
"(?m)^0007476\|\d{12}\|0031386500.*$
If you do need to break it up into smaller chunks because of memory usage I would suggest not reading on a per line basis, (since the lines are rather short), but process bigger chunks using something like a BufferedReader instead?
I fiddled around a bit with a 1.25GB file and the following is about 2.5 times faster than your implementation:
private static boolean matches() throws IOException {
String regex = "(?m)^0007476\|\d{12}\|0031386500.*$";
Pattern pattern = Pattern.compile(regex);
try(BufferedReader br = new BufferedReader(new FileReader(FILENAME))) {
for(String lines; (lines = readLines(br, 10000)) != null; ) {
if (pattern.matcher(lines).find()) {
return true;
}
}
}
return false;
}
private static String readLines(BufferedReader br, int amount) throws IOException {
StringBuilder builder = new StringBuilder();
int lineCounter = 0;
for(String line; (line = br.readLine()) != null && lineCounter < amount; lineCounter++ ) {
builder.append(line).append(System.lineSeparator());
}
return lineCounter > 0 ? builder.toString() : null;
}

Trying to parse a string to int from a file. Get NumberFormatException: For input string: "", eaven tho string appears to be an good int string. Java

I have been trying to figure this out for couple of hours now and I hope one of you can help me. I have an file (actually two but thats not important) that have some rows and columns with numbers and blank spaces between. And I'm trying to read those with BufferedReader. And that works great. I can print out the strings & chars however I want. But when I try to parse those strings and chars I get the following error:
Exception in thread "main" java.lang.NumberFormatException: For input string: ""
at java.lang.NumberFormatException.forInputString(Unknown Source)
at java.lang.Integer.parseInt(Unknown Source)
at java.lang.Integer.parseInt(Unknown Source)
at FileProcess.processed(FileProcess.java:30)
at DecisionTree.main(DecisionTree.java:16)
From what I have found with google I think the error is located in how I read my file.
public class ReadFiles {
private BufferedReader read;
public ReadFiles(BufferedReader startRead) {
read = startRead;
}
public String readFiles() throws IOException {
try {
String readLine = read.readLine().trim();
String readStuff = "";
while(readLine != null) {
readStuff += (readLine + "\n");
readLine = read.readLine();
}
return readStuff;
}
catch(NumberFormatException e) {
return null;
}
}
And for the parsing bit
public class FileProcess {
public String processed() throws IOException {
fileSelect fs = new fileSelect();
ReadFiles tr = new ReadFiles(fs.traning());
String training = tr.readFiles();
ReadFiles ts = new ReadFiles(fs.test());
String test = ts.readFiles();
List liste = new List(14,test.length());
String[] test2 = test.split("\n");
for(int i = 0; i<test2[0].length(); i++) {
char tmp = test.charAt(i);
String S = Character.toString(tmp).trim();
//int i1 = Integer.parseInt(S);
System.out.print(S);
}
This isn't the actual code for what I planning to do with the output, but the error appears at the code that is commented out. So my string output is as following:
12112211
Which seems good to parse to integer. But it does not work. I tried to manually see what's in the char position 0 and 1, for 0 I get 1, but for 1 I get nothing aka "". So how can I remove the ""? I hope you guys can help me out, and let me know if you need more info. But I think I have covered what's needed.
Thanks in advance :)
Yeah, and another thing: If I replace "" with "0" it works, but then I get all those zeros which I can't find a clever way to remove. But is it possible to maybe skip them while parsing or something? My files only hold 1 and 2, so it wouldn't interfere with anything if it is possible.
The string "" will be returned if you have 2 of the splitting characters next to each other (i.e. \n\n) or if there is a whitespace character being passed into the trim() call so ignore empty strings and carry on.
You could use the Scanner class to parse for ints, skipping Whitespace:
sc = new java.util.Scanner (line);
sc.nextInt ();
Another idea is to trim the line, split, and parse the parts:
lin = line.trim ();
String [] words = lin.split (" +");
for (String si : words)
Integer.parseInt (si);

BufferedReader: read multiple lines into a single string

I'm reading numbers from a txt file using BufferedReader for analysis. The way I'm going about this now is- reading a line using .readline, splitting this string into an array of strings using .split
public InputFile () {
fileIn = null;
//stuff here
fileIn = new FileReader((filename + ".txt"));
buffIn = new BufferedReader(fileIn);
return;
//stuff here
}
public String ReadBigStringIn() {
String line = null;
try { line = buffIn.readLine(); }
catch(IOException e){};
return line;
}
public ProcessMain() {
initComponents();
String[] stringArray;
String line;
try {
InputFile stringIn = new InputFile();
line = stringIn.ReadBigStringIn();
stringArray = line.split("[^0-9.+Ee-]+");
// analysis etc.
}
}
This works fine, but what if the txt file has multiple lines of text? Is there a way to output a single long string, or perhaps another way of doing it? Maybe use while(buffIn.readline != null) {}? Not sure how to implement this.
Ideas appreciated,
thanks.
You are right, a loop would be needed here.
The usual idiom (using only plain Java) is something like this:
public String ReadBigStringIn(BufferedReader buffIn) throws IOException {
StringBuilder everything = new StringBuilder();
String line;
while( (line = buffIn.readLine()) != null) {
everything.append(line);
}
return everything.toString();
}
This removes the line breaks - if you want to retain them, don't use the readLine() method, but simply read into a char[] instead (and append this to your StringBuilder).
Please note that this loop will run until the stream ends (and will block if it doesn't end), so if you need a different condition to finish the loop, implement it in there.
I would strongly advice using library here but since Java 8 you can do this also using streams.
try (InputStreamReader in = new InputStreamReader(System.in);
BufferedReader buffer = new BufferedReader(in)) {
final String fileAsText = buffer.lines().collect(Collectors.joining());
System.out.println(fileAsText);
} catch (Exception e) {
e.printStackTrace();
}
You can notice also that it is pretty effective as joining is using StringBuilder internally.
If you just want to read the entirety of a file into a string, I suggest you use Guava's Files class:
String text = Files.toString("filename.txt", Charsets.UTF_8);
Of course, that's assuming you want to maintain the linebreaks. If you want to remove the linebreaks, you could either load it that way and then use String.replace, or you could use Guava again:
List<String> lines = Files.readLines(new File("filename.txt"), Charsets.UTF_8);
String joined = Joiner.on("").join(lines);
Sounds like you want Apache IO FileUtils
String text = FileUtils.readStringFromFile(new File(filename + ".txt"));
String[] stringArray = text.split("[^0-9.+Ee-]+");
If you create a StringBuilder, then you can append every line to it, and return the String using toString() at the end.
You can replace your ReadBigStringIn() with
public String ReadBigStringIn() {
StringBuilder b = new StringBuilder();
try {
String line = buffIn.readLine();
while (line != null) {
b.append(line);
line = buffIn.readLine();
}
}
catch(IOException e){};
return b.toString();
}
You have a file containing doubles. Looks like you have more than one number per line, and may have multiple lines.
Simplest thing to do is read lines in a while loop.
You could return null from your ReadBigStringIn method when last line is reached and terminate your loop there.
But more normal would be to create and use the reader in one method. Perhaps you could change to a method which reads the file and returns an array or list of doubles.
BTW, could you simply split your strings by whitespace?
Reading a whole file into a single String may suit your particular case, but be aware that it could cause a memory explosion if your file was very large. Streaming approach is generally safer for such i/o.
This creates a long string, every line is seprateted from string " " (one space):
public String ReadBigStringIn() {
StringBuffer line = new StringBuffer();
try {
while(buffIn.ready()) {
line.append(" " + buffIn.readLine());
} catch(IOException e){
e.printStackTrace();
}
return line.toString();
}

Categories