How to delimit new line when reading CSV file?

How to delimit new line when reading CSV file? - java

I am trying to read a file where each line has data members, separated by commas, that are meant to populate an object's data members, I tried using the regex "|" symbol to separate "," and "\n" along with "\r" for getting to the new line. However, after reading the first line, the first data member of the second line does not get read right away but rather a "" character gets read beforehand. Am I using the wrong regex symbols? or am I not using the right approach? I read that there are many ways to tackle this and opted to use scanner since seemed the most simple, using the buffer reader seemed very confusing since it seems like it returns arrays and not individual strings and ints which is I'm trying to get.
The CSV file looks something like this
stringA,stringB,stringC,1,2,3
stringD,stringE,stringF,4,5,6
stringG,stringH,stringI,7,8,9
My code looks something like this
//In list class
public void load() throws FileNotFoundException
{
Scanner input = new Scanner(new FileReader("a_file.csv"));
object to_add; //To be added to the list
input.useDelimiter(",|\\n|\\r");
while (input.hasNext())
{
String n = input.next(); //After the first loop run, this data gets the value ""
String l = input.next(); //During this second run, this member gets the data that n was supposed to get, "stringD"
String d = input.next(); //This one gets "stringE"
int a = input.nextInt(); //And this one tries to get "stringF", which makes it crash
int c = input.nextInt();
to_add = new object(n, l, d, a, b, c); //Calling copy constructor to populate data members
insert(to_add); //Inserting object to the list
}
input.close();
}

Use Apache Commons CSV. Here is the user guide https://commons.apache.org/proper/commons-csv/user-guide.html

You can do this with OpenCSV and here is a tutorial how to use this library. You can download the library from the Maven Repository.
So following is the code what you need to do,
Reader reader = Files.newBufferedReader(Paths.get("path/to/csvfile.csv"));
CSVReader csvReader = new CSVReader(reader);
List<String[]> dataList = new ArrayList<>();
dataList = csvReader.readAll();
reader.close();
csvReader.close();
Object to_add;
for (String[] rowData : dataList) {
String textOne = rowData[0];
String textTwo = rowData[1];
String textThree = rowData[2];
int numberOne = Integer.parseInt(rowData[3]);
int numberTwo = Integer.parseInt(rowData[4]);
int numberThree = Integer.parseInt(rowData[5]);
to_add = new Object(textOne, textTwo, textThree, numberOne, numberTwo, numberThree);
insert(to_add);
}

Related

Java, Reading two different types of variables from a file and using them as objects later

I working on a project that is based on reading a text from a file and putting it as objects in my code.
My file has the following elements:
(ignore the bullet points)
4
Christmas Party
20
Valentine
12
Easter
5
Halloween
8
The first line declares how many "parties" I have in my text file (its 4 btw)
Every party has two lines - the first line is the name and the second one is the number of places available.
So for example, Christmas Party has 20 places available
Here's my code for saving the information from the file as objects.
public class Parties
{
static Scanner input = new Scanner(System.in);
public static void main(String[] args) throws FileNotFoundException
{
Scanner inFile = new Scanner(new FileReader ("C:\\desktop\\file.txt"));
int first = inFile.nextInt();
inFile.nextLine();
for(int i=0; i < first ; i++)
{
String str = inFile.nextLine();
String[] e = str.split("\\n");
String name = e[0];
int tickets= Integer.parseInt(e[1]); //this is where it throw an error ArrayIndexOutOfBoundsException, i read about it and I still don't understand
Party newParty = new Party(name, tickets);
System.out.println(name+ " " + tickets);
}
This is my SingleParty Class:
public class SingleParty
{
private String name;
private int tickets;
public Party(String newName, int newTickets)
{
newName = name;
newTickets = tickets;
}
Can someone explain to me how could I approach this error?
Thank you

str only contains the party name and splitting it won't work, as it won't have '\n' there.
It should be like this within the loop:
String name = inFile.nextLine();
int tickets = inFile.nextInt();
Party party = new Party(name, tickets);
// Print it here.
inFile().nextLine(); // for flushing

You could create a HashMap and put all the options into that during your iteration.
HashMap<String, Integer> hmap = new HashMap<>();
while (sc.hasNext()) {
String name = sc.nextLine();
int tickets = Integer.parseInt(sc.nextLine());
hmap.put(name, tickets);
}
You can now do what you need with each entry in the HashMap.
Note: this assumes you've done something with the first line of the text file, the 4 in your example.

nextLine() returns a single string.
Consider the first iteration, for example, "Christmas Party".
If you split this string by \n all you're gonna get is "Christmas Party" in an array of length 1. Split by "blank space" and it should work.

Java input/output and Scanner object

Below are two functions in my class, I want to first read the number of lines from a text file, then store the contents in an array. The problem I am having is that if I do not comment out int aNumber = numOfObjects(newInput); the array does not get stored and printed, it's as if numOfObjects function got to the end of the text file, and I can no longer access it. If I comment it out it works fine. I tried adding a second Scanner object but it didn't help. What can I do to make it work?
public void correctListItems(FileInputStream inputFile,FileOutputStream outputFile){
newInput = new Scanner(inputFile);
forCapturing = new Scanner(inputFile);
int aNumber = numOfObjects(newInput);
System.out.println(aNumber);
for(int i=0; forCapturing.hasNextLine(); i++){
publicationArray[i] = new Publication();
publicationArray[i].publication_code = forCapturing.nextLong();
publicationArray[i].publication_name = forCapturing.next();
publicationArray[i].publication_year = forCapturing.nextInt();
publicationArray[i].publication_authorname = forCapturing.next();
publicationArray[i].publication_cost = forCapturing.nextDouble();
publicationArray[i].publication_nbpages = forCapturing.nextInt();
System.out.println(publicationArray[i]);
System.out.println("-----------------------------------\n");
}
}
private int numOfObjects(Scanner aScanner){
int count = 0;
while (aScanner.hasNextLine()){
count++;
aScanner.nextLine(); //if this isn't included you'll experience an infinite loop
}
System.out.println(count);
return count;
}
}

There is a way to do this as you want. i.e. by reading through the file 2 times. First to count and then to capture.
Just add below lines after your line int aNumber = numOfObjects(newInput); in correctListItems function.
public void correctListItems(FileInputStream inputFile,FileOutputStream outputFile){
newInput = new Scanner(inputFile);
int aNumber = numOfObjects(newInput);
newInput.close();
inputFile.close();
inputFile = new FileInputStream(
new File(
"inputfile.txt"));
System.out.println(aNumber);
forCapturing = new Scanner(inputFile);
for(int i=0; forCapturing.hasNextLine(); i++){
....
....
So basically closing the scanner as well as file is important. And then creating the fileinputreader stream again will reset the file pointer to the beginning of the file. As you might already know, If the input file is not from the project folder, you have to give the complete path.
As a good process, its always better to close both the scanner object and file object after you're done, And then reinitialize the objects to start working on them again.
Hope this helps.

It looks like the scanner class uses an iterator internally. This means that it needs to be closed at some point, which I cant find in your code. Therefore I would (1) add following line to the numOfObects function before the return: “aScanner.close()”.
(2) I would create the second Scanner instance after you called the function, just to be sure. Hope it works.
Cheers!

The scanner doesn't move to the next line unless you call nextLine. So the loop is infinite since you're always on the first line.
But why do you need to know the number of objects in advance? Why not simply use a list instead of publicationArray?

Well, I am not exactly sure but I can be quite certain that the FileInputStream object once the bytes has been read by Scanner once, the scanner will have a token to indicate that a particular line has been read.
How about you change your code to:
public void correctListItems(FileInputStream inputFile,FileOutputStream outputFile){
forCapturing = new Scanner(inputFile);
for(int i=0; forCapturing.hasNextLine(); i++){
publicationArray[i] = new Publication();
publicationArray[i].publication_code = forCapturing.nextLong();
publicationArray[i].publication_name = forCapturing.next();
publicationArray[i].publication_year = forCapturing.nextInt();
publicationArray[i].publication_authorname = forCapturing.next();
publicationArray[i].publication_cost = forCapturing.nextDouble();
publicationArray[i].publication_nbpages = forCapturing.nextInt();
System.out.println(publicationArray[i]);
System.out.println("-----------------------------------\n");
}
System.out.println("Number of lines: "+ i);
}
At least with this, you would not have to run 2 loops to the same set of data. better performance too and get the thing you need done in 1 round of a loop

Java: Read a text file into an array

I've a txt file composed by two columns like this:
Name1 _ Opt1
Name2 _ Opt2
Name3 _ Opt3
In each row there's a name, a tab delimiter, a _ and then another name; there are really many rows (about 150000) and i'm not even sure which one is the best constructor to use, i'm thinking about a two dimensional array but it could be also something else if it's a better choice. For me it's important that i can access to the elements with something like this a[x][y].
I've done this but i just know how to count the number of the lines or how to put each lines in a different position of an array.
Here's the code:
int countLine = 0;
BufferedReader reader = new BufferedReader(new FileReader(filename));
while (true) {
String line = reader.readLine();
if (line == null) {
reader.close();
break;
} else {
countLine++;
}
}

Since you don't know the number of lines ahead of time, I would use an ArrayList instead of an array. The splitting of lines into String values can easily be done with a regular expression.
Pattern pattern = Pattern.compile("(.*)\t_\t(.*)");
List<String[]> list = new ArrayList<>();
int countLine = 0;
BufferedReader reader = new BufferedReader(new FileReader(filename));
while (true) {
String line = reader.readLine();
if (line == null) {
reader.close();
break;
} else {
Matcher matcher = pattern.matcher(line);
if (matcher.matches()) {
list.add(new String[] { matcher.group(1), matcher.group(2) });
}
countLine++;
}

The first thing you should do is to write a class that represents an entry in your file. It could be quite sophisticated but a really simple design will probably also do.
class Record {
final String name;
final String option;
Record(final String name, final String option) {
this.name = name;
this.option = option;
}
}
Using this class is much better than messing with arrays of strings.
The second thing to do is to use a more abstract data structure than an array structure to put your records into. This will free you from the burden of having to know the number of elements in advance. I recommend that you use an ArrayList for this. Then, you can read in one record at a time and add it to your collection.
List<Record> records = new ArrayList<Record>();
records.add(new Record("NameX", "OptionX"));
System.out.printf("There are %d records in the list.%n", records.size());
Of course, the second line in the above example should be done over and over again in your loop that reads the lines of the file.

Use ArrayList instead of array because the size is unknown. Use Scanner to read file, and to check existence of next line in file use hasNextLine() method,
import java.io.File;
import java.io.FileNotFoundException;
import java.util.ArrayList;
import java.util.Scanner;
class Test {
static Scanner userInput = new Scanner(System.in);
public static void main(String args[]) throws FileNotFoundException {
int countline = 0;
Scanner inp=new Scanner(new File("/home/nasir/Desktop/abc.txt"));
ArrayList<String> list=new ArrayList<String>();
while(inp.hasNextLine()){
list.add(inp.nextLine());// adding a row in ArrayList
countline++;// counting every line/row
}
System.out.println(countline+" "+list.get(2));
}// Main
}// Class

You can save the data on
HashMap>
The first String (key) is your name
The second String (key) is your opt and his value (reault) is the object result.
You can use it as:
result = youHashMap.get(name).get(opt);

Java - How to read integers separated by a space into an array

I am having trouble with my project because I can't get the beginning correct, which is to read a line of integers separated by a space from the user and place the values into an array.
System.out.println("Enter the elements separated by spaces: ");
String input = sc.next();
StringTokenizer strToken = new StringTokenizer(input);
int count = strToken.countTokens();
//Reads in the numbers to the array
System.out.println("Count: " + count);
int[] arr = new int[count];
for(int x = 0;x < count;x++){
arr[x] = Integer.parseInt((String)strToken.nextElement());
}
This is what I have, and it only seems to read the first element in the array because when count is initialized, it is set to 1 for some reason.
Can anyone help me? Would it be better to do this a different way?

There is only a tiny change necessary to make your code work. The error is in this line:
String input = sc.next();
As pointed out in my comment under the question, it only reads the next token of input. See the documentation.
If you replace it with
String input = sc.nextLine();
it will do what you want it to do, because nextLine() consumes the whole line of input.

String integers = "54 65 74";
List<Integer> list = new ArrayList<Integer>();
for (String s : integers.split("\\s"))
{
list.add(Integer.parseInt(s));
}
list.toArray();

This would be a easier way to do the same -
System.out.println("Enter the elements seperated by spaces: ");
String input = sc.nextLine();
String[] split = input.split("\\s+");
int[] desiredOP = new int[split.length];
int i=0;
for (String string : split) {
desiredOP[i++] = Integer.parseInt(string);
}

There are alternate ways to achieve the same. but when i tried your code, it seems to work properly.
StringTokenizer strToken = new StringTokenizer("a b c");
int count = strToken.countTokens();
System.out.println(count);
It prints count as 3. default demiliter is " "
I dont know how are you getting your input field. May be it is not returning the complete input in string format.
I think you are using java.util.Scanner for reading your input
java doc from scanner.
A Scanner breaks its input into tokens using a delimiter pattern,
which by default matches whitespace. The resulting tokens may then be
converted into values of different types using the various next
methods.
Hence the input is returning just one Integer and leaving the rest unattended
Read this. Scanner#next(), You should use Scanner#nextLine() instead

import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.util.StringTokenizer;
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
.
.
.
StringTokenizer st = new StringTokenizer(br.readLine());
int K = Integer.parseInt(st.nextToken());
int N= Integer.parseInt(st.nextToken());

loading large matrix from text file into Java arrays

My data is stored in large matrices stored in txt files with millions of rows and 4 columns of comma-separated values. (Each column stores a different variable, and each row stores a different millisecond's data for all four variables.) There is also some irrelevant header data in the first dozen or so lines. I need to write Java code to load this data into four arrays, with one array for each column in the txt matrix. The Java code also needs to be able to tell when the header is done, so that the first data row can be split into entries for the 4 arrays. Finally, the java code needs to iterate through the millions of data rows, repeating the process of decomposing each row into four numbers which are each entered into the appropriate array for the column in which the number was located.
Can anyone show me how to alter the code below in order to accomplish this?
I want to find the fastest way to accomplish this processing of millions of rows. Here is my code:
MainClass2.java
package packages;
public class MainClass2{
public static void main(String[] args){
readfile2 r = new readfile2();
r.openFile();
int x1Count = r.readFile();
r.populateArray(x1Count);
r.closeFile();
}
}
readfile2.java
package packages;
import java.io.*;
import java.util.*;
public class readfile2 {
private Scanner scan1;
private Scanner scan2;
public void openFile(){
try{
scan1 = new Scanner(new File("C:\\test\\samedatafile.txt"));
scan1 = new Scanner(new File("C:\\test\\samedatafile.txt"));
}
catch(Exception e){
System.out.println("could not find file");
}
}
public int readFile(){
int scan1Count = 0;
while(scan1.hasNext()){
scan1.next();
scan1Count += 1;
}
return scan1Count;
}
public double[] populateArray(int scan1Count){
double[] outputArray1 = new double[scan1Count];
double[] outputArray2 = new double[scan1Count];
double[] outputArray3 = new double[scan1Count];
double[] outputArray4 = new double[scan1Count];
int i = 0;
while(scan2.hasNext()){
//what code do I write here to:
// 1.) identify the start of my time series rows after the end of the header rows (e.g. row starts with a number AT LEAST 4 digits in length.)
// 2.) split each time series row's data into a separate new entry for each of the 4 output arrays
i++;
}
return outputArray1, outputArray2, outputArray3, outputArray4;
}
public void closeFile(){
scan1.close();
scan2.close();
}
}
Here are the first 19 lines of a typical data file:
text and numbers on first line
1 msec/sample
3 channels
ECG
Volts
Z_Hamming_0_05_LPF
Ohms
dz/dt
Volts
min,CH2,CH4,CH41,
,3087747,3087747,3087747,
0,-0.0518799,17.0624,0,
1.66667E-05,-0.0509644,17.0624,-0.00288295,
3.33333E-05,-0.0497437,17.0624,-0.00983428,
5E-05,-0.0482178,17.0624,-0.0161573,
6.66667E-05,-0.0466919,17.0624,-0.0204402,
8.33333E-05,-0.0448608,17.0624,-0.0213986,
0.0001,-0.0427246,17.0624,-0.0207532,
0.000116667,-0.0405884,17.0624,-0.0229672,
EDIT
I tested Shilaghae's code suggestion. It seems to work. However, the length of all the resulting arrays is the same as x1Count, so that zeros remain in the places where Shilaghae's pattern matching code is not able to place a number. (This is a result of how I wrote the code originally.)
I was having trouble finding the indices where zeros remain, but there seemed to be a lot more zeros besides the ones expected where the header was. When I graphed the derivative of the temp[1] output, I saw a number of sharp spikes where false zeros in temp[1] might be. If I can tell where the zeros in temp[1], temp[2], and temp[3] are, I might be able to modify the pattern matching to better retain all the data.
Also, it would be nice to simply shorten the output array to no longer include the rows where the header was in the input file. However, the tutorials I have found regarding variable length arrays only show oversimplified examples like:
int[] anArray = {100, 200, 300, 400};
The code might run faster if it no longer uses scan1 to produce scan1Count. I do not want to slow the code down by using an inefficient method to produce a variable-length array. And I also do not want to skip data in my time series in the cases where the pattern matching is not able to split the input row into 4 numbers. I would rather keep the in-time-series zeros so that I can find them and use them to debug the pattern matching.
Can anyone show how to do these things in fast-running code?
SECOND EDIT
So
"-{0,1}\\d+.\\d+,"
repeats for times in the expression:
"-{0,1}\\d+.\\d+,-{0,1}\\d+.\\d+,-{0,1}\\d+.\\d+,-{0,1}\\d+.\\d+,"
Does
"-{0,1}\\d+.\\d+,"
decompose into the following three statements:
"-{0,1}" means that a minus sign occurs zero or one times, while
"\\d+." means that the minus sign(or lack of minus sign) is followed by several digits of any value followed by a decimal point, so that finally
"\\d+," means that the decimal point is followed by several digits of any value?
If so, what about numbers in my data like "1.66667E-05," or "-8.06131E-05," ? I just scanned one of the input files, and (out of 3+ million 4-column rows) it contains 638 numbers that contain E, of which 5 were in the first column, and 633 were in the last column.
FINAL EDIT
The final code was very simple, and simply involved using string.split() with "," as the regular expression. To do that, I had to manually delete the headers from the input file so that the data only contained rows with 4 comma separated numbers.
In case anyone is curious, the final working code for this is:
public double[][] populateArray(int scan1Count){
double[] outputArray1 = new double[scan1Count];
double[] outputArray2 = new double[scan1Count];
double[] outputArray3 = new double[scan1Count];
double[] outputArray4 = new double[scan1Count];
try {
File tempfile = new File("C:\\test\\mydatafile.txt");
FileInputStream fis = new FileInputStream(tempfile);
DataInputStream in = new DataInputStream(fis);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String strLine;
int i = 0;
while ((strLine = br.readLine()) != null) {
String[] split = strLine.split(",");
outputArray1[i] = Double.parseDouble(split[0]);
outputArray2[i] = Double.parseDouble(split[1]);
outputArray3[i] = Double.parseDouble(split[2]);
outputArray4[i] = Double.parseDouble(split[3]);
i++;
}
} catch (IOException e) {
System.out.println("e for exception is:"+e);
e.printStackTrace();
}
double[][] temp = new double[4][];
temp[0]= outputArray1;
temp[1]= outputArray2;
temp[2]= outputArray3;
temp[3]= outputArray4;
return temp;
}
Thank you for everyone's help. I am going to close this thread now because the question has been answered.

You could read line to line the file and for every line you could control with a regular expression (http://www.vogella.de/articles/JavaRegularExpressions/article.html) if the line presents exactly 4 comma.
If the line presents exactly 4 comma you can split the line with String.split and fill the 4 array otherwise you pass at next line.
public double[][] populateArray(int scan1Count){
double[] outputArray1 = new double[scan1Count];
double[] outputArray2 = new double[scan1Count];
double[] outputArray3 = new double[scan1Count];
double[] outputArray4 = new double[scan1Count];
//Read File Line By Line
try {
File tempfile = new File("samedatafile.txt");
FileInputStream fis = new FileInputStream(tempfile);
DataInputStream in = new DataInputStream(fis);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String strLine;
int i = 0;
while ((strLine = br.readLine()) != null) {
Pattern pattern = Pattern.compile("-{0,1}\\d+.\\d+,-{0,1}\\d+.\\d+,-{0,1}\\d+.\\d+,-{0,1}\\d+.\\d+,");
Matcher matcher = pattern.matcher(strLine);
if (matcher.matches()){
String[] split = strLine.split(",");
outputArray1[i] = Double.parseDouble(split[0]);
outputArray2[i] = Double.parseDouble(split[1]);
outputArray3[i] = Double.parseDouble(split[2]);
outputArray4[i] = Double.parseDouble(split[3]);
}
i++;
}
} catch (IOException e) {
e.printStackTrace();
}
double[][] temp = new double[4][];
temp[0]= outputArray1;
temp[1]= outputArray2;
temp[2]= outputArray3;
temp[3]= outputArray4;
return temp;
}

You can split up each line using String.split().
To skip the headers, you can either read the first N lines and discard them (if you know how many there are) or you will need to look for a specific marker - difficult to advise without seeing your data.
You may also need to change your approach a little because you currently seem to be sizing the arrays according to the total number of lines (assuming your Scanner returns lines?) rather than omitting the count of header lines.

I'd deal with the problem of the headers by simply attempting to parse every line as four numbers, and throwing away any lines where the parsing doesn't work. If there is a possibility of unparseable lines after the header lines, then you can set a flag the first time you get a "good" line, and then report any subsequent "bad" lines.
Split the lines with String.split(...). It is not the absolute fastest way to do it, but the CPU time of your program will be spent elsewhere ... so it probably doesn't matter.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to delimit new line when reading CSV file? - java

Use Apache Commons CSV. Here is the user guide https://commons.apache.org/proper/commons-csv/user-guide.html

Related

Java, Reading two different types of variables from a file and using them as objects later

Java input/output and Scanner object

Java: Read a text file into an array

Java - How to read integers separated by a space into an array

loading large matrix from text file into Java arrays

Categories

Resources