So, I have a 2D array from a file and i am trying to find the number of rows and columns from the file. I succeeded in finding the number of rows, but for columns im not able to. My data is separated by tabs in the file
array.txt
4 5 1 9 0
3 4 5 0 5
2 7 7 4 5
public class readFile
{
private static Scanner infile;
public static void main(String[] args){
//Opening file for reading data
try
{
infile = new Scanner(new
File("C:\\Users\\array.txt"));
}
catch (FileNotFoundException fnfe)
{
System.out.println("Error Creating File");
System.exit(1);
}
//COUNTING THE NUMBER OF ROWS AND COLUMNS IN FILE
int rows = 0;
while(infile.hasNextLine())
{
rows++;
infile.nextLine();
}
int columns = 0;
if (infile.hasNextLine())
{
columns = infile.nextLine().split("\t").length;
}
System.out.println(rows);
System.out.println(columns);
}
}
This code loops until the end of the file:
while(infile.hasNextLine())
{
rows++;
infile.nextLine();
}
So your consequent call to if (infile.hasNextLine()) always returns false.
You should either recreate your scanner and start reading the file anew, or start with calculating the columns for the first line and then count the remaining lines.
What is the current output of this when you run your code?
columns = infile.nextLine().split("\t").length;
is it 0? or 4? If 4, then you always just have add 1 I guess since there are only 4 number of tabs each row....also, it would be brittle to use spaces or tabs as delimiter, I would recommend comma instead.
If the output is 0, that could mean that the tab characters in your text file is not represented as tab in the code and not equal to "\t"
Related
I want to put a newline character at a certain position instead of all of my numbers then newline character.
I tried to put the newline character with System.out.print("\n"); outside of the loop but I can't figure out how to put it at a certain position. I want my chart to be a simple 10 x 10 chart, with 10 numbers per row, 10 rows in total. My numbers.txt has 100 random numbers.
import java.io.*;
public class numbersChart {
public static void main(String[] args) throws IOException {
File myFile = new File("numbers.txt");
Scanner text = new Scanner(myFile);
while (text.hasNextInt()) {
System.out.print("\n");
System.out.printf("%6s", " ");
System.out.print(text.nextInt());
}
}
}
You don't really need the while loop, the idea is to put \n after 10ths numbers.
Try the code below:
public static void main(String[] args) throws IOException {
File myFile = new File("numbers.txt");
Scanner scanner = new Scanner(myFile);
int d = 10;
for (int i = 0; i < d * d; i++) {
if (scanner.hasNextInt()) {
System.out.print(scanner.nextInt());
} else {
break;
}
if ((i + 1) % d == 0) {
System.out.println();
} else {
System.out.print(" ");
}
}
}
Build your rows as you read the file pretty much in the same fashion as you are doing. Print each row (of whatever) as they are developed. Using a while loop to read the file is the usual practice for this sort of thing since you may not necessarily know exactly how many numbers may actually be contained within the file to generate the chart (table). In your particular case you for whatever reason know there are going to be 100 numbers. This however may not always be the case in other real world situations. Your code should be able to create table rows of ten (or whatever you desire) whether there is 1 number or 1 million (or even more) numbers in the file.
A slightly different play on the task would be to consider a numerical data file which contains any number of numerical values be it signed or unsigned integer, floating point or both. We want to read this file so to create and display a numerical table consisting of whatever desired number of rows with all rows consisting of whatever number of columns with the exception of perhaps the last row which there may not be enough values in the data file to accommodate the require number of columns. The columns are to also be spaced to our desired width when creating the table.
With the supplied code below, this can all be accomplished. Maximum Rows, columns, and table spacing is all configurable. I suggest you read the comments in code for further insight:
/* Create a numbers.txt file for testing. Save a copy
of your existing one somewhere else for safekeeping
if you use this portion of code! */
int quantityOfNumbers = 100;
// 'Try With Resources' used here to auto-close the writer.
try (java.io.PrintWriter writer = new java.io.PrintWriter(new java.io.File("numbers.txt"))) {
for (int i = 1; i <= quantityOfNumbers; i++) {
writer.append(String.valueOf(i));
if (i < quantityOfNumbers) {
writer.write(System.lineSeparator());
}
}
writer.flush();
}
catch (FileNotFoundException ex) {
System.err.println(ex.getMessage());
System.exit(0);
}
// ---------------------------------------------------------------------
String fileName = "numbers.txt"; // The numerical data file to read.
int desiredColumns = 10; // The number of columns per row you want.
int formatSpacing = 8; // Each column will be this many spaces wide.
int maxRows = 0; // Max rows we might want. If 0 then unlimited.
java.io.File myFile = new java.io.File(fileName); // File Object
// Read & process the numerical data file...
// 'Try With Resources' used here to auto-close the reader.
try (Scanner reader = new Scanner(myFile)) {
String num;
StringBuilder sb = new StringBuilder(""); // Used for building each numerical row
int columnCounter = 0; // Used to keep track of columns added to 'sb'.
int rowCounter = 0; // Used to keep track of the number of rows created.
while (reader.hasNextLine()) { // Read file until there is nothing left to read.
num = reader.nextLine().trim(); // Retrieve data line on each iteration.
/* Make sure the line in the file is actually a number
and not something alphanumeric or a blank line. Carry
out some form of validation. The regular expression
(regex) within the matches() method allows for signed
or unsigned integer or floating point string numerical
values. If it's invalid we skip the line. Remove this
`if` block if you want everything regardless: */
if (!num.matches("-?\\d+(\\.\\d+)?")) {
continue;
}
columnCounter++; // Valid line so increment Column Counter by 1
/* Format the string value as we append to sb. If you want the
table values right justified then remove the '-' from the format. */
sb.append(String.format("%-" + String.valueOf(formatSpacing) + "s", num));
if (columnCounter == desiredColumns) { // Have we reached our desired number of columns?
System.out.println(sb.toString()); // Print the row to console.
sb.setLength(0); // Clear the StringBuilder object (sb).
columnCounter = 0; // Reset the Column Counter to 0.
rowCounter++; // Increment Row Counter by 1
if (rowCounter == maxRows) { // If we've reach our max rows then stop reading.
break; // Break out of 'while' loop.
}
}
}
/* Reading has finished but is there anything that didn't
make 10 columns worth of data? If the StringBuilder
object (sb) contains anything at this point then yes
there is something so print it to console window. */
if (!sb.toString().isEmpty()) {
System.out.println(sb.toString());
}
}
catch (FileNotFoundException ex) {
// Whoops...can't find the numerical data file.
System.err.println("Can not locate the numerical data file!");
System.err.println("Data File: --> " + myFile.getAbsolutePath());
}
How do I read a .csv excel file with x number of rows and y number of columns, ignore irrelevant cells (things like names), then compute an average of the numbers in each column?
The Excel I have is something like this (, indicates new cell):
ID, week 1, week 2, week 3, .... , week 7
0 , 1 , 0.5 , 0 , , 1.2
1 , 0.5 , 1 , 0.5 , , 0.5
y , ......
so, how do I make it so it reads that kind of .csv file then computes an average in the format Week 1 = (Week 1 average), Week 2 = (week2 average) for all weeks?
Also am I correct in assuming I need to use a 2D Array for this?
Edit
Here's my code so far, it's very crude and I'm not sure if it does things properly yet:
import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;
public class ClassAverage {
public static void main(String[] args){
readFile2Array("attendance.csv");
}
public static double[][] readFile2Array(String fileName){
try {
int rowCount = 0;
int colCount = 0;
Scanner rc = new Scanner(new File("attendance.csv"));
while (rc.hasNextLine()) {
rowCount++;
rc.nextLine();
}
rc.close();
System.out.println(rowCount);
Scanner cc = new Scanner(new File("attendance.csv"));
while (cc.hasNext()) {
colCount++;
cc.next();
}
cc.close();
colCount = colCount/rowCount;
System.out.println(colCount);
Scanner sc = new Scanner(new File("attendance.csv"));
double[][] spreadSheet = new double[rowCount][colCount];
while (sc.hasNext()) {
for (int i=0; i<spreadSheet.length; ++i){
for (int j=0; j<spreadSheet[i].length; ++j){
spreadSheet[i][j] = Double.parseDouble(sc.next());
}
}
}
sc.close();
return spreadSheet;
} catch (FileNotFoundException e) {
System.out.println("File cannot be opened");
e.printStackTrace();
}
return null;
}
public static double weeklyAvg(double[][] a){
}
}
So a summary of what it's intended to do
readFile2Array: read the csv file and count the number of rows, then count the total number of cells, divide total number of cells by number of rows to find number of columns. Read again and put each cell into the correct place in a 2D array.
weeklyAvg: I haven't thought up a way to do this yet, but it's supposed to read the array column by column and compute an average for each column, then print out the result.
PS. I'm very new at Java so I have no idea what some suggestions mean so I'd really appreciate suggestions that are pure java based without addons and stuff (I'm not sure if that's what some people are suggesting even). I hope it's not too much to ask for (if it's even possible).
You can use a Java library to handle your CSV file. For example opencsv ( you can find the latest maven version here http://mvnrepository.com/artifact/com.opencsv/opencsv/3.5)
And then you can parse your file like this :
CSVReader reader = new CSVReader(new FileReader("PATH_TO_YOUR_FILE"));
String[] nextLine;
int counter = 0;
while ((nextLine = reader.readNext()) != null) {
// nextLine[] is an array of values from the line
System.out.println(nextLine[0] + nextLine[1]);
}
You have to ignore the header line, you can simply do this by incrementing a counter and skipping the zero value.
To compute the average you can use a hashmap where the key is the column header name (example week 1). Then you increment with the current line value and after the loop is completed you divide by the number of lines (don't forget to substract the ignored lines like header line)
To parse simple CSV files, it's pretty simple to just manually parse through it, as long as you know the format is the same throughout the file and it does not contain errors
Create a storage data structure for each column you wish to compute (use a LinkedList<String>)
Read through the CSV file line by line with a BufferedReader
Use String.split(',') on each line and add the specific columns in the returned array to the correct LinkedList
Loop through the LinkedLists at the end and compute your averages (using Double.parseDouble() to convert the Strings to doubles)
To make sure that the String you're attempting to parse is a double, you can either use a try-catch statement or use a regex. Check Java: how to check that a string is parsable to a double? for more information
So I'm reading in a two column data txt file of the following from:
20 0.15
30 0.10
40 0.05
50 0.20
60 0.10
70 0.10
80 0.30
and I want to put the second column into an array( {0.15,0.10,0.05,0.2,0.1,0.1,0.3}) but I don't know how to parse the floats that are greater than 1. I've tried to read the file in as scanner and use delimiters but I don't know how to get ride of the integer that proceeds the token. Please help me.
here is my code for reference:
import java.io.PrintWriter;
import java.util.Scanner;
import java.io.*;
class OneStandard {
public static void main(String[] args) throws IOException {
Scanner input1 = new Scanner(new File("ClaimProportion.txt"));//reads in claim dataset txt file
Scanner input2 = new Scanner(new File("ClaimProportion.txt"));
Scanner input3 = new Scanner(new File("ClaimProportion.txt"));
//this while loop counts the number of lines in the file
while (input1.hasNextLine()) {
NumClaim++;
input1.nextLine();
}
System.out.println("There are "+NumClaim+" different claim sizes in this dataset.");
int[] ClaimSize = new int[NumClaim];
System.out.println(" ");
System.out.println("The different Claim sizes are:");
//This for loop put the first column into an array
for (int i=0; i<NumClaim;i++){
ClaimSize[i] = input2.nextInt();
System.out.println(ClaimSize[i]);
input2.nextLine();
}
double[] ProportionSize = new double[NumClaim];
//this for loop is trying to put the second column into an array
for(int j=0; j<NumClaim; j++){
input3.skip("20");
ProportionSize[j] = input3.nextDouble();
System.out.println(ProportionSize[j]);
input3.nextLine();
}
}
}
You can use "YourString".split("regex");
Example:
String input = "20 0.15";
String[] items = input.split(" "); // split the string whose delimiter is a " "
float floatNum = Float.parseFloat(items[1]); // get the float column and parse
if (floatNum > 1){
// number is greater than 1
} else {
// number is less than 1
}
Hope this helps.
You only need one Scanner. If you know that each line always contains one int and one double, you can read the numbers directly instead of reading lines.
You also don't need to read the file once to get the number of lines, again to get the numbers etc. - you can do it in one go. If you use ArrayList instead of array, you won't have to specify the size - it will grow as needed.
List<Integer> claimSizes = new ArrayList<>();
List<Double> proportionSizes = new ArrayList<>();
while (scanner.hasNext()) {
claimSizes.add(scanner.nextInt());
proportionSizes.add(scanner.nextDouble());
}
Now number of lines is claimSizes.size() (also proportionSizes.size()). The elements are accessed by claimSizes.get(i) etc.
I am working on a lab for school so any help would be appreciated, but I do not want this solved for me. I am working in NetBeans and my main goal is to create a "two-dimensional" array by scanning in integers from a text file. So far, my program runs with no errors, but I am missing the first column of my array. My input looks like:
6
3
0 0 45
1 1 9
2 2 569
3 2 17
2 3 -17
5 3 9999
-1
where 6 is the number of rows, 3 is the number of columns, and -1 is the sentinel. My output looks like:
0 45
1 9
2 569
2 17
3 -17
3 9999
End of file detected.
BUILD SUCCESSFUL (total time: 0 seconds)
As you can see, everything prints correctly except for the missing first column.
Here is my program:
import java.io.*;
import java.util.Scanner;
public class Lab3
{
public static void main(String[] arg) throws IOException
{
File inputFile = new File("C:\\Users\\weebaby\\Documents\\NetBeansProjects\\Lab3\\src\\input.txt");
Scanner scan = new Scanner (inputFile);
final int SENT = -1;
int R=0, C=0;
int [][] rcArray;
//Reads in two values R and C, providing dimensions for the rows and columns.
R = scan.nextInt();
C = scan.nextInt();
//Creates two-dimensional array of size R and C.
rcArray = new int [R][C];
while (scan.nextInt() != SENT)
{
String line = scan.nextLine();
String[] numbers = line.split(" ");
int newArray[] = new int[numbers.length];
for (int i = 1; i < numbers.length; i++)
{
newArray[i] = Integer.parseInt(numbers[i]);
System.out.print(newArray[i]+" ");
}
System.out.println();
}
System.out.println("End of file detected.");
}
}
Clearly, there is a logical error here. Could someone please explain why the first column is invisible? Is there a way I can only use my rcArray or do I have to keep both my rcArray and newArray? Also, how I can get my file path to just read "input.txt" so that my file path isn't so long? The file "input.txt" is located in my Lab3 src folder (same folder as my program), so I thought I could just use File inputFile = new File("input.txt"); to locate the file, but I can't.
//Edit
Okay I have changed this part of my code:
for (int i = 0; i < numbers[0].length(); i++)
{
newArray[i] = Integer.parseInt(numbers[i]);
if (newArray[i]==SENT)
break;
System.out.print(newArray[i]+" ");
}
System.out.println();
Running the program (starting at 0 instead of 1) now gives the output:
0
1
2
3
2
5
which happens to be the first column. :) I'm getting somewhere!
//Edit 2
In case anyone cares, I figured everything out. :) Thanks for all of your help and feedback.
Since you do not want this solved for you, I will leave you with a hint:
Arrays in Java are 0 based, not 1 based.
As well as Jeffrey's point around the 0-based nature of arrays, look at this:
while (scan.nextInt() != SENT)
{
String line = scan.nextLine();
...
You're consuming an integer (using nextInt()) but all you're doing with that value is checking that it's not SENT. You probably want something like:
int firstNumber;
while ((firstNumber = scan.nextInt()) != SENT)
{
String line = scan.nextLine();
...
// Use line *and* firstNumber here
Or alternatively (and more cleanly IMO):
while (scan.hasNextLine())
{
String line = scan.nextLine();
// Now split the line... and use a break statement if the parsed first
// value is SENT.
My data is stored in large matrices stored in txt files with millions of rows and 4 columns of comma-separated values. (Each column stores a different variable, and each row stores a different millisecond's data for all four variables.) There is also some irrelevant header data in the first dozen or so lines. I need to write Java code to load this data into four arrays, with one array for each column in the txt matrix. The Java code also needs to be able to tell when the header is done, so that the first data row can be split into entries for the 4 arrays. Finally, the java code needs to iterate through the millions of data rows, repeating the process of decomposing each row into four numbers which are each entered into the appropriate array for the column in which the number was located.
Can anyone show me how to alter the code below in order to accomplish this?
I want to find the fastest way to accomplish this processing of millions of rows. Here is my code:
MainClass2.java
package packages;
public class MainClass2{
public static void main(String[] args){
readfile2 r = new readfile2();
r.openFile();
int x1Count = r.readFile();
r.populateArray(x1Count);
r.closeFile();
}
}
readfile2.java
package packages;
import java.io.*;
import java.util.*;
public class readfile2 {
private Scanner scan1;
private Scanner scan2;
public void openFile(){
try{
scan1 = new Scanner(new File("C:\\test\\samedatafile.txt"));
scan1 = new Scanner(new File("C:\\test\\samedatafile.txt"));
}
catch(Exception e){
System.out.println("could not find file");
}
}
public int readFile(){
int scan1Count = 0;
while(scan1.hasNext()){
scan1.next();
scan1Count += 1;
}
return scan1Count;
}
public double[] populateArray(int scan1Count){
double[] outputArray1 = new double[scan1Count];
double[] outputArray2 = new double[scan1Count];
double[] outputArray3 = new double[scan1Count];
double[] outputArray4 = new double[scan1Count];
int i = 0;
while(scan2.hasNext()){
//what code do I write here to:
// 1.) identify the start of my time series rows after the end of the header rows (e.g. row starts with a number AT LEAST 4 digits in length.)
// 2.) split each time series row's data into a separate new entry for each of the 4 output arrays
i++;
}
return outputArray1, outputArray2, outputArray3, outputArray4;
}
public void closeFile(){
scan1.close();
scan2.close();
}
}
Here are the first 19 lines of a typical data file:
text and numbers on first line
1 msec/sample
3 channels
ECG
Volts
Z_Hamming_0_05_LPF
Ohms
dz/dt
Volts
min,CH2,CH4,CH41,
,3087747,3087747,3087747,
0,-0.0518799,17.0624,0,
1.66667E-05,-0.0509644,17.0624,-0.00288295,
3.33333E-05,-0.0497437,17.0624,-0.00983428,
5E-05,-0.0482178,17.0624,-0.0161573,
6.66667E-05,-0.0466919,17.0624,-0.0204402,
8.33333E-05,-0.0448608,17.0624,-0.0213986,
0.0001,-0.0427246,17.0624,-0.0207532,
0.000116667,-0.0405884,17.0624,-0.0229672,
EDIT
I tested Shilaghae's code suggestion. It seems to work. However, the length of all the resulting arrays is the same as x1Count, so that zeros remain in the places where Shilaghae's pattern matching code is not able to place a number. (This is a result of how I wrote the code originally.)
I was having trouble finding the indices where zeros remain, but there seemed to be a lot more zeros besides the ones expected where the header was. When I graphed the derivative of the temp[1] output, I saw a number of sharp spikes where false zeros in temp[1] might be. If I can tell where the zeros in temp[1], temp[2], and temp[3] are, I might be able to modify the pattern matching to better retain all the data.
Also, it would be nice to simply shorten the output array to no longer include the rows where the header was in the input file. However, the tutorials I have found regarding variable length arrays only show oversimplified examples like:
int[] anArray = {100, 200, 300, 400};
The code might run faster if it no longer uses scan1 to produce scan1Count. I do not want to slow the code down by using an inefficient method to produce a variable-length array. And I also do not want to skip data in my time series in the cases where the pattern matching is not able to split the input row into 4 numbers. I would rather keep the in-time-series zeros so that I can find them and use them to debug the pattern matching.
Can anyone show how to do these things in fast-running code?
SECOND EDIT
So
"-{0,1}\\d+.\\d+,"
repeats for times in the expression:
"-{0,1}\\d+.\\d+,-{0,1}\\d+.\\d+,-{0,1}\\d+.\\d+,-{0,1}\\d+.\\d+,"
Does
"-{0,1}\\d+.\\d+,"
decompose into the following three statements:
"-{0,1}" means that a minus sign occurs zero or one times, while
"\\d+." means that the minus sign(or lack of minus sign) is followed by several digits of any value followed by a decimal point, so that finally
"\\d+," means that the decimal point is followed by several digits of any value?
If so, what about numbers in my data like "1.66667E-05," or "-8.06131E-05," ? I just scanned one of the input files, and (out of 3+ million 4-column rows) it contains 638 numbers that contain E, of which 5 were in the first column, and 633 were in the last column.
FINAL EDIT
The final code was very simple, and simply involved using string.split() with "," as the regular expression. To do that, I had to manually delete the headers from the input file so that the data only contained rows with 4 comma separated numbers.
In case anyone is curious, the final working code for this is:
public double[][] populateArray(int scan1Count){
double[] outputArray1 = new double[scan1Count];
double[] outputArray2 = new double[scan1Count];
double[] outputArray3 = new double[scan1Count];
double[] outputArray4 = new double[scan1Count];
try {
File tempfile = new File("C:\\test\\mydatafile.txt");
FileInputStream fis = new FileInputStream(tempfile);
DataInputStream in = new DataInputStream(fis);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String strLine;
int i = 0;
while ((strLine = br.readLine()) != null) {
String[] split = strLine.split(",");
outputArray1[i] = Double.parseDouble(split[0]);
outputArray2[i] = Double.parseDouble(split[1]);
outputArray3[i] = Double.parseDouble(split[2]);
outputArray4[i] = Double.parseDouble(split[3]);
i++;
}
} catch (IOException e) {
System.out.println("e for exception is:"+e);
e.printStackTrace();
}
double[][] temp = new double[4][];
temp[0]= outputArray1;
temp[1]= outputArray2;
temp[2]= outputArray3;
temp[3]= outputArray4;
return temp;
}
Thank you for everyone's help. I am going to close this thread now because the question has been answered.
You could read line to line the file and for every line you could control with a regular expression (http://www.vogella.de/articles/JavaRegularExpressions/article.html) if the line presents exactly 4 comma.
If the line presents exactly 4 comma you can split the line with String.split and fill the 4 array otherwise you pass at next line.
public double[][] populateArray(int scan1Count){
double[] outputArray1 = new double[scan1Count];
double[] outputArray2 = new double[scan1Count];
double[] outputArray3 = new double[scan1Count];
double[] outputArray4 = new double[scan1Count];
//Read File Line By Line
try {
File tempfile = new File("samedatafile.txt");
FileInputStream fis = new FileInputStream(tempfile);
DataInputStream in = new DataInputStream(fis);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String strLine;
int i = 0;
while ((strLine = br.readLine()) != null) {
Pattern pattern = Pattern.compile("-{0,1}\\d+.\\d+,-{0,1}\\d+.\\d+,-{0,1}\\d+.\\d+,-{0,1}\\d+.\\d+,");
Matcher matcher = pattern.matcher(strLine);
if (matcher.matches()){
String[] split = strLine.split(",");
outputArray1[i] = Double.parseDouble(split[0]);
outputArray2[i] = Double.parseDouble(split[1]);
outputArray3[i] = Double.parseDouble(split[2]);
outputArray4[i] = Double.parseDouble(split[3]);
}
i++;
}
} catch (IOException e) {
e.printStackTrace();
}
double[][] temp = new double[4][];
temp[0]= outputArray1;
temp[1]= outputArray2;
temp[2]= outputArray3;
temp[3]= outputArray4;
return temp;
}
You can split up each line using String.split().
To skip the headers, you can either read the first N lines and discard them (if you know how many there are) or you will need to look for a specific marker - difficult to advise without seeing your data.
You may also need to change your approach a little because you currently seem to be sizing the arrays according to the total number of lines (assuming your Scanner returns lines?) rather than omitting the count of header lines.
I'd deal with the problem of the headers by simply attempting to parse every line as four numbers, and throwing away any lines where the parsing doesn't work. If there is a possibility of unparseable lines after the header lines, then you can set a flag the first time you get a "good" line, and then report any subsequent "bad" lines.
Split the lines with String.split(...). It is not the absolute fastest way to do it, but the CPU time of your program will be spent elsewhere ... so it probably doesn't matter.