How To Compare Values Between Two Different Sized Csv Files? - java

I was wondering what is the most appropriate way of looping through two csv files and comparing their columns. Specifically I want to compare the csv file1 1st column to every iteration of csv file2 column 20 and check to see if there is a match. Here is what i have so far. In addition csv file1 is considerably smaller than csv file2.
public class ClassifyData {
public static void main(String[]args) throws IOException{
File file1 = new File("file1.csv");
File file2 = new File("file2.csv");
FileWriter writer = new FileWriter("/Users/home/Work.csv");
PrintWriter pw = new PrintWriter(writer);
Scanner in = new Scanner(file1);
Scanner in2 = new Scanner(file2);
boolean firstLine = true;
String[] temp = null;
String [] temp2 = null;
String line = null;
String line2 = null;
while((line = in.nextLine())!=null){
temp= line.split(",");
while(line2 = in2.nextLine() !=null){
temp2 = line2.split(",");
if(temp[0] == temp[20]){
System.out.println("match");
pw.append("0");
continue;
}
pw.append("\n");
}
}
pw.flush();
pw.close();
writer.close();
}
}

In the line if(temp[0] == temp[20]) you probably mean if(temp[0].equals(temp2[20])). This will give you the comparison you want. However, you're inner while loop still won't start over at the beginning of the second file like you seem to want. I don't think Scanner objects can start over on a file, and even if they could, you'd be wasting a lot of file reads by reading the same file over and over. Something like this will be more efficient for your disk:
ArrayList<String> list1 = new ArrayList<String>;
while((line = in.nextLine())!=null){
temp= line.split(",");
list1.add(temp[0]);
}
// ...
for(int i = 0; i < list1.size(); i++){
for(int j = 0; j < list2.size(); j++){
if(list1.get(i).equals(list2.get(j))){
System.out.println("Match found");
}
}
}
Warning: untested code

I don't think your solution is going to work because you're going through both files just once (you're sequentially incrementing through both files simultaneously). Given that the first file is small, I suggest going through that file completely once, and store the values in the first column in a hashtable. Then cycle through the second file, and check if the value in the 20th column appears in the hashtable or not.

Related

Javafx: Reading from an File and Spliting the result with .split method

I want to by reading the data of a file to split the results based on .split(",") in another words for this particular example i want to have 2 Indexes with each containing up to 5 informations which i would also like to acces with the .[0] and .[1] Method.
the File with the Data.
File Reading Method.
public void fileReading(ActionEvent event) throws IOException {
File file = new File("src/DateSpeicher/datenSpeicher.txt");
BufferedReader br = new BufferedReader(new FileReader(file));
String st;
while ((st = br.readLine()) != null) {
System.out.println(st);
}
}
The method does work very greatly however, i wonder how can i split those two in two Indexes or String arrays which both can be accessed through respective indecies [0], [1]. For first data in the firm array - 655464 [0][0] for last in the second Array [1][4].
My approach:
1. Making an ArrayList for every ,
2. Adding data till ","
Issue: eventho approach above works, you cant do such things as array1[0] - it gives an error, however the index method is crucial.
How can i solve this problem?
Path path = Paths.get("src/DateSpeicher/datenSpeicher.txt"); // Or:
Path path = Paths.get(new URL("/DateSpeicher/datenSpeicher.txt").toURI());
Either two Strings, and then handling them:
String content = new String(Files.readAllBytes(path), Charset.defaultCharset());
String[] data = content.split(",\\R");
or a list of lists:
List<String> lines = Files.readAllLines(path, Charset.defaultCharset());
// Result:
List<List<String>> lists = new ArrayList<>();
List<String> newList = null;
boolean addNewList = true;
for (int i = 0; i < lines.size(); ++i) {
if (addNewList) {
newList = new ArrayList<>();
lists.add(newList);
addNewList = false;
}
String line = lines.get(i);
if (line.endsWith(",")) {
line = line.substring(0, line.length() - 1);
addNewList = true;
}
newList.add(line);
}

Reading a File without line breaks using Buffered reader

I am reading a file with comma separated values which when split into an array will have 10 values for each line . I expected the file to have line breaks so that
line = bReader.readLine()
will give me each line. But my file doesnt have a line break. Instead after the first set of values there are lots of spaces(465 to be precise) and then the next line begins.
So my above code of readLine() is reading the entire file in one go as there are no lined breaks. Please suggest how best to efficiently tackle this scenario.
One way is to replace String with 465 spaces in your text with new line character "\n" before iterating it for reading.
I second Ninan's answer: replace the 465 spaces with a newline, then run the function you were planning on running earlier.
For aesthetics and readability I would suggest using Regex's Pattern to replace the spaces instead of a long unreadable String.replace(" ").
Your code could like below, but replace 6 with 465:
// arguments are passed using the text field below this editor
public static void main(String[] args)
{
String content = "DOG,CAT MOUSE,CHEESE";
Pattern p = Pattern.compile("[ ]{6}",
Pattern.DOTALL | Pattern.CASE_INSENSITIVE);
String newString = p.matcher(content).replaceAll("\n");
System.out.println(newString);
}
My suggestion is read file f1.txt and write to anther file f2.txt by removing all empty lines and spaces then read f2.txt something like
FileReader fr = new FileReader("f1.txt");
BufferedReader br = new BufferedReader(fr);
FileWriter fw = new FileWriter("f2.txt");
String line;
while((line = br.readLine()) != null)
{
line = line.trim(); // remove leading and trailing whitespace
if (!line.equals("")) // don't write out blank lines
{
fw.write(line, 0, line.length());
}
}
Then try using your code.
You might create your own subclass of a FilterInputStream or a PushbackInputStream and pass that to an InputStreamReader. One overrides int read().
Such a class unfortunately needs a bit of typing. (A nice excercise so to say.)
private static final int NO_CHAR = -2;
private boolean fromCache;
private int cachedSpaces;
private int cachedNonSpaceChar = NO_CHAR;
int read() throws IOException {
if (fromCache) {
if (cachecSpaces > 0) ...
if (cachedNonSpaceChar != NO_CHAR) ...
...
}
int ch = super.read();
if (ch != -1) {
...
}
return ch;
}
The idea is to cache spaces till either a nonspace char, and in read() either take from the cache, return \n instead, call super.read() when not from cache, recursive read when space.
My understanding is that you have a flat CSV file without proper line break, which supposed to have 10 values on each line.
Updated:
1. (Recommended) You can use Scanner class with useDelimiter to parse csv effectively, assuming you are trying to store 10 values from a line:
public static void parseCsvWithScanner() throws IOException {
Scanner scanner = new Scanner(new File("test.csv"));
// set your delimiter for scanner, "," for csv
scanner.useDelimiter(",");
// storing 10 values as a "line"
int LINE_LIMIT = 10;
// implement your own data structure to store each value of CSV
int[] tempLineArray = new int[LINE_LIMIT];
int lineBreakCount = 0;
while(scanner.hasNext()) {
// trim start and end spaces if there is any
String temp = scanner.next().trim();
tempLineArray[lineBreakCount++] = Integer.parseInt(temp);
if (lineBreakCount == LINE_LIMIT) {
// replace your own logic for handling the full array
for(int i=0; i<tempLineArray.length; i++) {
System.out.print(tempLineArray[i]);
} // end replace
// resetting array and counter
tempLineArray = new int[LINE_LIMIT];
lineBreakCount = 0;
}
}
scanner.close();
}
Or use the BufferedReader.
You might not need the ArrayList to store all values if there is memory issue by replacing your own logic.
public static void parseCsv() throws IOException {
BufferedReader br = new BufferedReader(new FileReader(file));
// your delimiter
char TOKEN = ',';
// your requirement of storing 10 values for each "line"
int LINE_LIMIT = 10;
// tmp for storing from BufferedReader.read()
int tmp;
// a counter for line break
int lineBreakCount = 0;
// array for storing 10 values, assuming the values of CSV are integers
int[] tempArray = new int[LINE_LIMIT];
// storing tempArray of each line to ArrayList
ArrayList<int[]> lineList = new ArrayList<>();
StringBuilder sb = new StringBuilder();
while((tmp = br.read()) != -1) {
if ((char)tmp == TOKEN) {
if (lineBreakCount == LINE_LIMIT) {
// your logic to handle the current "line" here.
lineList.add(tempArray);
// new "line"
tempArray = new int[LINE_LIMIT];
lineBreakCount = 0;
}
// storing current value from buffer with trim of spaces
tempArray[lineBreakCount] =
Integer.parseInt(sb.toString().trim());
lineBreakCount++;
// clear the buffer
sb.delete(0, sb.length());
}
else {
// add current char from BufferedReader if not delimiter
sb.append((char)tmp);
}
}
br.close();
}

How can I read lines from a inputted file and then store the most recently read lines in an array?

I am trying to create a program that takes an inputted text file and reads the lines one by one. It then needs to store the most recently read lines (the number of lines depends on the parameter lines) in an array and then I need to print the lines using PrintWriter.
I started the first part but I'm not sure if I have the right idea. If anyone can help me on the second part as well that would be very appreciated!
public void RecentLines(Reader in, Writer out, int lines) throws IOException {
BufferedReader r3ader = new BufferedReader(in);
String str;
while((str = r3ader.readLine()) != null){
String[] arr = str.split(" ");
for( int i =0; i < lines; i++){
arr[i] = r3ader.readLine();
}
}
EDIT
the full question is this:
Create a program which reads lines from IN, one line at the time until the end. Your method must maintain an internal buffer that stores the most recently read lines (this might be best done using an array). Once the method reaches the end of the file, it should print the lines stored in the internal buffer into out, probably best done by creating a PrintWriter to decorate this Writer. (Except for your debugging purposes during the development stage, this method should not print anything to System.out.)
Try this one:
public void RecentLines(Reader in, Writer out, int lines) throws IOException {
BufferedReader r3ader = new BufferedReader(in);
String str;
int i=0;
String[] lineArray = new String[lines];
while((str = r3ader.readLine()) != null){
lines[i%lines] = str;
i++;
if(!r3ader.hasNextLine()){
break;
}
}
sounds like a task for data structures. Queue seems to be the best fit for a given task.
public void RecentLines(Reader in, Writer out, int lines) throws IOException {
BufferedReader r3ader = new BufferedReader(in);
BufferedWriter wout = new BufferedWriter(out);
String str;
Queue<String> content = new LinkedList<String>();
int i = 0;
while ((str = r3ader.readLine()) != null) {
if (i >= lines) {
content.remove();
}
content.add(str);
i++;
}
wout.write(String.valueOf(content));
}

Reading a specific set of lines in a file [duplicate]

In Java, is there any method to read a particular line from a file? For example, read line 32 or any other line number.
For small files:
String line32 = Files.readAllLines(Paths.get("file.txt")).get(32)
For large files:
try (Stream<String> lines = Files.lines(Paths.get("file.txt"))) {
line32 = lines.skip(31).findFirst().get();
}
Unless you have previous knowledge about the lines in the file, there's no way to directly access the 32nd line without reading the 31 previous lines.
That's true for all languages and all modern file systems.
So effectively you'll simply read lines until you've found the 32nd one.
Not that I know of, but what you could do is loop through the first 31 lines doing nothing using the readline() function of BufferedReader
FileInputStream fs= new FileInputStream("someFile.txt");
BufferedReader br = new BufferedReader(new InputStreamReader(fs));
for(int i = 0; i < 31; ++i)
br.readLine();
String lineIWant = br.readLine();
Joachim is right on, of course, and an alternate implementation to Chris' (for small files only because it loads the entire file) might be to use commons-io from Apache (though arguably you might not want to introduce a new dependency just for this, if you find it useful for other stuff too though, it could make sense).
For example:
String line32 = (String) FileUtils.readLines(file).get(31);
http://commons.apache.org/io/api-release/org/apache/commons/io/FileUtils.html#readLines(java.io.File, java.lang.String)
You may try indexed-file-reader (Apache License 2.0). The class IndexedFileReader has a method called readLines(int from, int to) which returns a SortedMap whose key is the line number and the value is the line that was read.
Example:
File file = new File("src/test/resources/file.txt");
reader = new IndexedFileReader(file);
lines = reader.readLines(6, 10);
assertNotNull("Null result.", lines);
assertEquals("Incorrect length.", 5, lines.size());
assertTrue("Incorrect value.", lines.get(6).startsWith("[6]"));
assertTrue("Incorrect value.", lines.get(7).startsWith("[7]"));
assertTrue("Incorrect value.", lines.get(8).startsWith("[8]"));
assertTrue("Incorrect value.", lines.get(9).startsWith("[9]"));
assertTrue("Incorrect value.", lines.get(10).startsWith("[10]"));
The above example reads a text file composed of 50 lines in the following format:
[1] The quick brown fox jumped over the lazy dog ODD
[2] The quick brown fox jumped over the lazy dog EVEN
Disclamer: I wrote this library
Although as said in other answers, it is not possible to get to the exact line without knowing the offset (pointer) before. So, I've achieved this by creating an temporary index file which would store the offset values of every line. If the file is small enough, you could just store the indexes (offset) in memory without needing a separate file for it.
The offsets can be calculated by using the RandomAccessFile
RandomAccessFile raf = new RandomAccessFile("myFile.txt","r");
//above 'r' means open in read only mode
ArrayList<Integer> arrayList = new ArrayList<Integer>();
String cur_line = "";
while((cur_line=raf.readLine())!=null)
{
arrayList.add(raf.getFilePointer());
}
//Print the 32 line
//Seeks the file to the particular location from where our '32' line starts
raf.seek(raf.seek(arrayList.get(31));
System.out.println(raf.readLine());
raf.close();
Also visit the Java docs on RandomAccessFile for more information:
Complexity: This is O(n) as it reads the entire file once. Please be aware for the memory requirements. If it's too big to be in memory, then make a temporary file that stores the offsets instead of ArrayList as shown above.
Note: If all you want in '32' line, you just have to call the readLine() also available through other classes '32' times. The above approach is useful if you want to get the a specific line (based on line number of course) multiple times.
Another way.
try (BufferedReader reader = Files.newBufferedReader(
Paths.get("file.txt"), StandardCharsets.UTF_8)) {
List<String> line = reader.lines()
.skip(31)
.limit(1)
.collect(Collectors.toList());
line.stream().forEach(System.out::println);
}
No, unless in that file format the line lengths are pre-determined (e.g. all lines with a fixed length), you'll have to iterate line by line to count them.
In Java 8,
For small files:
String line = Files.readAllLines(Paths.get("file.txt")).get(n);
For large files:
String line;
try (Stream<String> lines = Files.lines(Paths.get("file.txt"))) {
line = lines.skip(n).findFirst().get();
}
In Java 7
String line;
try (BufferedReader br = new BufferedReader(new FileReader("file.txt"))) {
for (int i = 0; i < n; i++)
br.readLine();
line = br.readLine();
}
Source: Reading nth line from file
If you are talking about a text file, then there is really no way to do this without reading all the lines that precede it - After all, lines are determined by the presence of a newline, so it has to be read.
Use a stream that supports readline, and just read the first X-1 lines and dump the results, then process the next one.
It works for me:
I have combined the answer of
Reading a simple text file
But instead of return a String I am returning a LinkedList of Strings. Then I can select the line that I want.
public static LinkedList<String> readFromAssets(Context context, String filename) throws IOException {
BufferedReader reader = new BufferedReader(new InputStreamReader(context.getAssets().open(filename)));
LinkedList<String>linkedList = new LinkedList<>();
// do reading, usually loop until end of file reading
StringBuilder sb = new StringBuilder();
String mLine = reader.readLine();
while (mLine != null) {
linkedList.add(mLine);
sb.append(mLine); // process line
mLine = reader.readLine();
}
reader.close();
return linkedList;
}
Use this code:
import java.nio.file.Files;
import java.nio.file.Paths;
public class FileWork
{
public static void main(String[] args) throws IOException {
String line = Files.readAllLines(Paths.get("D:/abc.txt")).get(1);
System.out.println(line);
}
}
You can use LineNumberReader instead of BufferedReader. Go through the api. You can find setLineNumber and getLineNumber methods.
You can also take a look at LineNumberReader, subclass of BufferedReader. Along with the readline method, it also has setter/getter methods to access line number. Very useful to keep track of the number of lines read, while reading data from file.
public String readLine(int line){
FileReader tempFileReader = null;
BufferedReader tempBufferedReader = null;
try { tempFileReader = new FileReader(textFile);
tempBufferedReader = new BufferedReader(tempFileReader);
} catch (Exception e) { }
String returnStr = "ERROR";
for(int i = 0; i < line - 1; i++){
try { tempBufferedReader.readLine(); } catch (Exception e) { }
}
try { returnStr = tempBufferedReader.readLine(); } catch (Exception e) { }
return returnStr;
}
you can use the skip() function to skip the lines from begining.
public static void readFile(String filePath, long lineNum) {
List<String> list = new ArrayList<>();
long totalLines, startLine = 0;
try (Stream<String> lines = Files.lines(Paths.get(filePath))) {
totalLines = Files.lines(Paths.get(filePath)).count();
startLine = totalLines - lineNum;
// Stream<String> line32 = lines.skip(((startLine)+1));
list = lines.skip(startLine).collect(Collectors.toList());
// lines.forEach(list::add);
} catch (IOException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
list.forEach(System.out::println);
}
EASY WAY - Reading a line using line number.
Let's say Line number starts from 1 till null .
public class TextFileAssignmentOct {
private void readData(int rowNum, BufferedReader br) throws IOException {
int n=1; //Line number starts from 1
String row;
while((row=br.readLine()) != null) { // Reads every line
if (n == rowNum) { // When Line number matches with which you want to read
System.out.println(row);
}
n++; //This increments Line number
}
}
public static void main(String[] args) throws IOException {
File f = new File("../JavaPractice/FileRead.txt");
FileReader fr = new FileReader(f);
BufferedReader br = new BufferedReader(fr);
TextFileAssignmentOct txf = new TextFileAssignmentOct();
txf.readData(4, br); //Read a Specific Line using Line number and Passing buffered reader
}
}
for a text file you can use an integer with a loop to help you get the number of the line, don't forget to import the classes we are using in this example
File myObj = new File("C:\\Users\\LENOVO\\Desktop\\test.txt");//path of the file
FileReader fr = new FileReader(myObj);
fr.read();
BufferedReader bf = new BufferedReader(fr); //BufferedReader of the FileReader fr
String line = bf.readLine();
int lineNumber = 0;
while (line != null) {
lineNumber = lineNumber + 1;
if(lineNumber == 7)
{
//show line
System.out.println("line: " + lineNumber + " has :" + line);
break;
}
//lecture de la prochaine ligne, reading next
line = bf.readLine();
}
They are all wrong I just wrote this in about 10 seconds.
With this I managed to just call the object.getQuestion("linenumber") in the main method to return whatever line I want.
public class Questions {
File file = new File("Question2Files/triviagame1.txt");
public Questions() {
}
public String getQuestion(int numLine) throws IOException {
BufferedReader br = new BufferedReader(new FileReader(file));
String line = "";
for(int i = 0; i < numLine; i++) {
line = br.readLine();
}
return line; }}

Why last value in a csv keeps repeating itself when reading from

Here is a look at the function I have been messing with for a day now. For some reason it writes the last value in the csv over and over as opposed to parsing through the rows. I threw in some print statements and it appears the row contents are correctly writing to the array but are being over written by the last value. Any help would be amazing, thanks.
public int csvCombine(ArrayList <Indexstruct> todaysCSV, int totalconvo, String date) throws IOException{
String rows = null;
Indexstruct templist=new Indexstruct();
String [] rowArray= new String [2];
FileReader fr = new FileReader(date + ".csv");
BufferedReader br = new BufferedReader(fr);
rows= br.readLine();
rowArray=rows.split(",");
totalconvo+=Integer.parseInt(rowArray[0]); //Reads in total amount of words spoken and adds it to the CSV value of words spoken
final int csvSize=Integer.parseInt(rowArray[1]); //Read in size of csvList
for(int count=0; count<csvSize-1; count++){
rows = br.readLine();
rowArray = rows.split(","); // Reads lines into an array, takes array values and places them into an object of type indexStruct to write into ArrayList
templist.numOfUses=Integer.parseInt(rowArray[1]); //sets object num of uses
templist.Word=rowArray[0]; //sets object word
todaysCSV.add(count, templist); //adds object to csv ArrayList
}
br.close();
return totalconvo;
}
All you're currently doing is adding the same templist object over and over again, and so it makes sense that all data is the same. You need to create a new templist object (whatever type it is) with each iteration of the for loop.
i.e.,
for(int count=0; count < csvSize-1; count++) {
rows = br.readLine();
rowArray = rows.split(",");
int useCount = Integer.parseInt(rowArray[1]);
String word = rowArray[0];
// assuming a type called TempList with a constructor that looks like this
todaysCSV.add(count, new TempList(useCount, word));
}

Categories