Unexpected amount of lines when writing to a csv file - java

A part of my application writes data to a .csv file in the following way:
public class ExampleWriter {
public static final int COUNT = 10_000;
public static final String FILE = "test.csv";
public static void main(String[] args) throws Exception {
try (OutputStream os = new FileOutputStream(FILE)){
os.write(239);
os.write(187);
os.write(191);
BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(os, StandardCharsets.UTF_8));
for (int i = 0; i < COUNT; i++) {
writer.write(Integer.toString(i));
writer.newLine();
}
} catch (IOException e) {
e.printStackTrace();
}
System.out.println(checkLineCount(COUNT, new File(FILE)));
}
public static String checkLineCount(int expectedLineCount, File file) throws Exception {
BufferedReader expectedReader = new BufferedReader(new FileReader(file));
try {
int lineCount = 0;
while (expectedReader.readLine() != null) {
lineCount++;
}
if (expectedLineCount == lineCount) {
return "correct";
} else {
return "incorrect";
}
}
finally {
expectedReader.close();
}
}
}
The file will be opened in excel and all kind of languages are present in the data. The os.write parts are for prefixing the file with a byte order mark as to enable all kinds of characters.
Somehow the amount of lines in the file do not match the count in the loop and I can not figure out how. Any help on what I am doing wrong here would be greatly appreciated.

You simply need to flush and close your output stream (forcing fsync) before opening the file for input and counting. Try adding:
writer.flush();
writer.close();
inside your try-block. after the for-loop in the main method.

(As a side note).
Note that using a BOM is optional, and (in many cases) reduces the portability of your files (because not all consuming app's are able to handle it well). It does not guarantee that the file has the advertised character encoding. So i would recommend to remove the BOM. When using Excel, just select the file and and choose UTF-8 as encoding.

You are not flushing the stream,Refer oracle docs for more info
which says that
Flushes this output stream and forces any buffered output bytes to be
written out. The general contract of flush is that calling it is an
indication that, if any bytes previously written have been buffered by
the implementation of the output stream, such bytes should immediately
be written to their intended destination. If the intended destination
of this stream is an abstraction provided by the underlying operating
system, for example a file, then flushing the stream guarantees only
that bytes previously written to the stream are passed to the
operating system for writing; it does not guarantee that they are
actually written to a physical device such as a disk drive.
The flush method of OutputStream does nothing.
You need to flush as well as close the stream. There are 2 ways
manually call close() and flush().
use try with resource
As I can see from your code that you have already implemented try with resource and also BufferedReader class also implements Closeable, Flushable so use code as per below
public static void main(String[] args) throws Exception {
try (OutputStream os = new FileOutputStream(FILE); BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(os, StandardCharsets.UTF_8))){
os.write(239);
os.write(187);
os.write(191);
for (int i = 0; i < COUNT; i++) {
writer.write(Integer.toString(i));
writer.newLine();
}
} catch (IOException e) {
e.printStackTrace();
}
System.out.println(checkLineCount(COUNT, new File(FILE)));
}

When COUNT is 1, the code in main() will write a file with two lines, a line with data plus an empty line afterwards. Then you call checkLineCount(COUNT, file) expecting that it will return 1 but it returns 2 because the file has actually two lines.
Therefore if you want the counter to match you must not write a new line after the last line.

(As another side note).
Notice that writing CSV-files the way you are doing is really bad practice. CSV is not so easy as it may look at first sight! So, unless you really know what you are doing (so being aware of all CSV quirks), use a library!

Related

Writing to File while Read condition is true [duplicate]

I have the following code:
CSVmaker(LinkedList data) {
String [] myLines = makeStrings(data);
// for (int k = 0; k<myLines.length; k++)
// System.out.println(myLines[]);
this.file = new File("rawdata.csv");
try {
BufferedWriter buff = new BufferedWriter(new FileWriter(file));
for (int i = 0; i<myLines.length; i++){
buff.write(myLines[i]);
buff.newLine();
System.out.println("done");
}
} catch (IOException ex) {
System.out.println("except");
}
}
No, I checked for the contents of myLines, these are correct.
Also, I get the print which prints "done" just as often as I should.
The csv is created.
However, if I open it manually, it is empty.
What can be the reason for this?
You never flush the buffer, or close the BufferedWriter.
After the for loop, make the following calls:
buff.flush();
buff.close();
Even with other resources, closing them when done is a good idea.
You have to close() the stream after use.
Call buff.close() after write loop; BufferedWriter will flush data to file at close.
Though the question is answered . I would like to add how buffer works.
whenever you try to write to a file using buffer,whatever you write gets added to the buffer. When the buffer is full the contents are written to the file . This way we are reducing the number of hits to the hard-drive hence improving the efficency.
If we want to forcefully write to a file without the buffer getting full , we use flush() method.
Starting with Java 8, one would simply do it with a try with resources, which automatically closes the BufferedWriter. Also see the usage of the new class Files
try (BufferedWriter writer = Files.newBufferedWriter(somePath, yourCharset)){
writer.write(output);
}

What is the memory efficient way to read a very large csv file of say 3GB in Java?

I have written 2 methods to read the file
public static void parseCsvFile(String path) throws IOException {
FileInputStream inputStream = null;
Scanner sc = null;
try {
inputStream = new FileInputStream(path);
sc = new Scanner(inputStream, "UTF-8");
while (sc.hasNextLine()) {
String line = sc.nextLine();
//logger.info(line);
}
// note that Scanner suppresses exceptions
if (sc.ioException() != null) {
throw sc.ioException();
}
} finally {
if (inputStream != null) {
inputStream.close();
}
if (sc != null) {
sc.close();
}
}
}
public static void parseCsvUsingJavaStream(String path) {
try (Stream<String> stream = Files.lines(Paths.get(path))) {
stream.forEach(System.out :: println);
} catch (IOException ioe) {
ioe.printStackTrace();
}
}
From the first approach what I understand is that the method does not load all the lines from the file into the memory at once, which is memory efficient. I want to achieve the same using lambda expression. My question here is the does my second approach load all the lines into the memory?If yes then how can I make my second approach memory efficient?
The answer to your question is in the Files.lines javadoc :
Read all lines from a file as a Stream. Unlike readAllLines, this method does not read all lines into a List, but instead populates lazily as the stream is consumed.
Your second code sample should be roughly as memory-efficient as your first code sample.
Using the streams api should result to about the same memory usage as the other approach, unless you parallelize the stream.
From the Javadoc:
Read all lines from a file as a Stream. Unlike readAllLines, this method does not read all lines into a List, but instead populates lazily as the stream is consumed.
Bytes from the file are decoded into characters using the specified charset and the same line terminators as specified by readAllLines are supported.
After this method returns, then any subsequent I/O exception that occurs while reading from the file or when a malformed or unmappable byte sequence is read, is wrapped in an UncheckedIOException that will be thrown from the Stream method that caused the read to take place. In case an IOException is thrown when closing the file, it is also wrapped as an UncheckedIOException.
The returned stream encapsulates a Reader. If timely disposal of file system resources is required, the try-with-resources construct should be used to ensure that the stream's close method is invoked after the stream operations are completed.

How to write multiple times in external console application using java?

I need to dialogue with an external c++ console program (read output and write input). I read from the application with a Thread (and it works), but when it needs input, it works only the first time, then the stream probably remains empty, and it doesn't receive the second input (and external program closes).
The application i'm using is a simple .exe wrote in c++ that:
print "Insert first input"
scan input1
print input1
print "Insert second input"
scan input2
print input2
Main class:
import java.io.*;
import java.util.Scanner;
public class ExampleCom {
public static Communication com = new Communication();
public static void main(String[] args)
{
Scanner in = new Scanner(System.in);
String s;
com.read();
while(true)
{
s = in.nextLine();
com.write(s);
}
}
Communication class:
public class Communication
{
Process p;
OutputStream writer;
public InputStream reader = null;
Read r; //Class that with a loop read all exe input
Communication()
{
try{
p = Runtime.getRuntime ().exec ("C:\\esempio.exe");
writer = p.getOutputStream();
reader = p.getInputStream();
}catch(Exception e){}
}
public void read()
{
r = new Read();
Thread threadRead = new Thread(r);
threadRead.start();
}
public void write(String s)
{
try{
writer.write(s.getBytes());
writer.flush();
writer.close();
}catch(Exception e){}
}
}
How can I send my string (like "writer.write('hello')") when the external application needs it?
The problem is that in your write() method, you have the line
writer.close();
which means that after calling it the first time, you are closing the input stream to your C++. As far as it is concerned, it sees the "end of file" marker after your first input.
What you should do is put the close() in a separate method, and call that method only when you are done working with that process.
Now, as your target program expects text input and will only interpret the input if it gets an end-of-line (as per your answer to the question in my comment), you should supply that end-of-line to it.
Instead of doing raw byte-writes, I think a better approach would be to use a PrintWriter for that output stream, and use as naturally as you use System.out.println(). It can also save you on the flush() part.
You are interpreting it incorrectly when you see that your program is not reading the input until you close(). It's not waiting - it sends it as soon as you call flush(). But the C++ waits for either an end-of-file or an end-of-line, and since you are not giving it an end-of-line, then only close(), that sends it end-of-file, causes it to accept the input. But then you can no longer send any further data.
So the solution is, first, to define your writer as a PrintWriter. Instead of
OutputStream writer;
Use
PrintWriter writer;
And instead of
writer = p.getOutputStream();
Use
writer = new PrintWriter(p.getOutputStream(), true);
The true there will give you auto-flush whenever you use the println() command.
Now, your write method should be:
public void write(String s)
{
writer.println(s);
}
Note that a PrintWriter doesn't produce exceptions, so if you care about errors, you have to check for them using checkError().
And of course, have the close() in a separate method, as I mentioned before.
Because the write() method might throw an IOException, it is advisable to call the close() method inside a finally block.Place the writer.close() method outside the try clause:
finally {
if(writer != null) {
writer.close();
}

Bufferedwriter works, but file empty?

I have the following code:
CSVmaker(LinkedList data) {
String [] myLines = makeStrings(data);
// for (int k = 0; k<myLines.length; k++)
// System.out.println(myLines[]);
this.file = new File("rawdata.csv");
try {
BufferedWriter buff = new BufferedWriter(new FileWriter(file));
for (int i = 0; i<myLines.length; i++){
buff.write(myLines[i]);
buff.newLine();
System.out.println("done");
}
} catch (IOException ex) {
System.out.println("except");
}
}
No, I checked for the contents of myLines, these are correct.
Also, I get the print which prints "done" just as often as I should.
The csv is created.
However, if I open it manually, it is empty.
What can be the reason for this?
You never flush the buffer, or close the BufferedWriter.
After the for loop, make the following calls:
buff.flush();
buff.close();
Even with other resources, closing them when done is a good idea.
You have to close() the stream after use.
Call buff.close() after write loop; BufferedWriter will flush data to file at close.
Though the question is answered . I would like to add how buffer works.
whenever you try to write to a file using buffer,whatever you write gets added to the buffer. When the buffer is full the contents are written to the file . This way we are reducing the number of hits to the hard-drive hence improving the efficency.
If we want to forcefully write to a file without the buffer getting full , we use flush() method.
Starting with Java 8, one would simply do it with a try with resources, which automatically closes the BufferedWriter. Also see the usage of the new class Files
try (BufferedWriter writer = Files.newBufferedWriter(somePath, yourCharset)){
writer.write(output);
}

How can I print .exe printf() messages from java program

I have one application that prints messages from Test.exe in console .My java program creates one process by executing this Test.exe.
This application prints messages by reading from input-stream of that process.
The problem, that I am facing is,
I have two scenarios:
1) When I double click test.exe, messages("Printing : %d") are printing for every second.
2)But when I run my java application,whole messages are printing at last(not for every second) before terminating Test.exe.If .exe has a very huge messages to print,then it will print those messages(I think whenever buffer becomes full)and flushing will be done.
But how can I print messages same as 1st case.
Help from anyone would be appreciated. :)
Here is the code for this Test.exe.
#include <stdio.h>
#include <windows.h>
void main(void)
{
int i=0;
while (1)
{
Sleep(500);
printf("\nPrinting : %d",i);
i++;
if (i==10)
//if(i==100)
{
return 0;
}
}
}
And my Java application is below:
public class MainClass {
public static void main(String[] args) {
String str = "G:\\Charan\\Test\\Debug\\Test.exe";
try {
Process testProcess = Runtime.getRuntime().exec(str);
InputStream inputStream = new BufferedInputStream(
testProcess.getInputStream());
int read = 0;
byte[] bytes = new byte[1000];
String text;
while (read >= 0) {
if (inputStream.available() > 0 ) {
read = inputStream.read(bytes);
if (read > 0) {
text = new String(bytes, 0, read);
System.out.println(text);
}
}
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
Is it possible in reverse order.If I input some text from console,Java should read and pass that String to .exe(or testProcess).How .exe scan something from Java program.
Could anyone help me..
Given that you're trying to print stdout from that process line by line, I would created a BufferedReader object using the process' input stream and use the readLine() method on that. You can get a BufferedReader object using the following chain of constructors:
BufferedReader testProcessReader = new BufferedReader(new InputStreamReader(testProcess.getInputStream()));
And to read line by line:
String line;
while ((line = testProcessReader.readLine()) != null) {
System.out.println(line);
}
The assumption here is that Test.exe is flushing its output, which is required by any read from the Java side. You can flush the output from C by calling fflush(stdout) after every call to printf().
If you don't flush, the data only lives in a buffer. When considering performance, it's a trade-off, how often you want the data to be written vs. how many writes / flush operations you want to save. If performance is critical, you can consider looking into a more efficient inter-process communication mechanism to pass data between the processes instead of stdout. Since you are on Windows, the first step might be to take a look at the Microsoft IPC help page.
Seems to have something to do with not flushing. I guess it's on both sides - The C library you use seems to only automatically flush output when writing to a terminal. Flush manually after calling printf.
On the Java side, try reading from a non-buffered stream.

Categories