Add information to a bufferWriter before flushing it java - java

I understand that the BufferedWriter stores the information before I write in in a file before executing the act of writing it in a file using flush(), Append() etc..
I gather information from multiple sources, so currently what I am doing is looping in each source and appending it directly to the file each time, but what I'm trying to accomplish is to add all the information in the BufferedWriter and after finishing the loop, writing it to the file, how could that be done?
I am trying to improve performance by not flushing the data into the file so many times. The performance is issue because this might loop 1 million times.
Here is what I'm currently doing:
Open BufferedWriter
read data from a different source and storing in the buffer
appending stored data in a text file(here the buffer is emptied)
repeating steps 2.- and 3.- 50 times
closing text file
Here is what I'm trying to do:
Open BufferedWriter
read data from a different source and storing in the buffer
repeat step 2.- 50 times
append all the data collected(the data gathered over the 50 loops)
close the file
here is the code.
for (int mainLoop = 0; mainLoop < 50; mainLoop++){
try {
BufferedWriter writer = writer = new BufferedWriter(new
FileWriter
("path to file in computer" + mainLoop + ".txt", true));
for(int forloop = 0; forloop < 50; forloop++) {
final Document pageHtml=
Jsoup.connect("link to a page").get();
Elements body = pageHtml.select("p");
writer.append(System.getProperty("line.separator"));
writer.append(System.getProperty("line.separator"));
writer.append(body.text());
System.out.println(forloop);
}
writer.close();
} catch (IOException e) {
e.printStackTrace();
}continue;
}

i am trying to improve performance by not flushing the data into de file so many times
Are you flushing the data manually after each write? Don't do that.
Otherwise, specify a larger size when you instantiate your BufferedWriter.
You have the option of using a StringBuilder to aggregate the output first. However, I assume you have more output than you want to store in memory.
Finally, is there really a performance cost?
===
The BufferedWriter will optimize the actual writes it performs. As long as you specify a large buffer size, e.g., 10,000, multiple small writes to the buffer will not cause a write until the buffer is full. I see a comment that you are "clearing" the buffer. Don't do that. Leave the BufferedWriter alone and let it do its thing.
If you are accumulating information and then, for some reason, discarding it, use a StringBuilder to accumulate and then write the StringBuild content to the Writer.

A buffered writer will flush when you instruct it to. And also any time the buffer becomes full. It can be tricky to determine when buffer becomes full. And really, you should not care. A buffered writer will improve performance regardless of precisely when it flushes. Instead, your output code should use BufferedWriter just like any other Writer.
I also see in your code that you repeatedly open and close the output file. You almost certainly don't need to do that. Instead open and close the file at a higher level in your program, so it remains open for each iteration.

Related

The consequences of not closing a PrintWriter object

I wrote a really basic program which takes 2 equally sized lists of temperatures and creates a .CSV file using them. The weird part is that when I forgot to close the PrintWriter object and ran the program the output file was missing 300ish results (I don't know which results were missing) and depending on how the comma in the println was written either like this ", " or like this "," it would be missing a different number of results. When I closed the PrintWriter regardless of how the comma was written the output file would have the correct number of results. I was just wondering if anyone could explain why this was happening I thought closing the PrintWriter would just close it in the same was as closing a Scanner object would?
Don't have access to the code right now but it was just a for loop which would print the value of the current index of the 2 arrays in this format
PrintWriter.println(list1.get[i] + "," + list2.get[i];
Typically, output is collected in memory and only written to disk from time to time. This is done since larger disk writes are much more efficient.
When you don't close the writer you miss the last buffer full of output. But there are other negative consequences as well, the file will stay open until the program exits. If you do this repeatedly it will lead to resource leaks.
Aside from the writer content not being properly flushed and thus getting partially lost, every open writer hogs resources (RAM and OS file handles), and also blocks the file from being accessed by other processes.
Always close every writer and reader once you're done with it.

In Java how to read the latest string of constantly generated stream fast?

In Java I have a process constantly generating output. Of course it's placed into some buffer of the out stream (FiFo) until it's processed. But what I need is sometimes read the latest, actual string of the stream as if it was LiFo. The problem is when I need it, I have to read all the previous output generated between my reads, because streams don't have random access - which is very slow.
I use BufferedReader(StreamReader(process.getInputStream()))
The buffer of BufferedReader also poses a little problem.
How can I discard all the output I don't need, fast?
If possible I wouldn't like to create separate reader-discarder thread.
I tried:
stdInput = new BufferedReader(new
InputStreamReader(process.getInputStream()), 1000);
then when I need to read the Output:
stdInput.skip(iS.available() + 1000); //get the generated up
//till now sequence length and discard it
stdInput.readLine(); //to 'flush' the BufferedReader buffer
s = stdInput.readLine(); //to read the latest string
this way is very slow and takes undetermined time
Since you haven't posted the full code, it may be useful for you to try a few things and see what performs best. Overall, I'm not sure how much improvement you will see.
Remove BufferedReader wrapper and benchmark. It seems that your InputStream is memory backed. Hence reads maybe cheap. If it is then buffered reader may slow you down. If reads are expensive, then you should keep it.
Try Channels.newChannel(InputStream in) and use ReadableByteChannel.read(ByteBUffer dst) and benchmark. Again, it depends on your InputStream, but a Channel may show performance benefits.
Overall I'd recommend going the multithreaded approach with an ExecutorService and Callable class doing the reading and processing into a memory buffer.
My suggestion (I don't know your implementation): https://docs.oracle.com/javase/7/docs/api/java/io/BufferedReader.html#mark(int)
BufferedReader has a mark method:
public void mark(int readAheadLimit)
throws IOException
Marks the present position in the stream. Subsequent calls to reset()
will attempt to reposition the stream to this point.
If you know the size of your data and you can utilize that size to set the position to the part of the stream in which you care about.

How to clone or copy a BufferedReader?

I'm creating an Android app which is going to read very large files, with about 1.000 to 40.000 lines, which takes quite a bit of time with a single loop. Therefore, I'm trying to create a multithreaded reader, which creates multiple threads, and each of them reads a specific part of the file, and then it puts all the small parts together in one big array or String.
I'm using a BufferedReader which loops through each line in the file, and store the line count.
Each time the loop run, I check if lineNumber % LINES_PER_READER == 0 is true. If it is, I create a new reader thread, which should read the next LINES_PER_THREAD-number of lines in the file.
I wonder (because the files can be huge) if I can copy or clone the BufferedReader in any way, so that the new reader thread can just start reading from the line where it was created, because I already have a loop which is reading that line, instead of creating a new BufferedReader, read each line until I get to the specified line and then start reading the actual values.
Don't clone the BufferedReader. It will create trouble. Just send batches of lines to the individual processing threads.

Splitting text file without reading it

Is there any method so that I can split a text file in java without reading it?
I want to process a large text file in GB's, so I want to split file in small parts and apply thread over each file and combine result for it.
As I will be reading it for small parts then splitting a file by reading it won't make any sense as I will have to read same file for twice and it will degrade my performance.
Your threading attempt is ill formed. If you have to do significant processing with your file data consider following threading structure:
1 Reader Thread (Reads the File and feeds the workers )
Queue with read chunks
1..n Worker Threads (n depends on your cpu cores, processes the data chunks from the reader thread)
Queue or dictionary with processed chunks
1 Writer Thread ( Writes results to some file)
Maybe you could combine the Reader / Writer thread into one thread because it doesn't make much sense to parallelize IO on the same physical harddisk.
It's clear that you need some synchronization stuff between the threads. Especially for queues think about semaphores
Without reading the content of file you can't do that. That is not possible.
I don't think this is possible for the following reasons:
How do you write a file without "reading" it?
You'll need to read in the text to know where a character boundary is (the encoding is not necessarily 1 byte). This means that you cannot treat the file as binary.
Is it really not possible to read line-by line and process it like that? That also saves additional space that the split files will take up alongside the original. For you reference, reading a text file is simply:
public static void loadFileFromInputStream(InputStream in) throws IOException {
BufferedReader inputStream = new BufferedReader(new InputStreamReader(in));
String record = inputStream.readLine();
while (record != null) {
// do something with the record
// ...
record = inputStream.readLine();
}
}
You're only reading one line at a time... so the size of the file does not impact performance at all. You can also stop anytime you have to. If you're adventurous you can also add the lines to separate threads to speed up processing. That way, IO can continue churning along while you process your data.
Good luck! If, for some reason, you do find a solution, please post it here. Thanks!
Technically speaking - it cant be done without reading the file. But you also dont need to keep the entire file contents in memory to do the splitting. Just open a stream to the file and write out to other files by redirecting output to another file after certain number of bytes are written to one file. This way you are not required to keep more than one byte of file data in memory at any given time. But having a larger buffer, about 8 or 16kb will be dramatically increase performance.
Something has to read your file to split it (and you probably want to split it at line barriers, probably not at some multiple of kilobytes).
If running on Linux machine, you could delegate the splitting to an external command like csplit. So your Java program would simply run a csplit yourbigfile.txt command.
In the literal sense no. To literally split a file into smaller files, you have to read the large one and write the smaller ones.
However, I think you really want to know if you can have different threads sequentially reading different "parts" of a file at the same time. And the answer is that you can do that. Just have each thread create its own RandomAccessFile object for the file, seek to the relevant place, and start reading.
(A FileInputStream would probably work too, though I don't think that the Java API spec guarantees that skip is implemented using a OS level "seek" operation on the file.)
There are a couple of possible complications:
If the file is text, you presumably want each thread to start processing at the start of some line in the file. So each thread has to start by finding the end of a line, and make sure that it reads to the end of the last line in its "part".
If the file uses a variable width character encoding (e.g. UTF-8), then you need to deal with the case where your partition boundaries fall in the middle of a character.

Does DataOutputStream flush automatically when its buffer is full?

I'm writing information to a file, through a DataOutputStream (RandomAccessFile->FileOutputStream->BufferedOutputStream->DataOutputStream).
I assume that if the buffer used for data output is filled, then the dataoutput stream would automatically flush?
The reason I ask is that I'm writing the data in a for loop, and flushing after the loop (I'm guessing that flushing after every iteration of the loop would destroy the point of using buffers), and when the data gets too big (4MB atm) my file isn't coming out correctly.
DataOutputStream doesn't have a buffer, so there is nothing to flush. Everything is written within the write()/writeXXX() methods. However the BufferedOutputStream has a buffer of course, so you certainly need to flush or close to get that data written to the file. You need to close the outermost stream, i.e. in this case the DataOutputStream, not any of the nested streams.
when the data gets too big (4MB atm) my file isn't coming out
correctly.
You'll have to post your code. BufferedOutputStream's buffer is 8k bytes by default, nothing to do with 4Mb.

Categories