I wrote a really basic program which takes 2 equally sized lists of temperatures and creates a .CSV file using them. The weird part is that when I forgot to close the PrintWriter object and ran the program the output file was missing 300ish results (I don't know which results were missing) and depending on how the comma in the println was written either like this ", " or like this "," it would be missing a different number of results. When I closed the PrintWriter regardless of how the comma was written the output file would have the correct number of results. I was just wondering if anyone could explain why this was happening I thought closing the PrintWriter would just close it in the same was as closing a Scanner object would?
Don't have access to the code right now but it was just a for loop which would print the value of the current index of the 2 arrays in this format
PrintWriter.println(list1.get[i] + "," + list2.get[i];
Typically, output is collected in memory and only written to disk from time to time. This is done since larger disk writes are much more efficient.
When you don't close the writer you miss the last buffer full of output. But there are other negative consequences as well, the file will stay open until the program exits. If you do this repeatedly it will lead to resource leaks.
Aside from the writer content not being properly flushed and thus getting partially lost, every open writer hogs resources (RAM and OS file handles), and also blocks the file from being accessed by other processes.
Always close every writer and reader once you're done with it.
Related
I understand that the BufferedWriter stores the information before I write in in a file before executing the act of writing it in a file using flush(), Append() etc..
I gather information from multiple sources, so currently what I am doing is looping in each source and appending it directly to the file each time, but what I'm trying to accomplish is to add all the information in the BufferedWriter and after finishing the loop, writing it to the file, how could that be done?
I am trying to improve performance by not flushing the data into the file so many times. The performance is issue because this might loop 1 million times.
Here is what I'm currently doing:
Open BufferedWriter
read data from a different source and storing in the buffer
appending stored data in a text file(here the buffer is emptied)
repeating steps 2.- and 3.- 50 times
closing text file
Here is what I'm trying to do:
Open BufferedWriter
read data from a different source and storing in the buffer
repeat step 2.- 50 times
append all the data collected(the data gathered over the 50 loops)
close the file
here is the code.
for (int mainLoop = 0; mainLoop < 50; mainLoop++){
try {
BufferedWriter writer = writer = new BufferedWriter(new
FileWriter
("path to file in computer" + mainLoop + ".txt", true));
for(int forloop = 0; forloop < 50; forloop++) {
final Document pageHtml=
Jsoup.connect("link to a page").get();
Elements body = pageHtml.select("p");
writer.append(System.getProperty("line.separator"));
writer.append(System.getProperty("line.separator"));
writer.append(body.text());
System.out.println(forloop);
}
writer.close();
} catch (IOException e) {
e.printStackTrace();
}continue;
}
i am trying to improve performance by not flushing the data into de file so many times
Are you flushing the data manually after each write? Don't do that.
Otherwise, specify a larger size when you instantiate your BufferedWriter.
You have the option of using a StringBuilder to aggregate the output first. However, I assume you have more output than you want to store in memory.
Finally, is there really a performance cost?
===
The BufferedWriter will optimize the actual writes it performs. As long as you specify a large buffer size, e.g., 10,000, multiple small writes to the buffer will not cause a write until the buffer is full. I see a comment that you are "clearing" the buffer. Don't do that. Leave the BufferedWriter alone and let it do its thing.
If you are accumulating information and then, for some reason, discarding it, use a StringBuilder to accumulate and then write the StringBuild content to the Writer.
A buffered writer will flush when you instruct it to. And also any time the buffer becomes full. It can be tricky to determine when buffer becomes full. And really, you should not care. A buffered writer will improve performance regardless of precisely when it flushes. Instead, your output code should use BufferedWriter just like any other Writer.
I also see in your code that you repeatedly open and close the output file. You almost certainly don't need to do that. Instead open and close the file at a higher level in your program, so it remains open for each iteration.
My post got a little too long, sorry. Here is a summary:
File on disk cannot be deleted ("the JVM holds the file" error). both when deleting from the java code and when trying to manually delete the file from windows.
All streams to that file are closed and set to null. All file objects set to null.
The program does nothing at that point; but waiting 30 minutes allows me to deleted the file from windows. Weird. Is the file not used by java anymore? Plus, since nothing happens in the program, it indicates it cannot be some stream I forgot (plus, I triple checked nothing is open).
Invoking System.gc() seemed to work when files were small. Did not help when they got to about 20MB.
[EDIT2] - I tried writing some basic code to explain, but its tricky. I am sorry, I know it's difficult to answer like that. I can however write how I open and close streams, of course:
BufferedWriter bw = new BufferedWriter(new FileWriter(new File("C:\\folder\\myFile.txt")));
for(int i = 0; i < 10; i++)
{
bw.write("line " + i);
bw.newLine();
}
bw.close();
bw = null;
If I've used a file object:
File f = new File("C:\\folder\\myFile.txt");
// use it...
f = null;
Basic code, I know. But this is essentially what I do.
I know for a fact I've closed all streams in this exact way.
I know for a fact that nothing happens in the program in that 30-minutes interval in which I cannot delete the file, until I somehow magically can.
thank you for your input even without the coherent code.
I appreciate that.
Sorry for not providing any specific code here, since I can't pinpoint the problem (not exactly specific-code related). In any case, here is the thing:
I have written a program which reads, writes and modifies files on disk. For several reasons, the handling of the read/write is done in a different thread, which is constantly operating.
At some point, I am terminating the "read/write" thread, keeping only the main thread - it waits for input from a socket, totally unrelated to the file, and does nothing. Then, I try to delete the file (using either File.delete(), even tried nio.Files delete option).
The thing is - and it's very weird - sometimes it works, sometimes it doesn't. Even manually, going to the folder and trying to delete the file via windows, gives me the "The file is open by the JVM" message.
Now, I am well aware that keeping references from all kinds of streams to the file prevents me from deleting it. Well past that by now :)
I have made sure that all streams are closed. I even set their values to null, including any "File" objects I have used (even though it shouldn't make any difference). All set to null, all closed. And the thread which generates all of them - the "read/write" thread - well, it's terminated since it got the the end of its run() method.
Usually, if I wait about 30 minutes, while the JVM still operates, I can delete the file manually from windows. The error magically disappears. When the JVM is closed, I can always delete the file right away.
I am lost here. Tried specifically invoking System.gc() before trying to delete the file, even called it like 10 times (not that it should matter). Sometimes it helped, but on other occasions, for example, when the file got larger (say 20MB), that didn't help.
What am I missing here?
Obviously, this couldn't be my implicit fault (not closing some stream), since the read/write thread is dead, the main thread awaits something unrelated (so the program is at a "standstill"), I have explicitly closed all streams, even nullified the references (inStream = null), invoked the garbage collector.
What am I missing? Why is the file "deletable" after 30 minutes (nothing happens at that time - not something in my code). Am I missing some gentle reference/garbage collection thingy?
What you're doing just calls for problems. You say that "if an IOexception occurred, it is printed immediately" and it may be true, but given that something inexplicable happens, let's better doubt it.
I'd first ensure that everything gets always closed, and then I'd care about related logic (logging, exiting, ...).
Anyway, what you did is not how resources should be managed. The answer above is not exactly correct either. Anyway, try-with-resources is (besides #lombok.Cleanup) about the only way, clearly showing that nothing gets ever left open. Anything else is more complicated and more error-prone. I'd strongly recommend using it everywhere. This may be quite some work, but it also forces you to re-inspect all the critical code pieces.
Things like nullifying references and calling the GC should not help... and if they seem to do, it may be a chance.
Some ideas:
Are you using memory mapped files?
Are you sure System.exit is not disabled by a security manager?
Are you running an antivirus? They love to scan files just after they get written.
Btw., locking files is one reason why the WOW never started for me. Sometimes the locks persisted long after the culprit was gone, at least according to tools I could use.
Are you closing your streams in a try...finally or try(A a = new A()) block? If not the streams may not be closed.
I would strongly recommend using either Automatic Resource Block Management ( try(A a = new A()) ) or a try...finally block for all external resources.
try(BufferedWriter br = new BufferedWriter(new FileWriter(new File("C:\\folder\\myFile.txt")));
for(int i = 0; i < 10; i++)
{
br.write("line " + i);
br.newLine();
})
In Java I have a process constantly generating output. Of course it's placed into some buffer of the out stream (FiFo) until it's processed. But what I need is sometimes read the latest, actual string of the stream as if it was LiFo. The problem is when I need it, I have to read all the previous output generated between my reads, because streams don't have random access - which is very slow.
I use BufferedReader(StreamReader(process.getInputStream()))
The buffer of BufferedReader also poses a little problem.
How can I discard all the output I don't need, fast?
If possible I wouldn't like to create separate reader-discarder thread.
I tried:
stdInput = new BufferedReader(new
InputStreamReader(process.getInputStream()), 1000);
then when I need to read the Output:
stdInput.skip(iS.available() + 1000); //get the generated up
//till now sequence length and discard it
stdInput.readLine(); //to 'flush' the BufferedReader buffer
s = stdInput.readLine(); //to read the latest string
this way is very slow and takes undetermined time
Since you haven't posted the full code, it may be useful for you to try a few things and see what performs best. Overall, I'm not sure how much improvement you will see.
Remove BufferedReader wrapper and benchmark. It seems that your InputStream is memory backed. Hence reads maybe cheap. If it is then buffered reader may slow you down. If reads are expensive, then you should keep it.
Try Channels.newChannel(InputStream in) and use ReadableByteChannel.read(ByteBUffer dst) and benchmark. Again, it depends on your InputStream, but a Channel may show performance benefits.
Overall I'd recommend going the multithreaded approach with an ExecutorService and Callable class doing the reading and processing into a memory buffer.
My suggestion (I don't know your implementation): https://docs.oracle.com/javase/7/docs/api/java/io/BufferedReader.html#mark(int)
BufferedReader has a mark method:
public void mark(int readAheadLimit)
throws IOException
Marks the present position in the stream. Subsequent calls to reset()
will attempt to reposition the stream to this point.
If you know the size of your data and you can utilize that size to set the position to the part of the stream in which you care about.
I want to append a line of text into a file - however, I want to get the position of the string in the file, such that I can access the string directly using a RandomAccessFile and file.seek() (or similar)
The issue is that alot of file i/o operations are asynch, and the write operations can happen within very short time intervals - suggesting a asynch write, since everything else is inefficient. How do I make sure the filepointer is calculated correctly? I am a newcomer to Java and dont yet understand the details of the different methods of File I/O, so excuse my Question if using a BufferedWriter is exactly what I am looking for, but how do you get the current length of that?
EDIT: Reading the entire file is NOT an option. The file is large and as I said, the write operations happen often, several hundred every second in peak times.
Refer to the FileChannel class: http://docs.oracle.com/javase/7/docs/api/java/nio/channels/FileChannel.html
Relevant snippets from the link:
The file itself contains a variable-length sequence of bytes that can be read and written and whose current size can be queried. The size of the file increases when bytes are written beyond its current size;
...
File channels are safe for use by multiple concurrent threads. The close method may be invoked at any time, as specified by the Channel interface. Only one operation that involves the channel's position or can change its file's size may be in progress at any given time; attempts to initiate a second such operation while the first is still in progress will block until the first operation completes. Other operations, in particular those that take an explicit position, may proceed concurrently; whether they in fact do so is dependent upon the underlying implementation and is therefore unspecified.
Is there any method so that I can split a text file in java without reading it?
I want to process a large text file in GB's, so I want to split file in small parts and apply thread over each file and combine result for it.
As I will be reading it for small parts then splitting a file by reading it won't make any sense as I will have to read same file for twice and it will degrade my performance.
Your threading attempt is ill formed. If you have to do significant processing with your file data consider following threading structure:
1 Reader Thread (Reads the File and feeds the workers )
Queue with read chunks
1..n Worker Threads (n depends on your cpu cores, processes the data chunks from the reader thread)
Queue or dictionary with processed chunks
1 Writer Thread ( Writes results to some file)
Maybe you could combine the Reader / Writer thread into one thread because it doesn't make much sense to parallelize IO on the same physical harddisk.
It's clear that you need some synchronization stuff between the threads. Especially for queues think about semaphores
Without reading the content of file you can't do that. That is not possible.
I don't think this is possible for the following reasons:
How do you write a file without "reading" it?
You'll need to read in the text to know where a character boundary is (the encoding is not necessarily 1 byte). This means that you cannot treat the file as binary.
Is it really not possible to read line-by line and process it like that? That also saves additional space that the split files will take up alongside the original. For you reference, reading a text file is simply:
public static void loadFileFromInputStream(InputStream in) throws IOException {
BufferedReader inputStream = new BufferedReader(new InputStreamReader(in));
String record = inputStream.readLine();
while (record != null) {
// do something with the record
// ...
record = inputStream.readLine();
}
}
You're only reading one line at a time... so the size of the file does not impact performance at all. You can also stop anytime you have to. If you're adventurous you can also add the lines to separate threads to speed up processing. That way, IO can continue churning along while you process your data.
Good luck! If, for some reason, you do find a solution, please post it here. Thanks!
Technically speaking - it cant be done without reading the file. But you also dont need to keep the entire file contents in memory to do the splitting. Just open a stream to the file and write out to other files by redirecting output to another file after certain number of bytes are written to one file. This way you are not required to keep more than one byte of file data in memory at any given time. But having a larger buffer, about 8 or 16kb will be dramatically increase performance.
Something has to read your file to split it (and you probably want to split it at line barriers, probably not at some multiple of kilobytes).
If running on Linux machine, you could delegate the splitting to an external command like csplit. So your Java program would simply run a csplit yourbigfile.txt command.
In the literal sense no. To literally split a file into smaller files, you have to read the large one and write the smaller ones.
However, I think you really want to know if you can have different threads sequentially reading different "parts" of a file at the same time. And the answer is that you can do that. Just have each thread create its own RandomAccessFile object for the file, seek to the relevant place, and start reading.
(A FileInputStream would probably work too, though I don't think that the Java API spec guarantees that skip is implemented using a OS level "seek" operation on the file.)
There are a couple of possible complications:
If the file is text, you presumably want each thread to start processing at the start of some line in the file. So each thread has to start by finding the end of a line, and make sure that it reads to the end of the last line in its "part".
If the file uses a variable width character encoding (e.g. UTF-8), then you need to deal with the case where your partition boundaries fall in the middle of a character.