I have the following code to open a zip file that contains several files, and extracts information from each file:
public static void unzipFile(InputStream zippedFile) throws IOException {
try (ZipInputStream zipInputStream = new ZipInputStream(zippedFile)) {
for (ZipEntry zipEntry = zipInputStream.getNextEntry(); zipEntry != null; zipEntry = zipInputStream.getNextEntry()) {
BufferedReader reader = new BufferedReader(new InputStreamReader(new BoundedInputStream(zipInputStream, 1024)));
//Extract info procedure...
}
}
}
In summary, I pick each file from the zip, and open it with a BufferedReader to read the information from it. I'm also using BoundedInputStream (org.apache.commons.io.input.BoundedInputStream) to limit buffer size and avoid unwanted huge lines on the files.
It works as expected, however I'm getting this warning on Sonar:
Use try-with-resources or close this "BufferedReader" in a "finally" clause.
I just can't close (or use try-with-resources, like I did on the beginning of the method) the BufferedReaders I create - if I call the close method, the ZipInputStream will close. And the ZipInputStream is already under try-with-resources...
This sonar notification is marked as critical, but I believe it is a false positive. I wonder if you could clarify to me - am I correct, or should I handle this in a different way? I don't want to leave resource leaks in the code, since this method will be called several times and a leak could cause a serious damage.
The sonar notification is correct in that there is technically a resource leak that could eat up resources over time (see garbage collection and IO classes). In order to avoid closing the underlying ZipInputStream, consider passing the ZipEntry into the BoundedInputStream in the for loop as per this SO question: reading files in a zip file. Thus, when the BufferedReader is closed, the BoundedInputStream is closed and not the ZipInputStream.
Thanks to the answers here, I could address my issue this way:
BoundedInputStream boundedInputStream = new BoundedInputStream(zipInputStream, MAX_LINE_SIZE_BYTES);
boundedInputStream.setPropagateClose(false);
try(BufferedReader reader = new BufferedReader(new InputStreamReader(boundedInputStream))) { ...
With boundedInputStream.setPropagateClose(false); I can close the BufferedReader without closing the zipInputStream.
Related
I have a question regarding reading files in java.
Here is the sample code
File path = new File("myfile.txt");
InputStream inputStream = new FileInputStream(file);
BufferedReader reader = new BufferedReader(new InputStreamReader(inputStream));
file.delete();
String line;
while((line = reader.readLine()) != null) {
System.out.println(line);
}
I create an input stream and try to read it. As they say, its like a pipe, you read values byte by byte.
To speed it up, we can use BufferedReader which can read chunk by chunk.
So, I delete this file before reading.
Now, when i read it, it still reads complete file, even though file is not there.
If inputStream is a pipe, why is it not failing ? Any ideas ?
I'm pretty sure it's because the txt file you are loading from is so small it is fully read on initialization of the buffered reader.
In my opinion, the file still exists in the reader and the inputStream. Deleting the source file won't change anything until you rerun your application.
BufferedReader and InputStream classes are probaly prepared for similar situations like this. It can be useful if the source of the file goes offline during the run.
Correct me if I'm wrong, I'm not a professional programmer.
I see some posts on StackOverflow that contradict each other, and I would like to get a definite answer.
I started with the assumption that using a Java InputStream would allow me to stream bytes out of a file, and thus save on memory, as I would not have to consume the whole file at once. And that is exactly what I read here:
Loading all bytes to memory is not a good practice. Consider returning the file and opening an input stream to read it, so your application won't crash when handling large files. – andrucz
Download file to stream instead of File
But then I used an InputStream to read a very large Microsoft Excel file (using the Apache POI library) and I ran into this error:
java.lang.outofmemory exception while reading excel file (xlsx) using POI
I got an OutOfMemory error.
And this crucial bit of advice saved me:
One thing that'll make a small difference is when opening the file to start with. If you have a file, then pass that in! Using an InputStream requires buffering of everything into memory, which eats up space. Since you don't need to do that buffering, don't!
I got rid of the InputStream and just used a bare java.io.File, and then the OutOfMemory error went away.
So using java.io.File is better than an InputSteam, when it comes to memory use? That doesn't make any sense.
What is the real answer?
So you are saying that an InputStream would typically help?
It entirely depends on how the application (or library) >>uses<< the InputStream
With what kind of follow up code? Could you offer an example of memory efficient Java?
For example:
// Efficient use of memory
try (InputStream is = new FileInputStream(largeFileName);
BufferedReader br = new BufferedReader(new InputStreamReader(is))) {
String line;
while ((line = br.readLine()) != null) {
// process one line
}
}
// Inefficient use of memory
try (InputStream is = new FileInputStream(largeFileName);
BufferedReader br = new BufferedReader(new InputStreamReader(is))) {
StringBuilder sb = new StringBuilder();
while ((line = br.readLine()) != null) {
sb.append(line).append("\n");
}
String everything = sb.toString();
// process the entire string
}
// Very inefficient use of memory
try (InputStream is = new FileInputStream(largeFileName);
BufferedReader br = new BufferedReader(new InputStreamReader(is))) {
String everything = "";
while ((line = br.readLine()) != null) {
everything += line + "\n";
}
// process the entire string
}
(Note that there are more efficient ways of reading a file into memory. The above examples are purely to illustrate the principles.)
The general principles here are:
avoid holding the entire file in memory, all at the same time
if you have to hold the entire file in memory, then be careful about you "accumulate" the characters.
The posts that you linked to above:
The first one is not really about memory efficiency. Rather it is talking about a limitation of the AWS client-side library. Apparently, the API doesn't provide an easy way to stream an object while reading it. You have to save it the object to a file, then open the file as a stream. Whether that is memory efficient or not depends on what the application does with the stream; see above.
The second one specific to the POI APIs. Apparently, the POI library itself is reading the stream contents into memory if you use a stream. That would be an implementation limitation of that particular library. (But there could be a good reason; e.g. maybe because POI needs to be able to "seek" or "rewind" the stream.)
In my API (Spring boot) I have an endpoint where users can upload multiple file at once. The endpoint takes as input a list of MultipartFile.
I wish not to directly pass this MultipartFile object to the service directly so I loop through each MultipartFile and create a simple map that stored the filename and its InputStream.
Like this:
for (MultipartFile file : files) {
try (InputStream is = multipartFile.getInputStream()) {
filesMap.put(file.getOriginalFilename(), is);
}
}
service.uploadFiles(filesMap)
My understanding for Java streams and streams closing is quite limited.
I thought that try-with-resources automatically closes the InputStream once the code reached the end of the try block.
In the above code when does exactly the the multipartFile.getInputStream() gets closed?
The fact that I'm storing the stream in a map will that cause a memory leak?
Stream closes right after execution reaches closing bracket of try block.
It is okay to store InputStream anywhere after you closed it.
But be aware of that you can't read anything from this stream after you closes it.
Thanks to comments
Also, be aware of that some streams have special behavior on close() and it always depends on Stream realization.
For example:
If you try to read from closed FileInputStream you will get
java.io.IOException: Stream Closed
If you try to read from closed ByteArrayInputStream it will be okay, because of it's special close() realization: public void close() throws IOException {}
When does exactly the multipartFile.getInputStream() gets closed?
try (InputStream is = multipartFile.getInputStream()) {
filesMap.put(file.getOriginalFilename(), is);
} // <-- here
The try-with-resources statement ensures that each resource is closed at the end of the statement.
The fact that I'm storing the stream in a map will that cause a memory leak?
No, your collection just keeps closed InputStreams and you won't be able to read from them (in addition, you will get IOException).
I have a Java code that reads through an input file using a buffer reader until the readLine() method returns null. I need to use the contents of the file again indefinite number of times. How can I read this file from beginning again?
You can close and reopen it again. Another option: if it is not too large, put its content into, say, a List.
Buffer reader supports reset() to a position of buffered data only. But this cant goto the begin of file (suppose that file larger than buffer).
Solutions:
1.Reopen
2.Use RandomAccessFile
A single Reader should be used once to read the file. If you want to read the file again, create a new Reader based on it.
Using Guava's IO utilities, you can create a nice abstraction that lets you read the file as many times as you want using Files.newReaderSupplier(File, Charset). This gives you an InputSupplier<InputStreamReader> that you can retrieve a new Reader from by calling getInput() at any time.
Even better, Guava has many utility methods that make use of InputSuppliers directly... this saves you from having to worry about closing the supplied Reader yourself. The CharStreams class contains most of the text-related IO utilities. A simple example:
public void doSomeStuff(InputSupplier<? extends Reader> readerSupplier) throws IOException {
boolean needToDoMoreStuff = true;
while (needToDoMoreStuff) {
// this handles creating, reading, and closing the Reader!
List<String> lines = CharStreams.readLines(readerSupplier);
// do some stuff with the lines you read
}
}
Given a File, you could call this method like:
File file = ...;
doSomeStuff(Files.newReaderSupplier(file, Charsets.UTF_8)); // or whatever charset
If you want to do some processing for each line without reading every line into memory first, you could alternatively use the readLines overload that takes a LineProcessor.
you do this by calling the run() function recursively, after checking to see if no more lines can be read - here's a sample
// Reload the file when you reach the end (i.e. when you can't read anymore strings)
if ((sCurrentLine = br.readLine()) == null) {
run();
}
If you want to do this, you may want to consider a random access file. With that you can explicitly set the position back to the beginning and start reading again from there.
i would suggestion usings commons libraries
http://commons.apache.org/io/api-release/org/apache/commons/io/FileUtils.html
i think there is a call to just read the file into a byteArray which might be an alternate approach
Not sure if you have considered the mark() and reset() methods on the BufferedReader
that can be an option if your files are only a few MBs in size and you can set the mark at the beginning of the file and keep reset()ing once you hit the end of the file. It also appears that subsequent reads on the same file will be served entirely from the buffer without having to go to the disk.
I faced with the same issue and came wandering to this question.
1. Using mark() and reset() methods:
BufferedReader can be created using a FileReader and also a FileInputStream. FileReader doesn't support Mark and Reset methods. I got an exception while I tried to do this. Even when I tried with FileInputStream I wasn't able to do it because my file was large (even your's is I guess). If the file length is larger than the buffer then mark and reset methods won't work neither with FileReader not with FileInputStream. More on this in this answer by #jtahlborn.
2. Closing and reopening the file
When I closed and reopened the file and created a new BufferedReader, it worked well.
The ideal way I guess is to reopen the file again and construct a new BufferedReader as a FileReader or FileInputStream should be used only once to read the file.
try {
BufferedReader br = new BufferedReader(new FileReader(input));
while ((line = br.readLine()) != null)
{
//do somethng
}
br.close();
}
catch(IOException e)
{
System.err.println("Error: " + e.getMessage());
}
}
I was handed some data in a file with an .dat extension. I need to read this data in a java program and build the data into some objects we defined. I tried the following, but it did not work
FileInputStream fstream = new FileInputStream("news.dat");
BufferedReader br = new BufferedReader(new InputStreamReader(fstream));
Could someone tell me how to do this in java?
What kind of file is it? Is it a binary file which contains serialized Java objects? If so, then you rather need ObjectInputStream instead of DataInputStream to read it.
FileInputStream fis = new FileInputStream("news.dat");
ObjectInputStream ois = new ObjectInputStream(fis);
Object object = ois.readObject();
// ...
(don't forget to properly handle resources using close() in finally, but that's beyond the scope of this question)
See also:
Basic serialization tutorial
A .dat file is usually a binary file, without any specific associated format. You can read the raw bytes of the file in a manner similar to what you posted - but you will need to interpret these bytes according to the underlying format. In particular, when you say "open" the file, what exactly do you want to happen in Java? What kind of objects do you want to be created? How should the stream of bytes map to these objects?
Once you know this, you can either write this layer yourself or use an existing API (assuming it's a standard format).
For reference, your example doesn't work because it assumes that the binary format is a character representation in the platform's default charset (as per the InputStreamReader constructor). And as you say it's binary, this will fail to convert the binary to a stream of characters (since, after all, it's not).
// BufferedInputStream not strictly needed, but much more efficient than reading
// one byte at a time
BufferedInputStream in = new BufferedInputStream (new FileInputStream("news.dat"));
This will give you a buffered stream which will return the raw bytes of the file; you can now either read and process them yourself, or pass this input stream to some library API that will create appropriate objects for you (if such a library exists).
That entirely depends on what sort of file the .dat is. Unfortunately, .dat is often used as a generic extension for a data file. It could be binary, in which case you could use FileInputStream fstream = new FileInputStream(new File("news.dat")); and call read() to get bytes from the file, or text, in which case you could use BufferedReader buff = new BufferedInputReader(new FileInputStream(new File("news.dat"))); and call readLine() to get each line of text. [edit]Or it could be Java objects in which case what BalusC said.[/edit]
In both cases, you'd then need to know what format the file was in to divide things up and get meaning from it, although this would be much easier if it was text as it could be done by inspection.
Please try the below code:
FileReader file = new FileReader(new File("File.dat"));
BufferedReader br = new BufferedReader(file);
String temp = br.readLine();
while (temp != null) {
temp = br.readLine();
System.out.println(temp);
}
A better way would be to use try-with-resources so that you would not have to worry about closing the resources.
Here is the code.
FileInputStream fis = new FileInputStream("news.dat");
try(ObjectInputStream objectstream = new ObjectInputStream(fis)){
objectstream.readObject();
}
catch(IOException e){
//
}