Reading all object files in directory with single stream - java

If I had a directory filled with different object files, is there a way I could input them into my application without opening a new stream every time? I am currently using ObjectInputStream, but I don't mind using another form of IO.
For example, if I stored my users directly onto my harddrive as objects (each having their own file: name.user), is there a way I could load them all back in using the same stream? Or would it be impossible seeing how a new File object would be needed for each individual file? Is there a way around this?

Each file will need its own stream behind the scenes; there's no way round that. But that doesn't stop you creating your own InputStream that manages this for you, and then allows you to read everything off from one stream.
The idea would be that when you try to read from your CompoundObjectInputStream or whatever, it looks to see if there are any more files that it hasn't yet processed, and opens one if so using another stream, and passes the data through. When it reaches the point where there are no more files in that directory, the CompoundObjectInputStream indicates end-of-stream.

No, there is not. Each physical file requires its own FileInputStream, FileChannel, or other corresponding native accessor.
Note that File has no direct link to a physical file, it is just an abstract path name.

Related

How to create an InputStream of files that have a certain extension in Java?

I have a lot of files in a directory but I only want to read the ones with a certain extension (say .txt). I want these files added to the same BufferedInputStream so that I can read them in one go. When I call read() at the end of a file, the next one should begin.
It really feels like there should be an obvious answer to this but I had no luck finding it.
You might want to take a look at SequenceInputStream:
A SequenceInputStream represents the logical concatenation of other
input streams. It starts out with an ordered collection of input
streams and reads from the first one until end of file is reached,
whereupon it reads from the second one, and so on, until end of file
is reached on the last of the contained input streams.
To me the "obvious answer" is:
Just iterate through all the files in the directory using a proper filter. For each file create a FileInputStream, read it and close it.
I don't think there is an obvious answer to this question.
Probably you need to create a Wrapper InputStream with a list of files you want to read from. Internally you will open/close streams as needed, namely when a file is completely read.
It is not obvious but should not be difficult. This way you can work 'only' with one InputStream for all files.

How to manage the creation and deletion of temporary files

I'm adding code to a large JSP web application, integrating functionality to convert CGM files to PDFs (or PDFs to CGMs) to display to the user.
It looks like I can create the converted files and store them in the directory designated by System.getProperty("java.io.tmpdir"). How do I manage their deletion, though? The program resides on a Linux-based server. Will the OS automatically delete from /tmp or will I need to come up with functionality myself? If it's the latter scenario, what are good ways to go about doing it?
EDIT: I see I can use deleteOnExit() (relevant answer elsewhere), but I think the JVM runs more or less continuously in the background so I'm not sure if the exits would be frequent enough.
I don't think I need to cache any converted files--just convert a file anew every time it's needed.
You can do this
File file = File.createTempFile("base_name", ".tmp", new File(temporaryFolderPath));
file.deleteOnExit();
the file will be deleted when the virtual machine terminates
Edit:
If you want to delete it after the job is done, just do it:
File file = null;
try{
file = File.createTempFile("webdav", ".tmp", new File(temporaryFolderPath));
// do sth with the file
}finally{
file.delete();
}
There are ways to have the JVM delete files when the JVM exits using deleteOnExit() but I think there are known memory leaks using that method. Here is a blog explaining the leak: http://www.pongasoft.com/blog/yan/java/2011/05/17/file-dot-deleteOnExit-is-evil/
A better solution would either be to delete old files using a cron or if you know you aren't going to use the file again, why not just delete it after processing?
From your comment :
Also, could I just create something that checks to see if the size of my files exceeds a certain amount, and then deletes the oldest ones if that's true? Or am I overthinking it?
You could create a class that keeps track of the created files with a size limit. When the size of the created files, after creating a new one, goes over the limit, it deletes the oldest one. Beware that this may delete a file that still needs to exist even if it is the oldest one. You might need a way to know which files still need to be kept and delete only those that are not needed anymore.
You could have a timer in the class to check periodically instead of after each creation. This solution is tied to your application while using a cron isn't.

How to persist large strings in a POJO?

If I have a property of an object which is a large String (say the contents of a file ~ 50KB to 1 MB, maybe larger), what is the practice around declaring such a property in a POJO? All I need to do is to be able to set a value from one layer of my application and transfer it to another without making the object itself "heavy".
I was considering if it makes sense to associate an InputStream or OutputStream to get / set the value, rather than reference the String itself - which means when I attempt to read the value of the contents, I read it as a stream of bytes, rather than a whole huge string loaded into memory... thoughts?
What you're describing depends largely on your anticipated use of the data. If you're delivering the contents in raw form, then there may be more efficient ways to manage it.
For example, if your app has a web interface, your app may just provide a URL for a web server to stream the contents to the requester. If it's a CLI-based app, you may be able to get away with a simple file copy. If your app is processing the file, however, then perhaps your POJO could retain only the results of that processing rather than the raw data itself.
If you wish to provide a general pattern along the lines of using POJO's with references to external streams, I would suggest storing in your POJO something akin to a URI that tells where to find the stream (like a row ID in a database or a filename or a URI) rather than storing an instance of the stream itself. In doing so, you'll reduce the number of open file handles, prevent potential concurrency issues, and will be able to serialize those objects locally if needed without having to duplicate the raw data persisted elsewhere.
You could have an object that supplies a stream or an iterator every time you access it. Note that the content has to live on some storage, like a file. I.e your object will store a pointer (e.g. a file path) to the storage and every time someone access it, you open a stream or create an iterator and let that party read. Note also that in order to save on memory, whoever consumes it has to make sure not to store the whole content in memory.
However, 50KB or 1MB is really tiny. Unless you have like gigabytes (or maybe hundred megabytes), I wouldn't try to do something like that.
Also, even if you have large data, it's often simpler to just use files or whatever storage you'll use.
tl;dr: Just use String.

Where is data stored during streaming file upload via Apache Commons?

I saw this nifty guide on how to do streaming file uploads via Apache Commons. This got me thinking where is the data stored? And is it necessary to "close" or "clean" that location?
Thanks!
where is the data stored?
I don't think it is stored.
The Streaming API doesn't use DiskFileItemFactory. But it does use a buffer for copying data as BalusC has posted.
Once you have the stream of the upload, you can use
long bytesCopied = Streams.copy(yourInputStream, yourOutputStream, true);
Look at the API
Here is the javadoc for DiskFileItemFactory.
The default FileItemFactory implementation. This implementation
creates FileItem instances which keep their content either in memory,
for smaller items, or in a temporary file on disk, for larger items.
The size threshold, above which content will be stored on disk, is
configurable, as is the directory in which temporary files will be
created.
If not otherwise configured, the default configuration values are as
follows:
Size threshold is 10KB.
Repository is the system default temp directory, as returned by System.getProperty("java.io.tmpdir").
Temporary files, which are created for file items, should be deleted
later on. The best way to do this is using a FileCleaningTracker,
which you can set on the DiskFileItemFactory. However, if you do use
such a tracker, then you must consider the following: Temporary files
are automatically deleted as soon as they are no longer needed. (More
precisely, when the corresponding instance of File is garbage
collected.) This is done by the so-called reaper thread, which is
started automatically when the class FileCleaner is loaded. It might
make sense to terminate that thread, for example, if your web
application ends. See the section on "Resource cleanup" in the users
guide of commons-fileupload.
So, yes close and cleanup are necessary, as FileItem may denote a real file on disk.
It's stored as a byte[] in the Java memory.

How to create a file that streams to http response

I'm writing a web application and want the user to be able click a link and get a file download.
I have an interface is in a third party library that I can't alter:
writeFancyData(File file, Data data);
Is there an easy way that I can create a file object that I can pass to this method that when written to will stream to the HTTP response?
Notes:
Obviously I could just write a temporary file and then read it back in and then write it the output stream of the http response. However what I'm looking for is a way to avoid the file system IO. Ideally by creating a fake file that when written to will instead write to the output stream of the http response.
e.g.
writeFancyData(new OutputStreamBackedFile(response.getOutputStream()), data);
I need to use the writeFancyData method as it writes a file in a very specific format that I can't reproduce.
Assuming writeFancyData is a black box, it's not possible. As a thought experiment, consider an implementation of writeFancyData that did something like this:
public void writeFancyData(File file, Data data){
File localFile = new File(file.getPath());
...
// process data from file
...
}
Given the only thing you can return from any extended version of File is the path name, you're just not going to be able to get the data you want into that method. If the signature included some sort of stream, you would be in a lot better position, but since all you can pass in is a File, this can't be done.
In practice the implementation is probably one of the FileInputStream or FileReader classes that use the File object really just for the name and then call out to native methods to get a file descriptor and handle the actual i/o.
As dlawrence writes the API it is impossible to determine what the API is doing with the File.
A non-java approach is to create a named pipe. You could establish a reader for the pipe in your program, create a File on that path and pass it to API.
Before doing anything so fancy, I would recommend analyzing performance and verify that disk i/o is indeed a bottleneck.
Given that API, the best you can do is to give it the File for a file in a RAM disk filesystem.
And lodge a bug / defect report against the API asking for an overload that takes a Writer or OutputStream argument.

Categories