Where is data stored during streaming file upload via Apache Commons?

Where is data stored during streaming file upload via Apache Commons? - java

I saw this nifty guide on how to do streaming file uploads via Apache Commons. This got me thinking where is the data stored? And is it necessary to "close" or "clean" that location?
Thanks!

where is the data stored?
I don't think it is stored.
The Streaming API doesn't use DiskFileItemFactory. But it does use a buffer for copying data as BalusC has posted.
Once you have the stream of the upload, you can use
long bytesCopied = Streams.copy(yourInputStream, yourOutputStream, true);
Look at the API

Here is the javadoc for DiskFileItemFactory.
The default FileItemFactory implementation. This implementation
creates FileItem instances which keep their content either in memory,
for smaller items, or in a temporary file on disk, for larger items.
The size threshold, above which content will be stored on disk, is
configurable, as is the directory in which temporary files will be
created.
If not otherwise configured, the default configuration values are as
follows:
Size threshold is 10KB.
Repository is the system default temp directory, as returned by System.getProperty("java.io.tmpdir").
Temporary files, which are created for file items, should be deleted
later on. The best way to do this is using a FileCleaningTracker,
which you can set on the DiskFileItemFactory. However, if you do use
such a tracker, then you must consider the following: Temporary files
are automatically deleted as soon as they are no longer needed. (More
precisely, when the corresponding instance of File is garbage
collected.) This is done by the so-called reaper thread, which is
started automatically when the class FileCleaner is loaded. It might
make sense to terminate that thread, for example, if your web
application ends. See the section on "Resource cleanup" in the users
guide of commons-fileupload.
So, yes close and cleanup are necessary, as FileItem may denote a real file on disk.

It's stored as a byte[] in the Java memory.

Related

Reading all object files in directory with single stream

If I had a directory filled with different object files, is there a way I could input them into my application without opening a new stream every time? I am currently using ObjectInputStream, but I don't mind using another form of IO.
For example, if I stored my users directly onto my harddrive as objects (each having their own file: name.user), is there a way I could load them all back in using the same stream? Or would it be impossible seeing how a new File object would be needed for each individual file? Is there a way around this?

Each file will need its own stream behind the scenes; there's no way round that. But that doesn't stop you creating your own InputStream that manages this for you, and then allows you to read everything off from one stream.
The idea would be that when you try to read from your CompoundObjectInputStream or whatever, it looks to see if there are any more files that it hasn't yet processed, and opens one if so using another stream, and passes the data through. When it reaches the point where there are no more files in that directory, the CompoundObjectInputStream indicates end-of-stream.

No, there is not. Each physical file requires its own FileInputStream, FileChannel, or other corresponding native accessor.
Note that File has no direct link to a physical file, it is just an abstract path name.

is there a more efficient way of sending an mp4 file to the user

I am using Spring-MVC and I need to send a MP4 file back to the user. The MP4 files are, of course, very large in size (> 2 GB).
I found this SO thread Downloading a file from spring controllers, which shows how to stream back a binary file, which should theoretically work for my case. However, what I am concerned about is efficiency.
In one case, an answer may implicate to load all the bytes into memory.
byte[] data = SomeFileUtil.loadBytes(new File("somefile.mp4"));
In another case, an answer suggest using IOUtils.
InputStream is = new FileInputStream(new File("somefile.mp4"));
OutputStream os = response.getOutputStream();
IOUtils.copy(is, os);
I wonder if either of these are more memory efficient than simply defining a resource mapping?
<resources mapping="/videos/**" location="/path/to/videos/"/>
The resource mapping may work, except that I need to protect all requests to these videos, and I do not think resource mapping will lend itself to logic that protects the content.
Is there another way to stream back binary data (namely, MP4)? I'd like something that's memory efficient.

I would think that defining a resource mapping would be the cleanest way of handling it. With regards to protecting access, you can simply add /videos/** to your security configuration and define what access you allow for it via something like
<security:intercept-url pattern="/videos/**" access="ROLE_USER, ROLE_ADMIN"/>
or whatever access you desire.
Also, you might consider saving these large mp4's to a cloud storage and/or CDN such as Amazon S3 (with our without CloudFront).
Then you can generate unique urls which will last as long as you want them to. Then the download is handled by Amazon rather than having to use the computing power, data space, and memory of your web server to serve up the large resource files. Also, if you use something like CloudFront, you can configure it for streaming rather than download.

Loading the entire file into memory is worse, as well as using more memory and being non-scalable. You don't transmit any data until you've loaded it all, which adds all that latency.

How to manage the creation and deletion of temporary files

I'm adding code to a large JSP web application, integrating functionality to convert CGM files to PDFs (or PDFs to CGMs) to display to the user.
It looks like I can create the converted files and store them in the directory designated by System.getProperty("java.io.tmpdir"). How do I manage their deletion, though? The program resides on a Linux-based server. Will the OS automatically delete from /tmp or will I need to come up with functionality myself? If it's the latter scenario, what are good ways to go about doing it?
EDIT: I see I can use deleteOnExit() (relevant answer elsewhere), but I think the JVM runs more or less continuously in the background so I'm not sure if the exits would be frequent enough.
I don't think I need to cache any converted files--just convert a file anew every time it's needed.

You can do this
File file = File.createTempFile("base_name", ".tmp", new File(temporaryFolderPath));
file.deleteOnExit();
the file will be deleted when the virtual machine terminates
Edit:
If you want to delete it after the job is done, just do it:
File file = null;
try{
file = File.createTempFile("webdav", ".tmp", new File(temporaryFolderPath));
// do sth with the file
}finally{
file.delete();
}

There are ways to have the JVM delete files when the JVM exits using deleteOnExit() but I think there are known memory leaks using that method. Here is a blog explaining the leak: http://www.pongasoft.com/blog/yan/java/2011/05/17/file-dot-deleteOnExit-is-evil/
A better solution would either be to delete old files using a cron or if you know you aren't going to use the file again, why not just delete it after processing?

From your comment :
Also, could I just create something that checks to see if the size of my files exceeds a certain amount, and then deletes the oldest ones if that's true? Or am I overthinking it?
You could create a class that keeps track of the created files with a size limit. When the size of the created files, after creating a new one, goes over the limit, it deletes the oldest one. Beware that this may delete a file that still needs to exist even if it is the oldest one. You might need a way to know which files still need to be kept and delete only those that are not needed anymore.
You could have a timer in the class to check periodically instead of after each creation. This solution is tied to your application while using a cron isn't.

How to persist large strings in a POJO?

If I have a property of an object which is a large String (say the contents of a file ~ 50KB to 1 MB, maybe larger), what is the practice around declaring such a property in a POJO? All I need to do is to be able to set a value from one layer of my application and transfer it to another without making the object itself "heavy".
I was considering if it makes sense to associate an InputStream or OutputStream to get / set the value, rather than reference the String itself - which means when I attempt to read the value of the contents, I read it as a stream of bytes, rather than a whole huge string loaded into memory... thoughts?

What you're describing depends largely on your anticipated use of the data. If you're delivering the contents in raw form, then there may be more efficient ways to manage it.
For example, if your app has a web interface, your app may just provide a URL for a web server to stream the contents to the requester. If it's a CLI-based app, you may be able to get away with a simple file copy. If your app is processing the file, however, then perhaps your POJO could retain only the results of that processing rather than the raw data itself.
If you wish to provide a general pattern along the lines of using POJO's with references to external streams, I would suggest storing in your POJO something akin to a URI that tells where to find the stream (like a row ID in a database or a filename or a URI) rather than storing an instance of the stream itself. In doing so, you'll reduce the number of open file handles, prevent potential concurrency issues, and will be able to serialize those objects locally if needed without having to duplicate the raw data persisted elsewhere.

You could have an object that supplies a stream or an iterator every time you access it. Note that the content has to live on some storage, like a file. I.e your object will store a pointer (e.g. a file path) to the storage and every time someone access it, you open a stream or create an iterator and let that party read. Note also that in order to save on memory, whoever consumes it has to make sure not to store the whole content in memory.
However, 50KB or 1MB is really tiny. Unless you have like gigabytes (or maybe hundred megabytes), I wouldn't try to do something like that.
Also, even if you have large data, it's often simpler to just use files or whatever storage you'll use.
tl;dr: Just use String.

Java Applet random access storage

I have a java project that uses java.io.RandomAccessFile to manage data loading. It seeks through the file creating a map of key points which can then be loaded as needed later. This works great.
I want to make it run as an applet but it requires security permissions to create a temp file that a downloaded file could be stored in, and that's a huge barrier for it's intended usage.
I think I can spare the memory (a few MB) to store the contents in a memory buffer of some sort and then random access it in the same way I treat local files...
Is there a way to create a temp file without requiring security permissions (I assume not)?
What is the best buffering option? How would I get the contents of a URL based input stream into the buffer, read bytes from it, and be able to record and change the current seek position?

Check this discussion first: RandomAccessFile-like API for in-memory byte array?

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Where is data stored during streaming file upload via Apache Commons? - java

I saw this nifty guide on how to do streaming file uploads via Apache Commons. This got me thinking where is the data stored? And is it necessary to "close" or "clean" that location? Thanks!

It's stored as a byte[] in the Java memory.

Related

Reading all object files in directory with single stream

is there a more efficient way of sending an mp4 file to the user

How to manage the creation and deletion of temporary files

How to persist large strings in a POJO?

Java Applet random access storage

Categories

Resources