Servlet File Uploading - java

I'm using a servlet to do a multiple file upload (using Apache Commons FileUpload). A portion of my code is posted below. My problem is that if I upload files again and again , the memory consumption of the app server jumps rather drastically. The Apache Tomcat server seems to hang on to the memory and never return it. The heap space runs out of memory. Sometimes it runs out of memory exception and throws java heap space error.
I closed all the input streams, I think the problem is in the ServletFileUpload, could anyone help me out to how to close it.
ServletContext context=this.getServletConfig().getServletContext();
DiskFileItemFactory factory = new DiskFileItemFactory();
FileCleaningTracker fileCleaningTracker = FileCleanerCleanup.getFileCleaningTracker(context);
factory.setFileCleaningTracker(fileCleaningTracker);
if (isMultiPart) {
upload = new ServletFileUpload(factory);
try {
itr = upload.getItemIterator(request);
while (itr.hasNext()) {
item = itr.next();
if (item.isFormField()) {
...

You're using FileCleaningTracker, there are versions of Apache commons FileUpload with a bug in that component (see this: http://blog.novoj.net/2012/09/19/commons-file-upload-contains-a-severe-memory-leak/)
It seems it has been already fixed: https://issues.apache.org/jira/browse/FILEUPLOAD-189
So try using the last available version.

Related

Use Apache Commons VFS RAM file to avoid using file system with API requiring a file

There is a highly upvoted comment on this post:
how to create new java.io.File in memory?
where Sorin Postelnicu mentions using an Apache Commons VFS RAM file as a way to have an in memory file to pass to an API that requires a java.io.File (I am paraphrasing... I hope I haven't missed the point).
Based on reading related posts I have come up with this sample code:
#Test
public void working () throws IOException {
DefaultFileSystemManager manager = new DefaultFileSystemManager();
manager.addProvider("ram", new RamFileProvider());
manager.init();
final String rootPath = "ram://virtual";
manager.createVirtualFileSystem(rootPath);
String hello = "Hello, World!";
FileObject testFile = manager.resolveFile(rootPath + "/test.txt");
testFile.createFile();
OutputStream os = testFile.getContent().getOutputStream();
os.write(hello.getBytes());
//FileContent test = testFile.getContent();
testFile.close();
manager.close();
}
So, I think that I have an in memory file called ram://virtual/test.txt with contents "Hello, World!"
My question is: how could I use this file with an API that requires a java.io.File?
Java's File API always works with native file system. So there is no way of converting the VFS's FileObject to File without having the file present on the native file system.
But there is a way if your API can also work with InputStream. Most libraries usually have overloaded methods that take in InputStreams. In that case, following should work:
InputStream is = testFile.getContent().getInputStream();
SampleAPI api = new SampleApi(is);

Check real file size while streaming files with FileItemStream

I'm writing an API using Spring + apache commons file upload.
https://commons.apache.org/proper/commons-fileupload/
There is a problem that I faced. I need to validate a file size. If it's bigger then the one that I configure, user should get an error.
For now, I implemented the upload without this check and it looks like this:
public ResponseEntity insertFile(#PathVariable Long profileId, HttpServletRequest request) throws Exception {
ServletFileUpload upload = new ServletFileUpload();
FileItemIterator uploadItemIterator = upload.getItemIterator(request);
if (!uploadItemIterator.hasNext()) {
throw new FileUploadException("FileItemIterator was empty");
}
while (uploadItemIterator.hasNext()) {
FileItemStream fileItemStream = uploadItemIterator.next();
if (fileItemStream.isFormField()) {
continue;
}
//do stuff
}
return new ResponseEntity(HttpStatus.OK);
}
It does exactly what I need. It doesn't require me to get file loaded completely to the memory. I use InputStream that I got to perform further transfer to another service. Eventually, I don't have file loaded to the memory completely at any point of the time.
However, that prevents me from getting the total number of bites that were loaded.
Is there a way to handle such validation without downloading file completely or saving it somewhere?
I tried FileItem, but it does require complete loading of the file.
ServletFileUpload has a method setSizeMax that control the max file size accepted for each request. To mitigate memory consumption issues you can use a DiskFileFactory to set disk file storing for larger files. You must always get the files cause trusting in headers only is not reliable but I think this will do the job :)

OutOfMemoryError while attending multiple download requests with Spring

I'm getting an OutOfMemoryException while trying to download several files.
All of them are being downloading simultaneously and their size is over 200MB more or less.
I'm using Spring 3.2.3 and java 7. This is a call from a REST request.
This is the code:
#RequestMapping(value = "/app/download", method = RequestMethod.GET, produces = MediaType.MULTIPART_FORM_DATA_VALUE)
public void getFile(#PathVariable String param, HttpServletResponse response) {
byte[] fileBytes = null;
String fileLength = null;
try {
// Firstly looking for the file from disk
Path fileFromDisk = getFileFromDisk(param);
InputStream is = null;
long fileLengthL = Files.size(fileFromDisk);
fileLength = String.valueOf(fileLengthL);
// Preparing data for response
String fileName = "Some file name.zip";
response.setHeader("Content-Disposition", "attachment; filename=\"" + fileName + "\"");
response.setHeader("Content-Length", fileLength);
is = Files.newInputStream(fileFromDisk);
IOUtils.copy(is, response.getOutputStream());
response.flushBuffer();
} catch (Exception e) {
// Exception treatment
}
}
IOUtils is the library from Apache to work with files.
The code works perfectly until we have several requests at a time.
I think the problem is the response is filled with all the data from the file and it is not freed from the JVM until the download is completed.
I would like to know if there is a way to chunk the response or similar to avoid filling the heap space with all the data at a time.
¿Any ideas?
Thank you very much in advance.
Have you given your dev environment enough memory?
I use Eclipse and its default memory allocation is 512m which has caused
me issues when using Spring.
If you are using eclipse go into eclipses main folder and
open a file called eclipse.ini.
There will be a line in there that says -Xmx512m.
Change that to what ever memory you would like to allocate to your Dev enviroment
I would normally go at least -Xmx1024m at least.
I hope this helps.
The content type set with the 'produces' attribute looks to be incorrect. Set the proper content type directly on the response object with the setContentType method. Also try using the setContentLength method to set the content length.
After reading and reading I've reached this conclusion: The output stream of the response object has to be completely filled, it can't be returned as little blocks of data to the browser or client. So the file size is loaded whatever it will be.
My personal solution is let doing the hard work a third party. My requirements need to have multiple downloads of big files at the same time: as my memory is not enough I'm using an external entity that provides me those files as a temporary URL.
I don't know if it is the best way, but is working for me.
Thank you anyway for your responses.

JSP Writing to server directory

So I am using resumable.js to upload files to a server.
The directory that I want to save to is something like
/dir/files/upload/
Obviously just made up, but this directory has user permissions to write to it.
I am using JSP to listen to the POST request that resumable.js makes, and writing the
.part
files to that directory.
Sample listener:
<% if(request.getMethod().equals("POST") && request.getParameter("resumableFilename") != null){
long chunkSize = StringUtils.isEmpty(request.getParameter("resumableChunkSize"))? 0:Long.parseLong(request.getParameter("resumableChunkSize"));
String fileName = request.getParameter("resumableFilename");
long totalSize = StringUtils.isEmpty(request.getParameter("resumableTotalSize"))? 0:Long.parseLong(request.getParameter("resumableTotalSize"));
String temp_dir = "/dir/files/upload/"+request.getParameter("resumableIdentifier");//Add in user_id
String dest_dir = temp_dir+fileName+".part"+request.getParameter("resumableChunkNumber");
File fDir = new File(temp_dir);
fDir.mkdirs();
if(ServletFileUpload.isMultipartContent(request)){
DiskFileItemFactory factory = new DiskFileItemFactory();
factory.setRepository(new File(temp_dir)); ServletFileUpload upload = new ServletFileUpload(factory);
List items = upload.parseRequest(request);
ArrayListIterator iter = (ArrayListIterator)items.iterator();
FileItem item = (FileItem)iter.next();
File fileWithNewDir = new File(dest_dir);
item.write(fileWithNewDir); // write file to dest_dir (fileName.part*CHUNK_NUM*)
}
}
%>
The script is hosted on
www.site.com/pubs/res.jsp
According to the JS itself for resumable, the process of uploading it gets completed, however, a new directory is not made at all. I know it's not the write permissions, so it must be something else.
Here is my call in javascript for a new resumable object
var resume = new Resumable({
target:"res.jsp",
resumableChunkSize: 1*1024*1024,
simultaneousUploads: 3,
testChunks: false,
throttleProgressCallbacks: 1
});
It seems to be hitting the jsp file, but nothing is happening.
I followed Apache's fileupload page in order to implement that listener, but maybe I went wrong at some point.
Apache's FileUpload
Resumable.js
Location of the directory matters. It has to be within the context of the WAR. You cannot write to any location outside the context of the container. If you look at the log you may be abe to see the error message which can explain this.

Java Heap Space (CMS with huge files)

EDIT:
Got the directory to live. Now there's another issue in sight:
The files in the storage are stored with their DB id as a prefix
to their file names. Of course I don't want the users to see those.
Is there a way to combine the response.redirect and the header setting
für filename and size?
best,
A
Hi again,
new approach:
Is it possible to create a IIS like virtual directory within tomcat in order
to avoid streaming and only make use of header redirect? I played around with
contexts but could'nt get it going...
any ideas?
thx
A
Hi %,
I'm facing a wired issue with the java heap space which is close
to bringing me to the ropes.
The short version is:
I've written a ContentManagementSystem which needs to handle
huge files (>600mb) too. Tomcat heap settings:
-Xmx700m
-Xms400m
The issue is, that uploading huge files works eventhough it's
slow. Downloading files results in a java heap space exception.
Trying to download a 370mb file makes tomcat jump to 500mb heap
(which should be ok) and end in an Java heap space exception.
I don't get it, why does upload work and download not?
Here's my download code:
byte[] byt = new byte[1024*1024*2];
response.setHeader("Content-Disposition", "attachment;filename=\"" + fileName + "\"");
FileInputStream fis = null;
OutputStream os = null;
fis = new FileInputStream(new File(filePath));
os = response.getOutputStream();
BufferedInputStream buffRead = new BufferedInputStream(fis);
while((read = buffRead.read(byt))>0)
{
os.write(byt,0,read);
os.flush();
}
buffRead.close();
os.close();
If I'm getting it right the buffered reader should take care of any
memory issue, right?
Any help would be highly appreciated since I ran out of ideas
Best regards,
W
If I'm getting it right the buffered
reader should take care of any memory
issue, right?
No, that has nothing to do with memory issues, it's actually unnecessary since you're already using a buffer to read the file. Your problem is with writing, not with reading.
I can't see anything immediately wrong with your code. It looks as though Tomcat is buffering the entire response instead of streaming it. I'm not sure what could cause that.
What does response.getBufferSize() return? And you should try setting response.setContentLength() to the file's size; I vaguely remember that a web container under certain circumstances buffers the entire response in order to determine the content length, so maybe that's what's happening. It's good practice to do it anyway since it enables clients to display the download size and give an ETA for the download.
Try using the setBufferSize and flushBuffer methods of the ServletResponse.
You better use java.nio for that, so you can read resources partially and free resources already streamed!
Otherwise, you end up with memory problems despite the settings you've done to the JVM environment.
My suggestions:
The Quick-n-easy: Use a smaller array! Yes, it loops more, but this will not be a problem. 5 kilobytes is just fine. You'll know if this works adequately for you in minutes.
byte[] byt = new byte[1024*5];
A little bit harder: If you have access to sendfile (like in Tomcat with the Http11NioProtocol -- documentation here), then use it
A little bit harder, again: Switch your code to Java NIO's FileChannel. I have very, very similar code running on equally large files with hundreds of concurrent connections and similar memory settings with no problem. NIO is faster than plain old Java streams in these situations. It uses the magic of DMA (Direct Memory Access) allowing the data to go from disk to NIC without ever going through RAM or the CPU. Here is a code snippet for my own code base...I've ripped out much to show the basics. FileChannel.transferTo() is not guaranteed to send every byte, so it is in this loop.
WritableByteChannel destination = Channels.newChannel(response.getOutputStream());
FileChannel source = file.getFileInputStream().getChannel();
while (total < length) {
long sent = source.transferTo(start + total, length - total, destination);
total += sent;
}
The following code is able to streaming data to the client, allocating only a small buffer (BUFFER_SIZE, this is a soft point since you may want to adjust it):
private static final int OUTPUT_SIZE = 1024 * 1024 * 50; // 50 Mb
private static final int BUFFER_SIZE = 4096;
#Override
protected void doGet(HttpServletRequest request,HttpServletResponse response)
throws ServletException, IOException {
String fileName = "42.txt";
// build response headers
response.setStatus(200);
response.setContentLength(OUTPUT_SIZE);
response.setContentType("text/plain");
response.setHeader("Content-Disposition",
"attachment;filename=\"" + fileName + "\"");
response.flushBuffer(); // write HTTP headers to the client
// streaming result
InputStream fileInputStream = new InputStream() { // fake input stream
int i = 0;
#Override
public int read() throws IOException {
if (i++ < OUTPUT_SIZE) {
return 42;
} else {
return -1;
}
}
};
ReadableByteChannel input = Channels.newChannel(fileInputStream);
WritableByteChannel output = Channels.newChannel(
response.getOutputStream());
ByteBuffer buffer = ByteBuffer.allocate(BUFFER_SIZE);
while (input.read(buffer) != -1) {
buffer.flip();
output.write(buffer);
buffer.clear();
}
input.close();
output.close();
}
Are you required to serve files using Tomcat? For this kind of tasks we have used separate download mechanism. We chained Apache -> Tomcat -> storage and then add rewrite rules for download. Then you just by-pass Tomcat and Apache will serve the file to client (Apache->storage). But if works only if you have files stored as files. If you read from DB or other type of non-file storage this solution cannot be used successfully. the overall scenario is that you generate download links for files as e.g. domain/binaries/xyz... and write redirect rule for domain/files using Apache mod_rewrite.
Do you have any filters in the application, or do you use the tcnative library? You could try to profile it with jvisualvm?
Edit: Small remark: Note that you have a HTTP response splitting attack possibility in the setHeader if you do not sanitize fileName.
Why don't you use tomcat's own FileServlet?
It can surely give out files much better than you can possible imagine.
A 2-MByte buffer is way too large! A few k should be ample. Megabyte-sized objects are a real issue for the garbage collector, since they often need to be treated separately from "normal" objects (normal == much smaller than a heap generation). To optimize I/O, your buffer only needs to be slightly larger than your I/O buffer size, i.e. at least as large as a disk block or network package.

Categories