I'm writing a Java desktop client which will send multiple files over the wire to a servlet using a post request. In the servlet I'm getting the input stream from the request to receive the files. The servlet will write the files to disk, one by one as they're read from the stream.
The implementation has a couple of requirements:
Only one HTTP request must be used to the server (so only a single stream)
The servlet must use a reasonable fixed amount of memory, no matter what the size of the files.
I had considered inserting markers into the stream so I know when one file ends and the next one begins. I'd then write some code to parse the stream in the servlet, and start writing the next file as appropriate.
Here's the thing... surely there's a library to do that. I've looked through apache commons and found nothing. Commons File Upload is interesting but since the upload comes from a Java app, not a browser it only solves the receiving end, not the sending.
Any ideas for a library which easily allows multiple file transfers across a single stream with fixed memory expectations even for very large files?
Thanks.
Just use HTTP multipart/form-data encoding on the POST request body. It's described in RFC-2388 and a standard way of uploading (multiple) files by HTTP.
You can do it with just java.net.URLConnection as described in this mini-tutorial, although it would generate lot of boilerplate code. A more convenienced approach would be using Apache Commons HttpClient.
In the servlet side you can then just use Apache Commons Fileupload to process the uploaded files the usual HTTP way (or when you're already on Servlet 3.0, the HttpServletRequest#getParts(), see also this answer for examples).
Related
I want to download a very large file using camel, but I don't want to hold the entire file in memory and THEN save it to file.
I want to stream the file in and save or write to a file in chunks.
Is this possible with Camel, and if so, how do I do this?
Note: Is it possible that the endpoint I am downloading the file does not support streaming/chunking? If yes, how can I verify this?
Camel's HTTP component uses Netty to make the request. Netty reads the entire response into memory, so there is no way to do what you are asking for.
You would need to implement your own endpoint for Camel that utilizes another HTTP library which has support for HTTP response streaming.
More documentation is available here :
https://cwiki.apache.org/confluence/display/CAMEL/Netty4+HTTP
You can 3 option to download the file i.e. using:
ftp://[username#]hostname[:port]/directoryname[?options]
sftp://[username#]hostname[:port]/directoryname[?options]
ftps://[username#]hostname[:port]/directoryname[?options]
There is a option of streamDownload in it.
For more check out http://camel.apache.org/ftp.html
I need to retrieve from an Application Server (JBoss) a large file (gigabytes) and to avoid loading it in memory, I want to stream it through EJB.
Is it possible to take data out of an Application Server as a stream?
Create a HttpServlet, stream the file.
update
Be careful with your header. You cannot set the ContentLength-Header via setContentLength(), because it only accept int.
You wil have to set it with: setHeader("Content-Length", (long)length)
Maybe this will be helpful: Using ServletOutputStream to write very large files in a Java servlet without memory issues
There is a limit, but it depends on the client-side. If the client will hold the file in the memory, it will not work.
By EJB do you mean remote bean? These beans are typically based on RMI which in turn uses Java serialization. You cannot stream data using RMI.
However with servlets and HTTP this will be dead simple. Just open FileInputStream to your large file and copy it byte-by-byte to servlet output.
To remember:
Use input file buffering
At the very beginning set Content-Length header so that the client knows how much data is left
I have a webapp with an architecture I'm not thrilled with. In particular, I have a servlet that handles a very large file upload (via commons-fileupload), then processes the file, passing it to a service/repository layer.
What has been suggested to me is that I simply have my servlet upload the file, and a service on the backend do the processing. I like the idea, but I have no idea to go about it. I do not know JMS.
Other details:
- App is a GWT app split into the recommended client/server/shared subpackages, using an MVP architecture.
- Currently, I am only running in GWT hosted mode, but am planning to move to Tomcat in the very near future.
I'm perfectly willing to learn whatever I need to in order to get this working (in fact, that's the point of writing the app). I'm not expecting anyone to write code for me, but can someone point me in the right direction to get started?
There are many options for this scenario, but the simplest may be just copying the uploaded file to a known location on the file system, and have a background daemon monitor the location and process when it finds it.
#Jason, there are many ways to solve your problem.
i) Have dump you file data into Database with column type BLOB. and have a DB polling thread(after a particular time period) polls table for newly inserted file .
ii) Have dump file into file system and have a file montioring process.
Benefit of i) over ii) is that DB is centralized and fast resource where as file systems are genrally slow and non-centalized in nature.
So basically servlet would dump either to DB or file system. Now about who will process that dumped file:- a) It could be either montioring process as discussed above or b) you can use JMS which is asynchronous in nature what it means servlet would put a trigger event in queue which will asynchronously trigger new processing thread.
Well don't introduce JMS in your system unnecessarily if you are ok with monitoring process.
This sounds interesting and familiar to me :). We do it in the similar way.
We have our four projects, all four projects includes file upload and file processing (Image/Video/PDF/Docs) etc. So we created a single project to handle all file processing, it is something like below:
All four projects and File processor use Amazon S3/Our File Storage for file storage so file storage is shared among all five projects.
We make request to File Processor providing details in XML via http request which include file-path on S3/Stoarge, aws authentication details, file conversion/processing parameters. File Processor does processing and puts processed files on S3/Storage, constructs XML with processed files details and sends XML via response.
We use Spring Frameowrk and Tomcat.
Since this is foremost a learning exercise, you need to pick an easy to use JMS provider. This discussion suggested FFMQ just one year ago.
Since you are starting with a simple processor, you can keep it simple and use a JMS Queue.
In the simplest form, each message send by the servlet has to correspond to a single job. You can either put the entire payload of the upload in the message, or just send a filename as reference to the content in the message. These are details you can refactor later.
On the processor side, if you are using Java EE, you can use a MessageBean. If you are not, then I would suggest a 3 JVM solution -- one each for Tomcat, the JMS server, and the message processor. This article includes the basics of a message consuming client.
New Servlet 3.0 API provide us with convenient way to parse multi-part form data. But it stores content of uploaded files in file system or in memory
Is there streaming API for Servlet 3.0 ?
Something like Commons FileUpload. I have to write content directly from InputStream and write to another OutputStream adn I don't want to store temporary file content in disc or memory
I used this once for something similar, though not with servlets. It doesn't fill up your memory with data. Hope it helps:
http://code.google.com/p/io-tools/wiki/Tutorial_EasyStream
Looking at the Servlet 3.0 spec it may not be possible to have a streaming implementation
For parts with form-data as the Content-Disposition, but without a
filename, the string value of the part will also be available via the
getParameter /getParameterValues methods on HttpServletRequest, using
the name of the part.
So the request must be parsed up front so that all the non-file parts can be exposed as HttpServletRequest parameters.
You have to use third party libraries if you need streaming.
I'm looking for a way to get the form parameters of a HTTP multi-part request in a Servlet-filter without uploading files (yet).
request.getParameterMap() returns empty. I understand this is because of the request being multi-part.
I've looked at commons.HttpFileUpload but this seems to be overkill for my situation. In this filter I'm only interested in the normal parameters, and don't want to handle the file-upload yet.
Edit: the main problem is that I need to have an intact HttpRequestObject further down the filter stack. The HttpFileUpload seems to consume part of the request data (probably by using the data stream object and closing it again.)
It's certainly not overkill, it's the right way and always better than writing the parser yourself. The Apache Commons FileUpload is developed and maintained for years and has proven its robustness in handling multipart/form-data requests. You don't want to reinvent the wheel. If you really want to do it (I don't recommend it), then read on the multipart/form-data specification and start with reading the HttpServletRequest#getInputStream() (warning: this is a mix of binary and character data!).
You can if necessary also write a Filter which makes use of Apache Commons FileUpload under the hood and checks every request if it is multipart/form-data and if so, then put the parameters back in the request parameter map with help of Commons FileUpload and put the uploaded files (or exceptions) as request attributes, so that it's finally a bit more transparently in your servlet code. You can find here a basic example to get the idea.
Hope this helps.
Just to add to the answers already provided - I had a very similar problem in that I was trying to add some CSRF validation to our existing web app. We decided to include a special token in each form using some JS and add a servlet filter to check that the token existed (therefore a generic, isolated solution).
The servlet would check if the token was present but broke for every form that provided a file upload option. Hence I landed at this page frequently while doing some googling.
The work around we used (while attempting to avoid any dealings with the uploaded files) was to get some JavaScript to add the token as a GET parameter, i.e. We modified the form's action URL to include the token and therefore could use the HttpServletRequest.getParameter() method for the token (and only the token).
I have tested this in IE, FF and Chrome and all seem to be happy.
Hope this helps anyone who also finds themselves in a similar situation.
The Oreilly Servlets website has some sample code which you can download customise and use. This includes MultipartRequest which sounds like it does what you require, it un-boxes a multipart request and allows access to the parameters and the files separately.
Commons FileUpload provides a mechanism to read request params from a multipart form upload.
There's a really great example of how to grab the request parameters here:
How to upload files to server using JSP/Servlet?