I'm using play 2.1.0 and want to implement file upload with several parameters, i.e. multipart/form-data form has some small fields and file itself.
If I upload the file without using annotation
#BodyParser.Of(value = BodyParser.MultipartFormData.class, maxLength = MAX_FILE_SIZE_B)
and checking file size like uploadedFile.length > MAX_SIZE I can access request body and it's not null all the time.
If I'm using the annotation, when maxSizeExceeded ctx.request().body().asMultipartFormData() is null even my small parameters go first in the request sent by browser. Is it correct behaviour, is any way to get small parameters even file is too large?
Is it true that the first way is bad, because large files actually will be uploaded on the server?
The behavior is expected because, the header will contain the file size, and if the payload/file size has exceeded the max_size limit the server will not receive the file and the connection will be closed. So, you can't access any form fields. Instead try to add those fields as a part of request headers, if that helps.
There is no documentation that explains this, but that is how it is handled in http layer. The following code might explain a bit, when the payload exceeds the limit it wraps the object with body = null.
To answer your question, yes the second approach is good and helps your server from accepting large files unnecessarily.
Related
Jersey 2.4.1 gives us the ability to enable fixed length streaming. This is very useful when uploading large files. The new client property for enabling this is: HTTP_URL_CONNECTOR_FIX_LENGTH_STREAMING.
By default, when doing uploads, the whole entity content is buffered by the connector before the bytes are sent to their destination. This means that the client will likely run out of memory when uploading large files. Enabling fixed length streaming solves this problem.
Unfortunately this property is not honored when the content-length header is not specified (or is set to 0) in the request. My question is why? What problem are the Jersey runtimes trying to prevent by putting this restriction? Is the content length information necessary to stream the data?
Thanks,
Habib
Whether fixed length streaming is actived or not, the client should set the header anyway. With fixed length you know the size without the need of buffering the content but that only makes sense if you actually set the header. The server doesn't care if the client buffered the content to determine the length or not.
In HTTP, [the Content-Length field] SHOULD be sent whenever the message's length can be determined prior to being transferred, unless this is prohibited by the rules in section 4.4.
RFC 2616, section 14.13 Content-Length
Without setting the length header, the client could start streaming indefinitely, without a buffer. I guess this it what Jersey tries to prevent, because then the server wouldn't know when the content ends (exept some cases listed in
RFC 2616, section 4.4 Message Length).
I forward upload requests I receive from clients to an another endpoint. I do not control the presence of the content length header in the requests I receive, and therefore may not always have a content length header to send to the end point.
That said, I can see that we need to protect against the malicious case you mention above, although I initially thought this would be the backend's responsibility.
Thanks for the clarification.
I am using URL class in java and I want to read bytes through Input Stream from a specific byte location in the stream instead of using skip() function which takes a lot of time to get to that specific location.
I suppose it is not possible and here is why: when you send GET request, remote server does not know that you are interested in bytes from 100 till 200 - he sends you full document/file. So you need to read them, but don't need to handle them - that is why skip is slow.
But: I am sure that you can tell server (some of them support it, some - don't) that you want 100+ bytes of file.
Also: see this to get in-depth knowledge about skip mechanics: How does the skip() method in InputStream work?
The nature of streams mean you will need to read through all the data to get to the specific place you want to start from. You will not get faster than skip() unfortunately.
The simple answer is that you can't.
If you perform a GET that requests the entire file, you will have to use skip() to get to the part that you want. (And in fact, the slowness is most likely because the server has to send all of the data that is being skipped to the client. That is how TCP/IP works ...)
However, there is a possible alternative. The HTTP 1.1 specification supports partial fetching documents using the Range header. If your server supports this, then you can request the server to send you just the range of the document that you are interested in. However, you may need to deal with the case where the server ignores the Range header and sends the entire document anyway.
I am trying to send some very large files (>200MB) through an Http output stream from a Java client to a servlet running in Tomcat.
My protocol currently packages the file contents in a byte[] and that is placed a a Map<String, Object> along with some metadata (filename, etc.), each part under a "standard" key ("FILENAME" -> "Foo", "CONTENTS" -> byte[], "USERID" -> 1234, etc.). The Map is written to the URL connection output stream (urlConnection.getOutputStream()). This works well when the file contents are small (<25MB), but I am running into Tomcat memory issues (OutOfMemoryError) when the file size is very large.
I thought of sending the metadata Map first, followed by the file contents, and finally by a checksum on the file data. The receiver servlet can then read the metadata from its input stream, then read bytes until the entire file is finished, finally followed by reading the checksum.
Would it be better to send the metadata in connection headers? If so, how? If I send the metadata down the socket first, followed by the file contents, is there some kind of standard protocol for doing this?
You will almost certainly want to use a multipart POST to send the data to the server. Then on the server you can use something like commons-fileupload to process the upload.
The good thing about commons-fileupload is that it understands that the server may not have enough memory to buffer large files and will automatically stream the uploaded data to disk once it exceeds a certain size, which is quite helpful in avoiding OutOfMemoryError type problems.
Otherwise you are going to have to implement something comparable yourself. It doesn't really make much difference how you package and send your data, so long as the server can 1) parse the upload and 2) redirect data to a file so that it doesn't ever have to buffer the entire request in memory at once. As mentioned both of these come free if you use commons-fileupload, so that's definitely what I'd recommend.
I don't have a direct answer for you but you might consider using FTP instead. Apache Mina provides FTPLets, essentially servlets that respond to FTP events (see http://mina.apache.org/ftpserver/ftplet.html for details).
This would allow you to push your data in any format without requiring the receiving end to accommodate the entire data in memory.
Regards.
I have a filter which processes generated HTML and rewrites certain elements. For example, it adds class attributes to some anchors. Finally, it writes the processed HTML to the response (a subclass of HttpServletResponseWrapper). Naturally, this means that the processed HTML is a different length after it has passed through the filter.
I can see two ways of approaching this.
One is to iterate over the HTML, using a StringBuilder to build up the processed HTML, and write the processed HTML to the response once all filtering is complete.
The other is to iterate over the HTML but to write it to the response as soon as each element has been processed.
Which is the better way for this operation, or is there another option which would be preferable? I am looking to minimise temporary memory usage primarily.
The complexity of streaming the response (i.e. writing it "on the go") lies in the code structure: your processing must be such that the response bytes are obtained in due order. But if you assemble the response in a StringBuilder then your code is already good for streaming. Simply replace the StringBuilder with the PrintWriter that the ServletResponse.getWriter() method returns.
Note that in HTTP 1.0, the HTTP server must either provide the content length in the response headers, or close the connection at the end of the response. HTTP 1.1 includes the "chunked transfer encoding" which allows data streaming without knowing the content length beforehand, and without preventing the connection from being reused for subsequent HTTP requests. This should be handled automatically, so you do not have to worry about it unless you are trying to support really old HTTP clients.
Obviously the second approach would need less memory and would increase responsiveness, but it is often more difficult to implement.
First off my Java is beyond rusty and I've never done JSPs or servlets, but I'm trying to help someone else solve a problem.
A form rendered by JavaScript is posting back to a JSP.
Some of the fields in this form are over 100KB in size.
However when the form field is being retrieved on the JSP side the value of the field is being truncated to 100KB.
Now I know that there is a similar problem in ASP Request.Form which can be gotten around by using Request.BinaryRead.
Is there an equivalent in Java?
Or alternatively is there a setting in Websphere/Apache/IBM HTTP Server that gets around the same problem?
Since the posted request must be kept in-memory by the servlet container to provide the functionality required by the ServletRequest API, most servlet containers have a configurable size limit to prevent DoS attacks, since otherwise a small number of bogus clients could provoke the server to run out of memory.
It's a little bit strange if WebSphere is silently truncating the request instead of failing properly, but if this is the cause of your problem, you may find the configuration options here in the WebSphere documentation.
We have resolved the issue.
Nothing to do with web server settings as it turned out and nothing was being truncated in the post.
The form field prior to posting was being split into 102399 bytes sized chunks by JavaScript and each chunk was added to the form field as a value so it was ending up with an array of values.
Request.Form() appears to automatically concatenate these values to reproduce the single giant string but Java getParameter() does not.
Using getParameterValues() and rebuilding the string from the returned values however did the trick.
You can use getInputStream (raw bytes) or getReader (decoded character data) to read data from the request. Note how this interacts with reading the parameters. If you don't want to use a servlet, have a look at using a Filter to wrap the request.
I would expect WebSphere to reject the request rather than arbitrarily truncate data. I suspect a bug elsewhere.