I know that in BIO, web container receives the request and wrap it in a HttpServletRequest object, and We can get Header and other stuff from it.
I think the HTTP Message has been copied to User-space. But why the request body is still an inputstream ? Can anybody explain that? Thanks a lot!
HttpServletRequest is an interface from the servlet specification. There are many servlet container implementations including jetty, tomcat and websphere to name a few. Each has its own implementation of HttpServletRequest.
Using an InputStream gives freedom to the servlet implementation to source the request body value from wherever it likes. One implementation might use a local file and FileInputStream, another may have a byte array in memory and use ByteArrayInputStream, another might source from a cache or database etc.
Another benefit of InputStream over a byte array is that you can stream over it only holding a small chunk in memory at a time. There's no requirement to have a large byte array (eg gigabytes) in memory all at once.
Imagine a video sharing site where each user can upload 1GB videos. If the servlet spec were to impose byte array instead of InputStream then a server with 8GB of ram could only support 8 concurrent uploads. With InputStream you can have a small buffer of ram for each upload so could support hundreds/thousands of concurrent 1GB uploads
Related
I create PDF docs in memory as OutputStreams. These should be uploaded to S3. My problem is that it's not possible to create a PutObjectRequest from an OutputStream directly (according to this thread in the AWS dev forum). I use aws-java-sdk-s3 v1.10.8 in a Dropwizard app.
The two workarounds I can see so far are:
Copy the OutputStream to an InputStream and accept that twice the amount of RAM is used.
Pipe the OutputStream to an InputStream and accept the overhead of an extra thread (see this answer)
If i don't find a better solution I'll go with #1, because it looks as if I could afford the extra memory more easily than threads/CPU in my setup.
Is there any other, possibly more efficient way to achive this that I have overlooked so far?
Edit:
My OutputStreams are ByteArrayOutputStreams
I solved this by subclassing ConvertibleOutputStream:
public class ConvertibleOutputStream extends ByteArrayOutputStream {
//Craetes InputStream without actually copying the buffer and using up mem for that.
public InputStream toInputStream(){
return new ByteArrayInputStream(buf, 0, count);
}
}
What's the actual type of your OutputStream? Since it's an abstract class, there's no saying where the data actually goes (or if it even goes anywhere).
But let's assume that you're talking about a ByteArrayOutputStream since it at least keeps the data in memory (unlike many many others).
If you create a ByteArrayInputStream out of its buffer, there's no duplicated memory. That's the whole idea of streaming.
another workaround is to use presigned url feature of s3.
since presigned url allows you to upload files to s3 with http put or post, it is possible to send your output stream to HttpURLConnection.
sample code from amazon
I am using Spring-MVC and I need to send a MP4 file back to the user. The MP4 files are, of course, very large in size (> 2 GB).
I found this SO thread Downloading a file from spring controllers, which shows how to stream back a binary file, which should theoretically work for my case. However, what I am concerned about is efficiency.
In one case, an answer may implicate to load all the bytes into memory.
byte[] data = SomeFileUtil.loadBytes(new File("somefile.mp4"));
In another case, an answer suggest using IOUtils.
InputStream is = new FileInputStream(new File("somefile.mp4"));
OutputStream os = response.getOutputStream();
IOUtils.copy(is, os);
I wonder if either of these are more memory efficient than simply defining a resource mapping?
<resources mapping="/videos/**" location="/path/to/videos/"/>
The resource mapping may work, except that I need to protect all requests to these videos, and I do not think resource mapping will lend itself to logic that protects the content.
Is there another way to stream back binary data (namely, MP4)? I'd like something that's memory efficient.
I would think that defining a resource mapping would be the cleanest way of handling it. With regards to protecting access, you can simply add /videos/** to your security configuration and define what access you allow for it via something like
<security:intercept-url pattern="/videos/**" access="ROLE_USER, ROLE_ADMIN"/>
or whatever access you desire.
Also, you might consider saving these large mp4's to a cloud storage and/or CDN such as Amazon S3 (with our without CloudFront).
Then you can generate unique urls which will last as long as you want them to. Then the download is handled by Amazon rather than having to use the computing power, data space, and memory of your web server to serve up the large resource files. Also, if you use something like CloudFront, you can configure it for streaming rather than download.
Loading the entire file into memory is worse, as well as using more memory and being non-scalable. You don't transmit any data until you've loaded it all, which adds all that latency.
I am trying to send some very large files (>200MB) through an Http output stream from a Java client to a servlet running in Tomcat.
My protocol currently packages the file contents in a byte[] and that is placed a a Map<String, Object> along with some metadata (filename, etc.), each part under a "standard" key ("FILENAME" -> "Foo", "CONTENTS" -> byte[], "USERID" -> 1234, etc.). The Map is written to the URL connection output stream (urlConnection.getOutputStream()). This works well when the file contents are small (<25MB), but I am running into Tomcat memory issues (OutOfMemoryError) when the file size is very large.
I thought of sending the metadata Map first, followed by the file contents, and finally by a checksum on the file data. The receiver servlet can then read the metadata from its input stream, then read bytes until the entire file is finished, finally followed by reading the checksum.
Would it be better to send the metadata in connection headers? If so, how? If I send the metadata down the socket first, followed by the file contents, is there some kind of standard protocol for doing this?
You will almost certainly want to use a multipart POST to send the data to the server. Then on the server you can use something like commons-fileupload to process the upload.
The good thing about commons-fileupload is that it understands that the server may not have enough memory to buffer large files and will automatically stream the uploaded data to disk once it exceeds a certain size, which is quite helpful in avoiding OutOfMemoryError type problems.
Otherwise you are going to have to implement something comparable yourself. It doesn't really make much difference how you package and send your data, so long as the server can 1) parse the upload and 2) redirect data to a file so that it doesn't ever have to buffer the entire request in memory at once. As mentioned both of these come free if you use commons-fileupload, so that's definitely what I'd recommend.
I don't have a direct answer for you but you might consider using FTP instead. Apache Mina provides FTPLets, essentially servlets that respond to FTP events (see http://mina.apache.org/ftpserver/ftplet.html for details).
This would allow you to push your data in any format without requiring the receiving end to accommodate the entire data in memory.
Regards.
I have an JAX-RS web service that calls a db2 z/os database and returns about 240mb of data in a resultset. I am then creating an OutputStream to send this data to the client by looping through the resultset and adding a few XML tags for my output.
I am confused about what to use PrintWriter, BufferedWriter or OutputStreamWriter. I am looking for the fastest way to deliver the data. I also don't want the JVM to hold onto this data any longer than it needs to, so I don't use up it's memory.
Any help is appreciated.
You should use
BufferedWriter
Call .flush() frequently
Enable gzip for best compression
Start thinking about a different way of doing this. Can your data be paginated? Do you need all the data in one request.
If you are sending a large binary data, you probably don't want to use xml. When xml is used, binary data is usually represented using base64 which becomes larger than the original binary and uses quite a lot of CPU for the conversion into base64.
If I were you, I'd send the binary separate from the xml. If you are using WebService, MTOM attachment could help. Otherwise you could send the reference to the binary data in the xml, and let the app. download the binary data separately.
As for the fastest way to send binary, if you are using weblogic, just writing on the response's outputstram would be ok. That output stream is most probably buffered and whatever you do probably won't change the performance anyways.
Turning on gzip could also help depending on what you are sending (e.g. if you are sending jpeg (stuff that is already compressed) or something, it won't help a lot but if you are sending raw text then it can help a lot, etc.).
One solution (which might not work for you) is to spawn a job / thread that creates a file and then notifies the user when the file is ready to download, in this way you're not tied to the bandwidth of the client connection (and you can even compress the file properly, before the client downloads it)
Some Business Intelligence and data crunching applications do this, specially if the process takes some time to generate the data.
The output max speed will me limited by network bandwith and i am shure any Java OutputStream will be much more faster than you will notice the difference.
The choice depends on the data to send: is that text (lines) PrintWriter is easy, is that a byte array take OutputStream.
To hold not too much data in the buffers you should call flush() any x kb maybe.
You should never use PrintWriter to output data over a network. First of all, it creates platform-dependent line breaks. Second, it silently catches all I/O exceptions, which makes it hard for you to deal with those exceptions.
And if you're sending 240 MB as XML, then you're definitely doing something wrong. Before you start worrying about which stream class to use, try to reduce the amount of data.
EDIT:
The advice about PrintWriter (and PrintStream) came from a book by Elliotte Rusty Harold. I can't remember which one, but it was a few years ago. I think that ServletResponse.getWriter() was added to the API after that book was written - so it looks like Sun didn't follow Rusty's advice. I still think it was good advice - for the reasons stated above, and because it can tempt implementation authors to violate the API contract
in order to get predictable behavior.
I am developing a webapplication which reads very large files > 50MB and display it. The Java Spring backend will read these files and its content with CXF. My problem is, after it reads a 50 megabyte file, the size of used heap is increasing by 500 megabytes. I read this fils as a String and this String will be sent to frontend. Is there any idea, tricks how can I reduce the Java heap usage? I tried nio, spring's Resource class, but nothing helped.
A dirty way to do this is to have the Spring #Controller method accept an OutputStream or Writer argument—the framework will supply the raw output stream of the HTTP response and you can write directly into it. This sidesteps all the nice logic of content type management etc.
A better way is to define a custom type which will be returned from the controller method and a matching HttpMessageConverter which will (for example) use the information in that object to open the appropriate file and write its contents into the output stream.
In both cases you will not read the file into RAM; you'll use a single, fixed-size buffer to transfer the data directly from the disk to the HTTP response.