How to download (stream) large (generated) file in Micronaut - java

In my app I'm generating large pdf/csv files. I'm wondering Is there any way to stream large files in Micronaut without keeping it fully in memory before sending to a client.

You can use StreamedFile, eg:
#Get
public StreamedFile download() {
InputStream inputStream = ...
return new StreamedFile(inputStream, "large.csv");
}
Be sure to check the official documentation about file transfers.

Related

Check real file size while streaming files with FileItemStream

I'm writing an API using Spring + apache commons file upload.
https://commons.apache.org/proper/commons-fileupload/
There is a problem that I faced. I need to validate a file size. If it's bigger then the one that I configure, user should get an error.
For now, I implemented the upload without this check and it looks like this:
public ResponseEntity insertFile(#PathVariable Long profileId, HttpServletRequest request) throws Exception {
ServletFileUpload upload = new ServletFileUpload();
FileItemIterator uploadItemIterator = upload.getItemIterator(request);
if (!uploadItemIterator.hasNext()) {
throw new FileUploadException("FileItemIterator was empty");
}
while (uploadItemIterator.hasNext()) {
FileItemStream fileItemStream = uploadItemIterator.next();
if (fileItemStream.isFormField()) {
continue;
}
//do stuff
}
return new ResponseEntity(HttpStatus.OK);
}
It does exactly what I need. It doesn't require me to get file loaded completely to the memory. I use InputStream that I got to perform further transfer to another service. Eventually, I don't have file loaded to the memory completely at any point of the time.
However, that prevents me from getting the total number of bites that were loaded.
Is there a way to handle such validation without downloading file completely or saving it somewhere?
I tried FileItem, but it does require complete loading of the file.
ServletFileUpload has a method setSizeMax that control the max file size accepted for each request. To mitigate memory consumption issues you can use a DiskFileFactory to set disk file storing for larger files. You must always get the files cause trusting in headers only is not reliable but I think this will do the job :)

Download file to stream instead of File

I'm implementing an helper class to handle transfers from and to an AWS S3 storage from my web application.
In a first version of my class I was using directly a AmazonS3Client to handle upload and download, but now I discovered TransferManager and I'd like to refactor my code to use this.
The problem is that in my download method I return the stored file in form of byte[]. TransferManager instead has only methods that use File as download destination (for example download(GetObjectRequest getObjectRequest, File file)).
My previous code was like this:
GetObjectRequest getObjectRequest = new GetObjectRequest(bucket, key);
S3Object s3Object = amazonS3Client.getObject(getObjectRequest);
S3ObjectInputStream objectInputStream = s3Object.getObjectContent();
byte[] bytes = IOUtils.toByteArray(objectInputStream);
Is there a way to use TransferManager the same way or should I simply continue using an AmazonS3Client instance?
The TransferManager uses File objects to support things like file locking when downloading pieces in parallel. It's not possible to use an OutputStream directly. If your requirements are simple, like downloading small files from S3 one at a time, stick with getObject.
Otherwise, you can create a temporary file with File.createTempFile and read the contents into a byte array when the download is done.

OutOfMemoryError while attending multiple download requests with Spring

I'm getting an OutOfMemoryException while trying to download several files.
All of them are being downloading simultaneously and their size is over 200MB more or less.
I'm using Spring 3.2.3 and java 7. This is a call from a REST request.
This is the code:
#RequestMapping(value = "/app/download", method = RequestMethod.GET, produces = MediaType.MULTIPART_FORM_DATA_VALUE)
public void getFile(#PathVariable String param, HttpServletResponse response) {
byte[] fileBytes = null;
String fileLength = null;
try {
// Firstly looking for the file from disk
Path fileFromDisk = getFileFromDisk(param);
InputStream is = null;
long fileLengthL = Files.size(fileFromDisk);
fileLength = String.valueOf(fileLengthL);
// Preparing data for response
String fileName = "Some file name.zip";
response.setHeader("Content-Disposition", "attachment; filename=\"" + fileName + "\"");
response.setHeader("Content-Length", fileLength);
is = Files.newInputStream(fileFromDisk);
IOUtils.copy(is, response.getOutputStream());
response.flushBuffer();
} catch (Exception e) {
// Exception treatment
}
}
IOUtils is the library from Apache to work with files.
The code works perfectly until we have several requests at a time.
I think the problem is the response is filled with all the data from the file and it is not freed from the JVM until the download is completed.
I would like to know if there is a way to chunk the response or similar to avoid filling the heap space with all the data at a time.
¿Any ideas?
Thank you very much in advance.
Have you given your dev environment enough memory?
I use Eclipse and its default memory allocation is 512m which has caused
me issues when using Spring.
If you are using eclipse go into eclipses main folder and
open a file called eclipse.ini.
There will be a line in there that says -Xmx512m.
Change that to what ever memory you would like to allocate to your Dev enviroment
I would normally go at least -Xmx1024m at least.
I hope this helps.
The content type set with the 'produces' attribute looks to be incorrect. Set the proper content type directly on the response object with the setContentType method. Also try using the setContentLength method to set the content length.
After reading and reading I've reached this conclusion: The output stream of the response object has to be completely filled, it can't be returned as little blocks of data to the browser or client. So the file size is loaded whatever it will be.
My personal solution is let doing the hard work a third party. My requirements need to have multiple downloads of big files at the same time: as my memory is not enough I'm using an external entity that provides me those files as a temporary URL.
I don't know if it is the best way, but is working for me.
Thank you anyway for your responses.

Random-access Zip file without writing it to disk

I have a 1-2GB zip file with 500-1000k entries. I need to get files by name in fraction of second, without full unpacking. If file is stored on HDD, this works fine:
public class ZipMapper {
private HashMap<String,ZipEntry> map;
private ZipFile zf;
public ZipMapper(File file) throws IOException {
map = new HashMap<>();
zf = new ZipFile(file);
Enumeration<? extends ZipEntry> en = zf.entries();
while(en.hasMoreElements()) {
ZipEntry ze = en.nextElement();
map.put(ze.getName(), ze);
}
}
public Node getNode(String key) throws IOException {
return Node.loadFromStream(zf.getInputStream(map.get(key)));
}
}
But what can I do if program downloaded the zip file from Amazon S3 and has its InputStream (or byte array)? While downloading 1GB takes ~1 second, writing it to HDD may take some time, and it is slightly harder to handle multiple files since we don't have HDD garbage collector.
ZipInputStream does not allow to random access to entries.
It would be nice to create a virtual File in memory by byte array, but I couldn't find a way to.
You could mark the file to be deleted on exit.
If you want to go for an in-memory approach: Have a look at the new NIO.2 File API. Oracle provides a filesystem provider for zip/ jar and AFAIK ShrinkWrap provides an in-memory filesystem. You could try a combination of the two.
I've written some utility methods to copy directories and files to/from a Zip file using the NIO.2 File API (the library is Open Source):
Maven:
<dependency>
<groupId>org.softsmithy.lib</groupId>
<artifactId>softsmithy-lib-core</artifactId>
<version>0.3</version>
</dependency>
Tutorial:
http://softsmithy.sourceforge.net/lib/current/docs/tutorial/nio-file/index.html
API: CopyFileVisitor.copy
Especially PathUtils.resolve helps with resolving paths across filesystems.
You can use SecureBlackbox library, it allows ZIP operations on any seekable streams.
I think you should consider using your OS in order to create "in memory" file system (i.e - RAM drive).
In addition, take a look at the FileSystems API.
A completely different approach: If the server has the file on disk (and possibly cached in RAM already): make it give you the file(s) directly. In other words, submit which files you need and then take care to extract and deliver these on the server.
Blackbox library only has Extract(String name, String outputPath) method. Seems that it can randomly access any file in seekable zip-stream indeed, but it can't write result to byte array or return stream.
I couldn't find and documentation for ShrinkWrap. I couldn't find any suitable implementations of FileSystem/FileSystemProvider etc.
However, it turned out that Amazon EC2 instance I'm running (Large) somehow writes 1gb file to disk in ~1 second. So I just write file to the disk and use ZipFile.
If HDD would be slow, I think RAM disk would be the easiest solution.

Java- using an InputStream as a File

I'm trying to generate a PDF document from an uploaded ".docx" file using JODConverter.
The call to the method that generates the PDF is something like this :
File inputFile = new File("document.doc");
File outputFile = new File("document.pdf");
// connect to an OpenOffice.org instance running on port 8100
OpenOfficeConnection connection = new SocketOpenOfficeConnection(8100);
connection.connect();
// convert
DocumentConverter converter = new OpenOfficeDocumentConverter(connection);
converter.convert(inputFile, outputFile);
// close the connection
connection.disconnect();
I'm using apache commons FileUpload to handle uploading the docx file, from which I can get an InputStream object. I'm aware that Java.io.File is just an abstract reference to a file in the system.
I want to avoid the disk write (saving the InputStream to disk) and the disk read (reading the saved file in JODConverter).
Is there any way I can get a File object refering to an input stream? just any other way to avoid disk IO will also do!
EDIT: I don't care if this will end up using a lot of system memory. The application is going to be hosted on a LAN with very little to zero number of parallel users.
File-based conversions are faster than stream-based ones (provided by StreamOpenOfficeDocumentConverter) but they require the OpenOffice.org service to be running locally and have the correct permissions to the files.
Try the doc to avoid disk writting:
convert(java.io.InputStream inputStream, DocumentFormat inputFormat, java.io.OutputStream outputStream, DocumentFormat outputFormat)
There is no way to do it and make the code solid. For one, the .convert() method only takes two Files as arguments.
So, this would mean you'd have to extend File, which is possible in theory, but very fragile, as you are required to delve into the library code, which can change at any time and make your extended class non functional.
(well, there is a way to avoid disk writes if you use a RAM-backed filesystem and read/write from that filesystem, of course)
Chances are that commons fileupload has written the upload to the filesystem anyhow.
Check if your FileItem is an instance of DiskFileItem. If this is the case the write implementation of DiskFileItem willl try to move the file to the file object you pass. You are not causing any extra disk io then since the write already happened.

Categories