Streaming large files with play framework and third party API - java

I'm writing a play 2 application and I am struggling with a file streaming problem.
I retrieve my files using a third party API with a method having the following signature:
FileMetadata getFile(OutputStream destination, String fileId)
In a traditional Servlet application, if I wanted to send the content to my client I would have done something like:
HttpServletResponse resp;
myService.getFile(resp.getOutpuStream, fileId);
My problem is that in my play 2 Controller class I don't have access to the underlying OuputStream, so the simplest implementation of my controller method would be:
public static downloadFile(String id) {
ByteArrayOutputStream baos = new BAOS(...);
myApi.getFile(baos,id); //Load inside temp Array
ByteArrayInputStream bais = new BAIS(baos.toByteArray())
return Ok(bais);
}
It will work but it requires to load the whole content into memory before serving it so it's not an option (files can be huge).
I was thinking of a solution consisting in:
Defining a ByteArrayOutputStream (baos) inside my controller
Calling the third party API with this baos in parameter
Using the chunk return of the play framework to send the content of
the baos as soon as something is written inside by the 3rd party API
Problem is that I don't know if it possible (call to getFile is blocking so it would require multiple threads with a shared OutputStream) nor if it's overkill.
As someone ever faced this kind of problem and found a solution?
Could my proposed solution solve my problem?
Any insights will be appreciated.
Thanks
EDIT 1
Based on kheraud suggestion I have managed to have a working, but still not perfect, solution (code below).
Unfortunately if a problem occurs during the call to the getFile method, error is not sent back to the client (because I returned Ok) and the browser waits indefinitely for a file that will never come.
Is there a way to handle this case ?
public static Result downloadFile(String fileId {
Thread readerThread = null;
try {
PipedOutputStream pos = new PipedOutputStream();
PipedInputStream pis = new PipedInputStream(pos);
//Reading must be done in another thread
readerThread = new DownloadFileWorker(fileId,pos);
readerThread.start();
return ok(pis);
} catch (Exception ex) {
ex.printStackTrace();
return internalServerError(ex.toString());
}
}
static class DownloadFileWorker extends Thread{
String fileId;
PipedOutputStream pos;
public DownloadFileWorker(String fileId, PipedOutputStream pos) {
super();
this.fileId = fileId
this.pos = pos;
}
public void run(){
try {
myApi.getFile(pos,fileId);
pos.close();
} catch (Exception ex) {
ex.printStackTrace();
}
}
}
EDIT 2
I found a way to avoid infinite loading of the page by simply adding a pos.close in the catch() part of the worker thread. Client ends up with a zero KB file but I guess that's better than an infinite waiting.

There is something in the Play2 Scala framework made for that : Enumerators. This is very close to what you are thinking about.
You should have a look at this doc page for details
I didn't find something similar in the Play2 Java API, but looking in the fw code source, you have a :
public static Results.Status ok(java.io.InputStream content, int chunkSize)
method which seams to be what you are looking for. The implementation can be found in play.mvc.Results and play.core.j.JavaResults classes.

On the Play! mailing list, there recently was a discussion on the same topic:
https://groups.google.com/forum/#!topic/play-framework/YunJzgxPKsU/discussion
It includes a small snippet that allows non-scala-literates (like myself) use the scala streaming interface of Play!.

Related

How do I handle storage in a Java console app that cannot use DB?

I am given an assignment where we are not allowed to use a DB or libraries but only textfile for data storage.
But it has rather complex requirements, for e.g. many validations, because of that, we need to "access the db" (i.e. read the textfile) many times.
My question is: should I create a class like this:
class SomeRepository{
static ArrayList<Users> users = new ArrayList();
public SomeRepository(){
//instantiate this class on program load
//In constructor, we read the text file, instantiate and store everything inside the arraylist.
}
//public getOneUser(){ // for get methods, we don't read from text file at all }
/public save() { //text file saving code overhere }
}
Is this a good approach to solve the above problem? Currently, what we are doing is reading and writing to the text file every time we want to retrieve some data or write something new.
Wouldn't this be too expensive in terms of heap space memory? Or should I just read/write to the text file for every method?
public class IOManager {
public static void writeObjToTxtFile(String fileName, Object object) {
File file = new File(fileName + ".txt");//File will be created in the root directory where the program runs.
try (FileOutputStream fos = new FileOutputStream(file);
ObjectOutputStream oos = new ObjectOutputStream(fos);) {
oos.writeObject(object);
} catch (IOException e) {
e.printStackTrace();
}
}
public static Object readObjFromTxtFile(String fileName) {
Object obj = null;
File file = new File(fileName + ".txt");
FileInputStream fis = null;
try {
fis = new FileInputStream(file);
ObjectInputStream ois = new ObjectInputStream(fis);
obj = ois.readObject();
} catch (ClassNotFoundException | IOException e) {
e.printStackTrace();
}
return obj;
}
}
Add this class to your project. Since it's general for all Objects, you can pass and receive Objects like these as well: ArrayList<Users>. Play around and Tinker with it to fit whatever your specific purpose is. Hint: You can write other custom methods that calls these methods. eg:
public static void writeUsersToFile(ArrayList<Users> usersArrayList){
writeObjToTxtFile("users",usersArrayList);
}
Ps. Make sure your Objects implement Serializable. Eg:
public class Users implements Serializable {
}
I would suggest reading the contents of your file to a dynamic list such as an arraylist at the start of your program. Make the required queries/changes to your arraylist and then write that arraylist to your file when the program is set to close. This will save significant time over repeated file reads/writes.
This isn't without it's drawbacks, though. You don't want to hogg up memory in case of very large files - but considering this is an assignment, that may not be the case. Additionally, should your program terminate prior to the write at the end, all changes made to your database during the current execution will be lost.

Write multiple files with same string without hanging the UI

I am working on an Android App that changes the CPU Frequency when a foreground app changes. The frequencies for the foreground app is defined in my application itself. But while changing the frequencies my app has to open multiple system files and replace the frequency with my text. This makes my UI slow and when I change apps continuously, it makes the systemUI crash. What can I do to write these multiple files all together at the same time?
I have tried using ASynctaskLoader but that too crashes the SystemUI later.
public static boolean setFreq(String max_freq, String min_freq) {
ByteArrayInputStream inputStream = new ByteArrayInputStream(max_freq.getBytes(Charset.forName("UTF-8")));
ByteArrayInputStream inputStream1 = new ByteArrayInputStream(min_freq.getBytes(Charset.forName("UTF-8")));
SuFileOutputStream outputStream;
SuFileOutputStream outputStream1;
try {
if (max_freq != null) {
int cpus = 0;
while (true) {
SuFile f = new SuFile(CPUActivity.MAX_FREQ_PATH.replace("cpu0", "cpu" + cpus));
SuFile f1 = new SuFile(CPUActivity.MIN_FREQ_PATH.replace("cpu0", "cpu" + cpus));
outputStream = new SuFileOutputStream(f);
outputStream1 = new SuFileOutputStream(f1);
ShellUtils.pump(inputStream, outputStream);
ShellUtils.pump(inputStream1, outputStream1);
if (!f.exists()) {
break;
}
cpus++;
}
}
} catch (Exception ex) {
}
return true;
}
I assume SuFile and SuFileOutputStream are your custom implementations extending Java File and FileOutputStream classes.
Couple of points need to be fixed first.
f.exists() check should be before initializing OutputStream, otherwise it will create the file before checking exists or not. This makes your while loop to become an infinite loop.
as #Daryll suggested, use the number of CPUs with while/for loop. I suggest using for loop.
close your streams after pump(..) method call.
If you want to keep the main thread free, then you can do something like this,
see this code segment:
public static void setFreq(final String max_freq, final String min_freq) {
new Thread(new Runnable() {
//Put all the stuff here
}).start();
}
This should solve your problem.
Determine the number of CPUs before hand and use that number in your loop rather than using a while (true) having to do SuFile.exists() every cycle.
I don't know what SuFileOutputStream is but you may need to close those file output streams or find a faster way to write the file if that implementation is too slow.

Java create tar archive with entries of unknown size

I have a web app where I need to be able to serve the user an archive of multiple files. I've set up a generic ArchiveExporter, and made a ZipArchiveExporter. Works beautifully! I can stream my data to my server, and archive the data and stream it to the user all without using much memory, and without needing a filesystem (I'm on Google App Engine).
Then I remembered about the whole zip64 thing with 4gb zip files. My archives can get potentially very large (high res images), so I'd like to have an option to avoid zip files for my larger input.
I checked out org.apache.commons.compress.archivers.tar.TarArchiveOutputStream and thought I had found what I needed! Sadly when I checked the docs, and ran into some errors; I quickly found out you MUST pass the size of each entry as you stream. This is a problem because the data is being streamed to me with no way of knowing the size beforehand.
I tried counting and returning the written bytes from export(), but TarArchiveOutputStream expects a size in TarArchiveEntry before writing to it, so that obviously doesn't work.
I can use a ByteArrayOutputStream and read each entry entirely before writing its content so I know its size, but my entries can pontentially get very large; and this is not very polite to the other processes running on the instance.
I could use some form of persistence, upload the entry, and query the data size. However, that would be a waste of my google storage api calls, bandwidth, storage, and runtime.
I am aware of this SO question asking almost the same thing, but he settled for using zip files and there is no more relevant information.
What is the ideal solution to creating a tar archive with entries of unknown size?
public abstract class ArchiveExporter<T extends OutputStream> extends Exporter { //base class
public abstract void export(OutputStream out); //from Exporter interface
public abstract void archiveItems(T t) throws IOException;
}
public class ZipArchiveExporter extends ArchiveExporter<ZipOutputStream> { //zip class, works as intended
#Override
public void export(OutputStream out) throws IOException {
try(ZipOutputStream zos = new ZipOutputStream(out, Charsets.UTF_8)) {
zos.setLevel(0);
archiveItems(zos);
}
}
#Override
protected void archiveItems(ZipOutputStream zos) throws IOException {
zos.putNextEntry(new ZipEntry(exporter.getFileName()));
exporter.export(zos);
//chained call to export from other exporter like json exporter for instance
zos.closeEntry();
}
}
public class TarArchiveExporter extends ArchiveExporter<TarArchiveOutputStream> {
#Override
public void export(OutputStream out) throws IOException {
try(TarArchiveOutputStream taos = new TarArchiveOutputStream(out, "UTF-8")) {
archiveItems(taos);
}
}
#Override
protected void archiveItems(TarArchiveOutputStream taos) throws IOException {
TarArchiveEntry entry = new TarArchiveEntry(exporter.getFileName());
//entry.setSize(?);
taos.putArchiveEntry(entry);
exporter.export(taos);
taos.closeArchiveEntry();
}
}
EDIT this is what I was thinking with the ByteArrayOutputStream. It works, but I cannot guarantee I will always have enough memory to store the whole entry at once, hence my streaming efforts. There has to be a more elegant way of streaming a tarball! Maybe this is a question more suited for Code Review?
protected void byteArrayOutputStreamApproach(TarArchiveOutputStream taos) throws IOException {
TarArchiveEntry entry = new TarArchiveEntry(exporter.getFileName());
try(ByteArrayOutputStream baos = new ByteArrayOutputStream()) {
exporter.export(baos);
byte[] data = baos.toByteArray();
//holding ENTIRE entry in memory. What if it's huge? What if it has more than Integer.MAX_VALUE bytes? :[
int len = data.length;
entry.setSize(len);
taos.putArchiveEntry(entry);
taos.write(data);
taos.closeArchiveEntry();
}
}
EDIT This is what I meant by uploading the entry to a medium (Google Cloud Storage in this case) to accurately query the whole size. Seems like major overkill for what seems like a simple problem, but this doesn't suffer from the same ram problems as the solution above. Just at the cost of bandwidth and time. I hope someone smarter than me comes by and makes me feel stupid soon :D
protected void googleCloudStorageTempFileApproach(TarArchiveOutputStream taos) throws IOException {
TarArchiveEntry entry = new TarArchiveEntry(exporter.getFileName());
String name = NameHelper.getRandomName(); //get random name for temp storage
BlobInfo blobInfo = BlobInfo.newBuilder(StorageHelper.OUTPUT_BUCKET, name).build(); //prepare upload of temp file
WritableByteChannel wbc = ApiContainer.storage.writer(blobInfo); //get WriteChannel for temp file
try(OutputStream out = Channels.newOutputStream(wbc)) {
exporter.export(out); //stream items to remote temp file
} finally {
wbc.close();
}
Blob blob = ApiContainer.storage.get(blobInfo.getBlobId());
long size = blob.getSize(); //accurately query the size after upload
entry.setSize(size);
taos.putArchiveEntry(entry);
ReadableByteChannel rbc = blob.reader(); //get ReadChannel for temp file
try(InputStream in = Channels.newInputStream(rbc)) {
IOUtils.copy(in, taos); //stream back to local tar stream from remote temp file
} finally {
rbc.close();
}
blob.delete(); //delete remote temp file
taos.closeArchiveEntry();
}
I've been looking at a similar issue, and this is a constraint of tar file format, as far as I can tell.
Tar files are written as a stream, and metadata (filenames, permissions etc) are written between the file data (i.e. metadata 1, filedata 1, metadata 2, filedata 2 etc). The program that extracts the data, it reads metadata 1, then starts extracting filedata 1, but it has to have a way of knowing when it's done. This could be done a number of ways; tar does this by having the length in the metadata.
Depending on your needs, and what the recipient expects out, there are a few options that I can see (not all apply to your situation):
As you mentioned, load an entire file, work out the length, then send it.
Divide the file into blocks, of predefined length (which fits into memory), then tar them up as file1-part1, file1-part2 etc.; the last block would be short.
Divide the file into blocks of a predefined length (which don't need to fit into memory), then pad the last block to that size with something appropriate.
Work out the maximum possible size of the file, and pad to that size.
Use a different archive format.
Make your own archive format, which does not have this limitation.
Interestingly, gzip does not have predefined limits, and multiple gzips can be concatenated together, each with it's own "original filename". Unfortunately, standard gunzip extracts all the resulting data into one file, using the (?) first filename.

Disabling Multipart Caching in CXF jax-rs

I posted this question to the CXF list, without any luck. So here we go. I am trying to upload large files to a remote server (think of them virtual machine disks). So I have a restful service that accepts upload requests. The handler for the upload looks like:
#POST
#Consumes(MediaType.MULTIPART_FORM_DATA)
#Path("/doupload")
public Response receiveStream(MultipartBody multipart) {
List<Attachment> allAttachments = body.getAllAttachments();
Attachment att = null;
for (Attachment b : allAttachments) {
if (UPLOAD_FILE_DESCRIPTOR.equals(b.getContentId())) {
att = b;
}
}
Assert.notNull(att);
DataHandler dh = att.getDataHandler();
if (dh == null) {
throw new WebApplicationException(HTTP_BAD_REQUEST);
}
try {
InputStream is = dh.getInputStream();
byte[] buf = new byte[65536];
int n;
OutputStream os = getOutputStream();
while ((n = is.read(buf)) > 0) {
os.write(buf, 0, n);
}
ResponseBuilder rb = Response.status(HTTP_CREATED);
return rb.build();
} catch (IOException e) {
log.error("Got exception=", e);
throw new WebApplicationException(HTTP_INTERNAL_ERROR);
} catch (NoSuchAlgorithmException e) {
log.error("Got exception=", e);
throw new WebApplicationException(HTTP_INTERNAL_ERROR);
} finally {}
}
The client for this code is fairly simple:
public void sendLargeFile(String filename) {
WebClient wc = WebClient.create(targetUrl);
InputStream is = new FileInputStream(new File(filename));
Response r = wc.post(new Attachment(Constants.UPLOAD_FILE_DESCRIPTOR,
MediaType.APPLICATION_OCTET_STREAM, is));
}
The code works fine in terms of functionality. In terms of performance, I noticed that before my handler (receiveStream() method) gets the first byte out of the stream, the whole stream actually gets persisted into a temporary file (using a CachedOutputStream). Unfortunately, this is not acceptable for my purposes.
My handler simply passes the incoming bytes to a backend storage system (virtual machine disk repository), and waiting for the whole disk to be written to a cache only to be read again takes a lot of time, tying up a lot of resources, and reducing throughput.
There is a cost associated with writing the blocks and reading them again, since the app is running in the cloud, and the cloud provider charges per block read/written.
Since every byte is written to the local disk, my service VM must have enough disk space to accommodate the total sizes of all the streams being uploaded (i.e., if I have 10 uploads of 100GB each, I must have 1TB of disk just to cache the content). That again is extra money, as the size of the service VM grows dramatically, and the cloud provider charges for the provisioned disk size as well.
Given all of this, I am looking for a way to use the HTTP InputStream (or as close to it as possible) to read the attachment directly from there and handle it afterwards. I guess the question translates into one of:
- Is there a way to tell CXF not do caching
- OR - is there a way to pass CXF an output stream (one I write) to use, rather than using CachedOutputStream
I found a similar question here. The resolution says use CXF 2.2.3 or later, I am using 2.4.4 (and tried with 2.7.0) with no luck.
Thanks.
I think it's logically not possible (neither in CXF or anywhere else). You're calling getAllAttachements(), which means that the server should collect information about them from the HTTP input stream. It means that the entire stream has to go into memory for MIME parsing.
In your case you should work directly with the stream, and do the MIME parsing yourself:
public Response receiveStream(InputStream input) {
Now you have full control of the input and can consume it into memory byte-by-byte.
I ended up fixing the problem in an unelegant way, but it works, so I wanted to share my experience. Please do let me know if there are some "standard" or better ways.
Since I am writing the server side, I knew I was accessing all the attachments in the order they were sent, and process them as they are streamed in. So, to reflect that behavior of the handler method (receiveStream() method above), I created a new annotation on the server side called "#SequentialAttachmentProcessing" and annotatated my above method with it.
Also, wrote a subclass of Attachment, called SequentialAttachment that acts like a linked list. It has a skip() method that skips over the current attachment, and when an attachment ends, hasMore() method tells you whether there is another one.
Then I wrote a custom multipart/form-data provider which behaves as follows: If the target method is annotated as above, handle the attachment, otherwise call the default provider to do the handling. When it is handled by my provider, it always returns at most one attachment. Hence it could be misleading to a non-suspecting handling method. However, I think it is acceptable since the writer of the server must have annotated the method as "#SequentialAttachmentProcessing" and therefore must know what that entails.
As a result the implementation of the receiveStream() method is now something like:
#POST
#SequentialAttachmentProcessing
#Consumes(MediaType.MULTIPART_FORM_DATA)
#Path("/doupload")
public Response receiveStream(MultipartBody multipart) {
List<Attachment> allAttachments = body.getAllAttachments();
Assert.isTrue(allAttachments.size() <= 1);
if (allAttachment.size() > 0) {
Attachment head = allAttachments.get(0);
Assert.isTrue(head instanceof SequentialAttachment);
SequentialAttachment att = (SequentialAttachment) head;
while (att != null) {
DataHandler dh = att.getDataHandler();
InputStream is = dh.getInputStream();
byte[] buf = new byte[65536];
int n;
OutputStream os = getOutputStream();
while ((n = is.read(buf)) > 0) {
os.write(buf, 0, n);
}
if (att.hasMore()) {
att = att.next();
}
}
}
}
While this solved my immediate problem, I still believe there has to be a standard way of doing this. I hope this helps someone.

Creating a List of BufferedReaders

I would like to have a method that would return a list of BufferedReader objects (for example for all files in a directory):
private List<BufferedReader> getInputReaders(List<String> filenames) {
List<BufferedReader> result = new ArrayList<BufferedReader>();
for(String filename : filenames)
result.add(new BufferedReader(new InputStreamReader(new FileInputStream(filename), "UTF-8")));
}
return result;
}
Will this be a major waste of resources?
Will all those streams be opened at the moment of creation and remain so therefore holding system resources?
If yes, can I create those readers in "passive" mode without actually opening streams, or is there any other workaround (so I can build a List with thousands of readers safely)?
Yes, the constructor for FileInputStream invokes open() in its constructor. open() is a native method, which will most likely reserve a file descriptor for the file.
Instead of immediately returning a list of BufferedReaders, why not return a list of something that will open the underlying stream as needed? You can create a class that holds onto a filename and simply open the resource when called.
I'm pretty sure it's a bad idea. You risk to consume all the available file descriptors, and there is no point in opening a reader to a file if you don't want to read from it.
If you want to read from the file, then open a reader, read from the file, and close the reader. Then, do the same for the next file to read from.
If you want a unique abstraction to read from various sources (URLs, files, etc.), then create your own Source interface, and multiple implementations which would wrap the resource to read from (URLSource, FileSource, etc.). Only open the actual reader on the wrapped resource when reading from your Source instance.
yes those streams will be opened as soon as they are created
good way to avoid this is to create a LazyReader class that only initializes the Reader on first read
public class LazyReader extends Reader{
String fileName;
Reader reader=null;
public LazyReader(String filename){
super();
this.fileName=fileName;
}
private void init(){
if(reader==null)
reader = new BufferedReader(new InputStreamReader(new FileInputStream(filename), "UTF-8"));
}
public int read(char[] cbuf, int off, int len){
init();
return reader.read(cbuff, off,len);
}
public int close(){
init();
reader.close();
}
//if you want marking you should also implement mark(int), reset() and markSupported()
}

Categories