I posted this question to the CXF list, without any luck. So here we go. I am trying to upload large files to a remote server (think of them virtual machine disks). So I have a restful service that accepts upload requests. The handler for the upload looks like:
#POST
#Consumes(MediaType.MULTIPART_FORM_DATA)
#Path("/doupload")
public Response receiveStream(MultipartBody multipart) {
List<Attachment> allAttachments = body.getAllAttachments();
Attachment att = null;
for (Attachment b : allAttachments) {
if (UPLOAD_FILE_DESCRIPTOR.equals(b.getContentId())) {
att = b;
}
}
Assert.notNull(att);
DataHandler dh = att.getDataHandler();
if (dh == null) {
throw new WebApplicationException(HTTP_BAD_REQUEST);
}
try {
InputStream is = dh.getInputStream();
byte[] buf = new byte[65536];
int n;
OutputStream os = getOutputStream();
while ((n = is.read(buf)) > 0) {
os.write(buf, 0, n);
}
ResponseBuilder rb = Response.status(HTTP_CREATED);
return rb.build();
} catch (IOException e) {
log.error("Got exception=", e);
throw new WebApplicationException(HTTP_INTERNAL_ERROR);
} catch (NoSuchAlgorithmException e) {
log.error("Got exception=", e);
throw new WebApplicationException(HTTP_INTERNAL_ERROR);
} finally {}
}
The client for this code is fairly simple:
public void sendLargeFile(String filename) {
WebClient wc = WebClient.create(targetUrl);
InputStream is = new FileInputStream(new File(filename));
Response r = wc.post(new Attachment(Constants.UPLOAD_FILE_DESCRIPTOR,
MediaType.APPLICATION_OCTET_STREAM, is));
}
The code works fine in terms of functionality. In terms of performance, I noticed that before my handler (receiveStream() method) gets the first byte out of the stream, the whole stream actually gets persisted into a temporary file (using a CachedOutputStream). Unfortunately, this is not acceptable for my purposes.
My handler simply passes the incoming bytes to a backend storage system (virtual machine disk repository), and waiting for the whole disk to be written to a cache only to be read again takes a lot of time, tying up a lot of resources, and reducing throughput.
There is a cost associated with writing the blocks and reading them again, since the app is running in the cloud, and the cloud provider charges per block read/written.
Since every byte is written to the local disk, my service VM must have enough disk space to accommodate the total sizes of all the streams being uploaded (i.e., if I have 10 uploads of 100GB each, I must have 1TB of disk just to cache the content). That again is extra money, as the size of the service VM grows dramatically, and the cloud provider charges for the provisioned disk size as well.
Given all of this, I am looking for a way to use the HTTP InputStream (or as close to it as possible) to read the attachment directly from there and handle it afterwards. I guess the question translates into one of:
- Is there a way to tell CXF not do caching
- OR - is there a way to pass CXF an output stream (one I write) to use, rather than using CachedOutputStream
I found a similar question here. The resolution says use CXF 2.2.3 or later, I am using 2.4.4 (and tried with 2.7.0) with no luck.
Thanks.
I think it's logically not possible (neither in CXF or anywhere else). You're calling getAllAttachements(), which means that the server should collect information about them from the HTTP input stream. It means that the entire stream has to go into memory for MIME parsing.
In your case you should work directly with the stream, and do the MIME parsing yourself:
public Response receiveStream(InputStream input) {
Now you have full control of the input and can consume it into memory byte-by-byte.
I ended up fixing the problem in an unelegant way, but it works, so I wanted to share my experience. Please do let me know if there are some "standard" or better ways.
Since I am writing the server side, I knew I was accessing all the attachments in the order they were sent, and process them as they are streamed in. So, to reflect that behavior of the handler method (receiveStream() method above), I created a new annotation on the server side called "#SequentialAttachmentProcessing" and annotatated my above method with it.
Also, wrote a subclass of Attachment, called SequentialAttachment that acts like a linked list. It has a skip() method that skips over the current attachment, and when an attachment ends, hasMore() method tells you whether there is another one.
Then I wrote a custom multipart/form-data provider which behaves as follows: If the target method is annotated as above, handle the attachment, otherwise call the default provider to do the handling. When it is handled by my provider, it always returns at most one attachment. Hence it could be misleading to a non-suspecting handling method. However, I think it is acceptable since the writer of the server must have annotated the method as "#SequentialAttachmentProcessing" and therefore must know what that entails.
As a result the implementation of the receiveStream() method is now something like:
#POST
#SequentialAttachmentProcessing
#Consumes(MediaType.MULTIPART_FORM_DATA)
#Path("/doupload")
public Response receiveStream(MultipartBody multipart) {
List<Attachment> allAttachments = body.getAllAttachments();
Assert.isTrue(allAttachments.size() <= 1);
if (allAttachment.size() > 0) {
Attachment head = allAttachments.get(0);
Assert.isTrue(head instanceof SequentialAttachment);
SequentialAttachment att = (SequentialAttachment) head;
while (att != null) {
DataHandler dh = att.getDataHandler();
InputStream is = dh.getInputStream();
byte[] buf = new byte[65536];
int n;
OutputStream os = getOutputStream();
while ((n = is.read(buf)) > 0) {
os.write(buf, 0, n);
}
if (att.hasMore()) {
att = att.next();
}
}
}
}
While this solved my immediate problem, I still believe there has to be a standard way of doing this. I hope this helps someone.
Related
Jakarta mail could not read email photo
I don't know how to handle it.
InputStream x.available is 0 ,
Could not download source data
this is my writePart code
public static void writePart(Part p) throws MessagingException, IOException {
if (p.isMimeType("multipart/*")) {
Multipart content = (Multipart) p.getContent();
for (int i = 0; i < content.getCount(); i++) {
writePart(content.getBodyPart(i));
}
}
else if (p.isMimeType("text/*")) {
System.out.println(p.getContent());
}
else if (p.isMimeType("image/jpeg")) {
MimeBodyPart iPart = (MimeBodyPart) p;
String imgName = iPart.getFileName();
InputStream x = (InputStream) iPart.getContent();
//here!!! x.available is zore,why
System.out.println("x.length = " + x.available());
int i = 0;
byte[] bArray = new byte[x.available()];
while ((i = x.available()) > 0) {
int result = x.read(bArray);
if (result == -1)
break;
}
//todo temp directory
File file = new File("tmp/" + imgName);
boolean b = file.getParentFile().mkdirs();
FileOutputStream f2 = new FileOutputStream(file);
f2.write(bArray);
}
}
This is not the correct way to read an InputStream. The documentation for InputStream.available states:
Returns an estimate of the number of bytes that can be read (or skipped over) from this input stream without blocking by the next invocation of a method for this input stream. The next invocation might be the same thread or another thread. A single read or skip of this many bytes will not block, but may read or skip fewer bytes.
Note that while some implementations of InputStream will return the total number of bytes in the stream, many will not. It is never correct to use the return value of this method to allocate a buffer intended to hold all data in this stream.
The key phrase there is "without blocking". This means there may be more data available on the remote side, so it's incorrect to use this to detect the end of the stream. Instead, you should check result (the return value from InputStream.read(bArray).
It's also incorrect to use the result from available to allocate bArray. It is not possible in general to determine the size of all data that will be produced by an InputStream. In fact, regardless of the size of the buffer, it is not guaranteed that all of the data will be returned from the InputStream in a single call to read, so there needs to be a loop that either appends each read to a dynamically-sized buffer, or processes the read data in a streaming fashion. The latter is preferable, as it doesn't require retaining all of the data in memory at once, which could cause heap space exhaustion for large attachments in this case.
Implementing low-level I/O operations correctly can be tricky, so it's better to use a high-level method when available. In the case of MimeBodyPart, you can use the writeTo method to automatically write to the FileOutputStream:
iPart.writeTo(f2);
This eliminates the need to explicitly get and read from the InputStream.
I'm attempting to copy / duplicate a DocumentFile in an Android application, but upon inspecting the created duplicate, it does not appear to be exactly the same as the original (which is causing a problem, because I need to do an MD5 check on both files the next time a copy is called, so as to avoid overwriting the same files).
The process is as follows:
User selects a file from a ACTION_OPEN_DOCUMENT_TREE
Source file's type is obtained
New DocumentFile in target location is initialised
Contents of first file is duplicated into second file
The initial stages are done with the following code:
// Get the source file's type
String sourceFileType = MimeTypeMap.getSingleton().getExtensionFromMimeType(contextRef.getContentResolver().getType(file.getUri()));
// Create the new (empty) file
DocumentFile newFile = targetLocation.createFile(sourceFileType, file.getName());
// Copy the file
CopyBufferedFile(new BufferedInputStream(contextRef.getContentResolver().openInputStream(file.getUri())), new BufferedOutputStream(contextRef.getContentResolver().openOutputStream(newFile.getUri())));
The main copy process is done using the following snippet:
void CopyBufferedFile(BufferedInputStream bufferedInputStream, BufferedOutputStream bufferedOutputStream)
{
// Duplicate the contents of the temporary local File to the DocumentFile
try
{
byte[] buf = new byte[1024];
bufferedInputStream.read(buf);
do
{
bufferedOutputStream.write(buf);
}
while(bufferedInputStream.read(buf) != -1);
}
catch (IOException e)
{
e.printStackTrace();
}
finally
{
try
{
if (bufferedInputStream != null) bufferedInputStream.close();
if (bufferedOutputStream != null) bufferedOutputStream.close();
}
catch (IOException e)
{
e.printStackTrace();
}
}
}
The problem that I'm facing, is that although the file copies successfully and is usable (it's a picture of a cat, and it's still a picture of a cat in the destination), it is slightly different.
The file size has changed from 2261840 to 2262016 (+176)
The MD5 hash has changed completely
Is there something wrong with my copying code that is causing the file to change slightly?
Thanks in advance.
Your copying code is incorrect. It is assuming (incorrectly) that each call to read will either return buffer.length bytes or return -1.
What you should do is capture the number of bytes read in a variable each time, and then write exactly that number of bytes. Your code for closing the streams is verbose and (in theory1) buggy as well.
Here is a rewrite that addresses both of those issues, and some others as well.
void copyBufferedFile(BufferedInputStream bufferedInputStream,
BufferedOutputStream bufferedOutputStream)
throws IOException
{
try (BufferedInputStream in = bufferedInputStream;
BufferedOutputStream out = bufferedOutputStream)
{
byte[] buf = new byte[1024];
int nosRead;
while ((nosRead = in.read(buf)) != -1) // read this carefully ...
{
out.write(buf, 0, nosRead);
}
}
}
As you can see, I have gotten rid of the bogus "catch and squash exception" handlers, and fixed the resource leak using Java 7+ try with resources.
There are still a couple of issues:
It is better for the copy function to take file name strings (or File or Path objects) as parameters and be responsible for opening the streams.
Given that you are doing block reads and writes, there is little value in using buffered streams. (Indeed, it might conceivably be making the I/O slower.) It would be better to use plain streams and make the buffer the same size as the default buffer size used by the Buffered* classes .... or larger.
If you are really concerned about performance, try using transferFrom as described here:
https://www.journaldev.com/861/java-copy-file
1 - In theory, if the bufferedInputStream.close() throws an exception, the bufferedOutputStream.close() call will be skipped. In practice, it is unlikely that closing an input stream will throw an exception. But either way, the try with resource approach will deals with this correctly, and far more concisely.
I want to download large files from Google Cloud Storage using the google provided Java library com.google.cloud.storage. I have working code, but I still have one question and one major concern:
My major concern is, when is the file content actually downloaded? During (references to the code below) storage.get(blobId), during blob.reader() or during reader.read(bytes)? This gets very important when it comes to how to handle an invalid checksum, what do I need to do in order to actually trigger that the file is fetched over the network again?
The simpler question is: Is there built in functionality to do md5 (or crc32c) check on the received file in the google library? Maybe I don't need to implement it on my own.
Here is my method trying to download big files from Google Cloud Storage:
private static final int MAX_NUMBER_OF_TRIES = 3;
public Path downloadFile(String storageFileName, String bucketName) throws IOException {
// In my real code, this is a field populated in the constructor.
Storage storage = Objects.requireNonNull(StorageOptions.getDefaultInstance().getService());
BlobId blobId = BlobId.of(bucketName, storageFileName);
Path outputFile = Paths.get(storageFileName.replaceAll("/", "-"));
int retryCounter = 1;
Blob blob;
boolean checksumOk;
MessageDigest messageDigest;
try {
messageDigest = MessageDigest.getInstance("MD5");
} catch (NoSuchAlgorithmException ex) {
throw new RuntimeException(ex);
}
do {
LOGGER.debug("Start download file {} from bucket {} to Content Store (try {})", storageFileName, bucketName, retryCounter);
blob = storage.get(blobId);
if (null == blob) {
throw new CloudStorageCommunicationException("Failed to download file after " + retryCounter + " tries.");
}
if (Files.exists(outputFile)) {
Files.delete(outputFile);
}
try (ReadChannel reader = blob.reader();
FileChannel channel = new FileOutputStream(outputFile.toFile(), true).getChannel()) {
ByteBuffer bytes = ByteBuffer.allocate(128 * 1024);
int bytesRead = reader.read(bytes);
while (bytesRead > 0) {
bytes.flip();
messageDigest.update(bytes.array(), 0, bytesRead);
channel.write(bytes);
bytes.clear();
bytesRead = reader.read(bytes);
}
}
String checksum = Base64.encodeBase64String(messageDigest.digest());
checksumOk = checksum.equals(blob.getMd5());
if (!checksumOk) {
Files.delete(outputFile);
messageDigest.reset();
}
} while (++retryCounter <= MAX_NUMBER_OF_TRIES && !checksumOk);
if (!checksumOk) {
throw new CloudStorageCommunicationException("Failed to download file after " + MAX_NUMBER_OF_TRIES + " tries.");
}
return outputFile;
}
The google-cloud-java storage library does not validate checksums on its own when reading data beyond normal HTTPS/TCP correctness checking. If it compared the MD5 of the received data to the known MD5, it would need to download the entire file before it could return any results from read(), which for very large files would be infeasible.
What you're doing is a good idea if you need the additional protection of comparing MD5s. If this is a one-off task, you could use the gsutil command-line tool, which does this same sort of additional check.
As the JavaDoc of ReadChannel says:
Implementations of this class may buffer data internally to reduce remote calls.
So the implementation you get from blob.reader() could cache the whole file, some bytes or nothing and just fetch byte for byte when you call read(). You will never know and you shouldn't care.
As only read() throws an IOException and the other methods you used do not, I'd say that only calling read() will actually download stuff. You can also see this in the sources of the lib.
Btw. despite the example in the JavaDocs of the library, you should check for >= 0, not > 0. 0 just means nothing was read, not that end of stream is reached. End of stream is signaled by returning -1.
For retrying after a failed checksum check, get a new reader from the blob. If something caches the downloaded data, then the reader itself. So if you get a new reader from the blob, the file will be redownloaded from remote.
I am trying to create a simple demo for the ImageLoader functionality for the Android Volley Framework. Constructor is the following:
public ImageLoader(RequestQueue queue, ImageCache imageCache)
The problem is with the ImageCache. Its JavaDoc states:
Simple cache adapter interface. If provided to the ImageLoader, it
will be used as an L1 cache before dispatch to Volley. Implementations
must not block. Implementation with an LruCache is recommended.
What exactly the 'Implementations must not block' in this context means?
Is there an example of non-blocking file cache (even non-android but "pure" java) which I can use to educate my self how to convert my existing file cache to be non-blocking?
If no such exist - what may be the negative implications of using my existing implementation which is (just the reading from the file):
public byte[] get(String filename) {
byte[] ret = null;
if (filesCache.containsKey(filename)) {
FileInfo fi = filesCache.get(filename);
BufferedInputStream input;
String path = cacheDir + "/" + fi.getStorageFilename();
try {
File file = new File(path);
if (file.exists()) {
input = new BufferedInputStream(new FileInputStream(file));
ret = IOUtils.toByteArray(input);
input.close();
} else {
KhandroidLog.e("Cannot find file " + path);
}
} catch (FileNotFoundException e) {
filesCache.remove(filename);
KhandroidLog.e("Cannot find file: " + path);
} catch (IOException e) {
KhandroidLog.e(e.getMessage());
}
}
return ret;
}
What exactly the 'Implementations must not block' in this context means?
In your case, you cannot do disk I/O.
This is a Level One (L1) cache, meaning it is designed to return in a matter of microseconds, not milliseconds or seconds. That's why they advocate LruCache, which is a memory cache.
Is there an example of non-blocking file cache (even non-android but "pure" java) which I can use to educate my self how to convert my existing file cache to be non-blocking?
An L1 cache should not be a file cache.
what may be the negative implications of using my existing implementation which is (just the reading from the file)
An L1 cache should not be a file cache.
Volley already has an integrated L2 file cache, named DiskBasedCache, used for caching HTTP responses. You can substitute your own implementation of Cache for DiskBasedCache if you wish, and supply that when you create your RequestQueue.
I'm writing a play 2 application and I am struggling with a file streaming problem.
I retrieve my files using a third party API with a method having the following signature:
FileMetadata getFile(OutputStream destination, String fileId)
In a traditional Servlet application, if I wanted to send the content to my client I would have done something like:
HttpServletResponse resp;
myService.getFile(resp.getOutpuStream, fileId);
My problem is that in my play 2 Controller class I don't have access to the underlying OuputStream, so the simplest implementation of my controller method would be:
public static downloadFile(String id) {
ByteArrayOutputStream baos = new BAOS(...);
myApi.getFile(baos,id); //Load inside temp Array
ByteArrayInputStream bais = new BAIS(baos.toByteArray())
return Ok(bais);
}
It will work but it requires to load the whole content into memory before serving it so it's not an option (files can be huge).
I was thinking of a solution consisting in:
Defining a ByteArrayOutputStream (baos) inside my controller
Calling the third party API with this baos in parameter
Using the chunk return of the play framework to send the content of
the baos as soon as something is written inside by the 3rd party API
Problem is that I don't know if it possible (call to getFile is blocking so it would require multiple threads with a shared OutputStream) nor if it's overkill.
As someone ever faced this kind of problem and found a solution?
Could my proposed solution solve my problem?
Any insights will be appreciated.
Thanks
EDIT 1
Based on kheraud suggestion I have managed to have a working, but still not perfect, solution (code below).
Unfortunately if a problem occurs during the call to the getFile method, error is not sent back to the client (because I returned Ok) and the browser waits indefinitely for a file that will never come.
Is there a way to handle this case ?
public static Result downloadFile(String fileId {
Thread readerThread = null;
try {
PipedOutputStream pos = new PipedOutputStream();
PipedInputStream pis = new PipedInputStream(pos);
//Reading must be done in another thread
readerThread = new DownloadFileWorker(fileId,pos);
readerThread.start();
return ok(pis);
} catch (Exception ex) {
ex.printStackTrace();
return internalServerError(ex.toString());
}
}
static class DownloadFileWorker extends Thread{
String fileId;
PipedOutputStream pos;
public DownloadFileWorker(String fileId, PipedOutputStream pos) {
super();
this.fileId = fileId
this.pos = pos;
}
public void run(){
try {
myApi.getFile(pos,fileId);
pos.close();
} catch (Exception ex) {
ex.printStackTrace();
}
}
}
EDIT 2
I found a way to avoid infinite loading of the page by simply adding a pos.close in the catch() part of the worker thread. Client ends up with a zero KB file but I guess that's better than an infinite waiting.
There is something in the Play2 Scala framework made for that : Enumerators. This is very close to what you are thinking about.
You should have a look at this doc page for details
I didn't find something similar in the Play2 Java API, but looking in the fw code source, you have a :
public static Results.Status ok(java.io.InputStream content, int chunkSize)
method which seams to be what you are looking for. The implementation can be found in play.mvc.Results and play.core.j.JavaResults classes.
On the Play! mailing list, there recently was a discussion on the same topic:
https://groups.google.com/forum/#!topic/play-framework/YunJzgxPKsU/discussion
It includes a small snippet that allows non-scala-literates (like myself) use the scala streaming interface of Play!.