Parsing multipart/form-data using Apache Commons File Upload

Parsing multipart/form-data using Apache Commons File Upload - java

Does Apache Commons File Upload package provides a generic interface to stream parse multipart/form-data chunks via InputStream, appending Array<Byte>, or via any other generic streaming interface?
I know they have a streaming API but the example only shows you how to do that via ServletFileUpload, which I reckon must be specific to Servlet.
If not, are there any other alternative frameworks in JVM that lets you do exactly this? Sadly, the framework that I am using, Spray.io, doesn't seem to provide a way to do this.

bayou.io has a generic MultipartParser
You might need some adapters to work with it, since it has its own
Async
and ByteSource
interfaces.
The following example shows how to use the parser synchronously with InputStream
String msg = ""
//+ "preamble\r\n"
+"--boundary\r\n"
+"Head1: Value1\r\n"
+"Head2: Value2\r\n"
+"\r\n"
+"body.body.body.body."
+"\r\n--boundary\r\n"
+"Head1: Value1\r\n"
+"Head2: Value2\r\n"
+"\r\n"
+"body.body.body.body."
+"\r\n--boundary--"
+ "epilogue";
InputStream is = new ByteArrayInputStream(msg.getBytes("ISO-8859-1"));
ByteSource byteSource = new InputStream2ByteSource(is, 1024);
MultipartParser parser = new MultipartParser(byteSource, "boundary");
while(true)
{
try
{
MultipartPart part = parser.getNextPart().sync(); // async -> sync
System.out.println("== part ==");
System.out.println(part.headers());
ByteSource body = part.body();
InputStream stream = new ByteSource2InputStream(body, Duration.ofSeconds(1));
drain(stream);
}
catch (End end) // control exception from getNextPart()
{
System.out.println("== end of parts ==");
break;
}
}

Related

Auto-Detect File Extension with APACHE JENA

I want to convert any file extension to .ttl (TURTLE) and I need to use Apache Jena, I am aware of how it can be accomplished using RDFJ4 but the output isn't as accurate as it is using Jena. I want to know how I can auto-detect the extension or rather file type if I am not aware of the extension when reading a file from a directory. This is my code when I hardcode the file-name, it works, I just need help in auto detecting the file type. My code is as follows:
public class Converter {
public static void main(String[] args) throws FileNotFoundException {
String fileName = "./abc.rdf";
Model model = ModelFactory.createDefaultModel();
//I know this is how it is done with RDF4J but I need to use Apache Jena.
/* RDFParser rdfParser = Rio.createParser(Rio.getWriterFormatForFileName(fileName).orElse(RDFFormat.RDFXML));
RDFWriter rdfWriter = Rio.createWriter(RDFFormat.TURTLE,
new FileOutputStream("./"+stripExtension(fileName)+".ttl"));*/
InputStream is = FileManager.get().open(fileName);
if (is != null) {
model.read(is, null, "RDF/XML");
model.write(new FileOutputStream("./converted.ttl"), "TURTLE");
} else {
System.err.println("cannot read " + fileName);
}
}
}
All help and advice will be highly appreciated.

There is functionality that handles reading from a file using the extension to determine the syntax:
RDFDataMgr.read(model, fileName);
It also handles compressed files e.g. "file.ttl.gz".
There is a registry of languages:
RDFLanguages.fileExtToLang(...)
RDFLanguages.filenameToLang(...)
For more control see RDFParser:
RDFParser.create().
source(FileName)
... many options including forcing the language ...
.parse(model);
https://jena.apache.org/documentation/io/rdf-input.html

Sending in-memory generated .docx files from server to client with Spark

I am creating a web application using the Spark Java framework. The front-end is developed using AngularJS.
I want to generate a .docx file on the server (in-memory) and send this to the client for download.
To achieve this I created an angular service with the following function being called after the user clicks on a download button:
functions.generateWord = function () {
$http.post('/api/v1/surveys/genword', data.currentSurvey).success(function (response) {
var element = angular.element('<a/>');
element.attr({
href: 'data:attachment;charset=utf-8;application/vnd.openxmlformats-officedocument.wordprocessingml.document' + response,
target: '_blank',
download: 'test.docx'
})[0].click();
});
};
On the server, this api call gets forwarded to the following method:
public Response exportToWord(Response response) {
try {
File file = new File("src/main/resources/template.docx");
FileInputStream inputStream = new FileInputStream(file);
byte byteStream[] = new byte[(int)file.length()];
inputStream.read(byteStream);
response.raw().setContentType("data:attachment;chatset=utf-8;application/vnd.openxmlformats-officedocument.wordprocessingml.document");
response.raw().setContentLength((int) file.length());
response.raw().getOutputStream().write(byteStream);
response.raw().getOutputStream().flush();
response.raw().getOutputStream().close();
return response;
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
I have tried to solve this in MANY different ways and I always end up with a corrupted 'test.docx' that looks like this:

Solved it by using blobs and specifying the response type as 'arraybuffer' in the $http.post api call. The only bad thing with this solution (as far as I know) is that it doesn't play well with IE, but that's a problem for another day.
functions.generateWord = function () {
$http.post('/api/v1/surveys/genword', data.currentSurvey, {responseType: 'arraybuffer'})
.success(function (response) {
var blob = new Blob([response], {type: 'application/vnd.openxmlformats-officedocument.wordprocessingml.document'});
var url = (window.URL || window.webkitURL).createObjectURL(blob);
var element = angular.element('<a/>');
element.attr({
href: url,
target: '_blank',
download: 'survey.docx'
})[0].click();
});
};
I think what went wrong was that the byte stream got encoded as plain text when I tried to create a URL with:
href: 'data:attachment;charset=utf-8;application/vnd.openxmlformats-officedocument.wordprocessingml.document' + response
thus corrupting it.
When using blobs instead, I get a "direct" link to the generated byte stream and no encoding is done on it since the response type is set to 'arraybuffer'.
Note that this is just my own reasoning of why things went wrong with the original code. I might be terribly wrong, so feel free to correct me if that's the case.

Disabling Multipart Caching in CXF jax-rs

I posted this question to the CXF list, without any luck. So here we go. I am trying to upload large files to a remote server (think of them virtual machine disks). So I have a restful service that accepts upload requests. The handler for the upload looks like:
#POST
#Consumes(MediaType.MULTIPART_FORM_DATA)
#Path("/doupload")
public Response receiveStream(MultipartBody multipart) {
List<Attachment> allAttachments = body.getAllAttachments();
Attachment att = null;
for (Attachment b : allAttachments) {
if (UPLOAD_FILE_DESCRIPTOR.equals(b.getContentId())) {
att = b;
}
}
Assert.notNull(att);
DataHandler dh = att.getDataHandler();
if (dh == null) {
throw new WebApplicationException(HTTP_BAD_REQUEST);
}
try {
InputStream is = dh.getInputStream();
byte[] buf = new byte[65536];
int n;
OutputStream os = getOutputStream();
while ((n = is.read(buf)) > 0) {
os.write(buf, 0, n);
}
ResponseBuilder rb = Response.status(HTTP_CREATED);
return rb.build();
} catch (IOException e) {
log.error("Got exception=", e);
throw new WebApplicationException(HTTP_INTERNAL_ERROR);
} catch (NoSuchAlgorithmException e) {
log.error("Got exception=", e);
throw new WebApplicationException(HTTP_INTERNAL_ERROR);
} finally {}
}
The client for this code is fairly simple:
public void sendLargeFile(String filename) {
WebClient wc = WebClient.create(targetUrl);
InputStream is = new FileInputStream(new File(filename));
Response r = wc.post(new Attachment(Constants.UPLOAD_FILE_DESCRIPTOR,
MediaType.APPLICATION_OCTET_STREAM, is));
}
The code works fine in terms of functionality. In terms of performance, I noticed that before my handler (receiveStream() method) gets the first byte out of the stream, the whole stream actually gets persisted into a temporary file (using a CachedOutputStream). Unfortunately, this is not acceptable for my purposes.
My handler simply passes the incoming bytes to a backend storage system (virtual machine disk repository), and waiting for the whole disk to be written to a cache only to be read again takes a lot of time, tying up a lot of resources, and reducing throughput.
There is a cost associated with writing the blocks and reading them again, since the app is running in the cloud, and the cloud provider charges per block read/written.
Since every byte is written to the local disk, my service VM must have enough disk space to accommodate the total sizes of all the streams being uploaded (i.e., if I have 10 uploads of 100GB each, I must have 1TB of disk just to cache the content). That again is extra money, as the size of the service VM grows dramatically, and the cloud provider charges for the provisioned disk size as well.
Given all of this, I am looking for a way to use the HTTP InputStream (or as close to it as possible) to read the attachment directly from there and handle it afterwards. I guess the question translates into one of:
- Is there a way to tell CXF not do caching
- OR - is there a way to pass CXF an output stream (one I write) to use, rather than using CachedOutputStream
I found a similar question here. The resolution says use CXF 2.2.3 or later, I am using 2.4.4 (and tried with 2.7.0) with no luck.
Thanks.

I think it's logically not possible (neither in CXF or anywhere else). You're calling getAllAttachements(), which means that the server should collect information about them from the HTTP input stream. It means that the entire stream has to go into memory for MIME parsing.
In your case you should work directly with the stream, and do the MIME parsing yourself:
public Response receiveStream(InputStream input) {
Now you have full control of the input and can consume it into memory byte-by-byte.

I ended up fixing the problem in an unelegant way, but it works, so I wanted to share my experience. Please do let me know if there are some "standard" or better ways.
Since I am writing the server side, I knew I was accessing all the attachments in the order they were sent, and process them as they are streamed in. So, to reflect that behavior of the handler method (receiveStream() method above), I created a new annotation on the server side called "#SequentialAttachmentProcessing" and annotatated my above method with it.
Also, wrote a subclass of Attachment, called SequentialAttachment that acts like a linked list. It has a skip() method that skips over the current attachment, and when an attachment ends, hasMore() method tells you whether there is another one.
Then I wrote a custom multipart/form-data provider which behaves as follows: If the target method is annotated as above, handle the attachment, otherwise call the default provider to do the handling. When it is handled by my provider, it always returns at most one attachment. Hence it could be misleading to a non-suspecting handling method. However, I think it is acceptable since the writer of the server must have annotated the method as "#SequentialAttachmentProcessing" and therefore must know what that entails.
As a result the implementation of the receiveStream() method is now something like:
#POST
#SequentialAttachmentProcessing
#Consumes(MediaType.MULTIPART_FORM_DATA)
#Path("/doupload")
public Response receiveStream(MultipartBody multipart) {
List<Attachment> allAttachments = body.getAllAttachments();
Assert.isTrue(allAttachments.size() <= 1);
if (allAttachment.size() > 0) {
Attachment head = allAttachments.get(0);
Assert.isTrue(head instanceof SequentialAttachment);
SequentialAttachment att = (SequentialAttachment) head;
while (att != null) {
DataHandler dh = att.getDataHandler();
InputStream is = dh.getInputStream();
byte[] buf = new byte[65536];
int n;
OutputStream os = getOutputStream();
while ((n = is.read(buf)) > 0) {
os.write(buf, 0, n);
}
if (att.hasMore()) {
att = att.next();
}
}
}
}
While this solved my immediate problem, I still believe there has to be a standard way of doing this. I hope this helps someone.

Streaming large files with play framework and third party API

I'm writing a play 2 application and I am struggling with a file streaming problem.
I retrieve my files using a third party API with a method having the following signature:
FileMetadata getFile(OutputStream destination, String fileId)
In a traditional Servlet application, if I wanted to send the content to my client I would have done something like:
HttpServletResponse resp;
myService.getFile(resp.getOutpuStream, fileId);
My problem is that in my play 2 Controller class I don't have access to the underlying OuputStream, so the simplest implementation of my controller method would be:
public static downloadFile(String id) {
ByteArrayOutputStream baos = new BAOS(...);
myApi.getFile(baos,id); //Load inside temp Array
ByteArrayInputStream bais = new BAIS(baos.toByteArray())
return Ok(bais);
}
It will work but it requires to load the whole content into memory before serving it so it's not an option (files can be huge).
I was thinking of a solution consisting in:
Defining a ByteArrayOutputStream (baos) inside my controller
Calling the third party API with this baos in parameter
Using the chunk return of the play framework to send the content of
the baos as soon as something is written inside by the 3rd party API
Problem is that I don't know if it possible (call to getFile is blocking so it would require multiple threads with a shared OutputStream) nor if it's overkill.
As someone ever faced this kind of problem and found a solution?
Could my proposed solution solve my problem?
Any insights will be appreciated.
Thanks
EDIT 1
Based on kheraud suggestion I have managed to have a working, but still not perfect, solution (code below).
Unfortunately if a problem occurs during the call to the getFile method, error is not sent back to the client (because I returned Ok) and the browser waits indefinitely for a file that will never come.
Is there a way to handle this case ?
public static Result downloadFile(String fileId {
Thread readerThread = null;
try {
PipedOutputStream pos = new PipedOutputStream();
PipedInputStream pis = new PipedInputStream(pos);
//Reading must be done in another thread
readerThread = new DownloadFileWorker(fileId,pos);
readerThread.start();
return ok(pis);
} catch (Exception ex) {
ex.printStackTrace();
return internalServerError(ex.toString());
}
}
static class DownloadFileWorker extends Thread{
String fileId;
PipedOutputStream pos;
public DownloadFileWorker(String fileId, PipedOutputStream pos) {
super();
this.fileId = fileId
this.pos = pos;
}
public void run(){
try {
myApi.getFile(pos,fileId);
pos.close();
} catch (Exception ex) {
ex.printStackTrace();
}
}
}
EDIT 2
I found a way to avoid infinite loading of the page by simply adding a pos.close in the catch() part of the worker thread. Client ends up with a zero KB file but I guess that's better than an infinite waiting.

There is something in the Play2 Scala framework made for that : Enumerators. This is very close to what you are thinking about.
You should have a look at this doc page for details
I didn't find something similar in the Play2 Java API, but looking in the fw code source, you have a :
public static Results.Status ok(java.io.InputStream content, int chunkSize)
method which seams to be what you are looking for. The implementation can be found in play.mvc.Results and play.core.j.JavaResults classes.

On the Play! mailing list, there recently was a discussion on the same topic:
https://groups.google.com/forum/#!topic/play-framework/YunJzgxPKsU/discussion
It includes a small snippet that allows non-scala-literates (like myself) use the scala streaming interface of Play!.

Java Proxy Servlet for submitting files

I'm attempting to use Panda with my GWT application. I can upload videos directly to my panda server using
POST MY_PANDA_SERVER/videos/MY_VIDEO_ID/upload
However I would like hide my panda server behind my J2EE (glassfish) server. I would like to achieve this:
Start upload to some servlet on my J2EE server
Authenticate user
POST the file to my panda server while still uploading to servlet
Ideally I would like to never store the file on the J2EE server, but just use it as a proxy to get to the panda server.

Commons FileUpload is nice, but not sufficient in your case. It will parse the entire body in memory before providing the file items (and streams). You're not interested in the individual items. You basically just want to stream the request body from the one to other side transparently without altering it or storing it in memory in any way. FileUpload would only parse the request body into some "useable" Java objects and HttpClient would only create the very same request body again based on those Java objects. Those Java objects consumes memory as well.
You don't need a library for this (or it must be Commons IO to replace the for loop with an oneliner using IOUtils#copy()). Just the basic Java NET and IO API's suffices. Here's a kickoff example:
protected void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
URLConnection connection = new URL("http://your.url.to.panda").openConnection();
connection.setDoOutput(true); // POST.
connection.setRequestProperty("Content-Type", request.getHeader("Content-Type")); // This one is important! You may want to check other request headers and copy it as well.
// Set streaming mode, else HttpURLConnection will buffer everything.
int contentLength = request.getContentLength();
if (contentLength > -1) {
// Content length is known beforehand, so no buffering will be taken place.
((HttpURLConnection) connection).setFixedLengthStreamingMode(contentLength);
} else {
// Content length is unknown, so send in 1KB chunks (which will also be the internal buffer size).
((HttpURLConnection) connection).setChunkedStreamingMode(1024);
}
InputStream input = request.getInputStream();
OutputStream output = connection.getOutputStream();
byte[] buffer = new byte[1024]; // Uses only 1KB of memory!
for (int length = 0; (length = input.read(buffer)) > 0;) {
output.write(buffer, 0, length);
output.flush();
}
output.close();
connection.getInputStream(); // Important! It's lazily executed.
}

You can use apache commons file upload to receive the file. Then you can use http client to upload the file to your panda server with POST. With apache commons file upload you can process the file in memory so you don't have to store it.

Building upon Enrique's answer, I also recommend to use FileUpload and HttpClient. FileUpload can give you a stream of the uploaded file:
// Create a new file upload handler
ServletFileUpload upload = new ServletFileUpload();
// Parse the request
FileItemIterator iter = upload.getItemIterator(request);
while (iter.hasNext()) {
FileItemStream item = iter.next();
String name = item.getFieldName();
InputStream stream = item.openStream();
if (item.isFormField()) {
System.out.println("Form field " + name + " with value "
+ Streams.asString(stream) + " detected.");
} else {
System.out.println("File field " + name + " with file name "
+ item.getName() + " detected.");
// Process the input stream
...
}
}
You could then use HttpClient or HttpComponents to do the POST. You can find an example here.

The best solution is to use apache-camel servlet component:
http://camel.apache.org/servlet.html

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Parsing multipart/form-data using Apache Commons File Upload - java

Related

Auto-Detect File Extension with APACHE JENA

Sending in-memory generated .docx files from server to client with Spark

Disabling Multipart Caching in CXF jax-rs

Streaming large files with play framework and third party API

Java Proxy Servlet for submitting files

Categories

Resources