Java Streaming Objects via HTTP: some Best Practises for Exception Handling? - java

For some larger data we use HTTP streaming of objects in our Java-/Spring-Boot-Application. For that we need to bypass Spring MVC a bit like this:
#GetMapping("/report")
public void generateReport(HttpServletResponse response) throws TransformEntryException {
response.addHeader("Content-Disposition", "attachment; filename=report.json");
response.addHeader("Content-Type", "application/stream+json");
response.setCharacterEncoding("UTF-8");
OutputStream out = response.getOutputStream();
Long count = reportService.findReportData()
.map(entry -> transfromEntry(entry))
.map(entry -> om.writeValueAsBytes(entry))
.peek(entry -> out.write(entry))
.count();
LOGGER.info("Generated Report with {} entries.", count);
}
(...I know this code won't compile - just for illustration purposes...)
This works great so far - except if something goes wrong: let's say after streaming 12 entries successfully, the 13th entry will trigger an TransformEntryException during transfromEntry().
The stream will stop here. And the client gets indicated that his download finished successfully - while it was only part of the file.
We can log this server side and also attach some warning or even stacktrace to the downloaded file, but the client gets indicated that his download finished successfully - while it was only part of or even corrupt file.
I know that the HTTP status code gets sent with the header - which is already out. Is there any other way to indicate to the client a failed download?
("Client" most cases means some Webbrowser)

Related

Prevent client timing out while a servlet generates a large download

I have a Java servlet that generates some arbitrary report file and returns it as a download to the user's browser. The file is written directly to the servlet's output stream, so if it is very large then it can successfully download in chunks. However, sometimes the resulting data is not large enough to get split into chunks, and either the client connection times out, or it runs successfully but the download doesn't appear in the browser's UI until it's 100% done.
This is what my servlet code looks like:
#Override
protected void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
response.setContentType("application/pdf");
response.setHeader("Content-Disposition", "attachment; filename=\"" + report.getFileName(params) + "\"");
try(OutputStream outputStream = response.getOutputStream()) {
// Send the first response ASAP to suppress client timeouts
response.flushBuffer(); // This doesn't seem to change anything??
// This calls some arbitrary function that writes data directly into the given stream
generateFile(outputStream);
}
}
I ran a particularly bad test where the file generation took 110,826ms. Once it got to the end, the client had downloaded a 0 byte file - I assume this is the result of a timeout. I am expecting this specific result to be somewhere between 10 and 30 KB - smaller than the servlet's buffer. When I ran a different test, it generated a lot of data quickly (up to 80MB total), so the download appeared in my browser after the first chunk was filled up.
Is there a way to force a downloaded file to appear in the browser (and prevent a timeout from occurring) before any actual data has been generated? Am I on the right track with that flushBuffer() call?
Well, it looks like shrinking the size of my output buffer with response.setBufferSize(1000); allowed my stress test file to download successfully. I still don't know why response.flushBuffer() didn't seem to do anything, but at least as long as I generate data quickly enough to fill that buffer size before timing out, the download will complete.

Vert.x Web (Scala): Read Multipart File Upload as Stream from Router without Saving to Disk

I know that it is possible to handle file uploads through a Vert.x Web Router simply using something like so:
myRouter.post("/upload")
.handler(BodyHandler.create("myUploadDirectory")))
.handler { context =>
// do something with the uploaded files and respond to the request
}
However, this saves the file on the local server (or maybe even a network share). It might be perfectly fine to buffer small files on disk temporarily and move them to another store in batches, but the same cannot be said for very large files (multiple gigabytes, for example).
What is a good way to read the file upload as a stream of bytes, for example, and write it directly to a final store, and then be able to handle failures and successes gracefully, all from a Router?
Proxying the upload this would avoid making the store publicly accessible to clients and possibly allow more fine-grained control of the upload process than just creating a local file on the server or exposing the object/blob store.
EDIT:
I know that an alternative is to to do something like this, to handle the file upload as a special case before handling requests with my Router:
val myHttpServer = myVertx.createHttpServer()
myHttpServer.requestHandler(request => {
if(request.absoluteURI().contains("/upload")) {
request.setExpectMultipart(true)
request.handler { buffer =>
// upload part of the file
}
request.endHandler { end =>
// perform some action when the upload is done
}
} else
myRouter.handle(request)
})
However, as you can see it looks pretty messy. It would be much cleaner to handle it with a Router.post() or something similar.
Am I going about this wrong or something?
I've tried doing the following to no avail (I only get an HTTP 500 and no useful errors in the log). Not even the exceptionHandler is fired.
myRouter.post("/upload")
.handler { context =>
context.request.setExpectMultipart(true)
context.request.uploadHandler { upload =>
upload.handler { buffer =>
// send chunk to backend
}
upload.endHandler { _ =>
// upload successful
}
upload.exceptionHandler { e =>
// handle the exception
}
}
}
SOLUTION:
So it turns out that I had added a BodyHandler to the Router before adding my routes. This is because I wanted to be able to receive a JSON body in other POST requests, and didn't want to have to type .handler(BodyHandler.create()) before every route that received JSON.
However, as stated in the name of the class... the body was then handled, meaning that I would be unable to add the UploadHandler to the request.
I'm not knowledgeable with Scala but here's a solution for Java:
router.post("/upload").handler(rc -> {
HttpServerRequest req = rc.request();
req.setExpectMultipart(true);
req.uploadHandler(upload -> {
upload.exceptionHandler(cause -> {
req.response().setChunked(true).end("Upload failed");
});
upload.endHandler(v -> {
req.response().setChunked(true).end("Successfully uploaded file");
});
upload.handler(buffer -> {
// Send to backend
});
});
});

passing an Akka stream to an upstream service to populate

I need to call an upstream service (Azure Blob Service) to push data to an OutputStream, which then i need to turn around and push it back to the client, thru akka. Without akka (and just servlet code), i'd just get the ServletOutputStream and pass it to the azure service's method.
The closest i can try to stumble upon, and clearly this is wrong, is something like this
Source<ByteString, OutputStream> source = StreamConverters.asOutputStream().mapMaterializedValue(os -> {
blobClient.download(os);
return os;
});
ResponseEntity resposeEntity = HttpEntities.create(ContentTypes.APPLICATION_OCTET_STREAM, preAuthData.getFileSize(), source);
sender().tell(new RequestResult(resposeEntity, StatusCodes.OK), self());
The idea is i'm calling an upstream service to get an outputstream populated by calling
blobClient.download(os);
It seems like the the lambda function gets called and returns, but then afterwards it fails, because there's no data or something. As if i'm not supposed to be have that lambda function do the work, but perhaps return some object that does the work? Not sure.
How does one do this?
The real issue here is that the Azure API is not designed for back-pressuring. There is no way for the output stream to signal back to Azure that it is not ready for more data. To put it another way: if Azure pushes data faster than you are able to consume it, there will have to be some ugly buffer overflow failure somewhere.
Accepting this fact, the next best thing we can do is:
Use Source.lazySource to only start downloading data when there is downstream demand (aka. the source is being run and data is being requested).
Put the download call in some other thread so that it continues executing without blocking the source from being returned. Once way to do this is with a Future (I'm not sure what Java best practices are, but should work fine either way). Although it won't matter initially, you may need to choose an execution context other than system.dispatcher - it all depends on whether download is blocking or not.
I apologize in advance if this Java code is malformed - I use Akka with Scala, so this is all from looking at the Akka Java API and Java syntax reference.
ResponseEntity responseEntity = HttpEntities.create(
ContentTypes.APPLICATION_OCTET_STREAM,
preAuthData.getFileSize(),
// Wait until there is downstream demand to intialize the source...
Source.lazySource(() -> {
// Pre-materialize the outputstream before the source starts running
Pair<OutputStream, Source<ByteString, NotUsed>> pair =
StreamConverters.asOutputStream().preMaterialize(system);
// Start writing into the download stream in a separate thread
Futures.future(() -> { blobClient.download(pair.first()); return pair.first(); }, system.getDispatcher());
// Return the source - it should start running since `lazySource` indicated demand
return pair.second();
})
);
sender().tell(new RequestResult(responseEntity, StatusCodes.OK), self());
The OutputStream in this case is the "materialized value" of the Source and it will only be created once the stream is run (or "materialized" into a running stream). Running it is out of your control since you hand the Source to Akka HTTP and that will later actually run your source.
.mapMaterializedValue(matval -> ...) is usually used to transform the materialized value but since it is invoked as a part of materialization you can use that to do side effects such as sending the matval in a message, just like you have figured out, there isn't necessarily anything wrong with that even if it looks funky. It is important to understand that the stream will not complete its materialization and become running until that lambda completes. This means problems if download() is blocking rather than forking off some work on a different thread and immediately returning.
There is however another solution: Source.preMaterialize(), it materializes the source and gives you a Pair of the materialized value and a new Source that can be used to consume the already started source:
Pair<OutputStream, Source<ByteString, NotUsed>> pair =
StreamConverters.asOutputStream().preMaterialize(system);
OutputStream os = pair.first();
Source<ByteString, NotUsed> source = pair.second();
Note that there are a few additional things to think of in your code, most importantly if the blobClient.download(os) call blocks until it is done and you call that from the actor, in that case you must make sure that your actor does not starve the dispatcher and stop other actors in your application from executing (see Akka docs: https://doc.akka.io/docs/akka/current/typed/dispatchers.html#blocking-needs-careful-management ).

what is delay between HttpPost being sent to server and server responding

I'm an uploading a zipfile from a Java desktop application to an Httpserver (running Tomcat 7), Im using Apache httpClient 4.5.3 and I display a progress bar showing progress using this wrapper solution https://github.com/x2on/gradle-hockeyapp-plugin/blob/master/src/main/groovy/de/felixschulze/gradle/util/ProgressHttpEntityWrapper.groovy
So in my code Im updating progressbar every time the callback gets called
HttpEntity reqEntity = MultipartEntityBuilder.create()
.addPart("email", comment)
.addPart("bin", binaryFile)
.build();
ProgressHttpEntityWrapper.ProgressCallback progressCallback = new ProgressHttpEntityWrapper.ProgressCallback() {
#Override
public void progress(final float progress) {
SwingUtilities.invokeLater(
new Runnable()
{
public void run()
{
MainWindow.logger.severe("progress:"+progress);
Counters.getUploadSupport().set((int)progress);
SongKong.refreshProgress(CreateAndSendSupportFilesCounters.UPLOAD_SUPPORT_FILES);
}
}
);
}
};
httpPost.setEntity(new ProgressHttpEntityWrapper(reqEntity, progressCallback));
HttpResponse response = httpclient.execute(httpPost);
HttpEntity resEntity = response.getEntity();
MainWindow.logger.severe("HttpResponse:"+response.getStatusLine());
This reports files uploaded as a percentage, but there is a sizeable delay between it reporting 100% creation and actually receiving http status from server.
07/07/2017 14.23.54:BST:CreateSupportFile$4$1:run:SEVERE: progress:99.19408
07/07/2017 14.23.54:BST:CreateSupportFile$4$1:run:SEVERE: progress:99.40069
07/07/2017 14.23.54:BST:CreateSupportFile$4$1:run:SEVERE: progress:99.6073
07/07/2017 14.23.54:BST:CreateSupportFile$4$1:run:SEVERE: progress:99.81391
07/07/2017 14.23.54:BST:CreateSupportFile$4$1:run:SEVERE: progress:99.99768
07/07/2017 14.23.54:BST:CreateSupportFile$4$1:run:SEVERE: progress:99.99778
07/07/2017 14.23.54:BST:CreateSupportFile$4$1:run:SEVERE: progress:99.99789
07/07/2017 14.23.54:BST:CreateSupportFile$4$1:run:SEVERE: progress:99.999794
07/07/2017 14.23.54:BST:CreateSupportFile$4$1:run:SEVERE: progress:99.9999
07/07/2017 14.23.54:BST:CreateSupportFile$4$1:run:SEVERE: progress:100.0
07/07/2017 14.24.11:BST:CreateSupportFile:sendAsHttpPost:SEVERE: HttpResponse:HTTP/1.1 200 OK
07/07/2017 14.24.11:BST:CreateSupportFile:sendAsHttpPost:SEVERE: Unknown Request
Note is not due to my tomcat code doing much since I haven't yet implemented the tomcat code for this function so it just defaults to the "Unknown Request" code.
protected void doPost(javax.servlet.http.HttpServletRequest request,
javax.servlet.http.HttpServletResponse response)
throws javax.servlet.ServletException, java.io.IOException
{
String createMacUpdateLicense = request.getParameter(RequestParameter.CREATEMACUPDATELICENSE.getName());
if(createMacUpdateLicense!=null)
{
createMacUpdateLicense(response, createMacUpdateLicense);
}
else
{
response.setCharacterEncoding("UTF-8");
response.setContentType("text/plain; charset=UTF-8; charset=UTF-8");
response.getWriter().println("Unknown Request");
response.getWriter().close();
}
}
How can I more accurately report to the user when it will complete
Update
I have now fully implemented the serverside, this has increased the discrepancy
#Override
protected void doPost(javax.servlet.http.HttpServletRequest request, javax.servlet.http.HttpServletResponse response)
throws javax.servlet.ServletException, java.io.IOException
{
String uploadSupportFiles = request.getParameter(RequestParameter.UPLOADSUPPORTFILES.getName());
if(uploadSupportFiles!=null)
{
uploadSupportFiles(request, response, uploadSupportFiles);
}
else
{
response.setCharacterEncoding("UTF-8");
response.setContentType("text/plain; charset=UTF-8; charset=UTF-8");
response.getWriter().println("Unknown Request");
response.getWriter().close();
}
}
private void uploadSupportFiles(HttpServletRequest request, HttpServletResponse response, String email) throws IOException
{
Part filePart;
response.setCharacterEncoding("UTF-8");
response.setContentType("text/plain; charset=UTF-8; charset=UTF-8");
try
{
filePart = request.getPart("bin");
String fileName = getSubmittedFileName(filePart);
response.getWriter().println(email+":File:" + fileName);
//Okay now save the zip file somewhere and email notification
File uploads = new File("/home/jthink/songkongsupport");
File supportFile = new File(uploads, email+".zip");
int count =0;
while(supportFile.exists())
{
supportFile = new File(uploads, email+"("+count+").zip");
count++;
}
InputStream input;
input = filePart.getInputStream();
Files.copy(input, supportFile.toPath());
Email.sendAlert("SongKongSupportUploaded:" + supportFile.getName(), "SongKongSupportUploaded:" + supportFile.getName());
response.getWriter().close();
}
catch(ServletException se)
{
response.getWriter().println(email+":"+se.getMessage());
response.getWriter().close();
}
}
Assuming your server-side code just writes the uploaded file somewhere and responds something like "DONE" at the end, here is a rough timeline of what happens:
Bytes written to socket OutputStream
============================|
<--> Buffering |
Bytes sent by TCP stack |
============================
<------> Network latency|
Bytes received by Tomcat
============================
| (Tomcat waits for all data to finish uploading
| before handing it out as "parts" for your code)
| File written to local file on server
| =====
|
| Response "DONE" written by servlet to socket output
| ==
| <---> Network latency
| == Response "DONE" received by client
| |
| |
"100%" for entity wrapper ^ Actual 100% ^
Discrepancy
<----------------------->
"Twilight Zone" : part of discrepancy you cannot do much about.
(progress feedback impossible without using much lower level APIs)
<--------------------->
The scales are of course completely arbitrary, but it shows that there are several factors that can participate into the discrepancy.
Your server writes the file after receiving all bytes, but it does not make a big difference here.
So, the factors:
(client side) Buffering (possibly at several levels) between the Java I/O layer and the OS network stack
Network latency
(server-side) Buffering (possibly at several levels) between the OS network stack and the Java I/O layer
Time to write (or finish writing) zip file on disk
Time to print response (negligible)
Network latency
(client side) Time to read response (negligible)
So you could take that discrepancy into account and adjust the "upload complete" step to 90% of the total progress, and jump from 90 to 100 when you get the final response. From 0% to 90% the user would see "Uploading", with a nice progress bar moving, then you show "Processing...", perhaps with a throbber, and when done, jump to 100%.
That's what many other tools do. Even when I download a file with my browser, there is a small lag towards the end, the download seems stuck at "almost" 100% for a second (or more on my old computer) before the file is actually usable.
If the "twilight zone" time is much higher than the upload time as perceived by your progress wrapper, you might have a problem, and your question would thus be "where does this delay come from?" (for now I don't know). In that case, please provide complete timings (& make sure client & server machines have their clocks synchronized).
If you reaaaally need a more accurate/smooth progress report towards the end, you will need a much more involved setup. You will probably need to use more low-level APIs on the server side (e.g. not using #MultipartConfig etc), in order to have your server do something like writing to disk as data is received (which makes error handling much more difficult), print a dot to output and flush, for every 1% of the file that is written to disk (or any other kind of progress you want, provided it's actual progress on server-side). Your client side would then have the ability to read that response progressively, and get accurate progress report. You can avoid threading on client side, it's fine to do this sequentially:
POST data, report progress but scaled to 90% (ie if wrapper says 50%, you report 45%)
when done, start reading output from server, and report 91%, 95%, whatever, up until 100%.
Even with that I'm not sure it's possible to display progress info for all the steps (especially between 100% sent and first byte the server can possibly send), so maybe even that extremely complex setup would be useless (it could very well stall at 90% for a moment, then go 91/92/...99/100 in an instant).
So really at this point it's probably not worth it. If you really have a 17s step between last byte sent by client, and response received, something else is off. Initially I was assuming it was for humongous files, but since then you said your files were up to 50MB, so you might have something else to look at.
some of the server-side code might change depending on how the chunk data is represented, but the concept is roughly the same. Let's say you are uploading a 10MB file and you have your chunk size set to 1MB. You will send 10 requests to the server with 1MB of data each. The client is actually responsible for breaking all of this up. That is what you will do in Javascript. Then, each request is sent up via HttpRequest along with some other data about the file, chunk number and number of chunks. Again, I use the plupload plugin which handles this for me so some of the Request data may differ between implementations.
The method I am showing you is part of a Webservice which outputs JSON
data back to the client. Your javascript can then parse the JSON and
look for an error or success message and act appropriately. Depending
on your implementation, the data you send back might be different.
The javascript will ultimately handle the progress bar or percentage
or whatever, increasing it as it gets successful chunk uploads. My
implementation for my project lets plupload deal with all that, but
maybe that article I gave you will give you more control over the
client-side.
protected void Upload()
{
HttpPostedFile file = Request.Files[0];
String relativeFilePath = "uploads/";
try
{
if(file == null)
throw new Exception("Invalid Request.");
//plupload uses "chunk" to indicate which chunk number is being sent
int chunk = (int)Request.Form["chunk"];
//plupload uses "chunks" to indicate how many total chunks are being sent
int chunks = (int)Request.Form["chunks"];
//plupload uses "name" to indicate the original filename for the file being uploaded
String filename = Request.Form["name"];
relativeFilePath += filename;
//Create a File Stream to manage the uploaded chunk using the original filename
//Note that if chunk == 0, we are using FileMode.Create because it is the first chunk
//otherwise, we use FileMode.Append to add to the byte array that was previously saved
using (FileStream fs = new FileStream(Server.MapPath(relativeFilePath), chunk == 0 ? FileMode.Create : FileMode.Append))
{
//create the byte array based on the data uploaded and save it to the FileStream
var buffer = new byte[file.InputStream.Length];
file.InputStream.Read(buffer, 0, buffer.Length);
fs.Write(buffer, 0, buffer.Length);
}
if((chunks == 0) || ((chunks > 0)&&(chunk == (chunks - 1))))
{
//This is final cleanup. Either there is only 1 chunk because the file size
//is less than the chunk size or there are multiple chunks and this is the final one
//At this point the file is already saved and complete, but maybe the path is only
//temporary and you want to move it to a final location
//in my code I rename the file to a GUID so that there is never a duplicate file name
//but that is based on my application's needs
Response.Write("{\"success\":\"File Upload Complete.\"}");
}
else
Response.Write("{\"success\":\"Chunk "+chunk+" of "+chunks+" uploaded.\"}");
}
catch(Exception ex)
{
//write a JSON object to the page and HtmlEncode any quotation marks/HTML tags
Response.Write("{\"error\":\""+HttpContext.Current.Server.HtmlEncode(ex.Message)+"\"});
}
}

Why is this URL not opened from Play! Framework 1.2.4?

I have a URL in my Play! app that routes to either HTML or XLSX depending on the extension that is passed in the URL, with a routes line like :-
# Calls
GET /calls.{format} Call.index
so calls.html renders the page, calls.xlsx downloads an Excel file (using Play Excel module). All works fine from the browser, a cURL request, etc.
I now want to be able to create an email and have the Excel attached to it, but I cannot pull the attachment. Here's the basic version of what I tried first :-
public static void sendReport(List<Object[]> invoicelines, String emailaddress) throws MalformedURLException, URISyntaxException
{
setFrom("Telco Analysis <test#test.com>");
addRecipient(emailaddress);
setSubject("Telco Analysis report");
EmailAttachment emailAttachment = new EmailAttachment();
URL url = new URL("http://localhost:9001/calls.xlsx");
emailAttachment.setURL(url);
emailAttachment.setName(url.getFile());
emailAttachment.setDescription("Test file");
addAttachment(emailAttachment);
send(invoicelines);
}
but it just doesn't pull the URL content, it just sits there without any error messages, with Chrome's page spinner going and ties up the web server (to the point that requests from another browser/machine don't appear to get serviced). If I send the email without the attachment, all is fine, so it's just the pulling of the file that appears to be the problem.
So far I've tried the above method, I've tried Play's WS webservice library, I've tried manually-crafted HttpRequests, etc. If I specify another URL (such as http://www.google.com) it works just fine.
Anyone able to assist?
I am making an assumption that you are running in Dev mode.
In Dev mode, you will likely have a single request execution pool, but in your controller that send an email, you are sending off a second request, which will block until your previous request has completed (which it won't because it is waiting for the second request to respond)...so....deadlock!
The resaon why external requests work fine, is because you are not causing the deadlock on your Play request pool.
Simple answer to your problem is to increase the value of the play.pool in the application.conf. Make sure that it is uncommented, and choose a value greater than 1!
# Execution pool
# ~~~~~
# Default to 1 thread in DEV mode or (nb processors + 1) threads in PROD mode.
# Try to keep a low as possible. 1 thread will serialize all requests (very useful for debugging purpose)
play.pool=3

Categories