How to completely abort the output stream download? - java

we're currently working on the service that would archive the data and return it to the user as a ZipOutputStream. What we're currently looking for is an option to completely terminate the operation if something goes wrong on the server side. With our current implementation (just closing the response output stream) errors result in a malformed zip at the user side, but it can't be told if the archive is malformed or not before attempting to unzip it. The desired behavior would be something like download termination (from a browser perspective, for instance, it would result in an unsuccessful download indication (red cross icon or something similar, depending on the browser) explicitly telling the user that something went wrong). We're using Spring Boot, so any java code examples would really be appreciated, but if you know the underlying HTTP mechanism that is responsible for this kind of behavior, and can point in the right direction, that would be much appreciated too.
Here's what we have as of now (output being a response output stream of a Spring REST controller (HttpServletResponse.getOutputStream()) :
try (ZipOutputStream zipOutputStream = new ZipOutputStream(outputStream)) {
try {
for (ZipRecordFile fileInfo : zipRecord.listZipFileOverride()) {
InputStream fileStream = getFileStream(fileInfo.s3region(), fileInfo.s3bucket(),
fileInfo.s3key());
ZipEntry zipEntry = new ZipEntry(fileInfo.fileName());
zipOutputStream.putNextEntry(zipEntry);
fileStream.transferTo(zipOutputStream);
}
}
catch (Exception e) {
outputStream.close();
}
}

There isn't a (clean) way to do what you want:
Once you have started writing the ZIP file to the output stream, it is too late to change the HTTP response code. The response code is sent at the start of response.
Therefore, there is no proper way for the HTTP server to tell the HTTP client: "Hey ... ignore that ZIP file I sent you 'cos it is corrupt".
So what are the alternatives?
On the server side, create the entire ZIP as an in-memory object or write it to a temporary file. If you succeed, send an 2xx response followed by the ZIP data. If you fail, send a 4xx or 5xx response.
The main problem is that you need enough memory or file system space to hold the ZIP file.
Redesign your HTTP API so that the client can sent a second request to check if the first request's response contained a complete ZIP file.
You might be able to exploit MIME multipart encoding; see RFC 1341. Each part of a well-formed MIME multipart has a start marker and an end-marker. What you could try is to have your web-app construct the multipart stream containing the ZIP "by hand". If it decides it must abort the ZIP, it could just close the output stream without adding the required end marker.
The main problem with this is that you are depending on the HTTP stack on the client side to tell the browser (or whatever) that the multipart is corrupted. Furthermore, the browser (or whatever) must not pass on the partial (i.e. corrupt) ZIP file on to the user. I'm not sure if you can rely on (particular) web browsers to do that.
If you are running the download via custom code on the client side, you could conceivably implement your own encapsulation protocol. The effect would be the same as for 3 ... but you wouldn't be abusing the MIME spec.

Related

HttpServletRequest.getInputStream() does not unwrap chunked HTTP request

I am in the process of sending a HTTP chunked request to an internal system. I've confirmed other factors are not at play by ensuring that I can send small messages without chunk encoding.
My process was basically to change the Transfer-Encoding header to be chunked and I've removed the Content-Length header. Additionally, I am utilising an in-house ChunkedOutputStream which has been around for quite some time.
I am able to connect, obtain an output stream and send the data. The recipient then returns a 200 response so it seems the request was received and successfully handled. The endpoint receives the HTTP Request, and streams the data straight into a table (using HttpServletRequest.getInputStream()).
On inspecting the streamed data I can see that the chunk encoding information in the stream has not been unwrapped/decoded by the Tomcat container automatically. I've been trawling the Tomcat HTTPConnector documentation and can't find anything that alludes to the chunked encoding w.r.t how a chunk encoded message should be handled within a HttpServlet. I can't see other StackOverflow questions querying this so I suspect I am missing something basic.
My question boils down to:
Should Tomcat automatically decode the chunked encoding from my request and give me a "clean" InputStream when I call HttpServletRequest.getInputStream()?
If yes, is there configuration that needs to be updated to enable this functionality? Am I sending something wrong in the headers that is causing it to return the non-decoded stream?
If no, is it common practice to wrap input stream in a ChunkedInputStream or something similar when the Transfer-Encoding header is present ?
This is solved. As expected it was basic in my case.
The legacy system I was using provided handrolled methods to simplify the process of opening a HTTP Connection, sending headers and then using an OutputStream to send the content via a POST. I didn't realise, and it was in a rather obscure location, but the behind-the-scenes helper's we're identifying that I was not specifying a Content-Length thus added the TRANSFER_ENCODING=chunked header and wrapped the OutputStream in a ChunkedOutputStream. This resulted in me double encoding the contents, hence my endpoints (seeming) inability to decode it.
Case closed.

How to keep API Restful when GET request requires sizable JSON payload?

I'm building a java REST API using JAX-RS and to complete a GET request for a zip file I need a rather sizeable chunk of JSON to complete it. I'm not terribly experienced with REST but I do know that GET requests shouldn't have a request body and a POST shouldn't be returning a resource. So I guess my question is, how do I complete a request that contains JSON (currently in the message body) and expects a zip file in the response while keeping the application RESTful? It may be worth noting that the JSON could also contain a password
I have used POST for similar scenarios. This is a common scenarios for SEARCH operations where there is a need to send json data in request. Though using POST for getting an object is not as per REST standards, I found that to be the most suitable given the options available.
You can send body in GET requests, but that is not supported by all frameworks/tools/servers. This link discusses that in detail.
If you use POST for the operation, you can use https to send confidential information in the body.
You can think that your REST API exposes a virtual file system and the zip file you mentioned is just one resource in that VFS and have files in a certain directory to represent queries of that file system. Then you can create a new query object by sending a POST request to the queries directory, specifying all query parameters you need, such as chunk size and the path of the zip file in the VFS.
The virtual file system I am referring to is actually a directory containing other directories and files that can represent real files on the disk or metadata records in a database.
For example, say you start with the following directory layout in the VFS:
/myvfs
/files
/archive.zip
/queries
To download the archive.zip file you can send a simple GET request:
// Request:
GET /myvfs/files/archive.zip
But this will stream the entire file at once. In order to break it in parts, you can create a query in which you want to download chunks of 1MB:
// Request:
POST /myvfs/queries/archive.zip
{
chunk_size: 1048576
}
// Response:
{
query_id: 42,
chunks: 139
}
The new query lives at the address /myvfs/queries/archive.zip/42 and can be deleted by sending a DELETE request to that URL.
Now, you can download the zip file in parts. Note that the creation of the query does not actually create smaller files for each part, it only provides information about the offsets and the size of the chunks, information that can be persisted anywhere, from RAM to databases or plain text files.
To download the first 1MB chunk of the zip file, you can send a GET request:
GET /myvfs/queries/archive.zip/42/0
As a final note, you should also be aware that the query resource can be modeled to accommodate other scenarios, such as dynamic ranges of a certain file.
P.S. I am aware that the answer is not as clear as it should and I apologize for that. I will try to come back and refine it, as time permits.

How can i read a file in a ftp server using a servlet and then send this as a downloadable file to the user?

I have developed a servlet that offers some services.
I am using apache-commons-net FTPClient to log into a ftp server and read a file.
I want to make this file downloadle (aka send it to the outputstream maybe?) , but the only ways of reading a file that i know of are:
FTPClient.retrieveFileStream(String remote) and FTPClient.retrieveFile(String remote, OutputStream local).
I tried the first one and then wrote the InputStream i got to the outputStream of the servlet:
InputStream myFileStream = FTPClient.retrieveFileStream(fileName);
byte[] buffer = new byte[4096];
int length;
resp.reset();
resp.setContentType("text/csv");
resp.setHeader("Content-disposition","attachment; filename=\""+fileName+"\"");
OutputStream out = resp.getOutputStream();
while((length=myFileStream.read(buffer)) > 0){
out.write(buffer, 0, length);
}
myFileStream.close();
out.flush();
The Second One:
myClient.retrieveFile(fileName, resp.getOutputStream());
In both cases i get the text content of the file as a response and not the file itself.
Is there any way i can do this.
P.s. this code belongs to a medhod that is being called by the doPost() with http req and http resp as parameters.
If you want to download the file instead of just showing it, you have to change the content type you're sending to the browser (because it's browser's business to either display the data or save them as a file). Thus, do e.g.
resp.setContentType("application/octet-stream");
(instead of text/csv) to "hide" the real nature of the data from the browser and force it to save the data.
The problem was that i was using a google extension (DHC) to test my web service. and it displayed the file content instead of initializing the download.
I was making the file download in a doPost() method.
Solution:
I made it in a doGet() method and when accessed directly via browser everything works ok.
So i think it was only the extensions problem, which wrote the content of the response back to me instead of downloading the file attachment.
Thanks for the feedback to #Jozef

Streaming an upload with HttpClient/MultipartEntity

I've got a Tomcat instance right now that takes uploads and does some processing work on the data.
I want to replace this with a new servlet that conforms to a similar API. At first, I want this new servlet to just proxy all of the requests to the old one. They're running on separate JVMs, but on the same host.
I've been trying to use the HttpClient to proxy the upload, but it seems that the client waits for the stream to finish before it proxies the request. For large files, this causes the servlet to crash (I think it's buffering everything in memory).
Here's the code I'm currently using:
HttpPost httpPost = new HttpPost("http://localhost:8081/servlet");
String filePartName = request.getHeader("file_part_name");
_logger.info("Attaching file " + filePartName);
try {
Part filePart = request.getPart(filePartName);
MultipartEntity mpe = new MultipartEntity();
mpe.addPart(
filePartName,
new InputStreamBody(filePart.getInputStream(), filePartName)
);
httpPost.setEntity(mpe);
} catch (ServletException | IOException e) {
_logger.error("Caught exception trying to cross the streams, thanks Ghostbusters.", e);
throw new IllegalStateException("Could not proxy the request", e);
}
HttpResponse postResponse;
try {
postResponse = HTTP_CLIENT.execute(httpPost);
} catch (IOException e) {
_logger.error("Caught exception trying to cross the streams, thanks Ghostbusters.", e);
throw new IllegalStateException("Could not proxy the request", e);
}
I can't seem to figure out how to get HttpClient/HttpPost to stream the data as it comes in, instead of blocking until the first upload completes. Has anyone done something similar before? Is there an easier solution?
Thanks!
The issue lies in the way your request is processed by the Mime/Multiplart framework (the one you use to process your HTTPServletRequest, and access file parts).
The nature of a MIME/Multipart request is simple (at a high level), instead of having a traditionnal key=value content, those requests have much more complex syntax, that allows them to carry arbitrary, unstructured data (files to upload).
It basically looks like (taken from wikipedia):
Content-type: multipart/mixed; boundary="'''frontier'''"
This is a multi-part message in MIME format.
--'''frontier'''
Content-type: text/plain
This is the body of the message.
--'''frontier'''
Content-type: application/octet-stream
Content-Disposition: form-data; name="image1"
Content-transfer-encoding: base64
PGh0bWw+CiAgPGhlYWQ+CiAgPC9oZWFkPgogIDxib2R5PgogICAgPHA+VGhpcyBpcyB0aGUg
Ym9keSBvZiB0aGUgbWVzc2FnZS48L3A+CiAgPC9ib2R5Pgo8L2h0bWw+Cg==
--'''frontier'''--
The important part to note is that parts (that are separated by the boundary '''frontier''' here) have "names" (through the Content Disposition header), then follows the content. One such request can have any number of parts.
Now of course, the most simple, straightforward way to implement the parsing of such a request is to process it till the end, detect the boundary, and create a temporary file (or in-memory cache) to hold each part, identified by name.
Seeing the framework can not know what part you will need first (you may need the second part in your servlet call before the first), it parses the whole stream, and then, gives you back the control.
Therefore your call is blocked at this line
Part filePart = request.getPart(filePartName);
Here, the framework has to wait to parse the whole MIME part, before letting you use the result (even a rethorical, super optimised parser could not both parse lazily the stream, and allow you random access to any parts of the message, you'd have to choose between the two options).
So there's not much you can do...
Except, not use the Multipart parser. I wouldn't recommend this if you're not familiar with MIME (and/or MIME libraries such as Apache James), nor confident that you are in control of your request's structure.
But if you are, then you may bypass the framework processing, and access the raw stream of the request. You'd parse the MIME structure by hand, and stop when you hit the start of the request's body, and start building your HTTP Post at this point, being carefull to actually take care of MIME level technicalities (de-base64 ? de-gzip ?, ...).
Alternatively, if you think your server crashes because of an out of memory, it may very well be possible that your framework is configured to cache contents of the multpart in memory. But if there is a way to configure it to cache to disk, then this is a possible workaround.

The best way to send a file over a Network

I want to send a file to the Browser via the REST Interface.
Can you suggest the most efficient way to do it, Keeping in mind the following?
Not much traffic.
I am fetching the file from HBase which means when I fetch it from HBase I get it in Byte Array.
The files are not in any folder in the server. The files can only be fetched from the HBase table.
The Front end is PHP and I do not know PHP.
In the REST api you can just pass the byte array to Response and it takes care of itself.
Using the following code -
#Produces("image/jpg")
public Response getImage() {
<Fetch it from where ever you have it>
Response.ok(<byteArrayOfTheFile>).build();
}
I am giving case study of WebService by which i send file:
It is always good to encode the file content and send it to the destination where they will be decode it and read the content.
Sending as an attachment is always open to the world becasue it is not encrypted.And if the network having high trafic chances of failure is high.

Categories