How to decompress a Flux<DataBuffer> (and how to write one)? - java

I have a requirement to read and write compressed (GZIP) streams without intermediate storage. Currently, I'm using Spring RestTemplate to do the writing, and Apache HTTP client to do the reading (see my answer here for an explanation of why RestTemplate can't be used for reading large streams). The implementation is fairly straightforward, where I slap a GZIPInputStream on the response InputStream and move on.
Now, I'd like to switch to using Spring 5 WebClient (just because I'm not a fan of status quo). However, WebClient is reactive in nature and deals with Flux<Stuff>; I believe it's possible to get a Flux<DataBuffer>, where DataBuffer is an abstraction over ByteBuffer. Question is, how do I decompress it on the fly without having to store the full stream in memory (OutOfMemoryError, I'm looking at you), or writing to local disk? It's worth mentioning that WebClient uses Netty under the hood.
Also see Reactor Netty issue-251.
Also related to Spring integration issue-2300.
I'll admit to not knowing much about (de)compression, however, I did my research, but none of the material available online seemed particularly helpful.
compression on java nio direct buffers
Writing GZIP file with nio
Reading a GZIP file from a FileChannel (Java NIO)
(de)compressing files using NIO
Iterable gzip deflate/inflate in Java

public class HttpResponseHeadersHandler extends ChannelInboundHandlerAdapter {
private final HttpHeaders httpHeaders;
#Override
public void channelRead(ChannelHandlerContext ctx, Object msg) {
if (msg instanceof HttpResponse &&
!HttpStatus.resolve(((HttpResponse) msg).status().code()).is1xxInformational()) {
HttpHeaders headers = ((HttpResponse) msg).headers();
httpHeaders.forEach(e -> {
log.warn("Modifying {} from: {} to: {}.", e.getKey(), headers.get(e.getKey()), e.getValue());
headers.set(e.getKey(), e.getValue());
});
}
ctx.fireChannelRead(msg);
}
}
Then I create a ClientHttpConnector to use with WebClient and in afterNettyContextInit add the handler:
ctx.addHandlerLast(new ReadTimeoutHandler(readTimeoutMillis, TimeUnit.MILLISECONDS));
ctx.addHandlerLast(new Slf4JLoggingHandler());
if (forceDecompression) {
io.netty.handler.codec.http.HttpHeaders httpHeaders = new ReadOnlyHttpHeaders(
true,
CONTENT_ENCODING, GZIP,
CONTENT_TYPE, APPLICATION_JSON
);
HttpResponseHeadersHandler headersModifier = new HttpResponseHeadersHandler(httpHeaders);
ctx.addHandlerFirst(headersModifier);
}
ctx.addHandlerLast(new HttpContentDecompressor());
This, of course, would fail for responses that are not GZIP compressed, so I use this instance of WebClient for a particular use case only, where I know for sure that the response is compressed.
Writing is easy: Spring has a ResourceEncoder, so InputStream can simply be converted to InputStreamResource, and voila!

Noting this here as it confused me a bit - the API has changed a bit as of 5.1.
I have a similar setup to the accepted answer for the ChannelInboundHandler:
public class GzipJsonHeadersHandler extends ChannelInboundHandlerAdapter {
#Override
public void channelRead(ChannelHandlerContext ctx, Object msg) {
if (msg instanceof HttpResponse
&& !HttpStatus.resolve(((HttpResponse) msg).status().code()).is1xxInformational()) {
HttpHeaders headers = ((HttpResponse) msg).headers();
headers.clear();
headers.set(HttpHeaderNames.CONTENT_ENCODING, HttpHeaderValues.GZIP);
headers.set(HttpHeaderNames.CONTENT_TYPE, HttpHeaderValues.APPLICATION_JSON);
}
ctx.fireChannelRead(msg);
}
}
(The header values I needed are just hard-coded there for simplicity, otherwise it's identical.)
To register it however is different:
WebClient.builder()
.clientConnector(
new ReactorClientHttpConnector(
HttpClient.from(
TcpClient.create()
.doOnConnected(c -> {
c.addHandlerFirst(new HttpContentDecompressor());
c.addHandlerFirst(new HttpResponseHeadersHandler());
})
).compress(true)
)
)
.build();
It seems Netty now maintains a user list of handlers separate from (and after) the system list, and addHandlerFirst() only puts your handler at the front of the user list. It therefore requires an explicit call to HttpContentDecompressor to ensure it's definitely executed after your handler that inserts the correct headers.

Related

Sending Inputstream in spring integration

I have a project where I want to sending a pdf file to an ftp server.
I am creating the file using pdfbox and changing it to an Inputstream and then I was to pass this input scream value to a remote FTP and save it as .pdf.
I have the below code but not sure how I can pass the data to the outbound adapter.
#Bean
public IntegrationFlow localToFtpFlow() {
return IntegrationFlows.from("toFtpChannel")
.handle(Ftp.outboundAdapter(sf())
.remoteDirectory("/ftp/forklift_checklist"))
.get();
}
#MessagingGateway
public interface MyGateway {
#Gateway(requestChannel = "toFtpChannel")
void sendToFtp(InputStream file);
}
Not sure what why is the question.
What you have so far is OK:
You call that sendToFtp() gateway's method with an InputStream for a local file.
The Ftp.outboundAdapter(sf() is based on the this.remoteFileTemplate.send(message, this.mode) operation which really supports an InputStream for a request payload:
else if (payload instanceof InputStream) {
return new StreamHolder((InputStream) payload, "InputStream payload");
}
So, share with us, please, what the problem are you observing with your configuration?
Perhaps you are looking into a fileName to give for that data while saving to FTP. Consider to have another gateway argument as a #Header(FileHeaders.FILENAME) String fileName. The RemoteFileTemplate relies on a DefaultFileNameGenerator which looks into that header by default.

Does Java HTTP Client handle compression

I tried to find any mention of handling of compression in new Java HTTP Client but failed. Is there a built-in configuration to handle for e.g. gzip or deflate compression?
I would expect to have a BodyHandler for e.g. something like this:
HttpResponse.BodyHandlers.ofGzipped(HttpResponse.BodyHandlers.ofString())
but I don't see any. I don't see any configuration in HttpClient either. Am I looking in the wrong place or was this intentionally not implemented and deferred to support libraries?
I was also surprised that the new java.net.http framework doesn't handle this automatically, but the following works for me to handle HTTP responses which are received as an InputStream and are either uncompressed or compressed with gzip:
public static InputStream getDecodedInputStream(
HttpResponse<InputStream> httpResponse) {
String encoding = determineContentEncoding(httpResponse);
try {
switch (encoding) {
case "":
return httpResponse.body();
case "gzip":
return new GZIPInputStream(httpResponse.body());
default:
throw new UnsupportedOperationException(
"Unexpected Content-Encoding: " + encoding);
}
} catch (IOException ioe) {
throw new UncheckedIOException(ioe);
}
}
public static String determineContentEncoding(
HttpResponse<?> httpResponse) {
return httpResponse.headers().firstValue("Content-Encoding").orElse("");
}
Note that I've not added support for the "deflate" type (because I don't currently need it, and the more I read about "deflate" the more of a mess it sounded). But I believe you can easily support "deflate" by adding a check to the above switch block and wrapping the httpResponse.body() in an InflaterInputStream.
You can use Methanol. It has decompressing BodyHandler implementations, with out-of-the-box support for gzip & deflate. There's also a module for brotli.
var response = client.send(request, MoreBodyHandlers.decoding(BodyHandlers.ofString()));
Note that you can use any BodyHandler you want. MoreBodyHandlers::decoding makes it seem to your handler like the response was never compressed! It takes care of the Content-Encoding header and all.
Better yet, you can use Methanol's own HttpClient, which does transparent decompression after adding the appropriate Accept-Encoding to your requests.
var client = Methanol.create();
var request = MutableRequest.GET("https://example.com");
var response = client.send(request, BodyHandlers.ofString()); // The response is transparently decompressed
No, gzip/deflate compression are not handled by default. You would have to implement that in your application code if you need it - e.g. by providing a customized BodySubscriber to handle it. Alternatively - you may want to have a look at whether some of the reactive stream libraries out there offer such a feature, in which case you might be able to pipe that in by using one of the BodyHandlers.fromSubscriber​(Flow.Subscriber<? super List<ByteBuffer>> subscriber) or BodyHandlers.ofPublisher() methods.

Spring Boot: efficiently get data from REST API

I have a Spring Boot application that (among other things) gets some data from a third party JSON API (secured with OAuth), processes the result and presents it to the user. The application receives approx. 1 request each second.
Unfortunately this process is very slow at the moment (and in many cases even ends with a 503 error) and I am looking for some idea to improve the implementation. (by the way: the third party API itself does not seem to be the bottleneck as a instance of my app running on my local machine using the exact same API response very fast at the same time that the deploy instance takes very long).
For the API call I use the Apache HTTP library - or more specifically the Async HTTP Client:
this.httpClientAsync = HttpAsyncClients.custom()
.setDefaultCredentialsProvider(credsProvider) //for forward proxy
.build();
And the actual call to the API is this:
updateToken(); //get or update OAuth Token
HttpGet httpget = new HttpGet(URL);
httpget.addHeader("Authorization", "Bearer " + accessToken);
Future<HttpResponse> f = this.httpClientAsync.execute(httpget, callback);
Do you have any suggestion on how to improve the implementation?
To be honest, I don't even have an idea where the bottleneck is at the moment. Any idea on how to find out about that?
Thanks for your hints!
One more thing/update:
the Spring Controller looks something like this:
#RequestMapping(value = "/api/v1/api_data")
public DeferredResult<ResponseEntity<Map>> getAPIData() throws IOException, InterruptedException {
DeferredResult<ResponseEntity<Map>> res = new DeferredResult<>();
triggerAPICall(new FutureCallback() {
public void completed(Object o) {
(...)
res.setResult(...);
}
(...)
}
return res;
}
Furthermore, I was originally not using the async version of the HTTP client, but the blocking version. This then even slowed down the rest of the application...

Streaming HTTP responses with Jetty AsyncProxyServlet

I have a server that streams various things such as log output over long-lived HTTP responses. However, when using Jetty's proxy servlets, I haven't been able to get it to stream the response (it buffers the whole response before sending).
Using overriding a plain ProxyServlet class, the following appears to work:
#Override
protected void onResponseContent(HttpServletRequest request, HttpServletResponse response, Response proxyResponse, byte[] buffer, int offset, int length, Callback callback) {
super.onResponseContent(request, response, proxyResponse, buffer, offset, length, callback);
try {
response.getOutputStream().flush();
} catch (IOException e) {
log.warn("Error flushing", e);
}
}
However, doing that when overriding an AsyncProxyServlet doesn't work. (Full source code here.)
So, two questions:
When using ProxyServlet, is flushing after each bit of content received the way to go?
Is there a way to make it work with AsyncProxyServlet?
Got it working. The proper approach works whether async is used or not, which is to set the output buffer size when creating the Jetty server connectors.
HttpConfiguration httpConfig = new HttpConfiguration();
httpConfig.setOutputBufferSize(1024);
ServerConnector httpConnector = new ServerConnector(jettyServer,
new HttpConnectionFactory(httpConfig));
The default is 32768.
(Note: no need to override the onResponseContent method)

Streaming large files with spring mvc

I'm trying to create an application that download and uploads large files, so I don't want the file contents to be stored in memory.
On the mvc controller side I'm using an http message converter that converts to / from InputStream
#Override
public InputStream read(Class<? extends InputStream> clazz, HttpInputMessage inputMessage) throws IOException,
HttpMessageNotReadableException {
return inputMessage.getBody();
}
#Override
public void write(InputStream t, MediaType contentType, HttpOutputMessage outputMessage) throws IOException,
HttpMessageNotWritableException {
try {
IOUtils.copy(t, outputMessage.getBody());
} finally {
IOUtils.closeQuietly(t);
}
}
This works well on the server side.
On the client (RestTemplate) side I tried to use the same converter, but I got an exception that the stream has been closed (probably closed when the request was completed).
Client side code:
ResponseEntity<InputStream> res = rest.getForEntity(url, InputStream.class);
// res.getBody() is closed
I've also tried to copy the input stream into a buffer and create a new ByteArrayInputStream and return it to the RestTemplate client and it worked well, however it does require that the data will be read into memory which doesn't suite my demands.
My question is how to keep the stream open until I process it without having to read it all into memory / file?
Any idea will be appreciated.
Regards, Shay
As far as I am aware, RestTemplate's getForEntity() is not an appropriate way to get an InputStream. It's a convenience for converting to and from entity classes, so presumably that's where your problem lies.
Since you are used to HttpInputMessage, why don't you use HttpInputMessage.getBody() on the client side as well? It gets you a nice InputStream, which would be ready for passing straight to an OutputStream such as HttpServletResponse.getOutputStream().
Check how Spring MVC handles large files upload with org.springframework.web.multipart.commons.CommonsMultipartResolver. It has a 'maxInMemorySize' that can help control the memory requirements. See this thread for using a multipart resolver with the REST template Sending Multipart File as POST parameters with RestTemplate requests

Categories