Does Java HTTP Client handle compression

Does Java HTTP Client handle compression - java

I tried to find any mention of handling of compression in new Java HTTP Client but failed. Is there a built-in configuration to handle for e.g. gzip or deflate compression?
I would expect to have a BodyHandler for e.g. something like this:
HttpResponse.BodyHandlers.ofGzipped(HttpResponse.BodyHandlers.ofString())
but I don't see any. I don't see any configuration in HttpClient either. Am I looking in the wrong place or was this intentionally not implemented and deferred to support libraries?

I was also surprised that the new java.net.http framework doesn't handle this automatically, but the following works for me to handle HTTP responses which are received as an InputStream and are either uncompressed or compressed with gzip:
public static InputStream getDecodedInputStream(
HttpResponse<InputStream> httpResponse) {
String encoding = determineContentEncoding(httpResponse);
try {
switch (encoding) {
case "":
return httpResponse.body();
case "gzip":
return new GZIPInputStream(httpResponse.body());
default:
throw new UnsupportedOperationException(
"Unexpected Content-Encoding: " + encoding);
}
} catch (IOException ioe) {
throw new UncheckedIOException(ioe);
}
}
public static String determineContentEncoding(
HttpResponse<?> httpResponse) {
return httpResponse.headers().firstValue("Content-Encoding").orElse("");
}
Note that I've not added support for the "deflate" type (because I don't currently need it, and the more I read about "deflate" the more of a mess it sounded). But I believe you can easily support "deflate" by adding a check to the above switch block and wrapping the httpResponse.body() in an InflaterInputStream.

You can use Methanol. It has decompressing BodyHandler implementations, with out-of-the-box support for gzip & deflate. There's also a module for brotli.
var response = client.send(request, MoreBodyHandlers.decoding(BodyHandlers.ofString()));
Note that you can use any BodyHandler you want. MoreBodyHandlers::decoding makes it seem to your handler like the response was never compressed! It takes care of the Content-Encoding header and all.
Better yet, you can use Methanol's own HttpClient, which does transparent decompression after adding the appropriate Accept-Encoding to your requests.
var client = Methanol.create();
var request = MutableRequest.GET("https://example.com");
var response = client.send(request, BodyHandlers.ofString()); // The response is transparently decompressed

No, gzip/deflate compression are not handled by default. You would have to implement that in your application code if you need it - e.g. by providing a customized BodySubscriber to handle it. Alternatively - you may want to have a look at whether some of the reactive stream libraries out there offer such a feature, in which case you might be able to pipe that in by using one of the BodyHandlers.fromSubscriber(Flow.Subscriber<? super List<ByteBuffer>> subscriber) or BodyHandlers.ofPublisher() methods.

Related

How to decompress a Flux<DataBuffer> (and how to write one)?

I have a requirement to read and write compressed (GZIP) streams without intermediate storage. Currently, I'm using Spring RestTemplate to do the writing, and Apache HTTP client to do the reading (see my answer here for an explanation of why RestTemplate can't be used for reading large streams). The implementation is fairly straightforward, where I slap a GZIPInputStream on the response InputStream and move on.
Now, I'd like to switch to using Spring 5 WebClient (just because I'm not a fan of status quo). However, WebClient is reactive in nature and deals with Flux<Stuff>; I believe it's possible to get a Flux<DataBuffer>, where DataBuffer is an abstraction over ByteBuffer. Question is, how do I decompress it on the fly without having to store the full stream in memory (OutOfMemoryError, I'm looking at you), or writing to local disk? It's worth mentioning that WebClient uses Netty under the hood.
Also see Reactor Netty issue-251.
Also related to Spring integration issue-2300.
I'll admit to not knowing much about (de)compression, however, I did my research, but none of the material available online seemed particularly helpful.
compression on java nio direct buffers
Writing GZIP file with nio
Reading a GZIP file from a FileChannel (Java NIO)
(de)compressing files using NIO
Iterable gzip deflate/inflate in Java

public class HttpResponseHeadersHandler extends ChannelInboundHandlerAdapter {
private final HttpHeaders httpHeaders;
#Override
public void channelRead(ChannelHandlerContext ctx, Object msg) {
if (msg instanceof HttpResponse &&
!HttpStatus.resolve(((HttpResponse) msg).status().code()).is1xxInformational()) {
HttpHeaders headers = ((HttpResponse) msg).headers();
httpHeaders.forEach(e -> {
log.warn("Modifying {} from: {} to: {}.", e.getKey(), headers.get(e.getKey()), e.getValue());
headers.set(e.getKey(), e.getValue());
});
}
ctx.fireChannelRead(msg);
}
}
Then I create a ClientHttpConnector to use with WebClient and in afterNettyContextInit add the handler:
ctx.addHandlerLast(new ReadTimeoutHandler(readTimeoutMillis, TimeUnit.MILLISECONDS));
ctx.addHandlerLast(new Slf4JLoggingHandler());
if (forceDecompression) {
io.netty.handler.codec.http.HttpHeaders httpHeaders = new ReadOnlyHttpHeaders(
true,
CONTENT_ENCODING, GZIP,
CONTENT_TYPE, APPLICATION_JSON
);
HttpResponseHeadersHandler headersModifier = new HttpResponseHeadersHandler(httpHeaders);
ctx.addHandlerFirst(headersModifier);
}
ctx.addHandlerLast(new HttpContentDecompressor());
This, of course, would fail for responses that are not GZIP compressed, so I use this instance of WebClient for a particular use case only, where I know for sure that the response is compressed.
Writing is easy: Spring has a ResourceEncoder, so InputStream can simply be converted to InputStreamResource, and voila!

Noting this here as it confused me a bit - the API has changed a bit as of 5.1.
I have a similar setup to the accepted answer for the ChannelInboundHandler:
public class GzipJsonHeadersHandler extends ChannelInboundHandlerAdapter {
#Override
public void channelRead(ChannelHandlerContext ctx, Object msg) {
if (msg instanceof HttpResponse
&& !HttpStatus.resolve(((HttpResponse) msg).status().code()).is1xxInformational()) {
HttpHeaders headers = ((HttpResponse) msg).headers();
headers.clear();
headers.set(HttpHeaderNames.CONTENT_ENCODING, HttpHeaderValues.GZIP);
headers.set(HttpHeaderNames.CONTENT_TYPE, HttpHeaderValues.APPLICATION_JSON);
}
ctx.fireChannelRead(msg);
}
}
(The header values I needed are just hard-coded there for simplicity, otherwise it's identical.)
To register it however is different:
WebClient.builder()
.clientConnector(
new ReactorClientHttpConnector(
HttpClient.from(
TcpClient.create()
.doOnConnected(c -> {
c.addHandlerFirst(new HttpContentDecompressor());
c.addHandlerFirst(new HttpResponseHeadersHandler());
})
).compress(true)
)
)
.build();
It seems Netty now maintains a user list of handlers separate from (and after) the system list, and addHandlerFirst() only puts your handler at the front of the user list. It therefore requires an explicit call to HttpContentDecompressor to ensure it's definitely executed after your handler that inserts the correct headers.

How to deal with a web service that doesn't obey Accept: application/json using Jersey and Jackson

I am using Jersey and Jackson to access a REST web service, which is correctly returning well formed JSON data but has the response header:
Content-Type: text/html; charset=UTF-8
Even though I have specified Accept: application/json in the request header and as a result is causing Jersey to throw:
org.glassfish.jersey.message.internal.MessageBodyProviderNotFoundException: MessageBodyReader not found for media type=text/html;charset=UTF-8
I am consuming other web services fine with my code, but I am wondering if there is a way to create my own MessageBodyReader to deal with the mis-match, however I have yet to figure out how to implement it correctly. I plan to ask the owner of the web service to fix the mis-match but I don't hold out much hope.

Okay so I managed to figure it out by essentially following Stephen C's advice but thought I'd post a few more details in case anyone else is in the same boat. First I actually started from the Jersey guide a few sections back, specifically this one:
https://jersey.java.net/documentation/latest/user-guide.html#d0e6825
Obviously I am using Jersey for the javax.ws.rs.client and I am using Genson to do the JSON deserialisation. As a result I have implemented the following class to implement a MessageBodyReader:
public class BTCEURTradeMessageBodyReader
implements MessageBodyReader<BTCEURTrades> {
final org.slf4j.Logger logger =
LoggerFactory.getLogger(BTCEURTradeMessageBodyReader.class);
#Override
public boolean isReadable(Class<?> type, Type genericType,
Annotation[] annotations, MediaType mediaType) {
logger.info("isReadable being checked for: {} and media type: {}", type, mediaType);
return type == BTCEURTrades.class;
}
#Override
public BTCEURTrades readFrom(Class<BTCEURTrades> type, Type genericType,
Annotation[] annotations, MediaType mediaType,
MultivaluedMap<String, String> httpHeaders, InputStream entityStream)
throws IOException, WebApplicationException {
logger.info("readFrom being called for: {}", type);
BTCEURTrades btceurTrades;
try {
btceurTrades = new Genson().deserialize(entityStream, type);
} catch(Exception e) {
logger.error("Error processing JSON reponse.", e);
throw new ProcessingException("Error processing JSON reponse.");
}
return btceurTrades;
}
}
This then gets registered with the client after it is created as follows:
client = ClientBuilder.newClient();
client.register(BTCEURTradeMessageBodyReader.class);

I am wondering if there is a way to create my own MessageBodyReader to deal with the mis-match.
This page in the Jersey documentation explains how to create a custom MessageBodyReader:
https://jersey.java.net/documentation/latest/message-body-workers.html#d0e7151
In your case, you may be able to find the source code for the reader that is ordinarily used for your JSON, and "tweak" it. In theory.
However, after a bit more reading, I found this:
https://jersey.java.net/documentation/latest/media.html#json
which is telling me that Jersey already has extensive support for JSON. There is a good chance that you could fix your problem this by simply tweaking the configs so that Jersey knows what to do with the unusual content-type. But that will depend on which of the many possible ways that you are parsing JSON response bodies ... at the moment.
Someone commented thus:
I think however easier just to retrieve the data, ignore the header and just parse it into your json object.
That is a bad idea. The header is telling you that the JSON could contain multi-byte characters. If you simply ignored that and decoded the bytes to characters in the default character set, you would get "mojibake" if there were multibyte characters present.
If you are parsing the JSON yourself, then it should be a simple matter to configure the parser's input stream to use UTF-8, or whatever else the content-type header says the character encoding is.
Finally, there is the issue of "who is wrong".
I actually think it is your fault. If you send just an "Accept: application/json" header, you are telling the server that you don't care what the character set is. The server is then free to pick any charset for the response that it knows will correctly represent the response text. (In this case the text content of the JSON.)
If you specifically want (say) ASCII or Latin-1 then you should add an "Accept-charset:" header.
If >>THAT<< doesn't work, then maybe it is the server's fault. But bear in mind that if the response does / could contain characters that cannot be encoded on your preferred charset, then the server could / should send you a 406 error.

Streaming an upload with HttpClient/MultipartEntity

I've got a Tomcat instance right now that takes uploads and does some processing work on the data.
I want to replace this with a new servlet that conforms to a similar API. At first, I want this new servlet to just proxy all of the requests to the old one. They're running on separate JVMs, but on the same host.
I've been trying to use the HttpClient to proxy the upload, but it seems that the client waits for the stream to finish before it proxies the request. For large files, this causes the servlet to crash (I think it's buffering everything in memory).
Here's the code I'm currently using:
HttpPost httpPost = new HttpPost("http://localhost:8081/servlet");
String filePartName = request.getHeader("file_part_name");
_logger.info("Attaching file " + filePartName);
try {
Part filePart = request.getPart(filePartName);
MultipartEntity mpe = new MultipartEntity();
mpe.addPart(
filePartName,
new InputStreamBody(filePart.getInputStream(), filePartName)
);
httpPost.setEntity(mpe);
} catch (ServletException | IOException e) {
_logger.error("Caught exception trying to cross the streams, thanks Ghostbusters.", e);
throw new IllegalStateException("Could not proxy the request", e);
}
HttpResponse postResponse;
try {
postResponse = HTTP_CLIENT.execute(httpPost);
} catch (IOException e) {
_logger.error("Caught exception trying to cross the streams, thanks Ghostbusters.", e);
throw new IllegalStateException("Could not proxy the request", e);
}
I can't seem to figure out how to get HttpClient/HttpPost to stream the data as it comes in, instead of blocking until the first upload completes. Has anyone done something similar before? Is there an easier solution?
Thanks!

The issue lies in the way your request is processed by the Mime/Multiplart framework (the one you use to process your HTTPServletRequest, and access file parts).
The nature of a MIME/Multipart request is simple (at a high level), instead of having a traditionnal key=value content, those requests have much more complex syntax, that allows them to carry arbitrary, unstructured data (files to upload).
It basically looks like (taken from wikipedia):
Content-type: multipart/mixed; boundary="'''frontier'''"
This is a multi-part message in MIME format.
--'''frontier'''
Content-type: text/plain
This is the body of the message.
--'''frontier'''
Content-type: application/octet-stream
Content-Disposition: form-data; name="image1"
Content-transfer-encoding: base64
PGh0bWw+CiAgPGhlYWQ+CiAgPC9oZWFkPgogIDxib2R5PgogICAgPHA+VGhpcyBpcyB0aGUg
Ym9keSBvZiB0aGUgbWVzc2FnZS48L3A+CiAgPC9ib2R5Pgo8L2h0bWw+Cg==
--'''frontier'''--
The important part to note is that parts (that are separated by the boundary '''frontier''' here) have "names" (through the Content Disposition header), then follows the content. One such request can have any number of parts.
Now of course, the most simple, straightforward way to implement the parsing of such a request is to process it till the end, detect the boundary, and create a temporary file (or in-memory cache) to hold each part, identified by name.
Seeing the framework can not know what part you will need first (you may need the second part in your servlet call before the first), it parses the whole stream, and then, gives you back the control.
Therefore your call is blocked at this line
Part filePart = request.getPart(filePartName);
Here, the framework has to wait to parse the whole MIME part, before letting you use the result (even a rethorical, super optimised parser could not both parse lazily the stream, and allow you random access to any parts of the message, you'd have to choose between the two options).
So there's not much you can do...
Except, not use the Multipart parser. I wouldn't recommend this if you're not familiar with MIME (and/or MIME libraries such as Apache James), nor confident that you are in control of your request's structure.
But if you are, then you may bypass the framework processing, and access the raw stream of the request. You'd parse the MIME structure by hand, and stop when you hit the start of the request's body, and start building your HTTP Post at this point, being carefull to actually take care of MIME level technicalities (de-base64 ? de-gzip ?, ...).
Alternatively, if you think your server crashes because of an out of memory, it may very well be possible that your framework is configured to cache contents of the multpart in memory. But if there is a way to configure it to cache to disk, then this is a possible workaround.

Encoding of Response is incorrect using Apache HttpClient

I am calling a restful service that returns JSON using the Apache HttpClient.
The problem is I am getting different results in the encoding of the response when I run the code on different platforms.
Here is my code:
GetMethod get = new GetMethod("http://urltomyrestservice");
get.addRequestHeader("Content-Type", "text/html; charset=UTF-8");
...
HttpResponse response = httpexecutor.execute(request, conn, context);
response.setParams(params);
httpexecutor.postProcess(response, httpproc, context);
StringWriter writer = new StringWriter();
IOUtils.copy(response.getEntity().getContent(), writer);
When I run this on OSX, asian characters etc return fine e.g. 張惠妹 in the response. But when I run this on a linux server the same code displays the characters as ???
The linux server is an Amazon EC2 instance running Java 1.6.0_26-b03
My local OSX is running 1.6.0_29-b11
Any ideas really appreciated!!!!!

If you look at the javadoc of org.apache.commons.io.IOUtils.copy(InputStream, Writer):
Copy bytes from an InputStream to chars on a Writer using the default
character encoding of the platform.
So that will give different answers depending on the client (which is what you're seeing)
Also, Content-Type is usually a response header (unless you're using POST or PUT). The server is likely to ignore it (though you might have more luck with the Accept-Charset request header).
You need to parse the content type's charset-encoding parameter of the response header, and use that to convert the response into a String (if it's a String you're actually after). I expect Commons HTTP has code that will do that automatically for you. If it doesn't, Spring's RESTTemplate definitely does.

I believe that the problem is not in the HTTP encoding but elsewhere (e.g. while reading or forming the answer). Where do you get the content from and how? Is this stored in a DB or file?

Chunked http decoding in java?

I am decoding http packets.
And I faced a problem that chunk problem.
When I get a http packet it has a header and body.
When transefer-encoding is chunked I don't know what to do ?
Is there a useful API or class for dechunk the data in JAVA ?
And if someone , experienced about http decoding , please show me a way how to do this ?

Use a fullworthy HTTP client like Apache HttpComponents Client or just the Java SE provided java.net.URLConnection (mini tutorial here). Both handles it fully transparently and gives you a "normal" InputStream back. HttpClient in turn also comes with a ChunkedInputStream which you just have to decorate your InputStream with.
If you really insist in homegrowing a library for this, then I'd suggest to create a class like ChunkedInputStream extends InputStream and write logic accordingly. You can find more detail how to parse it in this Wikipedia article.

Apache HttpComponents
Oh, and if we are talking about the client side, HttpUrlConnection does this as well.

If you are looking for a simple API try Jodd Http library (http://jodd.org/doc/http.html).
It handles Chunked transfer encoding for you and you get the whole body as a string back.
From the docs:
HttpRequest httpRequest = HttpRequest.get("http://jodd.org");
HttpResponse response = httpRequest.send();
System.out.println(response);

Here is quick-and-dirty alternative that requires no dependency except Oracle JRE:
private static byte[] unchunk(byte[] content) throws IOException {
ByteArrayInputStream bais = new ByteArrayInputStream(content);
ChunkedInputStream cis = new ChunkedInputStream(bais, new HttpClient() {}, null);
return readFully(cis);
}
It uses the same sun.net.www.http.ChunkedInputStream as java.net.HttpURLConnection does behind the scene.
This implementation doesn't provide detailed exceptions (line numbers) on wrong content format.
It works with Java 8 but could fail in with next release. You've been warned.
Could be useful for prototyping though.
You can choose any readFully implementation from Convert InputStream to byte array in Java.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.