RESTful API - chunked response for bulk operation

RESTful API - chunked response for bulk operation - java

I work on a REST-like API that will support bulk operations on some resources. As it may take some time to finish such a request, I would like to return statuses of the operations in a chunked response. The media type should be JSON. How to do it with JAX-RS?
(I know that there is StreamingOutput, but it needs to manually serialize the data.)

Chunked Transfer encoding is usually used in cases where the content length is unknown when the sender starts transmitting the data. The receiver can handle each chunk while the server is still producing new ones.
This implies the the server is sending the whole time. I don't think that it makes too much sense to send I'm still working|I'm still working|I'm still working| in chunks and as far as I know chunked transfer-encoding is handled transparently by most application servers. They switch automatically when the response is bigger then a certain size.
A common pattern for your use case looks like this:
The client triggers a bulk operation:
POST /batch-jobs HTTP/1.1
The server creates a resource which describes the status of the job and returns the URI in the Location header:
HTTP/1.1 202 Accepted
Location: /batch-jobs/stats/4711
The client checks this resource and receives a 200:
GET /batch-jobs/stats/4711 HTTP/1.1
This example uses JSON but you could also return plain text or add caching headers which tell the client how long he should wait for the next poll.
HTTP/1.1 200 OK
Content-Type: application/json
{ "status" : "running", "nextAttempt" : "3000ms" }
If the job is done the server should answer with a 303 and the URI of the resource he has created:
HTTP/1.1 303 See other
Location: /batch-jobs/4711

Related

Traefik breaking headers and Tomcat keeping stream alive until timeout

TL;DR
My Spring Boot REST endpoint returns HTTP status immediately but sometimes waits 60 seconds before returning the result.
This is caused by a Feign-proxied request's headers being altered (Transfer-Encoding being replaced with Content-Length)
The context and results of the investigation are detailed below.
Facts
A (Spring Boot + Feign + Tomcat) -> Traefik -> B (Rest resource, Spring Boot + Tomcat)
Traefik & B always respond immediately, A always returns the 200 status code immediately.
Based on unknown criteria, A uses a KeepAlive stream and returns only after precisely 60 seconds (max idle thread?).
B uses the Transfer-Encoding: chunked header, but Traefik replaces it with Content-Length depending on unknown criteria.
The pause comes from a KeepAliveStream not being closed. I found several workarounds/solutions, but I'd love to have an explanation also.
Removing the Content-Length header solves the issue. Debugging sun.www.net.HttpClient confirms that having a Content-Length header triggers the use of a KeepAlive stream.
Calling A with the Connection: close header seems to solve the issue (same reason: this prevents the use of the KeepAliveStream).
Replacing Tomcat with Jetty in A seems to solve the issue, as it seems to rely on other HTTP libs.
Replacing A's Feign's Apache HttpClient with OkHttp solves the problem.
Remaining questions
Why doesn't Feign/Tomcat/HttpClient close once the whole body is available (which is immediately)?
Bonus question: Why/When/Based on what does Traefik alter the headers, and what are the rules?
The mystery of the lost bytes
One of our latest tests was to use -v with curl and we saw this while A was pausing:
$ curl -i -v http://localhost:8100/a/proxied-endpoint
#...
< Content-Length: 1843
<
{ [1793 bytes data]
So, the service hangs and waits for 50 missing bytes.
When interrupted though, it returns the whole response.
I'm thinking of an encooding issue there, but don't understand where it could happen.
Replacing the content length with 1793 (or lower) makes the endpoint makes the endpoint return immediately.
There is a discrepancy between the way the Content-Length header is computed and the way our client calculates it upon receiving the body.
Detailed context
The situation
I'm having an issue with a Spring Boot REST controller (A) that acts as a proxy to another service (B). The endpoint basically looks like this:
#GetMapping("/{resource}")
public ResponseEntity<List<Something>> getAll(#PathVariable resource) {
return myFeignProxy.getAll(resource);
}
There's a Traefik reverse proxy between A and B.
In summary: A -> Traefik -> B.
In all cases, the Feign proxies answers in less than 100 ms and the endpoint returns the HTTP status (200) immediately. However, in most cases, the body is not returned immediately. A's Spring Boot waits for precisely 60 seconds (this is really not random).
Whether the body will be returned immediately or after 60 s seems to depend upon resource: some resources are always available immediately, the others have the wait. Once again, this does not seem random.
[Edit]: Investigation has shown that, in the cases where A pauses, Traefik replaced B's original Transfer-Encoding header with Content-Length.
Based on this header, sun.net.www.HttpClient would decide to use a KeepAliveStream.
The problem is that this stream then doesn't close.
Versions
Spring Boot: 2.2.6
Tomcat: 9.0.33
Feign: (determined by Spring Cloud 2.1.2)
Traefik: 2.2.11
What it's not
It is not an issue with the proxied service (B) being slow. In all cases, myFeignProxy responds in a few ms and the endpoint returns the 200 HTTP status immediately.
I've tried changing Feign client timeouts, without any impact.
I also see no correlation between the pause, the size of the body and the time of response of the feign proxy.
Resource
Size (KB)
Feign time (ms)
60s pause
1
1.87
34
yes
2
3.29
35
no
3
1.55
34
yes
4
10.05
81
yes
The problem is not related to Spring Security either, as entirely removing it (configuration and dependencies) does not change the symptoms.
Updated investigations
Technical layer causing the pause
The pause seems to come from Tomcat. Replacing the Tomcat starter with the Jetty starter (in A) eliminates the issue (all requests answer immediately).
That being said, it doesn't explain the problem.
Trace log analysis
It appears that, for an endpoint where the pause appears, there is one additional line in logs during the calls. See below for examples. The parameters for the HttpURLConnection also seem to be different, though I do not understand why.
Case without pause
DEBUG [nio-8100-exec-9] s.n.www.protocol.http.HttpURLConnection : sun.net.www.MessageHeader#784b4a945 pairs: {GET /xxx HTTP/1.1: null}{Accept: application/json}{User-Agent: Java/11.0.7}{Host: xxx}{Connection: keep-alive}
DEBUG [nio-8100-exec-9] s.n.www.protocol.http.HttpURLConnection : sun.net.www.MessageHeader#2a3818a612 pairs: {null: HTTP/1.1 200 OK}{Cache-Control: no-cache, no-store, max-age=0, must-revalidate}{Content-Type: application/json}{Date: Tue, 20 Apr 2021 07:47:47 GMT}{Expires: 0}{Pragma: no-cache}{Strict-Transport-Security: max-age=31536000 ; includeSubDomains}{Vary: accept-encoding}{X-Content-Type-Options: nosniff}{X-Frame-Options: DENY}{X-Xss-Protection: 1; mode=block}{Transfer-Encoding: chunked}
Case with pause
DEBUG [nio-8100-exec-6] s.n.www.protocol.http.HttpURLConnection : sun.net.www.MessageHeader#7bff99e75 pairs: {GET /xxx HTTP/1.1: null}{Accept: application/json}{User-Agent: Java/11.0.7}{Host: xxx}{Connection: keep-alive}
TRACE [nio-8100-exec-6] s.n.www.protocol.http.HttpURLConnection : KeepAlive stream used: https://xxx/xxx
DEBUG [nio-8100-exec-6] s.n.www.protocol.http.HttpURLConnection : sun.net.www.MessageHeader#5aed6c9312 pairs: {null: HTTP/1.1 200 OK}{Cache-Control: no-cache, no-store, max-age=0, must-revalidate}{Content-Type: application/json}{Date: Tue, 20 Apr 2021 07:57:42 GMT}{Expires: 0}{Pragma: no-cache}{Strict-Transport-Security: max-age=31536000 ; includeSubDomains}{Vary: accept-encoding}{X-Content-Type-Options: nosniff}{X-Frame-Options: DENY}{X-Xss-Protection: 1; mode=block}{Content-Length: 803}
When finally responding after the pause (not present when responding immediately)
DEBUG [nio-8100-exec-7] o.apache.tomcat.util.threads.LimitLatch : Counting down[http-nio-8100-exec-7] latch=1
DEBUG [nio-8100-exec-7] org.apache.tomcat.util.net.NioEndpoint : Calling [org.apache.tomcat.util.net.NioEndpoint#63668501].closeSocket([org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper#cfdc708:org.apache.tomcat.util.net.NioChannel#6e7c15b6:java.nio.channels.SocketChannel[connected local=/0:0:0:0:0:0:0:1:8100 remote=/0:0:0:0:0:0:0:1:52501]])
The additional log ("KeepAlive stream used") occurs in sun.net.www.http.HttpClient. I seem to understand the decision to use this KeepAlive depends on the proxied response's headers.
Headers analysis and Traefik meddling
Traefik changes the headers between A and B.
B always returns its response with Transfer-Encoding: chunked.
Traefik sometimes replaces it with Content-Length and the correct size for the payload.
Nothing is configured in our Traefik instance concerning headers.
The rules used to decide between Transfer-Encoding and Content-Length seem hard to grasp:
It seems to depend on the endpoint being called or its payload.
It also seems to depend on something from the caller, as I don't always get the same header depending whether I'm calling from A or from curl.
This explains why the problem is not reproducible when both applications are on the local machine, since there is the no Traefik between them.
About the rules that Traefik applies, it appears the HTTP version plays a role.
$ curl -s -o /dev/null -D - --http1.1 https://traefik/b/endpoint
HTTP/1.1 200 OK
# ...
Transfer-Encoding: chunked
$ curl -s -o /dev/null -D - --http2 https://traefik/b/endpoint
HTTP/2 200
# ...
content-length: 2875
Traefik always returns the same headers for a given endpoint, so we can think that the headers also depend on the address or, more presumably, on the payload (a given endpoint always returns the same payload for this service).
First version of B that doesn't work
Performing a git bisect, I discovered the 60-second pause problem appeared when the proxied service (B) started using ZonedDateTime instead of LocalDateTime in its DTO. The only change is that date time fields now have an offset in the response body, there is no impact on the headers. Yet, the Feign client works fine for LocalDateTimes and pauses for ZonedDateTimes.
Forcing the connection to close
Passing the Connection: close header makes the pause disappear in A.
The response body is returned immediately.
HttpClient does not use the KeepAliveStream in this case.
Trying to reproduce with a mock B
I wrote a quick mock service B.
It returns the Content-Type header and the content.
What's interesting is that:
If mock-B returns the Content-Length header, then A has a 60-second pause.
If mock-B does not return the Content-Length header, then A returns immediately.
This is consistent with previous tests indicating that the Content-Length header plays a role, but I am still unable to understand which, since it is present in some Traefik occurrences that A still returns immediately.
mock-B
const port = 8080;
const http = require('http');
const path = require('path');
const fs = require('fs');
const countBytesInString = str => encodeURI(str).split(/%..|./).length - 1
const requestListener = (req, res) => {
console.log(`\n>> Mock call to endpoint ${req.url}`);
fs.readFile(path.join(__dirname, `endpoints${req.url}.json`), 'utf8' , (err, data) => {
if (err) {
console.error(err)
return;
}
const contentLength = countBytesInString(data);
console.log(`Content-Length: ${contentLength}`);
res.setHeader('Content-Type', 'application/json');
res.setHeader('content-length', contentLength);
res.writeHead(200);
res.end(data);
})
};
const server = http.createServer(requestListener);
server.listen(port);
console.log(`Mock server listening on port ${port}`);

Explaining the causes
We've finally understood the mechanism that leads to the issue.
A -> Traefik -> B
B returns a list of objects with a ZonedDateTime field ("validFrom":"2021-12-24 23:59:57+01:00") and the header Transfer-Encoding: chunked.
Traefik replaces the Transfer-Encoding: chunked with a Content-Length, computed from the body of the request.
A receives the response, deserializes the objects, then reserializes them but in the UTC timezone ("validFrom":"2021-12-24 22:59:57Z"), but it reuses the Content-Length from Traefik without recalculating it.
As a consequence, the body from is shorter than the announced Content-Length (each ZonedDateTime takes five bytes less when A sends it than when Traefik computes the content length).
The client however has been announced a Content-Length and is waiting for the missing bytes.
Possible solution
The solution we have in mind right now is to tell Feign and its calling controller that it returns a ResponseEntity<String> instead of a ResponseEntity<List<MyObject>>.
Pros:
B's response is returned as-is, so no more problem due to a varying content length.
A does not spend CPU-time deserializing then immediately reserializing the response.
Cons:
The OpenApi doc of A won't show the type of return (unless the Open API annotation allow to specify the return model). That's what I'll test later today.

HttpServletRequest.getInputStream() does not unwrap chunked HTTP request

I am in the process of sending a HTTP chunked request to an internal system. I've confirmed other factors are not at play by ensuring that I can send small messages without chunk encoding.
My process was basically to change the Transfer-Encoding header to be chunked and I've removed the Content-Length header. Additionally, I am utilising an in-house ChunkedOutputStream which has been around for quite some time.
I am able to connect, obtain an output stream and send the data. The recipient then returns a 200 response so it seems the request was received and successfully handled. The endpoint receives the HTTP Request, and streams the data straight into a table (using HttpServletRequest.getInputStream()).
On inspecting the streamed data I can see that the chunk encoding information in the stream has not been unwrapped/decoded by the Tomcat container automatically. I've been trawling the Tomcat HTTPConnector documentation and can't find anything that alludes to the chunked encoding w.r.t how a chunk encoded message should be handled within a HttpServlet. I can't see other StackOverflow questions querying this so I suspect I am missing something basic.
My question boils down to:
Should Tomcat automatically decode the chunked encoding from my request and give me a "clean" InputStream when I call HttpServletRequest.getInputStream()?
If yes, is there configuration that needs to be updated to enable this functionality? Am I sending something wrong in the headers that is causing it to return the non-decoded stream?
If no, is it common practice to wrap input stream in a ChunkedInputStream or something similar when the Transfer-Encoding header is present ?

This is solved. As expected it was basic in my case.
The legacy system I was using provided handrolled methods to simplify the process of opening a HTTP Connection, sending headers and then using an OutputStream to send the content via a POST. I didn't realise, and it was in a rather obscure location, but the behind-the-scenes helper's we're identifying that I was not specifying a Content-Length thus added the TRANSFER_ENCODING=chunked header and wrapped the OutputStream in a ChunkedOutputStream. This resulted in me double encoding the contents, hence my endpoints (seeming) inability to decode it.
Case closed.

Get response header OkHttp

I need to check response header of HTTP request using OkHTTP library. before loading data I need to check it's last update time. The problem in that that the response body is about 2 MB so I need to get only Last-Modified header. Is it possible to load only response header without response body to increase the speed of the program`s RESTful actions?

You can send a HTTP HEAD request which only retrieves the headers. You only need to check if your server application supports HEAD requests.
The HEAD method is identical to GET except that the server MUST NOT
return a message-body in the response. The metainformation contained
in the HTTP headers in response to a HEAD request SHOULD be identical
to the information sent in response to a GET request. This method can
be used for obtaining metainformation about the entity implied by the
request without transferring the entity-body itself. This method is
often used for testing hypertext links for validity, accessibility,
and recent modification. (http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html)
Example for OkHttp:
String url = ...
Request request = new Request.Builder().url(url).head().build();

The response body is streamed, so you can make the regular request, read the headers, and then decide whether or not to consume the body. If you don’t want the body, you can close() it without much waste.
There is a slight cost to the server to serve a response that might be abandoned. But the overall cost will be lower than making a HEAD and then a GET request unless you expect abandon a significant fraction (say > 90%) of requests.

PAGE_CACHE_FILTER_STATE HTTP Header

I can't find information anywhere regarding this HTTP Header: PAGE_CACHE_FILTER_STATE.
When I try to access my RSS feed from a browser, this header has the value of NoCacheRequest, but when I access it from my Java application (URL.openConnection()), I've noticed that it gets set to FromCacheRequest and my RSS doesn't appear to update.
So I have two questions:
What is this HTTP header?
How can I make PAGE_CACHE_FILTER_STATE: NoCacheRequest for all requests?

I've never heard about nor seen PAGE_CACHE_FILTER_STATE before either, so I can't help you out with the actual specifications for it. It looks like a custom header telling you whether a cached version of the content was used or not.
To avoid caching, you could try programmatically adding something different to the URL each time. For example, you might add a random number:
http://www.example.com/feed.rss?no_cache=564482
http://www.example.com/feed.rss?no_cache=984637
You should also try sending the Pragma: no-cache and Cache-Control: no-cache HTTP headers when you request the RSS feed.

How to know when HTTP-server is done sending data

I'm working on a browser/proxy oriented project where I need to download webpages. After sending a custom HTTP request to a web server I start listening for a server response.
When reading the response, I check the response headers for a Content-Length:-row. If I get one of those, it's easy to determine when the server is done sending data since I always know how many bytes of data I have received.
The problem occurs when the server doesn't include the Content-Length header and also keeps the connection open for further requests. For example, the google server responds with gzipped-content, but doesn't include content length. How do I know when to stop waiting for more data and close the connection?
I have considered using a timeout value to close the connection when no data has been received for a while, but this seems like the wrong way to do it. Chrome for example, can download the same pages as me and always seem to know exactly when to close the connection.

Have a look at IETF RfC 2616, search for chunked encoding and Content-Range.
HTTP is designed to return content of unknown length, as in:
HTTP/1.1 200 OK
Content-Type: text/plain
Transfer-Encoding: chunked
25
This is the data in the first chunk
1C
and this is the second one
3
con
8
sequence
0
source Wikipedia

I would try to suggest you to force Connection: close header so you are sure that the server closes the connection after output is finished, no matter if the Content-length is set or not. Performance will be partially affected by this

There are two cases you can expect:
1. socket-close
2. socket-timeout
Usually the socket will be closed, it also make sense to declare an Socket Timeout.
Remember
int stream.read(byte[],size);
returns the real size of byte[]-argument's size that has been read till socket-close or socket-timeout (or size-argument reached).
Regards.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.