Traefik breaking headers and Tomcat keeping stream alive until timeout

Traefik breaking headers and Tomcat keeping stream alive until timeout - java

TL;DR
My Spring Boot REST endpoint returns HTTP status immediately but sometimes waits 60 seconds before returning the result.
This is caused by a Feign-proxied request's headers being altered (Transfer-Encoding being replaced with Content-Length)
The context and results of the investigation are detailed below.
Facts
A (Spring Boot + Feign + Tomcat) -> Traefik -> B (Rest resource, Spring Boot + Tomcat)
Traefik & B always respond immediately, A always returns the 200 status code immediately.
Based on unknown criteria, A uses a KeepAlive stream and returns only after precisely 60 seconds (max idle thread?).
B uses the Transfer-Encoding: chunked header, but Traefik replaces it with Content-Length depending on unknown criteria.
The pause comes from a KeepAliveStream not being closed. I found several workarounds/solutions, but I'd love to have an explanation also.
Removing the Content-Length header solves the issue. Debugging sun.www.net.HttpClient confirms that having a Content-Length header triggers the use of a KeepAlive stream.
Calling A with the Connection: close header seems to solve the issue (same reason: this prevents the use of the KeepAliveStream).
Replacing Tomcat with Jetty in A seems to solve the issue, as it seems to rely on other HTTP libs.
Replacing A's Feign's Apache HttpClient with OkHttp solves the problem.
Remaining questions
Why doesn't Feign/Tomcat/HttpClient close once the whole body is available (which is immediately)?
Bonus question: Why/When/Based on what does Traefik alter the headers, and what are the rules?
The mystery of the lost bytes
One of our latest tests was to use -v with curl and we saw this while A was pausing:
$ curl -i -v http://localhost:8100/a/proxied-endpoint
#...
< Content-Length: 1843
<
{ [1793 bytes data]
So, the service hangs and waits for 50 missing bytes.
When interrupted though, it returns the whole response.
I'm thinking of an encooding issue there, but don't understand where it could happen.
Replacing the content length with 1793 (or lower) makes the endpoint makes the endpoint return immediately.
There is a discrepancy between the way the Content-Length header is computed and the way our client calculates it upon receiving the body.
Detailed context
The situation
I'm having an issue with a Spring Boot REST controller (A) that acts as a proxy to another service (B). The endpoint basically looks like this:
#GetMapping("/{resource}")
public ResponseEntity<List<Something>> getAll(#PathVariable resource) {
return myFeignProxy.getAll(resource);
}
There's a Traefik reverse proxy between A and B.
In summary: A -> Traefik -> B.
In all cases, the Feign proxies answers in less than 100 ms and the endpoint returns the HTTP status (200) immediately. However, in most cases, the body is not returned immediately. A's Spring Boot waits for precisely 60 seconds (this is really not random).
Whether the body will be returned immediately or after 60 s seems to depend upon resource: some resources are always available immediately, the others have the wait. Once again, this does not seem random.
[Edit]: Investigation has shown that, in the cases where A pauses, Traefik replaced B's original Transfer-Encoding header with Content-Length.
Based on this header, sun.net.www.HttpClient would decide to use a KeepAliveStream.
The problem is that this stream then doesn't close.
Versions
Spring Boot: 2.2.6
Tomcat: 9.0.33
Feign: (determined by Spring Cloud 2.1.2)
Traefik: 2.2.11
What it's not
It is not an issue with the proxied service (B) being slow. In all cases, myFeignProxy responds in a few ms and the endpoint returns the 200 HTTP status immediately.
I've tried changing Feign client timeouts, without any impact.
I also see no correlation between the pause, the size of the body and the time of response of the feign proxy.
Resource
Size (KB)
Feign time (ms)
60s pause
1
1.87
34
yes
2
3.29
35
no
3
1.55
34
yes
4
10.05
81
yes
The problem is not related to Spring Security either, as entirely removing it (configuration and dependencies) does not change the symptoms.
Updated investigations
Technical layer causing the pause
The pause seems to come from Tomcat. Replacing the Tomcat starter with the Jetty starter (in A) eliminates the issue (all requests answer immediately).
That being said, it doesn't explain the problem.
Trace log analysis
It appears that, for an endpoint where the pause appears, there is one additional line in logs during the calls. See below for examples. The parameters for the HttpURLConnection also seem to be different, though I do not understand why.
Case without pause
DEBUG [nio-8100-exec-9] s.n.www.protocol.http.HttpURLConnection : sun.net.www.MessageHeader#784b4a945 pairs: {GET /xxx HTTP/1.1: null}{Accept: application/json}{User-Agent: Java/11.0.7}{Host: xxx}{Connection: keep-alive}
DEBUG [nio-8100-exec-9] s.n.www.protocol.http.HttpURLConnection : sun.net.www.MessageHeader#2a3818a612 pairs: {null: HTTP/1.1 200 OK}{Cache-Control: no-cache, no-store, max-age=0, must-revalidate}{Content-Type: application/json}{Date: Tue, 20 Apr 2021 07:47:47 GMT}{Expires: 0}{Pragma: no-cache}{Strict-Transport-Security: max-age=31536000 ; includeSubDomains}{Vary: accept-encoding}{X-Content-Type-Options: nosniff}{X-Frame-Options: DENY}{X-Xss-Protection: 1; mode=block}{Transfer-Encoding: chunked}
Case with pause
DEBUG [nio-8100-exec-6] s.n.www.protocol.http.HttpURLConnection : sun.net.www.MessageHeader#7bff99e75 pairs: {GET /xxx HTTP/1.1: null}{Accept: application/json}{User-Agent: Java/11.0.7}{Host: xxx}{Connection: keep-alive}
TRACE [nio-8100-exec-6] s.n.www.protocol.http.HttpURLConnection : KeepAlive stream used: https://xxx/xxx
DEBUG [nio-8100-exec-6] s.n.www.protocol.http.HttpURLConnection : sun.net.www.MessageHeader#5aed6c9312 pairs: {null: HTTP/1.1 200 OK}{Cache-Control: no-cache, no-store, max-age=0, must-revalidate}{Content-Type: application/json}{Date: Tue, 20 Apr 2021 07:57:42 GMT}{Expires: 0}{Pragma: no-cache}{Strict-Transport-Security: max-age=31536000 ; includeSubDomains}{Vary: accept-encoding}{X-Content-Type-Options: nosniff}{X-Frame-Options: DENY}{X-Xss-Protection: 1; mode=block}{Content-Length: 803}
When finally responding after the pause (not present when responding immediately)
DEBUG [nio-8100-exec-7] o.apache.tomcat.util.threads.LimitLatch : Counting down[http-nio-8100-exec-7] latch=1
DEBUG [nio-8100-exec-7] org.apache.tomcat.util.net.NioEndpoint : Calling [org.apache.tomcat.util.net.NioEndpoint#63668501].closeSocket([org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper#cfdc708:org.apache.tomcat.util.net.NioChannel#6e7c15b6:java.nio.channels.SocketChannel[connected local=/0:0:0:0:0:0:0:1:8100 remote=/0:0:0:0:0:0:0:1:52501]])
The additional log ("KeepAlive stream used") occurs in sun.net.www.http.HttpClient. I seem to understand the decision to use this KeepAlive depends on the proxied response's headers.
Headers analysis and Traefik meddling
Traefik changes the headers between A and B.
B always returns its response with Transfer-Encoding: chunked.
Traefik sometimes replaces it with Content-Length and the correct size for the payload.
Nothing is configured in our Traefik instance concerning headers.
The rules used to decide between Transfer-Encoding and Content-Length seem hard to grasp:
It seems to depend on the endpoint being called or its payload.
It also seems to depend on something from the caller, as I don't always get the same header depending whether I'm calling from A or from curl.
This explains why the problem is not reproducible when both applications are on the local machine, since there is the no Traefik between them.
About the rules that Traefik applies, it appears the HTTP version plays a role.
$ curl -s -o /dev/null -D - --http1.1 https://traefik/b/endpoint
HTTP/1.1 200 OK
# ...
Transfer-Encoding: chunked
$ curl -s -o /dev/null -D - --http2 https://traefik/b/endpoint
HTTP/2 200
# ...
content-length: 2875
Traefik always returns the same headers for a given endpoint, so we can think that the headers also depend on the address or, more presumably, on the payload (a given endpoint always returns the same payload for this service).
First version of B that doesn't work
Performing a git bisect, I discovered the 60-second pause problem appeared when the proxied service (B) started using ZonedDateTime instead of LocalDateTime in its DTO. The only change is that date time fields now have an offset in the response body, there is no impact on the headers. Yet, the Feign client works fine for LocalDateTimes and pauses for ZonedDateTimes.
Forcing the connection to close
Passing the Connection: close header makes the pause disappear in A.
The response body is returned immediately.
HttpClient does not use the KeepAliveStream in this case.
Trying to reproduce with a mock B
I wrote a quick mock service B.
It returns the Content-Type header and the content.
What's interesting is that:
If mock-B returns the Content-Length header, then A has a 60-second pause.
If mock-B does not return the Content-Length header, then A returns immediately.
This is consistent with previous tests indicating that the Content-Length header plays a role, but I am still unable to understand which, since it is present in some Traefik occurrences that A still returns immediately.
mock-B
const port = 8080;
const http = require('http');
const path = require('path');
const fs = require('fs');
const countBytesInString = str => encodeURI(str).split(/%..|./).length - 1
const requestListener = (req, res) => {
console.log(`\n>> Mock call to endpoint ${req.url}`);
fs.readFile(path.join(__dirname, `endpoints${req.url}.json`), 'utf8' , (err, data) => {
if (err) {
console.error(err)
return;
}
const contentLength = countBytesInString(data);
console.log(`Content-Length: ${contentLength}`);
res.setHeader('Content-Type', 'application/json');
res.setHeader('content-length', contentLength);
res.writeHead(200);
res.end(data);
})
};
const server = http.createServer(requestListener);
server.listen(port);
console.log(`Mock server listening on port ${port}`);

Explaining the causes
We've finally understood the mechanism that leads to the issue.
A -> Traefik -> B
B returns a list of objects with a ZonedDateTime field ("validFrom":"2021-12-24 23:59:57+01:00") and the header Transfer-Encoding: chunked.
Traefik replaces the Transfer-Encoding: chunked with a Content-Length, computed from the body of the request.
A receives the response, deserializes the objects, then reserializes them but in the UTC timezone ("validFrom":"2021-12-24 22:59:57Z"), but it reuses the Content-Length from Traefik without recalculating it.
As a consequence, the body from is shorter than the announced Content-Length (each ZonedDateTime takes five bytes less when A sends it than when Traefik computes the content length).
The client however has been announced a Content-Length and is waiting for the missing bytes.
Possible solution
The solution we have in mind right now is to tell Feign and its calling controller that it returns a ResponseEntity<String> instead of a ResponseEntity<List<MyObject>>.
Pros:
B's response is returned as-is, so no more problem due to a varying content length.
A does not spend CPU-time deserializing then immediately reserializing the response.
Cons:
The OpenApi doc of A won't show the type of return (unless the Open API annotation allow to specify the return model). That's what I'll test later today.

Related

How to make REST call asynchronous in Java

I have REST calls between two microservices, one of the call is taking more than 15 mins of time to complete. We have company's own private cloud implementation which is terminating any open connection kept for more than 15 mins.
We are looking for some asynchronous rest call implementation, where service A will trigger the rest call to service B and forget and service B will notify when the response is ready to be served.
Is there any widely used technique/API for such scenario? I was not able to find any thing concrete on this front.

You could use Polling. Something like this :
Service A triggers a Rest call to Service B which returns an OK response. Then in each 1 minute service A make another API request to another endpoint in Service B which would return status of the previous request until the process is completed or for may be a certain point of time. Now when the 2nd request send the status as success you can mark the process as completed.

Instead of creating the actual resources, create a temporary one. Instead of returning a 201 (Created) HTTP response, you can issue a 202 (Accepted) response code. This informs the client that the request has been accepted and understood by the server, but the resource is not (yet) created. Send the temporary resource inside the Location header.
Request:
POST /blogs HTTP/1.1
<xml>
blogdata
</xml>
Response:
HTTP/1.1 202 Accepted
Location: /queue/12345
This location can store information about the status of the actual resource: an ETA on when it will be created, what is currently being done or processed.
When the actual resource has been created, the temporary resources can return a 303 (See other) response. The location header returns the URI to the definitive resource. A client can either DELETE the temporary resource, or the server can expire this resource and return a 410 (Gone) later on.
Source: https://restcookbook.com/Resources/asynchroneous-operations/

How to get all effective HTTP request headers?

I want to use the new java.net.HttpClient to do some requests to another system.
For debug purposes I want to log (and later store in our db) the request that I send and the response that I receive.
How can I retrieve the effective http headers, that java is sending?
I tried to get the headers like this:
HttpRequest request = HttpRequest.newBuilder()
.uri(URI.create("http://localhost:54113"))
.build();
System.out.println("HTTP-Headers:\n---");
request.headers().map()
.forEach((key, values) ->
values.forEach(value ->
System.out.println(key + ": " + value)
)
);
System.out.println("---");
HttpClient.newHttpClient().send(request, HttpResponse.BodyHandlers.ofString());
But it outputs:
HTTP-Headers:
---
---
My server, however, tells me, that it receives these Http headers:
HTTP-Headers:
---
Connection: Upgrade, HTTP2-Settings
User-Agent: Java-http-client/11
Host: localhost:54113
HTTP2-Settings: AAEAAEAAAAIAAAABAAMAAABkAAQBAAAAAAUAAEAA
Content-Length: 0
Upgrade: h2c
---
I have a multithreaded application and simultanious requests might occur. Using a log framework with custom appenders is therefore probably not reliable.

I have an unfortunate answer to your question: Regrettably, impossible.
Some background on why this is the case:
The actual implementation of HttpRequest used by your average OpenJDK-based java-core-library implementation is not java.net.http.HttpRequest - that is merely an interface. It's jdk.internal.net.http.HttpRequestImpl.
This code has 2 separate lists of headers to send; one is the 'user headers' and the other is the 'system headers'. Your .headers() call retrieves solely the user headers, which are headers you explicitly asked to send, and, naturally, as you asked for none to send, it is empty.
The system headers is where those 6 headers are coming from. I don't think there is a way to get at these in a supported fashion. If you want to dip into unsupported strategies (Where you write code that queries internal state and is thus has no guarantee to work on other JVM implementations, or a future version of a basic JVM implementation), it's still quite difficult, unfortunately! Some basic reflection isn't going to get the job done here. It's the worst news imaginable:
These 6 headers just aren't set, at all, until send is invoked. For example, the three headers that are HTTP2 related are set in the package-private setH2Upgrade method, and this method is passed the HttpClient object, which proves that this cannot possibly be called except in the chain of events started when you invoke send. An HttpClient object doesn't exist in the chain of code that makes HttpRequest objects, which proves this.
To make matters considerably worse, the default HttpClient impl will first clone your HttpRequest, then does a bunch of ops on this clone (including adding those system headers), and then sends the clone, which means the HttpRequest object you have doesn't have any of these headers. Not even after the send call completes. So even if you are okay with fetching these headers after the send and are okay with using reflecting to dig into internal state to get em, it won't work.
You also can't reflect into the client because the relevant state (the clone of your httprequest object) isn't in a field, it's in a local variable, and reflection can't get you those.
A HttpRequest can be configured with custom proxies, which isn't much of a solution either: That's TCP/IP level proxies, not HTTP proxies, and headers are sent encrypted with HTTPS. Thus, writing code that (ab)uses the proxy settings so that you can make a 'proxy' that just bounces the connection around your own code first before sending it out, in order to see the headers in transit, is decidedly non-trivial.
The only solution I can offer you is to ditch java.net.http.HttpClient entirely and use a non-java-lib-core library that does do what you want. perhaps OkHttp. (Before you sing hallelujah, I don't actually know if OkHttp can provide you with all the headers it intends to send, or give you a way to register a hook that is duly notified, so investigate that first!)

RESTful API - chunked response for bulk operation

I work on a REST-like API that will support bulk operations on some resources. As it may take some time to finish such a request, I would like to return statuses of the operations in a chunked response. The media type should be JSON. How to do it with JAX-RS?
(I know that there is StreamingOutput, but it needs to manually serialize the data.)

Chunked Transfer encoding is usually used in cases where the content length is unknown when the sender starts transmitting the data. The receiver can handle each chunk while the server is still producing new ones.
This implies the the server is sending the whole time. I don't think that it makes too much sense to send I'm still working|I'm still working|I'm still working| in chunks and as far as I know chunked transfer-encoding is handled transparently by most application servers. They switch automatically when the response is bigger then a certain size.
A common pattern for your use case looks like this:
The client triggers a bulk operation:
POST /batch-jobs HTTP/1.1
The server creates a resource which describes the status of the job and returns the URI in the Location header:
HTTP/1.1 202 Accepted
Location: /batch-jobs/stats/4711
The client checks this resource and receives a 200:
GET /batch-jobs/stats/4711 HTTP/1.1
This example uses JSON but you could also return plain text or add caching headers which tell the client how long he should wait for the next poll.
HTTP/1.1 200 OK
Content-Type: application/json
{ "status" : "running", "nextAttempt" : "3000ms" }
If the job is done the server should answer with a 303 and the URI of the resource he has created:
HTTP/1.1 303 See other
Location: /batch-jobs/4711

How to know when HTTP-server is done sending data

I'm working on a browser/proxy oriented project where I need to download webpages. After sending a custom HTTP request to a web server I start listening for a server response.
When reading the response, I check the response headers for a Content-Length:-row. If I get one of those, it's easy to determine when the server is done sending data since I always know how many bytes of data I have received.
The problem occurs when the server doesn't include the Content-Length header and also keeps the connection open for further requests. For example, the google server responds with gzipped-content, but doesn't include content length. How do I know when to stop waiting for more data and close the connection?
I have considered using a timeout value to close the connection when no data has been received for a while, but this seems like the wrong way to do it. Chrome for example, can download the same pages as me and always seem to know exactly when to close the connection.

Have a look at IETF RfC 2616, search for chunked encoding and Content-Range.
HTTP is designed to return content of unknown length, as in:
HTTP/1.1 200 OK
Content-Type: text/plain
Transfer-Encoding: chunked
25
This is the data in the first chunk
1C
and this is the second one
3
con
8
sequence
0
source Wikipedia

I would try to suggest you to force Connection: close header so you are sure that the server closes the connection after output is finished, no matter if the Content-length is set or not. Performance will be partially affected by this

There are two cases you can expect:
1. socket-close
2. socket-timeout
Usually the socket will be closed, it also make sense to declare an Socket Timeout.
Remember
int stream.read(byte[],size);
returns the real size of byte[]-argument's size that has been read till socket-close or socket-timeout (or size-argument reached).
Regards.

jax-ws change Content-type to Content-Type because server is hyper sensitive

I have to connect to a poorly implemented server that only understands Content-Type (capital-T) and not Content-type. How can I ask my JAX-WS client to send Content-Type?
I've tried:
Map<String, List<String>> headers = (Map<String, List<String>>)
((BindingProvider)port).getRequestContext().get(MessageContext.HTTP_REQUEST_HEADERS);
But headers is null. What am I doing wrong?

I have to connect to a poorly implemented server that only understands Content-Type(capital-T) and not Content-type. How can I ask my jax-ws client to send Content-Type?
I've dug this question a bit more and, sadly, I'm afraid the answer is: you can't. Let me share my findings.
First, the code that you'll find in https://jax-ws.dev.java.net/guide/HTTP_headers.html does not give you access to the HTTP headers of the future HTTP request (that hasn't been created at this point), it allows you to set additional HTTP headers for making a request (that will be added to the HTTP request later).
So, don't expect the following code to not return null if you don't put anything before (and actually, you'll only get what you put in there):
((BindingProvider)port).getRequestContext().get(MessageContext.HTTP_REQUEST_HEADERS);
Then, I did a little test based on the code provided in the same link:
AddNumbersImplService service = new AddNumbersImplService();
AddNumbersImpl port = service.getAddNumbersImplPort();
((BindingProvider)port).getRequestContext().put(MessageContext.HTTP_REQUEST_HEADERS,
Collections.singletonMap("X-Client-Version",Collections.singletonList("1.0-RC")));
port.addNumbers(3, 5);
And this is what I see in the HTTP request when running the client code:
POST /q2372336/addnumbers HTTP/1.1
Content-type: text/xml;charset="utf-8"
X-client-version: 1.0-RC
Soapaction: ""
Accept: text/xml, multipart/related, text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2
User-Agent: JAX-WS RI 2.1.6 in JDK 6
Host: localhost:8080
Connection: keep-alive
Content-Length: 249
Do you notice the difference: only the first char of the X-Client-Version header is kept upper cased, the rest is lowered!
And indeed, if you check the class c.s.x.w.t.Headers that is used to represent HTTP request (and response) headers, you'll see that it "normalizes" keys when they are added (in normalize(String)):
/* Normalize the key by converting to following form.
* First char upper case, rest lower case.
* key is presumed to be ASCII
*/
private String normalize (String key) {
...
}
So, while the c.s.x.w.t.h.c.HttpTransportPipe class (my understanding is that this is where the HTTP request is created, this is also where previously added headers will be added to the HTTP request headers) actually adds "Content-Type" as key in a c.s.x.w.t.Headers instance, the key will be modified because of the previously mentioned implementation detail.
I may be wrong but I don't see how this could be changed without patching the code. And the odd part is that I don't think that this "normalizing" stuff is really RFCs compliant (didn't check what RFCs say about headers case though). I'm surprised. Actually, you should raise an issue.
So I see three options here (since waiting for a fix might not be an option):
Patch the code yourself and rebuild JAX-WS RI (with all the drawbacks of this approach).
Try another JAX-WS implementation like CFX for your client.
Let the request go through some kind of custom proxy to modify the header on the fly.

You can modify the HTTP headers from the RequestContext. If you have access to the port object you can cast it to a javax.xml.ws.BindingProvider, which will give you access to the RequestContext.
You might also want to remove the unaccepted "Content-type" header.
This page shows how to do it in a bit more detail: https://jax-ws.dev.java.net/guide/HTTP_headers.html
Let me know if you need more code samples, or if you paste some of your code I can show you how to modify it.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.