high net IO time in weblogic.net.http.MessageHeader.isHttp - java

i am using jersey as rest client running in weblogic server, and looks like the http client is taking much time on net IO. the call stack is below
java.io.BuffererdInputStream.read
weblogic.net.http.MessageHeader.isHttp
weblogic.net.http.MessageHeader.pasreHeader
weblogic.net.http.HttpClient.parseHTTP
com.sun.jersey.api.client.WebResource$Builder.get
performance profile shows java.io.BuffererdInputStream.read took 60% of total request time in waiting net IO. it can also be seen in a small load of 2 concurrent http client.
what is possible reason that cause a net IO problem?
my environment
weblogic server 10.3
os: linux

Spending most of your application threads' time in network I/O is normal when using a blocking web framework. Moving bits over a network is orders of magnitude slower than, say, moving bits in and out of memory in a single computer.
Low level networking protocols are designed to guarantee a message gets where it's going without being changed en route, not to do that particularly fast.

Related

Tactics to explore performance degradation of java based web app over time

I am working on enterprise java application which has a lot of tools/frameworks in it already, such as Struts, JAX-RS and Spring MVC. It contains UIs and REST endpoints bundled together in a .war file.
The project is evolving and we are getting rid of older tools, striving for sticking up with only Spring MVC/Webflux.
Application is performing search on millions of XML/JSON records and recently the search engine was switched from Marklogic to Elasticsearch.
What we have noticed is that on production with not that heavy usage (up to 1.7k rpm on 2-4 application nodes) response times on some of the endpoints are increasing over time.
Elasticsearch has a space to grow and does not show any signs of a huge load.
So currently we have to restart/replace prod instances once in like a week or two when average response time is over 3 seconds instead of a regular 200-300 millis.
I tried to get CPU and heap flame graphs using async-profiler but the load profile is changing on every measurement as we have bunch of features available so I cannot really compare how graphs are changing over time.
Can you advise me on some tactics/approaches for finding the proper place in the code?
Found issue. It is related to thread pooling.
What we have noticed is that over time amount of active tomcat threads were growing together with response times:
On the image you can also see that the server was restarted on May 9th.
I was able to get a heap dump before the server restarted and after some digging found an interesting repeated piece in thread dump:
Thread xxx
at sun.misc.Unsafe.park(ZJ)V (Native Method)
at java.util.concurrent.locks.LockSupport.park(Ljava/lang/Object;)V (LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await()V (AbstractQueuedSynchronizer.java:2039)
at org.apache.http.pool.AbstractConnPool.getPoolEntryBlocking(Ljava/lang/Object;Ljava/lang/Object;JLjava/util/concurrent/TimeUnit;Ljava/util/concurrent/Future;)Lorg/apache/http/pool/PoolEntry; (AbstractConnPool.java:377)
at org.apache.http.pool.AbstractConnPool.access$200(Lorg/apache/http/pool/AbstractConnPool;Ljava/lang/Object;Ljava/lang/Object;JLjava/util/concurrent/TimeUnit;Ljava/util/concurrent/Future;)Lorg/apache/http/pool/PoolEntry; (AbstractConnPool.java:67)
at org.apache.http.pool.AbstractConnPool$2.get(JLjava/util/concurrent/TimeUnit;)Lorg/apache/http/pool/PoolEntry; (AbstractConnPool.java:243)
at org.apache.http.pool.AbstractConnPool$2.get(JLjava/util/concurrent/TimeUnit;)Ljava/lang/Object; (AbstractConnPool.java:191)
at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.leaseConnection(Ljava/util/concurrent/Future;JLjava/util/concurrent/TimeUnit;)Lorg/apache/http/HttpClientConnection; (PoolingHttpClientConnectionManager.java:282)
at org.apache.http.impl.conn.PoolingHttpClientConnectionManager$1.get(JLjava/util/concurrent/TimeUnit;)Lorg/apache/http/HttpClientConnection; (PoolingHttpClientConnectionManager.java:269)
at org.apache.http.impl.execchain.MainClientExec.execute(Lorg/apache/http/conn/routing/HttpRoute;Lorg/apache/http/client/methods/HttpRequestWrapper;Lorg/apache/http/client/protocol/HttpClientContext;Lorg/apache/http/client/methods/HttpExecutionAware;)Lorg/apache/http/client/methods/CloseableHttpResponse; (MainClientExec.java:191)
at org.apache.http.impl.execchain.ProtocolExec.execute(Lorg/apache/http/conn/routing/HttpRoute;Lorg/apache/http/client/methods/HttpRequestWrapper;Lorg/apache/http/client/protocol/HttpClientContext;Lorg/apache/http/client/methods/HttpExecutionAware;)Lorg/apache/http/client/methods/CloseableHttpResponse; (ProtocolExec.java:185)
at org.apache.http.impl.execchain.RetryExec.execute(Lorg/apache/http/conn/routing/HttpRoute;Lorg/apache/http/client/methods/HttpRequestWrapper;Lorg/apache/http/client/protocol/HttpClientContext;Lorg/apache/http/client/methods/HttpExecutionAware;)Lorg/apache/http/client/methods/CloseableHttpResponse; (RetryExec.java:89)
at org.apache.http.impl.execchain.RedirectExec.execute(Lorg/apache/http/conn/routing/HttpRoute;Lorg/apache/http/client/methods/HttpRequestWrapper;Lorg/apache/http/client/protocol/HttpClientContext;Lorg/apache/http/client/methods/HttpExecutionAware;)Lorg/apache/http/client/methods/CloseableHttpResponse; (RedirectExec.java:111)
at org.apache.http.impl.client.InternalHttpClient.doExecute(Lorg/apache/http/HttpHost;Lorg/apache/http/HttpRequest;Lorg/apache/http/protocol/HttpContext;)Lorg/apache/http/client/methods/CloseableHttpResponse; (InternalHttpClient.java:185)
at org.apache.http.impl.client.CloseableHttpClient.execute(Lorg/apache/http/client/methods/HttpUriRequest;Lorg/apache/http/protocol/HttpContext;)Lorg/apache/http/client/methods/CloseableHttpResponse; (CloseableHttpClient.java:83)
at org.apache.http.impl.client.CloseableHttpClient.execute(Lorg/apache/http/client/methods/HttpUriRequest;)Lorg/apache/http/client/methods/CloseableHttpResponse; (CloseableHttpClient.java:108)
at io.searchbox.client.http.JestHttpClient.executeRequest(Lorg/apache/http/client/methods/HttpUriRequest;)Lorg/apache/http/client/methods/CloseableHttpResponse; (JestHttpClient.java:136)
at io.searchbox.client.http.JestHttpClient.execute(Lio/searchbox/action/Action;Lorg/apache/http/client/config/RequestConfig;)Lio/searchbox/client/JestResult; (JestHttpClient.java:70)
at io.searchbox.client.http.JestHttpClient.execute(Lio/searchbox/action/Action;)Lio/searchbox/client/JestResult; (JestHttpClient.java:63)
...
In our case we are using Jest library to talk to Elasticsearch.
Internally it is using Apache HTTP client & Apache HTTP Async Client.
As you see on the thread dump it's clear that this thread was waiting for an available thread in HTTP Client's thread pool. And there were more threads with exactly the same stack.
What I also discovered is that we set maxTotal (maximum total number of connections) to 20 and defaultMaxPerRoute (maximum connections per route) to 2:
By default the pool allows only 20 concurrent connections in total and two concurrent connections per a unique route. The two connection limit is due to the requirements of the HTTP specification. However, in practical terms this can often be too restrictive.
See Connection pools description.
So the fix I did is increased those values to 50 and 40 respectively.
I would still prefer to have this parameters unbounded and grow with a usage but for now stick to these values.

Java Service: What Causes Latency Response Time Spike?

I have a very simple Java REST service. At lower traffic volumes, the service runs perfectly with response times of ~1ms and zero server backlog.
When traffic rises past a certain threshold the response times skyrocket from 1ms to 2.0 seconds, the http active session queue and open file counts spike, and the server is performing unacceptably. I posted a metrics graph of a typical six hour window where traffic starts low and goes above the problem threshold.
Any ideas on what could be causing this or how to diagnose further?
Your webapp will use a thread (borrowed from thread pool) to server one request.
Under stress load, many threads are created, if the number of requests exceed the capacity of the pool, they have to queue, waiting till a thread is available again from pool.
If your service is not run fast enough, (especially you're doing IO - open file), the wait time is increase, lead to slow response.
CPU has to switch between many threads hence the CPU will spike under load.
That's why they need a load balancing and many webapp to server as a service. The stress load is distributed to many subwebapp which improve the end user experience
The usual approach to diagnostic is to create load with JMeter and investigate results with Java VisualVM, Eclipse memory analyzer and so on. I don't know whether you have tried it.

Tomcat 8 - POST and PUT requests slow when deployed on RHEL

I have developed a REST API using Spring Framework. When I deploy this in Tomcat 8 on RHEL, the response times for POST and PUT requests are very high when compared to deployment on my local machine (Windows 8.1). On RHEL server it takes 7-9 seconds whereas on local machine it is less than 200 milliseconds.
RAM and CPU of RHEL server are 4 times that of local machine. Default tomcat configurations are used in both Windows and RHEL. Network latency is ruled out because GET requests take more or less same time as local machine whereas time taken to first byte is more for POST and PUT requests.
I even tried profiling the remote JVM using Visual JVM. There are no major hotspots in my custom code.
I was able to reproduce this same issue in other RHEL servers. Is there any tomcat setting which could help in fixing this performance issue ?
The profiling log you have placed means nothing, more or less. It shows the following:
The blocking queue is blocking. Which is normal, because this is its purpose - to block. This mean there is nothing to take from it.
It is waiting for connection on the socket. Which is also normal.
You do not specify what is your RHEL 8 physical/hardware setup. The operating system here might not be the only thing. You can not eliminate still network latency. What about if you have SAN, the SAN may have latency itself. If you are using SSD drive and the RHEL is using SAN with replication you may experience network latecy there.
I am more inclined to first check the IO on the disk than to focus on operating system. If the server is shared there might be other processes occupying the disk.
You are saying that the latency is ruled out because the GET requests are taking the same time. This is not enough to overrule it as I said this is the latency between the client and the application server, it does not check the latency between your app server machin and your SAN or disk or whatever storage is there.

How to increase WebSocket throughput

I need to pull data from a lot of clients connecting to a java server through a web socket.
There are a lot of web socket implementations, and I picked vert.x.
I made a simple demo where I listen to text frames of json, parse them with jackson and send response back. Json parser doesn't influence significantly on the throughput.
I am getting overall speed 2.5k per second with 2 or 10 clients.
Then I tried to use buffering and clients don't wait for every single response but send batch of messages (30k - 90k) after a confirmation from a server - speed increased up to 8k per second.
I see that java process has a CPU bottleneck - 1 core is used by 100%.
Mean while nodejs client cpu consumption is only 5%.
Even 1 client causes server to eat almost a whole core.
Do you think it's worth to try other websocket implementations like jetty?
Is there way to scale vert.x with multiple cores?
After I changed the log level from debug to info I have 70k. Debug level causes vert.x print messages for every frame.
It's possible specify number of verticle (thread) instances by e.g. configuring DeploymentOptions http://vertx.io/docs/vertx-core/java/#_specifying_number_of_verticle_instances
You was able to create more than 60k connections on a single machine, so I assume the average time of a connection was less than a second. Is it the case you expect on production? To compare other solutions you can try to run https://github.com/smallnest/C1000K-Servers
Something doesn't sound right. That's very low performance. Sounds like vert.x is not configured properly for parallelization. Are you using only one verticle (thread)?
The Kaazing Gateway is one of the first WS implementations. By default, it uses multiple cores and is further configurable with threads, etc. We have users using it for massive IoT deployments so your issue is not the JVM.
In case you're interested, here's the github repo: https://github.com/kaazing/gateway
Full disclosure: I work for Kaazing

Service using Jest is blocking on threadpool, why?

I've a Java + Spring app that will query ElasticSearch using Jest client (poor choice because it is poorly documented). ElasticSearch has response times of about 8-20 ms with 150 concurrent connections, but my app goes up to 900 -1500 ms. A quick look at VisualVM tells me that the processor usage is below 10% and profiling it tells me that 98% of the time all that the app does is wait on the following method
org.apache.http.pool.PoolEntryFuture.await()
that is part of Apache HttpCore and a dependency of Jest. I don't have a limitation in terms of threads that can run on tomcat (max is 200 and VisualVM says that the maximum number of thread during the experiment was 174). So it's not waiting free threads.
I think that the latency increase is excessive and I suspect that Jest is using an internal threadpool that has not enough threads to cope with all the requests, but I don't know.
Thoughts?
I think that the latency increase is excessive and I suspect that Jest is using an internal threadpool that has not enough threads to cope with all the requests...
In poking around the source real fast I see that you should be able to inject a ClientConfig into the Jest client factory.
The ClientConfig has the following setters which seem to impact the internal Apache http client connection manager:
clientConfig.maxTotalConnection(...);
clientConfig.defaultMaxTotalConnectionPerRoute(...);
clientConfig.maxTotalConnectionPerRoute(...);
Maybe tweaking some of those will give you more connections? Take a look at the JestClientFactory source to see what it is doing. We've definitely had to tweak those values in the past when making a large number of connections to the same server using HttpClient.
I would test this with just one connection and see what the average response time is. With just one thread you should have more than enough thread and resources etc. Most likely the process is waiting on an external resource like a database or a network service.

Categories