Tomcat 8 - POST and PUT requests slow when deployed on RHEL

Tomcat 8 - POST and PUT requests slow when deployed on RHEL - java

I have developed a REST API using Spring Framework. When I deploy this in Tomcat 8 on RHEL, the response times for POST and PUT requests are very high when compared to deployment on my local machine (Windows 8.1). On RHEL server it takes 7-9 seconds whereas on local machine it is less than 200 milliseconds.
RAM and CPU of RHEL server are 4 times that of local machine. Default tomcat configurations are used in both Windows and RHEL. Network latency is ruled out because GET requests take more or less same time as local machine whereas time taken to first byte is more for POST and PUT requests.
I even tried profiling the remote JVM using Visual JVM. There are no major hotspots in my custom code.
I was able to reproduce this same issue in other RHEL servers. Is there any tomcat setting which could help in fixing this performance issue ?

The profiling log you have placed means nothing, more or less. It shows the following:
The blocking queue is blocking. Which is normal, because this is its purpose - to block. This mean there is nothing to take from it.
It is waiting for connection on the socket. Which is also normal.
You do not specify what is your RHEL 8 physical/hardware setup. The operating system here might not be the only thing. You can not eliminate still network latency. What about if you have SAN, the SAN may have latency itself. If you are using SSD drive and the RHEL is using SAN with replication you may experience network latecy there.
I am more inclined to first check the IO on the disk than to focus on operating system. If the server is shared there might be other processes occupying the disk.
You are saying that the latency is ruled out because the GET requests are taking the same time. This is not enough to overrule it as I said this is the latency between the client and the application server, it does not check the latency between your app server machin and your SAN or disk or whatever storage is there.

Related

Failed to connect to Tomcat server on ec2 instance

UPDATE:
My goal is to learn what factors could overwhelm my little tomcat server. And when some exception happens, what I could do to resolve or remediate it without switching my server to a better machine. This is not a real app in a production environment but just my own experiment (Besides some changes on the server-side, I may also do something on my client-side)
Both of my client and server are very simple: the server only checks the URL format and send 201 code if it is correct. Each request sent from my client only includes an easy JSON body. There is no database involved. The two machines (t2-micro) only run client and server respectively.
My client is OkHttpClient(). To avoid timeout exceptions, I already set timeout 1,000,000 milli secs via setConnectTimeout, setReadTimeout, and setWriteTimeout. I also go to $CATALINA/conf/server.xml on my server and set connectionTimeout = "-1"(infinite)
ORIGINAL POST:
I'm trying to stress out my server by having a client launching 3000+ threads sending HTTP requests to my server. Both of my client and server reside on different ec2 instances.
Initially, I encountered some timeout issues, but after I set the connection, read and write timeout to a bigger value, this exception has been resolved. However, with the same specification, I'm getting java.net.ConnectException: Failed to connect to my_host_ip:8080 exception. And I do not know its root cause. I'm new to multithreading and distributed system, can anyone please give me some insights of this exception?
Below is some screenshot of from my ec2:
1. Client:
2. Server:

Having gone through similar exercise in past I can say that there is no definitive answer to the problem of scaling.
Here are some general trouble shooting steps that may lead to more specific information. I would suggest trying out tests by tweaking a few parameters in each test and measure the changes in Cpu, logs etc.
Please provide what value you have put for the timeout. Increasing timeout could cause your server (or client) to run out of threads quickly (cause each thread can process for longer). Question the need for increasing timeout. Is there any processing that slows your server?
Check application logs, JVM usage, memory usage on the client and Server. There will be some hints there.
Your client seems to be hitting 99%+ and then come down. This implies that there could be a problem at the client side in that it maxes out during the test. Your might want to resize your client to be able to do more.
Look at open file handles. The number should be sufficiently high.
Tomcat has some limit on thread count to handle load. You can check this in server.xml and if required change it to handle more. Although cpu doesn't actually max out on server side so unlikely that this is the problem.
If you a database then check the performance of the database. Also check jdbc connect settings. There is thread and timeout config at jdbc level as well.
Is response compression set up on the Tomcat? It will give much better throughout on server especially if the data being sent back by each request is more than a few kbs.
--------Update----------
Based on update on question few more thoughts.
Since the application is fairly simple, the path in terms of stressing the server should be to start low and increase load in increments whilst monitoring various things (cpu, memory, JVM usage, file handle count, network i/o).
The increments of load should be spread over several runs.
Start with something as low as 100 parallel threads.
Record as much information as you can after each run and if the server holds up well, increase load.
Suggested increments 100, 200, 500, 1000, 1500, 2000, 2500, 3000.
At some level you will see that the server can no longer take it. That would be your breaking point.
As you increase load and monitor you will likely discover patterns that suggest tuning of specific parameters. Each tuning attempt should then be tested again the same level of multi threading. The improvement of available will be obvious from the monitoring.

How to get rid of tcp-ip send delay in interprocess communication between a java and a php process in KVM VM's

I have a web application that consists of a java part and a php part. When a user does a request the php process will open a tcp/ip connection to the java process. It will keep this connection open for the duration of the request and this connection will be used to send a lot of information back and forth. This application runs very well as long as its hosted on either a dedicated server or on a VM that uses OpenVZ.
As soon as I try to host it on a KVM VM it becomes extremely slow. The reason for this is that within a single user request the php process can easily do up to 1 or 2 thousand tcp-ip sends to the java process. Now since this is all done over the same connection It really should not be a problem but on KVM VM's it seems each send gets about 20 milliseconds worth of delay so now a request that would normally take 0.1 seconds takes 20 seconds instead.
I'm not 100% sure KVM is to blame, But I have tested this on 3 different hosting provdiders using OpenVZ and another 3 different hosting providers using KVM. It runs perfectly fine on all the OpenVZ hosts and the send delay problem is present on all the KVM hosts.
O and I have tcpnodelay set on both the java and the php side.
Any idea what I could try to make this work on KVM?

So to answer my own question. It seems it seems you wont be able to avoid that send latency since even though its on localhost it still has to go from the virtualization layer down to the network layer and back up.
However, instead of creating TCP sockets on localhost the solution was to use Unix sockets instead. Since Unix sockets do not access the network layer in any way.
And as a bonus Using Unix sockets instead of TCP sockets gave my application a nice across the board performance boost. Including on setups were it worked fine before.

Netty based application performance issues

I have a Producer Consumer based application based on Netty. The basic requirement was to build a message oriented middleware (MOM)
MOM
So the MOM is based on the concept of queuing (Queuing makes systems loosely coupled and that was the basic requirement of the application).
The broker understands the MQTT protocol. We performed stress testing of the application on our local machine. These are the specs of the local machine.
We were getting great results. However, our production server is AWS Ubuntu based. So we stress tested the same application on AWS Ubuntu server. The performance was 10X poor than the local system. This is the configuration of the AWS server.
We have tried the following options to figure out where the issue is.
Initially we checked for bugs in our business logic. Did not find any.
Made the broker, client and all other dependencies same on mac as well as aws. What I mean by same dependencies is that we installed the same versions on aws as on mac.
Increased the ulimit on AWS.
Played with sysctl settings.
We were using Netty 4.1 and we had a doubt that it might be a Netty error as we do not have stable release for Netty 4.1 yet. So we even built the entire application using Netty 3.9.8 Final (Stable) and we still faced the same issue.
Increased the hardware configurations substantially of the AWS machine.
Now we have literally run out of options. The java version is the same on both machines.
So the last resort for us is to build the entire application using NodeJS but that would require a lot of effort rather than tweaking something in Netty itself. We are not searching for Java based alternatives to Netty as we think this might even be a bug in JVM NIO's native implementation on Mac and Ubuntu.
What possible options can we try further to solve this bug. Is this a Netty inherent issue. Or is this something to do with some internal implementations on Mac and Ubuntu which are different and are leading to perfomance differences as we see them ?
EDIT
The stress testing parameters are as follows.
We had 1000 clients sending 1000 messages per second (Global rate).
We ran the test for about 10 minutes to note the latency.
On the server side we have 10 consumer threads handling the messages.
We have a new instance of ChannelHandler per client.
For boss pool and worker pool required by Netty, we used the Cached Thread pool.
We have tried tuning the consumer threads but to no avail.
Edit 2
These are the profiler results provided by jvmtop for one phase of load testing.

high net IO time in weblogic.net.http.MessageHeader.isHttp

i am using jersey as rest client running in weblogic server, and looks like the http client is taking much time on net IO. the call stack is below
java.io.BuffererdInputStream.read
weblogic.net.http.MessageHeader.isHttp
weblogic.net.http.MessageHeader.pasreHeader
weblogic.net.http.HttpClient.parseHTTP
com.sun.jersey.api.client.WebResource$Builder.get
performance profile shows java.io.BuffererdInputStream.read took 60% of total request time in waiting net IO. it can also be seen in a small load of 2 concurrent http client.
what is possible reason that cause a net IO problem?
my environment
weblogic server 10.3
os: linux

Spending most of your application threads' time in network I/O is normal when using a blocking web framework. Moving bits over a network is orders of magnitude slower than, say, moving bits in and out of memory in a single computer.
Low level networking protocols are designed to guarantee a message gets where it's going without being changed en route, not to do that particularly fast.

current server status information for tomcat7

I am currently load testing my web application ( Spring + Hibernate based) on a standalone tomcat server (v7.0.27)on a Windows Server 2008 machine. I need to know how tomcat behaves as bulk requests come. e.g.
300 requests recevied - current heap size, server is hung up, server is unable to process, heap size, size of objects, number of objects. So on and so forth.
Is there a way to see this already ? (Info from the manager app is insufficient "current Threads active and memory occupied is not for my requirement).
P.S. maxThreads property for Connector element is 350.
Update : Another issue I faced while load testing - (Tomcat hangs up when i send 300 requests in some cases).
Any help would be highly and greatly appreciated.

you can use jconsole that ships with jdk.
http://docs.oracle.com/javase/6/docs/technotes/guides/management/jconsole.html

If the server hangs, there might be a deadlock.
You can try to attach with JProfiler, the monitoring section will show you the current locking situation and a possible deadlock.
Disclaimer: My company develops JProfiler.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.