First off, has anyone done a performance comparison for throughput/latency between a GRPC client-server implementation v/s a websocket+protobuf client-server implementation? Or at least something similary.
In order to reach this goal, I am trying out the example JAVA helloworld grpc client-server and trying to compare the latency of the response with a similar websocket client-server. Currently i am trying this out with both client and server on my local machine.
The websocket client-server has a simple while loop on the server side. For the grpc server i notice that it uses an asynchronous execution model. I suspect it creates a new thread for each client request, resulting in additional processing overheads. For instance, the websocket response latency i am measuring is in the order of 6-7 ms and the grpc example is showing a latency of about 600-700ms, accounting for protobuf overhead.
In order to do a similar comparison for grpc, is there a way to run the grpc server synchronously? I want to be able to eliminate the overhead of the thread creation/dispatch and other such internal overhead introduced by the asynchronous handling.
I do understand that there is a protobuf overhead involved in grpc that is not there in my websocket client-server example. However i can account for that by measuring the overhead introduced by protobuf processing.
Also, if i cannot run the grpc server synchronously, can i at least measure the thread dispatch/asynchronous processing overhead?
I am relatively new to JAVA, so pardon my ignorance.
Benchmarking in Java is easy to get wrong. You need to do many seconds worth of warm-up for multiple levels of JIT to kick in. You also need time for the heap size to level-off. In a simplistic one-shot benchmark, it's easy to see the code that runs last is fastest (independent of what that code is), due to class loading. 600 ms is an insanely large number for gRPC latency; we see around 300 µs median latency on Google Compute Engine between two machines, with TLS. I expect you have no warm-ups, so you are counting the time it takes for Java to load gRPC and are measuring Java using its interpreter with gRPC.
There is not a synchronous version of the gRPC server, and even if there was it would still run with a separate thread by default. grpc-java uses a cached thread pool, so after an initial request gRPC should be able to re-use a thread for calling the service.
The cost of jumping between threads is generally low, although it can add tail latency. In some in-process NOOP benchmarks we see RPC completion in 8 µs using the extra threads and 4 µs without. If you really want though, you can use serverBuilder.directExecutor() to avoid the thread hop. Note that most services would get slower with that option and have really poor tail latency, as service processing can delay I/O.
In order to do a similar comparison for grpc, is there a way to run the grpc server synchronously? I want to be able to eliminate the overhead of the thread creation/dispatch and other such internal overhead introduced by the asynchronous handling.
You can create a synchronous client. Generally the asynchronous is way faster. (Tested in Scala) You can simply use all resources you got in an non-blocking way. I would create a test on how many request from how many clients the server can handle per second. You can than limit the incoming request per client to make sure that your service will not crash. Asynchronous is also better for HTTP 2. HTTP 2 provides Multiplexing.
For a Benchmark I can recommend Metrics. You can expose the metrics via log or http endpoint.
Related
I am new to the RESTful Webservices world and I have a question regarding how WS works.
Context:
I am developing a RESTful WS that will have a high load; at one given time I can have let's say up to 10 clients sending multiple requests. All the requests will be sent to port 80.
I am developing the WS with Jersey (Java) and deploying on a Tomcat Webserver.
Question:
Let's say we have 5 clients that send requests at the same time; each one sends 2 requests to port 80; will they be treated in FIFO order? Can we have some sort of multi-threading if let's say we don't care about the order?
It all depends what server you use and how it is configured. Standard configuration (you have to work hard to make it not standard) is to have multiple threads. In other words - server usually automatically creates or uses another thread for each new request and it is almost certain that it will be processed in parallel.
You can actually see it inside your running code by using java.lang.Thread.currentThread() - print the name of current thread and Rest request and you will see.
To answer your question, a thread will be fetched from thread pool to server every request you send. The server does not care about the order, the request comes first will be served first.
More about the servers:
I suggest you use Nginx or Apache as reverse server to enable high performance, a thread will be fetched from the thread pool to server the request. To improve performance, you can increase the thread pool size. However, too much thread will, on the other hand, reduce your performance due to the frequency of switching from thread to thread increases. You don't want to have a very large thread pool.
If you are using Apache + Tomcat, basically, you have the same situation like you are using Tomcat. But apache is more suitable than tomcat to be the web server. In real life, companies use apache as reverse server that dispatch request to tomcat.
Apache and Tomcat are multithread based server, their performance reduce when there are too much requests. If you have to handle a lot of requests, you can use Nginx.
Nginx is an even based server, it uses queue to store requests and use FIFO to dispatch them. It can handle a lot of requests with much fewer threads. Therefore, its performance will be more stable even with larger amount of requests. However, with extremely large amount of requests, Nginx will also be overwhelmed, as its event loop has no room for extra requests.
Companies due with the situation by using distributed system concepts. For example load balancer. But to answer your question, that's a little too much. Check this article and this article to gain a better idea about nginx and apache.
I need to pull data from a lot of clients connecting to a java server through a web socket.
There are a lot of web socket implementations, and I picked vert.x.
I made a simple demo where I listen to text frames of json, parse them with jackson and send response back. Json parser doesn't influence significantly on the throughput.
I am getting overall speed 2.5k per second with 2 or 10 clients.
Then I tried to use buffering and clients don't wait for every single response but send batch of messages (30k - 90k) after a confirmation from a server - speed increased up to 8k per second.
I see that java process has a CPU bottleneck - 1 core is used by 100%.
Mean while nodejs client cpu consumption is only 5%.
Even 1 client causes server to eat almost a whole core.
Do you think it's worth to try other websocket implementations like jetty?
Is there way to scale vert.x with multiple cores?
After I changed the log level from debug to info I have 70k. Debug level causes vert.x print messages for every frame.
It's possible specify number of verticle (thread) instances by e.g. configuring DeploymentOptions http://vertx.io/docs/vertx-core/java/#_specifying_number_of_verticle_instances
You was able to create more than 60k connections on a single machine, so I assume the average time of a connection was less than a second. Is it the case you expect on production? To compare other solutions you can try to run https://github.com/smallnest/C1000K-Servers
Something doesn't sound right. That's very low performance. Sounds like vert.x is not configured properly for parallelization. Are you using only one verticle (thread)?
The Kaazing Gateway is one of the first WS implementations. By default, it uses multiple cores and is further configurable with threads, etc. We have users using it for massive IoT deployments so your issue is not the JVM.
In case you're interested, here's the github repo: https://github.com/kaazing/gateway
Full disclosure: I work for Kaazing
I've a Java + Spring app that will query ElasticSearch using Jest client (poor choice because it is poorly documented). ElasticSearch has response times of about 8-20 ms with 150 concurrent connections, but my app goes up to 900 -1500 ms. A quick look at VisualVM tells me that the processor usage is below 10% and profiling it tells me that 98% of the time all that the app does is wait on the following method
org.apache.http.pool.PoolEntryFuture.await()
that is part of Apache HttpCore and a dependency of Jest. I don't have a limitation in terms of threads that can run on tomcat (max is 200 and VisualVM says that the maximum number of thread during the experiment was 174). So it's not waiting free threads.
I think that the latency increase is excessive and I suspect that Jest is using an internal threadpool that has not enough threads to cope with all the requests, but I don't know.
Thoughts?
I think that the latency increase is excessive and I suspect that Jest is using an internal threadpool that has not enough threads to cope with all the requests...
In poking around the source real fast I see that you should be able to inject a ClientConfig into the Jest client factory.
The ClientConfig has the following setters which seem to impact the internal Apache http client connection manager:
clientConfig.maxTotalConnection(...);
clientConfig.defaultMaxTotalConnectionPerRoute(...);
clientConfig.maxTotalConnectionPerRoute(...);
Maybe tweaking some of those will give you more connections? Take a look at the JestClientFactory source to see what it is doing. We've definitely had to tweak those values in the past when making a large number of connections to the same server using HttpClient.
I would test this with just one connection and see what the average response time is. With just one thread you should have more than enough thread and resources etc. Most likely the process is waiting on an external resource like a database or a network service.
I am writing a benchmarking tool to run against a web application. The problem that I am facing is that the first request to the server always takes significantly longer than subsequent requests.
I have experienced this problem with the apache http client 3.x, 4.x and the Google http client. The apache http client 4.x shows the biggest difference (first request takes about seven times longer than subsequent ones. For Google and 3.x it is about 3 times longer.
My tool has to be able to benchmark simultaneous requests with threads. I can not use one instance of e.g. HttpClient and call it from all the threads, since this throws a Concurrent exception. Therefore, I have to use an individual instance in each thread, which will only execute a single request. This changes the overall results dramatically.
I do not understand this behavior. I do not think that it is due to a caching mechanism on the server because a) the webapp under consideration does not employ any caching (to my knowledge) and b) this effect is also visible when first requesting www.hostxy.com and afterwards www.hostxy.com/my/webapp.
I use System.nanoTime() immediately before and after calling client.execute(get) or get.execute(), respectively.
Does anyone have an idea where this behavior stems from? Do these httpclients themselves do any caching? I would be very grateful for any hints.
Read this: http://hc.apache.org/httpcomponents-client-ga/tutorial/html/connmgmt.html for connection pooling.
Your first connection probably takes the longest, because its a Connect: keep-alive connection, thus following connections can reuse that connection, once it has been established. This is justa guess
Are you hitting a JSP for the first time after server start? If the server flushes it's working directory on each start, then the first hit the JSPs compile and it takes a long time.
Also done on the first transaction: If the transaction uses a ca cert trust store, it will be loaded and cached.
You better once see it about caching
http://hc.apache.org/httpcomponents-client-ga/tutorial/html/caching.html
If your problem is that "first http request to specific host significantly slower", maybe the cause of this symptom is on the server, while you are concerned about the client.
If the "specific host" that you are calling is an Google App Engine application (or any other Cloud Plattform), it is normal that the first call to that application make you wait a little more. That is because Google put it under a dormant state upon inactivity.
After a recent call (that usually takes longer to respond), the subsequent calls have faster responses, because the server instances are all awake.
Take a look on this:
Google App Engine Application Extremely slow
I hope it helps you, good luck!
I have a relatively simple java service that fetches information from various SOAP webservices and does so using apache cxf 2.5.2 under the hood. The service launches 20 worker threads to churn through 1000-8000 requests every hour and each request could make 2-5 webservice calls depending on the nature of the request.
Setup
I am using connection pooling on the webservice connections
Connection Timeout is set to 2 seconds in order to realistically tackle the volume of requests efficiently.
All connections are going out through a http proxy.
20 Worker Threads
Grunty 16 cpu box
The problem is that I am starting to see 'connect time out' errors in the logs and quite a large number of them and it seems the the application service is also effecting the machines network performance as curl from the command line takes >5 seconds just establish a connection to the same webservices. However when I stop the service application, curl performance improves drastically to < 5ms
How have other people tackled this situation using CXF? did it work or did they switch to a different library? If you were to start from scratch how would you design for 'small payload high frequency' transactions?
Once we had the similar problem as yours that the request took very long time to complete. It is not CXF issue, every web services' stacks will operate long for very frequent request.
To solve this issue we implemented JMS EJB message driven bean. The flow were as follows: when the users send their request to web service, all requests were put into JMS queue so that response to users come very quickly and request is left to process at the background. Later the users were able to see their operations: if they are still send to process, if they are processing, if they are completed successfully or if they failed to complete for some reason.
If I had to design frequent transactions application, I would definitely use JMS for that.
Hope this helps.