I tried to setup a sample Play Framework (version 2.2.2) Java application to test its performace on some simple use case scenarios I had in mind. That's what I did:
Play controller
I wrote a basic Application controller to test the performance of a custom library I wanted to use in both sync and async scenarios:
public class Application extends Controller {
public static JsonNode transform(Request request) {
// this method reads a json from request, applies some transformation and returns a new JsonNode
}
public static Result syncTest() {
JsonNode node = transform(request());
if(node.has("error")) {
return badRequest(node);
} else {
return ok(node);
}
}
public static Promise<Result> asyncTest() {
final Request request = request();
Promise<JsonNode> promise = Promise.promise(
new Function0<JsonNode>() {
public JsonNode apply() {
return transform(request);
}
});
return promise.map(new Function<JsonNode, Result> () {
public Result apply(JsonNode node) {
if(node.has("error")) {
return badRequest(node);
} else {
return ok(node);
}
}
});
}
}
I run this service on virtual machine running on Azure with 2 2.0ghz cores and 3.4gb ram.
Testing
I used wrk from a different machine to perform tests on both sync and async routes. These are the commands and the results I got:
./wrk -s post.lua -d30s -c100 -t10 --latency http://my.proxy.net:8080/syncTest
Running 30s test #
10 threads and 100 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 84.98ms 48.13ms 410.73ms 68.95%
Req/Sec 121.23 18.90 181.00 73.67%
Latency Distribution
50% 81.36ms
75% 112.51ms
90% 144.44ms
99% 231.99ms
36362 requests in 30.03s, 10.99MB read
Requests/sec: 1210.80
Transfer/sec: 374.83KB
./wrk -s post.lua -d30s -c100 -t10 --latency http://my.proxy.net:8080/asyncTest
Running 30s test #
10 threads and 100 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 82.07ms 36.55ms 257.93ms 70.53%
Req/Sec 122.44 15.39 161.00 73.24%
Latency Distribution
50% 80.26ms
75% 102.37ms
90% 127.14ms
99% 187.17ms
36668 requests in 30.02s, 11.09MB read
Requests/sec: 1221.62
Transfer/sec: 378.18KB
./wrk -s post.lua -d30s -c1000 -t10 --latency http://my.proxy.net:8080/syncTest
Running 30s test #
10 threads and 1000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 842.98ms 617.40ms 4.18s 59.56%
Req/Sec 118.02 16.82 174.00 77.50%
Latency Distribution
50% 837.67ms
75% 1.14s
90% 1.71s
99% 2.51s
35326 requests in 30.01s, 10.68MB read
Socket errors: connect 0, read 27, write 0, timeout 181
Requests/sec: 1176.97
Transfer/sec: 364.35KB
./wrk -s post.lua -d30s -c1000 -t10 --latency http://my.proxy.net:8080/asyncTest
Running 30s test #
10 threads and 1000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 5.98s 4.53s 17.97s 72.66%
Req/Sec 21.32 10.45 37.00 59.74%
Latency Distribution
50% 4.86s
75% 8.30s
90% 12.89s
99% 17.10s
6361 requests in 30.08s, 1.92MB read
Socket errors: connect 0, read 0, write 0, timeout 8410
Requests/sec: 211.47
Transfer/sec: 65.46KB
During all tests, both cpus of the server's machine were working 100%. Later, I repeated this experiments but modified the Promises I was creating to run on a different execution context than the default. In this case both sync and async methods performed in a very similar way.
Questions
Why is it that, when using 10 threads with 100 connections, both methods have similar latency and request per seconds.
Why is it that, with 1000 connections, async method seems to have worst performance that async or, in case of a separate execution context, similar performance to sync methods?
Is it related to the transform method not being really cpu intensive, because I did the async implementation wrong or because I have completely misunderstood how this thing is supposed to work?
Thanks in advance!
Related
I have a grpc (1.13.x) server on java that isn't performing any computation or I/O intensive task. The intention is to check the number of requests this server can support per second on 80 core machine.
Server:
ExecutorService executor = new ThreadPoolExecutor(160, Integer.MAX_VALUE,
60L, TimeUnit.SECONDS,
new SynchronousQueue<Runnable>(),
new ThreadFactoryBuilder()
.setDaemon(true)
.setNameFormat("Glowroot-IT-Harness-GRPC-Executor-%d")
.build());
Server server = NettyServerBuilder.forPort(50051)
.addService(new MyService())
.executor(executor)
.build()
.start();
Service:
#Override
public void verify(Request request, StreamObserver<Result> responseObserver) {
Result result = Result.newBuilder()
.setMessage("hello")
.build();
responseObserver.onNext(result);
responseObserver.onCompleted();
}
I am using ghz client to perform a load test. Server is able to handle 40k requests per second but RPS count is not able to exceed more than 40k even on increase in number of concurrent clients with incoming requests rate 100k. GRPC server is able to handle just 40K requests per second and it queues all other requests. CPU is underutilized (7%). About 90% of grpc threads (with prefix grpc-default-executor) were in waiting state, despite no I/O operation. More than 25k threads are in waiting state.
Stacktrace of threads in waiting:
grpc-default-executor-4605
PRIORITY :5
THREAD ID :0X00007F15A4440D80
NATIVE ID :
stackTrace:
java.lang.Thread.State: TIMED_WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base#15.0.1/Native Method)
- parking to wait for <0x00007f1df161ae20> (a java.util.concurrent.SynchronousQueue$TransferStack)
at java.util.concurrent.locks.LockSupport.parkNanos(java.base#15.0.1/LockSupport.java:252)
at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(java.base#15.0.1/SynchronousQueue.java:462)
at java.util.concurrent.SynchronousQueue$TransferStack.transfer(java.base#15.0.1/SynchronousQueue.java:361)
at java.util.concurrent.SynchronousQueue.poll(java.base#15.0.1/SynchronousQueue.java:937)
at java.util.concurrent.ThreadPoolExecutor.getTask(java.base#15.0.1/ThreadPoolExecutor.java:1055)
at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base#15.0.1/ThreadPoolExecutor.java:1116)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base#15.0.1/ThreadPoolExecutor.java:630)
at java.lang.Thread.run(java.base#15.0.1/Thread.java:832)
Locked ownable synchronizers:
- None
How can I configure the server to support 100K+ requests?
Nothing in the gRPC stack seems to cause this limit. What's the average response time on the server side? It looks like you are limited by the ephemeral ports or TCP connection limit and you may want to tweak your kernel as described at here https://www.metabrew.com/article/a-million-user-comet-application-with-mochiweb-part-1 or here https://blog.box.com/ephemeral-port-exhaustion-and-web-services-at-scale
I'm writing a scheduled task in Thorntail that will run for a long time (approx. 30 minutes). However, it appears that Thorntail limits the execution time to 30 seconds.
My code looks like this (I've removed code that I believe is irrelevant):
#Singleton
public class ReportJobProcessor {
#Schedule(hour = "*", minute = "*/30", persistent = false)
public void processJobs() {
// Acquire a list of jobs
jobs.forEach(this::processJob);
}
private void processJob(ReportJob job) {
// A long running process
}
}
After 30 seconds, I see the following in my logs:
2019-10-01 16:15:14,097 INFO [org.jboss.as.ejb3.timer] (EJB default - 2) WFLYEJB0021: Timer: [id=... timedObjectId=... auto-timer?:true persistent?:false timerService=org.jboss.as.ejb3.timerservice.TimerServiceImpl#42478b98 initialExpiration=null intervalDuration(in milli sec)=0 nextExpiration=Tue Oct 01 16:20:00 CEST 2019 timerState=IN_TIMEOUT info=null] will be retried
Another 30 seconds later, an exception is thrown because the job still didn't complete.
I have no idea how to increase the timeout, and googling my issue returns nothing helpful.
How can I increase the timeout beyond 30 seconds?
I suggest you take a bit different approach.
The scheduled task will distribute jobs to asynchronously running stateless session beans (SLSB) called ReportJobExecutor and finish immediately after job distribution without timing out. The number of simultaneously running SLSBs can be adjustable in project-defaults.yml, the default count is 16, IIRC. This is a very basic example but demonstrates Java EE executions with predefined bean pool that is invoked using EJB Timer. More complicated example would be manual pooling of executors that would allow you to control lifecycle of the executors (e.g. killing them after specified time).
#Singleton
public class ReportJobProcessor {
#Inject ReportJobExecutor reportJobExecutor;
#Schedule(hour = "*", minute = "*/30", persistent = false)
public void processJobs() {
// Acquire a list of jobs
jobs.forEach(job -> reportJobExecutor.run(job));
}
}
#Stateless
#Asynchronous
public class ReportJobExecutor {
public void run(ReportJob job) {
//do whatever with job
}
}
Idea #2:
Another approach would be using Java Batch Processing API (JSR 352), unfortunately, I am not familiar with this API.
I have cluster having following configuration
Number of nodes - 6 ,
Machine - M3.2xlarge,
Number of cores per Nodes - 8,
Memory per nodes -30 GB,
I am running spark application on it which is reading data from HDFS and sending to SNS/SQS.
I am using following command to run this job
spark-submit --class com.message.processor.HDFSMessageReader --master
yarn-client --num-executors 17 --executor-cores 30 --executor-memory 6G
/home/hadoop/panther-0.0.1-SNAPSHOT-jar-with-dependencies.jar
/user/data/jsonData/* arn:aws:sns:us-east-1:618673372431:pantherSNS https://sns.us-east-1.amazonaws.com true
Here I am keeping number of executors to max and varying number of executors cores , following are the result that I have got --
Here I have refereed blog given by Cloudera to calculate number of Executors and Executors memory .
Scenario 1 --
--number of Executors = 17,
--number of Executors cores = 3
Result -- Total message send to SQS via SNS = 1.2 Million
Scenario 2--
--number of Executors = 17,
--number of Executors cores = 10
Result -- Total message send to SQS via SNS = 4.4 Million
Scenario 3--
--number of Executors = 17,
--number of Executors cores = 20
Result -- Total message send to SQS via SNS = 8.5 Million
Scenario 4--
--number of Executors = 17,
--number of Executors cores = 30
Result -- Total message send to SQS via SNS = 12.7 Million
How to explain this result?
Could you please let me know how application performance getting increased by increasing the number of executors core ?
There are some nice libraries like this from Apache, but that's little bit too complex for my purpose. All I need is to get the best estimate of HTTP latency (the time it takes to get connected with the server, regardless of transfer speed).
I tried the HTTP connection code from this answer:
private void doPing() {
//Remember time before connection
long millis = System.currentTimeMillis();
try (BufferedReader reader = new BufferedReader(new InputStreamReader(url.openStream(), "UTF-8"))) {
//We don't need any data
reader.close();
//Times is the array where we store our results
times.add((int)(System.currentTimeMillis()-millis));
//I see lot's of these in console, so it's probably working
System.out.println("Request done...");
}
//If internet is dead, does it throw exception?
catch(Exception e) {
times.add(-1);
}
}
The thing is that I am not so sure what am I measuring. Looping through the values gave me this results:
Testing connection to http://www.seznam.cz
Threads: 5
Loops per thread: 50
Given up waiting for results.
Average time to connection: 53.8 [ms]
Failures: 0.0%
Testing connection to http://www.seznam.cz
Threads: 5
Loops per thread: 100
Average time to connection: 43.58 [ms]
Failures: 0.0%
Testing connection to http://www.seznam.cz
Threads: 5
Loops per thread: 400
Average time to connection: 30.145 [ms]
Failures: 0.0%
Testing connection to http://www.stackoverflow.com
Threads: 5
Loops per thread: 30
Given up waiting for results.
Average time to connection: 4006.1111111111113 [ms]
Failures: 0.0%
Testing connection to http://www.stackoverflow.com
Threads: 5
Loops per thread: 80
Given up waiting for results.
Average time to connection: 2098.695652173913 [ms]
Failures: 0.0%
Testing connection to http://www.stackoverflow.com
Threads: 5
Loops per thread: 200
Given up waiting for results.
Average time to connection: 0.0 [ms]
//Whoops, connection dropped again
Failures: 100.0%
//Some random invalid url
Testing connection to http://www.sdsfdser.tk/
Threads: 4
Loops per thread: 20
Average time to connection: 0.0 [ms]
Failures: 100.0%
Not only that I am not so sure if I calculated what I wanted (though it reflects my experience), I am also not sure what happes in non standard cases.
Does the URL handle timeouts?
Will it allways throw exception on timeout?
While keeping in mind that this project is supposed to be simple and lightweight, could you tell me if I'm doing it right?
I think hailin suggested you create a raw Socket and connect it to the server instead of using URLConnection. I tried both, and I'm getting much higher latency with your version. I think opening a URLConnection must be doing some additional stuff in the background, though I'm not sure what.
Anyway, here's the version using a Socket (add exception handling as needed):
Socket s = new Socket();
SocketAddress a = new InetSocketAddress("www.google.com", 80);
int timeoutMillis = 2000;
long start = System.currentTimeMillis();
try {
s.connect(a, timeoutMillis);
} catch (SocketTimeoutException e) {
// timeout
} catch (IOException e) {
// some other exception
}
long stop = System.currentTimeMillis();
times.add(stop - start);
try {
s.close();
} catch (IOException e) {
// closing failed
}
This resolves the hostname (www.google.com in the example), establishes a TCP connection on port 80 and adds the milliseconds it took to times. If you don't want the time for the DNS resolution in there, you can create an InetAddress with InetAddress.getByName("hostname") before you start the timer and pass that to the InetSocketAddress constructor.
Edit: InetSocketAddress's constructor also resolves the host name right away, so constructing it from a resolved ip address shouldn't make a difference.
We use solr as out full text searcher. we use multi cores with solr and we have approximately 5 hundred million data. we batch update our index when reaches 1000 new record without commit them because we use auto commit in solr.In recently, when we batch post out new record, the cpu usage colse to 100%.
I have dump the high cpu theads. most threads likes follows:
java.lang.Thread.State: RUNNABLE
at org.apache.lucene.search.ConjunctionTermScorer.doNext(ConjunctionTermScorer.java:64)
at org.apache.lucene.search.ConjunctionTermScorer.nextDoc(ConjunctionTermScorer.java:95)
at org.apache.lucene.search.Scorer.score(Scorer.java:63)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:605)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297)
at org.apache.solr.search.SolrIndexSearcher.getDocSetNC(SolrIndexSearcher.java:1060)
at org.apache.solr.search.SolrIndexSearcher.getPositiveDocSet(SolrIndexSearcher.java:763)
at org.apache.solr.search.SolrIndexSearcher.getProcessedFilter(SolrIndexSearcher.java:880)
at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1337)
at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1304)
at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:395)
at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:410)
at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1797)
at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)