Elasticsearch client on AWS / Lambda / Java - 2.5 seconds client startup time

Elasticsearch client on AWS / Lambda / Java - 2.5 seconds client startup time - java

We're using AWS Lambda (Java) and the elasticsearch client to connect to a hosted elasticsearch instance on AWS. I encounter a long wait on the first request of about 2.5 seconds (on top of a cold start). After that it is very quick. I can't really figure out where this delay is coming from and I'm trying to optimize it.
private void testPerformanceElasticSearch() throws Exception {
log.info("1. Before testing elasticsearch client");
AWS4Signer signer = new AWS4Signer();
signer.setServiceName("es");
signer.setRegionName("eu-west-1");
HttpRequestInterceptor interceptor = new AWSRequestSigningApacheInterceptor("es", signer, new DefaultAWSCredentialsProviderChain());
String endpoint = "https://" + Utils.getEnvironmentVariable("ELASTIC_SEARCH_ENDPOINT");
RestHighLevelClient restHighLevelClient = new RestHighLevelClient(RestClient.builder(HttpHost.create(endpoint)).setHttpClientConfigCallback(hacb -> hacb.addInterceptorLast(interceptor)));
log.info("2. After getting elasticsearch client");
log.info("3. Before doing a elasticsearch query");
log.info("4");
BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
log.info("5");
TermsQueryBuilder termsQueryBuilder = QueryBuilders.termsQuery("userId", "abc");
log.info("6");
boolQueryBuilder.must(termsQueryBuilder);
log.info("7");
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
log.info("8");
searchSourceBuilder.query(boolQueryBuilder);
log.info("9");
SearchRequest searchRequest = new SearchRequest("users");
log.info("10");
searchRequest.source(searchSourceBuilder);
log.info("11");
restHighLevelClient.search(searchRequest);
log.info("12");
log.info("13. After testing elasticsearch");
}
And then I get logging like this; you can see between '5' and '6' there is more than a 2 second delay which I can't really place:
17:16:06.871INFO[PlacesPerformance] 1. Before testing elasticsearch client
17:16:06.932INFO[PlacesPerformance] 2. After getting elasticsearch client
17:16:06.933INFO[PlacesPerformance] 3. Before doing a elasticsearch query
17:16:06.935INFO[PlacesPerformance] 4
17:16:06.942INFO[PlacesPerformance] 5
17:16:09.179INFO[PlacesPerformance] 6
17:16:09.179INFO[PlacesPerformance] 7
17:16:09.181INFO[PlacesPerformance] 8
17:16:09.181INFO[PlacesPerformance] 9
17:16:09.183INFO[PlacesPerformance] 10
17:16:09.183INFO[PlacesPerformance] 11
17:16:09.362INFO[PlacesPerformance] 12
17:16:09.362INFO[PlacesPerformance] 13. After testing elasticsearch
Any suggestions on how to improve this?
UPDATE:
Strange. Whenever I run the code in a lambda, I experience the 2.5 second delay when constructing the request (not even executing it). Locally, it works fine though. I tried the following:
1. Local against local elasticsearch. No delay.
2. Local against AWS elasticsearch. No delay.
3. Lambda with signing request. DELAY.
4. Lambda without signing request. DELAY.
5. Lambda with a 'match all' query. DELAY
6. Lambda with a http address. DELAY.
7. Lambda with a custom runtime. DELAY.
8. Lambda with a custom runtime. DELAY.
9. Lambda with standard Java 8 runtime. DELAY.

The problem is that at the first request (real request, not warmup request as warmup requests don't go through your application code, it doesn't trigger loading classes which are used in actual request path) JVM loads (read, parse, verify, etc ...) related classes, initializes security components (ciphers, etc ...) and TLS handshake is done (requires multiple RTT, with Java 9 and TLS 1.3 this should be reduced).
The similar long duration behaviour is also seen for first AWS service calls (DynamoDB, SQS, etc ...)
As I am the author Thundra warmup plugin, I am thinking of introducing hook points for warmup messages as custom action will be able to executed like initializing security components, loading classes, etc ...

Lambda functions inside VPCs have a great impact on the startup time. You said your ES is a hosted instance, so I assume it's backed by a VPC.
Even if it's not in a VPC, Java cold starts are usually, by nature, longer than runtimes like Node or Python, because the JVM needs to be started up first. This is mainly where your 2.5 seconds come from.
OK. How to fix the issue?
It depends on how many concurrent connections you need to ElasticSearch. If one function is able to handle all the incoming requests, you can then limit the concurrent execution of your Lambda function to 1, so you make sure you are always hitting the same container (as long as these requests are made in a ±5 min time frame).
Now, if you don't know upfront how many concurrent Lambda functions will execute, you kind of have no way out. You could try warming up your Lambda functions beforehand, but then you'd need to fire like 100 requests at the same time to warm up 100 different containers.
Please check this answer as I go through the concurrent model of Lambda functions and how the cold/warm starts work.
I am happy to edit my answer if you have more info to share or if I wasn't clear enough.

Related

Webflux Webclient - increase my Webclient time out (wait a bit more a flaky service)

Small question regarding Spring Webflux Webclient, and how to increase the client side time out please.
Setup, I am the client, I need to consume a third party API over the web.
This third party API is known to be flaky. For a same request, same response, sometimes, it takes 1-2sec, sometimes more than 4-5secs (mostly 4-5secs 😛)
But it is a good and important API, the payload response is very important.
Hence, I believe it is worth to "wait them longer".
May I ask how to do that please?
After investigation, this third party API do see some 499 response from their VIP.
I believe this means "I did not want to wait". But I do want to wait longer!
After looking at the Webclient API, I am having a hard time finding how to do this.
Currently, I am constructing my WebClient as such:
WebClient.create().mutate().defaultHeader(HttpHeaders.CONTENT_TYPE, MediaType.APPLICATION_JSON_VALUE, HttpHeaders.ACCEPT, MediaType.APPLICATION_JSON_VALUE).clientConnector(new ReactorClientHttpConnector(HttpClient.create().metrics(true, () -> new MicrometerChannelMetricsRecorder(SERVICE, HTTP)).wiretap(true).secure(sslContextSpec -> sslContextSpec.sslContext(getSslContextBuilder())))).build()
What is the current default time out?
How to increase it please via property or code?
Something like :
.setDefaultClientSideWaitTime() ?
Many thanks for your help

Per answer from Netty Team, it needs:
.responseTimeout(Duration.ofSeconds(10))
.option(ChannelOption.CONNECT_TIMEOUT_MILLIS, 10 * 1000)
For an example to wait 10 seconds
and
conn.addHandlerLast
.timeout(Duration.ofSeconds(10L))
Are not needed

Try to set a responseTimeout as done for WebTestClient here :
https://stackoverflow.com/a/48655749/4725964

How to start CloudFoundry app using ReactorCloudFoundryClient?

I used StartApplicationRequest to create a sample request to start the application as given below:
StartApplicationRequest request = StartApplicationRequest.builder()
.applicationId("test-app-name")
.build();
Then, I used the ReactorCloudFoundryClient to start the application as shown below:
cloudFoundryClient.applicationsV3().start(request);
But my test application test-app-name is not getting started. I'm using latest Java CF client version (v4.5.0 RELEASE), but not seeing a way around to start the application.
Quite surprisingly, the outdated version seems to be working with the below code:
cfstatus = cfClient.startApplication("test-app-name"); //start app
cfstatus = cfClient.stopApplication("test-app-name"); //stop app
cfstatus = cfClient.restartApplication("test-app-name"); //stop app
I want to do the same with latest CF client library, but I don't see any useful reference. I referred to test cases written at CloudFoundry official Github repo. I derived to the below code after checking out a lot of docs:
StartApplicationRequest request = StartApplicationRequest.builder()
.applicationId("test-app-name")
.build();
cloudFoundryClient.applicationsV3().start(request);
Note that cloudFoundryClient is ReactorCloudFoundryClient instance as the latest library doesn't support the client class used with outdated code. I would like to do all operations (start/stop/restart) with latest library. The above code isn't working.

A couple things here...
Using the reactor based client, your call to cloudFoundryClient.applicationsV3().start(request) returns a Mono<StartApplicationResponse>. That's not the actual response, it's the possibility of one. You need to do something to get the response. See here for more details.
If you would like similar behavior to the original cf-java-client, you can call .block() on the Mono<StartApplicationResponse> and it will wait and turn into a response.
Ex:
client.applicationsV3()
.start(StartApplicationRequest.builder()
.applicationId("test-app-name")
.build())
.block()
The second thing is that it's .applicationId not applicationName. You need to pass in an application guid, not the name. As it is, you're going to get a 404 saying the application doesn't exist. You can use the client to fetch the guid, or you can use CloudFoundryOperations instead (see #3).
The CloudFoundryOperations interface is a higher-level API. It's easier to use, in general, and supports things like starting an app based on the name instead of the guid.
Ex:
ops.applications()
.start(StartApplicationRequest.builder()
.name("test-app-name").build())
.block();

AWS ElasticSearch 2.3 Java HTTP bulk API

I'm attampting to use a bulk HTTP api in Java on AWS ElasticSearch 2.3.
When I use a rest client for teh bulk load, I get the following error:
504 GATEWAY_TIMEOUT
When I run it as Lambda in Java, for HTTP Posts, I get:
{
"errorMessage": "2017-01-09T19:05:32.925Z 8e8164a7-d69e-11e6-8954-f3ac8e70b5be Task timed out after 15.00 seconds"
}
Through testing I noticed the bulk API doesn't work these with these settings:
"number_of_shards" : 5,
"number_of_replicas" : 5
When shards and replicas are set to 1, I can do a bulk load no problem.
I have tried using this setting to allow for my bulk load as well:
"refresh_interval" : -1
but so far it made no impact at all. In Java Lambda, I load my data as an InputStream from S3 location.
What are my options at this point for Java HTTP?
Is there anything else in index settings I could try?
Is there anything else in AWS access policy I could try?
Thank you for your time.
1Edit:
I also have tried these params: _bulk?action.write_consistency=one&refresh But makes no difference so far.
2Edit:
here is what made my bulk load work - set consistency param (I did NOT need to set refresh_interval):
URIBuilder uriBuilder = new URIBuilder(myuri);
uriBuilder = uriBuilder.addParameter("consistency", "one");
HttpPost post = new HttpPost(uriBuilder.build());
HttpEntity entity = new InputStreamEntity(myInputStream);
post.setEntity(entity);

From my experience, the issue can occur when your index replication settings can not be satisfied by your cluster. This happens either during a network partition, or if you simply set a replication requirement that can not be satisfied by your physical cluster.
In my case, this happens when I apply my production settings (number_of_replicas : 3) to my development cluster (which is single node cluster).
Your two solutions (setting the replica's to 1 Or setting your consistency to 1) resolve this issue because they allow Elastic to continue the bulk index w/o waiting for additional replica's to come online.
Elastic Search probably could have a more intuitive message on failure, maybe they do in Elastic 5.
Setting your cluster to a single

Is there a Java TThreadPoolServer with NonBlockingSocket in Apache Thrift?

I have tried TSimpleServer, THsHaServer and TThreadedSelectorSever but none of them worked for my use case where I need to scale. My server needs to receive lots of data but right now when I use the servers listed above the receive rate is 600KB which is very low when compare to other servers which can receive 6MB using one socket. I am bound by thrift and I need to find a way to get through!
Basically I have a service written in C++ and JAVA (same service written in two different languages) and we are trying to see which one has high throughput). with the C++ service we dont use any thrift server. we created our own non blocking server which uses one thread to accept the requests and passes it on to threadpool where the methods gets executed but with the Java service I tried TSimpleServer, THsHaServer and TThreadedSelectorSever and the performance was not great and when I run the profiler the one that seems to hotspot is the following method
org.apache.thrift.server.TNonblockingServer$SelectAcceptThread.run() 456
I have no idea what is going on underneath the thrift source code.
Benchmark results: C++ service 35K/sec(requests per seconds) and Java 2500/sec (request per second) Again they are executing the same methods, same service written in two different languages.
My code for one of the server is as follows. I have tried others as well as mentioned above but no peformance gain
TTransportFactory tTransportFactory = new TFramedTransport.Factory();
TNonblockingServerTransport tNonblockingServerTransport = new TNonblockingServerSocket(7911);
PersistenceService.AsyncProcessor<?> processor = getAsyncProcessor(serviceName);
Factory protocolFactory = new TBinaryProtocol.Factory(true, true);
THsHaServer.Args serverArgs = new THsHaServer.Args(tNonblockingServerTransport);
serverArgs.processor(processor);
serverArgs.transportFactory(tTransportFactory);
serverArgs.protocolFactory(protocolFactory);
TNonblockingServer tNonblockingServer = new THsHaServer(serverArgs);
System.out.println("Starting persistence server on port 7911 ...");
tNonblockingServer.serve();

Read timeout on a web service, but the operation still completes?

I have a Java web service client running on Linux (using Axis 1.4) that invokes a series of web services operations performed against a Windows server. There are times that some transactional operations fail with this Exception:
java.net.SocketTimeoutException: Read timed out
However, the operation on the server is completed (even having no useful response on the client). Is this a bug of either the web service server/client? Or is expected to happen on a TCP socket?

This is the expected behavior, rather than a bug. The operation behind the web service doesn't know anything about your read timing out so continues processing the operation.
You could increase the timeout of the connection - if you are manually manipulating the socket itself, the socket.connect() method can take a timeout (in milliseconds). A zero should avoid your side timing out - see the API docs.
If the operation is going to take a long time in each case, you may want to look at making this asynchronous - a first request submits the operations, then a second request to get back the results, possibly with some polling to see when the results are ready.
If you think the operation should be completing in this time, have you access to the server to see why it is taking so long?

I had similar issue. We have JAX-WS soap webservice running on Jboss EAP6 (or JBOSS 7). The default http socket timeout is set to 60 seconds unless otherwise overridden in server or by the client. To fix this issue I changed our java client to something like this. I had to use 3 different combinations of propeties here
This combination seems to work as standalone java client or webservice client running as part of other application on other web server.
//Set timeout on the client
String edxWsUrl ="http://www.example.com/service?wsdl";
URL WsURL = new URL(edxWsUrl);
EdxWebServiceImplService edxService = new EdxWebServiceImplService(WsURL);
EdxWebServiceImpl edxServicePort = edxService.getEdxWebServiceImplPort();
//Set timeout on the client
BindingProvider edxWebserviceBindingProvider = (BindingProvider)edxServicePort;
BindingProvider edxWebserviceBindingProvider = (BindingProvider)edxServicePort;
edxWebserviceBindingProvider.getRequestContext().put("com.sun.xml.internal.ws.request.timeout", connectionTimeoutInMilliSeconds);
edxWebserviceBindingProvider.getRequestContext().put("com.sun.xml.internal.ws.connect.timeout", connectionTimeoutInMilliSeconds);
edxWebserviceBindingProvider.getRequestContext().put("com.sun.xml.ws.request.timeout", connectionTimeoutInMilliSeconds);
edxWebserviceBindingProvider.getRequestContext().put("com.sun.xml.ws.connect.timeout", connectionTimeoutInMilliSeconds);
edxWebserviceBindingProvider.getRequestContext().put("javax.xml.ws.client.receiveTimeout", connectionTimeoutInMilliSeconds);
edxWebserviceBindingProvider.getRequestContext().put("javax.xml.ws.client.connectionTimeout", connectionTimeoutInMilliSeconds);

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Elasticsearch client on AWS / Lambda / Java - 2.5 seconds client startup time - java

Related

Webflux Webclient - increase my Webclient time out (wait a bit more a flaky service)

How to start CloudFoundry app using ReactorCloudFoundryClient?

AWS ElasticSearch 2.3 Java HTTP bulk API

Is there a Java TThreadPoolServer with NonBlockingSocket in Apache Thrift?

Read timeout on a web service, but the operation still completes?

Categories

Resources