AWS ElasticSearch 2.3 Java HTTP bulk API - java

I'm attampting to use a bulk HTTP api in Java on AWS ElasticSearch 2.3.
When I use a rest client for teh bulk load, I get the following error:
504 GATEWAY_TIMEOUT
When I run it as Lambda in Java, for HTTP Posts, I get:
{
"errorMessage": "2017-01-09T19:05:32.925Z 8e8164a7-d69e-11e6-8954-f3ac8e70b5be Task timed out after 15.00 seconds"
}
Through testing I noticed the bulk API doesn't work these with these settings:
"number_of_shards" : 5,
"number_of_replicas" : 5
When shards and replicas are set to 1, I can do a bulk load no problem.
I have tried using this setting to allow for my bulk load as well:
"refresh_interval" : -1
but so far it made no impact at all. In Java Lambda, I load my data as an InputStream from S3 location.
What are my options at this point for Java HTTP?
Is there anything else in index settings I could try?
Is there anything else in AWS access policy I could try?
Thank you for your time.
1Edit:
I also have tried these params: _bulk?action.write_consistency=one&refresh But makes no difference so far.
2Edit:
here is what made my bulk load work - set consistency param (I did NOT need to set refresh_interval):
URIBuilder uriBuilder = new URIBuilder(myuri);
uriBuilder = uriBuilder.addParameter("consistency", "one");
HttpPost post = new HttpPost(uriBuilder.build());
HttpEntity entity = new InputStreamEntity(myInputStream);
post.setEntity(entity);

From my experience, the issue can occur when your index replication settings can not be satisfied by your cluster. This happens either during a network partition, or if you simply set a replication requirement that can not be satisfied by your physical cluster.
In my case, this happens when I apply my production settings (number_of_replicas : 3) to my development cluster (which is single node cluster).
Your two solutions (setting the replica's to 1 Or setting your consistency to 1) resolve this issue because they allow Elastic to continue the bulk index w/o waiting for additional replica's to come online.
Elastic Search probably could have a more intuitive message on failure, maybe they do in Elastic 5.
Setting your cluster to a single

Related

Lettuce Redis client: Differentiating between connectTimeout in SocketOptions and defaultTimeout in RedisClusterClient

My application uses Lettuce Redis client to connect to AWS Elasticache. I am trying to follow this guide to increase my service's resiliency. One of the points being suggested is regarding the socket timeout:
Ensure that the socket timeout of the client is set to at least one second (vs. the typical “none” default in several clients). Setting the timeout too low can lead to numerous timeouts when the server load is high. Setting it too high can result in your application taking a long time to detect connection issues.
The pseudo code on how I am creating connections is:
RedisClusterClient redisClusterClient = RedisClusterClient.create(clientResources, redisUrl);
// Topology refresh and periodic refresh
ClusterTopologyRefreshOptions topologyRefreshOptions = ClusterTopologyRefreshOptions.builder()
.enablePeriodicRefresh(true)
.enableAllAdaptiveRefreshTriggers()
.build();
// Update cluster topology periodically
redisClient.setOptions(ClusterClientOptions.builder()
.topologyRefreshOptions(topologyRefreshOptions)
.build());
StatefulRedisClusterConnection connection = redisClusterClient.connect(new ByteArrayCodec());
I was going through the lettuce docs and saw there are two timeout options available for this:
Use connectTimeout field in SocketOptions
Use defaultTimeout field in RedisClusterClient
I would really appreciate if someone could help me understand the differences between the two and which one works better for my use case.
EDIT: Here is what I have tried till now:
I tried using both SocketOptions and deafultTimeout() one at a time and ran some tests.
Here is what I did:
Test Case 1
Set connectTimeout in SocketOptions to 1s and updated the redisClient object using setOptions() method.
Use Litmuschaos to add latency of >1s to the calls made to AWS Elasticache.
Use Elasticache failover API to bring down one of the nodes in the redis cluster.
Test Case 2
Set defaultTimeout in redisClient to 1s.
Use Litmuschaos to add latency of >1s to the calls made to AWS Elasticache.
Use Elasticache failover API to bring down one of the nodes in the redis cluster.
Observation (For both TCs):
The lettuce logs indicated that it is not able to connect to the node which was brought down (This was expected as AWS was still in the process of replacing it).
Once the redis node was up in AWS EC, Lettuce logs showed that it was successfully able to reconnect to that node (This was unexpected as I was already adding latency to the calls made to AWS EC).
Am I missing some config here?

Java request for marking elasticsearch index block as null

I am facing cluster block exception due to lack of space on elastic search node. But after sufficient space is available I need to manually set the "index.blocks.read_only_allow_delete": null value using java high level rest client. Same I am able to do with Kibana but need some java equivalent of that.
Kibana request :
PUT /_settings
{
"index.blocks.read_only_allow_delete": null
}
Elastic version : 6.8.4
Java : 8
As the High Level Rest Client has no Support for his feature, you need to use the Low Level Rest Client or issue the http request by yourself

Elasticsearch client on AWS / Lambda / Java - 2.5 seconds client startup time

We're using AWS Lambda (Java) and the elasticsearch client to connect to a hosted elasticsearch instance on AWS. I encounter a long wait on the first request of about 2.5 seconds (on top of a cold start). After that it is very quick. I can't really figure out where this delay is coming from and I'm trying to optimize it.
private void testPerformanceElasticSearch() throws Exception {
log.info("1. Before testing elasticsearch client");
AWS4Signer signer = new AWS4Signer();
signer.setServiceName("es");
signer.setRegionName("eu-west-1");
HttpRequestInterceptor interceptor = new AWSRequestSigningApacheInterceptor("es", signer, new DefaultAWSCredentialsProviderChain());
String endpoint = "https://" + Utils.getEnvironmentVariable("ELASTIC_SEARCH_ENDPOINT");
RestHighLevelClient restHighLevelClient = new RestHighLevelClient(RestClient.builder(HttpHost.create(endpoint)).setHttpClientConfigCallback(hacb -> hacb.addInterceptorLast(interceptor)));
log.info("2. After getting elasticsearch client");
log.info("3. Before doing a elasticsearch query");
log.info("4");
BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
log.info("5");
TermsQueryBuilder termsQueryBuilder = QueryBuilders.termsQuery("userId", "abc");
log.info("6");
boolQueryBuilder.must(termsQueryBuilder);
log.info("7");
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
log.info("8");
searchSourceBuilder.query(boolQueryBuilder);
log.info("9");
SearchRequest searchRequest = new SearchRequest("users");
log.info("10");
searchRequest.source(searchSourceBuilder);
log.info("11");
restHighLevelClient.search(searchRequest);
log.info("12");
log.info("13. After testing elasticsearch");
}
And then I get logging like this; you can see between '5' and '6' there is more than a 2 second delay which I can't really place:
17:16:06.871INFO[PlacesPerformance] 1. Before testing elasticsearch client
17:16:06.932INFO[PlacesPerformance] 2. After getting elasticsearch client
17:16:06.933INFO[PlacesPerformance] 3. Before doing a elasticsearch query
17:16:06.935INFO[PlacesPerformance] 4
17:16:06.942INFO[PlacesPerformance] 5
17:16:09.179INFO[PlacesPerformance] 6
17:16:09.179INFO[PlacesPerformance] 7
17:16:09.181INFO[PlacesPerformance] 8
17:16:09.181INFO[PlacesPerformance] 9
17:16:09.183INFO[PlacesPerformance] 10
17:16:09.183INFO[PlacesPerformance] 11
17:16:09.362INFO[PlacesPerformance] 12
17:16:09.362INFO[PlacesPerformance] 13. After testing elasticsearch
Any suggestions on how to improve this?
UPDATE:
Strange. Whenever I run the code in a lambda, I experience the 2.5 second delay when constructing the request (not even executing it). Locally, it works fine though. I tried the following:
1. Local against local elasticsearch. No delay.
2. Local against AWS elasticsearch. No delay.
3. Lambda with signing request. DELAY.
4. Lambda without signing request. DELAY.
5. Lambda with a 'match all' query. DELAY
6. Lambda with a http address. DELAY.
7. Lambda with a custom runtime. DELAY.
8. Lambda with a custom runtime. DELAY.
9. Lambda with standard Java 8 runtime. DELAY.
The problem is that at the first request (real request, not warmup request as warmup requests don't go through your application code, it doesn't trigger loading classes which are used in actual request path) JVM loads (read, parse, verify, etc ...) related classes, initializes security components (ciphers, etc ...) and TLS handshake is done (requires multiple RTT, with Java 9 and TLS 1.3 this should be reduced).
The similar long duration behaviour is also seen for first AWS service calls (DynamoDB, SQS, etc ...)
As I am the author Thundra warmup plugin, I am thinking of introducing hook points for warmup messages as custom action will be able to executed like initializing security components, loading classes, etc ...
Lambda functions inside VPCs have a great impact on the startup time. You said your ES is a hosted instance, so I assume it's backed by a VPC.
Even if it's not in a VPC, Java cold starts are usually, by nature, longer than runtimes like Node or Python, because the JVM needs to be started up first. This is mainly where your 2.5 seconds come from.
OK. How to fix the issue?
It depends on how many concurrent connections you need to ElasticSearch. If one function is able to handle all the incoming requests, you can then limit the concurrent execution of your Lambda function to 1, so you make sure you are always hitting the same container (as long as these requests are made in a ±5 min time frame).
Now, if you don't know upfront how many concurrent Lambda functions will execute, you kind of have no way out. You could try warming up your Lambda functions beforehand, but then you'd need to fire like 100 requests at the same time to warm up 100 different containers.
Please check this answer as I go through the concurrent model of Lambda functions and how the cold/warm starts work.
I am happy to edit my answer if you have more info to share or if I wasn't clear enough.

Java Elastic limit post execiton time

I do post like this:
Settings settings = Settings.settingsBuilder()
.put("cluster.name", "cluster-name")
.build();
client = TransportClient.builder()
.settings(settings)
.build();
client.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("my.elastic.server"), 9300));
IndexResponse response = client
.prepareIndex("myindex", "info")
.setSource(data) //here data is stored in a Map
.get();
But data could be about 2Mb or more and I care about the speed it would be posted to elastic. What is the best way to limit that time? Such Elastic Java API feature or maybe run posting in a separate Thread or maybe something else? Thanks
You could utilize Spring Data Elasticsearch in Java and Spring Batch to create an index batch job. This way you can break the data up into smaller chunks, for more frequent but smaller writes to your index.
If your job is big enough(millions of records), you can utilize a multi-threaded batch job, and significantly reduce the time it takes to generate your index. This may be overkill for a smaller index though.

AWS Error Message: InvalidInstanceID.NotFound

I'm trying to start a Amazon EC2 cloud machine with [startInstance][2] method using aws-sdk in Java. My code is as follows.
public String startInstance(String instanceId) throws Exception {
List<String> instanceIds = new ArrayList<String>();
instanceIds.add(instanceId);
StartInstancesRequest startRequest = new StartInstancesRequest(
instanceIds);
startRequest.setRequestCredentials(getCredentials());
StartInstancesResult startResult = ec2.startInstances(startRequest);
List<InstanceStateChange> stateChangeList = startResult
.getStartingInstances();
log.trace("Starting instance '{}':", instanceId);
// Wait for the instance to be started
return waitForTransitionCompletion(stateChangeList, "running",
instanceId);
}
When I run the above code, i'm getting the following AWS error:
Status Code: 400, AWS Request ID: e1bd4795-a609-44d1-9e80-43611e80006b, AWS Erro
r Code: InvalidInstanceID.NotFound, AWS Error Message: The instance ID 'i-2b97ac
2f' does not exist
at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpCli
ent.java:538)
at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.ja
va:283)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:168
)
at com.amazonaws.services.ec2.AmazonEC2Client.invoke(AmazonEC2Client.jav
a:5208)
at com.amazonaws.services.ec2.AmazonEC2Client.startInstances(AmazonEC2Cl
ient.java:2426)
AWS Error Message: The instance ID 'i-2b97ac2f' does not exist
You'll have to take the AWS response for granted here, i.e. the instance does not exist ;)
But seriously: Presumably you have already verified that you are actually running an instance with this ID in your account? Then this is most likely caused by targeting the wrong API endpoint, insofar an instance ID is only valid within a specific region (if not specified, the region defaults to 'us-east-1', see below).
In this case you need to specify the actual instance region via the setEndpoint() method of the AmazonEC2Client object within the apparently global ec2 variable before calling startInstances().
There are some examples regarding Using Regions with the AWS SDKs and all currently available AWS regional endpoint URLs are listed in Regions and Endpoints, specifically the Amazon Elastic Compute Cloud (EC2) defaults to 'us-east-1':
If you just specify the general endpoint (ec2.amazonaws.com), Amazon
EC2 directs your request to the us-east-1 endpoint.
We run a service (Qubole) that frequently spawns and then tags (and in some cases terminates) AWS instances immediately.
We have found that Amazon will, every once in a while, claim an instanceid as invalid - even though it has just created it. Retrying a few times with some sleep time thrown in usually solves the problem. Even a total retry interval of 15s proved insufficient in rare cases.
This experience comes from the useast region. We do not make api calls to different regions - so that is not an explanation. More likely - this is the infamous eventual consistency at work - where AWS is unable to provide read-after-write consistency for these api calls.
I am using the AWS ruby api and I noticed the same issue when creating an AMI image and its status is pending when I look in the AWS console but after a while the image is available for use.
Here is my script
image = ec2.images.create(:name => image_name, :instance_id => ami_id, :description => desc)
sleep 5 while image.state != :available
I sleep for about 5 sec for image to be in available but I get the error saying that the "AWS Error Message: InvalidInstanceID.NotFound". During my testing this is fine but most of the time this seems to be failing during continuous integration builds.
InvalidInstanceID.NotFound means the specified instance does not exist.
Ensure that you have indicated the region in which the instance is located, if it's not in the default region.
This error may occur because the ID of a recently created instance has not propagated through the system. For more information, see Eventual Consistency.

Categories