I want to dynamically configure my API servers depending on the name of the "cluster".
So I'm using AmazonElastiCacheClient to discover the clusters name and need to extract the endpoint of the one that has a specific name.
The problem is that I can find it but there doesn't seem to be a way to get an endpoint.
foundCluster.getCacheNodes() returns an empty list, even if there is 1 Redis instance appearing in the AWS console, in-sync and running.
foundCluster.getConfigurationEndpoint() returns null.
Any idea?
Try adding
DescribeCacheClustersRequest.setShowCacheNodeInfo(true);
I am making a guess:
AWS Elastic Cache with redis currenlty supports only single node clusers (so no auto discovery etc). I am not sure this is due this. Memcached based cluster is different.
"At this time, ElastiCache supports single-node Redis cache clusters." http://docs.aws.amazon.com/AmazonElastiCache/latest/UserGuide/CacheNode.Redis.html
Related
I am new to aws.
I have a mysql rds instance and I just created 2 read replicas. My application is written in Java, and what I have done up until now is using the JDBC I have connected to the one aws instance, but now how do I distribute the work around the 3 servers?
You can set up an internal Elastic Load Balancer to round robin requests to the slaves. Then configure two connections in your code: one that points directly to the master for writes and one that points to the ELB endpoint for reads.
Or if you're adventurous, you could set up your own internal load balancer using Nginx, HAProxy, or something similar. In either case, your LB will listen on port 3306.
AWS suggests setting up route 53. Here is the official article on the subject https://aws.amazon.com/premiumsupport/knowledge-center/requests-rds-read-replicas/
In case you have the option to use Spring boot and spring-cloud-aws-jdbc
You can take a look at this working example and explanation in this post
Need information on best practices for below AWS specific use case,
Our Java web application is deployed in us-east-1 and us-west-2 regions.
It communicates to Dynamo DB and Memcached based ElastiCache is sitting on top of Dynamo DB in both the regions.
We have Dynamo DB replication enabled between us-east-1 and us-west-2.
Route 53 directs API calls to the appropriate region.
Now, the issue is when we create or update a record in Dynamo DB it get's inserted in Dynamo DB and get's cached in that particular region. Record get's replicated to other regions Dynamo DB as well, but cache doesn't remain in sync since there is no replication between ElastiCache.
How do we address this issue in a best possible way?
DAX is not an answer here. I'm surprised nobody is pointing out this.
Ivan is wrong about this one.
DAX is read-write-through cache indeed, but it doesn't capture data changes happens through direct access to DynamoDB (obviously), nor the data changes that happened through "other" DAX cluster.
So in this scenario,
you have 2 DAX cluster one at west-1 one at east-1, and those two clusters are totally independent.
thus, if you make a data change on west-1 through DAX, that change does get propagated to east-1 on DynamoDB Table level, Not the cache(DAX) level
In other words, if you update the record that had been accessed from both region, (which than cached in both DAX Cluster), you'll still get the same problem.
Fundamentally, this is the problem of syncing cache layer across different regions, which is pretty hard. if you really need it, there are some ways of doing this by making your own Kafka or Kinesis stream that keep the Cache entry changes and consumed by cache clusters in multi-regions. which is not possible with elasticache by its own now. (but it is possible if you setup lambda or EC2 for this task only)
Checkout case studies from Netflix
One option as other suggested is to use DynamoDB DAX. It's a write-through cache, meaning that you do not need to synchronise database data with your cache. It happens when behind the curtains when you write data using normal DynamoDB API.
But if you want to still use ElastiCache you can use DynamoDB Streams. You can implement a Lambda function that will be triggered on every DynamoDB update in a table and write new data into ElastiCache.
This article may give you some ideas about how to do this.
I'm a Java developer, aware of AWS and good at Hazelcast independently.
Have 2 AWS EC2 instances running and would like to run Hazelcast as an in-memory cluster between nodes. Followed link to do the required changes. Except configuration for taskdef.json in Task Definition.
Read some documentation but couldn't understand what and why exactly task definition is?
How to i know if it's already created? else if I create one now, would my production gets distracted?
The whole reason for the ec2 discovery is to resolve the issue with non static ip addresses. The EC2 plugin performs a describe instances and pulls the ip adddress from the json.
As shown in the digram,the pet-project that I am working on has two following components.
a) The "RestAPI layer" (set of micro-services)
b) "Scalable Parallelized Algorithm" component.
I am planing on running this on AWS.I realized that I can use ElasticBeanTalk to deploy my RestAPI module.(Spring Boot JAR with embedded tomcat)
I am thinking how to architect the "Scalable Parallelized Algorithm" component.Here are some design details about this:
This consist of couple of Nodes which share the same data stored on
S3.
Each node perform the "algorithm" on a chunk of S3 data.One node works as master node and rest of the nodes send the partial result to
this node.(embarrassingly parallel,master-slave paradigm).Master node
get invoked by the RestAPI layer.
A "Node" is a Spring Boot application which communicates with other nodes through HTTP.
Number of "Nodes" is dynamic ,which means I should be able to manually add a new Node depend on the increasing data size of S3.
There is a "Node Registry" on Redis which contains IPs of all the nodes.Each node register itself , and use the list of IPs in the
registry to communicate with each other.
My questions:
1) Shall I use EC2 to deploy "Nodes" or can I use ElasticBeanStalk to deploy these nodes as well.I know with EC2 I can manage the number of nodes depend on the size of S3 data, but is it possible to do this with ElasticBeanStalk?
2) Can I use
Inet4Address.getLocalHost().getHostAddress()
to get the IP of the each Node ? Do EC2 instances have more than one IP ? This IP should be allow the RestAPI Layer to communicate with the "master" Node.
3) Whats the component I should use expose my RestAPI layer to the external world ? But I dont want to expose my "Nodes".
Update :
I cant use MapReduce since the nodes have state. ie, During initialization , each Node read its chunk of data from S3 and create the "vector space" in memory.This a time consuming process , so thats why this should be stored in memory.Also this system need near-real-time response , cannot use a "batch" system like MR.
1) I would look into CloudFormation to help you automate and orchestrate the Scalable Parallelized Algorithm. Read this FAQ
https://aws.amazon.com/cloudformation/faqs/
2) With regards to question #2, EC2 instances can have a private and public ip, depending on how you configure them. You can query the AWS EC2 Metadata service from the instance to obtain the information like this:
curl http://169.254.169.254/latest/meta-data/public-ipv4
or
curl http://169.254.169.254/latest/meta-data/local-ipv4
Full reference to EC2 instance metadata:
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-metadata.html
3) Check out the API Gateway service, it might be what you are looking for:
https://aws.amazon.com/api-gateway/faqs/
Some general principles
Use infrastructure automation: CloudFormation or Troposphere over CloudFormation. This would make your system clean and easy to maintain.
Use Tagging: this keeps your AWS account nice and tidy. Also you can do funky scripts like describe all instances based on Tags, which can be a one-liner CLI/SDK call returning all the IPs of your "slave" instances.
Use more Tags, it can be really powerful.
ElasticBeanstalk VS "manual" setup
ElasticBeanstalk sounds like a good choice to me, but it's important to see, it's using the same components which I would recommend:
Create an AMI which contains your Slave Instance ready to go, or
Create an AMI and use UserData to configure your Slave, or
Create an AMI and/or use an orchestration tool like Chef or Puppet to configure your slave instance.
Use this AMI in an Autoscaling Launch Config
Create an AutoScalingGroup which can be on a fix number of instances or can scale based on a metric.
Pro setup: if you can somehow count the jobs waiting for execution, that can be a metric for scaling up or down automatically
Pro+ tip: use the Master node to create the jobs, put the jobs into an SQS queue. The length of the queue is a good metric for scaling. Failed jobs are back in the queue and will be re-executed. ( The SQS message contains only a reference, not the full data of the job.)
Using a queue would decouple your environment which is highly recommended
To be clear, ElasticBeanstalk does something similar. Actually if you create a multi node Beanstalk stack, it will run a CloudFromation template, create an ELB, an ASG, a LCFG, and Instances. You just have a bit less control but also less management overhead.
If you go with Beanstalk, you need Worker Environment which also creates the SQS queue for you. If you go for a Worker Environment, you can find tutorials, working examples, which makes your start easier.
Further to read:
Background Task Handling for AWS Elastic Beanstalk
Architectural Overview
2) You can use CLI, it has some filtering capabilities, or you can use other commands like jq for filtering/formatting the output.
Here is a similar example.
Note: Use tags and then you can easily filter the instances. Or you can query based on the ELB/ASG.
3) Exposing your API via the API Gateway sounds a good solution. I assume you want to expose only the Master node(s) since thats what managing the tasks.
I try to access a local instance of elasticserach through java api.
According to elastic search doc, I can use the "cluster.name" property to specify the name of the cluster to use. Perfect.
Sadly can't I specify the node name to use? I can see that this one is also configurable in configuration.
Maybe it would be a bad practice??
Also, I can seehere that I can define a custom service ID which I did, but how to specify it to my java Transport Client?
Thank you so much for your help.
The whole point of Elasticsearch is to create a cluster of highly available data. Not all nodes contain all the data and not all nodes might be up all the time. If you want to connect to a single node by specifying its name and for some reason that node is down (it was killed, it is being upgraded, it is being re-provisioned, data was wiped to be reindexed, etc), then your client wouldn't be able to run queries and get results.
Instead, if you connect to the cluster, ES will make sure to route your queries to the cluster nodes that are up, whatever the state of certain nodes that might be down. So the best practice is to always connect via the cluster.name to get the best out of your ES cluster.
As for the `SERVICE_ID, it's not something you specify in your code, it's simply the name you want to give to the Elasticsearch service when running on Windows.