Need information on best practices for below AWS specific use case,
Our Java web application is deployed in us-east-1 and us-west-2 regions.
It communicates to Dynamo DB and Memcached based ElastiCache is sitting on top of Dynamo DB in both the regions.
We have Dynamo DB replication enabled between us-east-1 and us-west-2.
Route 53 directs API calls to the appropriate region.
Now, the issue is when we create or update a record in Dynamo DB it get's inserted in Dynamo DB and get's cached in that particular region. Record get's replicated to other regions Dynamo DB as well, but cache doesn't remain in sync since there is no replication between ElastiCache.
How do we address this issue in a best possible way?
DAX is not an answer here. I'm surprised nobody is pointing out this.
Ivan is wrong about this one.
DAX is read-write-through cache indeed, but it doesn't capture data changes happens through direct access to DynamoDB (obviously), nor the data changes that happened through "other" DAX cluster.
So in this scenario,
you have 2 DAX cluster one at west-1 one at east-1, and those two clusters are totally independent.
thus, if you make a data change on west-1 through DAX, that change does get propagated to east-1 on DynamoDB Table level, Not the cache(DAX) level
In other words, if you update the record that had been accessed from both region, (which than cached in both DAX Cluster), you'll still get the same problem.
Fundamentally, this is the problem of syncing cache layer across different regions, which is pretty hard. if you really need it, there are some ways of doing this by making your own Kafka or Kinesis stream that keep the Cache entry changes and consumed by cache clusters in multi-regions. which is not possible with elasticache by its own now. (but it is possible if you setup lambda or EC2 for this task only)
Checkout case studies from Netflix
One option as other suggested is to use DynamoDB DAX. It's a write-through cache, meaning that you do not need to synchronise database data with your cache. It happens when behind the curtains when you write data using normal DynamoDB API.
But if you want to still use ElastiCache you can use DynamoDB Streams. You can implement a Lambda function that will be triggered on every DynamoDB update in a table and write new data into ElastiCache.
This article may give you some ideas about how to do this.
Related
I would like to use Apache Ignite as failover read-only storage so my application will be able to access the most sensitive data if main storage (Oracle) is down.
So I need to
Start nodes
Create schema (execute DDL queries)
Load data from Oracle to Ignite
Seems like it's not the same as database caching and I don't need to use Cache. However, this page says that I need to implement a store to load a large amount of data from 3rd parties.
So, my questions are:
How to effectively transfer data from Oracle to Ignite? Data Streamers?
Who should init this transfer? First started node? How to do that? (tutorials explain how to achieve that via clients, should I follow this advice?)
Actually, I think, use of a cache store without read/write-through would be a suitable option here. You can configure a CacheJdbcPojoStore, for example, and call IgniteCache#loadCache(...) on your cache, once the cluster is up. More on this topic: https://apacheignite.readme.io/docs/3rd-party-store
If you don't want to use a cache store, then IgniteDataStreamer could be a good choice. This is the fastest way to upload big amount of data to the cluster. Data loading is usually performed from a client node, when all server nodes are up and running.
The problem statement :
Example : I have table name called "STUDENT" and it has 10 rows and consider one of the rows has name as "Jack". So when my server started and running I make the DB database into cache memory so my application has the value of "jack" and I am using it all over my application.
Now external source changed my "STUDENT" table and changed name "Jack" into "Prabhu Jack". I want the updated information asap into my application with out reloading/refresh into my application.. I dont want to run some constant thread to monitor and update my application. All I want is part of hibernate or any feasible solution to achieve this?
..
What you describe is the classic case of whether to pull or push updates.
Pull
This approach relies on the application using some background thread or task system that periodically polls a resource and requests the desired information. Its the responsibility of the application to perform this task.
In order to use a pull mechanism in conjunction with a cache implementation with Hibernate, this would mean that you'd want your Hibernate query results to be stored in a L2 cache implementation, such as ehcache.
Your ehcache would specify the storage capacity and expiration details and you simply query for the student data at each point you require it. The L2 cache would be consulted first, which lives on the application server side, and would only consult the database if the L2 cache had expired.
The downside is you would need to specify a reasonable time-to-live setting for the L2 cache so that the cache got updated by a query within reason after the rows were updated. Depending on the frequency of change and usage, maybe a 5 minute window is sufficient.
Using the L2 cache prevents the need for a useless background poll thread and allows you to specify a reasonable poll time all within the Hibernate framework backed by a cache implementation.
Push
This approach relies on the point where a change occurs to be capable of notifying interested parties that something changed and allowing the interested party to perform some action.
In order to use a push mechanism, your application would need to expose a way to be told a change occurred and preferably what the change actually was. Then when your external source modifies the table in question, that operation would need to raise an event and notify interested parties.
One way to architect this would be to use a JMS broker and have the external source submit a JMS message to a queue and have your application subscribe to the JMS queue to read the message when its sent.
Another solution would be to couple the place where the external source manipulates the data tightly with your application such that the external source doesn't just manipulate the data in question, but also sends a JSON request to your application, allowing it to update its internal cache immediately.
Conclusion
Using a push situation could require the introduction of additional middleware components, should you want to efficiently decouple the external source side & your application. But it does come with the added benefit that the eventual consistency between the database and your application's cache should happen with relative real-time. This solution also has no additional needs for querying the database after startup for those rows.
Using a pull situation doesn't require anything more than what you're likely already using in your application, other than maybe using a supported L2 cache provider rather than some homegrown solution. However, the eventual consistency between the database and your application's cache is completely dependent on your TTL configuration for that entity's cache. But be aware that this solution will continue to query the database to refresh the cache once your TTL has expired.
I want to dynamically configure my API servers depending on the name of the "cluster".
So I'm using AmazonElastiCacheClient to discover the clusters name and need to extract the endpoint of the one that has a specific name.
The problem is that I can find it but there doesn't seem to be a way to get an endpoint.
foundCluster.getCacheNodes() returns an empty list, even if there is 1 Redis instance appearing in the AWS console, in-sync and running.
foundCluster.getConfigurationEndpoint() returns null.
Any idea?
Try adding
DescribeCacheClustersRequest.setShowCacheNodeInfo(true);
I am making a guess:
AWS Elastic Cache with redis currenlty supports only single node clusers (so no auto discovery etc). I am not sure this is due this. Memcached based cluster is different.
"At this time, ElastiCache supports single-node Redis cache clusters." http://docs.aws.amazon.com/AmazonElastiCache/latest/UserGuide/CacheNode.Redis.html
We are running mongodb instance to store data in a collections, no problems with it and mongo is our main data storage.
Today, we are going to develop Oauth2 support for the product and have to store the user sessions (security key, access token and etc.. ) and the access token have to be validated against the authentication server only after the defined timeout so that not every request will wait for validation by authentication server.
First request for secured resource (create) shall always be authenticated against the authentication server. Any subsequent request will be validated internally (cache) and check the internal timeout and only if expired, another request to the authentication server will be issued.
To solve that requirements, we have to introduce some kind of a distributed cache, to store (with TTL support) the user sessions and etc, expire it based on a ttl.. .i wrote about that above.
Two options here:
store user session in the hazelcast and share it across all App servers - nice choice, to persists all user session in eviction map.
store user sessions in MongoDb - and do the same.
Do you see any benefits of using Hazelcast instead of storing the temp data inside Mongo? Any significant performance improvements you're aware of ?
I'm new to Hazelcast, so don't aware about all killer features.
Disclaimer: I am the founder of Hazelcast...
Hazelcast is much simpler and simplicity matters a lot.
You can embed Hazelcast into your application (if your application is
written in Java). No need to deploy and maintain remote nosql
cluster.
Hazelcast works directly with your application objects. No
JSON or any other format. Write and read java objects.
You can
execute Java code on your in-memory data. No need to fetch and
process data; send your code over to the data.
You can listen for
the updates on your data. "Notify me when this map or key is
updated".
Hazelcast has rich set of data structures like queue,
topic, semaphores, locks, multimap etc. Imagine sharing a queue
across multiple nodes and be able to do blocking queue poll/take
operation... this is really cool :)
Hazelcast is an in-memory grid so it should be significantly faster than MongoDB for that kind of usage. They also have pre-made session clustering code for Java servlets if you do not want to create that yourself.
Code for the session clustering here on github. Or here for Maven artifact.
Question:-
Does java client need to worry about multiple servers ?
Meaning:-
I have given two servers in memcached client, but when i set or get a key from cache, do i need to provide any server related info to it or memcache itself takes care of it??
My knowledge:-
Memcache itself takes care due to consistent hashing.
but does spymemcached 2.8.0 provides consistent hashing???
Memcached servers are pooling servers. Meaning that you define a pool (a list) of servers and when the Java client attempts a write it writes towards the pool.
It's the client's job to decide which server from the pool will receive and store the value and how it will retrieve the value from that pool.
Basically this allows you to start with one Memcached server (possibly on the same machine) and if push comes to shove you can add a few dozen more servers to the pool without touching the application code.
Since the client is responsible of distributing data across the pool of servers (the client has to choose right memcached server to store/fetch data) there are few distribution algorithms.
One of the simplest is modula. This algorithm distributes keys depending on the amount of memcached servers in pool. If the number of servers in the pool changes, the client won't be able to the find stored data, there will be cache misses. In such case it's better to use consistent hashing.
Most popular java memcached clients spymemached and xmemcached support consistent hashing.
In some use cases instead of directly using the memcached client, caching can be added to a spring application through AOP (interceptors) using simple-spring-memcached or Spring 3.1 Cache Abstraction. Spring Cache currently doesn't support memcached but simple-spring-memcached provides such integration in snapshot build and upcoming 3.0.0 release.
MemCached server will manage to store and retrieve key/value by itself.
While storing using hash generate the key and store it.
While retrieve again hash the given key and find on which server it has been stored and then fetch it, this will take some time.
Instead, there is one approach which can be used for storing and retrieving.
Create one HashMap and store the key with server address as value. Now next time if the same key needs to get then instead of searching, you will get the server address directly from the HashMap and you needs to fetch value from there only. Hence you can save the searching time of MemCahce server.
Hope you understand what i mean.