Question:-
Does java client need to worry about multiple servers ?
Meaning:-
I have given two servers in memcached client, but when i set or get a key from cache, do i need to provide any server related info to it or memcache itself takes care of it??
My knowledge:-
Memcache itself takes care due to consistent hashing.
but does spymemcached 2.8.0 provides consistent hashing???
Memcached servers are pooling servers. Meaning that you define a pool (a list) of servers and when the Java client attempts a write it writes towards the pool.
It's the client's job to decide which server from the pool will receive and store the value and how it will retrieve the value from that pool.
Basically this allows you to start with one Memcached server (possibly on the same machine) and if push comes to shove you can add a few dozen more servers to the pool without touching the application code.
Since the client is responsible of distributing data across the pool of servers (the client has to choose right memcached server to store/fetch data) there are few distribution algorithms.
One of the simplest is modula. This algorithm distributes keys depending on the amount of memcached servers in pool. If the number of servers in the pool changes, the client won't be able to the find stored data, there will be cache misses. In such case it's better to use consistent hashing.
Most popular java memcached clients spymemached and xmemcached support consistent hashing.
In some use cases instead of directly using the memcached client, caching can be added to a spring application through AOP (interceptors) using simple-spring-memcached or Spring 3.1 Cache Abstraction. Spring Cache currently doesn't support memcached but simple-spring-memcached provides such integration in snapshot build and upcoming 3.0.0 release.
MemCached server will manage to store and retrieve key/value by itself.
While storing using hash generate the key and store it.
While retrieve again hash the given key and find on which server it has been stored and then fetch it, this will take some time.
Instead, there is one approach which can be used for storing and retrieving.
Create one HashMap and store the key with server address as value. Now next time if the same key needs to get then instead of searching, you will get the server address directly from the HashMap and you needs to fetch value from there only. Hence you can save the searching time of MemCahce server.
Hope you understand what i mean.
Related
Requirement : I have 4 servers : A,B,C,D. They all connect to data provider, get the data and persist it into mongodb for N mins. So that if, next time, same request arrives to another server, it takes data from mongodb only instead of making a call to data provider.
|A|
|B| |data provider|
|C|
|D|
But if, |data provider| response slow, there is a possibility that 2 different requests for same resource arrive to A, B. I want one request waiting until the response of first request is received. I am using queue for this which is fine for single server. But now I need need distributed cache due to multiple servers.
Implementation : After reading few articles over the net, I got to know that Distributed Cache in Java can be implemented using ehcache RMI replication. But I have few doubts before going ahead with ehchache. (Although there are more solutions like JCS etc, I decided to pick ehcache on the basis other answers on StackOverflow)
Doubts
What if one of the servers gets down? Does ehcache handles this automatically?
Interesting situation, but I fail too see how an extra cache (of any kind) would help solve the problem. Ultimately your problem boils down to one of coordination between servers and a cache has little to add there.
Instead I would either use a queue shared between the four servers where only one request for a resource is allowed on at a time. Another possibility is a shared Map where each server will lock a resource name while retrieving it. Other servers can then wait on this lock and once it is released try to retrieve the resource from MongoDB.
I haven't tried using it, but a combination of redis and redisson looks like a good fit for such a task.
Ehcache with RMI replication is NOT a distributed cache and will not help in your situation because there will not be any shared state on which to queue/isolate your accesses.
Distributed Ehcache - that is backed by Terracotta - can help as you could combine strong consistency with a CacheLoader to obtain that only one thread across servers does load a given resource. But unless you are ready to stick with Terracotta 3.7.x, this is no longer an open source option.
But as Martin said, this may not be the best answer to your use case, as I feel you already use MongoDB as your fast access storage, which makes the cache redundant.
I have recently started taking a look into Infinispan as our caching layer. After reading through the operation modes in Infinispan as mentioned below.
Embedded mode: This is when you start Infinispan within the same JVM as your applications.
Client-server mode: This is when you start a remote Infinispan instance and connect to it using a variety of different protocols.
Firstly, I am confuse now which will be best suited to my application from the above two modes.
I have a very simple use case, we have a client side code that will make a call to our REST Service using the main VIP of the service and then it will get load balanced to individual Service Server where we have deployed our service and then it will interact with the Cassandra database to retrieve the data basis on the user id. Below picture will make everything clear.
Suppose for example, if client is looking for some data for userId = 123 then it will call our REST Service using the main VIP and then it will get load balanced to any of our four service server, suppose it gets load balanced to Service1, and then service1 will call Cassandra database to get the record for userId = 123 and then return back to Client.
Now we are planning to cache the data using Infinispan as compaction is killing our performance so that our read performance can get some boost. So I started taking a look into Infinispan and stumble upon two modes as I mentioned below. I am not sure what will be the best way to use Infinispan in our case.
Secondly, As from the Infinispan cache what I will be expecting is suppose if I am going with Embedded Mode, then it should look like something like this.
If yes, then how Infinispan cache will interact with each other? It might be possible that at some time, we will be looking for data for those userId's that will be on another Service Instance Infinispan cache? Right? So what will happen in that scenario? Will infinispan take care of those things as well? if yes, then what configuration setup I need to have to make sure this thing is working fine.
Pardon my ignorance if I am missing anything. Any clear information will make things more clear to me to my above two questions.
With regards to your second image, yes, architecture will exactly look like this.
If yes, then how Infinispan cache will interact with each other?
Please, take a look here: https://docs.jboss.org/author/display/ISPN/Getting+Started+Guide#GettingStartedGuide-UsingInfinispanasanembeddeddatagridinJavaSE
Infinispan will manage it using JGroups protocol and sending messages between nodes. The cluster will be formed and nodes will be clustered. After that you can experience expected behaviour of entries replication across particular nodes.
And here we go to your next question:
It might be possible that at some time, we will be looking for data for those userId's that will be on another Service Instance Infinispan cache? Right? So what will happen in that scenario? Will infinispan take care of those things as well?
Infinispan was developed for this scenario so you don't need to worry about it at all. If you have for example 4 nodes and setting distribution mode with numberOfOwners=2, your cached data will live on exactly 2 nodes in every moment. When you issue GET command on NON owner node, entry will be fetched from the owner.
You can also set clustering mode to replication, where all nodes contain all entries. Please, read more about modes here: https://docs.jboss.org/author/display/ISPN/Clustering+modes and choose what is the best for your use case.
Additionally, when you add new node to the cluster there will StateTransfer take place and synchronize/rebalance entries across cluster. NonBlockingStateTransfer is implemented already so your cluster will be still capable of serving responses during that joining phase. See: https://community.jboss.org/wiki/Non-BlockingStateTransferV2
Similarly for removing/crashing nodes in your cluster. There will be automatic rebalancing process so for example some entries (numOwners=2) which after crash live only at one node will be replicated respectively to live on 2 nodes according to numberOfOwners property in distribution mode.
To sum it up, your cluster will be still up to date and this does not matter which node you are asking for particular entry. If it does not contain it, entry will be fetched from the owner.
if yes, then what configuration setup I need to have to make sure this thing is working fine.
Aforementioned getting started guide is full of examples plus you can find some configuration file examples in the Infinispan distribution: ispn/etc/config-samples/*
I would suggest you to take a look at this source too: http://refcardz.dzone.com/refcardz/getting-started-infinispan where you can find even more basic and very quick configuration examples.
This source also provides decision related information for your first question: "Should I use embedded mode or remote client-server mode?" From my point of view, using remote cluster is more enterprise-ready solution (see: http://howtojboss.com/2012/11/07/data-grid-why/). Your caching layer is very easily scalable, high-available and fault tolerant then and is independent of your database layer and application layer because it simply sits between them.
And you could be interested about this feature as well: https://docs.jboss.org/author/display/ISPN/Cache+Loaders+and+Stores
I think in newest Infinispan release supports to work in a special, compatibility mode for those users interested in accessing Infinispan in multiple ways .
follow below link to configure your cache environment to support either embedded or remotely.
Interoperability between Embedded and Remote Server Endpoints
We are running mongodb instance to store data in a collections, no problems with it and mongo is our main data storage.
Today, we are going to develop Oauth2 support for the product and have to store the user sessions (security key, access token and etc.. ) and the access token have to be validated against the authentication server only after the defined timeout so that not every request will wait for validation by authentication server.
First request for secured resource (create) shall always be authenticated against the authentication server. Any subsequent request will be validated internally (cache) and check the internal timeout and only if expired, another request to the authentication server will be issued.
To solve that requirements, we have to introduce some kind of a distributed cache, to store (with TTL support) the user sessions and etc, expire it based on a ttl.. .i wrote about that above.
Two options here:
store user session in the hazelcast and share it across all App servers - nice choice, to persists all user session in eviction map.
store user sessions in MongoDb - and do the same.
Do you see any benefits of using Hazelcast instead of storing the temp data inside Mongo? Any significant performance improvements you're aware of ?
I'm new to Hazelcast, so don't aware about all killer features.
Disclaimer: I am the founder of Hazelcast...
Hazelcast is much simpler and simplicity matters a lot.
You can embed Hazelcast into your application (if your application is
written in Java). No need to deploy and maintain remote nosql
cluster.
Hazelcast works directly with your application objects. No
JSON or any other format. Write and read java objects.
You can
execute Java code on your in-memory data. No need to fetch and
process data; send your code over to the data.
You can listen for
the updates on your data. "Notify me when this map or key is
updated".
Hazelcast has rich set of data structures like queue,
topic, semaphores, locks, multimap etc. Imagine sharing a queue
across multiple nodes and be able to do blocking queue poll/take
operation... this is really cool :)
Hazelcast is an in-memory grid so it should be significantly faster than MongoDB for that kind of usage. They also have pre-made session clustering code for Java servlets if you do not want to create that yourself.
Code for the session clustering here on github. Or here for Maven artifact.
I have a requirement where i need to create a third party application which will test any object's presence in Oracle Coherence.
Scenario: Our main application uses Oracle Coherence to store some data, now i have to create a separate application (which will be running on a different server - out of the coherence cluster node). This particular application will detect whether some particular object is present in the coherence or not. We have no plans to run coherence on this machine too.
Can any third party application (which is not part of the coherence cluster) connect to coherence and fetch data? If yes then how? can i get some pointers to do the same?
There are multiple ways you can do it.
1) Use Coherence Extend - This allows any application to interact with Coherence without being part of Coherence Cluster.
Refer to http://docs.oracle.com/cd/E14526_01/coh.350/e14509/configextend.htm
This option is supported only if the third part application is in Java, .Net or C++
http://coherence.oracle.com/display/COH35UG/Coherence+Extend#CoherenceExtend-Typesofclients
2) Use REST API - The newer/latest versions of Coherence exposes cache data management using REST API's. Refer to http://docs.oracle.com/cd/E24290_01/coh.371/e22839/rest_intro.htm
This option does not have any restriction on client/third part technology as it is based on XML/JSON over HTTP.
Using REST you can check presence of cache key as below.
GET Operation
GET http://{host}:{port}/cacheName/key
Returns a single object from the cache based on a key. A 404 (Not Found) message is returned if the object with the specified key does not exist.
I created such a tool some time back using the C++ API.
https://github.com/actsasflinn/coherence-tool
I also wrapped the C++ API in a Ruby binding for scripting purposes.
https://github.com/actsasflinn/ruby-coherence
Either of these can run standalone outside of the cluster and rely on the TCP proxy method of communicating with a cluster.
I am using xmemcached-1.3.5 jar version of MemCache server.
Now, I have two instances of this server on different machine. I am
using xmemcached-1.3.5 jar to make entry in memcache-server through my
java application.
When both servers are up, the entry is made on only one of the MemCache instances.
Is there any configuration that needs to be made to get the entries duplicated onto both instances?
Memcached does not provide replication. You will need to investigate repcached if you want your cached data replicated.