Overview for Persistence in Apache Ignite

Overview for Persistence in Apache Ignite - java

I think I don't fully understand yet the Apache Ignite cache persistence. I probably miss an overview.
What I would like to achieve is something like this: Three data nodes that persistently and replicated store the cache data either on their own separate disks or in single 3rd party DB. As long as one of these nodes is available, all data shall be available to the cluster nodes. Configs for these three nodes must have the PersistenceConfiguration, I guess? What about the backups setting? This must be set to 2? What is the correct setting that as long as one of the three node is available all data will be available?
Do all data nodes have to be available for write operations to the cache? Or is one enough and the other two will replicate once they connect?
Other worker nodes shall use the cache but not store on disk. Configs for these node shall not have the Persistent set, I guess?
Sorry for all these questions. You see I may need some background information for the data store.
Thanks for any help!

Ignite native persistence can solve your problem. You can enable it by adding PersistentStoreConfiguration to IgniteConfiguration. Here is documentation on how to use it: https://apacheignite.readme.io/docs/distributed-persistent-store#section-usage
Every node that has persistence enabled will write its primary and backup partitions to disk, so when restarted, it will have this data available locally. If other nodes connect to the cluster after that, they will see the data, and it will be replicated to new nodes if needed.
Judging by your needs, you should use replicated cache. All data in cache will be stored on all nodes at the same time. When node with some data persisted on disk starts its work, it will have all data available, just like you need. Replicated cache is effectively equivalent to having all data backed up on every node, so you don't have to additionally configure backups. Here is documentation on cache modes: https://apacheignite.readme.io/docs/cache-modes
To restrict cache data to be stored on particular nodes only, you can create three server nodes, that will store data, and start other nodes as clients. You can find the difference here: https://apacheignite.readme.io/docs/clients-vs-servers
If you need more than three server nodes, then you can use cache node filter. It is a predicate, that specifies, what nodes should store data of some particular cache. Here is JavaDoc for CacheConfiguration.setNodeFilter method: https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/configuration/CacheConfiguration.html#setNodeFilter(org.apache.ignite.lang.IgnitePredicate)
Another option to enable persistence is to use CacheStore. It enables you to replicate your data in any external database, but it has lower performance and less features available, so I would recommend to go with the native one. Here is documentation on 3rd pary persistence: https://apacheignite.readme.io/v2.2/docs/3rd-party-store

Related

Does Hazelcast store MultiMap values in the local instance when backup is disabled?

I am configuring a Hazelcast Multimap without backups (on purpose):
config.getMultiMapConfig(SESSIONS_MAP)
.setBackupCount(0)
.setAsyncBackupCount(0)
.setValueCollectionType(MultiMapConfig.ValueCollectionType.SET);
My goal is that each instance stores its own values in the MultiMap, so that when a server disappears, those values are lost. Is above configuration correct?
Example: Server instances in a cluster host user sessions. I want to store users in a MultiMap, so that each user is physically stored on the local instance, but other instances can look up where a user session exists. When a server crashes, the user sessions disappear, and so should the entries in the MultiMap. [Users are actually stored in rooms, like MultiMap<roomId, Set<userId>>, where a room may span multiple instances. If one instance goes down, the room may survive, but I want the users on the current instance to become unavailable in the MultiMap as well.]
Only if above is guaranteed: In a controlled shutdown, is it worth to clean up the local entries before shutting down, or is it cheaper to just make the instance disappear?
The manual at https://docs.hazelcast.org/docs/latest-dev/manual/html-single/index.html#configuring-multimap doesn't clearly spell out what actually happens (or I am too blind to find it).

If you set backup counts to zero, it means that each entry will only be stored in one partition (the primary). But it doesn't mean that partition will be hosted on the "local" cluster node.
The partition where any entry is stored is determined by a hashing algorithm, but the mapping of partitions to cluster nodes will change as cluster membership changes (nodes are added or removed). So I don't think trying to manipulate the hashcode is a good way to go.
Since you mention the "local instance", I'm guessing you're using Hazelcast in embedded mode, and the Hazelcast cluster nodes are on the same servers that host the "rooms". You might want to configure a MembershipListener; this listener would be notified whenever a node leaves the cluster, and the listener could then remove map entries related to user sessions hosted in rooms on that node.

Thats a wrong use case for a partition-based distributed system. When you store in a partitioned distributed data-structure such as Map or MultiMap, you do not have control over which partition would host your key-value data. The host partition to your data is determined by consistent hashing algorithm applied on the key. This applies to both - write as well as read operations. And with backup enabled, the data is replicated in backup partitions on each node so that data can be recovered in case of a node failure.
So in your case, you don't even know whether a particular entry is indeed local to your instance (unless you are manually recording this mapping of key-partition using Hazelcast APIs). You are looking up an entry hoping it to be local to that instance because you executed the write operation of that entry from that same node but in reality, that entry may be stored on a partition in some other node in the cluster.
I believe what you want is NearCache which in other words can also be addressed as L1 cache - local to your application. If you loose the app instance, you loose the NearCache and is not available with MultiMap. But even with NearCache, you will never receive "null" or "data not found" because NearCache in principle, loads the data from partition owner (cluster node) if the data is not found in NearCache.
You can also turning off backup but that will mean loosing data on the lost node which may not be local to your application.
Hope that helps.

Hazelcast data affinity with preferred member as primary

I have a clustered system set up with Hazelcast to store my data. Each node in the cluster is responsible for connecting to a service on localhost and piping data from this service into the Hazelcast cluster.
I would like this data to be stored primarily on the node that received it, and also processed on that node. I'd like the data to be readable and writable on other nodes with moderately less performance requirements.
I started with a naive implementation that does exactly as I described with no special considerations. I noticed performance suffered quite a bit (we had a separate implementation using Infinispan to compare it with). Generally speaking, there is little logical intersection between the data I'm processing from each individual service. It's stored in a Hazelcast cluster so it can be read and occasionally written from all nodes and for failover scenarios. I still need to read the last good state of the failed node if either the Hazelcast member fails on that node or the local service fails on that node.
So my first attempt at co-locating the data and reducing network chatter was to key much of the data with a serverId (number from 1 to 3 on, say, a 3-node system) and include this in the key. The key then implements PartitionAware. I didn't notice an improvement in performance so I decided to execute the logic itself on the cluster and key it the same way (with a PartitionAware/Runnable submitted to a DurableExecutorService). I figured if I couldn't select which member the logic could be processed on, I could at least execute it on the same member consistently and co-located with the data.
That made performance even worse as all data and all execution tasks were being stored and run on a single node. I figured this meant node #1 was getting partitions 1 to 90, node #2 was getting 91 to 180, and node #3 was getting 181 to 271 (or some variant of this without complete knowledge of the key hash algorithm and exactly how my int serverId translates to a partition number). So hashing serverId 1, 2, 3 and resulted in e.g. the oldest member getting all the data and execution tasks.
My next attempt was to set backup count to (member count) - 1 and enable backup reads. That improved things a little.
I then looked into ReplicatedMap but it doesn't support indexing or predicates. One of my motivations to moving to Hazelcast was its more comprehensive support (and, from what I've seen, better performance) for indexing and querying map data.
I'm not convinced any of these are the right approaches (especially since mapping 3 node numbers to partition numbers doesn't match up to how partitions were intended to be used). Is there anything else I can look at that would provide this kind of layout, with one member being a preferred primary for data and still having readable backups on 1 or more other members after failure?
Thanks!

Data grids provide scalability, you can add or remove storage nodes to adjust capacity, and for this to work the grid needs to be able to rebalance the data load. Rebalancing means moving some of the data from one place to another. So as a general rule, the placement of data is out of your control and may change while the grid runs.
Partition awareness will keep related items together, if they move they move together. A runnable/callable accessing both can satisfy this from the one JVM, so will be more efficient.
There are two possible improvements if you really need data local to a particular node, read-backup-data or near-cache. See this answer.
Both or either will help reads, but not writes.

How to shard across a cluster of nodes based on the value of String?

I would like to create a distributed system where the data is sharded across all the nodes. I know there are libraries like Hazelcast or Apache Ignite that do the work for you. In my case, for each sharding key I need to create a socket subscription to another system so it's not just about how data is distributed but also how to actually create these subscriptions in a distributed way.
The idea is to, for each sharding key, create a subscription to the other system. Each subscription would keep a list of entries with data to check for every update coming from the socket connection.
What I had in mind was to send, for each new entry to keep, a message with the sharding key and the data to a topic. Then each node would apply the sharding algorithm to decide which of them is responsible to process the message and then create the subscription to the socket connection if it's not there already and add the data to it.
The complexity with this is to handle cluster topology changes. I would need to rebalance these connections manually by letting one node act as a leader reloading the data from database and resending the data again. Nodes would also need to react to these changes clearing the subscriptions. For that I thought of using a version number that would go alongside the data which increases with every change and allows nodes to identify these changes. Another solution would be to make every node aware of topology changes through events but these are async so I could run into race conditions when clearing the subscriptions.
Is there any other way or a better one of doing this? Maybe with some of the features Ignite provides? (I'm using Ignite for a cache in this case)
Thanks.

This sounds like a use case for continuous queries: https://apacheignite.readme.io/docs/continuous-queries

How to handle recovery memcached nodes when using spymemcached & HashAlgorithm.KETAMA_HASH

I am using spymemcached & HashAlgorithm.KETAMA_HASH to connect to a pool of memcached of 5 nodes.
My understanding is when we use a consistent hashing algorithm like, when a node is down, we don't need to worry as the key will be re-distributed (with min. impact)
What if when the down-ed node is going to join the pool. What I need to do?
Should I make sure stale data need to be removed? Or should my program need special handling for this case?

Given that this document is accurate: http://info.couchbase.com/rs/northscale/images/Couchbase_WP_Dealing_with_Memcached_Challenges.pdf
If there is any network disruption, and one or more clients decide that a particular
memcached server is not available anymore, they will automatically rehash some data into
the rest of the nodes even if the original one is still available. If the node eventually returns to service (for example after the network outage is resolved), the data on that node will be out of date and the clients without updated key-server remapping info will read stale data.
Assuming this is still up to date: http://lists.danga.com/pipermail/memcached/2007-April/003852.html
It would be safe refresh/flush the node before adding it back. Forcing the down-ed node to clear up any stale entry.

Why use your application-level cache if database already provides caching?

Modern database provide caching support. Most of the ORM frameworks cache retrieved data too. Why this duplication is necessary?

Because to get the data from the database's cache, you still have to:
Generate the SQL from the ORM's "native" query format
Do a network round-trip to the database server
Parse the SQL
Fetch the data from the cache
Serialise the data to the database's over-the-wire format
Deserialize the data into the database client library's format
Convert the database client librarie's format into language-level objects (i.e. a collection of whatevers)
By caching at the application level, you don't have to do any of that. Typically, it's a simple lookup of an in-memory hashtable. Sometimes (if caching with memcache) there's still a network round-trip, but all of the other stuff no longer happens.

Here are a couple of reasons why you may want this:
An application caches just what it needs so you should get a better cache hit ratio
Accessing a local cache will probably be a couple of orders of magnitude faster than accessing the database due to network latency - even with a fast network

Scaling read-write transactions using a strongly consistent cache
Scaling read-only transactions can be done fairly easily by adding more Replica nodes.
However, that does not work for the Primary node since that can be only scaled vertically:
And that's where a cache comes into play. For read-write database transactions that need to be executed on the Primary node, the cache can help you reduce the query load by directing it to a strongly consistent cache, like the Hibernate second-level cache:
Using a distributed cache
Storing an application-level cache in the memory of the application is problematic for several reasons.
First, the application memory is limited, so the volume of data that can be cached is limited as well.
Second, when traffic increases and we want to start new application nodes to handle the extra traffic, the new nodes would start with a cold cache, making the problem even worse as they incur a spike in database load until the cache is populated with data:
To address this issue, it's better to have the cache running as a distributed system, like Redis. This way, the amount of cached data is not limited by the memory size on a single node since sharding can be used to split the data among multiple nodes.
And, when a new application node is added by the auto-scaler, the new node will load data from the same distributed cache. Hence, there's no cold cache issue anymore.

Even if a database engine caches data, indexes, or query result sets, it still takes a round-trip to the database for your application to benefit from that cache.
An ORM framework runs in the same space as your application. So there's no round-trip. It's just a memory access, which is generally a lot faster.
The framework can also decide to keep data in cache as long as it needs it. The database may decide to expire cached data at unpredictable times, when other concurrent clients make requests that utilize the cache.
Your application-side ORM framework may also cache data in a form that the database can't return. E.g. in the form of a collection of java objects instead of a stream of raw data. If you rely on database caching, your ORM has to repeat that transformation into objects, which adds to overhead and decreases the benefit of the cache.

Also, the database's cache might not be as practical as one think. I copied this from http://highscalability.com/bunch-great-strategies-using-memcached-and-mysql-better-together -- it's MySQL specific, tho.
Given that MySQL has a cache, why is memcached needed at all?
The MySQL cache is associated with just one instance. This limits the cache to the maximum address of one server. If your system is larger than the memory for one server then using the MySQL cache won't work. And if the same object is read from another instance its not cached.
The query cache invalidates on writes. You build up all that cache and it goes away when someone writes to it. Your cache may not be much of a cache at all depending on usage patterns.
The query cache is row based. Memcached can cache any type of data you want and it isn't limited to caching database rows. Memcached can cache complex complex objects that are directly usable without a join.

The performance considerations related to the network roundtrips have correctly been pointed out.
To that, it must be added that caching data anywhere else than in the dbms (NOT "database"), creates a problem of potentially obsoleted data that is still being presented as being "up to date".
Giving in to the temptations of performance improvement goes at the expense of losing the guarantee (watertight or at least close to that) of absolutely reliably and guaranteeably correct and consistent data.
Consider this every time accuracy and consistency is crucial.

A lot of good answers here. I'll add one other point: I know my access pattern, the database doesn't.
Depending on what I'm doing, I know that if the data ends up stale, that's not really a problem. The DB doesn't, and would have to reload the cache with the new data.
I know that I'll come back to a piece of data a few times over the next while, so it's important to keep around. The DB has to guess at what to keep in the cache, it's doesn't have the information I do. So if I fetch it from the DB over and over, it may not be in cache if the server is busy. I could get a cache miss. With my cache, I can be sure I get a hit. This is especially true on data that is non-trivial to get (i.e. a few joins, some group functions) as opposed to just a single row. Getting a row with the primary key of 7 is easy for the DB, but if it has to do some real work, the cost of the cache miss is much higher.

No doubt that modern databases are providing caching facility but when you are having more traffic on you site and that time you need to perform many database transaction then you will no get high performance.So to increase performance in this case hibernate cache will help you,
by optimizing the database applications. The cache actually stores the data already loaded from the database, so that the traffic between our application and the database will be reduced when the application want to access that data again.The access time and traffic will be reduced between the application and the database.

That said - caches can sometimes become a burden and actually slowdown the server. When you have high load the algorithm for what is cached and what is not might not fit right with the requests coming in...what you get is a cache that starts to operate like FIFO in overtime...this begins to make itself known when the table that sits behind the cache has significantly more records than are ever going to be cached in memory...
A good trade off would be to cluster the data for what you want to cache. Have a main server which pumps updates to the clusters, the time for when to send/pump the updates should be able to be tailored for each table depending on TTL (time to live) settings.
Your logic and data on the user node can then sit on the same server which opens up in memory databases or if it does have to fetch data then you could set it up to use a pipe instead of a network call...
This is something that takes some thought on how you want to use the data and when/if you cluster then you have to be aware of distributed transactions (transactions over more than one database)...but if the data being cached will be updated on its own without links into other db spaces then you can get away with this....
The problem with ORM caching is that if the database is updated independently through another application then the ORM cache can become out of date...Also it can get tricky if you do an update to a set...the update might update something that is in your cache and it needs to have some sort of algorithm to identify which records need to be removed/updated in memory (slowing down the update!?) - and then this algorithm becomes incredibly tricky and bug prone!
If using ORM caching then keep to a simple rule...cache simple objects that hardly ever change (user/role details for example) and that are small in size and are hit many times in a request...if its outside of this then I suggest clustering the data for performance.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.