Ignite - full sync configuration

Ignite - full sync configuration - java

I have two server ignite nodes (Each node is started in the Spring Boot application) in the cluster.
And i have two cache:
//Persistence cahce
configuration.setReadThrough(true);
configuration.setWriteThrough(true);
configuration.setCacheStoreFactory(storeFactory);
configuration.setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL);
configuration.setWriteSynchronizationMode(CacheWriteSynchronizationMode.FULL_SYNC);
configuration.setCacheMode(CacheMode.REPLICATED);
//in memory
configuration.setIndexedTypes(String.class, SequenceReserve.class);
configuration.setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL);
configuration.setWriteSynchronizationMode(CacheWriteSynchronizationMode.FULL_SYNC);
configuration.setCacheMode(CacheMode.REPLICATED);
Requests for update to any caches can go to each node in parallel.
Every update - atomic operation.
cache.invoke(...);
My main goal is to avoid a inconsistent data any cost. In memory cache can get lost, but should not be inconsistent.
Any node should return an exception if the transaction was not commit on all nodes.
Can I write such a configuration that this behavior is guaranteed with 100% probability.
UPDATED
I ran the test and got the following behavior:
Each request is always performed on the same node (invoke method). I believe this is correct behavior. When will the query be executed on the second node?

IgniteCache#invoke(...) is a transactional operation. The best way to learn it is to check whether it throws TransactionException.
Your configuration seems to be enough to guarantee data consistency between nodes.
If you mean consistency between these two caches, then you can start explicit transactions and run invoke-s within them.
UPD
Note, that, as mentioned in JavaDoc for invoke(..) method, your EntryProcessor should be stateless. It may be called multiple times on different nodes, so it should return the same value each time.
UPD 2
If you call IgniteCache#invoke() method on a transactional cache, it makes the provided EntryProcessor be called on every node, that contains the needed partition of this cache. But if the cache is atomic, then the EntryProcessor will be called on the primary node only.
But you shouldn't rely on this behaviour. It's not specified anywhere, so it may change in future versions. Ignite is free to make as many calls to EntryProcessor#process() as it's necessary to guarantee data consistency.
You can use the following code to verify my words:
public static void main(String[] args) throws IgniteException {
Ignite ignite = Ignition.start("examples/config/example-ignite.xml");
IgniteCache<Integer, String> atomicCache = ignite.getOrCreateCache(
cacheConfiguration("atomic", CacheAtomicityMode.ATOMIC));
IgniteCache<Integer, String> txCache = ignite.getOrCreateCache(
cacheConfiguration("transactional", CacheAtomicityMode.TRANSACTIONAL));
atomicCache.invoke(1, (entry, arguments) -> {
System.out.println("Atomic invoke");
return null;
});
txCache.invoke(1, (entry, arguments) -> {
System.out.println("Transactional invoke");
return null;
});
}
private static <K, V> CacheConfiguration<K, V> cacheConfiguration(String name, CacheAtomicityMode atomicity) {
CacheConfiguration<K, V> cacheCfg = new CacheConfiguration<>(name);
cacheCfg.setAtomicityMode(atomicity);
cacheCfg.setCacheMode(CacheMode.REPLICATED);
cacheCfg.setWriteSynchronizationMode(CacheWriteSynchronizationMode.FULL_SYNC);
return cacheCfg;
}
"Transactional invoke" will be printed on every node, but "Atomic invoke" –– only on a single one.

Related

How to create a reusable Map

Is there a way to populate a Map once from the DB (through Mongo repository) data and reuse it when required from multiple classes instead of hitting the Database through the repository.

As per your comment, what you are looking for is a Caching mechanism. Caches are components which allow data to live in memory, as opposed to files, databases or other mediums so as to allow for the fast retrieval of information (against a higher memory footprint).
There are probably various tutorials online, but usually caches all have the following behaviour:
1. They are key-value pair structures.
2. Each entity living in the cache also has a Time To Live, that is, how long will it considered to be valid.
You can implement this in the repository layer, so the cache mechanism will be transparent to the rest of your application (but you might want to consider exposing functionality that allows to clear/invalidate part or all the cache).
So basically, when a query comes to your repository layer, check in the cache. If it exists in there, check the time to live. If it is still valid, return that.
If the key does not exist or the TTL has expired, you add/overwrite the data in the cache. Keep in mind that when updating the data model yourself, you also invalidate the cache accordingly so that new/fresh data will be pulled from the DB on the next call.

You can declare the map field as public static and this would allow application wide access to hit via ClassLoadingData.mapField
I think a better solution, if I understood the problem would be a memoized function, that is a function storing the value of its call. Here is a sketch of how this could be done (note this does not handle possible synchronization problem in a multi threaded environment):
class ClassLoadingData {
private static Map<KeyType,ValueType> memoizedValues = new HashMap<>();
public Map<KeyType,ValueType> getMyData() {
if (memoizedData.isEmpty()) { // you can use more complex if to handle data refresh
populateData(memoizedData);
} else {
return memoizedData;
}
}
private void populateData() {
// do your query, and assign result to memoizedData
}
}

Premise: I suggest you to use an object-relational mapping tool like Hibernate on your java project to map the object-oriented
domain model to a relational database and let the tool handle the
cache mechanism implicitally. Hibernate specifically implements a multi-level
caching scheme ( take a look at the following link to get more
informations:
https://www.tutorialspoint.com/hibernate/hibernate_caching.htm )
Regardless my suggestion on premise you can also manually create a singleton class that will be used from every class in the project that goes to interact with the DB:
public class MongoDBConnector {
private static final Logger LOGGER = LoggerFactory.getLogger(MongoDBConnector.class);
private static MongoDBConnector instance;
//Cache period in seconds
public static int DB_ELEMENTS_CACHE_PERIOD = 30;
//Latest cache update time
private DateTime latestUpdateTime;
//The cache data layer from DB
private Map<KType,VType> elements;
private MongoDBConnector() {
}
public static synchronized MongoDBConnector getInstance() {
if (instance == null) {
instance = new MongoDBConnector();
}
return instance;
}
}
Here you can define then a load method that goes to update the map with values stored on the DB and also a write method that instead goes to write values on the DB with the following characteristics:
1- These methods should be synchronized in order to avoid issues if multiple calls are performed.
2- The load method should apply a cache period logic ( maybe with period configurable ) to avoid to load for each method call the data from the DB.
Example: Suppose your cache period is 30s. This means that if 10 read are performed from different points of the code within 30s you
will load data from DB only on the first call while others will read
from cached map improving the performance.
Note: The greater is the cache period the more is the performance of your code but if the DB is managed you'll create inconsistency
with cache if an insertion is performed externally ( from another tool
or manually ). So choose the best value for you.
public synchronized Map<KType, VType> getElements() throws ConnectorException {
final DateTime currentTime = new DateTime();
if (latestUpdateTime == null || (Seconds.secondsBetween(latestUpdateTime, currentTime).getSeconds() > DB_ELEMENTS_CACHE_PERIOD)) {
LOGGER.debug("Cache is expired. Reading values from DB");
//Read from DB and update cache
//....
sampleTime = currentTime;
}
return elements;
}
3- The store method should automatically update the cache if insert is performed correctly regardless the cache period is expired:
public synchronized void storeElement(final VType object) throws ConnectorException {
//Insert object on DB ( throws a ConnectorException if insert fails )
//...
//Update cache regardless the cache period
loadElementsIgnoreCachePeriod();
}
Then you can get elements from every point in your code as follow:
Map<KType,VType> liveElements = MongoDBConnector.getElements();

How to use interactive query within kafka process topology in spring-cloud-stream?

Is it possible to use interactive query (InteractiveQueryService) within Spring Cloud Stream the class with #EnableBinding annotation or within the method with #StreamListener? I tried instantiating ReadOnlyKeyValueStore within provided KStreamMusicSampleApplication class and process method but its always null.
My #StreamListener method is listening to a bunch of KTables and KStreams and during the process topology e.g filtering, I have to check whether the key from a KStream already exists in a particular KTable.
I tried to figure out how to scan an incoming KTable to check if a key already exists but no luck. Then I came across InteractiveQueryService whose get() method could be used to check if a key exists inside a state store materializedAs from a KTable. The problem is that I can't access it from with the process topology (#EnableBinding or #StreamListener). It can only be accessed from outside these annotation e.g RestController.
Is there a way to scan an incoming KTable to check for the existence of a key or value? if not then can we access InteractiveQueryService within the process topology?

InteractiveQueryService in Spring Cloud Stream is not available to be used within the actual topology in your StreamListener. As you mentioned, it is supposed to be used outside of your main topology. However, with the use case you described, you still can use the state store from your main flow. For example, if you have an incoming KStream and a KTable which is materialized as a state store, then you can call process on the KStream and access the state store that way. Here is a rough code to achieve that. You need to convert this to fit into your specific use case, but here is the idea.
ReadOnlyKeyValueStore<Object, String> store;
input.process(() -> new Processor<Object, Product>() {
#Override
public void init(ProcessorContext processorContext) {
store = (ReadOnlyKeyValueStore) processorContext.getStateStore("my-store");
}
#Override
public void process(Object key, Object value) {
//find the key
store.get(key);
}
#Override
public void close() {
if (state != null) {
state.close();
}
}
}, "my-store");

Caffeine: Can't provide CacheWriter to AsyncLoadingCache

I'm trying to write a AsyncLoadingCache that accepts a CacheWriter and I'm getting an IllegalStateException.
Here's my code:
CacheWriter<String, UUID> cacheWriter = new CacheWriter<String, UUID>() {
#Override
public void write(String key, UUID value) {
}
#Override
public void delete(String key, UUID value, RemovalCause cause) {
}
};
AsyncLoadingCache<String, UUID> asyncCache = Caffeine.newBuilder()
.expireAfterWrite(60, TimeUnit.SECONDS)
.writer(cacheWriter)
.maximumSize(100L)
.buildAsync((String s) -> { /* <== line 41, exception occurs here */
return UUID.randomUUID();
});
And I'm getting this trace
Exception in thread "main" java.lang.IllegalStateException
at com.github.benmanes.caffeine.cache.Caffeine.requireState(Caffeine.java:174)
at com.github.benmanes.caffeine.cache.Caffeine.buildAsync(Caffeine.java:854)
at com.mycompany.caffeinetest.Main.main(Main.java:41)
If I'll change the cache to a LoadingCache or remove .writer(cacheWriter) the code will run properly. What am I doing wrong? it seems I'm providing the right types to both objects.

Unfortunately these two features are incompatible. While the documentation states this, I have updated the exception to communicate this better. In Caffeine.writer it states,
This feature cannot be used in conjunction with {#link #weakKeys()} or {#link #buildAsync}.
A CacheWriter is a synchronous interceptor for a mutation of an entry. For example, it might be used to evict into a disk cache as a secondary layer, whereas a RemovalListener is asynchronous and using it would leave a race where the entry is not present in either caches. The mechanism is to use ConcurrentHashMap's compute methods to perform the write or removal, and call into the CacheWriter within that block.
In AsyncLoadingCache, the value materializes later when the CompletableFuture is successful, or is automatically removed if null or an error. When the entry is modified within the hash table, this future may be in-flight. This would mean that the CacheWriter would often be called without the materialized value and likely cannot do very intelligent things.
From an API perspective, unfortunately telescoping builders (which use the type system to disallow incompatible chains) become more confusing than using runtime exceptions. Sorry for not making the error clear, which should now be fixed.

Understanding clients and servers

I'm pretty much new to ignite and have a question about responsibility of client and server nodes. As far as I got from the documentation client nodes are very small machines, so it's not their purpose to perform some heavy cache operations. For instance I need to load data from some persistence store, perform some heavy cache-related computations and put resulting data into cache. It looks like this:
I.
//This is on a client node
public class Loader{
private DataSource dataSource;
#IgniteInstanceResource
private Ignite ignite;
public void load(){
String key;
String values;
//retreive key and value from the dataSource
IgniteDataStreamer<String, String> streamer = ignite.dataStreamer("cache");
String result;
//process value
streamer.addData(key, result); //<---------1
}
}
The question is about //1. Is it client's node responsibility to process loaded data and put it into cache? I actually have intention to do the following: create task for each loaded String key and String value and perform all evaluation and cache related operations on a server node. Like the following:
II.
public class LoaderJob extends ComputeJobAdapter{
private String key;
private String value;
#Override
public Object execute(){
//perform all computation and putting into cache here
//and return Tuple2(key, result);
}
}
public class LoaderTask extends extends ComputeTaskSplitAdapter<Void, Void {
//...
public Void reduce(List<ComputeJobResult> results) throws IgniteException {
results.stream().forEach(result -> {
Tuple2<String, String> jobResult = result.getData();
ignite.dataStreamer("cache").addData(jobResult._1, jobResult._2);
});
return null;
}
}
In the second case what the client is doing is just to load data from the persistance store and then publishing tasks on servers.
What is the common way of doing things like that?

It depends on amount of data and computational complexity. In case of big amount of data you can load data right from server, without using client.
Here is the simplest example for DataStreamer, you need only to add loading data from your persistent store and do calculations before using DataStreamer.
Also, it depends on other things, like a client confuguration(CPU, RAM, network) and connection between client and server nodes. If client have a good configuration, for example, as a server, and it's in the same network as a server nodes, then it's not a problem to make load and computations on client and only after it stream data to cache.

Creating dedicate job for some data by yourself, is bad idea. Something like this doing in streamer (data will be buffered and sent to specific node where are will be stored).
client nodes are very small machines, so it's not their purpose to perform some heavy cache operations
This is not a true statement. You are able to give enough resource to client JVM, to load data.
You should create one data streamer on client side and load data from this machine. Also streamer instance is thread save, so you can load date from some threads simultaneously.

IgniteDataStreamer is the the fastest way to load data in a cache. So, the first case is valid.
I think, the second case make sense if a data will be gathered from persistence store on the server nodes and client send only parameters of the loading.

Google Guava's CacheLoader loadAll() vs reload() semantics

Two things I really like about Guava 11's CacheLoader (thanks, Google!) are loadAll(), which allows me to load multiple keys at once, and reload(), which allows me to reload a key asynchronously when it's "stale" but an old value exists. I'm curious as to how they play together, since reload() operates on but a single key.
Concretely, extending the example from CachesExplained:
LoadingCache<Key, Graph> graphs = CacheBuilder.newBuilder()
.maximumSize(1000)
.refreshAfterWrite(1, TimeUnit.MINUTES)
.build(
new CacheLoader<Key, Graph>() {
public Graph load(Key key) { // no checked exception
return getGraphFromDatabase(key);
}
public Map<Key, Graph> loadAll(Iterable<? extends K> keys) {
return getAllGraphsFromDatabase(keys);
}
public ListenableFuture<Graph> reload(final Key key, Graph prevGraph) {
if (neverNeedsRefresh(key)) {
return Futures.immediateFuture(prevGraph);
} else {
// asynchronous!
return ListenableFutureTask.create(new Callable<Graph>() {
public Graph call() {
return getGraphFromDatabase(key);
}
});
}
}
});
...where "getAllGraphsFromDatabase()" does an aggregate database query rather than length(keys) individual queries.
How do these two components of a LoadingCache play together? If some keys in my request to getAll() aren't present in the cache, they are loaded as a group with loadAll(), but if some need refreshing, do they get reloaded individually with load()? If so, are there plans to support a reloadAll()?

Here's how refreshing works.
Refreshing on a cache entry can be triggered in two ways:
Explicitly, with cache.refresh(key).
Implicitly, if the cache is configured with refreshAfterWrite and the entry is queried after the specified amount of time after it was written.
If an entry that is eligible for reload is queried, then the old value is returned, and a (possibly asynchronous) refresh is triggered. The cache will continue to return the old value for the key while the refresh is in progress. (So if some keys in a getAll request are eligible for refresh, their old values will be returned, but the values for those keys will be (possibly asynchronously) reloaded.)
The default implementation of CacheLoader.reload(key, oldValue) just returns Futures.immediateFuture(load(key)), which (synchronously) recomputes the value. More sophisticated, asynchronous implementations are recommended if you expect to be doing cache refreshes.
I don't think we're inclined to provide reloadAll at the moment. I suspect it's possible, but things are complicated enough as it is, and I think we're inclined to wait until we see specific demand for such a thing.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Ignite - full sync configuration - java

Related

How to create a reusable Map

How to use interactive query within kafka process topology in spring-cloud-stream?

Caffeine: Can't provide CacheWriter to AsyncLoadingCache

Understanding clients and servers

Google Guava's CacheLoader loadAll() vs reload() semantics

Categories

Resources