Isolation within the same process with Infinispan [duplicate]

Isolation within the same process with Infinispan [duplicate] - java

I am planning to implement a cache solution into an existing web app. Nothing complicated: basically a concurrent map that supports overflowing to disk and automatic eviction. Clustering the cache could be requirement in the future, but not now.
I like ehcache's copyOnRead and copyOnWrite features, because it means that I don't have to manually clone things before modifying something I take out of the cache. Now I have started to look at Infinispan, but I have not found anything equivalent there. Does it exist?
I.e., the following unit tests should pass:
#Test
public void testCopyOnWrite() {
Date date = new Date(0);
cache.put(0, date);
date.setTime(1000);
date = cache.get(0);
assertEquals(0, date.getTime());
}
#Test
public void testCopyOnRead() {
Date date = new Date(0);
cache.put(0, date);
assertNotSame(cache.get(0), cache.get(0));
}

Infinispan does support copyOnRead/copyOnWrite, albeit the actual format isn't pluggable. The configuration element is lazyDeserialization in Infinispan 4.x and storeAsBinary in Infinispan 5.x. Objects are serialized using the pluggable Marshaller framework, which is used for all forms of marshalling including for RPC calls over a network and storage to disk.

According to a JBoss developer, Infinispan does not yet support such feature. You should log a request for enhancement in the Infinispan issue tracker, so that others may vote on it (I will).
That being said, if you need this feature now, a workaround would be to extend AbstractDelegatingCache, and override the get and put methods to add this functionality. You could use your own copy strategy or look at how EHCache did it for inspiration.
Also, you may consider the Infinispan forum if you have further questions, since you will have more views from the Infinispan community.

I believe storeAsBinary only takes effect when objects need to be serialized which means when a put operation is called, the owner is not the current node.
This also means the testcases in the question could pass if the owner of key 0 is not the current node, but it would still fail if it's a single node environment.

Related

Spring Boot Cache TTL

I want to use Spring Boot Cache Abstraction to cache some data (https://docs.spring.io/spring/docs/current/spring-framework-reference/html/cache.html).
I'm open to using any of the providers that are available.
The main thing I need is this: I want to be able to set object level TTL, not just global cache level TTL.
E.g. for each object I store in my cache, I want to specify a custom TTL for the object based on some property of that object.
I know that to set up something like this, it must be done directly through the cache provider; but I have not been able to find examples of my use case - only found use cases where global TTL was being set. Can anyone help?

If you are working with redis, you can take a look at JetCache:
#Cached(expire = 10, timeUnit = TimeUnit.MINUTES)
User getUserById(long userId);

You need to check out the features of the different cache implementations available for Spring boot.
Supporting a variable expiry based on the entry value, has implications on the internals of the cache implementation and its performance. With variable expiry you need typically a O(log n) data structure. For example, Guava and Caffeine do not support it. EHCache does support it, see the Documentation about expiry.
The requested functionality is "beyond" the Spring abstraction, which means, you need to produce code for one specific cache implementation.

Coherence EntryProcessor query

I'm trying to implement a business functionality which uses Coherence transient caches.
One of the features I was planning to depend upon is auto-eviction of cache entries, when providing a (configurable) time-to-live at the time of putting an item in the cache. The interface NamedCache provides an API to achieve this (http://download.oracle.com/otn_hosted_doc/coherence/330/com/tangosol/net/NamedCache.html#put(java.lang.Object, java.lang.Object, long)).
However, I'm also planning to use Entry-Processors to ensure effective concurrency across the cluster. I'm stuck at a point now where, within the scope of the processor, I'm supposed to work with InvocableMap.Entry to get/set values with a key in the cache. Unfortunately, there is no setValue method which lets me specify the time-to-live value.
I'm assuming here that interfacing directly with the NamedCache reference inside the EntryProcessor's process method will not be a good idea, and will compromise the concurrency guarantees which EntryProcessor provides.
Can you please share your thoughts on what could be the best way to get an entry evicted after a certain amount of time (which is dynamically decided), while ensuring optimal concurrency across a cluster of nodes?
I'm not completely hung up on using the auto-eviction functionality. However, if I were to abandon that, I may have to rely upon a timer-based programmatic removal of the entry, which works reliably across a cluster. Again, I'm falling short of ideas on this one. Ideally, I would want Coherence to deal with this.
Many thanks in advance.
Best regards,
- Aditya

you can try the following:
Cast the entry in the EntryProcessor to BinaryEntry and set the expiration time.
For example:
public class MyEntryProcessor extends AbstractProcessor implements PortableObject {
#Override
public Object process(Entry myEntry) {
((BinaryEntry)myEntry).expire(100);
return myEntry;
}
}
http://docs.oracle.com/middleware/1212/coherence/COHJR/com/tangosol/util/BinaryEntry.html

What is the real delivered value of JTA?

I'm trying to wrap my head around the value underneath the Java Transactions API (JTA) and one of its implementations, Bitronix. But as I dig deeper and deeper into the documentation, I just can't help but think of the following, simple example:
public interface Transactional {
public void commit(Object);
public void rollback();
}
public class TransactionalFileWriter extends FileWriter implements Transactional {
#Override
public void commit(Object obj) {
String str = (String)obj;
// Write the String to a file.
write(str);
}
#Override
public void rollback() {
// Obtain a handler to the File we are writing to, and delete the file.
// This returns the file system to the state it was in before we created a file and started writing to it.
File f = getFile();
// This is just pseudo-code for the sake of this example.
File.delete(f);
}
}
// Some method in a class somewhere...
public void doSomething(File someFile) {
TransactionalFileWriter txFileWriter = getTxFW(someFile);
try {
txFileWriter.commit("Create the file and write this message to it.");
} catch(Throwable t) {
txFileWriter.rollback();
}
}
Don't get too caught up in the actual code above. The idea is simple: a transactional file writer that creates a file and writes to it. It's rollback() method deletes the file, thus returning the file system to the same state it was in before the commit(Object).
Am I missing something here? Is this all the JTA offers? Or is there a whole different set of dimensionality/aspects to transactionality that isn't represented by my simple example above? I'm guessing the latter but have yet to see anything concrete in the JTA docs. If I am missing something, then what is it, and can someone show me concrete examples? I can see transactionality being a huge component of JDBC but would hopefully like to get an example of JTA in action with something other than databases.

As every one else has mentioned, the primary benefit of JTA is not the single transaction case, but the orchestration of multiple transactions.
Your "Transactional File" is an excellent, conceptual, example when used in the proper context.
Consider a contrived use case.
You're uploading a picture that has associate meta data and you want to then alert the infrastructure that the file as arrived.
This "simple" task is fraught with reliability issues.
For example, this workflow:
String pathName = saveUploadedFile(myFile);
saveMetaData(myFile.size(), myFile.type(), currentUser, pathName);
queueMessageToJMS(new FileArrivalEvent(user, pathName);
That bit of code involves the file system and 2 different servers (DB and JMS).
If the saveUploadedFile succeeds, but the saveMetaData does not, you now have a orphaned file on the file system, a "file leak" so to speak. If the saveMetaData succeeds, but the queue does not, you have saved the file, but "nobody knows about it". The success of the transaction relies upon all 3 components successfully performing their tasks.
Now, throw in JTA (not real code):
beginWork();
try {
String pathName = saveUploadedFile(myFile);
saveMetaData(myFile.size(), myFile.type(), currentUser, pathName);
queueMessageToJMS(new FileArrivalEvent(user, pathName);
} catch(Exception e) {
rollbackWork();
} finally {
commitWork();
}
Now it "all works", or "none of it works".
Normally folks jump through hoops to make this kind of thing work safely, since most systems do not have transaction managers. But with a transaction manager (i.e. JTA), you the TM manages all of the hoops for you, and you get to keep your code clean.
If you survey the industry you will find very few transaction managers. Originally they were proprietary programs used by "Enterprise" grade systems. TIBCO is a famous one, IBM has one, Microsoft has one. Tuxedo used to be popular.
But with Java, and JTA, and the ubiquitous Java EE (etc) servers "everyone" has a transaction manager. We in the Java world get this orchestration for "free". And it's handy to have.
Java EE made transaction managers ubiquitous, and transaction handling a background consideration. Java EE means "never having to write commit() again". (Obviously Spring offers similar facilities).
For most systems, it's not necessary. That's why most people don't know much about it, or simply don't miss it. Most systems populate a single database, or simply don't worry about the issues surrounding orchestration of multiple systems. The process can be lossy, they have built in their own clean up mechanisms, whatever.
But when you need it, it's very nice. Committing to multiple system simultaneously cleans up a lot of headaches.

The biggest feature of JTA is that you can compose several transactional stores in one application and run transactions that span across these independent stores.
For instance, you can have a DB, a distributed transactional key-value store and your simple FileWriter and have a transaction that performs operations on all of these and commit all the changes in all the stores at once.
Take a look at infinispan. That's a transactional data grid, it uses JTA and can be used in combination with other JTA transactional services.
Edit:
Basically JTA is connected to the X/Open XA standard and it provides means to interact with X/Open XA resources directly in Java code. You can use alredy existing data-stores which hold X/Open XA compliant resources, such as databases, distributed data-grids and so on. Or you can define your own resources by implementing javax.transaction.xa.XAResource. Then, when your User transaction uses these resources, the transaction manager will orchestrate everything for you, no matter where the resources are located, in which data-store.
The whole bussiness is managed by the transaction manager which is responsible for synchronizing independent data-stores. JTA doesn't come with a transaction manager. JTA is just an API. You could write your own if you wish to (javax.transaction.TransactionManager), but obviously that's a difficult task. Instead what you want is to use some already implemented JTA service/library which features a transaction manager. For instance, if you use infinispan in your application you can use its transaction manager to allow your transactions to interact with different data-stores as well. It's best to seek further information on how to accomplish this from the implementators of JTA interface.
You can find full JTA API documentation here, though it's pretty long. There are also some tutorials available that talk about how to use Java EE Transaction Manager and update multiple data-stores but it is pretty obscure and doesn't provide any code samples.
You can check out Infinispan's documentation and tutorials, though I can't see any example that would combine Infinispan with other data-store.
Edit 2:
To answer your question from the comment: your understanding is more or less correct, but I'll try to clarify it further.
It'll be easier to explain the architecture and answer your question with a picture. The below are taken from the JTA spec 1.1
This is the X/Open XA architecture:
Each data-store (a database, message queue, SAP ERP system, etc) has its own resource manager. In case of a relational database, the JDBC driver is a resource adapter that represents the Resource Manager of the database in Java. Each resource has to be available through the XAResource interface (so that Transaction Manager can manage them even without knowing the implementation details of a specific data-store).
Your application communicates with both the Resource Managers (to get access to the specific resources) by the resource adapters, as well with the Transaction Manager (to start/finish a transaction) by the UserTransaction interface. Each Resource Manager needs to be initialized first and it has to be configured for global transactions (i.e. spanning across several data-stores).
So basically, yes, data-stores are independent logical units that group some resources. They also exhibit interface that allows to perform local transactions (confined to that specific data-store). This interface might be better-performing or might expose some additional functionality specific to that data-store which is not available through the JTA interface.
This is the architecture of JTA environment:
The small half-circle represents the JTA interface. In your case you're mostly interested in the JTA UserTransaction interface. You could also use EJB (transactional beans) and the Application Server would manage transactions for you, but that's a different way to go.
From the transaction manager’s perspective, the actual implementation of the
transaction services does not need to be exposed; only high-level interfaces need to be
defined to allow transaction demarcation, resource enlistment, synchronization and
recovery process to be driven from the users of the transaction services.
So the Transaction Manager can be understood as an interface which only represents the actual mechanism used to manage transactions such as JTS implementation, but thinking about it as a whole is not an error neither.
From what I understand, if you run for instance a JBoss application server, you're already equipped with a Transaction Manager with the underlying transaction service implementation.

Struts2 static data storage / access

I am trying to find what is the usual design/approach for "static/global"! data access/storage in a web app, I'm using struts 2. Background, I have a number of tables I want to display in my web app.
Problem 1.
The tables will only change and be updated once a day on the server, I don't want to access a database/or loading a file for every request to view a table.
I would prefer to load the tables to some global memory/cache once (a day), and each request get the table from there, rather than access a database.
I imagine this is a common scenario and there is an established approach? But I cant find it at the moment.
For struts 2, Is the ActionContext the right place for this data.
If so, any link to a tutorial would be really appreciated.
Problem 2.
The tables were stored in a XML file I unmarshalled with JAXB to get the table objects, and so the lists for the tables.
For a small application this was OK, but I think for the web app, its hacky to store the xml as resources and read in the file as servlet context and parse, or is it?
I realise I may be told to store the tables to a database accessing with a dao, and use hibernate to get the objects.
I am just curious as to what is the usual approach with data already stored in XML file? Given I will have new XML files daily.
Apologies if the questions are basic, I have a large amount of books/reference material, but its just taking me time to get the higher level design answers.

Not having really looked at the caching options I would fetch the data from the DB my self but only after an interval has passed.
Usually you work within the Action scope, the next level up is the Session and the most global is the Application. A simple way to test this is to create an Action class which implements ApplicationAware. Then you can get the values put there from any jsp/action... anywhere you can get to the ActionContext (which is most anyplace) see: http://struts.apache.org/2.0.14/docs/what-is-the-actioncontext.html
Anyways, I would implement a basic interceptor which would check if new data should be available and I have not looked it up already, then load the new data (the user triggering this interceptor may not need this new data, so doing this in a new thread would be a good idea).
This method increases the complexity, as you are responsible for managing some data structures and making them co-operate with the ORM.
I've done this to load data from tables which will never need to be loaded again, and that data stands on it's own (I don't need to find relationships between it and other tables). This is quick and dirty, Stevens solution is far more robust and probably would pay you back at a later date when further performance is a requirement.

This isn't really specific to Struts2 at all. You definitely do not want to try storing this information in the ActionContext -- that's a per-request object.
You should look into a caching framework like EHCache or something similar. If you use Hibernate for your persistence, Hibernate has options for caching data so that it does not need to hit the database on every request. (Hibernate can also use EHCache for its second-level cache).

As mentioned earlier, the best approach would be using EHCache or some other trusted cache manager.
Another approach is to use a factory to access the information. For instance, something to the effect of:
public class MyCache {
private static MyCache cache = new MyCache();
public static MyCache getCache() {
return cache;
}
(data members)
private MyCache() {
(update data members)
}
public synchronized getXXX() {
...
}
public synchronized setXXX(SomeType data) {
...
}
}
You need to make sure you synchronize all your reads and writes to make sure you don't have race conditions while updating the cache.
synchronized (MyCache.getCahce()) {
MyCahce.getCache().getXXX();
MyCache.getCache().getTwo();
...
}
etc
Again, better to use EHCache or something else turn-key since this is likely to be fickle without good understanding of the mechanisms. This sort of cache also has performance issues since it only allows ONE thread to read/write to the cache at a time. (Possible ways to speed up are to use thread locals and read/write locks - but that sort of thing is already built into many of the established cache managers)

Is Memcache (Java) for Google App Engine a global cache?

I'm new to Google App Engine, and I've spent the last few days building an app using GAE's Memcache to store data. Based on my initial findings, it appears as though GAE's Memcache is NOT global?
Let me explain further. I'm aware that different requests to GAE can potentially be served by different instances (in fact this appears to happen quite often). It is for this reason, that I'm using Memcache to store some shared data, as opposed to a static Map. I thought (perhaps incorrectly) that this was the point of using a distributed cache so that data could be accessed by any node.
Another definite possibility is that I'm doing something wrong. I've tried both JCache and the low-level Memcache API (I'm writing Java, not Python). This is what I'm doing to retrieve the cache:
MemcacheService cache = MemcacheServiceFactory.getMemcacheService();
After deployment, this is what I examine (via my application logs):
The initial request is served by a particular node, and data is stored into the cache retrieved above.
The new few requests retrieve this same cache and the data is there.
When a new node gets spawned to serve a request (from the logs I know when this happens because GAE logs the fact that "This request caused a new process to be started for your application .."), the cache is retrieved and is EMPTY!!
Now I also know that there is no guarantee to how long data will be in Memcache, but from my findings it appears the data is gone the moment a diff instance tries to access the cache. This seems to go against the whole concept of a distributed global cache no?
Hopefully someone can clarify exactly how this SHOULD behave. If Memcache is NOT suppose to be global and every server instance has its own copy, then why even use Memcache? I could simply use a static HashMap (which I initially did until I realized it wouldn't be global due to different instances serving my requests).
Help?

Yes, Memcache is shared across all instances of your app.

I found the issue and got it working. I was initially using the JCache API and couldn't get it to work, so I switched over to the low-level Memcache API but forgot to remove the old JCache code. So they two implementations were stepping on each other.
I'm not sure why the JCache implementation didn't work so I'll share the code:
try {
if (CacheManager.getInstance().getCache(CACHE_GEO_CLIENTS) == null) {
Cache cache = CacheManager.getInstance().getCacheFactory().createCache(Collections.emptyMap());
cache.put(CACHE_GEO_CLIENTS, new HashMap<String, String>());
CacheManager.getInstance().registerCache(CACHE_GEO_CLIENTS, cache);
}
} catch (CacheException e) {
log.severe("Exception while creating cache: " + e);
}
This block of code is inside a private constructor for a singleton called CacheService. This singleton serves as a Cache facade. Note that since requests can be served by different nodes, each node will have this Singleton instance. So when the Singleton is constructed for the first and only time, it'll check to see if my cache is available. If not, it'll create it. This should technically happen only once since Memcache is global yeah? The other somewhat odd thing I'm doing here is creating a single cache entry of type HashMap to store my actual values. I'm doing this because I need to enumerate through all keys and that's something that I can't do with Memcache natively.
What am I doing wrong here?

Jerry, there are two issues I see with the code you posted above:
1) You are using the javax.cache version of the API. According to Google, this has been deprecated:
http://groups.google.com/group/google-appengine-java/browse_thread/thread/5820852b63a7e673/9b47f475b81fb40e?pli=1
Instead, it is intended that we use the net.sf.jsr107 library until the JSR is finalized.
I don't know that using the old API will cause a specific issue, but still could be trouble.
2) I don't see how you are putting and getting from the cache, but the put statement you have is a bit strange:
cache.put(CACHE_GEO_CLIENTS, new HashMap());
It looks like you are putting a second cache inside the main cache.
I have very similar code, but I'm putting and getting individual objects into the cache, not Maps, keyed by a unique ID. And it is working fine for me across multiple instances on GAE.
-John

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.