Is Terracotta a distributed cache? - java

Is Terracotta a distributed cache?

Although you don't specify which product you are talking about, I'm going to assume you mean the open source platform itself. The short answer is no, but it can be used to write a distributed cache, and it has been in one of their own products (Ehcache).
You can see an overview of what the core engine is about here (it seems that they are hiding the information on their open source platform behind a registration wall now). It is a clustering engine that does not use J2EE technology, and its main purpose is to simplify distributed computing development. Besides caching, obvious use cases involve high availability and scalability needs. Think of it as enabling relatively plain java code to run "in the cloud" without having to worry about a lot of the details that that might involve.

Terracotta has nothing to do with 'caching' although most implementations use it for caching purpose.
Terracotta is about clustering and the terracotta itself is implemented using java (to my knowledge).
How Terracotta achieves clustering:
1) JVM1 running APP
2) JVM2 running APP (same)
3) JVM3 running APP (same)
Without Terracotta all JVMs are running independently with out knowing about each other performing some redundant tasks and maintaining their independent heaps
When you enable Terracotta (a Terracotta server running) across these 3 JVMs (configured to use Terracotta server)
Terracotta gives a logical view of all 3 JVMs as a single JVM. Any object graph that you designate to be stored at Server(Root ) is available to all 3 JVMs just like any local object but each JVM can can read/write to that object, whose changes are immediately(~) available to the other JVMs.
For this very reason Terracotta is used mainly for caching and distributed computing as idle JVMs with no work can process the work of the heavily loaded lagging JVM if the unit of work object is designated to be shareable.

Your question is unclear (Terracotta has several products) but yes, the Terracotta Platform does offer a solution for Distributed Caching.

L2 cache is the one that is external to a processor (a JVM, in our case) and shared among them. Serving as a transparent L2 cache, Terracotta combines your multicomputer into a multiporcessor. Thus, it is a distributed cache. But, you seem not to get it because you are SW guys who have never imagined that it can be transparent. You expect that a cache is a thing that has get/set methods and coherence problem that you need to resolve at application level.
Read the "Definite Guide to Terracotta". The authors are literal saying that Terracotta is a distributed cache. I think they understand this better than anybody who says "no" replying here.

Related

Memory Caching architecture

I am implementing a memory caching system for a web application. This memory caching system will have to handle objects sized from small scale to large scale and more and more hits to caches (reads and writes) . The system will have to handle multiple cachinng services such as JCS, ehCach, Memcach, SQL caching etc based on the configuration.
For learning and studying purposes and to implement a better architecture for my system, any one please let me have some resources. (example: sample class diagrams with project source files ).
The question is totally unspecific! The best thing you can do is to work through the tutorials, examples and manuals of the caching solutions.
You should also consider distributed caching solutions like infinispan and hazelcast.
For in-memory only caching Guava Cache and cache2k (I work on cache2k) might be sufficient.
If you start a new architecture around caching I strongly suggest that you look into the JSR107/JCache spec, because this is the new standard way to access caching services.

Cache implementation

I've been researching this for a week now, but I'd like some thoughts on my particular situation...
2 physical servers:
Server A - public WAR, admin WAR
Server B - public WAR
Requirements:
Both WARs need to view the same data.
admin WAR modifies / adds data to the cache.
public WARs modify other parts of the cache / add data to it.
entire cache needs to reside in memory on each physical server (if I add something on Server A admin WAR or public WAR, it needs to show up on Server B public WAR) so in the event of a failure, we aren't waiting for half the cache to be populated
1,500 active users/server, vast majority of traffic is read, very little write
Additional hardware is out of the question.
Is there a good third party caching solution for this scenario? It seems most distributed caching systems want to leave half the data on Server A and half on Server B, which wouldn't meet our failover performance needs.
Thanks for any ideas!
You should look at Redis
http://www.gigaspaces.com/ has a solution for that, it allows you to create "Space" that serves as cache in replicated mode, so each node will have exact copy of data.
They also have solution for fail-over or hot stand by.
Edit:
Gigaspace is far more than just a shared cache, but you can use just the caching solution. It's called In memory data grid. They have dramaticaly changed they web pages so I can't find exact page. But if you search through the documentation yo'll find it.
You can start here
http://www.gigaspaces.com/datagrid
But the technology is not free.
Take a look at the replicated options for EhCache.
Sounds like you've been searching for information on "distributed caches", which has a different defintion than "replicated cache". A distributed cache is a larger cache system spread out among many machines, so that the loss of anyone machine in the cluster does not bring down the entire cache, but just a portion. In this scenario the total size of your cache can reach (number of machines times memory of each machine).
In a replicated cache, the cached data is replicated across each machine, limiting you to a total cache size of max(memory of any one machine).
It seems most distributed caching systems want to leave half the data
on Server A and half on Server B, which wouldn't meet our failover
performance needs.
No, you can tweak it easy. Otherwise you need sticky seesion (you have to know exactly, which cache stores your data). You can choose any solution on the market EhCache, GigaSpace, GridGain etc. I would recommend to use JBoss Cache, imho the simplest and exactly what you need
There are many solutions in this space.
Memcached
EhCache
Infinispan
All of them can be configured as distributed caches. AFAIK Infinispan works best when left an an embedded cache in JBoss AS, last I checked it was difficult to integrate into other app servers. If you have money I would recommend BigMemory from Terracotta. Its the commercial derivative of EhCache and provides alot of additional nice-to-have features.
We use Apache Commons JCS and have been very pleased with it. It claims to be almost twice as fast as EHCache. For the situation you have described, you would probably configure a Lateral TCP Cache.

memcached tomcat mysql on 1GB RAM

I am new to memcached and caching in general. I have a java web application running on Ubuntu + Tomcat + MySQL on a VPS Server with 1GB of memory.
Does it make sense to add a memcached layer with about 256MB for caching? Will this be too much load on the server? Which is more appropriate caching rendered html pages or database objects?
Please advise.
If you're going to cache pages, don't use memcached, use Varnish. However, there's a good chance that's not a great use of memory. Cacheing pages trades memory for computation and database work, but it does cost quite a lot of memory per page, so it's best for cases where the computation and database work needed to produce a single page amounts to a lot (or the pages are very small!). Also, consider that page cacheing won't be effective, or even possible, if you want to use per-user customisation on your pages (eg showing the number of items in a shopping cart). At least not without getting into some truly hairy shenanigans (edge-side includes, anyone?).
If you're not going to cache pages, and your app is on a single machine, then there's no point using memcached or similar. The point of cache servers like that is to make the memory on one machine work as a cache for another - like how a file server shares a disk, they're essentially memory servers. On a single machine, you might as well give all the memory to Java and cache objects on the heap.
Are you using an object-relational mapper? If so, see if it has any support for a second-level cache. The big three implementations (Hibernate, OpenJPA, and EclipseLink) all support in-memory caches. They're likely to do a much better job than you would if you did the cacheing yourself.
But, if you're not using a mapper, you have no choice but to do the cacheing yourself. There are extension points in LinkedHashMap for building LRU caches, and then of course there's the people's favourite, SoftReference, in combination with a HashMap. Plus, there are probably cache implementations out there you could download and use - i'd be shocked if there wasn't something in the Apache Commons libraries.
memcached won't add any noticeable load on your server, but it will be memory your app can't use. If you only plan to have a single app server for a while, you're better off using an in-JVM cache.
As far what to cache, the answer falls somewhere in the middle of the above. You don't want to cache exactly what's in your database and you certainly don't want to cache the final output. You have a data model representation in your application that isn't exactly what's in the DB (e.g. a User object might be made up of multiple queries from a few different tables). Cache that kind of thing as it's most reusable.
There's lots of info in the memcached site that should help you understand and get going with caching in general and memcached specifically.
It might make sense to do that, why don't try a smaller size like 64 MB and see how that goes. When you use more resources for the memcache, there is less for everything else. You should try it and see what will give you the best performance.

Choosing a distributed shared memory solution

I have a task to build a prototype for a massively scalable distributed shared memory (DSM) app. The prototype would only serve as a proof-of-concept, but I want to spend my time most effectively by picking the components which would be used in the real solution later on.
The aim of this solution is to take data input from an external source, churn it and make the result available for a number of frontends. Those "frontends" would just take the data from the cache and serve it without extra processing. The amount of frontend hits on this data can literally be millions per second.
The data itself is very volatile; it can (and does) change quite rapidly. However the frontends should see "old" data until the newest has been processed and cached. The processing and writing is done by a single (redundant) node while other nodes only read the data. In other words: no read-through behaviour.
I was looking into solutions like memcached however this particular one doesn't fulfil all our requirements which are listed below:
The solution must at least have Java client API which is reasonably well maintained as the rest of app is written in Java and we are seasoned Java developers;
The solution must be totally elastic: it should be possible to add new nodes without restarting other nodes in the cluster;
The solution must be able to handle failover. Yes, I realize this means some overhead, but the overall served data size isn't big (1G max) so this shouldn't be a problem. By "failover" I mean seamless execution without hardcoding/changing server IP address(es) like in memcached clients when a node goes down;
Ideally it should be possible to specify the degree of data overlapping (e.g. how many copies of the same data should be stored in the DSM cluster);
There is no need to permanently store all the data but there might be a need of post-processing of some of the data (e.g. serialization to the DB).
Price. Obviously we prefer free/open source but we're happy to pay a reasonable amount if a solution is worth it. In any way, paid 24hr/day support contract is a must.
The whole thing has to be hosted in our data centers so SaaS offerings like Amazon SimpleDB are out of scope. We would only consider this if no other options would be available.
Ideally the solution would be strictly consistent (as in CAP); however, eventual consistence can be considered as an option.
Thanks in advance for any ideas.
Have a look at Hazelcast. It is pure Java, open source (Apache license) highly scalable in-memory data grid product. It does offer 7X24 support. And it does solve all of your problems I tried to explain each of them below:
It has a native Java Client.
It is 100% dynamic. Add and remove nodes dynamically. No need to change anything.
Again everything is dynamic.
You can configure number of backup nodes.
Hazelcast support persistency.
Everything that Hazelcast offers is free(open source) and it does offer enterprise level support.
Hazelcast is single jar file. super easy to use. Just add jar to your classpath. Have a look at screen cast in main page.
Hazelcast is strictly consistent. You can never read stale data.
I suggest you to use Redisson - Redis based In-memory Data Grid for Java. Implements (BitSet, BloomFilter, Set, SortedSet, Map, ConcurrentMap, List, Queue, Deque, BlockingQueue, BlockingDeque, ReadWriteLock, Semaphore, Lock, AtomicLong, CountDownLatch, Publish / Subscribe, RemoteService, ExecutorService, LiveObjectService, SchedulerService) on top of Redis server! It supports master/slave, sentinel and cluster server modes. Automatic cluster/sentinel servers topology discovery supported also. This lib is free and open-source.
Perfectly works in cloud thanks to AWS Elasticache support
Depending of what you prefer, i would surely follow the others by suggesting Hazelcast if you're towards AP from the CAP Theorem but if you need CP, i would choose Redis
Have a look at Terracotta's JVM clustering, it's OpenSource ;)
It has no API while it works efficent at JVM level, when you store the value in a replicated object it is sent to all other nodes.
Even locking and all those things work transparent and without adding any new code.
You may want to checkout Java-specific solutions like Coherence: http://www.oracle.com/global/ru/products/middleware/coherence/index.html
However, I consider such solutions to be too complex and prefer to use solutions like memcached. Big disadvantage of memcached for your purpose is lack of record lock it seems and there is no built in way to replicate data for failover. That is why I would look into the key-value data stores. Many of them would satisfy your need completely.
Here is a list of key-value data stores that may help you with your task:
http://www.metabrew.com/article/anti-rdbms-a-list-of-distributed-key-value-stores
Just pick one that you fill comfortable with.
I am doing a similar project, but instead targeting the .NET platform. Apart from the already mentioned solutions, I think you should take a look at ScaleOut StateServer and Alachisoft NCache. I am afraid neither of these alternatives are cheap, but they are a safer bet than open source for commercial solutions according to my judgement.
Both provide Java client APIs, even though I have only played around with the .NET APIs.
StateServer features self-discovery of new cache nodes, and NCache has a management console where new cache nodes can be added.
Both should be able to handle failovers seamlessly.
StateServer can have 1 or 2 passive copies of the data. NCache features more caching topologies to choose between.
If you mean write-through/write-behind to a database that is available in both.
I have no idea how many cache servers you plan to use, but here are the full price specs:
ScaleOut StateServer
Alachisoft NCache
Both are installed and configured locally on your server and they both have GUI Management.
I am not sure exactly what strictly consistent involves, so I'll leave that for you to investigate..
Overall, StateServer is the best option if you want to skip configuring every little detail in the cache cluster, while NCache features very many features and caching topologies to choose from.
Depending on the behaviour of data towards the clients (if the data is read many times from the same client) it might be a good idea to mix local caching on the clients with the distributed caching in the cluster (available for both NCache and StateServer), just a thought.
The specified use case seems to fit into Netflix's Hollow. This is a read-only replicated cache with a single producer and multiple consumers.
Have you tought about using a standard messaging solution like rabbitmq ?
RabbitMQ is an open source implementation of the AMQP protocol.
Your application seems more or less like a Publish/subscribe system.
The Publisher node is the one that does the processing and puts messages (processed data) in a queue in the servers.
Subscribers can get messages from the server in various ways. AMQP decouples the producer and the consumer of messages and is very flexible in how you can combine the two sides.

Any experience using Terracotta open source?

Does anybody have experience using the open source offering from Terracotta as opposed to their enterprise offering? Specifically, I'm interested if it is worth the effort to use terracotta without the enterprise tools to manage your cluster?
Over-simplified usage summary: we're a small startup with limited budget that needs to process millions of records and scale for hundreds-of-thousands of page views per day.
I am in a process of integrating Terracotta with my project (a sensor node network simulator). About three weeks ago I found out about Terracotta from one of my colleagues. And now my application takes advantage of grid computing using Terracotta. Below I summarized some essential points of my experience with Terracotta.
The Terracotta site contains pretty detailed documentation. This article probably a good starting point for a developer Concept and Architecture Guide
When you are stuck with a problem and found no answer in the documentation, the Terracotta community forum is a good place to ask questions. It seems that Terracotta developers check it on a regular basis and pretty responsive.
Even though Terracotta is running under JVM and it is advertised that it is only a matter of configuration to make you application running in a cluster, you should be ready that it may require to introduce some serious changes in you application to make it perform reasonably well. E.g. I had to completely rewrite synchronization logic of my application.
Good integration with Eclipse.
Admin Console is a great tool and it helped me a lot in tweaking my application to perform decently under Terracotta. It collects all performance metrics from servers and clients you can only think of. It certainly has some GUI related issues, but who does not :-)
Prefer standard Java Synchronization primitives (synchronized/wait/notify) over java.util.concurrent.* citizens. I found that standard primitives provide higher flexibility (can be configured to be a read or write cluster lock or even not-a-lock at all), easier to track in the Admin Console (you see the class name of the object being locked rather then e.g. some ReentrantLock).
Hope that helps.
At the moment, the Terracotta enterprise tools provide only a few features beyond the open source version around things like visualization and management (like the ability to kick a client out of the cluster). That will continue to diverge and the enterprise tools are likely to boast more operator-level functionality around things like managing and monitoring, but you can certainly manage and tune an app even with the open source tools.
The enterprise license also gives you things like support, indemnification, etc which may or may not be as important to you as the tooling.
I would urge you to try it for yourself. If you'd like to see an example of a real app using Terracotta, you should check out this reference web app that was just released:
The Examinator
You may want to take a look at JBossCache/PojoCache which is an in-memory distributed caching solution. The difference is it uses a simple API to propagate objects across your 'cluster' of caches, where as Terracotta works at the classloading/jvm level.
(They don't actually have their own JVM, but they modify classes as they are loaded to allow them to be 'clusterable')
Our company had a lot of luck with JBossCache, I'd recommend checking it out.
Update
What I see in the OP message is "well, I don't really know what we need (thus the lack of detailed requirements), but may be some enterprizey tool will magically solve all our problems, known and unforeseen? That would be awesome!"
With an architectural approach like this it's not gonna fly. No success stories from Teracotta would change that.
OSS is beneficial when the community around it can replace the commercial support. Suppose the guy have a problem in production. Community cannot help -- it's too small for the obscure product like this. Servers are down, business is in danger. You see? You need a commercial license up-front. No money? Well, then you're not a business, and probably not gonna become one (if nobody's willing to invest into it).
Sorry for interrupting your day-dreaming.
IMHO:
Terracotta is a clustering solution. Clustering is required for large, enterprise-grade applications. Large applications mean big budgets. Big budgets mean you can afford commercial license from Terracotta.
To put it in another way: if you don't have budget to buy it, it's probably not beneficial for your project.

Categories