Infinispan : Local cache only

Infinispan : Local cache only - java

I'm using Infinispan (7.2.3.Final) to store data into multiples caches.
The thing is : I only want to store datas locally, into files. I don't want to store datas into memory to avoid memory issues.
I get this error :
java.lang.OutOfMemoryError: Java heap space

Infinispan is a Java distributed in-memory. So I don't think it is relevant to use it if you don't want to use RAM. In my opinion, a good usage of Infinispan means you will have tune the memory (size and eviction) to find a good tradeoff between the running-cost, the complexity, and the performance.
You can configure Infinispan to persist data (doc). And you can configure it to evict data from RAM memory (doc). But I cannot advice a real configuration if you do not describe your use-case, and in particular why you think you need Infinispan (why not a data base?)
A possible usage is to keep all-in-memory. Obviously your data has to be small enough (I don't give number, some people can accept to pay several machines to reduce latency, it depends on your business...)
Ten year ago, we could use them to simply batch insertions in a data base. Now we use Kafka for this use case.
A frequent usage is keep hot data in memory. In this case we configure eviction and persistence. I think you are looking for eviction strategies here. There are several eviction strategies. But as far I know none allows to not use RAM at all: objects will pass through the memory, at least during the persistence.

Related

In memory distributed caching vs In memory data grid

What is the difference between "In memory distributed caching" vs "In memory data grid" ?
When do we use one over the other i.e., what are the practical use cases for "In memory data grid" ?
Can you name few popular "In memory data grid" frameworks which are compatible with Java applications ?

In-memory distributed caching is just that - a cache that is distributed across different nodes. It makes data highly available to the application(s) using it. They're usually key/value stores and support the standard put/get operations along with abilities to partition, replicate or backup data.
An in-memory data grid is a distributed cache with a bit of computing power thrown in. Over the abilities of a distributed cache, it allows you to do distributed SQLs, co-located processing etc...
plamb has given a good list of in-memory data grids.

With a In memory data grid you can do a in memory distributed cache. For example Oracle use Oracle Coherence to implement that kind of cache with Weblogic. So with this example I answered the last part of your question.
But it's the most expensive solution to just doing cache (money, memory, network, cpu) : In memory data grid is reliable more that you need to do cache, it can handle the real data so you don't need another backend.
If you need to do cache in memory distributed, solutions like EHCache, Memcached, Infinispan can do it. In fact almost Java EE application server give a solution to do it.

Big Redis setup

I need to store 32M records in Redis 3.0.1, each record needs around 422KB. Making a total of around 13GB of information.
The information is stored in disc in a zipped hashlist, and serialized in smile jackson. I'm using Java 6, Jedis and AIX.
I have a few questions:
Does that mean that the Redis process needs 13GB or RAM?
Is this a manageable size for a single instance or would you go for a cluster setup? I think we can have up to 4 servers. This would mean revisit the whole project and dates, so please consider other management impacts on this question.
Is there a better way of storing this amount of data?
Thanks
Carlos

Even if you use a Redis Cluster, all of your data should fit in memory.
With 13TB of data, as pointed by Alex, and limited to 4 servers as you said, it means each server should have more than 4TB of RAM...
Moreover Redis stores the data in memory in a format that is optimised for speed, and so does not try very hard to reduce their size. So it may take more than 13TB in practice.
That's why I would not recommend Redis in this case, or at least not Redis only.
Maybe you should consider an alternative NoSQL Database, that offer fast response time although it stores data on disk, like Couchbase (it uses Redis internaly as a cache).
Or, if your use case allows it, an easiest solution would be to add a Redis cache to your current architecture, without changing the current database you use. It will dramaticaly improve the access speed to data in cache (but wont reduce the first access). It depends if the data are likely to be requested more than once in a short period of time, or not.

Can Terracotta's BigMemory Go be used without EHCache?

For an upcoming project I will keep a large amount of data (up to 10GB) in RAM, but not as a cache. Is is possible to use BigMemory (in particular Go, i.e. the free edition) without EH Cache, simply as a non garbage collected memory storage? I have not found a clear answer in the docs, which mostly talk about the typical integration with EHCache.
Thank you.

Yes, EhCache is the API for BigMemory:
BigMemory Go currently uses Ehcache as its user-facing data access API.

Basically, the way BigMemory has been designed is as sort of another storage tier. You store things in the heap exceeding which you store things offheap (which is the bigmemory) and then exceeding which you store things on the disk. It makes sense to do so because in the nosql paradigm where we want to store bigdata; things work well if they are in key-value form. You can choose to store any kind of value by just making it serializable.
As for your constraint of "not as a cache", its very much possible to configure the cache so that values don't get evicted from the memory. Anyways if you use BigMemory Go, you get a limit of 32GB so storing 10GB won't trigger any eviction algorithms even without any configuration.

Caching approach for a cluster of servers

I have a Java application deployed on a cluster of JBoss AS 5.1 which requires a lot (> 3 GB) of data to be cached.
Right now the server cluster has just 2 nodes (separate machines).
Here are specific requirements:
Both nodes should not require data to be loaded into cache (i.e., there should either be replication or cache should reside on a separate server)
The data should never expire.
Both of the above requirements are REALLY important for the application. I'd be thankful if the suggestion would be made keeping both of these in mind.
I should also add a third requirement:
ease of use
The application was initially using Hashmap. I tried replacing the hashmap with JBoss Cache 3.2.1 for its replication and thread safety features. But i'm not really happy with JBoss Cache performance. Also when i load the data in the cache the 8 Gig of RAM is almost entirely being used (most of it is used by the cache entries).
I'd like to hear the experience of people who have handled such kind of caching scenario themselves. Thanks for your time in advance.

You can try out using GigaSpaces XAP datagrid is a replicated cache. It is very highly performant.
http://www.gigaspaces.com/datagrid

If you want a cache that provides a Java HashMap interface and can easily support gigabytes of cache data, with no expiry, then check out Oracle Coherence. This would use the Coherence "distributed cache" option (which is the default configuration). For more info, see: http://coherence.oracle.com/
Elastic. Just add nodes. Auto-discovery. Auto-load-balancing. No data loss. No interruption. Every time you add a node, you get more data capacity and more throughput.
Use both RAM and flash. Transparently. Easily handle 10s or even 100s of gigabytes per Coherence node (e.g. up to a TB or more per physical server).
Automatic high availability (HA). Kill a process, no data loss. Kill a server, no data loss.
Datacenter continuous availability (CA). Kill a data center, no data loss.
For the sake of full disclosure, I work at Oracle. The opinions and views expressed in this post are my own, and do not necessarily reflect the opinions or views of my employer.

I have 'solved' this problem before (work code, can't show you)... but, I can tell you this much:
with large volumes, a large amount of memory is used in overhead in HashMaps.
you can save a lot of memory by replacing java.util.* classes with smart uses of arrays.
every time you have memory allocations you also have to scan/collect that memory in the GC, so saving memory also improves performance.
Wherever you can, use arrays....
Edit: Apparently the concept of Hash Maps has been forgotten.... Has the Java implementation of HashMap made people believe it is the only way? A structured set of arrays, with a hash function, and a binary sort.... all basic structures... http://en.wikipedia.org/wiki/Hash_table
One array to add keys to. A parallel array to store the values in, and an int-based hash table to make a fast lookup in to the key array...
Computer Science - maybe second year ;-)
Edit again: I used to core concepts I have described in the JDOM project here: https://github.com/hunterhacker/jdom/blob/master/core/src/java/org/jdom2/StringBin.java

memcached tomcat mysql on 1GB RAM

I am new to memcached and caching in general. I have a java web application running on Ubuntu + Tomcat + MySQL on a VPS Server with 1GB of memory.
Does it make sense to add a memcached layer with about 256MB for caching? Will this be too much load on the server? Which is more appropriate caching rendered html pages or database objects?
Please advise.

If you're going to cache pages, don't use memcached, use Varnish. However, there's a good chance that's not a great use of memory. Cacheing pages trades memory for computation and database work, but it does cost quite a lot of memory per page, so it's best for cases where the computation and database work needed to produce a single page amounts to a lot (or the pages are very small!). Also, consider that page cacheing won't be effective, or even possible, if you want to use per-user customisation on your pages (eg showing the number of items in a shopping cart). At least not without getting into some truly hairy shenanigans (edge-side includes, anyone?).
If you're not going to cache pages, and your app is on a single machine, then there's no point using memcached or similar. The point of cache servers like that is to make the memory on one machine work as a cache for another - like how a file server shares a disk, they're essentially memory servers. On a single machine, you might as well give all the memory to Java and cache objects on the heap.
Are you using an object-relational mapper? If so, see if it has any support for a second-level cache. The big three implementations (Hibernate, OpenJPA, and EclipseLink) all support in-memory caches. They're likely to do a much better job than you would if you did the cacheing yourself.
But, if you're not using a mapper, you have no choice but to do the cacheing yourself. There are extension points in LinkedHashMap for building LRU caches, and then of course there's the people's favourite, SoftReference, in combination with a HashMap. Plus, there are probably cache implementations out there you could download and use - i'd be shocked if there wasn't something in the Apache Commons libraries.

memcached won't add any noticeable load on your server, but it will be memory your app can't use. If you only plan to have a single app server for a while, you're better off using an in-JVM cache.
As far what to cache, the answer falls somewhere in the middle of the above. You don't want to cache exactly what's in your database and you certainly don't want to cache the final output. You have a data model representation in your application that isn't exactly what's in the DB (e.g. a User object might be made up of multiple queries from a few different tables). Cache that kind of thing as it's most reusable.
There's lots of info in the memcached site that should help you understand and get going with caching in general and memcached specifically.

It might make sense to do that, why don't try a smaller size like 64 MB and see how that goes. When you use more resources for the memcache, there is less for everything else. You should try it and see what will give you the best performance.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.