Does Big Memory compliments EhCache & Terracotta server? - java

I am using EHCache as second level cache for my application's Hibernate DAO layer. To implement distributed cache I am planning to include Terracotta Server.
Recently I came to know about Terracotta's another product Big Memory.
Few questions regarding that:
How is Big Memory will help on top of Terracotta/EhCache?
Will it compliment Terracotta/EhCache implementation?
Will it be worth giving a try?
I work on a Java EE application which has a flex UI, Hibernate ORM layer, SQL Server 2008 and Tomcat application server.

How will Big Memory help on top of Terracotta/EhCache?
The way I've understood the point of BigMemory is that it stores large amounts of data on the memory outside JVM. This will help if you have lots of stuff you want to be cached, so much so that the GC times are impacting your performance like explained here.
If your stuff fits with your cache just fine and you don't experience such slowups, I would imagine Big Memory can even slow you down in contrast to terracotta, as heap within JVM would be faster than outside JVM. At least, it would not improve much.
Will it compliment Terracotta/EhCache implementation?
Based on the documentation, integration to ehcache/terracotta should be quite seamless. So, yes.
Will it be worth giving a try?
I'd go first with Terracotta, measuring memory usage, GC times and the impact, and if it would seem like Big Memory could help some more, then sure. If it looks ok, no reason to add extra stuff.

BigMemoryGO offers upto 32 GB free usage. I would suggest to give a try to BigMemory.
BigMemory Go lets you keep all of your application's data instantly available in your server's memory so I don't think it will slow down your app in contrast to Terracotta.

Related

How do you optimize an external java library's resource usage?

I have a utility that I built, and for being such a small purpose-built utility I was very surprised when I noticed during testing that it was using 150mb of memory. I ran it with a heap setting of 1MB and it still took up more than 50mb.
After profiling and spending a day trying to figure out where I went wrong, I decided to test a theory. My utility connects to a proprietary application. That connection requires an external library provided by the application vendor.
I wrote a small Hello World using the library and noticed the following:
1) declaring a new object from the vendors library immediately bumps the memory usage to 50MB (mostly permgen space)
2) actually trying to connect to an application server will bump the memory usage to 150MB.
That's just plain silly as far as I'm concerned.
I'm wondering if there is any potential way to tame the beast. Maybe somehow unload classes that I know aren't necessary or won't ever be referenced. The vendor isn't going to change things any time soon.
Or what about maybe loading the vendor's library only when necessary? That way it only eats up so much memory while communicating with the app server.
The JVM only loads classes you actually use. In fact it only load methods you actually use. e.g. if you have a byte code bug in a method you don't use, you won't know. ;)
The best way to reduce the size of the vendors library is to ask them to improve it, or write your own.
Unfortunately allot of propriety application are not very resource friendly. Their first priority is correctness. :j
BTW: 150 MB of memory costs about $6 so I wouldn't spend allot more than that much time trying to fix it.

Java Memory aware cache

I am looking for some ideas, and maybe already some concrete implemenatation if somebody knows any, but I am willing to code the wanted cache on my own.
I want to have a cache that caches only as many gigs as I configure. In comparision to the rest of the app the cache part will use nearly 100% of memory, so we can generalize the used memory of the app beeing the cache size(+ garbage).
Are there methods for getting a guess of how much memory is used? Or is it better to rely on soft pointers? Soft pointer and running always at the top of the jvm memory limit might be very inefficent with lots of cpu cycles for memory cleaning? Can I do some analysis on existing objects, like a myObject.getMemoryUsage()?
The LinkedHashMap has enough cache hits for my purpose so I don't have to code some strategic caching monster, but I don't know how to solve this momory issue properly. Any ideas? I don't want OOME flying anywhere.
What is best pratice?
SoftReference are not a great idea as they tend to be clearer all at once. This means when you get a performance hit from a GC, you also get a hit having to re-build your cache.
You can use Instrumentation.getObjectSize() to get the shallow size of an Object and use reflection to obtain a deep size. However, doing this relatively expensive and not something you want to get doing very often.
Why can't you limit the size to a number of object? In fact, I would start with the simplest cache you can and only add what you really need.
LRU cache in Java.
EDIT: One way to track how much memory you are using is to Serialize the value and store it as a byte[]. This can give you fairly precise control however can slow down your solution by up to 1000x times. (Nothing comes for free ;)
I would recommend using the Java Caching System. Though if you wanted to roll your own, I'm not aware of any way to get an objects size in memory. Your best bet would be to extend AbstractMap and wrap the values in SoftReferences. Then you could set the java heap size to the maximum size you wanted. Though, your implementation would also have to find and clean out stale data. It's probably easier just to use JCS.
The problem with SoftReferences is that they give more work to the garbage collector. Although it doesn't meet your requirements, HBase has a very interesting strategy in order to prevent the cache from contributing to the garbage collection pauses : they store the cache in native memory :
https://issues.apache.org/jira/browse/HBASE-4027
https://issues.apache.org/jira/secure/attachment/12488272/HBase-4027+%281%29.pdf
A good start for your use-case would be to store all your data on disk. It might seem naive, but thanks to the I/O cache, frequently accessed data will reside in memory. I highly recommend reading these architecture notes from the Varnish caching system :
https://www.varnish-cache.org/trac/wiki/ArchitectNotes
The best practice I find is to delegate the caching functionality outside of Java if possible. Java may be good in managing memory, but at dedicated caching system should be used for anything more than a simple LRU cache.
There is a large cost with GC when it kicks in.
EHCache is one of the more popular ones I know of. Java Caching System from another answer is good as well.
However, I generally offload that work to an underlying function (usually the JPA persistence layer by the application server, I let it get handled there so I don't have to deal with it on the application tier).
If you are caching other data such as web requests, http://hc.apache.org/httpclient-3.x/ is also another good candidate.
However, just remember you also have "a file system" there's absolutely nothing wrong with writing to the file system data you have retrieved. I've used the technique several times to fix out of memory errors due to improper use of ByteArrayOutputStreams

memcached tomcat mysql on 1GB RAM

I am new to memcached and caching in general. I have a java web application running on Ubuntu + Tomcat + MySQL on a VPS Server with 1GB of memory.
Does it make sense to add a memcached layer with about 256MB for caching? Will this be too much load on the server? Which is more appropriate caching rendered html pages or database objects?
Please advise.
If you're going to cache pages, don't use memcached, use Varnish. However, there's a good chance that's not a great use of memory. Cacheing pages trades memory for computation and database work, but it does cost quite a lot of memory per page, so it's best for cases where the computation and database work needed to produce a single page amounts to a lot (or the pages are very small!). Also, consider that page cacheing won't be effective, or even possible, if you want to use per-user customisation on your pages (eg showing the number of items in a shopping cart). At least not without getting into some truly hairy shenanigans (edge-side includes, anyone?).
If you're not going to cache pages, and your app is on a single machine, then there's no point using memcached or similar. The point of cache servers like that is to make the memory on one machine work as a cache for another - like how a file server shares a disk, they're essentially memory servers. On a single machine, you might as well give all the memory to Java and cache objects on the heap.
Are you using an object-relational mapper? If so, see if it has any support for a second-level cache. The big three implementations (Hibernate, OpenJPA, and EclipseLink) all support in-memory caches. They're likely to do a much better job than you would if you did the cacheing yourself.
But, if you're not using a mapper, you have no choice but to do the cacheing yourself. There are extension points in LinkedHashMap for building LRU caches, and then of course there's the people's favourite, SoftReference, in combination with a HashMap. Plus, there are probably cache implementations out there you could download and use - i'd be shocked if there wasn't something in the Apache Commons libraries.
memcached won't add any noticeable load on your server, but it will be memory your app can't use. If you only plan to have a single app server for a while, you're better off using an in-JVM cache.
As far what to cache, the answer falls somewhere in the middle of the above. You don't want to cache exactly what's in your database and you certainly don't want to cache the final output. You have a data model representation in your application that isn't exactly what's in the DB (e.g. a User object might be made up of multiple queries from a few different tables). Cache that kind of thing as it's most reusable.
There's lots of info in the memcached site that should help you understand and get going with caching in general and memcached specifically.
It might make sense to do that, why don't try a smaller size like 64 MB and see how that goes. When you use more resources for the memcache, there is less for everything else. You should try it and see what will give you the best performance.

Choosing between Berkeley DB Core and Berkeley DB JE

I'm designing a Java based web-app and I need a key-value store. Berkeley DB seems fitting enough for me, but there appears to be TWO Berkeley DBs to choose from: Berkeley DB Core which is implemented in C, and Berkeley DB Java Edition which is implemented in pure Java.
The question is, how to choose which one to use? With web-apps scalability and performance is quite important (who knows, maybe my idea will become the next Youtube), and I couldn't find easily any meaningful benchmarks between the two. I have yet to familiarize with Cores Java API, but I find it hard to believe that it could be much worse than Java Editions, which seems to be quite nice.
If some other key-value store would be much better, feel free to recommend that too. I'm storing smallish binary blobs, and keys probably will be hashes of the data, or some other unique id.
I have quite a bit of experience using both BDB-JE and BDB-core with Java. Deciding which one to use is quite simple: If you want concurrency, use BDB-JE. If you want scalability, use BDB-core.
BDB-JE breaks down performance-wise with large databases due to its file format and its reliance on Java garbage collection to clean up evicted cache entries. Expect long garbage collection pauses or spend a lot of time tuning magic GC settings. The file format has issues too, because the background cleaner threads have to spend a lot of time cleaning up garbage created by early cache evictions. If your database fits in RAM, BDB-JE works quite well.
BDB-core relies on a page-locking strategy, and highly concurrent applications experience a lot of deadlocks. If you can randomly order operations it reduces the deadlock potential, but it never eliminates it. Because BDB-core stores data in a more traditional way, it scales to super large sizes with predictable and expected performance degradation. Because its cache is not managed by a garbage collector, it can be quite large and not cause any pauses.
If you derive a common interface to these, and have a suitable set of unit tests, you should be able to swap between the two trivially at a later date (perhaps when you really need to make a decision based on hard facts that are not available right now)
I faced the same problem and decided to go with the Java edition, mainly because of its portability(I need something that would ran even on mobile devices). There are also the Direct Persistence Layer (DPL) API and the fact that the whole db is a single jar makes its deployment fairly simple.
The recent version 4 brought in High availability and performance improvements. There is also the fact that long running java applications can achieve such an optimization, that they would surpass native C applications performance in some scenarios.
It's a natural fit for any Java application - desktop or web.
I while ago I was having the same question, after doing some benchmarks I found that hash mode in the native edition is much faster and storage efficient than anything the java edition has to offer, so I decided to go with the native implementation.
I suggest you do your own benchmarks for the storage capacities you expect and decide if the Java edition is fast enough.
if it is, or if performance is not a big issue for you (it's critical for me), just go with the Java edition. otherwise go with the native one (assuming you see the same performance boost for your own use case).
btw:
my benchmark was test the speed of querying random keys out of 20,000,000 records, where the key is a string and the value is an int (4 bytes).
I saw that inserts (populating the benchmark) was much faster with the native version, and queries was twice as fast.
(This is not due to Java shortcoming but because the Java version is not of the same version as the native version - 4.0 vs 4.8 IIRC).
I decided to go with the Java Edition, simply because its possible to embed the database runtime within the same deployable. This was an important feature for my setup. I haven't benchmarked between core and JE, but I have seen great performance compared with other key-value stores that I tested when first evaluating database stores.
If you're creating a web-application though, then concurrency might be very important to you in the long run.

Tools to map threads to their memory usage?

Having some performance issues in a Weblogic 11g production system.
As part of the debugging effort, I’m interested in finding a way to map threads to their memory usage, then seeing the stack to determine what part of the application is consuming so much.
Anyone know of a tool or method to do what I want to do?
I'm not interested in JProbe memory profiling as it requires too much overhead (taking snapshots of everything). Also, I've read about Heapwalker in NetBeans that seems promising.
Eclipse has a memory analyzer (or heap walker, if you will) called mat - http://eclipse.org/mat/.
I've used it in the past and it was pretty helpful. I don't remember off-hand all the features, but I do remember being able to identify "heavy" threads, querying for largest objects, and such.
The home page links to several tutorials and a blog that are useful as well.

Categories