Java based memcached client, optimization of putting data inside memcache

Java based memcached client, optimization of putting data inside memcache - java

I have say list of 1000 beans which I need to share among different projects. I use memcache for this purpose. Currently, loop is run over complete list and each bean is stored in memcache with some unique memcache id. I was wondering, instead of putting each and every bean in memcache independently. Put all the beans in hashmap with the same key which is used for storing beans in memcache, and then put this hashmap in memcache.
Will this give me any significant improvement over putting each and every bean individually in memcached. Or will this cause me any trouble because of large size of the object.
Any help is appreciated.

It won't get you any particular benefit -- it'll actually probably be slower on the load -- serialization is serialization, and adding a hashmap wrapper around it just increases the amount of data that needs to be deserialized and populated. for retrievals, assuming that most lookups are desecrate by the key you want to use for your hashmap you'll have a much much slower retrieval time because you'll be pulling down the whole graph just to get to one of it's discreet member info.
Of course if the data is entirely static and you're only using memcached to populate values in various JVM's you can do it that way and just hold onto the hashmap in a static... but then you're multiplying your memory consumption by the number of nodes in the cluster...

I did some optimization work in spymemcached that helps it do the right thing when doing the wire encoding.
This may, or may not help you with your application. In general, just measure when you have performance questions about your app.

Related

Java: use concurrent hashmap or memcached cache

I have a simple country states hashmap, which is a simple static final unmodifiable concurrent hashmap.
Now we have implemented memcached cache in our application.
My question is, Is it beneficial to get the values from cache instead of such a simple map?
What benefits I will get or not get if I move this map to cache?

This really depends on the size of the data and how much memory is you've allocated for your JVM.
For simple data like states of a country which are within a few hundred entries, a simple HashMap would suffice and using memcache is an overkill and in fact slower.
If it's large amount of data which grow (typically 10s/100s MBs or larger) and require frequent access, memcache (or any other persistent cache) would be better than an in-memory storage.

It will be much faster as a HashMap because it is stored in memory and the lookup can be done via the jvm by it's reference. The lookup from memcache would require extra work for the processor to look up the map.

If your application is hosted on only one server then you don't need distributed feature of memcache and HashMap will be damn fast. Stats
But this is not case of web applications. ~99% cases for web applications you host it on multiple servers and want to use distributed caching, memcache is best in such cases.

Which NoSQL Implementation is Most Appropriate?

I'm new to NoSQL, and I'm scratching my head trying to figure out the most appropriate NoSQL implementation for the application I'm trying to build.
My Java application needs to have an in-memory hashmap containing millions to billions of entries as it models a single-layer neural network. Right now we're using Trove in order to be able to use primitives as keys and values to reduce the size of the map and increase the access speed. The map is a map of maps where the outer map's keys are longs and the inner maps have long/float key/values.
We need to be able to read the saved state from disk to the map of maps when the application starts up. The changes to the map of maps need also to be saved to disk either continuously or according to some scheduled interval.
I was at first drawn towards OrientDB because of their document and object DBs, although I'm still not sure at this point what would be better. Then I came across Redis, which is a key value store and works with an in-memory dataset that can be dumped to disk, including master-slave replication. However, it doesn't look like the values of the map can be anything other than Strings.
Am I looking in the right places for a solution to my needs? Right now, I like the in-memory and master-slave aspect of Redis, but I like the object/document capabilities of OrientDB as my data structures are more complicated than simple Strings and being able to use Trove with the primitive key/value types is very advantageous. It would be better if reading was cheap and writing was expensive rather than the other way around.
Thoughts?

Why not just serialize the Trove data structures directly to disk? There appears to be some sort of support for that judging by the documentation (http://trove4j.sourceforge.net/javadocs/serialized-form.html), but it's hard to tell because it's all auto-generated cruft instead of lovingly-made tutorials. Still, for your use case it's not obvious why you need a proper database, so perhaps KISS applies.

OrientDB has the most flexible engine with index, graph, transactions and complex documents as JSON. Why not?

Check out Java-Chronicle. It's a low latency persistence library. I think you may find it offers excellent performance for this type of data.

If you'd like to use Redis for this, you'd likely be best suited by using either ZSETs or HASHes as underlying structures (Redis supports structures, not just string values). Unless you need to fetch your parts of your maps based on the values/sorted order of the values, HASHes would probably be best (in terms of memory and speed).
So you would probably want to use a long -> {long:float, ...} . That is, longs mapping to long/float maps. You can then either fetch individual entries in the map with HGET, multiple entries with HMGET, or the full map with HGETALL. You can see the command reference http://redis.io/commands
On the space saving side of things, depending on the expected size of your HASHes, you may be able to tune them to use less space with limited/no negative effects on performance.
On the persistence side of things, you can either run Redis with snapshots or using incremental saving with append-only files. You can see the persistence documentation here: http://redis.io/topics/persistence
If you'd like to ask more pointed questions, you should head over to the mailing list https://groups.google.com/forum/?fromgroups=#!topic/redis-db/33ZYReULius

Redis supports more complex data structures than simple strings such as lists, (sorted) sets or hashes which might come handy for your domain model. On the other your neural network can leverage from rich graph capabilities of OrientDB depending on it's strucuture.

Retrieve all essential data on startup

In Java Web Application, i would like to know if it is a proper (or "standard"?) way that all the essential data such as the config data, message data, code maintenance data, dropdown option data and etc (assuming all data will not updated frequently) are loaded as a "static" variables from database when the server startup.Or is it more preferred way to retrieve data by querying db per request?
Thanks for all your advice here.

It is perfectly valid to pull out all the data that are not going to be modified during application life-cycle into and keep it in memory as singleton or something.
This is a good idea because it saves DB hits and retrieval is faster. A lot of environment specific settings and other data can also be pulled once and kept in an immutable hashmap for any future request.
In a common web-app you generally do not have so many config data/option objects that can eat up lot of memory and cause OOM. But, if you have a table with hundreds of thousands of config data, better assume pulling objects as and when requested. And if you do want to keep it in memory, think of putting this in some key-value store like MemcacheD.

We used DB to store config values and ehcache to avoid a lot of DB hits. This way you don't need to worry about memory consumption (it will use whatever memory you have).
EhCache is one of many available DB cache solution and can be configured on top of JPA etc.
You can configure ehcache (or many other cache providers) to deem the tables read-only, in which case it will only go to the DB if it's explicitly told to invalidate the cache. This performs pretty well. The overhead becomes visible though when the read occurs very frequently (like 100/sec), but usually storing the config value in a local variable and avoiding reading inside loops, passing it on through the method stack during the invocation mitigates this well enough.
Storing values in a Singleton as java objects performs the best, but if you want to modify these without app. start up, it becomes a little bit involved.
Here is a simple way to achieve dynamic configuration with Java objects:
private volatile ImmutableMap<String,Object> param_value
Basically you'll have to start thinking about multi-threaded access, and memory issues (while it's quite unlikely that you'll run out of memory because of configuration values, unless you have binary data as config values etc.).
In essence, I'd recommend using the DB and some cache provider unless that part of code really needs high-performance.

The best place to store large data retrieved by a java servlet (Tomcat)

I have the java servlet that retrieves data from a mysql database. In order to minimize roundtrips to the database, it is retrieved only once in init() method, and is placed to a HashMap<> (i.e. cached in memory).
For now, this HashMap is a member of the servlet class. I need not only store this data but also update some values (counters in fact) in the cached objects of underlying hashmap value class. And there is a Timer (or Cron task) to schedule dumping these counters to DB.
So, after googling i found 3 options of storing the cached data:
1) as now, as a member of servlet class (but servlets can be taken out of service and put back into service by the container at will. Then the data will be lost)
2) in ServletContext (am i right that it is recommended to store small amounts of data here?)
3) in a JNDI resource.
What is the most preferred way?

Put it in ServletContext But use ConcurrentHashMap to avoid concurrency issues.

From those 3 options, the best is to store it in the application scope. I.e. use ServletContext#setAttribute(). You'd like to use a ServletContextListener for this. In normal servlets you can access the ServletContext by the inherited getServletContext() method. In JSP you can access it by ${attributename}.
If the data is getting excessive large that it eats too much of Java's memory, then you should consider a 4th option: use a cache manager.

The most obvious way would be use something like ehcache and store the data in that. ehcache is a cache manager that works much like a hash map except the cache manager can be tweaked to hold things in memory, move them to disk, flush them, even write them into a database via a plugin etc. Depends if the objects are serializable, and whether your app can cope without data (i.e. make another round trip if necessary) but I would trust a cache manager to do a better job of it than a hand rolled solution.

If your cache can become large enough and you access it often it'll be reasonable to utilize some caching solution. For example ehcache is a good candidate and easily integrated with Spring applications, too. Documentation is here.
Also check this overview of open-source caching solutions for Java.

Update cached data in a hashtable

In order to minimize the number of database queries I need some sort of cache to store pairs of data. My approach now is a hashtable (with Strings as keys, Integers as value). But I want to be able to detect updates in the database and replace the values in my "cache". What I'm looking for is something that makes my stored pairs invalid after a preset timespan, perhaps 10-15 minutes. How would I implement that? Is there something in the standard Java package I can use?

I would use some existing solution(there are many cache frameworks).
ehcache is great, it can reset the values on given timespan and i bet it can do much more(i only used that)

You can either use existing solutions (see previous reply)
Or if you want a challenge, make your own easy cache class (not recommended for production project, but it's a great learning experience.
You will need at least 3 members
A cache data stored as hashtable object,
Next cache expiration date
Cache expiration interval set via constructor.
Then simply have public data getter methods, which verify cache expiration status:
if not expired, call hastable's accessors;
if expired, first call "data load" method that is also called in the constructor to pre-populate and then call hashtable accessors.
For an even cooler cache class (I have implemented it in Perl at my job), you can have additional functionality you can implement:
Individual per-key cache expiration (coupled with overall total cache expiration)
Auto, semi-auto, and single-shot data reload (e.g., reload entire cache at once; reload a batch of data defined either by some predefined query, or reload individual data elements piecemail). The latter approach is very useful when your cache has many hits on the same exact keys - that way you don't need to reload universe every time 3 kets that are always accessed expire.

You could use a caching framework like OSCache, EHCache, JBoss Cache, JCS... If you're looking for something that follows a "standard", choose a framework that supports the JCache standard interface (javax.cache) aka JSR-107.
For simple needs like what you are describing, I'd look at EHCache or OSCache (I'm not saying they are basic, but they are simple to start with), they both support expiration based on time.
If I had to choose one solution, I'd recommend Ehcache which has my preference, especially now that it has joined Terracotta. And just for the record, Ehcache provides a preview implementation of JSR107 via the net.sf.cache.jcache package.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.