Caching stored documents from MongoDB generically

Caching stored documents from MongoDB generically - java

I am trying to optimize a game server for learning purposes. I am using MongoDB as a backend datastore with their Java driver. I am storing player data (level, name, current quest), quest data, and a range of other gameplay data in the database. Each document type has its own class type with the appropriate fields (e.g. User.class holds a document from the users collection, Quest.class holds a document from the quests collection, etc.
Right now, when a player performs an action, I am using the player's username to find a document from the users collection and update it accordingly. This is extremely costly as it means that every single time a user performs an action, a database query is needed to fetch the data for the current player.
Of course, my next thought was to load the player's user document when they connect to the server and store this separately, then remove it when they disconnect and save their updated data from memory to MongoDB.
The problem is that I would like to also do something similar for all the other collections and the only foreseeable way of doing this (as each cache has a different key to lookup data, usually Strings and UUIDs) is something like the following:
// Create a bunch of separate caches (faster than Guava Table, but ugly)
// For example, after finding a user: userCache.put("TheirUsername", user);
private HashMap<String, User> userCache = new HashMap<>();
private HashMap<UUID, Group> groupCache = new HashMap<>();
private HashMap<Integer, Quest> questCache = new HashMap<>();
// Or use a Guava Table to store all (this is slower than individual maps)
// For example, after finding a user: cache.put(User.class, "TheirUsername", user);
private Table<Class, Object, Object> cache = HashBasedTable.create();
Are there any alternatives to having a large number of maps and storing the result of the find in these maps (one per cached collection)?
I would love to somehow abstract this without causing a loss in performance. I have attempted to use Guava to implement a Table<Class, Object, Object> so that the cache is essentially dynamic and lets me cache any class. The problem is that Tables are a lot slower, especially if there are hundreds of lookups per second...
I am unsure as to how I can make this as optimal performance-wise as possible without compromising the clean nature of my code. A Table is essentially what I would love to do as it is very versatile, but it's just not fast enough.

Basically, you could just use a single map from object to object. If your keys all have a correct equals()-method (all basic runtime classes have) you should not have any problems.
Thus, the basic answer to your question is:
HashMap<Object, Object> megaCache = new HashMap<>();
megaCache.put("someUser", someUserObject);
...
User cachedUser = (User) megaCache.get("someUser");
However, I strongly recommend not to do this!
You loose all the beauty and typesafety of generics and load up a single map with all kinds of stuff. (Usually, this is not a major problem concerning runtime, but the probability of having hash-collisions between unrelated key types raises.)
Rather go for single caches like in your original post and be typesafe and clear.

Related

What is the better option for a multi-level, in-process cache?

In my spring boot application, I need to implement an in-process multi-level cache
Here is an example of the data, which needs to be cached:
customer name (key, string)
--data entity name (key, string)
--config-1 (value, JSONObject)
--config-2 (value, JSONObject)
I'm planning on having a few hundreds customer entries, each having up to a hundred "config" JSONObjects
I'm currently looking at ehcache:
Cache cache = manager.getCache("sampleCache1");
Element element = new Element("key1", "value1");
cache.put(element);
In this context, I would use "Customer_Name" in place of "key1", and "My Customer" in place of "value1", but them I would need to build a hierarchy:
customer
-data entity
-config
I'm not sure how to do it with ehcache.
I'm also not sure whether there are better choices for what I'm trying to do.
Has anyone implemented such a multi-level hierarchical cache with ehcache or any other library?

For notation I use a map-like cache: value = Cache.get(key) which is more common then the EHCache2 Element
Option 1: Construct a composite key object
class CustomerConfigurationKey {
String customerKey;
String dataEntityKey;
// equals() and hashCode()
}
This is pretty standard key/value stores including plain maps. I did address this in cache2k Quick Start.
Option 2: Use multiple levels of caches
Put a cache inside a cache and access like: data.get(customerKey).get(dataEntityKey).
You can find examples of "Composite Key" vs. "Multi Level Caches" in cache2k benchmarks DateFormattingBenchmark
This only works nicely if you have a small set at the first level. In your case you would end up with a separate cache per customer, which is to costly. So, this is only for completeness, no real option in your scenario.
Option 3: Use a map for the second level
Construct a single cache with Cache<String, Map<String, JSONObject>.
If typically all the customer data is used in a short interval, it does not make sense to cache on a finer level, since all data of a customer will be typically in memory anyways. Another example: when a customer is not active any more, the cache would expire and all of the customer data could be removed from memory.
Updating single entries of the map will have concurrency issues that you need to properly address, e.g. by copying and putting only an immutable map in the cache or by using a ConcurrentHashMap.

How to get all values of a MbGlobalMap inside the IIB Global Cache?

I'm storing some information inside a MbGlobalMap (embedded Global Cache) of the IBM Integration Bus. If the map is called EXAMPLE.MAP I can access the values as follows:
MbGlobalMap map = MbGlobalMap.getGlobalMap("EXAMPLE.MAP");
Object value = map.get(key);
But I want to get all values of the EXAMPLE.MAP, even if I don't know all keys of the map. I can't iterate over the MbGlobalMap and a cast to java.util.Map won't work at all.
This is the documentation of the Class: https://www.ibm.com/support/knowledgecenter/SSMKHH_9.0.0/com.ibm.etools.mft.plugin.doc/com/ibm/broker/plugin/MbGlobalMap.html. There is no method provided, to return all elements inside the Map.
A workaround could be a list with all current keys in it, so that you can get this list and with it you can get all values inside the map. But this is not a clean solution I think.

After some time of research, I want to give an answer to this question by myself:
The solution is the workaround i mentioned in my question. You can put a Java HashMap into the Global Cache and write all your Objects into this Map. An example would look something like the following:
MbGlobalMap globalMap = MbGlobalMap.getGlobalMap("EXAMPLE.MAP");
HashMap<String,Object> map = new HashMap<String,Object>();
// Put some objects into the map
globalMap.put("ALL", map);
Now you have a Java HashMap inside the MbGlobalMap of the Global Cache and you can access the data, without knowing the keys as follows:
MbGlobalMap globalMap = MbGlobalMap.getGlobalMap("EXAMPLE.MAP");
HashMap<String,Object> map = (HashMap<String,Object>)globalMap.get("ALL");
Set<String> allKeys = map.keySet();
Iterator<String> iter = allKeys.iterator();
while(iter.hasNext()) {
// Do something with map.get(iter.next());
}
First I thought, this solution would not be a clean one, because now the Map has to be locked for every write operation. But it seems, that the Global Cache will lock the Map for every write operation anyway:
As JAMESHART mentioned it at his contribution at IBM developerWorks, the Extreme Scale Grid under the Global Cache is configured with pessimistic locking strategy. According to the entry in the IBM Knowledge Center, this means the following:
Pessimistic locking: Acquires locks on entries, then and holds the locks until commit time. This locking strategy provides good consistency at the expense of throughput.
So the use of the described workaround won't have such a big impact on write access and performance.

There's now an enhancement request on IBM's Community RFE website in order to get this feature:
http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=94875
Please give your vote for this request if you are interested in this feature, because IBM considers ERs based on their votes.

Arcdic , the best way with the API at hand will be to use the putAll that takes in a java.util.Map , then use an EntrySet to get values that you are interested in.
public void putAll(Map m)
throws MbException

Best way to store data that will be needed every session, but will never change

I have a list of keys and corresponding values with about 150 entries. I have a method that will access it multiple times per session, but will never edit it. It will remained unchanged from session to session. I figure because I want to quickly access the values, I would use a hashmap, but I am not sure how to store it for the long term. What the best way of storing it, and then accessing it at the beginning of my program. Thanks!

If it's actually never going to change, you may as well just store it in a static initialization class: public static final. If we assume it's never going to change, then there's no reason to make it easy to change through other techniques such as loading from a file or a database.
It's up to you whether you store it as a HashMap with 150 entries or a 150 fields in a class. I think it depends on how you want to access the data. If you store it as a HashMap, then you'll likely be accessing the values on a String key, e.g. Constants.getData("max.search.results"). If you store it as 150 fields then just access the fields directly, e.g. Constants.MAX_SEARCH_RESULTS.
As for performance, with 150 entries you have nothing to worry about unless your data is large. People routinely use HashMaps for thousands or millions of entries.
As for when it will be initialized, static fields in Java are initialized only once when the application starts. For 150 entries this should take less than a millisecond (ballpark estimate).
See also
Class for constants
HashMap performance

Never never? In that case, you should consider building a properties (or JSON, etc) file into the jar of your program. You can then use
new Properties(getClass().getResourceAsStream());
from any class at the same package level as the properties file to get the set of key-value pairs.
If in fact it may change occasionally, you might want to think about some sort of external data store as I see has already been mentioned in comments responding to your question.

It seems like you could hard-code it, e.g. with a map String -> String:
public final class CONST
{
public static final Map<String, String> DATA;
static
{
final Map<String, String> data = new HashMap<>(200);
data.put("key001", "value001");
// ...
data.put("key150", "value150");
DATA = Collections.unmodifiableMap(data);
}
}
This creates the hash table at the time when the class is loaded. If you want to avoid this for some reason you could also use the initialization-on-demand holder idiom

Bean class Vs Collection : which one should i prefer to hold data

I have a TestDTO class which holds the 2 input data from user,
next step is to fetch the several data from database, lets say i am fetching ten String type values from database which requires further to execute the business logic.
I wanted to know the best way to hold the data (in terms of saving memory space and performance)
Add 10 more fields in the existing TestDTO class and set database values at run time
Use java.util.collection (List/Map/..)
Create another DTO/Bean class for 10 String values

If you want modularity of your code 3rd point is better, but for simplicity you should use a HashMap, like:
HashMap map = new HashMap();
map.put("string1",value);
.....
and so on.
This post can be useful for you : https://forums.oracle.com/thread/1153857

If TestDTO and the new values fetched are coming from the same table in the database, then they should be in the same class. Else, the new values should ideally be in another DTO. I do not know the exact scenario that you have, so given these constraints, 2nd option goes out of the window. And options 1 and 3 will depend on your scenario. Always hold values from a single table in one object(preferably).

Efficiently finding duplicates in a constrained many-to-many dataset?

I have to write a bulk operation version of something our webapp
lets you do on a more limited basis from the UI. The desired
operation is to assign objects to a category. A category can have
multiple objects but a given object can only be in one category.
The workflow for the task is:
1) Using the browser, a file of the following form is uploaded:
# ObjectID, CategoryID
Oid1, Cid1
Oid2, Cid1
Oid3, Cid2
Oid4, Cid2
[etc.]
The file will most likely have tens to hundreds of lines, but
definitely could have thousands of lines.
In an ideal world a given object id would only occur once in the file
(reflecting the fact that an object can only be assigned to one category)
But since the file is created outside of our control, there's no guarantee
that's actually true and the processing has to deal with that possibility.
2) The server will receive the file, parse it, pre-process it
and show a page something like:
723 objects to be assigned to 126 categories
142 objects not found
42 categories not found
Do you want to continue?
[Yes] [No]
3) If the user clicks the Yes button, the server will
actually do the work.
Since I don't want to parse the file in both steps (2) and (3), as
part of (2), I need to build a container that will live across
requests and hold a useful representation of the data that will let me
easily provide the data to populate the "preview" page and will let me
efficiently do the actual work. (While obviously we have sessions, we
normally keep very little in-memory session state.)
There is an existing
assignObjectsToCategory(Set<ObjectId> objectIds, CategoryId categoryId)
function that is used when assignment is done through the UI. It is
highly desireable for the bulk operation to also use this API since it
does a bunch of other business logic in addition to the simple
assignment and we need that same business logic to run when this bulk
assign is done.
Initially it was going to be OK that if the file "illegally" specified
multiple categories for a given object -- it would be OK to assign the
object abitrarily to one of the categories the file associated it
with.
So I was initially thinking that in step (2) as I went through the
file I would build up and put into the cross-request container a
Map<CategoryId, Set<ObjectId>> (specifically a HashMap for quick
lookup and insertion) and then when it was time to do the work I could
just iterate on the map and for each CategoryId pull out the
associated Set<ObjectId> and pass them into assignObjectsToCategory().
However, the requirement on how to handle duplicate ObjectIds changed.
And they are now to be handled as follows:
If an ObjectId appears multiple times in the file and
all times is associated with the same CategoryId, assign
the object to that category.
If an ObjectId appears multiple times in the file and
is associated with different CategoryIds, consider that
an error and make mention of it on the "preview" page.
That seems to mess up my Map<CategoryId, Set<ObjectId>> strategy
since it doesn't provide a good way to detect that the ObjectId I
just read out of the file is already associated with a CategoryId.
So my question is how to most efficiently detect and track these
duplicate ObjectIds?
What came to mind is to use both "forward" and "reverse" maps:
public CrossRequestContainer
{
...
Map<CategoryId, Set<ObjectId>> objectsByCategory; // HashMap
Map<ObjectId, List<CategoryId>> categoriesByObject; // HashMap
Set<ObjectId> illegalDuplicates;
...
}
Then as each (ObjectId, CategoryId) pair was read in, it would
get put into both maps. Once the file was completely read in, I
could do:
for (Map.Entry<ObjectId, List<CategoryId>> entry : categoriesByObject.entrySet()) {
List<CategoryId> categories = entry.getValue();
if (categories.size() > 1) {
ObjectId object = entry.getKey();
if (!all_categories_are_equal(categories)) {
illegalDuplicates.add(object);
// Since this is an "illegal" duplicate I need to remove it
// from every category that it appeared with in the file.
for (CategoryId category : categories) {
objectsByCategory.get(category).remove(object);
}
}
}
}
When this loop finishes, objectsByCategory will no longer contain any "illegal"
duplicates, and illegalDuplicates will contain all the "illegal" duplicates to
be reported back as needed. I can then iterate over objectsByCategory, get the Set<ObjectId> for each category, and call assignObjectsToCategory() to do the assignments.
But while I think this will work, I'm worried about storing the data twice, especially
when the input file is huge. And I'm also worried that I'm missing something re: efficiency
and this will go very slowly.
Are there ways to do this that won't use double memory but can still run quickly?
Am I missing something that even with the double memory use will still run a lot
slower than I'm expecting?

Given the constraints you've given, I don't there's a way to do this using a lot less memory.
One possible optimization though is to only maintain lists of categories for objects which are listed in multiple categories, and otherwise just map object to category, ie:
Map<CategoryId, Set<ObjectId>> objectsByCategory; // HashMap
Map<ObjectId, CategoryId> categoryByObject; // HashMap
Map<ObjectId, Set<CategoryId>> illegalDuplicates; // HashMap
Yes, this adds yet another container, but it will contain (hopefully) only a few entries; also, the memory requirements of the categoryByObject map is reduced (cutting out one list overhead per entry).
The logic is a little more complicated of course. When a duplicate is initially discovered, the object should be removed from the categoryByObject map and added into the illegalDuplicates map. Before adding any object into the categoryByObject map, you will need to first check the illegalDuplicates map.
Finally, it probably won't hurt performance to build the objectsByCategory map in a separate loop after building the other two maps, and it will simplify the code a bit.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Caching stored documents from MongoDB generically - java

Related

What is the better option for a multi-level, in-process cache?

How to get all values of a MbGlobalMap inside the IIB Global Cache?

Best way to store data that will be needed every session, but will never change

Bean class Vs Collection : which one should i prefer to hold data

Efficiently finding duplicates in a constrained many-to-many dataset?

Categories

Resources