I've got a large graph wich is processed in a Java Servlet for routing purpose. The Graph has got 100k+ Nodes so I can't reload it for every new call. At the moment the graph is loaded once from my database into the RAM and referenced in a Hashmap.
When I start the servlet (creating a new instance) I need to find the startnode in the graph by id. Therefore I use the hashmap.
That all works very fine.
My problem is, that within my routing task I need to change certain attributes in the graph, i.e. the travelled distance. These attributes of course need to be individual for each created instance. At the moment I handle that task by resetting all "non-static" attributes when creating a new instance.
That creates two problems.
A) the instances are not thread safe
B) the resetting is very time consuming. Up to 10 times more than the actual routing.
So what I need is a static Hashmap for all instances of my Servlet. This Hashmap needs to contain all nodes of my network. These nodes need to have static attributes like id, coordinates, neighbour nodes etc. but also non-static attributes like travelled distance.
How can I do that?
Thanks for reading and sharing ideas
Your problem can be described as a model built at runtime and instantiated for every execution of your service.
When you say "static", I presume you mean "constant". The variable attributes are really specific to each execution, not to each Servlet instance. During an execution you should build a separate structure with variable attributes that parallels the constant one. Each node in the variable structure references a single node in the constant structure. The variable structure is built gradually and on demand, as a need for each node arises. The structure is discarded at the end of the execution.
I'd advise to keep the "main graph" in RAM in a singleton manner - as Marko Topolnik advised, but I'd keep a Map of only the changed nodes per each session, without the hierarchy, just storing them by ID's (if applicable, per se)
when a session ends, you only have to discard the map in the session, and that's all.
when a new session begins, just create a new Map instance...
You could also pool these maps, if that is critical - but avoid premature optimalization, as it causes far more problems than what it avoids.
if you need to access a node, fetch it from the original Map, then look it up if it exists in the "session local" map, then merge data in the two if found. (or, if you store the full node, not just the changed atributes in the "session local" map, use the changed node from that map)
also, be careful, this has many places that can introduce memory leaks...
Related
I'm storing a few properties(KV pairs) in a hierarchical db(JCR). As part of business logic, i have to lookup these key-value pairs very frequently and each time i have to call a method which goes and retrieves the persisted value.
I'm working on a CMS called AEM and all these key-value pairs are authored using a component and stored as JCR properties. Presently i've written an OSGi service which will go to that node and retrieve the value corresponding to the key and this method gets invoked several several times. Instead of making repeated calls to the service method to retrieve these values, can you suggest an efficient way to do this? OSGi auto-wiring?
First of all, I would suggest you to thing twice if you really need to get rid of (or decrease) node properties reading. Do you have performance issues because of this reading or you have another important reason?
If you still wanna to mess with it, I would suggest you next configuration:
You have a Cache Component, which contains this map with all key-value pairs.
You have Listener, which listens to node's change, which contains this data, and invalidates cache on such event (so cache will be rebuilt next time it accessed).
There is a great variety of cache implementations, or you can use simple map for this.
I'm developing a service that monitors computers. Computers can be added to or removed from monitoring by a web GUI. I keep reported data basically in various maps like Map<Computer, Temperature>. Now that the collected data grows and the data structures become more sophisticated (including computers referencing each other) I need a concept for what happens when removing computers from monitoring. Basically I need to delete all data reported by the removed computer. The most KISS-like approach would be removing the data manually from memory, like
public void onRemove(Computer computer) {
temperatures.remove(computer);
// ...
}
This method had to be changed whenever I add features :-( I know Java has a WeakHashMap, so I could store reported data like so:
Map<Computer, Temperature> temperatures = new WeakHashMap<>();
I could call System.gc() whenever a computer is removed from monitoring in order have all associated data eagerly removed from these maps.
While the first approach seems a bit like primitive MyISAM tables, the second one resembles DELETE cascades in InnoDB tables. But still it feels a bit uncomfortable and is probably the wrong approach. Could you point out advantages or disadvantages of WeakHashMaps or propose other solutions to this problem?
Not sure if it is possible for your case, but couldn't your Computer class have all the attributes, and then have a list of monitoredComputers (or have a wrapper class called MonitoredComputers, where you can wrap any logic needed like getTemperatures()). By that they can be removed from that list and don't have to look through all attribute lists. If the computer is referenced from another computer then you have to loop through that list and remove references from those who have it.
I'm not sure using a WeakHashMap is a good idea. As you say you may reference Computer objects from several places, so you'll need to make sure all references except one go through weak references, and to remove the hard reference when the Computer is deleted. As you have no control over when weak references are deleted, you may not get consistent results.
If you don't want to have to maintain manually the removal, you could have a flag on Computer objects, like isAlive(). Then you store Computers in special subclasses of Maps and Collections that at read time check if the Computer is alive and if not silently remove it. For example, on a Map<Computer, ?>, the get method would check if the computer is alive, and if not will remove it and return null.
Or the subclasses of Maps and Collections could just register themselves to a single computerRemoved() event, and automatically know how to remove the deleted computers, and you wouldn't have to manually code the removal. Just make sure you keep references to Computer only inside your special maps and collections.
Why not use an actual SQL database? You could use an embedded database engine such as H2, Apache Derby / Java DB, HSQLDB, or SQLite. Using an embedded database engine has the added benefits:
You could inspect the live contents of the monitoring data at any time using the corresponding DB engine's command line client.
You could build a new tool to access and manipulate the data by connecting to a shared database instance.
The schema itself is a form of documentation as to the structure of the monitoring data and the relationships between entities.
You could store different types of data for different types of computers by way of schema normalization.
You can back up the monitoring data.
If you need to restart the monitoring server, you won't lose all of the monitoring data.
Your Web UI could use a JPA implementation such as Hibernate to access the monitoring data and add new records. Or, for a more lightweight solution, you might consider using Spring Framework's JdbcTemplate and SimpleJdbcInsert classes. There is also OrmLite, ActiveJDBC, and jOOQ which each aim to offer simpler access to databases than JDBC.
The problem with WeakHashMap is that managing the references to Computer objects seems difficult and easily breakable.
Hash table based implementation of the Map interface, with weak keys. An entry in a WeakHashMap will automatically be removed when its key is no longer in ordinary use. More precisely, the presence of a mapping for a given key will not prevent the key from being discarded by the garbage collector, that is, made finalizable, finalized, and then reclaimed. When a key has been discarded its entry is effectively removed from the map, so this class behaves somewhat differently from other Map implementations.
It could be the case that a reference to a Computer object might still exist somewhere and the object will not be deleted for the WeakHashMaps. I would prefer a more deterministic approach.
But if you decide to go down this route, you can mitigate the problem I point out by wrapping all these Computer object keys in a class that has strict controls. This wrapper object will create and store the keys and will pay attention to never let references of those keys to leak out.
Novice coder here, so maybe this is too clunky:
Why not keep the monitored computers in a HashMap, and removed computers go to a WeakHashMap? That way all removed computers are seperate and easy to work with, with the gc cleaning up the oldest entries.
I am trying to implement a servlet for GPS monitoring and trying create simple cache, because i think that it will be faster then SQL request for every Http Request. simple scheme:
in the init() method, i reads one point for each vehicle into HashMap (vehicle id = key, location in json = value) . after that, some request try to read this points and some request try to update (one vehicle update one item). Of course I want to minimize synchronization so i read javadoc :
http://docs.oracle.com/javase/6/docs/api/java/util/HashMap.html
Note that this implementation is not synchronized. If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally. (A structural modification is any operation that adds or deletes one or more mappings; merely changing the value associated with a key that an instance already contains is not a structural modification.)
If I am right, there is no any synchronization in my task, because i do only "not a structural modification == changing the value associated with a key that an instance already contains)". is it a correct statement?
Use the ConcurrentHashMap it doesn't use synchronization by locks, but by atomic operations.
Wrong. Adding an item to the hash map is a structural modification (and to implement a cache you must add items at some point).
Use java.util.concurrent.ConcurrentHashMap.
if all the entries are read into hashmap in init() and then only read/modified - then yes, all the other threads theoretically do not need to sync, though some problems might arise due to threads caching values, so ConcurrentHashMap would be better.
perhaps rather than implementing cache yourself, use a simple implementation found in Guava library
Caching is not an easy problem - but it is a known one. Before starting, I would carefully measure wheter you really do have a performance problem, and whether caching actually solve it. You may think it should, and you may be right. You may also be horrendously wrong depending on the situation ("Preemptive optimization is the root of all evil"), so measure.
This said, do not implement a cache yourself, use a library doing it for you. I have personnaly good experience with ehcache.
If I understand correctly, you have two types of request:
Read from cache
Write to cache (to update the value)
In this case, you may potentially try to write to the same map twice at the same time, which is what the docs are referring to.
If all requests go through the same piece of code (e.g. an update method which can only be called from one thread) you will not need synchronisation.
If your system is multi-threaded and you have more than one thread or piece of code that writes to the map, you will need to externally synchronise your map or use a ConcurrentHashMap.
For clarity, the reason you need synchronisation is that if you have two threads both trying to update the a JSON value for the same key, who wins? This is either left up to chance or causes exceptions or, worse, buggy behaviour.
Any time you modify the same element from two threads, you need to synchronise on that code or, better still, use a thread-safe version of the data structure if that is applicable.
Can I use the same aggregate class as a member in other classes?
And if yes would the class that contains the aggregate enforce access etc on that?
Let say you have a User class. Then a class named LogBook and at last a class named Log/Post (something down that alley). The LogBook would be an aggregate root for the Log/Post class and the User would be the overall aggregate in my example. Now, would the User class contain methods for adding log-posts etc? You would make one method in the User class that invokes LogBook class which has a method that does all the logic for actually adding a log.
Or, is a aggregate ALWAYS on top of the hierachy? No nesting.
Here is a nice definition of an Aggregate:
Definition: A cluster of associated objects that are treated as a unit
for the purpose of data changes. External references are restricted to
one member of the Aggregate, designated as the root. A set of
consistency rules applies within the Aggregate's boundaries. Problem:
It is difficult to guarantee the consistency of changes to objects in
a model with complex associations. Invariants need to be maintained
that apply to closely related groups of objects, not just discrete
objects. Yet cautious locking schemes cause multiple users to
interfere pointlessly with each other and make a system unusable.
[DDD, p. 126] Solution: Cluster the Entities and Value Objects into
Aggregates and define boundaries around each. Choose one Entity to be
the root of each Aggregate, and control all access to the objects
inside the boundary through the root. Allow external objects to hold
references to root only. Transient references to the internal members
can be passed out for use within a single operation only. Because the
root controls access, it cannot be blindsided by changes to the
internals. This arrangemens makes it practical to enforce all
invariants for objects in the Aggregate and for the Aggregate as a
whole in any state change. [DDD, p. 129]
I don't think you want the User class reaching into the LogBook's aggregated objects without going through the LogBook class. However, accessing the LogBook from User seems OK.
I think the internals of an aggregate are allowed to hold references to the root of other aggregates. But each aggregate is responsible for enforcing its own boundaries. There is nothing stopping other objects from accessing the "referenced" aggregate completely outside of the first one - i.e. I don't think that nesting or ownership is implied just because one aggregate references another.
In your example, it seems like LogBook would fit better as an aggregate, controlling access to posts. Trying to shoehorn this into a larger User aggregate seems to be an awkward factoring of responsibilities.
I have say list of 1000 beans which I need to share among different projects. I use memcache for this purpose. Currently, loop is run over complete list and each bean is stored in memcache with some unique memcache id. I was wondering, instead of putting each and every bean in memcache independently. Put all the beans in hashmap with the same key which is used for storing beans in memcache, and then put this hashmap in memcache.
Will this give me any significant improvement over putting each and every bean individually in memcached. Or will this cause me any trouble because of large size of the object.
Any help is appreciated.
It won't get you any particular benefit -- it'll actually probably be slower on the load -- serialization is serialization, and adding a hashmap wrapper around it just increases the amount of data that needs to be deserialized and populated. for retrievals, assuming that most lookups are desecrate by the key you want to use for your hashmap you'll have a much much slower retrieval time because you'll be pulling down the whole graph just to get to one of it's discreet member info.
Of course if the data is entirely static and you're only using memcached to populate values in various JVM's you can do it that way and just hold onto the hashmap in a static... but then you're multiplying your memory consumption by the number of nodes in the cluster...
I did some optimization work in spymemcached that helps it do the right thing when doing the wire encoding.
This may, or may not help you with your application. In general, just measure when you have performance questions about your app.