I'm reviewing the capabilities of Googles Guava API and I ran into a data structure that I haven't seen used in my 'real world programming' experience, namely, the BiMap. Is the only benefit of this construct the ability to quickly retrieve a key, for a given value? Are there any problems where the solution is best expressed using a BiMap?
Any time you want to be able to do a reverse lookup without having to populate two maps. For instance a phone directory where you would like to lookup the phone number by name, but would also like to do a reverse lookup to get the name from the number.
Louis mentioned the memory savings possible in a BiMap implementation. That's the only thing that you can't get by wrapping two Map instances. Still, if you let us wrap the Map instances for you, we can take care of a few edges cases. (You could handle all these yourself, but why bother? :))
If you call put(newKey, existingValue), we'll error out immediately to keep the two maps in sync, rather than adding the entry to one map before realizing that it conflicts with an existing mapping in the other. (We provide forcePut if you do want to override the existing value.) We provide similar safeguards for inserting null or other invalid values.
BiMap views keep the two maps in sync: If you remove an element from the entrySet of the original BiMap, its corresponding entry is also removed from the inverse. We do the same kind of thing in Entry.setValue.
We handle serialization: A BiMap and its inverse stay "connected," and the entries are serialized only once.
We provide a smart implementation of inverse() so that foo.inverse().inverse() returns foo, rather than a wrapper of a wrapper.
We override values() to return a Set. This set is identical to what you'd get from inverse().keySet() except that it maintains the same iteration order as the original BiMap.
Related
I have a question about mapping Long value to another Long value and the best way to achieve that.
I have to map left values to right values just right before writing data to database.
3 => 70
8 => 12
1 => 45
Is there any "best way"? I was thinking about static map where the left Long will be a key and a right would be a value, and I have just to get a value corresponding to a given key.
Is it good approach?
You have two main options: an associative container, or an array. If the input values are all within a small range and performance is very important, you could use an array. Otherwise you may as well use a map as you said.
As #John Zwinck points out, a map is generally fine for this type of thing. The cost of mapping from one Long to another is trivial & going to be dwarfed by the network latency of writing to a database (so don't use a primitive arrray :).
Open for extension, but closed for modification
I think it's probably more import for you to consider what happens if the mappings change or you need to add another one. In line with SOLID principles (and in particular open-closed), it should be possible to modify the mappings without changing the class.
In practice you should make sure you can read the mappings (initially,on demand or periodically) from an external source (e.g. a property file, db, NoSQL cache).
Use map and pass initial capacity to constructor if you know the size of your key-value pairs.Choose implementation of map carefully depending upon concurrency/ordering requirements as per source.
Here is a tricky data structure and data organization case.
I have an application that reads data from large files and produces objects of various types (e.g., Boolean, Integer, String) that are categorized in a few (less than a dozen) groups and then stored in a database.
Each object is currently stored in a single HashMap<String, Object> data structure. Each such HashMap corresponds to a single category (group). Each database record is built from the information in all the objects contained in all categories (HashMap data structures).
A requirement has appeared for checking whether subsequent records are "equivalent" in the number and type of columns, where equivalence must be verified across all maps by comparing the name (HashMap key) and the type (actual class) of each stored object.
I am looking for an efficient way of implementing this functionality, while maintaining the original object categorization, because listing objects by category in the fastest possible way is also a requirement.
An idea would be to just sort the keys (e.g., by replacing each HashMap with a TreeMap) and then walk over all maps. An alternative would be to just copy everything in a TreeMap for comparison purposes only.
What would be the most efficient way of implementing this functionality?
Also, if how would you go about finding the difference (i.e., the fields added and those removed), between successive records?
Create a meta SortedSet in which you store all the created maps.
Means SortedSet<Map<String,Object>> e.g. a TreeSet which as a custom Comparator<Map<String,Object>> which does check exactly your requirements of same number and names of keys and same object type per value.
You can then use the contains() method of this meta set structure to find out if a similar record does already exist.
==== EDIT ====
Since I've misundertood the relation between database records and the maps in the first place, I've to change some semantics my answer now of course a little bit.
Still I'would use the mentioned SortedSet<Map<String,Object>> but of course the Map<String,Object> would now point to that Map you and havexy suggested.
On the other hand could it be a step forward to use a Set<Set<KeyAndType>> or SortedSet<Set<KeyAndType>> where your KeyAndType will only contain the key and the type with appropriate Comparable implementation or equals with hashcode.
Why? You asked how to find the differences between two records? If each record relates to one of those inner Set<KeyAndType> you can easily use retainAll() to form the intersection of two successive Sets.
If you would compare this to the idea of a SortedSet<Map<String,Object>>, in both ways you would have the logic which differenciates between the fields within the comparator, one time comparing inner sets, one time comparing inner maps. And since this information gets lost when the surrounding set is constructed, it will be hard to get the differences between two records later on, if you do not have another reduced structure which is easy to use to find such differences. And since such a Set<KeyAndType> could act as key as well as as easy base for comparison between two records, it could be a good candidate to be used for both purposes.
If furthermore you wanna keep the relation between such a Set<KeyAndType> to your record or the group of Map<String,Object> your meta structure could be something like:
Map<Set<KeyAndType>,DatabaseRecord> or Map<Set<KeyAndType>,GroupOfMaps> implemented by a simple LinkedHashMap which allows simple iteration in original order.
One soln is to keep both category based HashMap and combined TreeMap. This will have slight more memory requirement, not much though, as you ll just keep the same reference in both of them.
So whenever you are adding/removing to HashMap you will do the same operation in the TreeMap too. This way both will always be in sync.
You can then use TreeMap for comparison, whether you want comparison of type of object or actual content comparison.
I have input as a values. I need to retrieve key of that values in hashtable.
Please help to me sort out this. Appreciate your help.
Basically that's not how a hash table works - you're expected to look up by key. You can iterate over all the entries and find the - potentially multiple - keys which map to the particular value, but it won't be fast.
Instead, you should consider using a bidirectional map (bimap) such as the ones provided by Guava, assuming your situation really calls for a single-key-to-single-value solution. (There are lots of options around collections in Guava; if you give us more information about your situation, we may be able to help more.)
What you are looking for is a BiMap data structure. Google's guava provides an implementation of it. BidiMap interface in Commons Collections.
I have two sets both containing the same object types. I would like to be able to access the following:
the intersection of the 2 sets
the objects contained in set 1 and not in set 2
the objects contained in set 2 and not in set 1
My question relates to how best to compare the two sets to acquire the desired views. The class in question has numerous id properties which can be used to uniquely identify that entity. However, there are also numerous properties in the class that describe the current status of the object. The two sets can contain objects that match according to the ids, but which are in a different state (and as such, not all properties are equal between the two objects).
So - how do I best implement my solution. To implement an equals() method for the class which does not take into account the status properties and only looks at the id properties would not seem to be very true to the name 'equals' and could prove to be confusing later on. Is there some way I can provide a method through which the comparisons are done for the set methods?
Also, I would like to be able to access the 3 views described above without modifying the original sets.
All help is much appreciated!
(Edit: My first suggestion has been removed because of an unfortunate implementation detail in TreeSet, as pointed out by Martin Konecny. Some collection classes (e.g. TreeSet) allow you to supply a Comparator that is to be used to compare elements, so you might want to use one of those classes - at least, if there is some natural way of ordering your objects.)
If not (i.e. if it would be difficult to implement CompareTo(), while it would be simpler to implement HashCode() and Equals()), you could create a wrapper class which implements those two functions by looking at the relevant fields from the objects they wrap, and create a regular HashSet of these wrapper objects.
Short version: implement equals based on the entity's key, not state.
Slightly longer version: What the equals method should check depends on the type of object. For something that's considered a "value" object (say, an Integer or String or an Address), equality is typically based on all fields being the same. For an object with a set of fields that uniquely identify it (its primary key), equality is typically based on the fields of the primary key only. Equality doesn't necessarily need to (and often shouldn't) take in to consideration the state of an object. It needs to determine whether two objects are representations of the same thing. Also, for objects that are used in a Set or as keys in a Map, the fields that are used to determine equality should generally not be mutable, since changing them could cause a Set/Map to stop working as expected.
Once you've implemented equals like this, you can use Guava to view the differences between the two sets:
Set<Foo> notInSet2 = Sets.difference(set1, set2);
Set<Foo> notInSet1 = Sets.difference(set2, set1);
Both difference sets will be live views of the original sets, so changes to the original sets will automatically be reflected in them.
This is a requirement for which the Standard C++ Library fares better with its set type, which accepts a comparator for this purpose. In the Java library, your need is modeled better by a Map— one mapping from your candidate key to either the rest of the status-related fields, or to the complete object that happens to also contain the candidate key. (Note that the C++ set type is mandated to be some sort of balanced tree, usually implemented as a red-black tree, which means it's equivalent to Java's TreeSet, which does accept a custom Comparator.) It's ugly to duplicate the data, but it's also ugly to try to work around it, as you've already found.
If you have control over the type in question and can split it up into separate candidate key and status parts, you can eliminate the duplication. If you can't go that far, consider combining the candidate key fields into a single object held within your larger, complete object; that way, the Map key type will be the same as that candidate key type, and the only storage overhead will be the map keys' object references. The candidate key data would not be duplicated.
Note that most set types are implemented as maps under the covers; they map from the would-be set element type to something like a Boolean flag. Apparently there's too much code that would be duplicated in wholly disjoint set and map types. Once you realize that, backing up from using a set in an awkward way to using a map no longer seems to impose the storage overhead you thought it would.
It's a somewhat depressing realization, having chosen the mathematically correct idealized data structure, only to find it's a false choice down a layer or two, but even in your case your problem sounds better suited to a map representation than a set. Think of it as an index.
I'm looking for an implementation of java.util.Map that has a method that will return all they keys mapped to a given value, that is, there are multiple keys map to the same value. I've looked at Google Collections and Apache Commons and didn't notice anything. Of course, I could iterate through the keyset and check each corresponding value or use two maps, but I was hoping there was something available already built.
I don't know if that solution is good for you, but you can implement easily that by using a standard map from keys to values and a MultiMap from values to key.
Of course you'll have to take care of the syncronization of the two structures, IE when you remove a key from the map, you have to remove the key itself from the set of keys mapped to the value in the multimap.
It doesn't seems difficult to implement, maybe a bit heavy from the memory overhead aspect.
What you're looking for here is a bidirectional map, for which there is an implementation in commons collections.
Your value objects could have a property (of type ArrayList maybe) that holds all the keys.
Then you extend HashMap (or whatever Map impl you use) and override put so that when you put and object for a key you also add the key to your object's list of keys.
I can't find a ready made class that supports values with multiple keys. However you could re-implement the Apache Commons DualHashBidiMap using a MultiHashMap in place of one of the HashMaps.