Java Map with container object keys, lookup by container object field value?

Java Map with container object keys, lookup by container object field value? - java

Let's say I have a simple Java object, let's call it DefinedData. It will contain a number of final fields of varying types, such as strings, integers, enums, and even perhaps a set or two of strings. All in all, it's just a relatively simple data container. There will be potentially 1k to 2k of these, all static final objects. Most of these fields will be unique in that no other DefinedData object will have the same value for that field.
These will be placed into a Map of (DefinedData, Object). Now, you could easily get that Object out of the Map if you have the DefinedData object, but what if you only have one of the unique field values? You can't just pass that to the Map. You'd have to iterate over the keys and check, and that would mean wrapping the map with a lookup method for each field in DefinedData. Doable, but not the prettiest thing out there, especially if there are a lot of values in the Map and a lot of lookups, which is possible. Either that or there would need to be a lookup for DefinedData objects, which would again be a bunch of Maps...
This almost sounds like a job for a database (look up based on any column), but that's not a good solution for this particular problem. I'd also rather avoid having a dozen different Maps, each mapping a single field from DefinedData to the Object. The multikey maps I've seen wouldn't be applicable as they require all key values, not just one. Is there a Map, Collections, or other implementation that can handle this particular problem?

The only way to avoid having multiple maps is by iterating through all your DefinedData objects in some way. Reason being, you have no way of knowing how to divide them out or sort them until the request is made.
An example could be made if you had a bucket of apples. At any moment someone may come up and request a certain color, a certain kind, or a certain size. You have to choose to sort by one of those categories, and the other categories have to be searched through all the apples.
If only you could have three identical sets of apples; one for each category.
Having multiple maps would be a faster solution, though take up more memory, while iterating would be easier to achieve, slower, and use less memory.

I hesitate to propose this, but you could encapsulate your lookups behind some sort of Indexer class that auto-generates a single map via reflection using the fields of supplied objects.
By single map, I mean just one single map for the whole indexer which creates a key based on both the field name and data (say concatenating the string representing the field name with a string representation of the data).
Lookups against the indexer would supply both a field name and data value, which would then be looked up in the single map encapsulated by the indexer.
I do not think this necessarily has any advantage over a similar solution where the indexer is instead backed by a map of maps (map of field name to map of data to object).
The indexer could also be designed to use annotations so that not all fields are indexed, only those suitably annotated (or vice-versa, with annotations to exclude fields).
Overall, a map of map solutions strikes me as easier since it cuts out the step of complicated key assembly (which could be complicated for certain field data types). In either case, encapsulating it all in an Indexer that auto-generates its maps seems to be the way to go.
Update:
Made a quick non-generified proof of concept for an Indexer type class (using the map of maps approach). This is in no way a finished work, but illustrates the concept above. One major deficiency being the reliance on beans, so both public and private fields without accessor methods are invisible to this indexer.
public class Indexer
{
private Map<String,Map<Object,Set<Object>>> index = new HashMap<String,Map<Object,Set<Object>>>();
// Add an object to the index, all properties are indexed.
public void add(Object object) throws Exception
{
BeanInfo info = Introspector.getBeanInfo(object.getClass());
PropertyDescriptor[] propertyDescriptors = info.getPropertyDescriptors();
for (PropertyDescriptor descriptor : propertyDescriptors)
{
String fieldName = descriptor.getName();
Map<Object,Set<Object>> map = index.get(fieldName);
if (map == null)
{
map = new HashMap<Object,Set<Object>>();
index.put(fieldName, map);
}
Method method = descriptor.getReadMethod();
Object data = method.invoke(object);
Set<Object> set = map.get(data);
if (set == null)
{
set = new HashSet<Object>();
map.put(data, set);
}
set.add(object);
}
}
// Retrieve the set of all objects from the index whose property matches the supplied.
public Set<Object> get(String fieldName, Object value)
{
Map<Object,Set<Object>> map = index.get(fieldName);
if (map != null)
{
Set<Object> set = map.get(value);
if (set != null)
{
return Collections.unmodifiableSet(set);
}
}
return null;
}
}

Related

Why does HBase use NavigableMap<Cell, Cell> to store Cell?

What's the point that makes key and value the same? Will the JVM optimize the memory and make them only one copy in heap?

Map<T, T> is often used to implement a Set<T> with the same properties as a backing map. E.g. if a map is thread-safe, the corresponding set will be thread-safe, too. If a map is navigable, the set will be also navigable, etc.
Keeping an element in both key and value parts provides a way to get an exact instance stored in the set. Here are some typical use cases for this pattern.
Obtaining a canonical object. Think of something like String.intern() but for arbitrary objects. Interning can be easily implemented with Map<T, T>:
T existing = map.putIfAbsent(obj, obj);
return existing != null ? existing : obj;
Storing mutable objects in a set. If you want to modify an existing object, a set backed by Map<T, T> will come to the rescue again:
T existing = map.get(key);
if (existing != null) {
existing.mutate();
}
As far as I understand, a concurrent NavigableMap<Cell, Cell> is used in HBase to implement a concurrent navigable set of Cells with the above properties.
Note that key and value in such map are just two references to the same object. The object itself is not copied.

Java Hash Map Performance

protected static final Map<String, Integer> m = new HashMap();
I have a question in regards to performance of using the above. I am creating a 2D Tile Engine for a simple RPG game. I am using the hash map above to store the name of a tile along with its respected color code (Ex: 0xff00ff21). Since this is a game, the code m.get("name");is called an enormous amount of times to check if a tile is updated or not. (Ex: The render method with my computer runs at about 850 times per second). Please also note, I made sure to declare the HashMap outside of any loops and that it is initialized via a method call(also static) through the constructor that m.put("name", value) inputs all the information.
1) Is using a HashMap in this way a good idea? Is there perhaps another way to go about this more efficiently.
2) Is using a static final implementation of a hashMap good practice? The values will never change and the values used will be needed within the super class and its sub classes (Hence the "protected"). Can I set the key and value variables to final as well?
3) I understand that HashMap doesn't allow for duplicate keys, but from tinkering around with the HashMap, by inputting two of the same keys it simply replaces the older key and value with the newest .put("name", value);Is there to a way to throw an error perhaps if you try to .put("water", 0xff00ff21) and .put("water", 0xff221133) and/or .put("water",0xff00ff21)
Thank you for your time. New to this community and looking forward to helping/getting helped.

Please note that it is bad to ask three questions in one post.
1) IMO, yes. I usually use a HashMap for this kind of things. This can clarify things a lot better and enhance the readability of your code. Just imagine if you you only use hex color values for these kinda things, I think a lot of people would ask you what is 0xff221133 and what is 0xff00ff21.
2) Yes it is! static final is used when you want to declare some kind of constant. However, declaring a hash map as static final doesn't mean that its content cannot be changed. To prevent this, encapsulate the map in a class and only provide get methods:
final class TileColorMap {
private static final HashMap<String, Integer> tileColorMap = //blah blah blah
static {
//Add things to your map here
}
public static int get(String key) {
return tileColorMap.get(key);
}
}
3) If you look at the docs, specifically, Hashmap<>.put, you will see that:
Returns: the previous value associated with key, or null if there was no mapping for key. (A null return can also indicate that the map previously associated null with key.)
So you can add a method that put something into the map and will throw an exception if the key is a duplicate by checking whether the returned value is null.
private static void putStuffInMap (String key, int value) {
Integer returnedValue = tileColorMap.put(key, value);
if (returnedValue != null) {
throw new RuntimeException("Duplicate Keys!");
}
}

1) I'm not sure I understand what you're doing here, but how many different kinds of tiles could you be using here? You might be better off just defining a Tile object with a few constant Tiles that you can just reuse again and again by referring to Tile.WATER, etc instead of doing a hashtable lookup. If water has multiple colors just put them all in the water Tile object and pick from amongst them.
public class Tile
{
public static final Tile WATER = new Tile(...);
public static final Tile ROCK = new Tile(...);
}
2) Making a hashmap instance static and final doesn't make it immutable. The contents can still be updated. There's no performance benefit anyway. A read only hashmap wouldn't be any faster than a writable one. If you don't want it updated, just don't update it. It's your code, it's not like it's going to write to the hashmap when you aren't looking.
3) You could subclass hashmap and make it not accept duplicate keys, but again, I'm not sure what the purpose of this is- why aren't you sure what colors your tiles will be at run time? This strikes me as the kind of thing decided before compile time.

Using HashMap should be efficient enough. Is there more efficient way? Of course there will always be but whether it is appropriate depends on your design. For example, if tiles are statically defined, you may use enum/integer constants to represent a tile (instead of using "name"), and your tile-to-XXX mapping can be easily expressed as ArrayList or even array. (Again, it may not be appropriate to your design).
Again it depends on the design. Are the class containing the map something that is going to instantiate multiple times but you really want each instance to share same mapping? Are you going to give flexibility to the child class to set up the mapping? It is only meaningful to make it static if first answer is YES and second is NO.
To avoid change of content for the map, you can wrap it in a unmodifiable map:
// Access your data through this, so you won't mistakenly modify it
protected final Map<...> tileColorMap = Collections.unmodifiableMap(getTileColorMap());
// your super class or sub-class is providing the actual map
protected Map<...> getTileColorMap() {
Map<...> tileColorMap = new HashMap<>();
// do your setup
return tileColorMap;
}
If you are using Java 8+, it may be better to use Map#merge() method, and have the remapping function throw an exception you desire. Compared with the approach given by other answers, using merge() is safer as original value won't be mistakenly replaced. You may also selectively throw the exception if the new value is different from existing value.

How to structure data that can be both indexed and sorted on different keys?

I'd like to maintain a set of data that has two main attributes: 1. I can quickly look up the existence of an object by a numerical ID, and 2. I want to sort the data, but avoid needlessly sorting it since it can be slow. For a more concrete example, I have a set of user data, where each user has a unique ID (an int) and a unique username (a String). I'll be adding and removing users, and occasionally I want to generate a human-readable, alphabetically-sorted list to the user, but as my number of users increases, so does the time needed to sort the data.
How would you structure this? The only reasonable approach I can think of involves creating two separate data structures, and redundantly add/remove items to BOTH structures at the same time. As my data grows, it will be using more data than a single structure would. I might also introduce more bugs this way, as I have to remind myself to duplicate the operations to both structures when I come back to add to the code later. In other words, I could have:
TreeMap<String,Integer> nameSortedMap = new TreeMap<String,Integer>(String.CASE_INSENSITIVE_ORDER);
and
Map<Integer,String> idMap = new HashMap<Integer,String>();
Whenever I add or remove data, I do it on both maps. If I am retrieving a username by ID, I'd call idMap.get(id) or idMap.contains(id) (to see if a user exists). On the other hand, if I need to display a sorted list, I would use nameSortedMap.keySet(), which I gather should already be in name order, avoiding the need for additional work each time a sorted list is needed.
How's my thought process? Is there a better or simpler way to accomplish this? Thank you!

There are two ways I can think of:
Use a database and index both columns. Databases are fast, and can be very small (see: SQLite), but they're probably overkill if you don't need to save the data, or if this is the only thing you would use it for.
Create a class containing both of your maps above, which handles all inserting and deleting. That way, you only have one place where you have to remember to do operations on both. This is one of the major selling points for object oriented programming.

If you dont call the ordered list very often I dont think would be necessary to keep two maps, but if you prefer to do that. I would suggest you to create one class thtat extend the HashMap and implement the methods to handle both maps. Example:
public class UserMap extends HashMap<Integer, String> {
TreeMap<String, Integer> nameSortedMap = new TreeMap<String, Integer>(String.CASE_INSENSITIVE_ORDER);
#Override
public String put(Integer key, String value) {
String put = super.put(key, value);
nameSortedMap.put(value, key);
return put;
}
#Override
public String remove(Object key) {
String toRemove = get(key);
if (toRemove != null) {
remove(key);
getOrderedName().remove(toRemove);
}
return toRemove;
}
public Set<String> getSortedNames() {
return nameSortedMap.keySet();
}
}

Which Map provides moving objects to different index?

I need a Map that takes Key-Value pair (probably HashMap<String, Object>) whereas the Key will be a property of the Object itself, like:
class Person {
String name; //I know a string is not a good unique key, but ok to illustrate my example
}
Person person = new Person("John");
map.put(person.getName(), person);
Further, the map must provide an accessor similar to ArrayList.add(idx, object). It should thereby also be possible to reorder an object to a different position and adjust the rest accordingly.
Which Map/List is suitable for this?
(by the way: I should be runable with GWT, so external libs might be problematic).

There's no single standard container that does all of this.
However, a combination of a map and an ArrayList would satisfy all of your requirements.

Accesing hidden getEntry(Object key) in HashMap

I have similar problem to one discussed here, but with stronger practical usage.
For example, I have a Map<String, Integer>, and I have some function, which is given a key and in case the mapped integer value is negative, puts NULL to the map:
Map<String, Integer> map = new HashMap<String, Integer>();
public void nullifyIfNegative(String key) {
Integer value = map.get(key);
if (value != null && value.intValue() < 0) {
map.put(key, null);
}
}
I this case, the lookup (and hence, hashCode calculation for the key) is done twice: one for lookup and one for replacement. It would be nice to have another method (which is already in HashMap) and allows to make this more effective:
public void nullifyIfNegative(String key) {
Map.Entry<String, Integer> entry = map.getEntry(key);
if (entry != null && entry.getValue().intValue() < 0) {
entry.setValue(null);
}
}
The same concerns cases, when you want to manipulate immutable objects, which can be map values:
Map<String, String>: I want to append something to the string value.
Map<String, int[]>: I want to insert a number into the array.
So the case is quite common. Solutions, which might work, but not for me:
Reflection. Is good, but I cannot sacrifice performance just for this nice feature.
Use org.apache.commons.collections.map.AbstractHashedMap (it has at least protected getEntry() method), but unfortunately, commons-collections do not support generics.
Use generic commons-collections, but this library (AFAIK) is out-of-date (not in sync with latest library version from Apache), and (what is critical) is not available in central maven repository.
Use value wrappers, which means "making values mutable" (e.g. use mutable integers [e.g. org.apache.commons.lang.mutable.MutableInt], or collections instead of arrays). This solutions leads to memory loss, which I would like to avoid.
Try to extend java.util.HashMap with custom class implementation (which should be in java.util package) and put it to endorsed folder (as java.lang.ClassLoader will refuse to load it in Class<?> defineClass(String name, byte[] b, int off, int len), see sources), but I don't want to patch JDK and it seems like the list of packages that can be endorsed, does not include java.util.
The similar question is already raised on sun.com bugtracker, but I would like to know, what is the opinion of the community and what can be the way out taking in mind the maximum memory & performance effectiveness.
If you agree, this is nice and beneficiary functionality, please, vote this bug!

As a logical matter, you're right in that the single getEntry would save you a hash lookup. As a practical matter, unless you have a specific use case where you have reason to be concerned about the performance hit( which seems pretty unlikely, hash lookup is common, O(1), and well optimized) what you're worrying about is probably negligible.
Why don't you write a test? Create a hashtable with a few 10's of millions of objects, or whatever's an order of magnitude greater than what your application is likely to create, and average the time of a get() over a million or so iterations (hint: it's going to be a very small number).
A bigger issue with what you're doing is synchronization. You should be aware that if you're doing conditional alterations on a map you could run into issues, even if you're using a Synchronized map, as you'd have to lock access to the key covering the span of both the get() and set() operations.

Not pretty, but you could use lightweight object to hold a reference to the actual value to avoid second lookups.
HashMap<String, String[]> map = ...;
// append value to the current value of key
String key = "key";
String value = "value";
// I use an array to hold a reference - even uglier than the whole idea itself ;)
String[] ref = new String[1]; // lightweigt object
String[] prev = map.put(key, ref);
ref[0] = (prev != null) ? prev[0] + value : value;
I wouldn't worry about hash lookup performance too much though (Steve B's answer is pretty good in pointing out why). Especially with String keys, I wouldn't worry too much about hashCode() as its result is cached. You could worry about equals() though as it might be called more than once per lookup. But for short strings (which are often used as keys) this is negligible too.

There are no performance gain from this proposal, because performance of Map in average case is O(1). But enabling access to the raw Entry in such case will raise another problem. It will be possible to change key in entry (even if it's only possible via reflection) and therefore break order of the internal array.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.