I need to store keys as case insensitive, and all values for keys like STATE/state/State are merged into one Set. However the catch is I need the case sensitive version of the original key back at some point so a generic CaseInsensitiveMap doesn't work. I only need back the first capitalization of 'state' added, so in this case I keep STATE and discard state/State.
I've looked at a few options for implementing this data structure, like Guava HashMultimap and Tuples, but none seem quite right.
<CaseInsensitiveOriginalKey, OriginalKey, Set<Values>>
So for example if I add a key 'State' with values {Texas, Oklahoma} it will be stored as:
<state, State, {Texas, Oklahoma}>
The idea being if I create some kind of .add(StATe, {Nebraska}) then the map, seeing a case-insensitive entry for 'state' already exists, becomes:
<state, State, {Texas, Oklahoma, Nebraska}>
and for a new key, .add(COLOR, {blue, red})
The overall map becomes:
<state, State, {Texas, Oklahoma, Nebraska}>
<color, COLOR, {blue, red}>
.get(ColoR) returns {red, blue}
.getKey(coLOR) returns COLOR
Any ideas on how to best accomplish this?
You can maintain two maps:
One is a Map<String, Set<String>> that maps the case-insensitive key to the corresponding set of strings (e.g. "state" → {"Texas", "Oklahoma"}).
The other is a Map<String, String> that maps the case-insensitive key to its corresponding case-sensitive key (e.g. "state" → "State").
You can create your own class that has these two maps as private fields and ensures that they are kept in sync whenever a pairing is added/removed/updated.
What you need is something like Map<CaseInsensitiveOriginalKey, Record> where Record is a custom class with the original (case-sensitive) key and the set of values as attributes.
You could get away with using a generic Pair class instead of a custom Record class, but (IMO) that would be poor design.
However, there is a problem with your requirements:
However the catch is I need the case sensitive version of the original key back ...
Your examples indicate that you could have multiple case sensitive versions of the original key; i.e. the one that you saw first (e.g. "State") and subsequent ones (e.g. "STate", "state", etc). So which is the correct original key to use? And what about the case where the first one you saw was ... erm ... junky?
The point is that treating the first version that you saw as the definitive / preferred one is going to be problematic. You need something (or someone) to figure out the definitive version intelligently. To do that you probably need to keep all of the versions that you saw until (at least) you completed the initial data capture phase. You may even need to keep their frequencies and/or their contexts.
I'd suggest a data structure that has a couple of maps. One is a map from each (case-sensitive) key to the case-insensitive key and the other is a map from the case-insensitive key to the value. Given a case-sensitive key, each access would be a two-step affair: find the case-insensitive key to use from the first map and then use the key with the second map.
Related
I have a situation where many many keys are pointing to a single value. The situation arises from a service locator pattern that I am implementing such that -
each method in an interface is represented as a signature string
All such signatures of a single interface are used as keys
The value being the full canonical name of the implementation class
Thus my need is to retrieve a single value when user requests any of the matching keys.
In a sense I need an opposite of MultiMap from Guava .
I am looking for the most optimized solution there is since my keys are very similar though unique for a specific value and I am not sure if using a generic Map implementation like HashMap is efficient enough to handle this case.
e.g. all the below signatures
==============
_org.appops.server.core.service.mocks.MockTestService_testOperationThree
_org.appops.server.core.service.mocks.MockTestService_getService
_org.appops.server.core.service.mocks.MockTestService_start
_org.appops.server.core.service.mocks.MockTestService_testOperationTwo_String_int
_org.appops.server.core.service.mocks.MockTestService_getName
_org.appops.server.core.service.mocks.MockTestService_shutdown
_org.appops.server.core.service.mocks.MockTestService_testOperationOne_String
=======
Point to a single class i.e. org.appops.server.core.service.mocks.MockTestServiceImpl and I am anticipating hundreds of such classes (values) and thousands of such similar signatures (keys) .
In case there is no optimized way I could always use a HashMap with replicated values for each group of keys which I would like to avoid.
Ideally I would like to use a ready utility from Guava.
HashMap is actually what you need, and the issue is that you misunderstand what it does.
In case there is no optimized way I could always use a HashMap with replicated values for each group of keys which I would like to avoid.
HashMap does not store a copy of the value for each key mapping to that value. HashMap stores a reference to the Java object. It's always the same cost. A HashMap<Integer, BigExpensiveObject> where every key is mapped to the same BigExpensiveObject takes exactly the same amount of memory as a HashMap<Integer, Integer> where every key is mapped to the same Integer. The only memory difference in the whole program would be the memory difference between one BigExpensiveObject and one Integer.
Sets are essentially Maps from an existential point of view. There is nothing a Map can not do which a Set can, I assume. We have these overheads of defining key-value pairs in Maps which is not there in the Sets. But again the elements of a Set are just the keys of the underlying Map, right? So what is the point of having Sets around when Maps are able to do all the things required? I hope a Set takes the same amount of memory as a Map does?
What are key arguments in favor of existence of Sets?
For instance, in the case of Lists, we have ArrayList and LinkedList which have differences and we can choose between these two as per our requirements.
I would argue that a Map is actually a Set!
Map<Key,Value> can be implemented with Set<Entry<Key,Value>>
This is similar to the mathematical foundations of what sets, maps, and functions are.
Firstly, can we agree that a Map is a function from Key=>Value (or Domain=>Range). Each key corresponds with at most one value, so it is a partial function (or a complete function only upon those keys in the map). So a map is a function. (Scala even goes so far as to have Map implement the Function1 interface.)
Secondly, what is a function? A function is a set of tuples where each first element occurs only once in the set. The second element of the tuple is the value returned by the function.
So we have Map is a Function is a Set.
On a practical note, there are very good reasons for having Sets. They are very often the correct data structure to use from a conceptual point of view, even before you start worrying about performance. I'd use them over a List in most situations.
The primary difference between a Set and a Map is that a Map holds two object per Entry e.g. key and value and it may contain duplicate values but keys are always unique. But Set holds only keys and those are unique.
This question is kind of already posted here:
How to convert Map<String, String> to Map<Long, String> using guava
I think the answer of CollinD is appropriate:
All of Guava's methods for transforming and filtering produce lazy
results... the function/predicate is only applied when needed as the
object is used. They don't create copies. Because of that, though, a
transformation can easily break the requirements of a Set.
Let's say, for example, you have a Map<String, String> that contains
both "1" and "01" as keys. They are both distinct Strings, and so the
Map can legally contain both as keys. If you transform them using
Long.valueOf(String), though, they both map to the value 1. They are
no longer distinct keys. This isn't going to break anything if you
create a copy of the map and add the entries, because any duplicate
keys will overwrite the previous entry for that key. A lazily
transformed Map, though, would have no way of enforcing unique keys
and would therefore break the contract of a Map.
This is true, but actually I don't understand why it is not done because:
When the key transformation happen, if 2 keys are "merged", a runtime exception could be raised, or we could pass a flag to indicate to Guava to take any value of the multiple possible values for the newly computed key (failfast/failsafe possibilities)
We could have a Maps.transformKeys which produces a Multimap
Is there a drawback I don't see in doing such things?
As #CollinD suggests, there's no way to do this in a lazy way. To implement get, you have to convert all the keys with your transformation function (to ensure any duplicates are discovered).
So applying Function<K,NewK> to Map<K,V> is out.
You could safely apply Function<NewK,K> to the map:
V value = innerMap.get( fn.apply(newK) );
I don't see a Guava shorthand for that--it may just not be useful enough. You could get similar results with:
Function<NewK,V> newFn = Functions.compose(Functions.forMap(map), fn);
Why does java.util.Map.values() allow you to delete entries from the returned Collection when it makes no sense to remove a key value pair based on the value? The code which does this would have no idea what key the value(and hence a key) being removed is mapped from. Especially when there are duplicate values, calling remove on that Collection would result in an unexpected key being removed.
it makes no sense to remove a key value pair based on the value
I don't think you're being imaginative enough. I'll admit there probably isn't wide use for it, but there will be valid cases where it would be useful.
As a sample use case, say you had a Map<Person, TelephoneNumber> called contactList. Now you want to filter your contact list by those that are local.
To accomplish this, you could make a copy of the map, localContacts = new HashMap<>(contactList) and remove all mappings where the TelephoneNumber starts with an area code other than your local area code. This would be a valid time where you want to iterate through the values collection and remove some of the values:
Map<Person, TelephoneNumber> contactList = getContactList();
Map<Person, TelephoneNumber> localContacts = new HashMap<Person, TelephoneNumber>(contactList);
for ( Iterator<TelephoneNumber> valuesIt = localContacts.values().iterator(); valuesIt.hasNext(); ){
TelephoneNumber number = valuesIt.next();
if ( !number.getAreaCode().equals(myAreaCode) ) {
valuesIt.remove();
}
}
Especially when there are duplicate values, calling remove on that Collection would result in an unexpected key being removed.
What if you wanted to remove all mappings with that value?
It has to have a remove method because that's part of Collection. Given that, it has the choice of allowing you to remove values or throwing an UnsupportedOperationException. Since there are legitimate reasons that you might want to remove values, why not choose to allow this operation?
Maybe there's a given value where you want to remove every instance
of it from the Map.
Maybe you want to trim out every third
key/value pair for some reason.
Maybe you have a map from hotel
room number to occupancy count and you want to remove everything from
the map where the occupancy count is greater than one in order to
find a room for someone to stay in.
...if you think about it more
closely, there are plenty more examples like this...
In short: there are plenty of situations where this might be useful and implementing it doesn't harm anyone who doesn't use it, so why not?
I think there is quite often a use for removing a value based on a key; other answers show examples. Given that, if you want to remove a certain value, why would you only want one particular key of it removed? Even if you did, you'd have to know which key you wanted to remove (or not, as the case may be), and then you should just remove it by key anyway.
The Collection returned is a special Collection, and its semantics are such that it knows how values in it relate back to the Map it came from. The javadoc indicates what Collection operation the returned collection supports.
I want a map indexed by two keys (a map in which you put AND retrieve values using two keys) in Java. Just to be clear, I'm looking for the following behavior:
map.put(key1, key2, value);
map.get(key1, key2); // returns value
map.get(key2, key1); // returns null
map.get(key1, key1); // returns null
What's the best way to to it? More specifically, should I use:
Map<K1,Map<K2,V>>
Map<Pair<K1,K2>, V>
Other?
(where K1,K2,V are the types of first key, second key and value respectively)
You should use Map<Pair<K1,K2>, V>
It will only contain one map,
instead of N+1 maps
Key construction
will be obvious (creation of the
Pair)
Nobody will get confused as to
the meaning of the Map as its
programmer facing API won't have changed.
Dwell time in the data structure would be shorter, which is good if you find you need to synchronize it later.
If you're willing to bring in a new library (which I recommend), take a look at Table in Guava. This essentially does exactly what you're looking for, also possibly adding some functionality where you may want all of the entries that match one of your two keys.
interface Table<R,C,V>
A collection that associates an
ordered pair of keys, called a row key
and a column key, with a single value.
A table may be sparse, with only a
small fraction of row key / column key
pairs possessing a corresponding
value.
I'd recommend going for the second option
Map<Pair<K1,K2>,V>
The first one will generate more overload when retrieving data, and even more when inserting/removing data from the Map. Every time that you put a new Value V, you'll need to check if the Map for K1 exists, if not create it and put it inside the main Map, and then put the value with K2.
If you want to have an interface as you're exposing initially wrap your Map<Pair<K1,K2>,V> with your own "DoubleKeyMap".
(And don't forget to properly implement the methods hash and equals in the Pair class!!)
While I also am on board with what you proposed (a pair of values to use as the key), you could also consider making a wrapper which can hold/match both keys. This might get somewhat confusing since you would need to override the equals and hashCode methods and make that work, but it could be a straightforward way of indicating to the next person using your code that the key must be of a special type.
Searching a little bit, I found this post which may be of use to you. In particular, out of the Apache Commons Collection, MultiKeyMap. I've never used this before, but it looks like a decent solution and may be worth exploring.
I would opt for the Map<Pair<K1,K2>, V> solution, because:
it directly expresses what you want to do
is potentially faster because it uses fewer indirections
simplifies the client code (the code that uses the Map afterwards
Logically, you Pair (key1, key2) corresponds to something since it is the key of your map. Therefore you may consider writing your own class having K1 and K2 as parameters and overriding the hashCode() method (plus maybe other methods for more convenience).
This clearly appears to be a "clean" way to solve your problem.
I have used array for the key: like this
Map<Array[K1,K2], V>