Comparing keys of a multi-dimensional HashMap - java

I am working on an application with a number of custom data classes. I am taking input for my application from 2 different places and want to cross-reference between the two to help ensure the data is accurate.
I have a Map.Entry<String,HashMap<String, Integer>> object called chromosome, where each value is called a marker.
I also have a custom object called IndividualList individuals which extends HashMap<Integer,Individual> where each Individual has a method Genotype getGenotype() which returns the non-static variable genotype. Genotype extends HashMap<String,String[]>
I want to look at each the key for all my marker objects and check whether each of them are present as a key in any Individual's genotype. Every Individual has the same keys in its genotype so I only need to test for one Individual.
The problem I am facing is which Individual to test, as because it is a HashMap I cannot simply just arbitrarily choose the first element, so what I am doing at the moment is taking the values of individuals as a Collection then converting these to an ArrayList<Individual> then taking the first of these elements (which is just an arbitrary one as HashMap is unordered) to get an Individual then taking this Individual's genotype and comparing marker.getKey() with the keys in the genotype. Like so :
for(Map.Entry<String, MarkerPosition> marker : chromosome.getValue().entrySet())
if(!(new ArrayList<Individual>(individuals.values()).get(0)
.getGenotype().containsKey(marker.getKey())))
errors.add("Marker " + marker.getKey() + " is not present in genotype");
But as you can see, this is horrid and ugly and far too complicated, so I was wondering if there is a much simpler way of achieving what I want that I am missing.
Thanks!

Why can you not arbitrarily choose the first element of a HashMap?
individuals.entrySet().iterator().next()
individuals.values().iterator().next()
This will probably be the same entry each time. You should make sure the map is not empty to avoid an exception.

...This question is really confusingly phrased and difficult to understand, but I'm not clear on why you don't just use
individuals.values().iterator().next()
instead of new ArrayList<Individual>(individuals.values()).get(0).
(If you can use third-party libraries, your code would probably be significantly clearer overall if you used a Guava Table, which is a general-purpose, significantly "cleaner" replacement for a Map<K1, Map<K2, V>>. Disclosure: I contribute to Guava.)

Related

What is the benefit of using a custom class over a map? [duplicate]

This question already has answers here:
Class Object vs Hashmap
(3 answers)
Closed 3 years ago.
I have some piece of code that returns a min and max values from some input that it takes. I need to know what are the benefits of using a custom class that has a minimum and maximum field over using a map that has these two values?
//this is the class that holds the min and max values
public class MaxAndMinValues {
private double minimum;
private double maximum;
//rest of the class code omitted
}
//this is the map that holds the min and max values
Map<String, Double> minAndMaxValuesMap
The most apparent answer would be Object Oriented Programming aspects like the possibility to data with functionality, and the possibility to derive that class.
But let's for the moment assume, that is not a major factor, and your example is so simplistic, that I wouldn't use a Map either. What I would use is the Pair class from Apache Commons: https://commons.apache.org/proper/commons-lang/javadocs/api-3.1/org/apache/commons/lang3/tuple/Pair.html
(ImmutablePair):
https://commons.apache.org/proper/commons-lang/javadocs/api-3.1/org/apache/commons/lang3/tuple/ImmutablePair.html
The Pair class is generic, and has two generic types, one for each field. You can basically define a Pair of something, and get type safety, IDE support, autocompletion, and the big benefit of knowing what is inside. Also a Pair features stuff that a Map can not. For example, a Pair is potentially Comparable. See also ImmutablePair, if you want to use it as key in another Map.
public Pair<Double, Double> foo(...) {
// ...
Pair<Double, Double> range = Pair.of(minimum, maximum);
return range;
}
The big advantage of this class is, that the type you return exposes the contained types. So if you need to, you could return different types from a single method execution (without using a map or complicated inner class).
e.g. Pair<String, Double> or Pair<String, List<Double>>...
In simple situation, you just need to store min and max value from user input, your custom class will be ok than using Map, the reason is: in Java, a Map object can be a HashMap, LinkedHashMap or and TreeMap. it get you a short time to bring your data into its structure and also when you get value from the object. So in simple case, as you just described, just need to use your custom class, morever, you can write some method in your class to process user input, what the Map could not process for you.
I would say to look from perspective of the usage of a programming language. Let it be any language, there will be multiple ways to achieve the result (easy/bad/complicated/performing ...). Considering an Object oriented language like java, this question points more on to the design side of your solution.
Think of accessibility.
The values in a Map is kind of public that , you can modify the contents as you like from any part of the code. If you had a condition that the min and max should be in the range [-100 ,100] & if some part of your code inserts a 200 into map - you have a bug. Ok we can cover it up with a validation , but how many instances of validations would you write? But an Object ? there is always the encapsulation possibilities.
Think of re-use
. If you had the same requirement in another place of code, you have to rewrite the map logic again(probably with all validations?) Doesn't look good right?
Think of extensibility
. If you wanted one more data like median or average -either you have to dirty the map with bad keys or create a new map. But a object is always easy to extend.
So it all relates to the design. If you think its a one time usage probably a map will do ( not a standard design any way. A map must contain one kind of data technically and functionally)
Last but not least, think of the code readability and cognitive complexity. it will be always better with objects with relevant responsibilities than unclear generic storage.
Hope I made some sense!
The benefit is simple : make your code clearer and more robust.
The MaxAndMinValues name and its class definition (two fields) conveys a min and a max value but overall it makes sure that will accept only these two things and its class API is self explanatory to know how to store/get values from it.
While Map<String, Double> minAndMaxValuesMap conveys also the idea that a min and a max value are stored in but it has also multiple drawbacks in terms of design :
we don't know how to retrieve values without looking how these were added.
About it, how to name the keys we we add entries in the map ? String type for key is too broad. For example "MIN", "min", "Minimum" will be accepted. An enum would solve this issue but not all.
we cannot ensure that the two values (min and max) were added in (while an arg constructor can do that)
we can add any other value in the map since that is a Map and not a fixed structure in terms of data.
Beyond the idea of a clearer code in general, I would add that if MaxAndMinValues was used only as a implementation detail inside a specific method or in a lambda, using a Map or even an array {15F, 20F} would be acceptable. But if these data are manipulated through methods, you have to do their meaning the clearest possible.
We used custom class over Hashmap to sort Map based on values part

Should we use HashSet?

A HashSet is backed by a HashMap. From it's JavaDoc:
This class implements the Set interface, backed by a hash table
(actually a HashMap instance)
When taking a look at the source we can also see how they relate to each other:
// Dummy value to associate with an Object in the backing Map
private static final Object PRESENT = new Object();
public boolean add(E e) {
return map.put(e, PRESENT)==null;
}
Therefore a HashSet<E> is backed by a HashMap<E,Object>. For all HashSets in our application we have one reference object PRESENT that we use in the HashMap for the value. While the memory needed to store PRESENT is neglectable, we still store a reference to it for each value in the map.
Would it not be more efficient to use null instead of PRESENT? A further consideration then is should we forgo the HashSet altogether and directly use a HashMap, given the circumstance permits the use of a Map instead of a Set.
My basic problem that triggered these thoughts is the following situation: I have a collection of objects on with the following properties:
big collection of objects > 30'000
Insertion order is not relevant
Efficient check if an item is contained
Adding new items to the collection is not relevant
The chosen solution should perform optimal in the context to the above criteria as well as minimize memory consumption. On this basis the datastructures HashSet and HashMap spring to mind. When thinking about alternative approaches, the key question is:
How to check containement efficiently?
The only answer that comes to my mind is using the items hash to calculate the storage location. I might be missing something here. Are there any other approaches?
I had a look at various issues, that did shed some light on the issue, but not quietly answered my question:
Java : HashSet vs. HashMap
clarifying facts behind Java's implementation of HashSet/HashMap
Java HashSet vs HashMap
I am not looking for suggestions of any alternative libraries or framework to address this, but I want to understand if there is an other way to think about efficient containement checking of an element in a Collection.
In short, yes you should use HashSet. It might not be the most possibly efficient Set implementation, but that hardly ever matters, unless you are working with huge amounts of data.
In that case, I would suggest using specialized libraries. EnumMaps if you can use enums, primitive maps like Trove if your data is mostly primitives, a bunch of other data-structures that are optimized for certain data-types, or even an in-memory-database.
Don't get me wrong, I'm someone who likes performance-tuning, too, but replacing the built-in data-structures should only be done when its really necessary. For most cases, they work perfectly fine.
What you could do, in case you really want to save the last bit of memory and do not care about inserting, is using a fixed-sized array, sorting that and doing a binary search every time. But I doubt that it's more efficient than a HashSet.
Hashtables and HashSets should be used entirely different, so maybe the two shouldn't be compared as "which is more efficient". The hashset would be more suitable for the mathematical "set" (ex. {1,2,3,4}). They contain no duplicates and allow for only one null value. While a hashmap is more of a key-> pair value system. They allow multiple null values as well as duplicates, just not duplicate key vales. I know this is probably answering "difference between a hashtable and hashset" but I think my point is they really can't be compared.

can a Map also be a Collection?

I'd like to have a Map that is also a Collection. Or more specifically, I'd like to be able to iterate over the entries in a Map, including the case where there are multiple entries for a particular key.
The specific problem I'm trying to solve is providing an object that can be used in jstl both to iterate over using c:forEach and in an expression like ${a.b.c}. In this example, I'd want ${a.b.c} to evaluate to the the first value of c (or null if there are none), but also be able to iterate over all cs with <c:forEach items="${a.b.c}"> and have the loop body see each individual value of c in turn, although they have the same key in the Map.
Looking at things from a method point of view, this should be straightforward, just provide a Map implementation whose entrySet() method returns a set with multiple Entries with the same key. But since this seems to violate the contract of a Map, will things break in subtle yet disastrous ways? Has anyone else done this sort of thing?
(If you guessed I'm trying to present xml, you'd be correct)
EDIT
Please note that this is for use in jstl, so whatever interface I present must meet 2 conditions:
for use with the [] and . operators, it must be a Map, List, array or JavaBeans object (and of those it can't be a List or array because the indexes will not be numbers)
for use with forEach it must be an array, Collection, Iterator, Enumeration, Map, or String.
So I guess the real question is, can I count on jstl only calling .containsKey(), .get(), and .entrySet() and not caring about invariants being violated, and not internally making a copy of the Map which would not preserve the special iteration behavior.
What you are looking for is a Multimap. Guava provides an implementation of it and specifically you are looking for ArrayListMultimap.
I barely remember jstl, but what you're saying sounds a kind of controversial:
In foreach:
here ${a.b.c} should point to some container of values and then we iterate over it.
On the other hand you say, ${a.b.c} "should evaluate to the the first value of c" (or null...)
Its an ambiguous definition.
If you feel like Multimap is not what you want, you can provide your own collection implementation (probably internally based on Multimap)
Just as an idea you can always look at a single element as a list (that accidentally
is comprised of one element). This way you would resolve your ambiguity, I guess.
I hope this helps
Having a Map with multiple entries for the same key irreparably breaks the Map contract. If Multimap doesn't work for you, then there's no way to do this without breaking a lot of things.
Specifically, if you pass your monstrosity to something that's specified to take a Map, it'll almost certainly break...and it sounds like that's what you want to do with it, so yeah.
how about you use a Map with Collections as values? then you can have different values for the same key and you can iterate over them by a nested foreach-loop
you can also easily write a wrapper for an existing map-implementation, which gives you a single iterator over all values, if you need it that way

get the number of instances of an element in a guava multiset without iterating

I have a multiset in guava and I would like to retrieve the number of instances of a given element without iterating over this multiset (I don't want to iterate because I assume that iterating takes quite some time, as it looks through all the collection).
To do that, I was thinking first to use the entryset() method of multiset, to obtain a set with single instances and their corresponding count. Then, transform this set into a hashmap (where keys are the elements of my set, and values are their instance count). Because then I can use the methods of hashmap to directly retrieve a value from its key - done! But this makes sense only if I can transform the set into the hashmap in a quick way (without iterating trhough all elements): is it possible?
(as I said I expect this question to be flawed on multiple counts, I'd be happy if you could shed light on the conceptual mistakes I probably make here. Thx!)
Simply invoke count(element) on your multiset -- voila!
You may know in Guava Multiset is an interface, not a class.
If you just want to know the repeated number of an element, call Multiset.count(Object element).
Please forget my following statement:
Then if you are using a popular implementation HashMultiset, there is already a HashMap<E, AtomicInteger> working under the scene.
That is, when the HashMultiset iterates, also a HashMap iterates. No need to transform into another HashMap.

Map with two-dimensional key in java

I want a map indexed by two keys (a map in which you put AND retrieve values using two keys) in Java. Just to be clear, I'm looking for the following behavior:
map.put(key1, key2, value);
map.get(key1, key2); // returns value
map.get(key2, key1); // returns null
map.get(key1, key1); // returns null
What's the best way to to it? More specifically, should I use:
Map<K1,Map<K2,V>>
Map<Pair<K1,K2>, V>
Other?
(where K1,K2,V are the types of first key, second key and value respectively)
You should use Map<Pair<K1,K2>, V>
It will only contain one map,
instead of N+1 maps
Key construction
will be obvious (creation of the
Pair)
Nobody will get confused as to
the meaning of the Map as its
programmer facing API won't have changed.
Dwell time in the data structure would be shorter, which is good if you find you need to synchronize it later.
If you're willing to bring in a new library (which I recommend), take a look at Table in Guava. This essentially does exactly what you're looking for, also possibly adding some functionality where you may want all of the entries that match one of your two keys.
interface Table<R,C,V>
A collection that associates an
ordered pair of keys, called a row key
and a column key, with a single value.
A table may be sparse, with only a
small fraction of row key / column key
pairs possessing a corresponding
value.
I'd recommend going for the second option
Map<Pair<K1,K2>,V>
The first one will generate more overload when retrieving data, and even more when inserting/removing data from the Map. Every time that you put a new Value V, you'll need to check if the Map for K1 exists, if not create it and put it inside the main Map, and then put the value with K2.
If you want to have an interface as you're exposing initially wrap your Map<Pair<K1,K2>,V> with your own "DoubleKeyMap".
(And don't forget to properly implement the methods hash and equals in the Pair class!!)
While I also am on board with what you proposed (a pair of values to use as the key), you could also consider making a wrapper which can hold/match both keys. This might get somewhat confusing since you would need to override the equals and hashCode methods and make that work, but it could be a straightforward way of indicating to the next person using your code that the key must be of a special type.
Searching a little bit, I found this post which may be of use to you. In particular, out of the Apache Commons Collection, MultiKeyMap. I've never used this before, but it looks like a decent solution and may be worth exploring.
I would opt for the Map<Pair<K1,K2>, V> solution, because:
it directly expresses what you want to do
is potentially faster because it uses fewer indirections
simplifies the client code (the code that uses the Map afterwards
Logically, you Pair (key1, key2) corresponds to something since it is the key of your map. Therefore you may consider writing your own class having K1 and K2 as parameters and overriding the hashCode() method (plus maybe other methods for more convenience).
This clearly appears to be a "clean" way to solve your problem.
I have used array for the key: like this
Map<Array[K1,K2], V>

Categories