Java: Compact a HashMap (analogue of ArrayList#trimToSize) - java

Is there a way to compact a HashMap in the sense that you can with an ArrayList through its trimToSize() method?
One way I can think of is to iterate through all of the entries in the present map and populate a new one, then replace the original with the new one.
Is there a better way to accomplish this?

Well you don't need to go through iterating manually - you can just use:
map = new HashMap<String, String>(map); // Adjust type arguments as necessary
I believe that will do all the iteration for you. It's possible that clone() will do the same thing, but I don't know for sure.
Either way, I don't believe you're missing anything - I don't think there's any way of performing a "trim" operation in the current API. Unlike ArrayList, such an operation would be reasonably complex anyway (as expansion is) - it's not just a case of creating a new array and performing a single array copy. The entries need to be redistributed. The benefit of getting HashMap to do this itself internally would probably just be that the hash codes wouldn't need recomputing.

If you use the trove library instead this has support for hash map and hash set trimming (see THashMap object, compact method) and best of all, automatically trims on removal of objects when the map becomes too sparse. This should be quicker than building a new map using the standard java HashMap implementation as (presumably) it doesn't have to reorder the objects according to their hashcode, but can just use the order it already knows.

Related

Should we use HashSet?

A HashSet is backed by a HashMap. From it's JavaDoc:
This class implements the Set interface, backed by a hash table
(actually a HashMap instance)
When taking a look at the source we can also see how they relate to each other:
// Dummy value to associate with an Object in the backing Map
private static final Object PRESENT = new Object();
public boolean add(E e) {
return map.put(e, PRESENT)==null;
}
Therefore a HashSet<E> is backed by a HashMap<E,Object>. For all HashSets in our application we have one reference object PRESENT that we use in the HashMap for the value. While the memory needed to store PRESENT is neglectable, we still store a reference to it for each value in the map.
Would it not be more efficient to use null instead of PRESENT? A further consideration then is should we forgo the HashSet altogether and directly use a HashMap, given the circumstance permits the use of a Map instead of a Set.
My basic problem that triggered these thoughts is the following situation: I have a collection of objects on with the following properties:
big collection of objects > 30'000
Insertion order is not relevant
Efficient check if an item is contained
Adding new items to the collection is not relevant
The chosen solution should perform optimal in the context to the above criteria as well as minimize memory consumption. On this basis the datastructures HashSet and HashMap spring to mind. When thinking about alternative approaches, the key question is:
How to check containement efficiently?
The only answer that comes to my mind is using the items hash to calculate the storage location. I might be missing something here. Are there any other approaches?
I had a look at various issues, that did shed some light on the issue, but not quietly answered my question:
Java : HashSet vs. HashMap
clarifying facts behind Java's implementation of HashSet/HashMap
Java HashSet vs HashMap
I am not looking for suggestions of any alternative libraries or framework to address this, but I want to understand if there is an other way to think about efficient containement checking of an element in a Collection.
In short, yes you should use HashSet. It might not be the most possibly efficient Set implementation, but that hardly ever matters, unless you are working with huge amounts of data.
In that case, I would suggest using specialized libraries. EnumMaps if you can use enums, primitive maps like Trove if your data is mostly primitives, a bunch of other data-structures that are optimized for certain data-types, or even an in-memory-database.
Don't get me wrong, I'm someone who likes performance-tuning, too, but replacing the built-in data-structures should only be done when its really necessary. For most cases, they work perfectly fine.
What you could do, in case you really want to save the last bit of memory and do not care about inserting, is using a fixed-sized array, sorting that and doing a binary search every time. But I doubt that it's more efficient than a HashSet.
Hashtables and HashSets should be used entirely different, so maybe the two shouldn't be compared as "which is more efficient". The hashset would be more suitable for the mathematical "set" (ex. {1,2,3,4}). They contain no duplicates and allow for only one null value. While a hashmap is more of a key-> pair value system. They allow multiple null values as well as duplicates, just not duplicate key vales. I know this is probably answering "difference between a hashtable and hashset" but I think my point is they really can't be compared.

Java data structure for optimized get and iterate from then on

I need a data structure to do a get / find in Log N time and iterate starting from the object that was returned by the get operation. The iterator should iterate in the same order in which elements are inserted into the data structure.
Can I achieve this using TreeSet ? Or any other data structure?
Thanks!
This answer assumes that you want to get / find items by value, as opposed to access by insertion sequence number. I assume that this value is completely unrelated to the order in which items are inserted.
The closest you can get with standard Java foundation classes is a LinkedHashSet. This allows fast searching and iteration in insertion order. But it does not give you an iterator starting at a given position, so you'll have to implement that yourself. Either based on the LinkedHashSet, or using your own set implementation. I guess the easiest way would be using a HashSet and implementing the linking yourself. That way, you could use the set methods to look up the starting element, and use that to construct an iterator following the links. You could hide the links inside a wrapper object, so you won't have to expose them in your API.
If you start with a SortedMap<Integer, Object> and use keys for keeping track of the insertion order, you'll be able to use the fast tailMap operation for your needs.
If you need to find the position of an object by a key (or maybe by the object itself), then introduce another WeakHashMap<Object, Integer> that will map from your key to the position of the object. You'll then use the retrieved sequence number as the key into the former map.
Provided you are using Java 6, you can have a look at ConcurrentSkipListSet and ConcurrentSkipListMap.

Java random insert collection

I am considering using a Java collection that would work best with random insertions. I will be inserting a lot and only read the collection once at the end.
My desired functionality is adding an element at a specified index, anywhere between <0, current_length>. Which collection would be the most efficient to use?
Useful link for your reference:
http://www.coderfriendly.com/wp-content/uploads/2009/05/java_collections_v2.pdf
Not entirely sure how you will be reading the information post input (and how important it is to you). Hashmap or ArrayList would make sense depending on what you are looking to do. Also not sure if you are looking for something thread safe or not.
Hope it helps.
The inefficiency of using List is endemic to the problem. Every time you add something, every subsequent element will have to be re-indexed - as the javadoc states:
Shifts the element currently at that position (if any) and any
subsequent elements to the right (adds one to their indices).
From your question/comments, it would appear that you have a bunch of Objects, and you're sorting them as you go. I'd suggest a more efficient solution to this problem would be to write a Comparator (or make your object implement Comparable), and then use Collections.sort(list, comparator) (or Collections.sort(list)).
You might suggest that your Objects are being sorted on the basis of other variables. In which case, you could create an extension of the Object, with those other variables as fields and extending Comparable, and with a method like getOriginal(). You add these wrapped objects to your list, sort, and then iterate through the list, adding the original objects (from getOriginal()) to a new list.
For info on the sorting algorithm of collections - see this SO question

can a Map also be a Collection?

I'd like to have a Map that is also a Collection. Or more specifically, I'd like to be able to iterate over the entries in a Map, including the case where there are multiple entries for a particular key.
The specific problem I'm trying to solve is providing an object that can be used in jstl both to iterate over using c:forEach and in an expression like ${a.b.c}. In this example, I'd want ${a.b.c} to evaluate to the the first value of c (or null if there are none), but also be able to iterate over all cs with <c:forEach items="${a.b.c}"> and have the loop body see each individual value of c in turn, although they have the same key in the Map.
Looking at things from a method point of view, this should be straightforward, just provide a Map implementation whose entrySet() method returns a set with multiple Entries with the same key. But since this seems to violate the contract of a Map, will things break in subtle yet disastrous ways? Has anyone else done this sort of thing?
(If you guessed I'm trying to present xml, you'd be correct)
EDIT
Please note that this is for use in jstl, so whatever interface I present must meet 2 conditions:
for use with the [] and . operators, it must be a Map, List, array or JavaBeans object (and of those it can't be a List or array because the indexes will not be numbers)
for use with forEach it must be an array, Collection, Iterator, Enumeration, Map, or String.
So I guess the real question is, can I count on jstl only calling .containsKey(), .get(), and .entrySet() and not caring about invariants being violated, and not internally making a copy of the Map which would not preserve the special iteration behavior.
What you are looking for is a Multimap. Guava provides an implementation of it and specifically you are looking for ArrayListMultimap.
I barely remember jstl, but what you're saying sounds a kind of controversial:
In foreach:
here ${a.b.c} should point to some container of values and then we iterate over it.
On the other hand you say, ${a.b.c} "should evaluate to the the first value of c" (or null...)
Its an ambiguous definition.
If you feel like Multimap is not what you want, you can provide your own collection implementation (probably internally based on Multimap)
Just as an idea you can always look at a single element as a list (that accidentally
is comprised of one element). This way you would resolve your ambiguity, I guess.
I hope this helps
Having a Map with multiple entries for the same key irreparably breaks the Map contract. If Multimap doesn't work for you, then there's no way to do this without breaking a lot of things.
Specifically, if you pass your monstrosity to something that's specified to take a Map, it'll almost certainly break...and it sounds like that's what you want to do with it, so yeah.
how about you use a Map with Collections as values? then you can have different values for the same key and you can iterate over them by a nested foreach-loop
you can also easily write a wrapper for an existing map-implementation, which gives you a single iterator over all values, if you need it that way

get the number of instances of an element in a guava multiset without iterating

I have a multiset in guava and I would like to retrieve the number of instances of a given element without iterating over this multiset (I don't want to iterate because I assume that iterating takes quite some time, as it looks through all the collection).
To do that, I was thinking first to use the entryset() method of multiset, to obtain a set with single instances and their corresponding count. Then, transform this set into a hashmap (where keys are the elements of my set, and values are their instance count). Because then I can use the methods of hashmap to directly retrieve a value from its key - done! But this makes sense only if I can transform the set into the hashmap in a quick way (without iterating trhough all elements): is it possible?
(as I said I expect this question to be flawed on multiple counts, I'd be happy if you could shed light on the conceptual mistakes I probably make here. Thx!)
Simply invoke count(element) on your multiset -- voila!
You may know in Guava Multiset is an interface, not a class.
If you just want to know the repeated number of an element, call Multiset.count(Object element).
Please forget my following statement:
Then if you are using a popular implementation HashMultiset, there is already a HashMap<E, AtomicInteger> working under the scene.
That is, when the HashMultiset iterates, also a HashMap iterates. No need to transform into another HashMap.

Categories