any suggestions for speeding-up Java code that clones hash maps?

any suggestions for speeding-up Java code that clones hash maps? - java

I have a java class that contains a hash map as a member. This class is created with many objects. In many of these cases, one object of this type is cloned to another object, and then changed. The cloning is required because the changes modify the hash map, and I need to keep the original hash map of the original object intact.
I am wondering if anyone has any suggestions how to speed up the cloning part, or maybe some trick to avoid it. When I profile the code, most time is spent on the cloning these hash maps (which usually have very small set of values, a few hundreds or so).
(I am currently using the colt OpenIntDoubleHashMap implementation.)

You should use more effective algorithms for it. Look at the http://code.google.com/p/pcollections/ library, the PMap structure which allows immutable maps.
UPDATE
If your map is quite small (you said only a few hundreds), maybe more effective would be just two arrays:
int keys[size];
double values[size];
In this case to clone the map you just need do use System.arraycopy which should work very fast.

Maybe implement a copy-on-write wrapper for your map if the original only changes occasionally.

If only a small fraction of the objects change, could you implement a two-layer structure:
Layer 1 is the original map.
Layer 2 keeps the changed elements only.
Any object from the original map that needs to change gets cloned, modified an put into the layer-2 map.
Lookups first consult the layer-2 map and, if the object is not found, fall back to the layer-1 map.

Related

Java, best way to map objects in this specific case

This may be a duplicate, but I don't know the correct terminology to even search for what I want, so I apologise if this is and the title is not completely specific.
This is my scenario:
I have 2 different types of Objects I want to map to each other, call them ObjectA and ObjectB. These Objects are generated in a for-loop. I want each instance in an iteration to be mapped to the other. So basically, ObjectA(1) - ObjectB(1), ObjectA(2) - ObjectB(2), etc.. And there will be roughly 500 - 3000 entries to map.
The reason for this is some methods will be passed ObjectA and I need to get the corresponding ObjectB, and vice versa. I also can not use the initial loop index as reference, it just needs to be one of the objects.
I have tried making use of Guava HashBiMap, which works, but I don't like it for several reasons. 1) This is the only instance I will be making use of any Guava class, and I don't necessarily want to add ~500kb to my package for it. (This is for a mobile app, so trying to keep it small) and 2) I need to iterate the objects every frame, and iterating through the keySet() was giving significant memory allocation. I am sure there are ways around this and it was probably just some mistake I made, but still.. reason 1.
The current solution I have since I know the indexes are mapped, is to just have 2 ArrayList's, (or actually libgdx Array's, but for the purpose of understanding logic we can assume ArrayList), and I simply do;
ObjectA objectA = objectAList.get(objectBList.indexOf(objectB));
And vice versa.
Anyway, I don't like this solution either, it feels expensive and I sure there is a much simpler and faster method, I just don't specifically know what to search for.
Thanks for the help!

Maybe this will be helpful for you. There is an interface in Apache Commons Collections called BidiMap
https://commons.apache.org/proper/commons-collections/apidocs/org/apache/commons/collections4/BidiMap.html
Classes implementing this interface have methods u are looking for. I mean getting key by value and vice versa and this package weighs about 150 kb.
regards

Collection Map in Java

I want to write my own Map in Java. I know how map works, but i don't really know where you can keep keys and values. Can i keep them for example in List? So the keys would be store in the list and values would be store in another list?

Best would be if you checked out some of the concepts behind HashMap, TreeMap, HeapMap etc.
Once you understand those concepts, you're far better prepared for writing your own map when it comes to speed.
In other words: unless you know the concepts of all available implementations, it is very unlikely your wheel-re-invention will be a better solution.
Also be sure to test your implementations very thoroughly, as Collection are the backbone and heart of any good application.
Two very very simple (but slow) solutions are these:
1) As suggested above, you can use an ArrayList<Pair> and add your custom getItemByKey() (in Java commonly named 'get') method.
2) You can use two arrays, both keeping the same size, and keeping keys and values matched by their respective indices.

For choosing the data structure there's not better than Array (not all time but almost) of Entries (key/value) because the main goal of map is to map objects for objects, so mapping keys to values.
Using arrays for fast and constant access O(1), but you have a little problem, when your map is full, you have to create new Array and copy old entries.
Note: HashMap works in the same way.

Same object in multiple collections

Is it a good programming practice to keep the same object in multiple collections?
Lets say i have a map, which contains eg.: 500+ elements,
Map<String,MyObject> map = new HashMap<>();
My application works with multiple connected clients, and i know that each client will almost always use only ±20 known and different elements from this map.
My question is if is it a good idea to create a map for each client which will hold these 20 elements if I want to save some iterations.

Sure it is. It can even be a way to reuse objects instead of creating a lot of new objects which holds identical data (you would call this a pool of objects or a fly weight pattern).
However, it belongs on the context and you must be sure who and how the objects can be changed. If client A changes an object, it will also be changed for client B. If this is what you have intended, it is perfectly ok.

Yes, it is a good idea, since it will take less time to get the needed objects from a map, if a map is smaller. As long as you are not cloning those objects, and not having duplicate objects in different maps

Map versus List

Input : Let's say I have an object as Person. It has 2 properties namely
ssnNo - Social Security Number
name.
In one hand I have a List of Person objects (with unique ssnNo) and in the other hand I have a Map containing Person's ssnNo as the key and Person's name as the value.
Output : I need Person names using its ssnNo.
Questions :
Which approach to follow out of the 2 I have mentioned above i.e. using list or map? (I think the obvious answer would be the map).
If it is the map, is it always recommended to use map whether the data-set is large or small? I mean are there any performance issues that come with the map.

Map is the way to go. Maps perform very well, and their advantages over lists for lookups get bigger the bigger your data set gets.
Of course, there are some important performance considerations:
Make sure you have a good hashcode (and corresponding equals) implementation, so that you data will be evenly spread across the buckets of the Map.
Make sure you pre-size your Map when you allocate it (if at all possible). The map will automatically resize, but the resize operation essentially requires re-inserting each prior element into the new, bigger Map.

You're right, you should use a map in this case. There are no performance issues using map compared to lists, the performance is significantly better than that of a list when data is large. Map uses key's hashcodes to retrieve entries, in similar way as arrays use indexes to retrieve values, which gives good performance

This looks like a situation appropriate for a Map<Long, Person> that maps a social security number to the relevant Person. You might want to consider removing the ssnNo field from Person so as to avoid any redundancies (since you would be storing those values as keys in your map).
In general, Maps and Lists are very different structures, each suited for different circumstances. You would use the former whenever you want to maintain a set of key-value pairs that allows you to easily and quickly (i.e. in constant time) look up values based on the keys (this is what you want to do). You would use the latter when you simply want to store an ordered, linear collection of elements.

I think it makes sense to have a Person object, but it also makes sense to use a Map over a List, since the look up time will be faster. I would probably use a Map with SSNs as keys and Person objects as values:
Map<SSN,Person> ssnToPersonMap;

It's all pointers. It actually makes no sense to have a Map<ssn,PersonName> instead of a Map<ssn,Person>. The latter is the best choice most of the time.

Using map especially one that implement using a hash table will be faster than the list since this will allow you to get the name in constant time O(1). However using the list you need to do a linear search or may be a binary search which is slower.

Do all Hash-based datastructures in java use the 'bucket' concept?

The hash structures I am aware of - HashTable, HashSet & HashMap.
Do they all use the bucket structure - ie when two hashcodes are similar exactly the same one element does not overwrite the other, instead they are placed in the same bucket associated with that hashcode?

In Sun's current implementation of the Java library, IdentityHashMap and the internal implementation in ThreadLocal use probing structures.
The general problem with probing hash tables in Java is that hashCode and equals may be relatively expensive. Therefore you want to cache the hash value. You can't have an array that mixes references and primitives, so you'd need to do something relatively complicated. On the other hand, if you are using == to check matches, then you can check many references without a performance problem.
IIRC, Azul had a fast concurrent quadratic probing hash map.

A linked list is used at each bucket for dealing with hash collisions. Note that the java HashSet is actually implemented by a HashMap underneath (all keys being mapped to the same singleton value across all HashSets) and hence uses the same bucket structure.
If an element is added, its equality is checked against all items in the linked list (via .equals) before it is added at the end. Hence having hash collisions is particularly bad, as this could be an expensive check as the linked list becomes larger.

I believe Java hash structures all use a form of chaining to deal with colisions when performing the hashing - which places the items that have the same hash into a list.
I do not believe that Java uses open addressing for it's hash based data structures (open addressing recomputes hashes based on retry sequences until it finds an open slit in the table)

No -- open addressing is an alternate method of representing hash tables, where objects are stored directly in the table, instead of residing in a linked list. Only one object can be stored at a given index, so resolving collisions is more complicated.
When adding an object for which another object already resides at the same index, a probing sequence is used to determine the new index at which to store the new object. Removing objects is also more complicated, since you if you remove an object, you need to leave a marker that says "there used to be an object here"; for more details, see Wikipedia.
Open addressing is preferable when the objects being stored as small and will rarely be deleted. Open addressing has improved cache performance, since you don't need to go through an extra level of indirection walking a linked list.
The classes you mentioned -- HashTable, HashSet, and HashMap don't use open addressing, but you could easily create new classes that implemented open addressing and provided the same APIs as those classes.

The apis define the behaviour, the internals of how Hash collisions are managed doesn't affect the guarantees of the API ... the performance impact of bad hash value computation is another story. Let's just hash everything to 42 and see how it behaves.

Maps and Sets are the interfaces that determine the behavior of a HashSet or HashMap. A HashSet is a Set, and so it behaves like a Set (ie duplicates are not allowed). A HashMap acts like a Map - it will not overwrite a key with a similar hashcode, but it will overwrite a key, if the same exact key is used again. This will be the same regardless of what data structure is backing the Map internally. See the javadoc for Sets and HashMaps for more.
Did you mean to ask something about the specific implementation of one of these structures?

Except the HashSet. Set is by definition unique elements.
This was a mistake. Please see the comments below.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.