Input : Let's say I have an object as Person. It has 2 properties namely
ssnNo - Social Security Number
name.
In one hand I have a List of Person objects (with unique ssnNo) and in the other hand I have a Map containing Person's ssnNo as the key and Person's name as the value.
Output : I need Person names using its ssnNo.
Questions :
Which approach to follow out of the 2 I have mentioned above i.e. using list or map? (I think the obvious answer would be the map).
If it is the map, is it always recommended to use map whether the data-set is large or small? I mean are there any performance issues that come with the map.
Map is the way to go. Maps perform very well, and their advantages over lists for lookups get bigger the bigger your data set gets.
Of course, there are some important performance considerations:
Make sure you have a good hashcode (and corresponding equals) implementation, so that you data will be evenly spread across the buckets of the Map.
Make sure you pre-size your Map when you allocate it (if at all possible). The map will automatically resize, but the resize operation essentially requires re-inserting each prior element into the new, bigger Map.
You're right, you should use a map in this case. There are no performance issues using map compared to lists, the performance is significantly better than that of a list when data is large. Map uses key's hashcodes to retrieve entries, in similar way as arrays use indexes to retrieve values, which gives good performance
This looks like a situation appropriate for a Map<Long, Person> that maps a social security number to the relevant Person. You might want to consider removing the ssnNo field from Person so as to avoid any redundancies (since you would be storing those values as keys in your map).
In general, Maps and Lists are very different structures, each suited for different circumstances. You would use the former whenever you want to maintain a set of key-value pairs that allows you to easily and quickly (i.e. in constant time) look up values based on the keys (this is what you want to do). You would use the latter when you simply want to store an ordered, linear collection of elements.
I think it makes sense to have a Person object, but it also makes sense to use a Map over a List, since the look up time will be faster. I would probably use a Map with SSNs as keys and Person objects as values:
Map<SSN,Person> ssnToPersonMap;
It's all pointers. It actually makes no sense to have a Map<ssn,PersonName> instead of a Map<ssn,Person>. The latter is the best choice most of the time.
Using map especially one that implement using a hash table will be faster than the list since this will allow you to get the name in constant time O(1). However using the list you need to do a linear search or may be a binary search which is slower.
Related
I want to write my own Map in Java. I know how map works, but i don't really know where you can keep keys and values. Can i keep them for example in List? So the keys would be store in the list and values would be store in another list?
Best would be if you checked out some of the concepts behind HashMap, TreeMap, HeapMap etc.
Once you understand those concepts, you're far better prepared for writing your own map when it comes to speed.
In other words: unless you know the concepts of all available implementations, it is very unlikely your wheel-re-invention will be a better solution.
Also be sure to test your implementations very thoroughly, as Collection are the backbone and heart of any good application.
Two very very simple (but slow) solutions are these:
1) As suggested above, you can use an ArrayList<Pair> and add your custom getItemByKey() (in Java commonly named 'get') method.
2) You can use two arrays, both keeping the same size, and keeping keys and values matched by their respective indices.
For choosing the data structure there's not better than Array (not all time but almost) of Entries (key/value) because the main goal of map is to map objects for objects, so mapping keys to values.
Using arrays for fast and constant access O(1), but you have a little problem, when your map is full, you have to create new Array and copy old entries.
Note: HashMap works in the same way.
Given set of attributes and a comparator I'd like to generate an order preserving hash code that provides O(1) access. Is there a Java library for this sort of thing or would I have to design the hash function myself?
Try:
java.util.LinkedHashMap()
There is no single collection that will do this. Depending on the detail requirements there are several options to chose from.
For simplicity, I would just use a HashMap for lookups and when I need the sorted data, I'd make a copy of the values and sort it:
List<?> sorted = new ArrayList<?>(hashMap.values());
Collections.sort(sorted, Comparator<?>);
This suffices for most real world use cases.
You could also write your own super-container that internally holds the elements in two collections, one HashMap and maybe a TreeSet. You can then easily provide access methods that make use of the collection better for the purpose of the method. Just make sure you make additions and removals affect both the contained collections.
I'm making a java application that is going to be storing a bunch of random words (which can be added to or deleted from the application at any time). I want fast lookups to see whether a given word is in the dictionary or not. What would be the best java data structure to use for this? As of now, I was thinking about using a hashMap, and using the same word as both a value and the key for that value. Is this common practice? Using the same string for both the key and value in a (key,value) pair seems weird to me so I wanted to make sure that there wasn't some better idea that I was overlooking.
I was also thinking about alternatively using a treeMap to keep the words sorted, giving me an O(lgn) lookup time, but the hashMap should give an expected O(1) lookup time as I understand it, so I figured that would be better.
So basically I just want to make sure the hashMap idea with the strings doubling as both key and value in each (key,value) pair would be a good decision. Thanks.
I want fast lookups to see whether a given word is in the dictionary or not. What would be the best java data structure to use for this?
This is the textbook usecase of a Set. You can use a HashSet. The naive implementation for Set<T> uses a corresponding Map<T, Object> to simply mark whether the entry exists or not.
If you're storing it as a collection of words in a dictionary, I'd suggest taking a look at Tries. They require less memory than a Set and have quick lookup times of worst case O(string length).
Any class that is a Set should help your purpose. However, Do note that Set will not allow for duplicates. For that matter, even a Map won't allow duplicate keys. I would suggest on using a an ArrayList(assuming synchronization is not needed) if you need to add duplicate entries and treat them as separate.
My only concern would be memory, if you use the HashSet and if you have a very large collection of words... Then you will have to load the entire collection in the memory... If that's not a problem.... (And your collection must be very large for this to be a problem)... Then the HashSet should be fine... If you indeed have a very large collection of words, then you can try to use a tree, and only load in memory the parts that you are interested in.
Also keep in mind that insertion is fast, but not as fast as in a tree, remember that for this to work, Java is going to insert every element sorted. Again, nothing major, but if you add a lot of words at a time, you may consider using a tree...
I would like to find and reuse (if possible) a map implementation which has the following attributes:
While the number of entries is small, say < 32, underlying storage should be done in an array like this [key0, val0, key1, val1, ...] This storage scheme avoids many small Entry objects and provides for extremely fast look ups (even tho they are sequential scans!) on modern CPU's due to the CPU's cache not being invalidated and lack of pointer indirection into heap.
The map should maintain insertion order for key/value pairs regardless of the number of entries similar to LinkedHashMap
We are working on an in-memory representations of huge (millions of nodes/edges) graphs in Scala and having such a Map would allow us to store Node/Edge attributes as well as Edges per node in a much more efficient way for 99%+ of Nodes and Edges which have few attributes or neighbors while preserving chronological insertion order for both attributes and edges.
If anyone knows of a Scala or Java map with such characteristics I would be much obliged.
Thanx
While I'm not aware of any implementations that exactly fit your requirements, you may be interested in peeking at Flat3Map (source) in the Jakarta Commons library.
Unfortunately, the Jakarta libraries are rather outdated (e.g., no support for generics in the latest stable release, although it is promising to see that this is changing in trunk) and I usually prefer the Google Collections, but it might be worth your time to see how Apache implemented things.
Flat3Map doesn't preserve the order of the keys, unfortunately, but I do have a suggestion in regard to your original post. Instead of storing the keys and values in a single array like [key0, val0, key1, val1, ...], I recommend using parallel arrays; that is, one array with [key0, key1, ...] and another with [val0, val1, ...]. Normally I am not a proponent of parallel arrays, but at least this way you can have one array of type K, your key type, and another of type V, your value type. At the Java level, this has its own set of warts as you cannot use the syntax K[] keys = new K[32]; instead you'll need to use a bit of typecasting.
Have you measured with profiler if LinkedHashMap is too slow for you? Maybe you don't need that new map - premature optimization is the root of all evil..
Anyway for processing millions or more pieces of data in a second, even best-optimized map can be too slow, because every method call decreases performance as well in that cases. Then all you can do is to rewrite your algorithms from Java collections to arrays (i.e. int -> object maps).
Under java you can maintain a 2d array(spreadsheet). I wrote a program which basically defines a 2 d array with 3 coloumns of data, and 3 coloumns for looking up the data. the three coloumns are testID, SubtestID and Mode. This allows me to basically look up a value by testid and mode or any combination, or I can also reference by static placement. The table is loaded into memory at startup and referenced by the program. It is infinately expandable and new values can be added as needed.
If you are interested, I can post a code source example tonight.
Another idea may be to maintain a database in your program. Databases are designed to organize large amounts of data.
The hash structures I am aware of - HashTable, HashSet & HashMap.
Do they all use the bucket structure - ie when two hashcodes are similar exactly the same one element does not overwrite the other, instead they are placed in the same bucket associated with that hashcode?
In Sun's current implementation of the Java library, IdentityHashMap and the internal implementation in ThreadLocal use probing structures.
The general problem with probing hash tables in Java is that hashCode and equals may be relatively expensive. Therefore you want to cache the hash value. You can't have an array that mixes references and primitives, so you'd need to do something relatively complicated. On the other hand, if you are using == to check matches, then you can check many references without a performance problem.
IIRC, Azul had a fast concurrent quadratic probing hash map.
A linked list is used at each bucket for dealing with hash collisions. Note that the java HashSet is actually implemented by a HashMap underneath (all keys being mapped to the same singleton value across all HashSets) and hence uses the same bucket structure.
If an element is added, its equality is checked against all items in the linked list (via .equals) before it is added at the end. Hence having hash collisions is particularly bad, as this could be an expensive check as the linked list becomes larger.
I believe Java hash structures all use a form of chaining to deal with colisions when performing the hashing - which places the items that have the same hash into a list.
I do not believe that Java uses open addressing for it's hash based data structures (open addressing recomputes hashes based on retry sequences until it finds an open slit in the table)
No -- open addressing is an alternate method of representing hash tables, where objects are stored directly in the table, instead of residing in a linked list. Only one object can be stored at a given index, so resolving collisions is more complicated.
When adding an object for which another object already resides at the same index, a probing sequence is used to determine the new index at which to store the new object. Removing objects is also more complicated, since you if you remove an object, you need to leave a marker that says "there used to be an object here"; for more details, see Wikipedia.
Open addressing is preferable when the objects being stored as small and will rarely be deleted. Open addressing has improved cache performance, since you don't need to go through an extra level of indirection walking a linked list.
The classes you mentioned -- HashTable, HashSet, and HashMap don't use open addressing, but you could easily create new classes that implemented open addressing and provided the same APIs as those classes.
The apis define the behaviour, the internals of how Hash collisions are managed doesn't affect the guarantees of the API ... the performance impact of bad hash value computation is another story. Let's just hash everything to 42 and see how it behaves.
Maps and Sets are the interfaces that determine the behavior of a HashSet or HashMap. A HashSet is a Set, and so it behaves like a Set (ie duplicates are not allowed). A HashMap acts like a Map - it will not overwrite a key with a similar hashcode, but it will overwrite a key, if the same exact key is used again. This will be the same regardless of what data structure is backing the Map internally. See the javadoc for Sets and HashMaps for more.
Did you mean to ask something about the specific implementation of one of these structures?
Except the HashSet. Set is by definition unique elements.
This was a mistake. Please see the comments below.