I am trying to justify whether I'm using the most appropriate Data Structure for a set of scenarios.
The first scenario is an estate agent selling properties at different prices where no price is duplicated. Customers choose a range of prices & obtain a list of properties in that range.
To store the collection of property data I would choose TreeSet. As no property will have the same price, I could have pairs of: price (key) and value (property details). This would work with a TreeSet because there are no duplicate entries and the TreeSet could sort price in natural order. Additionally, the main operation for the scenario is search/contains which would take O(log n). Although there are faster search/contain operations e.g. HashMap, I need ordering. If I need to insert or delete an entry, I believe these operations are also O(log n).
To return a list of properties within a price range, I think I can use headSet() method?
However, I've read on some threads that I can store as a HashMap and create a TreeSet from the HashMap; would it be worth doing this?
You need an ordered set to be able to serve this type of queries. Therefor a tree structure is better suited for your needs than a hash map. However the equivalent to a HashMap is TreeMap, not a TreeSet - you need a mapping between key and value. As for the range operations there is a method more suited for your needs - subMap.
Related
I need an efficient data structure to store a big number (millions) of records on a live (up to a hundred insertions, deletions or updates per second) server.
Its clients need to be able to grab a chunk of that data, sorted, beginning from some point, be able to scroll (i.e. get records before and after the ones they initially got) and receive live updates.
Initially I considered some form of a linked ordered set with some index, however even though the records are unique in the sense that they have an id, the values of their fields by which the set would be ordered are not. Could resolve collisions by just inserting more than one record into each node, but does not seem right.
The other solution I came up with is a linked set with an index, which is kept sorted through insertion deletion and updates. Big O of that would be not O(log n) but O(n), but I'm guessing if I still have the index, would it speed up the process a lot? Or binary search the place to insert? Do not think I can with the list though.
What would be the most efficient solution and which one is best given that I need clients to receive live updates on the state of this data structure?
The code will be in Java
Millions of records -> First estimate if you want / can hold all the data in RAM.
Have a look at b-tree.
Algorithm
Average
Worst case
Space
O(n)
O(n)
Search
O(log n)
O(log n)
Insert
O(log n)
O(log n)
Delete
O(log n)
O(log n)
In Java these kinds of requirements are usually solved by using a TreeMap like a database index. The TreeMap interface isn't particularly well designed for this, so there are some tricks to it:
Your record objects should implement a Key interface or base class that just exposes the sort fields and ID. This interface should not extend Comparable.
Your record objects will be both keys and values in the TreeMap, and each record will map to itself, but the Key interface will be used as the key, so the type of the map is TreeMap<Key,Record>. Remember that every put should be of the form put(record,record)
When you make the TreeMap, use the constructor that takes a custom comparator. Pass a comparator that compares Keys using the sort fields AND the ID, so that there will be no duplicates.
To search in the map, you can use other implementations of the Key interface -- you don't have to use complete records. Because a caller can't provide an ID, though, you can't use TreeMap.get() to find a record that matches the sort fields. Use a key with ID=0 and TreeMap.ceilingEntry to get the first record with >= key, and then check the sort fields to see if they match.
Note that if you need multiple orderings on different fields, you can make your records implement multiple Key interfaces and put them in multiple maps.
Please explain how different collection are used under different scenario.
By this I mean to say how can I differentiate when to use a List, a Set or a Map interface.
Please provide some links to examples that can provide a clear explanation.
Also
if insertion order is preserved then we should go for List.
if insertion order is not preserved then we should go for Set.
What does "insertion order is preserved" means?
Insertion order
Insertion order is preserving the order in which you have inserted the data.
For example you have inserted data {1,2,3,4,5}
Set returns something like {2,3,1,4,5}
while list returns {1,2,3,4,5} .//It preserves the order of insertion
When to use List, Set and Map in Java
1) If you need to access elements frequently by using index, then List is a way to go. Its implementation e.g. ArrayList provides faster access if you know index.
2) If you want to store elements and want them to maintain an order on which they are inserted into collection then go for List again, as List is an ordered collection and maintain insertion order.
3) If you want to create collection of unique elements and don't want any duplicate then choose any Set implementation e.g. HashSet, LinkedHashSet or TreeSet. All Set implementation follow there general contract e.g. uniqueness but also add addition feature e.g. TreeSet is a SortedSet and elements stored on TreeSet can be sorted by using Comparator or Comparable in Java. LinkedHashSet also maintains insertion order.
4) If you store data in form of key and value then Map is the way to go. You can choose from Hashtable, HashMap, TreeMap based upon your subsequent need.
You will find some more useful info at http://java67.blogspot.com/2013/01/difference-between-set-list-and-map-in-java.html
I need a data structure that will perform both the role of a lookup map by key as well as be able to be convertable into a sorted list. The data that goes in is a very siple code-description pair (e.g. M/Married, D/Divorced etc). The lookup requirement is in order to get the description once the user makes a selection in the UI, whose value is the code. The sorted list requirement is in order to feed the data into UI components (JSF) which take List as input and the values always need to be displayed in the same order (alphabetical order of description).
The first thing that came to mind was a TreeMap. So I retrieve the data from my DB in the order I want it to be shown in the UI and load it into my tree map, keyed by the code so that I can later look up descriptions for further display once the user makes selections. As for getting a sorted list out of that same map, as per this post, I am doing the following:
List<CodeObject> list = new ArrayList<CodeObject>(map.values());
However, the list is not sorted in the same order that they were put into the map. The map is declared as a SortedMap and implemented as a TreeMap:
SortedMap<String, CodeObject> map = new TreeMap<String, CodeObject>().
CodeObject is a simple POJO containing just the code and description and corresponding getters (setters in through the constructor), a list of which is fed to UI components, which use the code as the value and description for display. I used to use just a List and that work fine with respect to ordering but a List does not provide an efficient interface for looking up a value by key and I now do have that requirement.
So, my questions are:
If TreeMap is supposed to be a map in the ordered of item addition, why isn's TreeMap.values() in the same order?
What should I do to fulfill my requirements explained above, i.e. have a data structure that will serve as both a lookup map AND a sorted collection of elements? Will TreeMap do it for me if I use it differently or do I need an altogether different approach?
TreeMap maintain's the key's natural order. You can even order it (with a bit more manipulation and custom definition of a comparator) by the natural order/reverse order of the value. But this is not the same as saying "Insertion order". To maintain the insertion order you need to use LinkedHashMap. Java LinkedHashMap is a subclass of HashMap - the analogy is the same as LinkedList where you maintain the trace of the next node. However, it says it cannot "Guarantee" that the order is maintained, so don't ask your money back if you suddenly see an insertion order is maintained with HashMap
TreeMap's documentation says:
The map is sorted according to the natural ordering of its keys, or by a Comparator provided at map creation time, depending on which constructor is used.
So unless you're providing a Comparator and tracking the insertion order and using it in that Comparator, you'll get the natural order of the keys, not the order in which the keys were inserted.
If you want insertion order, as davide said, you can use LinkedHashMap:
Hash table and linked list implementation of the Map interface, with predictable iteration order...This linked list defines the iteration ordering, which is normally the order in which keys were inserted into the map (insertion-order). Note that insertion order is not affected if a key is re-inserted into the map.
What you need is LinkedHashMap
See another question as well.
Redis has a data structure called a sorted set.
The interface is roughly that of a SortedMap, but sorted by value rather than key. I could almost make do with a SortedSet, but they seem to assume static sort values.
Is there a canonical Java implementation of a similar concept?
My immediate use case is to build a set with a TTL on each element. The value of the map would be the expiration time, and I'd periodically prune expired elements. I'd also be able to bump the expiration time periodically.
So... several things.
First, decide which kind of access you'll be doing more of. If you'll be doing more HashMap actions (get, put) than accessing a sorted list, then you're better off just using a HashMap and sorting the values when you want to prune the collection.
As for pruning the collection, it sounds like you want to just remove values that have a time less than some timestamp rather than removing the earliest n items. If that's the case then you're better off just filtering the HashMap based on whether the value meets a condition. That's probably faster than trying to sort the list first and then remove old entries.
Since you need two separate conditions, one on the keys and the other one on the values, it is likely that the best performance on very large amounts of data will require two data structures. You could rely on a regular Set and, separately, insert the same objects in PriorityQueue ordered by TTL. Bumping the TTL could be done by writing in a field of the object that contains an additional TTL; then, when you remove the next object, you check if there is an additional TTL, and if so, you put it back with this new TTL and additional TTL = 0 [I suggest this because the cost of removal from a PriorityQueue is O(n)]. This would yield O(log n) time for removal of the next object (+ cost due to the bumped TTLs, this will depend on how often it happens) and insertion, and O(1) or O(log n) time for bumping a TTL, depending on the implementation of Set that you choose.
Of course, the cleanest approach would be to design a new class encapsulating all this.
Also, all of this is overkill if your data set is not very large.
You can implement it using a combination of two data structures.
A sorted mapping of keys to scores. And a sorted reverse mapping of scores to keys.
In Java, typically these would be implemented with TreeMap (if we are sticking to the standard Collections Framework).
Redis uses Skip-Lists for maintaining the ordering, but Skip-Lists and Balanced Binary Search Trees (such as TreeMap) both serve the purpose to provide average O(log(N)) access here.
For a given sort set,
we can implement it as an independent class as follows:
class SortedSet {
TreeMap<String, Integer>> keyToScore;
TreeMap<Integer, Set<String>>> scoreToKey
public SortedSet() {
keyToScore= new TreeMap<>();
scoreToKey= new TreeMap<>();
}
void addItem(String key, int score) {
if (keyToScore.contains(key)) {
// Remove old key and old score
}
// Add key and score to both maps
}
List<String> getKeysInRange(int startScore, int endScore) {
// traverse scoreToKey and retrieve all values
}
....
}
I am wondering if there is a more efficient method for getting objects out of my LinkedHashMap with timestamps greater than a specified time. I.e. something better than the following:
Iterator<Foo> it = foo_map.values().iterator();
Foo foo;
while(it.hasNext()){
foo = it.next();
if(foo.get_timestamp() < minStamp) continue;
break;
}
In my implementation, each of my objects has essentially three values: an "id," "timestamp," and "data." The objects are insterted in order of their timestamps, so when I call an iterator over the set, I get ordered results (as required by the linked hashmap contract). The map is keyed to the object's id, so I can quickly lookup them up by id.
When I look them up by a timestamp condition, however, I get an iterator with sorted results. This is an improvement over a generic hashmap, but I still need to iterate sequentially over much of the range until I find the next entry with a higher timestamp than the specified one.
Since the results are already sorted, is there any algorithm I can pass the iterator (or collection to), that can search it faster than sequential? If I went with a treemap as an alternative, would it offer overall speed advantages, or is it doing essentially the same thing in the background? Since the collection is sorted by insertion order already, I'm thinking tree map has a lot more overhead I don't need?
There is no faster way ... if you just use a LinkedHashMap.
If you want faster access, you need to use a different data structure. For example, a TreeSet with an appropriate comparator might be a better solution for this aspect of your problem. For example if your TreeSet is ordered by date, then calling tailSet with an appropriate dummy value can give you all elements greater or equal to a given date.
Since the results are already sorted, is there any algorithm I can pass the iterator (or collection to), that can search it faster than sequential?
Not for a LinkedHashMap.
However, if the ordered list was an ArrayList instead, then you could use "binary search" on the list ... provided that you could lock it to prevent concurrent modifications while you are searching. (Actually, concurrency is a potential issue to consider no matter how you implement this ... including your current linear search.)
If you want to keep the ability to do id lookups, then you need two data structures; e.g. a TreeSet and a HashMap which share their element objects. A TreeSet will probably be more efficient than trying to maintain an ArrayList in order assuming that there are random insertions and/or random deletions.