I just finished the main part of the current data structures project, and am working on collecting the statistics. One requirement is that a count of all the references within the TreeMap be recorded.
This Map contains a 31,000+ nodes where a String is mapped to a TreeSet of indeterminate size.
I need to traverse the map and keep a running count of the number of items in the set.
Originally my idea was this:
Set<String> keySet= lyricWords.keySet();
Iterator<String> iter= keySet.iterator();
String current= iter.next();
while (iter.hasNext){
runCount+= lyricWords.get(current).size();
}
The runtime for this is far too long to be acceptable. Is there a more efficient way to do this on the final structure? I could keep a count as the map is built, but the professor wants the numbers to be based on the final structure itself.
I'm not sure. But, probably, you have infinitive loop. Try:
runCount+= iter.next().size();
for (Map.Entry<String, TreeSet> e: lyricWords.entrySet()) {
runCount+= e.getValue().size();
}
I dont see a problem with keeping a count as the map is built.
The count will be correct at the end, and you wont have to incur the cost of iterating through the entire thing again.
I think that the tree can and should keep track of its size
This isn't of much use to you since this is an assignment you're working on, but this is an example where a data structure specifically designed for mapping keys to multiple values shows how much better it is than a Map<T, Collection<V>>.
Guava's Multimap collection type keeps track of the total number of entries it contains, so if you were using a TreeMultimap<String, Foo> rather than a TreeMap<String, TreeSet<Foo>> you could just call multimap.size() to get the number you're looking for.
By the way, the Multimap implementations store a running total of the number of entries which is updated when entries are added to or removed from it. You might be able to do this by doing some fancy stuff with subclassing the TreeMap and wrapping the TreeSets that are added to it, but it would be quite challenging to make it all work properly I think.
Related
I have a list of numbers. In my program I would frequently be checking if a certain number is part of my list. If it is not part of my list, I add it to the list, otherwise I do nothing. I have found myself using a hashmap to store the items instead of an arraylist.
void add(Map<Integer, Integer> mp, int item){
if(!mp.containsKey(item)){
mp.put(item, 1);
}
}
As you can see above, I put anything as the value, since I would not be using the values.
I have tested this process to be a lot faster than using an arraylist. (Also, containsKey() for hashmap is O(1) while contains() for arraylist is O(n))
Although it works well for me, it feels awkward for the simple reason that it is not the right data structure. Is this a good practice? Is there a better DS that I can use? Is there not any list that utilizes hashing to store values?
I have a list of numbers. In my program I would frequently be checking if a certain number is part of my list. If it is not part of my list, I add it to the list, otherwise I do nothing.
You are describing a set. From the Javadoc, a java.util.Set is:
A collection that contains no duplicate elements.
Further, the operation you are describing is add():
Adds the specified element to this set if it is not already present.
In code, you would create a new Set (this example uses a HashSet):
Set<Integer> numbers = new HashSet<>();
Then any time you encounter a number you wish to keep track of, just call add(). If x is already present, the set will remain unchanged and will not throw any errors – you don't need to be careful about adding things, just add anything you see, and the set sort of "filters" out the duplicates for you.
numbers.add(x);
It's beyond your original question, but there are various things you can do with the data once you've populated a set - check if other numbers are present/absent, iterate the numbers in the set, etc. The Javadoc shows which functionality is available to use.
An alternative solution from the standard library is a java.util.BitSet. This is - in my opinion - only an option if the values of item are not too big, and if they are relatively close together. If your values are not near each other (and not starting near to zero), then it might be worthwhile looking for third party solutions that offers sparse bit sets or other sparse data structures.
You can use a bit set like:
BitSet bits = new BitSet();
void add(int item) {
bits.set(item);
}
And as suggested in the comments by Eritrean, you can also use a Set (e.g. HashSet). Internally, a HashSet uses a HashMap, so it will perform similar to your current solution, but it does away with having to put sentinel values in yourself (you just add or remove the item itself).
As an added benefit, if you use Collection<Integer> as the type of parameters/fields in your code, you can easily switch between using an ArrayList or an HashSet and test it without having to change code all over the place.
According to this question I have ordered a Java Map, as follows:
ValueComparator bvc = new ValueComparator(originalMap);
Map<String,Integer> sortedMap = new TreeMap<String,Integer>(bvc);
sortedMap.putAll(originalMap);
Now, I would like to extract the K most relevant values from the map, in top-K fashion. Is there a highly efficient way of doing it without iterating through the map?
P.S., some similar questions (e.g., this) ask for a solution to the top-1 retrieval problem.
No, not if you use a Map. You'd have to iterate over it.
Have you considered using a PriorityQueue? It's Java's implementation of a heap. It has efficient operations for insertion of arbitrary elements and for removal of the "minimum". You might think about doing this here. Instead of a Map, you could put them into a PriorityQueue ordered by relevance, with the most relevant as the root. Then, to extract the K most relevant, you'd just pop K elements from the PriorityQueue.
If you need the map-like property (mapping from String to Integer), then you could write a class that internally keeps everything in both a PriorityQueue and a HashMap. When you insert, you insert into both; when you remove the minimal element, you pop from the PriorityQueue, and that then tells you which element you also need to remove from your HashMap. This will still give you log-time inserts and min-removals.
I need to randomly access keys in a HashMap. Right now, I am using Set's toArray() method on the Set that HashMap's keySet() returns, and casting it as a String[] (my keys are Strings). Then I use Random to pick a random element of the String array.
public String randomKey() {
String[] keys = (String[]) myHashMap.keySet().toArray();
Random rand = new Random();
return keyring[rand.nextInt(keyring.length)];
}
It seems like there ought to be a more elegant way of doing this!
I've read the following post, but it seems even more convoluted than the way I'm doing it. If the following solution is the better, why is that so?Selecting random key and value sets from a Map in Java
There is no facility in a HashMap to return an entry without knowing the key so, if you want to use only that class, what you have is probably as good a solution as any.
Keep in mind however that you're not actually restricted to using a HashMap.
If you're going to be reading this collection far more often than writing it, you can create your own class which contains both a HashMap of the mappings and a different collection of the keys that allows random access (like a Vector).
That way, you won't incur the cost of converting the map to a set then an array every time you read, it will only happen when necessary (adding or deleting items from your collection).
Unfortunately, a Vector allows multiple keys of the same value so you would have to defend against that when inserting (to ensure fairness when selecting a random key). That will increase the cost of insertion.
Deletion would also be increased cost since you would have to search for the item to remove from the vector.
I'm not sure there's an easy single collection for this purpose. If you wanted to go the whole hog, you could have your current HashMap, a Vector of the keys, and yet another HashMap mapping the keys to the vector indexes.
That way, all operations (insert, delete, change, get-random) would be O(1) in time, very efficient in terms of time, perhaps less so in terms of space :-)
Or there's a halfway solution that still uses a wrapper but creates a long-lived array of strings whenever you insert, change or delete a key. That way, you only create the array when needed and you still amortise the costs. Your class then uses the hashmap for efficient access with a key, and the array for random selection.
And the change there is minimal. You already have the code for creating the array, you just have to create your wrapper class which provides whatever you need from a HashMap (and simply passes most calls through to the HashMap) plus one extra function to get a random key (using the array).
Now, I'd only consider using those methods if performance is actually a problem though. You can spend untold hours making your code faster in ways that don't matter :-)
If what you have is fast enough, it's fine.
Why not use the Collections.shuffle method, saved to a variable and simply pop one off the top as required.
http://docs.oracle.com/javase/7/docs/api/java/util/Collections.html#shuffle(java.util.List)
You could avoid copying the whole keyset into a temporary data structure, by first getting the size, choosing the random index and then iterating over the keyset the appropriate number of times.
This code would need to be synchronized to avoid concurrent modifications.
If you really just want any element of a set, this will work fine.
String s = set.iterator().next();
If you are unsure whether there is an element in the set, use:
String s;
Iterator<String> it = set.iterator();
if (it.hasNext()) {
s = it.next();
}
else {
// set was empty
}
Redis has a data structure called a sorted set.
The interface is roughly that of a SortedMap, but sorted by value rather than key. I could almost make do with a SortedSet, but they seem to assume static sort values.
Is there a canonical Java implementation of a similar concept?
My immediate use case is to build a set with a TTL on each element. The value of the map would be the expiration time, and I'd periodically prune expired elements. I'd also be able to bump the expiration time periodically.
So... several things.
First, decide which kind of access you'll be doing more of. If you'll be doing more HashMap actions (get, put) than accessing a sorted list, then you're better off just using a HashMap and sorting the values when you want to prune the collection.
As for pruning the collection, it sounds like you want to just remove values that have a time less than some timestamp rather than removing the earliest n items. If that's the case then you're better off just filtering the HashMap based on whether the value meets a condition. That's probably faster than trying to sort the list first and then remove old entries.
Since you need two separate conditions, one on the keys and the other one on the values, it is likely that the best performance on very large amounts of data will require two data structures. You could rely on a regular Set and, separately, insert the same objects in PriorityQueue ordered by TTL. Bumping the TTL could be done by writing in a field of the object that contains an additional TTL; then, when you remove the next object, you check if there is an additional TTL, and if so, you put it back with this new TTL and additional TTL = 0 [I suggest this because the cost of removal from a PriorityQueue is O(n)]. This would yield O(log n) time for removal of the next object (+ cost due to the bumped TTLs, this will depend on how often it happens) and insertion, and O(1) or O(log n) time for bumping a TTL, depending on the implementation of Set that you choose.
Of course, the cleanest approach would be to design a new class encapsulating all this.
Also, all of this is overkill if your data set is not very large.
You can implement it using a combination of two data structures.
A sorted mapping of keys to scores. And a sorted reverse mapping of scores to keys.
In Java, typically these would be implemented with TreeMap (if we are sticking to the standard Collections Framework).
Redis uses Skip-Lists for maintaining the ordering, but Skip-Lists and Balanced Binary Search Trees (such as TreeMap) both serve the purpose to provide average O(log(N)) access here.
For a given sort set,
we can implement it as an independent class as follows:
class SortedSet {
TreeMap<String, Integer>> keyToScore;
TreeMap<Integer, Set<String>>> scoreToKey
public SortedSet() {
keyToScore= new TreeMap<>();
scoreToKey= new TreeMap<>();
}
void addItem(String key, int score) {
if (keyToScore.contains(key)) {
// Remove old key and old score
}
// Add key and score to both maps
}
List<String> getKeysInRange(int startScore, int endScore) {
// traverse scoreToKey and retrieve all values
}
....
}
I have a multiset in guava and I would like to retrieve the number of instances of a given element without iterating over this multiset (I don't want to iterate because I assume that iterating takes quite some time, as it looks through all the collection).
To do that, I was thinking first to use the entryset() method of multiset, to obtain a set with single instances and their corresponding count. Then, transform this set into a hashmap (where keys are the elements of my set, and values are their instance count). Because then I can use the methods of hashmap to directly retrieve a value from its key - done! But this makes sense only if I can transform the set into the hashmap in a quick way (without iterating trhough all elements): is it possible?
(as I said I expect this question to be flawed on multiple counts, I'd be happy if you could shed light on the conceptual mistakes I probably make here. Thx!)
Simply invoke count(element) on your multiset -- voila!
You may know in Guava Multiset is an interface, not a class.
If you just want to know the repeated number of an element, call Multiset.count(Object element).
Please forget my following statement:
Then if you are using a popular implementation HashMultiset, there is already a HashMap<E, AtomicInteger> working under the scene.
That is, when the HashMultiset iterates, also a HashMap iterates. No need to transform into another HashMap.