Find top ten values in a Map - java

Say I have a TreeMap<String, Treeset<Song>>, where object Song has three String fields and an internal CompareTo method. The keys for the map are unique words in the lyrics that are not common words such as "she", "the", "if", or "on". There are multiple copies of Songs in the map, since there are an average of 60 words mapped to a single Song.
For extra credit, the professor asked us to come up with an algorithm to find the top 10 values in the map. I didn't solve the problem in time, which is why I'm asking here.
The part that I'm stumped on is, unlike with an ordered array or list, you can't just grab the top values sequentially. So, I thought about:
Create a PriorityQueue<Node> with the Comparator sorting the Nodes based
on the Set size
iterate over the map
for each map node
create a Node object with the key-value pair
insert Node into the queue
Even though the PriorityQueue will end up with all the key-value pairs, the top sizes will be at the top, and I can just retrieve the first ten.
This seems like a very roundabout way, since this particular map has 31,000+ nodes mapping to over 637,000 values. Is there a better way?

A simple modification of your algorithm:
Create a PriorityQueue<Node> with the Comparator sorting the Nodes based
on the Set size
iterate over the map
for each map node
if value for node is larger than last entry in priority queue
create a Node object with the key-value pair
insert Node into the queue
trim the queue to ten entries
At completion, the priority queue will only contain the top 10 entries.

I am not sure you want the top 10 by key, in which case Soldier.moth is right and you can specifically obtain a descending view calling descendingMap and then iterate for the first 10 elements. But if you want the top 10 by some other relation, just iterate over the elementSet and store the current top 10 in a sorted data structure, like TreeSet specifying a comparator based on size -- not sure what size you mean but you probably know -- and for every element you replace the smallest of the 10 if it is smaller than the current. You obtain the smallest with firstKey

Related

Sorting nodes under two conditions?

only just started learning java and was stuck on this problem.
Let's say I have a list of employees (I'll use only three of this examples) in no particular order and I go through the list and create a sorted link of nodes that all contain a name and salary per week. These three nodes would individually look like so:
(John, 1000)
(Bob, 1000)
(Adam, 1000)
And I wanted to sort it first by salary then alphabetically by name so all the nodes connected would look something like this:
(Adam, 1000)(Bob, 1000)(John, 1000)
I also have a way of increase the salary so if I were to do something like bobNode.increaseBy(200) (the amount increased at a time will always be the same i.e 200 every time the method is called for every name) the connected nodes would update and look something like this
(Bob, 1200)(Adam, 1000)(John, 1000)
Is there any efficient or easy way to do this? Currently, I have a compareTo method in my Node class that returns this.name.compareTo(other.name) so the nodes are sorted alphabetically as I go through the list of employees. Is there anyway to check for both conditions and sort?
I was thinking about doing something like if salary.compareTo(other.salary) == 0 compare the names instead, but since the nodes would already exists in the linked list it wouldn't really work.
What about adding and sorting the nodes alphabetically first and whenever salary of a node is adjusted removing that node and adding it again to the correct position?
Another idea I had that's similar to the previous was to remove the adjusted nodes and creating a new linked list of nodes that contains all the nodes with that amount of salary sorted alphabetically. I will then add these new nodes after I have gone through the list of employees. Wouldn't this be a bit problematic if say I had a list of 200,000 employees with a wide range of salary thus I would have to create and iterate through many nodes?
I also wanted to note that the salary can increase without having all the employees be added first.
Any help and ideas would be greatly appreciated!

what is right algorithm and data structure for a user defined collection with specific insertion and deletion

This question is asked in one of investment banking company's interview .
I have to design myCache which keeps a cache of studentRecords object and can have one object of myCache of studentRecords collection.When user wants to insert record in studentRecords it will only insert record if there is less than 20 record in collection .otherwise it will remove the least used record from the studentRecords and insert the record.Record will be inserted on basis of ranking of studentRecords in sorted order.When user wants to read the record it will check if studentRecords exist in myCache ,if not exist then will read record from studentRecords collection.
I created a doubly linked list and insert the record on basis of ranking .also can make a mycache class which is singleton and reads records from cache .But how to delete records which are least used .
I can create a array list which delete records top in array(least used record) but can not keep elements on basis of ordering of rank. and to read record on basis of ranking is expensive again .
Is there any other solution which would have impressed interviewer.
myCache class have functions like :
public void removeRecordFromStudentRecords(String rank);
public void addRecordToStudentRecords(StudentRecords st);
public Student readRecordFromStudentRecords(String rank);
table of StudentRecords
SrNo rank name maths science total percentage
1 1 rohan 90 90 180 90
2 2 sohan 80 90 160 80
3 3 abhi 70 70 140 70
If we're talking about a Cache we should optimize time complexity first and memory later.
So, in this case, I can provide next solution:
Use Map (i.e. HashMap) for storing records (key: recordId, value: Record).
Use Stack for last used items (value: recordId).
Use Tree (i.e. BST) for holding rank (key: rankValue, value: recordId).
Combination of this tree data structures allows to provide the fastest solution (I guess).
Read by Id operation: O(1) - just simple get from the map
Add record operation: O(ln N) - because we need to insert key into tree (we do not include balancing into counting complaxity)
Remove by rank operation: O(ln N) - simply finding recordId by rank in Tree( don't forget removing record from Map and recordID from Stack)
This is just brief overview of the problem. Guess, it's enough info to understand the main idea.
In order to keep a track of the least used record, you need to store the number of hits each record has (if you do not know what "hits" are, I suggest you look up "hits and misses in caching") So each studentRecord can be an object of a class as follows:
class StudentRecord{
int unique_id;
int ranking;
int hits
}
StudentRecord studentRecord = new StudentRecord();
sort your cache based on studentRecord.ranking and when you need to decide which studentRecord to delete, simply traverse the cache on the basis of hits and delete the element with the minimum hits.
To maintain hits, whenever you get a query for a studentRecord based on its unique_id, you increment its hits by 1. Thus, hits will give you a metric of which studentRecord is most used/least used
EDIT: Your question is now much clearer. For sorting, you can use simple insertion sort. The reasons for this are 1) you have max 20 elements in your cache and 2) when you try and insert a new element, insertion sort will help you perfectly to find the index where you can place the element. In fact, technically speaking, you need only sort once. Then you just need to figure out where to insert future elements.
I would say a simple linked list with arbitrary access (like java.util.ArrayList) will suffice. It will give you random access as in arrays and also the provision to accomodate less than 20 elements. I see no reason to make a doubly linked list since there is no need to access an element's left and right neighbours here...
Least Recently Used Scheduling technique can be applied here,you can keep a byte field for each entry in your list.
So everytime an entry of your object is used you can push 1 to the byte(b>>1).
So for entries which are being more frequently used, you'll have a lot of 1s in the binary representation for your byte.
For data, not being used at all will have all 0s.
And everytime, you are required to delete an entry from your cache, just delete the one with 0 or the one with the smallest value of this byte field.
Also, to remember references for greater time spans than just eight, you can use bigger datatypes.

Why is accessing an item by index slower in a linked list than an array?

I think I am missing a very obvious point but could not find it in my Java textbook.
I understand that node storage does not necessarily have to be contiguous in memory for linked list. Does this also mean that a linked list is not indexable? If so, then the only way to find an item in a linked list is to traverse the list, right, whereas you can get from an array by index?
Why is accessing an item by index slower in a linked list than an array?
A linked list has a chain of entries. If you want to get (say) the element at position 42, the code has to:
get the entry for the first element (position 0)
follow the next link to the entry for position 1
follow the next link to the entry for position 2
and so on .... 42 times in total.
There is no short cut.
I am still not understanding why a linked list is not indexable ....
Now a LinkedList is indexable in the sense that there is a get(int) operation that works. It is just that indexing a LinkedList is inefficient. In general, it takes O(N) steps to perform a get(i) in a linked list of length N. By contrast with an array, or an ArrayList, you can retrieve any element of the data structure in one step. We say that the complexity is O(1).
Contrast this with Set objects in general, and HashSet in particular. The HashSet class is NOT indexable because there is no get(int) method to retrieve the set element at position i. Indeed, even the notion of "position i" in a set is meaningless. The ordering of the elements in a Set is unspecified and (for some Set implementations, like HashSet) it may be effectively indeterminate.
Some Linked list implementations provide a way to access to it's elements using index, but the fact is that if you want to get 10th element in linked list your compiler still has to go through all the sequence from 0 to 9 because the elements may be spread over the memory. On the other hand when you ask for 10th element in an array using index, compiler computes the exact position of 10th element and jumps directly to that element. Array and list have different purposes; if your algorithm requires go back and forward over your data structure, then it is much efficient to use array. If you need mostly add/remove operations, then it is efficient to use list
With a linked list you can add and remove elements at any time so index it has no sense. Imagine that you create an index that points at the third element of the list. After that, you insert a new element at the beginning of the list. What value should return the index?
However it could be possible for example create an index at the middle of the list and use it only if you add o remove elements in the last half of the list.
#paxdiablo explains it very well here Is there a known implementation of an indexed linked list?

Adding elements into ArrayList at position larger than the current size

Currently I'm using an ArrayList to store a list of elements, whereby I will need to insert new elements at specific positions. There is a need for me to enter elements at a position larger than the current size. For e.g:
ArrayList<String> arr = new ArrayList<String>();
arr.add(3,"hi");
Now I already know there will be an OutOfBoundsException. Is there another way or another object where I can do this while still keeping the order? This is because I have methods that finds elements based on their index. For e.g.:
ArrayList<String> arr = new ArrayList<String>();
arr.add("hi");
arr.add(0,"hello");
I would expect to find "hi" at index 1 instead of index 0 now.
So in summary, short of manually inserting null into the elements in-between, is there any way to satisfy these two requirements:
Insert elements into position larger than current size
Push existing elements to the right when I insert elements in the middle of the list
I've looked at Java ArrayList add item outside current size, as well as HashMap, but HashMap doesn't satisfy my second criteria. Any help would be greatly appreciated.
P.S. Performance is not really an issue right now.
UPDATE: There have been some questions on why I have these particular requirements, it is because I'm working on operational transformation, where I'm inserting a set of operations into, say, my list (a math formula). Each operation contains a string. As I insert/delete strings into my list, I will dynamically update the unapplied operations (if necessary) through the tracking of each operation that has already been applied. My current solution now is to use a subclass of ArrayList and override some of the methods. I would certainly like to know if there is a more elegant way of doing so though.
Your requirements are contradictory:
... I will need to insert new elements at specific positions.
There is a need for me to enter elements at a position larger than the current size.
These imply that positions are stable; i.e. that an element at a given position remains at that position.
I would expect to find "hi" at index 1 instead of index 0 now.
This states that positions are not stable under some circumstances.
You really need to make up your mind which alternative you need.
If you must have stable positions, use a TreeMap or HashMap. (A TreeMap allows you to iterate the keys in order, but at the cost of more expensive insertion and lookup ... for a large collection.) If necessary, use a "position" key type that allows you to "always" generate a new key that goes between any existing pair of keys.
If you don't have to have stable positions, use an ArrayList, and deal with the case where you have to insert beyond the end position using append.
I fail to see how it is sensible for positions to be stable if you insert beyond the end, and allow instability if you insert in the middle. (Besides, the latter is going to make the former unstable eventually ...)
even you can use TreeMap for maintaining order of keys.
First and foremost, I would say use Map instead of List. I guess your problem can be solved in better way if you use Map. But in any case if you really want to do this with Arraylist
ArrayList<String> a = new ArrayList<String>(); //Create empty list
a.addAll(Arrays.asList( new String[100])); // add n number of strings, actually null . here n is 100, but you will have to decide the ideal value of this, depending upon your requirement.
a.add(7,"hello");
a.add(2,"hi");
a.add(1,"hi2");
Use Vector class to solve this issue.
Vector vector = new Vector();
vector.setSize(100);
vector.set(98, "a");
When "setSize" is set to 100 then all 100 elements gets initialized with null values.
For those who are still dealing with this, you may do it like this.
Object[] array= new Object[10];
array[0]="1";
array[3]= "3";
array[2]="2";
array[7]="7";
List<Object> list= Arrays.asList(array);
But the thing is you need to identify the total size first, this should be just a comment but I do not have much reputation to do that.

Comparator for TreeBag to sort by the number of occurrences

I have a source of strings (let us say, a text file) and many strings repeat multiple times. I need to get the top X most common strings in the order of decreasing number of occurrences.
The idea that came to mind first was to create a sortable Bag (something like org.apache.commons.collections.bag.TreeBag) and supply a comparator that will sort the entries in the order I need. However, I cannot figure out what is the type of objects I need to compare. It should be some kind of an internal map that combines my object (String) and the number of occurrences, generated internally by TreeBag. Is this possible?
Or would I be better off by simply using a hashmap and sort it by value as described in, for example, Java sort HashMap by value
Why don't you put the strings in a map. Map of string to number of times they appear in text.
In step 2, traverse the items in the map and keep on adding them to a minimum heap of size X. Always extract min first if the heap is full before inserting.
Takes nlogx time.
Otherwise after step 1 sort the items by number of occurrences and take first x items. A tree map would come in helpful here :) (I'd add a link to the javadocs, but I'm in a tablet )
Takes nlogn time.
With Guava's TreeMultiset, just use Multisets.copyHighestCountFirst.

Categories