Best Data Structure for fast retrieval, update, and keeping ordering

Best Data Structure for fast retrieval, update, and keeping ordering - java

The problem is as follows
I need to keep track of url + click count.
I need to be able to update url quickly with click count when user click on that url.
I need to be able to retrieve the top 10 click count URL quickly.
NOTE: Assuming you cannot use the database.
What is the best data structure to achieve the result?
I have thought about using a map before, but map doesn't keep track of ordering of the top 10 clicks.

You need an additional List<Map.Entry<URL,Integer>> for holding the top ten, with T being the click count for the lowermost.
If you count another click and this count is still not greater than T: do nothing.
If the increased count is greater than T, check whether the URL is in the list or not. If it is, do nothing. If it is not, add this entry to the List, sort and delete the last entry if the list has more than 10 entries. Update T.

The best data structure I can think is of using the TreeSet.
The elements of TreeSet are sorted, so you can easily find top items.
Also make sure for URL you maintain a separate comparator class which implements
Comparator, so you can put your logic of keeping elements sorted all
the time based on count. Use this comparator while creating the TreeSet. Insertion/Update/delete/Get all operations happen in O(logn)
Here is the code, how you should define the structure.
TreeSet<URL> treeSet = new TreeSet<URL>(new URLComparator());
class URL {
private String url;
int count;
public URL(String string, int i) {
url = string;
count = i;
}
#Override
public int hashCode() {
return url.hashCode();
}
#Override // No need to write this method. Just used it for testing
public String toString() {
return "url : " + url + " ,count : " + count+"\n";
}
}
One more info- Use hashcode method of your URL class as hashcode of your url.
This is how you define URLComparator class. compare logic is based on URL count.
class URLComparator implements Comparator<URL> {
#Override
public int compare(URL o1, URL o2) {
return new Integer(o2.count).compareTo(o1.count);
}
}
Testing
TreeSet<URL> treeSet = new TreeSet<URL>(new URLComparator());
treeSet.add(new URL("url1", 12));
treeSet.add(new URL("url2", 0));
treeSet.add(new URL("url3", 5));
System.out.println(treeSet);
Output:-
[url : url1 ,count : 12
, url : url3 ,count : 5
, url : url2 ,count : 0
]
To print top 10 elements, use following code.
Iterator<URL> iterator = treeSet.iterator();
int count = 0;
while(count < 10 && iterator.hasNext() ){
System.out.println(iterator.next());
count++;
}

You can use a Map<String, Integer> for the use case as:
It keeps track of key(url) and value(click count)
You can put to the map an updated url with mapped click count when user click on that url.
You can retrieve the top 10 click count after sorting the map based on the entryset
// create a list out of the entryset of your map
Set<Map.Entry<String, Integer>> set = map.entrySet();
List<Map.Entry<String, Integer>> list = new ArrayList<>(set);
// this can be clubbed in another stub to act on top 'N' click counts
list.sort((o1, o2) -> (o2.getValue()).compareTo(o1.getValue()));
list.stream().limit(10).forEach(entry ->
System.out.println(entry.getKey() + " ==== " + entry.getValue()));

Using Map, you will have to sort the values for top 10 urls.
which will egt you o(nlogn) complexity using comparator for sorting by values.
Another Way is:
Using Doubly linked list(of size 10) with a HashMap (And proceeding in a LRU cache way)
Retrieve/Update will be o(1).
Top 10 results will be items in list.
Structure of Doubly list :
class UrlAndCountNode{
String url;
int count;
UrlAndCountNode next;
UrlAndCountNode prev;
}
Structure of Map:
Map<String, UrlAndCountNode>

That's an interesting question IMO. It seems you need something that is sorted by clicks, but at the same time you need to alter these values, the only way to do that with a data structure is to remove that entry (that you want to update) and put the updated one back. Simply updating clicks will not work. As such I think that keeping them sorted by clicks is a batter option.
The downside is that if there are entries with the same number of clicks, they will get overriden, as such something like guava multiset would be a much better option.
As such I would do this:
static class Holder {
private final String name;
private final int clicks;
public Holder(String name, int clicks) {
super();
this.name = name;
this.clicks = clicks;
}
public String getName() {
return name;
}
public int getClicks() {
return clicks;
}
#Override
public String toString() {
return "name = " + name + " clicks = " + clicks;
}
}
And methods would look like this:
private static List<Holder> firstN(Multiset<Holder> set, int n) {
return set.stream().limit(n).collect(Collectors.toList());
}
private static void updateOne(Multiset<Holder> set, String urlName, int more) {
Iterator<Holder> iter = set.iterator();
int currentClicks = 0;
boolean found = false;
while (iter.hasNext()) {
Holder h = iter.next();
if (h.getName().equals(urlName)) {
currentClicks = h.getClicks();
iter.remove();
found = true;
}
}
if (found) {
set.add(new Holder(urlName, currentClicks + more));
}
}

Related

Converting singly linked list to a map

I have been given an assignment to change to upgrade an existing one.
Figure out how to recode the qualifying exam problem using a Map for each terminal line, on the
assumption that the size of the problem is dominated by the number of input lines, not the 500
terminal lines
The program takes in a text file that has number, name. The number is the PC number and the name is the user who logged on. The program returns the user for each pc that logged on the most. Here is the existing code
public class LineUsageData {
SinglyLinkedList<Usage> singly = new SinglyLinkedList<Usage>();
//function to add a user to the linked list or to increment count by 1
public void addObservation(Usage usage){
for(int i = 0; i < singly.size(); ++i){
if(usage.getName().equals(singly.get(i).getName())){
singly.get(i).incrementCount(1);
return;
}
}
singly.add(usage);
}
//returns the user with the most connections to the PC
public String getMaxUsage(){
int tempHigh = 0;
int high = 0;
String userAndCount = "";
for(int i = 0; i < singly.size(); ++i){//goes through list and keeps highest
tempHigh = singly.get(i).getCount();
if(tempHigh > high){
high = tempHigh;
userAndCount = singly.get(i).getName() + " " + singly.get(i).getCount();
}
}
return userAndCount;
}
}
I am having trouble on the theoretical side. We can use a hashmap or a treemap. I am trying to think through how I would form a map that would hold the list of users for each pc? I can reuse the Usage object which will hold the name and the count of the user. I am not supposed to alter that object though

When checking if Usage is present in the list you perform a linear search each time (O(N)). If you replace your list with the Map<String,Usage>, you'll be able to search for name in sublinear time. TreeMap has O(log N) time for search and update, HashMap has amortized O(1)(constant) time.
So, the most effective data structure in this case is HashMap.
import java.util.*;
public class LineUsageData {
Map<String, Usage> map = new HashMap<String, Usage>();
//function to add a user to the map or to increment count by 1
public void addObservation(Usage usage) {
Usage existentUsage = map.get(usage.getName());
if (existentUsage == null) {
map.put(usage.getName(), usage);
} else {
existentUsage.incrementCount(1);
}
}
//returns the user with the most connections to the PC
public String getMaxUsage() {
Usage maxUsage = null;
for (Usage usage : map.values()) {
if (maxUsage == null || usage.getCount() > maxUsage.getCount()) {
maxUsage = usage;
}
}
return maxUsage == null ? null : maxUsage.getName() + " " + maxUsage.getCount();
}
// alternative version that uses Collections.max
public String getMaxUsageAlt() {
Usage maxUsage = map.isEmpty() ? null :
Collections.max(map.values(), new Comparator<Usage>() {
#Override
public int compare(Usage o1, Usage o2) {
return o1.getCount() - o2.getCount();
}
});
return maxUsage == null ? null : maxUsage.getName() + " " + maxUsage.getCount();
}
}
Map can also be iterated in the time proportional to it's size, so you can use the same procedure to find maximum element in it. I gave you two options, either manual approach, or usage of Collections.max utility method.

With simple words: You use a LinkedList (singly or doubly) when you have a list of items, and you usually plan to traverse them,
and a Map implementation when you have "Dictionary-like" entries, where a key corresponds to a value and you plan to access the value using the key.
In order to convert your SinglyLinkedList to a HashMap or TreeMap, you need find out which property of your item will be used as your key (it must be an element with unique values).
Assuming you are using the name property from your Usage class, you can do this
(a simple example):
//You could also use TreeMap, depending on your needs.
Map<String, Usage> usageMap = new HashMap<String, Usage>();
//Iterate through your SinglyLinkedList.
for(Usage usage : singly) {
//Add all items to the Map
usageMap.put(usage.getName(), usage);
}
//Access a value using its name as the key of the Map.
Usage accessedUsage = usageMap.get("AUsageName");
Also note that:
Map<string, Usage> usageMap = new HashMap<>();
Is valid, due to diamond inference.

I Solved this offline and didn't get a chance to see some of the answers which looked to be both very helpful. Sorry about that Nick and Aivean and thanks for the responses. Here is the code i ended up writing to get this to work.
public class LineUsageData {
Map<Integer, Usage> map = new HashMap<Integer, Usage>();
int hash = 0;
public void addObservation(Usage usage){
hash = usage.getName().hashCode();
System.out.println(hash);
while((map.get(hash)) != null){
if(map.get(hash).getName().equals(usage.name)){
map.get(hash).count++;
return;
}else{
hash++;
}
}
map.put(hash, usage);
}
public String getMaxUsage(){
String str = "";
int tempHigh = 0;
int high = 0;
//for loop
for(Integer key : map.keySet()){
tempHigh = map.get(key).getCount();
if(tempHigh > high){
high = tempHigh;
str = map.get(key).getName() + " " + map.get(key).getCount();
}
}
return str;
}
}

Search multiple HashMaps at the same time

tldr: How can I search for an entry in multiple (read-only) Java HashMaps at the same time?
The long version:
I have several dictionaries of various sizes stored as HashMap< String, String >. Once they are read in, they are never to be changed (strictly read-only).
I want to check whether and which dictionary had stored an entry with my key.
My code was originally looking for a key like this:
public DictionaryEntry getEntry(String key) {
for (int i = 0; i < _numDictionaries; i++) {
HashMap<String, String> map = getDictionary(i);
if (map.containsKey(key))
return new DictionaryEntry(map.get(key), i);
}
return null;
}
Then it got a little more complicated: my search string could contain typos, or was a variant of the stored entry. Like, if the stored key was "banana", it is possible that I'd look up "bannana" or "a banana", but still would like the entry for "banana" returned. Using the Levenshtein-Distance, I now loop through all dictionaries and each entry in them:
public DictionaryEntry getEntry(String key) {
for (int i = 0; i < _numDictionaries; i++) {
HashMap<String, String> map = getDictionary(i);
for (Map.Entry entry : map.entrySet) {
// Calculate Levenshtein distance, store closest match etc.
}
}
// return closest match or null.
}
So far everything works as it should and I'm getting the entry I want. Unfortunately I have to look up around 7000 strings, in five dictionaries of various sizes (~ 30 - 70k entries) and it takes a while. From my processing output I have the strong impression my lookup dominates overall runtime.
My first idea to improve runtime was to search all dictionaries parallely. Since none of the dictionaries is to be changed and no more than one thread is accessing a dictionary at the same time, I don't see any safety concerns.
The question is just: how do I do this? I have never used multithreading before. My search only came up with Concurrent HashMaps (but to my understanding, I don't need this) and the Runnable-class, where I'd have to put my processing into the method run(). I think I could rewrite my current class to fit into Runnable, but I was wondering if there is maybe a simpler method to do this (or how can I do it simply with Runnable, right now my limited understanding thinks I have to restructure a lot).
Since I was asked to share the Levenshtein-Logic: It's really nothing fancy, but here you go:
private int _maxLSDistance = 10;
public Map.Entry getClosestMatch(String key) {
Map.Entry _closestMatch = null;
int lsDist;
if (key == null) {
return null;
}
for (Map.Entry entry : _dictionary.entrySet()) {
// Perfect match
if (entry.getKey().equals(key)) {
return entry;
}
// Similar match
else {
int dist = StringUtils.getLevenshteinDistance((String) entry.getKey(), key);
// If "dist" is smaller than threshold and smaller than distance of already stored entry
if (dist < _maxLSDistance) {
if (_closestMatch == null || dist < _lsDistance) {
_closestMatch = entry;
_lsDistance = dist;
}
}
}
}
return _closestMatch
}

In order to use multi-threading in your case, could be something like:
The "monitor" class, which basically stores the results and coordinates the threads;
public class Results {
private int nrOfDictionaries = 4; //
private ArrayList<String> results = new ArrayList<String>();
public void prepare() {
nrOfDictionaries = 4;
results = new ArrayList<String>();
}
public synchronized void oneDictionaryFinished() {
nrOfDictionaries--;
System.out.println("one dictionary finished");
notifyAll();
}
public synchronized boolean isReady() throws InterruptedException {
while (nrOfDictionaries != 0) {
wait();
}
return true;
}
public synchronized void addResult(String result) {
results.add(result);
}
public ArrayList<String> getAllResults() {
return results;
}
}
The Thread it's self, which can be set to search for the specific dictionary:
public class ThreadDictionarySearch extends Thread {
// the actual dictionary
private String dictionary;
private Results results;
public ThreadDictionarySearch(Results results, String dictionary) {
this.dictionary = dictionary;
this.results = results;
}
#Override
public void run() {
for (int i = 0; i < 4; i++) {
// search dictionary;
results.addResult("result of " + dictionary);
System.out.println("adding result from " + dictionary);
}
results.oneDictionaryFinished();
}
}
And the main method for demonstration:
public static void main(String[] args) throws Exception {
Results results = new Results();
ThreadDictionarySearch threadA = new ThreadDictionarySearch(results, "dictionary A");
ThreadDictionarySearch threadB = new ThreadDictionarySearch(results, "dictionary B");
ThreadDictionarySearch threadC = new ThreadDictionarySearch(results, "dictionary C");
ThreadDictionarySearch threadD = new ThreadDictionarySearch(results, "dictionary D");
threadA.start();
threadB.start();
threadC.start();
threadD.start();
if (results.isReady())
// it stays here until all dictionaries are searched
// because in "Results" it's told to wait() while not finished;
for (String string : results.getAllResults()) {
System.out.println("RESULT: " + string);
}

I think the easiest would be to use a stream over the entry set:
public DictionaryEntry getEntry(String key) {
for (int i = 0; i < _numDictionaries; i++) {
HashMap<String, String> map = getDictionary(i);
map.entrySet().parallelStream().foreach( (entry) ->
{
// Calculate Levenshtein distance, store closest match etc.
}
);
}
// return closest match or null.
}
Provided you are using java 8 of course. You could also wrap the outer loop into an IntStream as well. Also you could directly use the Stream.reduce to get the entry with the smallest distance.

Maybe try thread pools:
ExecutorService es = Executors.newFixedThreadPool(_numDictionaries);
for (int i = 0; i < _numDictionaries; i++) {
//prepare a Runnable implementation that contains a logic of your search
es.submit(prepared_runnable);
}
I believe you may also try to find a quick estimate of strings that completely do not match (i.e. significant difference in length), and use it to finish your logic ASAP, moving to next candidate.

I have my strong doubts that HashMaps are a suitable solution here, especially if you want to have some fuzzing and stop words. You should utilize a proper full text search solutions like ElaticSearch or Apache Solr or at least an available engine like Apache Lucene.
That being said, you can use a poor man's version: Create an array of your maps and a SortedMap, iterate over the array, take the keys of the current HashMap and store them in the SortedMap with the index of their HashMap. To retrieve a key, you first search in the SortedMap for said key, get the respective HashMap from the array using the index position and lookup the key in only one HashMap. Should be fast enough without the need for multiple threads to dig through the HashMaps. However, you could make the code below into a runnable and you can have multiple lookups in parallel.
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.SortedMap;
import java.util.TreeMap;
public class Search {
public static void main(String[] arg) {
if (arg.length == 0) {
System.out.println("Must give a search word!");
System.exit(1);
}
String searchString = arg[0].toLowerCase();
/*
* Populating our HashMaps.
*/
HashMap<String, String> english = new HashMap<String, String>();
english.put("banana", "fruit");
english.put("tomato", "vegetable");
HashMap<String, String> german = new HashMap<String, String>();
german.put("Banane", "Frucht");
german.put("Tomate", "Gemüse");
/*
* Now we create our ArrayList of HashMaps for fast retrieval
*/
List<HashMap<String, String>> maps = new ArrayList<HashMap<String, String>>();
maps.add(english);
maps.add(german);
/*
* This is our index
*/
SortedMap<String, Integer> index = new TreeMap<String, Integer>(String.CASE_INSENSITIVE_ORDER);
/*
* Populating the index:
*/
for (int i = 0; i < maps.size(); i++) {
// We iterate through or HashMaps...
HashMap<String, String> currentMap = maps.get(i);
for (String key : currentMap.keySet()) {
/* ...and populate our index with lowercase versions of the keys,
* referencing the array from which the key originates.
*/
index.put(key.toLowerCase(), i);
}
}
// In case our index contains our search string...
if (index.containsKey(searchString)) {
/*
* ... we find out in which map of the ones stored in maps
* the word in the index originated from.
*/
Integer mapIndex = index.get(searchString);
/*
* Next, we look up said map.
*/
HashMap<String, String> origin = maps.get(mapIndex);
/*
* Last, we retrieve the value from the origin map
*/
String result = origin.get(searchString);
/*
* The above steps can be shortened to
* String result = maps.get(index.get(searchString).intValue()).get(searchString);
*/
System.out.println(result);
} else {
System.out.println("\"" + searchString + "\" is not in the index!");
}
}
}
Please note that this is a rather naive implementation only provided for illustration purposes. It doesn't address several problems (you can't have duplicate index entries, for example).
With this solution, you are basically trading startup speed for query speed.

Okay!!..
Since your concern is to get faster response.
I would suggest you to divide the work between threads.
Lets you have 5 dictionaries May be keep three dictionaries to one thread and rest two will take care by another thread.
And then witch ever thread finds the match will halt or terminate the other thread.
May be you need an extra logic to do that dividing work ... But that wont effect your performance time.
And may be you need little more changes in your code to get your close match:
for (Map.Entry entry : _dictionary.entrySet()) {
you are using EntrySet But you are not using values anyway it seems getting entry set is a bit expensive. And I would suggest you to just use keySet since you are not really interested in the values in that map
for (Map.Entry entry : _dictionary.keySet()) {
For more details on the proformance of map Please read this link Map performances
Iteration over the collection-views of a LinkedHashMap requires time proportional to the size of the map, regardless of its capacity. Iteration over a HashMap is likely to be more expensive, requiring time proportional to its capacity.

PriorityQueue with indices for keeping counts sorted

A problem I often encounter in Java (usually while writing computational linguistics code) is the need to count the number of occurrences of some items in a dataset, then sort the items by their counts. The simplest concrete example is word counting: I need to count the number of occurrences of each word in a text file, then sort the words by their counts to find the most frequently used words.
Unfortunately, Java doesn't seem to have a good data structure for this task. I need to use the words as indices of a collection while I'm counting, so that I can efficiently look up the right counter to increment every time I read a word, but the values I want to sort on are the counts, not the words.
Map<String, Integer> provides the interface I need for looking up the count associated with a word, but Maps can only be sorted by their keys (i.e. TreeMap). PriorityQueue is a nice heap implementation that will sort on whatever comparator you give it, but it provides no way to access the elements by some kind of index and no way to update and re-heapify an element (other than by removing and adding it). Its single type parameter also means I need to stick the words and their counts together into one object in order to use it.
My current "solution" is to store the counts in a Map while counting them, then copy them all into a PriorityQueue to sort them:
Map<String, Integer> wordCounts = countStuff();
PriorityQueue<NamedCount> sortedCounts = new PriorityQueue<>(wordCounts.size(),
Collections.reverseOrder());
for(Entry<String, Integer> count : wordCounts.entrySet()) {
sortedCounts.add(new NamedCount(count.getKey(), count.getValue()));
}
(Note that NamedCount is just a simple pair<string, int> that implements Comparable to compare the integers). But this is inefficient, especially since the data set can be very large, and keeping two copies of the count set in memory is wasteful.
Is there any way I can get random access to the objects inside the PriorityQueue, so that I can just store one copy of the counts in the PriorityQueue and re-heapify as I update them? Would it make sense to use a Map<String, NamedCount> that keeps "pointers" to the objects in the PriorityQueue<NamedCount>?

First, for the base data structure, typically Guava's Multiset<String> is preferable to Map<String, Integer> in the same way that Set<String> is preferable to Map<String, Boolean>. It's a cleaner API and encapsulates the incrementing.
Now, if this were me, I would implement a custom Multiset which adds some additional logic to index the counts, and return them. Something like this:
class IndexedMultiset<T extends Comparable<T>> extends ForwardingMultiset<T> {
private final Multiset<T> delegate = HashMultiset.create();
private final TreeMultimap<Integer, T> countIndex = TreeMultimap.create();
#Override
protected Multiset<T> delegate() {
return delegate;
}
#Override
public int add(T element, int occurrences) {
int prev = super.add(element, occurrences);
countIndex.remove(prev, element);
countIndex.put(count(element), element);
return prev;
}
#Override
public boolean add(T element) {
return super.standardAdd(element);
}
//similar for remove, setCount, etc
}
Then I'd add whatever query capabilities you need based on counts. For example, retrieving an iterable of word/count pairs in descending order could look something like this:
public Iterable<CountEntry<T>> descendingCounts() {
return countIndex.keySet().descendingSet().stream()
.flatMap((count) -> countIndex.get(count).stream())
.map((element) -> new CountEntry<>(element, count(element)))
.collect(Collectors.toList());
}
public static class CountEntry<T> {
private final T element;
private final int count;
public CountEntry(T element, int count) {
this.element = element;
this.count = count;
}
public T element() {
return element;
}
public int count() {
return count;
}
#Override
public String toString() {
return element + ": " + count;
}
}
And it would all be used like this:
public static void main(String... args) {
IndexedMultiset<String> wordCounts = new IndexedMultiset<>();
wordCounts.add("foo");
wordCounts.add("bar");
wordCounts.add("baz");
wordCounts.add("baz");
System.out.println(wordCounts.descendingCounts()); //[baz: 2, bar: 1, foo: 1]
wordCounts.add("foo");
wordCounts.add("foo");
wordCounts.add("foo");
System.out.println(wordCounts.descendingCounts()); //[foo: 4, baz: 2, bar: 1]
}

If you can use third-party libraries like Guava, Multiset is designed pretty specifically as a solution to this problem:
Multiset<String> multiset = HashMultiset.create();
for (String word : words) {
multiset.add(word);
}
System.out.println(Multisets.copyHighestCountFirst(multiset));

A TreeSet or TreeMap that allow duplicates

I need a Collection that sorts the element, but does not removes the duplicates.
I have gone for a TreeSet, since TreeSet actually adds the values to a backed TreeMap:
public boolean add(E e) {
return m.put(e, PRESENT)==null;
}
And the TreeMap removes the duplicates using the Comparators compare logic
I have written a Comparator that returns 1 instead of 0 in case of equal elements. Hence in the case of equal elements the TreeSet with this Comparator will not overwrite the duplicate and will just sort it.
I have tested it for simple String objects, but I need a Set of Custom objects.
public static void main(String[] args)
{
List<String> strList = Arrays.asList( new String[]{"d","b","c","z","s","b","d","a"} );
Set<String> strSet = new TreeSet<String>(new StringComparator());
strSet.addAll(strList);
System.out.println(strSet);
}
class StringComparator implements Comparator<String>
{
#Override
public int compare(String s1, String s2)
{
if(s1.compareTo(s2) == 0){
return 1;
}
else{
return s1.compareTo(s2);
}
}
}
Is this approach fine or is there a better way to achieve this?
EDIT
Actually I am having a ArrayList of the following class:
class Fund
{
String fundCode;
BigDecimal fundValue;
.....
public boolean equals(Object obj) {
// uses fundCode for equality
}
}
I need all the fundCode with highest fundValue

You can use a PriorityQueue.
PriorityQueue<Integer> pQueue = new PriorityQueue<Integer>();
PriorityQueue(): Creates a PriorityQueue with the default initial capacity (11) that orders its elements according to their natural ordering.
This is a link to doc: https://docs.oracle.com/javase/8/docs/api/java/util/PriorityQueue.html

I need all the fundCode with highest fundValue
If that's the only reason why you want to sort I would recommend not to sort at all. Sorting comes mostly with a complexity of O(n log(n)). Finding the maximum has only a complexity of O(n) and is implemented in a simple iteration over your list:
List<Fund> maxFunds = new ArrayList<Fund>();
int max = 0;
for (Fund fund : funds) {
if (fund.getFundValue() > max) {
maxFunds.clear();
max = fund.getFundValue();
}
if (fund.getFundValue() == max) {
maxFunds.add(fund);
}
}
You can avoid that code by using a third level library like Guava. See: How to get max() element from List in Guava

you can sort a List using Collections.sort.
given your Fund:
List<Fund> sortMe = new ArrayList(...);
Collections.sort(sortMe, new Comparator<Fund>() {
#Override
public int compare(Fund left, Fund right) {
return left.fundValue.compareTo(right.fundValue);
}
});
// sortMe is now sorted

In case of TreeSet either Comparator or Comparable is used to compare and store objects . Equals are not called and that is why it does not recognize the duplicate one

Instead of the TreeSet we can use List and implement the Comparable interface.
public class Fund implements Comparable<Fund> {
String fundCode;
int fundValue;
public Fund(String fundCode, int fundValue) {
super();
this.fundCode = fundCode;
this.fundValue = fundValue;
}
public String getFundCode() {
return fundCode;
}
public void setFundCode(String fundCode) {
this.fundCode = fundCode;
}
public int getFundValue() {
return fundValue;
}
public void setFundValue(int fundValue) {
this.fundValue = fundValue;
}
public int compareTo(Fund compareFund) {
int compare = ((Fund) compareFund).getFundValue();
return compare - this.fundValue;
}
public static void main(String args[]){
List<Fund> funds = new ArrayList<Fund>();
Fund fund1 = new Fund("a",100);
Fund fund2 = new Fund("b",20);
Fund fund3 = new Fund("c",70);
Fund fund4 = new Fund("a",100);
funds.add(fund1);
funds.add(fund2);
funds.add(fund3);
funds.add(fund4);
Collections.sort(funds);
for(Fund fund : funds){
System.out.println("Fund code: " + fund.getFundCode() + " Fund value : " + fund.getFundValue());
}
}
}

Add the elements to the arraylist and then sort the elements using utility Collections.sort,. then implement comparable and write your own compareTo method according to your key.
wont remove duplicates as well, can be sorted also:
List<Integer> list = new ArrayList<>();
Collections.sort(list,new Comparator<Integer>()
{
#Override
public int compare(Objectleft, Object right) {
**your logic**
return '';
}
}
)
;

I found a way to get TreeSet to store duplicate keys.
The problem originated when I wrote some code in python using SortedContainers. I have a spatial index of objects where I want to find all objects between a start/end longitude.
The longitudes could be duplicates but I still need the ability to efficiently add/remove specific objects from the index. Unfortunately I could not find the Java equivalent of the Python SortedKeyList that separates the sort key from the type being stored.
To illustrate this consider that we have a large list of retail purchases and we want to get all purchases where the cost is in a specific range.
// We are using TreeSet as a SortedList
TreeSet _index = new TreeSet<PriceBase>()
// populate the index with the purchases.
// Note that 2 of these have the same cost
_index.add(new Purchase("candy", 1.03));
Purchase _bananas = new Purchase("bananas", 1.45);
_index.add(new Purchase(_bananas);
_index.add(new Purchase("celery", 1.45));
_index.add(new Purchase("chicken", 4.99));
// Range scan. This iterator should return "candy", "bananas", "celery"
NavigableSet<PriceBase> _iterator = _index.subset(
new PriceKey(0.99), new PriceKey(3.99));
// we can also remove specific items from the list and
// it finds the specific object even through the sort
// key is the same
_index.remove(_bananas);
There are 3 classes created for the list
PriceBase: Base class that returns the sort key (the price).
Purchase: subclass that contains transaction data.
PriceKey: subclass used for the range search.
When I initially implemented this with TreeSet it worked except in the case where the prices are the same. The trick is to define the compareTo() so that it is polymorphic:
If we are comparing Purchase to PriceKey then only compare the price.
If we are comparing Purchase to Purchase then compare the price and the name if the prices are the same.
For example here are the compareTo() functions for the PriceBase and Purchase classes.
// in PriceBase
#Override
public int compareTo(PriceBase _other) {
return Double.compare(this.getPrice(), _other.getPrice());
}
// in Purchase
#Override
public int compareTo(PriceBase _other) {
// compare by price
int _compare = super.compareTo(_other);
if(_compare != 0) {
// prices are not equal
return _compare;
}
if(_other instanceof Purchase == false) {
throw new RuntimeException("Right compare must be a Purchase");
}
// compare by item name
Purchase _otherPurchase = (Purchase)_other;
return this.getName().compareTo(_otherChild.getName());
}
This trick allows the TreeSet to sort the purchases by price but still do a real comparison when one needs to be uniquely identified.
In summary I needed an object index to support a range scan where the key is a continuous value like double and add/remove is efficient.
I understand there are many other ways to solve this problem but I wanted to avoid writing my own tree class. My solution seems like a hack and I am surprised that I can't find anything else. if you know of a better way then please comment.

How to Sort a Map<String, List<Object>> by the Key with the most values (that are not numeric) assigned to it

I have been working with Maps at present and I am baffled by how I can get my program to work effectively. I can iterate over the map get the keys and values and sort them in alphabetical and reverse alphbetical order quite easily and have used custom comparators for this. However, I am now trying to sort the map based on the key with the most values. The values are a list of objects I have created and can be thought of as this scenario.
There is an Atlas(like a catalog) that has lots of towns (the key of type string). That contains Shops(List). I want to sort this so that the town with the most shops is displayed first and goes in descending order with the secondary sorting being based on town alphabetically and return a string representing this.
I have used the Comparator interface with seperate classes for each one alphabetically and reverse alphabetically so far and wish to follow the same pattern for learning purposes However this has me completely stumped.
Example:
class Atlas {
Map<String, List<Shop> atlas = new HashMap<String, List<Shop>();
void addShop(Shop shop){
//if(Atlas already contains){
get the town and add the shop to it.
}
else{
add the town as the key and the shop as the value in the list
}
}
List<Shop> getAllShopsFromTheGivenTown(String givenTown){
//if(Atlas contains givenTown){
return the givenTown from the List.
}
else{
//Return an ArrayList emptyList
}
}
public String returnAllTownsAndShopsAlphbetically(){
String tmpString = "";
List<String> keys = new LinkedList<String>(atlas.keySet());
TownComparatorAtoZ tc = new TownComparatorAtoZ();
Collections.sort(keys, tc);
for(String town : keys){
List<Shop> shops = new LinkedList<Dealer>(atlas.get(town));
ShopComparatorAtoZ sc = new ShopComparatorAtoZ();
Collections.sort(shop, sc);
for(Shop shop : shops){
if(tmpString.isEmpty()){
tmpString = tmpString + town + ": " + shop.getName();
}
else if(tmpString.contains(town)){
tmpString = tmpString + ", " + shop.getName();
}
else{
tmpString = tmpString + " | " + town + ": " + shop.getName(); }
}
}
return tmpString;
}
}
As can be seen from above (although not the cleanest and most efficient) returns things alphabetically and will be reformatted into a string builder. However, I am wondering how I can use a comparator to achieve what I am after and if someone could provide a code snippet with an explanation of what it actually does I would be grateful as its more about understanding how to do it not just getting a copy and pasted lump of code but need to see if visually in code to understand it.
SO output I want to be something like
manchester: m&s, h&m, schuch | birmingham: game, body shop | liverpool: sports

You can try something like this:
public static Map<String, List<Shop>> mySortedMap(final Map<String, List<Shop>> orig)
{
final Comparator<String> c = new Comparator<String>()
{
#Override
public int compare(final String o1, final String o2)
{
// Compare the size of the lists. If they are the same, compare
// the keys themsevles.
final int sizeCompare = orig.get(o1).size() - orig.get(o2).size();
return sizeCompare != 0 ? sizeCompare : o1.compareTo(o2);
}
}
final Map<String, List<Shop>> ret = new TreeMap<String, List<Shop>>(c);
ret.putAll(orig);
return ret;
}
Explanation: TreeMap is the basic implementation of a SortedMap, and it can take a comparator of key values as an argument (if no comparator is passed as an argument, natural ordering of the keys prevails). Here we create an ad hoc comparator comparing the list sizes of the original map passed as an argument, and if the sizes are equal, it compares the keys themselves. Finally, we inject all elements from the origin map into it, and return it.

What if you try something like the following:
private static final Comparator<Map.Entry<String, List<Shop>>> CountThenAtoZ =
new Comparator<Map.Entry<String, List<Shop>>>() {
#Override
public int compare(Map.Entry<String, List<Shop>> x, Map.Entry<String, List<Shop>> y) {
// Compare shop count first. If equal, compare keys alphabetically.
int cmp = ((Integer)x.getValue().size()).compareTo(y.getValue().size());
return cmp != 0 ? cmp : x.getKey().compareTo(y.getKey());
}
};
...
public String returnAllTownsAndShopsAlphbetically() {
List<Map.Entry<String, List<Shop>>> entries = new ArrayList<>(atlas.entrySet());
Collections.sort(entries, CountThenAtoZ);
String result = "";
boolean firstTown = true;
for (Map.Entry<String, List<Shop>> entry : entries) {
if (!firstTown) result += " | "; else firstTown = false;
result += entry.getKey() + ": ";
boolean firstShop = true;
TreeSet<Shop> sortedShops = new TreeSet<>(new ShopComparatorAtoZ());
sortedShops.addAll(entry.getValue());
for (Shop shop : sortedShops) {
if (!firstShop) result += ", "; else firstShop = false;
result += shop.getName();
}
}
return result;
}
The way this works is to first create a list of the atlas entries in exactly the order we want. We need access to both the keys and their associated values to build the correct ordering, so sorting a List of Map.Entry instances is the most convenient.
We then walk the sorted list to build the resulting String, making sure to sort the shops alphabetically before adding them to the String.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Best Data Structure for fast retrieval, update, and keeping ordering - java

Related

Converting singly linked list to a map

Search multiple HashMaps at the same time

PriorityQueue with indices for keeping counts sorted

A TreeSet or TreeMap that allow duplicates

How to Sort a Map<String, List<Object>> by the Key with the most values (that are not numeric) assigned to it

Categories

Resources