I am working on below question:
Suppose you have a list of Dishes, where each dish is associated with
a list of ingredients. Group together dishes with common ingredients.
For example:
Input:
"Pasta" -> ["Tomato Sauce", "Onions", "Garlic"]
"Chicken Curry" --> ["Chicken", "Curry Sauce"]
"Fried Rice" --> ["Rice", "Onions", "Nuts"]
"Salad" --> ["Spinach", "Nuts"]
"Sandwich" --> ["Cheese", "Bread"]
"Quesadilla" --> ["Chicken", "Cheese"]
Output:
("Pasta", "Fried Rice")
("Fried Rice, "Salad")
("Chicken Curry", "Quesadilla")
("Sandwich", "Quesadilla")
Also what is the time and space complexity?
I came up with below code. Is there any better way to do this problem? It looks like algorithm is connected components from graph theory.
public static void main(String[] args) {
List<String> ing1 = Arrays.asList("Tomato Sauce", "Onions", "Garlic");
List<String> ing2 = Arrays.asList("Chicken", "Curry Sauce");
List<String> ing3 = Arrays.asList("Rice", "Onions", "Nuts");
List<String> ing4 = Arrays.asList("Spinach", "Nuts");
List<String> ing5 = Arrays.asList("Cheese", "Bread");
List<String> ing6 = Arrays.asList("Chicken", "Cheese");
Map<String, List<String>> map = new HashMap<>();
map.put("Pasta", ing1);
map.put("Chicken Curry", ing2);
map.put("Fried Rice", ing3);
map.put("Salad", ing4);
map.put("Sandwich", ing5);
map.put("Quesadilla", ing6);
System.out.println(group(map));
}
private static List<List<String>> group(Map<String, List<String>> map) {
List<List<String>> output = new ArrayList<>();
if (map == null || map.isEmpty()) {
return output;
}
Map<String, List<String>> holder = new HashMap<>();
for (Map.Entry<String, List<String>> entry : map.entrySet()) {
String key = entry.getKey();
List<String> value = entry.getValue();
for (String v : value) {
if (!holder.containsKey(v)) {
holder.put(v, new ArrayList<String>());
}
holder.get(v).add(key);
}
}
return new ArrayList<List<String>>(holder.values());
}
We can have an actual complexity estimation of this approach using graph theory. A "connected components" approach would have O(|V| + |E|) complexity, where V is the set of all ingredients and dishes, and E is the set containing all relations (a, b) where each a is a dish and b is an ingredient of the dish b. (i.e. assuming that you are storing this graph G = (V, E) in an adjacency list, as opposed to an adjacency matrix)
In any algorithm that needs to find out all the ingredients of each dish to find the result, you would have to investigate each and every dish and all of their ingredients. This would result in an investigation (i.e. traversal) that takes O(|V| + |E|) time, which would mean that no such algorithm could be better than your approach.
Let's first turn this problem into a graphs problem. Each dish and each ingredient will be a vertex. Each relation between dish and ingredient will be an edge.
Let's analyse the maximal size of the solution. Assuming there are N dishes and M ingredients overall, the maximal solution output is when every single dish is related. In that case the output is of size N^2 so this is a lower bound on the time complexity you can achieve. We can quite easily create a input for which we will must iterate over all vertices and edges so another lower bound on time complexity is N * M. Also we must save all of the vertices and edges so M * N is a lower bound on space complexity.
Now let's analyse your solution. You iterate over all dishes = N and for each one of the dishes you iterate over all of the values = M and with O(1) you check if in the dictionary so in total O(N * M). Your space complexity is O(M * N) as well. I would say your solution is good.
You just need to build a reverse map here.
I think you can write the code in a more expressive way by using Stream API introduced in Java8.
Basic steps:
Extract all the ingredients from the map
For each ingredient, get a set of dishes, and you will have many such sets - collect all such sets into a set - and so the return-type of the method becomes Set<Set<String>>
The following is the implementation:
private static Set<Set<String>> buildReverseMap(Map<String, Set<String>> map) {
// extracting all the values of map in a Set
Set<String> ingredients = map.values()
.stream()
.flatMap(Set::stream)
.collect(Collectors.toSet());
return ingredients.stream()
// map each ingredient to a set
.map(s ->
map.entrySet()
.stream()
.filter(entry -> entry.getValue().contains(s))
.map(Map.Entry::getKey)
.collect(Collectors.toSet())
).collect(Collectors.toSet());
}
Time complexity analysis:
Assuming you have N dishes and M ingredients, and in worst case each dist can have every ingredient. For each ingredient you need to iterate through every dish and check whether this contains the current-ingredient or not. This check can be done in amortized O(1) as we can have ingredients as HashSet<String> for each dish.
So for each ingredient you will iterate through every dish and check whether that dish contains this ingredient or not in amortized O(1). This give the time complexity to be amortized O(M*N).
Space-complexity Analysis:
Simply O(M*N) as in worst case you can have every dist made up of every available ingredient.
Note:
You can return a List<Set<String>> instead of Set<Set<String>> just by changing .collect(Collectors.toSet()) to .collect(Collectors.toList())
Related
Is there any **effective way **of comparing elements in Java and print out the position of the element which occurs once.
For example: if I have a list: ["Hi", "Hi", "No"], I want to print out 2 because "No" is in position 2. I have solved this using the following algorithm and it works, BUT the problem is that if I have a large list it takes too much time to compare the entire list to print out the first position of the unique word.
ArrayList<String> strings = new ArrayList<>();
for (int i = 0; i < strings.size(); i++) {
int oc = Collections.frequency(strings, strings.get(i));
if (oc == 1)
System.out.print(i);
break;
}
I can think of counting each element's occurrence no and filter out the first element though not sure how large your list is.
Using Stream:
List<String> list = Arrays.asList("Hi", "Hi", "No");
//iterating thorugh the list and storing each element and their no of occurance in Map
Map<String, Long> counts = list.stream().collect(Collectors.groupingBy(Function.identity(), LinkedHashMap::new, Collectors.counting()));
String value = counts.entrySet().stream()
.filter(e -> e.getValue() == 1) //filtering out all the elements which have more than 1 occurance
.map(Map.Entry::getKey) // creating a stream of element from map as all of these have only single occurance
.findFirst() //finding the first element from the element stream
.get();
System.out.println(list.indexOf(value));
EDIT:
A simplified version can be
Map<String, Long> counts2 = new LinkedHashMap<String, Long>();
for(String val : list){
long count = counts2.getOrDefault(val, 0L);
counts2.put(val, ++count);
}
for(String key: counts2.keySet()){
if(counts2.get(key)==1){
System.out.println(list.indexOf(key));
break;
}
}
The basic idea is to count each element's occurrence and store them in a Map.Once you have count of all elements occurrences. then you can simply check for the first element which one has 1 as count.
You can use HashMap.For example you can put word as key and index as value.Once you find the same word you can delete the key and last the map contain the result.
If there's only one word that's present only once, you can probably use a HashMap or HashSet + Deque (set for values, Deque for indices) to do this in linear time. A sort can give you the same in n log(n), so slower than linear but a lot faster than your solution. By sorting, it's easy to find in linear time (after the sort) which element is present only once because all duplicates will be next to each other in the array.
For example for a linear solution in pseudo-code (pseudo-Kotlin!):
counters = HashMap()
for (i, word in words.withIndex()) {
counters.merge(word, Counter(i, 1), (oldVal, newVal) -> Counter(oldVald.firstIndex, oldVald.count + newVal.count));
}
for (counter in counters.entrySet()) {
if (counter.count == 1) return counter.firstIndex;
}
class Counter(firstIndex, count)
Map<String,Boolean> + loops
Instead of using Map<String,Integer> as suggested in other answers.
You can maintain a HashMap (if you need to maintain the order, use LinkedHashMap instead) of type Map<String,Boolean> where a value would denote whether an element is unique or not.
The simplest way to generate the map is method put() in conjunction with containsKey() check.
But there are also more concise options like replace() + putIfAbsent(). putIfAbsent() would create a new entry only if key is not present in the map, therefore we can associate such string with a value of true (considered to be unique). On the other hand replace() would update only existing entry (otherwise map would not be effected), and if entry exist, the key is proved to be a duplicate, and it has to be associated with a value of false (non-unique).
And since Java 8 we also have method merge(), which expects tree arguments: a key, a value, and a function which is used when the given key already exists to resolve the old value and the new one.
The last step is to generate list of unique strings by iterating over the entry set of the newly created map. We need every key having a value of true (is unique) associated with it.
List<String> strings = // initializing the list
Map<String, Boolean> isUnique = new HashMap<>(); // or LinkedHashMap if you need preserve initial order of strings
for (String next: strings) {
isUnique.replace(next, false);
isUnique.putIfAbsent(next, true);
// isUnique.merge(next, true, (oldV, newV) -> false); // does the same as the commented out lines above
}
List<String> unique = new ArrayList<>();
for (Map.Entry<String, Boolean> entry: isUnique.entrySet()) {
if (entry.getValue()) unique.add(entry.getKey());
}
Stream-based solution
With streams, it can be done using collector toMap(). The overall logic remains the same.
List<String> unique = strings.stream()
.collect(Collectors.toMap( // creating intermediate map Map<String, Boolean>
Function.identity(), // key
key -> true, // value
(oldV, newV) -> false, // resolving duplicates
LinkedHashMap::new // Map implementation, if order is not important - discard this argument
))
.entrySet().stream()
.filter(Map.Entry::getValue)
.map(Map.Entry::getKey)
.toList(); // for Java 16+ or collect(Collectors.toList()) for earlier versions
I want to build a Map containing elements that are sorted by their value. I receive a list of purchases containing {customerId, purchaseAmount}, and want to build a map of the form which maps the customer to their total purchase amount. A single customer may have multiple purchases.
Finally, I want to process this information customer-by-customer, in order of decreasing total purchase amount. Meaning that I process the highest spending customer first, and the lowest spending customer last.
My initial solution for this was to build a Map (using HashMap), converting this Map to a List (LinkedList), sorting this List in decreasing order, and then processing this List. This is an O(n log n) solution, and I believe it is the best possible time complexity. However, I want to know if there is some way to leverage a data structure such as TreeMap which has a sorted property inherent to it. By default it will be sorted by its keys, however I want to sort it by the value. My current solution below.
public class MessageProcessor {
public static void main(String[] args) {
List<Purchase> purchases = new ArrayList<>();
purchases.add(new Purchase(1, 10));
purchases.add(new Purchase(2, 20));
purchases.add(new Purchase(3, 10));
purchases.add(new Purchase(1, 22));
purchases.add(new Purchase(2, 100));
processPurchases(purchases);
}
private static void processPurchases(List<Purchase> purchases) {
Map<Integer, Double> map = new HashMap<>();
for(Purchase p: purchases) {
if(!map.containsKey(p.customerId)) {
map.put(p.customerId, p.purchaseAmt);
}else {
double value = map.get(p.customerId);
map.put(p.customerId, value + p.purchaseAmt);
}
}
List<Purchase> list = new LinkedList<>();
for(Map.Entry<Integer, Double> entry : map.entrySet()) {
list.add(new Purchase(entry.getKey(), entry.getValue()));
}
System.out.println(list);
Comparator<Purchase> comparator = Comparator.comparing(p -> p.getPurchaseAmt());
list.sort(comparator.reversed());
//Process list
//...
}
class Purchase {
int customerId;
double purchaseAmt;
public Purchase(int customerId, double purchaseAmt) {
this.customerId = customerId;
this.purchaseAmt = purchaseAmt;
}
public double getPurchaseAmt() {
return this.purchaseAmt;
}
}
The current code accomplishes what I want to do, however I would like to know if there is a way that I can avoid transforming the Map into a List and then sorting the List using my custom Comparator. Perhaps using some kind of sort of sorted Map. Any advice would be appreciated. Also, suggestions on how to make my code more readable or idiomatic would be appreciated. Thanks. This is my first post of StackOverflow
First of all a TreeMap does not work for you, because it is sorted by keys, not by values. Another alternative would be a LinkedHashMap. It is sorted by insertion order.
You also can use Java Streams to process your List:
Map<Integer, Double> map = purchases.stream()
.collect(Collectors.toMap(Purchase::getCustomerId, Purchase::getPurchaseAmt, (a, b) -> a + b));
This creates a map for with the customerId as key and the sum of all purchases. Next you can sort that, by using another stream and migrating it to a LinkedHashMap:
LinkedHashMap<Integer, Double> sorted = map.entrySet().stream()
.sorted(Comparator.comparing(Map.Entry<Integer, Double>::getValue).reversed())
.collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue, (a, b) -> {
throw new IllegalStateException("");
}, LinkedHashMap::new));
At the end you can create a new list again if you need it:
List<Purchase> list = sorted.entrySet().stream()
.map(e -> new Purchase(e.getKey(), e.getValue()))
.collect(Collectors.toList());
If you want more basic information to java Streams here is an official tutorial.
I have an unsorted list with an unknown amount of items, in the range of 1000. These items all contain a label that determines where it is supposed to go. To avoid iterating over every item in the list several times, I want to split this list into a certain number of sublists that contain only the items of certain labels.
List<Item> allItems = getItemsFromSomewhere();
List<Item> itemsLabeledA1 = new ArrayList<>();
List<Item> itemsLabeledA2 = new ArrayList<>();
List<Item> itemsLabeledB1 = new ArrayList<>();
...
List<Item> itemsLabeledL3 = new ArrayList<>();
To further complicate the issue, some of the lists require a range of items to be added, which is why each item is labeled something like "A1", "A2", "A3". These lists require every item with an A-label to be added to them. Not all labels have these aggregate lists, however. I might have to aggregate all A-labeled items, while not aggregating all B-labeled items, while still sorting A1, B1, etc. in their own lists.
Given the example above, how do I elegantly split the full list in a single iteration? My initial thought was using ifs or a switch block, but that is an ugly solution.
allItems.forEach(item -> {
if (item.getLabel().contains("A1")) {
itemsLabeledA1.add(item);
allItemsLabeledA.add(item);
}
else if (item.getLabel().contains("B1")) itemsLabeledB1.add(item);
...
else if (item.getLabel().contains("L3")) itemsLabeledL3.add(item);
});
Is there a better way?
I would use groupingBy in two separate streaming operations:
Map<String, List<Item>> allByLabel = allItems.stream().collect(
Collectors.groupingBy(Item::getLabel));
Map<String, List<Item>> allByLabelStart = allItems.stream().collect(
Collectors.groupingBy(item -> item.getLabel().substring(0, 1)));
Maybe you can try using a HashMap where the key is the labels and the values are the sublists?
HashMap<String, List<Item>> map = new HashMap<String, List<Item>>();
for (String label : labels) {
map.put(label, new List<Item>);
}
allItems.forEach(item -> {
String label = item.getLabel();
map.get(label).add(item);
}
It seems you are classifying your items. For this, you'd need to create different groups as per your specification, then add each item to its group. If I understood correctly your requirement, you could accomplish this as follows:
Map<String, List<Item>> map = new LinkedHashMap<>(); // keeps insertion order
allItems.forEach(it -> {
String lbl = it.getLabel();
map.computeIfAbsent(lbl, k -> new ArrayList<>()).add(it);
if (needsToBeAggregated(lbl)) {
map.computeIfAbsent(lbl.substring(0, 1), k -> new ArrayList<>()).add(it);
}
});
Where boolean needsToBeAggregated(String label) is some method where you decide whether the item should be added to an aggregate group or not.
This solution doesn't use streams, but Java 8's Map.computeIfAbsent method instead.
I use a Dico class to store weight of term and id of document where it appears
public class Dico
{
private String m_term; // term
private double m_weight; // weight of term
private int m_Id_doc; // id of doc that contain term
public Dico(int Id_Doc,String Term,double tf_ief )
{
this.m_Id_doc = Id_Doc;
this.m_term = Term;
this.m_weight = tf_ief;
}
public String getTerm()
{
return this.m_term;
}
public double getWeight()
{
return this.m_weight;
}
public void setWeight(double weight)
{
this.m_weight= weight;
}
public int getDocId()
{
return this.m_Id_doc;
}
}
And I use this method to calculate final weight from a Map<String,Double> and List<Dico>:
public List<Dico> merge_list_map(List<Dico> list,Map<String,Double> map)
{
// in map each term is unique but in list i have redundancy
List<Dico> list_term_weight = new ArrayList <>();
for (Map.Entry<String,Double> entrySet : map.entrySet())
{
String key = entrySet.getKey();
Double value = entrySet.getValue();
for(Dico dic : list)
{
String term =dic.getTerm();
double weight = dic.getWeight();
if(key.equals(term))
{
double new_weight =weight*value;
list_term_weight.add(new Dico(dic.getDocId(), term, new_weight));
}
}
}
return list_term_weight;
}
I have 36736 elements in the map and 1053914 in list, currently this program take lot of time to compile: BUILD SUCCESSFUL (total time: 17 minutes 15 seconds).
How can I get only the term from the list that equals the term from map ?
You can use the lookup functionality of the Map, i.e. Map.get() given that your map maps terms to weights. This should have significant performance improvements. The only difference is the output list is in the order as the input list, rather than the order the keys occur in the weighting Map.
public List<Dico> merge_list_map(List<Dico> list, Map<String, Double> map)
{
// in map each term is unique but in list i have redundancy
List<Dico> list_term_weight = new ArrayList<>();
for (Dico dic : list)
{
String term = dic.getTerm();
double weight = dic.getWeight();
Double value = map.get(term); // <== fetch weight from Map
if (value != null)
{
double new_weight = weight * value;
list_term_weight.add(new Dico(dic.getDocId(), term, new_weight));
}
}
return list_term_weight;
}
Basic test
List<Dico> list = Arrays.asList(new Dico(1, "foo", 1), new Dico(2, "bar", 2), new Dico(3, "baz", 3));
Map<String, Double> weights = new HashMap<String, Double>();
weights.put("foo", 2d);
weights.put("bar", 3d);
System.out.println(merge_list_map(list, weights));
Output
[Dico [m_term=foo, m_weight=2.0, m_Id_doc=1], Dico [m_term=bar, m_weight=6.0, m_Id_doc=2]]
Timing test - 10,000 elements
List<Dico> list = new ArrayList<Dico>();
Map<String, Double> weights = new HashMap<String, Double>();
for (int i = 0; i < 1e4; i++) {
list.add(new Dico(i, "foo-" + i, i));
if (i % 3 == 0) {
weights.put("foo-" + i, (double) i); // <== every 3rd has a weight
}
}
long t0 = System.currentTimeMillis();
List<Dico> result1 = merge_list_map_original(list, weights);
long t1 = System.currentTimeMillis();
List<Dico> result2 = merge_list_map_fast(list, weights);
long t2 = System.currentTimeMillis();
System.out.println(String.format("Original: %d ms", t1 - t0));
System.out.println(String.format("Fast: %d ms", t2 - t1));
// prove results equivalent, just different order
// requires Dico class to have hashCode/equals() - used eclipse default generator
System.out.println(new HashSet<Dico>(result1).equals(new HashSet<Dico>(result2)));
Output
Original: 1005 ms
Fast: 16 ms <=== loads quicker
true
Also, check the initialization of the Map. (http://docs.oracle.com/javase/7/docs/api/java/util/HashMap.html) The rehash of the map is costly in performance.
As a general rule, the default load factor (.75) offers a good
tradeoff between time and space costs. Higher values decrease the
space overhead but increase the lookup cost (reflected in most of the
operations of the HashMap class, including get and put). The expected
number of entries in the map and its load factor should be taken into
account when setting its initial capacity, so as to minimize the
number of rehash operations. If the initial capacity is greater than
the maximum number of entries divided by the load factor, no rehash
operations will ever occur.
If many mappings are to be stored in a HashMap instance, creating it
with a sufficiently large capacity will allow the mappings to be
stored more efficiently than letting it perform automatic rehashing as
needed to grow the table.
If you know, or have an approximation of the number of elements that you put in the map, you can create your Map like this:
Map<String, Double> foo = new HashMap<String, Double>(maxSize * 2);
In my experience, you can increase your performance by a factor of 2 or more.
In order to have the merge_list_map function to be efficient, you need to actually use the Map for what it is: an efficient data structure for key lookup.
As you are doing, looping on the Map entries and looking for a match in the List, the algorithm is O(N*M) where M is the size of the map and N the size of the list. That is certainly the worst you can get.
If you loop first through the List and then, for each Term, do a lookup in the the Map with Map$get(String key), you will get a time complexity of O(N) since the map lookup can be considered as O(1).
In term of design, and if you can use Java8, your problem can be translated in terms of Streams:
public static List<Dico> merge_list_map(List<Dico> dico, Map<String, Double> weights) {
List<Dico> wDico = dico.stream()
.filter (d -> weights.containsKey(d.getTerm()))
.map (d -> new Dico(d.getTerm(), d.getWeight()*weights.get(d.getTerm())))
.collect (Collectors.toList());
return wDico;
}
The new weighted list is built following a logical process:
stream(): take the list as a stream of Dico elements
filter(): keep only the Dico elements whose term is in the weights map
map(): for each filtered element, create a new Dico() instance with the computed weight.
collect(): collect all the new instances in a new list
return the new list that contains the filtered Dico with the new weight.
Performance wise, I tested it against some text, the narrative of Arthur Gordon Pym from E.A.Poe:
String text = null;
try (InputStream url = new URL("http://www.gutenberg.org/files/2149/2149-h/2149-h.htm").openStream()) {
text = new Scanner(url, "UTF-8").useDelimiter("\\A").next();
}
String[] words = text.split("[\\p{Punct}\\s]+");
System.out.println(words.length); // => 108028
Since there are only 100k words in the book, for good measure, just x10 (initDico() is a helper to build the List<Dico> from the words):
List<Dico> dico = initDico(words);
List<Dico> bigDico = new ArrayList<>(10*dico.size());
for (int i = 0; i < 10; i++) {
bigDico.addAll(dico);
}
System.out.println(bigDico.size()); // 1080280
Build the weights map, using all words (initWeights() builds a frequency map of the words in the book):
Map<String, Double> weights = initWeights(words);
System.out.println(weights.size()); // 9449 distinct words
The the test of merging the 1M words against the map of weights:
long start = System.currentTimeMillis();
List<Dico> wDico = merge_list_map(bigDico, weights);
long end = System.currentTimeMillis();
System.out.println("===== Elapsed time (ms): "+(end-start));
// => 105 ms
The weights map is significantly smaller than yours, but it should not impact the the timing since the lookup operations are in quasi-constant
time.
This is no serious benchmark for the function, but it already shows that merge_list_map() should score less than 1s (loading and building list and map are not part of the function).
Just to complete the exercise, following are the initialisation methods used in the test above:
private static List<Dico> initDico(String[] terms) {
List<Dico> dico = Arrays.stream(terms)
.map(String::toLowerCase)
.map(s -> new Dico(s, 1.0))
.collect(Collectors.toList());
return dico;
}
// weight of a word is the frequency*1000
private static Map<String, Double> initWeights(String[] terms) {
Map<String, Long> wfreq = termFreq(terms);
long total = wfreq.values().stream().reduce(0L, Long::sum);
return wfreq.entrySet().stream()
.collect(Collectors.toMap(Map.Entry::getKey, e -> (double)(1000.0*e.getValue()/total)));
}
private static Map<String, Long> termFreq(String[] terms) {
Map<String, Long> wfreq = Arrays.stream(terms)
.map(String::toLowerCase)
.collect(groupingBy(Function.identity(), counting()));
return wfreq;
}
You should use the method contains() for list. In this way you'll avoid the second for. Even if the contains() method has O(n) complexity, you should see a a small improvement. Of course, rememeber to re-implement the equals(). Otherwise you should use a second Map, as bot suggested.
Use the lookup functionality of the Map, as Adam pointed out, and use HashMap as implementation of Map - HashMap lookup complexity is O(1). This should result in increased performance.
One famous programmer said "why anybody need DB, just give me hash table!". I have list of grammar symbols together with their frequencies. One way it's a map: symbol#->frequency. The other way its a [binary] relation. Problem: get top 5 symbols by frequency.
More general question. I'm aware of [binary] relation algebra slowly making inroad into CS theory. Is there java library supporting relations?
List<Entry<String, Integer>> myList = new ArrayList<...>();
for (Entry<String, Integer> e : myMap.entrySet())
myList.add(e);
Collections.sort(myList, new Comparator<Entry<String, Integer>>(){
int compare(Entry a, Entry b){
// compare b to a to get reverse order
return new Integer(b.getValue()).compareTo(new Integer(a.getValue());
}
});
List<Entry<String, Integer>> top5 = myList.sublist(0, 5);
More efficient:
TreeSet<Entry<String, Integer>> myTree = new TreeSet<...>(
new Comparator<Entry<String, Integer>>(){
int compare(Entry a, Entry b){
// compare b to a to get reverse order
return new Integer(b.getValue()).compareTo(new Integer(a.getValue());
}
});
for (Entry<String, Integer> e : myMap.entrySet())
myList.add(e);
List<Entry<String, Integer>> top5 = new ArrayList<>();
int i=0;
for (Entry<String, Integer> e : myTree) {
top5.add(e);
if (i++ == 4) break;
}
With TreeSet it should be easy:
int i = 0;
for(Symbol s: symbolTree.descendingSet()) {
i++;
if(i > 5) break; // or probably return
whatever(s);
}
Here is a general algorithm, assuming you already have a completed symbol HashTable
Make 2 arrays:
freq[5] // Use this to save the frequency counts for the 5 most frequent seen so far
word[5] // Use this to save the words that correspond to the above array, seen so far
Use an iterator to traverse your HashTable or Map:
Compare the current symbol's frequency against the ones in freq[5] in sequential order.
If the current symbol has a higher frequency than any entry in the array pairing above, shift that entry and all entries below it one position (i.e. the 5th position gets kicked out)
Add the current symbol / frequency pair to the newly vacated position
Otherwise, ignore.
Analysis:
You make at most 5 comparisons (constant time) against the arrays with each symbol seen in the HashTable, so this is O(n)
Each time you have to shift the entries in the array down, it is also constant time. Assuming you do a shift every time, this is still O(n)
Space: O(1) to store the arrays
Runtime: O(n) to iterate through all the symbols