max freq of repetition in an array - java

what is the fastest way to find the max freq of repetition in an array in java in smallest time complexity
A=[1,2,3,4,1,1]
ans = 1
how can this be done

a (mostly) linear time solution would be to use a HashMap<Integer, Integer> and build a histogram of all values appearing in A.
HashMap<Integer, Integer> m = new HashMap<Integer, Integer>();
for(int x : A)
{
Integer v = m.get(x);
if (null == v) {v = Integer.valueOf(0);}
m.put(x, ++v);
}
The going over the entire map and return the entry with the maximum value.
with the entrySet() method this is done in linear time as well.

Related

Print the Key for the N-th highest Value in a HashMap

I have a HashMap and have to print the N-th highest value in the HashMap.
I have managed to get the highest value.
I have sorted the HashMap first so that if there are two keys with the same value, then I get the key that comes first alphabetically.
But I still don't know how to get the key for nth highest value?
public void(HashMap map, int n) {
Map<String, Integer> sortedmap = new TreeMap<>(map);
Map.Entry<String, Integer> maxEntry = null;
for (Map.Entry<String, Integer> entry : sortedmap.entrySet()) {
if (maxEntry == null || entry.getValue().compareTo(maxEntry.getValue()) > 0) {
maxEntry = entry;
}
}
System.out.println(maxEntry.getKey());
}
Here is one way. It is presumed by Nth highest that duplicates must be ignored. Otherwise you would be asking about position in the map and not the intrinsic value as compared to others. For example, if the values are 8,8,8,7,7,5,5,3,2,1 then the 3rd highest value is 5 where the value 8 would be simply be value in the 3rd location of a descending sorted list.
initialize found to false and max to Integer.MAX_VALUE.
sort the list in reverse order based on value. Since the TreeMap is already sorted by keys and is a stable sort (see Sorting algorithms) the keys will remain in sorted order for duplicate values.
loop thru the list and continue checking if the current value is less than max. The key here is less than, That is what ignores the duplicates when iterating thru the list.
if the current value is less than max, assign to max and decrement n. Also assign the key
if n == 0, set found to true and break out of the loop.
if the loop finishes on its own, found will be false and no nth largest exists.
Map<String, Integer> map = new TreeMap<>(Map.of(
"peter" , 40, "mike" , 90, "sam",60, "john",90, "jimmy" , 32, "Alex",60,"joan", 20, "alice", 40));
List<Entry<String,Integer>> save = new ArrayList<>(map.entrySet());
save.sort(Entry.comparingByValue(Comparator.reverseOrder()));
int max = Integer.MAX_VALUE;
boolean found = false;
String key = null;
for (Entry<String,Integer> e : save) {
if (e.getValue() < max) {
max = e.getValue();
key = e.getKey();
if (--n == 0) {
found = true;
break;
}
}
}
if (found) {
System.out.println("Value = " + max);
System.out.println("Key = " + key);
} else {
System.out.println("Not found");
}
prints
Value = 60
Key = Alex
This problem doesn't require sorting all the given data. It will cause a huge overhead if n is close to 1, in which case the possible solution will run in a linear time O(n). Sorting increases time complexity to O(n*log n) (if you are not familiar with Big O notation, you might be interested in reading answers to this question). And for any n less than map size, partial sorting will be a better option.
If I understood you correctly, duplicated values need to be taken into account. For instance, for n=3 values 12,12,10,8,5 the third-largest value will be 10 (if you don't duplicate then the following solution can be simplified).
I suggest approaching this problem in the following steps:
Reverse the given map. So that values of the source map will become the keys, and vice versa. In the case of duplicated values, the key (value in the reversed map) that comes first alphabetically will be preserved.
Create a map of frequencies. So that the values of the source map will become the keys of the reversed map. Values will represent the number of occurrences for each value.
Flatten the values of reversed map into a list.
Perform a partial sorting by utilizing PriorityQueue as container for n highest values. PriorityQueue is based on the so called min heap data structure. While instantiating PriorityQueue you either need to provide a Comparator or elements of the queue has to have a natural sorting order, i.e. implement interface Comparable (which is the case for Integer). Methods element() and peek() will retrieve the smallest element from the priority queue. And the queue will contain n largest values from the given map, its smallest element will be the n-th highest value of the map.
The implementation might look like this:
public static void printKeyForNthValue(Map<String, Integer> map, int n) {
if (n <= 0) {
System.out.println("required element can't be found");
}
Map<Integer, String> reversedMap = getReversedMap(map);
Map<Integer, Integer> valueToCount = getValueFrequencies(map);
List<Integer> flattenedValues = flattenFrequencyMap(valueToCount);
Queue<Integer> queue = new PriorityQueue<>();
for (int next: flattenedValues) {
if (queue.size() >= n) {
queue.remove();
}
queue.add(next);
}
if (queue.size() < n) {
System.out.println("required element wasn't found");
} else {
System.out.println("value:\t" + queue.element());
System.out.println("key:\t" + reversedMap.get(queue.element()));
}
}
private static Map<Integer, String> getReversedMap(Map<String, Integer> map) {
Map<Integer, String> reversedMap = new HashMap<>();
for (Map.Entry<String, Integer> entry: map.entrySet()) { // in case of duplicates the key the comes first alphabetically will be preserved
reversedMap.merge(entry.getValue(), entry.getKey(),
(s1, s2) -> s1.compareTo(s2) < 0 ? s1 : s2);
}
return reversedMap;
}
private static Map<Integer, Integer> getValueFrequencies(Map<String, Integer> map) {
Map<Integer, Integer> result = new HashMap<>();
for (Integer next: map.values()) {
result.merge(next, 1, Integer::sum); // the same as result.put(next, result.getOrDefault(next, 0) + 1);
}
return result;
}
private static List<Integer> flattenFrequencyMap(Map<Integer, Integer> valueToCount) {
List<Integer> result = new ArrayList<>();
for (Map.Entry<Integer, Integer> entry: valueToCount.entrySet()) {
for (int i = 0; i < entry.getValue(); i++) {
result.add(entry.getKey());
}
}
return result;
}
Note, if you are not familiar with Java 8 method merge(), inside getReversedMap() you can replace it this with:
if (!reversedMap.containsKey(entry.getValue()) ||
entry.getKey().compareTo(reversedMap.get(entry.getValue())) < 0) {
reversedMap.put(entry.getValue(), entry.getKey());
}
main() - demo
public static void main(String[] args) {
Map<String, Integer> source =
Map.of("w", 10, "b", 12, "a", 10, "r", 12,
"k", 3, "l", 5, "y", 3, "t", 9);
printKeyForNthValue(source, 3);
}
Output (the third-greatest value from the set 12, 12, 10, 10, 9, 5, 3, 3)
value: 10
key: a
When finding the kth highest value, you should consider using a priority queue (aka a heap) or using quick select.
A heap can be constructed in O(n) time however if you initialize it and insert n elements, it will take O(nlogn) time. After which you can pop k elements in order to get the kth highest element
Quick select is an algorithm designed for finding the nth highest element in O(n) time

How to calculate the sum of values of different hashmaps with the same key?

So I have this hashmap named "hm" which produces the following output(NOTE:
this is just a selection) :
{1=35, 2=52, 3=61, 4=68, 5=68, 6=70, 7=70, 8=70, 9=70, 10=72, 11=72}
{1=35, 2=52, 3=61, 4=68, 5=70, 6=70, 7=70, 8=68, 9=72, 10=72, 11=72}
{1=35, 2=52, 3=61, 4=68, 5=68, 6=70, 7=70, 8=70, 9=72, 10=72, 11=72}
This output was created with the following code(NOTE : the rest of the class code is not shown here) :
private int scores;
HashMap<Integer,Integer> hm = new HashMap<>();
for (int i = 0; i < fileLines.length(); i++) {
char character = fileLines.charAt(i);
this.scores = character;
int position = i +1;
hm.put(position,this.scores);
}
System.out.println(hm);
What I am trying to do is put all these hashmaps together into one hashmap with as value the sum of the values per key. I am familiar with Python's defaultdict, but could not find an equivalent working example. I have searched for an answer and hit those answers below but they do not solve my problem.
How to calculate a value for each key of a HashMap?
what java collection that provides multiple values for the same key
is there a Java equivalent of Python's defaultdict?
The desired output would be :
{1=105, 2=156, 3=183 , 4=204 ,5=206 ..... and so on}
Eventually the average per position(key) has to be calculated but that is a problem I think I can fix on my own when I know how to do the above.
EDIT : The real output is much much bigger ! Think about 100+ of the hashmaps with more than 100 keys.
Try with something like that
public Map<Integer, Integer> combine(List<Map<Integer, Integer>> maps) {
Map<Integer, Integer> result = new HashMap<Integer, Integer>();
for (Map<Integer, Integer> map : maps) {
for (Map.Entry<Integer, Integer> entry : map.entrySet()) {
int newValue = entry.getValue();
Integer existingValue = result.get(entry.getKey());
if (existingValue != null) {
newValue = newValue + existingValue;
}
result.put(entry.getKey(), newValue);
}
}
return result;
}
Basically:
Create a new map for the result
Iterate over each map
Take each element and if already present in the result increment the value, if not put it in the map
return the result
newHashMap.put(key1,map1.get(key1)+map2.get(key1)+map3.get(key1));

Increase speed of composition from list and map

I use a Dico class to store weight of term and id of document where it appears
public class Dico
{
private String m_term; // term
private double m_weight; // weight of term
private int m_Id_doc; // id of doc that contain term
public Dico(int Id_Doc,String Term,double tf_ief )
{
this.m_Id_doc = Id_Doc;
this.m_term = Term;
this.m_weight = tf_ief;
}
public String getTerm()
{
return this.m_term;
}
public double getWeight()
{
return this.m_weight;
}
public void setWeight(double weight)
{
this.m_weight= weight;
}
public int getDocId()
{
return this.m_Id_doc;
}
}
And I use this method to calculate final weight from a Map<String,Double> and List<Dico>:
public List<Dico> merge_list_map(List<Dico> list,Map<String,Double> map)
{
// in map each term is unique but in list i have redundancy
List<Dico> list_term_weight = new ArrayList <>();
for (Map.Entry<String,Double> entrySet : map.entrySet())
{
String key = entrySet.getKey();
Double value = entrySet.getValue();
for(Dico dic : list)
{
String term =dic.getTerm();
double weight = dic.getWeight();
if(key.equals(term))
{
double new_weight =weight*value;
list_term_weight.add(new Dico(dic.getDocId(), term, new_weight));
}
}
}
return list_term_weight;
}
I have 36736 elements in the map and 1053914 in list, currently this program take lot of time to compile: BUILD SUCCESSFUL (total time: 17 minutes 15 seconds).
How can I get only the term from the list that equals the term from map ?
You can use the lookup functionality of the Map, i.e. Map.get() given that your map maps terms to weights. This should have significant performance improvements. The only difference is the output list is in the order as the input list, rather than the order the keys occur in the weighting Map.
public List<Dico> merge_list_map(List<Dico> list, Map<String, Double> map)
{
// in map each term is unique but in list i have redundancy
List<Dico> list_term_weight = new ArrayList<>();
for (Dico dic : list)
{
String term = dic.getTerm();
double weight = dic.getWeight();
Double value = map.get(term); // <== fetch weight from Map
if (value != null)
{
double new_weight = weight * value;
list_term_weight.add(new Dico(dic.getDocId(), term, new_weight));
}
}
return list_term_weight;
}
Basic test
List<Dico> list = Arrays.asList(new Dico(1, "foo", 1), new Dico(2, "bar", 2), new Dico(3, "baz", 3));
Map<String, Double> weights = new HashMap<String, Double>();
weights.put("foo", 2d);
weights.put("bar", 3d);
System.out.println(merge_list_map(list, weights));
Output
[Dico [m_term=foo, m_weight=2.0, m_Id_doc=1], Dico [m_term=bar, m_weight=6.0, m_Id_doc=2]]
Timing test - 10,000 elements
List<Dico> list = new ArrayList<Dico>();
Map<String, Double> weights = new HashMap<String, Double>();
for (int i = 0; i < 1e4; i++) {
list.add(new Dico(i, "foo-" + i, i));
if (i % 3 == 0) {
weights.put("foo-" + i, (double) i); // <== every 3rd has a weight
}
}
long t0 = System.currentTimeMillis();
List<Dico> result1 = merge_list_map_original(list, weights);
long t1 = System.currentTimeMillis();
List<Dico> result2 = merge_list_map_fast(list, weights);
long t2 = System.currentTimeMillis();
System.out.println(String.format("Original: %d ms", t1 - t0));
System.out.println(String.format("Fast: %d ms", t2 - t1));
// prove results equivalent, just different order
// requires Dico class to have hashCode/equals() - used eclipse default generator
System.out.println(new HashSet<Dico>(result1).equals(new HashSet<Dico>(result2)));
Output
Original: 1005 ms
Fast: 16 ms <=== loads quicker
true
Also, check the initialization of the Map. (http://docs.oracle.com/javase/7/docs/api/java/util/HashMap.html) The rehash of the map is costly in performance.
As a general rule, the default load factor (.75) offers a good
tradeoff between time and space costs. Higher values decrease the
space overhead but increase the lookup cost (reflected in most of the
operations of the HashMap class, including get and put). The expected
number of entries in the map and its load factor should be taken into
account when setting its initial capacity, so as to minimize the
number of rehash operations. If the initial capacity is greater than
the maximum number of entries divided by the load factor, no rehash
operations will ever occur.
If many mappings are to be stored in a HashMap instance, creating it
with a sufficiently large capacity will allow the mappings to be
stored more efficiently than letting it perform automatic rehashing as
needed to grow the table.
If you know, or have an approximation of the number of elements that you put in the map, you can create your Map like this:
Map<String, Double> foo = new HashMap<String, Double>(maxSize * 2);
In my experience, you can increase your performance by a factor of 2 or more.
In order to have the merge_list_map function to be efficient, you need to actually use the Map for what it is: an efficient data structure for key lookup.
As you are doing, looping on the Map entries and looking for a match in the List, the algorithm is O(N*M) where M is the size of the map and N the size of the list. That is certainly the worst you can get.
If you loop first through the List and then, for each Term, do a lookup in the the Map with Map$get(String key), you will get a time complexity of O(N) since the map lookup can be considered as O(1).
In term of design, and if you can use Java8, your problem can be translated in terms of Streams:
public static List<Dico> merge_list_map(List<Dico> dico, Map<String, Double> weights) {
List<Dico> wDico = dico.stream()
.filter (d -> weights.containsKey(d.getTerm()))
.map (d -> new Dico(d.getTerm(), d.getWeight()*weights.get(d.getTerm())))
.collect (Collectors.toList());
return wDico;
}
The new weighted list is built following a logical process:
stream(): take the list as a stream of Dico elements
filter(): keep only the Dico elements whose term is in the weights map
map(): for each filtered element, create a new Dico() instance with the computed weight.
collect(): collect all the new instances in a new list
return the new list that contains the filtered Dico with the new weight.
Performance wise, I tested it against some text, the narrative of Arthur Gordon Pym from E.A.Poe:
String text = null;
try (InputStream url = new URL("http://www.gutenberg.org/files/2149/2149-h/2149-h.htm").openStream()) {
text = new Scanner(url, "UTF-8").useDelimiter("\\A").next();
}
String[] words = text.split("[\\p{Punct}\\s]+");
System.out.println(words.length); // => 108028
Since there are only 100k words in the book, for good measure, just x10 (initDico() is a helper to build the List<Dico> from the words):
List<Dico> dico = initDico(words);
List<Dico> bigDico = new ArrayList<>(10*dico.size());
for (int i = 0; i < 10; i++) {
bigDico.addAll(dico);
}
System.out.println(bigDico.size()); // 1080280
Build the weights map, using all words (initWeights() builds a frequency map of the words in the book):
Map<String, Double> weights = initWeights(words);
System.out.println(weights.size()); // 9449 distinct words
The the test of merging the 1M words against the map of weights:
long start = System.currentTimeMillis();
List<Dico> wDico = merge_list_map(bigDico, weights);
long end = System.currentTimeMillis();
System.out.println("===== Elapsed time (ms): "+(end-start));
// => 105 ms
The weights map is significantly smaller than yours, but it should not impact the the timing since the lookup operations are in quasi-constant
time.
This is no serious benchmark for the function, but it already shows that merge_list_map() should score less than 1s (loading and building list and map are not part of the function).
Just to complete the exercise, following are the initialisation methods used in the test above:
private static List<Dico> initDico(String[] terms) {
List<Dico> dico = Arrays.stream(terms)
.map(String::toLowerCase)
.map(s -> new Dico(s, 1.0))
.collect(Collectors.toList());
return dico;
}
// weight of a word is the frequency*1000
private static Map<String, Double> initWeights(String[] terms) {
Map<String, Long> wfreq = termFreq(terms);
long total = wfreq.values().stream().reduce(0L, Long::sum);
return wfreq.entrySet().stream()
.collect(Collectors.toMap(Map.Entry::getKey, e -> (double)(1000.0*e.getValue()/total)));
}
private static Map<String, Long> termFreq(String[] terms) {
Map<String, Long> wfreq = Arrays.stream(terms)
.map(String::toLowerCase)
.collect(groupingBy(Function.identity(), counting()));
return wfreq;
}
You should use the method contains() for list. In this way you'll avoid the second for. Even if the contains() method has O(n) complexity, you should see a a small improvement. Of course, rememeber to re-implement the equals(). Otherwise you should use a second Map, as bot suggested.
Use the lookup functionality of the Map, as Adam pointed out, and use HashMap as implementation of Map - HashMap lookup complexity is O(1). This should result in increased performance.

Find map value with highest number of occurrences

I have a Map<Integer,Integer>
1 10
2 10
3 20
5 20
6 11
7 22
How do I find the maximum repeated value of the map? In this case - that is 10 & 20. Repeated count is 2 on both case.
Don't reinvent the wheel and use the frequency method of the Collections class:
public static int frequency(Collection<?> c, Object o)
If you need to count the occurrences for all values, use a Map and loop cleverly :)
Or put your values in a Set and loop on each element of the set with the frequency method above. HTH
If you fancy a more functional, Java 8 one-liner solution with lambdas, try:
Map<Integer, Long> occurrences =
map.values().stream().collect(Collectors.groupingBy(w -> w, Collectors.counting()));
loop over the hashmap, and count the number of repetitions.
for(Integer value:myMap.values() ){
Integer count = 1;
if(countMap.contains(value)){
count = countMap.get(value);
count++;
}
countMap.put(value, count);
}
then loop over the result map, and find the max(s):
Integer maxValue=0;
for (Map.Entry<Integer, Integer> entry : countMap.entrySet()){
if(entry.getValue => maxValue){
maxValue = entry.getValue;
maxResultList.add(entry.Key);
}
}
Simple solution is you need to write your own put method for getting repeated values
for repeated values
put(String x, int i){
List<Integer> list = map.get(x);
if(list == null){
list = new ArrayList<Integer>();
map.put(x, list);
}
list.add(i);
}
So, in this case, map to a list of [10,10,20,20]
for getting repeated values occurrence
You need be to compare the size of your values list with your values set.
List<T> listOfValues= map.values();
Set<T> listOfSetValues= new HashSet<T>(map.values);
now you need to check size of both collections; if unequal, you have duplicates, to get the max repeated occurrence subtract list from map size.
We can use a number of simple methods to do this.
First, we can define a method that counts elements, and returns a map from the value to its occurrence count:
Map<T, Integer> countAll(Collection<T> c){
return c.stream().collect(groupingByConcurrent(k->k, Collectors.counting()));
}
Then, to filter out all entries having fewer instances than the one with the most, we can do this:
C maxima(Collection<T> c, Comparator<? super T> comp,
Producer<C extends Collection<? super T> p)){
T max = c.stream().max(comp);
return c.stream().filter(t-> (comp.compare(t,max) >= 0)).collect(p);
}
Now we can use them together to get the results we want:
maxima(countAll(yourMap.valueSet()).entrySet(),
Comparator.comparing(e->e.getValue()), HashSet::new);
Note that this would produce a HashSet<Entry<Integer,Integer>> in your case.
Try this simple method:
public String getMapKeyWithHighestValue(HashMap<String, Integer> map) {
String keyWithHighestVal = "";
// getting the maximum value in the Hashmap
int maxValueInMap = (Collections.max(map.values()));
//iterate through the map to get the key that corresponds to the maximum value in the Hashmap
for (Map.Entry<String, Integer> entry : map.entrySet()) { // Iterate through hashmap
if (entry.getValue() == maxValueInMap) {
keyWithHighestVal = entry.getKey(); // this is the key which has the max value
}
}
return keyWithHighestVal;
}

Representing binary relation in java

One famous programmer said "why anybody need DB, just give me hash table!". I have list of grammar symbols together with their frequencies. One way it's a map: symbol#->frequency. The other way its a [binary] relation. Problem: get top 5 symbols by frequency.
More general question. I'm aware of [binary] relation algebra slowly making inroad into CS theory. Is there java library supporting relations?
List<Entry<String, Integer>> myList = new ArrayList<...>();
for (Entry<String, Integer> e : myMap.entrySet())
myList.add(e);
Collections.sort(myList, new Comparator<Entry<String, Integer>>(){
int compare(Entry a, Entry b){
// compare b to a to get reverse order
return new Integer(b.getValue()).compareTo(new Integer(a.getValue());
}
});
List<Entry<String, Integer>> top5 = myList.sublist(0, 5);
More efficient:
TreeSet<Entry<String, Integer>> myTree = new TreeSet<...>(
new Comparator<Entry<String, Integer>>(){
int compare(Entry a, Entry b){
// compare b to a to get reverse order
return new Integer(b.getValue()).compareTo(new Integer(a.getValue());
}
});
for (Entry<String, Integer> e : myMap.entrySet())
myList.add(e);
List<Entry<String, Integer>> top5 = new ArrayList<>();
int i=0;
for (Entry<String, Integer> e : myTree) {
top5.add(e);
if (i++ == 4) break;
}
With TreeSet it should be easy:
int i = 0;
for(Symbol s: symbolTree.descendingSet()) {
i++;
if(i > 5) break; // or probably return
whatever(s);
}
Here is a general algorithm, assuming you already have a completed symbol HashTable
Make 2 arrays:
freq[5] // Use this to save the frequency counts for the 5 most frequent seen so far
word[5] // Use this to save the words that correspond to the above array, seen so far
Use an iterator to traverse your HashTable or Map:
Compare the current symbol's frequency against the ones in freq[5] in sequential order.
If the current symbol has a higher frequency than any entry in the array pairing above, shift that entry and all entries below it one position (i.e. the 5th position gets kicked out)
Add the current symbol / frequency pair to the newly vacated position
Otherwise, ignore.
Analysis:
You make at most 5 comparisons (constant time) against the arrays with each symbol seen in the HashTable, so this is O(n)
Each time you have to shift the entries in the array down, it is also constant time. Assuming you do a shift every time, this is still O(n)
Space: O(1) to store the arrays
Runtime: O(n) to iterate through all the symbols

Categories