Mapping large set of Keys to a small set of Values

Mapping large set of Keys to a small set of Values - java

If you had 1,000,000 keys (ints) that mapped to 10,000 values (ints). What would be the most efficient way (lookup performance and memory usage) to implement.
Assume the values are random. i.e there is not a range of keys that map to a single value.
The easiest approach I can think of is a HashMap but wonder if you can do better by grouping the keys that match a single value.
Map<Integer,Integer> largeMap = Maps.newHashMap();
largeMap.put(1,4);
largeMap.put(2,232);
...
largeMap.put(1000000, 4);

If the set of keys is known to be in a given range (as 1-1000000 shown in your example), then the simplest is to use an array. The problem is that you need to look up values by key, and that limits you to either a map or an array.
The following uses a map of values to values simply to avoid duplicate instances of equal value objects (there may be a better way to do this, but I can't think of any). The array simply serves to look up values by index:
private static void addToArray(Integer[] array, int key,
Integer value, Map<Integer, Integer> map) {
array[key] = map.putIfAbsent(value, value);
}
And then values can be added using:
Map<Integer, Integer> keys = new HashMap<>();
Integer[] largeArray = new Integer[1000001];
addToArray(largeArray, 1, 4, keys);
addToArray(largeArray, 2, 232, keys);
...
addToArray(largeArray, 1000000, 4, keys);
If new Integer[1000001] seems like a hack, you can still maintain a sort of "index offset" to indicate the actual key associated with index 0 in the array.
And I'd put that in a class:
class LargeMap {
private Map<Integer, Integer> keys = new HashMap<>();
private Integer[] keyArray;
public LargeMap(int size) {
this.keyArray = new Integer[size];
}
public void put(int key, Integer value) {
this.keyArray[key] = this.keys.putIfAbsent(value, value);
}
public Integer get(int key) {
return this.keyArray[key];
}
}
And:
public static void main(String[] args) {
LargeMap myMap = new LargeMap(1000_000);
myMap.put(1, 4);
myMap.put(2, 232);
myMap.put(1000_000, 4);
}

I'm not sure if you can optimize much here by grouping anything. A 'reverse' mapping might give you slightly better performance if you want to do lookup by values instead of by key (i.e. get all keys with a certain value) but since you didn't explicitly said that you want to do this I wouldn't go with that approach.
For optimization you can use an int array instead of a map, if the keys are in a fixed range. Array lookup is O(1) and primitive arrays use less memory than maps.
int offset = -1;
int[] values = new int[1000000];
values[1 + offset] = 4;
values[2 + offset] = 232;
// ...
values[1000000 + offset] = 4;
If the range doesn't start at 1 you can adapt the offset.
There are also libraries like trove4j which provide better performance and more efficient storage for this kind of data than than standard collections, though I don't know how they compare to the simple array approach.

HashMap is the worst solution. The hash of an integer is itself. I would say a TreeMap if you want an easily available solution. You could write your own specialized tree map, for example splitting the keys into two shorts and having a TreeMap within a Treemap.

Related

Remove a specific value from a key (HashMaps)

I have the following HashMap (HashMap<String, String[]>) and was wondering, if there is a method to remove a specific String from the array of a specific key.
I've found only methods to remove one key basing on a value, but for example, I have:
("key1", new String[]{"A", "B", "C"})
How can I remove only B?

Here's s plain Java solution:
map.computeIfPresent("key1", (k, v) -> Arrays.stream(v)
.filter(s -> !s.equals("B")).toArray(String[]::new));

You would get the values for the specific key and remove the given value from it, then put it back into the map.
public void <K> removeValueFromKey(final Map<K, K[]> map, final K key, final K value) {
K[] values = map.get(key);
ArrayList<K> valuesAsList = new ArrayList<K>(values.length);
for (K currentValue : values) {
if (!currentValue.equals(value)) {
valuesAsList.add(currentValue);
}
}
K[] newValues = new K[valuesAsList.size()];
newValues = valuesAsList.toArray(newValues);
map.put(key, newValues);
}
Be aware that the runtime of course is linear to the size of the given array. There is no faster way, because you need to iterate over each element of the array to find all values that are equal to the given value.
However, you could do a faster implementation with other data structures, if that is practicable. For example sets would be better than arrays, or any other data structure that implements contains is faster than O(n).
The same holds for space complexity; you have a peak where you need to hold both arrays in the memory. This is because the size of an array cannot be changed; the method will construct a new array. Thus you will have two arrays in the memory, O(2n).
A Collection<String> may be a better solution, depending on how often you'll call the method, compared to how many elements a map holds.
Another thing is that you can speed up the progress by guessing a good initial capacity for the ArrayList.

Finding the maximum and minimum values in a HashMap of ArrayLists with semi-known key - Java

I have a HashMap of ArrayLists as follows:
HashMap<String, ArrayList<Double>> Flkn = new HashMap<String, ArrayList<Double>>();
Flkn.put("T_"+l+"_"+k+"_"+n, new ArrayList());
l, k and n take their values based on several loops and hence their values change depending on the parameters.
Under these circumstances, I am wanting to know for a given value of k, how the minimum and maximum values of the elements can be found in their relevant ArrayLists. (Please note that the length or ArrayLists is also dependent on the parameters)
For instance, let's say that I am wanting to know the minimum and maximum values within the ArrayList for k=3. Then what I am looking for would be all the ArrayLists that have the key ("T_"+l+"_"+3+"_"+n) for every value of l and n. The problem here is that there is no way I can predict the values of l and n because they are totally dependent on the code. Another inconvenient thing is that I am wanting to get the minimum and maximum values out of the loops where l and n get their values, hence using these variables directly isn't feasible.
What would be an efficient way to get Java to call every value of l and n and fetch the values in the ArrayList in order to find the minimum and maximum of these values?

If you absolutely have to deal with such "smart keys", for any kind of processing based on its parts you first need functions to extract values of those parts:
final static Function<String, Integer> EXTRACT_K = s -> Integer.parseInt(s.replaceAll("T_\\d+_(\\d+)_\\d+", "$1"));
final static Function<String, Integer> EXTRACT_L = s -> Integer.parseInt(s.replaceAll("T_(\\d+)_\\d+_\\d+", "$1"));
final static Function<String, Integer> EXTRACT_N = s -> Integer.parseInt(s.replaceAll("T_\\d+_(\\d+)_\\d+", "$1"));
These functions when applied to a key return k, l or n, respectively (if one knows more elegant way to do such, please comment or edit).
To be as more effective as possible (iterate not over entire map, but only over its part), suggest to switch from HashMap to any implementation of SortedMap with ordering based on values stored in a smart key:
final static Comparator<String> CMP
= Comparator.comparing(EXTRACT_K)
.thenComparing(EXTRACT_L)
.thenComparing(EXTRACT_N);
SortedMap<String, List<Double>> map = new TreeMap<>(CMP);
Such you get a map where entries will be first sorted by k, then by l and finally by n. Now it is possible to get all lists mapped to a given k using:
int k = 1;
Collection<List<Double>> lists
= map.subMap(String.format("T_0_%s_0", k), String.format("T_0_%s_0", k + 1)).values();
To get max and min values around items of subMap, take the stream of its values, convert it to DoubleStream and use its .summaryStatistics() as follows:
DoubleSummaryStatistics s
= subMap.values().stream()
.flatMapToDouble(vs -> vs.stream().mapToDouble(Double::doubleValue))
.summaryStatistics();
The final part is to check whether values exist:
if (s.getCount() > 0) {
max = s.getMax();
min = s.getMin();
} else
// no values exist for a given k, thus max and min are undefined

In Java 8 you could use DoubleSummaryStatistics and do something like this:
final DoubleSummaryStatistics stats =
Flkn.entrySet().stream().filter(e -> e.getKey().matches("T_[0-9]+_" + k + "_[0-9]+"))
.flatMapToDouble(e -> e.getValue().stream().mapToDouble(Double::doubleValue))
.summaryStatistics();
System.out.println(stats.getMax());
System.out.println(stats.getMin());
filter to keep only the entries you need; flatMapToDouble to merge your lists; and summaryStatistics to get both the minimum and maximum.

I'll simplify this a bit. Suppose you have a key that depends on an Integer k and a String s. It might seem a good idea to use a
Map<String, Object>
where the keys are k + " " + s (or something similar).
This is a terrible idea because, as you have realised, you have to iterate over the entire map and use String.split in order to find entries for a particular k value. This is extremely inefficient.
One common solution is to use a Map<Integer, Map<String, Object>> instead. You can get the object associated to k = 3, s = "foo" by doing map.get(3).get("foo"). You can also get all objects associated to 3 by doing map.get(3).values().
The downside to this approach is that it is a bit cumbersome to add to the map. In Java 8 you can do
map.computeIfAbsent(3, k -> new HashMap<String, Object>()).put("foo", "bar");
Google Guava's Table interface takes the pain out of using a data structure like this.

How to sort a map

I have a Map to sort as follows:
Map<String, String> map = new HashMap();
It contains the following String keys:
String key = "key1.key2.key3.key4"
It contains the following String values:
String value = "value1.value2"
where the key and value can vary by their number of dot sections from key1/value1 to key1.key2.key3.key4.key5/value1.value2.value3.value4.value5 non-homogeneously
I need to compare them according to the number of dots present in keys or in values according to the calling method type key / value :
sortMap(Map map, int byKey);
or
sortMap(Map map, int byValue);
The methods of course will return a sorted map.
Any help would be appreciated.

There is no way to impose any sort of order on HashMap.
If you want to order elements by some comparison on the keys, then use a TreeMap with some Comparator on the keys, or just use their default Comparable ordering.
If you want to order by the values, the only real option is to use a LinkedHashMap, which preserves the order that entries were put into the map, and then to sort the entries before inserting them into the map, or perhaps some non-JDK Map implementation. There are dirty hacks that make a key comparator that actually secretly compares the values, but these are dangerous and frequently lead to unpredictable behavior.

For starters, you will need to be using an instance of SortedMap. If the map doesn't implement that interface, then it has an undefined/arbitrary iteration order and you can't control it. (Generally this is the case, since a map is a way of associating values with keys; ordering is an auxiliary concern.)
So I'll assume you're using TreeMap, which is the canonical sorted map implementation. This sorts its keys according to a Comparator which you can supply in the constructor. So if you can write such a comparator that determines which is the "lower" of two arbitrary keys (spoiler alert: you can), this will be straightforward to implement.
This will, however, only work when sorting by key. I don't know if it makes much sense to sort a map by value, and I'm not aware of any straightforward way to do this. The best I can think of is to write a Comparator<Map.Entry> that sorts on values, call Map.getEntrySet and push all the entries into a list, then call Collections.sort on the list. It's not very elegant or efficient but it should get the job done if performance isn't your primary concern.
(Note also that if your keys aren't immutable, you will run into a lot of trouble, as they won't be resorted when externally changed.

You should use a TreeMap and implement a ValueComparator or make the key and value objects that implement Comparable.
Must be a duplicate here...
edit: duplicate of (to name just one) Sort a Map<Key, Value> by values (Java)

I did it by the following:
#SuppressWarnings({ "unchecked", "rawtypes" })
public static Map sortMap(Map unsortedMap) {
List list = new LinkedList(unsortedMap.entrySet());
// sort list based on comparator
Collections.sort(list, new Comparator() {
public int compare(Object o1, Object o2) {
String value1 = (String)((Map.Entry) (o1)).getValue();
String value2 = (String)((Map.Entry) (o2)).getValue();
// declare the count
int count1 = findOccurances(value1, '.');
int count2 = findOccurances(value2, '.');
// Go to thru the comparing
if(count1 > count2){
return -1;
}
if(count1 < count2){
return 1;
}
return 0;
}
});
// put the sorted list into map again
Map sortedMap = new LinkedHashMap();
for (Iterator it = list.iterator(); it.hasNext();) {
Map.Entry entry = (Map.Entry) it.next();
sortedMap.put(entry.getKey(), entry.getValue());
}
return sortedMap;
}
With the following helper method:
private static int findOccurances(String s, char chr) {
final char[] chars = s.toCharArray();
int count = 0;
for (int i = 0; i < chars.length; i++) {
if (chars[i] == chr) {
count++;
}
}
return count;
}
Here, I can put some switch on the comparing part with an additional int argument to change between asc/desc.
I can change between values and keys through a switch of another int argument value to get my answer.

how to order random values by hashcode in hashmap

I have a simple class that fills in a simple hashmap I want to order values by hashcode how to do that?
import java.util.HashMap;
import java.util.Map;
import java.util.Random;
public class Ch11Ex18 {
public static void main(String[] args) {
Random rand = new Random(47);
Map<Integer,Integer> m = new HashMap<Integer,Integer>();
for(int i = 0; i < 10000; i++) {
// Produce a number between 0 and 20:
int r = rand.nextInt(20);
Integer freq = m.get(r);
m.put(r, freq == null ? 1 : freq + 1);
}
System.out.println(m);
}
}

You don't: HashMap is inherently unordered.
You could use TreeMap with a custom comparator, but then you should be aware that if you use unequal objects with the same hash code, only one of them will end up in the map... and even so this would order by the keys rather than the values.
You could create an ArrayList<Integer> containing a copy of the values, and sort that - but then you won't have the keys.
You could create an ArrayList<Map.Entry<Integer, Integer>> containing a copy of the entries, and sort that... but really, what's the point?
Fundamentally, this is an odd thing to do - hash codes should not be used like this, basically. They're not unique, shouldn't be viewed as a source of randomness, etc. Whatever the bigger picture is here, there's pretty much bound to be a better approach.

A TreeMap sorts by key.
Map yourMap= new HashMap();
// Enter values
Map sortedMap = new TreeMap(yourMap);

In your case, since Integer.hashCode() is equal to the actual number, you can just plug your mappings into a TreeMap, and they will be sorted accordingly.

The iteration order of a HashMap is the natural ordering of the hash codes of the keys (because the hash value determines the bucket and the buckets are iterated over sequentially), so you could just iterate over map.keySet(). For the key type Integer, the hash code is equal to the actual value of the Integer.

Java - TreeMap Solution

I haven't done Java in a while and i need some suggestions and idea's regarding data structures.
Currently i am using a TreeMap to map String values to Integer value. I now need to do some calculations and divide the Integer value of the map entry by the the size of the whole map and store this for each entry. I was thinking about using a Map,Integer> but is there a 3 way generics data structure in Java?
My current solution for this is this ..
int treeSize = occurrence.size();
String [][] weight = new String[treeSize][2];
int counter=0;
double score =0;
for(Entry<String, Integer> entry : occurrence.entrySet()) {
weight[counter][0]=entry.getKey();
score=entry.getValue()/treeSize;
weight[counter][1]= Double.toString(score);
counter++;
}

I would use another object to hold this data:
public Data {
private int value;
private double score;
...
}
And then type the map as Map<String, Data>. After inserting all the values, you can iterate over the values and update the ratio property for each value in the map. For example:
double size = myMap.size();
for(Map.Entry<String, Data> entry : myMap.entrySet()) {
Data data = entry.getValue();
data.setScore(data.getValue() / size);
}
EDIT
Another thought just came to mind. Instead of calculating the values after you have inserted it, you should probably calculate it as you are inserting it; it's more efficient that way. Of course, you can only do this if you know the total number of values beforehand.
An even better way is to perform the calculation only when you retrieve a value from the map. There are two advantages to this:
You don't need a separate object. Just abstract the access of the value from the map inside another function which returns the value associated with the key, divided by the size of the map.
Since you don't have a separate object to maintain the calculated value, you don't need to update it every time you add or delete a new value.

You could use a Map.Entry<Integer, Double> to hold the two values. (Ultimately, you'd use either AbstractMap.SimpleEntry or AbstractMap.SimpleImmutableEntry)
So your TreeMap would be TreeMap<String, Map.Entry<Integer, Double>>
However, unless you have a good reason to do otherwise, I'd strongly suggest that you do the calculation on the fly. Recalculating every fraction every time anything is inserted or deleted is time consuming, and churns small little objects, so it's likely to be slower than just doing the calculation. Also, recalculation will cause threading issues if multiple threads access the TreeMap. Instead, something like
public synchronized double getFraction(String key) {
Integer value = theTreeMap.get(key);
if (value == null)
return 0.0; // or throw an exception if you prefer...
// note, since the Map has at least one entry, no need to check for div by zero
return value.doubleValue() / theTreeMap.size();
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Mapping large set of Keys to a small set of Values - java

HashMap is the worst solution. The hash of an integer is itself. I would say a TreeMap if you want an easily available solution. You could write your own specialized tree map, for example splitting the keys into two shorts and having a TreeMap within a Treemap.

Related

Remove a specific value from a key (HashMaps)

Finding the maximum and minimum values in a HashMap of ArrayLists with semi-known key - Java

How to sort a map

how to order random values by hashcode in hashmap

Java - TreeMap Solution

Categories

Resources