Remove a specific value from a key (HashMaps)

Remove a specific value from a key (HashMaps) - java

I have the following HashMap (HashMap<String, String[]>) and was wondering, if there is a method to remove a specific String from the array of a specific key.
I've found only methods to remove one key basing on a value, but for example, I have:
("key1", new String[]{"A", "B", "C"})
How can I remove only B?

Here's s plain Java solution:
map.computeIfPresent("key1", (k, v) -> Arrays.stream(v)
.filter(s -> !s.equals("B")).toArray(String[]::new));

You would get the values for the specific key and remove the given value from it, then put it back into the map.
public void <K> removeValueFromKey(final Map<K, K[]> map, final K key, final K value) {
K[] values = map.get(key);
ArrayList<K> valuesAsList = new ArrayList<K>(values.length);
for (K currentValue : values) {
if (!currentValue.equals(value)) {
valuesAsList.add(currentValue);
}
}
K[] newValues = new K[valuesAsList.size()];
newValues = valuesAsList.toArray(newValues);
map.put(key, newValues);
}
Be aware that the runtime of course is linear to the size of the given array. There is no faster way, because you need to iterate over each element of the array to find all values that are equal to the given value.
However, you could do a faster implementation with other data structures, if that is practicable. For example sets would be better than arrays, or any other data structure that implements contains is faster than O(n).
The same holds for space complexity; you have a peak where you need to hold both arrays in the memory. This is because the size of an array cannot be changed; the method will construct a new array. Thus you will have two arrays in the memory, O(2n).
A Collection<String> may be a better solution, depending on how often you'll call the method, compared to how many elements a map holds.
Another thing is that you can speed up the progress by guessing a good initial capacity for the ArrayList.

Related

Mapping large set of Keys to a small set of Values

If you had 1,000,000 keys (ints) that mapped to 10,000 values (ints). What would be the most efficient way (lookup performance and memory usage) to implement.
Assume the values are random. i.e there is not a range of keys that map to a single value.
The easiest approach I can think of is a HashMap but wonder if you can do better by grouping the keys that match a single value.
Map<Integer,Integer> largeMap = Maps.newHashMap();
largeMap.put(1,4);
largeMap.put(2,232);
...
largeMap.put(1000000, 4);

If the set of keys is known to be in a given range (as 1-1000000 shown in your example), then the simplest is to use an array. The problem is that you need to look up values by key, and that limits you to either a map or an array.
The following uses a map of values to values simply to avoid duplicate instances of equal value objects (there may be a better way to do this, but I can't think of any). The array simply serves to look up values by index:
private static void addToArray(Integer[] array, int key,
Integer value, Map<Integer, Integer> map) {
array[key] = map.putIfAbsent(value, value);
}
And then values can be added using:
Map<Integer, Integer> keys = new HashMap<>();
Integer[] largeArray = new Integer[1000001];
addToArray(largeArray, 1, 4, keys);
addToArray(largeArray, 2, 232, keys);
...
addToArray(largeArray, 1000000, 4, keys);
If new Integer[1000001] seems like a hack, you can still maintain a sort of "index offset" to indicate the actual key associated with index 0 in the array.
And I'd put that in a class:
class LargeMap {
private Map<Integer, Integer> keys = new HashMap<>();
private Integer[] keyArray;
public LargeMap(int size) {
this.keyArray = new Integer[size];
}
public void put(int key, Integer value) {
this.keyArray[key] = this.keys.putIfAbsent(value, value);
}
public Integer get(int key) {
return this.keyArray[key];
}
}
And:
public static void main(String[] args) {
LargeMap myMap = new LargeMap(1000_000);
myMap.put(1, 4);
myMap.put(2, 232);
myMap.put(1000_000, 4);
}

I'm not sure if you can optimize much here by grouping anything. A 'reverse' mapping might give you slightly better performance if you want to do lookup by values instead of by key (i.e. get all keys with a certain value) but since you didn't explicitly said that you want to do this I wouldn't go with that approach.
For optimization you can use an int array instead of a map, if the keys are in a fixed range. Array lookup is O(1) and primitive arrays use less memory than maps.
int offset = -1;
int[] values = new int[1000000];
values[1 + offset] = 4;
values[2 + offset] = 232;
// ...
values[1000000 + offset] = 4;
If the range doesn't start at 1 you can adapt the offset.
There are also libraries like trove4j which provide better performance and more efficient storage for this kind of data than than standard collections, though I don't know how they compare to the simple array approach.

HashMap is the worst solution. The hash of an integer is itself. I would say a TreeMap if you want an easily available solution. You could write your own specialized tree map, for example splitting the keys into two shorts and having a TreeMap within a Treemap.

Finding the maximum and minimum values in a HashMap of ArrayLists with semi-known key - Java

I have a HashMap of ArrayLists as follows:
HashMap<String, ArrayList<Double>> Flkn = new HashMap<String, ArrayList<Double>>();
Flkn.put("T_"+l+"_"+k+"_"+n, new ArrayList());
l, k and n take their values based on several loops and hence their values change depending on the parameters.
Under these circumstances, I am wanting to know for a given value of k, how the minimum and maximum values of the elements can be found in their relevant ArrayLists. (Please note that the length or ArrayLists is also dependent on the parameters)
For instance, let's say that I am wanting to know the minimum and maximum values within the ArrayList for k=3. Then what I am looking for would be all the ArrayLists that have the key ("T_"+l+"_"+3+"_"+n) for every value of l and n. The problem here is that there is no way I can predict the values of l and n because they are totally dependent on the code. Another inconvenient thing is that I am wanting to get the minimum and maximum values out of the loops where l and n get their values, hence using these variables directly isn't feasible.
What would be an efficient way to get Java to call every value of l and n and fetch the values in the ArrayList in order to find the minimum and maximum of these values?

If you absolutely have to deal with such "smart keys", for any kind of processing based on its parts you first need functions to extract values of those parts:
final static Function<String, Integer> EXTRACT_K = s -> Integer.parseInt(s.replaceAll("T_\\d+_(\\d+)_\\d+", "$1"));
final static Function<String, Integer> EXTRACT_L = s -> Integer.parseInt(s.replaceAll("T_(\\d+)_\\d+_\\d+", "$1"));
final static Function<String, Integer> EXTRACT_N = s -> Integer.parseInt(s.replaceAll("T_\\d+_(\\d+)_\\d+", "$1"));
These functions when applied to a key return k, l or n, respectively (if one knows more elegant way to do such, please comment or edit).
To be as more effective as possible (iterate not over entire map, but only over its part), suggest to switch from HashMap to any implementation of SortedMap with ordering based on values stored in a smart key:
final static Comparator<String> CMP
= Comparator.comparing(EXTRACT_K)
.thenComparing(EXTRACT_L)
.thenComparing(EXTRACT_N);
SortedMap<String, List<Double>> map = new TreeMap<>(CMP);
Such you get a map where entries will be first sorted by k, then by l and finally by n. Now it is possible to get all lists mapped to a given k using:
int k = 1;
Collection<List<Double>> lists
= map.subMap(String.format("T_0_%s_0", k), String.format("T_0_%s_0", k + 1)).values();
To get max and min values around items of subMap, take the stream of its values, convert it to DoubleStream and use its .summaryStatistics() as follows:
DoubleSummaryStatistics s
= subMap.values().stream()
.flatMapToDouble(vs -> vs.stream().mapToDouble(Double::doubleValue))
.summaryStatistics();
The final part is to check whether values exist:
if (s.getCount() > 0) {
max = s.getMax();
min = s.getMin();
} else
// no values exist for a given k, thus max and min are undefined

In Java 8 you could use DoubleSummaryStatistics and do something like this:
final DoubleSummaryStatistics stats =
Flkn.entrySet().stream().filter(e -> e.getKey().matches("T_[0-9]+_" + k + "_[0-9]+"))
.flatMapToDouble(e -> e.getValue().stream().mapToDouble(Double::doubleValue))
.summaryStatistics();
System.out.println(stats.getMax());
System.out.println(stats.getMin());
filter to keep only the entries you need; flatMapToDouble to merge your lists; and summaryStatistics to get both the minimum and maximum.

I'll simplify this a bit. Suppose you have a key that depends on an Integer k and a String s. It might seem a good idea to use a
Map<String, Object>
where the keys are k + " " + s (or something similar).
This is a terrible idea because, as you have realised, you have to iterate over the entire map and use String.split in order to find entries for a particular k value. This is extremely inefficient.
One common solution is to use a Map<Integer, Map<String, Object>> instead. You can get the object associated to k = 3, s = "foo" by doing map.get(3).get("foo"). You can also get all objects associated to 3 by doing map.get(3).values().
The downside to this approach is that it is a bit cumbersome to add to the map. In Java 8 you can do
map.computeIfAbsent(3, k -> new HashMap<String, Object>()).put("foo", "bar");
Google Guava's Table interface takes the pain out of using a data structure like this.

What is better to iterate over an EnumMap in java?

Suppose I have declared an enum and corresponding emummap as:
enum MyEnum {
CONSTANT1, CONSTANT2, CONSTANT3;
}
EnumMap<MyEnum, String> MyEnumMap = new EnumMap<MyEnum, String>(MyEnum.class);
I want to iterate over MyEnumMap, for example, just to print each Entry one by one.
What is the best approach(fastest) to iterate over keys in the following cases:
When it is ensured that each constant in MyEnum is a key in MyEnumMap
When each constant in MyEnum may or may not be a key in MyEnumMap
I want to choose between foreach loop using MyEnumMap.keySet() or MyEnum.values(). Any other approach is most welcomed.

It does not matter. Internally, EnumMap is implemented with a pair of arrays of the same length as the number of enum entries. One array has enum elements, while the second array has objects mapped to them, or NULL placeholders. Any iteration over EnumMap is therefore equivalent to a for loop on an integer index that traverses the entire range of enum ordinals, so you should pick the approach that makes your code most readable to you.

If you take a look at code of EnumMap#keySet()
381 public Set<K> keySet() {382 Set<K> ks = keySet;383 if (ks != null)384 return ks;385 else386 return keySet = new KeySet();387 }
you will notice that it returns keySet used internally by EnumMap to store keys.
Now each time we call MyEnum.values() we are getting different array filled with all enum elements. This means that first empty array is created which later needs to be filled with all enums which requires some iteration.
So in first approach you are skipping iterating over enums already stored by map, while insecond approach we simply creating some temporary array which involves additional iteration over all MyEnum elements.

Perhaps, you just want another way of writing the code....
Since keys are always unique
for(MyEnum myEnum: MyEnum.values()){
String value = map.get(myEnum);
If(value != null){
//use the value here
}
}
Just another way to write it.
Or you could also try
for (Map.Entry<MyEnum, String> entry : map.entrySet()) {
System.out.println(entry.getKey() + "/" + entry.getValue());
}

It depend on your application logic, but here are some hints:
for 1)
// it is a bit faster to iterate over plain array, than over Set
// And you can also get here information about entries that are in enum, but not in hashMap, so you can have logic for those cases.
for (MyEnum e: MyEnum.values()) {
// you can get here information what is contained and not contained in your map
}
for 2) But it is still better to use 1) because you can have there information of enum values not contained in Map.
for (MyEnum e: MyEnumMap.keySet()) {
// you can check here all that is in your map, but you cant tell what is in enum but not in your map
}

Sort a Map based on size

I have a map like so:
Map<List<Item>, Double> items = new HashMap<List<Item>, Double>();
I would like to sort this hashmap based on the size of the List<Item> where the largest sized ones are first. I don't care about the ordering within same sized objects though.
So far I've tried to use a TreeSet like so:
SortedSet<Map.Entry<List<Item>, Double>> sortedItems = new TreeSet<Map.Entry<List<Item>, Double>>(
new Comparator<Map.Entry<List<Item>, Double>>() {
#Override
public int compare(
Entry<List<Item>, Double> o1,
Entry<List<Item>, Double> o2) {
return o2.getKey().size() - o1.getKey().size();
}
});
sortedItems.addAll(items.entrySet());
However, the sortedItems object is only taking one of every sized list. It is treating equally sized lists as duplicates and is ignoring them. How can I fix this issue.
EDIT: So from what I can tell, when 2 lists of the same size are being compared, my compare method is returning 0. This tells the set that the entries are equal and they are treated as duplicates. So I guess the only way to fix this is to ensure that the compare method never returns 0. So I wrote this code:
#Override
public int compare(
Entry<List<AuctionItem>, Double> o1,
Entry<List<AuctionItem>, Double> o2) {
if (o1.getKey().size() <= o2.getKey().size()) {
return -1;
} else {
return 1;
}
}

You are using a TreeSet with Comparator, and as per your implementation of the compare(), it returns 0 if the size of the list is same. As TreeSet cannot contain duplicates, it adds only one of those lists whose size is equal. You do not need to create a TreeSetin order to sort your Map, because SETS should be used when the elements resemble mathematical sets (NO duplicate elements). You could possibly do :
List<Map.Entry<List<Item>, Double>> list =
new LinkedList<Map.Entry<List<Item>, Double>>( map.entrySet() );
Collections.sort( list, new Comparator<Map.Entry<List<Item>, Double>>()
{
public int compare( Map.Entry<List<Item>, Double> o1, Map.Entry<List<Item>, Double> o2 )
{
return (o1.getKey().size().compareTo( o2.getKey.size() );
}
} );
That is, put the entries of map in a List and then sort the list.

If you attempt to put an item into a TreeSet for which your custom Comparator returns 0, the item will not be placed into the Set.
You have to use a Comparator which doesn't return 0. For Lists whose sizes are equal, you have to define a consistent, arbitrary order.
Here is an easy and convinient way to do so:
new Comparator<Map.Entry<List<Item>, Double>>() {
#Override
public int compare(Entry<List<Item>, Double> o1,
Entry<List<Item>, Double> o2) {
int diff = o2.getKey().size() - o1.getKey().size();
return diff != 0 ? diff :
System.identityHashCode(o2.getKey()) -
System.identityHashCode(o1.getKey());
}
};
Basically what I do is this: if the 2 lists have different size, size2 - size1 will do. If they have the same size, I return the difference of their identity hashcodes which will always differ from 0 because the identity hashcode is the memory address which is different for distinct objects. And it is always the same for an object, so the comparator will always return the same order for 2 lists having the same size.
Using this comparator you will get a sorted set which allows lists having the same size, and it will be sorted by list size.

It seems odd that you are using a list of values (items) as the key to your hashmap; however, I'll give it a go.
At the core any Map is not an ordered collection. In short, if "Pete" has a height of 67 inches and "Bob" has a height of 72 inches, then without some extra bit of information, it is not possible to determine if Bob should come before or after Pete.
Ordered by "key" or in this case, "name" one might impose alphabetical ordering, in which case "Bob" comes before "Pete".
Ordered by "value" or in this case, "height" one might impose smallest to largest ordering, in which case "Pete" comes before "Bob".
I'm sure that you know what you want to order by (the size of the list), but this example means to illustrate that a Map alone is a poor data structure for ordering. Even ordered maps in Java only sort by insertion order.
My suggestion is to keep two collections in the same wrapping class. One that contains an ordered list of the keys, and another that contains the Map. Walk the ordered list of keys and return the map values in that order if you want an ordered set of the values by their key characteristics. It will be easier to understand, and much more readable.
Also realize that the Map is not listening to the key values, as such, if someone who has a key decides to add an entry to the List that the key maintains, the Map will not know of the alteration and will not recompute the bucket for the key's former value. As such, to use a Map properly, you need to approach it in one of two ways.
Make the map keys immutable, copying the values upon input and returning Collections.unmodifiableList(...) wrappers on output.
Accept Map keys that can be listened to, and make the map a subscriber to the key updates that might occur. When the map detects a key change, the values are removed from the old key location and re-added back to the map with the new key.

If you don't need constant sorting over time, you can do that:
Map<List<Item>, Double> items = new HashMap<>();
List<Map.Entry<List<Item>, Double>> list = new ArrayList<>(items.entrySet());
Collections.sort(list, new Comparator<Map.Entry<List<Item>, Double>>() {
#Override
public int compare(
Map.Entry<List<Item>, Double> o1,
Map.Entry<List<Item>, Double> o2) {
return Integer.compare(o2.getKey().size(), o1.getKey().size());
}
}
});
If you need a TreeSet or a TreeMap, then the use of a Comparator is fine but you have to define what happens when the lists size are equals: you need to use another criteria to determine ordering (like comparing each items a, b of both list, until a.compareTo(b) != 0).

Java - TreeMap Solution

I haven't done Java in a while and i need some suggestions and idea's regarding data structures.
Currently i am using a TreeMap to map String values to Integer value. I now need to do some calculations and divide the Integer value of the map entry by the the size of the whole map and store this for each entry. I was thinking about using a Map,Integer> but is there a 3 way generics data structure in Java?
My current solution for this is this ..
int treeSize = occurrence.size();
String [][] weight = new String[treeSize][2];
int counter=0;
double score =0;
for(Entry<String, Integer> entry : occurrence.entrySet()) {
weight[counter][0]=entry.getKey();
score=entry.getValue()/treeSize;
weight[counter][1]= Double.toString(score);
counter++;
}

I would use another object to hold this data:
public Data {
private int value;
private double score;
...
}
And then type the map as Map<String, Data>. After inserting all the values, you can iterate over the values and update the ratio property for each value in the map. For example:
double size = myMap.size();
for(Map.Entry<String, Data> entry : myMap.entrySet()) {
Data data = entry.getValue();
data.setScore(data.getValue() / size);
}
EDIT
Another thought just came to mind. Instead of calculating the values after you have inserted it, you should probably calculate it as you are inserting it; it's more efficient that way. Of course, you can only do this if you know the total number of values beforehand.
An even better way is to perform the calculation only when you retrieve a value from the map. There are two advantages to this:
You don't need a separate object. Just abstract the access of the value from the map inside another function which returns the value associated with the key, divided by the size of the map.
Since you don't have a separate object to maintain the calculated value, you don't need to update it every time you add or delete a new value.

You could use a Map.Entry<Integer, Double> to hold the two values. (Ultimately, you'd use either AbstractMap.SimpleEntry or AbstractMap.SimpleImmutableEntry)
So your TreeMap would be TreeMap<String, Map.Entry<Integer, Double>>
However, unless you have a good reason to do otherwise, I'd strongly suggest that you do the calculation on the fly. Recalculating every fraction every time anything is inserted or deleted is time consuming, and churns small little objects, so it's likely to be slower than just doing the calculation. Also, recalculation will cause threading issues if multiple threads access the TreeMap. Instead, something like
public synchronized double getFraction(String key) {
Integer value = theTreeMap.get(key);
if (value == null)
return 0.0; // or throw an exception if you prefer...
// note, since the Map has at least one entry, no need to check for div by zero
return value.doubleValue() / theTreeMap.size();
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.