Java - TreeMap Solution - java

I haven't done Java in a while and i need some suggestions and idea's regarding data structures.
Currently i am using a TreeMap to map String values to Integer value. I now need to do some calculations and divide the Integer value of the map entry by the the size of the whole map and store this for each entry. I was thinking about using a Map,Integer> but is there a 3 way generics data structure in Java?
My current solution for this is this ..
int treeSize = occurrence.size();
String [][] weight = new String[treeSize][2];
int counter=0;
double score =0;
for(Entry<String, Integer> entry : occurrence.entrySet()) {
weight[counter][0]=entry.getKey();
score=entry.getValue()/treeSize;
weight[counter][1]= Double.toString(score);
counter++;
}

I would use another object to hold this data:
public Data {
private int value;
private double score;
...
}
And then type the map as Map<String, Data>. After inserting all the values, you can iterate over the values and update the ratio property for each value in the map. For example:
double size = myMap.size();
for(Map.Entry<String, Data> entry : myMap.entrySet()) {
Data data = entry.getValue();
data.setScore(data.getValue() / size);
}
EDIT
Another thought just came to mind. Instead of calculating the values after you have inserted it, you should probably calculate it as you are inserting it; it's more efficient that way. Of course, you can only do this if you know the total number of values beforehand.
An even better way is to perform the calculation only when you retrieve a value from the map. There are two advantages to this:
You don't need a separate object. Just abstract the access of the value from the map inside another function which returns the value associated with the key, divided by the size of the map.
Since you don't have a separate object to maintain the calculated value, you don't need to update it every time you add or delete a new value.

You could use a Map.Entry<Integer, Double> to hold the two values. (Ultimately, you'd use either AbstractMap.SimpleEntry or AbstractMap.SimpleImmutableEntry)
So your TreeMap would be TreeMap<String, Map.Entry<Integer, Double>>
However, unless you have a good reason to do otherwise, I'd strongly suggest that you do the calculation on the fly. Recalculating every fraction every time anything is inserted or deleted is time consuming, and churns small little objects, so it's likely to be slower than just doing the calculation. Also, recalculation will cause threading issues if multiple threads access the TreeMap. Instead, something like
public synchronized double getFraction(String key) {
Integer value = theTreeMap.get(key);
if (value == null)
return 0.0; // or throw an exception if you prefer...
// note, since the Map has at least one entry, no need to check for div by zero
return value.doubleValue() / theTreeMap.size();
}

Related

Iterate over key-range of HashMap

Is it possible to iterate over a certain range of keys from a HashMap?
My HashMap contains key-value pairs where the key denotes a certainr row-column in Excel (e.g. "BM" or "AT") and the value is the value in this cell.
For example, my table import is:
startH = {
BQ=2019-11-04,
BU=2019-12-02,
BZ=2020-01-06,
CD=2020-02-03,
CH=2020-03-02,
CM=2020-04-06
}
endH = {
BT=2019-11-25,
BY=2019-12-30,
CC=2020-01-27,
CG=2020-02-24,
CL=2020-03-30,
CP=2020-04-27
}
I need to iterate over those two hashmap using a key-range in order to extract the data in the correct order. For example from "BQ" to "BT".
Explanation
Is it possible to iterate over hashmap but using its index?
No.
A HashMap has no indices. Depending on the underlying implementation it would also be impossible. Java HashMaps are not necessarily represented by a hashing-table. It can switch over to a red-black tree and they do not provide direct access at all. So no, not possible.
There is another fundamental flaw in this approach. HashMap does not maintain any order. Iterating it yields random orders that can change each time you start the program. But for this approach you would need insertion order. Fortunately LinkedHashMap does this. It still does not provide index-based access though.
Solutions
Generation
But, you actually do not even want index based access. You want to retrieve a certain key-range, for example from "BA" to "BM". A good approach that works with HashMap would be to generate your key-range and simply using Map#get to retrieve the data:
char row = 'B';
char columnStart = 'A';
char columnEnd = 'M';
for (char column = columnStart; columnStart <= columnEnd; column++) {
String key = Chararcter.toString(row) + column;
String data = map.get(key);
...
}
You might need to fine-tune it a bit if you need proper edge case handling, like wrapping around the alphabet (use 'A' + (column % alphabetSize)) and maybe it needs some char to int casting and vice versa for the additions, did not test it.
NavigableMap
There is actually a variant of map that offers pretty much what you want out of the box. But at higher cost of performance, compared to a simple HashMap. The interface is called NavigableMap. The class TreeMap is a good implementation. The problem is that it requires an explicit order. The good thing though is that you actually want Strings natural order, which is lexicographical.
So you can simply use it with your existing data and then use the method NavigableMap#subMap:
NavigableMap<String, String> map = new TreeMap<>(...);
String startKey = "BA";
String endKey = "BM";
Map<String, String> subMap = map.subMap(startKey, endKey);
for (Entry<String, String> entry : subMap.entrySet()) {
...
}
If you have to do those kind of requests more than once, this will definitely pay off and it is the perfect data-structure for this use-case.
Linked iteration
As explained before, it is also possible (although not as efficient) to instead have a LinkedHashMap (to maintain insertion order) and then simply iterate over the key range. This has some major drawbacks though, for example it first needs to locate the start of the range by fully iterating to there. And it relies on the fact that you inserted them correctly.
LinkedHashMap<String, String> map = ...
String startKey = "BA";
String endKey = "BM";
boolean isInRange = false;
for (Entry<String, String> entry : map.entrySet()) {
String key = entry.getKey();
if (!isInRange) {
if (key.equals(startKey)) {
isInRange = true;
} else {
continue;
}
}
...
if (key.equals(endKey)) {
break;
}
}
// rangeLower and rangeUpper can be arguments
int i = 0;
for (Object mapKey : map.keySet()) {
if (i < rangeLower || i > rangeUpper) {
i++;
continue;
}
// Do something with mapKey
}
The above code iterates by getting keyset and explicitly maintaining index and incrementing it in each loop. Another option is to use LinkedHashMap, which maintains a doubly linked list for maintaining insertion order.
I don't believe you can. The algorithm you propose assumes that the keys of a HashMap are ordered and they are not. Order of keys is not guaranteed, only the associations themselves are guaranteed.
You might be able to change the structure of your data to something like this:
ranges = {
BQ=BT,
BU=BY,
....
}
Then the iteration over the HashMap keys (start cells) would easily find the matching end cells.

Mapping large set of Keys to a small set of Values

If you had 1,000,000 keys (ints) that mapped to 10,000 values (ints). What would be the most efficient way (lookup performance and memory usage) to implement.
Assume the values are random. i.e there is not a range of keys that map to a single value.
The easiest approach I can think of is a HashMap but wonder if you can do better by grouping the keys that match a single value.
Map<Integer,Integer> largeMap = Maps.newHashMap();
largeMap.put(1,4);
largeMap.put(2,232);
...
largeMap.put(1000000, 4);
If the set of keys is known to be in a given range (as 1-1000000 shown in your example), then the simplest is to use an array. The problem is that you need to look up values by key, and that limits you to either a map or an array.
The following uses a map of values to values simply to avoid duplicate instances of equal value objects (there may be a better way to do this, but I can't think of any). The array simply serves to look up values by index:
private static void addToArray(Integer[] array, int key,
Integer value, Map<Integer, Integer> map) {
array[key] = map.putIfAbsent(value, value);
}
And then values can be added using:
Map<Integer, Integer> keys = new HashMap<>();
Integer[] largeArray = new Integer[1000001];
addToArray(largeArray, 1, 4, keys);
addToArray(largeArray, 2, 232, keys);
...
addToArray(largeArray, 1000000, 4, keys);
If new Integer[1000001] seems like a hack, you can still maintain a sort of "index offset" to indicate the actual key associated with index 0 in the array.
And I'd put that in a class:
class LargeMap {
private Map<Integer, Integer> keys = new HashMap<>();
private Integer[] keyArray;
public LargeMap(int size) {
this.keyArray = new Integer[size];
}
public void put(int key, Integer value) {
this.keyArray[key] = this.keys.putIfAbsent(value, value);
}
public Integer get(int key) {
return this.keyArray[key];
}
}
And:
public static void main(String[] args) {
LargeMap myMap = new LargeMap(1000_000);
myMap.put(1, 4);
myMap.put(2, 232);
myMap.put(1000_000, 4);
}
I'm not sure if you can optimize much here by grouping anything. A 'reverse' mapping might give you slightly better performance if you want to do lookup by values instead of by key (i.e. get all keys with a certain value) but since you didn't explicitly said that you want to do this I wouldn't go with that approach.
For optimization you can use an int array instead of a map, if the keys are in a fixed range. Array lookup is O(1) and primitive arrays use less memory than maps.
int offset = -1;
int[] values = new int[1000000];
values[1 + offset] = 4;
values[2 + offset] = 232;
// ...
values[1000000 + offset] = 4;
If the range doesn't start at 1 you can adapt the offset.
There are also libraries like trove4j which provide better performance and more efficient storage for this kind of data than than standard collections, though I don't know how they compare to the simple array approach.
HashMap is the worst solution. The hash of an integer is itself. I would say a TreeMap if you want an easily available solution. You could write your own specialized tree map, for example splitting the keys into two shorts and having a TreeMap within a Treemap.

Remove a specific value from a key (HashMaps)

I have the following HashMap (HashMap<String, String[]>) and was wondering, if there is a method to remove a specific String from the array of a specific key.
I've found only methods to remove one key basing on a value, but for example, I have:
("key1", new String[]{"A", "B", "C"})
How can I remove only B?
Here's s plain Java solution:
map.computeIfPresent("key1", (k, v) -> Arrays.stream(v)
.filter(s -> !s.equals("B")).toArray(String[]::new));
You would get the values for the specific key and remove the given value from it, then put it back into the map.
public void <K> removeValueFromKey(final Map<K, K[]> map, final K key, final K value) {
K[] values = map.get(key);
ArrayList<K> valuesAsList = new ArrayList<K>(values.length);
for (K currentValue : values) {
if (!currentValue.equals(value)) {
valuesAsList.add(currentValue);
}
}
K[] newValues = new K[valuesAsList.size()];
newValues = valuesAsList.toArray(newValues);
map.put(key, newValues);
}
Be aware that the runtime of course is linear to the size of the given array. There is no faster way, because you need to iterate over each element of the array to find all values that are equal to the given value.
However, you could do a faster implementation with other data structures, if that is practicable. For example sets would be better than arrays, or any other data structure that implements contains is faster than O(n).
The same holds for space complexity; you have a peak where you need to hold both arrays in the memory. This is because the size of an array cannot be changed; the method will construct a new array. Thus you will have two arrays in the memory, O(2n).
A Collection<String> may be a better solution, depending on how often you'll call the method, compared to how many elements a map holds.
Another thing is that you can speed up the progress by guessing a good initial capacity for the ArrayList.

Map Entry conversion with Map Tree

So what I have been trying to do is use a TreeMap I previously had and apply it to this method in which I convert it into a set and have it go through a Map Entry Loop. What I wish to do is invert my previous TreeMap into the opposite (flipped) TreeMap
'When I run my code, it gives me a comparable error. Does this mean I have to implement the comparable method? I convereted the arrayList into an Integer so I thought the comparable method would support it. Or is it just something wrong with my code
Error: Exception in thread "main" java.lang.ClassCastException: java.util.ArrayList cannot be cast to java.lang.Comparable
Overview: Originally, my intended purpose for the program was to make a Treemap that read from a text document and specifically found all the words and the index/rows of where the words were located. Now I wish to make a "top ten" list that contains the most used words. I wanted to "flip" my treemap so that the integer values would be what would be put in order and the string would follow
public static void getTopTenWords(TreeMap<String, ArrayList<Integer>> map) {
Set<Map.Entry<String, ArrayList<Integer>>> set = map.entrySet();
TreeMap<Integer, String> temp = new TreeMap<Integer, String>();
int count = 1;
for(Map.Entry<String, ArrayList<Integer>> entry : set){
if(temp.containsKey(entry.getValue())) {
Integer val = entry.getValue().get(count);
val++;
temp.put(val, entry.getKey());
}
else {
temp.put(entry.getValue().get(count), entry.getKey());
}
count++;
}
}
Now I wish to make a "top ten" list that contains the most used words.
I wanted to "flip" my treemap so that the integer values would be what
would be put in order and the string would follow
Note that a Map contains only unique keys. So, if you try to keep your count as key, then you would need to put it in your Map by creating a new object with new Integer(count).
If you put your count in Map like: - map.put(2, "someword"), then there are chances that your previous count value gets overwritten, because Integer caches the values in range: - [-128 to 127]. So, the integer values between these range will be interned if you don't create a new object. And hence two Integer with value say 2 will point to same Integer object, and hence resulting in duplicate key.
Secondly, in your code: -
if (temp.containsKey(entry.getValue()))
using the above if statement, you are comparing an ArrayList with an Integer value. temp contains key which are integers. And values in entry are ArrayList. So, that will fail at runtime. Also, since your orginal Map contains just the location of the word found in the text file. So, just what you need to do is, get the size of arraylist for each word, and make that a key.
You would need to modify your code a little bit.
public static void getTopTenWords(TreeMap<String, ArrayList<Integer>> map) {
Set<Map.Entry<String, ArrayList<Integer>>> set = map.entrySet();
TreeMap<Integer, String> temp = new TreeMap<Integer, String>();
for(Map.Entry<String, ArrayList<Integer>> entry : set) {
int size = entry.getValue().size();
int word = entry.getKey();
temp.put(new Integer(size), word));
}
}
So, you can see that, I just used the size of the values in your entry set. And put it as a key in your TreeMap. Also using new Integer(size) is very important. It ensures that every integer reference points to a new object. Thus no duplication.
Also, note that, your TreeMap sorts your Integer value in ascending order. Your most frequent words would be somewhere at the end.

How to select a random key from a HashMap in Java?

I'm working with a large ArrayList<HashMap<A,B>>, and I would repeatedly need to select a random key from a random HashMap (and do some stuff with it). Selecting the random HashMap is trivial, but how should I select a random key from within this HashMap?
Speed is important (as I need to do this 10000 times and the hashmaps are large), so just selecting a random number k in [0,9999] and then doing .next() on the iterator k times, is really not an option. Similarly, converting the HashMap to an array or ArrayList on every random pick is really not an option. Please, read this before replying.
Technically I feel that this should be possible, since the HashMap stores its keys in an Entry[] internally, and selecting at random from an array is easy, but I can't figure out how to access this Entry[]. So any ideas to access the internal Entry[] are more than welcome. Other solutions (as long as they don't consume linear time in the hashmap size) are also welcome of course.
Note: heuristics are fine, so if there's a method that excludes 1% of the elements (e.g. because of multi-filled buckets) that's no problem at all.
from top of my head
List<A> keysAsArray = new ArrayList<A>(map.keySet())
Random r = new Random()
then just
map.get(keysAsArray.get(r.nextInt(keysAsArray.size()))
I managed to find a solution without performance loss. I will post it here since it may help other people -- and potentially answer several open questions on this topic (I'll search for these later).
What you need is a second custom Set-like data structure to store the keys -- not a list as some suggested here. Lists-like data structures are to expensive to remove items from. The operations needed are adding/removing elements in constant time (to keep it up-to-date with the HashMap) and a procedure to select the random element. The following class MySet does exactly this
class MySet<A> {
ArrayList<A> contents = new ArrayList();
HashMap<A,Integer> indices = new HashMap<A,Integer>();
Random R = new Random();
//selects random element in constant time
A randomKey() {
return contents.get(R.nextInt(contents.size()));
}
//adds new element in constant time
void add(A a) {
indices.put(a,contents.size());
contents.add(a);
}
//removes element in constant time
void remove(A a) {
int index = indices.get(a);
contents.set(index,contents.get(contents.size()-1));
indices.put(contents.get(index),index);
contents.remove((int)(contents.size()-1));
indices.remove(a);
}
}
You need access to the underlying entry table.
// defined staticly
Field table = HashMap.class.getDeclaredField("table");
table.setAccessible(true);
Random rand = new Random();
public Entry randomEntry(HashMap map) {
Entry[] entries = (Entry[]) table.get(map);
int start = rand.nextInt(entries.length);
for(int i=0;i<entries.length;i++) {
int idx = (start + i) % entries.length;
Entry entry = entries[idx];
if (entry != null) return entry;
}
return null;
}
This still has to traverse the entries to find one which is there so the worst case is O(n) but the typical behaviour is O(1).
Sounds like you should consider either an ancillary List of keys or a real object, not a Map, to store in your list.
As #Alberto Di Gioacchino pointed out, there is a bug in the accepted solution with the removal operation. This is how I fixed it.
class MySet<A> {
ArrayList<A> contents = new ArrayList();
HashMap<A,Integer> indices = new HashMap<A,Integer>();
Random R = new Random();
//selects random element in constant time
A randomKey() {
return contents.get(R.nextInt(contents.size()));
}
//adds new element in constant time
void add(A item) {
indices.put(item,contents.size());
contents.add(item);
}
//removes element in constant time
void remove(A item) {
int index = indices.get(item);
contents.set(index,contents.get(contents.size()-1));
indices.put(contents.get(index),index);
contents.remove(contents.size()-1);
indices.remove(item);
}
}
I'm assuming you are using HashMap as you need to look something up at a later date?
If not the case, then just change your HashMap to an Array/ArrayList.
If this is the case, why not store your objects in a Map AND an ArrayList so you can look up randomly or by key.
Alternatively, could you use a TreeMap instead of HashMap? I don't know what type your key is but you use TreeMap.floorKey() in conjunction with some key randomizer.
After spending some time, I came to the conclusion that you need to create a model which can be backed by a List<Map<A, B>> and a List<A> to maintain your keys. You need to keep the access of your List<Map<A, B>> and List<A>, just provide the operations/methods to the caller. In this way, you will have the full control over implementation, and the actual objects will be safer from external changes.
Btw, your questions lead me to,
Why does the java.util.Set<V> interface not provide a get(Object o) method?, and
Bimap: I was trying to be clever but, of course, its values() method also returns Set.
This example, IndexedSet, may give you an idea about how-to.
[edited]
This class, SetUniqueList, might help you if you decide to create your own model. It explicitly states that it wraps the list, not copies. So, I think, we can do something like,
List<A> list = new ArrayList(map.keySet());
SetUniqueList unikList = new SetUniqueList(list, map.keySet);
// Now unikList should reflect all the changes to the map keys
...
// Then you can do
unikList.get(i);
Note: I didn't try this myself. Will do that later (rushing to home).
Since Java 8, there is an O(log(N)) approach with O(log(N)) additional memory: create a Spliterator via map.entrySet().spliterator(), make log(map.size()) trySplit() calls and choose either the first or the second half randomly. When there are say less than 10 elements left in a Spliterator, dump them into a list and make a random pick.
If you absolutely need to access the Entry array in HashMap, you can use reflection. But then your program will be dependent on that concrete implementation of HashMap.
As proposed, you can keep a separate list of keys for each map. You would not keep deep copies of the keys, so the actual memory denormalisation wouldn't be that big.
Third approach is to implement your own Map implementation, the one that keeps keys in a list instead of a set.
How about wrapping HashMap in another implementation of Map? The other map maintains a List, and on put() it does:
if (inner.put(key, value) == null) listOfKeys.add(key);
(I assume that nulls for values aren't permitted, if they are use containsKey, but that's slower)

Categories