This question already has answers here:
Interview question: data structure to set all values in O(1)
(18 answers)
Closed 9 months ago.
I'm trying to write a data structure that is capable to set all the Values in O(1).
My code:
public class myData {
boolean setAllStatus = false;
HashMap<Integer, Integer> hasMap = new HashMap<>();
int setAllValue = 0;
int count = 0;
public void set(int key, int value) {
hasMap.put(key, value);
}
public int get(int key) {
if (setAllStatus) {
if (hasMap.get(key) != null) {
if (count == hasMap.size()) {
return setAllValue;
} else {
// do something
}
} else {
throw new NullPointerException();
}
} else {
if (hasMap.get(key) == null) {
throw new NullPointerException();
} else {
return hasMap.get(key);
}
}
}
public void setAll(int value) {
setAllStatus = true;
setAllValue = value;
count = hasMap.size();
}
public static void main(String[] args) {
myData m = new myData();
m.set(1, 4);
m.set(4, 5);
System.out.println(m.get(4)); // 5
m.setAll(6);
System.out.println(m.get(4)); // 6
m.set(8, 7);
System.out.println(m.get(8)); // 7
}
}
When I set variables for the first time and then set all the values to a specific variable it works, but when I try to put a new variable after setting all the variables I'm a bit confused.
What kind of solution can I use to make it work?
If you want to enhance your knowledge of Data Structures, I suggest you to implement your own version of Hash table data structure from the ground up (define an array of buckets, learn how to store elements in a bucket, how to resolve collisions and so on...) instead of decorating the HashMap.
Your current code is very contrived:
By its nature, get() should not do anything apart from retrieving a value associated with a key because that's the only responsibility of this method (have a look at the implementation of get() in the HashMap class). Get familiar with the Single responsibility principle.
The idea of throwing an exception when the given key isn't present in the map is strange. And NullPointerException is not the right type of exception to describe this case, NoSuchElementException would be more intuitive.
You might also be interested in learning What does it mean to "program to an interface"?
And the main point is that is because you've picked the wrong starting point (see the advice at the very beginning), learn more about data structures starting from the simplest like Dynamic array, try to implement them from scratch, and gradually learn more about the class design and language features.
Time complexity
Regarding the time complexity, since your class decorates a HashMap methods set() and get() would perform in a constant time O(1).
If you need to change all the values in a HashMap, that could be done only a linear time O(n). Assuming that all existing values are represented by objects that are distinct from one another, it's inherently impossible to perform this operation in a constant time because we need to do this change for every node in each bucket.
The only situation when all values can be set to a new value in a constant time is the following very contrived example where each and every key would be associated with the same object, and we need to maintain a reference to this object (i.e. it would always retrieve the same value for every key that is present, which doesn't seem to be particularly useful):
public class SingleValueMap<K, V> {
private Map<K, V> map = new HashMap<>();
private V commonValue;
public void setAll(V newValue) {
this.commonValue = newValue;
}
public void add(K key) {
map.put(key, commonValue);
}
public void add(K key, V newValue) {
setAll(newValue);
map.put(key, commonValue);
}
public V get(K key) {
if (!map.containsKey(key)) throw new NoSuchElementException();
return commonValue;
}
}
And since we are no longer using the actual HashMap's functionality for storing the values, HashMap can be replaced with HashSet:
public class SingleValueMap<K, V> {
private Set<K> set = new HashSet<>();
private V commonValue;
public void setAll(V newValue) {
this.commonValue = newValue;
}
public void add(K key) {
set.add(key);
}
public void add(K key, V newValue) {
setAll(newValue);
set.add(key);
}
public V get(K key) {
if (!set.contains(key)) throw new NoSuchElementException();
return commonValue;
}
}
If I'm understanding the problem here correctly, every time setAll is called, we effectively forget about all the values of the HashMap and track only its keys basically as if it were a HashSet, where get uses the value passed into setAll. Additionally, any new set calls should still track both the key and the value until setAll is called some time later.
In other words, you need to track the set of keys before setAll, and the set of key-and-values after setAll separately in order to be able to distinguish them.
See if you can find a way to amortize or through constant time operations, keep track of which keys are and are not associated with the latest setAll operation.
Given that this looks like a homework problem, I am hesitating to help further (as per these SO guidelines), but if this is not homework, let me know and I can delve further into this topic.
Related
I am trying to implement a hash cons in java, comparable to what String.intern does for strings. I.e., I want a class to store all distinct values of a data type T in a set and provide an T intern(T t) method that checks whether t is already in the set. If so, the instance in the set is returned, otherwise t is added to the set and returned. The reason is that the resulting values can be compared using reference equality since two equal values returned from intern will for sure also be the same instance.
Of course, the most obvious candidate data structure for a hash cons is java.util.HashSet<T>. However, it seems that its interface is flawed and does not allow efficient insertion, because there is no method to retrieve an element that is already in the set or insert one if it is not in there.
An algorithm using HashSet would look like this:
class HashCons<T>{
HashSet<T> set = new HashSet<>();
public T intern(T t){
if(set.contains(t)) {
return ???; // <----- PROBLEM
} else {
set.add(t); // <--- Inefficient, second hash lookup
return t;
}
}
As you see, the problem is twofold:
This solution would be inefficient since I would access the hash table twice, once for contains and once for add. But okay, this may not be a too big performance hit since the correct bucket will be in the cache after the contains, so add will not trigger a cache miss and thus be quite fast.
I cannot retrieve an element already in the set (see line flagged PROBLEM). There is just no method to retrieve the element in the set. So it is just not possible to implement this.
Am I missing something here? Or is it really impossible to build a usual hash cons with java.util.HashSet?
I don't think it's possible using HashSet. You could use some kind of Map instead and use your value as key and as value. The java.util.concurrent.ConcurrentMap also happens to posess the quite convenient method
putIfAbsent(K key, V value)
that returns the value if it is already existent. However, I don't know about the performance of this method (compared to checking "manually" on non-concurrent implementations of Map).
Here is how you would do it using a HashMap:
class HashCons<T>{
Map<T,T> map = new HashMap<T,T>();
public T intern(T t){
if (!map.containsKey(t))
map.put(t,t);
return map.get(t);
}
}
I think the reason why it is not possible with HashSet is quite simple: To the set, if contains(t) is fulfilled, it means that the given t also equals one of the t' in the set. There is no reason for being able return it (as you already have it).
Well HashSet is implemented as HashMap wrapper in OpenJDK, so you won't win in memory usage comparing to solution suggested by aRestless.
10-min sketch
class HashCons<T> {
T[] table;
int size;
int sizeLimit;
HashCons(int expectedSize) {
init(Math.max(Integer.highestOneBit(expectedSize * 2) * 2, 16));
}
private void init(int capacity) {
table = (T[]) new Object[capacity];
size = 0;
sizeLimit = (int) (capacity * 2L / 3);
}
T cons(#Nonnull T key) {
int mask = table.length - 1;
int i = key.hashCode() & mask;
do {
if (table[i] == null) break;
if (key.equals(table[i])) return table[i];
i = (i + 1) & mask;
} while (true);
table[i] = key;
if (++size > sizeLimit) rehash();
return key;
}
private void rehash() {
T[] table = this.table;
if (table.length == (1 << 30))
throw new IllegalStateException("HashCons is full");
init(table.length << 1);
for (T key : table) {
if (key != null) cons(key);
}
}
}
The error looks like this
Exception in thread "Thread-1" java.lang.NullPointerException
at java.util.LinkedHashMap$Entry.remove(LinkedHashMap.java:332)
at java.util.LinkedHashMap$Entry.recordAccess(LinkedHashMap.java:356)
at java.util.LinkedHashMap.get(LinkedHashMap.java:304)
at Server.getLastFinishedCommands(Server.java:9086)
at Server.processPacket(Server.java:484)
at PacketWorker.run(PacketWorker.java:34)
at java.lang.Thread.run(Thread.java:744)
Inside getLastFinishedCommands I use
public List<CCommand> getLastFinishedCommands(UserProfile player) {
List<CCommand> returnList = new ArrayList<CCommand>();
if(!finishedCommands.containsKey(player.myWebsitecmd-1)) {
getSavedState(player);
return null;
}
try { //<-- added this try/catch so it doesn't happen again.
//Get commands.
CCommand cmd;
long i;
long startIndex = player.myWebsitecmd;
long endIndex = startIndex+LIMIT_COMMANDS;
for(i = startIndex; i <= endIndex; i++) {
cmd = finishedCommands.get(i); //<-- this is line 9086
if(cmd == null) {
return returnList;
}
returnList.add(cmd);
}
} catch(Exception e) {} //<-- added this try/catch so it doesn't happen again.
return returnList;
}
I wanted to make a Map that auto removes old entries so I used this snippet
public static <K, V> Map<K, V> createLRUMap(final int maxEntries) {
return new LinkedHashMap<K, V>(maxEntries*3/2, 0.7f, true) {
#Override
protected boolean removeEldestEntry(Map.Entry<K, V> eldest) {
return size() > maxEntries;
}
};
}
Used it like this
public static int final MAX_COMMANDS_QUEUE = 5000;
public Map<Long, CCommand> finishedCommands = createLRUMap(MAX_COMMANDS_QUEUE);
Obviously it's some kind of CocurrentModifcationException which happens when using with multiple threads.. but why does it crash internally, anyone know how I can use this with like a CocurrentHashMap? I'm trying to fix this without resorting to just putting a try/catch around the whole getLastFinishedCommands function.
I want a Map that clears itself from old junk but still holds atleast 5000 key/value entries.
Based on the stacktrace, I assume that the code tries to remove the value from an index whose item has been already removed by another thread. This makes it to throw NPE while accessing the properties of a null reference. Probably, you should try synchronizing the collection
From the documentation of LinkedHashMap
Note that this implementation is not synchronized. If multiple threads access a linked hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally. This is typically accomplished by synchronizing on some object that naturally encapsulates the map. If no such object exists, the map should be "wrapped" using the Collections.synchronizedMap method. This is best done at creation time, to prevent accidental unsynchronized access to the map:
Map m = Collections.synchronizedMap(new LinkedHashMap(...));
You said that multiple threads are accessing this map. This could indeed cause the NPE in the remove operation of a LinkedHashMap.Entry instance. This is the implementation of this method:
private void remove() {
before.after = after;
after.before = before;
}
Here before and after refer to the linked predecessor and successor of the current entry. If another thread already changed the linking between the entries, this could of course result in an unexpected behavior, such as the NPE.
The solution is - you guessed correctly - to wrap your produced map in a synchronized map. Such as:
public static <K, V> Map<K, V> createLRUMap(final int maxEntries) {
Map<K,V> result = new LinkedHashMap<K, V>(maxEntries*3/2, 0.7f, true) {
#Override
protected boolean removeEldestEntry(Map.Entry<K, V> eldest) {
return size() > maxEntries;
}
};
return Collections.synchronizedMap(result);
}
This synchronized wrapper will indeed synchronize all calls to the underlying map, so only one single thread is allowed to go through each method (such as get, put, contains, size, and so on).
I have a file of Integer[]s that is too large to put in memory. I would like to search for all arrays with a last member of x and use them in other code. Is there a way to use Guava's multimap to do this, where x is the key and stored in memory and the Integer[] is the value and that is stored on disk? In this scenario, the keys are not unique, but key-value pairs are unique. Reading of this multimap (assuming that it's possible) will be concurrent. I'm also open to suggestions of other ways to approach this.
Thanks
You could create a class representing an array on disk (based on its index in the file of arrays), let's call it FileBackedIntArray, and put instances of that as the values of a HashMultimap<Integer, FileBackedIntArray>:
public class FileBackedIntArray {
// Index of the array in the file of arrays
private final int index;
private final int lastElement;
public FileBackedIntArray(int index, int lastElement) {
this.index = index;
this.lastElement = lastElement;
}
public int getIndex() {
return index;
}
public int[] readArray() {
// Read the file and deserialize the array at the associated index
return smth;
}
public int getLastElement() {
return lastElement;
}
#Override
public int hashCode() {
return index;
}
#Override
public boolean equals(Object o) {
if (this == o) {
return true;
} else if (o == null || o.getClass() != getClass()) {
return false;
}
return index == ((FileBackedIntArray) o).index;
}
}
Do you actually need an Integer[] and not an int[], by the way (i.e. you can have null values)? As you've said in the comments, you don't really need an Integer[], so using intss everywhere will avoid boxing/unboxing and will save a lot of space since you appear to have lots of them. Hopefully you don't have a huge number of possible values for the last element (x).
You then create an instance for each array and read the last element to put it the Multimap without keeping the array around. Populating the Multimap needs to be either sequential or protected with a lock if concurrent, but reading can be concurrent without any protection. You could even create an ImmutableMultimap once the HashMultimap has been populated, to guard against any modification, a safe practice in a concurrent environment.
Why cannot I retrieve an element from a HashSet?
Consider my HashSet containing a list of MyHashObjects with their hashCode() and equals() methods overridden correctly. I was hoping to construct a MyHashObject myself, and set the relevant hash code properties to certain values.
I can query the HashSet to see if there "equivalent" objects in the set using the contains() method. So even though contains() returns true for the two objects, they may not be == true.
How come then there isn’t any get() method similar to how the contains() works?
What is the thinking behind this API decision?
If you know what element you want to retrieve, then you already have the element. The only question for a Set to answer, given an element, is whether it contains() it or not.
If you want to iterator over the elements, just use a Set.iterator().
It sounds like what you're trying to do is designate a canonical element for an equivalence class of elements. You can use a Map<MyObject,MyObject> to do this. See this Stack Overflow question or this one for a discussion.
If you are really determined to find an element that .equals() your original element with the constraint that you must use the HashSet, I think you're stuck with iterating over it and checking equals() yourself. The API doesn't let you grab something by its hash code. So you could do:
MyObject findIfPresent(MyObject source, HashSet<MyObject> set)
{
if (set.contains(source)) {
for (MyObject obj : set) {
if (obj.equals(source))
return obj;
}
}
return null;
}
It is brute-force and O(n) ugly, but if that's what you need to do...
You can use HashMap<MyHashObject, MyHashObject> instead of HashSet<MyHashObject>.
Calling containsKey() on your "reconstructed" MyHashObject will first hashCode() - check the collection, and if a duplicate hashcode is hit, finally equals() - check your "reconstructed" against the original, at which you can retrieve the original using get()
Complexity is O(1) but the downside is you will likely have to override both equals() and hashCode() methods.
It sounds like you're essentially trying to use the hash code as a key in a map (which is what HashSets do behind the scenes). You could just do it explicitly, by declaring HashMap<Integer, MyHashObject>.
There is no get for HashSets because typically the object you would supply to the get method as a parameter is the same object you would get back.
If you know the order of elements in your Set, you can retrieve them by converting the Set to an Array. Something like this:
Set mySet = MyStorageObject.getMyStringSet();
Object[] myArr = mySet.toArray();
String value1 = myArr[0].toString();
String value2 = myArr[1].toString();
The idea that you need to get the reference to the object that is contained inside a Set object is common. It can be archived by 2 ways:
Use HashSet as you wanted, then:
public Object getObjectReference(HashSet<Xobject> set, Xobject obj) {
if (set.contains(obj)) {
for (Xobject o : set) {
if (obj.equals(o))
return o;
}
}
return null;
}
For this approach to work, you need to override both hashCode() and equals(Object o) methods
In the worst scenario we have O(n)
Second approach is to use TreeSet
public Object getObjectReference(TreeSet<Xobject> set, Xobject obj) {
if (set.contains(obj)) {
return set.floor(obj);
}
return null;
}
This approach gives O(log(n)), more efficient.
You don't need to override hashCode for this approach but you have to implement Comparable interface. ( define function compareTo(Object o)).
One of the easiest ways is to convert to Array:
for(int i = 0; i < set.size(); i++) {
System.out.println(set.toArray()[i]);
}
If I know for sure in my application that the object is not used in search in any of the list or hash data structure and not used equals method elsewhere except the one used indirectly in hash data structure while adding. Is it advisable to update the existing object in set in equals method. Refer the below code. If I add the this bean to HashSet, I can do group aggregation on the matching object on key (id). By this way I am able to achieve aggregation functions such as sum, max, min, ... as well. If not advisable, please feel free to share me your thoughts.
public class MyBean {
String id,
name;
double amountSpent;
#Override
public int hashCode() {
return id.hashCode();
}
#Override
public boolean equals(Object obj) {
if(obj!=null && obj instanceof MyBean ) {
MyBean tmpObj = (MyBean) obj;
if(tmpObj.id!=null && tmpObj.id.equals(this.id)) {
tmpObj.amountSpent += this.amountSpent;
return true;
}
}
return false;
}
}
First of all, convert your set to an array. Then, get the item by indexing the array.
Set uniqueItem = new HashSet();
uniqueItem.add("0");
uniqueItem.add("1");
uniqueItem.add("0");
Object[] arrayItem = uniqueItem.toArray();
for(int i = 0; i < uniqueItem.size(); i++) {
System.out.println("Item " + i + " " + arrayItem[i].toString());
}
If you could use List as a data structure to store your data, instead of using Map to store the result in the value of the Map, you can use following snippet and store the result in the same object.
Here is a Node class:
private class Node {
public int row, col, distance;
public Node(int row, int col, int distance) {
this.row = row;
this.col = col;
this.distance = distance;
}
public boolean equals(Object o) {
return (o instanceof Node &&
row == ((Node) o).row &&
col == ((Node) o).col);
}
}
If you store your result in distance variable and the items in the list are checked based on their coordinates, you can use the following to change the distance to a new one with the help of lastIndexOf method as long as you only need to store one element for each data:
List<Node> nodeList;
nodeList = new ArrayList<>(Arrays.asList(new Node(1, 2, 1), new Node(3, 4, 5)));
Node tempNode = new Node(1, 2, 10);
if(nodeList.contains(tempNode))
nodeList.get(nodeList.lastIndexOf(tempNode)).distance += tempNode.distance;
It is basically reimplementing Set whose items can be accessed and changed.
If you want to have a reference to the real object using the same performance as HashSet, I think the best way is to use HashMap.
Example (in Kotlin, but similar in Java) of finding an object, changing some field in it if it exists, or adding it in case it doesn't exist:
val map = HashMap<DbData, DbData>()
val dbData = map[objectToFind]
if(dbData!=null){
++dbData.someIntField
}
else {
map[dbData] = dbData
}
I have an interesting problem I would like some help with. I have implemented a couple of queues for two separate conditions, one based on FIFO and the other natural order of a key (ConcurrentMap). That is you can image both queues have the same data just ordered differently. The question I have (and I am looking for an efficient way of doing this) if I find the key in the ConcurrentMap based on some criteria, what is the best way of finding the "position" of the key in the FIFO map. Essentially I would like to know whether it is the firstkey (which is easy), or say it is the 10th key.
Any help would be greatly appreciated.
There is no API for accessing the order in a FIFO map. The only way you can do it is iterate over keySet(), values() or entrySet() and count.
I believe something like the code below will do the job. I've left the implementation of element --> key as an abstract method. Note the counter being used to assign increasing numbers to elements. Also note that if add(...) is being called by multiple threads, the elements in the FIFO are only loosely ordered. That forces the fancy max(...) and min(...) logic. Its also why the position is approximate. First and last are special cases. First can be indicated clearly. Last is tricky because the current implementation returns a real index.
Since this is an approximate location, I would suggest you consider making the API return a float between 0.0 and 1.0 to indicate relative position in the queue.
If your code needs to support removal using some means other than pop(...), you will need to use approximate size, and change the return to ((id - min) / (max - min)) * size, with all the appropriate int / float casting & rounding.
public abstract class ApproximateLocation<K extends Comparable<K>, T> {
protected abstract K orderingKey(T element);
private final ConcurrentMap<K, Wrapper<T>> _map = new ConcurrentSkipListMap<K, Wrapper<T>>();
private final Deque<Wrapper<T>> _fifo = new LinkedBlockingDeque<Wrapper<T>>();
private final AtomicInteger _counter = new AtomicInteger();
public void add(T element) {
K key = orderingKey(element);
Wrapper<T> wrapper = new Wrapper<T>(_counter.getAndIncrement(), element);
_fifo.add(wrapper);
_map.put(key, wrapper);
}
public T pop() {
Wrapper<T> wrapper = _fifo.pop();
_map.remove(orderingKey(wrapper.value));
return wrapper.value;
}
public int approximateLocation(T element) {
Wrapper<T> wrapper = _map.get(orderingKey(element));
Wrapper<T> first = _fifo.peekFirst();
Wrapper<T> last = _fifo.peekLast();
if (wrapper == null || first == null || last == null) {
// element is not in composite structure; fifo has not been written to yet because of concurrency
return -1;
}
int min = Math.min(wrapper.id, Math.min(first.id, last.id));
int max = Math.max(wrapper.id, Math.max(first.id, last.id));
if (wrapper == first || max == min) {
return 0;
}
if (wrapper == last) {
return max - min;
}
return wrapper.id - min;
}
private static class Wrapper<T> {
final int id;
final T value;
Wrapper(int id, T value) {
this.id = id;
this.value = value;
}
}
}
If you can use a ConcurrentNavigableMap, the size of the headMap gives you exactly what you want.