I have implemented my own hashmap for study purposes. The key has a string and the value has an object of the class I created. By the way, I want to know if my hashcode method is appropriate, and how to not calculate the hashcode every time a value is inserted.
I saved the hash value once calculated as a member variable of object. However, when the get method is called, only the key value is received, so the hashcode must be obtained. How can I recycle the once calculated hash value?
Finally, is my hash generation method appropriate?
class IHashMap {
private class Node {
int hash;
String key;
int data;
Node right;
public Node(String key, int data) {
this.key = key;
this.data = data;
this.right = null;
this.hash = 0;
}
}
private Node[] table;
private int tbSize;
private int n;
public IHashMap(int tbSize) {
this.table = new Node[tbSize];
this.tbSize = tbSize;
this.n = 0;
}
//...Omit irrelevant code...
public void put(String key, int value) {
int hash = hashCode(key);
Node node = new Node(key, value);
node.hash = hash;
if(this.table[hash] != null) {
Node entry = this.table[hash];
while(entry.right != null && !entry.key.equals(key))
entry = entry.right;
if(entry.key.equals(key)) {
entry.data++;
}
else {
entry.right = node;
this.n++;
}
}
else {
this.table[hash] = node;
this.n++;
}
}
public int get(String key) {
int hash = hashCode(key);
if(this.table[hash] != null) {
if(this.table[hash].key.equals(key))
return this.table[hash].data;
Node entry = this.table[hash];
while(entry != null && !entry.key.equals(key))
entry = entry.right;
if(entry == null)
return -1;
return entry.data;
}
return -1;
}
private int hash(String key) {
int h = 0;
if(key.length() > 0) {
char[] var = strToCharArray(key);
for(int i = 0; i < var.length; i++)
h = 31 * h + var[i];
}
return h;
}
private int hashCode(String key) {
return (hash(key) & 0x7fffffff) % this.tbSize;
}
//...Omit irrelevant code...
}
I would really appreciate it if you could answer me.
So, the hashcode is the hashcode of the thing that is being inserted.
They way to prevent this from being too much of a hassle is to slip in lines into the storage items's hashcode that looks like
int hashcode() {
if (I have a cached hashcode) {
return cached_hashcode;
}
(calculate hashcode)
cached_hashcode = hashcode;
return hashcode;
}
this way, for each object, you only go through the hashcode computation once.
Now, keep in mind that computers have progressed a lot. They mostly wait on the RAM subsystem to respond to results, and can do about 1000 to 10000 math operations for a single ram fetch. This means that "preserving CPU cycles" at the cost of memory look ups can actually slow down your program.
Benchmark wisely, and don't be afraid to use a little CPU if it means reducing your RAM footprint.
For those who are curious, if your program is small enough to fit into layer 1 cache, it's not a big delay, but as you spill over these caches into the other layers the delays become noticeable. This is why "caching" is not always a great solution, as if you cache too heavily, your program becomes larger, and will spill out of cache more often.
Modern CPUs try to compensate, mostly by pre-fetching the needed RAM before it is requested (looking ahead in the processing stream). That leads to better runtime in many cases, but also creates new issues (like preloading stuff you might not use because you chose the "other" path through the code).
The best bet is to not overly-cache stuff that is simple, unless it's expensive to reconstruct. With the JVM a method call (at the very low levels) is more expensive than you might think, so the JVM has special optimizations for Strings and their hash codes.
Related
I have a simple 2D Point class that I wrote myself. My Points are immutable and I need to create loads of them, for memory efficiency I created a cache to fetch those that are already there.
Total unique points I use during the process around 100_000. I need to fetch loads of them multiple times.
While profiling my app I noticed that most of the time is spent in this class.
I wonder if I did anything tremendously stupid or the time spent really is because I need to create so many points. Can I optimize this class any further? (And yes - I need the concurrent access)
This is the code:
public class Point implements Comparable<Point> {
private static final Map<Integer, Map<Integer, Point>> POINT_CACHE = new ConcurrentHashMap<>();
private static final boolean USE_CACHE = true;
public final int row;
public final int column;
private int hashCache = -1;
public static Point newPoint(int row, int column){
if(!USE_CACHE) return new Point(row,column);
return POINT_CACHE.computeIfAbsent(row, k -> new ConcurrentHashMap<>()).computeIfAbsent(column, v -> new Point(row , column));
}
public static Point newPoint(Point point){
return newPoint(point.row,point.column);
}
protected Point(int row, int column) {
this.row = row;
this.column = column;
}
#Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
Point point = (Point) o;
return row == point.row && column == point.column;
}
#Override
public int hashCode() {
//Assuming the matrix is less than 65k x 65k, this will return unique hashes
if (hashCache == -1) hashCache = (row << 16) | column;
return hashCache;
//return Objects.hash(row, column);
}
//Getter
}
Profiler Result
Yes, ConcurrentHashMap operations are expensive.
They also use a lot of memory. Even more memory than a regular HashMap.
So unless you get an average cache hit rate over 90%, you are probably going to use more memory in the ConcurrentHashMap objects than you are saving in not creating duplicate Point objects.
The other thing to observe is since your Point objects have int values as their row and column attributes, you could simply encode each distinct point as a long ... and entirely remove the need for the Point objects and the cache.
I'm developing a game in java using the LWJGL library, and I came across a huge memory leak after creating an infinite terrain generator. Basically, the terrain is divided into chunks and I have a render distance variable like 8. I have a HashMap which maps every created Terrain with it's coordinate, represented by a custom Key class. Then, for each Key (coordinates) within the render distance, I'm checking if the HashMap contains this Key, if it doesn't I'm creating it and adding it to a List of terrains to render. Then, I render the terrains, clear the list, and every 5 seconds I check every key of the Hashmap to see if it is still in the render distance and remove the ones that don't.
The thing is that I have 2 memory leak, I mean that memory is increasing with time. To debug my code, I simply remove each part until the memory stops increasing. I found 2 parts causing the 2 leaks:
First, the HashMap is not clearing correctly the references to the Terrains. Indeed, I kept track of the size of the HashMap and it never goes beyond ~200 elements, even though the memory increases really fast when generating new terrains. I think I'm missing something about HashMap, I have to precise that I implemented a hashCode and equals methods to both Key and Terrain class.
Second, creating new Keys and storing a reference to the Terrain in my loops to avoid getting it in the HashMap every time I need it are causing a small but noticeable memory leak.
RefreshChunks is called every frame, and deleteUnusedChunks every 5 seconds
public void refreshChunks(MasterRenderer renderer,Loader loader, Vector3f cameraPosition) {
int camposX = Math.round(cameraPosition.x/(float)Terrain.CHUNK_SIZE);
int camposZ = Math.round(cameraPosition.z/(float)Terrain.CHUNK_SIZE);
campos = new Vector2f(camposX,camposZ);
for(int x = -chunkViewDist + camposX; x <= chunkViewDist + camposX; x++) {
for(int z = -chunkViewDist + camposZ; z <= chunkViewDist + camposZ; z++) {
Key key = new Key(x,z); //Memory increases
Terrain value = terrains.get(key); //Memory increases
if(Maths.compareDist(new Vector2f(x,z), campos, chunkViewDist)) {
if(value == null) {
terrains.put(key, new Terrain(loader,x,z,terrainInfo)); //Memory leak happens if I fill the HashMap
}
if(!terrainsToRender.contains(value)) {
terrainsToRender.add(value);
}
} else {
if(terrainsToRender.contains(value)) {
terrainsToRender.remove(value);
}
}
}
}
renderer.processTerrain(terrainsToRender);
if(DisplayManager.getCurrentTime() - lastClean > cleanGap) {
lastClean = DisplayManager.getCurrentTime();
deleteUnusedChunks();
}
}
public void deleteUnusedChunks() {
List<Key> toRemove = new ArrayList<Key>();
terrains.forEach((k, v) -> {
if(!Maths.compareDist(new Vector2f(k.x,k.y), campos, chunkViewDist)){
toRemove.add(k);
}
});
for(Key s:toRemove) {
terrains.remove(s);
}
}
Key class implementation:
public static class Key {
public final int x;
public final int y;
public Key(int x, int y) {
this.x = x;
this.y = y;
}
#Override
public boolean equals(Object o) {
if (this == o) return true;
if (!(o instanceof Key)) return false;
Key key = (Key) o;
return x == key.x && y == key.y;
}
#Override
public int hashCode() {
int result = x;
result = 31 * result + y;
return result;
}
}
Terrain class hashCode and equals implementation:
#Override
public boolean equals(Object o) {
if (this == o) return true;
if (!(o instanceof Terrain)) return false;
Terrain te = (Terrain) o;
return gridx == te.gridx && gridz == te.gridz;
}
#Override
public int hashCode() {
int result = gridx;
result = 31 * result + gridz;
return result;
}
I'm certainly missing something about the behavior of HashMaps and classes, thanks for your help.
Edit:
My questions are:
• Is it normal that instantiating a new key class and create a reference to a terrain are making the memory increase over time?
• Why while the HashMap size stays the same when putting new elements and removing old ones the memory keeps increasing over time?
Edit 2: I tried generating terrains for about 10min, and at some point, the program used 12 GB of ram, but even though I couldn't get it to crash with a out of memory exception because the more I was generating, the less ram was added. But still I don't my game to use the maximum amount of ram available before starting to recycle it.
I need to store object type of class edge as Key and ArrayList as Value
public class edge {
String edgeType; // Horizontal or Vertical edge
int i; //i and j for a cell in 2D matrix
int j;
}
HashMap<edge, ArrayList<String>> edgeHM = new HashMap<edge, ArrayList<String>>();
Each key's value Arraylist is initialized with 0 and 1
When I retrieve and edit one of the entries using the function reduceEdgeDomain multiple keys get updated with the value. Hash Collision?
Eg. need to update Key H21, keys updated H21 and V21 (edgetype, i, j)
H22 <-> V22 or V53 <-> H53
public static void reduceEdgeDomain(HashMap<edge, ArrayList<String>> edgeHM, ArrayList<edge> nonEssEdges){
ArrayList<String> tempAL;
for(int i=0; i<nonEssEdges.size(); i++)
{
edge str = nonEssEdges.get(i);
if(edgeHM.containsKey(str)){
tempAL = edgeHM.get(str);
edgeHM.remove(str);
tempAL.remove("1");
edgeHM.put(str, tempAL);
}
}
}
override methods for equals() and hashcode()
public boolean equals(Object o){
if (this == o){
return true;
}
if(!(o instanceof edge)){
return false;
}
edge edge = (edge) o;
return i==edge.i && j==edge.j && Objects.equals(edgeType, edge.edgeType);
}
#Override
public int hashCode() {
String uniqueIdentified = edgeType + i + j;
MessageDigest digest = null;
try {
digest = MessageDigest.getInstance("SHA-256");
} catch (NoSuchAlgorithmException e) {
e.printStackTrace();
}
byte[] hash = digest.digest(uniqueIdentified.getBytes(StandardCharsets.UTF_8));
ByteBuffer wrapped = ByteBuffer.wrap(hash); // big-endian by default
short num = wrapped.getShort();
return num;
}
You are using a short as your hash. There are only 65536 short values so you might just be unlucky. Also as #JBNizet pointed out the use of SHA-256 is not very useful here and will make your code much much slower.
So What? , hash collisions are normal because there are infinite number of possible combination and permutation for string, because string length is infinite, but hashcode in java is "int" which have limited range, so there is nothing to worry about. Hashcode collisions are normal.
I'm trying to implement A-Star in Java based on OSM Data. My problem is that my implementation is not working correctly. First of all the path is not the shortest. Second the closedlist contains more 1/3 more nodes in the end as Dijkstra. Thats actuall not that what I expected.
Here is my A-Star code which is based on Wikipedia Pseudocode
public Object[] executeAstar(ArrayList<Arclistentry> data, NodeD start, NodeD dest,long[] nodenur)
{
openlist = new PriorityQueue<NodeD>(1,comp);
closedlist.clear();
openlist.offer(start);
start.setg(0);
start.seth(calccost(start, dest));
start.setf(start.getg()+start.geth());
while(!openlist.isEmpty())
{
NodeD currentnode = openlist.poll();
if(currentnode.getnodenumber() == dest.getpredessor())
{
closedlist.add(currentnode);
return drawway(closedlist, start, dest);
}
closedlist.add(currentnode);
ArrayList<Arclistentry> entries = neighbors.get((int)currentnode.getnodenumber()-1);
for(Arclistentry aentry:entries)
{
NodeD successor = new NodeD(aentry.getnode(),aentry.getstart(), aentry.getcoorddest());
float tentative_g = currentnode.getg()+calccost(currentnode,successor);//+aentry.getcost();
if(contains(successor, closedlist))
{
continue;
}
if((contains(successor,openlist))&& tentative_g >= aentry.getcost())
{
continue;
}
if(!contains(successor, openlist))
{
successor.setpredessor(currentnode.getnodenumber());
successor.setg(tentative_g);
successor.seth(calccost(successor, dest));
successor.setf(successor.getg()+successor.geth());
openlist.offer(successor);
}
else
{
openlist.remove(successor);
successor.setpredessor(currentnode.getnodenumber());
successor.setg(tentative_g);
successor.seth(calccost(successor, dest));
successor.setf(successor.getg()+successor.geth());
openlist.offer(successor);
}
}
}
return drawway(closedlist,start, dest);
}
My Heuristics will be calculated by using the euclidian distance. But to consider also the cost of the node, the costs are multiplied with the heuristics result. My Data structure contains the following:
private long nodenumber;
private long predessor;
private float label;
private float f;
private float g;
private float h;
private double[] coord = new double[2];
public NodeD(long nodenr, long predessor, double[] coor)
{
this.nodenumber = nodenr;
this.predessor = predessor;
this.coord = coor;
}
public NodeD(long nodenr, long predessor, float label)
{
this.nodenumber = nodenr;
this.predessor = predessor;
this.label = label;
}
and for the arclist I use the following:
private long start;
private long dest_node;
private float cost_;
private double[]coordstart = new double[2];
private double[]coorddest = new double[2];
Contains Function for Priority Queue:
public boolean contains(NodeD o, PriorityQueue<NodeD> al)
{
Iterator<NodeD> e = al.iterator();
if (o==null)
{
while (e.hasNext())
{
if (e.next()==null)
{
return true;
}
}
}
else
{
while (e.hasNext())
{
NodeD t = e.next();
if(t.equals(null))
{
return false;
}
if (((o.getnodenumber()==t.getnodenumber()) & (o.getpredessor()==t.getpredessor()))||(o.getnodenumber()==t.getpredessor() & o.getpredessor()==t.getnodenumber()))
{
return true;
}
}
return false;
}
return false;
}
and contains for ArrayList (because it was not detecting right with the ArrayList.contains function
public boolean contains(NodeD o, ArrayList<NodeD> al) {
return indexOf(o,al) >= 0;
}
public int indexOf(NodeD o, ArrayList<NodeD> al) {
if (o == null) {
for (int i = 0; i < al.size(); i++)
if (al.get(i)==null)
return i;
} else {
for (int i = 0; i < al.size(); i++)
{
if ((o.getpredessor()==al.get(i).getpredessor())) //(o.getnodenumber()==al.get(i).getnodenumber()) &&
{
return i;
}
else if((o.getpredessor()==al.get(i).getnodenumber())&&(o.getnodenumber()==al.get(i).getpredessor()))
{
return i;
}
}
}
return -1;
}
The problem is that the algorithm is visiting all nodes. The other problem is the sorted openlist, which is pushing neighbors of the currentnode up, because they have a lower f value. So what I'm duing wrong by implementing this algorithm?
Recap of all our previous answers:
Make sure the A* estimation is a lower estimate otherwise it will wrongly skip parts
Do not iterate over all nodes to determine the index of the edges of your current node's edge set in an array
When creating new objects to put in your queue/sets, checks should be done on the properties of the nodes
If your focus is on speed, avoid as much work as possible by aborting non-interesting searches as soon as possible
I'm still unsure about this line:
if((contains(successor,openlist))&& tentative_g >= aentry.getcost())
What I think you are trying to do is to avoid adding a new node to the queue when you already have a better value for it in there. However, tentative_g is the length of the path from your starting node to your current node while aentry.getcost seems to be the length of the edge you are relaxing. That doesn't seem right to me... Try to retrieve the correct (old) value to compare against your new tentative label.
Lastly, for your current code, I would also make the following changes:
Use HashSet for your closedlist. Every time you check if a node is in there, you have to go over them all, which is not that efficient... Try using a HashSet by overriding the hash function of your NodeD objects. The built-in contains-function is much faster than your current approach. A similar argument can be made for your openlist. You cannot change the PQ to a set but you could omit the contains-checks. If you add a node with a bad priority, you will always first poll the correct priority (because it PQ) and could then, when polling the bad priority, just skip it. That's a small optimisation that trades off size of PQ to PQ lookup-operations
avoid recalculating stuff (mainly calccost()) by calculating it once and reusing the value when you need it (small time gain but nicer code).
try to avoid multiple lines with the same code by placing them on the correct line (e.g. 2 closedlist.add function can be merged to 1 add-call placed above the if condition, if you have something like if(..){doA();doB()}else{doA();doC();} try to put doA() before the if for legibility)
I have an interesting problem I would like some help with. I have implemented a couple of queues for two separate conditions, one based on FIFO and the other natural order of a key (ConcurrentMap). That is you can image both queues have the same data just ordered differently. The question I have (and I am looking for an efficient way of doing this) if I find the key in the ConcurrentMap based on some criteria, what is the best way of finding the "position" of the key in the FIFO map. Essentially I would like to know whether it is the firstkey (which is easy), or say it is the 10th key.
Any help would be greatly appreciated.
There is no API for accessing the order in a FIFO map. The only way you can do it is iterate over keySet(), values() or entrySet() and count.
I believe something like the code below will do the job. I've left the implementation of element --> key as an abstract method. Note the counter being used to assign increasing numbers to elements. Also note that if add(...) is being called by multiple threads, the elements in the FIFO are only loosely ordered. That forces the fancy max(...) and min(...) logic. Its also why the position is approximate. First and last are special cases. First can be indicated clearly. Last is tricky because the current implementation returns a real index.
Since this is an approximate location, I would suggest you consider making the API return a float between 0.0 and 1.0 to indicate relative position in the queue.
If your code needs to support removal using some means other than pop(...), you will need to use approximate size, and change the return to ((id - min) / (max - min)) * size, with all the appropriate int / float casting & rounding.
public abstract class ApproximateLocation<K extends Comparable<K>, T> {
protected abstract K orderingKey(T element);
private final ConcurrentMap<K, Wrapper<T>> _map = new ConcurrentSkipListMap<K, Wrapper<T>>();
private final Deque<Wrapper<T>> _fifo = new LinkedBlockingDeque<Wrapper<T>>();
private final AtomicInteger _counter = new AtomicInteger();
public void add(T element) {
K key = orderingKey(element);
Wrapper<T> wrapper = new Wrapper<T>(_counter.getAndIncrement(), element);
_fifo.add(wrapper);
_map.put(key, wrapper);
}
public T pop() {
Wrapper<T> wrapper = _fifo.pop();
_map.remove(orderingKey(wrapper.value));
return wrapper.value;
}
public int approximateLocation(T element) {
Wrapper<T> wrapper = _map.get(orderingKey(element));
Wrapper<T> first = _fifo.peekFirst();
Wrapper<T> last = _fifo.peekLast();
if (wrapper == null || first == null || last == null) {
// element is not in composite structure; fifo has not been written to yet because of concurrency
return -1;
}
int min = Math.min(wrapper.id, Math.min(first.id, last.id));
int max = Math.max(wrapper.id, Math.max(first.id, last.id));
if (wrapper == first || max == min) {
return 0;
}
if (wrapper == last) {
return max - min;
}
return wrapper.id - min;
}
private static class Wrapper<T> {
final int id;
final T value;
Wrapper(int id, T value) {
this.id = id;
this.value = value;
}
}
}
If you can use a ConcurrentNavigableMap, the size of the headMap gives you exactly what you want.