I have a simple 2D Point class that I wrote myself. My Points are immutable and I need to create loads of them, for memory efficiency I created a cache to fetch those that are already there.
Total unique points I use during the process around 100_000. I need to fetch loads of them multiple times.
While profiling my app I noticed that most of the time is spent in this class.
I wonder if I did anything tremendously stupid or the time spent really is because I need to create so many points. Can I optimize this class any further? (And yes - I need the concurrent access)
This is the code:
public class Point implements Comparable<Point> {
private static final Map<Integer, Map<Integer, Point>> POINT_CACHE = new ConcurrentHashMap<>();
private static final boolean USE_CACHE = true;
public final int row;
public final int column;
private int hashCache = -1;
public static Point newPoint(int row, int column){
if(!USE_CACHE) return new Point(row,column);
return POINT_CACHE.computeIfAbsent(row, k -> new ConcurrentHashMap<>()).computeIfAbsent(column, v -> new Point(row , column));
}
public static Point newPoint(Point point){
return newPoint(point.row,point.column);
}
protected Point(int row, int column) {
this.row = row;
this.column = column;
}
#Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
Point point = (Point) o;
return row == point.row && column == point.column;
}
#Override
public int hashCode() {
//Assuming the matrix is less than 65k x 65k, this will return unique hashes
if (hashCache == -1) hashCache = (row << 16) | column;
return hashCache;
//return Objects.hash(row, column);
}
//Getter
}
Profiler Result
Yes, ConcurrentHashMap operations are expensive.
They also use a lot of memory. Even more memory than a regular HashMap.
So unless you get an average cache hit rate over 90%, you are probably going to use more memory in the ConcurrentHashMap objects than you are saving in not creating duplicate Point objects.
The other thing to observe is since your Point objects have int values as their row and column attributes, you could simply encode each distinct point as a long ... and entirely remove the need for the Point objects and the cache.
Related
I'm developing a game in java using the LWJGL library, and I came across a huge memory leak after creating an infinite terrain generator. Basically, the terrain is divided into chunks and I have a render distance variable like 8. I have a HashMap which maps every created Terrain with it's coordinate, represented by a custom Key class. Then, for each Key (coordinates) within the render distance, I'm checking if the HashMap contains this Key, if it doesn't I'm creating it and adding it to a List of terrains to render. Then, I render the terrains, clear the list, and every 5 seconds I check every key of the Hashmap to see if it is still in the render distance and remove the ones that don't.
The thing is that I have 2 memory leak, I mean that memory is increasing with time. To debug my code, I simply remove each part until the memory stops increasing. I found 2 parts causing the 2 leaks:
First, the HashMap is not clearing correctly the references to the Terrains. Indeed, I kept track of the size of the HashMap and it never goes beyond ~200 elements, even though the memory increases really fast when generating new terrains. I think I'm missing something about HashMap, I have to precise that I implemented a hashCode and equals methods to both Key and Terrain class.
Second, creating new Keys and storing a reference to the Terrain in my loops to avoid getting it in the HashMap every time I need it are causing a small but noticeable memory leak.
RefreshChunks is called every frame, and deleteUnusedChunks every 5 seconds
public void refreshChunks(MasterRenderer renderer,Loader loader, Vector3f cameraPosition) {
int camposX = Math.round(cameraPosition.x/(float)Terrain.CHUNK_SIZE);
int camposZ = Math.round(cameraPosition.z/(float)Terrain.CHUNK_SIZE);
campos = new Vector2f(camposX,camposZ);
for(int x = -chunkViewDist + camposX; x <= chunkViewDist + camposX; x++) {
for(int z = -chunkViewDist + camposZ; z <= chunkViewDist + camposZ; z++) {
Key key = new Key(x,z); //Memory increases
Terrain value = terrains.get(key); //Memory increases
if(Maths.compareDist(new Vector2f(x,z), campos, chunkViewDist)) {
if(value == null) {
terrains.put(key, new Terrain(loader,x,z,terrainInfo)); //Memory leak happens if I fill the HashMap
}
if(!terrainsToRender.contains(value)) {
terrainsToRender.add(value);
}
} else {
if(terrainsToRender.contains(value)) {
terrainsToRender.remove(value);
}
}
}
}
renderer.processTerrain(terrainsToRender);
if(DisplayManager.getCurrentTime() - lastClean > cleanGap) {
lastClean = DisplayManager.getCurrentTime();
deleteUnusedChunks();
}
}
public void deleteUnusedChunks() {
List<Key> toRemove = new ArrayList<Key>();
terrains.forEach((k, v) -> {
if(!Maths.compareDist(new Vector2f(k.x,k.y), campos, chunkViewDist)){
toRemove.add(k);
}
});
for(Key s:toRemove) {
terrains.remove(s);
}
}
Key class implementation:
public static class Key {
public final int x;
public final int y;
public Key(int x, int y) {
this.x = x;
this.y = y;
}
#Override
public boolean equals(Object o) {
if (this == o) return true;
if (!(o instanceof Key)) return false;
Key key = (Key) o;
return x == key.x && y == key.y;
}
#Override
public int hashCode() {
int result = x;
result = 31 * result + y;
return result;
}
}
Terrain class hashCode and equals implementation:
#Override
public boolean equals(Object o) {
if (this == o) return true;
if (!(o instanceof Terrain)) return false;
Terrain te = (Terrain) o;
return gridx == te.gridx && gridz == te.gridz;
}
#Override
public int hashCode() {
int result = gridx;
result = 31 * result + gridz;
return result;
}
I'm certainly missing something about the behavior of HashMaps and classes, thanks for your help.
Edit:
My questions are:
• Is it normal that instantiating a new key class and create a reference to a terrain are making the memory increase over time?
• Why while the HashMap size stays the same when putting new elements and removing old ones the memory keeps increasing over time?
Edit 2: I tried generating terrains for about 10min, and at some point, the program used 12 GB of ram, but even though I couldn't get it to crash with a out of memory exception because the more I was generating, the less ram was added. But still I don't my game to use the maximum amount of ram available before starting to recycle it.
I have implemented my own hashmap for study purposes. The key has a string and the value has an object of the class I created. By the way, I want to know if my hashcode method is appropriate, and how to not calculate the hashcode every time a value is inserted.
I saved the hash value once calculated as a member variable of object. However, when the get method is called, only the key value is received, so the hashcode must be obtained. How can I recycle the once calculated hash value?
Finally, is my hash generation method appropriate?
class IHashMap {
private class Node {
int hash;
String key;
int data;
Node right;
public Node(String key, int data) {
this.key = key;
this.data = data;
this.right = null;
this.hash = 0;
}
}
private Node[] table;
private int tbSize;
private int n;
public IHashMap(int tbSize) {
this.table = new Node[tbSize];
this.tbSize = tbSize;
this.n = 0;
}
//...Omit irrelevant code...
public void put(String key, int value) {
int hash = hashCode(key);
Node node = new Node(key, value);
node.hash = hash;
if(this.table[hash] != null) {
Node entry = this.table[hash];
while(entry.right != null && !entry.key.equals(key))
entry = entry.right;
if(entry.key.equals(key)) {
entry.data++;
}
else {
entry.right = node;
this.n++;
}
}
else {
this.table[hash] = node;
this.n++;
}
}
public int get(String key) {
int hash = hashCode(key);
if(this.table[hash] != null) {
if(this.table[hash].key.equals(key))
return this.table[hash].data;
Node entry = this.table[hash];
while(entry != null && !entry.key.equals(key))
entry = entry.right;
if(entry == null)
return -1;
return entry.data;
}
return -1;
}
private int hash(String key) {
int h = 0;
if(key.length() > 0) {
char[] var = strToCharArray(key);
for(int i = 0; i < var.length; i++)
h = 31 * h + var[i];
}
return h;
}
private int hashCode(String key) {
return (hash(key) & 0x7fffffff) % this.tbSize;
}
//...Omit irrelevant code...
}
I would really appreciate it if you could answer me.
So, the hashcode is the hashcode of the thing that is being inserted.
They way to prevent this from being too much of a hassle is to slip in lines into the storage items's hashcode that looks like
int hashcode() {
if (I have a cached hashcode) {
return cached_hashcode;
}
(calculate hashcode)
cached_hashcode = hashcode;
return hashcode;
}
this way, for each object, you only go through the hashcode computation once.
Now, keep in mind that computers have progressed a lot. They mostly wait on the RAM subsystem to respond to results, and can do about 1000 to 10000 math operations for a single ram fetch. This means that "preserving CPU cycles" at the cost of memory look ups can actually slow down your program.
Benchmark wisely, and don't be afraid to use a little CPU if it means reducing your RAM footprint.
For those who are curious, if your program is small enough to fit into layer 1 cache, it's not a big delay, but as you spill over these caches into the other layers the delays become noticeable. This is why "caching" is not always a great solution, as if you cache too heavily, your program becomes larger, and will spill out of cache more often.
Modern CPUs try to compensate, mostly by pre-fetching the needed RAM before it is requested (looking ahead in the processing stream). That leads to better runtime in many cases, but also creates new issues (like preloading stuff you might not use because you chose the "other" path through the code).
The best bet is to not overly-cache stuff that is simple, unless it's expensive to reconstruct. With the JVM a method call (at the very low levels) is more expensive than you might think, so the JVM has special optimizations for Strings and their hash codes.
I have a custom class MarioState that I want to use in a HashMap. The class represents a possible state in a state space of the Mario game. Below is a simplified version of the class MarioState.
In my HashMap I want to store these states. However, not ever property in the MarioState is something that should be considered when comparing two MarioState's. For example if one MarioState has the stuck property set to true and a distance of 30 and another MarioState also has the stuck property set to true but a different distance value (e.g. 20) then they still should be considered the same.
I know for this to work in my HashMap I have to implement the .equals() and .hashcode() methods, which is what I did (by letting them be automatically generated by the InteliJ IDE).
public class MarioState{
// Tracking the distance Mario has moved.
private int distance;
private int lastDistance;
// To keep track of if Mario is stuck or not.
private int stuckCount;
private boolean stuck;
public MarioState(){
stuckCount = 0;
stuck = false;
distance = 0;
lastDistance = 0;
}
public void update(Environment environment){
// Computing the distance
int tempDistance = environment.getEvaluationInfo().distancePassedPhys;
distance = tempDistance - lastDistance;
lastDistance = tempDistance;
// If Mario hasn't moved for over 25 turns then this means he is stuck.
if(distance == 0){
stuckCount++;
} else {
stuckCount = 0;
stuck = false;
}
if(stuckCount > 25){ stuck = true; }
}
public float calculateReward(){
float reward = 0f;
reward += distance * 2;
if(stuck){ reward += -20; }
return reward;
}
#Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
MarioState that = (MarioState) o;
if (stuck != that.stuck) return false;
return true;
}
#Override
public int hashCode() {
return (stuck ? 1 : 0);
}
}
The problem is however that when running the code some of the keys are considered different when it shouldn't be according to their .equals() and .hashcode() functions. What can possibly cause this? Did I forget something?
The code used when inserting states in the HashMap (additional information can be provided if necessary):
public float[] getActionsQValues(MarioState state){
if(!table.containsKey(state)) {
float[] initialQvalues = getInitialQvalues(state);
table.put(state, initialQvalues);
return initialQvalues;
}
return table.get(state);
}
A screenshot when I'm in debug mode shows my table containing two keys with different values, but the keys itself are the same (but in the HashMap it is considered different).
Your hash code computation and equality comparison are both based on stuck - but that can change over time.
If you mutate an object after adding it as a key within a hash map, in such a way that the hash code changes, then the key will not be found when you later request it - because the hash code that was stored when the key was first added will no longer be the same as its current hash code.
Wherever possible, try to avoid using mutable objects as keys within a map (even a TreeMap which doesn't use the hash code would have the same problem if you changed the object in a way which would change relative ordering). If you must use mutable objects as keys within a map, you should avoid mutating them after adding them as keys.
I have a file of Integer[]s that is too large to put in memory. I would like to search for all arrays with a last member of x and use them in other code. Is there a way to use Guava's multimap to do this, where x is the key and stored in memory and the Integer[] is the value and that is stored on disk? In this scenario, the keys are not unique, but key-value pairs are unique. Reading of this multimap (assuming that it's possible) will be concurrent. I'm also open to suggestions of other ways to approach this.
Thanks
You could create a class representing an array on disk (based on its index in the file of arrays), let's call it FileBackedIntArray, and put instances of that as the values of a HashMultimap<Integer, FileBackedIntArray>:
public class FileBackedIntArray {
// Index of the array in the file of arrays
private final int index;
private final int lastElement;
public FileBackedIntArray(int index, int lastElement) {
this.index = index;
this.lastElement = lastElement;
}
public int getIndex() {
return index;
}
public int[] readArray() {
// Read the file and deserialize the array at the associated index
return smth;
}
public int getLastElement() {
return lastElement;
}
#Override
public int hashCode() {
return index;
}
#Override
public boolean equals(Object o) {
if (this == o) {
return true;
} else if (o == null || o.getClass() != getClass()) {
return false;
}
return index == ((FileBackedIntArray) o).index;
}
}
Do you actually need an Integer[] and not an int[], by the way (i.e. you can have null values)? As you've said in the comments, you don't really need an Integer[], so using intss everywhere will avoid boxing/unboxing and will save a lot of space since you appear to have lots of them. Hopefully you don't have a huge number of possible values for the last element (x).
You then create an instance for each array and read the last element to put it the Multimap without keeping the array around. Populating the Multimap needs to be either sequential or protected with a lock if concurrent, but reading can be concurrent without any protection. You could even create an ImmutableMultimap once the HashMultimap has been populated, to guard against any modification, a safe practice in a concurrent environment.
I try to draw lines between different GridPositions(x,y). Every GridPos has 4 Connections North, East, South, West. The Problem is if I paint a line from GridPos(1,1) to GridPos(2,2) the program will paint also a line in reverse direction between GridPos(2,2) and GridPos(1,1) later.
I tried to solve the problem with this class (WarpGate is the same as GridPos):
public class GateConnection {
private WarpGate gate1 = null;
private WarpGate gate2 = null;
public GateConnection(WarpGate gate1, WarpGate gate2) {
super();
this.gate1 = gate1;
this.gate2 = gate2;
}
#Override
public int hashCode() {
final int prime = 31;
int result = prime * ((gate1 == null) ? 0 : gate1.hashCode());
result += prime * ((gate2 == null) ? 0 : gate2.hashCode());
return result;
}
#Override
public boolean equals(Object obj) {
if (this == obj) {
return true;
}
if (obj == null) {
return false;
}
if (getClass() != obj.getClass()) {
return false;
}
GateConnection other = (GateConnection) obj;
if ((gate1.equals(other.gate1) || gate1.equals(other.gate2)) && (gate2.equals(other.gate2) || gate2.equals(other.gate1))) {
return true;
}
return false;
}
}
This Class could be added to an HashSet and the double painting would be gone but I don't know if the hashValue is always unique.
HashCode of WarpGate (auto-generated by eclipse):
#Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + gridX;
result = prime * result + gridY;
return result;
}
For now I use an ArrayList. I look if the GateConnection exists, if not then add. But this version takes much more ressources than using a HashSet.
EDIT:
The white rectangles are the connections which are painted, the numbers are the GridPositions(x|y) and the red Arrows are the two directions the rectangle is painted because GridPos(2|2) has a connection to GridPos(4|2) and (4|2) to (2|2)
A TreeSet neither uses hashCode() nor equals(). It uses compareTo(), though you should ensure it is consistent with equals() to respect Set semantics.
For a HashSet, the hashCode() of a stored object does not have to be unique. In fact, you can return the same code for every item if you want and they will still be stored without losing any items, if your equals() is implemented correctly. A good hashCode() will improve performance only.
The only critical rule is that two equal items must generate the same hash code.
Your implementation looks OK as long as you can guarantee that gate1 and gate2 are never equal within the same GateConnection object. If they are equal, two GateConnection objects could have different hash codes but be reported as equal. That would lead to unpredictable behaviour if they are stored in a HashSet.
E.g. GateConnection((1,1), (1,1)) equals GateConnection((1,1), (7,9)) but the hash codes are different.