Java: need advise about WeakHashMap

Java: need advise about WeakHashMap - java

I guess I'm another person trying to make some kind of a cache with WeakHashMap. And I need some help with it.
I have bunch of TrackData objects that contain information about audio tracks. Then there are Track objects that keep reference to the TrackData inside. Several tracks can point to the same TrackData. Then I have TrackDataCache class that looks like this:
public class TrackDataCache {
private static TrackDataCache instance = new TrackDataCache();
public static TrackDataCache getInstance() {
return instance;
}
private WeakHashMap<TrackData, WeakReference<TrackData>> cache = new WeakHashMap<TrackData, WeakReference<TrackData>>();
public void cache(Track track) {
TrackData key = track.getTrackData();
WeakReference<TrackData> trackData = cache.get(key);
if (trackData == null) {
cache.put(key, new WeakReference<TrackData>(key));
} else {
track.setTrackData(trackData.get());
}
}
}
So when I load a track, I call TrackDataCache.cache() and if its track data was not loaded before, it is cached or replaced with cached copy otherwise (TrackData overrides equals() method to check for location and subsong index). I want to use weak references so that I don't need to care when I remove Tracks.
I wanted to ask if it is an ok practice to keep weak reference to the key in WeakHashMap, and if not, how should I approach this problem? I need weak references and constant time retrieving of cached values. I was thinking of copying WeakHashMap code and making getEntry() method public, which solves the problem but it's such a bad hack :(
PS. I understand that either apache or google collections may have something like this, but I really don't want to add 2Mb dependencies.

I'd recommend to replace WeakReferences with SoftReferences.
Any objects which is referenced only by a WeakReference is a target for every round of the garbage collector. It means that your cache can be cleared even it there's still enough free memory.
If you replace WeakReference with SoftReference then you state: Remove referenced object only when there's absolutely no free memory to allocate.
There's no ready-to-use SoftHashMap implementation in java. There is a good example in guava - MapMaker. It's worth to use this well-tested and verified on production environments code and not to provide your own definitely less quality implementation. It also has amazing mechanism of 'self-cleaning':
you can specify cache max size:
As the map size grows close to the
maximum, the map will evict entries
that are less likely to be used again.
For example, the map may evict an
entry because it hasn't been used
recently or very often.
you can specify expiration time for map entries with expireAfterWrite and expireAfterAccess methods.
I also find your cache design not very convenient. As I understand from your code snippet, from start your Tracks have strong references to their TrackData and you build your cache upon these circumstances. But from some moment you want to use your cache for retreiving data so you'll have to create new Tracks in some other way because from that moment you want to use cache but not strong references.
Different Tracks can have the same TrackData so we can't use Track as a key. So, I'd go with the next approach:
introduce intermediate ids level and make cache based on the Map<Integer, TrackData> with soft values and defined self-cleaning strategy (based on MapMaker);
change relation Track --> TrackData to Track --> Id (int). Cache Id --> TrackData.

TrackData can be shared by many instances of Track. We need to have a key system that doesn't require TrackData to obtain the same instance for several Track.
public class Track [
#Override
public int hashcode() {
... make hashcode that will be the same for
... tracks sharing the same track data.
}
#Override
public boolean equals() {
... ensure that if A.hashcode == B.hashcode then A.equals(B)
}
}
public class TrackDataManager {
private WeakHashMap<Track,TrackData> cache = new WeakHashMap<Track,TrackData>();
public TrackData getTrackData(Track track) {
// Track.hashcode()/equals() ensures two tracks that
// share track data will get the same object back
TrackData data = cache.get(track);
if (data == null) {
data = constructDataFromTrackFile(track);
cache.put(track, data);
}
return data;
}
private TrackData constructDataFromTrackFile(Track track) {
... read data from file and create that object.
}
}
If the construction of the TrackData object is always going to happen as part of reading the file, but the created instance is being thrown away in favour of the shared instance, I'd model that like this:
public class TrackData {
#Override
public int hashcode() {
... make hashcode that will be the same for same track data.
}
#Override
public boolean equals() {
... ensure that if A.hashcode == B.hashcode then A.equals(B)
}
}
public class TrackDataCache {
private WeakHashMap<Integer,TrackData> cache = new WeakHashMap<Integer,TrackData>();
public TrackData getTrackData(Track track) {
// cache contains shared TrackData instances, we may throw away
// the Track instance in favour of the shared one.
Integer key = track.getTrackData().hashcode();
TrackData data = cache.get(key);
if (data == null) {
cache.put(key, track.getTrackData());
data = track.getTrackData();
} else {
// ensure we're using the shared instance, not the local one.
// deliberate object reference comparison
if (data != track.getTrackData()) {
track.setTrackData(data);
}
}
return data;
}
}
Notice that the WeakHashMap will not do anything in any of the two solutions as long as there are Track objects alive keeping references to the TrackData. This could be fixed by making WeakReference inside Track - however that also means you can end up not having any TrackData, and need to read it back from file, in which case the first solution is better modelled than the second.

Related

Omitting an instance field at run time in Java

Java's assert mechanism allows disabling putting in assertions which have essentially no run time cost (aside from a bigger class file) if assertions are disabled. But this may cover all situations.
For instance, many of Java's collections feature "fail-fast" iterators that attempt to detect when you're using them in a thread-unsafe way. But this requires both the collection and the iterator itself to maintain extra state that would not be needed if these checks weren't there.
Suppose someone wanted to do something similar, but allow the checks to be disabled and if they are disabled, it saves a few bytes in the iterator and likewise a few more bytes in the ArrayList, or whatever.
Alternatively, suppose we're doing some sort of object pooling that we want to be able to turn on and off at runtime; when it's off, it should just use Java's garbage collection and take no room for reference counts, like this (note that the code as written is very broken):
class MyClass {
static final boolean useRefCounts = my.global.Utils.useRefCounts();
static {
if(useRefCounts)
int refCount; // want instance field, not local variable
}
void incrementRefCount(){
if(useRefCounts) refCount++; // only use field if it exists;
}
/**return true if ready to be collected and reused*/
boolean decrementAndTestRefCount(){
// rely on Java's garbage collector if ref counting is disabled.
return useRefCounts && --refCount == 0;
}
}
The trouble with the above code is that the static bock makes no sense. But is there some trick using low-powered magic to make something along these lines work? (If high powered magic is allowed, the nuclear option is generate two versions of MyClass and arrange to put the correct one on the class path at start time.)

NOTE: You might not need to do this at all. The JIT is very good at inlining constants known at runtime especially boolean and optimising away the code which isn't used.
The int field is not ideal, however, if you are using a 64 bit JVM, the object size might not change.
On the OpenJDK/Oracle JVM (64-bit), the header is 12 bytes by default. The object alignment is 8 byte so the object will use 16 bytes. The field, adds 4 bytes, which after alignment is also 16 bytes.
To answer the question, you need two classes (unless you use generated code or hacks)
class MyClass {
static final boolean useRefCounts = my.global.Utils.useRefCounts();
public static MyClass create() {
return useRefCounts ? new MyClassPlus() : new MyClass();
}
void incrementRefCount() {
}
boolean decrementAndTestRefCount() {
return false;
}
}
class MyClassPlus extends MyClass {
int refCount; // want instance field, not local variable
void incrementRefCount() {
refCount++; // only use field if it exists;
}
boolean decrementAndTestRefCount() {
return --refCount == 0;
}
}

If you accept a slightly higher overhead in the case you’re using your ref count, you may resort to external storage, i.e.
class MyClass {
static final WeakHashMap<MyClass,Integer> REF_COUNTS
= my.global.Utils.useRefCounts()? new WeakHashMap<>(): null;
void incrementRefCount() {
if(REF_COUNTS != null) REF_COUNTS.merge(this, 1, Integer::sum);
}
/**return true if ready to be collected and reused*/
boolean decrementAndTestRefCount() {
return REF_COUNTS != null
&& REF_COUNTS.compute(this, (me, i) -> --i == 0? null: i) == null;
}
}
There is a behavioral difference for the case that someone invokes decrementAndTestRefCount() more often than incrementRefCount(). While your original code silently runs into a negative ref count, this code will throw a NullPointerException. I prefer failing with an exception in this case…
The code above will leave you with the overhead of a single static field in case you’re not using the feature. Most JVMs should have no problems eliminating the conditionals regarding the state of a static final variable.
Note further that the code allows MyClass instances to get garbage collected while having a non-zero ref count, just like when it was an instance field, but also actively removes the mapping when the count reaches the initial state of zero again, to minimize the work needed for cleanup.

reading a reference to an object and reading the object’s fields under JMM

This post was raised after reading: https://shipilev.net/blog/2016/close-encounters-of-jmm-kind/#pitfall-semi-sync
class Box {
int x;
public Box(int v) {
x = v;
}
}
class RacyBoxy {
Box box;
public synchronized void set(Box v) {
box = v;
}
public Box get() {
return box;
}
}
and test:
#JCStressTest
#State
public class SynchronizedPublish {
RacyBoxy boxie = new RacyBoxy();
#Actor
void actor() {
boxie.set(new Box(42)); // set is synchronized
}
#Actor
void observer(IntResult1 r) {
Box t = boxie.get(); // get is not synchronized
if (t != null) {
r.r1 = t.x;
} else {
r.r1 = -1;
}
}
}
The author says that it is possible that r.r1 == 0. And I agree with
that. But, I am confused with an explanation:
The actual failure comes from the fact that reading a reference to an object and reading the object’s fields are distinct under the memory model.
I agree that
reading a reference to an object and reading the object’s fields are distinct under the memory model
but, I don't see how it has an influence on result.
Please help me understand it.
P.S. If someone is confused about #Actor. It just means: run in a thread.

I think it adresses a common miconception of people that read code with regards to sequential consitency. The fact that the reference to an instance is available in one thread, does not imply that its constructor is set. In other words: reading an instance is a different operation than reading an instance's field. Many people assume that once they can observe an instance, it requires the constructor to be run but due to the missing read synchronization, this is not true for the above example.

Ill just slightly augment the accepted answer here - without some barriers there are absolutely no guarantees that once you see a reference (think some threads can get a hold of a reference) - all the fields from that constructor are initialized. I actually answered sort of this already some time ago to one of your questions if I'm not mistaken.
There are two barriers inserted after the constructor that has final fields LoadLoad and LoadStore; it you think about their names - you will notice that no operation after the constructor can be re-ordered with one inside it:
Load -> Load (no Load can be re-ordered with a previous Load)
Load -> Store (no Store can be re-ordered with a previous Load)
Also note that it would be impossible for you to break that under the current x86 memory model - as it is a (too?) strong memory model; and as such these barriers are free on x86 - they are not inserted at all, because the operations are not re-ordered.

How to design a model to allow apply or cancel updates?

Is there a design or development pattern where we deal with making updates to a copy of the actual data and applying the diff to the original reference if needed?
If not, what is the best way of designing such models?
What I think I should do:
I should probably use an enum mode to indicate whether the model is being used in 'Update direct reference mode' OR 'Update only a copy mode'
Update the setters and getters of data to reference the actualState or the temporaryState as per what mode is the model being used in.
Have the setter method for mode to create a copy of the actual data and store it in a temporary state. If the mode is updated to update direct reference, clear out the temporaryState
Create a method for applying the changes from temporaryState to the actualState. This method shall also clear out the temporary state from memory.
In code:
enum InsertionMode {
UPDATE_DIRECT, UPDATE_COPY
}
class Store {
private Data actualState;
private Data temporaryState;
private InsertionMode mode;
private void resetTemporaryState() {
....
}
private void initTemporaryState() {
this.temporaryState = copy(actualState);
}
private commitTemporaryState() {
this.actualState = this.temporaryState;
this.resetTemporaryState();
}
public Data setInsertionMode(InsertionMode mode) {
if (this.mode != mode) {
InsertionMode previousMode = this.mode;
this.mode = mode;
if (previousMode == InsertionMode.UPDATE_COPY) {
this.resetTemporaryState();
}
if (this.mode == InsertionMode.UPDATE_COPY) {
this.initTemporaryState();
}
}
}
public void commit() {
if (this.mode == InsertionMode.UPDATE_COPY) {
this.commitTemporaryState();
}
}
public void abort() {
if (this.mode == InsertionMode.UPDATE_COPY) {
this.resetTemporaryState();
this.setInsertionMode(InsertionMode.UPDATE_DIRECT);
}
}
...
}

The given code is "okay", as it will support your requirements.
But: updating objects is a simple approach, and is easy to implement. But depending on your context, you do things really differently in 2017.
Instead of having one mutable object that changes state, you could instead go for immutable objects. State becomes a sequence of such objects.
Reaching a new state means adding a newly created object at the end of the sequence, cancel means to go with the old, unchanged sequence. This approach is the base for blockchain applications; but it can be scaled down to a smaller context as well - just by looking at its core aspect: you never change state by changing existing objects, but by creating new objects. Of course, this needs a lot of thought; you don't want to blindly duplicate everything; you might more be looking having "delta" objects (that represent individual changes) and "views" that show aggregations of deltas.
Beyond that: you might want to read about CQRS versus CRUD (for example this).

Java Server Client, shared variable between threads

I am working on a project to create a simple auction server that multiple clients connect to. The server class implements Runnable and so creates a new thread for each client that connects.
I am trying to have the current highest bid stored in a variable that can be seen by each client. I found answers saying to use AtomicInteger, but when I used it with methods such as atomicVariable.intValue() I got null pointer exception errors.
What ways can I manipulate the AtomicInteger without getting this error or is there an other way to have a shared variable that is relatively simple?
Any help would be appreciated, thanks.
Update
I have the AtomicInteger working. The problem is now that only the most recent client to connect to the server seems to be able to interact with it. The other client just sort of freeze.
Would I be correct in saying this is a problem with locking?

Well, most likely you forgot to initialize it:
private final AtomicInteger highestBid = new AtomicInteger();
However working with highestBid requires a great deal of knowledge to get it right without any locking. For example if you want to update it with new highest bid:
public boolean saveIfHighest(int bid) {
int currentBid = highestBid.get();
while (currentBid < bid) {
if (highestBid.compareAndSet(currentBid, bid)) {
return true;
}
currentBid = highestBid.get();
}
return false;
}
or in a more compact way:
for(int currentBid = highestBid.get(); currentBid < bid; currentBid = highestBid.get()) {
if (highestBid.compareAndSet(currentBid, bid)) {
return true;
}
}
return false;
You might wonder, why is it so hard? Image two threads (requests) biding at the same time. Current highest bid is 10. One is biding 11, another 12. Both threads compare current highestBid and realize they are bigger. Now the second thread happens to be first and update it to 12. Unfortunately the first request now steps in and revert it to 11 (because it already checked the condition).
This is a typical race condition that you can avoid either by explicit synchronization or by using atomic variables with implicit compare-and-set low-level support.
Seeing the complexity introduced by much more performant lock-free atomic integer you might want to restore to classic synchronization:
public synchronized boolean saveIfHighest(int bid) {
if (highestBid < bid) {
highestBid = bid;
return true;
} else {
return false;
}
}

I wouldn't look at the problem like that. I would simply store all the bids in a ConcurrentSkipListSet, which is a thread-safe SortedSet. With the correct implementation of compareTo(), which determines the ordering, the first element of the Set will automatically be the highest bid.
Here's some sample code:
public class Bid implements Comparable<Bid> {
String user;
int amountInCents;
Date created;
#Override
public int compareTo(Bid o) {
if (amountInCents == o.amountInCents) {
return created.compareTo(created); // earlier bids sort first
}
return o.amountInCents - amountInCents; // larger bids sort first
}
}
public class Auction {
private SortedSet<Bid> bids = new ConcurrentSkipListSet<Bid>();
public Bid getHighestBid() {
return bids.isEmpty() ? null : bids.first();
}
public void addBid(Bid bid) {
bids.add(bid);
}
}
Doing this has the following advantages:
Automatically provides a bidding history
Allows a simple way to save any other bid info you need
You could also consider this method:
/**
* #param bid
* #return true if the bid was successful
*/
public boolean makeBid(Bid bid) {
if (bids.isEmpty()) {
bids.add(bid);
return true;
}
if (bid.compareTo(bids.first()) <= 0) {
return false;
}
bids.add(bid);
return true;
}

Using an AtomicInteger is fine, provided you initialise it as Tomasz has suggested.
What you might like to think about, however, is whether all you will literally ever need to store is just the highest bid as an integer. Will you never need to store associated information, such as the bidding time, user ID of the bidder etc? Because if at a later stage you do, you'll have to start undoing your AtomicInteger code and replacing it.
I would be tempted from the outset to set things up to store arbitrary information associated with the bid. For example, you can define a "Bid" class with the relevant field(s). Then on each bid, use an AtomicReference to store an instance of "Bid" with the relevant information. To be thread-safe, make all the fields on your Bid class final.
You could also consider using an explicit Lock (e.g. see the ReentrantLock class) to control access to the highest bid. As Tomasz mentions, even with an AtomicInteger (or AtomicReference: the logic is essentially the same) you need to be a little careful about how you access it. The atomic classes are really designed for cases where they are very frequently accessed (as in thousands of times per second, not every few minutes as on a typical auction site). They won't really give you any performance benefit here, and an explicit Lock object might be more intuitive to program with.

Simple database-like collection class in Java

The problem: Maintain a bidirectional many-to-one relationship among java objects.
Something like the Google/Commons Collections bidi maps, but I want to allow duplicate values on the forward side, and have sets of the forward keys as the reverse side values.
Used something like this:
// maintaining disjoint areas on a gameboard. Location is a space on the
// gameboard; Regions refer to disjoint collections of Locations.
MagicalManyToOneMap<Location, Region> forward = // the game universe
Map<Region, <Set<Location>>> inverse = forward.getInverse(); // live, not a copy
Location parkplace = Game.chooseSomeLocation(...);
Region mine = forward.get(parkplace); // assume !null; should be O(log n)
Region other = Game.getSomeOtherRegion(...);
// moving a Location from one Region to another:
forward.put(parkplace, other);
// or equivalently:
inverse.get(other).add(parkplace); // should also be O(log n) or so
// expected consistency:
assert ! inverse.get(mine).contains(parkplace);
assert forward.get(parkplace) == other;
// and this should be fast, not iterate every possible location just to filter for mine:
for (Location l : mine) { /* do something clever */ }
The simple java approaches are: 1. To maintain only one side of the relationship, either as a Map<Location, Region> or a Map<Region, Set<Location>>, and collect the inverse relationship by iteration when needed; Or, 2. To make a wrapper that maintains both sides' Maps, and intercept all mutating calls to keep both sides in sync.
1 is O(n) instead of O(log n), which is becoming a problem. I started in on 2 and was in the weeds straightaway. (Know how many different ways there are to alter a Map entry?)
This is almost trivial in the sql world (Location table gets an indexed RegionID column). Is there something obvious I'm missing that makes it trivial for normal objects?

I might misunderstand your model, but if your Location and Region have correct equals() and hashCode() implemented, then the set of Location -> Region is just a classical simple Map implementation (multiple distinct keys can point to the same object value). The Region -> Set of Location is a Multimap (available in Google Coll.). You could compose your own class with the proper add/remove methods to manipulate both submaps.
Maybe an overkill, but you could also use in-memory sql server (HSQLDB, etc). It allows you to create index on many columns.

I think you could achieve what you need with the following two classes. While it does involve two maps, they are not exposed to the outside world, so there shouldn't be a way for them to get out of sync. As for storing the same "fact" twice, I don't think you'll get around that in any efficient implementation, whether the fact is stored twice explicitly as it is here, or implicitly as it would be when your database creates an index to make joins more efficient on your 2 tables. you can add new things to the magicset and it will update both mappings, or you can add things to the magicmapper, which will then update the inverse map auotmatically. The girlfriend is calling me to bed now so I cannot run this through a compiler - it should be enough to get you started. what puzzle are you trying to solve?
public class MagicSet<L> {
private Map<L,R> forward;
private R r;
private Set<L> set;
public MagicSet<L>(Map forward, R r) {
this.forward = map;
this.r = r;
this.set = new HashSet<L>();
}
public void add(L l) {
set.add(l);
forward.put(l,r);
}
public void remove(L l) {
set.remove(l);
forward.remove(l);
}
public int size() {
return set.size();
}
public in contains(L l){
return set.contains(l);
}
// caution, do not use the remove method from this iterator. if this class was going
// to be reused often you would want to return a wrapped iterator that handled the remove method properly. In fact, if you did that, i think you could then extend AbstractSet and MagicSet would then fully implement java.util.Set.
public Iterator iterator() {
return set.iterator();
}
}
public class MagicMapper<L,R> { // note that it doesn't implement Map, though it could with some extra work. I don't get the impression you need that though.
private Map<L,R> forward;
private Map<R,MagicSet<L>> inverse;
public MagicMapper<L,R>() {
forward = new HashMap<L,R>;
inverse = new HashMap<R,<MagicSet<L>>;
}
public R getForward(L key) {
return forward.get(key);
}
public Set<L> getBackward(R key) {
return inverse.get(key); // this assumes you want a null if
// you try to use a key that has no mapping. otherwise you'd return a blank MagicSet
}
public void put (L l, R r) {
R oldVal = forward.get(l);
// if the L had already belonged to an R, we need to undo that mapping
MagicSet<L> oldSet = inverse.get(oldVal);
if (oldSet != null) {oldSet.remove(l);}
// now get the set the R belongs to, and add it.
MagicSet<L> newSet = inverse.get(l);
if (newSet == null) {
newSet = new MagicSet<L>(forward, r);
inverse.put(r,newSet);
}
newSet.add(l); // magically updates the "forward" map
}
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.