I'm implementing a LRU cache for photos of users, using Commons Collections LRUMap (which is basicly a LinkedHashMap with small modifications). The findPhoto method can be called several hundred times within a few seconds.
public class CacheHandler {
private static final int MAX_ENTRIES = 1000;
private static Map<Long, Photo> photoCache = Collections.synchronizedMap(new LRUMap(MAX_ENTRIES));
public static Map<Long, Photo> getPhotoCache() {
return photoCache;
}
}
Usage:
public Photo findPhoto(Long userId){
User user = userDAO.find(userId);
if (user != null) {
Map<Long, Photo> cache = CacheHandler.getPhotoCache();
Photo photo = cache.get(userId);
if(photo == null){
if (user.isFromAD()) {
try {
photo = LDAPService.getInstance().getPhoto(user.getLogin());
} catch (LDAPSearchException e) {
throw new EJBException(e);
}
} else {
log.debug("Fetching photo from DB for external user: " + user.getLogin());
UserFile file = userDAO.findUserFile(user.getPhotoId());
if (file != null) {
photo = new Photo(file.getFilename(), "image/png", file.getFileData());
}
}
cache.put(userId, photo);
}else{
log.debug("Fetching photo from cache, user: " + user.getLogin());
}
return photo;
}else{
return null;
}
}
As you can see I'm not using synchronization blocks. I'm assuming that the worst case scenario here is a race condition that causes two threads to run cache.put(userId, photo) for the same userId. But the data will be the same for two threads, so that is not an issue.
Is my reasoning here correct? If not, is there a way to use a synchronization block without getting a large performance hit? Having only 1 thread accessing the map at a time feels like overkill.
Assylias is right that what you've got will work fine.
However, if you want to avoid fetching images more than once, that is also possible, with a bit more work. The insight is that if a thread comes along, makes a cache miss, and starts loading an image, then if a second thread comes along wanting the same image before the first thread has finished loading it, then it should wait for the first thread, rather than going and loading it itself.
This is fairly easy to coordinate using some of Java's simpler concurrency classes.
Firstly, let me refactor your example to pull out the interesting bit. Here's what you wrote:
public Photo findPhoto(User user) {
Map<Long, Photo> cache = CacheHandler.getPhotoCache();
Photo photo = cache.get(user.getId());
if (photo == null) {
photo = loadPhoto(user);
cache.put(user.getId(), photo);
}
return photo;
}
Here, loadPhoto is a method which does the actual nitty-gritty of loading a photo, which isn't relevant here. I assume that the validation of the user is done in another method which calls this one. Other than that, this is your code.
What we do instead is this:
public Photo findPhoto(final User user) throws InterruptedException, ExecutionException {
Map<Long, Future<Photo>> cache = CacheHandler.getPhotoCache();
Future<Photo> photo;
FutureTask<Photo> task;
synchronized (cache) {
photo = cache.get(user.getId());
if (photo == null) {
task = new FutureTask<Photo>(new Callable<Photo>() {
#Override
public Photo call() throws Exception {
return loadPhoto(user);
}
});
photo = task;
cache.put(user.getId(), photo);
}
else {
task = null;
}
}
if (task != null) task.run();
return photo.get();
}
Note that you need to change the type of CacheHandler.photoCache to accommodate the wrapping FutureTasks. And since this code does explicit locking, you can remove the synchronizedMap from it. You could also use a ConcurrentMap for the cache, which would allow the use of putIfAbsent, a more concurrent alternative to the lock/get/check for null/put/unlock sequence.
Hopefully, what is happening here is fairly obvious. The basic pattern of getting something from the cache, checking to see if what you got was null, and if so putting something back in is still there. But instead of putting in a Photo, you put in a Future, which is essentially a placeholder for a Photo which may not (or may) be there right at that moment, but which will become available later. The get method on Future gets the thing that a place is being held for, blocking until it arrives if necessary.
This code uses FutureTask as an implementation of Future; this takes a Callable capable of producing a Photo as a constructor argument, and calls it when its run method is called. The call to run is guarded with a test that essentially recapitulates the if (photo == null) test from earlier, but outside the synchronized block (because as you realised, you really don't want to be loading photos while holding the cache lock).
This is a pattern i've seen or needed a few times. It's a shame it's not built into the standard library somewhere.
Yes you are right - if the photo creation is idempotent (always returns the same photo), the worst thing that can happen is that you will fetch it more than once and put it into the map more than once.
Related
I am just getting started with IBKR API on Java. I am following the API sample code, specifically the options chain example, to figure out how to get options chains for specific stocks.
The example works well for this, but I have one question - how do I know once ALL data has been loaded? There does not seem to be a way to tell. The sample code is able to tell when each individual row has been loaded, but there doesn't seem to be a way to tell when ALL strikes have been successfully loaded.
I thought that using tickSnapshotEnd() would be beneficial, but it doesn't not seem to work as I would expect it to. I would expect it to be called once for every request that completes. For example, if I do a query for a stock like SOFI on the 2022/03/18 expiry, I see that there are 35 strikes but tickSnapshotEnd() is called 40+ times, with some strikes repeated more than once.
Note that I am doing requests for snapshot data, not live/streaming data
reqOptionsMktData is obviously a method in the sample code you are using. Not sure what particular code your using, so this is a general response.
Firstly you are correct, there is no way to tell via the API, this must be done by the client. Of course it will provide the requestID that was used when the request was made. The client needs to remember what each requestID was for and decide how to process that information when it is received in the callbacks.
This can be done via a dictionary or hashtable, where upon receiving data in the callback then check if the chain is complete.
Message delivery from the API often has unexpected results, receiving extra messages is common and is something that needs to be taken into account by the client. Consider the API stateless, and track everything in the client.
Seems you are referring to Regulatory Snapshots, I would encourage you to look at the cost. It could quite quickly add up to the price of streaming live data. Add to that the 1/sec limit will make a chain take a long time to load. I wouldn't even recommend using snapshots with live data, cancelling the request yourself is trivial and much faster.
Something like (this is obviously incomplete C#, just a starting point)
class OptionData
{
public int ReqId { get; }
public double Strike { get; }
public string Expiry { get; }
public double? Bid { get; set; } = null;
public double? Ask { get; set; } = null;
public bool IsComplete()
{
return Bid != null && Ask != null;
}
public OptionData(int reqId, double strike, ....
{ ...
}
...
class MyData()
{
// Create somewhere to store our data, indexed by reqId.
Dictionary<int, OptionData> optChain = new();
public MyData()
{
// We would want to call reqSecDefOptParams to get a list of strikes etc.
// Choose which part of the chain you want, likely you'll want to
// get the current price of the underlying to decide.
int reqId = 1;
...
optChain.Add(++reqId, new OptionData(reqId,strike,expiry));
...
// Request data for each contract
// Note the 50 msg/sec limit https://interactivebrokers.github.io/tws-api/introduction.html#fifty_messages
// Only 1/sec for Reg snapshot
foreach(OptionData opt in optChain)
{
Contract con = new()
{
Symbol = "SPY",
Currency = "USD"
Exchange = "SMART",
Right = "C",
SecType = "OPT",
Strike = opt.strike,
Expiry = opt.Expiry
};
ibClient.ClientSocket.reqMktData(opt.ReqId, con, "", false, true, new List<TagValue>());
}
}
...
private void Recv_TickPrice(TickPriceMessage msg)
{
if(optChain.ContainsKey(msg.RequestId))
{
if (msg.Field == 2) optChain[msg.RequestId].Ask = msg.Price;
if (msg.Field == 1) optChain[msg.RequestId].Bid = msg.Price;
// You may want other tick types as well
// see https://interactivebrokers.github.io/tws-api/tick_types.html
if(optChain[msg.RequestId].IsComplete())
{
// This wont apply for reg snapshot.
ibClient.ClientSocket.cancelMktData(msg.RequestId);
// You have the data, and have cancelled the request.
// Maybe request more data or update display etc...
// Check if the whole chain is complete
bool complete=true;
foreach(OptionData opt in optChain)
if(!opt.IsComplete()) complete=false;
if(complete)
// do whatever
}
}
}
This program is about showing the oldest, youngest ect person in a network.
I need to figure out how I can improve it, so I dont get the ConcurrentModificationException. I get this when I ask for displaying more of these multiple time, like asking for youngest, oldest, and make it refresh to tell me whos the current youngest.
public void randomIncreaseCoupling(int amount, double chance, double inverseChance) {
randomChangeCoupling(amount,chance,inverseChance,true);
}
public void randomDecreaseCoupling(int amount, double chance, double inverseChance) {
randomChangeCoupling(amount,chance,inverseChance,false);
This code is used in the network to randomly change the date outcome.
Also, I have this running in a Thread currently, but I need to fasten it, so I need to run each of the 'functions' to run in their own Thread.
The Class MainController is starting the Thread by:
public void startEvolution() {
if (display == null)
throw new Error("Controller not initialized before start");
evolutionThread = new NetworkEvolutionSimulator(network, display);
evolutionThread.start();
}
When I click on any button ex a button to show me the oldest in this network, it is done by:
public void startOldest() {
if (display == null)
throw new Error("Not properly initialized");
int order = display.getDistanceFor(Identifier.OLDEST);
Set<Person> oldest = network.applyPredicate(PredicateFactory.IS_OLDEST,
order);
display.displayData(Identifier.OLDEST, summarize(order, oldest));
I tried to make it like:
public void startOldest() {
if (display == null)
throw new Error("Not properly initialized");
int order = display.getDistanceFor(Identifier.OLDEST);
Set<Person> oldest = network.applyPredicate(PredicateFactory.IS_OLDEST,
order);
display.displayData(Identifier.OLDEST, summarize(order, oldest));
evolutionThread2 = new NetworkEvolutionSimulator(network, display);
evolutionThread2.start();
But this starts main thread over and over when I press the button. What I want is that this specific function and the others when I press the cercain button it has to start each of them in their own threads so I will be able to use more than one of them at a time. How shall I do this?
I can explain more if needed.
Thanks in advance.
My first post, so sorry if I didn't follow a specific rule.
You could use the synchronized keyword -
The synchronized keyword can be used to mark several types of code blocks:
Instance methods
Static methods
Code blocks inside instance methods
Code blocks inside static methods
Everywhere you're using your set oldest you could add a synchronized code block like this
synchronized(oldest) { ... }
I have the following design in a project
Multiple crawlers
a list ImageList for found images (Observable); this gets updated by threaded processes (thus parallel)
two observers which listen to the list (Downloader and ImagesWindow); caveat: these can be notified multiple times, because the list gets updated by threads
I always wanted to get only the newest entries from ImageList so I implemented it with a counter:
public class ImageList extends Observable {
private final ConcurrentMap<Integer, Image> images = new ConcurrentHashMap<Integer, Image>();
private final AtomicInteger counter = new AtomicInteger(0);
/* There is some more code within here, but its not that important
important is that stuff gets added to the list and the list shall
inform all listeners about the change
The observers then check which is the newest ID in the list (often +1
but I guess I will reduce the inform frequency somehow)
and call (in synchronized method):
int lastIndex = list.getCurrentLastIndex();
getImagesFromTo(myNextValue, lastIndex);
myNextValue = lastIndex + 1;
*/
public synchronized void addToFinished(Image job) throws InterruptedException {
int currentCounter = counter.incrementAndGet();
images.put(currentCounter, job);
this.setChanged();
this.notifyObservers();
}
public synchronized int getCurrentLastIndex() {
return counter.get();
}
public ArrayList<Image> getImagesFromTo(int starting, int ending) {
ArrayList<Image> newImages = new ArrayList<Image>();
Image image;
for (int i = starting; i <= ending; i++) {
image = images.get(i);
if (image != null) {
newImages.add(image);
}
}
return newImages;
}
}
The observers (Downloader here) use this method like this:
#Override
public void update(Observable o, Object arg) {
System.out.println("Updated downloader");
if (o instanceof ImageList) {
ImageList list = (ImageList) o;
downloadNewImages(list);
}
}
private synchronized void downloadNewImages(ImageList list) {
int last = list.getCurrentLastIndex();
for (Image image : list.getImagesFromTo(readImageFrom, last)) {
// code gets stuck after this line
if (filter.isOk(image)) {
// and before this line
// [here was a line, but it also fails if I remove it]
}
}
// set the index to the new index
readImageFrom = last + 1;
}
However, sometimes the loop gets stuck and a second call seems to be allowed on the method. Then this is what happens:
Downloader retrieves images 70 to 70
Downloader retrieves images 70 to 71
Downloader retrieves images 70 to 72
…
Downloader retrieves images 70 to n
So a second call to the method is allowed entering the method, but the counter readImageFrom never gets updated.
When I remove both calls to the other functions within the loop, the script begins to work. I know they are not synchronized, but do they have to be if already the "parent" is synchronized?
filter.isOK() is implemented like this (the other functions just return true or false; the code fails when I have hasRightColor included, I guess because it is a bit slower to calculate):
public boolean isOk(Image image) {
return hasRightDimensions(image) && hasRightColor(image);
}
How can this happen? Eclipse does not show any thrown exception (which of course would cause the method to be exited).
Maybe there also is a totally different approach for getting only the newest content of a list from multiple observers (where each observer might be notified several times because the program runs parallel)?
Okay, the error was some wicked NullPointerException which was not displayed to me (whoever knows why) in filter.isOk().
I was not able to see it in my IDE, because I had changed from this.image to parameter-passing image, but forgot to remove private image in the header and to change the parameters of the last of the three functions.
So eclipse did neither say anything about a missing image nor about an unused this.image.
Finally.
I have a strange (at least for me) behaviour with guava cache. After the first hit, the following accesses will return an empty object. I did not use strange evinction, so I can't figure out where I'm doing wrong.
I declared the following LoadingCache:
LoadingCache<String, Vector<Location>> locations = CacheBuilder.newBuilder()
.maximumSize(100000)
.build(
new CacheLoader<String,Vector<Location>>() {
#Override
public Vector<Location> load(String key) {
return _getLocationListByTranscriptId(key);
}
});
and I used it only in this method:
public Vector<Location> getLocationListByTranscriptId (String transcriptid) {
if (transcriptid.equals("TCONS_00000046")) System.out.println("tcons found, will this work?");
Vector<Location> result;
try {
result = locations.get(transcriptid);
} catch (ExecutionException e) {
System.err.println("Error accessing cache, doing the hard way");
result = _getLocationListByTranscriptId(transcriptid);
}
if (transcriptid.equals("TCONS_00000046")){
if (result.size()==0){
System.out.println("this is a problem");
return null;
}
System.out.println("this is good!");
}
return result;
}
Iterating a Collection of input string, i get the following output:
tcons found, will this work?
this is good!
tcons found, will this work?
this is a problem
So, the first time i use the cache, it works, but
A) the value is not correctly stored for future accesses;
B) the value is resetted for some strange behaviour.
What can i do? Thanks all for reading this!
EDIT:
Thanks to axtavt answer I could immediately figure out where I was editing the resulting list. Don't know why, i was sure about guava cache returning a copy of the values. Thank you for the answer, and for the suggestions about defensive programming. (Sorry if i can't rate your answer yet).
I believe you accidentially clear the Vector somewhere in your code. There are two possibilities:
Vector is modified by the code that obtains it from the cache.
This kind of mistakes can be prevented by making defensive copy (though it ruins the idea of caching), or returning immutable view of collection:
LoadingCache<String, List<Location>> locations = CacheBuilder.newBuilder()
.maximumSize(100000)
.build(
new CacheLoader<String, List<Location>>() {
#Override
public List<Location> load(String key) {
return Collections.unmodifiableList(
_getLocationListByTranscriptId(key));
}
});
After changing the code this way it will be easy to spot the place where illegal modification of collection takes place.
Note that there is no unmodifiable view of Vector, therefore List should be used instead.
_getLocationListByTranscriptId() stores its result in a field where it can be accessed by other methods (or other invocation of the same method). So, you should check that _getLocationListByTranscriptId() doesn't leave any references to its result in fields.
Here is the proposed solution (I did search for the same - unsuccessfully)
public abstract class AsyncCache<T> {
/**
* an atomic int is used here only because stamped reference doesn't take a long,
* if it did the current thread could be used for the stamp.
*/
private AtomicInteger threadStamp = new AtomicInteger(1);
private AtomicStampedReference<T> reference = new AtomicStampedReference<T>(null, 0);
protected abstract T rebuild();
public void reset() {
reference.set(null, 0);
}
public T get() {
T obj = reference.getReference();
if (obj != null) return obj;
int threadID = threadStamp.incrementAndGet();
reference.compareAndSet(null, null, 0, threadID);
obj = rebuild();
reference.compareAndSet(null, obj, threadID, threadID);
return obj;
}
}
The process should be easy to see - the resource is only built when requested and invalidated by calling reset.
The first thread to request resource inserts its ID into the stamped reference and will then insert its version of the resource once generated UNLESS another reset is called.
in case of a subsequent reset the first requesting thread will return a stale version of the resource (is valid use case), and some request started after the latest reset will populate the reference with its result.
Please let me know if I've missed something or if there is a better (faster + simlpler ++ elegant) solution.
One thing - the MAX_INT is not handled intentionally - don't believe the prog will live long enough, but certainly easy to do.
Thank you.
Thats defiently not async since requesting thread will block until rebuild() method is complete. Another problem - you don't check value returned from compareAndSet. I believed you need something like this
if(reference.compareAndSet(null, null, 0, threadID)) { //if resource was already reseted - rebuild
reference.compareAndSet(null, obj, threadID, threadID);
obj = rebuild();
}
But that approach has another drawback - you have to rebuild entry several times (considering several threads wants that entry at once). You can use future tasks for that case (http://www.codercorp.com/blog/java/simple-concurrent-in-memory-cache-for-web-application-using-future.html) or use MapMaker.