Can PicoContainer caching be thread-safe? - java

Lost a bunch of time just trying to figure out what was going on here, but I think I'm finally onto something.
We have some fairly normal PicoContainer code which simply turns on caching, which I thought was supposed to result in singleton behaviour:
container.as(Characteristics.CACHE).addComponent(Service.class, ServiceImpl.class);
However as we found today, we have a component which is apparently being constructed not once, but four times. It's not something I can reproduce on my own computer, just on some other developer machines.
We investigated further, and it turns out that multiple threads were hitting PicoContainer to look up the same component at the same time, and instead of instantiating one copy and making the other three threads wait, it appears that it just instantiates four copies (and then only remembers to keep around one of them.)
Is there some relatively simple way to get true singular behaviour in PicoContainer?

Seems pico-container needs explicit synchronization mechanism for the case you are dealing with. Here is a link which documents this behavior and suggests the solutions for the same.
To quote this link
When components are created by two threads concurrently, with the
intention of the instance being cached, it is possible in a small
percentage of cases for the first instance into the cache to be
replaced with a second instance.
The other link worth visiting is regarding caching;

Related

Implementing shared data structure (e.g. cache) as a concurrentHashMap in a singleton bean

I am implementing an HTTP API using the Spring MVC framework.
I want to store some data between requests and between sessions. The data needs to be readable and modifiable by multiple requests in completely independent sessions, but it only needs to exist in-memory while the application is running, it does not need to be persisted to a database, and it does not need to be shared between any scaled-up, multi-node, multi-process server backend design, just one per (e.g.) Tomcat instance is completely fine. Consider for example a cache or something logging some short-lived metrics about the application-specific data coming in through the requests.
I am assuming the usual way would be to use an in-memory database or something like Redis.
However, this being my first venture into web stuff and coming from c++ parallel computing personally, this seems like an extremely over-engineered and inefficient solution to me.
Can I not just create a singleton bean containing a ConcurrentHashMap of my required types, inject it as a dependency into my Controller, and be done with it? I never see anyone talk about this anywhere, even though it seems to be the simplest solution by far to me. Is there something about how Spring MVC or Tomcat works that makes this impossible?
Basically, yes. "A singleton ConcurrentHashMap" can be used as a cache.
But, I'd go with something that works like a map but has an API that is specifically tailored to caches. Fortunately, such a thing exists.
Guava is a 'general utilities' project (just a bunch of useful utility classes, lots of em now seem a bit pointless, in the sense that java.util and co have these too, but guava is over 10 years old, and everything it has didn't exist back then) - and one of the most useful things it has is a 'Cache' class. It's a Map with bonus features.
I strongly suggest you use it and follow its API designs. It's got a few things that map doesn't have:
You can set up an eviction system; various strategies are available. You can allow k/v pairs to expire X milliseconds after being created, or optionally X milliseconds after the last time they were read. Or simply guarantee that the cache will never exceed some set size, removing the least recently accessed (or written - again, your choice) k/v pair if needed.
The obvious 'get a value' API call isn't .get() like with map, it's a variant where you provide the key as well as a computation function that would calculate the value; the Cache object will just return the cache value if it exists, but if not, it will run the computation, store it in the cache, and return that. Making your life a lot easier, you just call the get method, pass in the key and the computer, and continue, not having to care about whether the computation function is used or not.
You get some control over concurrent calculations too - if 2 threads simultaneously end up wanting the value for key K which isn't in the cache, should both threads just go compute it, or should one thread be paused to wait for the other's calculation? That's also not entirely trivial to write in a ConcurrentHashMap.
Some fairly fancy footwork - weak keying/valuing: You can set things up such that if the key is garbage collected, the k/v pair gets evicted (eventually) too. This is tricky (string keys don't really work here, for example, and sometimes your value refers to your key in which case the existence of the value would mean your key can't be GCed, making this principle worthless - so you need to design your key and value classes carefully), but can be very powerful.
I believe you can also get just the guava cache stuff on its own, but if not - you know where to look: Add guava as a dependency to your project, fire up an instance of CacheBuilder, read the javadocs, and you're off :)

Duplicate planning entities in the solution

I'm new to Optaplanner, and I try to solve a quite simple problem (for now, I will add more constraints eventually).
My model is the following: I have tasks (MarkerNesting), that must run one at a time on a VirtualMachine; the goal is to assign a list of MarkerNestings to VirtualMachines, having all machines used (we can consider that we have more tasks than machines as a first approximation). As a result, I expect each task to have a start and a end date (as shadow variables - not implemented yet).
I think I must use a chained variable, with the VirtualMachine being the anchor (chained through time pattern) - am I right?
So I wrote a program inspired by some examples (tsp and coach and shuttle) with 4 machines and 4 tasks, and I expect each machine having one task when it is solved. When running it, though, I get some strange results : not all machines are used, but the worst is that I have duplicate MarkerNesting instances (output example):
[VM 1/56861999]~~~>[Nesting(155/2143571436)/[Marker m4/60s]]~~~>[Nesting(816/767511741)/[Marker m2/300s]]~~~>[Nesting(816/418304857)/[Marker m2/300s]]~~~>[Nesting(980/1292472219)/[Marker m1/300s]]~~~>[Nesting(980/1926764753)/[Marker m1/300s]]
[VM 2/1376400422]~~~>[Nesting(155/1815546035)/[Marker m4/60s]]
[VM 3/1619356001]
[VM 4/802771878]~~~>[Nesting(111/548795052)/[Marker m3/180s]]
The instances are different (to read the log: [Nesting(id/hashcode)]), but they have the same id, so they are the same entity in the end. If I understand well, Optaplanner clones the solution whenever it finds a best one, but I don't know why it mixes instances like that.
Is there anything wrong in my code? Is it a normal behavior?
Thank you in advance!
Duplicate MarkerNesting instances that you didn't create, have the same content, but a different memory address, so are != from each other: that means something when wrong in the default solution cloner, which is based on reflection. It's been a while since anyone ran into an issue there. See docs section on "planning clone". The complex model of chained variables (which will be improved) doesn't help here at all.
Sometimes a well placed #DeepPlanningClone fixes it, but in this case it might as well be due to the #InverseRelationShadowVariable not being picked.
In any case, those system.out's in the setter method are misleading - they can happen both by the solution cloner as well as by the moves, so without the solution hash (= memory address), they tell nothing. Try doing a similar system.out in either your best solution change events, or in the BestSolutionRecaller call to cloneWorkingSolution(), for both the original as well as the clone.
As expected, I was doing something wrong: in Schedule (the PlanningSolution), I had a getter for a collection of VirtualMachine, which calculate from another field (pools : each Pool holds VirtualMachines). As a result, there where no setter, and the solution cloner was probably not able to clone the solution properly (maybe because pools is not annotated as a problem fact or a planning entity?).
To fix the problem, I removed the Pool class (not really needed), leaving a collection of VirtualMachines in Schedule.
To sum up, never introduce too many classes before you need them ^_^'
I pushed the correct version of my code on github.

Correctly using onUpgrade (and content providers) to handle updates without blocking the main thread, are `Loader`s pointless?

This is one of the questions that involves crossing what I call the "Hello World Gulf" I'm on the "Hello world" I can use SQLite and Content Providers (and resolvers) but I now need to cross to the other side, I cannot make the assumption that onUpgrade will be quick.
Now my go-to book (Wrox, Professional Android 4 development - I didn't chose it because of professional, I chose it because Wrox are like the O'Reilly of guides - O'Reilly suck at guides, they are reference book) only touches briefly on using Loaders, so I've done some searching, some more reading and so forth.
I've basically concluded a Loader is little more than a wrapper, it just does things on a different thread, and gives you a callback (on that worker thread) to process things in, it gives you 3 steps, initiating the query, using the results of the query, and resetting the query.
This seems like quite a thin wrapper, so question 1:
Why would I want to use Loaders?
I sense I may be missing something you see, most "utilities" like this with Android are really useful if you go with the grain so to speak, and as I said Loaders seem like a pretty thin wrapper, and they force me to have callback names which could become tedious of there are multiple queries going on
http://developer.android.com/reference/android/content/Loader.html
Reading that points out that "they ought to monitor the data and act upon changes" - this sounds great but it isn't obvious how that is actually done (I am thinking about database tables though)
Presentation
How should this alter the look of my application? Should I put a loading spinning thing (I'm not sure on the name, never needed them before) after a certain amount of time post activity creation? So the fragment is blank, but if X time elapses without the loader reporting back, I show a spiny thing?
Other operations
Loaders are clearly useless for updates and such, their name alone tells one this much, so any nasty updates and such would have to be wrapped by my own system for shunting work to a worker thread. This further leads me to wonder why would I want loaders?
What I think my answer is
Some sort of wrapper (at some level, content provider or otherwise) to do stuff on a worker thread will mean that the upgrade takes place on that thread, this solves the problem because ... well that's not on the main thread.
If I do write my own I can then (if I want to) ensure queries happen in a certain order, use my own data-structures (rather than Bundles) it seems that I have better control.
What I am really looking for
Discussion, I find when one knows why things are the way they are that one makes less mistakes and just generally has more confidence, I am sure there's a reason Loaders exist, and there will be some pattern that all of Android lends itself towards, I want to know why this is.
Example:
Adapters (for ListViews) it's not immediately obvious how one keeps track of rows (insert) why one must specify a default style (and why ArrayAdapter uses toString) when most of the time (in my experience, dare I say) it is subclasses, reading the source code gives one an understanding of what the Adapter must actually do, then I challenge myself "Can I think of a (better) system that meets these requirements", usually (and hopefully) my answer to that converges on how it's actually done.
Thus the "Hello World Gulf" is crossed.
I look forward to reading answers and any linked text-walls on the matter.
you shouldnt use Loaders directly, but rather LoaderManager

Are there any "gotchas" I should know about with regards to using multithreading in JGAP?

I'm working on a Genetic Programming project that tries to generate GPs that would represent an image. My approach is to split the image into different independent sections and having seperate threads do the evolution jobs on them.
Since things are going to be asynchronous, naturally you'd want objects to be independent as well. The problem is that I noticed that certain objects in JGAP are actually shared variables, so they are going to be shared between threads, and that would cause a lot of issues. For example, I noticed that all Variables with the same name are the same, which means that if I wanted to evaluate more than one IGPProgram at the same time I'd have to lock the variable, which could really hamper performance.
I also noticed that if you tried to create more than one GPConfiguration, the program would complain that you would have to reset it first. So this seems to me all GPConfigurations are shared (i.e. you can't have multiple threads create multiple configurations at the same time), which is a problem because creating GPProblems can take a lot of time, and I'm creating a lot of GPProblems, so I was hoping to reduce the time taken by splitting the work into multiple threads.
Are there any "gotchas" that I would need to know about when working with JGAP and threads? Unfortunately, multithreading isn't touched upon too much in the JGAP documentation and I was hoping I'd get some advice from people who might have experience with JGAP.
According to the FAQ, JGAP "does support multi-threaded computation". However, this doesn't mean that the entire API/object graph is fully thread safe. Do you have a code sample that demonstrates the problem you are having? I don't think you're going to get a canonical answer without refining your question a bit.
There is a threaded example in the JGAP distribution zip under examples/src/examples/simpleBooleanThreaded.
If you want some variable do not shared across thread, and make minor changes to let the code suppport multithread. you can using ThreadLocal.
When and how should I use a ThreadLocal variable?

Multiple threads modifying a collection in Java?

The project I am working on requires a whole bunch of queries towards a database. In principle there are two types of queries I am using:
read from excel file, check for a couple of parameters and do a query for hits in the database. These hits are then registered as a series of custom classes. Any hit may (and most likely will) occur more than once so this part of the code checks and updates the occurrence in a custom list implementation that extends ArrayList.
for each hit found, do a detail query and parse the output, so that the classes created in (I) get detailed info.
I figured I would use multiple threads to optimize time-wise. However I can't really come up with a good way to solve the problem that occurs with the collection these items are stored in. To elaborate a little bit; throughout the execution objects are supposed to be modified by both (I) and (II).
I deliberately didn't c/p any code, as it would be big chunks of code to make any sense.. I hope it make some sense with the description above.
Thanks,
In Java 5 and above, you may either use CopyOnWriteArrayList or a synchronized wrapper around your list. In earlier Java versions, only the latter choice is available. The same is true if you absolutely want to stick to the custom ArrayList implementation you mention.
CopyOnWriteArrayList is feasible if the container is read much more often than written (changed), which seems to be true based on your explanation. Its atomic addIfAbsent() method may even help simplify your code.
[Update] On second thought, a map sounds more fitting to the use case you describe. So if changing from a list to e.g. a map is an option, you should consider ConcurrentHashMap. [/Update]
Changing the objects within the container does not affect the container itself, however you need to ensure that the objects themselves are thread-safe.
Just use the new java.util.concurrent packages.
Classes like ConcurrentLinkedQueue and ConcurrentHashMap are already there for you to use and are all thread-safe.

Categories