How to remove randomization in Repast Simphony for testing purposes? - java

I want to remove all randomization from my Repast model so that I can refactor with confidence that functionality is unchanged. However, I was unable to remove randomization by setting the seed using RandomHelper.setSeed(1) at the top of myBuilder.build(), and making sure that my 'Default Random Seed' parameter seed was set to 1 in the GUI at initialization.
So, I tried to remove randomization from the sample JZombies model and had the same issue. Again, I set RandomHelper.setSeed(1) at the top of JZombiesBuilder.build(), and made sure the Default Random Seed was set to 1. Sometimes the output was identical, sometimes it was not.
In both cases I'm using a Text Sink to record a constant number of ticks of aggregate agent counts and aggregate agent attributes as my data. I found differences in the output files using both Windows's FC & FCIV.
What changes do I need to make to ensure deterministic behavior?
Edit:
I got deterministic behavior in the JZombies demo model by also putting RandomHelper.setSeed(1); at the top of each class's constructor. Doing the same thing in my actual model makes the first step consistently identical. There are still differences from the second tick on. I think the issue is random scheduling, now?

You should not have to set your random seed twice, so I would start with removing the RandomHelper.setSeed(1) call in your builder (and elsewhere).
The GUI random seed you are mentioning is set via the JZombies_Demo.rs/parameters.xml file.
On to your actual question. If you are using RandomHelper calls for all of your stochastic elements in the code, you should see reproducible results. If not, this could indicate that there is some unaccounted for stochasticity, e.g., the use of a non-RandomHelper call or something like iterating through a HashMap. E.g., when you iterate using the for loop over a DefaultContext, the iteration occurs over a HashSet, but when using the Context.getObjects() method, the internal iteration is over a LinkedHashMap, so repeatability is ensured.

Related

Duplicate planning entities in the solution

I'm new to Optaplanner, and I try to solve a quite simple problem (for now, I will add more constraints eventually).
My model is the following: I have tasks (MarkerNesting), that must run one at a time on a VirtualMachine; the goal is to assign a list of MarkerNestings to VirtualMachines, having all machines used (we can consider that we have more tasks than machines as a first approximation). As a result, I expect each task to have a start and a end date (as shadow variables - not implemented yet).
I think I must use a chained variable, with the VirtualMachine being the anchor (chained through time pattern) - am I right?
So I wrote a program inspired by some examples (tsp and coach and shuttle) with 4 machines and 4 tasks, and I expect each machine having one task when it is solved. When running it, though, I get some strange results : not all machines are used, but the worst is that I have duplicate MarkerNesting instances (output example):
[VM 1/56861999]~~~>[Nesting(155/2143571436)/[Marker m4/60s]]~~~>[Nesting(816/767511741)/[Marker m2/300s]]~~~>[Nesting(816/418304857)/[Marker m2/300s]]~~~>[Nesting(980/1292472219)/[Marker m1/300s]]~~~>[Nesting(980/1926764753)/[Marker m1/300s]]
[VM 2/1376400422]~~~>[Nesting(155/1815546035)/[Marker m4/60s]]
[VM 3/1619356001]
[VM 4/802771878]~~~>[Nesting(111/548795052)/[Marker m3/180s]]
The instances are different (to read the log: [Nesting(id/hashcode)]), but they have the same id, so they are the same entity in the end. If I understand well, Optaplanner clones the solution whenever it finds a best one, but I don't know why it mixes instances like that.
Is there anything wrong in my code? Is it a normal behavior?
Thank you in advance!
Duplicate MarkerNesting instances that you didn't create, have the same content, but a different memory address, so are != from each other: that means something when wrong in the default solution cloner, which is based on reflection. It's been a while since anyone ran into an issue there. See docs section on "planning clone". The complex model of chained variables (which will be improved) doesn't help here at all.
Sometimes a well placed #DeepPlanningClone fixes it, but in this case it might as well be due to the #InverseRelationShadowVariable not being picked.
In any case, those system.out's in the setter method are misleading - they can happen both by the solution cloner as well as by the moves, so without the solution hash (= memory address), they tell nothing. Try doing a similar system.out in either your best solution change events, or in the BestSolutionRecaller call to cloneWorkingSolution(), for both the original as well as the clone.
As expected, I was doing something wrong: in Schedule (the PlanningSolution), I had a getter for a collection of VirtualMachine, which calculate from another field (pools : each Pool holds VirtualMachines). As a result, there where no setter, and the solution cloner was probably not able to clone the solution properly (maybe because pools is not annotated as a problem fact or a planning entity?).
To fix the problem, I removed the Pool class (not really needed), leaving a collection of VirtualMachines in Schedule.
To sum up, never introduce too many classes before you need them ^_^'
I pushed the correct version of my code on github.

Android: cancellable Java Collection sort in AsyncTask.doInBackground

I am writing an application for Android mobile phones.
I have a java.util.ArrayList that contains objects which require a custom java.util.Comparator to be sorted properly.
I know I can use java.util.Collections.sort() but the amount of data is such that I want to sort my ArrayList in an android.os.AsyncTask.
I do not want to use several AsyncTask objects that would each sort a subset of the data.
Any AsyncTask can be cancelled so I want to regularly call AsyncTask.isCancelled() while I sort. If it returns true, I give up on sorting (and on my whole data set).
I Googled but could not find an AsyncTask-friendly way to sort while regularly checking for cancellation.
I may be able to call isCancelled() in my implementation of java.util.Comparator.compare() and throw my own subclass of java.lang.RuntimeException if it returns true. Then try{java.util.Collections.sort(ArrayList, Comparator);} catch () {} for that specific exception. I don't feel entirely comfortable with that approach.
Alternatively, I can use an intermediary java.util.TreeSet and write 2 loops that each check for cancellation before every iteration. The first loop would add all the items of the ArrayList to the TreeSet (and my Comparator implementation keeps them sorted on insertion). The second loop would add all the object in the TreeSet back into the ArrayList, in the correct order, thanks to the TreeSet natural java.util.Iterator. This approach uses some extra memory but will assuredly work.
The last approach I can think of is to implement myself what I have actually been looking for in Android: A generic java.util.List (preferably quick)sort using a java.util.Comparator and an android.os.AsyncTask that regularly checks for cancellation.
Has anybody found that?
Do you have any other solution to this problem?
EDIT:
Although I haven't given any thought to what the sorting method signature would look like, I would also be perfectly happy with using a android.os.CancellationSignal to decide when to abandon the sorting.
I’ll try to describe my thought process here. If anybody had better offers at any point…
Lets re-affirm what we are trying to achieve here.
We need a sorting algorithm with the following properties
Runs on a single task
Be in place i.e. not use extra memory
We should be able to cancel the sort at will, i.e. return immediately or very close to it when we decide it's no longer needed.
Be efficient
Would not use exceptions to control a perfectly normal flow of your application. You are right about not feeling comfortable about that one ☺
There is no native android tool to do that AFAIK
Let's focus for a second on requirement 3.
Here is a quote from asycTask documentation, The section regarding cancelling a task
Blockquote
To ensure that a task is cancelled as quickly as possible, you should always check the return value of isCancelled() periodically from doInBackground(Object[]), if possible (inside a loop for instance.) ".
Meaning, an iterative sorting algorithm, where on each iteration you must check for the isCancalled() flag, will fill this requirment. The problem is simple iterative sorting algorithms , such is Insertion sort, often are not very efficient. It shouldn’t matter too much for small inputs, but since you say your typical input is a huge array list, and that triggered our quest anyway, we need to keep things as efficient as possible.
Since you did mention quick sort, I was thinking, it has got everything we need! It’s efficient, it’s in place, it runs on a single task. There is only one shortfall. It is, in it’s classic form, recursive, meaning it won’t return immediately upon cancellation. Luckily a brief Google search yields many results that can help including this one. In Brief, you can find there a variant for quicksort that is iterative. This is done by replacing the recursive callstack by a stack that stores the same indexes that recursive implementation would use to preform "partition" with.
Take this Algorithm, add a check if asyncTask.isCancelled() on each iteration, and you got yourself a solution that answers all the requirements.

How can you easily compare modified code against a reference implementation?

I'm currently in the process of modifying somebody else's R-Tree implementation in order to add additional behaviour. I want to ensure that once I have made my changes the basic structure of the tree remains unchanged.
My current approach is to create a copy of the reference code and move it into it's own package (tree_ref). I have then created a unit test which has instances of my modified tree and the original tree (in tree_ref). I am filling the trees with data and then checking that their field values are identical - in which case I assert the test case as having passed.
It strikes me that this may not be the best approach and that there may be some recognised methodology that I am unaware of to solve this problem. I haven't been able to find one by searching.
Any help is appreciated. Thanks.
What you're doing makes sense, and is a good practice. Note that whenever you 'clone-and-own' an existing package, you're likely doing it for a reason. Maybe its performance. Maybe it is a behavior change. But whatever the reason, the tests you run against the reference and test subject need to be agnostic to the changes.
Usually, this sort of testing works well with randomized testing -- of some sort of collection implementation, for example.
Note also that if the reference implementation had solid unit tests, you don't need to cover those cases -- you simply need to target the tests at your implementation.
(And for completeness, let me state this no-brainer) you still have to add your own tests to cover the new behavior you've introduced with your changes.
I would do that in two stages:
First, insert random data into the tree. (I assume that's what you are doing)
Second check some extreme cases (does the tree handle negative numbers, NaN, Infinity, hundreds of identical points, unbalanced distribution of points?)
R-trees are fun. Enjoy!

Eclipse - how to debug input parameter change

I have the following problem, we might even call it a classic one:
public void myMethod(Map<Object, Object> parameter){
someOtherObject.method(parameter);
.
.
.
someOtherThirdPartyObject.method(parameter);
}
And suddenly, in the end some method touched the input parameter Map, and I don't know where and how. Now, I know it would be desirable to make the parameter immutable, but it is not and that is the root of the problem. For instance, the methods inside myMethod are intended to perform some validations, but they do some more as well, which is wrong by design.
So, the question is how to create a breakpoint in this method where the execution pauses if an attribute of this parameter Map changes? It might be a good idea to put a conditional breakpoint after each method call, but if you have 20-odd methods, it's rather painful.
How can I debug when this input parameter is changing?
What you want appears to be called a "watchpoint". I actually didn't know this functionality existed and I used to work on the Eclipse Project!
http://help.eclipse.org/indigo/index.jsp?topic=%2Forg.eclipse.cdt.doc.user%2Ftasks%2Fcdt_t_add_watch.htm
It looks like you'll have to figure out what fields are being editted and then set a "Write" watchpoint using the help document above.
Additionally, Eclipse highlights variables which are modified, so if you step over your method calls one by one you will be able to see which one is modifying the value (and which field is being modified) because it will be highlighted (bright yellow, by default) in the "variables" tab in the "debug" perspective. Once you know which method if modifying the data you can run debug again, but this time debug the method that changes the value and just keep repeating until you find the problem.
This is a classic problem solving scenario where you start with a very large search space and systematically and methodologically narrow it down until the search space is small enough for you to locate the problem.
If you're trying to locate a spot where your map is being modified incorrectly, you might want to first start at the higher levels of the myMethod. Put breakpoints around the methods called inside the myMethod method. At each breakpoint, look at the contents of the Map. Eclipse has a variable watch panel where you can see the contents of every variable at a specific moment in time.
When you hit the breakpoint where you notice something is wrong. Stop. You now know to dig into someOtherObject.method(parameter); assuming the data was changed at it's breakpoint.
Now, someotherObject.method will likely have other methods inside it. Put your breakpoints inside this method around all of it's function calls and repeat the process. Continue repeating until there are no more methods left. Eventually, you will narrow down the problem and have the answer.
Unfortunately, there is no magic "fix my code" button for these types of problems. It just takes good, old fashioned Sherlock Holmes style investigative skills and reasoning to eliminate areas of the code that you know aren't the problem until you're left with a smaller section that allows you to get at the root cause.
If no code modification is allowed, you can
use the watchpoints method described by acattle to watch changes at this specific map instance or
have breakpoints in the Map methods modifying its state (if you want to do that for multiple instances). It does not matter that the Map code is binary only, you can still open it using Ctrl-Shift-T (Open Type), select the methods like put(...) or remove(...) in the outline view and add breakpoints using the context menu in the outline view.

Where does that randomness come from?

I'm working on a data mining research project and use code from a big svn.
Apparently one of the methods I use from that svn uses randomness somewhere without asking for a seed, which makes 2 calls to my program return different results. That's annoying for what I want to do, so I'm trying to locate that "uncontrolled" randomness.
Since the classes I use depend on many other, that's pretty painful to do by hand. Any idea how I could find where that randomness comes from ?
Edit:
Roughly, my code is structured as :
- stuff i wrote
- call to a method I didnt write involving lots of others classes
- stuff i wrote
I know that the randomness is introduced in the method I didn't write, but can't locate where exactly...
Idea:
What I'm looking for might be a tool or Eclipse plug-in that would let me see each time Random is instantiated during the execution of my program. Know anything like that ?
The default seed of many random number generators is the current time. If it's a cryptographic random number generator, it's a seed that's far more complex than that.
I'd bet that your random numbers are probably being seeded with the current time. The only way to fix that is to find the code that creates or seeds the random number generator and change it to seed to a constant. I'm not sure what the syntax of that is in Java, but in my world (C#) it's something like:
Random r = new Random(seedValue);
So even with an answer from StackOverflow, you still have some detective work to do to find the code you want.
Maybe it's a bit old-fashioned style, but...
How about tracing the intermediate results (variables, functions arguments) to standard output, gathering inputs for two different runs and checking where do they start to differ?
Maybe you want to read this:
In Java, when you create a new Random object, the seed is automaticly set to the system clocks "current time" in nanoseconds. So, when you check out the source of the Random class you will see a constructor, something like this:
public Random()
{
this(System.nanoTime());
}
Or maybe this:
In Eclipse you can set your cursor in a variable and then press F3 or F2 (I don't know exactly). This will bring you to the point where this variable is declared.
A second tool you can use is "Find usages". Then your IDE will search to all usages of a method, a variable or variable or whatever you want.
Which "big svn" are you using?
You could write some simple tests, to test whether or not two identical calls to underlying functions return two identical results...
Unless you know where the Random object is created, you're going to have to do some detective work this way.
How much of this code is open to you?
Why don't you insert a lot of logging calls (e.g. to standard error) that trace the state of the value you are concerned about throughout the program.
You can compare the trace across two successive runs to narrow down where the randomness is happening by searching for the first difference in the two log files.
Then you can insert more logging calls in that area until you precisely identify the problem.
Java's "Set" class implementations do not guarantee that they iterate the elements the same order. Thus, even if you run a program on the same machine twice, the order in which a set is traversed may change. Can't do anything about it unless one changes all "set" uses into "lists".

Categories