Where does that randomness come from? - java

I'm working on a data mining research project and use code from a big svn.
Apparently one of the methods I use from that svn uses randomness somewhere without asking for a seed, which makes 2 calls to my program return different results. That's annoying for what I want to do, so I'm trying to locate that "uncontrolled" randomness.
Since the classes I use depend on many other, that's pretty painful to do by hand. Any idea how I could find where that randomness comes from ?
Edit:
Roughly, my code is structured as :
- stuff i wrote
- call to a method I didnt write involving lots of others classes
- stuff i wrote
I know that the randomness is introduced in the method I didn't write, but can't locate where exactly...
Idea:
What I'm looking for might be a tool or Eclipse plug-in that would let me see each time Random is instantiated during the execution of my program. Know anything like that ?

The default seed of many random number generators is the current time. If it's a cryptographic random number generator, it's a seed that's far more complex than that.
I'd bet that your random numbers are probably being seeded with the current time. The only way to fix that is to find the code that creates or seeds the random number generator and change it to seed to a constant. I'm not sure what the syntax of that is in Java, but in my world (C#) it's something like:
Random r = new Random(seedValue);
So even with an answer from StackOverflow, you still have some detective work to do to find the code you want.

Maybe it's a bit old-fashioned style, but...
How about tracing the intermediate results (variables, functions arguments) to standard output, gathering inputs for two different runs and checking where do they start to differ?

Maybe you want to read this:
In Java, when you create a new Random object, the seed is automaticly set to the system clocks "current time" in nanoseconds. So, when you check out the source of the Random class you will see a constructor, something like this:
public Random()
{
this(System.nanoTime());
}
Or maybe this:
In Eclipse you can set your cursor in a variable and then press F3 or F2 (I don't know exactly). This will bring you to the point where this variable is declared.
A second tool you can use is "Find usages". Then your IDE will search to all usages of a method, a variable or variable or whatever you want.

Which "big svn" are you using?
You could write some simple tests, to test whether or not two identical calls to underlying functions return two identical results...
Unless you know where the Random object is created, you're going to have to do some detective work this way.
How much of this code is open to you?

Why don't you insert a lot of logging calls (e.g. to standard error) that trace the state of the value you are concerned about throughout the program.
You can compare the trace across two successive runs to narrow down where the randomness is happening by searching for the first difference in the two log files.
Then you can insert more logging calls in that area until you precisely identify the problem.

Java's "Set" class implementations do not guarantee that they iterate the elements the same order. Thus, even if you run a program on the same machine twice, the order in which a set is traversed may change. Can't do anything about it unless one changes all "set" uses into "lists".

Related

How to remove randomization in Repast Simphony for testing purposes?

I want to remove all randomization from my Repast model so that I can refactor with confidence that functionality is unchanged. However, I was unable to remove randomization by setting the seed using RandomHelper.setSeed(1) at the top of myBuilder.build(), and making sure that my 'Default Random Seed' parameter seed was set to 1 in the GUI at initialization.
So, I tried to remove randomization from the sample JZombies model and had the same issue. Again, I set RandomHelper.setSeed(1) at the top of JZombiesBuilder.build(), and made sure the Default Random Seed was set to 1. Sometimes the output was identical, sometimes it was not.
In both cases I'm using a Text Sink to record a constant number of ticks of aggregate agent counts and aggregate agent attributes as my data. I found differences in the output files using both Windows's FC & FCIV.
What changes do I need to make to ensure deterministic behavior?
Edit:
I got deterministic behavior in the JZombies demo model by also putting RandomHelper.setSeed(1); at the top of each class's constructor. Doing the same thing in my actual model makes the first step consistently identical. There are still differences from the second tick on. I think the issue is random scheduling, now?
You should not have to set your random seed twice, so I would start with removing the RandomHelper.setSeed(1) call in your builder (and elsewhere).
The GUI random seed you are mentioning is set via the JZombies_Demo.rs/parameters.xml file.
On to your actual question. If you are using RandomHelper calls for all of your stochastic elements in the code, you should see reproducible results. If not, this could indicate that there is some unaccounted for stochasticity, e.g., the use of a non-RandomHelper call or something like iterating through a HashMap. E.g., when you iterate using the for loop over a DefaultContext, the iteration occurs over a HashSet, but when using the Context.getObjects() method, the internal iteration is over a LinkedHashMap, so repeatability is ensured.

Java code change analysis tool - e.g tell me if a method signature has changed, method implementation

Is there any diff tool specifically for Java that doesn't just highlight differences in a file, but is more complex?
By more complex I mean it'd take 2 input files, the same class file of different versions, and tell me things like:
Field names changed
New methods added
Deleted methods
Methods whose signatures have changed
Methods whose implementations have changed (not interested in any more detail than that)
Done some Googling and can't find anything like this...I figure it could be useful in determining whether or not changes to dependencies would require a rebuild of a particular module.
Thanks in advance
Edit:
I suppose I should clarify:
I'm not bothered about a GUI for the tool, it'd be something I'm interested in calling programmatically.
And as for my reasoning:
To workout if I need to rebuild certain modules/components if their dependencies have changed (which could save us around 1 hour per component)... More detailed explanation but I don't really see it as important.
To be used to analyse changes made to certain components that we are trying to lock down and rely on as being more stable, we are attempting to ensure that only very rarely should method signatures change in a particular component.
You said above that Clirr is what you're looking for.
But for others with slightly differet needs, I'd like to recommend JDiff. Both have pros and cons, but for my needs I ended up using JDiff. I don't think it'll satisfy your last bullet point and it's difficult to call programmatically. What it does do is generate a useful report for API differences.

Good algorithm for generating call graphs?

I am writing some code to generate call graphs for a particular intermediate representation without executing it by statically scanning the IR code. The IR code itself is not too complex and I have a good understanding of what function call sequences look like so all I need to do is trace the calls. I am currently doing it the obvious way:
Keep track of where we are
If we encounter a function call, branch to that location, execute and come back
While branching put an edge between the caller and the callee
I am satisfied with where I am getting at but I want to make sure that I am not reinventing the wheel here and face corner cases. I am wondering if there are any accepted good algorithms (and/or design patterns) that do this efficiently?
UPDATE:
The IR code is a byte-code disassembly from a homebrewn Java-like language and looks like the Jasmine specification.
From an academic perspective, here are some considerations:
Do you care about being conservative / correct? For example, suppose the code you're analyzing contains a call through a function pointer. If you're just generating documentation, then it's not necessary to deal with this. If you're doing a code optimization that might go wrong, you will need to assume that 'call through pointer' means 'could be anything.'
Beware of exceptional execution paths. Your IR may or may not abstract this away from you, but keep in mind that many operations can throw both language-level exceptions as well as hardware interrupts. Again, it depends on what you want to do with the call graph later.
Consider how you'll deal with cycles (e.g. recursion, mutual recursion). This may affect how you write code for traversing the graphs later on (i.e., they will need some sort of 'visited' set to avoid traversing cycles forever).
Cheers.
Update March 6:
Based on extra information added to the original post:
Be careful about virtual method invocations. Keep in mind that, in general, it is unknowable which method will execute. You may have to assume that the call will go to any of the subclasses of a particular class. The standard example goes a bit like this: suppose you have an ArrayList<A>, and you have class B extends A. Based on a random number generator, you will add instances of A and B to the list. Now you call x.foo() for all x in the list, where foo() is a virtual method in A with an override in B. So, by just looking at the source code, there is no way of knowing whether the loop calls A.foo, B.foo, or both at run time.
I don't know the algorithm, but pycallgraph does a decent job. It is worth checking out the source for it. It is not long and should be good for checking out existing design patterns.

How to test code easily?

I'm learning Java by reading "Head First Java" and by doing all the puzzles and excercies. In the book they recommend to write TestDrive classes to test the code and clases I've written, that's one really simple thing to do, but by doing this I think I can't fully test my code because I'm writing the test code knowing what I want to get, I don't know if it makes any sense, but I was wondering if there's any way of testing my code in a simple way that it tell's me what isn't working correctly. Thanks.
that's right - you know what to expect, and write test cases to cover that knowledge. In many respects this is normal - you want to test the stuff you've written just so you know it works as you expect.
now, you need to take it to the next step: find a system where it will be working (ie integrate it with other bits n pieces of the complete puzzle) and see if it still works according to your assumptions and knowledge.
Then you need to give it to someone else to test for you - they will quickly find the bits that you never thought of.
Then you give it to a real user, and they not only find the things you and your tester never thought of, but they also find the things that were never thought of by the requirements analyst.
This is the way software works, and possibly the reason its never finished.
PS. One thing about your test code that does matter more than anything - once you've done it once and found it works as expected, you can add more stuff to your app and then run your test code again to make sure it still works as expected. This is called regression testing and I think its the only reason to write your own unit tests.
and: Dilbert's take on testing.
What do we mean by code? When Unit testing, which is what I think we're talking about here, we are testing specific methods and classes.
I think I can't fully test my code
because I'm writing the test code
knowing what I want to get
In other words you are investigating whether some code fulfils a contract. Consider this example:
int getInvestvalue( int depositCents, double annualInterestRate, int years) {
}
What tests can you devise? If you devise a good set of tests you can have some confidence in this routine. So we could try these kinds of input:
deposit 100, rate 5.0, years 1 : expected answer 105
deposit 100, rate 0, years 1 : expected answer 100
deposit 100, rate 10, years 0 : expected anwer 100
What else? How about a negative rate?
More interestingly, how about a very high rate of interest like 1,000,000.50 and 100,000 years, what happens to the result, would it fit in an integer - the thing about devising this test is that it challenges the interface - why is there no exception documented?
The question then comes: how do we figure out those test cases. I don't think there is a single approach that leads to building a comprehensive set but here's a couple of things to consider:
Edges: Zero, one, two, many. In my example we don't just do a rate of 5%. We consider especially the special cases. Zero is special, one is special, negative is special, a big number is special ...
Corner cases: combinations of edges. In my example that's a large rate and large number of years. Picking these is something of an art, and is helped by our knowledge of the implmentation: here we know that there's a "multiplier" effect between rates and years.
White box: using knowldge of the implementation to drive code coverage. Adjusting the inputs to force the code down particiular paths. For example if yoiu know that the code has a "if negative rate" conditional path, then this is a clue to include a negative rate test.
One of the tenets of "Test Driven Development" is writing a test first (i.e. before you've written the code). Obviously this test will initially fail (your program may not even compile). If the test doesn't fail, then you know you've got a problem with the test itself. Once the test fails, the objective then becomes to keep writing code until the test passes.
Also, some of the more popular unit testing frameworks such as jUnit will allow you to test if something works or explicitly doesn't work (i.e. you can assert that a certain type of exception is thrown). This becomes useful to check bad input, corner cases, etc.
To steal a line from Stephen Covey, just begin with the end in mind and write as many tests as you can think of. This may seem trivial for very simple code, but the idea becomes useful as you move onto more complex problems.
This site has a lot of help resources for testing codes. SoftwareTestingHelp
First, you need to make sure your code is written to be unit tested. Dependencies on outside classes should be made explicit (required by the constructor if possible) so that it isn't possible to write a unit test without identifying every possible way to break things. If you find that there are too many dependencies, or that it isn't obvious how each dependency will be used, you need to work on the Single Responsibility Principle, which will make your classes smaller, simpler, and more modular.
Once your code is written so that you can foresee situations that might occur based on your dependencies and input parameters, you should write tests looking for the correct behavior from a variety of those foreseeable situations. One of the biggest advantages I've found to unit testing is that it actually forced me to think, "What if ...", and figure out what the correct behavior would be in each case. For example, I have to decide whether it makes more sense to throw an exception or return a null value in the case of certain errors.
Once you think you've got all your bases covered, you might also want to throw your code at a tool like QuickCheck to help you identify possibilities that you might have missed.
TestDrive
No, you should be writing JUnit or TestNG tests.
Done correctly, your tests are your specification. It defines what your code is supposed to do. Each test defines a new aspect of your application. Therefore, you would never write tests looking for things that don't work correctly since you tests specify how things should work correctly.
Once you think you've finished unit testing and coding your component, one of the best and easiest ways to raise confidence that things are working correctly is to use a technique called Exploratory Testing, which can be thought of as an unscripted exploration of the part of the application you've written looking for bugs based on your intuition and experience (and deviousness!).
Pair Programming is another great way to prevent and flush out the bugs from your code. Two minds are better than one and often someone else will think of something you didn't (and vice versa).

Tool for testing using parameter permutations

I remember there existed a testing program/library that will run your code and/or unit tests, creating all kinds of permutations of the parameters (null values, random integers, strings and so on). I just can't remember what it was called and searching google doesn't seem to come up with anything.
I know it was created for Java code and I think it was quite expensive as well. That's really all I can remember, anyone have a clue what program/library I am thinking about?
The AgitarOne JUnit Generator comes to mind. I'm not sure it's what you're thinking of, though.
JTest from Parasoft may be what you're thinking of.

Categories