Running time of Construction Heuristic in OptaPlanner - java

I am using the OptaPlanner to optimize a chained planning problem which is similar to the VehicleRoutingExample. My planning entities have a planning variable which is another planning entity.
Now I am testing a huge dataset with ca. 1500 planning entities.
I am using an EasyJavaScoreCalculator to get a HardSoftScore. The Score includes several time and other factors which are calculated in loops.
My Problem is that the ConstrucionHeuristic (FIRST_FIT or FIRST_FIT_DECREASING) takes more than ten minutes to initialize a Solution.
I have already reduced the number of constraints and the number of loops with which I am calculating the score, but it did not have a real effect on the running duration.
Is there a way to make the CH need less time? (I thought that it would take less time than the LocalSearch stuff but it isn’t…)

EasyJavaScoreCalculator is very slow and doesn't scale beyond a few 100 entities. Use an IncremantalJavaScoreCalculator or Drools calculator instead. To see the difference for yourself, take the VRP example and switch between the 3 implementations (easy, inc and drools).
Also see the docs section about incremental score calculation, to explain why that's much faster.

Related

Mallet Topic Modelling API - How to decide number of intervals needed or best for optimization?

Sorry I'm quite the beginner in the field of NLP, as the title says what is the best interval for optimization in Mallet API? I was also wondering if it was dependent or related to the number of iterations/topics/corpus etc.
The optimization interval is the number of iterations between hyperparameter updates. Values between 20 and 50 seem to work well, but I haven't done any systematic tests. One possible failure mode to look out for is that too many optimization rounds could lead to instability, with the alpha hyperparameters going to zero.
Here is an interesting blog post where Christof Schöch did some systematic tests on
Topic Modeling with MALLET: Hyperparameter Optimization
TL;DR:
It all depends on the project’s aims. But it is important that we are
aware of the massive effects Mallet’s inconspicuous parameter of the
hyperparameter optimization can have on the resulting models.
EDIT: The authors did not fix the random seed. So results might be explained by random initialization of MALLET.

Watchmaker genetic algorithm combining Termination conditions

Using Stagnation(numGenerations, true) to terminate an evolution in Watchmaker.
I would like the numGenerations to depend on how well the evolution is doing. If I have a rotten population (low fitness) then I would like to bail out early. If the population is performing well, I'd like to give it more time.
How would I do that?
I read the user manual, worked through the examples on http://watchmaker.uncommons.org/, looked at the API, and searched around the web. Didn't see this topic addressed specifically. I'm new to Java and genetic algorithms, so I could have easily missed something.
Rereading the API I discovered that multiple TerminationConditions can be supplied to engine.evolvePopulation(). That let me write a recursive function that keeps going as long as the fitness continues to improve.
process (Parameters params) {
result = engine.evolvePopulation(params.size, 0,
new Stagnation(params.stagnation, true),
new TargetFitness(params.target, true));
if (result.get(0).getFitness() >= params.target)
process(params.increase());
return;
}
In my case, the target is incremented by a fixed amount every time. The size and stagnation are increased as a function of the cube of the target. That way, the better a particular population becomes, the more time gets invested into it. Not sure that's the best approach, but for this problem it got the answer I was looking for.
Oh by the way, my program doesn't really look like what I pasted in above. I'm a pretty lousy programmer and my code is a lot uglier than that. Just trying to show the gist of the idea.
The Stagnation termination condition only aborts the evolution if the best fitness score in the population does not improve for a certain number of consecutive generations. It does not cut-off after a fixed number of generations from the start (for that you would use the GenerationCount condition), it only kicks in when the evolution appears to have stopped making progress. So if your population is performing well (by which I take it you mean that the fitness is continuing to improve) the stagnation condition is unlikely to be triggered.
If you want something different you might need to write your own TerminationCondition. It's just a single method that takes the PopulationData as an argument so that you can make decisions based on that at the end of each generation. You just need to be able to define "rotten population" in terms of the mean and/or best fitness and the number of generations so far.

Mergesort running faster on larger inputs

I'm working on an empirical analysis of merge sort (sorting strings) for school, and I've run into a strange phenomenon that I can't explain or find an explanation of. When I run my code, I capture the running time using the built in system.nanotime() method, and for some reason at a certain input size, it actually takes less time to execute the sort routine than with a smaller input size.
My algorithm is just a basic merge sort, and my test code is simple too:
//Get current system time
long start = System.nanoTime();
//Perform mergesort procedure
a = q.sort(a);
//Calculate total elapsed sort time
long time = System.nanoTime()-start;
The output I got for elapsed time when sorting 900 strings was: 3928492ns
For 1300 strings it was: 3541923ns
With both of those being the average of about 20 trials, so it's pretty consistent. After 1300 strings, the execution time continues to grow as expected. I'm thinking there might be some peak input size where this phenomenon is most noticeable.
So my Question: What might be causing this sudden increase in speed of the program? I was thinking there might be some sort of optimization going on with arrays holding larger amounts of data, although 1300 items in an array is hardly large.
Some info:
Compiler: Java version 1.7.0_07
Algorithm: Basic recursive merge sort (using arrays)
Input type: Strings 6-10 characters long, shuffled (random order)
Am I missing anything?
Am I missing anything?
You're trying to do a microbenchmark, but the code you've posted so far does not resemble a well working sample. To do so, please follow the rules stated here: How do I write a correct micro-benchmark in Java?.
The explanation about your code being faster is because after some iterations of your method, the JIT will trigger and the performance of your code will be optimized, thus your code getting faster, even when processing larger data.
Some recommendations:
Use several array/list inputs with different size. Good values to do this kind of analysis are 100, 1000 (1k), 10000 (10k), 100000 (100k), 1000000 (1m) and random size values between these. You will get more accurate results when performing evaluations that take longer time.
Use arrays/lists of different objects. Create a POJO and make it implement the Comparable interface, then execute your sort method. As explained above, use different arrays values.
Not directly related to your question, but the execution results are based on the JDK used. Eclipse is just an IDE and can work with different JDK versions, e.g. at my workplace I use JDK 6 u30 to work on projects on the company, but for personal projects (like proof of concepts) I use JDK 7 u40.

java frameworks complexity statistics

It is extremely difficult to illustrate the complexity of frameworks (hibernate, spring, apache-commons, ...)
The only thing I could think of was to compare the file sizes of the jar libraries or even better, the number of classes contained in the jar files.
Of course this is not a mathematical sound proof of complexity. But at least it should make clear that some frameworks are lightweight compared to others.
Of course it would take quiet some time to calculate statistics. In an attempt to save time I was wondering if perhaps somebody did so already ?
EDIT:
Yes, there are a lot of tools to calculate the complexity of individual methods and classes. But this question is about third party jar files.
Also please note that 40% of phrases in my original question stress the fact that everybody is well aware of the fact that complexity is hard to measure and that file size and nr of classes may indeed not be sufficient. So, it is not necessary to elaborate on this any further.
There are tools out there that can measure the complexity of code. However this is more of a psychological question as you cannot mathematically define the term 'complex code'. And obviously giving two random persons some piece of code will give you very different answers.
In general the issue with complexity arises from the fact that a human brain cannot process more than a certain number of lines of code simultaneously (actually functional pieces, but normal lines of code should be exactly that). The exact number of lines that one can hold and understand in memory at the same time of course varies based on many factors (including time of day, day of the week and status of your coffee machine) and therefore completely depends on the audience. However less number of lines of code that you have to keep in your 'internal memory register' for one task is better, therefore this should be the general factor when trying to determine the complexity of an API.
There is however a pitfall with this way of calculating complexity, as many APIs offer you a fast way of solving a problem (easy entry level), but this solution later turns out to cause several very complex coding decisions, that on overall makes your code very difficult to understand. In contrast other APIs require you to do a very complex setup that is hard to understand at first, but the rest of your code will be extremely easy because of that initial setup.
Therefore a good way of measuring API complexity is to define a task to solve by that API that is representative and big enough, and then measure the average amount of simultaneous lines of code one has to keep in mind to implement that task.And once you're done, please publish the result in a scientific paper of your choice. ;)

Drools Planner rule profiling

we are using Drools Planner 5.4.0.Final.
We want to profile our java application to understand if we can improve performance.
Is there a way to profile how much time a rule needs to be evaluated?
We use a lot of eval(....) and our "average calculate count per second" is nearly 37. Removing all eval(...) our "average calculate count per second" remains the same.
We already profiled the application and we saw most of the time is spent in doMove ... afterVariableChanged(...).
So we suspect some of our rules are inefficient, but we don't understand where is the problem.
Thanks!
A decent average calculate count per second is higher than 1000 (at least), a good one higher than 5000. Follow these steps in order:
1) First, I strongly recommend to upgrade to to 6.0.0.CR5. Just follow the upgrade recipe which will guide you step by step in a few hours. That alone will double your average calculate count (and potentially far more), due to several improvements (selectors, constraint match system, ...).
2) Open the black box by enabling logging: first DEBUG, then TRACE. The logs can show if the moves are slow (= rules are slow) or the step initialization is slow (= you need JIT selection).
3) Use the stepLimit benchmark technique to find out which rule(s) are slow.
4) Use the benchmarker (if you aren't already) and play with JIT selection, late acceptance, etc. See those topics in the docs.

Categories