we are using Drools Planner 5.4.0.Final.
We want to profile our java application to understand if we can improve performance.
Is there a way to profile how much time a rule needs to be evaluated?
We use a lot of eval(....) and our "average calculate count per second" is nearly 37. Removing all eval(...) our "average calculate count per second" remains the same.
We already profiled the application and we saw most of the time is spent in doMove ... afterVariableChanged(...).
So we suspect some of our rules are inefficient, but we don't understand where is the problem.
Thanks!
A decent average calculate count per second is higher than 1000 (at least), a good one higher than 5000. Follow these steps in order:
1) First, I strongly recommend to upgrade to to 6.0.0.CR5. Just follow the upgrade recipe which will guide you step by step in a few hours. That alone will double your average calculate count (and potentially far more), due to several improvements (selectors, constraint match system, ...).
2) Open the black box by enabling logging: first DEBUG, then TRACE. The logs can show if the moves are slow (= rules are slow) or the step initialization is slow (= you need JIT selection).
3) Use the stepLimit benchmark technique to find out which rule(s) are slow.
4) Use the benchmarker (if you aren't already) and play with JIT selection, late acceptance, etc. See those topics in the docs.
Related
Sorry I'm quite the beginner in the field of NLP, as the title says what is the best interval for optimization in Mallet API? I was also wondering if it was dependent or related to the number of iterations/topics/corpus etc.
The optimization interval is the number of iterations between hyperparameter updates. Values between 20 and 50 seem to work well, but I haven't done any systematic tests. One possible failure mode to look out for is that too many optimization rounds could lead to instability, with the alpha hyperparameters going to zero.
Here is an interesting blog post where Christof Schöch did some systematic tests on
Topic Modeling with MALLET: Hyperparameter Optimization
TL;DR:
It all depends on the project’s aims. But it is important that we are
aware of the massive effects Mallet’s inconspicuous parameter of the
hyperparameter optimization can have on the resulting models.
EDIT: The authors did not fix the random seed. So results might be explained by random initialization of MALLET.
I am using the OptaPlanner to optimize a chained planning problem which is similar to the VehicleRoutingExample. My planning entities have a planning variable which is another planning entity.
Now I am testing a huge dataset with ca. 1500 planning entities.
I am using an EasyJavaScoreCalculator to get a HardSoftScore. The Score includes several time and other factors which are calculated in loops.
My Problem is that the ConstrucionHeuristic (FIRST_FIT or FIRST_FIT_DECREASING) takes more than ten minutes to initialize a Solution.
I have already reduced the number of constraints and the number of loops with which I am calculating the score, but it did not have a real effect on the running duration.
Is there a way to make the CH need less time? (I thought that it would take less time than the LocalSearch stuff but it isn’t…)
EasyJavaScoreCalculator is very slow and doesn't scale beyond a few 100 entities. Use an IncremantalJavaScoreCalculator or Drools calculator instead. To see the difference for yourself, take the VRP example and switch between the 3 implementations (easy, inc and drools).
Also see the docs section about incremental score calculation, to explain why that's much faster.
It is extremely difficult to illustrate the complexity of frameworks (hibernate, spring, apache-commons, ...)
The only thing I could think of was to compare the file sizes of the jar libraries or even better, the number of classes contained in the jar files.
Of course this is not a mathematical sound proof of complexity. But at least it should make clear that some frameworks are lightweight compared to others.
Of course it would take quiet some time to calculate statistics. In an attempt to save time I was wondering if perhaps somebody did so already ?
EDIT:
Yes, there are a lot of tools to calculate the complexity of individual methods and classes. But this question is about third party jar files.
Also please note that 40% of phrases in my original question stress the fact that everybody is well aware of the fact that complexity is hard to measure and that file size and nr of classes may indeed not be sufficient. So, it is not necessary to elaborate on this any further.
There are tools out there that can measure the complexity of code. However this is more of a psychological question as you cannot mathematically define the term 'complex code'. And obviously giving two random persons some piece of code will give you very different answers.
In general the issue with complexity arises from the fact that a human brain cannot process more than a certain number of lines of code simultaneously (actually functional pieces, but normal lines of code should be exactly that). The exact number of lines that one can hold and understand in memory at the same time of course varies based on many factors (including time of day, day of the week and status of your coffee machine) and therefore completely depends on the audience. However less number of lines of code that you have to keep in your 'internal memory register' for one task is better, therefore this should be the general factor when trying to determine the complexity of an API.
There is however a pitfall with this way of calculating complexity, as many APIs offer you a fast way of solving a problem (easy entry level), but this solution later turns out to cause several very complex coding decisions, that on overall makes your code very difficult to understand. In contrast other APIs require you to do a very complex setup that is hard to understand at first, but the rest of your code will be extremely easy because of that initial setup.
Therefore a good way of measuring API complexity is to define a task to solve by that API that is representative and big enough, and then measure the average amount of simultaneous lines of code one has to keep in mind to implement that task.And once you're done, please publish the result in a scientific paper of your choice. ;)
We have a small index - less than 1MB in size and covering roughly 10,000 documents. The only fields that are stored are quite short which explains the small index size.
After the documents are loaded into the index, an update of an existing document can take between 1 and 2 seconds (there's quite a variance in this range though). We've tried utilizing various best practices (such as those in the Lucene wiki) but can't find what's wrong. We've even gone ahead and are now using RAMDirectory to remove the possibility of IO being the problem.
Is this really the performance to expect?
UPDATE
As requested below, I'm adding some more details:
We're treating Lucene as a black-box, we just time the amount of time it takes to reindex/update an object. We don't know what's going on inside.
The objects (or documents, in Lucene's terms) are quite small, with a total size of a 2KB of data each.
A code snippet outlining your entire update procedure would help. Are you committing after each update? This is not necessary and for top performance you must use Near Realtime Readers. Newer Lucene versions have an NRTManager that handles most of the boilerplate involved.
In many cases the best practice is to commit only rarely or never (except when shutting down). If your service shuts down ungracefully, you lose your index, but even if you didn't, you'd have to rebuild it upon restart anyway to account for all the changes that happened in the meantime.
I implemented textrank in java but it seems pretty slow. Does anyone know about its expected performance?
If it's not expected to be slow, could any of the following be the problem:
1) It didn't seem like there was a way to create an edge and add a weight to it at the same in JGraphT time so I calculate the weight and if it's > 0, I add an edge. I later recalculate the weights to add them while looping through the edges. Is that a terrible idea?
2) I'm using JGraphT. Is that a slow library?
3) Anything else I could do to make it faster?
It depends what you mean by "pretty slow". A bit of googling found this paragraph:
"We calculated the total time for RAKE and TextRank (as an average over 100iterations) to extract keywords from the Inspec testing set of 500 abstracts, afterthe abstracts were read from files and loaded in memory. RAKE extracted key-words from the 500 abstracts in 160 milliseconds. TextRank extracted keywordsin 1002 milliseconds, over 6 times the time of RAKE."
(See http://www.scribd.com/doc/51398390/11/Evaluating-ef%EF%AC%81ciency for the context.)
So from this, I infer that a decent TextRank implementation should be capable of extracting keywords from ~500 abstracts in ~1second.