Drools performance for decision tables

Drools performance for decision tables - java

I have a potential performance/memory bottleneck when I try to calculate insurance premium using Drools engine.
I use Drools in my project to separate business logic from java code and I decided to use it for premium calculation too.
Am I using Drools the wrong way?
How to meet the requirements in more performant way?
Details below:
Calculations
I have to calculate insurance premium for given contract.
Contract is configured with
productCode (code from dictionary)
contractCode (code from dictionary)
client’s personal data (e.g. age, address)
insurance sum (SI)
etc.
At the moment, premium is calculated using this formula:
premium := SI * px * (1 + py) / pz
where:
px is factor parameterized in Excel file and depends on 2 properties (client’s age and sex)
py is factor parameterized in excel file and depends on 4 contract’s properties
pz - similarly
Requirements
R1 – java code doesn’t know the formula,
R2 - java code knows nothing about formula dependencies, in other words that premium depends on: px, py, pz,
R3 - java code knows nothing about parameters’ dependencies, I mean that px depends on client’s age and sex, and so on.
With R1, R2 and R3 implemented I have java code in separation from business logic, and any business analyst (BA) may modify formula and add new dependencies without redeploys.
My solution, so far
I have contract domain model, which consists of classes Contract, Product, Client, Policy and so on. Contract class is defined as:
public class Contract {
String code; // contractCode
double sumInsured; // SI
String clientSex; // M, F
int professionCode; // code from dictionary
int policyYear; // 1..5
int clientAge; //
... // etc.
In addition I introduced Var class that is container for any parameterized variable:
public class Var {
public final String name;
public final ContractPremiumRequest request;
private double value; // calculated value
private boolean ready; // true if value is calculated
public Var(String name, ContractPremiumRequest request) {
this.name = name;
this.request = request;
}
...
public void setReady(boolean ready) {
this.ready = ready;
request.check();
}
...
// getters, setters
}
and finally - request class:
public class ContractPremiumRequest {
public static enum State {
INIT,
IN_PROGRESS,
READY
}
public final Contract contract;
private State state = State.INIT;
// all dependencies (parameterized factors, e.g. px, py, ...)
private Map<String, Var> varMap = new TreeMap<>();
// calculated response - premium value
private BigDecimal value;
public ContractPremiumRequest(Contract contract) {
this.contract = contract;
}
// true if *all* vars are ready
private boolean _isReady() {
for (Var var : varMap.values()) {
if (!var.isReady()) {
return false;
}
}
return true;
}
// check if should modify state
public void check() {
if (_isReady()) {
setState(State.READY);
}
}
// read number from var with given [name]
public double getVar(String name) {
return varMap.get(name).getValue();
}
// adding uncalculated factor to this request – makes request IN_PROGRESS
public Var addVar(String name) {
Var var = new Var(name, this);
varMap.put(name, var);
setState(State.IN_PROGRESS);
return var;
}
...
// getters, setters
}
Now I can use these classes with such flow:
request = new ContractPremiumRequest(contract)
creates request with state == INIT
px = request.addVar( "px" )
creates Var("px") with ready == false
moves request to state == IN_PROGRESS
py = request.addVar( "py" )
px.setValue( factor ), px.setReady( true )
set calculated value on px
makes it ready == true
request.check() makes state == READY if ALL vars are ready
now we can use formula, as request has all dependencies calculated
I have created 2 DRL rules and prepared 3 decision tables (px.xls, py.xls, ...) with factors provided by BA.
Rule1 - contract_premium_prepare.drl:
rule "contract premium request - prepare dependencies"
when
$req : ContractPremiumRequest (state == ContractPremiumRequest.State.INIT)
then
insert( $req.addVar("px") );
insert( $req.addVar("py") );
insert( $req.addVar("pz") );
$req.setState(ContractPremiumRequest.State.IN_PROGRESS);
end
Rule2 - contract_premium_calculate.drl:
rule "contract premium request - calculate premium"
when
$req : ContractPremiumRequest (state == ContractPremiumRequest.State.READY)
then
double px = $req.getVar("px");
double py = $req.getVar("py");
double pz = $req.getVar("pz");
double si = $req.contract.getSumInsured();
// use formula to calculate premium
double premium = si * px * (1 + py) / pz;
// round to 2 digits
$req.setValue(premium);
end
Decision table px.xls:
Decision table py.xls:
KieContainer is constructed once on startup:
dtconf = KnowledgeBuilderFactory.newDecisionTableConfiguration();
dtconf.setInputType(DecisionTableInputType.XLS);
KieServices ks = KieServices.Factory.get();
KieContainer kc = ks.getKieClasspathContainer();
Now to calculate premium for given contract we write:
ContractPremiumRequest request = new ContractPremiumRequest(contract); // state == INIT
kc.newStatelessKieSession("session-rules").execute(request);
BigDecimal premium = request.getValue();
This is what happens:
Rule1 fires for ContractPremiumRequest[INIT]
this rule creates and adds px, py and pz dependencies (Var objects)
proper excel row fires for each px, py, pz object and makes it ready
Rule2 fires for ContractPremiumRequest[READY] and use formula
Volumes
PX decision table has ~100 rows,
PY decision table has ~8000 rows,
PZ decision table has ~50 rows.
My results
First calculation, which loads and initializes decision tables takes ~45 seconds – this might become problematic.
Each calculation (after some warmup) takes ~0.8 ms – which is acceptable for our team.
Heap consumption is ~150 MB – which is problematic as we expect much more big tables will be used.
Question
Am I using Drools the wrong way?
How to meet the requirements in more performant way?
How to optimize memory usage?
========== EDIT (after 2 years) ==========
This is a short summary after 2 years.
Our system has grown very much, as we expected. We have ended with more then 500 tables (or matrices) with insurance pricing, actuarial factors, coverage configs etc.
Some tables are more than 1 million rows in size. We used drools but we couldn't handle performance problems.
Finally we have used Hyperon engine (http://hyperon.io)
This system is a beast - it allows us to run hundreds rule matches in approx 10 ms total time.
We were even able to trigger full policy recalculation on every KeyType event on UI fields.
As we have learnt, Hyperon uses fast in-memory indexes for each rule table and these indexes are somehow compacted so they offer almost no memory footprint.
We have one more benefit now - all pricing, factors, config tables can be modified on-line (both values and structure) and this is fully transparent to java code.
Application just continues to work with new logic, no development or restart is needed.
However we have needed some time and effort to get to know Hyperon well enough :)
I have found some comparison made by our team a year ago - it shows engine initialization (drools/hyperon) and 100k simple calculations from jvisualVM perspective:

The problem is that you have created a huge amount of code (all the rules resulting from the tables) for what is a relatively small amount of data. I have seen similar cases, and they all benefited from inserting the tables as data. PxRow, PyRow and PzRow should be defined like this:
class PxRow {
private String gender;
private int age;
private double px;
// Constructor (3 params) and getters
}
Data can still be in (simpler) spreadsheets or anything else you fancy for data entry by the BA boffins. You insert all rows as facts PxRow, PyRow, PzRow. Then you need one or two rules:
rule calculate
when
$c: Contract( $cs: clientSex, $ca: clientAge,
$pc: professionCode, $py: policyYear,...
...
$si: sumInsured )
PxRow( gender == $cs, age == $ca, $px: px )
PyRow( profCode == $pc, polYear == $py,... $py: py )
PzRow( ... $pz: pz )
then
double premium = $si * $px * (1 + $py) / $pz;
// round to 2 digits
modify( $c ){ setPremium( premium ) }
end
Forget the flow and all the other decorations. But you may need another rule just in case your Contract doesn't match Px or Py or Pz:
rule "no match"
salience -100
when
$c: Contract( premium == null ) # or 0.00
then
// diagnostic
end

After reading the question more carefully, I would offer a few recommendations:
I'd prefer a relational database to Excel spreadsheets.
These are trivially simple calculations. I think the model is overkill. A rules engine seems like far too big a hammer for a problem of this size.
I would code it more simply.
Make the calculation interfaced based so you can modify it by injecting in a new class implementation.
Learn how to write Junit tests.
My first choice would be a simple decision table calculation, without a rules engine, maintaining the factors in a relational database.
A Rete rules engine is a big hammer for if/else or switch statements. I think it's overkill unless you're leveraging induction features.
I would not put anything in session. I'm envisioning an idempotent REST service that takes in a request and returns a response with premium and whatever else has to come back.
It sounds to me like you are grossly overcomplicating the solution prematurely. Do the simplest thing that can possibly work; measure the performance; refactor as needed based on the data you get back and requirements.
How experienced a developer are you? Are you alone or part of a team? Is this a new system that's never been done by you before?

Related

How to fetch Real values from model as decimals (doubles) in Z3 (Java)?

I'm trying to fetch Real values from a Model computed by a Solver. However, even though I've set pp.decimal to true (both in the SMT2 file and using the Global.setParameter), that's only obeyed when printing the model itself.
When I attempt to fetch values by using model.getConstInterp over the values of model.getConstDecls, they all display fractions (making my hacky solution of using Double.parseDouble infeasible).
I was wondering if there's any convenient way to fetch the values of constant functions within the model without forcing me to write a parser (for either the model or the arithmetic expressions it's producing).
Any help would be much appreciated.
EDIT to include example:
BoolExpr[] assertions = ctx.parseSMTLIB2String(smt, null, null, null, null);
// get solver from context (modelled upon assertions)
Solver solver = ctx.mkSolver();
solver.add(assertions);
switch (solver.check()) {
case SATISFIABLE: {
// fetch our model
Model model = solver.getModel();
System.out.println(model);
for(FuncDecl constant : model.getConstDecls()) {
// get the interpretation
Expr value = model.getConstInterp(constant);
System.out.println(value.toString());
Output:
(define-fun b () Real
(- 1.0))
(define-fun w2 () Real
0.5)
(define-fun w1 () Real
0.5)
-1
1/2
1/2
I'm looking to somehow extract the results of these constant functions into Java doubles. I could simply parse the values of the Exprs' toString() if they would both abide by pp.decimal.

After light digging (beyond the scope of looking at autocomplete suggestions), I worked out that you can simply check if the Expr you have is a RatNum. From there, you can up-cast to a RatNum and use getNumerator and getDenominator() and yield a double from division that way.
Expr value = model.getConstInterp(constant);
if(value.isRatNum()) {
RatNum rational = (RatNum) value;
IntNum num = rational.getNumerator(), den = rational.getDenominator();
System.out.println("Value = " + ((double) num.getInt() / den.getInt()));
}
This makes sense now.

How To Iterate Through Every Latitude/Longitude in Java

I'm accessing this API that gives me global weather:
https://callforcode.weather.com/doc/v3-global-weather-notification-headlines/
However, it takes lat/lng as an input parameter, and I need the data for the entire world.
I figure I can loop through every latitude longitude, every 2 latitudes and 2 longitude degrees, giving me a point on the world, every ~120 miles across and roughly 100 degrees north/south, which should give me all the data in 16,200 API calls ((360/2) * (180/2)).
How can I do this effectively in Java?
I'd conceived something like this; but is there a better way of doing this?
for(int i = 0; i < 360; i+2){
var la = i;
for(int x = 0 x < 180; x+2) {
var ln = x;
//call api with lat = i, lng = x;
}
}

It's somewhat of a paradigm shift, but I would NOT use a nested for-loop for this problem. In many situations where you are looking at iterating over an entire result set, it is often possible to trim the coverage dramatically without losing much or any effectiveness. Caching, trimming, prioritizing... these are the things you need: not a for-loop.
Cut sections entirely - maybe you can ignore ocean, maybe you can ignore Antartica and the North Pole (since people there have better ways of checking weather anyway)
Change your search frequency based on population density. Maybe northern Canada doesn't need to be checked as thoroughly as Los Angeles or Chicago.
Rely on caching in low-usage areas - presumably you can track what areas are actually being used and can then more frequently refresh those sections.
So what you end up with is some sort of weighted caching system that takes into account population density, usage patterns, and other priorities to determine what latitude/longitude coordinates to check and how frequently.
High-level code might look something like this:
void executeUpdateSweep(List<CoordinateCacheItem> cacheItems)
{
for(CoordinateCacheItem item : cacheItems)
{
if(shouldRefreshCache(item))
{
//call api with lat = item.y , lng = item.x
}
}
}
boolean shouldRefreshCache(item)
{
long ageWeight = calculateAgeWeight(item);//how long since last update?
long basePopulationWeight = item.getBasePopulationWeight();//how many people (users and non-users) live here?
long usageWeight = calculateUsageWeight(item);//how much is this item requested?
return ageWeight + basePopulationWeight + usageWeight > someArbitraryThreshold;
}

Arraylist of lat/long points of interest?

i've made an app that implements augmented reality based on POI's and have all the functionality working for one POI but i would now like to be able to put in multiple points. Can any give me advice on how to do this ? Can i create an array of POI's ?? posted my relevant code below but don't really know where to go from here.
private void setAugmentedRealityPoint() {
homePoi = new AugmentedPOI(
"Home",
"Latitude, longitude",
28.306802, -81.601358
);
This is how its currently set and i then go on to use it in other area's as shown belown:
public double calculateAngle() {
double dX = homePoi.getPoiLatitude() - myLatitude;
double dY = homePoi.getPoiLongitude() - myLongitude;
}
and here:
private boolean isWithinDistance(double myLatitude, double myLongitude){
Location my1 = new Location("One");
my1.setLatitude(myLatitude);
my1.setLongitude(myLongitude);
Location target =new Location("Two");
target.setLatitude(homePoi.getPoiLatitude());
target.setLongitude(homePoi.getPoiLongitude());
double range =my1.distanceTo(target);
double zone = 20;
if (range < zone ) {
return true;
}
else {
return false;
}
}
Any help would be appreciated.

Using a List would be a smart idea. You could add all entries into it in code, or you could pull them in from a JSON file. When you're rendering them, you could check if they are in range.
If you have a lot of these POIs, you should divide them into smaller and smaller regions, and only load what you need. For example, structure them like this:
- CountryA
+ County 1
* POI
* POI
- CountryB
+ County 1
* POI
* POI
+ County 2
* POI
Get the country and county of the user, and only load what you really need. I assume this is a multiplayer game, so I'll share some of my code.
On the server side, I have 3 objects: Country, County and POI.
First I discover all countries on the disk, and make an object for it. Inside my country object I have a list for all counties, and inside my County object I have a list of POIs. When a player joins, they send a packet with their Country and County, and I can select the appropriate POIs for them. Storing them in smaller regions is essential, or your server will have a hard time if you go through all of the POIs for every player.
Here is my method for discovering data: Server.java#L311-L385
Code for selecting POIs for a player: Server.java#L139-L181
And how you can render it: PlayScreen.java#L209-L268
You need to port it to your own app, and I'm probably horrible at explaining, but I hope you got something out of it.

How to efficiently remove duplicate collision pairs in spatial hash grid?

I'm working on a 2D game for android so performance is a real issue and a must. In this game there might occur a lot of collisions between any objects and I don't want to check in bruteforce o(n^2) whether any gameobject collides with another one. In order to reduce the possible amount of collision checks I decided to use spatial hashing as broadphase algorithm becouse it seems quite simple and efficient - dividing the scene on rows and columns and checking collisions between objects residing only in the same grid element.
Here's the basic concept I quickly scratched:
public class SpatialHashGridElement
{
HashSet<GameObject> gameObjects = new HashSet<GameObject>();
}
static final int SPATIAL_HASH_GRID_ROWS = 4;
static final int SPATIAL_HASH_GRID_COLUMNS = 5;
static SpatialHashGridElement[] spatialHashGrid = new SpatialHashGridElement[SPATIAL_HASH_GRID_ROWS * SPATIAL_HASH_GRID_COLUMNS];
void updateGrid()
{
float spatialHashGridElementWidth = screenWidth / SPATIAL_HASH_GRID_COLUMNS;
float spatialHashGridElementHeight = screenHeight / SPATIAL_HASH_GRID_ROWS;
for(SpatialHashGridElement e : spatialHashGrid)
e.gameObjects.clear();
for(GameObject go : displayList)
{
for(int i = 0; i < go.vertices.length/3; i++)
{
int row = (int) Math.abs(((go.vertices[i*3 + 1] / spatialHashGridElementHeight) % SPATIAL_HASH_GRID_ROWS));
int col = (int) Math.abs(((go.vertices[i*3 + 0] / spatialHashGridElementWidth) % SPATIAL_HASH_GRID_COLUMNS));
if(!spatialHashGrid[row * SPATIAL_HASH_GRID_COLUMNS + col].gameObjects.contains(go))
spatialHashGrid[row * SPATIAL_HASH_GRID_COLUMNS + col].gameObjects.add(go);
}
}
}
The code isn't probably of the highest quality so if you spot anything to improve please don't hesitate to tell me but the most worrying problem that arises currently is that in 2 grid cells there might be same collision pairs checked. Worst case example (assuming none of the objects spans more than 2 cells):
Here we have 2 gameObjects colliding(red and blue). Each of them resides in 4 cells => therefore in each cell there will be the same pair to check.
I can't come up with some efficient approach to remove the possibility of duplicate pairs without a need to filter the grid after creating it in updateGrid(). Is there some brilliant way to detect that some collision pair has been already inserted even during the updateGrid function? I will be very grateful for any tips!

I'm trying to explain my idea using some pseudo-code (C# language elements):
public partial class GameObject {
// ...
Set<GameObject> collidedSinceLastTick = new HashSet<GameObject>();
public boolean collidesWith(GameObject other) {
if (collidedSinceLastTick.contains(other)) {
return true; // or even false, see below
}
boolean collided = false;
// TODO: your costly logic here
if (collided) {
collidedSinceLastTick.add(other);
// maybe return false if other actions depend on a GameObject just colliding once per tick
}
return collided;
}
// ...
}
HashSet and .hashCode() both can be tuned in some cases. Maybe you could even remove displayList and "hold" everything in spatialHashGrid to reduce the memory foot-print a little bit. Of course do that only if you don't need special access to displayList - in XML's DocumentObjectModel objects can be accessed by a path throught the tree, and "hot spots" can be accessed by ID where the ID has to be assigned explicitely. For serializing (saving game state or whatever) it should not be an issue to iterate through spatialHashGrid performance-wise (it's a bit slower than serializing the gameObject set because you may have to suppress duplicates - using Java serialization it even does not save the same object twice using the default settings, saving just a reference after the first occurence of an object).

Filtering Data in Java/Processing

I am working on a map based data visualization project that scrapes data from an XML file. Locations are placed on a map based on geo location and locations are interactive with mouse clicks that will then display information about the location. I need to start filtering the results based on information about each location. Ex: lets say the i want to display information about trees and i know their location and their type. I would want to filter in and out walnut, cherry, oak using check boxes.
I am trying to plan out how to attack this problem from a design standpoint. Currently the all information is pulled directly from the XML file with very little going into new Arrays/Lists. Any recommendations as I am trying to conquer this task? If you need me to elaborate more or want any more information please let me know.
EDIT:
I'm sorry if this is vague, I'm not entirely sure how to ask the question. So right now I am taking 311 data and putting information into Array's based on the information I want to display. So lets say I want get an address. (At this point a map has been populated with all of the individual locations from the 311 data lets say 200 spots) I click one location, and that location is tied to an index in an array that has all of the addresses. So at any time I can use an index to get information from an array. There are multiple arrays holding information like address, report type, time, etc. I want the locations on the map to be sorted by the report type. I hope this makes more sense.

I hope I understand correctly, this seem like a regular data management system requirements, it will be hard to cover these kind of systems in a few words, but in a nutshell I'd say that those systems are divided into layers:
data layer, usually some database, try to install and use a database like mysql
data access layer, I understand you're using java, consider using hibernate that will let you describe and use your database using objects rather than RDBMS tables, here you would also have sql / hql queries
some business layer to have the logic on top of these dummy data objects, or maybe connect to some external service
serve this data to you client, whatever it runs on, if its a Java client or a web browser
Check out java spring http://projects.spring.io/spring-framework/ on how its done in practice.
Then, if you feel like going back and forth to the server for more data is too much on performance you may decide to cache some of the information in the client side.
Last, always remember Donald Knuth saying
about 97% of the time: premature optimization is the root of all evil.

Are you using processing tag for Processing.org right? If I understand you, isn't the case of make an object that group all data relative to one location, and when needed retrieve the info using a getter or even dot notation? I though of something like:
class Local(){
String name;
String address;
//whatever else...
float mapPosX, mapPosY;
boolean ispressedOver(){
//return if mouse over
}
}
create Locals using XML data, store in an array of Local, and when mouse pressed get it
if (localsArray[i].isPressedOver){display(localsArray[i].address);}
this would be a very simple example of the idea except for the xml parsing to an
Place[] places = new Place[4];
void setup() {
size(600, 400);
noStroke();
places[0] = new Place ("one", "That street, 12 - BR", 0.32044, 0.230098, 200, 98);
places[1] = new Place ("two", "This street, 35 - UG", 0.22222, 0.084723, 394,176);
places[2] = new Place ("three", "Other street, 132 - TY", 0.32321, 0.36388, 157, 283);
places[3] = new Place ("four", "Principal street, 672 - OP", 0.909044, 0.7828939, 276, 312);
}
void draw() {
background(75, 16, 160);
for(Place p:places){
p.display();
}
}
class Place {
String name;
String address;
float latitude;
float longitude;
float xPos;
float yPos;
float sz = 40;
Place(String n, String a, float lat, float lng, float x, float y) {
name = n;
address = a;
latitude = lat;
longitude = lng;
xPos = x;
yPos = y;
}
void display() {
fill(200, 210, 100);
rect(xPos, yPos, sz, sz);
if (isOver()) {
String quick = name + " - " + address + " - " + latitude + " - " + longitude ;
fill(0);
text(quick, xPos - textWidth(quick)/2, yPos - 10);
}
}
boolean isOver() {
return (mouseX > xPos && mouseX < xPos + sz && mouseY > yPos && mouseY < yPos + sz);
}
}
In following link I pasted a code I'm working on. It is not displaying anything. But I does get XML data and build objects based on them. For now the output is in the console. I don't know if is going to help much. Most variables names are in portuguese :P And it is not commented... But it works. You can run it. it gets the xml from an API in the web. There are two classes, don't bother with Query. It is necessary to get the xml, but not related to your question. The Prop class is the data holder. It gets an xml as parameter and parse it's fields to member vars. For now there is only one method. toString() used to display data to the console.
http://pastebin.com/8gGDsFAv

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.