Drools Database - java

I am developing a rating system based on Drool Rules to replace a old one made with ASP and a relational DB.
Everything is running very good as expected, but some rules become very extensive because the rating system needs to compare a lot of constant values with the input - I do not want to back to the solution of using an external database.
A this moment, there is some standard data structure that should be use to persist lot of constant values? I know it is possible to construct a Java structure for that ... but my objective is to give the rules file to the sales team, who barely understand Java but they are very good with ratios.
For example, I want to replace this with something more clean:
if("A".equals($inpt)) { $outpt = 0.1; }
else if("B".equals($inpt)) { $outpt = 0.2; }
else if("C".equals($inpt)) { $outpt = 0.3; }

I'll have to guess a little. Let's say you have
rule "set output"
when
$s: Something( $input: input )
InToOut( input == $input, $output: output )
then
modify( $s ){ setOutput( $output ) }
end
Your sales team members will surely understand if you give them the skeleton
rule "setInToOut"
salience 999999999
when
then
insert( new InToOut( "A", 0.1 ) );
insert( new InToOut( "B", 0.2 ) );
...
end
You can simplify this with a function.

Related

Dataflow GCS to BigQuery - How to output multiple rows per input?

Currently I am using the gcs-text-to-bigquery google provided template and feeding in a transform function to transform my jsonl file. The jsonl is pretty nested and i wanted to be able to output multiple rows per one row of the newline delimited json by doing some transforms.
For example:
{'state': 'FL', 'metropolitan_counties':[{'name': 'miami dade', 'population':100000}, {'name': 'county2', 'population':100000}…], 'rural_counties':{'name': 'county1', 'population':100000}, {'name': 'county2', 'population':100000}….{}], 'total_state_pop':10000000,….}
There will obviously be more counties than 2 and each state will have one of these lines. The output my boss wants is:
When i do the gcs-to-bq text transform, i end up only getting one line per state (so I'll get miami dade county from FL, and then whatever the first county is in my transform for the next state). I read a little bit and i think this is because of the mapping in the template that expects one output per jsonline. It seems I can do a pardo(DoFn ?) not sure what that is, or there is a similar option with beam.Map in python. There is some business logic in the transforms (right now it's about 25 lines of code as the json has more columns than i showed but those are pretty simple).
Any suggestions on this? data is coming in tonight/tomorrow, and there will be hundreds of thousands of rows in a BQ table.
the template i am using is currently in java, but i can translate it to python pretty easily as there are a lot of examples online in python. i know python better and i think its easier given the different types (sometimes a field can be null) and it seems less daunting given the examples i saw look simpler, however, open to either
Solving that in Python is somewhat straightforward. Here's one possibility (not fully tested):
from __future__ import absolute_import
import ast
import apache_beam as beam
from apache_beam.io import ReadFromText
from apache_beam.io import WriteToText
from apache_beam.options.pipeline_options import PipelineOptions
import os
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = '/path/to/service_account.json'
pipeline_args = [
'--job_name=test'
]
pipeline_options = PipelineOptions(pipeline_args)
def jsonify(element):
return ast.literal_eval(element)
def unnest(element):
state = element.get('state')
state_pop = element.get('total_state_pop')
if state is None or state_pop is None:
return
for type_ in ['metropolitan_counties', 'rural_counties']:
for e in element.get(type_, []):
name = e.get('name')
pop = e.get('population')
county_type = (
'Metropolitan' if type_ == 'metropolitan_counties' else 'Rural'
)
if name is None or pop is None:
continue
yield {
'State': state,
'County_Type': county_type,
'County_Name': name,
'County_Pop': pop,
'State_Pop': state_pop
}
with beam.Pipeline(options=pipeline_options) as p:
lines = p | ReadFromText('gs://url to file')
schema = 'State:STRING,County_Type:STRING,County_Name:STRING,County_Pop:INTEGER,State_Pop:INTEGER'
data = (
lines
| 'Jsonify' >> beam.Map(jsonify)
| 'Unnest' >> beam.FlatMap(unnest)
| 'Write to BQ' >> beam.io.Write(beam.io.BigQuerySink(
'project_id:dataset_id.table_name', schema=schema,
create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND)
)
)
This will only succeed if you are working with batch data. If you have streaming data then just change beam.io.Write(beam.io.BigquerySink(...)) to beam.io.WriteToBigQuery.

How to train Matrix Factorization Model in Apache Spark MLlib's ALS Using Training, Test and Validation datasets

I want to implement Apache Spark's ALS machine learning algorithm. I found that best model should be chosen to get best results. I have split the training data into three sets Training, Validation and Test as suggest on forums.
I've found following code sample to train model on these sets.
val ranks = List(8, 12)
val lambdas = List(1.0, 10.0)
val numIters = List(10, 20)
var bestModel: Option[MatrixFactorizationModel] = None
var bestValidationRmse = Double.MaxValue
var bestRank = 0
var bestLambda = -1.0
var bestNumIter = -1
for (rank <- ranks; lambda <- lambdas; numIter <- numIters) {
val model = ALS.train(training, rank, numIter, lambda)
val validationRmse = computeRmse(model, validation, numValidation)
if (validationRmse < bestValidationRmse) {
bestModel = Some(model)
bestValidationRmse = validationRmse
bestRank = rank
bestLambda = lambda
bestNumIter = numIter
}
}
val testRmse = computeRmse(bestModel.get, test, numTest)
This code trains model for each combination of rank and lambda and compares rmse (root mean squared error) with validation set. These iterations gives a better model which we can say is represented by (rank,lambda) pair. But it doesn't do much after that on test set.
It just computes the rmse with `test' set.
My question is how it can be further tuned with test set data.
No, one would never fine tune the model using test data. If you do that, it stops being your test data.
I'd recommend this section of Prof. Andrew Ng's famous course that discusses the model training process: https://www.coursera.org/learn/machine-learning/home/week/6
Depending on your observation of the error values with validation data set, you might want to add/remove features, get more data or make changes in the model, or maybe even try a different algorithm altogether. If the cross-validation and the test rmse look reasonable, then you are done with the model and you could use it for the purpose (some prediction, I would assume) that made you build it in the first place.

Drools on web application

I'm very new to drools but I want to integrate it on my existing project. I'm using Spring MVC framework. I successfully implemented the simple example hello world from the free project of drools. What I want to do now is:
Send a bean to the rules for it to evaluate.
Modify the bean depending on the rules
Send it back to the controller to make a response to the user.
The step 1 I'm already done with it. I was able to insert the bean in rules. What I have for now in my rules is something like this:
global String $test;
rule "Excellent"
when
$m: FLTBean ( listeningScore > 85 )
$p: FLTBean ( listeningScore < 101 )
then
$test = "Excellent";
System.out.println( $test );
end
For step 2 and step 3 I don't know how can I do that. If possible please give me a simple code to be able to do this. I want to have nested rules. With 2 nested rules as an example would be great.
Thanks in advance.
There are a couple of ways you can do this, depending on whether you are using a stateless or stateful session.
rule "Excellent"
no-loop
when
$m: FLTBean ( listeningScore > 85 && listeningScore < 101 )
then
$m.setRating("Excellent")
update( $m )
end
In which case your Java code for a stateless session could be:
FLTBean flt = new FLTBean();
flt.setScore(91);
List<Object> facts = new ArrayList<Object>();
facts.add(flt);
ksession.execute(facts);
System.out.println("Result is " + flt.getRating());
If you are using a stateful session then you can insert facts, fire rules and then query facts out of the working memory. Your rule can insert new facts into the working memory like so:
rule "Excellent"
when
$m: FLTBean ( listeningScore > 85 && listeningScore < 101 )
then
insert( new FLTResult("Excellent") )
end
To get the result back out again, you can use the Drools API to find any objects in the working memory.
/** Provide a reference to the session and the class name
* of the fact you are searching for.
*/
public Collection<Object> findFacts(final StatefulKnowledgeSession session,
final String factClass) {
ObjectFilter filter = new ObjectFilter() {
#Override
public boolean accept(Object object) {
return object.getClass().getSimpleName().equals(factClass);
}
};
Collection<Object> results = session.getObjects(filter);
return results;
}
// And call that like so:
FLTBean flt = new FLTBean();
flt.setScore(91);
ksession.insert(flt);
ksession.fireAllRules();
results = findFacts(ksession, "FLTResult");
One option is to write a query for the bean in order to get it back from the rule engine once the rules have been executed. This could become cumbersome if you have a lot of beans to fetch. The docs show you an examples of this approach.
Another option could be to have global collection where you collect all the beans at the end of rule execution. Just make sure to have the "collect" rule to be low salience, so that it'll be last to execute. This rule would be something like below
rule 'collect results'
salience -500
when
$beans : ArrayList() from collect( MyBean() )
then
someGlobal.setBeans( $beans);
end
In fact, you can probably add directly to a global List if you want. You can also add some conditions with the collect if you don't want all the beans.

Find out which field matched term in custom score script

I am using a custom score query with a multiMatchQuery. Ultimately what I want is simple and requires little explaination. In my Java Custom Score Script, I want to be able to find out which field a result matched to.
Example:
If I search Starbucks and a result comes back with the name Starbucks then I want to be able to know that name.basic was the field that matched my query. If I search for coffee and starbucks comes back I want to be able to know that tags was the field that matched.
Is there anyway to do this?
Search Query Code:
def basicSearchableSearch(t: String, lat: Double, lon: Double, r: Double, z: Int, bb: BoundingBox, max: Int): SearchResponse = {
val multiQuery = filteredQuery(
multiMatchQuery(t)
//Matches businesses and POIs
.field("name.basic").operator(Operator.OR)
.field("name.no_space")
//Businesses only
.field("tags").boost(6f),
geoBoundingBoxFilter("location")
.bottomRight(bb.botRight.y,bb.botRight.x)
.topLeft(bb.topLeft.y,bb.topLeft.x)
)
val customQuery = customScoreQuery(
multiQuery
)
.script("customJavaScript")
.lang("native")
.param("lat",lat)
.param("lon",lon)
.param("zoom",z)
global.Global.getClient().prepareSearch("searchable")
.setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
.setQuery(customQuery)
.setFrom(0).setSize(max)
.execute()
.actionGet();
}
It's only simple for simple queries. On complex queries, the question which field matched is actually quite nontrivial. So, I cannot think of any efficient way to do it.
Perhaps, you could consider moving your custom score calculation closer to the match. The multi_match query is basically a shortcut for a set of match queries on the same query string combined by a dis_max query. So, you are currently building something like this:
custom_score(
filtered(
dis_max(match_1, match_2, match_3)
)
)
What you can do is to move your custom_score under dis_max and build something like this:
filtered(
dis_max(
custom_score_1(match_1),
custom_score_2(match_2),
custom_score_3(match_3)
)
)
Obviously, this will be a somewhat different query, since dis_max will operate on custom score instead of original score.

Java String formatting solution

I have a string description of a company, which is nasty written by different users (hand-typed). Here is a example (focus on the dots, spaces, first letters etc..):
XXXX is a Global menagement consulting,Technology services and
outsourcing company, with 257000people serving clients in more than
120 countries.. combining unparalleled experience, comprehensive
capabilities across all industries and business functions,and
extensive research on the worlds most successfull companies, XXXX
collaborates with clients to help them become high-performance
businesses and governments., the company generated net revenues of
US$27.9 Billion for the fiscal year ended 31.07.2012..
Now what i want is to format the string to a bit nicer version like this:
XXXX is a global management consulting, technology services and
outsourcing company, with 257,000 people serving clients in more than
120 countries. Combining unparalleled experience, comprehensive
capabilities across all industries and business functions, and
extensive research on the world’s most successful companies, XXXX
collaborates with clients to help them become high-performance
businesses and governments. The company generated net revenues of
US$27.9 billion for the fiscal year ended Aug. 31, 2012.
My question is: Is there any library with already defined methods which could do all the spelling corrections, unneeded space removal, etc .. ?
So far, I do it be replacing stuff like " ," with ", " and toUpperCase() if the is a "///." in front etc..
desc = desc.replace(" ", " ");
desc = desc.replace("..", ".");
desc = desc.replace(" .", ".");
desc = desc.replace(" ,", ", ");
desc = desc.replace(".,", ".");
desc = desc.replace(",.", ".");
desc = desc.replace(", .", ".");
desc = desc.replace("*", "");
I'm sure there is a cleaner and better version to do this. Using regex maybe??
Any solution would be appreciated.
If I were trying to solve your problem, I would probably read the text 1 char at a time, and format it as you go. For example, in psuedocode...
while (has more chars){
char letter = readChar();
if (letter == ','){
// checking for the ',.' combination
letter = readChar();
if (readChar == '.'){
// write out a '.' only
out.print('.');
}
else {
// it wasn't the ',.' combination, so you need to output both characters, whatever they are
out.print(',');
out.print(letter);
}
}
else if (another letter you want to filter){
// etc.
}
else {
// doesn't match any of the filters, so just output the letter
out.print(letter);
}
}
Basically if you read the text 1 char at a time, you can detect any of your chosen formatting problems as you go, and correct them immediately. This provides a performance improvement, as you're only reading over the text string once (not 8 times, like you are currently doing), and allows you to add as many different/complex formatting changes as you want. The downside, however, is that you need to write the logic yourself rather than relying on in-built functions.

Categories