I'm working with wit.ai's Duckling (https://duckling.wit.ai/), however I am depending on and calling Duckling from within my Java application. I have no Clojure experience...
I am able to run Duckling's parse method, however I can't figure out how to pass in the date/time to be used as context for the time and date resolution.
Here is the function:
(defn parse
"Public API. Parses text using given module. If dims are provided as a list of
keywords referencing token dimensions, only these dimensions are extracted.
Context is a map with a :reference-time key. If not provided, the system
current date and time is used."
([module text]
(parse module text []))
([module text dims]
(parse module text dims (default-context :now)))
([module text dims context]
(->> (analyze text context module (map (fn [dim] {:dim dim :label dim}) dims) nil)
:winners
(map #(assoc % :value (engine/export-value % {})))
(map #(select-keys % [:dim :body :value :start :end :latent])))))
In the testing corpus, it has the context date at the top of the file. This gets passed into the parse function while testing the corpus.
{:reference-time (time/t -2 2013 2 12 4 30 0)
:min (time/t -2 1900)
:max (time/t -2 2100)}
Here is my Java code:
public void extract(String input) {
IFn require = Clojure.var("clojure.core", "require");
require.invoke(Clojure.read("duckling.core"));
Clojure.var("duckling.core", "load!").invoke();
LazySeq o = (LazySeq) Clojure.var("duckling.core", "parse").invoke("en$core", input, dims);
}
My question is, how do I insert a specific date/time in as a parameter to the parse function?
EDIT 1 Looking at it some more, it looks like this is a datetime object. Duckling depends on clj-time 0.8.0, however I can't figure out how to create that same object in Java by calling out to clj-time.
Duckling has its own datetime helper function ('t') in the duckling.time.obj namespace, which drives HOW I get the same datetime object that it's expecting.
private final Keyword REFERENCE_TIME = Keyword.intern("reference-time");
private final Keyword MIN = Keyword.intern("min");
private final Keyword MAX = Keyword.intern("max");
public void extract(String input) {
PersistentArrayMap datemap = (PersistentArrayMap) Clojure.var("duckling.time.obj", "t").invoke(-5, 2017, 2, 21, 23, 30, 0);
PersistentArrayMap minMap = (PersistentArrayMap) Clojure.var("duckling.time.obj", "t").invoke(-5, 1900);
PersistentArrayMap maxMap = (PersistentArrayMap) Clojure.var("duckling.time.obj", "t").invoke(-5, 2100);
Object[] contextArr = new Object[6];
contextArr[0] = REFERENCE_TIME;
contextArr[1] = datemap;
contextArr[2] = MIN;
contextArr[3] = minMap;
contextArr[4] = MAX;
contextArr[5] = maxMap;
PersistentArrayMap cljContextMap = PersistentArrayMap.createAsIfByAssoc(contextArr);
LazySeq results = (LazySeq) Clojure.var("duckling.core", "parse").invoke("en$core", input, dims, cljContextMap);
}
Only thing left to do is create the datemap with dynamic values instead of hardcoded.
Related
I'm currently developing some functionality that needs to either subtract or add time to a Calendar class instance. The time I need to add/sub is in a properties file and could be any of these formats:
30,sec
90,sec
1.5,min
2,day
2.333,day
Let's assume addition for simplicity. I would read those values in a String array:
String[] propertyValues = "30,sec".split(",");
I would read the second value in that comma-separated pair, and map that to the relevant int in the Calendar class (so for example, "sec" becomes Calendar.SECOND, "min" becomes Calendar.MINUTE):
int calendarMajorModifier = mapToCalendarClassIntValues(propertyValues[1]);
To then do the actual operation I would do it as simple as:
cal.add(calendarMajorModifier, Integer.parseInt(propertyValues[0]));
This works and it's not overly complicated. The issue is now floating values (so 2.333,day for eaxmple) - how would you deal with it?
String[] propertyValues = "2.333,day".split(",");
As you can imagine the code becomes quite hairy (I haven't actually written it yet, so please ignore syntax mistakes)
float timeComponent = Float.parseFloat(propertyValues[0]);
if (calendarMajorModifier == Calendar.DATE) {
int dayValue = Integer.parseFloat(timeComponent);
cal.add(calendarMajorModifier, dayValue);
timeComponent = (timeComponent - dayValue) * 24; //Need to convert a fraction of a day to hours
if (timeComponent != 0) {
calendarMajorModifier = Calendar.HOUR;
}
}
if (calendarMajorModifier == Calendar.HOUR) {
int hourValue = Integer.parseFloat(timeComponent);
cal.add(calendarMajorModifier, hourValue);
timeComponent = (timeComponent - hourValue) * 60; //Need to convert a fraction of an hour to minutes
if (timeComponent != 0) {
calendarMajorModifier = Calendar.MINUTE;
}
}
... etc
Granted, I can see how there may be a refactoring opportunity, but still seems like a very brute-forceish solution.
I am using the Calendar class to do the operations on but could technically be any class. As long as I can convert between them (i.e. by getting the long value and using that), as the function needs to return a Calendar class. Ideally the class also has to be Java native to avoid third party licensing issues :).
Side note: I suggested changing the format to something like yy:MM:ww:dd:hh:mm:ss to avoid floating values but that didn't pan out. I also suggested something like 2,day,5,hour, but again, ideally needs to be format above.
I'd transform the value into the smallest unit and add that:
float timeComponent = Float.parseFloat(propertyValues[0]);
int unitFactor = mapUnitToFactor(propertyValues[1]);
cal.add(Calendar.SECOND, (int)(timeComponent * unitFactor));
and mapUnitToFactor would be something like:
int mapUnitToFactor(String unit)
{
if ("sec".equals(unit))
return 1;
if ("min".equals(unit))
return 60;
if ("hour".equals(unit))
return 3600;
if ("day".equals(unit))
return 24*3600;
throw new InvalidParameterException("Unknown unit: " + unit);
}
So for example 2.333 days would be turned into 201571 seconds.
I please bear with me, I have been using Java for 2 days and i've hit a bit of a hurdle.
I am using Talend to perform a count using the tMemorize and tJava components but this may be a question for a Java developer. I have previously posted an issue with using this method within a joblet by my new issue is more Java related which can be viewed here:
using Joblets in talend with tMemorize and tJavaFlex
I need to reference an array generated by the java code talend. I cannot reference this element directly because of an issue with using tJavaFlex within multiple joblets: Java renames joblets each time they are used.
It may be useful to understand how my code works in normal circumstances (excluding the use of joblets).
int counter = 1;
if (EnquiryID_mem_1_tMemorizeRows_1[0].equals(EnquiryID_mem_1_tMemorizeRows_1[1]))
{
counter++;
}
row3.counter = counter;
The EnquiryID_mem_1_tMemorizeRows_1[0] and EnquiryID_mem_1_tMemorizeRows_1[1] is what I need to reference.
To overcome this I have written the following code.
String string = currentComponent;
String[] parts = string.split("_");
String part1 = parts[0];
String part2 = parts[1];
String joblet = part1+'_'+part2;
String newrow = "EnquiryID_"+joblet+"_tMemorizeRows_1"
if (newrow[0].equals(newrow[1]))
{
counter++;
}
row3.counter = counter;
However I get the following error:
The type of the expression must be an array type but it resolved to String
I understand that the newrow variable is a string and I am using it to reference an array. I have searched far and wide online for a resolve but I cannot fine one. Can someone help me please?
Thank you
Here is the talend code that my code should reference. I have taken it from the currentComponent that I am using to when it changes to one not in use directly.
currentComponent = "mem_1_tMemorizeRows_1";
// row1
// row1
if (execStat) {
runStat.updateStatOnConnection("row1" + iterateId,
1, 1);
}
for (int i_mem_1_tMemorizeRows_1 = iRows_mem_1_tMemorizeRows_1 - 1; i_mem_1_tMemorizeRows_1 > 0; i_mem_1_tMemorizeRows_1--) {
EnquiryID_mem_1_tMemorizeRows_1[i_mem_1_tMemorizeRows_1] = EnquiryID_mem_1_tMemorizeRows_1[i_mem_1_tMemorizeRows_1 - 1];
}
EnquiryID_mem_1_tMemorizeRows_1[0] = row1.EnquiryID;
mem_1_row2 = row1;
tos_count_mem_1_tMemorizeRows_1++;
/**
* [mem_1_tMemorizeRows_1 main ] stop
*/
/**
* [mem_1_tJavaFlex_1 main ] start
*/
currentComponent = "mem_1_tJavaFlex_1";
// mem_1_row2
// mem_1_row2
if (execStat) {
runStat.updateStatOnConnection("mem_1_row2"
+ iterateId, 1, 1);
}
mem_1_row3.QuoteID = mem_1_row2.QuoteID;
mem_1_row3.EnquiryID = mem_1_row2.EnquiryID;
if (EnquiryID_mem_1_tMemorizeRows_1[0]
.equals(EnquiryID_mem_1_tMemorizeRows_1[1])) {
rower++;
}
mem_1_row3.rower = rower;
tos_count_mem_1_tJavaFlex_1++;
/**
* [mem_1_tJavaFlex_1 main ] stop
*/
/**
* [mem_1_tMap_1 main ] start
*/
currentComponent = "mem_1_tMap_1";
Thank you to everyone who has helped so far.
This
if (newrow[0].equals(newrow[1]))
Tries to pick the first and second element of the array newrow. Unfortunately you declare newrow as
String newrow = "EnquiryID_"+joblet+"_tMemorizeRows_1"
which is not an array but a String. That syntax in the if will not work with a String. I am not sure what you are trying to do but that if check will not work.
EDIT:
If you are trying to pick up char from a string you need to use charAt(index).
If you want to treat newrow as an array you have to declare it as such and pass appropriate elements to it.
EDIT 2: I think you are trying to pass the actual data in joblet to newrow in this:
String newrow = "EnquiryID_"+joblet+"_tMemorizeRows_1"
But what happens here is that everything is concatenated in one String so you need to figure out where the data you are looking for (part[0] and part[1] I assume) is present in that String so you can pull them out (basically what indices contain the values you are looking for).
An example of how newrow will look after that assignment:
"EnquiryID_part1_part2_tMemorizeRows_1"
So "part1" will start at index 10 and will end at index 14. I am just using "part1" here, but it would have whatever value is stored in part1 variable.
If you can show us what you expect it to look like that would help.
I'm not super familiar with talend (understand: not at all). But it sounds like you have some sort of attribute of a generated class (say myGeneratedObject) and you want to access it by name.
In that case, you could do something like:
String newrow = "EnquiryID_"+joblet+"_tMemorizeRows_1"
Field field = myGeneratedObject.getClass().getField(newrow);
if (field.getClass().isArray()) {
if(Array.get(field, 0).equals(Array.get(field, 1)) {
counter++;
}
}
It all depends how you access that field really and where it's declared. But if it's an attribute of an object, then the code above should work, +/- contextual adjustments due to my lack of knowledge of the exact problem.
I am working on a project that confuses me really bad right now.
Given is a List<TimeInterval> list that contains elements of the class TimeInterval, which looks like this:
public class TimeInterval {
private static final Instant CONSTANT = new Instant(0);
private final LocalDate validFrom;
private final LocalDate validTo;
public TimeInterval(LocalDate validFrom, LocalDate validTo) {
this.validFrom = validFrom;
this.validTo = validTo;
}
public boolean isValid() {
try {
return toInterval() != null;
}
catch (IllegalArgumentException e) {
return false;
}
}
public boolean overlapsWith(TimeInterval timeInterval) {
return this.toInterval().overlaps(timeInterval.toInterval());
}
private Interval toInterval() throws IllegalArgumentException {
return new Interval(validFrom.toDateTime(CONSTANT), validTo.toDateTime(CONSTANT));
}
The intervals are generated using the following:
TimeInterval tI = new TimeInterval(ld_dateValidFrom, ld_dateValidTo);
The intervals within the list may overlap:
|--------------------|
|-------------------|
This should result in:
|-------||-----------||------|
It should NOT result in:
|--------|-----------|-------|
Generally speaking in numbers:
I1: 2014-01-01 - 2014-01-30
I2: 2014-01-07 - 2014-01-15
That should result in:
I1: 2014-01-01 - 2014-01-06
I2: 2014-01-07 - 2014-01-15
I3: 2014-01-16 - 2014-01-30
I'm using JODA Time API but since I'm using for the first time, I actually don't really have a clue how to solve my problem. I already had a look at the method overlap() / overlapWith() but I still don't get it.
Your help is much appreciated!
UPDATE
I found something similar to my problem >here< but that doesn't help me for now.
I tried it over and over again, and even though it worked for the first intervals I tested, it doesn't actually work the way I wanted it to.
Here are the intervals I have been given:
2014-10-20 ---> 2014-10-26
2014-10-27 ---> 2014-11-02
2014-11-03 ---> 2014-11-09
2014-11-10 ---> 2014-11-16
2014-11-17 ---> 9999-12-31
This is the function I am using to generate the new intervals:
private List<Interval> cleanIntervalList(List<Interval> sourceList) {
TreeMap<DateTime, Integer> endPoints = new TreeMap<DateTime, Integer>();
// Fill the treeMap from the TimeInterval list. For each start point,
// increment the value in the map, and for each end point, decrement it.
for (Interval interval : sourceList) {
DateTime start = interval.getStart();
if (endPoints.containsKey(start)) {
endPoints.put(start, endPoints.get(start)+1);
}
else {
endPoints.put(start, 1);
}
DateTime end = interval.getEnd();
if (endPoints.containsKey(end)) {
endPoints.put(end, endPoints.get(start)-1);
}
else {
endPoints.put(end, 1);
}
}
System.out.println(endPoints);
int curr = 0;
DateTime currStart = null;
// Iterate over the (sorted) map. Note that the first iteration is used
// merely to initialize curr and currStart to meaningful values, as no
// interval precedes the first point.
List<Interval> targetList = new LinkedList<Interval>();
for (Entry<DateTime, Integer> e : endPoints.entrySet()) {
if (curr > 0) {
if (e.getKey().equals(endPoints.lastEntry().getKey())){
targetList.add(new Interval(currStart, e.getKey()));
}
else {
targetList.add(new Interval(currStart, e.getKey().minusDays(1)));
}
}
curr += e.getValue();
currStart = e.getKey();
}
System.out.println(targetList);
return targetList;
}
This is what the output actually looks like:
2014-10-20 ---> 2014-10-25
2014-10-26 ---> 2014-10-26
2014-10-27 ---> 2014-11-01
2014-11-02 ---> 2014-11-02
2014-11-03 ---> 2014-11-08
2014-11-09 ---> 2014-11-09
2014-11-10 ---> 2014-11-15
2014-11-16 ---> 2014-11-16
2014-11-17 ---> 9999-12-31
And this is what the output SHOULD look like:
2014-10-20 ---> 2014-10-26
2014-10-27 ---> 2014-11-02
2014-11-03 ---> 2014-11-09
2014-11-10 ---> 2014-11-16
2014-11-17 ---> 9999-12-31
Since there is no overlap in the original intervals, I don't get why it produces stuff like
2014-10-26 ---> 2014-10-26
2014-11-02 ---> 2014-11-02
2014-11-09 ---> 2014-11-09
etc
I've been trying to fix this all day long and I'm still not getting there :( Any more help is much appreciated!
Half-Open
I suggest you reconsider the terms of your goal. Joda-Time wisely uses the "Half-Open" approach to defining a span of time. The beginning is inclusive while the ending is exclusive. For example, a week starts an the beginning of the first day and runs up to, but not including, the first moment of the next week. Half-open proves to be quite helpful and natural way to handle spans of time, as discussed in other answers.
Using this Half-Open approach for your example, you do indeed want this result:
|--------|-----------|-------|
I1: 2014-01-01 - 2014-01-07
I2: 2014-01-07 - 2014-01-16
I3: 2014-01-16 - 2014-01-30
Search StackOverflow for "half-open" to find discussion and examples, such as this answer of mine.
Joda-Time Interval
Joda-Time has an excellent Interval class to represent a span of time defined by a pair of endpoints on the timeline. That Interval class offers overlap, overlaps (sic), abuts, and gap methods. Note in particular the overlap method that generates a new Interval when comparing two others; that may be key to your solution.
But unfortunately, that class only works with DateTime objects and not LocalDate (date-only, no time-of-day or time zone). Perhaps that lack of support for LocalDate is why you or your team invented that TimeInterval class. But I suggest rather that using that custom class, consider using DateTime objects with Joda-Time's classes. I'm not 100% certain that is better than rolling your own date-only interval class (I've been tempted to do that), but my gut tells me so.
To focus on days rather than day+time, on your DateTime objects call the withTimeAtStartOfDay method to adjust the time portion to the first moment of the day. That first moment is usually 00:00:00.000 but not necessarily due to Daylight Saving Time (DST) and possibly other anomalies. Just be careful and consistent with the time zone; perhaps use UTC throughout.
Here is some example code in Joda-Time 2.5 using the values suggested in the Question. In these particular lines, the call to withTimeAtStartOfDay may be unnecessary as Joda-Time defaults to first moment of day when no day-of-time is provided. But I suggest using those calls to withTimeAtStartOfDay as it makes your code self-documenting as to your intent. And it makes all your day-focused use of DateTime code consistent.
Interval i1 = new Interval( new DateTime( "2014-01-01", DateTimeZone.UTC ).withTimeAtStartOfDay(), new DateTime( "2014-01-30", DateTimeZone.UTC ).withTimeAtStartOfDay() );
Interval i2 = new Interval( new DateTime( "2014-01-07", DateTimeZone.UTC ).withTimeAtStartOfDay(), new DateTime( "2014-01-15", DateTimeZone.UTC ).withTimeAtStartOfDay() );
From there, apply the logic suggested in the other answers.
Here is a suggested algorithm, based on the answer you have already found. First, you need to sort all the end points of the intervals.
TreeMap<LocalDate,Integer> endPoints = new TreeMap<LocalDate,Integer>();
This map's keys - which are sorted since this is a TreeMap - will be the LocalDate objects at the start and end of your intervals. They are mapped to a number that represents the number of end points at this date subtracted from the number of start points at this date.
Now traverse your list of TimeIntervals. For each one, for the start point, check whether it is already in the map. If so, add one to the Integer. If not, add it to the map with the value of 1.
For the end point of the same interval, if it exists in the map, subtract 1 from the Integer. If not, create it with the value of -1.
Once you finished filling endPoints, create a new list for the "broken up" intervals you will create.
List<TimeInterval> newList = new ArrayList<TimeInterval>();
Now start iterating over endPoints. If you had at least one interval in the original list, you'll have at least two points in endPoints. You take the first, and keep the key (LocalDate) in a variable currStart, and its associated Integer in another variable (curr or something).
Loop starting from the second element until the end. At each iteration:
If curr > 0, create a new TimeInterval starting at currStart and ending at the current key date. Add it to newList.
Add the Integer value to curr.
Assign the key as your next currStart.
And so on until the end.
What happens here is this: ordering the dates makes sure you have no overlaps. Each new interval is guaranteed not to overlap with any new one since they have exclusive and sorted end points. The trick here is to find the spaces in the timeline which are not covered by any intervals at all. Those empty spaces are characterized by the fact that your curr is zero, as it means that all the intervals that started before the current point in time have also ended. All the other "spaces" between the end points are covered by at least one interval so there should be a corresponding new interval in your newList.
Here is an implementation, but please notice that I did not use Joda Time (I don't have it installed at the moment, and there is no particular feature here that requires it). I created my own rudimentary TimeInterval class:
public class TimeInterval {
private final Date validFrom;
private final Date validTo;
public TimeInterval(Date validFrom, Date validTo) {
this.validFrom = validFrom;
this.validTo = validTo;
}
public Date getStart() {
return validFrom;
}
public Date getEnd() {
return validTo;
}
#Override
public String toString() {
return "[" + validFrom + " - " + validTo + "]";
}
}
The important thing is to add the accessor methods for the start and end to be able to perform the algorithm as I wrote it. In reality, you should probably use Joda's Interval or implement their ReadableInterval if you want to use their extended features.
Now for the method itself. For this to work with yours you'll have to change all Date to LocalDate:
public static List<TimeInterval> breakOverlappingIntervals( List<TimeInterval> sourceList ) {
TreeMap<Date,Integer> endPoints = new TreeMap<>();
// Fill the treeMap from the TimeInterval list. For each start point, increment
// the value in the map, and for each end point, decrement it.
for ( TimeInterval interval : sourceList ) {
Date start = interval.getStart();
if ( endPoints.containsKey(start)) {
endPoints.put(start, endPoints.get(start) + 1);
} else {
endPoints.put(start, 1);
}
Date end = interval.getEnd();
if ( endPoints.containsKey(end)) {
endPoints.put(end, endPoints.get(start) - 1);
} else {
endPoints.put(end, -1);
}
}
int curr = 0;
Date currStart = null;
// Iterate over the (sorted) map. Note that the first iteration is used
// merely to initialize curr and currStart to meaningful values, as no
// interval precedes the first point.
List<TimeInterval> targetList = new ArrayList<>();
for ( Map.Entry<Date,Integer> e : endPoints.entrySet() ) {
if ( curr > 0 ) {
targetList.add(new TimeInterval(currStart, e.getKey()));
}
curr += e.getValue();
currStart = e.getKey();
}
return targetList;
}
(Note that it would probably be more efficient to use a mutable Integer-like object rather than Integer here, but I opted for clarity).
I'm not fully up to speed on Joda; I'll need to read up on that if you want an overlap-specific solution.
However, this is possible using only the dates. This is mostly pseudocode, but should bring the point across. I've also added notation so you can tell what the intervals look like. There's also some confusion for me as to whether I should be adding 1 or subtracting 1 for an overlap, so I erred on the side of caution by pointing outward from the overlap (-1 for start, +1 for end).
TimeInterval a, b; //a and b are our two starting intervals
TimeInterval c = null;; //in case we have a third interval
if(a.start > b.start) { //move the earliest interval to a, latest to b, if necessary
c = a;
a = b;
b = c;
c = null;
}
if(b.start > a.start && b.start < a.end) { //case where b starts in the a interval
if(b.end > a.end) { //b ends after a |AA||AB||BB|
c = new TimeInterval(a.end + 1, b.end);//we need time interval c
b.end = a.end;
a.end = b.start - 1;
}
else if (b.end < a.end) { //b ends before a |AA||AB||AA|
c = new TimeInterval(b.end + 1, a.end);//we need time interval c
a.end = b.start - 1;
}
else { //b and a end at the same time, we don't need c |AA||AB|
c = null;
a.end = b.start - 1;
}
}
else if(a.start == b.start) { //case where b starts same time as a
if(b.end > a.end) { //b ends after a |AB||B|
b.start = a.end + 1;
a.end = a.end;
}
else if(b.end < a.end) { //b ends before a |AB||A|
b.start = b.end + 1;
b.end = a.end;
a.end = b.start;
}
else { //b and a are the same |AB|
b = null;
}
}
else {
//no overlap
}
I am a begginer at jess rules so i can't understand how i could use it. I had read a lot of tutorials but i am confused.
So i have this code :
Date choosendate = "2013-05-05";
Date date1 = "2013-05-10";
Date date2 = "2013-05-25";
Date date3 = "2013-05-05";
int var = 0;
if (choosendate.compareTo(date1)==0)
{
var = 1;
}
else if (choosendate.compareTo(date2)==0)
{
var = 2;
}
else if (choosendate.compareTo(date3)==0)
{
var = 3;
}
How i could do it with jess rules?
I would like to make a jess rules who takes the dates , compare them and give me back in java the variable var. Could you make me a simple example to understand it?
This problem isn't a good fit for Jess as written (the Java code is short and efficient as-is) but I can show you a solution that could be adapted to other more complex situations. First, you would need to define a template to hold Date, int pairs:
(deftemplate pair (slot date) (slot score))
Then you could create some facts using the template. These are somewhat equivalent to your date1, date2, etc, except they associate each date with the corresponding var value:
(import java.util.Date)
(assert (pair (date (new Date 113 4 10)) (score 1)))
(assert (pair (date (new Date 113 4 25)) (score 2)))
(assert (pair (date (new Date 113 4 5)) (score 3)))
We can define a global variable to hold the final, computed score (makes it easier to get from Java.) This is the equivalent of your var variable:
(defglobal ?*var* = 0)
Assuming that the "chosen date" is going to be in an ordered fact chosendate, we could write a rule like the following. It replaces your chain of if statements, and will compare your chosen date to all the dates in working memory until it finds a match, then stop:
(defrule score-date
(chosendate ?d)
(pair (date ?d) (score ?s))
=>
(bind ?*var* ?s)
(halt))
OK, now, all the code above goes in a file called dates.clp. The following Java code will make use of it (the call to Rete.watchAll() is included so you can see some interesting trace output; you'd leave that out in a real program):
import jess.*;
// ...
// Get Jess ready
Rete engine = new Rete();
engine.batch("dates.clp");
engine.watchAll();
// Plug in the "chosen date"
Date chosenDate = new Date(113, 4, 5);
Fact fact = new Fact("chosendate", engine);
fact.setSlotValue("__data", new Value(new ValueVector().add(chosenDate), RU.LIST));
engine.assertFact(fact);
// Run the rule and report the result
int count = engine.run();
if (count > 0) {
int score = engine.getGlobalContext().getVariable("*var*").intValue(null);
System.out.println("Score = " + score);
} else {
System.out.println("No matching date found.");
}
As I said, this isn't a great fit, because the resulting code is larger and more complex than your original. Where using a rule engine makes sense is if you've got multiple rules that interact; such a Jess program has no more overhead than this, and so fairly quickly starts to look like a simplification compared to equivalent Java code. Good luck with Jess!
I have a very simple code taken from this example, where I am using the Lin, Path and Wu-Palmer similarity measures to compute the similarity between two words. My code is as follows:
import edu.cmu.lti.lexical_db.ILexicalDatabase;
import edu.cmu.lti.lexical_db.NictWordNet;
import edu.cmu.lti.ws4j.RelatednessCalculator;
import edu.cmu.lti.ws4j.impl.Lin;
import edu.cmu.lti.ws4j.impl.Path;
import edu.cmu.lti.ws4j.impl.WuPalmer;
public class Test {
private static ILexicalDatabase db = new NictWordNet();
private static RelatednessCalculator lin = new Lin(db);
private static RelatednessCalculator wup = new WuPalmer(db);
private static RelatednessCalculator path = new Path(db);
public static void main(String[] args) {
String w1 = "walk";
String w2 = "trot";
System.out.println(lin.calcRelatednessOfWords(w1, w2));
System.out.println(wup.calcRelatednessOfWords(w1, w2));
System.out.println(path.calcRelatednessOfWords(w1, w2));
}
}
And the scores are as expected EXCEPT when both words are identical. If both words are the same (e.g. w1 = "walk"; w2 = "walk";), the three measures I have should each return 1.0. But instead, they are returning 1.7976931348623157E308.
I have used ws4j before (the same version, in fact), but I have never seen this behavior. Searching online has not yielded any clues. What could possibly be going wrong here?
P.S. The fact that the Lin, Wu-Palmer and Path measures should return 1 can also be verified with the online demo provided by ws4j
I had a similar problem, and here's what's going on here. I hope that other people who run into this problem will find by response helpful.
If you have noticed, the online demo allows you to choose word sense by specifying word in the following format: word#pos_tag#word_sense. For example, a noun gender with the first word sense would be gender#n#1.
Your code snippet uses the first word sense by default. When I calculate WuPalmer similarity between "gender" and "sex", it will return 0.26. If I use online demo, it will return 1.0. But if we use "gender#n#1" and "sex#n#1" the online demo will return 0.26, so there is no discrepancy. The online demo calculates the max of all pos tag / word sense pairs. Here's a corresponding snippet of code that should do the trick:
ILexicalDatabase db = new NictWordNet();
WS4JConfiguration.getInstance().setMFS(true);
RelatednessCalculator rc = new Lin(db);
String word1 = "gender";
String word2 = "sex";
List<POS[]> posPairs = rc.getPOSPairs();
double maxScore = -1D;
for(POS[] posPair: posPairs) {
List<Concept> synsets1 = (List<Concept>)db.getAllConcepts(word1, posPair[0].toString());
List<Concept> synsets2 = (List<Concept>)db.getAllConcepts(word2, posPair[1].toString());
for(Concept synset1: synsets1) {
for (Concept synset2: synsets2) {
Relatedness relatedness = rc.calcRelatednessOfSynset(synset1, synset2);
double score = relatedness.getScore();
if (score > maxScore) {
maxScore = score;
}
}
}
}
if (maxScore == -1D) {
maxScore = 0.0;
}
System.out.println("sim('" + word1 + "', '" + word2 + "') = " + maxScore);
Also, this will give you 0.0 similarity on non-stemmed word forms, e.g. 'genders' and 'sex.' You can use a porter stemmer included in ws4j to make sure you stem words beforehand if needed.
Hope this helps!
I had raised this issue at the googlecode site for ws4j, and it turns out that indeed it was a bug. The reply I received is as follows:
This looks like it is due to attempting to override a protected static field (this can't be done in Java). The attached patch fixes the issue by moving the definition of min and max the fields to non-static final members in RelatednessCalculator and adding getters. Implementations then provide their min/max values through super constructor calls.
Patch can be applied with patch -p1 < 0001-Cannot-override-static-members-replacing-fields-with.patch
And here is the (now resolved) issue on their site.
Here is why -
In jcn we have...
sim(c1, c2) = 1 / distance(c1, c2)
distance(c1, c2) = ic(c1) + ic(c2) - (2 * ic(lcs(c1, c2)))
where c1, c2 are the two concepts,
ic is the information content of the concept.
lcs(c1, c2) is the least common subsumer of c1 and c2.
Now, we don't want distance to be 0 (=> similarity will become
undefined).
distance can be 0 in 2 cases...
(1) ic(c1) = ic(c2) = ic(lcs(c1, c2)) = 0
ic(lcs(c1, c2)) can be 0 if the lcs turns out to be the root
node (information content of the root node is zero). But since
c1 and c2 can never be the root node, ic(c1) and ic(c2) would be 0
only if the 2 concepts have a 0 frequency count, in which case, for
lack of data, we return a relatedness of 0 (similar to the lin case).
Note that the root node ACTUALLY has an information content of
zero. Technically, none of the other concepts can have an information
content value of zero. We assign concepts zero values, when
in reality their information content is undefined (due to zero
frequency counts). To see why look at the formula for information
content: ic(c) = -log(freq(c)/freq(ROOT)) {log(0)? log(1)?}
(2) The second case that distance turns out to be zero is when...
ic(c1) + ic(c2) = 2 * ic(lcs(c1, c2))
(which could have a more likely special case ic(c1) = ic(c2) =
ic(lcs(c1, c2)) if all three turn out to be the same concept.)
How should one handle this?
Intuitively this is the case of maximum relatedness (zero
distance). For jcn this relatedness would be infinity... But we
can't return infinity. And simply returning a 0 wouldn't work...
since here we have found a pair of concepts with maximum
relatedness, and returning a 0 would be like saying that they
aren't related at all.
1.7976931348623157E308 is the value of Double.MAX_VALUE but the maximum value of some similarity/relatedness algo (Lin, WuPalmer and Path) are between 0 and 1. Then , for identical synset, the maxium value can be returned is 1. Into the version of my repo (https://github.com/DonatoMeoli/WS4J) i fixed this and other bugs.
Now, for two identical words, the values returned are:
HirstStOnge 16.0
LeacockChodorow 1.7976931348623157E308
Lesk 1.7976931348623157E308
WuPalmer 1.0
Resnik 1.7976931348623157E308
JiangConrath 1.7976931348623157E308
Lin 1.0
Path 1.0
Done in 67 msec.
Process finished with exit code 0