OR some AND rules in OWL API? - java

I don’t seem to be able to figure out how to OR (ObjectUnionOf?) a set of AND (ObjectIntersectionOf) rules. What my code produces when the OWL file is opened in protégé is rules (has_difi_min some double[<= "184.84"^^double]) and (has_mean_ndvi some double[<= "0.3428"^^double]), etc. with lines separating the "rulesets" as shown below in the screenshot.
My OWLAPI code:
/* write rules */
// OWLObjectIntersectionOf intersection = null;
OWLClassExpression firstRuleSet = null;
OWLClass owlCls = null;
OWLObjectUnionOf union = null;
Iterator it = rules.map.entrySet().iterator();
Set<OWLClassExpression> unionSet = new HashSet<OWLClassExpression>();
while (it.hasNext()) {
Map.Entry pair = (Map.Entry) it.next();
String currCls = (String) pair.getKey();
owlCls = factory.getOWLClass(IRI.create("#" + currCls));
ArrayList<owlRuleSet> currRuleset = (ArrayList<owlRuleSet>) pair.getValue();
for (int i = 0; i < currRuleset.size(); i++) {
firstRuleSet = factory.getOWLObjectIntersectionOf(currRuleset.get(i).getRuleList(currCls))
union = factory.getOWLObjectUnionOf(firstRuleSet);
manager.addAxiom(ontology, factory.getOWLEquivalentClassesAxiom(owlCls, union));
}
}
manager.saveOntology(ontology);
This is what is looks like:
I want the lines to be ORs.
edit: Thanks Ignazio!
My OWLAPI code now looks like so:
/* write rules */
OWLClass owlCls = null;
OWLObjectUnionOf totalUnion = null;
Iterator it = rules.map.entrySet().iterator();
Set<OWLClassExpression> unionSet = new HashSet<OWLClassExpression>();
while (it.hasNext()) {
Map.Entry pair = (Map.Entry) it.next();
String currCls = (String) pair.getKey();
owlCls = factory.getOWLClass(IRI.create("#" + currCls));
ArrayList<owlRuleSet> currRuleset = (ArrayList<owlRuleSet>) pair.getValue();
for (int i = 0; i < currRuleset.size(); i++) {
firstRuleSet = factory.getOWLObjectIntersectionOf(currRuleset.get(i).getRuleList(currCls))
unionSet.add(firstRuleSet);
}
totalUnion = factory.getOWLObjectUnionOf(unionSet);
unionSet.clear()
manager.addAxiom(ontology, factory.getOWLEquivalentClassesAxiom(owlCls, totalunion));
}
manager.saveOntology(ontology);

You are creating unionSet but not using it. Instead of adding an axiom to the ontology, add firstRuleSet to unionSet, then create an equivalent class axiom outside the main loop, just before saving the ontology.

Related

Accessing custom Lucene attribute from DirectoryReader

I added a custom attribute to my Lucene pipeline like described here (in the "Adding a custom Attribute" section).
Now, after I built my index (by adding all the documents via IndexWriter) I want to be able to assess this attribute when reading the index directory. How do I do this?
What I'm doing now is the following:
DirectoryReader reader = DirectoryReader.open(index);
TermsEnum iterator = null;
for (int i = 0; i < r.maxDoc(); i++) {
Terms terms = r.getTermVector(i, "content");
iterator = terms.iterator(iterator);
AttributeSource attributes = iterator.attributes();
SentenceAttribute sentence = attributes.addAttribute(SentenceAttribute.class);
while (true) {
BytesRef term = iterator.next();
if (term == null) {
break;
}
System.out.println(term.utf8ToString());
System.out.println(sentence.getStringSentenceId());
}
}
It doesn't seem to work: I get the same sentenceId all the time.
I use Lucene 4.9.1.
Finally, I solved it. To do it, I used PayloadAttribute to store the data I needed.
To store payloads in the index, first, set storeTermVectorPayloads property of the Field as well as some other stuff:
fieldType.setStoreTermVectors(true);
fieldType.setStoreTermVectorOffsets(true);
fieldType.setStoreTermVectorPositions(true);
fieldType.setStoreTermVectorPayloads(true);
Then for each token during the analyzation phase, set the payload attribute:
private final PayloadAttribute payloadAtt = addAttribute(PayloadAttribute.class);
// in incrementToken()
payloadAtt.setPayload(new BytesRef(String.valueOf(myAttr)));
Then build an index, and, finally, after that it's possible to get the payload this way:
DocsAndPositionsEnum payloads = null;
TermsEnum iterator = null;
Terms termVector = reader.getTermVector(docId, "field");
iterator = termVector.iterator(iterator);
while ((ref = iterator.next()) != null) {
payloads = iterator.docsAndPositions(null, payloads, DocsAndPositionsEnum.FLAG_PAYLOADS);
while (payloads.nextDoc() != DocIdSetIterator.NO_MORE_DOCS) {
int freq = payloads.freq();
for (int i = 0; i < freq; i++) {
payloads.nextPosition();
BytesRef payload = payloads.getPayload();
// do something with the payload
}
}
}

How to extract key phrases from a given text with OpenNLP?

I'm using Apache OpenNLP and i'd like to extract the Keyphrases of a given text. I'm already gathering entities - but i would like to have Keyphrases.
The problem i have is that i can't use TF-IDF cause i don't have models for that and i only have a single text (not multiple documents)
Here is some code (prototyped - not so clean)
public List<KeywordsModel> extractKeywords(String text, NLPProvider pipeline) {
SentenceDetectorME sentenceDetector = new SentenceDetectorME(pipeline.getSentencedetecto("en"));
TokenizerME tokenizer = new TokenizerME(pipeline.getTokenizer("en"));
POSTaggerME posTagger = new POSTaggerME(pipeline.getPosmodel("en"));
ChunkerME chunker = new ChunkerME(pipeline.getChunker("en"));
ArrayList<String> stopwords = pipeline.getStopwords("en");
Span[] sentSpans = sentenceDetector.sentPosDetect(text);
Map<String, Float> results = new LinkedHashMap<>();
SortedMap<String, Float> sortedData = new TreeMap(new MapSort.FloatValueComparer(results));
float sentenceCounter = sentSpans.length;
float prominenceVal = 0;
int sentences = sentSpans.length;
for (Span sentSpan : sentSpans) {
prominenceVal = sentenceCounter / sentences;
sentenceCounter--;
String sentence = sentSpan.getCoveredText(text).toString();
int start = sentSpan.getStart();
Span[] tokSpans = tokenizer.tokenizePos(sentence);
String[] tokens = new String[tokSpans.length];
for (int i = 0; i < tokens.length; i++) {
tokens[i] = tokSpans[i].getCoveredText(sentence).toString();
}
String[] tags = posTagger.tag(tokens);
Span[] chunks = chunker.chunkAsSpans(tokens, tags);
for (Span chunk : chunks) {
if ("NP".equals(chunk.getType())) {
int npstart = start + tokSpans[chunk.getStart()].getStart();
int npend = start + tokSpans[chunk.getEnd() - 1].getEnd();
String potentialKey = text.substring(npstart, npend);
if (!results.containsKey(potentialKey)) {
boolean hasStopWord = false;
String[] pKeys = potentialKey.split("\\s+");
if (pKeys.length < 3) {
for (String pKey : pKeys) {
for (String stopword : stopwords) {
if (pKey.toLowerCase().matches(stopword)) {
hasStopWord = true;
break;
}
}
if (hasStopWord == true) {
break;
}
}
}else{
hasStopWord=true;
}
if (hasStopWord == false) {
int count = StringUtils.countMatches(text, potentialKey);
results.put(potentialKey, (float) (Math.log(count) / 100) + (float)(prominenceVal/5));
}
}
}
}
}
sortedData.putAll(results);
System.out.println(sortedData);
return null;
}
What it basically does is giving me the Nouns back and sorting them by prominence value (where is it in the text?) and counts.
But honestly - this doesn't work soo good.
I also tried it with lucene analyzer but the results were also not so good.
So - how can i achieve what i want to do? I already know of KEA/Maui-indexer etc (but i'm afraid i can't use them because of GPL :( )
Also interesting? Which other algorithms can i use instead of TF-IDF?
Example:
This text: http://techcrunch.com/2015/09/04/etsys-pulling-the-plug-on-grand-st-at-the-end-of-this-month/
Good output in my opinion: Etsy, Grand St., solar chargers, maker marketplace, tech hardware
Finally, i found something:
https://github.com/srijiths/jtopia
It is using the POS from opennlp/stanfordnlp. It has an ALS2 license. Haven't measured precision and recall yet but it delivers great results in my opinion.
Here is my code:
Configuration.setTaggerType("openNLP");
Configuration.setSingleStrength(6);
Configuration.setNoLimitStrength(5);
// if tagger type is "openNLP" then give the openNLP POS tagger path
//Configuration.setModelFileLocation("model/openNLP/en-pos-maxent.bin");
// if tagger type is "default" then give the default POS lexicon file
//Configuration.setModelFileLocation("model/default/english-lexicon.txt");
// if tagger type is "stanford "
Configuration.setModelFileLocation("Dont need that here");
Configuration.setPipeline(pipeline);
TermsExtractor termExtractor = new TermsExtractor();
TermDocument topiaDoc = new TermDocument();
topiaDoc = termExtractor.extractTerms(text);
//logger.info("Extracted terms : " + topiaDoc.getExtractedTerms());
Map<String, ArrayList<Integer>> finalFilteredTerms = topiaDoc.getFinalFilteredTerms();
List<KeywordsModel> keywords = new ArrayList<>();
for (Map.Entry<String, ArrayList<Integer>> e : finalFilteredTerms.entrySet()) {
KeywordsModel keyword = new KeywordsModel();
keyword.setLabel(e.getKey());
keywords.add(keyword);
}
I modified the Configurationfile a bit so that the POSModel is loaded from the pipeline instance.

Get value from List<ListObject>

I am interestig in how to get value from Object from List<>.
Here is code example with Objects
#Override
public List<ListObject> initChildren() {
//Init the list
List<ListObject> mObjects = new ArrayList<ListObject>();
//Add an object to the list
StockObject s1 = new StockObject(this);
s1.code = "Системне програмування-1";
s1.num = "1.";
s1.value = "307/18";
s1.time = "8:30 - 10:05";
mObjects.add(s1);
StockObject s2 = new StockObject(this);
s2.code = "Комп'ютерна електроніка";
s2.num = "2.";
s2.value = "305/18";
s2.time = "10:25 - 11:00";
mObjects.add(s2);
StockObject s3 = new StockObject(this);
s3.code = "Психологія";
s3.num = "3.";
s3.value = "201/20";
s3.time = "11:20 - 13:55";
mObjects.add(s3);
StockObject s4 = new StockObject(this);
s4.code = "Проектування програмного забезпечення";
s4.num = "4.";
s4.value = "24";
s4.time = "14:15 - 16:50";
mObjects.add(s4);
return mObjects;
}
You can use get() method like follows
mObjects.get(index)
where index is the zero based index of your List, just like an array.
To directly access object, you do for example,
mObjects.get(index).code
you use like below
List<ListObject> mObjects = new ArrayList<ListObject>();
.......................your program...........
//access using enhanced for loop
for(ListObject myObj : mObjects){
System.out.println(myObj.code);
System.out.println(myObj.num);
System.out.println(myObj.value);
}
//access using index
int index=0;
System.out.println(mObjects.get(index).code);
You can iterate over the collection and type cast to the data type you know its in there.
List<ListObject> listOfObjects = initChildren();
for (Iterator iterator = listOfObjects.iterator(); iterator.hasNext();) {
StockObject so = (StockObject) iterator.next();
// do whatever you want with your StockObject (so)
System.out.println("Code:" + so.code);
}
You can use for each syntax as well like following
List<ListObject> listOfObjects = initChildren();
for (ListObject listObject : listOfObjects) {
StockObject so = (StockObject) listObject;
// do whatever you want with your StockObject (so)
System.out.println("Code:" + so.code);
}

BIRT: How to remove a dataset parameter programmatically

I want to modify an existing *.rptdesign file and save it under a new name.
The existing file contains a Data Set with a template SQL select statement and several DS parameters.
I'd like to use an actual SQL select statement which uses only part of the DS parameters.
However, the following code results in the exception:
Exception in thread "main" `java.lang.RuntimeException`: *The structure is floating, and its handle is invalid!*
at org.eclipse.birt.report.model.api.StructureHandle.getStringProperty(StructureHandle.java:207)
at org.eclipse.birt.report.model.api.DataSetParameterHandle.getName(DataSetParameterHandle.java:143)
at org.eclipse.birt.report.model.api.DataSetHandle$DataSetParametersPropertyHandle.removeParamBindingsFor(DataSetHandle.java:851)
at org.eclipse.birt.report.model.api.DataSetHandle$DataSetParametersPropertyHandle.removeItems(DataSetHandle.java:694)
--
OdaDataSetHandle dsMaster = (OdaDataSetHandle) report.findDataSet("Master");
HashSet<String> bindVarsUsed = new HashSet<String>();
...
// find out which DS parameters are actually used
HashSet<String> bindVarsUsed = new HashSet<String>();
...
ArrayList<OdaDataSetParameterHandle> toRemove = new ArrayList<OdaDataSetParameterHandle>();
for (Iterator iter = dsMaster.parametersIterator(); iter.hasNext(); ) {
OdaDataSetParameterHandle dsPara = (OdaDataSetParameterHandle)iter.next();
String name = dsPara.getName();
if (name.startsWith("param_")) {
String bindVarName = name.substring(6);
if (!bindVarsUsed.contains(bindVarName)) {
toRemove.add(dsPara);
}
}
}
PropertyHandle paramsHandle = dsMaster.getPropertyHandle( OdaDataSetHandle.PARAMETERS_PROP );
paramsHandle.removeItems(toRemove);
What is wrong here?
Has anyone used the DE API to remove parameters from an existing Data Set?
I had similar issue. Resolved it by calling 'removeItem' multiple times and also had to re-evaluate parametersIterator everytime.
protected void updateDataSetParameters(OdaDataSetHandle dataSetHandle) throws SemanticException {
int countMatches = StringUtils.countMatches(dataSetHandle.getQueryText(), "?");
int paramIndex = 0;
do {
paramIndex = 0;
PropertyHandle odaDataSetParameterProp = dataSetHandle.getPropertyHandle(OdaDataSetHandle.PARAMETERS_PROP);
Iterator parametersIterator = dataSetHandle.parametersIterator();
while(parametersIterator.hasNext()) {
Object next = parametersIterator.next();
paramIndex++;
if(paramIndex > countMatches) {
odaDataSetParameterProp.removeItem(next);
break;
}
}
if(paramIndex < countMatches) {
paramIndex++;
OdaDataSetParameter dataSetParameter = createDataSetParameter(paramIndex);
odaDataSetParameterProp.addItem(dataSetParameter);
}
} while(countMatches != paramIndex);
}
private OdaDataSetParameter createDataSetParameter(int paramIndex) {
OdaDataSetParameter dataSetParameter = StructureFactory.createOdaDataSetParameter();
dataSetParameter.setName("param_" + paramIndex);
dataSetParameter.setDataType(DesignChoiceConstants.PARAM_TYPE_INTEGER);
dataSetParameter.setNativeDataType(1);
dataSetParameter.setPosition(paramIndex);
dataSetParameter.setIsInput(true);
dataSetParameter.setIsOutput(false);
dataSetParameter.setExpressionProperty("defaultValue", new Expression("<evaluation script>", ExpressionType.JAVASCRIPT));
return dataSetParameter;
}

In ArrayList, how to remove subdirecyory if its parent is already present in the list?

I've an ArrayList<String> containing paths of directiories, like:
/home, /usr...
I want to write a code that will remove all the paths from the list if the list already contains parent direcotry of that element.
For e.g:
If the list contains:
/home
/home/games
then, /home/games should get removed as its parent /home is already in the list.
Below is the code:
for (int i = 0; i < checkedList.size(); i++) {
File f = new File(checkedList.get(i));
if(checkedList.contains(f.getParent()));
checkedList.remove(checkedList.get(i));
}
Above checkedList is a String arrayList.
The problem comes when the list contains:
/home
/home/games/minesweeper
Now the minesweeper folder will not get removed as its parent games is not in the list. How to remove these kinds of elements too?
Another possible solution would be using String.startsWith(String).
But of course you could take advantage of parent functionality of File class in order to handle the relative directories and other particularities. Follows a draft of the solution:
List<String> listOfDirectories = new ArrayList<String>();
listOfDirectories.add("/home/user/tmp/test");
listOfDirectories.add("/home/user");
listOfDirectories.add("/tmp");
listOfDirectories.add("/etc/test");
listOfDirectories.add("/etc/another");
List<String> result = new ArrayList<String>();
for (int i = 0; i < listOfDirectories.size(); i++) {
File current = new File(listOfDirectories.get(i));
File parent = current;
while ((parent = parent.getParentFile()) != null) {
if (listOfDirectories.contains(parent.getAbsolutePath())) {
current = parent;
}
}
String absolutePath = current.getAbsolutePath();
if (!result.contains(absolutePath)) {
result.add(absolutePath);
}
}
System.out.println(result);
This would print:
[/home/user, /tmp, /etc/test, /etc/another]
You can do some string manipulation to get the base directory of each string.
int baseIndex = checkedList.get(i).indexOf("/",1);
String baseDirectory = checkedList.get(i).substring(0,baseIndex);
if(baseIndex != -1 && checkedList.contains(baseDirectory))
{
checkedList.remove(checkedList.get(i));
}
This will get the index of the second '/' and extract the string up until that slash. If the second slash exists, then it checks if the list contains the base string and removes the current string if there's a mtach.
You can substract the root from your string and add it to a hashset.
For example:
if you have /home/games you can substract "home" from the string using string substraction or a regular expression or whateever you want.
before you add "home" to the hashset you must check if it's already added:
if (hashset.Contains("home"))
{
//then it s already added
}
else
{
hashhset.add("home");
}
would doing the opposite work? if the parent is NOT found in your ArrayList, add the value to a final output ArrayList?
for (int i = 0; i < checkedList.size(); i++) {
File f = new File(checkedList.get(i));
if(!checkedList.contains(f.getParent()));
yourOutputList.Add(checkedList.get(i));
}
You should check every parent of each list item in turn.
I will assume that your list contains normalized absolute path File objects:
for (int i = 0; i < checkedList.size(); i++) {
File curItem = checkedList.get(i);
for (
File curParent = curItem.getParent( );
curParent != null;
curParent = curParent.getParent( )
)
{
if(checkedList.contains( curParent ) )
{
checkedList.remove( curItem );
break;
}
}
}
Actually, I would rewrite it with ListIterator
for (ListIterator iter = checkedList.iterator(); iter.hasNext(); )
{
File curItem = iter.next();
for (
File curParent = curItem.getParent( );
curParent != null;
curParent = curParent.getParent( )
)
{
if(checkedList.contains( curParent ) )
{
iter.remove( );
break;
}
}
}

Categories