I'm trying to use the Jena OntModel to get the direct relations of an ontology.
Problems come from the listClasses() method.
I searched a while for tips on the net but don't find relevant answer to my problem.
So here comes a complete example with minimal data and code showing what's wrong.
I have a for example this basic ontology (N-Triple formated):
<http://weblab.ow2.org/wookie#Anti-social_behaviour> <http://www.w3.org/2000/01/rdf-schema#subClassOf> <http://weblab.ow2.org/wookie#CriminalEvent>.
<http://weblab.ow2.org/wookie#Robbery> <http://www.w3.org/2000/01/rdf-schema#subClassOf> <http://weblab.ow2.org/wookie#CriminalEvent>.
<http://weblab.ow2.org/wookie#Vehicle_crime> <http://www.w3.org/2000/01/rdf-schema#subClassOf> <http://weblab.ow2.org/wookie#CriminalEvent>.
<http://weblab.ow2.org/wookie#Bicycle_theft> <http://www.w3.org/2000/01/rdf-schema#subClassOf> <http://weblab.ow2.org/wookie#CriminalEvent>.
<http://weblab.ow2.org/wookie#CriminalEvent> <http://www.w3.org/2000/01/rdf-schema#subClassOf> <http://weblab.ow2.org/wookie#Event>.
<http://weblab.ow2.org/wookie#Event> <http://www.w3.org/2000/01/rdf-schema#subClassOf> <http://weblab.ow2.org/wookie#WookieThing>.
<http://weblab.ow2.org/wookie#Event> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2000/01/rdf-schema#Class>.
I basically would like to be able to get the all the classes, and for each class, get subClasses.
I use the following JAVA code :
import java.io.FileNotFoundException;
import java.io.IOException;
import java.util.Collections;
import java.util.Comparator;
import java.util.List;
import com.hp.hpl.jena.ontology.OntClass;
import com.hp.hpl.jena.ontology.OntModel;
import com.hp.hpl.jena.query.Query;
import com.hp.hpl.jena.query.QueryExecutionFactory;
import com.hp.hpl.jena.query.QueryFactory;
import com.hp.hpl.jena.query.ResultSet;
import com.hp.hpl.jena.rdf.model.ModelFactory;
import com.hp.hpl.jena.rdf.model.Statement;
import com.hp.hpl.jena.util.FileManager;
public class Main {
public static void main(final String [] argv) throws FileNotFoundException, IOException {
final OntModel model = ModelFactory.createOntologyModel();
model.read(FileManager.get().open("./src/test/resources/eventOnto.n3"), "", "N-TRIPLE");
// This part allows to check that the ontology model is really loaded and that inference is
// correctly done. WORKING
final List<Statement> statements = model.listStatements().toList();
Collections.sort(statements, new Comparator<Statement>() {
#Override
public int compare(final Statement o1, final Statement o2) {
return o1.toString().compareTo(o2.toString());
}
});
for (final Statement statement : statements) {
System.out.println(statement);
}
System.out.println("-------------------------------------------------");
// Listing all the classes.
final List<OntClass> classes = model.listClasses().toList();
for (final OntClass ontclass : classes) {
System.out.println(ontclass);
}
System.out.println("-------------------------------------------------");
// Bug got nothing. So try with a SPARQL query...
final Query query = QueryFactory.create("PREFIX rdf:<http://www.w3.org/2000/01/rdf-schema#> SELECT distinct ?class WHERE {?class a rdf:Class.}");
final ResultSet queryResult = QueryExecutionFactory.create(query, model).execSelect();
while (queryResult.hasNext()) {
System.out.println(queryResult.next());
}
// and got many results...
}
Prints show that the ontology model is correctly loaded and basic inference is done. It also show that the getClasses() don't return anything while the SPARQL query that ask for classes got every single class.
[http://weblab.ow2.org/wookie#Anti-social_behaviour, http://www.w3.org/1999/02/22-rdf-syntax-ns#type, http://www.w3.org/2000/01/rdf-schema#Class]
[http://weblab.ow2.org/wookie#Anti-social_behaviour, http://www.w3.org/1999/02/22-rdf-syntax-ns#type, http://www.w3.org/2000/01/rdf-schema#Resource]
[http://weblab.ow2.org/wookie#Anti-social_behaviour, http://www.w3.org/2000/01/rdf-schema#subClassOf, http://weblab.ow2.org/wookie#Anti-social_behaviour]
[http://weblab.ow2.org/wookie#Anti-social_behaviour, http://www.w3.org/2000/01/rdf-schema#subClassOf, http://weblab.ow2.org/wookie#CriminalEvent]
[http://weblab.ow2.org/wookie#Anti-social_behaviour, http://www.w3.org/2000/01/rdf-schema#subClassOf, http://weblab.ow2.org/wookie#Event]
[http://weblab.ow2.org/wookie#Anti-social_behaviour, http://www.w3.org/2000/01/rdf-schema#subClassOf, http://weblab.ow2.org/wookie#WookieThing]
[http://weblab.ow2.org/wookie#Anti-social_behaviour, http://www.w3.org/2000/01/rdf-schema#subClassOf, http://www.w3.org/2000/01/rdf-schema#Resource]
[http://weblab.ow2.org/wookie#Anti-social_behaviour, http://www.w3.org/2000/01/rdf-schema#subClassOf, http://www.w3.org/2002/07/owl#Thing]
...(many other lines)...
-------------------------------------------------
-------------------------------------------------
( ?class = rdf:Property )
( ?class = rdf:List )
( ?class = rdfs:Resource )
( ?class = rdfs:Class )
( ?class = rdf:Statement )
( ?class = rdfs:Literal )
( ?class = <http://weblab.ow2.org/wookie#Event> )
( ?class = rdf:XMLLiteral )
( ?class = owl:Thing )
( ?class = <http://weblab.ow2.org/wookie#WookieThing> )
( ?class = <http://weblab.ow2.org/wookie#CriminalEvent> )
( ?class = rdfs:Container )
( ?class = <http://weblab.ow2.org/wookie#Robbery> )
( ?class = <http://weblab.ow2.org/wookie#Bicycle_theft> )
( ?class = rdf:Seq )
( ?class = rdf:Alt )
( ?class = <http://weblab.ow2.org/wookie#Anti-social_behaviour> )
( ?class = <http://weblab.ow2.org/wookie#Vehicle_crime> )
( ?class = rdfs:ContainerMembershipProperty )
( ?class = rdf:Bag )
( ?class = rdfs:Datatype )
Do anyone know why the OntModel.getClasses() don't return anything ?
You get no results for two reasons.
The default ontology profile is an OWL profile, so you're getting an OWL model use getOntologyModel without a particular OntModelSpec. Instead, you should probably be using an RDFS ontology model. That will make listClasses look for instances of rdfs:Class rather than owl:Class. That will still only get you one result, though, because you've only declared one thing to be an rdfs:Class.
You've only declared one thing as an rdfs:Class, and the rest are supposed to be inferred to be rdfs:Classes based on the fact that they're subclasses of something else. That means that you need to use an inference model. In this case, you probably want an RDFS inference model, but because (some of) Jena's OWL reasoners include the RDFS rules, you might be able to use an OWL model too.
Here's code that loads your data into a few different types of models and shows the results of listClasses:
import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.io.InputStream;
import org.apache.jena.riot.Lang;
import org.apache.jena.riot.RDFDataMgr;
import com.hp.hpl.jena.ontology.OntClass;
import com.hp.hpl.jena.ontology.OntModel;
import com.hp.hpl.jena.ontology.OntModelSpec;
import com.hp.hpl.jena.rdf.model.Model;
import com.hp.hpl.jena.rdf.model.ModelFactory;
public class ListClassesExample {
public static void main(String[] args) throws IOException {
String content =
"<http://weblab.ow2.org/wookie#Anti-social_behaviour> <http://www.w3.org/2000/01/rdf-schema#subClassOf> <http://weblab.ow2.org/wookie#CriminalEvent>.\n" +
"<http://weblab.ow2.org/wookie#Robbery> <http://www.w3.org/2000/01/rdf-schema#subClassOf> <http://weblab.ow2.org/wookie#CriminalEvent>.\n" +
"<http://weblab.ow2.org/wookie#Vehicle_crime> <http://www.w3.org/2000/01/rdf-schema#subClassOf> <http://weblab.ow2.org/wookie#CriminalEvent>.\n" +
"<http://weblab.ow2.org/wookie#Bicycle_theft> <http://www.w3.org/2000/01/rdf-schema#subClassOf> <http://weblab.ow2.org/wookie#CriminalEvent>.\n" +
"<http://weblab.ow2.org/wookie#CriminalEvent> <http://www.w3.org/2000/01/rdf-schema#subClassOf> <http://weblab.ow2.org/wookie#Event>.\n" +
"<http://weblab.ow2.org/wookie#Event> <http://www.w3.org/2000/01/rdf-schema#subClassOf> <http://weblab.ow2.org/wookie#WookieThing>.\n" +
"<http://weblab.ow2.org/wookie#Event> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2000/01/rdf-schema#Class>.";
final Model base = ModelFactory.createDefaultModel();
try ( InputStream in = new ByteArrayInputStream( content.getBytes() )) {
RDFDataMgr.read( base, in, Lang.NTRIPLES );
}
System.out.println( "== OWL Classes (no inference) ==" );
OntModel owlOntology = ModelFactory.createOntologyModel( OntModelSpec.OWL_MEM, base );
for ( OntClass klass : owlOntology.listClasses().toList() ) {
System.out.println( klass );
}
System.out.println( "== RDFS Classes (no inference) ==" );
OntModel rdfsOntology = ModelFactory.createOntologyModel( OntModelSpec.RDFS_MEM, base );
for ( OntClass klass : rdfsOntology.listClasses().toList() ) {
System.out.println( klass );
}
System.out.println( "== RDFS Classes (with inference) ==" );
OntModel rdfsOntologyInf = ModelFactory.createOntologyModel( OntModelSpec.RDFS_MEM_RDFS_INF, base );
for ( OntClass klass : rdfsOntologyInf.listClasses().toList() ) {
System.out.println( klass );
}
System.out.println( "== End ==");
}
}
== OWL Classes (no inference) ==
== RDFS Classes (no inference) ==
http://weblab.ow2.org/wookie#Event
== RDFS Classes (with inference) ==
http://www.w3.org/1999/02/22-rdf-syntax-ns#Property
http://www.w3.org/1999/02/22-rdf-syntax-ns#List
http://www.w3.org/2000/01/rdf-schema#Resource
http://www.w3.org/2000/01/rdf-schema#Class
http://www.w3.org/1999/02/22-rdf-syntax-ns#Statement
http://www.w3.org/2000/01/rdf-schema#Literal
http://weblab.ow2.org/wookie#Event
http://weblab.ow2.org/wookie#WookieThing
http://weblab.ow2.org/wookie#CriminalEvent
http://www.w3.org/2000/01/rdf-schema#Container
http://weblab.ow2.org/wookie#Robbery
http://weblab.ow2.org/wookie#Bicycle_theft
http://www.w3.org/1999/02/22-rdf-syntax-ns#Seq
http://www.w3.org/1999/02/22-rdf-syntax-ns#Alt
http://weblab.ow2.org/wookie#Anti-social_behaviour
http://weblab.ow2.org/wookie#Vehicle_crime
http://www.w3.org/2000/01/rdf-schema#ContainerMembershipProperty
http://www.w3.org/1999/02/22-rdf-syntax-ns#Bag
http://www.w3.org/2000/01/rdf-schema#Datatype
http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral
== End ==
This has some classes that you didn't mention, because the RDFS axioms entail them. If you just want the things that are declared as rdfs:Classes in your data, and the subclasses of them, you could use a SPARQL query:
String query = "\n" +
"prefix rdfs: <"+RDFS.getURI()+">\n" +
"\n" +
"select distinct ?class where {\n" +
" { ?class a rdfs:Class } union\n" +
" { ?class rdfs:subClassOf|^rdfs:subClassOf [] }\n" +
"}";
ResultSet results = QueryExecutionFactory.create( query, base ).execSelect();
System.out.println( query );
ResultSetFormatter.out( results );
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select distinct ?class where {
{ ?class a rdfs:Class } union
{ ?class rdfs:subClassOf|^rdfs:subClassOf [] }
}
--------------------------------------------------------
| class |
========================================================
| <http://weblab.ow2.org/wookie#Event> |
| <http://weblab.ow2.org/wookie#Bicycle_theft> |
| <http://weblab.ow2.org/wookie#Anti-social_behaviour> |
| <http://weblab.ow2.org/wookie#WookieThing> |
| <http://weblab.ow2.org/wookie#Vehicle_crime> |
| <http://weblab.ow2.org/wookie#CriminalEvent> |
| <http://weblab.ow2.org/wookie#Robbery> |
--------------------------------------------------------
Related
I've made an Ontology in Protege and i want to create new individuals using eclipse i' using this code
public class testOwl2 {
public static final String SOURCE_URL = "http://www.semanticweb.org/nira/ontologies/2022/3/untitled-ontology-9";
// where we've stashed it on disk for the time being
protected static final String SOURCE_FILE = "C:\\Users\\benni\\Ontologies\\L'ontologie classique.owl";
// the namespace of the ontology
public static final String NS = SOURCE_URL + "#";
/***********************************/
/* External signature methods */
/***********************************/
public void run() {
OntModel m = ModelFactory.createOntologyModel( OntModelSpec.OWL_MEM );
loadModel( m );
// get an OntClass reference to one of the classes in the model
// note: ideally, we would delegate this step to Jena's schemagen tool
OntClass Patient = m.getOntClass( NS + "Patient" );
//OntProperty Patient_relation = m.getObjectProperty( NS + "Has_sign" );
// similarly a reference to the attack duration property,
// and again, using schemagen would be better
OntProperty Patient_Crea = m.getDatatypeProperty( NS + "Creatinine_value" );
// create an instance of the attack class to represent the current attack
Individual Patient1 = m.createIndividual( NS + "P4", Patient );
// add a duration to the attack
Patient1.addProperty( Patient_Crea, m.createTypedLiteral( 10 ) );
m.prepare();
// finally, print out the model to show that we have some data
m.write( System.out, "Turtle" );
}
/***********************************/
/* Internal implementation methods */
/***********************************/
/** read the ontology and add it as a sub-model of the given ontmodel */
protected void loadModel( OntModel m ) {
FileManager.get().getLocationMapper().addAltEntry( SOURCE_URL, SOURCE_FILE );
Model baseOntology = FileManager.get().loadModel( SOURCE_URL );
m.addSubModel( baseOntology );
// for compactness, add a prefix declaration st: (for Sam Thomas)
m.setNsPrefix( "st", NS );
}
public static void main( String[] args ) {
new testOwl2().run();}}
Output
#prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
#prefix st: <http://www.semanticweb.org/nira/ontologies/2022/3/untitled-ontology-9#> .
#prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
#prefix owl: <http://www.w3.org/2002/07/owl#> .
#prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
st:P4
a st:Patient ;
st:Creatinine_value "10"^^xsd:int .
But in the owl file (protege) i dont have any individuals that i creat in my ontology could u guys tell what wrong with this code and where can i find this individuals ...thank you
I have tried numerous ways and followed some of the examples that are scattered around the web on how to write a jagged array (an array of arrays that may be of differing lengths) in HDF5.
Most of the examples are in C and rather low-level. Anyhow I can't seem to get it working and I just looked at the C-source code and it pretty much says that any variable-length datatypes that are not strings are not supported (if I understood correctly).
My miserable dysfunctional code (as is):
public void WIP_createVLenFloatDataSet( List<? extends Number> floats ) throws Exception
{
String group = "/test";
long groupId = createGroupIfNotExist( group );
MDataQualifier qualifier = new MDataQualifierImpl( group, "float", "0.0.0" );
long datasetId = openDataSet( qualifier );
long heapType = H5.H5Tcopy( MDataType.FLOAT_ARRAY.getHDFType() );
heapType = H5.H5Tvlen_create( heapType );
// heapType = H5.H5Tarray_create( heapType, 1, new long[]{1} );
if( !exists( datasetId ) )
{
long[] maxDims = new long[]{ HDF5Constants.H5S_UNLIMITED };
long dataspaceId = H5.H5Screate_simple( 1, new long[]{ 1 }, null );
// Create the dataset.
long datasetId1 = -1;
try
{
if( exists( m_fileId ) && exists( dataspaceId ) && exists( heapType ) )
{
long creationProperties = H5.H5Pcreate( HDF5Constants.H5P_DATASET_CREATE );
H5.H5Pset_chunk( creationProperties, /*ndims*/1, new long[]{ 1 } );
datasetId1 = H5.H5Dcreate( groupId, qualifier.getVersionedName(), heapType, dataspaceId, H5P_DEFAULT, creationProperties, H5P_DEFAULT );
// H5.H5Pclose( creationProperties );
}
}
catch( Exception e )
{
LOG.error( "Problems creating the dataset: " + e.getMessage(), e );
}
datasetId = datasetId1;
if( exists( datasetId ) )
{
// flushIfNecessary();
LOG.trace( "Wrote empty dataset {}", qualifier.getVersionedName() );
}
}
List<? extends Number> data = ( List<? extends Number> )floats;
// H5.H5Dwrite( datasetId, heapType, dataspaceId, memSpaceId, HDF5Constants.H5P_DEFAULT, Floats.toArray( data) );
ByteBuffer bb = ByteBuffer.allocate( data.size() * 4 );
floats.forEach( f -> bb.putFloat( f.floatValue() ) );
// H5.H5Dwrite( datasetId, heapType, H5S_ALL, H5S_ALL, H5P_DEFAULT, Floats.toArray( data ) );
H5.H5Dwrite( datasetId, heapType, H5S_ALL, H5S_ALL, H5P_DEFAULT, bb.array() );
}
Has anyone done this before and can at least confirm that it's not possible?
The most I can get out of HDF5 is the message "buf does not support variable length type".
Apparently the "glue code" of the JNI wrapper doesn't support this. If you want to use this feature you either have to implement your own JNI or wait for a newer version. The official JNI code is open source and can be found here.
I have a drools rule created via the Guvnor console and the rule validates and inserts a fact into the working memory if conditions were met. The rule is:
1. | rule "EligibilityCheck001"
2. | dialect "mvel"
3. | when
4. | Eligibility( XXX== "XXX" , YYY== "YYY" , ZZZ== "ZZZ" , BBB == "BBB" )
5. | then
6. | EligibilityInquiry fact0 = new EligibilityInquiry();
7. | fact0.setServiceName( "ABCD" );
8. | fact0.setMemberStatus( true );
9. | insert(fact0 );
10. | System.out.println( "Hello from Drools");
11. | end
Java code that executes the rule is as follows
RuleAgent ruleAgent = RuleAgent.newRuleAgent("/Guvnor.properties");
RuleBase ruleBase = ruleAgent.getRuleBase();
FactType factType = ruleBase.getFactType("mortgages.Eligibility");
Object obj = factType.newInstance();
factType.set(obj, "XXX", "XXX");
factType.set(obj, "YYY", "YYY");
factType.set(obj, "ZZZ", "XXX");
factType.set(obj, "BBB", "BBB");
WorkingMemory workingMemory = ruleBase.newStatefulSession();
workingMemory.insert(obj);
workingMemory.fireAllRules();
System.out.println("After drools execution");
long count = workingMemory.getFactCount();
System.out.println("count " + count);
Everything looks great with the output as below:
Hello from Drools
After drools execution
count 2
I cannot seem to find a way to get the EligibilityInquiry fact object back in my Java code and get the attributes set in the rule above (serviceName and status). I have used the StatefulSession approach.
The properties file has the link to the snapshot with basic authentication via username and password. There are 2 total facts: EligibilityInquiry and Eligibility.
I am fairly new to drools and any help with the above is appreciated.
(Note: I fixed the order of statement, a typo ("XX") and removed the comments from the output. Less surprise.)
This snippet assumes that EligibilityInquiry is also declared in DRL.
FactType eligInqFactType = ruleBase.getFactType("mortgages", "EligibilityInquiry");
Class<?> eligInqClass = eligInqFactType.getFactClass();
ObjectFilter filter = new FilterByClass( eligInqClass );
Collection<Object> eligInqs = workingMemory.getObjects( filter );
And the filter is
public class FilterByClass implements ObjectFilter {
private Class<?> theClass;
public FilterByClass( Class<?> clazz ){
theClass = clazz;
}
public boolean accept(Object object){
return theClass.isInstance( object );
}
}
You might also use a query, which takes about the same amount of code.
// DRL code
query "eligInqs"
eligInq : EligibilityInquiry()
end
// after return from fireAllRules
QueryResults results = workingMemory.getQueryResults( "eligInqs" );
for ( QueryResultsRow row : results ) {
Object eligInqObj = row.get( "eligInq" );
System.out.println( eligInqClass.cast( eligInqObj ) );
}
Or you can call workingMemory.getObjects() and iterate the collection and check for the class of each object.
for( Object obj: workingMemory.getObjects() ){
if( obj.isInstance( eligInqClass ) ){
System.out.println( eligInqClass.cast( eligInqObj ) );
}
}
Or you can (with or without inserting the created EligibilityInquiry object as a fact) add the fact to a global java.util.List eligInqList and iterate that in your Java code. Note that the API of StatefulKnowledgeSession is required (instead of WorkingMemory).
// Java - prior to fireAllRules
StatefulKnowledgeSession kSession() = ruleBase.newStatefulSession();
List<?> list = new ArrayList();
kSession.setGlobal( "eligInqList", list );
// DRL
global java.util.List eligInqList;
// in a rule
then
EligibilityInquiry fact0 = new EligibilityInquiry();
fact0.setServiceName( "ABCD" );
fact0.setMemberStatus( true );
insert(fact0 );
eligInqList.add( fact0 );
end
// after return from fireAllRules
for( Object elem: list ){
System.out.println( eligInqClass.cast( elem ) );
}
Probably an embarras de richesses.
I am trying to query a local version of Linked Movie Database using SPARQL. The file is in N-Triples format and its size is approximately 450mb. I am using servlets for implementation. Now when I pass the query, it takes about more than five minutes for the servlet to process it and at the end I get the following exception:
type Exception report
message
description The server encountered an internal error () that prevented it from fulfilling this request.
exception
javax.servlet.ServletException: Servlet execution threw an exception
root cause
java.lang.OutOfMemoryError: Java heap space
java.util.Arrays.copyOfRange(Arrays.java:3209)
java.lang.String.<init>(String.java:215)
java.lang.StringBuilder.toString(StringBuilder.java:430)
org.openjena.riot.tokens.TokenizerText.allBetween(TokenizerText.java:732)
org.openjena.riot.tokens.TokenizerText.parseToken(TokenizerText.java:152)
org.openjena.riot.tokens.TokenizerText.hasNext(TokenizerText.java:69)
org.openjena.atlas.iterator.PeekIterator.fill(PeekIterator.java:37)
org.openjena.atlas.iterator.PeekIterator.next(PeekIterator.java:77)
org.openjena.riot.lang.LangBase.nextToken(LangBase.java:145)
org.openjena.riot.lang.LangNTriples.parseOne(LangNTriples.java:59)
org.openjena.riot.lang.LangNTriples.parseOne(LangNTriples.java:21)
org.openjena.riot.lang.LangNTuple.runParser(LangNTuple.java:58)
org.openjena.riot.lang.LangBase.parse(LangBase.java:75)
org.openjena.riot.system.JenaReaderNTriples2.readWorker(JenaReaderNTriples2.java:28)
org.openjena.riot.system.JenaReaderRIOT.readImpl(JenaReaderRIOT.java:124)
org.openjena.riot.system.JenaReaderRIOT.read(JenaReaderRIOT.java:79)
com.hp.hpl.jena.rdf.model.impl.ModelCom.read(ModelCom.java:226)
com.hp.hpl.jena.util.FileManager.readModelWorker(FileManager.java:395)
com.hp.hpl.jena.util.FileManager.loadModelWorker(FileManager.java:299)
com.hp.hpl.jena.util.FileManager.loadModel(FileManager.java:250)
ServletExample.runQuery(ServletExample.java:92)
ServletExample.doGet(ServletExample.java:62)
javax.servlet.http.HttpServlet.service(HttpServlet.java:627)
javax.servlet.http.HttpServlet.service(HttpServlet.java:729)
note The full stack trace of the root cause is available in the Apache Tomcat/5.5.31 logs.
My code is:
import java.io.IOException;
import java.io.PrintWriter;
import javax.servlet.ServletException;
import javax.servlet.http.*;
import com.hp.hpl.jena.query.*;
import com.hp.hpl.jena.rdf.model.*;
import com.hp.hpl.jena.util.FileManager;
public class ServletExample
extends HttpServlet
{
/***********************************/
/* Constants */
/***********************************/
private static final long serialVersionUID = 1L;
public static final String SPARQL_ENDPOINT = "http://data.linkedmdb.org/sparql";
public static final String QUERY ="PREFIX m: <http://data.linkedmdb.org/resource/movie/>"
+"SELECT DISTINCT ?actorName WHERE {"+
"?dir1 m:director_name \"Sofia Coppola\"."+
"?dir2 m:director_name \"Francis Ford Coppola\"."+
"?dir1film m:director ?dir1;"+
"m:actor ?actor."+
"?dir2film m:director ?dir2;"+
"m:actor ?actor."+
"?actor m:actor_name ?actorName."+
"}";
/*"PREFIX m: <http://data.linkedmdb.org/resource/movie/>\n" +
"SELECT DISTINCT ?actorName WHERE {\n" +
" ?dir1 m:director_name %dir_name_1%.\n" +
" ?dir2 m:director_name %dir_name_2%.\n" +
" ?dir1film m:director ?dir1;\n" +
" m:actor ?actor.\n" +
" ?dir2film m:director ?dir2;\n" +
" m:actor ?actor.\n" +
" ?actor m:actor_name ?actorName.\n" +
"}\n" +
"";*/
private static final String HEADER = "<html>\n" +
" <head>\n" +
" <title>results</title>\n" +
" <link href=\"simple.css\" type=\"text/css\" rel=\"stylesheet\" />\n" +
" </head>\n" +
" <body>\n" +
"";
private static final String FOOTER = "</body></html>";
/**
* Respond to HTTP GET request. Will need to be mounted against some URL
* pattern in web.xml
*/
#Override
protected void doGet( HttpServletRequest req, HttpServletResponse resp )
throws ServletException, IOException
{
String dir1 = req.getParameter( "dir1" );//"Sofia";
String dir2 = req.getParameter( "dir2" );//"Francis Ford Coppola";
//String dir1 = "Sofia";
//String dir2 = "Francis Ford Coppola";
if (dir1 == null || dir2 == null || dir1.isEmpty() || dir2.isEmpty()) {
noInput( resp );
}
else {
runQuery( resp, dir1, dir2 );
}
}
protected void noInput( HttpServletResponse resp )
throws IOException
{
header( resp );
resp.getWriter().println( "<p>Please select director names as query params <code>dir1</code> and <code>dir2</code></p>" );
footer( resp );
}
protected void footer( HttpServletResponse resp ) throws IOException {
resp.getWriter().println( FOOTER );
}
protected void header( HttpServletResponse resp ) throws IOException {
resp.getWriter().println( HEADER );
}
protected void runQuery( HttpServletResponse resp, String dir1, String dir2 )
throws IOException
{
PrintWriter out = resp.getWriter();
// Set up the query
// String q = QUERY.replace( "%dir_name_1%", "\"" + dir1 + "\"" )
// .replace( "%dir_name_2%", "\"" + dir2 + "\"" );
String q=QUERY;
Query query = QueryFactory.create( q ) ;
Model model = FileManager.get().loadModel( "e:\\applications\\linkedmdb-18-05-2009-dump\\dump.nt" );
// QueryExecution qexec = QueryExecutionFactory.sparqlService( SPARQL_ENDPOINT, query );
//com.hp.hpl.jena.query.Query query = QueryFactory.create(QUERY);
QueryExecution qexec = QueryExecutionFactory.create(query, model);
// perform the query
ResultSet results = qexec.execSelect();
// generate the output
header( resp );
if (!results.hasNext()) {
out.println( "<p>No results, sorry.</p>" );
}
else {
out.println( "<h1>Results</h1>" );
while (results.hasNext()) {
QuerySolution qs = results.next();
String actorName = qs.getLiteral( "actorName" ).getLexicalForm();
out.println( String.format( "<div>Actor named: %s</div>", actorName ) );
}
}
footer( resp );
}
}
Is there any way to resolve this exception?
It seems you're loading all your data in memory using Jena/RIOT. As far as I know, LinkedIMDB is large enough to give you problems with this approach. What you're doing is bringing all your database to memory.
Increasing the heap of your JVM could be one possible solution but it won't scale if your data keeps growing.
The right solution is to go for other configurations of Jena that are designed for this size of datasets. These are:
Jena SDB, which uses relational databases as backend.
Jena TDB, which uses a native Java storage based on B-trees indexes to speed up queries. It scales better than (1).
Optionally you could go for scalable RDF databases such 4store and query your data via Jena ARQ. This solution is by far the one that will scale and perform better.
You are running out of heap memory in Java Virtual Machine (JVM). Either increase amount of heap memory that is available to JVM or design your software to use less memory, for example process the stuff in smaller chunks.
To increase heap memory, add these parameters to your servlet container's or application server's startup script, somewhere your java binary is executed. This tells JVM that it may use up to 512 megabytes of memory, if that is not enough, try with larger values:
-Xmx512m
It is hard to say how to improve your software to use less memory without seeing the actual code.
I am trying to get the Highlighter class from Lucene to work properly with tokens coming from Solr's WordDelimiterFilter. It works 90% of the time, but if the matching text contains a ',' such as "1,500" the output is incorrect:
Expected: 'test 1,500 this'
Observed: 'test 11,500 this'
I am not currently sure whether it is Highlighter messing up the recombination or WordDelimiterFilter messing up the tokenization but something is unhappy. Here are the relevant dependencies from my pom:
org.apache.lucene
lucene-core
2.9.3
jar
compile
org.apache.lucene
lucene-highlighter
2.9.3
jar
compile
org.apache.solr
solr-core
1.4.0
jar
compile
And here is a simple JUnit test class demonstrating the problem:
package test.lucene;
import static org.junit.Assert.assertEquals;
import static org.junit.Assert.assertTrue;
import java.io.IOException;
import java.io.Reader;
import java.util.HashMap;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.queryParser.ParseException;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.highlight.Highlighter;
import org.apache.lucene.search.highlight.InvalidTokenOffsetsException;
import org.apache.lucene.search.highlight.QueryScorer;
import org.apache.lucene.search.highlight.SimpleFragmenter;
import org.apache.lucene.search.highlight.SimpleHTMLFormatter;
import org.apache.lucene.util.Version;
import org.apache.solr.analysis.StandardTokenizerFactory;
import org.apache.solr.analysis.WordDelimiterFilterFactory;
import org.junit.Test;
public class HighlighterTester {
private static final String PRE_TAG = "<b>";
private static final String POST_TAG = "</b>";
private static String[] highlightField( Query query, String fieldName, String text )
throws IOException, InvalidTokenOffsetsException {
SimpleHTMLFormatter formatter = new SimpleHTMLFormatter( PRE_TAG, POST_TAG );
Highlighter highlighter = new Highlighter( formatter, new QueryScorer( query, fieldName ) );
highlighter.setTextFragmenter( new SimpleFragmenter( Integer.MAX_VALUE ) );
return highlighter.getBestFragments( getAnalyzer(), fieldName, text, 10 );
}
private static Analyzer getAnalyzer() {
return new Analyzer() {
#Override
public TokenStream tokenStream( String fieldName, Reader reader ) {
// Start with a StandardTokenizer
TokenStream stream = new StandardTokenizerFactory().create( reader );
// Chain on a WordDelimiterFilter
WordDelimiterFilterFactory wordDelimiterFilterFactory = new WordDelimiterFilterFactory();
HashMap<String, String> arguments = new HashMap<String, String>();
arguments.put( "generateWordParts", "1" );
arguments.put( "generateNumberParts", "1" );
arguments.put( "catenateWords", "1" );
arguments.put( "catenateNumbers", "1" );
arguments.put( "catenateAll", "0" );
wordDelimiterFilterFactory.init( arguments );
return wordDelimiterFilterFactory.create( stream );
}
};
}
#Test
public void TestHighlighter() throws ParseException, IOException, InvalidTokenOffsetsException {
String fieldName = "text";
String text = "test 1,500 this";
String queryString = "1500";
String expected = "test " + PRE_TAG + "1,500" + POST_TAG + " this";
QueryParser parser = new QueryParser( Version.LUCENE_29, fieldName, getAnalyzer() );
Query q = parser.parse( queryString );
String[] observed = highlightField( q, fieldName, text );
for ( int i = 0; i < observed.length; i++ ) {
System.out.println( "\t" + i + ": '" + observed[i] + "'" );
}
if ( observed.length > 0 ) {
System.out.println( "Expected: '" + expected + "'\n" + "Observed: '" + observed[0] + "'" );
assertEquals( expected, observed[0] );
}
else {
assertTrue( "No matches found", false );
}
}
}
Anyone have any ideas or suggestions?
After further investigation, this appears to be a bug in the Lucene Highlighter code. As you can see here:
public class TokenGroup {
...
protected boolean isDistinct() {
return offsetAtt.startOffset() >= endOffset;
}
...
The code attempts to determine if a group of tokens is distinct by checking to see if the start offset is greater than the previous end offset. The problem with this approach is illustrated by this issue. If you were to step through the tokens, you would see that they are as follows:
0-4: 'test', 'test'
5-6: '1', '1'
7-10: '500', '500'
5-10: '1500', '1,500'
11-15: 'this', 'this'
From this you can see that the third token starts after the end of the second, but the fourth starts the same place as the second. The intended outcome would be to group tokens 2, 3, and 4, but per this implementation, token 3 is seen as separate from 2, so 2 shows up by itself, then 3 and 4 get grouped leaving this outcome:
Expected: 'test <b>1,500</b> this'
Observed: 'test 1<b>1,500</b> this'
I'm not sure this can be accomplished without 2 passes, one to get all the indexes and a second to combine them. Also, I'm not sure what the implications would be outside of this specific case. Does anyone have any ideas here?
EDIT
Here is the final source code I came up with. It will group things correctly. It also appears to be MUCH simpler than the Lucene Highlighter implementation, but admittedly does not handle different levels of scoring as my application only needs a yes/no as to whether a fragment of text gets highlighted. Its also worth noting that I am using their QueryScorer to score the text fragments which does have the weakness of being Term oriented rather than Phrase oriented which means the search string "grammatical or spelling" would end up with highlighting that looks something like this "grammatical or spelling" as the or would most likely get dropped by your analyzer. Anyway, here is my source:
public TextFragments<E> getTextFragments( TokenStream tokenStream,
String text,
Scorer scorer )
throws IOException, InvalidTokenOffsetsException {
OffsetAttribute offsetAtt = (OffsetAttribute) tokenStream.addAttribute( OffsetAttribute.class );
TermAttribute termAtt = (TermAttribute) tokenStream.addAttribute( TermAttribute.class );
TokenStream newStream = scorer.init( tokenStream );
if ( newStream != null ) {
tokenStream = newStream;
}
TokenGroups tgs = new TokenGroups();
scorer.startFragment( null );
while ( tokenStream.incrementToken() ) {
tgs.add( offsetAtt.startOffset(), offsetAtt.endOffset(), scorer.getTokenScore() );
if ( log.isTraceEnabled() ) {
log.trace( new StringBuilder()
.append( scorer.getTokenScore() )
.append( " " )
.append( offsetAtt.startOffset() )
.append( "-" )
.append( offsetAtt.endOffset() )
.append( ": '" )
.append( termAtt.term() )
.append( "', '" )
.append( text.substring( offsetAtt.startOffset(), offsetAtt.endOffset() ) )
.append( "'" )
.toString() );
}
}
return tgs.fragment( text );
}
private class TokenGroup {
private int startIndex;
private int endIndex;
private float score;
public TokenGroup( int startIndex, int endIndex, float score ) {
this.startIndex = startIndex;
this.endIndex = endIndex;
this.score = score;
}
}
private class TokenGroups implements Iterable<TokenGroup> {
private List<TokenGroup> tgs;
public TokenGroups() {
tgs = new ArrayList<TokenGroup>();
}
public void add( int startIndex, int endIndex, float score ) {
add( new TokenGroup( startIndex, endIndex, score ) );
}
public void add( TokenGroup tg ) {
for ( int i = tgs.size() - 1; i >= 0; i-- ) {
if ( tg.startIndex < tgs.get( i ).endIndex ) {
tg = merge( tg, tgs.remove( i ) );
}
else {
break;
}
}
tgs.add( tg );
}
private TokenGroup merge( TokenGroup tg1, TokenGroup tg2 ) {
return new TokenGroup( Math.min( tg1.startIndex, tg2.startIndex ),
Math.max( tg1.endIndex, tg2.endIndex ),
Math.max( tg1.score, tg2.score ) );
}
private TextFragments<E> fragment( String text ) {
TextFragments<E> fragments = new TextFragments<E>();
int lastEndIndex = 0;
for ( TokenGroup tg : this ) {
if ( tg.startIndex > lastEndIndex ) {
fragments.add( text.substring( lastEndIndex, tg.startIndex ), textModeNormal );
}
fragments.add(
text.substring( tg.startIndex, tg.endIndex ),
tg.score > 0 ? textModeHighlighted : textModeNormal );
lastEndIndex = tg.endIndex;
}
if ( lastEndIndex < ( text.length() - 1 ) ) {
fragments.add( text.substring( lastEndIndex ), textModeNormal );
}
return fragments;
}
#Override
public Iterator<TokenGroup> iterator() {
return tgs.iterator();
}
}
Here's a possible cause.
Your highlighter needs to use the same Analyzer used for search. IIUC, Your code uses a default analyzer for the highlighting, even though it uses a specialized analyzer for parsing the query. I believe you need to change the Fragmenter to work with your specific TokenStream.