I am making an application which may require about 2-3 OWL files to work with, in order to serve different task for the same application. I am using Jena as my semantic web framework. My question is: How do we organize/setup these owl files?
Should I read all the owl files in the same dataset or I should maintain different datasets for different owls.
Note: I am not considering the Imported owls as it is handled by jena itself.
If I use same dataset, how can I differentiate to between he results obtained by functions like OntModel.lisRootHierarchyClasses(); and other such types of functions.
Is it possible to name the ontologies when I read them into the OntModel.
Hence would like to know the best practice to handle more than one OWL files in a same application
For Example:
I read my ontologies in the into an ontModel backed by a TDB dataset:
public static void loadModel(){
dataset.begin(ReadWrite.WRITE);
try{
ontModel = ModelToOntModel(model);
FileManager.get().readModel( ontModel, "SourceOwl1.owk");
FileManager.get().readModel( ontModel, "SourceOwl2.owl");
registerListener();
dataset.commit();
} catch (Exception e){
System.out.println("Error in Loading model from source!!");
e.printStackTrace();
} finally {
dataset.end();
}
}
Once the ontmodel is ready a user input specifies a particular class (say : SourceOWL2_ClassA) among any of the owl files, which i further need to process its Object properties and datatype properties and provide user some information in the same context.
But in order to do that, properties from SourceOWL1 also get listed and hence cause errors. Further more the structure of the SourceOWL1 and SourceOWL2 are very much different, where SourceOWL1 contains about 3 imports and SourceOWL2 contains none.
After few days of extensive hands on I found the solution.
The answer is to make use of NAMED MODELS in Dataset.
The mistake committed in the above code snippet is that model/ontModel used is generated from the DefaultModel i.e.
Model model = dataset.getDefaultModel();
Insted one should make use of :
Model namedmodel = dataset.addNamedModel("NameOfModel"); where NameOfModel can be any string convenient for the developer.
After which load the OWL files in the respective namedModel.
Thus the above function can be re-written as follows:
public static void loadModel(){
dataset.begin(ReadWrite.WRITE);
try{
Model namedModel1 = dataset.addNamedModel("NamedModel1");
OntModel ontModel1 = ModelFactory.createOntologyModel();
FileManager.get().readModel( ontModel1, "SourceOwl1.owl");
// Load second Model
Model namedModel1 = dataset.addNamedModel("NamedModel2");
OntModel ontModel1 = ModelFactory.createOntologyModel();
FileManager.get().readModel( ontModel, "SourceOwl2.owl");
// Similarly you can load many other models within same dataset.
dataset.commit();
} catch (Exception e){
System.out.println("Error in Loading model from source!!");
e.printStackTrace();
} finally {
dataset.end();
}
}
To answer the problems stated in the question:
Once dataset creation is complete we can access the different ontologies / OntModel specific to our requirement by using dataset.getNamedModel("NamedModel1") and hence treat it as a ontModel independent of others.
Since the ontModel used in the question was generated via dataset.getDefaultModel() hence on ontModel.lisRootHierarchyClasses() used to result in root classes from all the source owls. But now one can access the desiered model using the named model concept and ontModel.lisRootHierarchyClasses() will answer the root classes specific to that ontology only.
For more information on Named models you can refer here
It helped me clear my concepts.. hope it helps you too..
Related
I'm trying to create a Model in jena that won't load the entire data into memory but instead will read from the filesystem.
I found a whole lot of available configurations, but they all seem to be in-memory (for example on OntModelSpec).
Use Apache Jena TDB - see documentation here.
TDB stores your dataset on disk, but accesses it very efficiently: you shouldn't experience any real performance difference over an in-memory model.
Typically, if I'm dealing with a large model or dataset I work like this:
Load model on commandline:
# /tmp/DB is where TDB will store the indexed model
$ tdbloader2 --loc /tmp/DB file.nt
(use tdbloader on Windows)
(Optional) Try a query:
$ tdbquery --loc /tmp/DB #query.sparql
Access like any old model from java:
Dataset dataset = TDBFactory.createDataset("/tmp/DB") ;
Model model = dataset.getDefaultModel() ;
... continue as before ...
You can create your own implementation of org.apache.jena.graph.Graph, which won't work with memory.
An example is d2rq, where de.fuberlin.wiwiss.d2rq.jena.GraphD2RQ works with databases. but it is based on outdated jena.
i'm trying to understand how to load firstly, and then read into RDF file in N-Turtle format.
I'm using Jena Java APIs (https://jena.apache.org/index.html).
I'm trying with this Java code:
Model model = ModelFactory.createDefaultModel();
FileInputStream inputStream = null;
try {
inputStream = new FileInputStream("path_in_my_pc");
} catch (FileNotFoundException e) {}
I have to research a word into the RDF file, and print the results that match.
I was thinking to save the RDF N-Turtle into a string, and then, using some String method to find what I need. Is there other method to do this?
It would be also useful understand how to iterate all the RDF file and print the entire document.
Thank you for the help.
There's no format called N-Turtle. There are N3, Turtle, and N-Triples. N3 and Turtle are more or less interchangeable (they're not actually the same, but most of the time, when people say N3, they actually mean Turtle). For reading from Turtle, the answer to
Jena read from turtle fails
should work for you. For N-Triples,
How to load N-TRIPLE file in apache jena?
can work. Also have a look at
How to load an N-Triples model and obtain the namespaces from RDF file?(newbie)
I have this piece of information in RDF/XML
<rdf:RDF xmlns:cim="http://iec.ch/TC57/2012/CIM-schema-cim16#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<cim:SynchronousMachineTimeConstantReactance rdf:ID="_54302da0-b02c-11e3-af35-080027008896">
<cim:IdentifiedObject.aliasName>GENCLS_DYN</cim:IdentifiedObject.aliasName>
<cim:IdentifiedObject.name>RoundRotor Dynamics</cim:IdentifiedObject.name>
<cim:SynchronousMachineTimeConstantReactance.tpdo>0.30000001192092896</cim:SynchronousMachineTimeConstantReactance.tpdo>
<cim:SynchronousMachineTimeConstantReactance.tppdo>0.15000000596046448</cim:SynchronousMachineTimeConstantReactance.tppdo>
I have learned a little bit about how to read the document but now I want to go farther. I am "playing" with API functions to try to get the values but I am lost (and I think I do not understand properly how JENA and RDF work). So, how can I get the values of each tag?
Greetings!
I would start with the Reading and Writing RDF in Apache Jena documentation, and then read The Core RDF Api. One important step in understanding the RDF Data Model is to seperate any notion of XML from your understanding of RDF. RDF is a graph data model that just so happens to have one serialization which is in XML.
You'll note that xml-specific language like "tags" actually don't show up at all in the discussion unless you are talking about how to serialize/deserialize RDF/XML.
In order to make the data you are looking at more human friendly, I'd suggest writing it out in TURTLE. TURTLE (or TTL) is another serialization of RDF that is much easier to read or write.
The following code will express your data in TURTLE and will be helpful in understanding what you see.
final InputStream yourInputFile = ...;
final Model model = ModelFactory.createDefaultModel();
model.read(yourInputFile, "RDF/XML");
model.write(System.out, null, "TURTLE");
You'll also want to provide minimal working examples whenever submitting questions on the subject area. For example, I had to add some missing end-tags to your data in order for it to be valid XML:
<rdf:RDF xmlns:cim="http://iec.ch/TC57/2012/CIM-schema-cim16#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<cim:SynchronousMachineTimeConstantReactance rdf:ID="_54302da0-b02c-11e3-af35-080027008896">
<cim:IdentifiedObject.aliasName>GENCLS_DYN</cim:IdentifiedObject.aliasName>
<cim:IdentifiedObject.name>RoundRotor Dynamics</cim:IdentifiedObject.name>
<cim:SynchronousMachineTimeConstantReactance.tpdo>0.30000001192092896</cim:SynchronousMachineTimeConstantReactance.tpdo>
<cim:SynchronousMachineTimeConstantReactance.tppdo>0.15000000596046448</cim:SynchronousMachineTimeConstantReactance.tppdo>
</cim:SynchronousMachineTimeConstantReactance>
</rdf:RDF>
Which becomes the following TURTLE:
<file:///R:/workspaces/create/git-svn/create-sparql/RDF/XML#_54302da0-b02c-11e3-af35-080027008896>
a cim:SynchronousMachineTimeConstantReactance ;
cim:IdentifiedObject.aliasName "GENCLS_DYN" ;
cim:IdentifiedObject.name "RoundRotor Dynamics" ;
cim:SynchronousMachineTimeConstantReactance.tpdo "0.30000001192092896" ;
cim:SynchronousMachineTimeConstantReactance.tppdo "0.15000000596046448" .
RDF operates at the statement level, so to find out that your _54302da0-b02c-11e3-af35-080027008896 is a cim:SynchronousMachineTimeConstantReactance you would look for the corresponding triples. Jena's Model API (linked to above) will provide you with methods to identify the properties that resources have.
The following will list all statements whose subject is the aforementioned resource:
final Resource s = model.getResource("file:///R:/workspaces/create/git-svn/create-sparql/RDF/XML#_54302da0-b02c-11e3-af35-080027008896");
final ExtendedIterator<Statement> properties = s.listProperties();
while( properties.hasNext() ) {
System.out.println(properties.next());
}
which produces:
[file:///R:/workspaces/create/git-svn/create-sparql/RDF/XML#_54302da0-b02c-11e3-af35-080027008896, http://iec.ch/TC57/2012/CIM-schema-cim16#SynchronousMachineTimeConstantReactance.tppdo, "0.15000000596046448"]
[file:///R:/workspaces/create/git-svn/create-sparql/RDF/XML#_54302da0-b02c-11e3-af35-080027008896, http://iec.ch/TC57/2012/CIM-schema-cim16#SynchronousMachineTimeConstantReactance.tpdo, "0.30000001192092896"]
[file:///R:/workspaces/create/git-svn/create-sparql/RDF/XML#_54302da0-b02c-11e3-af35-080027008896, http://iec.ch/TC57/2012/CIM-schema-cim16#IdentifiedObject.name, "RoundRotor Dynamics"]
[file:///R:/workspaces/create/git-svn/create-sparql/RDF/XML#_54302da0-b02c-11e3-af35-080027008896, http://iec.ch/TC57/2012/CIM-schema-cim16#IdentifiedObject.aliasName, "GENCLS_DYN"]
[file:///R:/workspaces/create/git-svn/create-sparql/RDF/XML#_54302da0-b02c-11e3-af35-080027008896, http://www.w3.org/1999/02/22-rdf-syntax-ns#type, http://iec.ch/TC57/2012/CIM-schema-cim16#SynchronousMachineTimeConstantReactance]
which produces:
I intend to convert the data in an SQL database into an RDF dump. I have a model and an ontology defined.
Model model = ModelFactory.createDefaultModel();
OntModel ontModel = ModelFactory.createOntologyModel(OntModelSpec.OWL_DL_MEM, model);
The ontModel has many classes defined in it. Now let us suppose I have 10000 records in my SQL db and I want to load them into the model and write it into a file. However, I want to paginate in case of a memory overflow.
int fromIndex = 0;
int toIndex = 10;
while(true) {
//1. get resources between fromIndex to toIndex from sql db
// if no more resources 'break'
//2. push these resources in model
//3. write the model to a file
RDFWriter writer = model.getWriter();
File file = new File(file_path);
FileWriter fileWriter = new FileWriter(file, true);
writer.write(this.model, fileWriter, BASE_URL);
model.close();
from = to+1;
to = to+10;
}
Now, how does one append the new resources to the existing resources in the file. Because currently I see the ontology getting written twice and it throws an exception
org.apache.jena.riot.RiotException: The markup in the document
following the root element must be well-formed.
Is there a way to handle this already?
Now, how does one append the new resources to the existing resources
in the file. Because currently I see the ontology getting written
twice and it throws an exception
A model is a set of triples. You can add more triples to the set, but that's not quite the same thing as "appending", since a model doesn't contain duplicate triples, and sets don't have a specified order, so "appending" isn't quite the right metaphor.
Jena can handle pretty big models, so you might first see whether you can just create the model and add everything to it, and then write the model to the file. While it's good to be cautious, it's not a bad idea to see whether you can do what you want without jumping through hoops.
If you do have problems with in-memory models, you might consider using a TDB backed model which will use disk for storage. You could do incremental updates to that model using the model API, or SPARQL queries, and then extract the model afterward in some serialization.
One more option, and this is probably the easiest if you really do want to append to a file, is to use a non-XML serialization of the RDF, such as Turtle or N-Triples. These are text based (N-Triples is line-based), so appending new content to a file is not a problem. This approach is described in the answer to Adding more individuals to existing RDF ontology.
I have a RDF file that I am able to read using
Model model = ModelFactory.createDefaultModel();
// use the FileManager to find the input file
InputStream in = FileManager.get().open(args[0]);
if (in == null) {
throw new IllegalArgumentException(
"File: " + args[0] + " not found");
}
// read the RDF/XML file
model.read(in, null);
I also have OWL file which contains the description of the ontology which is used for creating my models. My question is: do I need to read this file (and how?) in order to work with my RDF model correctly?
To make myself clear, I will give ou an example:
I need to know whether one resource has some relationship with other resource (for example Station1 has predicate "isResponsibleFor" Workorder1). How can I do this with Jena?
If I try to use something like resource.hasProperty(ResourceFactory.createProperty("isResponsibleFor")), it returns false (but the property is there!).
Can you direct me to some advanced tutorial on this topic perhaps? I found many tutorials on Papache site etc. but they do not provide me with the information I am looking for. Sorry if the question is not clear, I am quite new to Jena
EDIT: currently, I am searching whether my model contains given statement using this:
public static boolean containsStatement(Model model, String sub,
String pred, String obj) {
// list the statements in the Model
StmtIterator iter = model.listStatements();
// print out the predicate, subject and object of each statement
while (iter.hasNext()) {
Statement stmt = iter.nextStatement(); // get next statement
Resource subject = stmt.getSubject(); // get the subject
Property predicate = stmt.getPredicate(); // get the predicate
RDFNode object = stmt.getObject(); // get the object
if (subject.toString().contains(sub)
&& predicate.toString().contains(pred)
&& object.toString().contains(obj)) {
return true;
}
}
return false;
}
but I am pretty sure that this is highly ineffective approach.. could you suggest me something more elegant and fast? Thanks!
Short answer: no, you don't need the ontology to work with your RDF file, but in many cases it can help your application.
First, you can shorten loading your file:
Model model = FileManager.get().loadModel( args[0] );
Now, in order to work with the relationship between resources, as given by the URI of the property connecting the subject resource to the object, you need the full URI of the predicate. Typically, this will be something like http://example.com/foo#isResponsibleFor. If you just use the short-name of predicate, it won't work - which is what you are finding.
You don't show any examples of your actual RDF data, so I'm going to use a fake namespace. Use your actual namespace in your code. In the meantime:
String NS = "http://example.com/example#";
Property isResponsibleFor = model.getProperty( NS + "isResponsibleFor" );
Resource station = model.getResource( NS + "station1" );
for (StmtIterator i = station.listProperties( isResponsibleFor ); i.hasNext(); ) {
Statement s = i.next();
Resource workorder = s.getResource();
// now you can do something with the work-order resource
}
In your code, you had:
public static boolean containsStatement(Model model, String sub, String pred, String obj)
There are a number of things wrong here. First, it's better if you can write your code in a more object-oriented style, which tends not to use static methods if that can be avoided. Second, don't use strings when you refer to things in a model. Jena has the Resource class to denote resources in a model, and many other RDF-specific classes as well. Use strings for handling input from your user, otherwise convert strings to resources or other RDF objects as soon as you can. Thirdly, I'd advise against exposing the details of your representation via your object's API. containsStatement makes it clear in the API that you are using RDF triples; that's not a detail that callers of the API need to know and it breaks encapsulation. A better API would have methods such as listWorkItems - that relates to the domain, and hides details of the implementation.
Regarding the use of your ontology, there are two specific ways your application can benefit from using your ontology. First, you can automatically generate statements such as:
Property isResponsibleFor = model.getProperty( NS + "isResponsibleFor" );
by using Jena's schemagen vocabulary generator tool. You can use schemagen as part of your build process to ensure that your vocabulary class automatically stays up-to-date as your ontology changes.
Secondly, by using Jena's inference engines, you can use the ontology to infer additional statements about your domain. For example, suppose you have class WidgetOrder, which is a type of WorkItem. Without inference, and without your ontology, if you ask the model object to list all of the WorkItems, it won't list the WidgetOrder resources. However, with the ontology and the reasoner, listing resources of type WorkItem will also return the resources that only have a declared type of WidgetOrder, because the other types can be inferred.