Write to Jena RDF model through pagination - java

I intend to convert the data in an SQL database into an RDF dump. I have a model and an ontology defined.
Model model = ModelFactory.createDefaultModel();
OntModel ontModel = ModelFactory.createOntologyModel(OntModelSpec.OWL_DL_MEM, model);
The ontModel has many classes defined in it. Now let us suppose I have 10000 records in my SQL db and I want to load them into the model and write it into a file. However, I want to paginate in case of a memory overflow.
int fromIndex = 0;
int toIndex = 10;
while(true) {
//1. get resources between fromIndex to toIndex from sql db
// if no more resources 'break'
//2. push these resources in model
//3. write the model to a file
RDFWriter writer = model.getWriter();
File file = new File(file_path);
FileWriter fileWriter = new FileWriter(file, true);
writer.write(this.model, fileWriter, BASE_URL);
model.close();
from = to+1;
to = to+10;
}
Now, how does one append the new resources to the existing resources in the file. Because currently I see the ontology getting written twice and it throws an exception
org.apache.jena.riot.RiotException: The markup in the document
following the root element must be well-formed.
Is there a way to handle this already?

Now, how does one append the new resources to the existing resources
in the file. Because currently I see the ontology getting written
twice and it throws an exception
A model is a set of triples. You can add more triples to the set, but that's not quite the same thing as "appending", since a model doesn't contain duplicate triples, and sets don't have a specified order, so "appending" isn't quite the right metaphor.
Jena can handle pretty big models, so you might first see whether you can just create the model and add everything to it, and then write the model to the file. While it's good to be cautious, it's not a bad idea to see whether you can do what you want without jumping through hoops.
If you do have problems with in-memory models, you might consider using a TDB backed model which will use disk for storage. You could do incremental updates to that model using the model API, or SPARQL queries, and then extract the model afterward in some serialization.
One more option, and this is probably the easiest if you really do want to append to a file, is to use a non-XML serialization of the RDF, such as Turtle or N-Triples. These are text based (N-Triples is line-based), so appending new content to a file is not a problem. This approach is described in the answer to Adding more individuals to existing RDF ontology.

Related

Adding blank nodes to a Jena model

I'm trying to populate a Jena ontology model with an existing set of triples, some of which contain blank nodes. I want to maintain these blank nodes inside this new model faithfully but I can't work out a way of adding them into a Jena model.
I have been using:
Statement s = ResourceFactory.createStatement(subject, predicate, object);
To add new statements to the model:
private OntModel model = ModelFactory.createOntologyModel();
model.add(s);
but this only allows for certain types as subject, predicate, and object; Resource subject, Property predicate, RDFNode object. None of these types allow for the adding of a blanknode as subject or object such as through:
Node subject = NodeFactory.createBlankNode(subjectValue);
Any suggestions? I've tried just using the blanknodes as resources and creating a Resource object but that breaks everything as they become classes then and not blank nodes.
Any help would be much appreciated, been pulling my hair out with this.
well, if you already have an existing set of triples you can easily read them from file by using:
OntModel model = ModelFactory.createOntologyModel();
model.read(new FileInputStream("data.ttl"), null, "TTL");
this will take care of blank nodes, see the jena documentation
you can create a blank node by hand like this:
Resource subject = model.createResource("s");
Property predicate = model.createProperty("p");
Resource object = model.createResource();
model.add(subject, predicate, object);
which will result in something like:
[s, p, aad22737-ce84-4564-a9c5-9bdfd49b55de]

How to read into RDF file N-Turtle format using Jena Java APIs

i'm trying to understand how to load firstly, and then read into RDF file in N-Turtle format.
I'm using Jena Java APIs (https://jena.apache.org/index.html).
I'm trying with this Java code:
Model model = ModelFactory.createDefaultModel();
FileInputStream inputStream = null;
try {
inputStream = new FileInputStream("path_in_my_pc");
} catch (FileNotFoundException e) {}
I have to research a word into the RDF file, and print the results that match.
I was thinking to save the RDF N-Turtle into a string, and then, using some String method to find what I need. Is there other method to do this?
It would be also useful understand how to iterate all the RDF file and print the entire document.
Thank you for the help.
There's no format called N-Turtle. There are N3, Turtle, and N-Triples. N3 and Turtle are more or less interchangeable (they're not actually the same, but most of the time, when people say N3, they actually mean Turtle). For reading from Turtle, the answer to
Jena read from turtle fails
should work for you. For N-Triples,
How to load N-TRIPLE file in apache jena?
can work. Also have a look at
How to load an N-Triples model and obtain the namespaces from RDF file?(newbie)

Handle more than one OWL files in a same JENA application

I am making an application which may require about 2-3 OWL files to work with, in order to serve different task for the same application. I am using Jena as my semantic web framework. My question is: How do we organize/setup these owl files?
Should I read all the owl files in the same dataset or I should maintain different datasets for different owls.
Note: I am not considering the Imported owls as it is handled by jena itself.
If I use same dataset, how can I differentiate to between he results obtained by functions like OntModel.lisRootHierarchyClasses(); and other such types of functions.
Is it possible to name the ontologies when I read them into the OntModel.
Hence would like to know the best practice to handle more than one OWL files in a same application
For Example:
I read my ontologies in the into an ontModel backed by a TDB dataset:
public static void loadModel(){
dataset.begin(ReadWrite.WRITE);
try{
ontModel = ModelToOntModel(model);
FileManager.get().readModel( ontModel, "SourceOwl1.owk");
FileManager.get().readModel( ontModel, "SourceOwl2.owl");
registerListener();
dataset.commit();
} catch (Exception e){
System.out.println("Error in Loading model from source!!");
e.printStackTrace();
} finally {
dataset.end();
}
}
Once the ontmodel is ready a user input specifies a particular class (say : SourceOWL2_ClassA) among any of the owl files, which i further need to process its Object properties and datatype properties and provide user some information in the same context.
But in order to do that, properties from SourceOWL1 also get listed and hence cause errors. Further more the structure of the SourceOWL1 and SourceOWL2 are very much different, where SourceOWL1 contains about 3 imports and SourceOWL2 contains none.
After few days of extensive hands on I found the solution.
The answer is to make use of NAMED MODELS in Dataset.
The mistake committed in the above code snippet is that model/ontModel used is generated from the DefaultModel i.e.
Model model = dataset.getDefaultModel();
Insted one should make use of :
Model namedmodel = dataset.addNamedModel("NameOfModel"); where NameOfModel can be any string convenient for the developer.
After which load the OWL files in the respective namedModel.
Thus the above function can be re-written as follows:
public static void loadModel(){
dataset.begin(ReadWrite.WRITE);
try{
Model namedModel1 = dataset.addNamedModel("NamedModel1");
OntModel ontModel1 = ModelFactory.createOntologyModel();
FileManager.get().readModel( ontModel1, "SourceOwl1.owl");
// Load second Model
Model namedModel1 = dataset.addNamedModel("NamedModel2");
OntModel ontModel1 = ModelFactory.createOntologyModel();
FileManager.get().readModel( ontModel, "SourceOwl2.owl");
// Similarly you can load many other models within same dataset.
dataset.commit();
} catch (Exception e){
System.out.println("Error in Loading model from source!!");
e.printStackTrace();
} finally {
dataset.end();
}
}
To answer the problems stated in the question:
Once dataset creation is complete we can access the different ontologies / OntModel specific to our requirement by using dataset.getNamedModel("NamedModel1") and hence treat it as a ontModel independent of others.
Since the ontModel used in the question was generated via dataset.getDefaultModel() hence on ontModel.lisRootHierarchyClasses() used to result in root classes from all the source owls. But now one can access the desiered model using the named model concept and ontModel.lisRootHierarchyClasses() will answer the root classes specific to that ontology only.
For more information on Named models you can refer here
It helped me clear my concepts.. hope it helps you too..

Jena - how to use ontology + rdf together

I have a RDF file that I am able to read using
Model model = ModelFactory.createDefaultModel();
// use the FileManager to find the input file
InputStream in = FileManager.get().open(args[0]);
if (in == null) {
throw new IllegalArgumentException(
"File: " + args[0] + " not found");
}
// read the RDF/XML file
model.read(in, null);
I also have OWL file which contains the description of the ontology which is used for creating my models. My question is: do I need to read this file (and how?) in order to work with my RDF model correctly?
To make myself clear, I will give ou an example:
I need to know whether one resource has some relationship with other resource (for example Station1 has predicate "isResponsibleFor" Workorder1). How can I do this with Jena?
If I try to use something like resource.hasProperty(ResourceFactory.createProperty("isResponsibleFor")), it returns false (but the property is there!).
Can you direct me to some advanced tutorial on this topic perhaps? I found many tutorials on Papache site etc. but they do not provide me with the information I am looking for. Sorry if the question is not clear, I am quite new to Jena
EDIT: currently, I am searching whether my model contains given statement using this:
public static boolean containsStatement(Model model, String sub,
String pred, String obj) {
// list the statements in the Model
StmtIterator iter = model.listStatements();
// print out the predicate, subject and object of each statement
while (iter.hasNext()) {
Statement stmt = iter.nextStatement(); // get next statement
Resource subject = stmt.getSubject(); // get the subject
Property predicate = stmt.getPredicate(); // get the predicate
RDFNode object = stmt.getObject(); // get the object
if (subject.toString().contains(sub)
&& predicate.toString().contains(pred)
&& object.toString().contains(obj)) {
return true;
}
}
return false;
}
but I am pretty sure that this is highly ineffective approach.. could you suggest me something more elegant and fast? Thanks!
Short answer: no, you don't need the ontology to work with your RDF file, but in many cases it can help your application.
First, you can shorten loading your file:
Model model = FileManager.get().loadModel( args[0] );
Now, in order to work with the relationship between resources, as given by the URI of the property connecting the subject resource to the object, you need the full URI of the predicate. Typically, this will be something like http://example.com/foo#isResponsibleFor. If you just use the short-name of predicate, it won't work - which is what you are finding.
You don't show any examples of your actual RDF data, so I'm going to use a fake namespace. Use your actual namespace in your code. In the meantime:
String NS = "http://example.com/example#";
Property isResponsibleFor = model.getProperty( NS + "isResponsibleFor" );
Resource station = model.getResource( NS + "station1" );
for (StmtIterator i = station.listProperties( isResponsibleFor ); i.hasNext(); ) {
Statement s = i.next();
Resource workorder = s.getResource();
// now you can do something with the work-order resource
}
In your code, you had:
public static boolean containsStatement(Model model, String sub, String pred, String obj)
There are a number of things wrong here. First, it's better if you can write your code in a more object-oriented style, which tends not to use static methods if that can be avoided. Second, don't use strings when you refer to things in a model. Jena has the Resource class to denote resources in a model, and many other RDF-specific classes as well. Use strings for handling input from your user, otherwise convert strings to resources or other RDF objects as soon as you can. Thirdly, I'd advise against exposing the details of your representation via your object's API. containsStatement makes it clear in the API that you are using RDF triples; that's not a detail that callers of the API need to know and it breaks encapsulation. A better API would have methods such as listWorkItems - that relates to the domain, and hides details of the implementation.
Regarding the use of your ontology, there are two specific ways your application can benefit from using your ontology. First, you can automatically generate statements such as:
Property isResponsibleFor = model.getProperty( NS + "isResponsibleFor" );
by using Jena's schemagen vocabulary generator tool. You can use schemagen as part of your build process to ensure that your vocabulary class automatically stays up-to-date as your ontology changes.
Secondly, by using Jena's inference engines, you can use the ontology to infer additional statements about your domain. For example, suppose you have class WidgetOrder, which is a type of WorkItem. Without inference, and without your ontology, if you ask the model object to list all of the WorkItems, it won't list the WidgetOrder resources. However, with the ontology and the reasoner, listing resources of type WorkItem will also return the resources that only have a declared type of WidgetOrder, because the other types can be inferred.

how to get the data from xml feeds

I have the following feeds from my vendor,
http://scores.cricandcric.com/cricket/getFeed?key=4333433434343&format=xml&tagsformat=long&type=schedule
I wanted to get the data from that xml files as java objects, so that I can insert into my database regularly.
The above data is nothing but regular updates from the vendor, so that I can update in my website.
can you please suggest me what are my options available to get this working
Should I use any webservices or just Xstream
to get my final output.. please suggest me as am a new comer to this concept
Vendor has suggested me that he can give me the data in following 3 formats rss, xml or json, I am not sure what is easy and less consumable to get it working
I would suggest just write a program that parses the XML and inserts the data directly into your database.
Example
This groovy script inserts data into a H2 database.
//
// Dependencies
// ============
import groovy.sql.Sql
#Grapes([
#Grab(group='com.h2database', module='h2', version='1.3.163'),
#GrabConfig(systemClassLoader=true)
])
//
// Main program
// ============
def sql = Sql.newInstance("jdbc:h2:db/cricket", "user", "pass", "org.h2.Driver")
def dataUrl = new URL("http://scores.cricandcric.com/cricket/getFeed?key=4333433434343&format=xml&tagsformat=long&type=schedule")
dataUrl.withReader { reader ->
def feeds = new XmlSlurper().parse(reader)
feeds.matches.match.each {
def data = [
it.id,
it.name,
it.type,
it.tournamentId,
it.location,
it.date,
it.GMTTime,
it.localTime,
it.description,
it.team1,
it.team2,
it.teamId1,
it.teamId2,
it.tournamentName,
it.logo
].collect {
it.text()
}
sql.execute("INSERT INTO matches (id,name,type,tournamentId,location,date,GMTTime,localTime,description,team1,team2,teamId1,teamId2,tournamentName,logo) VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)", data)
}
}
Well... you could use an XML Parser (stream or DOM), or a JSON parser (again stream of 'DOM'), and build the objects on the fly. But with this data - which seems to consist of records of cricket matches, why not go with a csv format?
This seems to be your basic 'datum':
<id>1263</id>
<name>Australia v India 3rd Test at Perth - Jan 13-17, 2012</name>
<type>TestMatch</type>
<tournamentId>137</tournamentId>
<location>Perth</location>
<date>2012-01-14</date>
<GMTTime>02:30:00</GMTTime>
<localTime>10:30:00</localTime>
<description>3rd Test day 2</description>
<team1>Australia</team1>
<team2>India</team2>
<teamId1>7</teamId1>
<teamId2>1</teamId2>
<tournamentName>India tour of Australia 2011-12</tournamentName>
<logo>/cricket/137/tournament.png</logo>
Of course you would still have to parse a csv, and deal with character delimiting (such as when you have a ' or a " in a string), but it will reduce your network traffic quite substantially, and likely parse much faster on the client. Of course, this depends on what your client is.
Actually you have RESTful store that can return data in several formats and you only need to read from this source and no further interaction is needed.
So, you can use any XML Parser to parse XML data and put the extracted data in whatever data structure that you want or you have.
I did not hear about XTREME, but you can find more information about selecting the best parser for your situation at this StackOverflow question.

Categories