Adding blank nodes to a Jena model - java

I'm trying to populate a Jena ontology model with an existing set of triples, some of which contain blank nodes. I want to maintain these blank nodes inside this new model faithfully but I can't work out a way of adding them into a Jena model.
I have been using:
Statement s = ResourceFactory.createStatement(subject, predicate, object);
To add new statements to the model:
private OntModel model = ModelFactory.createOntologyModel();
model.add(s);
but this only allows for certain types as subject, predicate, and object; Resource subject, Property predicate, RDFNode object. None of these types allow for the adding of a blanknode as subject or object such as through:
Node subject = NodeFactory.createBlankNode(subjectValue);
Any suggestions? I've tried just using the blanknodes as resources and creating a Resource object but that breaks everything as they become classes then and not blank nodes.
Any help would be much appreciated, been pulling my hair out with this.

well, if you already have an existing set of triples you can easily read them from file by using:
OntModel model = ModelFactory.createOntologyModel();
model.read(new FileInputStream("data.ttl"), null, "TTL");
this will take care of blank nodes, see the jena documentation
you can create a blank node by hand like this:
Resource subject = model.createResource("s");
Property predicate = model.createProperty("p");
Resource object = model.createResource();
model.add(subject, predicate, object);
which will result in something like:
[s, p, aad22737-ce84-4564-a9c5-9bdfd49b55de]

Related

How to retrieve a RDF Property

I have this piece of information in RDF/XML
<rdf:RDF xmlns:cim="http://iec.ch/TC57/2012/CIM-schema-cim16#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<cim:SynchronousMachineTimeConstantReactance rdf:ID="_54302da0-b02c-11e3-af35-080027008896">
<cim:IdentifiedObject.aliasName>GENCLS_DYN</cim:IdentifiedObject.aliasName>
<cim:IdentifiedObject.name>RoundRotor Dynamics</cim:IdentifiedObject.name>
<cim:SynchronousMachineTimeConstantReactance.tpdo>0.30000001192092896</cim:SynchronousMachineTimeConstantReactance.tpdo>
<cim:SynchronousMachineTimeConstantReactance.tppdo>0.15000000596046448</cim:SynchronousMachineTimeConstantReactance.tppdo>
I have learned a little bit about how to read the document but now I want to go farther. I am "playing" with API functions to try to get the values but I am lost (and I think I do not understand properly how JENA and RDF work). So, how can I get the values of each tag?
Greetings!
I would start with the Reading and Writing RDF in Apache Jena documentation, and then read The Core RDF Api. One important step in understanding the RDF Data Model is to seperate any notion of XML from your understanding of RDF. RDF is a graph data model that just so happens to have one serialization which is in XML.
You'll note that xml-specific language like "tags" actually don't show up at all in the discussion unless you are talking about how to serialize/deserialize RDF/XML.
In order to make the data you are looking at more human friendly, I'd suggest writing it out in TURTLE. TURTLE (or TTL) is another serialization of RDF that is much easier to read or write.
The following code will express your data in TURTLE and will be helpful in understanding what you see.
final InputStream yourInputFile = ...;
final Model model = ModelFactory.createDefaultModel();
model.read(yourInputFile, "RDF/XML");
model.write(System.out, null, "TURTLE");
You'll also want to provide minimal working examples whenever submitting questions on the subject area. For example, I had to add some missing end-tags to your data in order for it to be valid XML:
<rdf:RDF xmlns:cim="http://iec.ch/TC57/2012/CIM-schema-cim16#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<cim:SynchronousMachineTimeConstantReactance rdf:ID="_54302da0-b02c-11e3-af35-080027008896">
<cim:IdentifiedObject.aliasName>GENCLS_DYN</cim:IdentifiedObject.aliasName>
<cim:IdentifiedObject.name>RoundRotor Dynamics</cim:IdentifiedObject.name>
<cim:SynchronousMachineTimeConstantReactance.tpdo>0.30000001192092896</cim:SynchronousMachineTimeConstantReactance.tpdo>
<cim:SynchronousMachineTimeConstantReactance.tppdo>0.15000000596046448</cim:SynchronousMachineTimeConstantReactance.tppdo>
</cim:SynchronousMachineTimeConstantReactance>
</rdf:RDF>
Which becomes the following TURTLE:
<file:///R:/workspaces/create/git-svn/create-sparql/RDF/XML#_54302da0-b02c-11e3-af35-080027008896>
a cim:SynchronousMachineTimeConstantReactance ;
cim:IdentifiedObject.aliasName "GENCLS_DYN" ;
cim:IdentifiedObject.name "RoundRotor Dynamics" ;
cim:SynchronousMachineTimeConstantReactance.tpdo "0.30000001192092896" ;
cim:SynchronousMachineTimeConstantReactance.tppdo "0.15000000596046448" .
RDF operates at the statement level, so to find out that your _54302da0-b02c-11e3-af35-080027008896 is a cim:SynchronousMachineTimeConstantReactance you would look for the corresponding triples. Jena's Model API (linked to above) will provide you with methods to identify the properties that resources have.
The following will list all statements whose subject is the aforementioned resource:
final Resource s = model.getResource("file:///R:/workspaces/create/git-svn/create-sparql/RDF/XML#_54302da0-b02c-11e3-af35-080027008896");
final ExtendedIterator<Statement> properties = s.listProperties();
while( properties.hasNext() ) {
System.out.println(properties.next());
}
which produces:
[file:///R:/workspaces/create/git-svn/create-sparql/RDF/XML#_54302da0-b02c-11e3-af35-080027008896, http://iec.ch/TC57/2012/CIM-schema-cim16#SynchronousMachineTimeConstantReactance.tppdo, "0.15000000596046448"]
[file:///R:/workspaces/create/git-svn/create-sparql/RDF/XML#_54302da0-b02c-11e3-af35-080027008896, http://iec.ch/TC57/2012/CIM-schema-cim16#SynchronousMachineTimeConstantReactance.tpdo, "0.30000001192092896"]
[file:///R:/workspaces/create/git-svn/create-sparql/RDF/XML#_54302da0-b02c-11e3-af35-080027008896, http://iec.ch/TC57/2012/CIM-schema-cim16#IdentifiedObject.name, "RoundRotor Dynamics"]
[file:///R:/workspaces/create/git-svn/create-sparql/RDF/XML#_54302da0-b02c-11e3-af35-080027008896, http://iec.ch/TC57/2012/CIM-schema-cim16#IdentifiedObject.aliasName, "GENCLS_DYN"]
[file:///R:/workspaces/create/git-svn/create-sparql/RDF/XML#_54302da0-b02c-11e3-af35-080027008896, http://www.w3.org/1999/02/22-rdf-syntax-ns#type, http://iec.ch/TC57/2012/CIM-schema-cim16#SynchronousMachineTimeConstantReactance]
which produces:

How match JAXB elements in CIM/RDF?

Trying to load a model from a CIM/XML file acording to IEC 61970 (Common Information Model, for power systems models), I found a problem;
According JAXB´s graphs between elements are provided by #XmlREF #XmlID and these both should be equals to match. But in CIM/RDF the references to a resource through an ID, i.e. rdf:resource="#_37C0E103000D40CD812C47572C31C0AD" contain the "#" character, consequently JAXB is unable to match "GeographicalRegion" vs. "SubGeographicalRegion.Region" when in the rdf:resource atribute the "#" character is present.
Here an example:
<cim:GeographicalRegion rdf:ID="_37C0E103000D40CD812C47572C31C0AD">
<cim:IdentifiedObject.name>GeoRegion</cim:IdentifiedObject.name>
<cim:IdentifiedObject.localName>OpenCIM3bus</cim:IdentifiedObject.localName>
</cim:GeographicalRegion>
<cim:SubGeographicalRegion rdf:ID="_ID_SubGeographicalRegion">
<cim:IdentifiedObject.name>SubRegion</cim:IdentifiedObject.name>
<cim:IdentifiedObject.localName>SubRegion</cim:IdentifiedObject.localName>
<cim:SubGeographicalRegion.Region rdf:resource="#_37C0E103000D40CD812C47572C31C0AD"/>
</cim:SubGeographicalRegion>
I realize you're asking for a solution using JAXB, but I would urge you to consider an RDF-based solution as it is more flexible and robust. You're basically trying to reinvent what RDF parsers already have built in. RDF/XML is a difficult format to parse, it doesn't make much sense to try and hack your own parsing together - especially since files that have very different XML structures can express exactly the same information: this only becomes apparent when looking at the level of the RDF. You may find that your JAXB parser workaround works on one CIM/RDF file but completely fails on another.
So, here's an example of how to process your file using the Sesame RDF API. No inferencing is involved, this just parses the file and puts it in an in-memory RDF model, which you can then manipulate and query from any angle.
Assuming the root element of your CIM file looks something like this:
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:cim="http://example.org/cim/">
(only a guess of course, but I need prefixes for a proper example)
Then you can do the following, using Sesame's Rio RDF/XML parser:
String baseURI = "http://example.org/my/file";
FileInputStream in = new FileInputStream("/path/to/my/cim.rdf");
Model model = Rio.parse(in, baseURI, RDFFormat.RDFXML);
This creates an in-memory RDF model of your document. You can then simply filter-query over that. For example, to print out the properties of all resources that have _37C0E103000D40CD812C47572C31C0AD as their SubGeographicalRegion.Region:
String CIM_NS = "http://example.org/cim/";
ValueFactory vf = ValueFactoryImpl.getInstance();
URI subRegion = vf.createURI(CIM_NS, "SubGeographicalRegion.Region");
URI res = vf.createURI("http://example.org/my/file#_37C0E103000D40CD812C47572C31C0AD");
Set<Resource> subs = model.filter(null, subRegion, res).subjects();
for (Resource sub: subs) {
System.out.println("resource: " + sub + " has the following properties: ");
for (URI prop: model.filter(sub, null, null).predicates()) {
System.out.println(prop + ": " + model.filter(sub, prop, null).objectValue());
}
}
Of course at this point you can also choose to convert the model to some other syntax format for further handling by your application - as you see fit. The point is that the difference between the identifiers with the leading # and without has been resolved for you by the RDF/XML parser.
This is of course personal opinion only, since I don't know the details of your use case, but I think you'll find that this is quite quick and flexible. I should also point out that although the above solution keeps the entire model in memory, you can easily adapt this to a more streaming (and therefore less memory-intensive) approach if you find your files are too big.

Write to Jena RDF model through pagination

I intend to convert the data in an SQL database into an RDF dump. I have a model and an ontology defined.
Model model = ModelFactory.createDefaultModel();
OntModel ontModel = ModelFactory.createOntologyModel(OntModelSpec.OWL_DL_MEM, model);
The ontModel has many classes defined in it. Now let us suppose I have 10000 records in my SQL db and I want to load them into the model and write it into a file. However, I want to paginate in case of a memory overflow.
int fromIndex = 0;
int toIndex = 10;
while(true) {
//1. get resources between fromIndex to toIndex from sql db
// if no more resources 'break'
//2. push these resources in model
//3. write the model to a file
RDFWriter writer = model.getWriter();
File file = new File(file_path);
FileWriter fileWriter = new FileWriter(file, true);
writer.write(this.model, fileWriter, BASE_URL);
model.close();
from = to+1;
to = to+10;
}
Now, how does one append the new resources to the existing resources in the file. Because currently I see the ontology getting written twice and it throws an exception
org.apache.jena.riot.RiotException: The markup in the document
following the root element must be well-formed.
Is there a way to handle this already?
Now, how does one append the new resources to the existing resources
in the file. Because currently I see the ontology getting written
twice and it throws an exception
A model is a set of triples. You can add more triples to the set, but that's not quite the same thing as "appending", since a model doesn't contain duplicate triples, and sets don't have a specified order, so "appending" isn't quite the right metaphor.
Jena can handle pretty big models, so you might first see whether you can just create the model and add everything to it, and then write the model to the file. While it's good to be cautious, it's not a bad idea to see whether you can do what you want without jumping through hoops.
If you do have problems with in-memory models, you might consider using a TDB backed model which will use disk for storage. You could do incremental updates to that model using the model API, or SPARQL queries, and then extract the model afterward in some serialization.
One more option, and this is probably the easiest if you really do want to append to a file, is to use a non-XML serialization of the RDF, such as Turtle or N-Triples. These are text based (N-Triples is line-based), so appending new content to a file is not a problem. This approach is described in the answer to Adding more individuals to existing RDF ontology.

Jena - how to use ontology + rdf together

I have a RDF file that I am able to read using
Model model = ModelFactory.createDefaultModel();
// use the FileManager to find the input file
InputStream in = FileManager.get().open(args[0]);
if (in == null) {
throw new IllegalArgumentException(
"File: " + args[0] + " not found");
}
// read the RDF/XML file
model.read(in, null);
I also have OWL file which contains the description of the ontology which is used for creating my models. My question is: do I need to read this file (and how?) in order to work with my RDF model correctly?
To make myself clear, I will give ou an example:
I need to know whether one resource has some relationship with other resource (for example Station1 has predicate "isResponsibleFor" Workorder1). How can I do this with Jena?
If I try to use something like resource.hasProperty(ResourceFactory.createProperty("isResponsibleFor")), it returns false (but the property is there!).
Can you direct me to some advanced tutorial on this topic perhaps? I found many tutorials on Papache site etc. but they do not provide me with the information I am looking for. Sorry if the question is not clear, I am quite new to Jena
EDIT: currently, I am searching whether my model contains given statement using this:
public static boolean containsStatement(Model model, String sub,
String pred, String obj) {
// list the statements in the Model
StmtIterator iter = model.listStatements();
// print out the predicate, subject and object of each statement
while (iter.hasNext()) {
Statement stmt = iter.nextStatement(); // get next statement
Resource subject = stmt.getSubject(); // get the subject
Property predicate = stmt.getPredicate(); // get the predicate
RDFNode object = stmt.getObject(); // get the object
if (subject.toString().contains(sub)
&& predicate.toString().contains(pred)
&& object.toString().contains(obj)) {
return true;
}
}
return false;
}
but I am pretty sure that this is highly ineffective approach.. could you suggest me something more elegant and fast? Thanks!
Short answer: no, you don't need the ontology to work with your RDF file, but in many cases it can help your application.
First, you can shorten loading your file:
Model model = FileManager.get().loadModel( args[0] );
Now, in order to work with the relationship between resources, as given by the URI of the property connecting the subject resource to the object, you need the full URI of the predicate. Typically, this will be something like http://example.com/foo#isResponsibleFor. If you just use the short-name of predicate, it won't work - which is what you are finding.
You don't show any examples of your actual RDF data, so I'm going to use a fake namespace. Use your actual namespace in your code. In the meantime:
String NS = "http://example.com/example#";
Property isResponsibleFor = model.getProperty( NS + "isResponsibleFor" );
Resource station = model.getResource( NS + "station1" );
for (StmtIterator i = station.listProperties( isResponsibleFor ); i.hasNext(); ) {
Statement s = i.next();
Resource workorder = s.getResource();
// now you can do something with the work-order resource
}
In your code, you had:
public static boolean containsStatement(Model model, String sub, String pred, String obj)
There are a number of things wrong here. First, it's better if you can write your code in a more object-oriented style, which tends not to use static methods if that can be avoided. Second, don't use strings when you refer to things in a model. Jena has the Resource class to denote resources in a model, and many other RDF-specific classes as well. Use strings for handling input from your user, otherwise convert strings to resources or other RDF objects as soon as you can. Thirdly, I'd advise against exposing the details of your representation via your object's API. containsStatement makes it clear in the API that you are using RDF triples; that's not a detail that callers of the API need to know and it breaks encapsulation. A better API would have methods such as listWorkItems - that relates to the domain, and hides details of the implementation.
Regarding the use of your ontology, there are two specific ways your application can benefit from using your ontology. First, you can automatically generate statements such as:
Property isResponsibleFor = model.getProperty( NS + "isResponsibleFor" );
by using Jena's schemagen vocabulary generator tool. You can use schemagen as part of your build process to ensure that your vocabulary class automatically stays up-to-date as your ontology changes.
Secondly, by using Jena's inference engines, you can use the ontology to infer additional statements about your domain. For example, suppose you have class WidgetOrder, which is a type of WorkItem. Without inference, and without your ontology, if you ask the model object to list all of the WorkItems, it won't list the WidgetOrder resources. However, with the ontology and the reasoner, listing resources of type WorkItem will also return the resources that only have a declared type of WidgetOrder, because the other types can be inferred.

Access fields of returned Java Object

I made a simple client call to the XML-RPC WordPress API/Posts using a xml-rpc client and according to their documentation here it returns a struct. How can i access the return values.
Here is a look at my code:
XmlRpcClient client = new XmlRpcClient("http://www.mywebsite.net/xmlrpc.php", false);
String[] fields = new String[4];
fields[0] = "post_id";
fields[1] = "post_title";
fields[2] = "post_date";
fields[3] = "post_content";
Object token = client.invoke("wp.getPost", new Object[]{"0","myusername", "mypassword", 1545, fields });
System.out.println(token);
When I print use
System.out.println(token);
I get the following out put:
{item_one=I am item number one, item_two=I am Item two...}
How can I access item_one and item_two?
There's a bit of information missing (what's the fully qualified name of XmlRpcClient?), but assuming that client.invoke actually returns just an Object and not something more specific that has accessor methods, you can parse the response using something like this:
Object token = client.invoke("wp.getPost", new Object[]{"0","myusername", "mypassword", 1545, fields });
String[] items = token.toString().split(",");
for (String item : items) {
String[] parts = item.split("=");
String key = parts[0];
String value = parts[1];
// do stuff with your keys and values here
}
Of course this isn't perfect code -- you may need to check for nulls, use String.trim(), etc, but it should get you started.
You don't have a true Java representation of the data returned, in that you don't have an object on which you can access
token.item_one
rather you have a string containing a representation of a set - that is something that (in concept) from which you could retrieve an value by its name
token.get("item_one")
This string format is probably JSON, which pretty much looks like JavaScript, and hence can represent quite complex data. In general you can have arrays of objects and objects containing objects (for example, a Customer might contain an Address object)
So you have two possibilities:
1). parse the string into a true Java representation such as one of the standard Java collection classes. You then use the get-by-name style I show above.
2). define a Java class that mimics the structure of the data and then parse the string to fill out such an object, you can then use the "dot" form of access - you really have a Java Object representing the data.
In the first case there are suitable libraries such as quickJson
For the second you can use implementations of standards such as JAX/B, which tends to be more work as you may need to construct the target Java Class by hand. Enterprise Java runtimes will give you these facilities and perhaps tooling to help, or look at implementaitons such as Jackson. You will see that JAX/B hada focus on mapping from XML to Java, but tutorials such as this show how to work with JSON instead.
My guess is that the first option, simple parsing to a collection may be enough for you.

Categories