Read restriction values using Jena - java

I have an object restriction defined as follows
hasYear some integer[minLength 2, maxLength 4, >=1995, <=2012]
How can i read the individual values defined in the restriction using Jena.

You can use different approaches. First of all you can traverse Jena Model by the following code:
model.read(...);
StmtIterator si = model.listStatements(
model.getResource("required property uri"), RDFS.range, (RDFNode) null);
while (si.hasNext()) {
Statement stmt = si.next();
Resource range = stmt.getObject().asResource();
// get restrictions collection
Resource nextNode = range.getPropertyResourceValue(OWL2.withRestrictions);
for (;;) {
Resource restr = nextNode.getPropertyResourceValue(RDF.first);
if (restr == null)
break;
StmtIterator pi = restr.listProperties();
while (pi.hasNext()) {
Statement restrStmt = pi.next();
Property restrType = restrStmt.getPredicate();
Literal value = restrStmt.getObject().asLiteral();
// print type and value for each restriction
System.out.println(restrType + " = " + value);
}
// go to the next element of collection
nextNode = nextNode.getPropertyResourceValue(RDF.rest);
}
}
If you use OntModel representation of RDF graph code can be simplified by using of
model.listRestrictions()
ontClass.asRestriction()
etc.
Good example of such approach (thanks to Ian Dickinson)
Another way is to use SPARQL 1.1 query with the same meaning
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?datatype ?restr_type ?restr_value {
?prop rdfs:range ?range.
?range owl:onDatatype ?datatype;
owl:withRestrictions ?restr_list.
?restr_list rdf:rest*/rdf:first ?restr.
?restr ?restr_type ?restr_value
}

Related

Java XML JDOM2 XPath - Read text value from XML attribute and element using XPath expression

The program should be allowed to read from an XML file using XPath expressions.
I already started the project using JDOM2, switching to another API is unwanted.
The difficulty is, that the program does not know beforehand if it has to read an element or an attribute.
Does the API provide any function to receive the content (string) just by giving it the XPath expression?
From what I know about XPath in JDOM2, it uses objects of different types to evaluate XPath expressions pointing to attributes or elements.
I am only interested in the content of the attribute / element where the XPath expression points to.
Here is an example XML file:
<?xml version="1.0" encoding="UTF-8"?>
<bookstore>
<book category="COOKING">
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>
<book category="CHILDREN">
<title lang="en">Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
<book category="WEB">
<title lang="en">XQuery Kick Start</title>
<author>James McGovern</author>
<author>Per Bothner</author>
<author>Kurt Cagle</author>
<author>James Linn</author>
<author>Vaidyanathan Nagarajan</author>
<year>2003</year>
<price>49.99</price>
</book>
<book category="WEB">
<title lang="en">Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>39.95</price>
</book>
</bookstore>
This is what my program looks like:
package exampleprojectgroup;
import java.io.IOException;
import java.util.LinkedList;
import java.util.List;
import org.jdom2.Attribute;
import org.jdom2.Document;
import org.jdom2.Element;
import org.jdom2.JDOMException;
import org.jdom2.filter.Filters;
import org.jdom2.input.SAXBuilder;
import org.jdom2.input.sax.XMLReaders;
import org.jdom2.xpath.XPathExpression;
import org.jdom2.xpath.XPathFactory;
public class ElementAttribute2String
{
ElementAttribute2String()
{
run();
}
public void run()
{
final String PATH_TO_FILE = "c:\\readme.xml";
/* It is essential that the program has to work with a variable amount of XPath expressions. */
LinkedList<String> xPathExpressions = new LinkedList<>();
/* Simulate user input.
* First XPath expression points to attribute,
* second one points to element.
* Many more expressions follow in a real situation.
*/
xPathExpressions.add( "/bookstore/book/#category" );
xPathExpressions.add( "/bookstore/book/price" );
/* One list should be sufficient to store the result. */
List<Element> elementsResult = null;
List<Attribute> attributesResult = null;
List<Object> objectsResult = null;
try
{
SAXBuilder saxBuilder = new SAXBuilder( XMLReaders.NONVALIDATING );
Document document = saxBuilder.build( PATH_TO_FILE );
XPathFactory xPathFactory = XPathFactory.instance();
int i = 0;
for ( String string : xPathExpressions )
{
/* Works only for elements, uncomment to give it a try. */
// XPathExpression<Element> xPathToElement = xPathFactory.compile( xPathExpressions.get( i ), Filters.element() );
// elementsResult = xPathToElement.evaluate( document );
// for ( Element element : elementsResult )
// {
// System.out.println( "Content of " + string + ": " + element.getText() );
// }
/* Works only for attributes, uncomment to give it a try. */
// XPathExpression<Attribute> xPathToAttribute = xPathFactory.compile( xPathExpressions.get( i ), Filters.attribute() );
// attributesResult = xPathToAttribute.evaluate( document );
// for ( Attribute attribute : attributesResult )
// {
// System.out.println( "Content of " + string + ": " + attribute.getValue() );
// }
/* I want to receive the content of the XPath expression as a string
* without having to know if it is an attribute or element beforehand.
*/
XPathExpression<Object> xPathExpression = xPathFactory.compile( xPathExpressions.get( i ) );
objectsResult = xPathExpression.evaluate( document );
for ( Object object : objectsResult )
{
if ( object instanceof Attribute )
{
System.out.println( "Content of " + string + ": " + ((Attribute)object).getValue() );
}
else if ( object instanceof Element )
{
System.out.println( "Content of " + string + ": " + ((Element)object).getText() );
}
}
i++;
}
}
catch ( IOException ioException )
{
ioException.printStackTrace();
}
catch ( JDOMException jdomException )
{
jdomException.printStackTrace();
}
}
}
Another thought is to search for the '#' character in the XPath expression, to determine if it is pointing to an attribute or element.
This gives me the desired result, though I wish there was a more elegant solution.
Does the JDOM2 API provide anything useful for this problem?
Could the code be redesigned to meet my requirements?
Thank you in advance!
XPath expressions are hard to type/cast because they need to be compiled in a system that is sensitive to the return type of the XPath functions/values that are in the expression. JDOM relies on third-party code to do that, and that third party code does not have a mechanism to correlate those types at your JDOM code's compile time. Note that XPath expressions can return a number of different types of content, including String, boolean, Number, and Node-List-like content.
In most cases, the XPath expression return type is known before the expression is evaluated, and the programmer has the "right" casting/expectations for processing the results.
In your case, you don't, and the expression is more dynamic.
I recommend that you declare a helper function to process the content:
private static final Function extractValue(Object source) {
if (source instanceof Attribute) {
return ((Attribute)source).getValue();
}
if (source instanceof Content) {
return ((Content)source).getValue();
}
return String.valueOf(source);
}
This at least will neaten up your code, and if you use Java8 streams, can be quite compact:
List<String> values = xPathExpression.evaluate( document )
.stream()
.map(o -> extractValue(o))
.collect(Collectors.toList());
Note that the XPath spec for Element nodes is that the string-value is the concatination of the Element's text() content as well as all child elements' content. Thus, in the following XML snippet:
<a>bilbo <b>samwise</b> frodo</a>
the getValue() on the a element will return bilbo samwise frodo, but the getText() will return bilbo frodo. Choose which mechanism you use for the value extraction carefully.
I had the exact same problem and took the approach of recognizing when an attribute is the focus of the Xpath. I solved with two functions. The first complied the XPathExpression for later use:
XPathExpression xpExpression;
if (xpath.matches( ".*/#[\\w]++$")) {
// must be an attribute value we're after..
xpExpression = xpfac.compile(xpath, Filters.attribute(), null, myNSpace);
} else {
xpExpression = xpfac.compile(xpath, Filters.element(), null, myNSpace);
}
The second evaluates and returns a value:
Object target = xpExpression.evaluateFirst(baseEl);
if (target != null) {
String value = null;
if (target instanceof Element) {
Element targetEl = (Element) target;
value = targetEl.getTextNormalize();
} else if (target instanceof Attribute) {
Attribute targetAt = (Attribute) target;
value = targetAt.getValue();
}
I suspect its a matter of coding style whether you prefer the helper function suggested in the previous answer or this approach. Either will work.

Basic RDFS inferencing with the Jena API

I'm currently following the Jena API inferencing tutorial:
https://jena.apache.org/documentation/inference/
and as an exercise to test my understanding, I'd like to rewrite the first example, which demonstrates a trivial RDFS reasoning from a programmatically built model:
import com.hp.hpl.jena.rdf.model.*;
import com.hp.hpl.jena.vocabulary.*;
public class Test1 {
static public void main(String...argv) {
String NS = "foo:";
Model m = ModelFactory.createDefaultModel();
Property p = m.createProperty(NS, "p");
Property q = m.createProperty(NS, "q");
m.add(p, RDFS.subPropertyOf, q);
m.createResource(NS + "x").addProperty(p, "bar");
InfModel im = ModelFactory.createRDFSModel(m);
Resource x = im.getResource(NS + "x");
// verify that property q of x is "bar" (which follows
// from x having property p, and p being a subproperty of q)
System.out.println("Statement: " + x.getProperty(q));
}
}
to something which does the same, but with the model read from this Turtle file instead (which is my own translation of the above, and thus might be buggy):
#prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
#prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
#prefix foo: <http://example.org/foo#>.
foo:p a rdf:Property.
foo:q a rdf:Property.
foo:p rdfs:subPropertyOf foo:q.
foo:x foo:p "bar".
with this code:
public class Test2 {
static public void main(String...argv) {
String NS = "foo:";
Model m = ModelFactory.createDefaultModel();
m.read("foo.ttl");
InfModel im = ModelFactory.createRDFSModel(m);
Property q = im.getProperty(NS + "q");
Resource x = im.getResource(NS + "x");
System.out.println("Statement: " + x.getProperty(q));
}
}
which doesn't seem to be the right approach (I suspect in particular that my extraction of the q property is somehow not right). What am I doing wrong?
String NS = "foo:";
m.createResource(NS + "x")
creates a URI but the Turtle version has foo:x = http://example.org/foo#x
See the differences by printing the model im.write(System.out, "TTL");
Change NS = "foo:" to NS = "http://example.org/foo#"

What is the easy and best way to read below xml?

I am new to XML.
Please let me know easy and best way to read xml below in java. In my xml, I have queries as root and query as child element in it.
<queries>
<query id="getUserByName">
select * from users where name=?
</query>
<query id="getUserByEmail">
select * from users where email=?
</query>
</queries>
I will pass query id, based on that we need to fetch corresponding query. Please help me with code for better understanding.
With XPath, it's simple.
import java.io.ByteArrayInputStream;
import java.io.InputStream;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathFactory;
import org.xml.sax.InputSource;
public class Test {
public static final String xml =
"<queries>"
+ " <query id=\"getUserByName\">"
+ " select * from users where name=?"
+ " </query>"
+ " <query id=\"getUserByEmail\">"
+ " select * from users where email=?"
+ " </query>"
+ "</queries>";
public static void main(String[] args) throws Exception {
System.out.println(getQuery("getUserByName"));
System.out.println(getQuery("getUserByEmail"));
}
public static String getQuery (String id) throws Exception {
InputStream is = new ByteArrayInputStream(xml.getBytes("UTF8"));
InputSource inputSource = new InputSource(is);
XPath xpath = XPathFactory.newInstance().newXPath();
return xpath.evaluate("/queries/query[#id='" + id +"']", inputSource);
}
}
A very easy code to implement would be JAXB parser. Personally I love this one as it establishes everything using simple annotations.
Steps.
Create a couple of bean classes with the structure of your xml. In your case Queries class containing List<Query>. Define Query to contain a string variable. If you take the time to go through the annotations, I'm sure you can do this even with a single bean class but with multiple annotations.
Pass your string of XML to a JAXB context of Queries class and you are done.
You'll get one Java object for each Query tag. Once you get the bean class, manipulation becomes easy.
Ref:
JAXB Hello World Example
JAXB Tutorial
A good solution would be to load these queries to a map first and access it later based on the map. To load queries to a map you could do something like:
Map<String, String> queriesMap = new HashMap<String, String>();
DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
ByteArrayInputStream inputStream = new ByteArrayInputStream("<queries> <query id=\"getUserByName\"> select * from users where name=? </query> <query id=\"getUserByEmail\"> select * from users where email=? </query></queries>".getBytes());
// you could use something like: new FileInputStream("queries.xml");
Document doc = documentBuilder.parse(inputStream);
// get queries elements
NodeList queriesNodes = doc.getElementsByTagName("queries");
// iterate over it
for (int i = 0; i < queriesNodes.getLength(); i++) {
// get queries element
Node node = queriesNodes.item(i);
// get query elements (theoretically)
NodeList queryNodes = node.getChildNodes();
for (int j = 0; j < queryNodes.getLength(); j++) {
Node queryNode = queryNodes.item(j);
// if not element just skip to next one (in case of text nodes for the white spaces)
if (!(queryNode.getNodeType() == Node.ELEMENT_NODE)) {
continue;
}
// get query
Node idAttr = queryNode.getAttributes().getNamedItem("id");
if (idAttr != null) {
queriesMap.put(idAttr.getTextContent(), StringUtils.trim(queryNode.getTextContent()));
}
}
}
System.out.println(queriesMap);

Lucene performance: Transferring fields data from one index to another

In short, I am in need to exchange the mapping of multiple field and values from one Index to the resulting Index.
The following is the scenario.
Index 1 Structure
[Field => Values] [Stored]
Doc 1
keys => keyword1;
Ids => id1, id1, id2, id3, id7, id11, etc..
Doc 2
keys => keyword2;
Ids => id3, id11, etc..
Index 2 Structure
[Field => Values] [Stored]
Doc 1
ids => id1
keys => keyword1, keyword1
Doc 3
ids => id3
keys => keyword1, keyword2, etc..
Please note that the keys<->ids mapping is reversed in the resulting Index.
What do you think the most effective way to accomplish this in terms of time complexity? ..
The only way I could think of is that..
1) index1Reader.terms();
2) Process only terms belonging to "Ids" field
3) For each term, get TermDocs
4) For each doc, load it, get "keys" field info
5) Create a new Lucene Doc, add 'Id', multi Keys, write it to index2.
6) Go to step 2.
Since the fields are stored, I'm sure that there are multiple ways of doing it.
Please guide me with any performance techniques. Even the slightest improvement will have a huge impact in my scenario considering that the Index1 size is ~ 6GB.
Total no. of unique keywords: 18 million;
Total no. of unique ids: 0.9 million
Interesting UPDATE
Optimization 1
While adding a new doc, instead of creating multiple duplicate 'Field' objects, creating a single StringBuffer with " " delimiter, and then adding entire as a single Field seems to have up to 25% improvement.
UPDATE 2: Code
public void go() throws IOException, ParseException {
String id = null;
int counter = 0;
while ((id = getNextId()) != null) { // this method is not taking time..
System.out.println("Node id: " + id);
updateIndex2DataForId(id);
if(++counter > 10){
break;
}
}
index2Writer.close();
}
private void updateIndex2DataForId(String id) throws ParseException, IOException {
// Get all terms containing the node id
TermDocs termDocs = index1Reader.termDocs(new Term("id", id));
// Iterate
Document doc = new Document();
doc.add(new Field("id", id, Store.YES, Index.NOT_ANALYZED));
int docId = -1;
while (termDocs.next()) {
docId = termDocs.doc();
doc.add(getKeyDataAsField(docId, Store.YES, Index.NOT_ANALYZED));
}
index2Writer.addDocument(doc);
}
private Field getKeyDataAsField(int docId, Store storeOption, Index indexOption) throws CorruptIndexException,
IOException {
Document doc = index1Reader.document(docId, fieldSelector); // fieldSel has "key"
Field f = new Field("key", doc.get("key"), storeOption, indexOption);
return f;
}
Usage of FieldCache worked like a charm... But, we need to allot more and more RAM to accommodate all the fields on the heap.
I've updated the above updateIndex2DataForId() with the following snippet..
private void updateIndex2DataForId(String id) throws ParseException, IOException {
// Get all terms containing the node id
TermDocs termDocs = index1Reader.termDocs(new Term("id", id));
// Iterate
Document doc = new Document();
doc.add(new Field("id", id, Store.YES, Index.NOT_ANALYZED));
int docId = -1;
StringBuffer buffer = new StringBuffer();
while (termDocs.next()) {
docId = termDocs.doc();
buffer .append(keys[docId] + " "); // keys[] is pre-populated using FieldCache
}
doc.add(new Field("id", buffer.trim().toString(), Store.YES, Index.ANALYZED));
index2Writer.addDocument(doc);
}
String[] keys = FieldCache.DEFAULT.getStrings(index1Reader, "keywords");
It made everything faster, I cannot tell you the exact metrics but I must say very substantial.
Now the program is completing in a bit of reasonable time. Anyways, further guidance is highly appreciated.

How to find min-max occurrence of an element in xsd using xsom

I want to find out the minimum occurence maximm occurence of an xsd element using xsom of java.I got this code to find out complex elements.Can anyone help me in find out occurence of all the xsd element.Atlest give me a code snippet with the class and method to be used to find the occurrence
xmlfile = "Calendar.xsd"
XSOMParser parser = new XSOMParser();
parser.parse(new File(xmlfile));
XSSchemaSet sset = parser.getResult();
XSSchema s = sset.getSchema(1);
if (s.getTargetNamespace().equals("")) // this is the ns with all the stuff
// in
{
// try ElementDecls
Iterator jtr = s.iterateElementDecls();
while (jtr.hasNext())
{
XSElementDecl e = (XSElementDecl) jtr.next();
System.out.print("got ElementDecls " + e.getName());
// ok we've got a CALENDAR.. what next?
// not this anyway
/*
*
* XSParticle[] particles = e.asElementDecl() for (final XSParticle p :
* particles) { final XSTerm pterm = p.getTerm(); if
* (pterm.isElementDecl()) { final XSElementDecl ed =
* pterm.asElementDecl(); System.out.println(ed.getName()); }
*/
}
// try all Complex Types in schema
Iterator<XSComplexType> ctiter = s.iterateComplexTypes();
while (ctiter.hasNext())
{
// this will be a eSTATUS. Lets type and get the extension to
// see its a ENUM
XSComplexType ct = (XSComplexType) ctiter.next();
String typeName = ct.getName();
System.out.println(typeName + newline);
// as Content
XSContentType content = ct.getContentType();
// now what?
// as Partacle?
XSParticle p2 = content.asParticle();
if (null != p2)
{
System.out.print("We got partical thing !" + newline);
// might would be good if we got here but we never do :-(
}
// try complex type Element Decs
List<XSElementDecl> el = ct.getElementDecls();
for (XSElementDecl ed : el)
{
System.out.print("We got ElementDecl !" + ed.getName() + newline);
// would be good if we got here but we never do :-(
}
Collection<? extends XSAttributeUse> c = ct.getAttributeUses();
Iterator<? extends XSAttributeUse> i = c.iterator();
while (i.hasNext())
{
XSAttributeDecl attributeDecl = i.next().getDecl();
System.out.println("type: " + attributeDecl.getType());
System.out.println("name:" + attributeDecl.getName());
}
}
}
Assuming you are referring to com.sun.xml.xsom, the occurrence is specific to a particle (elements are not the only particles).
Here are the APIs: maxOccurs and minOccurs
For one source to see how to traverse a schema tree using XSOM please take a look here. It shows basically how the visitor patterns works with XSOM (for which Sun built a package).

Categories