Issue with binding values from sub selection in Jena ARQ

Issue with binding values from sub selection in Jena ARQ - java

I want to run the following simple testing query:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX vcard: <http://www.w3.org/2001/vcard-rdf/3.0#>
SELECT ?givenName ?name_count ?temp
WHERE
{ BIND(if(( ?name_count = 2 ), "just two", "definitely not 2") AS ?temp)
{ SELECT DISTINCT ?givenName (COUNT(?givenName) AS ?name_count)
WHERE
{ ?y vcard:Family ?givenName }
GROUP BY ?givenName
}
}
The graph I am querying is this from the tutorial https://jena.apache.org/tutorials/sparql_data.html:
#prefix vCard: <http://www.w3.org/2001/vcard-rdf/3.0#> .
#prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
<http://somewhere/MattJones/> vCard:FN "Matt Jones" .
<http://somewhere/MattJones/> vCard:N _:b0 .
_:b0 vCard:Family "Jones" .
_:b0 vCard:Given "Matthew" .
<http://somewhere/RebeccaSmith/> vCard:FN "Becky Smith" .
<http://somewhere/RebeccaSmith/> vCard:N _:b1 .
_:b1 vCard:Family "Smith" .
_:b1 vCard:Given "Rebecca" .
<http://somewhere/JohnSmith/> vCard:FN "John Smith" .
<http://somewhere/JohnSmith/> vCard:N _:b2 .
_:b2 vCard:Family "Smith" .
_:b2 vCard:Given "John" .
<http://somewhere/SarahJones/> vCard:FN "Sarah Jones" .
<http://somewhere/SarahJones/> vCard:N _:b3 .
_:b3 vCard:Family "Jones" .
_:b3 vCard:Given "Sarah" .
Now the problem is that running it with Jena:
Query query = QueryFactory.create(theAboveQueryAsString);
QueryExecution qexec = QueryExecutionFactory.create(query, theAboveGraphmodel);
ResultSet execSel = qexec.execSelect();
ResultSetRewindable results = ResultSetFactory.copyResults(execSel);;
ResultSetFormatter.out(System.out, results, query);
gives off this result in console:
----------------------------------
| givenName | name_count | temp |
==================================
| "Smith" | 2 | |
| "Jones" | 2 | |
----------------------------------
having the temp values as null.
On the other hand running the same query on the the same graph in Ontotext GraphDb enviroment i get the correct result (saved as CSV):
givenName | name_count | temp
------------------------------------
Jones | 2 | just two
Smith | 2 | just two
Could there be a bug in the ARQ engine or am I missing something?
Thanks in advance.
I am using jena-arq 3.12.0
Java(TM) SE Runtime Environment (build 1.8.0_181-b13)
Eclipse Version Version: 2019-06 (4.12.0)

There is a join between BIND and sub-select. The arguments to the join step are calculated before the join is done. So the BIND is evaluated, the sub-select is evaluated separately and the results joined. ?name_count isn't set in the BIND assignment. If you move it after the sub-select, it will apply to the results of the sub-select.
BIND adds a binding to the result of the pattern before it.
(base <http://example/base/>
(prefix ((rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>)
(vcard: <http://www.w3.org/2001/vcard-rdf/3.0#>))
(project (?givenName ?name_count ?temp)
(join
(extend ((?temp (if (= ?name_count 2) "just two" "definitely not 2")))
(table unit))
(distinct
(project (?givenName ?name_count)
(extend ((?name_count ?.0))
(group (?givenName) ((?.0 (count ?givenName)))
(bgp (triple ?y vcard:Family ?givenName))))))))))
Here, the (extend...) is one of two argument to the (join...). (table unit) is the "nothing" before the BIND.
If put afterwards, the algebra is:
(base <http://example/base/>
(prefix ((rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>)
(vcard: <http://www.w3.org/2001/vcard-rdf/3.0#>))
(project (?givenName ?name_count ?temp)
(extend ((?temp (if (= ?name_count 2) "just two" "definitely not 2")))
(distinct
(project (?givenName ?name_count)
(extend ((?name_count ?.0))
(group (?givenName) ((?.0 (count ?givenName)))
(bgp (triple ?y vcard:Family ?givenName))))))))))
and the extend (which is from the syntax BIND) is working on the (distinct ... of the sub-query.

Related

The function "org.apache.jena.rdf.model.hasProperty" didn't work

I'm learning jena recently.
I try to understand their TUTORIALS. (https://jena.apache.org/tutorials/rdf_api.html#ch-Navigating-a-Model).
When I compile Tutorial06, (https://github.com/apache/jena/blob/main/jena-core/src-examples/jena/examples/rdf/Tutorial06.java)
I put a "/" in the end of the URI by accident in line 32:
static final String johnSmithURI = "http://somewhere/JohnSmith/";
(it should be: static final String johnSmithURI = "http://somewhere/JohnSmith";)
so I got the exception.
I want to use "org.apache.jena.rdf.model.hasProperty" to set a condition,
but it didn't work.
There might be two situation:
if I directly copy and paste "vcard.hasProperty(VCARD.Family)"
the console will show "The method hasProperty(Property) is undefined for the type Resource"
it's strange
this method is defined in their doc
https://jena.apache.org/documentation/javadoc/jena/org/apache/jena/rdf/model/Resource.html#hasProperty(org.apache.jena.rdf.model.Property)
However, if I just choose the function after I keyed in "." like the picture
enter image description here
It won't show the error
but the value of Boolean seems strange
I add those code since line 51 in Tutorial06:
System.out.println("vcard.hasProperty(VCARD.FN) : " + vcard.hasProperty(VCARD.FN)) ; System.out.println("vcard.hasProperty(VCARD.N) : " + vcard.hasProperty(VCARD.N)) ; System.out.println("vcard.hasProperty(VCARD.Family) : " + vcard.hasProperty(VCARD.Family)) ; System.out.println("vcard.hasProperty(VCARD.Given) : " + vcard.hasProperty(VCARD.Given)) ; System.out.println("vcard.hasProperty(VCARD.EMAIL) : " + vcard.hasProperty(VCARD.EMAIL)) ;
Result:
vcard.hasProperty(VCARD.FN) : true
vcard.hasProperty(VCARD.N) : true
vcard.hasProperty(VCARD.Family) : false
vcard.hasProperty(VCARD.Given) : false
vcard.hasProperty(VCARD.EMAIL) : false
the rdf file look like this:
<rdf:Description rdf:about="http://somewhere/JohnSmith">
<vCard:FN>John Smith</vCard:FN>
<vCard:N rdf:parseType="Resource">
<vCard:Family>Smith</vCard:Family>
<vCard:Given>John</vCard:Given>
</vCard:N>
</rdf:Description>
I'll appreciate if anyone can give me some idea about those situation.
I use Eclipse with jdk-11 and the jena-core -3.2.0.jar as my library

The "vCard:Family" has a different subject because of the <vCard:N rdf:parseType="Resource">.
The RDF structure is clearer in Turtle or N-Triples:
#prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
#prefix vCard: <http://www.w3.org/2001/vcard-rdf/3.0#> .
<http://somewhere/JohnSmith>
vCard:FN "John Smith" ;
vCard:N [ vCard:Family "Smith" ;
vCard:Given "John"
] .
or equivalently:
#prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
#prefix vCard: <http://www.w3.org/2001/vcard-rdf/3.0#> .
<http://somewhere/JohnSmith>
vCard:FN "John Smith" ;
vCard:N _:b0 .
_:b0 vCard:Family "Smith" ;
vCard:Given "John" .
or in N-Triples:
<http://somewhere/JohnSmith> <http://www.w3.org/2001/vcard-rdf/3.0#FN> "John Smith" .
<http://somewhere/JohnSmith> <http://www.w3.org/2001/vcard-rdf/3.0#N> _:B99dd1c9aX2D9045X2D4719X2D8d7eX2D9f4043b9b757 .
_:B99dd1c9aX2D9045X2D4719X2D8d7eX2D9f4043b9b757 <http://www.w3.org/2001/vcard-rdf/3.0#Family> "Smith" .
_:B99dd1c9aX2D9045X2D4719X2D8d7eX2D9f4043b9b757 <http://www.w3.org/2001/vcard-rdf/3.0#Given> "John" .

Dynamic Array in RDF/XML

In c# I am generating a graph and am using RDF/XML to send it to my android application. Part of my graph is supposed to show different trains that arrive at a certain train station.
Adding a train, its arrival, departure, destination and so on is no problem. Unfortunately I don't know how to get all stations the different trains pass on their way in the RDF/XML.
For example:
Train A starts in city 1, passes city 2 & 3, and arrives in city 4.
Train B starts in city 1, passes city 5 & 6 & 7, and arrives in city 8.
How can I dynamically add arrays like [city 5, city 6, city 7] to a train?
My current RDF/XML:
<?xml version="1.0" encoding="utf-16"?>" +
"<!DOCTYPE rdf:RDF [\n" +
"\t<!ENTITY rdf 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'>\n" +
"]>\n" +
"<rdf:RDF xmlns:rdfs=\"http://www.w3.org/2000/01/rdf-schema#\" xmlns:xsd=\"http://www.w3.org/2001/XMLSchema#\" xmlns:ns0=\"http://my.url.com/ontologies/mash-up#\" xmlns:ns1=\"http://xmlns.com/foaf/0.1/\" xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\">\n" +
" <ns0:Train rdf:about=\"http://my.url.com/ontologies/mash-up#Train-1d4b674c-479c-48a3-aab3-b729fc96cbd4\">\n" +
" <ns0:name rdf:datatype=\"&xsd;string\">RE 1</ns0:name>\n" +
" <ns0:description rdf:datatype=\"&xsd;string\">Platform 1</ns0:description>\n" +
" <ns0:arrival rdf:datatype=\"&xsd;dateTime\">2015-04-14T18:00:40Z</ns0:arrival>\n" +
" <ns0:departure rdf:datatype=\"&xsd;dateTime\">2015-04-14T18:02:40Z</ns0:departure>\n" +
" <ns0:destination rdf:datatype=\"&xsd;string\">Padaborn</ns0:destination>\n" +
" <ns1:primaryTopic rdf:resource=\"http://my.url.com/ontologies/mash-up#Train-1d4b674c-479c-48a3-aab3-b729fc96cbd4\" />\n" +
" </ns0:Train>\n" +
"</rdf:RDF>";
Simply adding all stations as an own attribute would mean my Jena query would have to have very very many optionals and I'd like to avoid that.
Thanks in advance.
Edit:
This is how I would do it. That would mean very many optionals...
" <ns0:stations rdf:about=\"http://my.url.com/ontologies/mash-up#Stations-xxxx\">“+
" <ns0:from rdf:datatype=\"&xsd;string\">Aachen</ns0:from>\n“ +
" <ns0:to rdf:datatype=\"&xsd;string\">Padaborn</ns0:to>\n“ +
" <ns0:over1 rdf:datatype=\"&xsd;string\“>Köln</ns0:to>\n“ +
" <ns0:over2 rdf:datatype=\"&xsd;string\“>Düsseldorf</ns0:to>\n“ +
" <ns0:over3 rdf:datatype=\"&xsd;string\“>Duisburg</ns0:to>\n“ +
" <ns0:over4 rdf:datatype=\"&xsd;string\“>Essen</ns0:to>\n“ +
" <ns0:over5 rdf:datatype=\"&xsd;string\“>Dortmund</ns0:to>\n“ +
" <ns0:over6 rdf:datatype=\"&xsd;string\“>Hamm (Westf)</ns0:to>\n“ +
" </ns0:stations>\n"

If order of the stations does not matter:
The simplest option, if the order of the stations passed is not important, would be to use the same property. That would look like this in the data (I'll use Turtle, since it's more human readable and writeable, but show the corresponding RDF/XML too; it's easy to convert between the two, since they're just different serializations of the same data):
#prefix : <urn:train:>.
#prefix xsd: <http://www.w3.org/2001/XMLSchema#>.
:train1
:name "RE 1" ;
:description "Platform 1" ;
:arrival "2015-04-14T18:00:40Z"^^xsd:dateTime ;
:departure "2015-04-14T18:02:40Z"^^xsd:dateTime ;
:over "Station1", "Station2", "Station3" ;
:destination "Padaborn" .
<rdf:RDF
xmlns="urn:train:"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:xsd="http://www.w3.org/2001/XMLSchema#">
<rdf:Description rdf:about="urn:train:train1">
<name>RE 1</name>
<description>Platform 1</description>
<arrival rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime"
>2015-04-14T18:00:40Z</arrival>
<departure rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime"
>2015-04-14T18:02:40Z</departure>
<over>Station1</over>
<over>Station2</over>
<over>Station3</over>
<destination>Padaborn</destination>
</rdf:Description>
</rdf:RDF>
Then you can use just one optional in the SPARQL query:
prefix : <urn:train:>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
select * where {
?train :name ?name ;
:description ?description ;
:arrival ?arrival ;
:departure ?departure ;
:destination ?destination .
optional {
?train :over ?over
}
}
--------------------------------------------------------------------------------------------------------------------------------------------
| train | name | description | arrival | departure | destination | over |
============================================================================================================================================
| :train1 | "RE 1" | "Platform 1" | "2015-04-14T18:00:40Z"^^xsd:dateTime | "2015-04-14T18:02:40Z"^^xsd:dateTime | "Padaborn" | "Station3" |
| :train1 | "RE 1" | "Platform 1" | "2015-04-14T18:00:40Z"^^xsd:dateTime | "2015-04-14T18:02:40Z"^^xsd:dateTime | "Padaborn" | "Station2" |
| :train1 | "RE 1" | "Platform 1" | "2015-04-14T18:00:40Z"^^xsd:dateTime | "2015-04-14T18:02:40Z"^^xsd:dateTime | "Padaborn" | "Station1" |
--------------------------------------------------------------------------------------------------------------------------------------------
You might want those in a list, so you'd group by and group_concat their values:
prefix : <urn:train:>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
select ?train ?name ?description
?arrival ?departure
?destination
(group_concat(?over) as ?overs)
where {
?train :name ?name ;
:description ?description ;
:arrival ?arrival ;
:departure ?departure ;
:destination ?destination .
optional {
?train :over ?over
}
}
group by ?train ?name ?description ?arrival ?departure ?destination
--------------------------------------------------------------------------------------------------------------------------------------------------------------
| train | name | description | arrival | departure | destination | overs |
==============================================================================================================================================================
| :train1 | "RE 1" | "Platform 1" | "2015-04-14T18:00:40Z"^^xsd:dateTime | "2015-04-14T18:02:40Z"^^xsd:dateTime | "Padaborn" | "Station3 Station2 Station1" |
--------------------------------------------------------------------------------------------------------------------------------------------------------------
If the order matters
If the order is more important, then you'll need to preserve it somehow. You could either do this with distinct properties, or one property that has a list value (or some other structured data that supports ordering).
With multiple properties
Your solution that used over1, over2, over3, properties can actually be realized using just one optional pattern, since you can use a variable in place of a property, and then filter on the value of that. Just check whether its URI begins with the right prefix, including over:
#prefix : <urn:train:>.
#prefix xsd: <http://www.w3.org/2001/XMLSchema#>.
:train1
:name "RE 1" ;
:description "Platform 1" ;
:arrival "2015-04-14T18:00:40Z"^^xsd:dateTime ;
:departure "2015-04-14T18:02:40Z"^^xsd:dateTime ;
:over1 "Station1" ;
:over2 "Station2" ;
:over3 "Station3" ;
:destination "Padaborn" .
<rdf:RDF
xmlns="urn:train:"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:xsd="http://www.w3.org/2001/XMLSchema#">
<rdf:Description rdf:about="urn:train:train1">
<name>RE 1</name>
<description>Platform 1</description>
<arrival rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime"
>2015-04-14T18:00:40Z</arrival>
<departure rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime"
>2015-04-14T18:02:40Z</departure>
<over1>Station1</over1>
<over2>Station2</over2>
<over3>Station3</over3>
<destination>Padaborn</destination>
</rdf:Description>
</rdf:RDF>
prefix : <urn:train:>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
select * where {
?train :name ?name ;
:description ?description ;
:arrival ?arrival ;
:departure ?departure ;
:destination ?destination .
optional {
?train ?overProp ?over .
filter strstarts(str(?overProp),str(:over))
}
}
order by ?overProp
-------------------------------------------------------------------------------------------------------------------------------------------------------
| train | name | description | arrival | departure | destination | overProp | over |
=======================================================================================================================================================
| :train1 | "RE 1" | "Platform 1" | "2015-04-14T18:00:40Z"^^xsd:dateTime | "2015-04-14T18:02:40Z"^^xsd:dateTime | "Padaborn" | :over1 | "Station1" |
| :train1 | "RE 1" | "Platform 1" | "2015-04-14T18:00:40Z"^^xsd:dateTime | "2015-04-14T18:02:40Z"^^xsd:dateTime | "Padaborn" | :over2 | "Station2" |
| :train1 | "RE 1" | "Platform 1" | "2015-04-14T18:00:40Z"^^xsd:dateTime | "2015-04-14T18:02:40Z"^^xsd:dateTime | "Padaborn" | :over3 | "Station3" |
-------------------------------------------------------------------------------------------------------------------------------------------------------
And of course, you can use group by to combine the values here in the same way.
With just a little bit of structure
You could also use just one property, but have the value be something with just a little bit of structure. E.g., a blank node that has a value for the station and the index in which it's passed:
#prefix : <urn:train:>.
#prefix xsd: <http://www.w3.org/2001/XMLSchema#>.
:train1
:name "RE 1" ;
:description "Platform 1" ;
:arrival "2015-04-14T18:00:40Z"^^xsd:dateTime ;
:departure "2015-04-14T18:02:40Z"^^xsd:dateTime ;
:over [:station "Station1" ; :number 2 ] ,
[:station "Station2" ; :number 3 ] ,
[:station "Station3" ; :number 1 ] ;
:destination "Padaborn" .
<rdf:RDF
xmlns="urn:train:"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:xsd="http://www.w3.org/2001/XMLSchema#">
<rdf:Description rdf:about="urn:train:train1">
<name>RE 1</name>
<description>Platform 1</description>
<arrival rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime"
>2015-04-14T18:00:40Z</arrival>
<departure rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime"
>2015-04-14T18:02:40Z</departure>
<over rdf:parseType="Resource">
<station>Station1</station>
<number rdf:datatype="http://www.w3.org/2001/XMLSchema#integer"
>2</number>
</over>
<over rdf:parseType="Resource">
<station>Station2</station>
<number rdf:datatype="http://www.w3.org/2001/XMLSchema#integer"
>3</number>
</over>
<over rdf:parseType="Resource">
<station>Station3</station>
<number rdf:datatype="http://www.w3.org/2001/XMLSchema#integer"
>1</number>
</over>
<destination>Padaborn</destination>
</rdf:Description>
</rdf:RDF>
prefix : <urn:train:>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
select * where {
?train :name ?name ;
:description ?description ;
:arrival ?arrival ;
:departure ?departure ;
:destination ?destination .
optional {
?train :over [ :station ?over ;
:number ?number ]
}
}
-----------------------------------------------------------------------------------------------------------------------------------------------------
| train | name | description | arrival | departure | destination | over | number |
=====================================================================================================================================================
| :train1 | "RE 1" | "Platform 1" | "2015-04-14T18:00:40Z"^^xsd:dateTime | "2015-04-14T18:02:40Z"^^xsd:dateTime | "Padaborn" | "Station3" | 1 |
| :train1 | "RE 1" | "Platform 1" | "2015-04-14T18:00:40Z"^^xsd:dateTime | "2015-04-14T18:02:40Z"^^xsd:dateTime | "Padaborn" | "Station2" | 3 |
| :train1 | "RE 1" | "Platform 1" | "2015-04-14T18:00:40Z"^^xsd:dateTime | "2015-04-14T18:02:40Z"^^xsd:dateTime | "Padaborn" | "Station1" | 2 |
-----------------------------------------------------------------------------------------------------------------------------------------------------
With a single property and a list
You could also have a single over property and have its value be a list. This is very easy to write in the N3/Turtle serialization. It's not as pretty in RDF/XML, but like I said in a comment, it's better to use a human writeable syntax if you're writing by hand, or to use a proper API for writing it. It's not too hard to query in the SPARQL either.
#prefix : <urn:train:>.
#prefix xsd: <http://www.w3.org/2001/XMLSchema#>.
:train1
:name "RE 1" ;
:description "Platform 1" ;
:arrival "2015-04-14T18:00:40Z"^^xsd:dateTime ;
:departure "2015-04-14T18:02:40Z"^^xsd:dateTime ;
:over ("Station1" "Station2" "Station3") ;
:destination "Padaborn" .
<rdf:RDF
xmlns="urn:train:"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:xsd="http://www.w3.org/2001/XMLSchema#">
<rdf:Description rdf:about="urn:train:train1">
<name>RE 1</name>
<description>Platform 1</description>
<arrival rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime"
>2015-04-14T18:00:40Z</arrival>
<departure rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime"
>2015-04-14T18:02:40Z</departure>
<over rdf:parseType="Resource">
<rdf:first>Station1</rdf:first>
<rdf:rest rdf:parseType="Resource">
<rdf:first>Station2</rdf:first>
<rdf:rest rdf:parseType="Resource">
<rdf:first>Station3</rdf:first>
<rdf:rest rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#nil"/>
</rdf:rest>
</rdf:rest>
</over>
<destination>Padaborn</destination>
</rdf:Description>
</rdf:RDF>
prefix : <urn:train:>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
select * where {
?train :name ?name ;
:description ?description ;
:arrival ?arrival ;
:departure ?departure ;
:destination ?destination .
optional {
?train :over/(rdf:rest*/rdf:first) ?over
}
}
--------------------------------------------------------------------------------------------------------------------------------------------
| train | name | description | arrival | departure | destination | over |
============================================================================================================================================
| :train1 | "RE 1" | "Platform 1" | "2015-04-14T18:00:40Z"^^xsd:dateTime | "2015-04-14T18:02:40Z"^^xsd:dateTime | "Padaborn" | "Station1" |
| :train1 | "RE 1" | "Platform 1" | "2015-04-14T18:00:40Z"^^xsd:dateTime | "2015-04-14T18:02:40Z"^^xsd:dateTime | "Padaborn" | "Station2" |
| :train1 | "RE 1" | "Platform 1" | "2015-04-14T18:00:40Z"^^xsd:dateTime | "2015-04-14T18:02:40Z"^^xsd:dateTime | "Padaborn" | "Station3" |
--------------------------------------------------------------------------------------------------------------------------------------------
And of course, you can can group by and group_concat here, too. With the list based approach, you can actually compute the position of the element in the list, too. See, e.g., my answer to Is it possible to get the position of an element in an RDF Collection in SPARQL?.

Query SPARQL to DBPedia using Java code

I would like to get the URIs of the page on DBPedia that have a label equal to "London". That is, when I query DBPedia, if a page the property rdfs:label with the value "London", then I want to get its URI, e.g., http://dbpedia.org/resource/London. I'm using the following Java code, but I get no results. What am I doing wrong here?
String strings = "London";
String service = "http://dbpedia.org/sparql";
String query = "PREFIX dbo:<http://dbpedia.org/ontology/>"
+ "PREFIX : <http://dbpedia.org/resource/>"
+ "PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#/>"
+ "select ?URI where {?URI rdfs:label "+strings+".}";
QueryExecution qe=QueryExecutionFactory.sparqlService(service, query);
ResultSet rs = qe.execSelect();
while (rs.hasNext()){
QuerySolution s= rs.nextSolution();
System.out.println(s.getResource("?URI").toString());
}

If you take a query like this to the DBpedia endpoint, you'll get a parse error:
select ?resource where {
?resource rdfs:label London
}
Virtuoso 37000 Error SP030: SPARQL compiler, line 4: syntax error at 'London' before '}'
SPARQL query:
define sql:big-data-const 0
#output-format:text/html
define sql:signal-void-variables 1 define input:default-graph-uri <http://dbpedia.org> select ?resource where {
?resource rdfs:label London
}
You need to put your strings inside single or double quotes. So you'd have a query like this, which parses correctly, but still has no results:
select ?resource where {
?resource rdfs:label "London"
}
SPARQL results
Much of the time, when you're working with DBpedia, it will help a lot to browse the data by hand to find out what's there, and then base your queries on that knowledge. In this case, you're interested in http://dbpedia.org/page/London. If you put that in a web browser, you'll see a bunch of the information London. If you scroll to the bottom, you can click on the N3/Turtle to see the same information in the Turtle serialization, which is very close to the SPARQL syntax. If you search in that page for rdfs:label, you'll see:
dbpedia:London rdfs:label "Londres"#fr ,
"London"#en ,
"London"#sv ,
"Londra"#it ,
"\u30ED\u30F3\u30C9\u30F3"#ja ,
"\u4F26\u6566"#zh ,
"Londres"#es ,
"Londyn"#pl ,
"Londen"#nl ,
"Londres"#pt ,
"\u041B\u043E\u043D\u0434\u043E\u043D"#ru ,
"London"#de ;
Most (perhaps all) of the labels in DBpedia have language tags, so you actually need to search for things that have the label "London"#en. For instance, you can use this query to get thes results:
select ?resource where {
?resource rdfs:label "London"#en
}
SPARQL results
Most of your code for the query is right, but you just need to be using the proper syntax for literals, and need to know what the data in DBpedia looks like. This is also a good case for parameterized SPARQL strings. There's an example in get latitude and longitude of a place dbpedia, but the short idea is that you should be able to write
select ?resource where {
?resource rdfs:label ?label
}
as a ParmeterizedSparqlString, and then use an API method to replace ?label with the literal that you want (in this case, "London"#en) without having to do any string concatenation. The ParameterizedSparqlString will also ensure that all the quoting is done correctly, and that the literal is formatted in the right way. Here's an example:
import com.hp.hpl.jena.query.ParameterizedSparqlString;
import com.hp.hpl.jena.query.QueryExecution;
import com.hp.hpl.jena.query.QueryExecutionFactory;
import com.hp.hpl.jena.query.ResultSet;
import com.hp.hpl.jena.query.ResultSetFactory;
import com.hp.hpl.jena.query.ResultSetFormatter;
import com.hp.hpl.jena.rdf.model.Literal;
import com.hp.hpl.jena.rdf.model.ResourceFactory;
public class LondonExample {
public static void main(String[] args) {
ParameterizedSparqlString qs = new ParameterizedSparqlString( "" +
"prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>\n" +
"\n" +
"select ?resource where {\n" +
" ?resource rdfs:label ?label\n" +
"}" );
Literal london = ResourceFactory.createLangLiteral( "London", "en" );
qs.setParam( "label", london );
System.out.println( qs );
QueryExecution exec = QueryExecutionFactory.sparqlService( "http://dbpedia.org/sparql", qs.asQuery() );
// Normally you'd just do results = exec.execSelect(), but I want to
// use this ResultSet twice, so I'm making a copy of it.
ResultSet results = ResultSetFactory.copyResults( exec.execSelect() );
while ( results.hasNext() ) {
// As RobV pointed out, don't use the `?` in the variable
// name here. Use *just* the name of the variable.
System.out.println( results.next().get( "resource" ));
}
// A simpler way of printing the results.
ResultSetFormatter.out( results );
}
}
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select ?resource where {
?resource rdfs:label "London"#en
}
http://dbpedia.org/resource/London
http://dbpedia.org/resource/Category:London
http://www.ontologyportal.org/SUMO#LondonUnitedKingdom
http://sw.opencyc.org/2008/06/10/concept/en/London
http://sw.opencyc.org/2008/06/10/concept/Mx8Ngx4rvWhwppwpEbGdrcN5Y29ycA-GTG9uZG9uHiu9WLbVnCkRsZ2tw3ljb3Jw
http://wikidata.dbpedia.org/resource/Q1442133
http://wikidata.dbpedia.org/resource/Q261303
http://wikidata.dbpedia.org/resource/Q79348
http://wikidata.dbpedia.org/resource/Q92561
http://wikidata.dbpedia.org/resource/Q3258936
http://wikidata.dbpedia.org/resource/Q84
http://wikidata.dbpedia.org/resource/Q6669759
http://wikidata.dbpedia.org/resource/Q6669762
http://wikidata.dbpedia.org/resource/Q6669763
http://wikidata.dbpedia.org/resource/Q6669771
http://wikidata.dbpedia.org/resource/Q586353
http://wikidata.dbpedia.org/resource/Q1310705
http://wikidata.dbpedia.org/resource/Q1749384
http://wikidata.dbpedia.org/resource/Q3836562
http://wikidata.dbpedia.org/resource/Q3836563
http://wikidata.dbpedia.org/resource/Q3836565
http://wikidata.dbpedia.org/resource/Q1001456
http://wikidata.dbpedia.org/resource/Q5712562
http://wikidata.dbpedia.org/resource/Q3061911
http://wikidata.dbpedia.org/resource/Q6669774
http://wikidata.dbpedia.org/resource/Q6669754
http://wikidata.dbpedia.org/resource/Q6669757
http://wikidata.dbpedia.org/resource/Q6669761
http://wikidata.dbpedia.org/resource/Q6669767
http://wikidata.dbpedia.org/resource/Q6669769
http://wikidata.dbpedia.org/resource/Q2477346
---------------------------------------------------------------------------------------------------------------
| resource |
===============================================================================================================
| <http://dbpedia.org/resource/London> |
| <http://dbpedia.org/resource/Category:London> |
| <http://www.ontologyportal.org/SUMO#LondonUnitedKingdom> |
| <http://sw.opencyc.org/2008/06/10/concept/en/London> |
| <http://sw.opencyc.org/2008/06/10/concept/Mx8Ngx4rvWhwppwpEbGdrcN5Y29ycA-GTG9uZG9uHiu9WLbVnCkRsZ2tw3ljb3Jw> |
| <http://wikidata.dbpedia.org/resource/Q1442133> |
| <http://wikidata.dbpedia.org/resource/Q261303> |
| <http://wikidata.dbpedia.org/resource/Q79348> |
| <http://wikidata.dbpedia.org/resource/Q92561> |
| <http://wikidata.dbpedia.org/resource/Q3258936> |
| <http://wikidata.dbpedia.org/resource/Q84> |
| <http://wikidata.dbpedia.org/resource/Q6669759> |
| <http://wikidata.dbpedia.org/resource/Q6669762> |
| <http://wikidata.dbpedia.org/resource/Q6669763> |
| <http://wikidata.dbpedia.org/resource/Q6669771> |
| <http://wikidata.dbpedia.org/resource/Q586353> |
| <http://wikidata.dbpedia.org/resource/Q1310705> |
| <http://wikidata.dbpedia.org/resource/Q1749384> |
| <http://wikidata.dbpedia.org/resource/Q3836562> |
| <http://wikidata.dbpedia.org/resource/Q3836563> |
| <http://wikidata.dbpedia.org/resource/Q3836565> |
| <http://wikidata.dbpedia.org/resource/Q1001456> |
| <http://wikidata.dbpedia.org/resource/Q5712562> |
| <http://wikidata.dbpedia.org/resource/Q3061911> |
| <http://wikidata.dbpedia.org/resource/Q6669774> |
| <http://wikidata.dbpedia.org/resource/Q6669754> |
| <http://wikidata.dbpedia.org/resource/Q6669757> |
| <http://wikidata.dbpedia.org/resource/Q6669761> |
| <http://wikidata.dbpedia.org/resource/Q6669767> |
| <http://wikidata.dbpedia.org/resource/Q6669769> |
| <http://wikidata.dbpedia.org/resource/Q2477346> |
---------------------------------------------------------------------------------------------------------------

The question mark in a variable name is just to distinguish it from other tokens in a SPARQL query. It could quite legally also be a $ and ?URI and $URI are the same variable.
When working with the Jena API (or pretty much any SPARQL API) you should not need to include the ?/$ in the variable name.
So simple change your getResource() call to omit the leading ? from the variable name like so:
System.out.println(s.getResource("URI").toString());
Note that a Jena ResultSet also has a getResultVars() method that provides a list of variables names present in the results that Jena will recognise. If you ever encounter a similar problem in the future it is worth printing out the variable names to see exactly how Jena is seeing the variables.

Matching strings with at least one word in common

I'm making a query to get the URIs of documents, that have a specific title. My query is:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dc: <http://purl.org/dc/elements/1.1/> SELECT ?document WHERE {
?document dc:title ?title.
FILTER (?title = "…" ).
}
where "…" is actually the value of this.getTitle(), since the query string is generated by:
String queryString = "PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> " +
"PREFIX dc: <http://purl.org/dc/elements/1.1/> SELECT ?document WHERE { " +
"?document dc:title ?title." +
"FILTER (?title = \"" + this.getTitle() + "\" ). }";
With the query above, I get only the documents with titles exactly like this.getTitle(). Imagine this.getTitle is formed by more than 1 word. I'd like to get documents even if only one word forming this.getTitle appears on the document title (for example). How could I do that?

Let's say you've got some data like (in Turtle):
#prefix : <http://stackoverflow.com/q/20203733/1281433> .
#prefix dc: <http://purl.org/dc/elements/1.1/> .
:a dc:title "Great Gatsby" .
:b dc:title "Boring Gatsby" .
:c dc:title "Great Expectations" .
:d dc:title "The Great Muppet Caper" .
Then you can use a query like:
prefix : <http://stackoverflow.com/q/20203733/1281433>
prefix dc: <http://purl.org/dc/elements/1.1/>
select ?x ?title where {
# this is just in place of this.getTitle(). It provides a value for
# ?TITLE that is "Gatsby Strikes Again".
values ?TITLE { "Gatsby Strikes Again" }
# Select a thing and its title.
?x dc:title ?title .
# Then filter based on whether the ?title matches the result
# of replacing the strings in ?TITLE with "|", and matching
# case insensitively.
filter( regex( ?title, replace( ?TITLE, " ", "|" ), "i" ))
}
to get results like
------------------------
| x | title |
========================
| :b | "Boring Gatsby" |
| :a | "Great Gatsby" |
------------------------
What's particularly neat about this is that since you're generating the pattern on the fly, you could even make it based on another value from the graph pattern. For instance, if you want all pairs of things whose titles match on at least one word, you could do:
prefix : <http://stackoverflow.com/q/20203733/1281433>
prefix dc: <http://purl.org/dc/elements/1.1/>
select ?x ?xtitle ?y ?ytitle where {
?x dc:title ?xtitle .
?y dc:title ?ytitle .
filter( regex( ?xtitle, replace( ?ytitle, " ", "|" ), "i" ) && ?x != ?y )
}
order by ?x ?y
to get:
-----------------------------------------------------------------
| x | xtitle | y | ytitle |
=================================================================
| :a | "Great Gatsby" | :b | "Boring Gatsby" |
| :a | "Great Gatsby" | :c | "Great Expectations" |
| :a | "Great Gatsby" | :d | "The Great Muppet Caper" |
| :b | "Boring Gatsby" | :a | "Great Gatsby" |
| :c | "Great Expectations" | :a | "Great Gatsby" |
| :c | "Great Expectations" | :d | "The Great Muppet Caper" |
| :d | "The Great Muppet Caper" | :a | "Great Gatsby" |
| :d | "The Great Muppet Caper" | :c | "Great Expectations" |
-----------------------------------------------------------------
Of course, it's very important to note that you're pulling generating patterns based on your data now, and that means that someone who can put data into your system could put very expensive patterns in to bog down the query and cause a denial-of-service. On a more mundane note, you could run into trouble if any of your titles have characters in them that would interfere with the regular expressions. One interesting problem would be if something had a title with multiple spaces so that the pattern became The|Words|With||Two|Spaces, since the empty pattern in there might make everything match. This is an interesting approach, but it's got a lot of caveats.
In general, you could do this as shown here, or by generating the regular expression in code (where you can take care of escaping, etc.), or you could use a SPARQL engine that supports some text-based extensions (e.g., jena-text, which adds Apache Lucene or Apache Solr to Apache Jena).

Retrieving superclasses implied by OWL intersection classes

An OWL ontology may have classes A, B, and C, and axiom (in DL notation):
A &sqsubseteq; (B &sqcap; C)
or in approximate Manchester OWL Syntax:
A subClassOf (B and C)
It is logically true that A is a subclass of B, and that A is a subclass of C, but the triples
A rdfs:subClassOf B
A rdfs:subClassOf C
are not necessarily present in the RDF serialization of the OWL ontology. For instance, consider this very simple ontology in Protégé and its RDF serialization in RDF/XML and Turtle:
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns="http://stackoverflow.com/q/19924861/1281433/sample.owl#"
xmlns:owl="http://www.w3.org/2002/07/owl#"
xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#">
<owl:Ontology rdf:about="http://stackoverflow.com/q/19924861/1281433/sample.owl"/>
<owl:Class rdf:about="http://stackoverflow.com/q/19924861/1281433/sample.owl#C"/>
<owl:Class rdf:about="http://stackoverflow.com/q/19924861/1281433/sample.owl#B"/>
<owl:Class rdf:about="http://stackoverflow.com/q/19924861/1281433/sample.owl#A">
<rdfs:subClassOf>
<owl:Class>
<owl:intersectionOf rdf:parseType="Collection">
<owl:Class rdf:about="http://stackoverflow.com/q/19924861/1281433/sample.owl#B"/>
<owl:Class rdf:about="http://stackoverflow.com/q/19924861/1281433/sample.owl#C"/>
</owl:intersectionOf>
</owl:Class>
</rdfs:subClassOf>
</owl:Class>
</rdf:RDF>
#prefix : <http://stackoverflow.com/q/19924861/1281433/sample.owl#> .
#prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
#prefix owl: <http://www.w3.org/2002/07/owl#> .
#prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
#prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
<http://stackoverflow.com/q/19924861/1281433/sample.owl>
a owl:Ontology .
:B a owl:Class .
:C a owl:Class .
:A a owl:Class ;
rdfs:subClassOf [ a owl:Class ;
owl:intersectionOf ( :B :C )
] .
The serialization has a triple with rdfs:subClassOf, but the object isn't :B or :C, so a query like
:A rdfs:subClassOf ?superclass
won't return the superclasses of :A. How can I write a SPARQL query that will return those superclasses of :A?

It sounds like you've got a class that is a subclass of some intersection class. E.g., you might have
Student &sqsubseteq; Person &sqcap; enrolledIn some Course
In the Protégé OWL ontology editor, this would look like:
If you write a SPARQL query for subclasses, e.g.,
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select ?subclass ?superclass where {
?subclass rdfs:subClassOf ?superclass
}
and you don't have a reasoner inferring additional data, you won't see Student as a subclass in your results, but you might see an blank (anonymous) node:
---------------------------------------------------------
| subclass | superclass |
=========================================================
| <http://www.examples.org/school#Student> | _:b0 |
---------------------------------------------------------
To understand why this is the case, you need to take a look at the RDF serialization of the ontology. In this case, it's (in RDF/XML):
<rdf:RDF
xmlns="http://www.examples.org/school#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:owl="http://www.w3.org/2002/07/owl#"
xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#">
<owl:Ontology rdf:about="http://www.examples.org/school"/>
<owl:Class rdf:about="http://www.examples.org/school#Course"/>
<owl:Class rdf:about="http://www.examples.org/school#Person"/>
<owl:Class rdf:about="http://www.examples.org/school#Student">
<rdfs:subClassOf>
<owl:Class>
<owl:intersectionOf rdf:parseType="Collection">
<owl:Class rdf:about="http://www.examples.org/school#Person"/>
<owl:Restriction>
<owl:onProperty>
<owl:ObjectProperty rdf:about="http://www.examples.org/school#enrolledIn"/>
</owl:onProperty>
<owl:someValuesFrom rdf:resource="http://www.examples.org/school#Course"/>
</owl:Restriction>
</owl:intersectionOf>
</owl:Class>
</rdfs:subClassOf>
</owl:Class>
</rdf:RDF>
Or in the more human-readable Turtle (which is also more like the SPARQL query syntax):
#prefix : <http://www.examples.org/school#> .
#prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
#prefix owl: <http://www.w3.org/2002/07/owl#> .
#prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
#prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
:Student a owl:Class ;
rdfs:subClassOf [ a owl:Class ;
owl:intersectionOf ( :Person [ a owl:Restriction ;
owl:onProperty :enrolledIn ;
owl:someValuesFrom :Course
] )
] .
:Person a owl:Class .
:enrolledIn a owl:ObjectProperty .
:Course a owl:Class .
<http://www.examples.org/school>
a owl:Ontology .
There is, in fact, a Student rdfs:subClassOf [ ... ] triple in the data, but the [ ... ] is a blank node; it's an anonymous owl:Class that is the intersection of some other classes. A reasoner would be able to tell you that if X &sqsubseteq; (Y and Z) then X &sqsubseteq; Y and X &sqsubseteq; Z, but a SPARQL query on its own won't do that. You could make a more complex SPARQL query like this that would, though:
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix owl: <http://www.w3.org/2002/07/owl#>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
select ?subclass ?superclass where {
{ ?subclass rdfs:subClassOf ?superclass }
union
{ ?subclass rdfs:subClassOf [ owl:intersectionOf [ rdf:rest* [ rdf:first ?superclass ] ] ] }
}
--------------------------------------------------------------------------------------
| subclass | superclass |
======================================================================================
| <http://www.examples.org/school#Student> | _:b0 |
| <http://www.examples.org/school#Student> | <http://www.examples.org/school#Person> |
| <http://www.examples.org/school#Student> | _:b1 |
--------------------------------------------------------------------------------------
The two blank nodes are the anonymous intersection class, and the anonymous restriction class (enrolledIn some Course). If you only want IRI results, you can use a filter:
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix owl: <http://www.w3.org/2002/07/owl#>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
select ?subclass ?superclass where {
{ ?subclass rdfs:subClassOf ?superclass }
union
{ ?subclass rdfs:subClassOf [ owl:intersectionOf [ rdf:rest* [ rdf:first ?superclass ] ] ] }
filter( isIRI( ?superclass ) )
}
--------------------------------------------------------------------------------------
| subclass | superclass |
======================================================================================
| <http://www.examples.org/school#Student> | <http://www.examples.org/school#Person> |
--------------------------------------------------------------------------------------
Now, as final touch, if you want to make your query a bit smaller, since the only difference in those two unioned patterns is the path that connects ?subclass and ?superclass, you can actually write this with just one property path. (Although, as noted in Sparql query Subclass or EquivalentTo, you might run into some issues with Protégé if you do this.) The idea is that you can rewrite this:
{ ?subclass rdfs:subClassOf ?superclass }
union
{ ?subclass rdfs:subClassOf [ owl:intersectionOf [ rdf:rest* [ rdf:first ?superclass ] ] ] }
as this, by using property paths, which also removes the need for the blank nodes:
?subclass ( rdfs:subClassOf |
( rdfs:subClassOf / owl:intersectionOf / rdf:rest* / rdf:first ) ) ?superclass
and you can simplify that a bit more to
?subclass rdfs:subClassOf/((owl:intersectionOf/rdf:rest*/rdf:first)+) ?superclass
and you can even remove one level of parentheses from that, to make it
?subclass rdfs:subClassOf/(owl:intersectionOf/rdf:rest*/rdf:first)+ ?superclass
but then you'd have to start remembering the precedence rules, and that's not much fun. The query works though:
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix owl: <http://www.w3.org/2002/07/owl#>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
select ?subclass ?superclass where {
?subclass rdfs:subClassOf/(owl:intersectionOf/rdf:rest*/rdf:first)+ ?superclass
filter( isIRI( ?superclass ) )
}
--------------------------------------------------------------------------------------
| subclass | superclass |
======================================================================================
| <http://www.examples.org/school#Student> | <http://www.examples.org/school#Person> |
--------------------------------------------------------------------------------------

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Issue with binding values from sub selection in Jena ARQ - java

Related

The function "org.apache.jena.rdf.model.hasProperty" didn't work

Dynamic Array in RDF/XML

Query SPARQL to DBPedia using Java code

Matching strings with at least one word in common

Retrieving superclasses implied by OWL intersection classes

Categories

Resources