Matching strings with at least one word in common

Matching strings with at least one word in common - java

I'm making a query to get the URIs of documents, that have a specific title. My query is:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dc: <http://purl.org/dc/elements/1.1/> SELECT ?document WHERE {
?document dc:title ?title.
FILTER (?title = "…" ).
}
where "…" is actually the value of this.getTitle(), since the query string is generated by:
String queryString = "PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> " +
"PREFIX dc: <http://purl.org/dc/elements/1.1/> SELECT ?document WHERE { " +
"?document dc:title ?title." +
"FILTER (?title = \"" + this.getTitle() + "\" ). }";
With the query above, I get only the documents with titles exactly like this.getTitle(). Imagine this.getTitle is formed by more than 1 word. I'd like to get documents even if only one word forming this.getTitle appears on the document title (for example). How could I do that?

Let's say you've got some data like (in Turtle):
#prefix : <http://stackoverflow.com/q/20203733/1281433> .
#prefix dc: <http://purl.org/dc/elements/1.1/> .
:a dc:title "Great Gatsby" .
:b dc:title "Boring Gatsby" .
:c dc:title "Great Expectations" .
:d dc:title "The Great Muppet Caper" .
Then you can use a query like:
prefix : <http://stackoverflow.com/q/20203733/1281433>
prefix dc: <http://purl.org/dc/elements/1.1/>
select ?x ?title where {
# this is just in place of this.getTitle(). It provides a value for
# ?TITLE that is "Gatsby Strikes Again".
values ?TITLE { "Gatsby Strikes Again" }
# Select a thing and its title.
?x dc:title ?title .
# Then filter based on whether the ?title matches the result
# of replacing the strings in ?TITLE with "|", and matching
# case insensitively.
filter( regex( ?title, replace( ?TITLE, " ", "|" ), "i" ))
}
to get results like
------------------------
| x | title |
========================
| :b | "Boring Gatsby" |
| :a | "Great Gatsby" |
------------------------
What's particularly neat about this is that since you're generating the pattern on the fly, you could even make it based on another value from the graph pattern. For instance, if you want all pairs of things whose titles match on at least one word, you could do:
prefix : <http://stackoverflow.com/q/20203733/1281433>
prefix dc: <http://purl.org/dc/elements/1.1/>
select ?x ?xtitle ?y ?ytitle where {
?x dc:title ?xtitle .
?y dc:title ?ytitle .
filter( regex( ?xtitle, replace( ?ytitle, " ", "|" ), "i" ) && ?x != ?y )
}
order by ?x ?y
to get:
-----------------------------------------------------------------
| x | xtitle | y | ytitle |
=================================================================
| :a | "Great Gatsby" | :b | "Boring Gatsby" |
| :a | "Great Gatsby" | :c | "Great Expectations" |
| :a | "Great Gatsby" | :d | "The Great Muppet Caper" |
| :b | "Boring Gatsby" | :a | "Great Gatsby" |
| :c | "Great Expectations" | :a | "Great Gatsby" |
| :c | "Great Expectations" | :d | "The Great Muppet Caper" |
| :d | "The Great Muppet Caper" | :a | "Great Gatsby" |
| :d | "The Great Muppet Caper" | :c | "Great Expectations" |
-----------------------------------------------------------------
Of course, it's very important to note that you're pulling generating patterns based on your data now, and that means that someone who can put data into your system could put very expensive patterns in to bog down the query and cause a denial-of-service. On a more mundane note, you could run into trouble if any of your titles have characters in them that would interfere with the regular expressions. One interesting problem would be if something had a title with multiple spaces so that the pattern became The|Words|With||Two|Spaces, since the empty pattern in there might make everything match. This is an interesting approach, but it's got a lot of caveats.
In general, you could do this as shown here, or by generating the regular expression in code (where you can take care of escaping, etc.), or you could use a SPARQL engine that supports some text-based extensions (e.g., jena-text, which adds Apache Lucene or Apache Solr to Apache Jena).

Related

SQL query using Spark and java language

I have two dataframe on spark .
The first dataframe1 is :
+--------------+--------------+--------------+
|id_z |longitude |latitude |
+--------------+--------------+--------------+
|[12,20,30 ] |-7.0737816 | 33.82666 |
|13 |-7.5952683 | 33.5441916 |
+--------------+--------------+--------------+
The second dataframe2 is :
+--------------+--------------+---------------+
|id_z2 |longitude2 |latitude2 |
+--------------+--------------+---------------+
| 14 |-8.5952683 | 38.5441916 |
| 12 |-7.0737816 | 33.82666 |
+--------------+--------------+---------------+
I want to apply the logic of the following request.
String sql = "SELECT * FROM dataframe2 WHERE dataframe2 .id_z2 IN ("'"
+ id_z +"'") and longitude2 = "'"+longitude+"'" and latitude = "'"+latitude+"'"";
I prefer not to have a join, is it possible to do this?
I really need your help , or just a starting point that will make things easier for me.
Thnak you

Issue with binding values from sub selection in Jena ARQ

I want to run the following simple testing query:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX vcard: <http://www.w3.org/2001/vcard-rdf/3.0#>
SELECT ?givenName ?name_count ?temp
WHERE
{ BIND(if(( ?name_count = 2 ), "just two", "definitely not 2") AS ?temp)
{ SELECT DISTINCT ?givenName (COUNT(?givenName) AS ?name_count)
WHERE
{ ?y vcard:Family ?givenName }
GROUP BY ?givenName
}
}
The graph I am querying is this from the tutorial https://jena.apache.org/tutorials/sparql_data.html:
#prefix vCard: <http://www.w3.org/2001/vcard-rdf/3.0#> .
#prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
<http://somewhere/MattJones/> vCard:FN "Matt Jones" .
<http://somewhere/MattJones/> vCard:N _:b0 .
_:b0 vCard:Family "Jones" .
_:b0 vCard:Given "Matthew" .
<http://somewhere/RebeccaSmith/> vCard:FN "Becky Smith" .
<http://somewhere/RebeccaSmith/> vCard:N _:b1 .
_:b1 vCard:Family "Smith" .
_:b1 vCard:Given "Rebecca" .
<http://somewhere/JohnSmith/> vCard:FN "John Smith" .
<http://somewhere/JohnSmith/> vCard:N _:b2 .
_:b2 vCard:Family "Smith" .
_:b2 vCard:Given "John" .
<http://somewhere/SarahJones/> vCard:FN "Sarah Jones" .
<http://somewhere/SarahJones/> vCard:N _:b3 .
_:b3 vCard:Family "Jones" .
_:b3 vCard:Given "Sarah" .
Now the problem is that running it with Jena:
Query query = QueryFactory.create(theAboveQueryAsString);
QueryExecution qexec = QueryExecutionFactory.create(query, theAboveGraphmodel);
ResultSet execSel = qexec.execSelect();
ResultSetRewindable results = ResultSetFactory.copyResults(execSel);;
ResultSetFormatter.out(System.out, results, query);
gives off this result in console:
----------------------------------
| givenName | name_count | temp |
==================================
| "Smith" | 2 | |
| "Jones" | 2 | |
----------------------------------
having the temp values as null.
On the other hand running the same query on the the same graph in Ontotext GraphDb enviroment i get the correct result (saved as CSV):
givenName | name_count | temp
------------------------------------
Jones | 2 | just two
Smith | 2 | just two
Could there be a bug in the ARQ engine or am I missing something?
Thanks in advance.
I am using jena-arq 3.12.0
Java(TM) SE Runtime Environment (build 1.8.0_181-b13)
Eclipse Version Version: 2019-06 (4.12.0)

There is a join between BIND and sub-select. The arguments to the join step are calculated before the join is done. So the BIND is evaluated, the sub-select is evaluated separately and the results joined. ?name_count isn't set in the BIND assignment. If you move it after the sub-select, it will apply to the results of the sub-select.
BIND adds a binding to the result of the pattern before it.
(base <http://example/base/>
(prefix ((rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>)
(vcard: <http://www.w3.org/2001/vcard-rdf/3.0#>))
(project (?givenName ?name_count ?temp)
(join
(extend ((?temp (if (= ?name_count 2) "just two" "definitely not 2")))
(table unit))
(distinct
(project (?givenName ?name_count)
(extend ((?name_count ?.0))
(group (?givenName) ((?.0 (count ?givenName)))
(bgp (triple ?y vcard:Family ?givenName))))))))))
Here, the (extend...) is one of two argument to the (join...). (table unit) is the "nothing" before the BIND.
If put afterwards, the algebra is:
(base <http://example/base/>
(prefix ((rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>)
(vcard: <http://www.w3.org/2001/vcard-rdf/3.0#>))
(project (?givenName ?name_count ?temp)
(extend ((?temp (if (= ?name_count 2) "just two" "definitely not 2")))
(distinct
(project (?givenName ?name_count)
(extend ((?name_count ?.0))
(group (?givenName) ((?.0 (count ?givenName)))
(bgp (triple ?y vcard:Family ?givenName))))))))))
and the extend (which is from the syntax BIND) is working on the (distinct ... of the sub-query.

Elasticsearch - how to group by and count matches in an index

I have an instance of Elasticsearch running with thousands of documents. My index has 2 fields like this:
|____Type_____|__ Date_added __ |
| walking | 2018-11-27T00:00:00.000 |
| walking | 2018-11-26T00:00:00.000 |
| running | 2018-11-24T00:00:00.000 |
| running | 2018-11-25T00:00:00.000 |
| walking | 2018-11-27T04:00:00.000 |
I want to group by and count how many matches were found for the "type" field, in a certain range.
In SQL I would do something like this:
select type,
count(type)
from index
where date_added between '2018-11-20' and '2018-11-30'
group by type
I want to get something like this:
| type | count |
| running | 2 |
| walking | 3 |
I'm using the High Level Rest Client api in my project, so far my query looks like this, it's only filtering by the start and end time:
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.query(QueryBuilders
.boolQuery()
.must(QueryBuilders
.rangeQuery("date_added")
.from(start.getTime())
.to(end.getTime()))
)
);
How can I do a "group by" in the "type" field? Is it possible to do this in ElasticSearch?

That's a good start! Now you need to add a terms aggregation to your query:
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.query(QueryBuilders.boolQuery()
.must(QueryBuilders
.rangeQuery("date_added")
.from(start.getTime())
.to(end.getTime()))
)
);
// add these two lines
TermsAggregationBuilder groupBy = AggregationBuilders.terms("byType").field("type.keyword");
sourceBuilder.aggregation(groupBy);

After using Val's reply to aggregate the fields, I wanted to print the aggregations of my query together with the value of them. Here's what I did:
Terms terms = searchResponse.getAggregations().get("byType");
Collection<Terms.Bucket> buckets = (Collection<Bucket>) terms.getBuckets();
for (Bucket bucket : buckets) {
System.out.println("Type: " + bucket.getKeyAsString() + " = Count("+bucket.getDocCount()+")");
}
This is the output after running the query in an index with 2700 documents with a field called "type" and 2 different types:
Type: walking = Count(900)
Type: running = Count(1800)

Dynamic Array in RDF/XML

In c# I am generating a graph and am using RDF/XML to send it to my android application. Part of my graph is supposed to show different trains that arrive at a certain train station.
Adding a train, its arrival, departure, destination and so on is no problem. Unfortunately I don't know how to get all stations the different trains pass on their way in the RDF/XML.
For example:
Train A starts in city 1, passes city 2 & 3, and arrives in city 4.
Train B starts in city 1, passes city 5 & 6 & 7, and arrives in city 8.
How can I dynamically add arrays like [city 5, city 6, city 7] to a train?
My current RDF/XML:
<?xml version="1.0" encoding="utf-16"?>" +
"<!DOCTYPE rdf:RDF [\n" +
"\t<!ENTITY rdf 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'>\n" +
"]>\n" +
"<rdf:RDF xmlns:rdfs=\"http://www.w3.org/2000/01/rdf-schema#\" xmlns:xsd=\"http://www.w3.org/2001/XMLSchema#\" xmlns:ns0=\"http://my.url.com/ontologies/mash-up#\" xmlns:ns1=\"http://xmlns.com/foaf/0.1/\" xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\">\n" +
" <ns0:Train rdf:about=\"http://my.url.com/ontologies/mash-up#Train-1d4b674c-479c-48a3-aab3-b729fc96cbd4\">\n" +
" <ns0:name rdf:datatype=\"&xsd;string\">RE 1</ns0:name>\n" +
" <ns0:description rdf:datatype=\"&xsd;string\">Platform 1</ns0:description>\n" +
" <ns0:arrival rdf:datatype=\"&xsd;dateTime\">2015-04-14T18:00:40Z</ns0:arrival>\n" +
" <ns0:departure rdf:datatype=\"&xsd;dateTime\">2015-04-14T18:02:40Z</ns0:departure>\n" +
" <ns0:destination rdf:datatype=\"&xsd;string\">Padaborn</ns0:destination>\n" +
" <ns1:primaryTopic rdf:resource=\"http://my.url.com/ontologies/mash-up#Train-1d4b674c-479c-48a3-aab3-b729fc96cbd4\" />\n" +
" </ns0:Train>\n" +
"</rdf:RDF>";
Simply adding all stations as an own attribute would mean my Jena query would have to have very very many optionals and I'd like to avoid that.
Thanks in advance.
Edit:
This is how I would do it. That would mean very many optionals...
" <ns0:stations rdf:about=\"http://my.url.com/ontologies/mash-up#Stations-xxxx\">“+
" <ns0:from rdf:datatype=\"&xsd;string\">Aachen</ns0:from>\n“ +
" <ns0:to rdf:datatype=\"&xsd;string\">Padaborn</ns0:to>\n“ +
" <ns0:over1 rdf:datatype=\"&xsd;string\“>Köln</ns0:to>\n“ +
" <ns0:over2 rdf:datatype=\"&xsd;string\“>Düsseldorf</ns0:to>\n“ +
" <ns0:over3 rdf:datatype=\"&xsd;string\“>Duisburg</ns0:to>\n“ +
" <ns0:over4 rdf:datatype=\"&xsd;string\“>Essen</ns0:to>\n“ +
" <ns0:over5 rdf:datatype=\"&xsd;string\“>Dortmund</ns0:to>\n“ +
" <ns0:over6 rdf:datatype=\"&xsd;string\“>Hamm (Westf)</ns0:to>\n“ +
" </ns0:stations>\n"

If order of the stations does not matter:
The simplest option, if the order of the stations passed is not important, would be to use the same property. That would look like this in the data (I'll use Turtle, since it's more human readable and writeable, but show the corresponding RDF/XML too; it's easy to convert between the two, since they're just different serializations of the same data):
#prefix : <urn:train:>.
#prefix xsd: <http://www.w3.org/2001/XMLSchema#>.
:train1
:name "RE 1" ;
:description "Platform 1" ;
:arrival "2015-04-14T18:00:40Z"^^xsd:dateTime ;
:departure "2015-04-14T18:02:40Z"^^xsd:dateTime ;
:over "Station1", "Station2", "Station3" ;
:destination "Padaborn" .
<rdf:RDF
xmlns="urn:train:"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:xsd="http://www.w3.org/2001/XMLSchema#">
<rdf:Description rdf:about="urn:train:train1">
<name>RE 1</name>
<description>Platform 1</description>
<arrival rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime"
>2015-04-14T18:00:40Z</arrival>
<departure rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime"
>2015-04-14T18:02:40Z</departure>
<over>Station1</over>
<over>Station2</over>
<over>Station3</over>
<destination>Padaborn</destination>
</rdf:Description>
</rdf:RDF>
Then you can use just one optional in the SPARQL query:
prefix : <urn:train:>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
select * where {
?train :name ?name ;
:description ?description ;
:arrival ?arrival ;
:departure ?departure ;
:destination ?destination .
optional {
?train :over ?over
}
}
--------------------------------------------------------------------------------------------------------------------------------------------
| train | name | description | arrival | departure | destination | over |
============================================================================================================================================
| :train1 | "RE 1" | "Platform 1" | "2015-04-14T18:00:40Z"^^xsd:dateTime | "2015-04-14T18:02:40Z"^^xsd:dateTime | "Padaborn" | "Station3" |
| :train1 | "RE 1" | "Platform 1" | "2015-04-14T18:00:40Z"^^xsd:dateTime | "2015-04-14T18:02:40Z"^^xsd:dateTime | "Padaborn" | "Station2" |
| :train1 | "RE 1" | "Platform 1" | "2015-04-14T18:00:40Z"^^xsd:dateTime | "2015-04-14T18:02:40Z"^^xsd:dateTime | "Padaborn" | "Station1" |
--------------------------------------------------------------------------------------------------------------------------------------------
You might want those in a list, so you'd group by and group_concat their values:
prefix : <urn:train:>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
select ?train ?name ?description
?arrival ?departure
?destination
(group_concat(?over) as ?overs)
where {
?train :name ?name ;
:description ?description ;
:arrival ?arrival ;
:departure ?departure ;
:destination ?destination .
optional {
?train :over ?over
}
}
group by ?train ?name ?description ?arrival ?departure ?destination
--------------------------------------------------------------------------------------------------------------------------------------------------------------
| train | name | description | arrival | departure | destination | overs |
==============================================================================================================================================================
| :train1 | "RE 1" | "Platform 1" | "2015-04-14T18:00:40Z"^^xsd:dateTime | "2015-04-14T18:02:40Z"^^xsd:dateTime | "Padaborn" | "Station3 Station2 Station1" |
--------------------------------------------------------------------------------------------------------------------------------------------------------------
If the order matters
If the order is more important, then you'll need to preserve it somehow. You could either do this with distinct properties, or one property that has a list value (or some other structured data that supports ordering).
With multiple properties
Your solution that used over1, over2, over3, properties can actually be realized using just one optional pattern, since you can use a variable in place of a property, and then filter on the value of that. Just check whether its URI begins with the right prefix, including over:
#prefix : <urn:train:>.
#prefix xsd: <http://www.w3.org/2001/XMLSchema#>.
:train1
:name "RE 1" ;
:description "Platform 1" ;
:arrival "2015-04-14T18:00:40Z"^^xsd:dateTime ;
:departure "2015-04-14T18:02:40Z"^^xsd:dateTime ;
:over1 "Station1" ;
:over2 "Station2" ;
:over3 "Station3" ;
:destination "Padaborn" .
<rdf:RDF
xmlns="urn:train:"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:xsd="http://www.w3.org/2001/XMLSchema#">
<rdf:Description rdf:about="urn:train:train1">
<name>RE 1</name>
<description>Platform 1</description>
<arrival rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime"
>2015-04-14T18:00:40Z</arrival>
<departure rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime"
>2015-04-14T18:02:40Z</departure>
<over1>Station1</over1>
<over2>Station2</over2>
<over3>Station3</over3>
<destination>Padaborn</destination>
</rdf:Description>
</rdf:RDF>
prefix : <urn:train:>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
select * where {
?train :name ?name ;
:description ?description ;
:arrival ?arrival ;
:departure ?departure ;
:destination ?destination .
optional {
?train ?overProp ?over .
filter strstarts(str(?overProp),str(:over))
}
}
order by ?overProp
-------------------------------------------------------------------------------------------------------------------------------------------------------
| train | name | description | arrival | departure | destination | overProp | over |
=======================================================================================================================================================
| :train1 | "RE 1" | "Platform 1" | "2015-04-14T18:00:40Z"^^xsd:dateTime | "2015-04-14T18:02:40Z"^^xsd:dateTime | "Padaborn" | :over1 | "Station1" |
| :train1 | "RE 1" | "Platform 1" | "2015-04-14T18:00:40Z"^^xsd:dateTime | "2015-04-14T18:02:40Z"^^xsd:dateTime | "Padaborn" | :over2 | "Station2" |
| :train1 | "RE 1" | "Platform 1" | "2015-04-14T18:00:40Z"^^xsd:dateTime | "2015-04-14T18:02:40Z"^^xsd:dateTime | "Padaborn" | :over3 | "Station3" |
-------------------------------------------------------------------------------------------------------------------------------------------------------
And of course, you can use group by to combine the values here in the same way.
With just a little bit of structure
You could also use just one property, but have the value be something with just a little bit of structure. E.g., a blank node that has a value for the station and the index in which it's passed:
#prefix : <urn:train:>.
#prefix xsd: <http://www.w3.org/2001/XMLSchema#>.
:train1
:name "RE 1" ;
:description "Platform 1" ;
:arrival "2015-04-14T18:00:40Z"^^xsd:dateTime ;
:departure "2015-04-14T18:02:40Z"^^xsd:dateTime ;
:over [:station "Station1" ; :number 2 ] ,
[:station "Station2" ; :number 3 ] ,
[:station "Station3" ; :number 1 ] ;
:destination "Padaborn" .
<rdf:RDF
xmlns="urn:train:"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:xsd="http://www.w3.org/2001/XMLSchema#">
<rdf:Description rdf:about="urn:train:train1">
<name>RE 1</name>
<description>Platform 1</description>
<arrival rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime"
>2015-04-14T18:00:40Z</arrival>
<departure rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime"
>2015-04-14T18:02:40Z</departure>
<over rdf:parseType="Resource">
<station>Station1</station>
<number rdf:datatype="http://www.w3.org/2001/XMLSchema#integer"
>2</number>
</over>
<over rdf:parseType="Resource">
<station>Station2</station>
<number rdf:datatype="http://www.w3.org/2001/XMLSchema#integer"
>3</number>
</over>
<over rdf:parseType="Resource">
<station>Station3</station>
<number rdf:datatype="http://www.w3.org/2001/XMLSchema#integer"
>1</number>
</over>
<destination>Padaborn</destination>
</rdf:Description>
</rdf:RDF>
prefix : <urn:train:>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
select * where {
?train :name ?name ;
:description ?description ;
:arrival ?arrival ;
:departure ?departure ;
:destination ?destination .
optional {
?train :over [ :station ?over ;
:number ?number ]
}
}
-----------------------------------------------------------------------------------------------------------------------------------------------------
| train | name | description | arrival | departure | destination | over | number |
=====================================================================================================================================================
| :train1 | "RE 1" | "Platform 1" | "2015-04-14T18:00:40Z"^^xsd:dateTime | "2015-04-14T18:02:40Z"^^xsd:dateTime | "Padaborn" | "Station3" | 1 |
| :train1 | "RE 1" | "Platform 1" | "2015-04-14T18:00:40Z"^^xsd:dateTime | "2015-04-14T18:02:40Z"^^xsd:dateTime | "Padaborn" | "Station2" | 3 |
| :train1 | "RE 1" | "Platform 1" | "2015-04-14T18:00:40Z"^^xsd:dateTime | "2015-04-14T18:02:40Z"^^xsd:dateTime | "Padaborn" | "Station1" | 2 |
-----------------------------------------------------------------------------------------------------------------------------------------------------
With a single property and a list
You could also have a single over property and have its value be a list. This is very easy to write in the N3/Turtle serialization. It's not as pretty in RDF/XML, but like I said in a comment, it's better to use a human writeable syntax if you're writing by hand, or to use a proper API for writing it. It's not too hard to query in the SPARQL either.
#prefix : <urn:train:>.
#prefix xsd: <http://www.w3.org/2001/XMLSchema#>.
:train1
:name "RE 1" ;
:description "Platform 1" ;
:arrival "2015-04-14T18:00:40Z"^^xsd:dateTime ;
:departure "2015-04-14T18:02:40Z"^^xsd:dateTime ;
:over ("Station1" "Station2" "Station3") ;
:destination "Padaborn" .
<rdf:RDF
xmlns="urn:train:"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:xsd="http://www.w3.org/2001/XMLSchema#">
<rdf:Description rdf:about="urn:train:train1">
<name>RE 1</name>
<description>Platform 1</description>
<arrival rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime"
>2015-04-14T18:00:40Z</arrival>
<departure rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime"
>2015-04-14T18:02:40Z</departure>
<over rdf:parseType="Resource">
<rdf:first>Station1</rdf:first>
<rdf:rest rdf:parseType="Resource">
<rdf:first>Station2</rdf:first>
<rdf:rest rdf:parseType="Resource">
<rdf:first>Station3</rdf:first>
<rdf:rest rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#nil"/>
</rdf:rest>
</rdf:rest>
</over>
<destination>Padaborn</destination>
</rdf:Description>
</rdf:RDF>
prefix : <urn:train:>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
select * where {
?train :name ?name ;
:description ?description ;
:arrival ?arrival ;
:departure ?departure ;
:destination ?destination .
optional {
?train :over/(rdf:rest*/rdf:first) ?over
}
}
--------------------------------------------------------------------------------------------------------------------------------------------
| train | name | description | arrival | departure | destination | over |
============================================================================================================================================
| :train1 | "RE 1" | "Platform 1" | "2015-04-14T18:00:40Z"^^xsd:dateTime | "2015-04-14T18:02:40Z"^^xsd:dateTime | "Padaborn" | "Station1" |
| :train1 | "RE 1" | "Platform 1" | "2015-04-14T18:00:40Z"^^xsd:dateTime | "2015-04-14T18:02:40Z"^^xsd:dateTime | "Padaborn" | "Station2" |
| :train1 | "RE 1" | "Platform 1" | "2015-04-14T18:00:40Z"^^xsd:dateTime | "2015-04-14T18:02:40Z"^^xsd:dateTime | "Padaborn" | "Station3" |
--------------------------------------------------------------------------------------------------------------------------------------------
And of course, you can can group by and group_concat here, too. With the list based approach, you can actually compute the position of the element in the list, too. See, e.g., my answer to Is it possible to get the position of an element in an RDF Collection in SPARQL?.

Query SPARQL to DBPedia using Java code

I would like to get the URIs of the page on DBPedia that have a label equal to "London". That is, when I query DBPedia, if a page the property rdfs:label with the value "London", then I want to get its URI, e.g., http://dbpedia.org/resource/London. I'm using the following Java code, but I get no results. What am I doing wrong here?
String strings = "London";
String service = "http://dbpedia.org/sparql";
String query = "PREFIX dbo:<http://dbpedia.org/ontology/>"
+ "PREFIX : <http://dbpedia.org/resource/>"
+ "PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#/>"
+ "select ?URI where {?URI rdfs:label "+strings+".}";
QueryExecution qe=QueryExecutionFactory.sparqlService(service, query);
ResultSet rs = qe.execSelect();
while (rs.hasNext()){
QuerySolution s= rs.nextSolution();
System.out.println(s.getResource("?URI").toString());
}

If you take a query like this to the DBpedia endpoint, you'll get a parse error:
select ?resource where {
?resource rdfs:label London
}
Virtuoso 37000 Error SP030: SPARQL compiler, line 4: syntax error at 'London' before '}'
SPARQL query:
define sql:big-data-const 0
#output-format:text/html
define sql:signal-void-variables 1 define input:default-graph-uri <http://dbpedia.org> select ?resource where {
?resource rdfs:label London
}
You need to put your strings inside single or double quotes. So you'd have a query like this, which parses correctly, but still has no results:
select ?resource where {
?resource rdfs:label "London"
}
SPARQL results
Much of the time, when you're working with DBpedia, it will help a lot to browse the data by hand to find out what's there, and then base your queries on that knowledge. In this case, you're interested in http://dbpedia.org/page/London. If you put that in a web browser, you'll see a bunch of the information London. If you scroll to the bottom, you can click on the N3/Turtle to see the same information in the Turtle serialization, which is very close to the SPARQL syntax. If you search in that page for rdfs:label, you'll see:
dbpedia:London rdfs:label "Londres"#fr ,
"London"#en ,
"London"#sv ,
"Londra"#it ,
"\u30ED\u30F3\u30C9\u30F3"#ja ,
"\u4F26\u6566"#zh ,
"Londres"#es ,
"Londyn"#pl ,
"Londen"#nl ,
"Londres"#pt ,
"\u041B\u043E\u043D\u0434\u043E\u043D"#ru ,
"London"#de ;
Most (perhaps all) of the labels in DBpedia have language tags, so you actually need to search for things that have the label "London"#en. For instance, you can use this query to get thes results:
select ?resource where {
?resource rdfs:label "London"#en
}
SPARQL results
Most of your code for the query is right, but you just need to be using the proper syntax for literals, and need to know what the data in DBpedia looks like. This is also a good case for parameterized SPARQL strings. There's an example in get latitude and longitude of a place dbpedia, but the short idea is that you should be able to write
select ?resource where {
?resource rdfs:label ?label
}
as a ParmeterizedSparqlString, and then use an API method to replace ?label with the literal that you want (in this case, "London"#en) without having to do any string concatenation. The ParameterizedSparqlString will also ensure that all the quoting is done correctly, and that the literal is formatted in the right way. Here's an example:
import com.hp.hpl.jena.query.ParameterizedSparqlString;
import com.hp.hpl.jena.query.QueryExecution;
import com.hp.hpl.jena.query.QueryExecutionFactory;
import com.hp.hpl.jena.query.ResultSet;
import com.hp.hpl.jena.query.ResultSetFactory;
import com.hp.hpl.jena.query.ResultSetFormatter;
import com.hp.hpl.jena.rdf.model.Literal;
import com.hp.hpl.jena.rdf.model.ResourceFactory;
public class LondonExample {
public static void main(String[] args) {
ParameterizedSparqlString qs = new ParameterizedSparqlString( "" +
"prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>\n" +
"\n" +
"select ?resource where {\n" +
" ?resource rdfs:label ?label\n" +
"}" );
Literal london = ResourceFactory.createLangLiteral( "London", "en" );
qs.setParam( "label", london );
System.out.println( qs );
QueryExecution exec = QueryExecutionFactory.sparqlService( "http://dbpedia.org/sparql", qs.asQuery() );
// Normally you'd just do results = exec.execSelect(), but I want to
// use this ResultSet twice, so I'm making a copy of it.
ResultSet results = ResultSetFactory.copyResults( exec.execSelect() );
while ( results.hasNext() ) {
// As RobV pointed out, don't use the `?` in the variable
// name here. Use *just* the name of the variable.
System.out.println( results.next().get( "resource" ));
}
// A simpler way of printing the results.
ResultSetFormatter.out( results );
}
}
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select ?resource where {
?resource rdfs:label "London"#en
}
http://dbpedia.org/resource/London
http://dbpedia.org/resource/Category:London
http://www.ontologyportal.org/SUMO#LondonUnitedKingdom
http://sw.opencyc.org/2008/06/10/concept/en/London
http://sw.opencyc.org/2008/06/10/concept/Mx8Ngx4rvWhwppwpEbGdrcN5Y29ycA-GTG9uZG9uHiu9WLbVnCkRsZ2tw3ljb3Jw
http://wikidata.dbpedia.org/resource/Q1442133
http://wikidata.dbpedia.org/resource/Q261303
http://wikidata.dbpedia.org/resource/Q79348
http://wikidata.dbpedia.org/resource/Q92561
http://wikidata.dbpedia.org/resource/Q3258936
http://wikidata.dbpedia.org/resource/Q84
http://wikidata.dbpedia.org/resource/Q6669759
http://wikidata.dbpedia.org/resource/Q6669762
http://wikidata.dbpedia.org/resource/Q6669763
http://wikidata.dbpedia.org/resource/Q6669771
http://wikidata.dbpedia.org/resource/Q586353
http://wikidata.dbpedia.org/resource/Q1310705
http://wikidata.dbpedia.org/resource/Q1749384
http://wikidata.dbpedia.org/resource/Q3836562
http://wikidata.dbpedia.org/resource/Q3836563
http://wikidata.dbpedia.org/resource/Q3836565
http://wikidata.dbpedia.org/resource/Q1001456
http://wikidata.dbpedia.org/resource/Q5712562
http://wikidata.dbpedia.org/resource/Q3061911
http://wikidata.dbpedia.org/resource/Q6669774
http://wikidata.dbpedia.org/resource/Q6669754
http://wikidata.dbpedia.org/resource/Q6669757
http://wikidata.dbpedia.org/resource/Q6669761
http://wikidata.dbpedia.org/resource/Q6669767
http://wikidata.dbpedia.org/resource/Q6669769
http://wikidata.dbpedia.org/resource/Q2477346
---------------------------------------------------------------------------------------------------------------
| resource |
===============================================================================================================
| <http://dbpedia.org/resource/London> |
| <http://dbpedia.org/resource/Category:London> |
| <http://www.ontologyportal.org/SUMO#LondonUnitedKingdom> |
| <http://sw.opencyc.org/2008/06/10/concept/en/London> |
| <http://sw.opencyc.org/2008/06/10/concept/Mx8Ngx4rvWhwppwpEbGdrcN5Y29ycA-GTG9uZG9uHiu9WLbVnCkRsZ2tw3ljb3Jw> |
| <http://wikidata.dbpedia.org/resource/Q1442133> |
| <http://wikidata.dbpedia.org/resource/Q261303> |
| <http://wikidata.dbpedia.org/resource/Q79348> |
| <http://wikidata.dbpedia.org/resource/Q92561> |
| <http://wikidata.dbpedia.org/resource/Q3258936> |
| <http://wikidata.dbpedia.org/resource/Q84> |
| <http://wikidata.dbpedia.org/resource/Q6669759> |
| <http://wikidata.dbpedia.org/resource/Q6669762> |
| <http://wikidata.dbpedia.org/resource/Q6669763> |
| <http://wikidata.dbpedia.org/resource/Q6669771> |
| <http://wikidata.dbpedia.org/resource/Q586353> |
| <http://wikidata.dbpedia.org/resource/Q1310705> |
| <http://wikidata.dbpedia.org/resource/Q1749384> |
| <http://wikidata.dbpedia.org/resource/Q3836562> |
| <http://wikidata.dbpedia.org/resource/Q3836563> |
| <http://wikidata.dbpedia.org/resource/Q3836565> |
| <http://wikidata.dbpedia.org/resource/Q1001456> |
| <http://wikidata.dbpedia.org/resource/Q5712562> |
| <http://wikidata.dbpedia.org/resource/Q3061911> |
| <http://wikidata.dbpedia.org/resource/Q6669774> |
| <http://wikidata.dbpedia.org/resource/Q6669754> |
| <http://wikidata.dbpedia.org/resource/Q6669757> |
| <http://wikidata.dbpedia.org/resource/Q6669761> |
| <http://wikidata.dbpedia.org/resource/Q6669767> |
| <http://wikidata.dbpedia.org/resource/Q6669769> |
| <http://wikidata.dbpedia.org/resource/Q2477346> |
---------------------------------------------------------------------------------------------------------------

The question mark in a variable name is just to distinguish it from other tokens in a SPARQL query. It could quite legally also be a $ and ?URI and $URI are the same variable.
When working with the Jena API (or pretty much any SPARQL API) you should not need to include the ?/$ in the variable name.
So simple change your getResource() call to omit the leading ? from the variable name like so:
System.out.println(s.getResource("URI").toString());
Note that a Jena ResultSet also has a getResultVars() method that provides a list of variables names present in the results that Jena will recognise. If you ever encounter a similar problem in the future it is worth printing out the variable names to see exactly how Jena is seeing the variables.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Matching strings with at least one word in common - java

Related

SQL query using Spark and java language

Issue with binding values from sub selection in Jena ARQ

Elasticsearch - how to group by and count matches in an index

Dynamic Array in RDF/XML

Query SPARQL to DBPedia using Java code

Categories

Resources