Query SPARQL to DBPedia using Java code

Query SPARQL to DBPedia using Java code - java

I would like to get the URIs of the page on DBPedia that have a label equal to "London". That is, when I query DBPedia, if a page the property rdfs:label with the value "London", then I want to get its URI, e.g., http://dbpedia.org/resource/London. I'm using the following Java code, but I get no results. What am I doing wrong here?
String strings = "London";
String service = "http://dbpedia.org/sparql";
String query = "PREFIX dbo:<http://dbpedia.org/ontology/>"
+ "PREFIX : <http://dbpedia.org/resource/>"
+ "PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#/>"
+ "select ?URI where {?URI rdfs:label "+strings+".}";
QueryExecution qe=QueryExecutionFactory.sparqlService(service, query);
ResultSet rs = qe.execSelect();
while (rs.hasNext()){
QuerySolution s= rs.nextSolution();
System.out.println(s.getResource("?URI").toString());
}

If you take a query like this to the DBpedia endpoint, you'll get a parse error:
select ?resource where {
?resource rdfs:label London
}
Virtuoso 37000 Error SP030: SPARQL compiler, line 4: syntax error at 'London' before '}'
SPARQL query:
define sql:big-data-const 0
#output-format:text/html
define sql:signal-void-variables 1 define input:default-graph-uri <http://dbpedia.org> select ?resource where {
?resource rdfs:label London
}
You need to put your strings inside single or double quotes. So you'd have a query like this, which parses correctly, but still has no results:
select ?resource where {
?resource rdfs:label "London"
}
SPARQL results
Much of the time, when you're working with DBpedia, it will help a lot to browse the data by hand to find out what's there, and then base your queries on that knowledge. In this case, you're interested in http://dbpedia.org/page/London. If you put that in a web browser, you'll see a bunch of the information London. If you scroll to the bottom, you can click on the N3/Turtle to see the same information in the Turtle serialization, which is very close to the SPARQL syntax. If you search in that page for rdfs:label, you'll see:
dbpedia:London rdfs:label "Londres"#fr ,
"London"#en ,
"London"#sv ,
"Londra"#it ,
"\u30ED\u30F3\u30C9\u30F3"#ja ,
"\u4F26\u6566"#zh ,
"Londres"#es ,
"Londyn"#pl ,
"Londen"#nl ,
"Londres"#pt ,
"\u041B\u043E\u043D\u0434\u043E\u043D"#ru ,
"London"#de ;
Most (perhaps all) of the labels in DBpedia have language tags, so you actually need to search for things that have the label "London"#en. For instance, you can use this query to get thes results:
select ?resource where {
?resource rdfs:label "London"#en
}
SPARQL results
Most of your code for the query is right, but you just need to be using the proper syntax for literals, and need to know what the data in DBpedia looks like. This is also a good case for parameterized SPARQL strings. There's an example in get latitude and longitude of a place dbpedia, but the short idea is that you should be able to write
select ?resource where {
?resource rdfs:label ?label
}
as a ParmeterizedSparqlString, and then use an API method to replace ?label with the literal that you want (in this case, "London"#en) without having to do any string concatenation. The ParameterizedSparqlString will also ensure that all the quoting is done correctly, and that the literal is formatted in the right way. Here's an example:
import com.hp.hpl.jena.query.ParameterizedSparqlString;
import com.hp.hpl.jena.query.QueryExecution;
import com.hp.hpl.jena.query.QueryExecutionFactory;
import com.hp.hpl.jena.query.ResultSet;
import com.hp.hpl.jena.query.ResultSetFactory;
import com.hp.hpl.jena.query.ResultSetFormatter;
import com.hp.hpl.jena.rdf.model.Literal;
import com.hp.hpl.jena.rdf.model.ResourceFactory;
public class LondonExample {
public static void main(String[] args) {
ParameterizedSparqlString qs = new ParameterizedSparqlString( "" +
"prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>\n" +
"\n" +
"select ?resource where {\n" +
" ?resource rdfs:label ?label\n" +
"}" );
Literal london = ResourceFactory.createLangLiteral( "London", "en" );
qs.setParam( "label", london );
System.out.println( qs );
QueryExecution exec = QueryExecutionFactory.sparqlService( "http://dbpedia.org/sparql", qs.asQuery() );
// Normally you'd just do results = exec.execSelect(), but I want to
// use this ResultSet twice, so I'm making a copy of it.
ResultSet results = ResultSetFactory.copyResults( exec.execSelect() );
while ( results.hasNext() ) {
// As RobV pointed out, don't use the `?` in the variable
// name here. Use *just* the name of the variable.
System.out.println( results.next().get( "resource" ));
}
// A simpler way of printing the results.
ResultSetFormatter.out( results );
}
}
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select ?resource where {
?resource rdfs:label "London"#en
}
http://dbpedia.org/resource/London
http://dbpedia.org/resource/Category:London
http://www.ontologyportal.org/SUMO#LondonUnitedKingdom
http://sw.opencyc.org/2008/06/10/concept/en/London
http://sw.opencyc.org/2008/06/10/concept/Mx8Ngx4rvWhwppwpEbGdrcN5Y29ycA-GTG9uZG9uHiu9WLbVnCkRsZ2tw3ljb3Jw
http://wikidata.dbpedia.org/resource/Q1442133
http://wikidata.dbpedia.org/resource/Q261303
http://wikidata.dbpedia.org/resource/Q79348
http://wikidata.dbpedia.org/resource/Q92561
http://wikidata.dbpedia.org/resource/Q3258936
http://wikidata.dbpedia.org/resource/Q84
http://wikidata.dbpedia.org/resource/Q6669759
http://wikidata.dbpedia.org/resource/Q6669762
http://wikidata.dbpedia.org/resource/Q6669763
http://wikidata.dbpedia.org/resource/Q6669771
http://wikidata.dbpedia.org/resource/Q586353
http://wikidata.dbpedia.org/resource/Q1310705
http://wikidata.dbpedia.org/resource/Q1749384
http://wikidata.dbpedia.org/resource/Q3836562
http://wikidata.dbpedia.org/resource/Q3836563
http://wikidata.dbpedia.org/resource/Q3836565
http://wikidata.dbpedia.org/resource/Q1001456
http://wikidata.dbpedia.org/resource/Q5712562
http://wikidata.dbpedia.org/resource/Q3061911
http://wikidata.dbpedia.org/resource/Q6669774
http://wikidata.dbpedia.org/resource/Q6669754
http://wikidata.dbpedia.org/resource/Q6669757
http://wikidata.dbpedia.org/resource/Q6669761
http://wikidata.dbpedia.org/resource/Q6669767
http://wikidata.dbpedia.org/resource/Q6669769
http://wikidata.dbpedia.org/resource/Q2477346
---------------------------------------------------------------------------------------------------------------
| resource |
===============================================================================================================
| <http://dbpedia.org/resource/London> |
| <http://dbpedia.org/resource/Category:London> |
| <http://www.ontologyportal.org/SUMO#LondonUnitedKingdom> |
| <http://sw.opencyc.org/2008/06/10/concept/en/London> |
| <http://sw.opencyc.org/2008/06/10/concept/Mx8Ngx4rvWhwppwpEbGdrcN5Y29ycA-GTG9uZG9uHiu9WLbVnCkRsZ2tw3ljb3Jw> |
| <http://wikidata.dbpedia.org/resource/Q1442133> |
| <http://wikidata.dbpedia.org/resource/Q261303> |
| <http://wikidata.dbpedia.org/resource/Q79348> |
| <http://wikidata.dbpedia.org/resource/Q92561> |
| <http://wikidata.dbpedia.org/resource/Q3258936> |
| <http://wikidata.dbpedia.org/resource/Q84> |
| <http://wikidata.dbpedia.org/resource/Q6669759> |
| <http://wikidata.dbpedia.org/resource/Q6669762> |
| <http://wikidata.dbpedia.org/resource/Q6669763> |
| <http://wikidata.dbpedia.org/resource/Q6669771> |
| <http://wikidata.dbpedia.org/resource/Q586353> |
| <http://wikidata.dbpedia.org/resource/Q1310705> |
| <http://wikidata.dbpedia.org/resource/Q1749384> |
| <http://wikidata.dbpedia.org/resource/Q3836562> |
| <http://wikidata.dbpedia.org/resource/Q3836563> |
| <http://wikidata.dbpedia.org/resource/Q3836565> |
| <http://wikidata.dbpedia.org/resource/Q1001456> |
| <http://wikidata.dbpedia.org/resource/Q5712562> |
| <http://wikidata.dbpedia.org/resource/Q3061911> |
| <http://wikidata.dbpedia.org/resource/Q6669774> |
| <http://wikidata.dbpedia.org/resource/Q6669754> |
| <http://wikidata.dbpedia.org/resource/Q6669757> |
| <http://wikidata.dbpedia.org/resource/Q6669761> |
| <http://wikidata.dbpedia.org/resource/Q6669767> |
| <http://wikidata.dbpedia.org/resource/Q6669769> |
| <http://wikidata.dbpedia.org/resource/Q2477346> |
---------------------------------------------------------------------------------------------------------------

The question mark in a variable name is just to distinguish it from other tokens in a SPARQL query. It could quite legally also be a $ and ?URI and $URI are the same variable.
When working with the Jena API (or pretty much any SPARQL API) you should not need to include the ?/$ in the variable name.
So simple change your getResource() call to omit the leading ? from the variable name like so:
System.out.println(s.getResource("URI").toString());
Note that a Jena ResultSet also has a getResultVars() method that provides a list of variables names present in the results that Jena will recognise. If you ever encounter a similar problem in the future it is worth printing out the variable names to see exactly how Jena is seeing the variables.

Related

UNNEST(ARRAY[]) returning a single row with paranthesis

I am trying to flatten a list of strings coming from the UI using the following SQL query
#Query(value = "INSERT INTO mydb.temp select unnest(array[:myList]) ", nativeQuery = true)
public void findrows(
#Param("myList") List<String> myList) throws MDBServiceException;
The result I was getting is this:
| id |
| -------- |
| (A01,B01)|
Instead I want my result to be like this:
| id |
| --- |
| A01 |
| B01 |
I am also trying with json_array_text_elements but no luck. Any help is much appreciated.

Set returning functions (like unnest) should be used in the FROM clause. And an INSERT statement should always specify the target columns for which values are provided:
INSERT INTO mydb.temp (col1, col2, col3)
select c1, c2, c3
from unnest(array[:myList]) as u(c1, c2, c3)

SQL query using Spark and java language

I have two dataframe on spark .
The first dataframe1 is :
+--------------+--------------+--------------+
|id_z |longitude |latitude |
+--------------+--------------+--------------+
|[12,20,30 ] |-7.0737816 | 33.82666 |
|13 |-7.5952683 | 33.5441916 |
+--------------+--------------+--------------+
The second dataframe2 is :
+--------------+--------------+---------------+
|id_z2 |longitude2 |latitude2 |
+--------------+--------------+---------------+
| 14 |-8.5952683 | 38.5441916 |
| 12 |-7.0737816 | 33.82666 |
+--------------+--------------+---------------+
I want to apply the logic of the following request.
String sql = "SELECT * FROM dataframe2 WHERE dataframe2 .id_z2 IN ("'"
+ id_z +"'") and longitude2 = "'"+longitude+"'" and latitude = "'"+latitude+"'"";
I prefer not to have a join, is it possible to do this?
I really need your help , or just a starting point that will make things easier for me.
Thnak you

Elasticsearch - how to group by and count matches in an index

I have an instance of Elasticsearch running with thousands of documents. My index has 2 fields like this:
|____Type_____|__ Date_added __ |
| walking | 2018-11-27T00:00:00.000 |
| walking | 2018-11-26T00:00:00.000 |
| running | 2018-11-24T00:00:00.000 |
| running | 2018-11-25T00:00:00.000 |
| walking | 2018-11-27T04:00:00.000 |
I want to group by and count how many matches were found for the "type" field, in a certain range.
In SQL I would do something like this:
select type,
count(type)
from index
where date_added between '2018-11-20' and '2018-11-30'
group by type
I want to get something like this:
| type | count |
| running | 2 |
| walking | 3 |
I'm using the High Level Rest Client api in my project, so far my query looks like this, it's only filtering by the start and end time:
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.query(QueryBuilders
.boolQuery()
.must(QueryBuilders
.rangeQuery("date_added")
.from(start.getTime())
.to(end.getTime()))
)
);
How can I do a "group by" in the "type" field? Is it possible to do this in ElasticSearch?

That's a good start! Now you need to add a terms aggregation to your query:
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.query(QueryBuilders.boolQuery()
.must(QueryBuilders
.rangeQuery("date_added")
.from(start.getTime())
.to(end.getTime()))
)
);
// add these two lines
TermsAggregationBuilder groupBy = AggregationBuilders.terms("byType").field("type.keyword");
sourceBuilder.aggregation(groupBy);

After using Val's reply to aggregate the fields, I wanted to print the aggregations of my query together with the value of them. Here's what I did:
Terms terms = searchResponse.getAggregations().get("byType");
Collection<Terms.Bucket> buckets = (Collection<Bucket>) terms.getBuckets();
for (Bucket bucket : buckets) {
System.out.println("Type: " + bucket.getKeyAsString() + " = Count("+bucket.getDocCount()+")");
}
This is the output after running the query in an index with 2700 documents with a field called "type" and 2 different types:
Type: walking = Count(900)
Type: running = Count(1800)

Spark Sql, unable to query multiple possible values in a array

I have the data schema of LinkeIn account as shown below. I need to query the skills which is in the for of array, where array may contains either JAVA OR java OR Java or JAVA developer OR Java developer.
Dataset<Row> sqlDF = spark.sql("SELECT * FROM people"
+ " WHERE ARRAY_CONTAINS(skills,'Java') "
+ " OR ARRAY_CONTAINS(skills,'JAVA')"
+ " OR ARRAY_CONTAINS(skills,'Java developer') "
+ "AND ARRAY_CONTAINS(experience['description'],'Java developer')" );
The above query is what i have tried and please suggest any better way.and also how to use case-insentive query ?

df.printschema()
root
|-- skills: array (nullable = true)
| |-- element: string (containsNull = true)
df.show()
+--------------------+
| skills|
+--------------------+
| [Java, java]|
|[Java Developer, ...|
| [dev]|
+--------------------+
Now lets register it as a temp table:
>>> df.registerTempTable("t")
Now, we will explode the array, convert each element as lower case and query using LIKE operator:
>>> res = sqlContext.sql("select skills, lower(skill) as skill from (select skills, explode(skills) skill from t) a where lower(skill) like '%java%'")
>>> res.show()
+--------------------+--------------+
| skills| skill|
+--------------------+--------------+
| [Java, java]| java|
| [Java, java]| java|
|[Java Developer, ...|java developer|
|[Java Developer, ...| java dev|
+--------------------+--------------+
Now, you can do a distinct on skills field.

Matching strings with at least one word in common

I'm making a query to get the URIs of documents, that have a specific title. My query is:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dc: <http://purl.org/dc/elements/1.1/> SELECT ?document WHERE {
?document dc:title ?title.
FILTER (?title = "…" ).
}
where "…" is actually the value of this.getTitle(), since the query string is generated by:
String queryString = "PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> " +
"PREFIX dc: <http://purl.org/dc/elements/1.1/> SELECT ?document WHERE { " +
"?document dc:title ?title." +
"FILTER (?title = \"" + this.getTitle() + "\" ). }";
With the query above, I get only the documents with titles exactly like this.getTitle(). Imagine this.getTitle is formed by more than 1 word. I'd like to get documents even if only one word forming this.getTitle appears on the document title (for example). How could I do that?

Let's say you've got some data like (in Turtle):
#prefix : <http://stackoverflow.com/q/20203733/1281433> .
#prefix dc: <http://purl.org/dc/elements/1.1/> .
:a dc:title "Great Gatsby" .
:b dc:title "Boring Gatsby" .
:c dc:title "Great Expectations" .
:d dc:title "The Great Muppet Caper" .
Then you can use a query like:
prefix : <http://stackoverflow.com/q/20203733/1281433>
prefix dc: <http://purl.org/dc/elements/1.1/>
select ?x ?title where {
# this is just in place of this.getTitle(). It provides a value for
# ?TITLE that is "Gatsby Strikes Again".
values ?TITLE { "Gatsby Strikes Again" }
# Select a thing and its title.
?x dc:title ?title .
# Then filter based on whether the ?title matches the result
# of replacing the strings in ?TITLE with "|", and matching
# case insensitively.
filter( regex( ?title, replace( ?TITLE, " ", "|" ), "i" ))
}
to get results like
------------------------
| x | title |
========================
| :b | "Boring Gatsby" |
| :a | "Great Gatsby" |
------------------------
What's particularly neat about this is that since you're generating the pattern on the fly, you could even make it based on another value from the graph pattern. For instance, if you want all pairs of things whose titles match on at least one word, you could do:
prefix : <http://stackoverflow.com/q/20203733/1281433>
prefix dc: <http://purl.org/dc/elements/1.1/>
select ?x ?xtitle ?y ?ytitle where {
?x dc:title ?xtitle .
?y dc:title ?ytitle .
filter( regex( ?xtitle, replace( ?ytitle, " ", "|" ), "i" ) && ?x != ?y )
}
order by ?x ?y
to get:
-----------------------------------------------------------------
| x | xtitle | y | ytitle |
=================================================================
| :a | "Great Gatsby" | :b | "Boring Gatsby" |
| :a | "Great Gatsby" | :c | "Great Expectations" |
| :a | "Great Gatsby" | :d | "The Great Muppet Caper" |
| :b | "Boring Gatsby" | :a | "Great Gatsby" |
| :c | "Great Expectations" | :a | "Great Gatsby" |
| :c | "Great Expectations" | :d | "The Great Muppet Caper" |
| :d | "The Great Muppet Caper" | :a | "Great Gatsby" |
| :d | "The Great Muppet Caper" | :c | "Great Expectations" |
-----------------------------------------------------------------
Of course, it's very important to note that you're pulling generating patterns based on your data now, and that means that someone who can put data into your system could put very expensive patterns in to bog down the query and cause a denial-of-service. On a more mundane note, you could run into trouble if any of your titles have characters in them that would interfere with the regular expressions. One interesting problem would be if something had a title with multiple spaces so that the pattern became The|Words|With||Two|Spaces, since the empty pattern in there might make everything match. This is an interesting approach, but it's got a lot of caveats.
In general, you could do this as shown here, or by generating the regular expression in code (where you can take care of escaping, etc.), or you could use a SPARQL engine that supports some text-based extensions (e.g., jena-text, which adds Apache Lucene or Apache Solr to Apache Jena).

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Query SPARQL to DBPedia using Java code - java

Related

UNNEST(ARRAY[]) returning a single row with paranthesis

SQL query using Spark and java language

Elasticsearch - how to group by and count matches in an index

Spark Sql, unable to query multiple possible values in a array

Matching strings with at least one word in common

Categories

Resources