OpenRdf Exception when parsing data from DBPedia

OpenRdf Exception when parsing data from DBPedia - java

I use OpenRdf with Sparql to gather data from DBPedia but I encounter some errors on the following query ran against the DBPedia Sparql endpoint:
CONSTRUCT{
?battle ?relation ?data .
}
WHERE{
?battle rdf:type yago:Battle100953559 ;
?relation ?data .
FILTER(?relation != owl:sameAs)
}
LIMIT 1
OFFSET 18177
I modified the LIMIT and OFFSET to point out the specific result that provokes the problem.
The response is this one :
#prefix foaf: <http://xmlns.com/foaf/0.1/> .
#prefix ns1: <http://en.wikipedia.org/wiki/> .
<http://dbpedia.org/resource/Mongol%E2%80%93Jin_Dynasty_War> foaf:isPrimaryTopicOf ns1:Mongol–Jin_Dynasty_War .
The problem is that the ns1:Mongol–Jin_Dynasty_War entity contains a minus sign, therefore I get the following exception when running this query inside a Java application using OpenRdf :
org.openrdf.query.QueryEvaluationException: org.openrdf.rio.RDFParseException: Expected '.', found '–' [line 3]
Is there any way to circumvent this problem ?
Thanks !

To help other users who might encounter the same problem, I'll post here the way to set the preferred output format for Graph Queries using OpenRDF v2.7.x.
You need to creat a subclass of SPARQLRepository to access the HTTPClient (for some reason, the field is protected.
public class NtripleSPARQLRepository extends SPARQLRepository {
public NtripleSPARQLRepository(String endpointUrl) {
super(endpointUrl);
this.getHTTPClient().setPreferredRDFFormat(RDFFormat.NTRIPLES);
}
}
The you just need to create a new Instance of this class :
NtripleSPARQLRepository repository = new NtripleSPARQLRepository(service);
RepositoryConnection connection = new SPARQLConnection(repository);
Query query = connection.prepareQuery(QueryLanguage.SPARQL, "YOUR_QUERY");

If you are querying a Virtuoso server, then you are probably encountering sloppiness in the implementation of Virtuoso. I have seen this when getting XML results (vertical tab in output but only XML 1.0) and most recently in JSON results (\U escape for characters not in Basic Multilingual Plane).

Related

SPARQL Query to escape emojis?

I'm using SPARQL query to extract instances which is valid.
But using this query, I can also get instances which name contains emoticon (e.g., http://ko.dbpedia.org/resource/😼), and it gives me an error while iterating over the query resultsets. How can I escape from emojis?
SELECT DISTINCT ?s WHERE {
?s ?p ?o
FILTER regex(str(?s), "^http://ko.dbpedia.org/resource")
}
ORDER BY DESC(?s)
limit 100
Error message is as follows
Exception in thread "main" com.hp.hpl.jena.shared.JenaException: Convert results are FAILED.:virtuoso.jdbc4.VirtuosoException: Virtuoso Communications Link Failure (timeout) : malformed input around byte 34
at virtuoso.jena.driver.VirtuosoQueryExecution$VResultSet.moveForward(VirtuosoQueryExecution.java:498)
at virtuoso.jena.driver.VirtuosoQueryExecution$VResultSet.hasNext(VirtuosoQueryExecution.java:441)
at kr.ac.kaist.dm.BBox.TypeInference.LoadTriple.processTriples(LoadTriple.java:92)
at kr.ac.kaist.dm.BBox.TypeInference.TypeInferenceMain.main(TypeInferenceMain.java:110)
Sample Code is as follows.
VirtuosoQueryExecution vqe = VirtuosoQueryExecutionFactory.create(sparql, set);
ResultSet results = vqe.execSelect();
int i = 0;
while(results.hasNext()){ // <----- LoadTriple.java:92 here.
I just posted the extended version of this question on virtuoso-opensource issue #543.
I just want to escape from emoji rather than including all possible characters like "FILTER regex(?s, \"[a-zA-Z가-힣~!##$%^&*()-_=+|'<>]+\") }"

ENCODE_FOR_URI() should work:
FILTER regex(ENCODE_FOR_URI(str(?s)), "^http://ko.dbpedia.org/resource")
...though you would also need to URI encode the regex match string:
http%3A%2F%2Fko.dbpedia.org%2Fresource

Neo4J transactional REST api string escape doesn't work

While sending off Cypher queries to Neo4J's transactional Cypher API, I am running into the following error:
Neo.ClientError.Request.InvalidFormat Unable to deserialize request:
Unrecognized character escape ''' (code 39)
My Cypher query looks like this
MATCH (n:Test {id:'test'}) SET n.`label` = 'John Doe\'s house';
While this query works just fine when executed in Neo4J's browser interface it fails when using the REST API. Is this a bug or am I doing something wrong? In case this is not a bug, how do I have to escape ' to get it working in both?
Edit:
I found this answer and tested the triple single and triple double quotes but they just caused another Neo.ClientError.Request.InvalidFormat error to be thrown.
Note: I am using Neo4J 2.2.2
Note 2: Just in case it's important, below is the JSON body I am sending to the endpoint.
{"statements":[
{"statement": "MATCH (n:Test {id:'test'}) SET n.`label` = 'John Doe\'s house';"}
]}

You'll have to escape the \ too:
{"statements":[
{"statement": "MATCH (n:Test {id:'test'}) SET n.`label` = 'John Doe\\'s house';"}
]}
But if you use parameters (recommended), you can do
{"statements":[
{"statement": "MATCH (n:Test {id:'test'}) SET n.`label` = {lbl}",
"parameters" : {"lbl" : "Jane Doe's house"}
}
]}

Aggregating within Apache Jena

I'm using the Java API of Apache Jena to store and retrieve documents and the words within them. For this I decided to set up the following datastructure:
_dataset = TDBFactory.createDataset("./database");
_dataset.begin(ReadWrite.WRITE);
Model model = _dataset.getDefaultModel();
Resource document= model.createResource("http://name.space/Source/DocumentA");
document.addProperty(RDF.value, "Document A");
Resource word = model.createResource("http://name.space/Word/aword");
word.addProperty(RDF.value, "aword");
Resource resource = model.createResource();
resource.addProperty(RDF.value, word);
resource.addProperty(RSS.items, "5");
document.addProperty(RDF.type, resource);
_dataset.commit();
_dataset.end();
The code example above represents a document ("Document A") consisting of five (5) words ("aword"). The occurences of a word in a document are counted and stored as a property. A word can also occur in other documents, therefore the occurence count relating to a specific word in a specific document is linked together by a blank node. (I'm not entirely sure if this structure makes any sense as I'm fairly new to this way of storing information, so please feel free to provide better solutions!)
My major question is: How can I get a list of all distinct words and the sum of their occurences over all documents?

Your data model is a bit unconventional, in my opinion. With your code, you'll end up with data that looks like this (in Turtle notation), and which uses rdf:type and rdf:value in unconventional ways:
:doc rdf:value "document a" ;
rdf:type :resource .
:resource rdf:value :word ;
:items 5 .
:word rdf:value "aword" .
It's unusual, because usually you wouldn't have such complex information on the type attribute of a resource. From the SPARQL standpoint though, rdf:type and rdf:value are properties just like any other, and you can still retrieve the information you're looking for with a simple query. It would look more or less like this (though you'll need to define some prefixes, etc.):
select ?word (sum(?n) as ?nn) where {
?document rdf:type ?type .
?type rdf:value/rdf:value ?word ;
:items ?n .
}
group by ?word
That query will produce a result for each word, and with each will be the sum of all the values of the :items properties associated with the word. There are lots of questions on Stack Overflow that have examples of running SPARQL queries with Jena. E.g., (the first one that I found with Google): Query Jena TDB store.

Rally REST Query throwing MalformedJsonException

I'm trying to query something from the Rally database. Right now I'm just trying to make sure I can get through initially. This code:
//create rallyrest object
RallyRestApi restApi = new RallyRestApi(new URI(hostname), username, password);
restApi.setApplicationName("QueryTest");
restApi.setWsapiVersion("v2.0");
restApi.setApplicationVersion("1.1");
System.out.println("1: So far, so good -- RallyRestApi object created");
try {
//create query request
String type = "HierarchicalRequirement";
QueryRequest qreq = new QueryRequest(type);
System.out.println("2: Still going -- Query Request Created");
//set fetch, filter, and project
qreq.setFetch(new Fetch("Name","FormattedID"));
qreq.setQueryFilter(new QueryFilter("Name", "contains", "freight"));
qreq.setProject(projectNumber);
System.out.println("3: Going strong -- Fetch, Filter, and Project set");
//create response from query********Blows up
QueryResponse resp = restApi.query(qreq);
System.out.println("4: We made it!");
} catch (Exception e) {
System.out.println("Error: " + e);
}
finally {
restApi.close();
}
}
gives me this error:
Exception in thread "main" com.google.gson.JsonSyntaxException: com.google.gson.stream.MalformedJsonException: Use JsonReader.setLenient(true) to accept malformed JSON at line 1 column 22
at com.google.gson.JsonParser.parse(JsonParser.java:65)
at com.google.gson.JsonParser.parse(JsonParser.java:45)
at com.rallydev.rest.response.Response.<init>(Response.java:25)
at com.rallydev.rest.response.QueryResponse.<init>(QueryResponse.java:18)
at com.rallydev.rest.RallyRestApi.query(RallyRestApi.java:227)
at RQuery.main(RQuery.java:65)
Caused by: com.google.gson.stream.MalformedJsonException: Use JsonReader.setLenient(true) to accept malformed JSON at line 1 column 22
at com.google.gson.stream.JsonReader.syntaxError(JsonReader.java:1505)
at com.google.gson.stream.JsonReader.checkLenient(JsonReader.java:1386)
at com.google.gson.stream.JsonReader.doPeek(JsonReader.java:531)
at com.google.gson.stream.JsonReader.peek(JsonReader.java:414)
at com.google.gson.JsonParser.parse(JsonParser.java:60)
... 5 more
Java Result: 1
Could someone please explain why this is happening? Is my code wrong? If I need to do as the error suggests and set Json.lenient(true), please give me instructions on how to do that with my code.
Thank you!

Your code works for me. I don't see anything wrong with the code.
Try different query, maybe there are some extra characters.
See this post - it mentioned a case with trailing NUL (\0) characters. Perhaps you need to set lenient to true, but I don't know how to do it: there is no direct access to it when working with Rally QueryResponse.
The reason for trying a different query is that there are two local factors: your java environment and your data. MalformedJsonException indicates bad json which points to data. You are fetching only "Name" and "FormattedID", so chances are the culprit is somewhere in the name. Try a different query, e.g. (FormattedID = US123) but choose the story that does not contain "freight" in the name. Establish that at least one particular query works - it will indicate further that the issue is indeed related to data, and not the environment.
Next, try the same query (Name contains "freight") directly in WS API, which is an interactive document where queries can be tested. An equivalent of the query in your code can also be pasted in the browser:
https://rally1.rallydev.com/slm/webservice/v2.0/hierarchicalrequirement?workspace=https://rally1.rallydev.com/slm/webservice/v2.0/workspace/123&query=(Name%20contains%20%22fraight%22)&start=1&pagesize=200&fetch=Name,FormattedID
Make sure to replace 123 in /workspace/123 with the valid OID of your workspace.
Does the query return or you see the same error in the browser?
If the query returns, what is the TotalResultCount?
The total result count will help to troubleshoot further. You may run your code one page at a time, and knowing the TotalResultCount it is possible to manipulate pagesize, start and limit to narrow down your code to the page where the culprit story exists (assuming that there is a culprit story). Here is an example:
qreq.setPageSize(200);
qreq.setStart(2);
qreq.setLimit(200);
Maximum pagesize is 200. Default is 20. The actual number to use depends on TotalResultCount.
The start index for queries begins at 1. The default is 1. In this example we start with second page
My environment is Java SE 1.6 and these jars:
httpcore-4.2.4.jar
httpclient-4.2.5.jar
commons-logging-1.1.1.jar
commons-codec-1.6.jar
gson-2.2.4.jar
rally-rest-api-2.0.4.jar

Faceting using SolrJ and Solr4

I've gone through the related questions on this site but haven't found a relevant solution.
When querying my Solr4 index using an HTTP request of the form
&facet=true&facet.field=country
The response contains all the different countries along with counts per country.
How can I get this information using SolrJ?
I have tried the following but it only returns total counts across all countries, not per country:
solrQuery.setFacet(true);
solrQuery.addFacetField("country");
The following does seem to work, but I do not want to have to explicitly set all the groupings beforehand:
solrQuery.addFacetQuery("country:usa");
solrQuery.addFacetQuery("country:canada");
Secondly, I'm not sure how to extract the facet data from the QueryResponse object.
So two questions:
1) Using SolrJ how can I facet on a field and return the groupings without explicitly specifying the groups?
2) Using SolrJ how can I extract the facet data from the QueryResponse object?
Thanks.
Update:
I also tried something similar to Sergey's response (below).
List<FacetField> ffList = resp.getFacetFields();
log.info("size of ffList:" + ffList.size());
for(FacetField ff : ffList){
String ffname = ff.getName();
int ffcount = ff.getValueCount();
log.info("ffname:" + ffname + "|ffcount:" + ffcount);
}
The above code shows ffList with size=1 and the loop goes through 1 iteration. In the output ffname="country" and ffcount is the total number of rows that match the original query.
There is no per-country breakdown here.
I should mention that on the same solrQuery object I am also calling addField and addFilterQuery. Not sure if this impacts faceting:
solrQuery.addField("user-name");
solrQuery.addField("user-bio");
solrQuery.addField("country");
solrQuery.addFilterQuery("user-bio:" + "(Apple OR Google OR Facebook)");
Update 2:
I think I got it, again based on what Sergey said below. I extracted the List object using FacetField.getValues().
List<FacetField> fflist = resp.getFacetFields();
for(FacetField ff : fflist){
String ffname = ff.getName();
int ffcount = ff.getValueCount();
List<Count> counts = ff.getValues();
for(Count c : counts){
String facetLabel = c.getName();
long facetCount = c.getCount();
}
}
In the above code the label variable matches each facet group and count is the corresponding count for that grouping.

Actually you need only to set facet field and facet will be activated (check SolrJ source code):
solrQuery.addFacetField("country");
Where did you look for facet information? It must be in QueryResponse.getFacetFields (getValues.getCount)

In the solr Response you should use QueryResponse.getFacetFields() to get List of FacetFields among which figure "country". so "country" is idenditfied by QueryResponse.getFacetFields().get(0)
you iterate then over it to get List of Count objects using
QueryResponse.getFacetFields().get(0).getValues().get(i)
and get value name of facet using QueryResponse.getFacetFields().get(0).getValues().get(i).getName()
and the corresponding weight using
QueryResponse.getFacetFields().get(0).getValues().get(i).getCount()

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.