SPARQL Query to escape emojis? - java

I'm using SPARQL query to extract instances which is valid.
But using this query, I can also get instances which name contains emoticon (e.g., http://ko.dbpedia.org/resource/😼), and it gives me an error while iterating over the query resultsets. How can I escape from emojis?
SELECT DISTINCT ?s WHERE {
?s ?p ?o
FILTER regex(str(?s), "^http://ko.dbpedia.org/resource")
}
ORDER BY DESC(?s)
limit 100
Error message is as follows
Exception in thread "main" com.hp.hpl.jena.shared.JenaException: Convert results are FAILED.:virtuoso.jdbc4.VirtuosoException: Virtuoso Communications Link Failure (timeout) : malformed input around byte 34
at virtuoso.jena.driver.VirtuosoQueryExecution$VResultSet.moveForward(VirtuosoQueryExecution.java:498)
at virtuoso.jena.driver.VirtuosoQueryExecution$VResultSet.hasNext(VirtuosoQueryExecution.java:441)
at kr.ac.kaist.dm.BBox.TypeInference.LoadTriple.processTriples(LoadTriple.java:92)
at kr.ac.kaist.dm.BBox.TypeInference.TypeInferenceMain.main(TypeInferenceMain.java:110)
Sample Code is as follows.
VirtuosoQueryExecution vqe = VirtuosoQueryExecutionFactory.create(sparql, set);
ResultSet results = vqe.execSelect();
int i = 0;
while(results.hasNext()){ // <----- LoadTriple.java:92 here.
I just posted the extended version of this question on virtuoso-opensource issue #543.
I just want to escape from emoji rather than including all possible characters like "FILTER regex(?s, \"[a-zA-Z가-힣~!##$%^&*()-_=+|'<>]+\") }"

ENCODE_FOR_URI() should work:
FILTER regex(ENCODE_FOR_URI(str(?s)), "^http://ko.dbpedia.org/resource")
...though you would also need to URI encode the regex match string:
http%3A%2F%2Fko.dbpedia.org%2Fresource

Related

Unable to capture next line character in Java

I have a requirement of parsing through an python file which contains multiple sql queries and get the start and end positions of the query to get only the query part using JAVA
I am using .contains function to check for sql(''' as my opening character for the query and now for the closing character I have ''') but there are some cases where ''') comes in between the query when there is a variable involved which should not be detected as an end of the query.
Something like this :
spark.sql(''' SELECT .......
FROM.....
WHERE xxx IN ('''+ Variable +''')
''')
here the last but one line also gets detected as end of line if I use line.contains(" ''') ") which is wrong.
All I can think of is to check for next line character as the end of the query as each query is separated by two empty lines. So tried these if (line.contains(" ''')\n") & if (line.contains(" ''')\r\n") but none of them work for me.
Kindly let me know of any other way to do this.
Note that I do not have the privilege to change the query file.
Thanks
I believe simple contains won't solve this problem.
You will have to use Pattern if you are looking to match \n.
String query = "spark.sql(''' SELECT .......\n" +
"FROM..... \n" +
"WHERE xxx IN ('''+ Variable +''')\n" +
"''')";
Pattern pattern = Pattern.compile("^spark.sql\\('''(.*)'''\\)$", Pattern.DOTALL);
System.out.println(pattern.matcher(query).find());
Output:
true
Pattern.DOTALL tells Java to allow the dot to match newline characters, too.

JPA Select query not returning results with one letter word

I have a query that when given a word that starts with a one-letter word followed by space character and then another word (ex: "T Distribution"), does not return results. While given "Distribution" alone returns results including the results for "T Distribution". It is the same behavior with all search terms beginning with a one-letter word followed by space character and then another word.
The problem appears when the search term is of this pattern:
"[one-letter][space][letter/word]". example: "o ring".
What would be the problem that the LIKE operator not working correctly in this case?
Here is my query:
#Cacheable(value = "filteredConcept")
#Query("SELECT NEW sina.backend.data.model.ConceptSummaryVer04(s.id, s.arabicGloss, s.englishGloss, s.example, s.dataSourceId,
s.synsetFrequnecy, s.arabicWordsCache, s.englishWordsCache, s.superId, s.categoryId, s.dataSourceCacheAr, s.dataSourceCacheEn,
s.superTypeCasheAr, s.superTypeCasheEn, s.area, s.era, s.rank, s.undiacritizedArabicWordsCache, s.normalizedEnglishWordsCache,
s.isTranslation, s.isGloss, s.arabicSynonymsCount, s.englishSynonymsCount) FROM Concept s
where s.undiacritizedArabicWordsCache LIKE %:searchTerm% AND data_source_id != 200 AND data_source_id != 31")
List<ConceptSummaryVer04> findByArabicWordsCacheAndNotConcept(#Param("searchTerm") String searchTerm, Sort sort);
the result of the query on the database itself:
link to screenshot
results on the database are returned no matter the letters case:
link to screenshot
I solved this problem.
It was due to the default configuration of the Full-text index on mysql database which is by default set to 2 (ft_min_word_len = 2).
I changed that and rebuilt the index. Then, one-letter words were returned by the query.
12.9.6 Fine-Tuning MySQL Full-Text Search
Use some quotes:
LIKE '%:searchTerm%';
Set searchTerm="%your_word%" and use it on query like this :
... s.undiacritizedArabicWordsCache LIKE :searchTerm ...

Neo4J transactional REST api string escape doesn't work

While sending off Cypher queries to Neo4J's transactional Cypher API, I am running into the following error:
Neo.ClientError.Request.InvalidFormat Unable to deserialize request:
Unrecognized character escape ''' (code 39)
My Cypher query looks like this
MATCH (n:Test {id:'test'}) SET n.`label` = 'John Doe\'s house';
While this query works just fine when executed in Neo4J's browser interface it fails when using the REST API. Is this a bug or am I doing something wrong? In case this is not a bug, how do I have to escape ' to get it working in both?
Edit:
I found this answer and tested the triple single and triple double quotes but they just caused another Neo.ClientError.Request.InvalidFormat error to be thrown.
Note: I am using Neo4J 2.2.2
Note 2: Just in case it's important, below is the JSON body I am sending to the endpoint.
{"statements":[
{"statement": "MATCH (n:Test {id:'test'}) SET n.`label` = 'John Doe\'s house';"}
]}
You'll have to escape the \ too:
{"statements":[
{"statement": "MATCH (n:Test {id:'test'}) SET n.`label` = 'John Doe\\'s house';"}
]}
But if you use parameters (recommended), you can do
{"statements":[
{"statement": "MATCH (n:Test {id:'test'}) SET n.`label` = {lbl}",
"parameters" : {"lbl" : "Jane Doe's house"}
}
]}

OpenRdf Exception when parsing data from DBPedia

I use OpenRdf with Sparql to gather data from DBPedia but I encounter some errors on the following query ran against the DBPedia Sparql endpoint:
CONSTRUCT{
?battle ?relation ?data .
}
WHERE{
?battle rdf:type yago:Battle100953559 ;
?relation ?data .
FILTER(?relation != owl:sameAs)
}
LIMIT 1
OFFSET 18177
I modified the LIMIT and OFFSET to point out the specific result that provokes the problem.
The response is this one :
#prefix foaf: <http://xmlns.com/foaf/0.1/> .
#prefix ns1: <http://en.wikipedia.org/wiki/> .
<http://dbpedia.org/resource/Mongol%E2%80%93Jin_Dynasty_War> foaf:isPrimaryTopicOf ns1:Mongol–Jin_Dynasty_War .
The problem is that the ns1:Mongol–Jin_Dynasty_War entity contains a minus sign, therefore I get the following exception when running this query inside a Java application using OpenRdf :
org.openrdf.query.QueryEvaluationException: org.openrdf.rio.RDFParseException: Expected '.', found '–' [line 3]
Is there any way to circumvent this problem ?
Thanks !
To help other users who might encounter the same problem, I'll post here the way to set the preferred output format for Graph Queries using OpenRDF v2.7.x.
You need to creat a subclass of SPARQLRepository to access the HTTPClient (for some reason, the field is protected.
public class NtripleSPARQLRepository extends SPARQLRepository {
public NtripleSPARQLRepository(String endpointUrl) {
super(endpointUrl);
this.getHTTPClient().setPreferredRDFFormat(RDFFormat.NTRIPLES);
}
}
The you just need to create a new Instance of this class :
NtripleSPARQLRepository repository = new NtripleSPARQLRepository(service);
RepositoryConnection connection = new SPARQLConnection(repository);
Query query = connection.prepareQuery(QueryLanguage.SPARQL, "YOUR_QUERY");
If you are querying a Virtuoso server, then you are probably encountering sloppiness in the implementation of Virtuoso. I have seen this when getting XML results (vertical tab in output but only XML 1.0) and most recently in JSON results (\U escape for characters not in Basic Multilingual Plane).

How to replace particular string in JAVA?

I have string like
order by o desc,b asc
Here I want to replace o and b columns of this clause by table_o and table_b and output
order by table_o desc, table_b asc
I am using replace function for that but output becomes like
table_order table_by table_o desc,table_b asc
How to solve this problem using regular expression?
One more example
"order by orders desc, bye asc"
should be replaced as
"order by table_orders desc, table_bye asc"
Here is one possible solution. [You might have to tweak spaces around desc asc and , based on your actual SQL]
String str = "select a,b,c * from Table order by o desc,b asc,c,d";
System.out.println(str.replaceAll(
"(.*order by )?(\\w+)( desc| asc)?(,|$)", "$1table_$2$3$4"));
Result
select a,b,c * from Table order by table_o desc,table_b asc,table_c,table_d
Visual Regex
Regex details
(.*order by)? => will match select a,b,c * from Table order by =>back ref $1
(\\w+) => will match column name =>back ref $2
( desc| asc)? => will match desc or asc => back ref $3
(,|$) => will match trailing comma or endof line => back ref $4
Please Note : this solution only works with simple sql queries, and would produce wrong result if the order byclause is part of inner query of a complex SQL. Moreover Regex is not can not ideal tool to parse SQL syntax
See this link Regular expression to match common SQL syntax?
If full-fledged SQL parsing is required, Its better to use either SQL parsers or Parser generators like ANTLR to parse SQL. See this link for list of available ANTLR SQL grammer
If you just want to replace text like that just use these regexes:
" o "
" b "
Probably you are looking for this? Regular Expressions in Java SE & EE Have a look at Regular Expressions chapter that will do the work most of the times.
Simply use a space in the replace function (you do not need a regex).
Pseudo-code:
string = string_replace(string, " o ", " table_o ")
Edit:
After your example, you can but every valid boundary between [ and ]. The regex will then match is. To get back the origional boundary put it between ( and ) and replace it back.
E.g.:
string = regex_replace(string, "([ \t])o([ \t,])", "\1o\2")
\1 and \2 might be different in your regex implementation.
Also I'd suggest clarifying your case so that it is clear what you really want to replace and also take a look at Truth's suggestion of the XY problem.
You can use code like this to convert your text:
String sql = "select o, b, c,d form Table order by orders ,b asc, c desc,d desc, e";
String text = sql.toLowerCase();
String orderBy = "order by ";
int start = text.indexOf(orderBy);
if (start >= 0) {
String subtext = text.substring(start+orderBy.length());
System.out.printf("Replaceed: [%s%s%s]%n", text.substring(0, start), orderBy, subtext.replaceAll("(\\w+)(\\s+(?:asc|desc)?,?\\s*)?", "table_$1$2"));
}
OUTPUT:
Replaceed: [select o, b, c,d form table order by table_orders ,table_b asc, table_c desc,table_d desc, table_e]

Categories