Cypher do not repeat same results - java

I have a cypher code that goes like this :
start n=node(*)
match p=n-[r:OWES*1..200]->n
return extract(s in relationships(p) : s.amount),
extract(t in nodes(p) : t.name),
length(p)
The query gives back nodes in a closed circle connected with relation OWES up to 200 level deep .
The results given are :
2
[155.55, 100641359]
[LJUBOJEVIC STR I PRZIONA KAFE VL.LJ , SASA , LJUBOJEVIC STR I PRZIONA KAFE VL.LJ ]
2
[100641359, 155.55]
[SASA , LJUBOJEVIC STR I PRZIONA KAFE VL.LJ , SASA ]
3
[100641359, 100641367, 550111.55]
[SASA , LJUBOJEVIC STR I PRZIONA KAFE VL.LJ , ADVOKAT KOSTIC JEVREM VRBAS , SASA ]
3
[100641367, 550111.55, 100641359]
[LJUBOJEVIC STR I PRZIONA KAFE VL.LJ , ADVOKAT KOSTIC JEVREM VRBAS , SASA , LJUBOJEVIC STR I PRZIONA KAFE VL.LJ ]
3
[550111.55, 100641359, 100641367]
[ADVOKAT KOSTIC JEVREM VRBAS , SASA , LJUBOJEVIC STR I PRZIONA KAFE VL.LJ , ADVOKAT KOSTIC JEVREM VRBAS ]
So I get my results returning more times , if it is 3 relations level I get 3 results , 2 I get 2 same results in diferent order .How to change my cypher to get result only once for one path by not giving up from * in a cypher . If not in cypher can I handle this some way in Java .

This is using Cypher 2.0 because I'm making use of the STARTNODE function.
It is a bit of a monstrosity, but it works. I wouldn't use it without adding some serious constraints to keep the overall collection size down.
CREATE
(a {name:'A'}),
(b {name:'B'}),
(c {name:'C'}),
(d {name:'D'}),
(e {name:'E'}),
(f {name:'F'}),
a-[:OWES {amount:100}]->b,
b-[:OWES {amount:200}]->c,
c-[:OWES {amount:300}]->a,
e-[:OWES {amount:400}]->f,
f-[:OWES {amount:500}]->e
start nn=node(*)
MATCH nn-[nr:OWES]->()
WITH nn, nr ORDER BY nn.name, nr.amount
WITH COLLECT([nn, nr.amount]) as sortedPairs
START n=node(*)
match p=n-[r:OWES*1..200]->n
WITH sortedPairs,
extract(s in r: [STARTNODE(s), s.amount]) as pairs
WITH
filter(sp in sortedPairs: ANY(f in pairs WHERE HEAD(f) = HEAD(sp) AND LAST(f) = LAST(sp))) as finalPairs
return distinct
extract(s in finalPairs : HEAD(s)),
extract(s in finalPairs : LAST(s)),
length(finalPairs)
Results:
Detailed Query Results
Query Results
+----------------------------------------------------------------------------------------------------------------------+
| extract(s in finalPairs : HEAD(s)) | extract(s in finalPairs : LAST(s)) | length(finalPairs) |
+----------------------------------------------------------------------------------------------------------------------+
| [Node[39]{name:"E"},Node[38]{name:"F"}] | [400,500] | 2 |
| [Node[43]{name:"A"},Node[42]{name:"B"},Node[41]{name:"C"}] | [100,200,300] | 3 |
+----------------------------------------------------------------------------------------------------------------------+
2 rows
13 ms
Execution Plan
Distinct(_rows=2, _db_hits=0)
ColumnFilter(symKeys=["sortedPairs", "pairs", "finalPairs"], returnItemNames=["finalPairs"], _rows=5, _db_hits=0)
Extract(symKeys=["sortedPairs", "pairs"], exprKeys=["finalPairs"], _rows=5, _db_hits=0)
ColumnFilter(symKeys=["n", "sortedPairs", " UNNAMED155", "pairs", "p", "r"], returnItemNames=["sortedPairs", "pairs"], _rows=5, _db_hits=0)
Extract(symKeys=["n", "sortedPairs", " UNNAMED155", "p", "r"], exprKeys=["pairs"], _rows=5, _db_hits=13)
ExtractPath(name="p", patterns=[" UNNAMED155=n-[:OWES*1..200]->n"], _rows=5, _db_hits=0)
PatternMatch(g="(n)-[' UNNAMED155']-(n)", _rows=5, _db_hits=0)
AllNodes(identifier="n", _rows=6, _db_hits=6)
ColumnFilter(symKeys=[" INTERNAL_AGGREGATEfbdcf75a-046d-4501-9696-1e2c80469b29"], returnItemNames=["sortedPairs"], _rows=1, _db_hits=0)
EagerAggregation(keys=[], aggregates=["( INTERNAL_AGGREGATEfbdcf75a-046d-4501-9696-1e2c80469b29,Collect)"], _rows=1, _db_hits=5)
ColumnFilter(symKeys=["nr", " UNNAMEDS-2101388511", " UNNAMEDS2003458696", "nn", " UNNAMED39"], returnItemNames=["nn", "nr"], _rows=5, _db_hits=0)
Sort(descr=["SortItem(Cached( UNNAMEDS2003458696 of type Any),true)", "SortItem(Cached( UNNAMEDS-2101388511 of type Any),true)"], _rows=5, _db_hits=0)
Extract(symKeys=["nn", " UNNAMED39", "nr"], exprKeys=[" UNNAMEDS2003458696", " UNNAMEDS-2101388511"], _rows=5, _db_hits=10)
PatternMatch(g="(nn)-['nr']-( UNNAMED39)", _rows=5, _db_hits=0)
AllNodes(identifier="nn", _rows=6, _db_hits=6)

Related

Issue with binding values from sub selection in Jena ARQ

I want to run the following simple testing query:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX vcard: <http://www.w3.org/2001/vcard-rdf/3.0#>
SELECT ?givenName ?name_count ?temp
WHERE
{ BIND(if(( ?name_count = 2 ), "just two", "definitely not 2") AS ?temp)
{ SELECT DISTINCT ?givenName (COUNT(?givenName) AS ?name_count)
WHERE
{ ?y vcard:Family ?givenName }
GROUP BY ?givenName
}
}
The graph I am querying is this from the tutorial https://jena.apache.org/tutorials/sparql_data.html:
#prefix vCard: <http://www.w3.org/2001/vcard-rdf/3.0#> .
#prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
<http://somewhere/MattJones/> vCard:FN "Matt Jones" .
<http://somewhere/MattJones/> vCard:N _:b0 .
_:b0 vCard:Family "Jones" .
_:b0 vCard:Given "Matthew" .
<http://somewhere/RebeccaSmith/> vCard:FN "Becky Smith" .
<http://somewhere/RebeccaSmith/> vCard:N _:b1 .
_:b1 vCard:Family "Smith" .
_:b1 vCard:Given "Rebecca" .
<http://somewhere/JohnSmith/> vCard:FN "John Smith" .
<http://somewhere/JohnSmith/> vCard:N _:b2 .
_:b2 vCard:Family "Smith" .
_:b2 vCard:Given "John" .
<http://somewhere/SarahJones/> vCard:FN "Sarah Jones" .
<http://somewhere/SarahJones/> vCard:N _:b3 .
_:b3 vCard:Family "Jones" .
_:b3 vCard:Given "Sarah" .
Now the problem is that running it with Jena:
Query query = QueryFactory.create(theAboveQueryAsString);
QueryExecution qexec = QueryExecutionFactory.create(query, theAboveGraphmodel);
ResultSet execSel = qexec.execSelect();
ResultSetRewindable results = ResultSetFactory.copyResults(execSel);;
ResultSetFormatter.out(System.out, results, query);
gives off this result in console:
----------------------------------
| givenName | name_count | temp |
==================================
| "Smith" | 2 | |
| "Jones" | 2 | |
----------------------------------
having the temp values as null.
On the other hand running the same query on the the same graph in Ontotext GraphDb enviroment i get the correct result (saved as CSV):
givenName | name_count | temp
------------------------------------
Jones | 2 | just two
Smith | 2 | just two
Could there be a bug in the ARQ engine or am I missing something?
Thanks in advance.
I am using jena-arq 3.12.0
Java(TM) SE Runtime Environment (build 1.8.0_181-b13)
Eclipse Version Version: 2019-06 (4.12.0)
There is a join between BIND and sub-select. The arguments to the join step are calculated before the join is done. So the BIND is evaluated, the sub-select is evaluated separately and the results joined. ?name_count isn't set in the BIND assignment. If you move it after the sub-select, it will apply to the results of the sub-select.
BIND adds a binding to the result of the pattern before it.
(base <http://example/base/>
(prefix ((rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>)
(vcard: <http://www.w3.org/2001/vcard-rdf/3.0#>))
(project (?givenName ?name_count ?temp)
(join
(extend ((?temp (if (= ?name_count 2) "just two" "definitely not 2")))
(table unit))
(distinct
(project (?givenName ?name_count)
(extend ((?name_count ?.0))
(group (?givenName) ((?.0 (count ?givenName)))
(bgp (triple ?y vcard:Family ?givenName))))))))))
Here, the (extend...) is one of two argument to the (join...). (table unit) is the "nothing" before the BIND.
If put afterwards, the algebra is:
(base <http://example/base/>
(prefix ((rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>)
(vcard: <http://www.w3.org/2001/vcard-rdf/3.0#>))
(project (?givenName ?name_count ?temp)
(extend ((?temp (if (= ?name_count 2) "just two" "definitely not 2")))
(distinct
(project (?givenName ?name_count)
(extend ((?name_count ?.0))
(group (?givenName) ((?.0 (count ?givenName)))
(bgp (triple ?y vcard:Family ?givenName))))))))))
and the extend (which is from the syntax BIND) is working on the (distinct ... of the sub-query.

Elasticsearch - how to group by and count matches in an index

I have an instance of Elasticsearch running with thousands of documents. My index has 2 fields like this:
|____Type_____|__ Date_added __ |
| walking | 2018-11-27T00:00:00.000 |
| walking | 2018-11-26T00:00:00.000 |
| running | 2018-11-24T00:00:00.000 |
| running | 2018-11-25T00:00:00.000 |
| walking | 2018-11-27T04:00:00.000 |
I want to group by and count how many matches were found for the "type" field, in a certain range.
In SQL I would do something like this:
select type,
count(type)
from index
where date_added between '2018-11-20' and '2018-11-30'
group by type
I want to get something like this:
| type | count |
| running | 2 |
| walking | 3 |
I'm using the High Level Rest Client api in my project, so far my query looks like this, it's only filtering by the start and end time:
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.query(QueryBuilders
.boolQuery()
.must(QueryBuilders
.rangeQuery("date_added")
.from(start.getTime())
.to(end.getTime()))
)
);
How can I do a "group by" in the "type" field? Is it possible to do this in ElasticSearch?
That's a good start! Now you need to add a terms aggregation to your query:
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.query(QueryBuilders.boolQuery()
.must(QueryBuilders
.rangeQuery("date_added")
.from(start.getTime())
.to(end.getTime()))
)
);
// add these two lines
TermsAggregationBuilder groupBy = AggregationBuilders.terms("byType").field("type.keyword");
sourceBuilder.aggregation(groupBy);
After using Val's reply to aggregate the fields, I wanted to print the aggregations of my query together with the value of them. Here's what I did:
Terms terms = searchResponse.getAggregations().get("byType");
Collection<Terms.Bucket> buckets = (Collection<Bucket>) terms.getBuckets();
for (Bucket bucket : buckets) {
System.out.println("Type: " + bucket.getKeyAsString() + " = Count("+bucket.getDocCount()+")");
}
This is the output after running the query in an index with 2700 documents with a field called "type" and 2 different types:
Type: walking = Count(900)
Type: running = Count(1800)

Get rows based on maximum value for subsets of a table

I want to filter a table based on the values of one column, then get the maximum value for each of these values.
e.g.
id | value
-----------
0 | 10
0 | 22
0 | 50
1 | 33
1 | 4
2 | 5
2 | 23
2 | 33
3 | 22
3 | 50
Filter by rows with IDs 2 and 3, then get the maximum of each id
id | value
-----------
2 | 33
3 | 50
How do I use that using hibernate?
This is my attempt:
List<int> ids = ... // Retreived from elsewhere
Disjunction disjunction = Restrictions.disjunction();
for(int id: ids){
disjunction.add(Restrictions.eq("id", id)); // Specify which IDs
}
#SuppressWarnings("unchecked")
List<Item> items= (List<Item>) sessionFactory.getCurrentSession()
.createCriteria(Item.class)
.add(disjunction)
.setProjection(
Projections.projectionList()
.add(Projections.max("value"))
.add(Projections.groupProperty("id")
)
)
.setResultTransformer(Criteria.DISTINCT_ROOT_ENTITY)
.list();
This is just giving me the 'id' with the highest value (e.g. 3, not the entire row)
I am trying to do this in a spring mvc app.
Thanks in advance
select MAX(id),max(value) from ABCD where id in (110,56001) group by id
Try this Query to execute in the form to get the expected output.
You can specify a WHERE clause in your CriteriaQuery, and then do a multiselect to do the GROUP BY:
CriteriaBuilder cb = em.getCriteriaBuilder();
CriteriaQuery<Object[]> query = cb.createQuery(Object[].class);
Root<Item> item = query.from(Item.class);
query.where(cb.equal(item.get("id"), 2));
query.where(cb.equal(item.get("id"), 3));
query.multiselect(item.get("id"), item.max("value")).groupBy(item.get("id"));
List<Object[]> results = em.createQuery(query).getResultList();
System.out.println("id | value\n-----------");
for(Object[] object : results){
System.out.println(object[0] + " | " + object[1]);
}

Mongodb select all fields group by one field and sort by another field

We have collection 'message' with following fields
_id | messageId | chainId | createOn
1 | 1 | A | 155
2 | 2 | A | 185
3 | 3 | A | 225
4 | 4 | B | 226
5 | 5 | C | 228
6 | 6 | B | 300
We want to select all fields of document with following criteria
distict by field 'chainId'
order(sort) by 'createdOn' in desc order
so, the expected result is
_id | messageId | chainId | createOn
3 | 3 | A | 225
5 | 5 | C | 228
6 | 6 | B | 300
We are using spring-data in our java application. I tried to go with different approaches, nothing helped me so far.
Is it possible to achieve above with single query?
What you want is something that can be achieved with the aggregation framework. The basic form of ( which is useful to others ) is:
db.collection.aggregate([
// Group by the grouping key, but keep the valid values
{ "$group": {
"_id": "$chainId",
"docId": { "$first": "$_id" },
"messageId": { "$first": "$messageId" },
"createOn": { "$first": "$createdOn" }
}},
// Then sort
{ "$sort": { "createOn": -1 } }
])
So that "groups" on the distinct values of "messageId" while taking the $first boundary values for each of the other fields. Alternately if you want the largest then use $last instead, but for either smallest or largest by row it probably makes sense to $sort first, otherwise just use $min and $max if the whole row is not important.
See the MongoDB aggregate() documentation for more information on usage, as well as the driver JavaDocs and SpringData Mongo connector documentation for more usage of the aggregate method and possible helpers.
here is the solution using MongoDB Java Driver
final MongoClient mongoClient = new MongoClient();
final DB db = mongoClient.getDB("mstreettest");
final DBCollection collection = db.getCollection("message");
final BasicDBObject groupFields = new BasicDBObject("_id", "$chainId");
groupFields.put("docId", new BasicDBObject("$first", "$_id"));
groupFields.put("messageId", new BasicDBObject("$first", "$messageId"));
groupFields.put("createOn", new BasicDBObject("$first", "$createdOn"));
final DBObject group = new BasicDBObject("$group", groupFields);
final DBObject sortFields = new BasicDBObject("createOn", -1);
final DBObject sort = new BasicDBObject("$sort", sortFields);
final DBObject projectFields = new BasicDBObject("_id", 0);
projectFields.put("_id", "$docId");
projectFields.put("messageId", "$messageId");
projectFields.put("chainId", "$_id");
projectFields.put("createOn", "$createOn");
final DBObject project = new BasicDBObject("$project", projectFields);
final AggregationOutput aggregate = collection.aggregate(group, sort, project);
and the result will be:
{ "_id" : 5 , "messageId" : 5 , "createOn" : { "$date" : "2014-04-23T04:45:45.173Z"} , "chainId" : "C"}
{ "_id" : 4 , "messageId" : 4 , "createOn" : { "$date" : "2014-04-23T04:12:25.173Z"} , "chainId" : "B"}
{ "_id" : 1 , "messageId" : 1 , "createOn" : { "$date" : "2014-04-22T08:29:05.173Z"} , "chainId" : "A"}
I tried it with SpringData Mongo and it didn't work when I group it by chainId(java.lang.NumberFormatException: For input string: "C") was the exception
Replace this line:
final DBObject group = new BasicDBObject("$group", groupFields);
with this one:
final DBObject group = new BasicDBObject("_id", groupFields);
here is the solution using springframework.data.mongodb:
Aggregation aggregation = Aggregation.newAggregation(
Aggregation.group("chainId"),
Aggregation.sort(new Sort(Sort.Direction.ASC, "createdOn"))
);
AggregationResults<XxxBean> results = mongoTemplate.aggregate(aggregation, "collection_name", XxxBean.class);

Matching strings with at least one word in common

I'm making a query to get the URIs of documents, that have a specific title. My query is:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dc: <http://purl.org/dc/elements/1.1/> SELECT ?document WHERE {
?document dc:title ?title.
FILTER (?title = "…" ).
}
where "…" is actually the value of this.getTitle(), since the query string is generated by:
String queryString = "PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> " +
"PREFIX dc: <http://purl.org/dc/elements/1.1/> SELECT ?document WHERE { " +
"?document dc:title ?title." +
"FILTER (?title = \"" + this.getTitle() + "\" ). }";
With the query above, I get only the documents with titles exactly like this.getTitle(). Imagine this.getTitle is formed by more than 1 word. I'd like to get documents even if only one word forming this.getTitle appears on the document title (for example). How could I do that?
Let's say you've got some data like (in Turtle):
#prefix : <http://stackoverflow.com/q/20203733/1281433> .
#prefix dc: <http://purl.org/dc/elements/1.1/> .
:a dc:title "Great Gatsby" .
:b dc:title "Boring Gatsby" .
:c dc:title "Great Expectations" .
:d dc:title "The Great Muppet Caper" .
Then you can use a query like:
prefix : <http://stackoverflow.com/q/20203733/1281433>
prefix dc: <http://purl.org/dc/elements/1.1/>
select ?x ?title where {
# this is just in place of this.getTitle(). It provides a value for
# ?TITLE that is "Gatsby Strikes Again".
values ?TITLE { "Gatsby Strikes Again" }
# Select a thing and its title.
?x dc:title ?title .
# Then filter based on whether the ?title matches the result
# of replacing the strings in ?TITLE with "|", and matching
# case insensitively.
filter( regex( ?title, replace( ?TITLE, " ", "|" ), "i" ))
}
to get results like
------------------------
| x | title |
========================
| :b | "Boring Gatsby" |
| :a | "Great Gatsby" |
------------------------
What's particularly neat about this is that since you're generating the pattern on the fly, you could even make it based on another value from the graph pattern. For instance, if you want all pairs of things whose titles match on at least one word, you could do:
prefix : <http://stackoverflow.com/q/20203733/1281433>
prefix dc: <http://purl.org/dc/elements/1.1/>
select ?x ?xtitle ?y ?ytitle where {
?x dc:title ?xtitle .
?y dc:title ?ytitle .
filter( regex( ?xtitle, replace( ?ytitle, " ", "|" ), "i" ) && ?x != ?y )
}
order by ?x ?y
to get:
-----------------------------------------------------------------
| x | xtitle | y | ytitle |
=================================================================
| :a | "Great Gatsby" | :b | "Boring Gatsby" |
| :a | "Great Gatsby" | :c | "Great Expectations" |
| :a | "Great Gatsby" | :d | "The Great Muppet Caper" |
| :b | "Boring Gatsby" | :a | "Great Gatsby" |
| :c | "Great Expectations" | :a | "Great Gatsby" |
| :c | "Great Expectations" | :d | "The Great Muppet Caper" |
| :d | "The Great Muppet Caper" | :a | "Great Gatsby" |
| :d | "The Great Muppet Caper" | :c | "Great Expectations" |
-----------------------------------------------------------------
Of course, it's very important to note that you're pulling generating patterns based on your data now, and that means that someone who can put data into your system could put very expensive patterns in to bog down the query and cause a denial-of-service. On a more mundane note, you could run into trouble if any of your titles have characters in them that would interfere with the regular expressions. One interesting problem would be if something had a title with multiple spaces so that the pattern became The|Words|With||Two|Spaces, since the empty pattern in there might make everything match. This is an interesting approach, but it's got a lot of caveats.
In general, you could do this as shown here, or by generating the regular expression in code (where you can take care of escaping, etc.), or you could use a SPARQL engine that supports some text-based extensions (e.g., jena-text, which adds Apache Lucene or Apache Solr to Apache Jena).

Categories