Elasticsearch grouping by nested fields - java

Is there a way to group by nested fields and perform aggregation on a non-nested fields??
I have data like this in ES
{
"_index": "bighalf",
"_type": "excel",
"_id": "AVE0rgXqe0-x669Gsae3",
"_score": 1,
"_source": {
"Name": "Marsh",
"date": "2015-11-07T10:47:14",
"grade": 9,
"year": 2016,
"marks": 70,
"subject": "Mathematics",
"Gender": "male",
"dob": "22/11/2000",
"sprint": [
{
"sprintdate": "2015-11-06T22:30:00",
"sprintname": "changed",
"sprintpoints": 52
}
]
}
},
{
"_index": "bighalf",
"_type": "excel",
"_id": "AVE0rvTHe0-x669Gsae5",
"_score": 1,
"_source": {
"Name": "Taylor",
"date": "2015-11-07T10:47:14",
"grade": 9,
"year": 2016,
"marks": 54,
"subject": "Mathematics",
"Gender": "male",
"dob": "22/11/2000",
"sprint": [
{
"sprintdate": "2015-11-07T22:30:00",
"sprintname": "jira",
"sprintpoints": 52
}
]
}
}
I wanted to group by sprintname and find sum of marks
I tried like this:
SumBuilder sumGrades = AggregationBuilders.sum("sum_grade").field("grade");
NestedBuilder nested = AggregationBuilders.nested("nested").path("sprint")
.subAggregation(AggregationBuilders.terms("by_sprint").field("sprint.sprintname").subAggregation(sumGrades));
String names[] = { "changed", "jira" };
QueryBuilder query = QueryBuilders.boolQuery().must(
QueryBuilders.nestedQuery("sprint",QueryBuilders.boolQuery().must(QueryBuilders.termsQuery("sprint.sprintname", names))));
FilterAggregationBuilder aggregation = AggregationBuilders.filter("agg").filter(query).subAggregation(nested);
the sum_grade did not work for me. But I changed field(grade) with nested field (sprintpoints) and it worked But my requirement is to find sum("grade") and group by sprint.sprintname.

Since your sprint field is of nested type, in your aggregation you need to use the reverse_nested aggregation in order to "jump back" at the root document from within your nested ones. It goes like this:
SumBuilder sumGrades = AggregationBuilders.sum("sum_grade").field("grade");
ReverseNestedBuilder backToGrades = AggregationBuilders.reverseNested("spring_to_grade")
.subAggregation(sumGrades);
TermsBuilder bySprint = AggregationBuilders.terms("by_sprint")
.field("sprint.sprintname").subAggregation(backToGrades)
NestedBuilder nested = AggregationBuilders.nested("nested").path("sprint")
.subAggregation(bySprint);
String names[] = { "changed", "jira" };
QueryBuilder query = QueryBuilders.boolQuery().must(
QueryBuilders.nestedQuery("sprint",QueryBuilders.boolQuery().must(QueryBuilders.termsQuery("sprint.sprintname", names))));
FilterAggregationBuilder aggregation = AggregationBuilders.filter("agg").filter(query).subAggregation(nested);

Related

How can I parse a GeoPoint value out of an ElasticSearch response in Java?

I am searching an elastic search index from Java using Elastic's high level REST client for JAVA.
My response looks like this...
{
"took": 25,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 10000,
"relation": "gte"
},
"max_score": 2,
"hits": [
{
"_index": "contacts_1_rvmmtqnnlh",
"_type": "_doc",
"_id": "1",
"_score": 2,
"_source": {
"location": {
"lon": -71.34,
"lat": 41.12
}
}
},
{
"_index": "contacts_1_rvmmtqnnlh",
"_type": "_doc",
"_id": "5291485",
"_score": 2,
"_source": {
"home_address1": "208 E Main ST Ste 230",
"firstname": "Geri",
"home_city": "Belleville",
"location": "39.919499456869055,-89.08605153191894",
"lastname": "Boyer"
}
},
...
{
"_index": "contacts_1_rvmmtqnnlh",
"_type": "_doc",
"_id": "5291492",
"_score": 2,
"_source": {
"home_address1": "620 W High ST",
"firstname": "Edna",
"home_city": "Nashville",
"location": "40.55917440131824,-89.24254785283054",
"lastname": "Willis"
}
}
]
}
}
How can I parse out the latitude and longitude of each document hit? The latitude and longitude are stored in a field named "location" that is of type GeoPoint
Here is what I have tried...
SearchHit[] hits = searchResponse.getHits().getHits();
for (SearchHit hit : hits) {
Map<String, Object> contactMap = hit.getSourceAsMap();
LinkedHashMap<String, Object> contactLHM = new LinkedHashMap<>(contactMap);
Object coordinate = contactLHM.get("location");
location.latitude = ??????
location.longitude = ?????
}
How can I parse out the latitude and longitude given that the value of the coordinate variable is
{lon=-71.34, lat=41.12}
By the way, this is the location class definition:
public static class Location{
public Double latitude;
public Double longitude;
}
The source here indicates that you have saved documents with different _source.
You can do that with the geo_point type and of course, query them by using the same query. Basically elasticsearch understands both formats and analyzes them to the same structure (lat, lon), but that doesn't mean that it will change your source (which is exactly the data you saved).
First of all, if that's an option, you need to save the data with only one way, so the _source comes always the same. If that's not an option then you need to handle both formats (location as string, location as object of lat and lon). Moreover, you can update your _source by script.
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update-by-query.html

Implementing priority search in elastic search

I'm trying to implement custom search in elastic search.
Problem statement is consider 3 documents inserted into elastic search with "names" field as array:
{
id:1,
names:["John Wick","Iron man"]
}
{
id:2,
names:["Wick Stone","Nick John"]
}
{
id:3,
names:["Manny Nick","Stone cold"]
}
when I search for "Nick" I want to boost or give priority to document starting with Nick so in this case document with id 2 should come first and then document with id 3 and also if I search for whole name "Manny Nick"
doc with id 3 should be given priority.
In such case, you may want to modify/boost the score of search matched result for required criteria. For example, match the documents with names "Nick" and at the same time modify and boost the score of documents which contains names that start with Nick so that documents that match Nick and also starts with Nick will have higher score.
One of the way to achieve this is using Function Score Query.
In the below query, search is made for keyword "Nick" and matched documents' score is modified and boosted for criteria "names that start with Nick" using Match Phrase Prefix Query with additional weight 20.
{
"query": {
"function_score": {
"query": {
"match": {
"names": "Nick"
}
},
"boost": "1",
"functions": [
{
"filter": {
"match_phrase_prefix": {
"names": "Nick"
}
},
"weight": 20
}
],
"boost_mode": "sum"
}
}
}
Testing:
Inserted data:
{
id:1,
names:["John Wick","Iron man"]
}
{
id:2,
names:["Wick Stone","Nick John"]
}
{
id:3,
names:["Manny Nick","Stone cold"]
}
Output:
{
"took": 10,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 20.693148,
"hits": [
{
"_index": "stack_1",
"_type": "1",
"_id": "T9kn5WsBrk7qsVCmKBGH",
"_score": 20.693148,
"_source": {
"id": 2,
"names": [
"Wick Stone",
"Nick John"
]
}
},
{
"_index": "stack_1",
"_type": "1",
"_id": "Ttkm5WsBrk7qsVCm2RF_",
"_score": 20.287682,
"_source": {
"id": 3,
"names": [
"Manny Nick",
"Stone cold"
]
}
}
]
}
}

How to get Accumulator result using morphia in mongoDB?

I'm using Morphia with MongoDB in Java, I like to get a number of records in aggregation query like that:
AggregationPipeline pipCount = ds.createAggregation(MyTable.class)
.match(query1)
.match(query2)
.unwind("transactions")
.match(query3)
.group("_id", grouping("_id"), grouping("count", new Accumulator("$sum", 1)));
Iterator<MyTable> result = pipCount.aggregate(MyTable.class);
I need to use grouping("_id") to remove duplicate result and then count the result but can't find any way to read sum value...
any idea?
Sample Data:
{
"_id": "00000222",
"create_date": ISODate("2015-05-06T07:20:31.000+0000"),
"update_date": ISODate("2015-05-06T07:20:31.000+0000"),
"payment": 70.0,
"fee": 0.0,
"type": "RECURRING",
"currency": "USD",
"status": "OK",
"transactions": [{
"_id": "111111223",
"amount": 1260.0,
"fee_type": "VARIABLE_ADD",
"fee_rate": 2.75,
"status": "ERROR",
"charges": [{
"_id": "2222223344",
"amount": 1000.0,
"recurring": true,
"firstTime": false,
"oneTime": true,
}, {
"_id": "222222222233221",
"amount": 70.0,
"recurring": true,
"firstTime": true,
"oneTime": true,
}]
}],
"users": {
"_id": "33333333332212",
"update_date": ISODate("2015-12-18T08:03:35.000+0000"),
"user_id": "sdjfhsd#skjksdf.com",
"first_name": "dsjfj",
"last_name": "skdfjf",
}
}
Result: 1
You can try something like this. You don't need an extra grouping. The first group will take care of duplicates while counting the sum and project the count and map the response to document and read the count.
import org.bson.Document;
AggregationPipeline pipCount = datastore.createAggregation(MyTable.class)
.match(query1)
.match(query2)
.unwind("somethingID")
.match(query3)
.group("_id", grouping("count", new Accumulator("$sum", 1)))
.project(Projection.projection("count"));
Iterator<Document> result = pipCount.aggregate(Document.class);
while (result.hasNext()) {
Document document = result.next();
Integer count = document.getInteger("count");
}

Grouping by ID in mongoDB

Can anyone help me with the following aggregate operation in mongodb: having a collection of items with ids and group ids, group them by group ids. For example, for collection of items:
{
"id": 1,
"group_id": 10,
"data": "some_data",
"name": "first"
},
{
"id": 2,
"group_id": 10,
"data": "some_data",
"name": "second"
},
{
"id": 3
"group_id": 20,
"data": "some_data",
"name": "third"
}
Create new collection of groups with the following structure:
{
"id": 10,
"items": [
{
"id": 1,
"group_id": 10,
"data": "some_data",
"name": "first"
},
{
"id": 2,
"group_id": 10,
"data": "some_data",
"name": "second"
}
]
},
{
"id": 10,
"items": [
{
"id": 2,
"group_id": 20,
"data": "some_data",
"name": "third"
}
]
}
The corresponding snippet with Java and spring-data-mongodb will also be appreciated.
In fact I'm doing the same right now with Java and want to move this logic to mongo for paging optimisation.
You can do it with the folowwing simple group aggregation:
db.table.aggregate(
[
{
$group: {
_id : "$group_id",
items : { "$push" : "$$ROOT" }
}
}
]
);
When you want to output the data from the aggregation into a new collection use the $out operator

How to insert a bulk data seperatly into mongoDB?

I have a json file that contains three datas together. I want to insert all three datas seperatly into the mongodB. Is that possible? if yes then how?
{
"docs": [
{
"_id": "First",
"count": 4,
"name": "Fish",
},
{
"_id": "Second",
"count": 6,
"name": "Meat"
},
{
"_id": "Third",
"count": 8,
"name": "Vegetables"
}
]
}
Inserting a group of documents from the mongo client shell:
let,
var input = {
"docs": [
{
"_id": "First",
"count": 4,
"name": "Fish",
},
{
"_id": "Second",
"count": 6,
"name": "Meat"
},
{
"_id": "Third",
"count": 8,
"name": "Vegetables"
}
]
}
Inserting the docs array:
db.collection.insert(input["docs"]);
This would insert each item in the docs array as separate documents in the collection.
db.collection.find();
would give us, three different documents that were inserted.
{ "_id" : "First", "count" : 4, "name" : "Fish" }
{ "_id" : "Second", "count" : 6, "name" : "Meat" }
{ "_id" : "Third", "count" : 8, "name" : "Vegetables" }
To do it in Java, you need to load and parse the JSON file using JSON parsing libraries such as Jackson parser, get the docs array and persist it.

Categories