I want to code elasticsearch aggregation in JAVA API to find field collapsing and result grouping.
The json aggregation code is shown below
I've got these code from elasticsearch docs
'dedup_by_score' aggregation has sub aggregation called 'top_hit' aggregation
and use this in terms aggregation for bucket ordering.
... some query
"aggs": {
"dedup_by_score": {
"terms": {
"field": "keyword",
"order": {
"top_hit": "desc"
},
"size": 10
},
"aggs": {
"top_hit": {
"max": {
"script": {
"source": "_score"
}
}
}
}
}
}
I want to convert this json query into JAVA
And this is what I've already tried in JAVA
AggregationBuilder aggregation = AggregationBuilders.terms("dedup_by_score")
.field("keyword")
.order(BucketOrder.aggregation("top_hit", false))
.size(10)
.subAggregation(
AggregationBuilders.topHits("top_hit")
.subAggregation(
AggregationBuilders.max("max").script(new Script("_score"))
)
);
But I got an error like below from Elasticsearch
{
"type":"aggregation_initialization_exception",
"reason":"Aggregator [top_hit] of type [top_hits] cannot accept sub-aggregations"
}
How can I fix this Java code? I'm using Elasticsearch 6.7.1 version now.
Thanks in advance
Top hit aggs can't have sub-aggs. Try this:
AggregationBuilder aggregation = AggregationBuilders.terms("dedup_by_score")
.field("keyword")
.order(BucketOrder.aggregation("top_hit", false))
.size(10)
.subAggregation(
AggregationBuilders.max("max").script(new Script("_score"))
.subAggregation(
AggregationBuilders.topHits("top_hit")
)
);
Related
I have a ElasticSearch Query that is working well (curl), is my first Query,
First I am filtering by Organization (Multitenancy), then group by Customer, Finally sum the amount of the sales but I only want to have the 3 best customers.
My question is.. How to build the aggregation with the AggregationBuilders to get "bucket_sort" statement. I got the sales grouping by customer with Java API.
Elastic Query is:
curl -X POST 'http://localhost:9200/sales/sale/_search?pretty' -H 'Content-Type: application/json' -d '
{
"aggs": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"organization_id": "15"
}
}
]
}
},
"aggs": {
"by_customer": {
"terms": {
"field": "customer_id"
},
"aggs": {
"sum_total" : {
"sum": {
"field": "amount"
}
},
"total_total_sort": {
"bucket_sort": {
"sort": [
{"sum_total": {"order": "desc"}}
],
"size": 3
}
}
}
}
}
}
}
}'
My Java Code:
#Test
public void queryBestCustomers() throws UnknownHostException {
Client client = Query.client();
AggregationBuilder sum = AggregationBuilders.sum("sum_total").field("amount");
AggregationBuilder groupBy = AggregationBuilders.terms("by_customer").field("customer_id").subAggregation(sum);
AggregationBuilder aggregation =
AggregationBuilders
.filters("filtered",
new FiltersAggregator.KeyedFilter("must", QueryBuilders.termQuery("organization_id", "15"))).subAggregation(groupBy);
SearchRequestBuilder requestBuilder = client.prepareSearch("sales")
.setTypes("sale")
.addAggregation(aggregation);
SearchResponse response = requestBuilder.execute().actionGet();
}
I hope I got your question right.
Try adding "order" to your groupBy agg:
AggregationBuilder groupBy = AggregationBuilders.terms("by_customer").field("customer_id").subAggregation(sum).order(Terms.Order.aggregation("sum_total", false));
One more thing, if you want the top 3 clients than your .size(3) should be set on groupBy agg as well and not on sorting. like that:
AggregationBuilder groupBy = AggregationBuilders.terms("by_customer").field("customer_id").subAggregation(sum).order(Terms.Order.aggregation("sum_total", false)).size(3);
As another answer mentioned, "order" does work for your use case.
However there are other use cases where one may want to use bucket_sort. For example if someone wanted to page through the aggregation buckets.
As bucket_sort is a pipeline aggregation you cannot use the AggregationBuilders to instantiate it. Instead you'll need to use the PipelineAggregatorBuilders.
You can read more information about the bucket sort/pipeline aggregation here.
The ".from(50)" in the following code is an example of how you can page through the buckets. This causes the items in the bucket to start from item 50 if applicable. Not including "from" is the equivalent of ".from(0)"
BucketSortPipelineAggregationBuilder paging = PipelineAggregatorBuilders.bucketSort(
"paging", List.of(new FieldSortBuilder("sum_total").order(SortOrder.DESC))).from(50).size(10);
AggregationBuilders.terms("by_customer").field("customer_id").subAggregation(sum).subAggregation(paging);
I wrote a query in MongoDB as follows:
db.getCollection('student').aggregate(
[
{
$match: { "student_age" : { "$ne" : 15 } }
},
{
$group:
{
_id: "$student_name",
count: {$sum: 1},
sum1: {$sum: "$student_age"}
}
}
])
In others words, I want to fetch the count of students that aren't 15 years old and the summary of their age. The query works fine and I get two data items.
In my application, I want to do the query by Spring Data.
I wrote the following code:
Criteria where = Criteria.where("AGE").ne(15);
Aggregation aggregation = Aggregation.newAggregation(
Aggregation.match(where),
Aggregation.group().sum("student_age").as("totalAge"),
count().as("countOfStudentNot15YearsOld"));
When this code is run, the output query will be:
"aggregate" : "MyDocument", "pipeline" :
[ { "$match" { "AGE" : { "$ne" : 15 } } },
{ "$group" : { "_id" : null, "totalAge" : { "$sum" : "$student_age" } } },
{ "$count" : "countOfStudentNot15YearsOld" }],
"cursor" : { "batchSize" : 2147483647 }
Unfortunately, the result is only countOfStudentNot15YearsOld item.
I want to fetch the result like my native query.
If your're asking to return the grouping for both "15" and "not 15" as a result then you're looking for the $cond operator which will allow a "branching" based on conditional evaluation.
From the "shell" content you would use it like this:
db.getCollection('student').aggregate([
{ "$group": {
"_id": null,
"countFiteen": {
"$sum": {
"$cond": [{ "$eq": [ "$student_age", 15 ] }, 1, 0 ]
}
},
"countNotFifteen": {
"$sum": {
"$cond": [{ "$ne": [ "$student_age", 15 ] }, 1, 0 ]
}
},
"sumNotFifteen": {
"$sum": {
"$cond": [{ "$ne": [ "$student_age", 15 ] }, "$student_age", 0 ]
}
}
}}
])
So you use the $cond to perform a logical test, in this case whether the "student_age" in the current document being considered is 15 or not, then you can return a numerical value in response which is 1 here for "counting" or the actual field value when that is what you want to send to the accumulator instead. In short it's a "ternary" operator or if/then/else condition ( which in fact can be shown in the more expressive form with keys ) you can use to test a condition and decide what to return.
For the spring mongodb implementation you use ConditionalOperators.Cond to construct the same BSON expressions:
import org.springframework.data.mongodb.core.aggregation.*;
ConditionalOperators.Cond isFifteen = ConditionalOperators.when(new Criteria("student_age").is(15))
.then(1).otherwise(0);
ConditionalOperators.Cond notFifteen = ConditionalOperators.when(new Criteria("student_age").ne(15))
.then(1).otherwise(0);
ConditionalOperators.Cond sumNotFifteen = ConditionalOperators.when(new Criteria("student_age").ne(15))
.thenValueOf("student_age").otherwise(0);
GroupOperation groupStage = Aggregation.group()
.sum(isFifteen).as("countFifteen")
.sum(notFifteen).as("countNotFifteen")
.sum(sumNotFifteen).as("sumNotFifteen");
Aggregation aggregation = Aggregation.newAggregation(groupStage);
So basically you just extend off of that logic, using .then() for a "constant" value such as 1 for the "counts", and .thenValueOf() where you actually need the "value" of a field from the document, so basically equal to the "$student_age" as shown for the common shell notation.
Since ConditionalOperators.Cond shares the AggregationExpression interface, this can be used with .sum() in the form that accepts an AggregationExpression as opposed to a string. This is an improvement on past releases of spring mongo which would require you to perform a $project stage so there were actual document properties for the evaluated expression prior to performing a $group.
If all you want is to replicate the original query for spring mongodb, then your mistake was using the $count aggregation stage rather than appending to the group():
Criteria where = Criteria.where("AGE").ne(15);
Aggregation aggregation = Aggregation.newAggregation(
Aggregation.match(where),
Aggregation.group()
.sum("student_age").as("totalAge")
.count().as("countOfStudentNot15YearsOld")
);
I am new to programming and I want to get the sum of power used in a month from a data stored in elasticsearch, I've used sense and got the value but still finding it hard using Java API in scala. This is what I did
POST /myIndext/myType/_search?search_type=dfs_query_then_fetch
{
"aggs": {
"duration": {
"date_histogram": {
"field": "Day",
"interval": "month",
"format": "yyyy-MM-dd"},
"aggs": {
"Power_total": {
"sum": {
"field": "myField"
}
}
}
}
}
}
RESULT WAS
( "aggregations": {
"duration": {
"buckets": [
{
"key_as_string": "2017-01-01",
"key": 1480550400000,
"doc_count": 619,
"myField": {
"value": 5218.066633789334
}
}
Then scala code is this
val matchquery = QueryBuilders.matchQuery("ID", configurate)
val queryK = QueryBuilders.matchQuery("ID", configurate)
val filterA = QueryBuilders.rangeQuery("Day").gte("2017-01-02T00:00:05.383+0100").lte("2017-01-13T00:00:05.383+0100")
val query = QueryBuilders.filteredQuery(queryK, filteAr)
val agg = AggregationBuilders.dateHistogram("duration")
.field("Day")
.interval(DateHistogramInterval.MONTH)
.minDocCount(0)
.extendedBounds(new DateTime("2017-01-01T00:00:05.383+0100"), new DateTime("2017-01-13T00:00:05.383+0100"))
.subAggregation(AggregationBuilders.sum("power_total").field("myField"))
val result: SearchResponse = client
.prepareSearch("myIndex")
.setTypes("myType")
.setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
.setQuery(query)
.addAggregation(agg)
.addSort("Day", SortOrder.DESC)
.setSize(815)
.addField("myField")
.execute()
.actionGet()
val results = result.getHits.getHits
println("Current results: " + results.length)
for (hit <- results) {
println("------------------------------")
val response = hit.getSource
println(response)
}
client.close()
RESULT WAS
current result = 0
Please let me know why am not getting value for "myField" like I got using sense.
I have tried doing it severally and still get same errors, could it be that I don't parse the query response the right way?
Everything was correct the only pitfall was that I was querying a date time not stored stored in my database. so instead of "2017-01-01", I was inserting this "2017-01-02"
This is my code in Marvel Sense:
GET /sweet/cake/_search
{
"query": {
"bool": {
"must": [
{"term": {
"code":"18"
}}
]
}
},
"size": 0,
"aggs": {
"group_by_state": {
"terms": {
"field": "id"
}
}
}
}
And I want to write it in Java but I dont't know how.
You can find some examples in the official documentation for the Java client.
But in your case, you need to create one bool/must query using the QueryBuilders and one terms aggregation using the AggregationBuilders. It goes like this:
// build the query
BoolQueryBuilder query = QueryBuilders.boolFilter()
.must(QueryBuilders.termFilter("code", "18"));
// build the terms sub-aggregation
TermsAggregation stateAgg = AggregationBuilders.terms("group_by_state")
.field("id");
SearchResponse resp = client.prepareSearch("sweet")
.setType("cake")
.setQuery(query)
.setSize(0)
.addAggregation(stateAgg)
.execute()
.actionGet();
I found this article in Spring Forum which obviously dicusses partly the same problem, but has no answer to my question.
Given the following document...
{
"_id": { "$oid": "5214b5d529ee12460939e2ba"},
"title": "this is my title",
"tags": [ "fun", "sport" ],
"comments": [
{
"author": "alex",
"text": "this is cool",
"createdAt": 1
},
{
"author": "sam",
"text": "this is bad",
"createdAt": 2
},
{
"author": "jenny",
"text": "this is bad",
"createdAt": 3
}
]
}
... I want to do this aggregation (Javascript) ...
//This is as concise as possible to focus on the actual problem which is the sort operation when ported to Spring!
db.articles.aggregate(
{$unwind:"$comments"},
//do more like match, group, etc...
{$sort:{"comments.createdAt":-1}} //Sort descending -> here the problem occurs in Spring (works in Javascript!)
);
... but with Spring -> Throws Invalid Reference!
Aggregation agg = newAggregation(
unwind("comments"),
sort(Direction.DESC, "comments.createdAt") //Throws invalid reference 'comments.createdAt'!
//How can I make this work?
);
Of course I can do it with the native Java-Driver and without usage of Spring's MongoTemplate but I don't like this approach very much. What can I do to make this exact aggregation work with Spring?
I am using the current Version 1.4.0.RELEASE.
The code as posted indeed works successfully - the problem I had was something else.
I did something like this:
Aggregation agg = newAggregation(
project("comments"), //This was the problem! Without this it works as desired!
unwind("comments"),
sort(Direction.DESC, "comments.createdAt")
);
As I wrote in the code I wanted to project only the comments-Field to save some overhead - but this acutally caused my problem!
Thanks a lot for the hint!