Aggregations with Java High Level Rest Client - java

edit: I'm using elastic search 7.3.0
I'm trying to do a query with an aggregation and sub aggregation, but the sub aggregation is absent from the SearchResponse.
As part of debugging, I ran my query in a unit test, copied the query, and ran it manually with postman. There, the response is exactly what I expect, but for some reason, in my java code, parts are missing.
SearchRequest request = new SearchRequest("index");
SearchSourceBuidler search = new SearchSourceBuilder();
SortBuilder sortByDate = SortBuilders
.fieldSort("date")
.order(SortOrder.DESC);
// Getting the latest result for each bucket
TopHitsAggregationBuilder latestResults = AggregationBuilders
.topHits("latest")
.sort(sortByDate)
.fetchSource("*","")
.size(1);
// Aggregate per service
TermsAggregationBuilder perService = AggregationBuilders
.terms("services")
.field("service.service_id")
.subAggregation(latestResults);
search.aggregation(perService);
search.size(1);
request.source(search);
SearchResponse response = client.search(request, RequestOptions.DEFAULT);
Here is the request generated:
{
"size": 0,
"aggregations": {
"services": {
"terms": {
"field": "service.service_id",
"size": 10,
"min_doc_count": 1,
"shard_min_doc_count": 0,
"show_term_doc_count_error": false,
"order": [
{
"_count": "desc"
},
{
"_key": "asc"
}
]
},
"aggregations": {
"latest": {
"top_hits": {
"from": 0,
"size": 1,
"version": false,
"seq_no_primary_term": false,
"explain": false,
"_source": {
"includes": [
"*"
],
"excludes": [
""
]
},
"sort": [
{
"date": {
"order": "desc"
}
}
]
}
}
}
}
}
}
In my code, response is
{
...
"aggregations": {
"sterms#services": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": []
}
}
}
If I run the same query manually I get
{
...
"aggregations": {
"services": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "09045f59-3709-4769-8c92-d611f773a401",
"doc_count": 2,
"latest": {
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": null,
"hits": [ ... ]

Related

Getting Unique results for Objects inside List in Elasticsearch

I have mapping like this
"custom_metadata": {
"properties": {
"key": {
"type": "keyword"
},
"value": {
"type": "keyword"
}
}
}
The ingested data looks like this
// data in document 1
"custom_metadata": [
{
"value": "NPL",
"key": "schema"
},
{
"value": "SAPERP",
"key": "system"
}
]
// data in document 2
"custom_metadata": [
{
"value": "trial",
"key": "schema"
},
{
"value": "Oracle",
"key": "system"
}
]
I want to aggregate on each key and get relevant value in search results, like this
"schema": [
{"value": "NPL"},
{ "value": "trial",}
],
"system":[
{"value": "SAPERP"},
{ "value": "Oracle",}
]
Note: Above output is for representation.If I get something like in ES then I can parse and get desired result on service side
What I have tried:
"custom_metadata_key": {
"terms": {
"field": "custom_metadata.key"
},
"aggregations": {
"custom_metadata_value": {
"terms": {
"field": "custom_metadata.value"
}
}
}
}
Above nested agg , aggregates each key and give all the values in results.
{
"key" : "schema",
"doc_count" : 2,
"custom_metadata_value" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Oracle",
"doc_count" : 2
},
{
"key" : "NPL",
"doc_count" : 1
},
{
"key" : "SAPERP",
"doc_count" : 1
},
{
"key" : "trial",
"doc_count" : 1
}
]
}
}
Above output repeats for all key and gives same aggregation for all.
You need to change data type of field custom_metadata from object to nested and you can achive your desire output easily.
Mapping
{
"mappings": {
"properties": {
"custom_metadata":{
"type": "nested"
}
}
}
}
Query
{
"size": 0,
"aggs": {
"data": {
"nested": {
"path": "custom_metadata"
},
"aggs": {
"custom_metadata_key": {
"terms": {
"field": "custom_metadata.key.keyword",
"size": 10
},
"aggs": {
"custom_metadata_value": {
"terms": {
"field": "custom_metadata.value.keyword",
"size": 10
}
}
}
}
}
}
}
}
Output
"aggregations": {
"data": {
"doc_count": 4,
"custom_metadata_key": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "schema",
"doc_count": 2,
"custom_metadata_value": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "NPL",
"doc_count": 1
},
{
"key": "trial",
"doc_count": 1
}
]
}
},
{
"key": "system",
"doc_count": 2,
"custom_metadata_value": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Oracle",
"doc_count": 1
},
{
"key": "SAPERP",
"doc_count": 1
}
]
}
}
]
}
}
}

Elasticsearch: Filter the records based on nested field with nested field containing only the filtered object

I am trying to filter the records based on nested field and want only the matching object in that array to be shown as part of the record.
Below is the detailed explanation of my requirement.
So, I have Elasticsearch data like this:
[{
"basicInfo": {
"requestId": 123,
},
"managerInfo": {
"manager": "John",
},
"groupInfo": [
{
"id": "id1",
"name": "abc",
"status": "Approved"
},
{
"id": "id2",
"name": "abc",
"status": "Pending"
}
]
},
{
"basicInfo": {
"requestId": 233,
},
"managerInfo": {
"manager": "John Sr",
},
"groupInfo": [
{
"id": "id3",
"name": "abc",
"status": "Pending"
}
]
}
]
I want to filter the records only with groupInfo.status as Approved and basicInfo.requestId as 123, but my condition is I should only get the Approved record in the groupInfo and not the pending ones. So, the output I am expecting is:
{
"took": 23,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 3.0602708,
"hits": [
{
"_index": "my_index",
"_type": "request",
"_id": "123",
"_score": 3.0602708,
"_source": {
"basicInfo": {
"requestId": 123
},
"managerInfo": {
"manager": "John"
},
"groupInfo": [
{
"id": "id1",
"name": "abc",
"status": "Approved"
}
// No id2 here as it is in pending state
]
}
}
]
}
}
But instead I am able to achieve:
{
"took": 23,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 3.0602708,
"hits": [
{
"_index": "my_index",
"_type": "request",
"_id": "123",
"_score": 3.0602708,
"_source": {
"basicInfo": {
"requestId": 123
},
"managerInfo": {
"manager": "John"
},
"groupInfo": [
{
"id": "id1",
"name": "abc",
"status": "Approved"
},
{
"id": "id2",
"name": "abc",
"status": "Pending"
}
]
}
}
]
}
}
This is the query I am using:
{
"query": {
"bool": {
"must": [
{
"match": {
"basicInfo.requestId": "123"
}
},
{
"nested": {
"path": "groupInfo",
"query": {
"bool": {
"must": [
{
"term": {
"groupInfo.status": "Approved"
}
}
]
}
}
}
}
]
}
}
}
So, my question is first what I am expecting, is that even possible? Can we filter the result and make sure that we get only the matched array from that result?
If yes, how can we do it?
Thanks in advance.
Maybe you are looking for Inner Hits.
In many cases, it’s very useful to know which inner nested objects (in
the case of nested) or children/parent documents (in the case of
parent/child) caused certain information to be returned. The inner
hits feature can be used for this. This feature returns per search hit
in the search response additional nested hits that caused a search hit
to match in a different scope.
{
"query": {
"bool": {
"must": [
{
"match": {
"basicInfo.requestId": "123"
}
},
{
"nested": {
"path": "groupInfo",
"query": {
"bool": {
"must": [
{
"term": {
"groupInfo.status": "Approved"
}
}
]
}
},
"inner_hits":{}
}
}
]
}
}
}

how to translate nested elastic search query in java?

Below query will do filter and aggregation how to translate this to java code. Query works from postman the same needs to be converted into java using java client api. and i am using rest high level client as elastic search client. i tried with the below java code but the generated query is bit different than the the actual below is java code which i have tried.
BoolQueryBuilder booleanQuery = QueryBuilders.boolQuery();
booleanQuery.filter(QueryBuilders
.queryStringQuery(String.join(" OR ", exactMatchThese))
.field("events.recommendationData.exceptionId"));
QueryBuilder queryBuilder = QueryBuilders.nestedQuery("events.recommendationData", booleanQuery, ScoreMode.None);
Search Query which is working
GET <index-name>/_search
{
"query": {
"bool": {
"filter": [
{
"nested": { --> note
"path": "events.recommendationData",
"query": {
"query_string": {
"query": "\"1\" OR \"2\"",
"fields": [
"events.recommendationData.exceptionId"
],
"type": "best_fields",
"default_operator": "or",
"max_determinized_states": 10000,
"enable_position_increments": true,
"fuzziness": "AUTO",
"fuzzy_prefix_length": 0,
"fuzzy_max_expansions": 50,
"phrase_slop": 0,
"escape": false,
"auto_generate_synonyms_phrase_query": true,
"fuzzy_transpositions": true,
"boost": 1
}
}
}
}
]
}
},
"size": 1,
"aggs": {
"genres": {
"nested": {
"path": "events.recommendationData.recommendations"
},
"aggs": {
"nested_comments_recomms": {
"terms": {
"field": "events.recommendationData.recommendations.recommendationType"
}
}
}
}
}
}
Below Search Query Generated from above java code which i have mentioned and is not working.
{
"query": {
"nested": {
"query": {
"bool": {
"filter": [
{
"query_string": {
"query": "\"1\" OR \"2\"",
"fields": [
"events.recommendationData.exceptionId^1.0"
],
"type": "best_fields",
"default_operator": "or",
"max_determinized_states": 10000,
"enable_position_increments": true,
"fuzziness": "AUTO",
"fuzzy_prefix_length": 0,
"fuzzy_max_expansions": 50,
"phrase_slop": 0,
"escape": false,
"auto_generate_synonyms_phrase_query": true,
"fuzzy_transpositions": true,
"boost": 1
}
}
],
"adjust_pure_negative": true,
"boost": 1
}
},
"path": "events.recommendationData",
"ignore_unmapped": false,
"score_mode": "none",
"boost": 1
}
},
"aggregations": {
"recommendationTypes": {
"terms": {
"field": "events.recommendationData.recommendations.recommendationType",
"size": 10,
"min_doc_count": 1,
"shard_min_doc_count": 0,
"show_term_doc_count_error": false,
"order": [
{
"_count": "desc"
},
{
"_key": "asc"
}
]
}
}
}
}
Your inner-most query block is query string i.e.
QueryStringQueryBuilder queryString = QueryBuilders
.queryStringQuery(String.join(" OR ", exactMatchThese));
This is the query part of nested query hence we create a nested query and assign the above query to it as written below,
NestedQueryBuilder nestedQuery = QueryBuilders
.nestedQuery("events.recommendationData", queryString, ScoreMode.None);
Finally add the above query to the filter clause of bool query,
BoolQueryBuilder boolQuery = QueryBuilders.boolQuery().filter(nestedQuery);
All together this is,
QueryStringQueryBuilder queryString = QueryBuilders
.queryStringQuery(String.join(" OR ", exactMatchThese));
NestedQueryBuilder nestedQuery = QueryBuilders
.nestedQuery("events.recommendationData", queryString, ScoreMode.None);
BoolQueryBuilder boolQuery = QueryBuilders.boolQuery().filter(nestedQuery);

Limiting Bucket Size Returned By Aggregation

I have a huge amount of elasticsearch data, I neeed to make aggregations and return buckets. I need to limit data size returned from elasticsearch to only get a sample for the data not all data.
I have tried adding "size" attribute. But it's not acceptable in bucketing aggregations.
{
"size": 0,
"query": {
"bool": {
"adjust_pure_negative": true,
"boost": 1
}
},
"aggregations": {
"my_agg_1": {
"histogram": {
"field": "coAt",
"interval": 86400,
"offset": 1558216800,
"order": {
"_key": "asc"
},
"keyed": false,
"min_doc_count": 1
},
"aggregations": {
"my_agg_2": {
"terms": {
"field": "atr1",
"missing": "NaN",
"value_type": "string",
"size": 2147483647,
"min_doc_count": 1,
"shard_min_doc_count": 0,
"show_term_doc_count_error": false,
"order": [
{
"_count": "desc"
},
{
"_key": "asc"
}
]
},
"aggregations": {
"atr2": {
"top_hits": {
"from": 0,
"size": 1,
"version": false,
"explain": false,
"sort": [
{
"coAt": {
"order": "desc"
}
}
]
}
},
"clientIP_count": {
"value_count": {
"field": "clientIP"
}
}
}
}
}
}
}
}

get a specific value based on criteria from elasticsearch array using java

My elastic search data looks like
{
"took": 12,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 1,
"hits": [
{
"_index": "eswar",
"_type": "azure",
"_id": "AV6y005oafdLlkt7Fe-R",
"_score": 1,
"_source": {
"costs": [
{
"cost": 3.6,
"endDate": "2017-02-15T00:00:00+00:00",
"startDate": "2017-02-14T00:00:00+00:00"
},
{
"cost": 2,
"endDate": "2017-02-14T00:00:00+00:00",
"startDate": "2017-02-13T00:00:00+00:00"
}
],
"externalUUID": "/subscriptions/9ee6993f-a036-4118-9eab-c66d9fda1ef3/resourceGroups/VISTARAGATEWAYIMAGE/providers/Microsoft.Compute/disks/VistaraGateway01_disk1_ec7798e17f934e6483ed5d2490e80d98",
"clientId": 154,
"region": "useast",
"cloudProviderId": 57063
}
},
{
"_index": "eswar",
"_type": "azure",
"_id": "AV6y00rmafdLlkt7Fe-Q",
"_score": 1,
"_source": {
"costs": [
{
"cost": 0,
"endDate": "2017-02-14T00:00:00+00:00",
"startDate": "2017-02-13T00:00:00+00:00"
},
{
"cost": 3,
"endDate": "2017-02-17T00:00:00+00:00",
"startDate": "2017-02-16T00:00:00+00:00"
}
],
"externalUUID": "/subscriptions/9ee6993f-a036-4118-9eab-c66d9fda1ef3/resourceGroups/vistaragatewayimage/providers/Microsoft.Compute/virtualMachines/VistaraGateway",
"clientId": 154,
"region": "eastus",
"cloudProviderId": 57063
}
}
]
}
}
I want to get costs.cost:3.6 as aggregation result,but I am getting result as 5
how can I filter data even in array?
RangeQueryBuilder startDateRQB = QueryBuilders.rangeQuery("costs.startDate").gte("2017-02-14T00:00:00+00:00");
RangeQueryBuilder endDateRQB = QueryBuilders.rangeQuery("costs.endDate").lte("2017-02-15T00:00:00+00:00");
RegexpQueryBuilder deviceNameREQB= QueryBuilders.regexpQuery("region", "useast.*");
BoolQueryBuilder bQB=QueryBuilders.boolQuery().must(deviceNameREQB).must(startDateRQB).must(endDateRQB);
BoolQueryBuilder sQB=QueryBuilders.boolQuery().must(startDateRQB).must(endDateRQB);
SearchResponse response = client.prepareSearch(index).setQuery(bQB).addAggregation(AggregationBuilders.sum("Totalcost").field("costs.cost")).execute().actionGet();
Sum sum=response.getAggregations().get("Totalcost");
double cost=sum.getValue();
System.out.println(cost);
I suggest you to define costs as nested object.
Than, you will be able to add conditions on the data inside (nested documents).
This approach can open a wide range of possibilities to your queries.
Have a look at the following solution:
{
"size": 0,
"aggregations": {
"costs_agg": {
"nested": {
"path": "costs"
},
"aggregations": {
"bool_agg": {
"must": [
{
"range": {
"costs.startDate": {
"gte": "2017-02-14T00:00:00+00:00"
}
}
},
{
"range": {
"costs.endDate": {
"lte": "2017-02-15T00:00:00+00:00"
}
}
},
{
"wildcard": {
"costs.region": "useast.*"
}
}
]
},
"aggregations": {
"cost_sum_agg": {
"sum": {
"field": "costs.cost"
}
}
}
}
}
}
}
Let me explain every aggregation (by its name):
costs_agg: nested aggregation to dive into costs scope
bool_agg: the thing with aggregation over nested object its that, a query above the aggregation won't filter by nested objects. The solution here is to filter the needed nested-documents inside the aggregation itself
cost_sum_agg: final sum
Hope it helps.

Categories