My elastic search data looks like
{
"took": 12,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 1,
"hits": [
{
"_index": "eswar",
"_type": "azure",
"_id": "AV6y005oafdLlkt7Fe-R",
"_score": 1,
"_source": {
"costs": [
{
"cost": 3.6,
"endDate": "2017-02-15T00:00:00+00:00",
"startDate": "2017-02-14T00:00:00+00:00"
},
{
"cost": 2,
"endDate": "2017-02-14T00:00:00+00:00",
"startDate": "2017-02-13T00:00:00+00:00"
}
],
"externalUUID": "/subscriptions/9ee6993f-a036-4118-9eab-c66d9fda1ef3/resourceGroups/VISTARAGATEWAYIMAGE/providers/Microsoft.Compute/disks/VistaraGateway01_disk1_ec7798e17f934e6483ed5d2490e80d98",
"clientId": 154,
"region": "useast",
"cloudProviderId": 57063
}
},
{
"_index": "eswar",
"_type": "azure",
"_id": "AV6y00rmafdLlkt7Fe-Q",
"_score": 1,
"_source": {
"costs": [
{
"cost": 0,
"endDate": "2017-02-14T00:00:00+00:00",
"startDate": "2017-02-13T00:00:00+00:00"
},
{
"cost": 3,
"endDate": "2017-02-17T00:00:00+00:00",
"startDate": "2017-02-16T00:00:00+00:00"
}
],
"externalUUID": "/subscriptions/9ee6993f-a036-4118-9eab-c66d9fda1ef3/resourceGroups/vistaragatewayimage/providers/Microsoft.Compute/virtualMachines/VistaraGateway",
"clientId": 154,
"region": "eastus",
"cloudProviderId": 57063
}
}
]
}
}
I want to get costs.cost:3.6 as aggregation result,but I am getting result as 5
how can I filter data even in array?
RangeQueryBuilder startDateRQB = QueryBuilders.rangeQuery("costs.startDate").gte("2017-02-14T00:00:00+00:00");
RangeQueryBuilder endDateRQB = QueryBuilders.rangeQuery("costs.endDate").lte("2017-02-15T00:00:00+00:00");
RegexpQueryBuilder deviceNameREQB= QueryBuilders.regexpQuery("region", "useast.*");
BoolQueryBuilder bQB=QueryBuilders.boolQuery().must(deviceNameREQB).must(startDateRQB).must(endDateRQB);
BoolQueryBuilder sQB=QueryBuilders.boolQuery().must(startDateRQB).must(endDateRQB);
SearchResponse response = client.prepareSearch(index).setQuery(bQB).addAggregation(AggregationBuilders.sum("Totalcost").field("costs.cost")).execute().actionGet();
Sum sum=response.getAggregations().get("Totalcost");
double cost=sum.getValue();
System.out.println(cost);
I suggest you to define costs as nested object.
Than, you will be able to add conditions on the data inside (nested documents).
This approach can open a wide range of possibilities to your queries.
Have a look at the following solution:
{
"size": 0,
"aggregations": {
"costs_agg": {
"nested": {
"path": "costs"
},
"aggregations": {
"bool_agg": {
"must": [
{
"range": {
"costs.startDate": {
"gte": "2017-02-14T00:00:00+00:00"
}
}
},
{
"range": {
"costs.endDate": {
"lte": "2017-02-15T00:00:00+00:00"
}
}
},
{
"wildcard": {
"costs.region": "useast.*"
}
}
]
},
"aggregations": {
"cost_sum_agg": {
"sum": {
"field": "costs.cost"
}
}
}
}
}
}
}
Let me explain every aggregation (by its name):
costs_agg: nested aggregation to dive into costs scope
bool_agg: the thing with aggregation over nested object its that, a query above the aggregation won't filter by nested objects. The solution here is to filter the needed nested-documents inside the aggregation itself
cost_sum_agg: final sum
Hope it helps.
Related
I am using Java to perform queries on Elasticsearch, via the ElasticSearchClient. As there are big variables returned, I would like to only retrieve the ones that are relevant but the variables in _source are nested.
Below is a sample index response (multiple indexes can be returned with same _source structure)
[
{
"_index": "kn-tas-20200630",
"_type": "_doc",
"_id": "1122334455",
"_score": null,
"_source": {
"variables": [
{
"rawValue": "DEFH",
"name": "MANAGER"
},
{
"rawValue": "ABCD",
"name": "EMPLOYEE"
},
{
"rawValue": "[{\"rowId\":102030,\"rowType\":\"SIM\"}]",
"name": "extData"
}
]
},
"sort": [
1665735632119
]
}
]
I would like to create a query using SearchSourceBuilder to query ES and only retrieve the following:
Get the rawValue by name (I provide Manager, I get "DFEH")
Get the rowType value (I provide extData + row Type, I get "SIM")
Below is my query:
{
"from": 0,
"size": 100,
"query": {
"bool": {
"must": [
{
"terms": {
"prcKey": [
"K-112"
],
"boost": 1.0
}
}
],
"must_not": [
{
"exists": {
"field": "endDate",
"boost": 1.0
}
},
{
"term": {
"personInCharge": {
"value": "ABC",
"boost": 1.0
}
}
}
],
"adjust_pure_negative": true,
"boost": 1.0
}
},
"_source": {
"includes": [
"variables.name",
"variables.rawValue"
],
"excludes": []
},
"sort": [
{
"createTime": {
"order": "desc"
}
}
]
}
How can I fix my query? I tried using nested queries but without any luck.
I am trying to filter the records based on nested field and want only the matching object in that array to be shown as part of the record.
Below is the detailed explanation of my requirement.
So, I have Elasticsearch data like this:
[{
"basicInfo": {
"requestId": 123,
},
"managerInfo": {
"manager": "John",
},
"groupInfo": [
{
"id": "id1",
"name": "abc",
"status": "Approved"
},
{
"id": "id2",
"name": "abc",
"status": "Pending"
}
]
},
{
"basicInfo": {
"requestId": 233,
},
"managerInfo": {
"manager": "John Sr",
},
"groupInfo": [
{
"id": "id3",
"name": "abc",
"status": "Pending"
}
]
}
]
I want to filter the records only with groupInfo.status as Approved and basicInfo.requestId as 123, but my condition is I should only get the Approved record in the groupInfo and not the pending ones. So, the output I am expecting is:
{
"took": 23,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 3.0602708,
"hits": [
{
"_index": "my_index",
"_type": "request",
"_id": "123",
"_score": 3.0602708,
"_source": {
"basicInfo": {
"requestId": 123
},
"managerInfo": {
"manager": "John"
},
"groupInfo": [
{
"id": "id1",
"name": "abc",
"status": "Approved"
}
// No id2 here as it is in pending state
]
}
}
]
}
}
But instead I am able to achieve:
{
"took": 23,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 3.0602708,
"hits": [
{
"_index": "my_index",
"_type": "request",
"_id": "123",
"_score": 3.0602708,
"_source": {
"basicInfo": {
"requestId": 123
},
"managerInfo": {
"manager": "John"
},
"groupInfo": [
{
"id": "id1",
"name": "abc",
"status": "Approved"
},
{
"id": "id2",
"name": "abc",
"status": "Pending"
}
]
}
}
]
}
}
This is the query I am using:
{
"query": {
"bool": {
"must": [
{
"match": {
"basicInfo.requestId": "123"
}
},
{
"nested": {
"path": "groupInfo",
"query": {
"bool": {
"must": [
{
"term": {
"groupInfo.status": "Approved"
}
}
]
}
}
}
}
]
}
}
}
So, my question is first what I am expecting, is that even possible? Can we filter the result and make sure that we get only the matched array from that result?
If yes, how can we do it?
Thanks in advance.
Maybe you are looking for Inner Hits.
In many cases, it’s very useful to know which inner nested objects (in
the case of nested) or children/parent documents (in the case of
parent/child) caused certain information to be returned. The inner
hits feature can be used for this. This feature returns per search hit
in the search response additional nested hits that caused a search hit
to match in a different scope.
{
"query": {
"bool": {
"must": [
{
"match": {
"basicInfo.requestId": "123"
}
},
{
"nested": {
"path": "groupInfo",
"query": {
"bool": {
"must": [
{
"term": {
"groupInfo.status": "Approved"
}
}
]
}
},
"inner_hits":{}
}
}
]
}
}
}
edit: I'm using elastic search 7.3.0
I'm trying to do a query with an aggregation and sub aggregation, but the sub aggregation is absent from the SearchResponse.
As part of debugging, I ran my query in a unit test, copied the query, and ran it manually with postman. There, the response is exactly what I expect, but for some reason, in my java code, parts are missing.
SearchRequest request = new SearchRequest("index");
SearchSourceBuidler search = new SearchSourceBuilder();
SortBuilder sortByDate = SortBuilders
.fieldSort("date")
.order(SortOrder.DESC);
// Getting the latest result for each bucket
TopHitsAggregationBuilder latestResults = AggregationBuilders
.topHits("latest")
.sort(sortByDate)
.fetchSource("*","")
.size(1);
// Aggregate per service
TermsAggregationBuilder perService = AggregationBuilders
.terms("services")
.field("service.service_id")
.subAggregation(latestResults);
search.aggregation(perService);
search.size(1);
request.source(search);
SearchResponse response = client.search(request, RequestOptions.DEFAULT);
Here is the request generated:
{
"size": 0,
"aggregations": {
"services": {
"terms": {
"field": "service.service_id",
"size": 10,
"min_doc_count": 1,
"shard_min_doc_count": 0,
"show_term_doc_count_error": false,
"order": [
{
"_count": "desc"
},
{
"_key": "asc"
}
]
},
"aggregations": {
"latest": {
"top_hits": {
"from": 0,
"size": 1,
"version": false,
"seq_no_primary_term": false,
"explain": false,
"_source": {
"includes": [
"*"
],
"excludes": [
""
]
},
"sort": [
{
"date": {
"order": "desc"
}
}
]
}
}
}
}
}
}
In my code, response is
{
...
"aggregations": {
"sterms#services": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": []
}
}
}
If I run the same query manually I get
{
...
"aggregations": {
"services": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "09045f59-3709-4769-8c92-d611f773a401",
"doc_count": 2,
"latest": {
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": null,
"hits": [ ... ]
I have a huge amount of elasticsearch data, I neeed to make aggregations and return buckets. I need to limit data size returned from elasticsearch to only get a sample for the data not all data.
I have tried adding "size" attribute. But it's not acceptable in bucketing aggregations.
{
"size": 0,
"query": {
"bool": {
"adjust_pure_negative": true,
"boost": 1
}
},
"aggregations": {
"my_agg_1": {
"histogram": {
"field": "coAt",
"interval": 86400,
"offset": 1558216800,
"order": {
"_key": "asc"
},
"keyed": false,
"min_doc_count": 1
},
"aggregations": {
"my_agg_2": {
"terms": {
"field": "atr1",
"missing": "NaN",
"value_type": "string",
"size": 2147483647,
"min_doc_count": 1,
"shard_min_doc_count": 0,
"show_term_doc_count_error": false,
"order": [
{
"_count": "desc"
},
{
"_key": "asc"
}
]
},
"aggregations": {
"atr2": {
"top_hits": {
"from": 0,
"size": 1,
"version": false,
"explain": false,
"sort": [
{
"coAt": {
"order": "desc"
}
}
]
}
},
"clientIP_count": {
"value_count": {
"field": "clientIP"
}
}
}
}
}
}
}
}
I do a query on Elasticsearch from Kibana 4.4.1 which looks like this :
{
"size": 0,
"query": {
"filtered": {
"query": {
"query_string": {
"query": "FALK0911622560T",
"analyze_wildcard": true
}
},
"filter": {
"bool": {
"must": [
{
"range": {
"#timestamp": {
"gte": 1438290000000,
"lte": 1440968400000,
"format": "epoch_millis"
}
}
}
],
"must_not": []
}
}
}
},
"aggs": {
"2": {
"date_histogram": {
"field": "#timestamp",
"interval": "1w",
"time_zone": "Europe/Helsinki",
"min_doc_count": 1,
"extended_bounds": {
"min": 1438290000000,
"max": 1440968400000
}
},
"aggs": {
"1": {
"percentiles": {
"field": "Quantity",
"percents": [
50
]
}
}
}
}
}
}
This piece of code will return all the docs with "ProductCode" = FALK0911622560T", between the given interval.
I tried the same thing with Elasticsearch Java API with the following code :
BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery().must(QueryBuilders.matchQuery(matchQueryKey,matchQueryValue));
SearchResponse response = client.prepareSearch(indexName)
.setTypes(indexTypeName)
.setQuery(boolQueryBuilder)
.setSize(100)
.addAggregation(AggregationBuilders
.dateHistogram("myHistogram")
.field("#timestamp")
.interval(DateHistogramInterval.WEEK)
.timeZone("Europe/Helsinki")
.minDocCount(1)
.extendedBounds(1438290000000L, 1440968400000L))
.addFields(fieldsOfInterest)
.execute()
.actionGet();
response.getAggregations();
But I get all the documents in the index with "ProductCode" = FALK0911622560T.
Between the given time, I should have only 5 documents on response.getAgregations() because I set the interval to be Week.
A doc in Elasticsearch looks like this :
{
"_index": "warehouse-550",
"_type": "core2",
"_id": "AVOKCqQ68h4KkDGZvk6b",
"_score": null,
"_source": {
"message": "5,550,67.01,FALK0911622560T,2015-07-31;08:00:00.000\r",
"#version": "1",
"#timestamp": "2015-07-31T06:00:00.000Z",
"path": "D:/Programs/Logstash/x_testingLocally/processed-stocklevels-550-25200931072015.csv",
"host": "EVO385",
"type": "core2",
"Quantity": 5,
"Warehouse": "550",
"Price": 67.01,
"ProductCode": "FALK0911622560T",
"Timestamp": "2015-07-31;08:00:00.000"
},
"fields": {
"#timestamp": [
1438322400000
]
},
"highlight": {
"ProductCode": [
"#kibana-highlighted-field#FALK0911622560T#/kibana-highlighted-field#"
],
"message": [
"5,550,67.01,#kibana-highlighted-field#FALK0911622560T#/kibana-highlighted-field#,2015-07-31;08:00:00.000\r"
]
},
"sort": [
1438322400000
]
}
Please help.
Thank you.
You did not add the rangeQuery. Change your boolQueryBuilder to following:
BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery().must(QueryBuilders.matchQuery(matchQueryKey,matchQueryValue)).must(QueryBuilders.rangeQuery("#timestamp").gte(fromValue).lte(toValue));
You can get buckets using:
InternalDateHistogram histogram = searchResponse.getAggregations().getAsMap().get(aggregation_name);
List bucketList = histogram?.getBuckets()