I have a product that has a property categoryIds.
"id" : 1,
"title" : "product",
"price" : "1100.00",
"categories" : [ the ids of the product's categories],
"tags" : [ the ids of the product's tags ],
"variants" : [ nested type with properties: name, definition, maybe in the future availability dates]
I want to group the product id according to the category in the query.
In POST _search, I ask about products that belong to specific categories (eg [1, 2, 3]), and I can also limit them with a variant.
How can I group/aggregate my answer to get a list of the productIds of a categories?
What I'm trying to get:
{
"productsForCategories": {
"1": [
"product-1",
"product-2",
"product-3"
],
"2": [
"product-1",
"product-3",
"product-4"
],
"3": [
"product-5",
"product-6"
]
}
}
Thanks in advance for all answers.
What java generated.
curl --location --request POST 'https://localhost:9200/products/_search' \
--header 'Content-Type: application/json' \
--data-raw '{
"size": 0,
"query": {
"bool": {
"must": [
{
"bool": {
"should": [
{
"term": {
"categories": {
"value": 7,
"boost": 1.0
}
}
}
],
"adjust_pure_negative": true,
"minimum_should_match": "1",
"boost": 1.0,
"_name": "fromRawQuery"
}
}
],
"filter": [
{
"bool": {
"adjust_pure_negative": true,
"boost": 1.0,
"_name": "filterPart"
}
}
],
"adjust_pure_negative": true,
"boost": 1.0,
"_name": "queryPart"
}
},
"_source": {
"includes": [
"categories",
"productType",
"relations"
],
"excludes": []
},
"stored_fields": "_id",
"sort": [
{
"_score": {
"order": "desc"
}
}
],
"aggregations": {
"agg": {
"global": {},
"aggregations": {
"categories": {
"terms": {
"field": "categories",
"size": 2147483647,
"min_doc_count": 1,
"shard_min_doc_count": 0,
"show_term_doc_count_error": false,
"order": [
{
"_count": "desc"
},
{
"_key": "asc"
}
]
},
"aggregations": {
"productsForCategories": {
"terms": {
"field": "_id",
"size": 2147483647,
"min_doc_count": 1,
"shard_min_doc_count": 0,
"show_term_doc_count_error": false,
"order": [
{
"_count": "desc"
},
{
"_key": "asc"
}
]
}
}
}
}
}
}
}
}'```
You can use terms aggregation that is a multi-bucket value source based aggregation where buckets are dynamically built - one per unique value.
Adding a working example with index data, mapping, search query, and search result
Index Mapping:
{
"mappings":{
"properties":{
"categories":{
"type":"keyword"
}
}
}
}
Index Data:
{
"id":1,
"product":"p1",
"category":[1,2,7]
}
{
"id":2,
"product":"p2",
"category":[7,4,5]
}
{
"id":3,
"product":"p3",
"category":[4,5,6]
}
Search Query:
{
"size": 0,
"aggs": {
"cats": {
"terms": {
"field": "cat_ids",
"include": [
7
]
},
"aggs": {
"products": {
"terms": {
"field": "product.keyword",
"size": 10
}
}
}
}
}
}
Search Result:
"aggregations": {
"cats": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 7,
"doc_count": 2,
"products": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "p1",
"doc_count": 1
},
{
"key": "p2",
"doc_count": 1
}
]
}
}
]
}
I believe what you want is products corresponding to each category. As Bhavya mentioned you can use term aggregation for the same.
GET products/_search
{
"size": 0, //<===== If you need only aggregated results, set this to 0. It represents query result size.
"aggs": {
"categories": {
"terms": {
"field": "cat_ids", // <================= Equivalent of group by Cat_ids
"size": 10
},"aggs": {
"products": {
"terms": {
"field": "name.keyword",//<============= For Each category group by products
"size": 10
}
}
}
}
}
}
Result:
"aggregations" : {
"categories" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 1, //<========== category id
"doc_count" : 2, //<========== For the given category id 2 products
"products" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "p1", //<========= for cat_id=1, p1 is there
"doc_count" : 1
},
{
"key" : "p2", //<========= for cat_id=1, p2 is there
"doc_count" : 1
}
]
}
},
{
"key" : 2,
"doc_count" : 2,
"products" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "p1",
"doc_count" : 1
},
{
"key" : "p2",
"doc_count" : 1
}
]
}
},
{
"key" : 3,
"doc_count" : 1,
"products" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "p1",
"doc_count" : 1
}
]
}
}
]
}
}
Details are present as comments. Please remove the comments and try running the query.
Filtering aggregation results: See this
Related
Working with Nifi JoltTransformJSON processor, fairly new to using JOLT Specs and can't figure out how to move a set of {} from each item of an array to a single set of {} around all items of the array form the output JSON. Please see the OUTPUT and the NEEDED OUTPUT sections below for comparisons. I tried creating the spec from reading the JOLT definition and some examples which got me pretty close, but this last part has been kicking my rear for a while, any hints would be awesome!
INPUT JSON:
{
"AssetID": "1",
"AssetNumber": "2",
"AssetMaterial": "Cisco MDS 9706",
"RackUnits": "9.0",
"MaterialType": "Chassis",
"AssetName": "Cisco-MDS-9706_1",
"CustRID": "A001",
"SerialNumber": "OU812",
"Room": "ROOM5",
"Datacenter": "DC69",
"UMountingID": "86",
"CabinetAssetID": "181",
"CabinetName": "CAB666"
}
CURRENT SPEC:
[
{
"operation": "shift",
"spec": {
"AssetID": "data[].6.value",
"AssetNumber": "data[].7.value",
"AssetMaterial": "data[].8.value",
"AssetName": "data[].9.value",
"CustRID": "data[].10.value",
"SerialNumber": "data[].11.value",
"Room": "data[].12.value",
"Datacenter": "data[].13.value",
"UMountingID": "data[].14.value",
"CabinetAssetID": "data[].15.value",
"CabinetName": "data[].16.value"
}
},
{
"operation": "default",
"spec": {
"to": "table1"
}
},
{
"operation": "default",
"spec": {
"fieldsToReturn": [6, 7, 8, 9, 10, 11, 12]
}
}
]
OUTPUT JSON:
{
"data": [
{
"6": {
"value": "1"
}
},
{
"7": {
"value": "2"
}
},
{
"8": {
"value": "Cisco MDS 9706"
}
},
{
"9": {
"value": "Cisco-MDS-9706_1"
}
},
{
"10": {
"value": "A001"
}
},
{
"11": {
"value": "OU812"
}
},
{
"12": {
"value": "ROOM5"
}
},
{
"13": {
"value": "DC69"
}
},
{
"14": {
"value": "86"
}
},
{
"15": {
"value": "181"
}
},
{
"16": {
"value": "CAB666"
}
}
],
"to": "table1",
"fieldsToReturn": [ 6, 7, 8, 9, 10, 11, 12 ]
}
REQUIRED/EXPECTED OUTPUT:
{
"data" : [
{
"6" : {
"value" : "1"
},
"7" : {
"value" : "2"
},
"8" : {
"value" : "Cisco MDS 9706"
},
"9" : {
"value" : "Cisco-MDS-9706_1"
},
"10" : {
"value" : "A001"
},
"11" : {
"value" : "OU812"
},
"12" : {
"value" : "ROOM5"
},
"13" : {
"value" : "DC69"
},
"14" : {
"value" : "86"
},
"15" : {
"value" : "181"
},
"16" : {
"value" : "CAB666"
}
}
],
"to" : "table1",
"fieldsToReturn" : [ 6, 7, 8, 9, 10, 11, 12 ]
}
What you need seems that the top level object should reside at the common index of the array data, so use index 0 such as data[0]. to combine those object as single object such as
[
{
"operation": "shift",
"spec": {
"AssetID": "data[0].6.value",
"AssetNumber": "data[0].7.value",
"AssetMaterial": "data[0].8.value",
...
...
"#table1": "to"
}
},
{
"operation": "default",
"spec": {
"fieldsToReturn": [6, 7, 8, 9, 10, 11, 12]
}
},
{
"operation": "sort"
}
]
btw, the first default transformation spec is superfluous, rather use
"#table1": "to"
within the shift transformation spec
I'm trying the Elasticsearch aggregations in spring webflux.
I what to know how to get the buckets values (inside buckets count) from Flux<AggregationContainer<?>> in Java API and total records in elasticsearch.
Request:
order/_search
{
"size": 0,
"aggs": { "status":
{ "terms":
{ "field":"order_status" , "size": 100}}}}
Response:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
.....
},
"aggregations" : {
"status" : {
"buckets" : [
{
"key" : "CREATED",
"doc_count" : 34
},
.....
{
"key" : "CANCELLED",
"doc_count" : 2
},
{
"key" : "COMPLETED",
"doc_count" : 1
}
]
}
}
}
Java imp:
#Autowired
private ReactiveElasticsearchOperations operations;
Flux<AggregationContainer<?>> result = operations.aggregate(searchQuery, Count.class, "indexs");
I am working with Elasticsearch recently, and I meet a problem that don't know how to solve it.
I have a Json like:
{
"objects": [
"object1": {
"id" : "12345",
"name":"abc"
},
"12345"
]
}
Object2 is a reference of object1, when I trying to saving(or called indexing) into elastic search, it says:
"org.elasticsearch.index.mapper.MapperParsingException: failed to parse"
After I google I found that because object1 is an object, but object 2 is considered as a string.
We cannot change our json in our project, so in this case how can I save it in the elasticsearch?
Thanks for any help and suggestion.
How do you do that?
I run this command and it works.
PUT test/t1/1
{
"objects": {
"object1": {
"id" : "12345",
"name":"abc"
},
"object2": "12345"
}
}
and the result is:
{
"_index": "test",
"_type": "t1",
"_id": "1",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 2,
"failed": 0
},
"created": true
}
UPDATE 1
Depending on your requirements one of these may solve your problem:
PUT test/t1/2
{
"objects": [
{
"object1": {
"id": "12345",
"name": "abc"
}
},
{
"object2": "12345"
}
]
}
PUT test/t1/2
{
"objects": [
{
"object1": {
"id": "12345",
"name": "abc"
},
"object2": "12345"
},
{
...
}
]
}
I am using the Transport client to retrieve data from Elasticsearch.
Example code snippet:
String[] names = {"Stokes","Roshan"};
BoolQueryBuilder builder = QueryBuilders.boolQuery();
AggregationBuilder<?> aggregation = AggregationBuilders.filters("agg")
.filter(builder.filter(QueryBuilders.termsQuery("Name", "Taylor"))
.filter(QueryBuilders.rangeQuery("grade").lt(9.0)))
.subAggregation(AggregationBuilders.terms("by_year").field("year")
.subAggregation(AggregationBuilders.sum("sum_marks").field("marks"))
.subAggregation(AggregationBuilders.sum("sum_grade").field("grade")));
SearchResponse response = client.prepareSearch(index)
.setTypes(datasquareID)
.addAggregation(aggregation)
.execute().actionGet();
System.out.println(response.toString());
I wanted to calculate the sum of marks and the sum of grades with names "Stokes" or "Roshan" whose grade is less than 9 and group them by "year". Please let me know whether my approach is correct or not. Please let me know your suggestions as well.
Documents in ES:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 5,
"max_score" : 1,
"hits" : [{
"_index" : "bighalf",
"_type" : "excel",
"_id" : "AVE0rgXqe0-x669Gsae3",
"_score" : 1,
"_source" : {
"Name" : "Taylor",
"grade" : 9,
"year" : 2016,
"marks" : 54,
"subject" : "Mathematics",
"Gender" : "male",
"dob" : "13/09/2000"
}
}, {
"_index" : "bighalf",
"_type" : "excel",
"_id" : "AVE0rvTHe0-x669Gsae5",
"_score" : 1,
"_source" : {
"Name" : "Marsh",
"grade" : 9,
"year" : 2015,
"marks" : 70,
"subject" : "Mathematics",
"Gender" : "male",
"dob" : "22/11/2000"
}
}, {
"_index" : "bighalf",
"_type" : "excel",
"_id" : "AVE0sBbZe0-x669Gsae7",
"_score" : 1,
"_source" : {
"Name" : "Taylor",
"grade" : 3,
"year" : 2015,
"marks" : 87,
"subject" : "physics",
"Gender" : "male",
"dob" : "13/09/2000"
}
}, {
"_index" : "bighalf",
"_type" : "excel",
"_id" : "AVE0rWz4e0-x669Gsae2",
"_score" : 1,
"_source" : {
"Name" : "Stokes",
"grade" : 9,
"year" : 2015,
"marks" : 91,
"subject" : "Mathematics",
"Gender" : "male",
"dob" : "21/12/2000"
}
}, {
"_index" : "bighalf",
"_type" : "excel",
"_id" : "AVE0roT4e0-x669Gsae4",
"_score" : 1,
"_source" : {
"Name" : "Roshan",
"grade" : 9,
"year" : 2015,
"marks" : 85,
"subject" : "Mathematics",
"Gender" : "male",
"dob" : "12/12/2000"
}
}
]
}
}
Response :
"aggregations" : {
"agg" : {
"buckets" : [{
"doc_count" : 0,
"by_year" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : []
}
}
]
}
}
Please let me know the solution for my requirement.
I think the issue is in your filters aggregation. To sum it up, you want to filter your aggregation to documents "... with names "Stokes" or "Roshan" whose grade is less than 9". In order to do this
// create the sum aggregations
SumBuilder sumMarks = AggregationBuilders.sum("sum_marks").field("marks");
SumBuilder sumGrades = AggregationBuilders.sum("sum_grade").field("grade");
// create the year aggregation + add the sum sub-aggregations
TermsBuilder yearAgg = AggregationBuilders.terms("by_year").field("year")
.subAggregation(sumMarks)
.subAggregation(sumGrades);
// create the bool filter for the condition above
String[] names = {"stokes","roshan"};
BoolQueryBuilder aggFilter = QueryBuilders.boolQuery()
.must(QueryBuilders.termsQuery("Name", names))
.must(QueryBuilders.rangeQuery("grade").lte(9.0))
// create the filter aggregation and add the year sub-aggregation
FilterAggregationBuilder aggregation = AggregationBuilders.filter("agg")
.filter(aggFilter)
.subAggregation(yearAgg);
// create the request and execute it
SearchResponse response = client.prepareSearch(index)
.setTypes(datasquareID)
.addAggregation(aggregation)
.execute().actionGet();
System.out.println(response.toString());
In the end, it will look like this:
{
"query": {
"match_all": {}
},
"aggs": {
"agg": {
"filter": {
"bool": {
"must": [
{
"terms": {
"Name": [
"stokes",
"roshan"
]
}
},
{
"range": {
"grade": {
"lte": 9
}
}
}
]
}
},
"aggs": {
"by_year": {
"terms": {
"field": "year"
},
"aggs": {
"sum_marks": {
"sum": {
"field": "marks"
}
},
"sum_grade": {
"sum": {
"field": "grade"
}
}
}
}
}
}
}
}
For your documents above, the result will look like this:
"aggregations": {
"agg": {
"doc_count": 2,
"by_year": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 2015,
"doc_count": 2,
"sum_grade": {
"value": 18
},
"sum_marks": {
"value": 176
}
}
]
}
}
}
I am so tied for split the data for my expectation output. But i could not able to got it. I tried all the Filter and Tokenizer.
I Have Updated setting in elastic search as give below.
{
"settings": {
"analysis": {
"filter": {
"filter_word_delimiter": {
"preserve_original": "true",
"type": "word_delimiter"
}
},
"analyzer": {
"en_us": {
"tokenizer": "keyword",
"filter": [ "filter_word_delimiter","lowercase" ]
}
}
}
}
}
Executed Queries
curl -XGET "XX.XX.XX.XX:9200/keyword/_analyze?pretty=1&analyzer=en_us" -d 'DataGridControl'
Hits value
{
"tokens" : [ {
"token" : "datagridcontrol"
"start_offset" : 0,
"end_offset" : 16,
"type" : "word",
"position" : 1
}, {
"token" : "data",
"start_offset" : 0,
"end_offset" : 4,
"type" : "word",
"position" : 1
}, {
"token" : "grid",
"start_offset" : 4,
"end_offset" : 8,
"type" : "word",
"position" : 2
}, {
"token" : "control",
"start_offset" : 9,
"end_offset" : 16,
"type" : "word",
"position" : 3
} ]
}
Expectation Result like ->
DataGridControl
DataGrid
DataControl
Data
grid
control
What type of tokenizer and Filter add to index setting.
Any help ?
Try this:
{
"settings": {
"analysis": {
"filter": {
"filter_word_delimiter": {
"type": "word_delimiter"
},
"custom_shingle": {
"type": "shingle",
"token_separator":"",
"max_shingle_size":3
}
},
"analyzer": {
"en_us": {
"tokenizer": "keyword",
"filter": [
"filter_word_delimiter",
"custom_shingle",
"lowercase"
]
}
}
}
}
}
and let me know if it gets you any closer.