ElasticSearch - grouping by two different fields - java

Is there a way to apply group by on two fields in elasticsearch??
TermsBuilder yearAgg = AggregationBuilders.terms("by_year").field("year").subAggregation(AggregationBuilders.terms("by_name")).field("Name").subAggregation(sumMarks);
// create the bool filter for the condition above
String[] names = { "stokes", "roshan" };
BoolQueryBuilder aggFilter = QueryBuilders.boolQuery().must(QueryBuilders.termsQuery("Name", names));
// create the filter aggregation and add the year sub-aggregation
FilterAggregationBuilder aggregation = AggregationBuilders.filter("agg").filter(aggFilter).subAggregation(yearAgg);
// create the request and execute it
SearchResponse response = client.prepareSearch("bighalf").setTypes("excel").addAggregation(aggregation).execute().actionGet();
System.out.println(response.toString());
I tried to apply group by on two different terms but Am not getting the expected result.
Response after grouping:
{
"aggregations": {
"agg": {
"doc_count": 2,
"by_year": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "roshan",
"doc_count": 1,
"by_name": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "roshan",
"doc_count": 1
}
]
},
"sum_marks": {
"value": 85
}
},
{
"key": "stokes",
"doc_count": 1,
"by_name": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "stokes",
"doc_count": 1
}
]
},
"sum_marks": {
"value": 91
}
}
]
}
}
}
}
I could only see the document count under "by_name" grouping. Is there a better way to apply grouping on two different fields in elasticsearch.

You have an error in the way you build your aggregation, you're using the same field Name for both the by_year and by_name aggregation.
// your code
TermsBuilder yearAgg = AggregationBuilders.terms("by_year")
.field("year").subAggregation(AggregationBuilders.terms("by_name")).field("Name").subAggregation(sumMarks);
^
|
This parenthesis is wrong, it should go at the end
Do it like this instead
TermsBuilder yearAgg = AggregationBuilders.terms("by_year").field("year")
.subAggregation(AggregationBuilders.terms("by_name").field("Name").subAggregation(sumMarks));

Related

Scala - Ids lists of objects with duplicated values from spark dataset

I need to create an IDs lists for all objects that have identical (same value and quantity) parameters. I am looking for a solution that will be more efficient than two nested loops and an if.
Object structure in the dataset:
case class MergedProduct(id: String,
products: List[Product])
case class Product(productUrl: String, productId: String)
Example of data in dataset:
[ {
"id": "ID1",
"products": [
{
"product": {
"productUrl": "SOMEURL",
"productId": "1"
}
},
{
"product": {
"productUrl": "SOMEOTHERURL",
"productId": "1"
}
}
],
},
{
"id": "ID2",
"products": [
{
"product": {
"productUrl": "SOMEURL",
"productId": "1"
}
},
{
"product": {
"productUrl": "SOMEOTHERURL",
"productId": "1"
}
}
],
},
{
"id": "ID3",
"products": [
{
"product": {
"productUrl": "DIFFERENTURL",
"productId": "1"
}
},
{
"product": {
"productUrl": "SOMEOTHERURL",
"productId": "1"
}
}
],
},
{
"id": "ID4",
"products": [
{
"product": {
"productUrl": "SOMEOTHERURL",
"productId": "1"
}
},
{
"product": {
"productUrl": "DIFFERENTURL",
"productId": "1"
}
}
],
},
{
"id": "ID5",
"products": [
{
"product": {
"productUrl": "NOTDUPLICATEDURL",
"productId": "1"
}
},
{
"product": {
"productUrl": "DIFFERENTURL",
"productId": "1"
}
}
],
}
]
In this example, we have 4 objects that are duplicated, so I would like to get their ID in the corresponding lists.
Example output is List[List[String]]:
List(List("ID1", "ID2"), List("ID3","ID4"))
I am looking for something efficient and readable - the dataset we are talking about has nearly 700 million objects.
As I can remove the listed duplicates from the dataset (it does not affect the database) because the goal is one - logging them exists, so I was thinking about the solution of taking MergedProduct one by one, searching for other MergedProduct with identical Products, getting their ID, logging in they exist and then remove the mentioned MergedProduct ID from the dataset and move on to the next one until I check the whole dataset but in this case I would have to collect it first as a list of MergedProducts and then do all operations - seems like going around
After trying some options and looking for neat solutions- I think this is kinda ok:
private def getDuplicates(mergedProducts: List[MergedProduct]): List[List[String]] = {
val duplicates = mergedProducts.groupBy(_.products.sortBy(_.product.productId)).filter(_._2.size > 1).values.toList
duplicates.map(duplicates => duplicates.map(_.id))
}

How to aggregate MongoDB the final total sum? From the sum calculated earlier

How to aggregate the final total sum? From the sum calculated earlier
this is original result.
[
{
"name": "a",
"prices": 10,
},
{
"name": "a",
"prices": 20,
}
]
but i need to do this.
[
{
"name": "a",
"prices": 10,
},
{
"name": "a",
"prices": 20,
},
//i need to do more//
{
"name": "total",
"total":30
}
]
this is example picture.
enter image description here
$group by null and construct array of root documents in docs, get total price in totalPrices
concat current docs and total prices doc using $concatArrays
$unwind deconstruct docs array
$project to show both the fields from docs object
db.collection.aggregate([
{
$group: {
_id: null,
docs: { $push: "$$ROOT" },
totalPrices: { $sum: "$prices" }
}
},
{
$project: {
docs: {
$concatArrays: [
"$docs",
[
{
name: "total",
prices: "$totalPrices"
}
]
]
}
}
},
{ $unwind: "$docs" },
{
$project: {
_id: 0,
name: "$docs.name",
prices: "$docs.prices"
}
}
])
Playground

Elasticsearch group/aggregate respons by search criteria

I have a product that has a property categoryIds.
"id" : 1,
"title" : "product",
"price" : "1100.00",
"categories" : [ the ids of the product's categories],
"tags" : [ the ids of the product's tags ],
"variants" : [ nested type with properties: name, definition, maybe in the future availability dates]
I want to group the product id according to the category in the query.
In POST _search, I ask about products that belong to specific categories (eg [1, 2, 3]), and I can also limit them with a variant.
How can I group/aggregate my answer to get a list of the productIds of a categories?
What I'm trying to get:
{
"productsForCategories": {
"1": [
"product-1",
"product-2",
"product-3"
],
"2": [
"product-1",
"product-3",
"product-4"
],
"3": [
"product-5",
"product-6"
]
}
}
Thanks in advance for all answers.
What java generated.
curl --location --request POST 'https://localhost:9200/products/_search' \
--header 'Content-Type: application/json' \
--data-raw '{
"size": 0,
"query": {
"bool": {
"must": [
{
"bool": {
"should": [
{
"term": {
"categories": {
"value": 7,
"boost": 1.0
}
}
}
],
"adjust_pure_negative": true,
"minimum_should_match": "1",
"boost": 1.0,
"_name": "fromRawQuery"
}
}
],
"filter": [
{
"bool": {
"adjust_pure_negative": true,
"boost": 1.0,
"_name": "filterPart"
}
}
],
"adjust_pure_negative": true,
"boost": 1.0,
"_name": "queryPart"
}
},
"_source": {
"includes": [
"categories",
"productType",
"relations"
],
"excludes": []
},
"stored_fields": "_id",
"sort": [
{
"_score": {
"order": "desc"
}
}
],
"aggregations": {
"agg": {
"global": {},
"aggregations": {
"categories": {
"terms": {
"field": "categories",
"size": 2147483647,
"min_doc_count": 1,
"shard_min_doc_count": 0,
"show_term_doc_count_error": false,
"order": [
{
"_count": "desc"
},
{
"_key": "asc"
}
]
},
"aggregations": {
"productsForCategories": {
"terms": {
"field": "_id",
"size": 2147483647,
"min_doc_count": 1,
"shard_min_doc_count": 0,
"show_term_doc_count_error": false,
"order": [
{
"_count": "desc"
},
{
"_key": "asc"
}
]
}
}
}
}
}
}
}
}'```
You can use terms aggregation that is a multi-bucket value source based aggregation where buckets are dynamically built - one per unique value.
Adding a working example with index data, mapping, search query, and search result
Index Mapping:
{
"mappings":{
"properties":{
"categories":{
"type":"keyword"
}
}
}
}
Index Data:
{
"id":1,
"product":"p1",
"category":[1,2,7]
}
{
"id":2,
"product":"p2",
"category":[7,4,5]
}
{
"id":3,
"product":"p3",
"category":[4,5,6]
}
Search Query:
{
"size": 0,
"aggs": {
"cats": {
"terms": {
"field": "cat_ids",
"include": [
7
]
},
"aggs": {
"products": {
"terms": {
"field": "product.keyword",
"size": 10
}
}
}
}
}
}
Search Result:
"aggregations": {
"cats": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 7,
"doc_count": 2,
"products": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "p1",
"doc_count": 1
},
{
"key": "p2",
"doc_count": 1
}
]
}
}
]
}
I believe what you want is products corresponding to each category. As Bhavya mentioned you can use term aggregation for the same.
GET products/_search
{
"size": 0, //<===== If you need only aggregated results, set this to 0. It represents query result size.
"aggs": {
"categories": {
"terms": {
"field": "cat_ids", // <================= Equivalent of group by Cat_ids
"size": 10
},"aggs": {
"products": {
"terms": {
"field": "name.keyword",//<============= For Each category group by products
"size": 10
}
}
}
}
}
}
Result:
"aggregations" : {
"categories" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 1, //<========== category id
"doc_count" : 2, //<========== For the given category id 2 products
"products" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "p1", //<========= for cat_id=1, p1 is there
"doc_count" : 1
},
{
"key" : "p2", //<========= for cat_id=1, p2 is there
"doc_count" : 1
}
]
}
},
{
"key" : 2,
"doc_count" : 2,
"products" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "p1",
"doc_count" : 1
},
{
"key" : "p2",
"doc_count" : 1
}
]
}
},
{
"key" : 3,
"doc_count" : 1,
"products" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "p1",
"doc_count" : 1
}
]
}
}
]
}
}
Details are present as comments. Please remove the comments and try running the query.
Filtering aggregation results: See this

Completion Suggester in elasticsearch in mutifield

I'm using elasticsearch for the first time. I'm trying to use completion suggester in multi-field key, although I don't see any error but I don't get the response.
Mapping creation:
PUT /products5/
{
"mappings":{
"products" : {
"properties" : {
"name" : {
"type":"text",
"fields":{
"text":{
"type":"keyword"
},
"suggest":{
"type" : "completion"
}
}
}
}
}
}
}
Indexing:
PUT /products5/product/1
{
"name": "Apple iphone 5"
}
PUT /products5/product/2
{
"name": "iphone 4 16GB"
}
PUT /products5/product/3
{
"name": "iphone 3 SS 16GB black"
}
PUT /products5/product/4
{
"name": "Apple iphone 4 S 16 GB white"
}
PUT /products5/product/5
{
"name": "Apple iphone case"
}
Query:
POST /products5/product/_search
{
"suggest":{
"my-suggestion":{
"prefix":"i",
"completion":{
"field":"name.suggest"
}
}
}
}
Output:
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 0,
"max_score": 0,
"hits": []
},
"suggest": {
"my-suggestion": [
{
"text": "i",
"offset": 0,
"length": 1,
"options": []
}
]
}
}
Please guide me what is the mistake, I tried every possible options.
From the first perspective this looks accurate. Probably the reason why you don't have correct response is that you added documents in the index before you created mapping in the index. And documents are not indexed according to the mapping you specified
I have found an issue in your mapping name. There is an inconsistency between name of the mapping and value which you specifies in the url when you're creating new documents. You create a mapping in the index with the name products. And when you add new documents you're specifying product as a name of the mapping of your index and it doesn't end with s. You have a typo.

Change row color using Google spreadsheet API

I'd like to change the row color of a spreadsheet using the Google Spreadsheet API.
I'm using JAVA, I saw it working in JavaScript but I don't found it in JAVA.
Google Sheet API Documentation is not the best to say the least but after some fiddling here is python code that works:
http = credentials.authorize(httplib2.Http())
discovery_url = ('https://sheets.googleapis.com/$discovery/rest?'
'version=v4')
service = discovery.build('sheets', 'v4', http=http, discoveryServiceUrl=discovery_url, cache_discovery=False)
spreadsheet = service.spreadsheets().get(spreadsheetId=ss.id).execute()
requests = []
for sheet in spreadsheet.get('sheets'):
sheetId = sheet.get('properties').get('sheetId')
requests.append({
"updateCells": {
"rows": [
{
"values": [{
"userEnteredFormat": {
"backgroundColor": {
"red": 1,
"green": 0,
"blue": 0,
"alpha": 1
}}}
]
}
],
"fields": 'userEnteredFormat.backgroundColor',
"range": {
"sheetId": sheetId,
"startRowIndex": 0,
"endRowIndex": 1,
"startColumnIndex": 0,
"endColumnIndex": 1
}}})
body = {
'requests': requests
}
response = service.spreadsheets().batchUpdate(spreadsheetId=ss.id, body=body).execute()
With JavaScript API you could use this:
const range = {
sheetId: 250062959, // find your own
startRowIndex: 0,
endRowIndex: 1,
startColumnIndex: 0,
endColumnIndex: 1,
};
const request = {
spreadsheetId, // fill with your own
resource: {
requests: [
{
updateCells: {
range,
fields: '*',
rows: [
{
values: [
{
userEnteredValue: { stringValue: 'message' },
userEnteredFormat: {
backgroundColor: { red: 1, green: 0, blue: 0 },
},
},
],
},
],
},
},
],
},
};
try {
const result = await client.spreadsheets.batchUpdate(request);
console.log(result);
} catch (error) {
throw `update row error ${error}`;
}
I know this is a long time since you originally asked but according to the v4 API you could technically set a conditional format that is always true with spreadsheets.batchUpdate
eg. https://developers.google.com/sheets/api/samples/conditional-formatting
May not be the easiest thing to manage but is 'technically' possible
object = {
"updateCells": {
"range": {
"sheetId": sheetId,
"startRowIndex":startRowIndex,
"endRowIndex": endRowIndex,
"startColumnIndex": startColumnIndex,
"endColumnIndex": endColumnIndex
}
"rows": [{
"values": [{
"textFormatRuns": [
{"format": {
"foregroundColor": {
"red": 0.0,
"green": 255.0,
"blue": 31.0
},
},"startIndex": 0
},
]
}
]
}]
"fields": "textFormatRuns(format)"
}
 }
Set cell color:
https://developers.google.com/apps-script/reference/spreadsheet/range#setBackground(String)
google-apps-script (JavaScript) is the only option as far as I know. It can't be done with the spreadsheet-API (gdata)

Categories