I have created a composite query for aggregating on 2 different attributes as below -
{
"from": 0,
"size": 0,
"query": {
"bool": {
"must": [
{
"nested": {
"query": {
"script": {
"script": {
"source": "params.territoryIds.contains(doc['territoryHierarchy.id'].value) ",
"lang": "painless",
"params": {
"territoryIds": [
12345678
]
}
},
"boost": 1.0
}
},
"path": "territoryHierarchy",
"ignore_unmapped": false,
"score_mode": "none",
"boost": 1.0
}
},
{
"bool": {
"should": [
{
"nested": {
"query": {
"script": {
"script": {
"source": "doc['forecastHeaders.id'].value == params.id && doc['forecastHeaders.revenueCategory'].value == params.revenueCategory ",
"lang": "painless",
"params": {
"revenueCategory": 0,
"id": 987654321
}
},
"boost": 1.0
}
},
"path": "forecastHeaders",
"ignore_unmapped": false,
"score_mode": "none",
"boost": 1.0
}
},
{
"nested": {
"query": {
"script": {
"script": {
"source": "doc['forecastHeaders.id'].value == params.id && doc['forecastHeaders.revenueCategory'].value == params.revenueCategory ",
"lang": "painless",
"params": {
"revenueCategory": 0,
"id": 987654321
}
},
"boost": 1.0
}
},
"path": "forecastHeaders",
"ignore_unmapped": false,
"score_mode": "none",
"boost": 1.0
}
}
],
"adjust_pure_negative": true,
"boost": 1.0
}
},
{
"terms": {
"revnWinProbability": [
40,
50
],
"boost": 1.0
}
},
{
"terms": {
"revenueStatus.keyword": [
"OPEN"
],
"boost": 1.0
}
},
{
"range": {
"recordUpdateTime":{
"gte":1655117440000
}
}
}
],
"adjust_pure_negative": true,
"boost": 1.0
}
},
"version": true,
"aggregations": {
"TopLevelAggregation": {
"composite" : {
"size" : 10000,
"sources" : [
{
"directs": {
"terms": {
"script": {
"source": "def territoryNamesList = new ArrayList(); def name; def thLength = params._source.territoryHierarchy.length; for(int i = 0; i< thLength;i++) { def thRecord = params._source.territoryHierarchy[i]; if (params.territoryIds.contains(thRecord.id) && i+params.levelToReturn < thLength) { territoryNamesList.add(params._source.territoryHierarchy[i+params.levelToReturn].name);} } return territoryNamesList;",
"lang": "painless",
"params": {
"territoryIds": [
12345678
],
"levelToReturn": 1
}
}
}
}
},
{
"qtr" : {
"terms" : {
"field" : "quarter.keyword",
"missing_bucket" : false,
"order" : "asc"
}
}
}
]
},
"aggregations": {
"revnRevenueAmount": {
"sum": {
"script": {
"source": "doc['revenueTypeCategory.keyword'].value != 'Other' ? doc['revnRevenueAmount']:doc['revnRevenueAmount']",
"lang": "painless"
},
"value_type": "long"
}
}
}
}
}
}
So this query does a composite aggregation based on two different terms aggregations, directs and qtr, and it works fine.
Now I am trying to create a corresponding spring data java client implementation for it. So I have created the code as below -
BoolQueryBuilder baseQueryBuilder = getQueryBuilder(searchCriteria);
List<TermsAggregationBuilder> aggregationBuilders = getMultiBaseAggregationBuilders(searchCriteria, baseQueryBuilder);
Where the bool query supplies the first part of the bool query and the getMultiBaseAggregationBuilders method returns the 2 different terms aggregations shown in the query above - directs and qtr. Now I am not finding any API to send this list of terms aggregations to the composite aggregation builder. Would be really grateful if someone can give me a pointer as to how this list of terms aggregations can be used inside the composite aggregation builder so the same can be achieved in the java code as it shows in the elastic query above. Thanks in advance.
Related
I am using Java to perform queries on Elasticsearch, via the ElasticSearchClient. As there are big variables returned, I would like to only retrieve the ones that are relevant but the variables in _source are nested.
Below is a sample index response (multiple indexes can be returned with same _source structure)
[
{
"_index": "kn-tas-20200630",
"_type": "_doc",
"_id": "1122334455",
"_score": null,
"_source": {
"variables": [
{
"rawValue": "DEFH",
"name": "MANAGER"
},
{
"rawValue": "ABCD",
"name": "EMPLOYEE"
},
{
"rawValue": "[{\"rowId\":102030,\"rowType\":\"SIM\"}]",
"name": "extData"
}
]
},
"sort": [
1665735632119
]
}
]
I would like to create a query using SearchSourceBuilder to query ES and only retrieve the following:
Get the rawValue by name (I provide Manager, I get "DFEH")
Get the rowType value (I provide extData + row Type, I get "SIM")
Below is my query:
{
"from": 0,
"size": 100,
"query": {
"bool": {
"must": [
{
"terms": {
"prcKey": [
"K-112"
],
"boost": 1.0
}
}
],
"must_not": [
{
"exists": {
"field": "endDate",
"boost": 1.0
}
},
{
"term": {
"personInCharge": {
"value": "ABC",
"boost": 1.0
}
}
}
],
"adjust_pure_negative": true,
"boost": 1.0
}
},
"_source": {
"includes": [
"variables.name",
"variables.rawValue"
],
"excludes": []
},
"sort": [
{
"createTime": {
"order": "desc"
}
}
]
}
How can I fix my query? I tried using nested queries but without any luck.
I have a set of the following phrases: [remix], [18+], etc. How can I make a search by one character, for example "[", to find all these variants ?
Right now I have the following analyzers config:
{
"analysis": {
"analyzer": {
{ "bigram_analyzer": {
{ "type": "custom",
{ "tokenizer": { "keyword",
{ "filter": [
{ "lowercase",
"bigram_filter".
]
},
{ "full_text_analyzer": {
{ "type": "custom",
{ "tokenizer": { "ngram_tokenizer",
{ "filter": [
"lowercase"
]
}
},
{ "filter": {
{ "bigram_filter": {
{ "type": "edge_ngram",
{ "max_gram": 2
}
},
{ "tokenizer": {
{ "ngram_tokenizer": {
{ "type": "ngram",
{ "min_gram": 3,
{ "max_gram": 3,
{ "token_chars": [
{ "letter",
{ "digit",
{ "symbol",
"punctuation"
]
}
}
}
}
Mapping occurs at the java entity level using the spring boot data elasticsearch starter
If I understand your problem correctly - you want to implement an autocomplete analyzer that will return any term that starts with [ or any other character. To do so you can create a custom analyzer using ngram autocomplete. Here is an example:
Here is the testing index:
PUT /testing-index-v3
{
"settings": {
"number_of_shards": 1,
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 15
}
},
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"lowercase",
"autocomplete_filter"
]
}
}
}
},
"mappings": {
"properties": {
"term": {
"type": "text",
"analyzer": "autocomplete"
}
}
}
}
Here is the documents input:
POST /testing-index-v3/_doc
{
"term": "[+18]"
}
POST testing-index-v3/_doc
{
"term": "[remix]"
}
POST testing-index-v3/_doc
{
"term": "test"
}
And finally our search:
GET testing-index-v3/_search
{
"query": {
"match": {
"term": {
"query": "[remi",
"analyzer": "keyword",
"fuzziness": 0
}
}
}
}
As you can see I chose the keyword tokenizer for the autocomplete filter. I'm using ngram filter with min_gram: 1 and max_gram 15 which means our query will be separated into tokens like this:
input-query = i, in, inp, inpu, input .. and etc. Separates up to 15 tokens. This is wanted only at indexing time. Looking at the query we specify keyword analyzer as well - this analyzer is for the search time and it hard matches results. Here are some example searches and results:
GET testing-index-v3/_search
{
"query": {
"match": {
"term": {
"query": "[",
"analyzer": "keyword",
"fuzziness": 0
}
}
}
}
result:
"hits" : [
{
"_index" : "testing-index-v3",
"_type" : "_doc",
"_id" : "w5c_IHsBGGZ-oIJIi-6n",
"_score" : 0.7040055,
"_source" : {
"term" : "[remix]"
}
},
{
"_index" : "testing-index-v3",
"_type" : "_doc",
"_id" : "xJc_IHsBGGZ-oIJIju7m",
"_score" : 0.7040055,
"_source" : {
"term" : "[+18]"
}
}
]
GET testing-index-v3/_search
{
"query": {
"match": {
"term": {
"query": "[+",
"analyzer": "keyword",
"fuzziness": 0
}
}
}
}
result:
"hits" : [
{
"_index" : "testing-index-v3",
"_type" : "_doc",
"_id" : "xJc_IHsBGGZ-oIJIju7m",
"_score" : 0.7040055,
"_source" : {
"term" : "[+18]"
}
}
]
Hope this answer helps you. Good luck with your adventures with elasticsearch!
I tried to write a filter query using elastic search Java API version 7.6
But there is no good documentation on how to write a filter context search.
Anyone know how to write Java API for the following:
GET /_search
{
"query": {
"bool": {
"must": [
{ "match": { "title": "Search" }},
{ "match": { "content": "Elasticsearch" }}
],
"filter": [
{ "term": { "status": "published" }},
{ "range": { "publish_date": { "gte": "2015-01-01" }}}
]
}
}
}
Try the following
BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
List<QueryBuilder> mustClauses = boolQueryBuilder.must();
mustClauses.add(QueryBuilders.matchQuery("title", "Search"));
mustClauses.add(QueryBuilders.matchQuery("content", "Elasticsearch"));
List<QueryBuilder> filterClauses = boolQueryBuilder.filter();
filterClauses.add(QueryBuilders.termQuery("status", "published"));
filterClauses.add(QueryBuilders.rangeQuery("publish_date").gte("2015-01-01"));
SearchRequest searchRequest = new SearchRequest();
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(boolQueryBuilder);
searchRequest.source(searchSourceBuilder);
System.out.println(searchRequest.toString());
The resulting query is
{
"query": {
"bool": {
"must": [
{
"match": {
"title": {
"query": "Search",
"operator": "OR",
"prefix_length": 0,
"max_expansions": 50,
"fuzzy_transpositions": true,
"lenient": false,
"zero_terms_query": "NONE",
"auto_generate_synonyms_phrase_query": true,
"boost": 1.0
}
}
},
{
"match": {
"content": {
"query": "Elasticsearch",
"operator": "OR",
"prefix_length": 0,
"max_expansions": 50,
"fuzzy_transpositions": true,
"lenient": false,
"zero_terms_query": "NONE",
"auto_generate_synonyms_phrase_query": true,
"boost": 1.0
}
}
}
],
"filter": [
{
"term": {
"status": {
"value": "published",
"boost": 1.0
}
}
},
{
"range": {
"publish_date": {
"from": "2015-01-01",
"to": null,
"include_lower": true,
"include_upper": true,
"boost": 1.0
}
}
}
],
"adjust_pure_negative": true,
"boost": 1.0
}
}
}
Something similar i am trying to achieve for below JSON.
SELECT * FROM TABLE_NAME WHERE ORGID = "BACKENDORG" AND TYPE = "AUDIT" AND
(sessionId LIKE '%16ECA064B298356B%' OR loginUserId LIKE '%16ECA064B298356B%' OR txnId LIKE '%16ECA064B298356B%');
{
"sessionId": "16ECA064B298356B",
"message": "2019-12-03T05:29:13.217Z [http-nio-8080-exec-4] INFO http-nio-8080-exec-4 QueryController backendorg 16CFAFCCFB14D9A3 16ECA064B298356B 16ECA3A4EFA026BF
"type": "audit",
"orgId": "backendorg",
"loginUserId": "16CFAFCCFB14D9A3",
"txnId": "16ECA3A4EFA026BF"
}
trying to write a LIKE query using BoolQueryBuilder, Here is my query
{
"query": {
"bool": {
"must": [
{
"term": {
"orgId": {
"value": "backendorg",
"boost": 1
}
}
},
{
"term": {
"type": {
"value": "audit",
"boost": 1
}
}
},
{
"bool": {
"should": [
{
"wildcard": {
"sessionId": {
"wildcard": "16ECA064B298356B",
"boost": 1
}
}
},
{
"wildcard": {
"loginUserId": {
"wildcard": "16ECA064B298356B",
"boost": 1
}
}
},
{
"wildcard": {
"txnId": {
"wildcard": "16ECA064B298356B",
"boost": 1
}
}
}
],
"adjust_pure_negative": true,
"minimum_should_match": "1",
"boost": 1
}
}
],
"adjust_pure_negative": true,
"boost": 1
}
}
}
Above query should return the above JSON result, but it showing zero hits.
It would be helpful if someone pointed out the issue.
I have a huge amount of elasticsearch data, I neeed to make aggregations and return buckets. I need to limit data size returned from elasticsearch to only get a sample for the data not all data.
I have tried adding "size" attribute. But it's not acceptable in bucketing aggregations.
{
"size": 0,
"query": {
"bool": {
"adjust_pure_negative": true,
"boost": 1
}
},
"aggregations": {
"my_agg_1": {
"histogram": {
"field": "coAt",
"interval": 86400,
"offset": 1558216800,
"order": {
"_key": "asc"
},
"keyed": false,
"min_doc_count": 1
},
"aggregations": {
"my_agg_2": {
"terms": {
"field": "atr1",
"missing": "NaN",
"value_type": "string",
"size": 2147483647,
"min_doc_count": 1,
"shard_min_doc_count": 0,
"show_term_doc_count_error": false,
"order": [
{
"_count": "desc"
},
{
"_key": "asc"
}
]
},
"aggregations": {
"atr2": {
"top_hits": {
"from": 0,
"size": 1,
"version": false,
"explain": false,
"sort": [
{
"coAt": {
"order": "desc"
}
}
]
}
},
"clientIP_count": {
"value_count": {
"field": "clientIP"
}
}
}
}
}
}
}
}