Am fetching documents from elastic search using java api, i have the following code in my elastic search documents and am trying to search it with the following pattern.
code : MS-VMA1615-0D
Input : *VMA1615-0* -- Am getting the results (MS-VMA1615-0D).
Input : MS-VMA1615-0D -- Am getting the results (MS-VMA1615-0D).
Input : *VMA1615-0 -- Am getting the results (MS-VMA1615-0D).
Input : *VMA*-0* -- Am getting the results (MS-VMA1615-0D).
But, if i give input like below, am not getting results.
Input : VMA1615 -- Am not getting the results.
Am expecting to return the code MS-VMA1615-0D
Please find my below java code that am using
private final String INDEX = "products";
private final String TYPE = "doc";
SearchRequest searchRequest = new SearchRequest(INDEX);
searchRequest.types(TYPE);
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
QueryStringQueryBuilder qsQueryBuilder = new QueryStringQueryBuilder(code);
qsQueryBuilder.defaultField("code");
searchSourceBuilder.query(qsQueryBuilder);
searchSourceBuilder.size(50);
searchRequest.source(searchSourceBuilder);
SearchResponse searchResponse = null;
try {
searchResponse = SearchEngineClient.getInstance().search(searchRequest);
} catch (IOException e) {
e.getLocalizedMessage();
}
Item item = null;
SearchHit[] searchHits = searchResponse.getHits().getHits();
Please find my mapping details :
PUT products
{
"settings": {
"analysis": {
"analyzer": {
"custom_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"char_filter": [
"html_strip"
],
"filter": [
"lowercase",
"asciifolding"
]
}
}
}
},
"mappings": {
"doc": {
"properties": {
"code": {
"type": "text",
"analyzer": "custom_analyzer"
}
}
}
}
}
To do what you're looking for you might have to change the tokenizer you're using. Currently you are using whitespace tokenizer which must be replaced with pattern tokenizer.
So your new mapping should look like the below one:
PUT products
{
"settings": {
"analysis": {
"analyzer": {
"custom_analyzer": {
"type": "custom",
"tokenizer": "pattern",
"char_filter": [
"html_strip"
],
"filter": [
"lowercase",
"asciifolding"
]
}
}
}
},
"mappings": {
"doc": {
"properties": {
"code": {
"type": "text",
"analyzer": "custom_analyzer"
}
}
}
}
}
So after changing your mapping a query to VMA1615 will return MS-VMA1615-0D.
This works as it tokenize the string "MS-VMA1615-0D" into "MS", "VMA1615" & "0D". So, whenever in your query you have any of them it will give you the result.
POST _analyze
{
"tokenizer": "pattern",
"text": "MS-VMA1615-0D"
}
will return:
{
"tokens": [
{
"token": "MS",
"start_offset": 0,
"end_offset": 2,
"type": "word",
"position": 0
},
{
"token": "VMA1615",
"start_offset": 3,
"end_offset": 10,
"type": "word",
"position": 1
},
{
"token": "0D",
"start_offset": 11,
"end_offset": 13,
"type": "word",
"position": 2
}
]
}
Based on your comment:
It is not how elasticsearch works. Elasticsearch stores the terms and
their corresponding documents in an inverted index data structure and
by default the terms produced by a full text search is based on
white-spaces, i.e. a text "Hi there I am a technocrat" would split up
as ["Hi", "there", "I", "am", "a", "technocrat"]. So this implies that
the terms which gets stored depends on how it is tokenized. After
indexing when you query let's say in the above example if I query for
"technocrat", I will get the result as the inverted index has that
term associated with my document. So in your case "VMA" is not stored as a term.
To do that use the below mapping:
PUT products
{
"settings": {
"analysis": {
"analyzer": {
"custom_analyzer": {
"type": "custom",
"tokenizer": "my_pattern_tokenizer",
"char_filter": [
"html_strip"
],
"filter": [
"lowercase",
"asciifolding"
]
}
},
"tokenizer": {
"my_pattern_tokenizer": {
"type": "pattern",
"pattern": "-|\\d"
}
}
}
},
"mappings": {
"doc": {
"properties": {
"code": {
"type": "text",
"analyzer": "custom_analyzer"
}
}
}
}
}
So to check:
POST products/_analyze
{
"tokenizer": "my_pattern_tokenizer",
"text": "MS-VMA1615-0D"
}
will produce:
{
"tokens": [
{
"token": "MS",
"start_offset": 0,
"end_offset": 2,
"type": "word",
"position": 0
},
{
"token": "VMA",
"start_offset": 3,
"end_offset": 6,
"type": "word",
"position": 1
},
{
"token": "D",
"start_offset": 12,
"end_offset": 13,
"type": "word",
"position": 2
}
]
}
Related
I am trying to filter the records based on nested field and want only the matching object in that array to be shown as part of the record.
Below is the detailed explanation of my requirement.
So, I have Elasticsearch data like this:
[{
"basicInfo": {
"requestId": 123,
},
"managerInfo": {
"manager": "John",
},
"groupInfo": [
{
"id": "id1",
"name": "abc",
"status": "Approved"
},
{
"id": "id2",
"name": "abc",
"status": "Pending"
}
]
},
{
"basicInfo": {
"requestId": 233,
},
"managerInfo": {
"manager": "John Sr",
},
"groupInfo": [
{
"id": "id3",
"name": "abc",
"status": "Pending"
}
]
}
]
I want to filter the records only with groupInfo.status as Approved and basicInfo.requestId as 123, but my condition is I should only get the Approved record in the groupInfo and not the pending ones. So, the output I am expecting is:
{
"took": 23,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 3.0602708,
"hits": [
{
"_index": "my_index",
"_type": "request",
"_id": "123",
"_score": 3.0602708,
"_source": {
"basicInfo": {
"requestId": 123
},
"managerInfo": {
"manager": "John"
},
"groupInfo": [
{
"id": "id1",
"name": "abc",
"status": "Approved"
}
// No id2 here as it is in pending state
]
}
}
]
}
}
But instead I am able to achieve:
{
"took": 23,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 3.0602708,
"hits": [
{
"_index": "my_index",
"_type": "request",
"_id": "123",
"_score": 3.0602708,
"_source": {
"basicInfo": {
"requestId": 123
},
"managerInfo": {
"manager": "John"
},
"groupInfo": [
{
"id": "id1",
"name": "abc",
"status": "Approved"
},
{
"id": "id2",
"name": "abc",
"status": "Pending"
}
]
}
}
]
}
}
This is the query I am using:
{
"query": {
"bool": {
"must": [
{
"match": {
"basicInfo.requestId": "123"
}
},
{
"nested": {
"path": "groupInfo",
"query": {
"bool": {
"must": [
{
"term": {
"groupInfo.status": "Approved"
}
}
]
}
}
}
}
]
}
}
}
So, my question is first what I am expecting, is that even possible? Can we filter the result and make sure that we get only the matched array from that result?
If yes, how can we do it?
Thanks in advance.
Maybe you are looking for Inner Hits.
In many cases, it’s very useful to know which inner nested objects (in
the case of nested) or children/parent documents (in the case of
parent/child) caused certain information to be returned. The inner
hits feature can be used for this. This feature returns per search hit
in the search response additional nested hits that caused a search hit
to match in a different scope.
{
"query": {
"bool": {
"must": [
{
"match": {
"basicInfo.requestId": "123"
}
},
{
"nested": {
"path": "groupInfo",
"query": {
"bool": {
"must": [
{
"term": {
"groupInfo.status": "Approved"
}
}
]
}
},
"inner_hits":{}
}
}
]
}
}
}
So, I have two filters defined in my config JSON file. Now, I want to apply these filters one at a time and then combine the result.
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 3,
"max_gram": 20
},
"shingle_filter": {
"type": "shingle",
"min_shingle_size": 1,
"max_shingle_size": 2
}
},
Example:
"best mac laptop" -> "best", "mac", "laptop", "best mac", "mac laptop", "bes", "best", "best ", "best m", "best ma", "best mac", ...
Like above, I want to create index using Shingle filter, then I want to create index autocomplete filter on original data, and then combine and create index in a single document. Is it possible? Is there anyway?
So, after looking hard into the spring data Elasticsearch docs I'm able to index same field using two different analyzers.
#Document(indexName = "course-doc")
#Setting(settingPath = "es-config/autocomplete.json")
#Getter
#Setter
public class Course {
#Id
long id;
#MultiField(
mainField = #Field(type = FieldType.Text, analyzer = "autocomplete_index", searchAnalyzer = "autocomplete_search"),
otherFields = {#InnerField(suffix = "search", type = FieldType.Text, analyzer = "search_index", searchAnalyzer = "autocomplete_search")})
String name;
}
autocomplete.json
{
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 20
},
"shingle_filter": {
"type": "shingle",
"min_shingle_size": 1,
"max_shingle_size": 10
}
},
"analyzer": {
"autocomplete_search": {
"type": "custom",
"tokenizer": "standard",
"filter": [ "lowercase" ]
},
"autocomplete_index": {
"type": "custom",
"tokenizer": "standard",
"filter": [ "lowercase", "stop" , "autocomplete_filter" ]
},
"search_index": {
"type": "custom",
"tokenizer": "standard",
"filter": [ "lowercase" , "shingle_filter" ]
},
"standard-analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [ "lowercase", "stop" ]
}
}
}
}
This is my query, where it does the nested sort but I want it to sort the data in item_numbers array together with the nested sort in a single query in elastic search.
{
"query": {
"nested": {
"query": {
"bool": {
"must": [{
"match": {
"item_numbers.type": "catalog"
}
}]
}
},
"path": "item_numbers"
}
},
"sort": [{
"item_numbers.value.keyword": {
"order": "asc",
"nested": {
"path": "item_numbers"
}
}
}]
}
My output for the above query is below :
{
"data": [
{
"item_numbers": [
{
"value": "Ball",
"value_phonetic": "",
"type": "catalog"
},
{
"value": "Apple",
"value_phonetic": "",
"type": "catalog"
},
{
"value": "Cat",
"value_phonetic": "",
"type": "catalog"
}
]
},
{
"item_numbers": [
{
"value": "Cococola",
"value_phonetic": "",
"type": "catalog"
},
{
"value": "Appy",
"value_phonetic": "",
"type": "catalog"
}
]
}
]
}
But I want to sort the document which contains multiple data in an array in a single document
Expected output :
{
"data": [
{
"item_numbers": [
{
"value": "Apple",
"value_phonetic": "",
"type": "catalog"
},
{
"value": "Ball",
"value_phonetic": "",
"type": "catalog"
},
{
"value": "Cat",
"value_phonetic": "",
"type": "catalog"
}
]
},
{
"item_numbers": [
{
"value": "Appy",
"value_phonetic": "",
"type": "catalog"
},
{
"value": "Cococola",
"value_phonetic": "",
"type": "catalog"
}
]
}
]
}
Does anyone know what changes to be made in the query to sort to get this output?
The global sort, even though it's nested, is only applied on the top level -- meaning the inner docs don't get sorted.
What you're looking for is sorted inner_hits:
{
"_source": "sorted_item_numbers", <--
"query": {
"nested": {
"query": {
"bool": {
"must": [
{
"match": {
"item_numbers.type": "catalog"
}
}
]
}
},
"inner_hits": { <--
"name": "sorted_item_numbers",
"sort": {
"item_numbers.value.keyword": "asc"
}
},
"path": "item_numbers"
}
},
"sort": [
{
"item_numbers.value.keyword": {
"order": "asc",
"nested": {
"path": "item_numbers"
}
}
}
]
}
Note that the response will be slightly different from the standard hits but both the top-level docs will be sorted (the doc with the best item_numbers.value taking precedence) as well as the actual contents of the item_numbers.
I have a JSON
{
"Id": "xxx",
"Type": "Transaction.Create",
"Payload": {
"result": 2,
"description": "Pending",
"body": {
"redirect": {
"url": "xxx",
"fields": {
"MD": "8a829449620619e80162252adeb66a39"
}
},
"card": {
"expiryMonth": "1",
"expiryYear": "2033"
},
"order": {
"amount": 1
}
}
}
}
And I want to remove the card info of it like this:
{
"Id": "xxx",
"Type": "Transaction.Create",
"Payload": {
"result": 2,
"description": "Pending",
"body": {
"redirect": {
"url": "xxx",
"fields": {
"MD": "8a829449620619e80162252adeb66a39"
}
},
"order": {
"amount": 1
}
}
}
}
How can I do this with Apache velocity?
What works is:
#set($content = $util.urlEncode($input.json('$')))
#set($new = $content.replaceAll("2033","2055"))
Action=SendMessage&MessageBody={"body": "$new","Event-Signature": "$util.urlEncode($input.params('Event-Signature'))"}
This gives me
{
"Id": "xxx",
"Type": "Transaction.Create",
"Payload": {
"result": 2,
"description": "Pending",
"body": {
"redirect": {
"url": "xxx",
"fields": {
"MD": "8a829449620619e80162252adeb66a39"
}
},
"card": {
"expiryMonth": "1",
"expiryYear": "2050"
},
"order": {
"amount": 1
}
}
}
}
But now I want to remove the card part but it does not work:
#set($content = $util.urlEncode($input.json('$')))
#set($new = $content.delete("$.Payload.body.card"))
Action=SendMessage&MessageBody={"body": "$new","Event-Signature": "$util.urlEncode($input.params('Event-Signature'))"}
what am I doing wrong?
Main goal is transform a mapping template in API Gateway for a webhook. The webhook contains to many information and we want to remove some part of the JSON POST call.
Try using the below
#set($dummy=$content.Payload.remove("card"))
I do a query on Elasticsearch from Kibana 4.4.1 which looks like this :
{
"size": 0,
"query": {
"filtered": {
"query": {
"query_string": {
"query": "FALK0911622560T",
"analyze_wildcard": true
}
},
"filter": {
"bool": {
"must": [
{
"range": {
"#timestamp": {
"gte": 1438290000000,
"lte": 1440968400000,
"format": "epoch_millis"
}
}
}
],
"must_not": []
}
}
}
},
"aggs": {
"2": {
"date_histogram": {
"field": "#timestamp",
"interval": "1w",
"time_zone": "Europe/Helsinki",
"min_doc_count": 1,
"extended_bounds": {
"min": 1438290000000,
"max": 1440968400000
}
},
"aggs": {
"1": {
"percentiles": {
"field": "Quantity",
"percents": [
50
]
}
}
}
}
}
}
This piece of code will return all the docs with "ProductCode" = FALK0911622560T", between the given interval.
I tried the same thing with Elasticsearch Java API with the following code :
BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery().must(QueryBuilders.matchQuery(matchQueryKey,matchQueryValue));
SearchResponse response = client.prepareSearch(indexName)
.setTypes(indexTypeName)
.setQuery(boolQueryBuilder)
.setSize(100)
.addAggregation(AggregationBuilders
.dateHistogram("myHistogram")
.field("#timestamp")
.interval(DateHistogramInterval.WEEK)
.timeZone("Europe/Helsinki")
.minDocCount(1)
.extendedBounds(1438290000000L, 1440968400000L))
.addFields(fieldsOfInterest)
.execute()
.actionGet();
response.getAggregations();
But I get all the documents in the index with "ProductCode" = FALK0911622560T.
Between the given time, I should have only 5 documents on response.getAgregations() because I set the interval to be Week.
A doc in Elasticsearch looks like this :
{
"_index": "warehouse-550",
"_type": "core2",
"_id": "AVOKCqQ68h4KkDGZvk6b",
"_score": null,
"_source": {
"message": "5,550,67.01,FALK0911622560T,2015-07-31;08:00:00.000\r",
"#version": "1",
"#timestamp": "2015-07-31T06:00:00.000Z",
"path": "D:/Programs/Logstash/x_testingLocally/processed-stocklevels-550-25200931072015.csv",
"host": "EVO385",
"type": "core2",
"Quantity": 5,
"Warehouse": "550",
"Price": 67.01,
"ProductCode": "FALK0911622560T",
"Timestamp": "2015-07-31;08:00:00.000"
},
"fields": {
"#timestamp": [
1438322400000
]
},
"highlight": {
"ProductCode": [
"#kibana-highlighted-field#FALK0911622560T#/kibana-highlighted-field#"
],
"message": [
"5,550,67.01,#kibana-highlighted-field#FALK0911622560T#/kibana-highlighted-field#,2015-07-31;08:00:00.000\r"
]
},
"sort": [
1438322400000
]
}
Please help.
Thank you.
You did not add the rangeQuery. Change your boolQueryBuilder to following:
BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery().must(QueryBuilders.matchQuery(matchQueryKey,matchQueryValue)).must(QueryBuilders.rangeQuery("#timestamp").gte(fromValue).lte(toValue));
You can get buckets using:
InternalDateHistogram histogram = searchResponse.getAggregations().getAsMap().get(aggregation_name);
List bucketList = histogram?.getBuckets()