Java elasticsearch API with Elasticsearch server on WM machine - java

I do a query on Elasticsearch from Kibana 4.4.1 which looks like this :
{
"size": 0,
"query": {
"filtered": {
"query": {
"query_string": {
"query": "FALK0911622560T",
"analyze_wildcard": true
}
},
"filter": {
"bool": {
"must": [
{
"range": {
"#timestamp": {
"gte": 1438290000000,
"lte": 1440968400000,
"format": "epoch_millis"
}
}
}
],
"must_not": []
}
}
}
},
"aggs": {
"2": {
"date_histogram": {
"field": "#timestamp",
"interval": "1w",
"time_zone": "Europe/Helsinki",
"min_doc_count": 1,
"extended_bounds": {
"min": 1438290000000,
"max": 1440968400000
}
},
"aggs": {
"1": {
"percentiles": {
"field": "Quantity",
"percents": [
50
]
}
}
}
}
}
}
This piece of code will return all the docs with "ProductCode" = FALK0911622560T", between the given interval.
I tried the same thing with Elasticsearch Java API with the following code :
BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery().must(QueryBuilders.matchQuery(matchQueryKey,matchQueryValue));
SearchResponse response = client.prepareSearch(indexName)
.setTypes(indexTypeName)
.setQuery(boolQueryBuilder)
.setSize(100)
.addAggregation(AggregationBuilders
.dateHistogram("myHistogram")
.field("#timestamp")
.interval(DateHistogramInterval.WEEK)
.timeZone("Europe/Helsinki")
.minDocCount(1)
.extendedBounds(1438290000000L, 1440968400000L))
.addFields(fieldsOfInterest)
.execute()
.actionGet();
response.getAggregations();
But I get all the documents in the index with "ProductCode" = FALK0911622560T.
Between the given time, I should have only 5 documents on response.getAgregations() because I set the interval to be Week.
A doc in Elasticsearch looks like this :
{
"_index": "warehouse-550",
"_type": "core2",
"_id": "AVOKCqQ68h4KkDGZvk6b",
"_score": null,
"_source": {
"message": "5,550,67.01,FALK0911622560T,2015-07-31;08:00:00.000\r",
"#version": "1",
"#timestamp": "2015-07-31T06:00:00.000Z",
"path": "D:/Programs/Logstash/x_testingLocally/processed-stocklevels-550-25200931072015.csv",
"host": "EVO385",
"type": "core2",
"Quantity": 5,
"Warehouse": "550",
"Price": 67.01,
"ProductCode": "FALK0911622560T",
"Timestamp": "2015-07-31;08:00:00.000"
},
"fields": {
"#timestamp": [
1438322400000
]
},
"highlight": {
"ProductCode": [
"#kibana-highlighted-field#FALK0911622560T#/kibana-highlighted-field#"
],
"message": [
"5,550,67.01,#kibana-highlighted-field#FALK0911622560T#/kibana-highlighted-field#,2015-07-31;08:00:00.000\r"
]
},
"sort": [
1438322400000
]
}
Please help.
Thank you.

You did not add the rangeQuery. Change your boolQueryBuilder to following:
BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery().must(QueryBuilders.matchQuery(matchQueryKey,matchQueryValue)).must(QueryBuilders.rangeQuery("#timestamp").gte(fromValue).lte(toValue));
You can get buckets using:
InternalDateHistogram histogram = searchResponse.getAggregations().getAsMap().get(aggregation_name);
List bucketList = histogram?.getBuckets()

Related

Searching _source and returning only needed fields using Java

I am using Java to perform queries on Elasticsearch, via the ElasticSearchClient. As there are big variables returned, I would like to only retrieve the ones that are relevant but the variables in _source are nested.
Below is a sample index response (multiple indexes can be returned with same _source structure)
[
{
"_index": "kn-tas-20200630",
"_type": "_doc",
"_id": "1122334455",
"_score": null,
"_source": {
"variables": [
{
"rawValue": "DEFH",
"name": "MANAGER"
},
{
"rawValue": "ABCD",
"name": "EMPLOYEE"
},
{
"rawValue": "[{\"rowId\":102030,\"rowType\":\"SIM\"}]",
"name": "extData"
}
]
},
"sort": [
1665735632119
]
}
]
I would like to create a query using SearchSourceBuilder to query ES and only retrieve the following:
Get the rawValue by name (I provide Manager, I get "DFEH")
Get the rowType value (I provide extData + row Type, I get "SIM")
Below is my query:
{
"from": 0,
"size": 100,
"query": {
"bool": {
"must": [
{
"terms": {
"prcKey": [
"K-112"
],
"boost": 1.0
}
}
],
"must_not": [
{
"exists": {
"field": "endDate",
"boost": 1.0
}
},
{
"term": {
"personInCharge": {
"value": "ABC",
"boost": 1.0
}
}
}
],
"adjust_pure_negative": true,
"boost": 1.0
}
},
"_source": {
"includes": [
"variables.name",
"variables.rawValue"
],
"excludes": []
},
"sort": [
{
"createTime": {
"order": "desc"
}
}
]
}
How can I fix my query? I tried using nested queries but without any luck.

Elasticsearch: Filter the records based on nested field with nested field containing only the filtered object

I am trying to filter the records based on nested field and want only the matching object in that array to be shown as part of the record.
Below is the detailed explanation of my requirement.
So, I have Elasticsearch data like this:
[{
"basicInfo": {
"requestId": 123,
},
"managerInfo": {
"manager": "John",
},
"groupInfo": [
{
"id": "id1",
"name": "abc",
"status": "Approved"
},
{
"id": "id2",
"name": "abc",
"status": "Pending"
}
]
},
{
"basicInfo": {
"requestId": 233,
},
"managerInfo": {
"manager": "John Sr",
},
"groupInfo": [
{
"id": "id3",
"name": "abc",
"status": "Pending"
}
]
}
]
I want to filter the records only with groupInfo.status as Approved and basicInfo.requestId as 123, but my condition is I should only get the Approved record in the groupInfo and not the pending ones. So, the output I am expecting is:
{
"took": 23,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 3.0602708,
"hits": [
{
"_index": "my_index",
"_type": "request",
"_id": "123",
"_score": 3.0602708,
"_source": {
"basicInfo": {
"requestId": 123
},
"managerInfo": {
"manager": "John"
},
"groupInfo": [
{
"id": "id1",
"name": "abc",
"status": "Approved"
}
// No id2 here as it is in pending state
]
}
}
]
}
}
But instead I am able to achieve:
{
"took": 23,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 3.0602708,
"hits": [
{
"_index": "my_index",
"_type": "request",
"_id": "123",
"_score": 3.0602708,
"_source": {
"basicInfo": {
"requestId": 123
},
"managerInfo": {
"manager": "John"
},
"groupInfo": [
{
"id": "id1",
"name": "abc",
"status": "Approved"
},
{
"id": "id2",
"name": "abc",
"status": "Pending"
}
]
}
}
]
}
}
This is the query I am using:
{
"query": {
"bool": {
"must": [
{
"match": {
"basicInfo.requestId": "123"
}
},
{
"nested": {
"path": "groupInfo",
"query": {
"bool": {
"must": [
{
"term": {
"groupInfo.status": "Approved"
}
}
]
}
}
}
}
]
}
}
}
So, my question is first what I am expecting, is that even possible? Can we filter the result and make sure that we get only the matched array from that result?
If yes, how can we do it?
Thanks in advance.
Maybe you are looking for Inner Hits.
In many cases, it’s very useful to know which inner nested objects (in
the case of nested) or children/parent documents (in the case of
parent/child) caused certain information to be returned. The inner
hits feature can be used for this. This feature returns per search hit
in the search response additional nested hits that caused a search hit
to match in a different scope.
{
"query": {
"bool": {
"must": [
{
"match": {
"basicInfo.requestId": "123"
}
},
{
"nested": {
"path": "groupInfo",
"query": {
"bool": {
"must": [
{
"term": {
"groupInfo.status": "Approved"
}
}
]
}
},
"inner_hits":{}
}
}
]
}
}
}

how to use terms query in elasticsearch 7.x in this case

elasticsearch version is 7.x
here has some nested data blow :
data1:
[{name:"tom"},{name:"jack"}]
data2:
[{name:"tom"},{name:"rose"}]
data3:
[{name:"tom"},{name:"rose3"}]
...
dataN:
[{name:"tom"},{name:"roseN"}]
when i use the terms query , I just want to search tom, jack, But don't want to include rose...roseN
query:{
terms:{["tom","jack"]}
}
this code is not effective
Adding a working example
Index Data:
PUT /_doc/1
{
"names": [
{
"name": "tom"
},
{
"name": "jack"
}
]
}
PUT /_doc/2
{
"names": [
{
"name": "tom"
},
{
"name": "rose"
}
]
}
Search Query:
{
"query": {
"bool": {
"must": {
"terms": {
"names.name": [
"tom",
"jack"
]
}
},
"must_not": {
"match": {
"names.name": "rose"
}
}
}
}
}
Search Result:
"hits": [
{
"_index": "65838516",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"_source": {
"names": [
{
"name": "tom"
},
{
"name": "jack"
}
]
}
}
]

Write elastic search filter search using ES Java API v7.x

I tried to write a filter query using elastic search Java API version 7.6
But there is no good documentation on how to write a filter context search.
Anyone know how to write Java API for the following:
GET /_search
{
"query": {
"bool": {
"must": [
{ "match": { "title": "Search" }},
{ "match": { "content": "Elasticsearch" }}
],
"filter": [
{ "term": { "status": "published" }},
{ "range": { "publish_date": { "gte": "2015-01-01" }}}
]
}
}
}
Try the following
BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
List<QueryBuilder> mustClauses = boolQueryBuilder.must();
mustClauses.add(QueryBuilders.matchQuery("title", "Search"));
mustClauses.add(QueryBuilders.matchQuery("content", "Elasticsearch"));
List<QueryBuilder> filterClauses = boolQueryBuilder.filter();
filterClauses.add(QueryBuilders.termQuery("status", "published"));
filterClauses.add(QueryBuilders.rangeQuery("publish_date").gte("2015-01-01"));
SearchRequest searchRequest = new SearchRequest();
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(boolQueryBuilder);
searchRequest.source(searchSourceBuilder);
System.out.println(searchRequest.toString());
The resulting query is
{
"query": {
"bool": {
"must": [
{
"match": {
"title": {
"query": "Search",
"operator": "OR",
"prefix_length": 0,
"max_expansions": 50,
"fuzzy_transpositions": true,
"lenient": false,
"zero_terms_query": "NONE",
"auto_generate_synonyms_phrase_query": true,
"boost": 1.0
}
}
},
{
"match": {
"content": {
"query": "Elasticsearch",
"operator": "OR",
"prefix_length": 0,
"max_expansions": 50,
"fuzzy_transpositions": true,
"lenient": false,
"zero_terms_query": "NONE",
"auto_generate_synonyms_phrase_query": true,
"boost": 1.0
}
}
}
],
"filter": [
{
"term": {
"status": {
"value": "published",
"boost": 1.0
}
}
},
{
"range": {
"publish_date": {
"from": "2015-01-01",
"to": null,
"include_lower": true,
"include_upper": true,
"boost": 1.0
}
}
}
],
"adjust_pure_negative": true,
"boost": 1.0
}
}
}

ElasticSearch - JavaApi searching not happening without (*) in my input query

Am fetching documents from elastic search using java api, i have the following code in my elastic search documents and am trying to search it with the following pattern.
code : MS-VMA1615-0D
Input : *VMA1615-0* -- Am getting the results (MS-VMA1615-0D).
Input : MS-VMA1615-0D -- Am getting the results (MS-VMA1615-0D).
Input : *VMA1615-0 -- Am getting the results (MS-VMA1615-0D).
Input : *VMA*-0* -- Am getting the results (MS-VMA1615-0D).
But, if i give input like below, am not getting results.
Input : VMA1615 -- Am not getting the results.
Am expecting to return the code MS-VMA1615-0D
Please find my below java code that am using
private final String INDEX = "products";
private final String TYPE = "doc";
SearchRequest searchRequest = new SearchRequest(INDEX);
searchRequest.types(TYPE);
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
QueryStringQueryBuilder qsQueryBuilder = new QueryStringQueryBuilder(code);
qsQueryBuilder.defaultField("code");
searchSourceBuilder.query(qsQueryBuilder);
searchSourceBuilder.size(50);
searchRequest.source(searchSourceBuilder);
SearchResponse searchResponse = null;
try {
searchResponse = SearchEngineClient.getInstance().search(searchRequest);
} catch (IOException e) {
e.getLocalizedMessage();
}
Item item = null;
SearchHit[] searchHits = searchResponse.getHits().getHits();
Please find my mapping details :
PUT products
{
"settings": {
"analysis": {
"analyzer": {
"custom_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"char_filter": [
"html_strip"
],
"filter": [
"lowercase",
"asciifolding"
]
}
}
}
},
"mappings": {
"doc": {
"properties": {
"code": {
"type": "text",
"analyzer": "custom_analyzer"
}
}
}
}
}
To do what you're looking for you might have to change the tokenizer you're using. Currently you are using whitespace tokenizer which must be replaced with pattern tokenizer.
So your new mapping should look like the below one:
PUT products
{
"settings": {
"analysis": {
"analyzer": {
"custom_analyzer": {
"type": "custom",
"tokenizer": "pattern",
"char_filter": [
"html_strip"
],
"filter": [
"lowercase",
"asciifolding"
]
}
}
}
},
"mappings": {
"doc": {
"properties": {
"code": {
"type": "text",
"analyzer": "custom_analyzer"
}
}
}
}
}
So after changing your mapping a query to VMA1615 will return MS-VMA1615-0D.
This works as it tokenize the string "MS-VMA1615-0D" into "MS", "VMA1615" & "0D". So, whenever in your query you have any of them it will give you the result.
POST _analyze
{
"tokenizer": "pattern",
"text": "MS-VMA1615-0D"
}
will return:
{
"tokens": [
{
"token": "MS",
"start_offset": 0,
"end_offset": 2,
"type": "word",
"position": 0
},
{
"token": "VMA1615",
"start_offset": 3,
"end_offset": 10,
"type": "word",
"position": 1
},
{
"token": "0D",
"start_offset": 11,
"end_offset": 13,
"type": "word",
"position": 2
}
]
}
Based on your comment:
It is not how elasticsearch works. Elasticsearch stores the terms and
their corresponding documents in an inverted index data structure and
by default the terms produced by a full text search is based on
white-spaces, i.e. a text "Hi there I am a technocrat" would split up
as ["Hi", "there", "I", "am", "a", "technocrat"]. So this implies that
the terms which gets stored depends on how it is tokenized. After
indexing when you query let's say in the above example if I query for
"technocrat", I will get the result as the inverted index has that
term associated with my document. So in your case "VMA" is not stored as a term.
To do that use the below mapping:
PUT products
{
"settings": {
"analysis": {
"analyzer": {
"custom_analyzer": {
"type": "custom",
"tokenizer": "my_pattern_tokenizer",
"char_filter": [
"html_strip"
],
"filter": [
"lowercase",
"asciifolding"
]
}
},
"tokenizer": {
"my_pattern_tokenizer": {
"type": "pattern",
"pattern": "-|\\d"
}
}
}
},
"mappings": {
"doc": {
"properties": {
"code": {
"type": "text",
"analyzer": "custom_analyzer"
}
}
}
}
}
So to check:
POST products/_analyze
{
"tokenizer": "my_pattern_tokenizer",
"text": "MS-VMA1615-0D"
}
will produce:
{
"tokens": [
{
"token": "MS",
"start_offset": 0,
"end_offset": 2,
"type": "word",
"position": 0
},
{
"token": "VMA",
"start_offset": 3,
"end_offset": 6,
"type": "word",
"position": 1
},
{
"token": "D",
"start_offset": 12,
"end_offset": 13,
"type": "word",
"position": 2
}
]
}

Categories