ElasticSearch post_filter Java API issue - java

I am trying to use the ElasticSearch Java APIs to do a query-string-query and then restrict the results by a date range based on a field in the result set. When I test it out using Kibana I get 77 hits but when I try to do the same thing using the Java APIs I get '0' hits.
Here is the query as written in Kibana:
GET /enyo_cad/_search
{
"from": 0, "size": 20,
"query": {
"query_string": {
"query": "smith",
"lenient": true
}
},
"post_filter": {
"range": {
"cadIncident.dateTimeReceived": {
"gte": "2014-01-01T00:00:00",
"lte": "2016-01-01T00:00:00"
}
}
},
"highlight": {
"fields": {
"*": {}
},
"require_field_match": false
},
"sort": [
{ "cadIncident.dateTimeReceived": { "order": "desc" }},
{ "_score": { "order": "desc" }}
]
}
And here is my Java code:
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(QueryBuilders.queryStringQuery(searchString)
.lenient(Boolean.TRUE)
.analyzeWildcard(Boolean.TRUE)
);
searchSourceBuilder.from(fromHit);
searchSourceBuilder.size(pageSize);
if (startDate != null && endDate != null) {
String dtName = getDateTimeAttributeName(appList, radarManager);
searchSourceBuilder.postFilter(QueryBuilders.rangeQuery(dtName)
.gte(startDate)
.lte(endDate));
}
// This section will sort the results IF there is only 1 application module selected OTHERWISE default revelvance score based sorting will continue.
if (dateSort && appList.size() == 1) {
searchSourceBuilder.sort(getDateTimeAttributeName(appList, radarManager), SortOrder.DESC);
searchSourceBuilder.sort("_score", SortOrder.DESC);
}
HighlightBuilder highlightBuilder = new HighlightBuilder();
HighlightBuilder.Field highlightFields = new HighlightBuilder.Field("*");
highlightFields.highlighterType("unified"); //highlighter type unified is default https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-highlighting.html
highlightBuilder.field(highlightFields);
searchSourceBuilder.highlighter(highlightBuilder);
searchRequest.source(searchSourceBuilder);
searchRequest.indices(indexFilter);
searchResponse = restClient.search(searchRequest, requestOptions);
SearchHits hits = searchResponse.getHits();
Any help would be greatly appreciated..

Found the solution.. Sorry for the post when it was just a simple missing pair of parentheses .. should have been
if (dateSort && (appList.size() == 1)) {

Related

ElasticSearch Java API nested query with inner_hits error

i have problem with ElasticSearch Java API. I use version 5.1.2.
I will now describe code pasted below. I need to optimize search mechanism by limiting inner_hits only to object id. I used InnerHitBuilder with .setFetchSourceContext(FetchSourceContext.DO_NOT_FETCH_SOURCE) and .addDocValueField("item.id"). Query being generated has error - there is "ignore_unmapped" attribute inside "inner_hits" node.
..."inner_hits": {
"name": "itemTerms",
"ignore_unmapped": false,
"from": 0,
"size": 2147483647,
"version": false,
"explain": false,
"track_scores": false,
"_source": false,
"docvalue_fields": ["item.id"]
}...
Executing such query results with error:
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "[inner_hits] unknown field [ignore_unmapped], parser not found"
}
],
"type": "illegal_argument_exception",
"reason": "[inner_hits] unknown field [ignore_unmapped], parser not found"
},
"status": 400
}
When i manually remove that attribute from query, everything runs smoothly.
protected BoolQueryBuilder itemTermQuery(FileTerms terms, boolean withInners) {
BoolQueryBuilder termsQuery = QueryBuilders.boolQuery();
for (String term : FileTerms.terms()) {
if (terms.term(term).isEmpty())
continue;
Set<String> fns = terms.term(term).stream().
map(x -> x.getTerm())
.filter(y -> !y.isEmpty())
.collect(Collectors.toSet());
if (!fns.isEmpty())
termsQuery = termsQuery.must(
QueryBuilders.termsQuery("item.terms." + term + ".term", fns));
}
QueryBuilder query = terms.notEmpty() ? termsQuery : QueryBuilders.matchAllQuery();
TermsQueryBuilder discontinuedQuery = QueryBuilders.termsQuery("item.terms." + FileTerms.Terms.USAGE_IS + ".term",
new FileTerm("Discontinued", "", "", "", "").getTerm());
FunctionScoreQueryBuilder.FilterFunctionBuilder[] functionBuilders = {
new FunctionScoreQueryBuilder.FilterFunctionBuilder(query, ScoreFunctionBuilders.weightFactorFunction(1)),
new FunctionScoreQueryBuilder.FilterFunctionBuilder(discontinuedQuery, ScoreFunctionBuilders.weightFactorFunction(-1000))
};
FunctionScoreQueryBuilder functionScoreQuery = functionScoreQuery(functionBuilders);
NestedQueryBuilder nested = QueryBuilders.nestedQuery("item", functionScoreQuery.query(), ScoreMode.None);
if (withInners) nested = nested.innerHit(new InnerHitBuilder()
.setFetchSourceContext(FetchSourceContext.DO_NOT_FETCH_SOURCE)
.addDocValueField("item.id")
.setSize(Integer.MAX_VALUE)
.setName("itemTerms"));
return QueryBuilders.boolQuery().must(nested);
}
How to build query without that unnecessary attribute inside "inner_hits" node?
EDIT:
I use 5.1.2 library and 5.1.2 elastic server.
<dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>transport</artifactId>
<version>5.1.2</version>
</dependency>
"version": {
"number": "5.1.2",
"build_hash": "c8c4c16",
"build_date": "2017-01-11T20:18:39.146Z",
"build_snapshot": false,
"lucene_version": "6.3.0"
},

java : search for substring in elasticsearch

I'm trying to look for substrings in the elasticsearch, but what I've come to known and what I've coded doesn't exactly look for a substring like the way I want.
Here's what I've coded :
BoolQueryBuilder query = new BoolQueryBuilder();
query.must(new QueryStringQueryBuilder("tagName : *"+tagName+"*"));
SearchResponse response = esclient.prepareSearch(index).setTypes(type)
.setQuery(query)
.execute().actionGet();
SearchHit[] hits = response.getHits().getHits();
for (SearchHit hit : hits) {
Map map = hit.getSource();
list.add((String) map.get("tagName"));
}
list = list.stream().distinct().collect(Collectors.toList());
for(int i = 0; i < list.size(); i++) {;
jsonArrayBuilder.add((String) list.get(i));
}
What I'm trying to implement is to look even if part of the given tagname matches with anything should be listed.
But in case, for ex : if I'm looking for a tag named "social_security_number" and I type "social security" then I would like it to be listed.
But what's actually happening is if I miss the underscore, it's not getting listed.
Is it possible to be done? Should I modify this code to search that way?
Here is my index structure :
POST arempris/emptagnames
{
"mappings" : {
"emptags":{
"properties": {
"employeeid": {
"type":"integer"
},
"tagName": {
"type": "text",
"fielddata": true,
"analyzer": "lowercase_keyword",
"search_analyzer": "lowercase_keyword"
}
}
}
}
}
Would greatly appreciate for your help and thanks a lot in advance.
The analyzer that you have set does not tokenize anything, so the space is important. Specifying a custom analyzer that will split on whitespaces and underscores and anything you might find useful is a good solution. The below will work, but check really carefully what the analyzer does and visit the documentation for every part you don't understand.
PUT stackoverflow
{
"settings": {
"analysis": {
"analyzer": {
"customanalyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"standard",
"generatewordparts"
]
}
},
"filter": {
"generatewordparts": {
"type": "word_delimiter",
"split_on_numerics": false,
"split_on_case_change": false,
"generate_word_parts": true,
"generate_number_parts": false,
"stem_english_possessive": false,
"catenate_all": false
}
}
}
},
"mappings": {
"emptags": {
"properties": {
"employeeid": {
"type": "integer"
},
"tagName": {
"type": "text",
"fielddata": true,
"analyzer": "customanalyzer",
"search_analyzer": "customanalyzer"
}
}
}
}
}
GET stackoverflow/emptags/1
{
"employeeid": 1,
"tagName": "social_security_number"
}
GET stackoverflow/_analyze
{
"analyzer" : "customanalyzer",
"text" : "social_security_number123"
}
GET stackoverflow/_search
{
"query": {
"query_string": {
"default_field": "tagName",
"query": "*curi*"
}
}
}
Another solution would be to normalize your input and replace any symbol that you want to treat as a whitespace (e.g. underscore) with a whitespace.
Read here for more
http://nocf-www.elastic.co/guide/en/elasticsearch/reference/current/analysis-word-delimiter-tokenfilter.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-normalizers.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-tokenizers.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-analyze.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-custom-analyzer.html

Issue in extracting the sum aggregation in ElasticSearch multi field group by using JavaAPI

Using ElasticSearch 5.2 and a group by is being done similer to
select city,institutionId, SUM(appOpenCount) from XYZ where ( time > 123 && appOpenCount > 0 ) group by city, institutionId.
I have it working when i do using curl method, but when the same is being converted to java api i am missing something that is causing me not get the last part of sum aggregation.
I have a type temp_type with mapping given below.
{
"temp_index" : {
"mappings" : {
"temp_type" : {
"properties" : {
"appOpenCount" : {
"type" : "integer"
},
"city" : {
"type" : "keyword"
}
"institutionId" : {
"type" : "keyword"
},
"time" : {
"type" : "long"
}
}
}
}
}
}
and my aggregation XGET call looks like this.
curl -XGET "http://localhost:9200/temp_index/temp_type/_search?pretty" -d'
{
"size":0,
"_source":false,
"from" : 0,
"query": {
"bool": {
"must": [
{"range": { "time": { "gte": 1513744603000 } } },
{ "range": { "appOpenCount": { "gt": 0 } } }
]
}
},
"aggregations": {
"city-aggs": {
"terms": { "field": "city"},
"aggregations": {
"intitution-agg": {
"terms": { "field": "institutionId" },
"aggregations": {
"appOpenCount": { "sum": { "field": "appOpenCount" }}}
}
}
}
}
}'
The response is perfect ( the aggregated number mathematically makes sense )
{
"took" : 57,
"timed_out" : false,
"_shards" : { ... },
"hits" : {... },
"aggregations" : {
"city-aggs" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "city-1",
"doc_count" : 25,
"intitution-agg" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "inst-1",
"doc_count" : 5,
"appOpenCount" : {
"value" : 15.0
}
}
]
}
}
]
}
}
Using this as template i converted this to Java API call and it i am able to execute it and access city-agg key and institution-agg key but am not sure how to access the appOpenCount agg. Basically getting null for Sum aggregation.
// bool query
BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
List<QueryBuilder> mustQueries = boolQueryBuilder.must();
mustQueries.add(QueryBuilders.rangeQuery("time").gte(startTime));
mustQueries.add(QueryBuilders.rangeQuery("appOpenCount").gt(0));
queryBuilder = boolQueryBuilder;
// aggregationbuilder
AggregationBuilder aggregationBuilder = null;
TermsAggregationBuilder cityAggs = AggregationBuilders.terms("city-aggs").field("city");
TermsAggregationBuilder institutionAggs = AggregationBuilders.terms(
"institution-agg").field("institutionId");
SumAggregationBuilder fieldAggBuilder = AggregationBuilders.sum("appOpenCount").field("appOpenCount");
aggregationBuilder = cityAggs.subAggregation(institutionAggs).subAggregation(fieldAggBuilder);
// search call
SearchResponse searchResponse = client.prepareSearch(indexName)
.setTypes(typeName)
.setQuery(queryBuilder)
.addAggregation(aggregationBuilder)
.setFrom(0)
.setSize(0)
.execute().actionGet();
// Iterate the searchResponse
Terms cityAggsTerms = searchResponse.getAggregations().get("city-aggs");
List<Terms.Bucket> mainCityBuckets = cityAggsTerms.getBuckets();
for (Terms.Bucket mainCityBucket : mainCityBuckets) {
String cityName = mainCityBucket.getKeyAsString();
LOGGER.info("CityName : " + cityName); // all good
Terms institutionTerms = mainCityBucket.getAggregations().get("institution-agg");
List<Terms.Bucket> institutionBuckets = institutionTerms.getBuckets();
for (Terms.Bucket institutionBucket : institutionBuckets) {
String institutionName = institutionBucket.getKeyAsString();
LOGGER.info("InstitutionName : " + institutionName ); // all good
Sum appOpenCountSum = institutionBucket.getAggregations().get("appOpenCount");
if(appOpenCountSum != null) {
double appOpenCount = appOpenCountSum.getValue();
LOGGER.info("InstitutionName : " + institutionName +
" and appOpenCount is " + appOpenCount);
} else {
LOGGER.info("appOpenCountSum is null");
}
} // institution for
}// city for
How can i access the value of appOpenCount aggregation. I am hitting the case where my "appOpenCountSum" variable is null. Any help would be appreciated. I am able to access the city-agg and institution-agg and get proper values too. Not sure how to access the appOpenCount aggregation inside Term.Bucket
I followed the example provided in elastic search docs for this
https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/_metrics_aggregations.html#java-aggs-metrics-sum
Have given in-depth breakdown and hopefully it helps others too.
EDIT : Issue was the way i was building the aggregation query in java. The fieldAggBuilder should be added to institutionAggs and not the way i had done previously. The corrected code below.
// aggregationbuilder
AggregationBuilder aggregationBuilder = null;
TermsAggregationBuilder cityAggs = AggregationBuilders.terms("cityaggs").field("city");
TermsAggregationBuilder institutionAggs = AggregationBuilders.terms(
"institution-agg").field("institutionId");
SumAggregationBuilder fieldAggBuilder =
AggregationBuilders.sum("appOpenCount").field("appOpenCount");
institutionAggs.subAggregation(fieldAggBuilder); // this was missing previously
aggregationBuilder = cityAggs.subAggregation(institutionAggs);

How to create elasticsearch aggregation using Java API on scala

I am new to programming and I want to get the sum of power used in a month from a data stored in elasticsearch, I've used sense and got the value but still finding it hard using Java API in scala. This is what I did
POST /myIndext/myType/_search?search_type=dfs_query_then_fetch
{
"aggs": {
"duration": {
"date_histogram": {
"field": "Day",
"interval": "month",
"format": "yyyy-MM-dd"},
"aggs": {
"Power_total": {
"sum": {
"field": "myField"
}
}
}
}
}
}
RESULT WAS
( "aggregations": {
"duration": {
"buckets": [
{
"key_as_string": "2017-01-01",
"key": 1480550400000,
"doc_count": 619,
"myField": {
"value": 5218.066633789334
}
}
Then scala code is this
val matchquery = QueryBuilders.matchQuery("ID", configurate)
val queryK = QueryBuilders.matchQuery("ID", configurate)
val filterA = QueryBuilders.rangeQuery("Day").gte("2017-01-02T00:00:05.383+0100").lte("2017-01-13T00:00:05.383+0100")
val query = QueryBuilders.filteredQuery(queryK, filteAr)
val agg = AggregationBuilders.dateHistogram("duration")
.field("Day")
.interval(DateHistogramInterval.MONTH)
.minDocCount(0)
.extendedBounds(new DateTime("2017-01-01T00:00:05.383+0100"), new DateTime("2017-01-13T00:00:05.383+0100"))
.subAggregation(AggregationBuilders.sum("power_total").field("myField"))
val result: SearchResponse = client
.prepareSearch("myIndex")
.setTypes("myType")
.setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
.setQuery(query)
.addAggregation(agg)
.addSort("Day", SortOrder.DESC)
.setSize(815)
.addField("myField")
.execute()
.actionGet()
val results = result.getHits.getHits
println("Current results: " + results.length)
for (hit <- results) {
println("------------------------------")
val response = hit.getSource
println(response)
}
client.close()
RESULT WAS
current result = 0
Please let me know why am not getting value for "myField" like I got using sense.
I have tried doing it severally and still get same errors, could it be that I don't parse the query response the right way?
Everything was correct the only pitfall was that I was querying a date time not stored stored in my database. so instead of "2017-01-01", I was inserting this "2017-01-02"

Extracting fields from JSON fields from mongodb

I want to extract all these places mentioned in "location" field and does not want the other fields in the below json.but can't be able to extract since it is nested..Can anyone help me?
DBCursor cursorTotal = coll.find(obje);
while (cursorTotal.hasNext()) {
DBObject curNext = cursorTotal.next();
System.out.println("data::"+curNext.get("list.myList.location");
}
My "curNext" gives output as::
{
"_id": {
"$oid": "51ebe983e4b0d529b4df2a0e"
},
"date": {
"$date": "2013-07-21T13:31:11.000Z"
},
"lTitle": "Three held for running CISF job racket",
"list": {
"myList": [
{
"location": "Germany"
},
{
"location": "Geneva"
},
{
"location": "Paris"
}
]
},
"hash": -1535814113,
"category": "news"
}
I want my output as
Germany,Geneva,Paris
I have been in a long wait here for an answer and finally I got what I was searching for...Just noting my answer so someone else can benefit from it
DBCursor cursorTotal = coll.find(obje);
while (cursorTotal.hasNext()) {
DBObject curNext = cursorTotal.next();
String res=curNext.toString();
JsonElement jelement = new JsonParser().parse(res);
JsonObject jobject = jelement.getAsJsonObject();
jobject = jobject.getAsJsonObject("list");
JsonArray jarray = jobject.getAsJsonArray("myList");
jobject = jarray.get(0).getAsJsonObject();
String result = jobject.get("location").getAsString();
System.out.println("all places::"+result);
}
For finding only locations you should used mongo aggregation, below query will fetch all lcoations array
db.collectionName.aggregate({
"$unwind": "$ner.nerList"
},
{
"$group": {
"_id": "$_id",
"location": {
"$push": "$ner.nerList.location"
}
}
},
{
"$project": {
"location": "$location",
"_id": 0
}
})
Unfortunately I don't know how to convert this in Java but, I find below links which helpfull to you for converting above query in java format
Mongo Java aggregation driver

Categories