Elasticsearch aggregate and sum by field (group by)

Elasticsearch aggregate and sum by field (group by) - java

I have the next data in my index:
{
"category": "fruit",
"name": "apple",
"price": 2.6,
},
{
"category": "fruit",
"name": "orange",
"price": 1.8,
},
{
"category": "vegs",
"name": "tomato",
"price": 0.95,
}
I would like to sum the prices by category that will lead to a result like:
fruit - 4.4
vegs - 0.95
I do realize that I need to use nested aggregation, but I fail to see how exactly. Here is the code I got so far:
SearchRequest searchRequest = new SearchRequest("products");
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.
aggregation(AggregationBuilders.
nested("category_price", "products").
subAggregation(
AggregationBuilders
.terms("field").field("category")).
subAggregation(
AggregationBuilders.avg("avg_price").field("price"))
);
searchRequest.source(searchSourceBuilder);
SearchResponse response = client.search(searchRequest);
Nested agg = response.getAggregations().get("category_price");
Terms name = agg.getAggregations().get("field");
for (Terms.Bucket bucket : name.getBuckets()) {
ReverseNested resellerToProduct = bucket.getAggregations().get("avg_price");
System.out.println(resellerToProduct.getDocCount());
System.out.println(resellerToProduct.getName());
}

you created the second aggregation as sibling and you need just sub-aggregation.
+ you don't have to use here nested aggregation .
AggregationBuilder aggregationBuilder = AggregationBuilders.global("agg")
.subAggregation(AggregationBuilders.terms("by_category").field("category")
.subAggregation(AggregationBuilders.sum("sum_price").field("price")));

Related

How to update nested elasticsearch value via bulkrequest?

We are using AWS Elasticsearch - 7.7 Version
I already followed Update nested field in an index of ElasticSearch with Java API
I have below JSON Elastic Search
{
"_index": "product",
"_type": "_doc",
"_source": {
"id": 1,
"name": "test",
"properties": [{
"id": 1,
"qty": 10
}]
}
}
I have below code
BulkRequest request = new BulkRequest();
request.add(new UpdateRequest(<ES Endpoint>, 1))
.doc(XContentType.JSON, "name", "TEST 1"));
BulkResponse bulkResponse = restClient.bulk(request, RequestOptions.DEFAULT);
How should I update "properties" value "qty"?
https://www.elastic.co/guide/en/elasticsearch/client/java-api/6.8/java-docs-update.html

You can pass a Map with all fields to update in the doc() call:
Map doc = new HashMap();
doc.put("name", "TEST 1");
doc.put("qty", 12);
request.add(new UpdateRequest("index", 1)
.doc(XContentType.JSON, doc));

Partial query with Where statement using Elasticssearch 7 java api

I am using the following to search. It is working fine. But it is returning the results when complete word match is found. But I want results with a partial query (minimum 3 characters match incomplete word). Another check should be , I have a field campus in my document. Which has values like campus: "Bradford" , campus:"Oxford", campus:"Harvard" etc. I want that my query should return the document whose campus should be Bradford or Oxford and Nel will be available in the rest of the entire document.
RestHighLevelClient client;
QueryBuilder matchQueryBuilder = QueryBuilders.queryStringQuery("Nel");
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.query(matchQueryBuilder);
SearchRequest searchRequest = new SearchRequest("index_name");
searchRequest.source(sourceBuilder);
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
If we map with the SQL statement, as we used where campus='Bradford' OR campus='Oxford'.
In the document, I have "Nelson Mandela II"
Currently, it is working if I write Nelson as query but I need it to work with query Nel.

There basically two possible ways to achieve the use-case you are looking for.
Solution 1: Using wildcard query
Assuming that you have two fields
name of type text
campus of type text
Below is how your java code would be:
private static void wildcardQuery(RestHighLevelClient client, SearchSourceBuilder sourceBuilder)
throws IOException {
System.out.println("-----------------------------------------------------");
System.out.println("Wildcard Query");
MatchQueryBuilder campusClause_1 = QueryBuilders.matchQuery("campus", "oxford");
MatchQueryBuilder campusClause_2 = QueryBuilders.matchQuery("campus", "bradford");
//Using wildcard query
WildcardQueryBuilder nameClause = QueryBuilders.wildcardQuery("name", "nel*");
//Main Query
BoolQueryBuilder query = QueryBuilders.boolQuery()
.must(nameClause)
.should(campusClause_1)
.should(campusClause_2)
.minimumShouldMatch(1);
sourceBuilder.query(query);
SearchRequest searchRequest = new SearchRequest();
//specify your index name in the below parameter
searchRequest.indices("my_wildcard_index");
searchRequest.source(sourceBuilder);
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
System.out.println(searchResponse.getHits().getTotalHits());
System.out.println("-----------------------------------------------------");
}
Note that if the fields of the above were of keyword type and you need exact match for case sensitivity, you'd need the below code:
TermQueryBuilder campusClause_2 = QueryBuilders.termQuery("campus", "Bradford");
Solution 2. Using Edge Ngram tokenizer (Preferred Solution)
For this you would need to make use of Edge Ngram tokenizer.
Below is how your mapping would be:
Mapping:
PUT my_index
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"filter": "lowercase",
"tokenizer": "my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 10,
"token_chars": [
"letter",
"digit"
]
}
}
}
},
"mappings": {
"properties": {
"name":{
"type": "text",
"analyzer": "my_analyzer"
},
"campus": {
"type": "text"
}
}
}
}
Sample Documents:
PUT my_index/_doc/1
{
"name": "Nelson Mandela",
"campus": "Bradford"
}
PUT my_index/_doc/2
{
"name": "Nel Chaz",
"campus": "Oxford"
}
Query DSL
POST my_index/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"name": "nel"
}
}
],
"should": [
{
"match": {
"campus": "bradford"
}
},
{
"match": {
"campus": "oxford"
}
}
],
"minimum_should_match": 1
}
}
}
Java Code:
private static void boolMatchQuery(RestHighLevelClient client, SearchSourceBuilder sourceBuilder)
throws IOException {
System.out.println("-----------------------------------------------------");
System.out.println("Bool Query");
MatchQueryBuilder campusClause_1 = QueryBuilders.matchQuery("campus", "oxford");
MatchQueryBuilder campusClause_2 = QueryBuilders.matchQuery("campus", "bradford");
//Plain old match query would suffice here
MatchQueryBuilder nameClause = QueryBuilders.matchQuery("name", "nel");
BoolQueryBuilder query = QueryBuilders.boolQuery()
.must(nameClause)
.should(campusClause_1)
.should(campusClause_2)
.minimumShouldMatch(1);
sourceBuilder.query(query);
SearchRequest searchRequest = new SearchRequest();
searchRequest.indices("my_index");
searchRequest.source(sourceBuilder);
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
System.out.println(searchResponse.getHits().getTotalHits());
}
Note how I've just made use of match query for the name field. I'd suggest you read a bit about what analysis, analyzer, tokenizer and edge-ngram tokenizers are about.
In the console, you should be able to see the total hits of the document.
Similarly you can also make use of other query types for e.g. Term query in the above solutions if you are looking for exact match for keyword field etc.
Updated Answer:
Personally I do not recommend Solution 1 as it would be lot of computational power wastage for a single field itself, let alone for multiple fields.
In order to do multi-field sub-string matches, the best way to do that would be to make use of a concept called as copy-to and then make use of Edge N-Gram tokenizer for that field.
So what does this Edge N-Gram tokenizer do really? Put it simply, based on min-gram and max-gram it would simply break down your tokens for e.g.
Zeppelin into Zep, Zepp, Zeppe, Zeppel, Zeppeli, Zeppelin and thereby insert these values in the inverted index of that field. Not if you just execute a very simple match query, it would return that document as your inverted index would have that substring.
And about copy_to field:
The copy_to parameter allows you to copy the values of multiple fields
into a group field, which can then be queried as a single field.
Using copy_to field, we have the below mapping for the two fields campus and name.
Mapping:
PUT my_index
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"filter": "lowercase",
"tokenizer": "my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer": {
"type": "edge_ngram",
"min_gram": 3,
"max_gram": 10,
"token_chars": [
"letter",
"digit"
]
}
}
}
},
"mappings": {
"properties": {
"name":{
"type": "text",
"copy_to": "search_string" <---- Note this
},
"campus": {
"type": "text",
"copy_to": "search_string" <---- Note this
},
"search_string": {
"type": "text",
"analyzer": "my_analyzer" <---- Note this
}
}
}
}
Notice in the above mapping, how I've made use of the Edge N-gram specific analyzer only to search_string. Note that this consumes disk space as a result you may want to take a step back and make sure that you do not use this analyzer for all the fields but again it depends on the use-case that you have.
Sample Document:
POST my_index/_doc/1
{
"campus": "Cambridge University",
"name": "Ramanujan"
}
Search Query:
POST my_index/_search
{
"query": {
"match": {
"search_string": "ram"
}
}
}
And that would give you the Java Code as simple as below:
private static void boolMatchQuery(RestHighLevelClient client, SearchSourceBuilder sourceBuilder)
throws IOException {
System.out.println("-----------------------------------------------------");
System.out.println("Bool Query");
MatchQueryBuilder searchClause = QueryBuilders.matchQuery("search_string", "ram");
//Feel free to add multiple clauses
BoolQueryBuilder query = QueryBuilders.boolQuery()
.must(searchClause);
sourceBuilder.query(query);
SearchRequest searchRequest = new SearchRequest();
searchRequest.indices("my_index");
searchRequest.source(sourceBuilder);
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
System.out.println(searchResponse.getHits().getTotalHits());
}
Hope that helps!

Convert JSONArray to List<HashMap<String,Object>> in java without GSON

I have a JSONArray from net.minidev.json which I want to convert to List<HashMap<String,Object>>.
There are many answers for converting the JSONArray using Gson.
However, I cannot add Gson as a dependency to my pom.xml, so I am looking for a way to achieve it with Java-8 features.
My JSONArray is something like this: It comprises multiple hierarchies.
[
{
"data": {
"name": "idris"
},
"children": [
{
"processStandardDeduction": 69394,
"cropId": 1,
"data": null,
"expectedQuantityPerAcre": 1220,
"name": "Red Tomato 1 quality",
"id": 1003,
"autoArchivePlotDays": 59902
},
{
"processStandardDeduction": 69394,
"cropId": 1,
"autoArchivePlotDays": 59902
},
{
"processStandardDeduction": 69394,
"cropId": 1,
"autoArchivePlotDays": 59902
}
],
"name": "Red Tomato",
"id": 1002
},
{
"data": null,
"name": "Red Tomato 1 quality",
"id": 1003,
"processStandardDeduction": 69394,
"cropId": 1,
"expectedQuantityPerAcre": 1220,
"cropName": "Tomato",
"autoArchivePlotDays": 59902
},
{
"data": null,
"name": "Red Tomato 3 quality",
"id": 1001,
"processStandardDeduction": 69394,
"autoArchivePlotDays": 59902
},
{
"processStandardDeduction": 69394,
"cropId": 1,
"data": null,
"id": 1004,
"autoArchivePlotDays": 59902
}
]
I would like to achieve same structure in List>
I tried looping each element of the JSONArray by converting it to each HashMap<String,Object> and then adding it to the List<HashMap<String,Object>>.
ObjectMapper mapper = new ObjectMapper();
List<HashMap<String, Object>> cropDetailsList = new ArrayList<>();
for (Object eachCropJson : cropDetails) { //cropDetails is the JSONArray
HashMap<String, Object> eachCropMap = (HashMap<String, Object>) mapper.convertValue(eachCropJson,
HashMap.class);
cropDetailsList.add(eachCropMap);
}
return cropDetailsList;
I would like to try a better approach using Java-8 features without using a forEach.
Thanks in advance.

If you get this JSON as String then you can use ObjectMapper.readValue method
readValue(String content, TypeReference valueTypeRef)
Code
List<HashMap<String, Object>> cropDetailsList = mapper.readValue(jsonString,
new TypeReference<List<HashMap<String, Object>>>(){});
In the same way if you want to iterate JSONArray
List<HashMap<String, Object>> cropDetailsList = cropDetails.stream()
.map(eachCropJson->(HashMap<String, Object>) mapper.convertValue(eachCropJson, HashMap.class))
.collect(Collectors.toList());

Using stream this can be done
List<HashMap<String, Object>> output = cropDetails.toList().stream().map(m -> ((HashMap<String, Object>) m)).collect(Collectors.toList());

ElasticSearch Java API nested query with inner_hits error

i have problem with ElasticSearch Java API. I use version 5.1.2.
I will now describe code pasted below. I need to optimize search mechanism by limiting inner_hits only to object id. I used InnerHitBuilder with .setFetchSourceContext(FetchSourceContext.DO_NOT_FETCH_SOURCE) and .addDocValueField("item.id"). Query being generated has error - there is "ignore_unmapped" attribute inside "inner_hits" node.
..."inner_hits": {
"name": "itemTerms",
"ignore_unmapped": false,
"from": 0,
"size": 2147483647,
"version": false,
"explain": false,
"track_scores": false,
"_source": false,
"docvalue_fields": ["item.id"]
}...
Executing such query results with error:
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "[inner_hits] unknown field [ignore_unmapped], parser not found"
}
],
"type": "illegal_argument_exception",
"reason": "[inner_hits] unknown field [ignore_unmapped], parser not found"
},
"status": 400
}
When i manually remove that attribute from query, everything runs smoothly.
protected BoolQueryBuilder itemTermQuery(FileTerms terms, boolean withInners) {
BoolQueryBuilder termsQuery = QueryBuilders.boolQuery();
for (String term : FileTerms.terms()) {
if (terms.term(term).isEmpty())
continue;
Set<String> fns = terms.term(term).stream().
map(x -> x.getTerm())
.filter(y -> !y.isEmpty())
.collect(Collectors.toSet());
if (!fns.isEmpty())
termsQuery = termsQuery.must(
QueryBuilders.termsQuery("item.terms." + term + ".term", fns));
}
QueryBuilder query = terms.notEmpty() ? termsQuery : QueryBuilders.matchAllQuery();
TermsQueryBuilder discontinuedQuery = QueryBuilders.termsQuery("item.terms." + FileTerms.Terms.USAGE_IS + ".term",
new FileTerm("Discontinued", "", "", "", "").getTerm());
FunctionScoreQueryBuilder.FilterFunctionBuilder[] functionBuilders = {
new FunctionScoreQueryBuilder.FilterFunctionBuilder(query, ScoreFunctionBuilders.weightFactorFunction(1)),
new FunctionScoreQueryBuilder.FilterFunctionBuilder(discontinuedQuery, ScoreFunctionBuilders.weightFactorFunction(-1000))
};
FunctionScoreQueryBuilder functionScoreQuery = functionScoreQuery(functionBuilders);
NestedQueryBuilder nested = QueryBuilders.nestedQuery("item", functionScoreQuery.query(), ScoreMode.None);
if (withInners) nested = nested.innerHit(new InnerHitBuilder()
.setFetchSourceContext(FetchSourceContext.DO_NOT_FETCH_SOURCE)
.addDocValueField("item.id")
.setSize(Integer.MAX_VALUE)
.setName("itemTerms"));
return QueryBuilders.boolQuery().must(nested);
}
How to build query without that unnecessary attribute inside "inner_hits" node?
EDIT:
I use 5.1.2 library and 5.1.2 elastic server.
<dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>transport</artifactId>
<version>5.1.2</version>
</dependency>
"version": {
"number": "5.1.2",
"build_hash": "c8c4c16",
"build_date": "2017-01-11T20:18:39.146Z",
"build_snapshot": false,
"lucene_version": "6.3.0"
},

Java mongodb - find then average

Okey, let's start. Imagine that we have the next mongo collection:
{
"city": "TWENTYNINE PALMS",
"loc": [-116.06041, 34.237969],
"pop": 11412,
"state": "CA",
"_id": "92278"
}
{
"city": "NEW CUYAMA",
"loc": [-74.823806, 34.996709],
"pop": 80,
"state": "CA",
"_id": "93254"
}
{
"city": "WATERBURY",
"loc": [-72.996268, 41.550328],
"pop": 25128,
"state": "CT",
"_id": "06705"
}
Notice that loc array is [latitude,longitude]
I would like to obtain using java mongo driver the "pop" average of the cities that have the altitude beetwen -75,-70.
So, using SQL I know that the query is:
SELECT avg(pop)
WHERE loc.altitude > -75 AND lloc.altitude < -70
I am very noob in mongodb, this is my current code:
BasicDBObject doc = new BasicDBObject("loc.0", new BasicDBObject("$gte",
-75).append("$lte", -70));
DBCursor cursor = collection.find(doc);
The previous code returns me all the documents that altitude are beetwen (-75,-70), but I do not know how to obtain the average,using mongo driver, I know that I can iterate over results using java..
Thank you

Use the aggregation framework with following aggregation pipeline (Mongo shell implementation):
db.collection.aggregate([
{
"$match": {
"loc.0": { "$gte": -75 },
"loc.1": { "$lte": 70 }
}
},
{
"$group": {
"_id": 0,
"average": {
"$avg": "$pop"
}
}
}
])
With the example above, this outputs to console:
/* 1 */
{
"result" : [
{
"_id" : 0,
"average" : 12604
}
],
"ok" : 1
}
With Java, this can be implemented as follows:
DBObject match = new BasicDBObject();
match.put("loc.0", new BasicDBObject("$gte", -75));
match.put("loc.1", new BasicDBObject("$lte", 70));
DBObject groupFields = new BasicDBObject( "_id", 0);
groupFields.put("average", new BasicDBObject( "$avg", "$pop"));
DBObject group = new BasicDBObject("$group", groupFields);
AggregationOutput output = collection.aggregate( match, group );

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Elasticsearch aggregate and sum by field (group by) - java

Related

How to update nested elasticsearch value via bulkrequest?

Partial query with Where statement using Elasticssearch 7 java api

Convert JSONArray to List<HashMap<String,Object>> in java without GSON

ElasticSearch Java API nested query with inner_hits error

Java mongodb - find then average

Categories

Resources