How to update nested elasticsearch value via bulkrequest? - java

We are using AWS Elasticsearch - 7.7 Version
I already followed Update nested field in an index of ElasticSearch with Java API
I have below JSON Elastic Search
{
"_index": "product",
"_type": "_doc",
"_source": {
"id": 1,
"name": "test",
"properties": [{
"id": 1,
"qty": 10
}]
}
}
I have below code
BulkRequest request = new BulkRequest();
request.add(new UpdateRequest(<ES Endpoint>, 1))
.doc(XContentType.JSON, "name", "TEST 1"));
BulkResponse bulkResponse = restClient.bulk(request, RequestOptions.DEFAULT);
How should I update "properties" value "qty"?
https://www.elastic.co/guide/en/elasticsearch/client/java-api/6.8/java-docs-update.html

You can pass a Map with all fields to update in the doc() call:
Map doc = new HashMap();
doc.put("name", "TEST 1");
doc.put("qty", 12);
request.add(new UpdateRequest("index", 1)
.doc(XContentType.JSON, doc));

Related

ElasticSearch Java API nested query with inner_hits error

i have problem with ElasticSearch Java API. I use version 5.1.2.
I will now describe code pasted below. I need to optimize search mechanism by limiting inner_hits only to object id. I used InnerHitBuilder with .setFetchSourceContext(FetchSourceContext.DO_NOT_FETCH_SOURCE) and .addDocValueField("item.id"). Query being generated has error - there is "ignore_unmapped" attribute inside "inner_hits" node.
..."inner_hits": {
"name": "itemTerms",
"ignore_unmapped": false,
"from": 0,
"size": 2147483647,
"version": false,
"explain": false,
"track_scores": false,
"_source": false,
"docvalue_fields": ["item.id"]
}...
Executing such query results with error:
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "[inner_hits] unknown field [ignore_unmapped], parser not found"
}
],
"type": "illegal_argument_exception",
"reason": "[inner_hits] unknown field [ignore_unmapped], parser not found"
},
"status": 400
}
When i manually remove that attribute from query, everything runs smoothly.
protected BoolQueryBuilder itemTermQuery(FileTerms terms, boolean withInners) {
BoolQueryBuilder termsQuery = QueryBuilders.boolQuery();
for (String term : FileTerms.terms()) {
if (terms.term(term).isEmpty())
continue;
Set<String> fns = terms.term(term).stream().
map(x -> x.getTerm())
.filter(y -> !y.isEmpty())
.collect(Collectors.toSet());
if (!fns.isEmpty())
termsQuery = termsQuery.must(
QueryBuilders.termsQuery("item.terms." + term + ".term", fns));
}
QueryBuilder query = terms.notEmpty() ? termsQuery : QueryBuilders.matchAllQuery();
TermsQueryBuilder discontinuedQuery = QueryBuilders.termsQuery("item.terms." + FileTerms.Terms.USAGE_IS + ".term",
new FileTerm("Discontinued", "", "", "", "").getTerm());
FunctionScoreQueryBuilder.FilterFunctionBuilder[] functionBuilders = {
new FunctionScoreQueryBuilder.FilterFunctionBuilder(query, ScoreFunctionBuilders.weightFactorFunction(1)),
new FunctionScoreQueryBuilder.FilterFunctionBuilder(discontinuedQuery, ScoreFunctionBuilders.weightFactorFunction(-1000))
};
FunctionScoreQueryBuilder functionScoreQuery = functionScoreQuery(functionBuilders);
NestedQueryBuilder nested = QueryBuilders.nestedQuery("item", functionScoreQuery.query(), ScoreMode.None);
if (withInners) nested = nested.innerHit(new InnerHitBuilder()
.setFetchSourceContext(FetchSourceContext.DO_NOT_FETCH_SOURCE)
.addDocValueField("item.id")
.setSize(Integer.MAX_VALUE)
.setName("itemTerms"));
return QueryBuilders.boolQuery().must(nested);
}
How to build query without that unnecessary attribute inside "inner_hits" node?
EDIT:
I use 5.1.2 library and 5.1.2 elastic server.
<dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>transport</artifactId>
<version>5.1.2</version>
</dependency>
"version": {
"number": "5.1.2",
"build_hash": "c8c4c16",
"build_date": "2017-01-11T20:18:39.146Z",
"build_snapshot": false,
"lucene_version": "6.3.0"
},

Elasticsearch aggregate and sum by field (group by)

I have the next data in my index:
{
"category": "fruit",
"name": "apple",
"price": 2.6,
},
{
"category": "fruit",
"name": "orange",
"price": 1.8,
},
{
"category": "vegs",
"name": "tomato",
"price": 0.95,
}
I would like to sum the prices by category that will lead to a result like:
fruit - 4.4
vegs - 0.95
I do realize that I need to use nested aggregation, but I fail to see how exactly. Here is the code I got so far:
SearchRequest searchRequest = new SearchRequest("products");
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.
aggregation(AggregationBuilders.
nested("category_price", "products").
subAggregation(
AggregationBuilders
.terms("field").field("category")).
subAggregation(
AggregationBuilders.avg("avg_price").field("price"))
);
searchRequest.source(searchSourceBuilder);
SearchResponse response = client.search(searchRequest);
Nested agg = response.getAggregations().get("category_price");
Terms name = agg.getAggregations().get("field");
for (Terms.Bucket bucket : name.getBuckets()) {
ReverseNested resellerToProduct = bucket.getAggregations().get("avg_price");
System.out.println(resellerToProduct.getDocCount());
System.out.println(resellerToProduct.getName());
}
you created the second aggregation as sibling and you need just sub-aggregation.
+ you don't have to use here nested aggregation .
AggregationBuilder aggregationBuilder = AggregationBuilders.global("agg")
.subAggregation(AggregationBuilders.terms("by_category").field("category")
.subAggregation(AggregationBuilders.sum("sum_price").field("price")));

How to get Accumulator result using morphia in mongoDB?

I'm using Morphia with MongoDB in Java, I like to get a number of records in aggregation query like that:
AggregationPipeline pipCount = ds.createAggregation(MyTable.class)
.match(query1)
.match(query2)
.unwind("transactions")
.match(query3)
.group("_id", grouping("_id"), grouping("count", new Accumulator("$sum", 1)));
Iterator<MyTable> result = pipCount.aggregate(MyTable.class);
I need to use grouping("_id") to remove duplicate result and then count the result but can't find any way to read sum value...
any idea?
Sample Data:
{
"_id": "00000222",
"create_date": ISODate("2015-05-06T07:20:31.000+0000"),
"update_date": ISODate("2015-05-06T07:20:31.000+0000"),
"payment": 70.0,
"fee": 0.0,
"type": "RECURRING",
"currency": "USD",
"status": "OK",
"transactions": [{
"_id": "111111223",
"amount": 1260.0,
"fee_type": "VARIABLE_ADD",
"fee_rate": 2.75,
"status": "ERROR",
"charges": [{
"_id": "2222223344",
"amount": 1000.0,
"recurring": true,
"firstTime": false,
"oneTime": true,
}, {
"_id": "222222222233221",
"amount": 70.0,
"recurring": true,
"firstTime": true,
"oneTime": true,
}]
}],
"users": {
"_id": "33333333332212",
"update_date": ISODate("2015-12-18T08:03:35.000+0000"),
"user_id": "sdjfhsd#skjksdf.com",
"first_name": "dsjfj",
"last_name": "skdfjf",
}
}
Result: 1
You can try something like this. You don't need an extra grouping. The first group will take care of duplicates while counting the sum and project the count and map the response to document and read the count.
import org.bson.Document;
AggregationPipeline pipCount = datastore.createAggregation(MyTable.class)
.match(query1)
.match(query2)
.unwind("somethingID")
.match(query3)
.group("_id", grouping("count", new Accumulator("$sum", 1)))
.project(Projection.projection("count"));
Iterator<Document> result = pipCount.aggregate(Document.class);
while (result.hasNext()) {
Document document = result.next();
Integer count = document.getInteger("count");
}

Elasticsearch grouping by nested fields

Is there a way to group by nested fields and perform aggregation on a non-nested fields??
I have data like this in ES
{
"_index": "bighalf",
"_type": "excel",
"_id": "AVE0rgXqe0-x669Gsae3",
"_score": 1,
"_source": {
"Name": "Marsh",
"date": "2015-11-07T10:47:14",
"grade": 9,
"year": 2016,
"marks": 70,
"subject": "Mathematics",
"Gender": "male",
"dob": "22/11/2000",
"sprint": [
{
"sprintdate": "2015-11-06T22:30:00",
"sprintname": "changed",
"sprintpoints": 52
}
]
}
},
{
"_index": "bighalf",
"_type": "excel",
"_id": "AVE0rvTHe0-x669Gsae5",
"_score": 1,
"_source": {
"Name": "Taylor",
"date": "2015-11-07T10:47:14",
"grade": 9,
"year": 2016,
"marks": 54,
"subject": "Mathematics",
"Gender": "male",
"dob": "22/11/2000",
"sprint": [
{
"sprintdate": "2015-11-07T22:30:00",
"sprintname": "jira",
"sprintpoints": 52
}
]
}
}
I wanted to group by sprintname and find sum of marks
I tried like this:
SumBuilder sumGrades = AggregationBuilders.sum("sum_grade").field("grade");
NestedBuilder nested = AggregationBuilders.nested("nested").path("sprint")
.subAggregation(AggregationBuilders.terms("by_sprint").field("sprint.sprintname").subAggregation(sumGrades));
String names[] = { "changed", "jira" };
QueryBuilder query = QueryBuilders.boolQuery().must(
QueryBuilders.nestedQuery("sprint",QueryBuilders.boolQuery().must(QueryBuilders.termsQuery("sprint.sprintname", names))));
FilterAggregationBuilder aggregation = AggregationBuilders.filter("agg").filter(query).subAggregation(nested);
the sum_grade did not work for me. But I changed field(grade) with nested field (sprintpoints) and it worked But my requirement is to find sum("grade") and group by sprint.sprintname.
Since your sprint field is of nested type, in your aggregation you need to use the reverse_nested aggregation in order to "jump back" at the root document from within your nested ones. It goes like this:
SumBuilder sumGrades = AggregationBuilders.sum("sum_grade").field("grade");
ReverseNestedBuilder backToGrades = AggregationBuilders.reverseNested("spring_to_grade")
.subAggregation(sumGrades);
TermsBuilder bySprint = AggregationBuilders.terms("by_sprint")
.field("sprint.sprintname").subAggregation(backToGrades)
NestedBuilder nested = AggregationBuilders.nested("nested").path("sprint")
.subAggregation(bySprint);
String names[] = { "changed", "jira" };
QueryBuilder query = QueryBuilders.boolQuery().must(
QueryBuilders.nestedQuery("sprint",QueryBuilders.boolQuery().must(QueryBuilders.termsQuery("sprint.sprintname", names))));
FilterAggregationBuilder aggregation = AggregationBuilders.filter("agg").filter(query).subAggregation(nested);

Add _timestamp to Elasticsearch results using Java API

I have an Elasticsearch index which has _timestamp populated on every record. Using Marvel or curl I can get the _timestamp in the "fields" part of the result for example:
GET index/type/_search?fields=_timestamp,_source
{
"took": 11,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"failed": 0
},
"hits": {
"total": 116888,
"max_score": 1,
"hits": [
{
"_index": "index",
"_type": "type",
"_id": "mXJdWqSLSfykbMtChiCRjA",
"_score": 1,
"_source": {
"results": "example",
},
"fields": {
"_timestamp": 1443618319514
}
},...
However when doing a search using the Java API I cant get it to return the _timestamp.
SearchRequestBuilder builder= client.prepareSearch(index)
.addFacet(facet)
.setFrom(start)
.setSize(limit);
SearchResponse response = builder.execute().actionGet();
Can anyone tell me how to ask for _timestamp too?
You simply need to use the setFields() method like this:
SearchRequestBuilder builder= client.prepareSearch(index)
.setType(type)
.addFacet(facet)
.setFields("_timestamp") <--- add this line
.setFrom(start)
.setSize(limit);
SearchResponse response = builder.execute().actionGet();

Categories