Spring MongoDB Aggregation Group - Can't get the group query right - java

I have a document in a MongoDB, which looks like follows.
{
"_id" : ObjectId("5ceb812b3ec6d22cb94c82ca"),
"key" : "KEYCODE001",
"values" : [
{
"classId" : "CLASS_01",
"objects" : [
{
"code" : "DD0001"
},
{
"code" : "DD0010"
}
]
},
{
"classId" : "CLASS_02",
"objects" : [
{
"code" : "AD0001"
}
]
}
]
}
I am interested in getting a result like follows.
{
"classId" : "CLASS_01",
"objects" : [
{
"code" : "DD0001"
},
{
"code" : "DD0010"
}
]
}
To get this, I came up with an aggregation pipeline in Robo 3T, which looks like follows. And it's working as expected.
[
{
$match:{
'key':'KEYCODE001'
}
},
{
"$unwind":{
"path": "$values",
"preserveNullAndEmptyArrays": true
}
},
{
"$unwind":{
"path": "$values.objects",
"preserveNullAndEmptyArrays": true
}
},
{
$match:{
'values.classId':'CLASS_01'
}
},
{
$project:{
'object':'$values.objects',
'classId':'$values.classId'
}
},
{
$group:{
'_id':'$classId',
'objects':{
$push:'$object'
}
}
},
{
$project:{
'_id':0,
'classId':'$_id',
'objects':'$$objects'
}
}
]
Now, when I try to do the same in a SpringBoot application, I can't get it running. I ended up having the error java.lang.IllegalArgumentException: Invalid reference '$complication'!. Following is what I have done in Java so far.
final Aggregation aggregation = newAggregation(
match(Criteria.where("key").is("KEYCODE001")),
unwind("$values", true),
unwind("$values.objects", true),
match(Criteria.where("classId").is("CLASS_01")),
project().and("$values.classId").as("classId").and("$values.objects").as("object"),
group("classId", "objects").push("$object").as("objects").first("$classId").as("_id"),
project().and("$_id").as("classId").and("$objects").as("objects")
);
What am I doing wrong? Upon research, I found that multiple fields in group does not work or something like that (please refer to this question). So, is what I am currently doing even possible in Spring Boot?

After hours of debugging + trial and error, found the following solution to be working.
final Aggregation aggregation = newAggregation(
match(Criteria.where("key").is("KEYCODE001")),
unwind("values", true),
unwind("values.objects", true),
match(Criteria.where("values.classId").is("CLASS_01")),
project().and("values.classId").as("classId").and("values.objects").as("object"),
group(Fields.from(Fields.field("_id", "classId"))).push("object").as("objects"),
project().and("_id").as("classId").and("objects").as("objects")
);
It all boils down to group(Fields.from(Fields.field("_id", "classId"))).push("object").as("objects") that which introduces a org.springframework.data.mongodb.core.aggregation.Fields object that wraps a list of org.springframework.data.mongodb.core.aggregation.Field objects. Within Field, the name of the field and the target could be encapsulated. This resulted in the following pipeline which is a match for the expected.
[
{
"$match" :{
"key" : "KEYCODE001"
}
},
{
"$unwind" :{
"path" : "$values", "preserveNullAndEmptyArrays" : true
}
},
{
"$unwind" :{
"path" : "$values.objects", "preserveNullAndEmptyArrays" : true
}
},
{
"$match" :{
"values.classId" : "CLASS_01"
}
},
{
"$project" :{
"classId" : "$values.classId", "object" : "$values.objects"
}
},
{
"$group" :{
"_id" : "$classId",
"objects" :{
"$push" : "$object"
}
}
},
{
"$project" :{
"classId" : "$_id", "objects" : 1
}
}
]
Additionally, figured that there is no need to using $ sign anywhere and everywhere.

Related

MongoTemplate upsert property snapshot

I'm using mongo 4.2.15
Here is entry:
{
"keys": {
"country": "US",
"channel": "c999"
},
"counters": {
"sale": 0
},
"increments": null
}
I want to be able to initialize counter set as well as increment counters.sale value and save increment result snapshot to increments property. Something like that:
db.getCollection('counterSets').update(
{ "$and" : [
{ "keys.country" : "US"},
{ "keys.channel" : "c999"}
]
},
{ "$inc" :
{ "counters.sale" : 10
},
"$set" :
{ "keys" :
{ "country" : "US", "channel" : "c999"},
"increments":
{ "3000c058-b8a7-4cff-915b-4979ef9a6ed9": {"counters" : "$counters"} }
}
},
{upsert: true})
The result is:
{
"_id" : ObjectId("61965aba1501d6eb40588ba0"),
"keys" : {
"country" : "US",
"channel" : "c999"
},
"counters" : {
"sale" : 10.0
},
"increments" : {
"3000c058-b8a7-4cff-915b-4979ef9a6ed9" : {
"counters" : "$counters"
}
}
}
Does it possible to do such update which is some how copy increment result from root object counters to child increments.3000c058-b8a7-4cff-915b-4979ef9a6ed9.counters with a single upsert. I want to implement safe inrement. Maybe you can suggest some another design?
In order to use expressions, your $set should be part of aggregation pipeline. So your query should look like
NOTE: I've added square brackets to the update
db.getCollection('counterSets').update(
{ "$and" : [
{ "keys.country" : "US"},
{ "keys.channel" : "c999"}
]
},
[ {"$set": {"counters.sale": {"$sum":["$counters.sale", 10]}}}, {"$set": {"increments.x": "$counters"}}],
{upsert: true})
I haven't found any information about the atomicity of aggregation pipelines, so use this carefully.

Why my java elasticsearch request translation is not valid?

I'm currently making an elasticsearch request to retrieves some data. I have succeeded to write the right request in Json format. After that I tried to translate this one into Java. But when I print the request that the Java sends to ES, both requests are not the same and I don't achieve to make that.
Here is the Json request that returns the GOOD data:
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{ "terms": { "accountId": ["107276450147"] } },
{"range" : {
"date" : {
"lt" : "1480612801000",
"gte" : "1478020801000"
} }
}]
}
}
}
},
"size" : 0,
"aggregations" : {
"field-aggregation" : {
"terms" : {
"field" : "publicationId",
"size" : 2147483647
},
"aggregations" : {
"top-aggregation" : {
"top_hits" : {
"size" : 1,
"_source" : {
"includes" : [ ],
"excludes" : [ ]
}
}
}
}
}
}
}
And the Java generated request... which does not return good data..
{
"from" : 0,
"size" : 10,
"aggregations" : {
"field-aggregation" : {
"terms" : {
"field" : "publicationId",
"size" : 2147483647
},
"aggregations" : {
"top-aggregation" : {
"top_hits" : {
"size" : 1,
"_source" : {
"includes" : [ ],
"excludes" : [ ]
}
}
}
}
}
}
}
And finally the java code that generate the wrong json request:
TopHitsBuilder top = AggregationBuilders.topHits("top-aggregation")
.setFetchSource(true)
.setSize(1);
TermsBuilder field = AggregationBuilders.terms("field-aggregation")
.field(aggFieldName)
.size(Integer.MAX_VALUE)
.subAggregation(top);
BoolFilterBuilder filterBuilder = FilterBuilders.boolFilter()
.must(FilterBuilders.termsFilter("accountId", Collections.singletonList("107276450147")))
.must(FilterBuilders.rangeFilter("date").gte(1478020801000L).lte(1480612801000L));
NativeSearchQueryBuilder query = new NativeSearchQueryBuilder()
.withQuery(QueryBuilders.filteredQuery(QueryBuilders.matchAllQuery(), filterBuilder))
.withIndices("metric")
.withTypes(type)
.addAggregation(field);
return template.query(query.build());
First of all, I must remove the "size":10 and the "from" that the Java generates... And after I have to add the filters. I did this but it's never added..
Can you tell what is wrong in my java code and why the filters does not appears in the final Json?
Thanks guys.
Thanks guys. I finally manage the problem. The java sent the good query but I was looked at the wrong place in ES java API. Nevertheless, I added the request type to COUNT in order to avoid ES send me back the non-aggregated data that are useless for me.

how to update nested field values in elasticsearch using java api

i have a following document in elasticsearch
{
"uuid":"123",
"Email":"mail#example.com",
"FirstName":"personFirstNmae",
"LastName":"personLastName",
"Inbox":{
"uuid":"1234",
"messageList":[
{
"uuid":"321",
"Subject":"subject1",
"Body":"bodyText1",
"ArtworkUuid":"101",
"DateAndTime":"2015-10-15T10:59:12.096+05:00",
"ReadStatusInt":0,
"Delete":{
"deleteStatus":0,
"deleteReason":0
}
},
{
"uuid":"123",
"Subject":"subject",
"Body":"bodyText",
"ArtworkUuid":"100",
"DateAndTime":"2015-10-15T10:59:11.982+05:00",
"ReadStatusInt":1,
"Delete":{
"deleteStatus":0,
"deleteReason":0
}
}
]
}
}
and here is the mapping of the doc
{
"testdb" : {
"mappings" : {
"directUser" : {
"properties" : {
"Email" : {
"type" : "string",
"store" : true
},
"FirstName" : {
"type" : "string",
"store" : true
},
"Inbox" : {
"type" : "nested",
"include_in_parent" : true,
"properties" : {
"messageList" : {
"type" : "nested",
"include_in_parent" : true,
"properties" : {
"ArtworkUuid" : {
"type" : "string",
"store" : true
},
"Body" : {
"type" : "string",
"store" : true
},
"DateAndTime" : {
"type" : "date",
"store" : true,
"format" : "dateOptionalTime"
},
"Delete" : {
"type" : "nested",
"include_in_parent" : true,
"properties" : {
"deleteReason" : {
"type" : "integer",
"store" : true
},
"deleteStatus" : {
"type" : "integer",
"store" : true
}
}
},
"ReadStatusInt" : {
"type" : "integer",
"store" : true
},
"Subject" : {
"type" : "string",
"store" : true
},
"uuid" : {
"type" : "string",
"store" : true
}
}
},
"uuid" : {
"type" : "string",
"store" : true
}
}
},
"LastName" : {
"type" : "string",
"store" : true
},
"uuid" : {
"type" : "string",
"store" : true
}
}
}
}
}
}
Now i want to update the values of Inbox.messageList.Delete.deleteStatus and Inbox.messageList.Delete.deleteReason from 0 to 1 of the doc with uuid 321 (Inbox.messageList.uuid).
i want to achieve something like this
{
"uuid":"123",
"Email":"mail#example.com",
"FirstName":"personFirstNmae",
"LastName":"personLastName",
"Inbox":{
"uuid":"1234",
"messageList":[
{
"uuid":"321",
"Subject":"subject1",
"Body":"bodyText1",
"ArtworkUuid":"101",
"DateAndTime":"2015-10-15T10:59:12.096+05:00",
"ReadStatusInt":0,
"Delete":{
"deleteStatus":1,
"deleteReason":1
}
},
{
"uuid":"123",
"Subject":"subject",
"Body":"bodyText",
"ArtworkUuid":"100",
"DateAndTime":"2015-10-15T10:59:11.982+05:00",
"ReadStatusInt":1,
"Delete":{
"deleteStatus":0,
"deleteReason":0
}
}
]
}
}
i am trying the following code to achieve my desired updated document
var xb:XContentBuilder=XContentFactory.jsonBuilder().startObject()
.startObject("Inbox")
xb.startArray("messageList")
xb.startObject();
xb.startObject("Delete")
xb.field("deleteStatus",1)
xb.field("deleteReason",1)
xb.endObject()
xb.endObject();
xb.endArray()
.endObject()
xb.endObject()
val responseUpdate=client.prepareUpdate("testdb", "directUser", directUserObj.getUuid.toString())
.setDoc(xb).execute().actionGet()
but from this code my doc becomes
{"uuid":"123",
"Email":"mail#example.com",
"FirstName":"personFirstNmae",
"LastName":"personLastName",
,"Inbox":{
"uuid":"1234",
"messageList":[
{
"Delete":{
"deleteStatus":1,
"deleteReason":1
}
}
]
}
}
and i do not want this, please help me how can i achieve my desired document , Iam using elasticsearch version 1.6
The best way I've found to update a single nested field is to use the elasticsearch update API that takes a (parameterised) script a la this answer. Last time I checked this kind of thing is only supported in groovy scripts, not lucene expression scripts (unfortunately). The reason your update produces the result it does is you are updating the whole nested object, not a specific nested item. Groovy script update will allow you to select and update the nested object with the specified ID.
You can also have a look at the nested object that I have currently updated and how I used the UpdateRequest class in Java here.
Specifically for the JAVA API, it is also possible to update a nested document with this answer of PeteyPabPro.

retrieve values from nested json array in mongodb

My mongo collection has entries in the following format
{
"myobj" : {
"objList" : [
{ "location" : "Texas" },
{ "location" : "Houston"},
{ "name":"Sam" }
]
},
"category" : "cat1"
}
{
"myobj" :
{
"objList" : [
{ "location" : "Tennesy" },
{ "location" : "NY"},
{ "location" : "SF" }
]
},
"category" : "cat2"
}
I want to extract the "**category**" where location is "Houston". In case of simple JSON object I have to just pass it as query like:
BasicDBObject place = new BasicDBObject();
place.put("location", "Houston");
But in case of nested JSON I don't know how to pass it as a query and get the appropriate category. ie If I pass my location as"Houston" then it should return it's appropriate category "cat1"...i hope my question is clear now....
Ok, you have your documents:
db.coll1.insert({
"myobj" : {
"objList" : [
{ "location" : "Texas" },
{ "location" : "Houston"},
{ "name":"Sam" }
]
},
"category" : "cat1"
})
and
db.coll1.insert({
"myobj" : {
"objList" : [
{ "location" : "Tennesy" },
{ "location" : "Houston"},
{ "location" : "SF" }
]
},
"category" : "cat1"
})
Now you can find what you want using the dot operator:
db.coll1.find({"myobj.objList.location": "Texas"}).pretty() will return one object which has Texas
db.coll1.find({"myobj.objList.location": "SF"}).pretty() will return one object which has SF
db.coll1.find({"myobj.objList.location": "Houston"}).pretty() will return both objects
And now I hope you will be able to write it in Java. I have never used Java, but based on this question you can do something like this. If it will not work, just look how to use dot operator in java driver for mongo:
DBCursor cursor = coll1.find(new BasicDBObject("myobj.objList.location", "Texas"));
P.S. you told, that you wanted to retrieve category. In such a way, you will need to use a projection db.coll1.find({<the query I provided}, {category: 1, _id: 0})

Format date in elasticsearch query (during retrieval)

I have a elasticsearch index with a field "aDate" (and lot of other fields) with the following mapping
"aDate" : {
"type" : "date",
"format" : "date_optional_time"
}
When i query for a document i get a result like
"aDate" : 1421179734000,
I know this is the epoch, the internal java/elasticsearch date format, but i want to have a result like:
"aDate" : "2015-01-13T20:08:54",
I play around with scripting
{
"query":{
"match_all":{
}
},
"script_fields":{
"aDate":{
"script":"if (!_source.aDate?.equals('null')) new java.text.SimpleDateFormat('yyyy-MM-dd\\'T\\'HH:mm:ss').format(new java.util.Date(_source.aDate));"
}
}
}
but it give strange results (script works basically, but aDate is the only field returned and _source is missing). This looks like
"hits": [{
"_index": "idx1",
"_type": "type2",
"_id": "8770",
"_score": 1.0,
"fields": {
"aDate": ["2015-01-12T17:15:47"]
}
},
I would prefer a solution without scripting if possible.
When you run a query in Elasticsearch you can request it to return the raw data, for example specifying fields:
curl -XGET http://localhost:9200/myindex/date-test/_search?pretty -d '
{
"fields" : "aDate",
"query":{
"match_all":{
}
}
}'
Will give you the date in the format that you originally stored it:
{
"_index" : "myindex",
"_type" : "date-test",
"_id" : "AUrlWNTAk1DYhbTcL2xO",
"_score" : 1.0,
"fields" : {
"aDate" : [ "2015-01-13T20:08:56" ]
}
}, {
"_index" : "myindex",
"_type" : "date-test",
"_id" : "AUrlQnFgk1DYhbTcL2xM",
"_score" : 1.0,
"fields" : {
"aDate" : [ 1421179734000 ]
}
It's not possible to change the date format unless you use a script.
curl -XGET http://localhost:9200/myindex/date-test/_search?pretty -d '
{
"query":{
"match_all":{ }
},
"script_fields":{
"aDate":{
"script":"use( groovy.time.TimeCategory ) { new Date( doc[\"aDate\"].value ) }"
}
}
}'
Will return:
{
"_index" : "myindex",
"_type" : "date-test",
"_id" : "AUrlWNTAk1DYhbTcL2xO",
"_score" : 1.0,
"fields" : {
"aDate" : [ "2015-01-13T20:08:56.000Z" ]
}
}, {
"_index" : "myindex",
"_type" : "date-test",
"_id" : "AUrlQnFgk1DYhbTcL2xM",
"_score" : 1.0,
"fields" : {
"aDate" : [ "2015-01-13T20:08:54.000Z" ]
}
}
To apply a format, append it as follows:
"script":"use( groovy.time.TimeCategory ){ new Date( doc[\"aDate\"].value ).format(\"yyyy-MM-dd\") }"
will return "aDate" : [ "2015-01-13" ]
To display the T, you'll need to use quotes but replace them with the Unicode equivalent:
"script":"use( groovy.time.TimeCategory ){ new Date( doc[\"aDate\"].value ).format(\"yyyy-MM-dd\u0027T\u0027HH:mm:ss\") }"
returns "aDate" : [ "2015-01-13T20:08:54" ]
To return script_fields and source
Use _source in your query to specify the fields you want to return:
curl -XGET http://localhost:9200/myindex/date-test/_search?pretty -d '
{ "_source" : "name",
"query":{
"match_all":{ }
},
"script_fields":{
"aDate":{
"script":"use( groovy.time.TimeCategory ) { new Date( doc[\"aDate\"].value ) }"
}
}
}'
Will return my name field:
"_source":{"name":"Terry"},
"fields" : {
"aDate" : [ "2015-01-13T20:08:56.000Z" ]
}
Using asterisk will return all fields, e.g.: "_source" : "*",
"_source":{"name":"Terry","aDate":1421179736000},
"fields" : {
"aDate" : [ "2015-01-13T20:08:56.000Z" ]
}
Since 5.0.0, es use Painless as script language: link
Try this (work in 6.3.2)
"script":"doc['aDate'].value.toString('yyyy-MM-dd HH:mm:ss')"
As LabOctoCat mentioned, Olly Cruickshank answer no longer works in elastic 2.2. I changed the script to:
"script":"new Date(doc['time'].value)"
You can format the date according to this.
Scripting it only computes the answer when the row is extracted. This is expensive, and keeps you from using any date-related search functions in Elasticsearch.
You should create an elasticsearch "date" field before inserting it. Looks like a java Date() object will do.
Thanks #Archon for your suggestion. I used your answer as a guide to remove the time element from a datetime field in Elasticsearch
{
"aggs": {
"grp_by_date": {
"terms": {
"size": 200,
"script": "doc['TransactionReconciliationsCreated'].value.toString('yyyy-MM-dd')"
}
}
}
}
If you use Elasticsearch 7, and want to display datetime in a specified timezone, you can request it like this
"query": {
"bool": {
"filter": [
{
"term": {
"client": {
"value": "iOS",
"boost": 1
}
}
}
],
"adjust_pure_negative": true,
"boost": 1
}
},
"script_fields": {
"time": {
"script": "ZonedDateTime input = doc['time'].value; input = input.withZoneSameInstant(ZoneId.of('Asia/Shanghai')); String output = input.format(DateTimeFormatter.ISO_ZONED_DATE_TIME); return output"
}
},
"_source": true,
return
{
...
"_source" : {
...
"time" : 1632903354213
...
},
"fields" : {
"time" : [
"2021-09-29T16:15:54.213+08:00[Asia/Shanghai]"
]
}
},
...
}

Categories