mongo + spring data + aggragate sum - java

I am looking for a solution without spring data. My project requirement is to do without spring data.
To calculate the sum using aggregate function by mongo command, able to get output. But same by using spring data getting exception.
Sample mongo query :
db.getCollection('events_collection').aggregate(
{ "$match" : { "store_no" : 3201 , "event_id" : 882800} },
{ "$group" : { "_id" : "$load_dt", "event_id": { "$first" : "$event_id" }, "start_dt" : { "$first" : "$start_dt" }, "count" : { "$sum" : 1 } } },
{ "$sort" : { "_id" : 1 } },
{ "$project" : { "load_dt" : "$_id", "ksn_cnt" : "$count", "event_id" : 1, "start_dt" : 1, "_id" : 0 } }
)
Same thing done in java as,
String json = "[ { \"$match\": { \"store_no\": 3201, \"event_id\": 882800 } }, { \"$group\": { \"_id\": \"$load_dt\", \"event_id\": { \"$first\": \"$event_id\" }, \"start_dt\": { \"$first\": \"$start_dt\" }, \"count\": { \"$sum\": 1 } } }, { \"$sort\": { \"_id\": 1 } }, { \"$project\": { \"load_dt\": \"$_id\", \"ksn_cnt\": \"$count\", \"event_id\": 1, \"start_dt\": 1, \"_id\": 0 } } ]";
BasicDBList pipeline = (BasicDBList) JSON.parse(json);
System.out.println(pipeline);
AggregationOutput output = col.aggregate(pipeline);
exception is :
com.mongodb.CommandFailureException: { "serverUsed" : "somrandomserver/10.10.10.10:27001" , "errmsg" : "exception: pipeline element 0 is not an object" , "code" : 15942 , "ok" : 0.0}
Could someone please suggest how to use aggregate function with spring?

Try the following (untested) Spring Data MongoDB aggregation equivalent
import static org.springframework.data.mongodb.core.aggregation.Aggregation.*;
MongoTemplate mongoTemplate = repository.getMongoTemplate();
Aggregation agg = newAggregation(
match(Criteria.where("store_no").is(3201).and("event_id").is(882800)),
group("load_dt")
.first("event_id").as("event_id")
.first("start_dt").as("start_dt")
.count().as("ksn_cnt"),
sort(ASC, previousOperation()),
project("ksn_cnt", "event_id", "start_dt")
.and("load_dt").previousOperation()
.and(previousOperation()).exclude()
);
AggregationResults<OutputType> result = mongoTemplate.aggregate(agg,
"events_collection", OutputType.class);
List<OutputType> mappedResult = result.getMappedResults();
As a first step, filter the input collection by using a match operation which accepts a Criteria query as an argument.
In the second step, group the intermediate filtered documents by the "load_dt" field and calculate the document count and store the result in the new field "ksn_cnt".
Sort the intermediate result by the id-reference of the previous group operation as given by the previousOperation() method.
Finally in the fourth step, select the "ksn_cnt", "event_id", and "start_dt" fields from the previous group operation. Note that "load_dt" again implicitly references an group-id field. Since you do not want an implicit generated id to appear, exclude the id from the previous operation via and(previousOperation()).exclude().
Note that if you provide an input class as the first parameter to the newAggregation method the MongoTemplate will derive the name of the input collection from this class. Otherwise if you don’t not specify an input class you must provide the name of the input collection explicitly. If an input-class and an input-collection is provided the latter takes precedence.

Related

How to create a Mongodb Distinct search with a filter?

I was looking to create a distinct search in a MongoDB collection with a filter condition.
The collection is alertStatus with the following format:
{ 'id' : 0, 'group' : "groupe1", 'state' : "PENDING" }
{ 'id' : 1, 'group' : "groupe2", 'state' : "PENDING" }
{ 'id' : 2, 'group' : "groupe3", 'state' : "DONE" }
{ 'id' : 3, 'group' : "groupe2", 'state' : "PENDING" }
I want the list of different groups where the state is "PENDING"
Here is a working solution for Java/SpringBoot :
#Autowired
protected MongoTemplate mongoTemplate;
DistinctIterable<String> groups = mongoTemplate.getCollection("alertStatus")
.distinct("group",String.class)
.filter(new Document("state","PENDING"));
for ( String g : groups) {
log.info(g);
}
If someone has a solution using JPA, please let me know.

I need to retrieve MongoDB's object just with filtered's array item

I'm needing to retrieve just with two dates, all the documents from my MongoDB's collection, with the filtered items from the array.
This is an example of 2 of my documents;
{
"_id" : ObjectId("5f18fa823406b7000132d097"),
"last_date" : "22/07/2020 23:48:32",
"history_dates" : [
"22/07/2020 23:48:32",
"22/07/2020 00:18:53",
"23/07/2020 00:49:12",
"23/07/2020 01:19:30"
],
"hostname" : "MyHostname1",
"ip" : "142.0.111.79",
"component" : "C:\\Windows\\System32\\es-ES\\KernelBase.dll.mui",
"process" : "LogonUI.exe",
"date" : "23/07/2020 10:26:04",
}
{
"_id" : ObjectId("5f18fa823406b7000132d098"),
"last_date" : "22/07/2020 23:48:33",
"history_dates" : [
"22/07/2020 23:48:33",
"23/07/2020 00:18:53",
],
"hostname" : "MyHostName2",
"ip" : "142.0.111.54",
"component" : "C:\\Windows\\System32\\es-ES\\KernelBase.dll.mui",
"process" : "svchost.exe",
"date" : "23/07/2020 10:26:04",
}
I'm needing to make a find to my database (Using Spring Data), to retrieve the same objects, but with the "history_dates"'s array filtered between the 2 dates recieved.
For example, if my 2 recieved dates are: "23/07/2020" and "24/07/2020", I want MongoDB to return the next objects;
{
"_id" : ObjectId("5f18fa823406b7000132d097"),
"last_date" : "22/07/2020 23:48:32",
"history_dates" : [
"23/07/2020 00:49:12",
"23/07/2020 01:19:30"
],
"hostname" : "MyHostname1",
"ip" : "142.0.111.79",
"component" : "C:\\Windows\\System32\\es-ES\\KernelBase.dll.mui",
"process" : "LogonUI.exe",
"date" : "23/07/2020 10:26:04",
}
{
"_id" : ObjectId("5f18fa823406b7000132d098"),
"last_date" : "22/07/2020 23:48:33",
"history_dates" : [
"23/07/2020 00:18:53"
],
"hostname" : "MyHostName2",
"ip" : "142.0.111.54",
"component" : "C:\\Windows\\System32\\es-ES\\KernelBase.dll.mui",
"process" : "svchost.exe",
"date" : "23/07/2020 10:26:04",
}
I'm really ignorant about MongoDB's queries, and I have been trying to make this with Spring Data all the week.
UPDATE 1.
Thanks varman, and do you know how can i just retrieve the documents with filtered arrays not empty?
So basically you need to do filter. MongoTemplate offers a lot of operation for mongodb, if some methods don't exist in MongoTemplate, we can go with Bson Document pattern. In that case, try this article: Trick to covert mongo shell query.
Actually you need a Mongo query something like following. Using $addFields one of the methods shown below. But you can use $project, $set etc. Here $addFields overwrites your history_dates. (It uses to add new fields to document too).
{
$addFields: {
history_dates: {
$filter: {
input: "$history_dates",
cond: {
$and: [{
$gt: ["$$this", "23/07/2020"]
},
{
$lt: ["$$this", "24/07/2020"]
}
]
}
}
}
}
}
Working Mongo playground.
You need to convert this into spring data. So #Autowired the MongoTemplate in you class.
#Autowired
MongoTemplate mongoTemplate;
The method is,
public List<Object> filterDates(){
Aggregation aggregation = Aggregation.newAggregation(
a->new Document("$addFields",
new Document("history_dates",
new Document("$filter",
new Document("input","$history_dates")
.append("cond",
new Document("$and",
Arrays.asList(
new Document("$gt",Arrays.asList("$$this","23/07/2020")),
new Document("$lt",Arrays.asList("$$this","24/07/2020"))
)
)
)
)
)
)
).withOptions(AggregationOptions.builder().allowDiskUse(Boolean.TRUE).build());
return mongoTemplate.aggregate(aggregation, mongoTemplate.getCollectionName(YOUR_CLASS.class), Object.class).getMappedResults();
}
Mongo template doesn't provide add methods for $addFields and $filter. So we just go with bson document pattern. I haven't tested this in Spring.

Two Aggregate Totals in One Group

I wrote a query in MongoDB as follows:
db.getCollection('student').aggregate(
[
{
$match: { "student_age" : { "$ne" : 15 } }
},
{
$group:
{
_id: "$student_name",
count: {$sum: 1},
sum1: {$sum: "$student_age"}
}
}
])
In others words, I want to fetch the count of students that aren't 15 years old and the summary of their age. The query works fine and I get two data items.
In my application, I want to do the query by Spring Data.
I wrote the following code:
Criteria where = Criteria.where("AGE").ne(15);
Aggregation aggregation = Aggregation.newAggregation(
Aggregation.match(where),
Aggregation.group().sum("student_age").as("totalAge"),
count().as("countOfStudentNot15YearsOld"));
When this code is run, the output query will be:
"aggregate" : "MyDocument", "pipeline" :
[ { "$match" { "AGE" : { "$ne" : 15 } } },
{ "$group" : { "_id" : null, "totalAge" : { "$sum" : "$student_age" } } },
{ "$count" : "countOfStudentNot15YearsOld" }],
"cursor" : { "batchSize" : 2147483647 }
Unfortunately, the result is only countOfStudentNot15YearsOld item.
I want to fetch the result like my native query.
If your're asking to return the grouping for both "15" and "not 15" as a result then you're looking for the $cond operator which will allow a "branching" based on conditional evaluation.
From the "shell" content you would use it like this:
db.getCollection('student').aggregate([
{ "$group": {
"_id": null,
"countFiteen": {
"$sum": {
"$cond": [{ "$eq": [ "$student_age", 15 ] }, 1, 0 ]
}
},
"countNotFifteen": {
"$sum": {
"$cond": [{ "$ne": [ "$student_age", 15 ] }, 1, 0 ]
}
},
"sumNotFifteen": {
"$sum": {
"$cond": [{ "$ne": [ "$student_age", 15 ] }, "$student_age", 0 ]
}
}
}}
])
So you use the $cond to perform a logical test, in this case whether the "student_age" in the current document being considered is 15 or not, then you can return a numerical value in response which is 1 here for "counting" or the actual field value when that is what you want to send to the accumulator instead. In short it's a "ternary" operator or if/then/else condition ( which in fact can be shown in the more expressive form with keys ) you can use to test a condition and decide what to return.
For the spring mongodb implementation you use ConditionalOperators.Cond to construct the same BSON expressions:
import org.springframework.data.mongodb.core.aggregation.*;
ConditionalOperators.Cond isFifteen = ConditionalOperators.when(new Criteria("student_age").is(15))
.then(1).otherwise(0);
ConditionalOperators.Cond notFifteen = ConditionalOperators.when(new Criteria("student_age").ne(15))
.then(1).otherwise(0);
ConditionalOperators.Cond sumNotFifteen = ConditionalOperators.when(new Criteria("student_age").ne(15))
.thenValueOf("student_age").otherwise(0);
GroupOperation groupStage = Aggregation.group()
.sum(isFifteen).as("countFifteen")
.sum(notFifteen).as("countNotFifteen")
.sum(sumNotFifteen).as("sumNotFifteen");
Aggregation aggregation = Aggregation.newAggregation(groupStage);
So basically you just extend off of that logic, using .then() for a "constant" value such as 1 for the "counts", and .thenValueOf() where you actually need the "value" of a field from the document, so basically equal to the "$student_age" as shown for the common shell notation.
Since ConditionalOperators.Cond shares the AggregationExpression interface, this can be used with .sum() in the form that accepts an AggregationExpression as opposed to a string. This is an improvement on past releases of spring mongo which would require you to perform a $project stage so there were actual document properties for the evaluated expression prior to performing a $group.
If all you want is to replicate the original query for spring mongodb, then your mistake was using the $count aggregation stage rather than appending to the group():
Criteria where = Criteria.where("AGE").ne(15);
Aggregation aggregation = Aggregation.newAggregation(
Aggregation.match(where),
Aggregation.group()
.sum("student_age").as("totalAge")
.count().as("countOfStudentNot15YearsOld")
);

DB script for changing the model of mongoDB collection [duplicate]

In MongoDB, is it possible to update the value of a field using the value from another field? The equivalent SQL would be something like:
UPDATE Person SET Name = FirstName + ' ' + LastName
And the MongoDB pseudo-code would be:
db.person.update( {}, { $set : { name : firstName + ' ' + lastName } );
The best way to do this is in version 4.2+ which allows using the aggregation pipeline in the update document and the updateOne, updateMany, or update(deprecated in most if not all languages drivers) collection methods.
MongoDB 4.2+
Version 4.2 also introduced the $set pipeline stage operator, which is an alias for $addFields. I will use $set here as it maps with what we are trying to achieve.
db.collection.<update method>(
{},
[
{"$set": {"name": { "$concat": ["$firstName", " ", "$lastName"]}}}
]
)
Note that square brackets in the second argument to the method specify an aggregation pipeline instead of a plain update document because using a simple document will not work correctly.
MongoDB 3.4+
In 3.4+, you can use $addFields and the $out aggregation pipeline operators.
db.collection.aggregate(
[
{ "$addFields": {
"name": { "$concat": [ "$firstName", " ", "$lastName" ] }
}},
{ "$out": <output collection name> }
]
)
Note that this does not update your collection but instead replaces the existing collection or creates a new one. Also, for update operations that require "typecasting", you will need client-side processing, and depending on the operation, you may need to use the find() method instead of the .aggreate() method.
MongoDB 3.2 and 3.0
The way we do this is by $projecting our documents and using the $concat string aggregation operator to return the concatenated string.
You then iterate the cursor and use the $set update operator to add the new field to your documents using bulk operations for maximum efficiency.
Aggregation query:
var cursor = db.collection.aggregate([
{ "$project": {
"name": { "$concat": [ "$firstName", " ", "$lastName" ] }
}}
])
MongoDB 3.2 or newer
You need to use the bulkWrite method.
var requests = [];
cursor.forEach(document => {
requests.push( {
'updateOne': {
'filter': { '_id': document._id },
'update': { '$set': { 'name': document.name } }
}
});
if (requests.length === 500) {
//Execute per 500 operations and re-init
db.collection.bulkWrite(requests);
requests = [];
}
});
if(requests.length > 0) {
db.collection.bulkWrite(requests);
}
MongoDB 2.6 and 3.0
From this version, you need to use the now deprecated Bulk API and its associated methods.
var bulk = db.collection.initializeUnorderedBulkOp();
var count = 0;
cursor.snapshot().forEach(function(document) {
bulk.find({ '_id': document._id }).updateOne( {
'$set': { 'name': document.name }
});
count++;
if(count%500 === 0) {
// Excecute per 500 operations and re-init
bulk.execute();
bulk = db.collection.initializeUnorderedBulkOp();
}
})
// clean up queues
if(count > 0) {
bulk.execute();
}
MongoDB 2.4
cursor["result"].forEach(function(document) {
db.collection.update(
{ "_id": document._id },
{ "$set": { "name": document.name } }
);
})
You should iterate through. For your specific case:
db.person.find().snapshot().forEach(
function (elem) {
db.person.update(
{
_id: elem._id
},
{
$set: {
name: elem.firstname + ' ' + elem.lastname
}
}
);
}
);
Apparently there is a way to do this efficiently since MongoDB 3.4, see styvane's answer.
Obsolete answer below
You cannot refer to the document itself in an update (yet). You'll need to iterate through the documents and update each document using a function. See this answer for an example, or this one for server-side eval().
For a database with high activity, you may run into issues where your updates affect actively changing records and for this reason I recommend using snapshot()
db.person.find().snapshot().forEach( function (hombre) {
hombre.name = hombre.firstName + ' ' + hombre.lastName;
db.person.save(hombre);
});
http://docs.mongodb.org/manual/reference/method/cursor.snapshot/
Starting Mongo 4.2, db.collection.update() can accept an aggregation pipeline, finally allowing the update/creation of a field based on another field:
// { firstName: "Hello", lastName: "World" }
db.collection.updateMany(
{},
[{ $set: { name: { $concat: [ "$firstName", " ", "$lastName" ] } } }]
)
// { "firstName" : "Hello", "lastName" : "World", "name" : "Hello World" }
The first part {} is the match query, filtering which documents to update (in our case all documents).
The second part [{ $set: { name: { ... } }] is the update aggregation pipeline (note the squared brackets signifying the use of an aggregation pipeline). $set is a new aggregation operator and an alias of $addFields.
Regarding this answer, the snapshot function is deprecated in version 3.6, according to this update. So, on version 3.6 and above, it is possible to perform the operation this way:
db.person.find().forEach(
function (elem) {
db.person.update(
{
_id: elem._id
},
{
$set: {
name: elem.firstname + ' ' + elem.lastname
}
}
);
}
);
I tried the above solution but I found it unsuitable for large amounts of data. I then discovered the stream feature:
MongoClient.connect("...", function(err, db){
var c = db.collection('yourCollection');
var s = c.find({/* your query */}).stream();
s.on('data', function(doc){
c.update({_id: doc._id}, {$set: {name : doc.firstName + ' ' + doc.lastName}}, function(err, result) { /* result == true? */} }
});
s.on('end', function(){
// stream can end before all your updates do if you have a lot
})
})
update() method takes aggregation pipeline as parameter like
db.collection_name.update(
{
// Query
},
[
// Aggregation pipeline
{ "$set": { "id": "$_id" } }
],
{
// Options
"multi": true // false when a single doc has to be updated
}
)
The field can be set or unset with existing values using the aggregation pipeline.
Note: use $ with field name to specify the field which has to be read.
Here's what we came up with for copying one field to another for ~150_000 records. It took about 6 minutes, but is still significantly less resource intensive than it would have been to instantiate and iterate over the same number of ruby objects.
js_query = %({
$or : [
{
'settings.mobile_notifications' : { $exists : false },
'settings.mobile_admin_notifications' : { $exists : false }
}
]
})
js_for_each = %(function(user) {
if (!user.settings.hasOwnProperty('mobile_notifications')) {
user.settings.mobile_notifications = user.settings.email_notifications;
}
if (!user.settings.hasOwnProperty('mobile_admin_notifications')) {
user.settings.mobile_admin_notifications = user.settings.email_admin_notifications;
}
db.users.save(user);
})
js = "db.users.find(#{js_query}).forEach(#{js_for_each});"
Mongoid::Sessions.default.command('$eval' => js)
With MongoDB version 4.2+, updates are more flexible as it allows the use of aggregation pipeline in its update, updateOne and updateMany. You can now transform your documents using the aggregation operators then update without the need to explicity state the $set command (instead we use $replaceRoot: {newRoot: "$$ROOT"})
Here we use the aggregate query to extract the timestamp from MongoDB's ObjectID "_id" field and update the documents (I am not an expert in SQL but I think SQL does not provide any auto generated ObjectID that has timestamp to it, you would have to automatically create that date)
var collection = "person"
agg_query = [
{
"$addFields" : {
"_last_updated" : {
"$toDate" : "$_id"
}
}
},
{
$replaceRoot: {
newRoot: "$$ROOT"
}
}
]
db.getCollection(collection).updateMany({}, agg_query, {upsert: true})
(I would have posted this as a comment, but couldn't)
For anyone who lands here trying to update one field using another in the document with the c# driver...
I could not figure out how to use any of the UpdateXXX methods and their associated overloads since they take an UpdateDefinition as an argument.
// we want to set Prop1 to Prop2
class Foo { public string Prop1 { get; set; } public string Prop2 { get; set;} }
void Test()
{
var update = new UpdateDefinitionBuilder<Foo>();
update.Set(x => x.Prop1, <new value; no way to get a hold of the object that I can find>)
}
As a workaround, I found that you can use the RunCommand method on an IMongoDatabase (https://docs.mongodb.com/manual/reference/command/update/#dbcmd.update).
var command = new BsonDocument
{
{ "update", "CollectionToUpdate" },
{ "updates", new BsonArray
{
new BsonDocument
{
// Any filter; here the check is if Prop1 does not exist
{ "q", new BsonDocument{ ["Prop1"] = new BsonDocument("$exists", false) }},
// set it to the value of Prop2
{ "u", new BsonArray { new BsonDocument { ["$set"] = new BsonDocument("Prop1", "$Prop2") }}},
{ "multi", true }
}
}
}
};
database.RunCommand<BsonDocument>(command);
MongoDB 4.2+ Golang
result, err := collection.UpdateMany(ctx, bson.M{},
mongo.Pipeline{
bson.D{{"$set",
bson.M{"name": bson.M{"$concat": []string{"$lastName", " ", "$firstName"}}}
}},
)

Spring Data Mongo Template - Counting an array

Mongo document:
{
"_id" : "1",
"array" : [
{
"item" : "item"
},
{
"item" : "item"
}
]
}
My mongo shell query looks like so:
db.getCollection('collectionName').aggregate(
{$match: { _id: "1"}},
{$project: { count: { $size:"$array" }}}
)
Is there anyway to implement this using the Mongo Template from Spring?
So far I have this:
MatchOperation match = new MatchOperation(Criteria.where("_id").is("1"));
ProjectionOperation project = new ProjectionOperation();
Aggregation aggregate = Aggregation.newAggregation(match, project);
mongoTemplate.aggregate(aggregate, collectionName, Integer.class);
I think I am only missing the project logic but I'm not sure if it is possible to do $size or equivalent here.
It's quite possible, the $size operator is supported (see DATAMONGO-979 and its implementation here). Your implementation could follow this example:
import static org.springframework.data.mongodb.core.aggregation.Aggregation.*;
Aggregation agg = new Aggregation(
match(where("_id").is("1")), //
project() //
.and("array") //
.size() //
.as("count")
);
AggregationResults<IntegerCount> results = mongoTemplate.aggregate(
agg, collectionName, Integer.class
);
List<IntegerCount> intCount = results.getMappedResults();
Please find below the sample code. You can change it accordingly for your requirement with collection name, collection name class and array field name.
MatchOperation match = new MatchOperation(Criteria.where("_id").is("1"));
Aggregation aggregate = Aggregation.newAggregation(match, Aggregation.project().and("array").project("size").as("count"));
AggregationResults<CollectionNameClass> aggregateResult = mongoOperations.aggregate(aggregate, "collectionName", <CollectionNameClass>.class);
if (aggregateResult!=null) {
//You can find the "count" as an attrribute inside "result" key
System.out.println("Output ====>" + aggregateResult.getRawResults().get("result"));
System.out.println("Output ====>" + aggregateResult.getRawResults().toMap());
}
Sample output:-
Output ====>[ { "_id" : "3ea8e671-1e64-4cde-bd78-5980049a772b" , "count" : 47}]
Output ====>{serverUsed=127.0.0.1:27017, waitedMS=0, result=[ { "_id" : "3ea8e671-1e64-4cde-bd78-5980049a772b" , "count" : 47}], ok=1.0}
You can write query as
Aggregation aggregate = Aggregation.newAggregation(Aggregation.match(Criteria.where("_id").is(1)),
Aggregation.project().and("array").size().as("count")); mongoTemplate.aggregate(aggregate, collectionName, Integer.class);
It will execute the following query { "aggregate" : "collectionName" , "pipeline" : [ { "$match" : { "_id" : 1}} , { "$project" : { "count" : { "$size" : [ "$array"]}}}]}

Categories