Get month wise count in MongoDB/Java Springboot - java

I have a collection in mongodb with below data:
collection name: runState
runId: 1
startTime:2020-09-16T20:56:06.598+00:00
endTime:2020-09-16T20:57:09.196+00:00
product_action: org_rhel_oracle_install
Task: completed
ranBy:David
runId: 2
startTime:2021-01-11T20:56:06.598+00:00
endTime:2021-01-11T20:56:09.196+00:00
product_action: org_rhel_oracle_install
Task: completed
ranBy:John
runId: 2
startTime:2021-01-27T20:56:06.598+00:00
endTime:2021-01-27T20:56:09.196+00:00
product_action: org_rhel_oracle_install
Task: completed
ranBy:John
runId: 3
startTime:2021-01-11T20:56:06.598+00:00
endTime:2021-01-11T20:57:09.196+00:00
product_action: org_rhel_postgres_install
Task: completed
ranBy:John
runId: 4
startTime:2021-02-09T20:56:06.598+00:00
endTime:2021-02-09T20:57:09.196+00:00
product_action: org_rhel_oracle_install
Task: completed
ranBy:John
runId: 5
startTime:2021-02-09T20:56:06.598+00:00
endTime:2021-02-09T20:57:09.196+00:00
product_action: org_rhel_postgres_install
Task: completed
ranBy:John
runId: 6
startTime:2021-09-09T20:56:06.598+00:00
endTime:2021-09-09T20:57:09.196+00:00
product_action: org_rhel_postgres_install
Task: completed
ranBy:John
runId: 7
startTime:2022-01-09T20:56:06.598+00:00
endTime:2022-01-09T20:57:09.196+00:00
product_action: org_rhel_oracle_install
Task: completed
ranBy:David
runId: 8
startTime:2022-01-10T20:56:06.598+00:00
endTime:2022-01-10T20:57:09.196+00:00
product_action: org_rhel_oracle_install
Task: failed
ranBy:David
I want the output as count for last 12 months (Jan 2021 to Jan 2022) for each products where task is completed( product is gettable from product_action)
Output should be in below format:
{
"_id" : "postgres",
completed: [
{
"month" : "FEB-2021",
"count" : 1
},
{
"month" : "SEP-2021",
"count" : 1
},
{
"month" : "JAN-2021",
"count" : 1
}
]
},
{
"_id" : "oracle",
"completed" : [
{
"month" : "FEB-2021",
"count" : 1
},
{
"month" : "JAN-2021",
"count" : 2
}
]
}
I have started with below, but not sure how to get count for month wise like above.
{"product_action":{$regex:"postgres|oracle"},"Task":"completed"}
As this is new to me, can someone help me with mongo DB query to get the result and also code to acheive this in Java springboot?
Java code I tried using aggregation, but this is not yielding the result I want.
Aggregation agg = Aggregation.newAggregation(
Aggregation.project("endTime","Task","product_action").and(DateOperators.Month.monthOf("endTime")).as("month"),
Aggregation.match(Criteria.where("product_action").regex("postgres|oracle").and("Task").is("completed")
.and("endTime").gte(parseDate("2021-02-01"))),
Aggregation.group("month","Task").count().as("count")
);

Try this on for size:
db.foo.aggregate([
// Get easy stuff out way. Filter for the desired date range and only
// those items that are complete:
{$match: {$and: [
{"endTime":{$gte:new ISODate("2021-01-01")}},
{"endTime":{$lt:new ISODate("2022-01-01")}},
{"Task":"completed"}
]} }
// Now group by product and date expressed as month-year. The product
// is embedded in the field value so there are a few approaches to digging
// it out. Here, we split on underscore and take the [2] item.
,{$group: {_id: {
p: {$arrayElemAt:[{$split:["$product_action","_"]},2]},
d: {$dateToString: {date: "$endTime", format: "%m-%Y"}}
},
n: {$sum: 1}
}}
// The OP seeks to make the date component nested inside the product
// instead of having it as a two-part grouping. We will "regroup" and
// create an array. This is slightly different than the format indicated
// by the OP but values as keys (e.g. "Jan-2021: 2") is in general a
// poor idea so instead we construct an array of proper name:value pairs.
,{$group: {_id: '$_id.p',
completed: {$push: {d: '$_id.d', n: '$n'}}
}}
]);
which yields
{
"_id" : "postgres",
"completed" : [
{
"d" : "02-2021",
"n" : 1
},
{
"d" : "09-2021",
"n" : 1
},
{
"d" : "01-2021",
"n" : 1
}
]
}
{
"_id" : "oracle",
"completed" : [
{
"d" : "02-2021",
"n" : 1
},
{
"d" : "01-2021",
"n" : 2
}
]
}
UPDATED
It has come up before that the $dateToString function does not have a format argument to produce the 3 letter abbreviation for a month e.g. JAN (or a long form e.g. January for that matter). Sorting still works with 01-2021,02-2021,04-2021 vs. JAN-2021,FEB-2021,APR-2021 but if such output is really desired directly from the DB instead of post-processing in the client-side code, then the second group is replaced by a $sort and $group as follows:
// Ensure the NN-YYYY dates are going in increasing order. The product
// component _id.p does not matter here -- only the dates have to be
// increasing. NOTE: This is OPTIONAL with respect to changing
// NN-YYYY into MON-YYYY but almost always the follow on question is
// how to get the completed list in date order...
,{$sort: {'_id.d':1}}
// Regroup as before but index the NN part of NN-YYYY into an
// array of 3 letter abbrevs, then reconstruct the string with the
// dash and the year component. Remember: the order of the _id
// in the doc stream coming out of $group is not deterministic
// but the array created by $push will preserve the order in
// which it was pushed -- which is the date-ascending sorted order
// from the prior stage.
,{$group: {_id: '$_id.p',
completed: {$push: {
d: {$concat: [
{$arrayElemAt:[ ['JAN','FEB','MAR',
'APR','MAY','JUN',
'JUL','AUG','SEP',
'OCT','NOV','DEC'],
// minus 1 to adjust for zero-based array:
{$subtract:[{$toInt: {$substr:['$_id.d',0,2]}},1]}
]},
"-",
{$substr:['$_id.d',3,4]}
]},
n: '$n'}}
}}
which yields:
{
"_id" : "postgres",
"completed" : [
{
"d" : "JAN-2021",
"n" : 1
},
{
"d" : "FEB-2021",
"n" : 1
},
{
"d" : "SEP-2021",
"n" : 1
}
]
}
{
"_id" : "oracle",
"completed" : [
{
"d" : "JAN-2021",
"n" : 2
},
{
"d" : "FEB-2021",
"n" : 1
}
]
}
As for converting this to Java, there are several approaches but unless a great deal of programmatic control is required, then capturing the query as "relaxed JSON" (quotes not required around keys) in a string in Java and calling Document.parse() seems to be the easiest way. A full example including helper functions and the appropriate Java drivers calls can be found here: https://moschetti.org/rants/mongoaggcvt.html but the gist of it is:
private static class StageHelper {
private StringBuilder txt;
public StageHelper() {
this.txt = new StringBuilder();
}
public void add(String expr, Object ... subs) {
expr.replace("'", "\""); // This is the helpful part.
if(subs.length > 0) {
expr = String.format(expr, subs); // this too
}
txt.append(expr);
}
public Document fetch() {
Document b = Document.parse(txt.toString());
return b;
}
}
private List<Document> makePipeline() {
List<Document> pipeline = new ArrayList<Document>();
StageHelper s = new StageHelper();
s.add("{$match: {$and: [ ");
// Note use of EJSON here plus string substitution of dates:
s.add(" {endTime:{$gte: {$date: '%s'}} }", "2021-01-01");
s.add(" {endTime:{$lt: {$date: '%s'}} }", "2022-01-01");
s.add(" {Task:'completed'} ");
s.add("]} } ");
pipeline.add(s.fetch());
s = new StageHelper();
s.add("{$group: {_id: { ");
s.add(" p: {$arrayElemAt:[{$split:['$product_action','_']},2]}, ");
s.add(" d: {$dateToString: {date: '$endTime', 'format': '%m-%Y'}} ");
s.add(" }, ");
s.add(" n: {$sum: 1} ");
s.add("}} ");
pipeline.add(s.fetch());
s = new StageHelper();
s.add("{$sort: {'_id.d':1}} ");
pipeline.add(s.fetch());
s = new StageHelper();
s.add("{$group: {_id: '$_id.p', ");
s.add(" completed: {$push: { ");
s.add(" d: {$concat: [ ");
s.add(" {$arrayElemAt:[ ['JAN','FEB','MAR', ");
s.add(" 'APR','MAY','JUN', ");
s.add(" 'JUL','AUG','SEP', ");
s.add(" 'OCT','NOV','DEC'], ");
s.add(" {$subtract:[{$toInt: {$substr:['$_id.d',0,2]}},1]} ");
s.add(" ]}, ");
s.add(" '-', ");
s.add(" {$substr:['$_id.d',3,4]} ");
s.add(" ]}, ");
s.add(" n: '$n'}} ");
s.add(" }} ");
pipeline.add(s.fetch());
return pipeline;
}
...
import com.mongodb.client.MongoCursor;
import com.mongodb.client.AggregateIterable;
AggregateIterable<Document> output = coll.aggregate(pipeline);
MongoCursor<Document> iterator = output.iterator();
while (iterator.hasNext()) {
Document doc = iterator.next();

Related

How to get the count of element with non-empty-array-field when group in mongodb aggregate using Spring Data Mongo?

I have the following documents in one collection named as mail_test. Some of them have a tags field which is an array:
/* 1 */
{
"_id" : ObjectId("601a7c3a57c6eb4c1efb84ff"),
"email" : "aaaa#bbb.com",
"content" : "11111"
}
/* 2 */
{
"_id" : ObjectId("601a7c5057c6eb4c1efb8590"),
"email" : "aaaa#bbb.com",
"content" : "22222"
}
/* 3 */
{
"_id" : ObjectId("601a7c6d57c6eb4c1efb8675"),
"email" : "aaaa#bbb.com",
"content" : "33333",
"tags" : [
"x"
]
}
/* 4 */
{
"_id" : ObjectId("601a7c8157c6eb4c1efb86f4"),
"email" : "aaaa#bbb.com",
"content" : "4444",
"tags" : [
"yyy",
"zzz"
]
}
There are two documents with non-empty-tags, so I want the result to be 2.
I use the the following statement to aggregate and get the correct tag_count:
db.getCollection('mail_test').aggregate([{$group:{
"_id":null,
"all_count":{$sum:1},
"tag_count":{"$sum":{$cond: [ { $ne: ["$tags", undefined] }, 1, 0]}}
//if replace `undefined` with `null`, I got the tag_count as 4, that is not what I want
//I also have tried `$exists`, but it cannot be used here.
}}])
and the result is:
{
"_id" : null,
"all_count" : 4.0,
"tag_count" : 2.0
}
and I use spring data mongo in java to do this:
private void test(){
Aggregation agg = Aggregation.newAggregation(
Aggregation.match(new Criteria()),//some condition here
Aggregation.group(Fields.fields()).sum(ConditionalOperators.when(Criteria.where("tags").ne(null)).then(1).otherwise(0)).as("tag_count")
//I need an `undefined` instead of `null`,or is there are any other solution?
);
AggregationResults<MailTestGroupResult> results = mongoTemplate.aggregate(agg, MailTest.class, MailTestGroupResult.class);
List<MailTestGroupResult> mappedResults = results.getMappedResults();
int tag_count = mappedResults.get(0).getTag_count();
System.out.println(tag_count);//get 4,wrong
}
I need an undefined instead of null but I don't know how to do this,or is there are any other solution?
You can use Aggregation operators to check if the field tags exists or not with one of the following constructs in the $group stage of your query (to calculate the tag_count value):
"tag_count":{ "$sum": { $cond: [ { $gt: [ { $size: { $ifNull: ["$tags", [] ] }}, 0 ] }, 1, 0] }}
// - OR -
"tag_count":{ "$sum": { $cond: [ $eq: [ { $type: "$tags" }, "array" ] }, 1, 0] }
Both, return the same result (as you had posted).

Get count of unique ObjectId from array MongoDB

I'm new to working with MongoDb and do not know a lot of things.
I need to write an aggregation request.
Here is the JSON document structure.
{
"_id" : ObjectId("5a72f7a75ef7d430e8c462d2"),
"crawler_id" : ObjectId("5a71cbb746e0fb0007adc6c2"),
"skill" : "stack",
"created_date" : ISODate("2018-02-01T13:19:03.522+0000"),
"modified_date" : ISODate("2018-02-01T13:22:23.078+0000"),
"connects" : [
{
"subskill" : "we’re",
"weight" : NumberInt(1),
"parser_id" : [
ObjectId("5a71d88d5ef7d41964fbec11")
]
},
{
"subskill" : "b1",
"weight" : NumberInt(2),
"parser_id" : [
ObjectId("5a71d88d5ef7d41964fbec11"),
ObjectId("5a71d88d5ef7d41964fbec1b")
]
},
{
"subskill" : "making",
"weight" : NumberInt(2),
"parser_id" : [
ObjectId("5a71d88d5ef7d41964fbec1b"),
ObjectId("5a71d88d5ef7d41964fbec1c")
]
},
{
"subskill" : "delivery",
"weight" : NumberInt(2),
"parser_id" : [
ObjectId("5a71d88d5ef7d41964fbec1c"),
ObjectId("5a71d88d5ef7d41964fbec1e")
]
}
]
}
I need the result return the name of skill and the number of unique parser_id.
In this case, the result should be:
[
{
"skill": "stack",
"quantity": 4
}
]
where "stack" - skill name,
and "quantity" - count of unique parser_id.
ObjectId("5a71d88d5ef7d41964fbec11")
ObjectId("5a71d88d5ef7d41964fbec1b")
ObjectId("5a71d88d5ef7d41964fbec1c")
ObjectId("5a71d88d5ef7d41964fbec1e")
Can some one help me with this request ???
Given the document supplied in your question, this command ...
db.collection.aggregate([
{ $unwind: "$connects" },
// count all occurrences
{ "$group": { "_id": {skill: "$skill", parser_id: "$connects.parser_id"}, "count": { "$sum": 1 } }},
// sum all occurrences and count distinct
{ "$group": { "_id": "$_id.skill", "quantity": { "$sum": 1 } }},
// (optional) rename the '_id' attribute to 'skill'
{ $project: { 'skill': '$_id', 'quantity': 1, _id: 0 } }
])
... will return:
{
"quantity" : 4,
"skill" : "stack"
}
The above command groups by skill and connects.parser_id and then gets a distinct count of those groups.
Your command includes the java tag so I suspect you are looking to execute the same command using the MongoDB Java driver. The code below (using MongoDB Java driver v3.x) will return the same result:
MongoClient mongoClient = ...;
MongoCollection<Document> collection = mongoClient.getDatabase("...").getCollection("...");
List<Document> documents = collection.aggregate(Arrays.asList(
Aggregates.unwind("$connects"),
new Document("$group", new Document("_id", new Document("skill", "$skill").append("parser_id", "$connects.parser_id"))
.append("count", new Document("$sum", 1))),
new Document("$group", new Document("_id", "$_id.skill").append("quantity", new Document("$sum", 1))),
new Document("$project", new Document("skill", "$_id").append("quantity", 1).append("_id", 0))
)).into(new ArrayList<>());
for (Document document : documents) {
logger.info("{}", document.toJson());
}
Note: this code deliberately uses the form new Document(<pipeline aggregator>, ...) instead of the Aggregators utilities to make it easier to see the translation between the shell command and its Java equivalent.
try $project with $reduce
$setUnion is used to keep only the distinct ids and finally $size used to get the distinct array count
db.col.aggregate(
[
{$project : {
_id : 0,
skill : 1,
quantity : {$size :{$reduce : {input : "$connects.parser_id", initialValue : [] , in : {$setUnion : ["$$value", "$$this"]}}}}
}
}
]
).pretty()
result
{ "skill" : "stack", "quantity" : 4 }

How to get documents, where any of the field value matches to any of the listed expressions in MongoDB Java

I'am trying to fetch all documents in a collections, where any of the document field can match to any of the listed regular expressions.
Considering below scenarios.
User can create documents with different fields names as they wish in a collection.
such as
document1 = >{ "_id":1, "card" : 1234 , "status": 4}
document2 => {"_id": ***, "Housenumber" : 356/78 , "value" : null}
------
documentn =>{ "_id" : ObjectId("4ecd2e33dd68c9021e453d12"), "searchword" : "win" }
------
Field names are not same for all the documents in a collection.
regular expressions can be:"/^(^456$|^win$............etc)/"
I tried to get key dynamically and do find query as mentioned below:
----------
table = db.getCollection(coll);
DBObject dataKeys = table.findOne();
Set<String> keys = dataKeys.keySet();
Iterator<String> iterator = keys.iterator();
while(iterator.hasNext()){
String key = iterator.next();
regexQuery.put(**key**, new BasicDBObject("$regex", "^((^(([0-9]{4}[-. _]?)$)|"
+ "(^[a-zA-Z0-9._%+-]...........................0-9]$$").append("$options", "i"));
DBCursor cursor = table.find(regexQuery);
while (cursor.hasNext()) {
System.out.println(cursor.next());
I can see key value is coming properly but it is not fetching the matching documents.
I am new to MongoDB and I followed above approach after googling it.
If you are looking to regex match on the field names (not the values), then use $objectToArray to turn the field names (LHS) into expression-worthy values (RHS):
var r = [
{ _id: 1, name: "buzz", addr: "here"}
,{ _id: 2, searchword: "win", value: 6}
,{ _id: 3, game:0, word: "foo", fruit: "apple", fame: 7}
,{ _id: 4, qval:23}
];
db.foo.insert(r);
var rin = [ /ame/, /^val/ ]; // list of regex
db.foo.aggregate([
{$project: {x: {$objectToArray: "$$CURRENT"}}}
,{$unwind: "$x"}
,{$match: {"x.k": {$in: rin}}}
]);
{ "_id" : 1, "x" : { "k" : "name", "v" : "buzz" } }
{ "_id" : 2, "x" : { "k" : "value", "v" : 6 } }
{ "_id" : 3, "x" : { "k" : "game", "v" : 0 } }
{ "_id" : 3, "x" : { "k" : "fame", "v" : 7 } }

Regex query MongoDB Performance issue

i have Mongodb collection which contains single field , each day i am receiving 31000 documents and in the collection i have almost 6 months data
Here is how my data looks like in database
{
"_id" : ObjectId("59202aa3f32dfba00d0773c3"),
"Data" : "20-05-2017 18:38:13 SYSTEM_000_00_SAVING ",
"__v" : 0
}
{
"_id" : ObjectId("59202aa3f32dfba00d0773c4"),
"Data" : "20-05-2017 18:38:13 SyTime_000_09_00:00 ",
"__v" : 0
}
here is my code for query
DBObject query = new BasicDBObject();
Pattern regex = Pattern.compile("20-05-2017");
query.put("Data", regex);
i have created index but its still slow
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "NOB_SRB.fdevices"
},
{
"v" : 1,
"unique" : true,
"key" : {
"Data" : 1.0
},
"name" : "Data_1",
"ns" : "NOB_SRB.fdevices"
}
]
Add a start of input anchor ^ to the start of the regex:
Pattern regex = Pattern.compile("^20-05-2017");
Because your regex does not have an anchor, the entire field is searched for the date anywhere in it, which requires every character in the field to be compared.

Count number of fields in a document mongo Java

I have a collection containing following documents -
{
"_id" : ObjectId("56e7a51b4a66e30330151847"),
"host" : "myTestHost.com",
"sessionId" : "daxxxxxx-xxxx-xxxx-xxxx-xxxxxxx1",
"ssoId" : "xxxxxx#gmail.com",
"days" : {
"13" : NumberLong(130),
"11" : NumberLong(457),
"10" : NumberLong(77)
},
"count" : NumberLong(664),
"timeStamp" : NumberLong("1458021713370")
}
I am using mongo java driver 3.2.1.
This document contains an embedded document 'days', that holds a specific count for each day of month.
I need to find the number of days for which a count is present.For example - for above document mentioned, the number of days for which count is present is 3 (13th, 11th and 10th day of month).
I know how to get the count on mongo console -
mongos>var count = 0;
mongos> db.monthData.find({},{days:1}).forEach(function(record){for(f in record.days) { count++;}});
mongos> count;
I need to convert this to java code.
Maybe you can reshape your schema as follow:
{
"_id" : ObjectId("56e7a51b4a66e30330151847"),
"host" : "myTestHost.com",
"sessionId" : "daxxxxxx-xxxx-xxxx-xxxx-xxxxxxx1",
"ssoId" : "xxxxxx#gmail.com",
"days" : [
{
"13" : NumberLong(130)
},
{
"11" : NumberLong(457)
},
{
"10" : NumberLong(77)
}
],
"count" : NumberLong(664),
"timeStamp" : NumberLong(1458021713370)
}
days becomes an array of objects and in this way you can easily use aggregation pipeline to know how many elements are in days array:
>db.collection.aggregate(
[
{$match:{days:{"$exists":1}}},
{$project:{
numberOfDays: {$size:"$days"},
_id:1}
}
]
)
The aggregation returns:
{ "_id" : ObjectId("56e7a51b4a66e30330151847"), "numberOfDays" : 3 }
To use Aggregation Pipeline with Java driver see aggregate, AggregateIterable, Block and read Data Aggregation with Java Driver

Categories