I have a requirement to count common liked items of two users. How can i do this in spring data mongodb.
Suppose user1 likes item a ,user2 likes item a,then common item between these users is 1. So,same structure of item like/dislike here,so how to get this item by mongodb query.
My domain design looks like
public class UserItemHistory {
long userId;
long itemId;
int status // status will be 1 if user likes item
}
Data structure will be
{
user_id:1,
item_id:2,
status:1
}
{
user_id:2,
item_id:2,
status:1
}
You basically want this aggregation pipeline with the .aggregate() method. In short you provide the common users to match and ask for items with a "liked" status. Then you "group by" the "itemId" and count the number of matches between the two users. Finally you filter this to all items "having" a count that is "greater than" 1 as this indicates the same item was present for both users.
As an aggregation pipeline, that is three simple steps, $match then $group to count then $match again on the count values:
{ "$match": {
"userId": { "$in": [1,2] },
"status": 1
}},
{ "$group": {
"_id": "$itemId",
"count": { "$sum": 1 }
}},
{ "$match": { "count": { "$gt": 1 } } }
Which can be written with helpers under spring mongo like this:
Aggregation aggregation = newAggregation(
match(
Criteria.where("userId").in(Arrays.asList(1, 2))
.and("status").is(1)
),
group("itemId").count().as("count"),
match(Criteria.where("count").gt(1))
);
Which provides the pipeline to be run with .aggregate() against the class or collection as you choose.
Related
So, I have this repository in MongoDB that holds movies with this structure:
title: String
description: String
likes: Set<String>
hates: Set<String>
The likes & hates are a Set because they hold a list of UserIds - where the user with those UserIds are the ones that liked/hated the movie.
I am trying to have my service get all movies from the database, sorted by the number of likes/hates. Previously, my structure was different, and likes/hates were just Integers. Then, getting all sorted movies was easy:
public List<MovieDocument> getSortedMovies(SortProperty sortBy, Order order) {
return moviesRepository.findAll(Sort.by(fromString(order.toString()), sortBy.toString()))
}
Where sortBy was either likes or hates and order was either asc or desc, provided by the client of the API.
In the above case, MoviesRepository didn't have any custom methods:
#Repository
public interface MoviesRepository extends MongoRepository<MovieDocument, String> {}
How am I supposed to do that now that likes and hates are Set objects?
Again, what I want is to get all movies sorted by the size of the likes/hates sets.
Can I do that using any of the built-in MongoRepository methods? I had a look and didn't see anything useful.
Looking at other StackOverflow posts, I saw there is an option to add methods to my MoviesRepository with an Aggregation annotation. This would look something like:
#Aggregation("{$project: { 'like_count': { $size: '$likes' } }}, {$sort: {'like_count': -1}}]")
List<String> getSortedMovieIdsByLikesDesc();
However, this does not return the whole MovieDocument, but rather it returns the number of likes. In addition to that, it looks like I'd have to create a new custom method for each property/order combination i.e. likes-asc, likes-desc, hates-asc, hates-desc. This feels tedious and not very extensible.
How would I fix the above to return whole documents and is there any other way to do this I'm not considering?
EDIT
I tried the following based on input from #rickhg12hs.
#Aggregation("{$set: { like_count: { $size: $likes } }}, {$sort: {like_count: -1}}")
List<MovieDocument> getSortedMovieIdsByLikesDesc();
#Aggregation("{$set: { like_count: { $size: $likes } }}, {$sort: {like_count: 1}}")
List<MovieDocument> getSortedMovieIdsByLikesAsc();
#Aggregation("{$set: { hate_count: { $size: $hates } }}, {$sort: {hate_count: -1}}")
List<MovieDocument> getSortedMovieIdsByHatesDesc();
#Aggregation("{$set: { hate_count: { $size: $hates } }}, {$sort: {hate_count: 1}}")
List<MovieDocument> getSortedMovieIdsByHatesAsc();
Unfortunately, all four of those methods seem to return the exact same thing when called. Specifically they return the two items that are in the database unordered.
You seem to be doing almost everything right. Here's an example that does what I think you want.
db.collection.aggregate([
{
// Count likes and hates
"$set": {
"likeCount": {
"$size": "$likes"
},
"hateCount": {
"$size": "$hates"
}
}
},
{
// Most likes first, split ties with
// least hates first
"$sort": {
"likeCount": -1,
"hateCount": 1
}
},
// {
// "$project": {
// "likeCount": 0,
// "hateCount": 0
// }
// }
])
You can uncomment the "$project" stage if you want to remove the counts too.
Try it on mongoplayground.net.
I have a collection for which i want to make queries with pagination by sorting it with dateCreated field as decending to get latest documents.
Am holding lastKnownCommentId(Object id) for performance purpose. Having lastKnownCommentId would avoid loading documents from starting again during next paginations, if not then applying limit on query will cause performance issue.
Query<Comment> query = datastore.createQuery(Comment.class)
.field(POST_ID).equal(postId);
if (lastKnownCommentId != null) {
query.field(FieldConstants.OBJECT_ID).greaterThan(new ObjectId(lastKnownCommentId));
}
query.order(Sort.descending(FieldConstants.DATE_CREATED));
return query.asList(new FindOptions().limit(10));
Now i have 12 documents in this collection which matches to one postId. When this query is excecuted for first pagination with lastKnownCommentId=null it gives me 10 documents sorted by date, 2 documents are still not in pagination picture.
For second pagination with lastKnownCommentId=someId (someId is object id of last document obtained from first pagination), it gives me again 9 documents as result instead of 2 documents which remained in first pagination.
Things are working fine if i dont do sorting by date, i can completely skip this sorting and do sorting on array list instead. Am not understanding why this happening with sorting in query.
I tried to cross check with aggregation and results same ouput.
db.comment.aggregate(
// Pipeline
[
// Stage 1
{
$match: {
"postId":{"$eq":"5fb2090fe4d37312a4c3ce59"}
}
},
// Stage 2
{
$sort: {
"dateCreated":-1
}
},
// Stage 3
{
$match: {
"_id":{"$gt":ObjectId("5fb0e53392ad724f9026d2f7")}
}
},
// Stage 4
{
$limit: // positive integer
10
},
],
// Options
{
cursor: {
batchSize: 50
}
}
);
What does query look like just before you return it? (print out the value, or do a for loop if it's an object so you can see what the values are).
I have a huge number of documents on elastic search, which have lots of status fields from various systems.
I want to apply the filtering on the es index first to get the ids of the records and then download those from mysql.
i.e. i want to fetch all the values of a field say cid, as an array in es response.
If i just ask for a _source ["cid"], that still gives individual documents and is too slow (i want to fetch ~1M records).
Another option is to use terms aggregation, that is faster but still gives individual buckets for each id.
Is there a workaround where i can fetch those many ids of the documents (using aggregation/script/query)?
Currently I am doing :
{
"size": 0,
"query":{
"bool": {
"filter": {
"term": {
"source": "web"
}
}
}
},
"aggs": {
"ids": {
"terms": {
"field": "cid",
"size": 10000
}
}
}
}
Any solution or advice would be helpful.
I have a node called quotes in Firebase. I'm facing issues while fetching data in Android for a particular range. I want to fetch 3 continues quotes id starting from 2. Here is my query database:
"quotes" : {
"-L75elQJaD3EYPsd4oWS" : {
"authorName" : "Hellen v",
"famousQuote" : "When one door of happiness closes, another opens; but often we look so long at the closed door that we do not see the one which has been opened for us.",
"id" : "1",
"uploadedBy" : "Admin"
},
"-L7GOvDNI-o_H8RvNwoN" : {
"authorName" : "Rocky Balboa",
"famousQuote" : "It's not about how hard you can hit; it's about how hard you can get hit and keep moving forward.",
"id" : "2",
"uploadedBy" : "Admin"
},
"-L7GP9oBv5NR1T6HlDd4" : {
"authorName" : "African proverb",
"famousQuote" : "If you want to go fast, go alone. If you want to go far, go together.",
"id" : "3",
"uploadedBy" : "Admin"
},
"-L7GPjM1F3_7Orcz0Q1q" : {
"authorName" : "A.P.J Abdul Kalam",
"famousQuote" : "Don’t take rest after your first victory because if you fail in second, more lips are waiting to say that your first victory was just luck.",
"id" : "4",
"uploadedBy" : "Admin"
},
Below is the rule which I'm using for quotes
"quotes": {
".indexOn": ".value"
}
How can I get quotes which has id 2,3 and 4?
If you have more than 4 records in your database, to solve this, you can use a query in which you should combine startAt() and endAt() methods to limit both ends of your query like this:
DatabaseReference rootRef = FirebaseDatabase.getInstance().getReference();
Query query = rootRef.child("quotes").orderByChild("id").startAt("2").endAt("4");
query.addListenerForSingleValueEvent(/* ... */);
Please see here more informations about Firebase Query's startAt() method:
Create a query constrained to only return child nodes with a value greater than or equal to the given value, using the given orderBy directive or priority as default.
And here are more informations about Firebase Query's endAt() method:
Create a query constrained to only return child nodes with a value less than or equal to the given value, using the given orderBy directive or priority as default.
Edit: According to your comment, if you only want the items that have the id property set to 2, 3 and 4, you should use nested queries like this:
Query queryTwo = rootRef.child("quotes").orderByChild("id").equalsTo("2");
queryTwo.addListenerForSingleValueEvent(
List<Item> list = new ArrayList();
list.add(itemTwo);
Query queryThree = rootRef.child("quotes").orderByChild("id").equalsTo("3");
queryThree.addListenerForSingleValueEvent(
list.add(itemThree);
Query queryFour = rootRef.child("quotes").orderByChild("id").equalsTo("4");
queryFour.addListenerForSingleValueEvent(
list.add(itemFour);
//Do what you need to do with the list that contains three items
);
);
);
I have db of visits users to places, that contains place_id and user_id like this
{place_id : 1, user_id : 1}
{place_id : 1, user_id : 1}
{place_id : 1, user_id : 2}
{place_id : 2, user_id : 3}
{place_id : 2, user_id : 3}
And I want to get amount of distinct users in each place. I ended up with following native mongo aggregation:
db.collection.aggregate([{
$group: {
_id: "$place_id",
setOfUsers: {
$addToSet: "$user_id"
}
}
}, {
$project: {
distinctUserCount: {
$size: "$setOfUsers"
}
}
}])
And now I want to implement it using Spring Data, the problem now is $size operation in projection, since Spring data API does not have such, at least I haven't found it in reference.
GroupOperation group = Aggregation.group("place_id").addToSet("user_id").as("setOfUsers");
ProjectionOperation project = Aggregation.project(). .... ?
Maybe there is any way to create size field also, than the nested api can be used:
Aggregation.project().and("distinctUserCount").nested( ???);
Any help is appreciated.
I am going to answer this in "one hit", so rather than address your "$project" issue, I'm going to advise here that there is a better approach.
The $addToSet operator will create a "unique" array ( or "set" ) of the elements you ask to add to it. It is however basically another form of $group in itself, with the difference being the elements are added to an "array" ( or "set" ) in results.
This is "bad" for scalability, as your potential problem here is that the "set" actually exceeds the BSON limit for document size. Maybe it does not right now, but who knows what the code you write right now will be doing in ten years time.
Therefore, since $group is really the same thing, and you also need "two" pipeline stages to get the "distinct" count, then just to "two" $group stages instead:
Aggregation pipeline = newAggregation(
group(fields("place_id","user_id")),
group("_id.place_id").count().as("distinctUserCount")
);
Being the shell equivalent of:
[
{ "$group": {
"_id": { "place_id": "$place_id", "user_id": "$user_id" }
}},
{ "$group": {
"_id": "$_id.place_id",
"distinctUserCount": { "$sum": 1 }
}}
]
This is simple code and it is much more "scalable" as the individualt "user_id" values are at first contained in separate documents in the pipeline. Therefore the "second" $group ( in place of a $project with $size ) "counts" the distinct amounts that were already determined in the first grouping key.
Learn the limitations and pitfalls, and code well.