Upsert many documents based on _id

Upsert many documents based on _id - java

I need to upsert many documents based on _id.
E.g.
document_1 = {_id:"1", "age":11, "name":"name1"}
document_2 = {_id:"2", "age":22, "name":"name2"}
I wrote the below
db.my_collection.updateMany(
{ _id: {"$in":["1","2"] } },
[
{$set: {_id:"1", "age":11, "name":"name1"}},
{$set: {_id:"2", "age":22, "name":"name2"}}
],
true
)
But no rows gets updated or inserted. Where have I gone wrong?

Instead of true in the 3rd parameter, you have to pass {new: true, upsert: true}.

$merge seems to be a better option for upsert operation. You can store the records you want to upsert in another collection, says to_be_inserted and perform $merge in the aggregation pipeline.
db.to_be_upserted.aggregate([
{
"$merge": {
"into": "my_collection",
"on": "_id",
"whenMatched": "merge",
"whenNotMatched": "insert"
}
}
])
Here is the Mongo Playground for your reference.

Related

Firebase query to fetch data in given range

I have a node called quotes in Firebase. I'm facing issues while fetching data in Android for a particular range. I want to fetch 3 continues quotes id starting from 2. Here is my query database:
"quotes" : {
"-L75elQJaD3EYPsd4oWS" : {
"authorName" : "Hellen v",
"famousQuote" : "When one door of happiness closes, another opens; but often we look so long at the closed door that we do not see the one which has been opened for us.",
"id" : "1",
"uploadedBy" : "Admin"
},
"-L7GOvDNI-o_H8RvNwoN" : {
"authorName" : "Rocky Balboa",
"famousQuote" : "It's not about how hard you can hit; it's about how hard you can get hit and keep moving forward.",
"id" : "2",
"uploadedBy" : "Admin"
},
"-L7GP9oBv5NR1T6HlDd4" : {
"authorName" : "African proverb",
"famousQuote" : "If you want to go fast, go alone. If you want to go far, go together.",
"id" : "3",
"uploadedBy" : "Admin"
},
"-L7GPjM1F3_7Orcz0Q1q" : {
"authorName" : "A.P.J Abdul Kalam",
"famousQuote" : "Don’t take rest after your first victory because if you fail in second, more lips are waiting to say that your first victory was just luck.",
"id" : "4",
"uploadedBy" : "Admin"
},
Below is the rule which I'm using for quotes
"quotes": {
".indexOn": ".value"
}
How can I get quotes which has id 2,3 and 4?

If you have more than 4 records in your database, to solve this, you can use a query in which you should combine startAt() and endAt() methods to limit both ends of your query like this:
DatabaseReference rootRef = FirebaseDatabase.getInstance().getReference();
Query query = rootRef.child("quotes").orderByChild("id").startAt("2").endAt("4");
query.addListenerForSingleValueEvent(/* ... */);
Please see here more informations about Firebase Query's startAt() method:
Create a query constrained to only return child nodes with a value greater than or equal to the given value, using the given orderBy directive or priority as default.
And here are more informations about Firebase Query's endAt() method:
Create a query constrained to only return child nodes with a value less than or equal to the given value, using the given orderBy directive or priority as default.
Edit: According to your comment, if you only want the items that have the id property set to 2, 3 and 4, you should use nested queries like this:
Query queryTwo = rootRef.child("quotes").orderByChild("id").equalsTo("2");
queryTwo.addListenerForSingleValueEvent(
List<Item> list = new ArrayList();
list.add(itemTwo);
Query queryThree = rootRef.child("quotes").orderByChild("id").equalsTo("3");
queryThree.addListenerForSingleValueEvent(
list.add(itemThree);
Query queryFour = rootRef.child("quotes").orderByChild("id").equalsTo("4");
queryFour.addListenerForSingleValueEvent(
list.add(itemFour);
//Do what you need to do with the list that contains three items
);
);
);

Spring data - Count distinct items from grouping

I have db of visits users to places, that contains place_id and user_id like this
{place_id : 1, user_id : 1}
{place_id : 1, user_id : 1}
{place_id : 1, user_id : 2}
{place_id : 2, user_id : 3}
{place_id : 2, user_id : 3}
And I want to get amount of distinct users in each place. I ended up with following native mongo aggregation:
db.collection.aggregate([{
$group: {
_id: "$place_id",
setOfUsers: {
$addToSet: "$user_id"
}
}
}, {
$project: {
distinctUserCount: {
$size: "$setOfUsers"
}
}
}])
And now I want to implement it using Spring Data, the problem now is $size operation in projection, since Spring data API does not have such, at least I haven't found it in reference.
GroupOperation group = Aggregation.group("place_id").addToSet("user_id").as("setOfUsers");
ProjectionOperation project = Aggregation.project(). .... ?
Maybe there is any way to create size field also, than the nested api can be used:
Aggregation.project().and("distinctUserCount").nested( ???);
Any help is appreciated.

I am going to answer this in "one hit", so rather than address your "$project" issue, I'm going to advise here that there is a better approach.
The $addToSet operator will create a "unique" array ( or "set" ) of the elements you ask to add to it. It is however basically another form of $group in itself, with the difference being the elements are added to an "array" ( or "set" ) in results.
This is "bad" for scalability, as your potential problem here is that the "set" actually exceeds the BSON limit for document size. Maybe it does not right now, but who knows what the code you write right now will be doing in ten years time.
Therefore, since $group is really the same thing, and you also need "two" pipeline stages to get the "distinct" count, then just to "two" $group stages instead:
Aggregation pipeline = newAggregation(
group(fields("place_id","user_id")),
group("_id.place_id").count().as("distinctUserCount")
);
Being the shell equivalent of:
[
{ "$group": {
"_id": { "place_id": "$place_id", "user_id": "$user_id" }
}},
{ "$group": {
"_id": "$_id.place_id",
"distinctUserCount": { "$sum": 1 }
}}
]
This is simple code and it is much more "scalable" as the individualt "user_id" values are at first contained in separate documents in the pipeline. Therefore the "second" $group ( in place of a $project with $size ) "counts" the distinct amounts that were already determined in the first grouping key.
Learn the limitations and pitfalls, and code well.

How do I check if a MongoDB object exists and create/update respectively?

I am developing a wireless network survey tool built with Java (Swing GUI) and a MongoDB data storage solution. I am new to MongoDB and hardly a Java guru so I need some help. I want to find if a network exists in my database and append heard points to the network document. If the network doesn't exist, I would like to create a document for that network and add the heard points. I have been trying to fix this for days but I just can't seem to wrap my head around the solution. Also, it would be nice if the BSSID was the unique id so I don't get any duplicate networks. My ideal data structure would look something like this:
{ 'bssid' : 'ca:fe:de:ad:be:ef',
'channel' : 6,
'heardpoints' : {
'point' : { 'lat' : 36.12345, 'long' : -75.234564 },
'point' : { 'lat' : 36.34567, 'long' : -75.345678 }
}
This is what I have tried so far. It seems to add the initial point but it does not add additional points after the first one was made.
BasicDBObject query = new BasicDBObject();
query.put("bssid", pkt[1]);
DBCursor cursor = coll.find(query);
if (!cursor.hasNext()) {
// Document doesnt exist so create one
BasicDBObject document = new BasicDBObject();
document.put("bssid", pkt[1]);
BasicDBObject heardpoints = new BasicDBObject();
BasicDBObject point = new BasicDBObject();
point.put("lat", latitude);
point.put("long", longitude);
heardpoints.put("point", point);
document.put("heardpoints", heardpoints);
coll.insert(document);
} else {
// Document exists so we will update here
DBObject network = cursor.next();
BasicDBObject heardpoints = new BasicDBObject();
BasicDBObject point = new BasicDBObject();
point.put("lat", latitude);
point.put("long", longitude);
heardpoints.put("point", point);
network.put("heardpoints", heardpoints);
coll.save(network);
}
I feel like I am way off the reservation on this one. Any support would help, thanks a lot!
UPDATE
I am using the upsert suggestion but still having some issue. No doubt this will work for me, I am just not doing it correctly. I am still not getting any new points past the first one added.
BasicDBObject query = new BasicDBObject("bssid", pkt[1]);
System.out.println(query);
DBCursor cursor = coll.find(query);
System.out.println(cursor);
try {
DBObject network = cursor.next();
System.out.println(network);
network.put("heardpoints", new BasicDBObject("point",
new BasicDBObject("lat", latitude)
.append("long", longitude)));
coll.update(query, network, true, false);
} catch (NoSuchElementException ex) {
System.err.println("mongo error");
} finally {
cursor.close();
}

You've got two ways to address this really, it just depends on how you actually want to use the data. In either case the first thing to address is your "ideal data structure", and mostly because it is invalid. This is the wrong part:
'heardpoints' : {
'point' : { 'lat' : 36.12345, 'long' : -75.234564 },
'point' : { 'lat' : 36.34567, 'long' : -75.345678 }
}
So this "hash/map" is invalid because you have the same "key" named twice. You cannot do that and you probably want and "array" instead, as well as something that you have a hope of using GeoSpatial queries on later when you want to:
Array Approach
"heardpoints": [
{
"geometry": {
"type": "Point",
"coordinates": [-75.234564, 36.12345 ]
},
"time": ISODate("2014-11-04T21:09:18.437Z")
},
{
"geometry": {
"type": "Point",
"coordinates": [ -75.345678, 36.34567 ]
},
"time": ISODate("2014-11-04T21:10:28.919Z")
}
]
And a correct ordering for "lon" and "lat" as how MongoDB and the GeoJSON spec it follows does it.
Now this is for the form where you are going to keep all of your "hearddata" in a "single document" per "bssid" value, with each location kept in an array. Note that this is not really necessarily and "upsert" per se, except in the first creation instance. The main intent is to "update" the same "bssid" value document. Just in shell form now with a Java syntax translation later:
db.collection.update(
{ "bssid": "ca:fe:de:ad:be:ef" },
{
"$setOnInsert": { "channel": 6 },
"$push": {
"heardpoints": {
"$each": [{
"geometry": {
"type": "Point",
"coordinates": [-75.234564, 36.12345 ]
},
"time": ISODate("2014-11-04T21:09:18.437Z")
}],
"$sort": { "time": -1 },
"$slice": 20
}
}
},
{ "upsert": true }
);
Whatever the language and API representation, there are basically two parts to a MongoDB update operation. Essentially this:
[ < Query >, < Update > ]
Depending on the API presentation there are technically "three" parts where the third is Options but on the basic consideration on the "upsert" option, it is important to understand how both the Query and Update document portions are handled in an update operation.
The most important thing to apply to the Update document is that it has two forms. If you just supply "keys" and "values" in a standard object form then whatever is supplied will "overwrite" any existing content in a matched document. The other form (which will be used in all examples) is to use "update operators" which allow "parts" of the document to be modified or "augmented". That is important distinction. But on with the examples.
On a blank collection or at least one where the specified "bssid" value does not exist, then a new document would be created containing that "bssid" field value. Additionally there is some other behavior that is going to happen.
There is a special "update operator" in here called $setOnInsert. Just like the conditions specified in the Query portion of the statement, any fields and values mentioned here are only "created" in the document when a "new" document is inserted. So if the document matching the query condition was found then none of the operations here are actually performed to change the found document. This is a good place to set initial values and also limit the write activity on the document to just the fields where it is required.
The second section in the Update document is another "update operator" called $push. As expected by the common term in computing languages, this "adds items" to an "array". So on document creation then a new array is made and the items are appended or otherwise added to the "existing" array content in the found document.
There are some interesting modifiers here which have their own purpose. $each is a modifier that allow more than one item to be sent to an operator like $push at a time. We are only using it for a single item, but it's use it generally required with the other two modifiers we are interested in.
The next is $sort which is applied to the array elements present in the document in order to "sort" them by the condition. In this case there is a "time" field on the array elements, so the "sort" makes sure that as new elements are added then the contents of the array is always ordered so that the "newest" entries are always at the front of the array.
The final there is $slice which is complementing $sort by essentially specifying a "capped amount" for the array. So just to make sure out documents never get too large, the $slice modifier, which would be applied "after" the $sort modifier has done it's work then "removes" any entries beyond the specified "maximum" entries, and maintains the "maximum" length at that number. So quite a useful feature.
Of course if you did not care about a "time" value then there is another way to handle this so that the "coordinate" data is only kept for "unique" combinations. That way is to use the $addToSet operator to manage array or "set" entries by itself:
db.collection.update(
{ "bssid": "ca:fe:de:ad:be:ef" },
{
"$setOnInsert": { "channel": 6 },
"$addToSet": {
"heardpoints": {
"$each": [{
"geometry": {
"type": "Point",
"coordinates": [-75.234564, 36.12345 ]
}
}]
}
}
},
{ "upsert": true }
);
Now that does not actually need the $each modifier, but it's just left there for a future point. $addToSet essentially looks at the existing array content and compares it do the element you have supplied. Where that data does not exactly match something already present in the array then it is added to the "set". Otherwise, nothing happens since the data is already there.
So if you just want the data collected for specific points where they vary then this is a good approach. But there is a "catch", and a couple actually that are worth mentioning.
Suppose you want to keep only 20 entries as was mentioned before. While $addToSet supports the $each modifier, unfortunately the other modifiers such as $slice are not supported. So you cant "maintain a cap" with a single update statement and you would in fact have to issue "two" update operations in order to achieve this:
db.collection.update(
{ "bssid": "ca:fe:de:ad:be:ef" },
{
"$setOnInsert": { "channel": 6 },
"$addToSet": {
"heardpoints": {
"$each": [{
"geometry": {
"type": "Point",
"coordinates": [-75.234564, 36.12345 ]
}
}]
}
}
},
{ "upsert": true }
);
db.collection.update(
{ "bssid": "ca:fe:de:ad:be:ef" },
{
"$setOnInsert": { "channel": 6 },
"$push": {
"heardpoints": {
"$each": [],
"$slice": 20
}
}
}
)
But even so we have a new problem here. Aside from now counting in "two" operations, keeping this cap has another problem, which basically is that a "set" is "not ordered" in any way. So you can limit the total number of items in the list with the second update, but there is no way to remove the "oldest" item for example.
In order to do this then you want a "time" field for the "last update", but yes there is a catch again. Once you supply a "time" value then the "distinct data" that makes a "set" is no longer true. An $addToSet operation considers the following to be two "different" entries as all fields and not just the "coordinate" data is considered:
"heardpoints": [
{
"geometry": {
"type": "Point",
"coordinates": [-75.234564, 36.12345 ]
},
"time": ISODate("2014-11-04T21:09:18.437Z")
},
{
"geometry": {
"type": "Point",
"coordinates": [-75.234564, 36.12345 ]
},
"time": ISODate("2014-11-04T21:10:28.919Z")
}
]
Where the intent is to just "update the time" on the existing point at the given coordinates, then you need to take a different approach. But again this is two updates and in reverse, you try to update a document first and then do something else if that does not succeed. Meaning the "upsert" attempt is the second operation:
var result = db.collection.update(
{
"bssid": "ca:fe:de:ad:be:ef",
"heardpoints.geometry.coordinates": [-75.234564, 36.12345 ]
},
{
"$set": {
"heardpoints.$.time": ISODate("2014-11-04T21:10:28.919Z")
}
}
);
// If result did not match and modify anything existing then perform the upsert
if ( ) {
db.collection.update(
{ "bssid": "ca:fe:de:ad:be:ef" }, // just this key and not the array
{
"$setOnInsert": { "channel": 6 },
"$push": {
"heardpoints": {
"$each": [{
"geometry": {
"type": "Point",
"coordinates": [-75.234564, 36.12345 ]
},
"time": ISODate("2014-11-04T21:09:18.437Z")
}],
"$sort": { "time": -1 },
"$slice": 20
}
}
},
{ "upsert": true }
);
}
So two sepations where one tries to "update" an existing array entry by first querying for that position. That first operation cannot be an upsert since it would create a new document with the same "bssid" and the array entry that was not found. If it could that would be, but this is not allowed with the positional $ operator which is using a matched position of the found element so that that element can be altered via the $set operator.
In the Java invocation there is a WriteResult type that is returned which can be used like this:
WriteResult writeResult = collection.update(query1, update1, false, false);
if ( writeResult.getN() == 0 ) {
// Upsert would be tried if the array item was not found
writeResult = collection.update(query2, update2, true, false);
}
If something was not updated then the serialized content looks like this:
{ "serverUsed" : "192.168.2.3:27017" , "ok" : 1 , "n" : 0 , "updatedExisting" : true}
Which means you basically nest the n value to see what happened and make your decision on whether to "update" the array item or "push" a new one depending on where the query matched that array item or not.
Document Approach
The general conclusion from the above is that where you want to keep distinct data for the "coordinates" and just modify a "time" entry then the above process can get messy. The operations are not ideally atomic, and though there can be some tuning, it is probably not well suited to high volume updates.
This is a case then where the logic is to "remove" the array storage, and then store each distinct "point" in it's own document with the related "bssid" field. This simplifies the case of whether to update or "insert" a new one into a single operation model. Documents in the collection now look like this:
{
"bssid": "ca:fe:de:ad:be:ef",
"channel": 6,
"geometry": {
"type": "Point",
"coordinates": [-75.234564, 36.12345 ]
},
"time": ISODate("2014-11-04T21:09:18.437Z")
},
{
"bssid": "ca:fe:de:ad:be:ef",
"channel": 6,
"geometry": {
"type": "Point",
"coordinates": [ -75.345678, 36.34567 ]
},
"time": ISODate("2014-11-04T21:10:28.919Z")
}
Distinct in their own collection and not bound in the same document under an array. There is data duplication but the "update" process is now much simplified:
db.collection.update(
{
"bssid": "ca:fe:de:ad:be:ef",
"geometry": {
"type": "Point",
"coordinates": [-75.234564, 36.12345 ]
}
},
{
"$setOnInsert": { "channel": 6 },
"$set": { "time": ISODate("2014-11-04T21:10:28.919Z") }
}
{ "upsert": true }
)
And all that does would be match a document based on the supplied "bssid" and "point" values either "updating" the "time" where it matched or just inserting a new document with all values where that "bssid" and "point" data was not found.
The overall case is that where this started off with simple needs and it was fine to "embed" the array into the array, maintaining more complex needs can be a possible pain to use that storage form. On the other hand, using separate documents in the collection has it's benefits on one side, but then you do have to do your own work to "clean up" entries beyond any cap limits you might want. But it is arguable that may not necessarily need to be a "real time" operation.
Different approaches, so work with the one that suits you best. This is just a guide to implement in either way and showing the pitfalls and solutions. What works best for you, only you can tell.
This really is more about the technique than the specific Java coding. That part is not hard, so here is just some of the most difficult structure from above for reference:
DBObject update = new BasicDBObject(
"$setOnInsert", new BasicDBObject(
"channel", 6
)
).append(
"$push", new BasicDBObject(
"heardpoints", new BasicDBObject(
"$each", new DBObject[]{
new BasicDBObject(
"geometry",
new BasicDBObject("type","Point").append(
"coordinates", new double[]{-75.234564, 36.12345}
)
).append(
"time", new DateTime(2014,1,1,0,0,DateTimeZone.UTC).toDate()
)
}
).append(
"$sort", new BasicDBObject(
"time", -1
)
).append("$slice", 20)
)
);

OR and AND Operators in Elasticsearch query

I have few json document with the following format :-
_source: {
userId: "A1A1",
customerId: "C1",
component: "comp_1",
timestamp: 1408986553,
}
I want to query the document based on the following :-
(( userId == currentUserId) OR ( customerId== currentCustomerId) OR (currentRole ==ADMIN) ) AND component= currentComponent)
I tried using the SearchSourceBuilder and QueryBuilders.matchQuery, but I wasnt able to put multiple sub queries with AND and OR operators.
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(QueryBuilders.matchQuery("userId",userId)).sort("timestamp", SortOrder.DESC).size(count);
How we query elasticsearch using OR and AND operators?

I think in this case the Bool query is the best shot.
Something like :
{
"bool" : {
"must" : { "term" : { "component" : "comp_1" } },
"should" : [
{ "term" : { "userId" : "A1A1" } },
{ "term" : { "customerId" : "C1" } },
{ "term" : { "currentRole" : "ADMIN" } }
],
"minimum_should_match" : 1
}
}
Which gives in Java:
QueryBuilder qb = QueryBuilders
.boolQuery()
.must(termQuery("component", currentComponent))
.should(termQuery("userId", currentUserId))
.should(termQuery("customerId", currentCustomerId))
.should(termQuery("currentRole", ADMIN))
.minimumNumberShouldMatch(1)
The must parts are ANDs, the should parts are more or less ORs, except that you can specify a minimum number of shoulds to match (using minimum_should_match), this minimum being 1 by default I think (but you could set it to 0, meaning that a document matching no should condition would be returned as well).
If you want to do more complex queries involving nested ANDs and ORs, simply nest other bool queries inside must or should parts.
Also, as you're looking for exact values (ids and so on), maybe you can use term queries instead of match queries, which spare you the analysis phase (if those fields are analyzed at all, which doesn't necessarily make sense for ids). If they are analyzed, you still can do that, but only if you know exactly how your terms are stored (standard analyzer stores them lower cased for instance).

If you use a query_string query, your ANDs and ORs will be interpreted as such by the Lucene library.
This allows you to search for
(currentUserId OR currentCustomerId) AND currentComponent
for instance. By default, the values will be searched for in all fields.

Update / delete multiple objects using Jongo

I have a method which takes in a Collection of Objects that are to be deleted.
This is the way I am deleting them now
public void deleteAll(Collection<Object> objs){
for(Object obj : objs) {
collection.remove("{ _id: # }", obj.getId());
}
}
I am doing something very similar for update where I am looping through the passed collection of Objects. This seems to be very time consuming.
Is there a better way of doing the update/delete?

It's possible to both remove and update multiple documents with a single query.
remove
You need to use a query with a selector using $in, and an array of _id values to match.
With Jongo, you can build the list to match with $in into the query in a couple of different ways
// pass an array of ids
ObjectId[] ids = {id1, id2, id3};
collection.remove("{ _id: { $in: # } }", ids);
// or pass each id separately
collection.remove("{ _id: { $in:[#, #, #] }}", id1, id2, id3);
update
Exact same concept as above using $in to select the objects you want to update, however you also have to set the multi option, so that the update applies to all the documents it matches against, not just the first.
With Jongo this is done like so
ObjectId[] ids = {id1, id2, id3};
collection
.update("{ _id: { $in: # } }", ids)
.multi()
.with({ $set: { foo: "bar" });

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Upsert many documents based on _id - java

Instead of true in the 3rd parameter, you have to pass {new: true, upsert: true}.

Related

Firebase query to fetch data in given range

Spring data - Count distinct items from grouping

How do I check if a MongoDB object exists and create/update respectively?

OR and AND Operators in Elasticsearch query

Update / delete multiple objects using Jongo

Categories

Resources