I am trying to upsert millions of data using BulkWriteOperation, but my code is giving exception when my query condition is not satisfying but a document is available with that id.
Here is my code :-
if(provisionSubscriberList.size()>0){
Map<String, Object> map = new HashMap<String, Object>();
map.put("id", campaignTO.getId());
map.put("testSample", false);
map.put("status", "Active");
map.put("controlGroup", false);
try{
WriteConcern wc = WriteConcern.ACKNOWLEDGED;
BulkWriteOperation bulk = mongoTemplate.getCollection("provisionSubscriber").initializeOrderedBulkOperation();
for (ProvisionSubscriberEntity provisionalSubscriber : provisionSubscriberList) {
Query queryForAddSubscriber = new Query();
Update updateFieldsForAddSubscriber = new Update();
updateFieldsForAddSubscriber.set("msisdn", provisionalSubscriber.getMsisdn());
updateFieldsForAddSubscriber.set("deviceType", provisionalSubscriber.getDeviceType());
updateFieldsForAddSubscriber.addToSet("campaignIdList", map);
List<DBObject> criteria = new ArrayList<DBObject>();
criteria.add(new BasicDBObject("_id",new ObjectId(provisionalSubscriber.getId())));
criteria.add(new BasicDBObject("campaignIdList.id", new BasicDBObject("$ne", campaignTO.getId())));
criteria.add(new BasicDBObject("campaignIdList.controlGroup", new BasicDBObject("$ne", true)));
criteria.add(new BasicDBObject("campaignIdList.status", new BasicDBObject("$ne", "Active")));
BasicDBObject queryCriteria = new BasicDBObject("$and", criteria);
bulk.find(queryCriteria).upsert().updateOne(updateFieldsForAddSubscriber.getUpdateObject());
}
BulkWriteResult results =bulk.execute(wc);
System.out.println(results);
for (BulkWriteUpsert up : results.getUpserts()) {
System.out.println(up.getId());
}
And Here is the Exception I am getting:-
com.mongodb.BulkWriteException: Bulk write operation error on server 192.168.1.113:27017. Write errors: [BulkWriteError{index=0, code=11000, message='E11000 duplicate key error index: jmailer_digiengage.provisionSubscriber.$_id_ dup key: { : ObjectId('58c8f33301de9614143f5812') }', details={ }}].
at com.mongodb.BulkWriteHelper.translateBulkWriteException(BulkWriteHelper.java:56)
at com.mongodb.DBCollection.executeBulkWriteOperation(DBCollection.java:2310)
at com.mongodb.BulkWriteOperation.execute(BulkWriteOperation.java:136)
at com.lumatadigital.digiengage.daoImpl.ProvisioningDaoImpl.provisionOnCampaign(ProvisioningDaoImpl.java:120)
at com.lumatadigital.digiengage.schedular.service.SchedularJobConfig.provisioningJob(SchedularJobConfig.java:29)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.springframework.util.MethodInvoker.invoke(MethodInvoker.java:269)
at org.springframework.scheduling.quartz.MethodInvokingJobDetailFactoryBean$MethodInvokingJob.executeInternal(MethodInvokingJobDetailFactoryBean.java:257)
at org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:75)
at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557)
EDIT: Basically, I want to insert data if document is not available or update data if the document is available and my query is satisfying to that document, otherwise skip that document. Also, I want to track the upserted documents.
It is because in your query criteria:
List<DBObject> criteria = new ArrayList<DBObject>();
criteria.add(new BasicDBObject("_id",new ObjectId(provisionalSubscriber.getId())));
criteria.add(new BasicDBObject("campaignIdList.id", new BasicDBObject("$ne", campaignTO.getId())));
criteria.add(new BasicDBObject("campaignIdList.controlGroup", new BasicDBObject("$ne", true)));
criteria.add(new BasicDBObject("campaignIdList.status", new BasicDBObject("$ne", "Active")));
BasicDBObject queryCriteria = new BasicDBObject("$and", criteria);
If the _id field is inserted in the database already with the insert statement, and when the update statement runs for the next time, the criteria "$ne" (not equal to) in the campaign list object fails that will create new row with the same _id tries to insert instead of update since the previous data do not match with the current data.
Hence you are getting the below error:
E11000 duplicate key error index: jmailer_digiengage.provisionSubscriber.$_id_ dup key: { : ObjectId('58c8f33301de9614143f5812') }
To do bulk update, you can use the below code
MongoCollection<Document> collection = database.getCollection("collection");
List<WriteModel<Document>> updates = new ArrayList<WriteModel<Document>>();
UpdateOptions options = new UpdateOptions();
options.upsert(true);
// Doc1 update
Document doc1 = new Document("$set", new Document("key1", "value1"));
updates.add(new UpdateOneModel<Document>(new Document("_id",new ObjectId("562a44971bca3c0001953f42")), doc1, options));
//Doc2 update
Document doc2 = new Document("$set", new Document("key1", "value2"));
updates.add(new UpdateOneModel<Document>(new Document("_id",new ObjectId("562a44971bca3c0001954071")), doc2, options));
BulkWriteResult result = collection.bulkWrite(updates);
System.out.println("Updated count : " + result.getModifiedCount());
In the below snippet
updates.add(new UpdateOneModel<Document>(new Document("_id",new ObjectId("562a44971bca3c0001954071")), doc2, options));
The 1st condition is the filter condition where you can use any key present in the doc to filter out the document you want to update, the 2nd parameter is the fields that needs to be updated for the doc, the 3rd parameter is additional options that can be passed to the model
Related
I am trying to do unordered bulk insert but I am getting write error, and on same single insert is working.
final DBCollection col = db.getCollection(STUDENT);
BulkWriteOperation bulkop = col.initializeUnorderedBulkOperation();
for(){
//putting values in doc
DBObject doc = new BasicDBObject();
//
bulkop.insert(doc);
}
bulkop.execute(WriteConcern.FSYNCED);
here is log :
Bulk write operation error on server IP:27017. Write errors: [com.mongodb.BulkWriteError#24cd7085].
com.mongodb.BulkWriteException: Bulk write operation error on server 192.168.50.166:27017. Write errors: [com.mongodb.BulkWriteError#24cd7085].
at com.mongodb.BulkWriteHelper.translateBulkWriteException(BulkWriteHelper.java:57)
at com.mongodb.DBCollection.executeBulkWriteOperation(DBCollection.java:2202)
at com.mongodb.DBCollection.executeBulkWriteOperation(DBCollection.java:2188)
at com.mongodb.BulkWriteOperation.execute(BulkWriteOperation.java:121)
I am using Mongo 3.0.4 Java Driver.
For Mongodb-Java 3.0.1 and above , you can use this logic,
MongoClient mongoClient = new MongoClient("localhost",27017);
MongoDatabase database = mongoClient.getDatabase("testDB");
MongoCollection<Document> collection = database.getCollection("testColl").withWriteConcern(WriteConcern.FSYNCED);
BulkWriteOptions options = new BulkWriteOptions();
options.ordered(false);
ArrayList<InsertOneModel<Document>> insertList = new ArrayList<InsertOneModel<Document>>();
for(int i =0; i<1000;i++){
Document document= new Document().append("_id",i).append("testdat", "testvalue");;
insertList.add(new InsertOneModel<Document>(document));
}
collection.bulkWrite(insertList, options);
I'm using mongo-java-driver 3.0.2.
I have a method that uses MongoCollection.aggregate(List<Bson> pipeline) to sort and limit:
private static MongoIterable<Document> selectTop(int n) {
BasicDBObject sortFields = new BasicDBObject("score", -1);
BasicDBObject sort = new BasicDBObject("$sort", sortFields);
BasicDBObject limit = new BasicDBObject("$limit", n);
List<BasicDBObject> pipeline = new ArrayList<>();
pipeline.add(sort);
pipeline.add(limit);
return playersCollection.aggregate(pipeline);
}
When n is big, it fails with:
com.mongodb.MongoCommandException: Command failed with error 16820: 'exception: Sort exceeded memory limit of 104857600 bytes, but did not opt in to external sorting. Aborting operation. Pass allowDiskUse:true to opt in.'
I've found that the MongoDB shell provides a method db.collection.aggregate(pipeline, options) (link) where options can contain an allowDiskUse field.
I can't find the equivalent to this in the Java API. Although there is an AggregationOptions class, the MongoCollection class doesn't provide an aggregate(List<Bson> pipeline, AggregationOptions options) method.
This still works on the 3.0.3 driver:
MongoClient client = new MongoClient(new ServerAddress("127.0.0.1", 27017));
DB test = client.getDB("test");
DBCollection sample = test.getCollection("sample");
List<DBObject> aggregationQuery = Arrays.<DBObject>asList(
new BasicDBObject("$sort",new BasicDBObject("score",-1)),
new BasicDBObject("$limit",1)
);
System.out.println(aggregationQuery);
Cursor aggregateOutput = sample.aggregate(
aggregationQuery,
AggregationOptions.builder()
.allowDiskUse(true)
.build()
);
while ( aggregateOutput.hasNext() ) {
DBObject doc = aggregateOutput.next();
System.out.println(doc);
}
Of course you can also use newer classes as well:
MongoClient client = new MongoClient(new ServerAddress("192.168.2.4", 27017));
MongoDatabase db = client.getDatabase("test");
MongoCollection<Document> collection = db.getCollection("sample");
AggregateIterable<Document> result = collection.aggregate(Arrays.asList(
new BasicDBObject("$sort", new BasicDBObject("score", -1)),
new BasicDBObject("$limit", 1)
)).allowDiskUse(true);
MongoCursor<Document> cursor = result.iterator();
while (cursor.hasNext()) {
Document doc = cursor.next();
System.out.println(doc);
}
So .aggregate() on MongoCollection returns an AggregateIterable class instance, which has an .allowDiskuse() method as well as others to set aggregation options.
Can anyone explain me why in Java when i do an aggregation pipeline with "$out" don't write the result in the new collection when i write only this:
Document match = new Document("$match", new Document("top_speed",new Document("$gte",350)));
Document out=new Document("$out", "new_collection");
coll.aggregate(Arrays.asList(
match,out
)
);
When I save the aggregation result and I iterate on it, the new collection is created and the result of the match is inside (Java has an error obviously in this case):
AggregateIterable<Document> resultAgg=
coll.aggregate(Arrays.asList(
match,out
)
);
for (Document doc : resultAgg){
System.out.println("The result of aggregation match:-"+ doc.toJson());
}
I can't understand why.
You can call toCollection() method instead of iterating.
Document match = new Document("$match", new Document("top_speed", new Document("$gte", 350)));
Document out = new Document("$out", "new_collection");
coll.aggregate(Arrays.asList(match, out)).toCollection();
I'm trying to query and sort documents as followed:
Query only for documents older than SOMETIME.
Within range of AROUNDME_RANGE_RADIUS_IN_RADIANS.
Get distance for each document.
Sort them by time. New to Old.
Overall it should return up to 20 results.
But it seems that since $geoNear is by default limited to 100 results, I get unexpected results.
I see $geoNear working in the following order:
Gets docs from the entire collection, by distance.
And only then executes the given Query.
Is there a way to reverse the order?
MongoDB v2.6.5
Java Driver v2.10.1
Thank you.
Example document in my collection:
{
"timestamp" : ISODate("2014-12-27T06:52:17.949Z"),
"text" : "hello",
"loc" : [
34.76701564815013,
32.05852053407342
]
}
I'm using aggregate since from what I understood it's the only way to sort by "timestamp" and get the distance.
BasicDBObject query = new BasicDBObject("timestamp", new BasicDBObject("$lt", SOMETIME));
// aggregate: geoNear
double[] currentLoc = new double[] {
Double.parseDouble(myLon),
Double.parseDouble(myLat)
};
DBObject geoNearFields = new BasicDBObject();
geoNearFields.put("near", currentLoc);
geoNearFields.put("distanceField", "dis");
geoNearFields.put("maxDistance", AROUNDME_RANGE_RADIUS_IN_RADIANS));
geoNearFields.put("query", query);
//geoNearFields.put("num", 5000); // FIXME: a temp solution I would really like to avoid
DBObject geoNear = new BasicDBObject("$geoNear", geoNearFields);
// aggregate: sort by timestamp
DBObject sortFields = new BasicDBObject("timestamp", -1);
DBObject sort = new BasicDBObject("$sort", sortFields);
// aggregate: limit
DBObject limit = new BasicDBObject("$limit", 20);
AggregationOutput output = col.aggregate(geoNear, sort, limit);
You could add a $match stage at the top of the pipleine, to filter the documents before the $geonear stage.
BasicDBObject match = new BasicDBObject("timestamp",
new BasicDBObject("$lt", SOMETIME));
AggregationOutput output = col.aggregate(match,geoNear, sort, limit);
The below piece of code now, is not required,
geoNearFields.put("query", query);
I have the following structure in my document:
{
_id : ObjectId("43jh4j343j4j"),
array : [
{
_arrayId : ObjectId("dsd87dsa9d87s9d7"),
someField : "something",
someField2 : "something2"
},
{
_arrayId : ObjectId("sds9a0d9da0d9sa0"),
someField : "somethingElse",
someField2 : "somethingElse2"
}
]
}
I want to update someField and someField2 but only for one of the items in the array, the one that matches _arrayId (e.g. _arrayId : ObjectId("dsd87dsa9d87s9d7"); and only for this document (e.g. _id : ObjectId("43jh4j343j4j") ) and no other.
The arrayIds are not unique to the document that's why I need it to be for a specific document. I could use the $ positional operator if I wanted to update that value within the array for every document it exists in, but that's not what I want.
I am trying to accomplish this in java but a command line solution would work as well.
Here is RameshVel's solution translated to java:
DB db = conn.getDB( "yourDB" );
DBCollection coll = db.getCollection( "yourCollection" );
ObjectId _id = new ObjectId("4e71b07ff391f2b283be2f95");
ObjectId arrayId = new ObjectId("4e639a918dca838d4575979c");
BasicDBObject query = new BasicDBObject();
query.put("_id", _id);
query.put("array._arrayId", arrayId);
BasicDBObject data = new BasicDBObject();
data.put("array.$.someField", "updated");
BasicDBObject command = new BasicDBObject();
command.put("$set", data);
coll.update(query, command);
You could still use $ positional operator to accomplish this. But you need to specify the objectid of the parent doc along with the _arrayid filter. The below command line query works fine
db.so.update({_id:ObjectId("4e719eb07f1d878c5cf7333c"),
"array._arrayId":ObjectId("dsd87dsa9d87s9d7")},
{$set:{"array.$.someField":"updated"}})
...and this is how to do it with mongo-driver version >= 3.1 (mine is 3.2.2):
final MongoClient mongoClient = new MongoClient(new MongoClientURI(mongoURIString));
final MongoDatabase blogDatabase = mongoClient.getDatabase("yourDB");
MongoCollection<Document> postsCollection = blogDatabase.getCollection("yourCollection");
ObjectId _id = new ObjectId("4e71b07ff391f2b283be2f95");
ObjectId arrayId = new ObjectId("4e639a918dca838d4575979c");
Bson filter = Filters.and(Filters.eq( "_id", id ), Filters.eq("array._arrayId", arrayId));
Bson setUpdate = Updates.set("array.$.someField", "updated");
postsCollection.updateOne(postFilter, setUpdate);
Seeing as none of the answers actually explain how to do this a) in Java and b) for multiple fields in a nested array item, here is the solution for mongo-java-driver 3.12.3.
import com.mongodb.client.MongoCollection;
import com.mongodb.client.model.Filters;
import com.mongodb.client.model.Updates;
import org.bson.Document;
import org.bson.types.ObjectId;
MongoClient mongoClient = MongoClients.create(...);
MongoDatabase db = mongoClient.getDatabase("testDb");
MongoCollection<Document> collection = db.getCollection("testCollection");
collection.updateOne(
Filters.and(
Filters.eq("_id", new ObjectId("43jh4j343j4j")),
Filters.eq("array._arrayId", new ObjectId("dsd87dsa9d87s9d7"))
),
Updates.combine(
Updates.set("array.$.someField", "new value 1"),
Updates.set("array.$.someField2", "new value 2")
)
);
This thread has helped me towards the right solution, but I had to do more research for the full solution, so hoping that someone else will benefit from my answer too.