Why are upserts so slow for MongoDB Java API? - java

Using Mongo Java Driver 2.13 and Mongo 3.0.
I am trying to move from Spring Data save() to MongoDB API's Bulk Writing since I am saving/updating about 100K objects. I am trying to write the Service/Repository layer code where I can pass in a Collection of my specific Objects and be able to either create new records or update existing records, or in other words upsert. When I do an insert the performance is very acceptable.
If I update the code to do upserts the performance is just way too slow. Am I doing something wrong in the following code sample (note it is scaled down to just the necessary logic, i.e. no error handling):
public void save(Collection<MyDomainObject> objects) {
BulkWriteOperation bulkWriter = dbCollection.initializeUnorderedBulkOperation();
for(MyDomainObject mdo : objects) {
DBObject dbObject = convert(mdo);
bulkWriter.find(new BasicDBObject("id",mdo.getId()))
.upsert().updateOne(new BasicDBObject("$set",dbObject));
}
bulkWriter.execute(writeConcern);
}
Note that I also tried replaceOne() instead of updateOne() with the same results.
I also noticed in the Mongo log that "nscannedObjects" keeps increasing while "nMatched", "nModified" and "upsert" are never larger than 1. Does this mean that it is table scanning for each record?
Am I using upsert the correct way? Any other suggestions?

Thanks to ry_donahue I figured out the issue.
It was not using the correct ID field, which is the index. In the conversion of the domain object to a DBObject there ended up being an "id" and an "_id" field.
I also changed updateOne() to replaceOne(). So now the code looks like this:
public void save(Collection<MyDomainObject> objects) {
BulkWriteOperation bulkWriter = dbCollection.initializeUnorderedBulkOperation();
for(MyDomainObject mdo : objects) {
DBObject dbObject = convert(mdo);
bulkWriter.find(new BasicDBObject("_id",new ObjectId(mdo.getId()))).upsert().replaceOne(dbObject);
}
bulkWriter.execute(writeConcern);
}
This now gives very good performance.

Related

How to query multi-valued array fields in Elasticsearch using Java client?

Using the Elasticsearch High Level REST Client for Java v7.3
I have a few fields in the schema that look like this:
{
"document_type" : ["Utility", "Credit"]
}
Basically one field could have an array of strings as the value. I not only need to query for a specific document_type, but also a general string query.
I've tried the following code:
QueryBuilder query = QueryBuilders.boolQuery()
.must(QueryBuilders.queryStringQuery(terms))
.filter(QueryBuilders.termQuery("document_type", "Utility"));
...which does not return any results. If I remove the ".filter()" part the query returns fine, but the filter appears to prevent any results from coming back. I'm suspecting it's because document_type is a multi-valued array - maybe I'm wrong though. How would I build a query query all documents for specific terms, but also filter by document_type?
I think, the reason is the wrong query. Consider using the terms query instead of term query. There is also a eqivalent in the java api.
Here is a good overview of the query qsl queries and their eqivalent in the high level rest client: https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/java-rest-high-query-builders.html

Faster way of updating database table using Hibernate (Java 8 reduction?)

I am working on a monitoring tool developed in Spring Boot using Hibernate as ORM.
I need to compare each row (already persisted rows of sent messages) in my table and see if a MailId (unique) has received a feedback (status: OPENED, BOUNCED, DELIVERED...) Yes or Not.
I get the feedbacks by reading csv files from a network folder. The CSV parsing and reading of files goes very fast, but the update of my database is very slow. My algorithm is not very efficient because I loop trough a list that can have hundred thousands of objects and look in my table.
This is the method that make the update in my table by updating the "target" Object (row in table database)
#Override
public void updateTargetObjectFoo() throws CSVProcessingException, FileNotFoundException {
// Here I make a call to performProcessing method which reads files on a folder and parse them to JavaObjects and I map them in a feedBackList of type Foo
List<Foo> feedBackList = performProcessing(env.getProperty("foo_in"), EXPECTED_HEADER_FIELDS_STATUS, Foo.class, ".LETTERS.STATUS.");
for (Foo foo: feedBackList) {
//findByKey does a simple Select in mySql where MailId = foo.getMailId()
Foo persistedFoo = fooDao.findByKey(foo.getMailId());
if (persistedFoo != null) {
persistedFoo.setStatus(foo.getStatus());
persistedFoo.setDnsCode(foo.getDnsCode());
persistedFoo.setReturnDate(foo.getReturnDate());
persistedFoo.setReturnTime(foo.getReturnTime());
//The save account here does an MySql UPDATE on the table
fooDao.saveAccount(foo);
}
}
}
What if I achieve this selection/comparison and update action in Java side? Then re-update the whole list in database?
Will it be faster?
Thanks to all for your help.
Hibernate is not particularly well-suited for batch processing.
You may be better off using Spring's JdbcTemplate to do jdbc batch processing.
However, if you must do this via Hibernate, this may help: https://docs.jboss.org/hibernate/orm/5.2/userguide/html_single/chapters/batch/Batching.html

MongoDB - How to get the count for a find query

I cannot for the life of me find out how to get a count for a find query using the java driver in mongo db. Can someone please put me out of my misery?
I have the following:
MongoCursor<Document> findRes = collection.find().iterator();
But there is no count method that I can find anywhere.
public Long getTotalCount(String collectionName, Document filterDocument) {
MongoCollection collection = database.getCollection(collectionName);
return filterDocument != null ? collection.count(filterDocument) : collection.count();
}
Where filterDocument is org.bson.Document with filter criterias or null if you want to get total count
You may also use more powerful Filters class. Example: collection.count(Filters.and(Filters.eq("field","value"),second condition and so on));
So, in order to be able to take both Document and Filters as param you may change signature to public Long getTotalCount(String collectionName, Bson filterDocument) {
long rows = db.getCollection(myCollection).count(new Document("_id", 10)) ;
this is in Java, myCollection is collection name.
MongoDB has inbuilt method count() that can be called on cursor to find the number of documents returned.
I tried following piece of code in mongodb, that worked well, can be easily applied in java or any other language too:
var findres = db.c.find()
findres.count() gave output 29353
cursor.count() is what you're looking for I believe. Your find query returns a Cursor so you can just call count() on that.

MongoDB + Morphia - full text search using AND instead of OR

I've setup full text search and MongoDB and it's working quite well (Mongo 2.6.5).
However it does an OR instead of and AND.
1) Is it possible to make the query an AND query, while still getting all the benefits of full text search (stemming etc.)
2) And if so, is it possible to add this option via the Morphia wrapper library
EDIT
I see that the full text search includes a 'score' for each document returned. Is it possible to only return docs with a certain score or above. Is there some score that would represent a 'fuzzy' and query. That is usually all tokens are in the document but not absolutely always. If so this would solve the problem as well.
Naturally if possible to do this via Morphia that would be super helpful. But I can use the native java driver as well.
Any pointers in the correct direction, much appreciated.
EDIT
Code looks like this, I'm using Morphia 1.0.1:
Datastore ds = Dao.instance().getDatabase();
Query<Product> q = ds.createQuery(Product.class).search("grey vests");
List<Product> prods = q.asList();
Printing the query gives:
{ "$text" : { "$search" : "grey vests"}}
Note: I am able to do take an intersection of multiple result sets to create an AND query. However this is very slow since something like "grey" will return a massive result set and be slow at feeding the results back.
EDIT
I've tried to chain the search() calls and add a single 'token' to each call. But I am getting a run time error. Code becomes:
q.search("grey").search("vests");
The query I get is (which seems like it's doing the right thing) ...
{ "$and" : [ { "$text" : { "$search" : "grey"}} , { "$text" : { "$search" : "vests"}}]}
The error is:
com.mongodb.MongoQueryException: Query failed with error code 17287 and error message 'Can't canonicalize query: BadValue Too many text expressions' on server ...
at com.mongodb.connection.ProtocolHelper.getQueryFailureException(ProtocolHelper.java:93)

alias in mongo db

I've started to fiddle with mongo db and came up with a question.
Say, I have an object (POJO) with an id field (say, named 'ID') that I would like to represent in JSON and store/load in/from Mongo DB.
As far as I understood any object always has _id field (with underscore, lowercased).
What I would like to do is: during the query I would like the mongo db to return me my JSON with field ID instead of _id.
In SQL I would use something like
SELECT _id as ID ...
My question is whether its possible to do this in mongo db, and if it is, the Java based Example will be really appreciated :)
I understand that its possible to iterate over the records and substitute the _id with ID manually but I don't want this O(n) loop.
I also don't really want to duplicate the lines and store both "id" and "_id"
So I'm looking for solution at the level of query or maybe Java Driver.
Thanks in advance and have a nice day
Mongodb doesnt use SQL , its more like Object Query Language and Collections.
what you can try is , some thing similar to below code using Mongo Java Driver
Pojo obj = new PojoInstance();
obj.setId(id);
db.yourdb.find(obj);
I've end up using the following approach in the Java Driver:
DBCursor cursor = runSomeQuery();
try {
while(cursor.hasNext()) {
DBObject dbObject = cursor.next();
ObjectId id = (ObjectId) dbObject.get("_id");
dbObject.removeField("_id");
dbObject.put("ID", id.toString());
System.out.println(dbObject);
}
} finally {
cursor.close();
}
I was wondering whether this is the best solution or I have other better options
Mark
Here's an example of what I am doing in Javascript. It may be helpful to you. In my case I am removing the _id field and aliasing the two very nested fields to display simpler names.
db.players.aggregate([
{ $match: { accountId: '12345'}},
{ $project: {
"_id": 0,
"id": "$id",
"masterVersion": "$branches.master.configuration.player.template.version",
"previewVersion": "$branches.preview.configuration.player.template.version"
}
}
])
I hope you find this helpful.

Categories