MongoDB java driver bulk update vs regular update performance - java

I am using MongoDB java driver in my application.
In a specific use case, I had to update huge number of records in a collection. I found that using MongoDb Java Driver's BulkUpdate API had far worse performance as compared to regular update. This is really strange, I thought bulk update method was supposed increase performance.
Here are specific details -
Collection had 270k records. There were around 17k update statements to be executed. Database is locally installed.
Bulk Update Code
public void scramble(String collectionName, String fieldName) {
final List<String> distinctValues = getDistinctValues(collectionName, fieldName);
final BulkWriteOperation bulk = mongoDb.getCollection(collectionName).initializeUnorderedBulkOperation();
for (final String actualValue : distinctValues) {
if (!StringUtils.isEmpty(actualValue)) {
final String scrambledValue = getScrambledValue(actualValue);
update(fieldName, actualValue, scrambledValue, bulk);
}
}
bulk.execute();
}
protected void update(String fieldName, String actualValue, String scrambledValue, BulkWriteOperation bulk) {
final DBObject searchQuery = new BasicDBObject(fieldName, actualValue);
final DBObject updateQuery = new BasicDBObject("$set", new BasicDBObject(fieldName, scrambledValue));
bulk.find(searchQuery).update(updateQuery);
}
Regular Update Code
public void scramble(String collectionName, String fieldName) {
final List<String> distinctValues = getDistinctValues(collectionName, fieldName);
for (final String actualValue : distinctValues) {
if (!StringUtils.isEmpty(actualValue)) {
final String scrambledValue = getScrambledValue(actualValue);
update(collectionName, fieldName, actualValue, scrambledValue);
}
}
}
protected void update(String collectionName, String fieldName, String actualValue, String scrambledValue) {
final DBObject searchQuery = new BasicDBObject(fieldName, actualValue);
final DBObject updateQuery = new BasicDBObject("$set", new BasicDBObject(fieldName, scrambledValue));
mongoDb.getCollection(collectionName).update(searchQuery, updateQuery);
}
Bulk Update API code took around 1.5 hrs and regular update took 2-3 mins.
Can someone please explain this behavior.

Related

Updating elasticsearch entities with bulk

I have this database data as below (ES 7.xx) version
{
"id":"1234",
"expirationDate":"17343234234",
"paths":"http:localhost:9090",
"work":"software dev",
"family":{
"baba":"jams",
"mother":"ela"
}
},
{
"id":"00021",
"expirationDate":"0123234",
"paths":"http:localhost:8080",
"work":"software engi",
"family":{
"baba":"stev",
"mother":"hela"
}
}
how can i update the entity which its expirationDate smaller than current Time? to be the current time for example:
the id 00021 is expired because its expiration date is smaller than today then it should updated to current time.
something like void updateExpiredEntity(List<ids> ids,Long currentTime) suing void bulkUpdate(List<UpdateQuery> queries, BulkOptions bulkOptions, IndexCoordinates index);
Please provide me some code implementation
is it correct like this?
public void update(UUID id,Long currentDate) {
UpdateQuery updateQuery = UpdateQuery.builder(id.toString()).withRouting("expirationDate=currentDate")
.build();
elasticsearchTemplate.bulkUpdate(List.of(updateQuery), IndexCoordinates.of("index"));
}
}
If you are using Elasticsearch 7.xx, I will assume that you have use Spring Data Elasticsearch version 4.0.x that comes with Spring boot 2.3.x. Since it's the version that support Elasticsearch 7.xx.
There're many things that have change in this Spring Data Elasticsearch version. Update document by query is one of them. Unlike before that we autowired ElasticsearchTemplate, we now have to use ElasticsearchRestTemplate and RestHighLevelClient instead.
In your case if you might want to use RestHighLevelClient to update document by query. Assume that you stored expirationDate as number mapping type in seconds unit then the code that you have asked for should look like this.
public class ElasticsearchService {
#Autowired
private ElasticsearchRestTemplate elasticsearchRestTemplate;
#Autowired
private RestHighLevelClient highLevelClient;
public void updateExpireDateDemo() throws IOException {
String indexName = "test";
Date currentDate = new Date();
Long seconds = (Long) (currentDate.getTime() / 1000);
UpdateByQueryRequest request = new UpdateByQueryRequest(indexName);
request.setQuery(new RangeQueryBuilder("expirationDate").lte(seconds));
Script updateScript = new Script(
ScriptType.INLINE, "painless",
"ctx._source.expirationDate=" + seconds + ";",
Collections.emptyMap());
request.setScript(updateScript);
highLevelClient.updateByQuery(request, RequestOptions.DEFAULT);
}
}
I'm not quite get why you really need to use the bulkUpdate but if that's the case then. You have to query the record that need to be update from Elasticsearch to get and id of each document first. Then you can update with list of UpdateQuery. So your code will look like this.
#Service
public class ElasticsearchService {
#Autowired
private ElasticsearchRestTemplate elasticsearchRestTemplate;
public void updateExpireDateByBulkDemo() throws IOException {
String indexName = "test";
Date currentDate = new Date();
Long seconds = (Long) (currentDate.getTime() / 1000);
List<UpdateQuery> updateList = new ArrayList();
RangeQueryBuilder expireQuery = new RangeQueryBuilder("expirationDate").lte(seconds);
NativeSearchQuery query = new NativeSearchQueryBuilder().withQuery(expireQuery).build();
SearchHits<Data> searchResult = elasticsearchRestTemplate.search(query, Data.class, IndexCoordinates.of(indexName));
for (SearchHit<Data> hit : searchResult.getSearchHits()) {
String elasticsearchDocumentId = hit.getId();
updateList.add(UpdateQuery.builder(elasticsearchDocumentId).withScript("ctx._source.expirationDate=" + seconds + ";").build());
}
if (updateList.size() > 0) {
elasticsearchRestTemplate.bulkUpdate(updateList, IndexCoordinates.of(indexName));
}
}
}
However, this only update first page of the search result. If you require to update every record matched your query then you have to use searchScroll method in oder to get every document id instead.

Is there any way to write custom or native queries in Java JPA (DocumentDbRepository) while firing a query to azure-cosmosdb?

Connected to azure-cosmosdb and able to fire default queries like findAll() and findById(String Id). But I can't write a native query using #Query annotation as the code is not considering it. Always considering the name of the function in respository class/interface. I need a way to fire a custom or native query to azure-cosmos db. ?!
Tried with #Query annotation. But not working.
List<MonitoringSessions> findBySessionID(#Param("sessionID") String sessionID);
#Query(nativeQuery = true, value = "SELECT * FROM MonitoringSessions M WHERE M.sessionID like :sessionID")
List<MonitoringSessions> findSessions(#Param("sessionID") String sessionID);
findBySessionID() is working as expected. findSessions() is not working. Below root error came while running the code.
Caused by: org.springframework.data.mapping.PropertyReferenceException: No property findSessions found for type MonitoringSessions
Thanks for the response. I got what I exactly wanted from the below link. Credit goes to Author of the link page.
https://cosmosdb.github.io/labs/java/technical_deep_dive/03-querying_the_database_using_sql.html
public class Program {
private final ExecutorService executorService;
private final Scheduler scheduler;
private AsyncDocumentClient client;
private final String databaseName = "UniversityDatabase";
private final String collectionId = "StudentCollection";
private int numberOfDocuments;
public Program() {
// public constructor
executorService = Executors.newFixedThreadPool(100);
scheduler = Schedulers.from(executorService);
client = new AsyncDocumentClient.Builder().withServiceEndpoint("uri")
.withMasterKeyOrResourceToken("key")
.withConnectionPolicy(ConnectionPolicy.GetDefault()).withConsistencyLevel(ConsistencyLevel.Eventual)
.build();
}
public static void main(String[] args) throws InterruptedException, JSONException {
FeedOptions options = new FeedOptions();
// as this is a multi collection enable cross partition query
options.setEnableCrossPartitionQuery(true);
// note that setMaxItemCount sets the number of items to return in a single page
// result
options.setMaxItemCount(5);
String sql = "SELECT TOP 5 s.studentAlias FROM coll s WHERE s.enrollmentYear = 2018 ORDER BY s.studentAlias";
Program p = new Program();
Observable<FeedResponse<Document>> documentQueryObservable = p.client
.queryDocuments("dbs/" + p.databaseName + "/colls/" + p.collectionId, sql, options);
// observable to an iterator
Iterator<FeedResponse<Document>> it = documentQueryObservable.toBlocking().getIterator();
while (it.hasNext()) {
FeedResponse<Document> page = it.next();
List<Document> results = page.getResults();
// here we iterate over all the items in the page result
for (Object doc : results) {
System.out.println(doc);
}
}
}
}

How do you execute a MongoDB query stored as string in Java?

I'm kind of new to the MongoDB Java driver and I was wondering how you could execute a query stored as a string. Is this the best way to execute them, or what would be a better approach?
I've stumbled across the piece of the below on another stackoverflow thread, but haven't been able to get anything useful out of it. The output does not contain the result of the query at all.
The code I'm running right now:
#Test
public void testExecuteStoredQueries() {
String code = "db.getCollection('users').find({})";
final BasicDBObject command = new BasicDBObject();
String formattedCode = String.format("function() { return %s ; }", code);
System.out.println("Formatted code:");
System.out.println(formattedCode);
command.put("eval", formattedCode);
Document result = DbEngine.getInstance().getDatabase().runCommand(command);
System.out.println(result.toJson());
}
Summarized output:
{
"retval": {
"_mongo": "....",
"_db": "...",
"_collection": "...",
"_ns": "cezy.users",
"_query": {},
"_fields": null,
"_limit": 0,
"_skip": 0,
"_batchSize": 0,
"_options": 0,
"_cursor": null,
"_numReturned": 0,
"_special": false
},
"ok": 1
}
I use morphia when i have to deal with objects. As when you retrieve the data from MongoDb, for the long values you get extended Json instead of Json Response. Parsing Extended Json could be a trouble and might break the code. As Gson doesn't support the conversion from Extended Json to Json.
private void createDatastore(boolean createIndexes) {
Morphia morphia = new Morphia();
morphia.map(classname.class);
datastore = morphia.createDatastore(mongoClient, databaseName);
if (createIndexes) {
datastore.ensureIndexes();
}
}
#Override
public Datastore getDatastore() {
return this.datastore;
}
#Test
public void testExecuteStoredQueries() {
String code = "db.getCollection('users').find({})";
String formattedCode = String.format("function() { return %s ; }", code);
final BasicDBObject basicObject = new BasicDBObject(new BasicDBObject("$in", formattedCode));
Query<ClassName> query = getDatastore().createQuery(<Classname>.class).filter("_eval", basicObject);
List<Classname> List = query.asList();
//if you want to access each object and perform some task
List.forEach((cursor) -> {
//perform your task
});
}
Removing the function creation and adding ".toArray()" pretty much solved the problem.
#Test
public void testExecuteStoredQueries() {
String code = "db.users.find({}).toArray();";
final BasicDBObject command = new BasicDBObject();
command.put("eval", code);
Document result = DbEngine.getInstance().getDatabase().runCommand(command);
System.out.println(result.toJson());
assertNotNull(result.get("retval"));
}
The array is in the "retval" field of the response.

Nested Query in DynamoDB returns nothing

I'm using DynamoDB with the Java SDK, but I'm having some issues with querying nested documents. I've included simplified code below. If I remove the filter expression, then everything gets returned. With the filter expression, nothing is returned. I've also tried using withQueryFilterEntry(which I'd prefer to use) and I get the same results. Any help is appreciated. Most of the documentation and forums online seem to use an older version of the java sdk than I'm using.
Here's the Json
{
conf:
{type:"some"},
desc: "else"
}
Here's the query
DynamoDBQueryExpression<JobDO> queryExpression = new DynamoDBQueryExpression<PJobDO>();
queryExpression.withFilterExpression("conf.Type = :type").addExpressionAttributeValuesEntry(":type", new AttributeValue(type));
return dbMapper.query(getItemType(), queryExpression);
Is it a naming issue? (your sample json has "type" but the query is using "Type")
e.g. the following is working for me using DynamoDB Local:
public static void main(String [] args) {
AmazonDynamoDBClient client = new AmazonDynamoDBClient(new BasicAWSCredentials("akey1", "skey1"));
client.setEndpoint("http://localhost:8000");
DynamoDBMapper mapper = new DynamoDBMapper(client);
client.createTable(new CreateTableRequest()
.withTableName("nested-data-test")
.withAttributeDefinitions(new AttributeDefinition().withAttributeName("desc").withAttributeType("S"))
.withKeySchema(new KeySchemaElement().withKeyType("HASH").withAttributeName("desc"))
.withProvisionedThroughput(new ProvisionedThroughput().withReadCapacityUnits(1L).withWriteCapacityUnits(1L)));
NestedData u = new NestedData();
u.setDesc("else");
Map<String, String> c = new HashMap<String, String>();
c.put("type", "some");
u.setConf(c);
mapper.save(u);
DynamoDBQueryExpression<NestedData> queryExpression = new DynamoDBQueryExpression<NestedData>();
queryExpression.withHashKeyValues(u);
queryExpression.withFilterExpression("conf.#t = :type")
.addExpressionAttributeNamesEntry("#t", "type") // returns nothing if use "Type"
.addExpressionAttributeValuesEntry(":type", new AttributeValue("some"));
for(NestedData u2 : mapper.query(NestedData.class, queryExpression)) {
System.out.println(u2.getDesc()); // "else"
}
}
NestedData.java:
#DynamoDBTable(tableName = "nested-data-test")
public class NestedData {
private String desc;
private Map<String, String> conf;
#DynamoDBHashKey
public String getDesc() { return desc; }
public void setDesc(String desc) { this.desc = desc; }
#DynamoDBAttribute
public Map<String, String> getConf() { return conf; }
public void setConf(Map<String, String> conf) { this.conf = conf; }
}

spring data mongodb query document

I am facing this issue(getting null response) when i am trying to Query in Java using
I need to based on placed time stamp range and releases desc and status.
// My document as follows:
<ordersAuditRequest>
<ordersAudit>
<createTS>2013-04-19 12:19:17.165</createTS>
<orderSnapshot>
<orderId>43060151</orderId>
<placedTS>2013-04-19 12:19:17.165</placedTS>
<releases>
<ffmCenterDesc>TW</ffmCenterDesc>
<relStatus>d </relStatus>
</releases>
</ordersAudit>
</ordersAuditRequest>
I am using following query but it returns null.
Query query = new Query();
query.addCriteria(Criteria.where("orderSnapshot.releases.ffmCenterDesc").is(ffmCenterDesc)
.and("orderSnapshot.releases.relStatus").is(relStatus)
.andOperator(
Criteria.where("orderSnapshot.placedTS").gt(orderPlacedStart),
Criteria.where("orderSnapshot.placedTS").lt(orderPlacedEnd)
)
);
I can't reproduce your problem, which suggests that the issue is with the values in the database and the values you're passing in to the query (i.e. they're not matching). This is not unusual when you're trying to match dates, as you need to make sure they're stored as ISODates in the database and queried using java.util.date in the query.
I have a test that shows your query working, but I've made a number of assumptions about your data.
My test looks like this, hopefully this will help point you in the correct direction, or if you give me more feedback I can re-create your problem more accurately.
#Test
public void shouldBeAbleToQuerySpringDataWithDates() throws Exception {
// Setup - insert test data into the DB
SimpleDateFormat dateFormat = new SimpleDateFormat("yyyy-MM-dd' 'hh:mm:ss.SSS");
MongoTemplate mongoTemplate = new MongoTemplate(new Mongo(), "TheDatabase");
// cleanup old test data
mongoTemplate.getCollection("ordersAudit").drop();
Release release = new Release("TW", "d");
OrderSnapshot orderSnapshot = new OrderSnapshot(43060151, dateFormat.parse("2013-04-19 12:19:17.165"), release);
OrdersAudit ordersAudit = new OrdersAudit(dateFormat.parse("2013-04-19 12:19:17.165"), orderSnapshot);
mongoTemplate.save(ordersAudit);
// Create and run the query
Date from = dateFormat.parse("2013-04-01 01:00:05.000");
Date to = dateFormat.parse("2014-04-01 01:00:05.000");
Query query = new Query();
query.addCriteria(Criteria.where("orderSnapshot.releases.ffmCenterDesc").is("TW")
.and("orderSnapshot.releases.relStatus").is("d")
.andOperator(
Criteria.where("orderSnapshot.placedTS").gt(from),
Criteria.where("orderSnapshot.placedTS").lt(to)
)
);
// Check the results
List<OrdersAudit> results = mongoTemplate.find(query, OrdersAudit.class);
Assert.assertEquals(1, results.size());
}
public class OrdersAudit {
private Date createdTS;
private OrderSnapshot orderSnapshot;
public OrdersAudit(final Date createdTS, final OrderSnapshot orderSnapshot) {
this.createdTS = createdTS;
this.orderSnapshot = orderSnapshot;
}
}
public class OrderSnapshot {
private long orderId;
private Date placedTS;
private Release releases;
public OrderSnapshot(final long orderId, final Date placedTS, final Release releases) {
this.orderId = orderId;
this.placedTS = placedTS;
this.releases = releases;
}
}
public class Release {
String ffmCenterDesc;
String relStatus;
public Release(final String ffmCenterDesc, final String relStatus) {
this.ffmCenterDesc = ffmCenterDesc;
this.relStatus = relStatus;
}
}
Notes:
This is a TestNG class, not JUnit.
I've used SimpleDateFormat to create Java Date classes, this is just for ease of use.
The XML value you pasted for relStatus included spaces, which I have stripped.
You showed us the document structure in XML, not JSON, so I've had to assume what your data looks like. I've translated it almost directly into JSON, so it looks like this in the database:
{
"_id" : ObjectId("51d689843004ec60b17f50de"),
"_class" : "OrdersAudit",
"createdTS" : ISODate("2013-04-18T23:19:17.165Z"),
"orderSnapshot" : {
"orderId" : NumberLong(43060151),
"placedTS" : ISODate("2013-04-18T23:19:17.165Z"),
"releases" : {
"ffmCenterDesc" : "TW",
"relStatus" : "d"
}
}
}
You can find what yours really looks like by doing a db.<collectionName>.findOne() call in the mongoDB shell.

Categories