manage concurrency access mongoDB - java

i need to manage concurrency access for data updates in mongoDB.
Example: two users a and b connect to my application. The user a has updated a data and the user b wants to update the same data that the user a has already updated, so i want the user b cannot update this data because is has already updated by the user a.

if user A and user B only update one document and your can know the initial value and updated value try this code:
The code try to update the secret field,and we know the inital value is expertSecret
public void compareAndSet(String expertSecret, String targetSecret) {
// get a mongodb collection
MongoCollection<Document> collection = client.getDatabase(DATABASE).getCollection(COLLECTION);
BasicDBObject filter = new BasicDBObject();
filter.append("secret", expertSecret);
BasicDBObject update = new BasicDBObject();
update.append("secret", targetSecret);
collection.updateOne(filter, update);
}
If don't know initial value,how to do?
you can add a filed to representative operation,and before update check the filed.
If need to update more than one document,how to do?
Multi-document transactions need mongo server to support,get more information from here
However, for situations that require atomicity for updates to multiple documents or consistency between reads to multiple documents, MongoDB provides the ability to perform multi-document transactions against replica sets. Multi-document transactions can be used across multiple operations, collections, databases, and documents. Multi-document transactions provide an “all-or-nothing” proposition.

Related

Spring Data JPA: Efficiently Query The Database for A Large Dataset

I have written an application to scrape a huge set of reviews. For each review i store the review itself Review_Table(User_Id, Trail_Id, Rating), the Username (Id, Username, UserLink) and the Trail which is build previously in the code (Id, ...60 other attributes)
for(Element card: reviewCards){
String userName = card.select("expression").text();
String userLink = card.select("expression").attr("href");
String userRatingString = card.select("expression").attr("aria-label");
Double userRating;
if(userRatingString.equals("NaN Stars")){
userRating = 0.0;
}else {
userRating = Double.parseDouble(userRatingString.replaceAll("[^0-9.]", ""));
}
User u;
Rating r;
//probably this is the bottleneck
if(userService.getByUserLink(userLink)!=null){
u = new User(userName, userLink, new HashSet<Rating>());
r = Rating.builder()
.user(u)
.userRating(userRating)
.trail(t)
.build();
}else {
u = userService.getByUserLink(userLink);
r = Rating.builder()
.user(u)
.userRating(userRating)
.trail(t)
.build();
}
i = i +1;
ratingSet.add(r);
userSet.add(u);
}
saveToDb(userSet, t, link, ratingSet);
savedEntities = savedEntities + 1;
log.info(savedEntities + " Saved Entities");
}
The code works fine for small-medium sized dataset but i encounter a huge bottleneck for larger datasets. Let's suppose i have 13K user entities already stored in the PostgresDB and another batch of 8500 reviews comes to be scraped, i have to check for every review if the user of that review is already stored. This is taking forever
I tried to define and index on the UserLink attribute in Postgres but the speed didn't improve at all
I tried to take and collect all the users stored in the Db inside a set and use the contains method to check if a particular user already exists in the set (in this way I thought I could bypass the database bottleneck of 8k write and read but in a risky way because if the users inside the db table were too much i would have encountered a memory overflow). The speed, again, didn't improve
At this point I don't have any other idea to improve this
Well for one, you would certainly benefit from not querying for each user individually in a loop. What you can do is query & cache for only the UserLink or UserName meaning get & cache the complete set of only one of them because that's what you seem to need to differentiate in the if-else.
You can actually query for individual fields with Spring Data JPA #Query either directly or even with Spring Data JPA Projections to query subset of fields if needed and cache & use them for the lookup. If you think the users could run into millions or billions then you could think of using a distributed cache like Apache Ignite where your collection could scale easily.
Btw, the if-else seem to be inversed is it not?
Next you don't store each review individually which the above code appears to imply. You can write in batches. Also since you are using Postgres you can use Postgres CopyManager provided by Postgres for bulk data transfer by using it with Spring Data Custom repositories. So you can keep writing to a new text/csv file locally at a set schedule (every x minutes) and use this to write that batched text/csv to the table (after that x minutes) and remove the file. This would be really quick.
The other option is write a stored procedure that combines the above & invoke it again in a custom repository.
Please let me know which one you had like elaborated..
UPDATE (Jan 12 2022):
One other item i missed is when you querying for UserLink or UserName you can use a very efficient form of select query that Postgres supports instead of using an IN clause like below,
#Select("select u from user u where u.userLink = ANY('{:userLinks}'::varchar[])", nativeQuery = true)
List<Users> getUsersByLinks(#Param("userLinks") String[] userLinks);

Faster way of updating database table using Hibernate (Java 8 reduction?)

I am working on a monitoring tool developed in Spring Boot using Hibernate as ORM.
I need to compare each row (already persisted rows of sent messages) in my table and see if a MailId (unique) has received a feedback (status: OPENED, BOUNCED, DELIVERED...) Yes or Not.
I get the feedbacks by reading csv files from a network folder. The CSV parsing and reading of files goes very fast, but the update of my database is very slow. My algorithm is not very efficient because I loop trough a list that can have hundred thousands of objects and look in my table.
This is the method that make the update in my table by updating the "target" Object (row in table database)
#Override
public void updateTargetObjectFoo() throws CSVProcessingException, FileNotFoundException {
// Here I make a call to performProcessing method which reads files on a folder and parse them to JavaObjects and I map them in a feedBackList of type Foo
List<Foo> feedBackList = performProcessing(env.getProperty("foo_in"), EXPECTED_HEADER_FIELDS_STATUS, Foo.class, ".LETTERS.STATUS.");
for (Foo foo: feedBackList) {
//findByKey does a simple Select in mySql where MailId = foo.getMailId()
Foo persistedFoo = fooDao.findByKey(foo.getMailId());
if (persistedFoo != null) {
persistedFoo.setStatus(foo.getStatus());
persistedFoo.setDnsCode(foo.getDnsCode());
persistedFoo.setReturnDate(foo.getReturnDate());
persistedFoo.setReturnTime(foo.getReturnTime());
//The save account here does an MySql UPDATE on the table
fooDao.saveAccount(foo);
}
}
}
What if I achieve this selection/comparison and update action in Java side? Then re-update the whole list in database?
Will it be faster?
Thanks to all for your help.
Hibernate is not particularly well-suited for batch processing.
You may be better off using Spring's JdbcTemplate to do jdbc batch processing.
However, if you must do this via Hibernate, this may help: https://docs.jboss.org/hibernate/orm/5.2/userguide/html_single/chapters/batch/Batching.html

How to listen for change in a collection?

I have to check for changes in an old embedded DBF database which is populated by an old third-party application. I don't have access to source code of that application and cannot put trigger or whatever on the database. For business constraint I cannot change that...
My objective is to capture new records, deleted records and modified records from a table (~1500 records) of that database with a Java application for further processes. The database is accessible in my Spring application though JPA/Hibernate with HXTT DBF driver.
I am looking now for a way to efficiently capture changes made by the third-party app in the database.
Do I have to periodically read the whole table and check if each record is still unchanged or to apply any kind of diff within two readings? Is there a kind of "trigger" I can set in my Java app? How to listen properly for those changes?
There is no JPA mechanism for getting callbacks from a database when the data changes.
The only options is to build your own change detection. Typically you would start by detecting which entities were added, removed, and which still exists. For the once that still exist you will need to check if they are changed, so the entity needs an equals() method.
An entity is identified by it primary key, so you will need to get the set of all primary keys, once you have that you can easily use Guava's Sets methods to produce the 3 sets of added, removed, and existing (before and now), like this.
List<MyEntity> old = new ArrayList<>(); // load from the DB last time
List<MyEntity> current = new ArrayList<>(); // loaded from DB now
Map<Long, MyEntity> oldMap = old.stream().collect(Collectors.toMap(MyEntity::getId, Function.<MyEntity>identity() ));
Map<Long, MyEntity> currentMap = current.stream().collect(Collectors.toMap(MyEntity::getId, Function.<MyEntity>identity()));
Set<Long> oldKeys = oldMap.keySet();
Set<Long> currentKeys = currentMap.keySet();
Sets.SetView<Long> deletedKeys = Sets.difference(oldKeys, currentKeys);
Sets.SetView<Long> addedKeys = Sets.difference(currentKeys, oldKeys);
Sets.SetView<Long> couldBeChanged = Sets.intersection(oldKeys, currentKeys);
for (Long id : couldBeChanged) {
if (oldMap.get(id).equals(currentMap.get(id))) {
// entity with this id was changed
}
}

List all MongoDB Databases and their details from Java

I am developing a Java/MongoDB application and require a list of all existing MongoDB Databases.
I know I can use this code:-
final MongoClient mongoClient = DatabaseManager.getMongoclient();
final ListDatabasesIterable<Document> databasesDocument = mongoClient.listDatabases();
final MongoCursor<Document> mongoCursor = databasesDocument.iterator();
while (mongoCursor.hasNext()) {
final Document databaseDocument = mongoCursor.next();
Assert.assertNotNull(databaseDocument);
}
However the details only include the Database Name, its size on disk, and whether or not the database is empty.
I need to know when the database was created, when = Date & Time.
Is there anyway I can retrieve this information from within a Java application?
As far as I know, MongoDB doesn't keep track of database creation dates.
One possible workaround if you are creator of databases is tracking it by yourself. Create meta collection in meta database and insert new record db_name=time when you create database.

GAE datastore: Entity deleted only after multiple calls to delete()

I'm playing with the GAE datastore on a local machine with Eclipse. I've created two servlets - AddMovie and DeleteMovie:
AddMovie
Entity movie = new Entity("movie",System.currentTimeMillis());
movie.setProperty("name", "Hakeshset Beanan");
movie.setProperty("director", "Godard");
datastore.put(movie);
DeleteMovie
Query q = new Query("movie");
PreparedQuery pq = datastore.prepare(q);
List<Entity> movies = Lists.newArrayList(pq.asIterable());
response.put("numMoviesFound", String.valueOf(movies.size()));
for (Entity movie : movies) {
Key key = movie.getKey();
datastore.delete(key);
}
The funny thing is that the DeleteMovie servlet does not delete all the movies. Consecutive calls return {"numMoviesFound":"15"}, then {"numMoviesFound":"9"}, {"numMoviesFound":"3"} and finally {"numMoviesFound":"3"}.
Why aren't all movies deleted from the datastore at once?
Update: The problem seem to happen only on the local Eclipse, not on GAE servers.
I think you should delete all your movies in a single transaction, it would ensure better consistency.
Talking about consistency, your problem is right here :
Google App Engine's High Replication Datastore (HRD) provides high availability for your reads and writes by storing data synchronously in multiple data centers. However, the delay from the time a write is committed until it becomes visible in all data centers means that queries across multiple entity groups (non-ancestor queries) can only guarantee eventually consistent results. Consequently, the results of such queries may sometimes fail to reflect recent changes to the underlying data.
https://developers.google.com/appengine/docs/java/datastore/structuring_for_strong_consistency

Categories