GAE datastore: Entity deleted only after multiple calls to delete() - java

I'm playing with the GAE datastore on a local machine with Eclipse. I've created two servlets - AddMovie and DeleteMovie:
AddMovie
Entity movie = new Entity("movie",System.currentTimeMillis());
movie.setProperty("name", "Hakeshset Beanan");
movie.setProperty("director", "Godard");
datastore.put(movie);
DeleteMovie
Query q = new Query("movie");
PreparedQuery pq = datastore.prepare(q);
List<Entity> movies = Lists.newArrayList(pq.asIterable());
response.put("numMoviesFound", String.valueOf(movies.size()));
for (Entity movie : movies) {
Key key = movie.getKey();
datastore.delete(key);
}
The funny thing is that the DeleteMovie servlet does not delete all the movies. Consecutive calls return {"numMoviesFound":"15"}, then {"numMoviesFound":"9"}, {"numMoviesFound":"3"} and finally {"numMoviesFound":"3"}.
Why aren't all movies deleted from the datastore at once?
Update: The problem seem to happen only on the local Eclipse, not on GAE servers.

I think you should delete all your movies in a single transaction, it would ensure better consistency.
Talking about consistency, your problem is right here :
Google App Engine's High Replication Datastore (HRD) provides high availability for your reads and writes by storing data synchronously in multiple data centers. However, the delay from the time a write is committed until it becomes visible in all data centers means that queries across multiple entity groups (non-ancestor queries) can only guarantee eventually consistent results. Consequently, the results of such queries may sometimes fail to reflect recent changes to the underlying data.
https://developers.google.com/appengine/docs/java/datastore/structuring_for_strong_consistency

Related

Spring + Hibernate loading large amount records

I'm trying to find the best/optimal way of loading larger amounts of data form MySQL database in a Spring/Hibernate service.
I pull about 100k records from a 3rd party API (in chunks usually between 300-1000) I then need to pull translations for each record from database since there are 30 languages that means that there will be 30 rows per record so 1000 records from API is 30,000 rows from database.
The records from API come in form of POJO (super small in size) say I get 1000 records I split the list into multiple 100 record lists and then collect id's of each record and select all translations from database for this record. I only need two values from the table which I than add to my POJOs and then I push the POJOs to the next service.
Basically this:
interface i18nRepository extends CrudRepository<Translation, Long> {}
List<APIRecord> records = api.findRecords(...);
List<List<APIRecord>> partitioned = Lists.partition(records, 100); // Guava
for(List<APIRecord> chunk : partitioned) {
List<Long> ids = new ArrayList();
for(APIRecord record : chunk) {
ids.add(record.getId());
}
List<Translation> translations = i18Repository.findAllByRecordIdIn(ids);
for(APIRecord record : chunk) {
for(Translation translation : translations) {
if (translation.getRedordId() == record.getId()) {
record.addTranslation(translation);
}
}
}
}
As far as spring-boot/hibernate properties go I only have default ones set. I would like to make this as efficient, fast and memory lite as possible. One idea I had was to use the lower layer API instead of Hibernate to bypass object mapping.
In my opinion, you should bypass JPA/Hibernate for bulk operations.
There's no way to make bulk operations efficient in JPA.
Consider using Spring's JpaTemplate and native SQL.

Timeouts in datastore queries

I am using objectify v5.1.11 from auto scaled app engine instances in java8 runtime environment.
I have an API which IOT devices call periodically to upload statistics information. In this API, I insert an entity to datastore to store the statistics information. This entity uses the auto generated IDs of datastore. Entity definition is as follows:
#Entity(name = "Stats")
public class StatsEntity {
#Id
private Long statisticsId;
#Index
private Long deviceId;
#Index
private String statsKey;
#Index
private Date creationTime;
}
But I had a requirement of checking for duplicates before inserting the entity. I switched to custom generated (String) Ids. I came up with a mechanism of appending the deviceId to statsKey (unique for each statistic within device) string provided by the device to generate the ID.
This is to avoid the eventual consistency behaviour if I use query to check if the entity already exists. Since get by ID is strongly consistent I can use it to check for duplicates.
There is another API to fetch the statistics uploaded by a device. In this API, I list the entities by filtering on deviceId and order by creationTime in descending order (newest first) with a page size of 100. This request times out since the request exceeds the 60s limit of appengine. I see the following exception in the logs:
Task was cancelled.
java.util.concurrent.CancellationException: Task was cancelled.
at com.google.common.util.concurrent.AbstractFuture.cancellationExceptionWithCause(AbstractFuture.java:1355)
at com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:555)
at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:436)
at com.google.common.util.concurrent.AbstractFuture$TrustedFuture.get(AbstractFuture.java:99)
at com.google.appengine.tools.development.TimedFuture.get(TimedFuture.java:42)
at com.google.common.util.concurrent.ForwardingFuture.get(ForwardingFuture.java:62)
at com.google.appengine.api.utils.FutureWrapper.get(FutureWrapper.java:93)
at com.google.appengine.api.datastore.FutureHelper.getInternal(FutureHelper.java:69)
at com.google.appengine.api.datastore.FutureHelper.quietGet(FutureHelper.java:33)
at com.google.appengine.api.datastore.BaseQueryResultsSource.loadMoreEntities(BaseQueryResultsSource.java:243)
at com.google.appengine.api.datastore.BaseQueryResultsSource.loadMoreEntities(BaseQueryResultsSource.java:180)
at com.google.appengine.api.datastore.QueryResultIteratorImpl.ensureLoaded(QueryResultIteratorImpl.java:173)
at com.google.appengine.api.datastore.QueryResultIteratorImpl.hasNext(QueryResultIteratorImpl.java:70)
at com.googlecode.objectify.impl.KeysOnlyIterator.hasNext(KeysOnlyIterator.java:29)
at com.google.common.collect.Iterators$5.hasNext(Iterators.java:580)
at com.google.common.collect.TransformedIterator.hasNext(TransformedIterator.java:42)
at com.googlecode.objectify.impl.ChunkIterator.hasNext(ChunkIterator.java:39)
at com.google.common.collect.MultitransformedIterator.hasNext(MultitransformedIterator.java:50)
at com.google.common.collect.MultitransformedIterator.hasNext(MultitransformedIterator.java:50)
at com.google.common.collect.Iterators$PeekingImpl.hasNext(Iterators.java:1105)
at com.googlecode.objectify.impl.ChunkingIterator.hasNext(ChunkingIterator.java:51)
at com.ittiam.cvml.dao.repository.PerformanceStatsRepositoryImpl.list(PerformanceStatsRepositoryImpl.java:154)
at com.ittiam.cvml.service.PerformanceStatsServiceImpl.listPerformanceStats(PerformanceStatsServiceImpl.java:227)
The statsKey provided by the device is based on time and hence monotonically increasing (step increase of 15 mins) which is bad as per this link.
But my traffic is not large enough to warrant this behaviour. Each device makes 2 to 3 requests every 15 minutes and there are about 300 devices.
When I try to list entities for devices which haven't made any request since I made this switch to custom ID, I still observe this issue.
Edit
My code to list the entity is as follows:
Query<StatsEntity> query = ofy().load().type(StatsEntity.class);
List<StatsEntity> entityList =
new ArrayList<StatsEntity>();
query = query.filter("deviceId", deviceId);
query = query.order("-creationTime");
query = query.limit(100);
QueryResultIterator<StatsEntity> iterator = query.iterator();
while (iterator.hasNext()) {
entityList.add(iterator.next());
}
This error usually occurs because of the write contention. The logic behind this is simple if you're having multiple transactions such as writing and reading some stuff from the same entity group concurrently.
There are various approaches to solve this problem:
A query lives for only 30 secs but you can extend it by converting your API into a task queue. Usually handling such write contention issues you should always use a task queue which lasts for around 10 mins.
If possible, make your entity group smaller.
You can find more approaches here.
Hope this answers your question!!!

manage concurrency access mongoDB

i need to manage concurrency access for data updates in mongoDB.
Example: two users a and b connect to my application. The user a has updated a data and the user b wants to update the same data that the user a has already updated, so i want the user b cannot update this data because is has already updated by the user a.
if user A and user B only update one document and your can know the initial value and updated value try this code:
The code try to update the secret field,and we know the inital value is expertSecret
public void compareAndSet(String expertSecret, String targetSecret) {
// get a mongodb collection
MongoCollection<Document> collection = client.getDatabase(DATABASE).getCollection(COLLECTION);
BasicDBObject filter = new BasicDBObject();
filter.append("secret", expertSecret);
BasicDBObject update = new BasicDBObject();
update.append("secret", targetSecret);
collection.updateOne(filter, update);
}
If don't know initial value,how to do?
you can add a filed to representative operation,and before update check the filed.
If need to update more than one document,how to do?
Multi-document transactions need mongo server to support,get more information from here
However, for situations that require atomicity for updates to multiple documents or consistency between reads to multiple documents, MongoDB provides the ability to perform multi-document transactions against replica sets. Multi-document transactions can be used across multiple operations, collections, databases, and documents. Multi-document transactions provide an “all-or-nothing” proposition.

How to listen for change in a collection?

I have to check for changes in an old embedded DBF database which is populated by an old third-party application. I don't have access to source code of that application and cannot put trigger or whatever on the database. For business constraint I cannot change that...
My objective is to capture new records, deleted records and modified records from a table (~1500 records) of that database with a Java application for further processes. The database is accessible in my Spring application though JPA/Hibernate with HXTT DBF driver.
I am looking now for a way to efficiently capture changes made by the third-party app in the database.
Do I have to periodically read the whole table and check if each record is still unchanged or to apply any kind of diff within two readings? Is there a kind of "trigger" I can set in my Java app? How to listen properly for those changes?
There is no JPA mechanism for getting callbacks from a database when the data changes.
The only options is to build your own change detection. Typically you would start by detecting which entities were added, removed, and which still exists. For the once that still exist you will need to check if they are changed, so the entity needs an equals() method.
An entity is identified by it primary key, so you will need to get the set of all primary keys, once you have that you can easily use Guava's Sets methods to produce the 3 sets of added, removed, and existing (before and now), like this.
List<MyEntity> old = new ArrayList<>(); // load from the DB last time
List<MyEntity> current = new ArrayList<>(); // loaded from DB now
Map<Long, MyEntity> oldMap = old.stream().collect(Collectors.toMap(MyEntity::getId, Function.<MyEntity>identity() ));
Map<Long, MyEntity> currentMap = current.stream().collect(Collectors.toMap(MyEntity::getId, Function.<MyEntity>identity()));
Set<Long> oldKeys = oldMap.keySet();
Set<Long> currentKeys = currentMap.keySet();
Sets.SetView<Long> deletedKeys = Sets.difference(oldKeys, currentKeys);
Sets.SetView<Long> addedKeys = Sets.difference(currentKeys, oldKeys);
Sets.SetView<Long> couldBeChanged = Sets.intersection(oldKeys, currentKeys);
for (Long id : couldBeChanged) {
if (oldMap.get(id).equals(currentMap.get(id))) {
// entity with this id was changed
}
}

Inconsistent behavior when adding an Entity using Google App Engine datastore

I'm working on a Google App Engine project that's written in Java.
Whenever I add an entity into the datastore I use a transaction:
DatastoreService datastore = DatastoreServiceFactory.getDatastoreService();
Transaction transaction = datastore.beginTransaction();
datastore.put(entity);
transaction.commit();
I then count the number of entities that are in the datastore:
Query query = new Query(kind);
query.setFilter(filter);
PreparedQuery preparedQuery = datastore.prepare(query);
int count = preparedQuery.countEntities(FetchOptions.Builder.withDefaults());
System.out.println("count = " + count);
I've noticed that sometimes the datastore can be inconsistent. Like sometimes I'll add an entity, but the count doesn't show up. If I then add another entity the count goes up by 2.
Why does this happen? How do I stop it from happening?
You are experiencing the effects of "Eventual Consistency", which is a side effect of using the High Replication Datastore. Basically, you are calling the query before the data has time to replicate across data centres. You can avoid this, by using "Ancestor Queries", which always results in "Strongly Consistent" queries. Suggest you read this, which explains it in detail.

Categories