Timeouts in datastore queries - java

I am using objectify v5.1.11 from auto scaled app engine instances in java8 runtime environment.
I have an API which IOT devices call periodically to upload statistics information. In this API, I insert an entity to datastore to store the statistics information. This entity uses the auto generated IDs of datastore. Entity definition is as follows:
#Entity(name = "Stats")
public class StatsEntity {
#Id
private Long statisticsId;
#Index
private Long deviceId;
#Index
private String statsKey;
#Index
private Date creationTime;
}
But I had a requirement of checking for duplicates before inserting the entity. I switched to custom generated (String) Ids. I came up with a mechanism of appending the deviceId to statsKey (unique for each statistic within device) string provided by the device to generate the ID.
This is to avoid the eventual consistency behaviour if I use query to check if the entity already exists. Since get by ID is strongly consistent I can use it to check for duplicates.
There is another API to fetch the statistics uploaded by a device. In this API, I list the entities by filtering on deviceId and order by creationTime in descending order (newest first) with a page size of 100. This request times out since the request exceeds the 60s limit of appengine. I see the following exception in the logs:
Task was cancelled.
java.util.concurrent.CancellationException: Task was cancelled.
at com.google.common.util.concurrent.AbstractFuture.cancellationExceptionWithCause(AbstractFuture.java:1355)
at com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:555)
at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:436)
at com.google.common.util.concurrent.AbstractFuture$TrustedFuture.get(AbstractFuture.java:99)
at com.google.appengine.tools.development.TimedFuture.get(TimedFuture.java:42)
at com.google.common.util.concurrent.ForwardingFuture.get(ForwardingFuture.java:62)
at com.google.appengine.api.utils.FutureWrapper.get(FutureWrapper.java:93)
at com.google.appengine.api.datastore.FutureHelper.getInternal(FutureHelper.java:69)
at com.google.appengine.api.datastore.FutureHelper.quietGet(FutureHelper.java:33)
at com.google.appengine.api.datastore.BaseQueryResultsSource.loadMoreEntities(BaseQueryResultsSource.java:243)
at com.google.appengine.api.datastore.BaseQueryResultsSource.loadMoreEntities(BaseQueryResultsSource.java:180)
at com.google.appengine.api.datastore.QueryResultIteratorImpl.ensureLoaded(QueryResultIteratorImpl.java:173)
at com.google.appengine.api.datastore.QueryResultIteratorImpl.hasNext(QueryResultIteratorImpl.java:70)
at com.googlecode.objectify.impl.KeysOnlyIterator.hasNext(KeysOnlyIterator.java:29)
at com.google.common.collect.Iterators$5.hasNext(Iterators.java:580)
at com.google.common.collect.TransformedIterator.hasNext(TransformedIterator.java:42)
at com.googlecode.objectify.impl.ChunkIterator.hasNext(ChunkIterator.java:39)
at com.google.common.collect.MultitransformedIterator.hasNext(MultitransformedIterator.java:50)
at com.google.common.collect.MultitransformedIterator.hasNext(MultitransformedIterator.java:50)
at com.google.common.collect.Iterators$PeekingImpl.hasNext(Iterators.java:1105)
at com.googlecode.objectify.impl.ChunkingIterator.hasNext(ChunkingIterator.java:51)
at com.ittiam.cvml.dao.repository.PerformanceStatsRepositoryImpl.list(PerformanceStatsRepositoryImpl.java:154)
at com.ittiam.cvml.service.PerformanceStatsServiceImpl.listPerformanceStats(PerformanceStatsServiceImpl.java:227)
The statsKey provided by the device is based on time and hence monotonically increasing (step increase of 15 mins) which is bad as per this link.
But my traffic is not large enough to warrant this behaviour. Each device makes 2 to 3 requests every 15 minutes and there are about 300 devices.
When I try to list entities for devices which haven't made any request since I made this switch to custom ID, I still observe this issue.
Edit
My code to list the entity is as follows:
Query<StatsEntity> query = ofy().load().type(StatsEntity.class);
List<StatsEntity> entityList =
new ArrayList<StatsEntity>();
query = query.filter("deviceId", deviceId);
query = query.order("-creationTime");
query = query.limit(100);
QueryResultIterator<StatsEntity> iterator = query.iterator();
while (iterator.hasNext()) {
entityList.add(iterator.next());
}

This error usually occurs because of the write contention. The logic behind this is simple if you're having multiple transactions such as writing and reading some stuff from the same entity group concurrently.
There are various approaches to solve this problem:
A query lives for only 30 secs but you can extend it by converting your API into a task queue. Usually handling such write contention issues you should always use a task queue which lasts for around 10 mins.
If possible, make your entity group smaller.
You can find more approaches here.
Hope this answers your question!!!

Related

How to implement pagination by nextPageToken?

I am trying to implement pagination using nextPageToken.
I have table:
CREATE TABLE IF NOT EXISTS categories
(
id BIGINT PRIMARY KEY,
name VARCHAR(30) NOT NULL,
parent_id BIGINT REFERENCES categories (id)
);
so I have entity Category:
#Id
#GeneratedValue(strategy = GenerationType.SEQUENCE)
private Long id;
#Column(name = "name")
private String name;
#OneToOne
#JoinColumn(name = "parent_id", referencedColumnName = "id")
private Category category;
I dont really understand what I have to do next.
client requests token(that keeps what?)
Assume I have controller:
#GetMapping
public ResponseEntity<CategoriesTokenResponse> getCategories(
#RequestParam String nextPageToken
) {
return ResponseEntity.ok(categoryService.getCategories(nextPageToken));
}
Service:
public CategoriesTokenResponse getCategories(String nextPageToken) {
return new CategoriesTokenResponse(categoryDtoList, "myToken");
}
#Data
#AllArgsConstructor
public class CategoriesTokenResponse {
private final List<CategoryDto> categories;
private final String token;
}
How I have to implement sql query for that? And how I have to generate nextPagetoken for each id?
SELECT * FROM categories WHERE parent_id = what?
AND max(category id from previous page = token?)
ORDER BY id LIMIT 20;
First, you need to understand what you are working with here. Every implementation has some sort of limitation or inefficiency. For example, using page tokens like that is only good for infinite scroll pages. You can't jump to any specific page. So if my browser crashes and I'm on page 100, I have to scroll through 100 pages AGAIN. It is faster for massive data sets for sure, but does that matter if you need access to all pages? Or if you limit the return to begin with? Such as only getting the first 30 pages?
Basically decide this: Do you only care about the first few pages because search/sort is always in use? (like a user never using more than the first 1-5 pages of google) and is that data set large? Then great use-case. Will a user select "all items in the last 6 months" and actually need all of them or is the sort/search weak? Or, will you return all pages and not limit max return of 30 pages? Or, is development speed more important than a 0.1-3 second (depends on data size) speed increase? Then go with built in JPA Page objects.
I have used Page objects on 700k records with less than a second speed change compared to 70k records. Based on that, I don't see removing offset adding a ton of value unless you plan for a huge data set. I just tested a new system I'm making with pageable, it returned 10 items on page 1 in 84 milliseconds with no page limiter for 27k records on a vpn into my work network from my house. A table with over 500k records took 131 milliseconds That's pretty fast. Want to make it faster? Force a total max return of 30 pages and a max of 100 results per page, because normally, they don't need all data in that table. They want something else? refine the search. The speed difference is less than a second between this and the seek/key stype paging. This is assuming a normal SQL database too. NoSQL is a bit different here. Baeldung has a ton of articles on jpa paging like the following: https://www.baeldung.com/rest-api-pagination-in-spring
JPA Paging should take no more than 30 minutes to learn and implement, it's extremely easy and comes stock on JPA repositories. I strongly suggest using that over the seek/key style paging as you likely aren't building a system like google's or facebook's.
If you absolutely want to go with the seek/key style paging there's a good informational page here:
https://blog.jooq.org/2013/10/26/faster-sql-paging-with-jooq-using-the-seek-method/
In general, what you are looking for is using JOOQ with spring. Implementation example here:
https://docs.spring.io/spring-boot/docs/1.3.5.RELEASE/reference/html/boot-features-jooq.html
Basically, create a DSL context:
private final DSLContext DSL;
#Autowired
public JooqExample(DSLContext dslContext) {
this.DSL= dslContext;
}
Then use it like so:
DSL.using(configuration)
.select(PLAYERS.PLAYER_ID,
PLAYERS.FIRST_NAME,
PLAYERS.LAST_NAME,
PLAYERS.SCORE)
.from(PLAYERS)
.where(PLAYERS.GAME_ID.eq(42))
.orderBy(PLAYERS.SCORE.desc(),
PLAYERS.PLAYER_ID.asc())
.seek(949, 15) // (!)
.limit(10)
.fetch();
Instead of explicitly phrasing the seek predicate, just pass the last record from the previous query, and jOOQ will see that all records before and including this record are skipped, given the ORDER BY clause.

Spring + Hibernate loading large amount records

I'm trying to find the best/optimal way of loading larger amounts of data form MySQL database in a Spring/Hibernate service.
I pull about 100k records from a 3rd party API (in chunks usually between 300-1000) I then need to pull translations for each record from database since there are 30 languages that means that there will be 30 rows per record so 1000 records from API is 30,000 rows from database.
The records from API come in form of POJO (super small in size) say I get 1000 records I split the list into multiple 100 record lists and then collect id's of each record and select all translations from database for this record. I only need two values from the table which I than add to my POJOs and then I push the POJOs to the next service.
Basically this:
interface i18nRepository extends CrudRepository<Translation, Long> {}
List<APIRecord> records = api.findRecords(...);
List<List<APIRecord>> partitioned = Lists.partition(records, 100); // Guava
for(List<APIRecord> chunk : partitioned) {
List<Long> ids = new ArrayList();
for(APIRecord record : chunk) {
ids.add(record.getId());
}
List<Translation> translations = i18Repository.findAllByRecordIdIn(ids);
for(APIRecord record : chunk) {
for(Translation translation : translations) {
if (translation.getRedordId() == record.getId()) {
record.addTranslation(translation);
}
}
}
}
As far as spring-boot/hibernate properties go I only have default ones set. I would like to make this as efficient, fast and memory lite as possible. One idea I had was to use the lower layer API instead of Hibernate to bypass object mapping.
In my opinion, you should bypass JPA/Hibernate for bulk operations.
There's no way to make bulk operations efficient in JPA.
Consider using Spring's JpaTemplate and native SQL.

GAE: Exception while allocating 16 digit ids

I notice problem with allocating IDs on google app engine while using datastore. In my application I have a set of data that have to be initially uploaded. Data has been prepared on test appengine environment so it has autogenerated values for ID fields. Since I want to preserve these values I'm recreating entities by using remote API with Objectify as a separate process. After upload I want to make sure that used IDs will be removed from value range for autogenerator. I'm using DatastoreService.allocateIdRange with range of single long value. Everything works fine on dev server but on appspot for some values (16 digits values) I receive "Exceeded maximum allocated IDs" IllegalArgumentException.
Is there any limitation of allocateIdRange call (I have found none in documentation)?
Below is a sample code I'm using for id allocation for datastore after upload:
DatastoreService datastore = DatastoreServiceFactory.getDatastoreService();
String kind = Key.getKind(clazz);
PreparedQuery query = datastore.prepare(new Query(kind).setKeysOnly());
KeyRange keyRange = null;
Long id = null;
for (Entity entity : query.asIterable()) {
id = (Long) entity.getKey().getId();
keyRange = new KeyRange(null, kind, id, id);
DatastoreService.KeyRangeState state = datastore.allocateIdRange(keyRange);
}
This is a known issue with allocateIdRange(). A better error message would be "You can't call allocateIdRange() on scattered ids".
Scattered ids are the default since 1.8.1 and have values >= 2^52. Unfortunately we don't currently expose an API to reserve these ids.
It sounds like you may be trying to allocate an ID larger than the max allowed ID. This is limited by the largest integer size in javascript, which is 2^53.
Here is the page describing the App Engine limitation and the largest javascript int.

GAE datastore: Entity deleted only after multiple calls to delete()

I'm playing with the GAE datastore on a local machine with Eclipse. I've created two servlets - AddMovie and DeleteMovie:
AddMovie
Entity movie = new Entity("movie",System.currentTimeMillis());
movie.setProperty("name", "Hakeshset Beanan");
movie.setProperty("director", "Godard");
datastore.put(movie);
DeleteMovie
Query q = new Query("movie");
PreparedQuery pq = datastore.prepare(q);
List<Entity> movies = Lists.newArrayList(pq.asIterable());
response.put("numMoviesFound", String.valueOf(movies.size()));
for (Entity movie : movies) {
Key key = movie.getKey();
datastore.delete(key);
}
The funny thing is that the DeleteMovie servlet does not delete all the movies. Consecutive calls return {"numMoviesFound":"15"}, then {"numMoviesFound":"9"}, {"numMoviesFound":"3"} and finally {"numMoviesFound":"3"}.
Why aren't all movies deleted from the datastore at once?
Update: The problem seem to happen only on the local Eclipse, not on GAE servers.
I think you should delete all your movies in a single transaction, it would ensure better consistency.
Talking about consistency, your problem is right here :
Google App Engine's High Replication Datastore (HRD) provides high availability for your reads and writes by storing data synchronously in multiple data centers. However, the delay from the time a write is committed until it becomes visible in all data centers means that queries across multiple entity groups (non-ancestor queries) can only guarantee eventually consistent results. Consequently, the results of such queries may sometimes fail to reflect recent changes to the underlying data.
https://developers.google.com/appengine/docs/java/datastore/structuring_for_strong_consistency

Hibernate Batch Processing Using Native SQL

I have an application using hibernate. One of its modules calls a native SQL (StoredProc) in batch process. Roughly what it does is that every time it writes a file it updates a field in the database. Right now I am not sure how many files would need to be written as it is dependent on the number of transactions per day so it could be zero to a million.
If I use this code snippet in while loop will I have any problems?
#Transactional
public void test()
{
//The for loop represents a list of records that needs to be processed.
for (int i = 0; i < 1000000; i++ )
{
//Process the records and write the information into a file.
...
//Update a field(s) in the database using a stored procedure based on the processed information.
updateField(String.valueOf(i));
}
}
#Transactional(propagation=propagation.MANDATORY)
public void updateField(String value)
{
Session session = getSession();
SQLQuery sqlQuery = session.createSQLQuery("exec spUpdate :value");
sqlQuery.setParameter("value", value);
sqlQuery.executeUpdate();
}
Will I need any other configurations for my data source and transaction manager?
Will I need to set hibernate.jdbc.batch_size and hibernate.cache.use_second_level_cache?
Will I need to use session flush and clear for this? The samples in the hibernate tutorial is using POJO's and not native sql so I am not sure if it is also applicable.
Please note another part of the application is already using hibernate so as much as possible I would like to stick to using hibernate.
Thank you for your time and I am hoping for your quick response. If it is also possible could code snippet would really be useful for me.
Application Work Flow
1) Query Database for the transaction information. (Transaction date, Type of account, currency, etc..)
2) For each account process transaction information. (Discounts, Current Balance, etc..)
3) Write the transaction information and processed information to a file.
4) Update a database field based on the process information
5) Go back to step 2 while their are still accounts. (Assuming that no exception are thrown)
The code snippet will open and close the session for each iteration, which definitely not a good practice.
Is it possible, you have a job which checks how many new files added in the folder?
The job should run say every 15/25 minutes, checking how much files are changed/added in last 15/25 minutes and updates the database in batch.
Something like that will lower down the number of open/close session connections. It should be much faster than this.

Categories