Filtering paginated results - java

How to solve a problem if I have a paginated list done by Class.createCriteria() with some filters applied, but if I have to filter the records "afterwards", as the function that is used to filter can't be used inside the createCriteria(), because it is not according strictly to this class' properties.
The problem I get is that even though my createCriteria() returns paginated list with 10 records, after filtering them by this other function, it is only e.g. 3 records. I don't know what I should do in order to apply this filter inside the createCriteria, so I always get those 10 records and my totalCount is counted propely.

Related

scanindexforward() not working as expected while doing pagination in dynamodb

I am using DynamoDBEnchancedAsyncClient to query DynamoDB using GSI and pagination. Below is the code that I am using to achieve the same. I am tying to limit the number of items per page and number of pages sent to the subscriber of the Mono using below code. I need to sort the records in each page in descending order using the timestamp and this is the sort key in my GSI. For this I am using scanIndexForward(false) below. However I am not getting any records in the page even though there are in total 4 records that are present in DynamoDB.
SdkPublisher<Page<Customer>> query = secindex.query(QueryEnhancedRequest.builder().queryConditional(queryconditional).scanIndexForward(false)
.limit(2).build())
Mono.from(PagePublisher.create(query().limit(1)))
secindex is the DynamoDBAsyncIndex which is the GSI . As per the above code, 1 page should be returned to client with 2 records however none are getting returned. Also If I remove scanIndexForward(false) then the result is as expected but sorted in ascending order. How do I make it return limited records in descending order ? Does the pagination work differently when the scanIndexForward() is supplied?
Without 100% knowing what your filters are on your dynamo call, i can only guess - but I've seen this sort of thing many times
so.
Correction limit is applied before the query is returned not after. This was incorrect below - but because of the nature of additional filters being applied after the return this could indeed result in 2 items being returned that are then filtered out and an ultimate return of 0
end correction
Dynamodb Query does not perform any filter/limits on the data before returning it. The only thing a standard query to dynamo can do is check Hash Key/Range Key with some basic Range Key filtering ( gt, lt, between, begins with ect) - all other filters on attributes that are not Hash/Range are done by the SDK you're using after getting back a set of responses.
1 - Query the dynamo with a Hash Key/Range Key combination and any filtering on Range.
2 - All items that match this are sent back - up to 1mb data. Anything more than that needs additional calls
3 - Apply the limit to these results! this was incorrect, this is applied before being returned
4 - Apply the filter to whats been limited!
5 - Then whatever is left is returned.
This means that what happens often when you are using filter conditions on a dynamo query, you may not actually get back what you expect to - because either they are on the next page and what is on the current page, nothing matches the filter so you get back 0.
Since you are also using Limit, when it sorts the data in the same order as the Sort Key (as scan index forward is false) then if the first two values don't match your other filters, you get 0 items back.
I would recommend you try querying all the items without any filters beyond just Hash/Range key - no other attribute filters.
Then filter the response manually on your side.
You also should be aware of the internal pagination of the SDKs for dynamo - it will only return 1mb amount of data from the Dyanmo in a single call. Anything beyond that requires a second call including the LastEvaluatedKey that is returned in the first page of results. https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Query.Pagination.html has more information.
If your system cannot afford to do the filtering itself after the query is called, then you need to re-evaluate your HashKey/SortKey combinations. Dynamo is best aligned in an Access Pattern schema - that is to say, I have X data and I will need Y data, so I will cause X to be a Hash Key, and the Y values to be different Range Keys under that X.
like as an example: User data. You might have a HashKey of "user_id".
Then you have several different patterns for Range_keys
meta# (with attributes of email, username, hashed and salted passwords, ect)
post#1
post#2
post#3
avatar#
and so you make a query on just Hash Key of the user id, you get all the info. Or if you have page with just their posts, you can do a query of hash key of user id and range key (begins with post#)
This is the most important aspect of a good dynamo schema - the ability to do queries on any thing you need with just a Hash Key or a HashKey and RangeKey
With a well understood set of access patterns that you will need, and a dynamo that is set up appropriately, then you should need no filters or limits, because your combination of Hash/Range key queries will be enough.
(this does sometimes mean a duplication of data! You may have the same information in a Post# item as you do in the Meta# item - ie they both contain usernames. This is OK as when you query for a post you need the user name - as well as when you query for the password/username to see if they match. - Dynamo as a NoSQL handles this very well, and very fast - a given Hash/Range key combination is basically considered its own table in terms of access, making queries VERY fast against it.)

Returning prioritized documents in Elasticsearch

I am trying to write an ES searching method that returns records/documents that have `Tickets to the top of the list. So for example I do a search on 'Smith' and I get 200 results. I want all results ordered so that those who have tickets appear first. I have this query working, however whenever I try and sort by first name, the query no longer sorts by those with tickets and first name.
Does anyone have any thoughts on how I might keep the original sort order (by Tickets)?

GAE Search API: How to prevent the 2000 bytes query limit?

We are using the GAE Search API since quite some time but recently hit the query length limit of 2000 bytes:
java.lang.IllegalArgumentException: query string must not be longer
than 2000 bytes, was 2384
We're basically having documents saved with a secondary id set as an atomic field. Within our query we do some sorting and distance calculations and also exclude docs with those secondary ids matching a list of ids with a NOT statement like following:
... AND NOT sec_id:(x AND y AND ...)
With a certain amount of excluded ids we obviously hit the query length limit. I could split the query into separate ones with the same base query and only use a different set of excluded ids but then the sorting is problematic.
So I am wondering if there is another way to implement this kind of query, preferably with a black and also a white list within one query (AND NOT :(..) & AND :(..)).

How to remove duplicate rows in a Cursor (Android SDK)?

Currently, I have made 3 queries (resulting in 3 cursors), and then I merged the cursors using the MergeCursor class. However, this has caused duplicates in the cursor, and I can't seem to find a way to remove them? What would be the ideal method to fix this problem?
A Cursor is an object tied to the ResultSet, not the data therein.
If the three result-sets have identical keys, their primary-keys will need to be fetched to de-duplicate the rows - the Cursor implementation does not provide this function. There are several options, two named here:
As eluded to in an earlier comment - do this server-side and have the joined result returned. Ex: Send the base query from the client, have the server start three queries and merge the result - though databases excel in set-operations and there is almost never a performance gain in doing this programatically.
Launch one task that in turn runs the three queries and does the work to fetch the rows, returning just the distinct set of keys.

getting the number of filtered results in a listgrid (smartgwt)

I'm having a listgrid filled with entries, with filtering enabled. when I execute the filter the listgrid gets updated accordingly. Now I want to get the number of results that were found with the filters. I tried adding the FilterEditorSubmitHandler but that gets executed before the filter is actually executed (and also the listGrid.getDataSource().getFields() always returns 0.
Is there a way to get the number of results after the filter was applied? to be more precise: the FilterEditorSubmitHandler is called before the actual filtering, and I need a handler that gets called after the filter was applied, or right after the grid was updated again
You can use any of the following:
grid.getRecords();
grid.getRecordList();
grid.getResultSet();
grid.getDataAsRecordList();
I just found the correct handler.
the DataArrivedHandler works after a filter is executed and gets the correct number of results. On the first fill of the ListGrid it returns 0 though, but when filling the DataSource I already know the size of the Set I retrieve.
as a side note: I'm using an RPC call to get the data out of the DB, and not the SmartGWT server stuff.

Categories