Count entities in datastore Google app engine - java

In my GAE application I have a survey. For example one question is "where do you live?".
If we are using the google guestbook example application the key would be guestbook, and in this I have created smaller keys like address, content2, content3. I have no problem posting or retrieving the individual results such as: "Fred MALE 24 Tokyo" "content1, gender, age, place". However on a separate page I want to show the total amount of people who took the survey; 10 people answered the survey - 3 Males 7 FEMALES. I also want to show results such as "10 people answered the survey in Tokyo." Can someone explain a way to count like this with the datastore? It would also help if your answer was in the context of Google`s guestbook example,or another simple example of GAE. お願いします!

There is count, see https://developers.google.com/appengine/docs/python/datastore/queryclass#Query_count
However see the notes in there. In short:
its time is proportional to the number of elements being counted
If you have many elements it will timeout because of (1) so you would need to use limit an paginate client-side.
its faster than iterating, but by a constant factor (because of #1)
For many elements your direction is to either use something like mapreduce, use cloud sql (easiest but $) or to maintain the counts yourself, usually with sharding and/or using memcached to improve performance and greatly reduce chances that simultaneous operations could lose data (because memcached supports atomic addition operations)

Do you mean a query like -
select count(*) from questionaires where sex = 'M'
I dont think you can return this from GAE datastore
The options are -
Return a list from a query and output its size
#SuppressWarnings("unchecked")
public List<Users> Users() {
PersistenceManager pm = getPersistenceManagerFactory().getPersistenceManager();
String query = "select from " + User.class.getName();
return (List<User>) pm.newQuery(query).execute();
}
Maintain a count on these fields in the datastore

Related

How to select items in date range in DynamoDB

How can I select all items within a given date range?
SELECT * FROM GameScores where createdAt >= start_date && createAt <=end_date
I want to make a query like this. Do I need to crate a global secondary index or not?
I've tried this
public void getItemsByDate(Date start, Date end) {
SimpleDateFormat df = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSS'Z'");
String stringStart = df.format(start);
String stringEnd = df.format(end);
ScanSpec scanSpec = new ScanSpec();
scanSpec.withFilterExpression("CreatedAt BETWEEN :from AND :to")
.withValueMap(
new ValueMap()
.withString(":from", stringStart)
.withString(":to", stringEnd));
ItemCollection<ScanOutcome> items = null;
items = gamesScoresTable.scan(scanSpec);
}
But it doesn't work, I'm getting less results than expected.
I can answer your questions, but to suggest any real solution, I would need to see the general shape of your data, as well as what your GameScore's primary key is.
TLDR;
Setup your table so that you can retrieve data with queries, rather than scans and filters, and then create indexes to support lesser used access patterns and improve querying flexibility. Because of how fast reads are when providing the full (or, although not as fast, partial) primary key, i.e. using queries, DynamoDB is optimal when table structure is driven by the application's access patterns.
When designing your tables, keep in mind NoSQL design best practices, as well as best practices for querying and scanning and it will pay dividends in the long run.
Explanations
Question 1
How can I select all items within a given date range?
To answer this, I'd like to break that question down a little more. Let's start with: How can I select all items?
This, you have already accomplished. A scan is a great way to retrieve all items in your table, and unless you have all your items within one partition, it is the only way to retrieve all the items in your table. Scans can be helpful when you have to access data by unknown keys.
Scans, however, have limitations, and as your table grows in size they'll cost you in both performance and dollars. A single scan can only retrieve a maximum of 1MB of data, of a single partition, and is capped at that partition's read capacity. When a scan tops out at either limitation, consecutive scans will happen sequentially. Meaning a scan on a large table could take multiple round trips.
On top of that, with scans you consume read capacity based on the size of the item, no matter how much (or little) data is returned. If you only request a small amount of attributes in your ProjectionExpression, and your FilterExpression eliminates 90% of the items in your table, you still paid to read the entire table.
You can optimize performance of scans using Parallel Scans, but if you require an entire table scan for an access pattern that happens frequently for your application, you should consider restructuring your table. More about scans.
Let's now look at: How can I select all items, based on some criteria?
The ideal way to accomplish retrieving data based on some criteria (in your case SELECT * FROM GameScores where createdAt >= start_date && createAt <=end_date) would be to query the base table (or index). To do so, per the documentation:
You must provide the name of the partition key attribute and a single value for that attribute. Query returns all items with that partition key value.
Like the documentation says, querying a partition will return all of its values. If your GameScores table has a partition key of GameName, then a query for GameName = PacMan will return all Items with that partition key. Other GameName partitions, however, will not be captured in this query.
If you need more depth in your query:
Optionally, you can provide a sort key attribute and use a comparison operator to refine the search results.
Here's a list of all the possible comparison operators you can use with your sort key. This is where you can leverage a between comparison operator in the KeyConditionExpression of your query operation. Something like: GameName = PacMan AND createdAt BETWEEN time1 AND time2 will work, if createdAt is the sort key of the table or index that you are querying.
If it is not the sort key, you might have the answer to your second question.
Question 2
Do I need to create a Global Secondary Index?
Let's start with: Do I need to create an index?
If your base table data structure does not fit some amount of access patterns for your application, you might need to. However, in DynamoDB, the denormalization of data also support more access patterns. I would recommend watching this video on how to structure your data.
Moving onto: Do I need to create a GSI?
GSIs do not support strong read consistency, so if you need that, you'll need to go with a Local Secondary Index (LSI). However, if you've already created your base table, you won't be able to create an LSI. Another difference between the two is the primary key: a GSI can have a different partition and sort key as the base table, while an LSI will only be able to differ in sort key. More about indexes.

ORDER BY with SUM, MIN, MAX, etc using Ebean

I am using PlayFramework 2.2.2 with Ebean and MsSql.
I am looking for the simplest or cleanest method to be able to sort by MIN or MAX etc.
A sample raw sql query might look like:
SELECT id, name, tickets FROM users WHERE tickets != NULL ORDER BY MAX(tickets)
I don't know if it's just me, but the documentation for ebean is incredibly confusing. It seems any time anyone comes up with a query that couldn't be written by a 9 year old, the answer is "switch to RawSQL". Well, why bother with Ebean at all then?
Anyway, I would really like to see some CONCRETE Ebean examples of ordering by MIN/MAX, etc.
Do you really need order by min ,max... since there is id in the select list.
Let me know whether id is unique or duplicates are allowed
Would suggest to use the following query incase the id is unique
select id,name,tickets from users where tickets is not null order by ticket desc
I encountered the same problem then I found the following information.
Design Goal:
This query language is NOT designed to be a replacement for SQL. It is designed to be a simple way to describe the "Object Graph" you want Ebean to build for you. Each find/fetch represents a node in that "Object Graph" which makes it easy to define for each node which properties you want to fetch.
Once you hit the limits of this language such as wanting aggregate functions (sum, average, min etc) or recursive queries etc you use SQL. Ebean's goal is to make it as easy as possible to use your own SQL to populate entity beans. Refer to RawSql .
-> http://www.avaje.org/static/javadoc/pub/com/avaje/ebean/Query.html

sql query performance: finding max

I have a table with group and permission column. I want to find the max permission from a list of group. I am using java and oracle database, I thought of two ways to do this:
Way 1:
in java loop through the group list
result = select permission from table where group = currentgroup
if result > max, max = result
Way 2:
max = select max(permission) from table where group in (group list)
I thought way 2 would be faster, but then group list can be very long and I dont know if it is a good idea to have long list in a single sql query.
From the information you've given, the second approach is by far the best. Databases are optimised directly for these kinds of tasks, so within reason, its always best to narrow the data down with the database. The first approach means the database needs to return all values anyway, increasing processing time, bandwidth and using up memory within your java application.

Google App Engine Queries with JDOQL, how to just get a count?

I've used "select from X.class.getName()" to get all records of class X, but if there is a lot of records, it might take a long time to get the results.
I just want a count of how many records are there in the Datastore, what's the fastest query to get this number ? Is there something like "select COUNT() X.class.getName()" that can return , for example, 234000 [ the count of all records ] ?
See What's the best way to count results in GQL?
(short answer is that you should store the amount of object and update it whenver you add/remove objects from the datastore)

Implementing result paging in hibernate (getting total number of rows)

How do I implement paging in Hibernate? The Query objects has methods called setMaxResults and setFirstResult which are certainly helpful. But where can I get the total number of results, so that I can show link to last page of results, and print things such as results 200 to 250 of xxx?
You can use Query.setMaxResults(int results) and Query.setFirstResult(int offset).
Editing too: There's no way to know how many results you'll get. So, first you must query with "select count(*)...". A little ugly, IMHO.
You must do a separate query to get the max results...and in the case where between time A of the first time the client issues a paging request to time B when another request is issued, if new records are added or some records now fit the criteria then you have to query the max again to reflect such. I usually do this in HQL like this
Integer count = (Integer) session.createQuery("select count(*) from ....").uniqueResult();
for Criteria queries I usually push my data into a DTO like this
ScrollableResults scrollable = criteria.scroll(ScrollMode.SCROLL_INSENSITIVE);
if(scrollable.last()){//returns true if there is a resultset
genericDTO.setTotalCount(scrollable.getRowNumber() + 1);
criteria.setFirstResult(command.getStart())
.setMaxResults(command.getLimit());
genericDTO.setLineItems(Collections.unmodifiableList(criteria.list()));
}
scrollable.close();
return genericDTO;
you could perform two queries - a count(*) type query, which should be cheap if you are not joining too many tables together, and a second query that has the limits set. Then you know how many items exists but only grab the ones being viewed.
You can do one thing. just prepare Criteria query as per your busness requirement with all Predicates , sorting , searching etc.
and then do as below :-
CriteriaBuilder criteriaBuilder = em.getCriteriaBuilder();
CriteriaQuery<Feedback> criteriaQuery = criteriaBuilder.createQuery(Feedback.class);
//Just Prepare your all Predicates as per your business need.
//eg :-
yourPredicateAsPerYourBusnessNeed = criteriaBuilder.equal(Root.get("applicationName"), applicationName);
criteriaQuery.where(yourPredicateAsPerYourBusnessNeed).distinct(true);
TypedQuery<Feedback> criteriaQueryWithPredicate = em.createQuery(criteriaQuery);
//Getting total Count Here
Long totalCount = criteriaQueryWithPredicate.getResultStream().distinct().count();
Now we have our actual data with us as above with total count , right.
So now we can apply pagination on the data we have in our hand above , as below :-
List<Feedback> feedbackList = criteriaQueryWithPredicate.setFirstResult(offset).setMaxResults(pageSize).getResultList();
Now You can prepare a wrapper with your List return by DB along with the totalCount , startingPageNo that is offset here in this case, page Size etc and can return to your service / controller class.
I am 101 % sure , this will solve your problem, Because I was facing same problem and sorted it out same way.
Thanks- Sunil Kumar Mali
You can just setMaxResults to the maximum number of rows you want returned. There is no harm in setting this value greater than the number of actual rows available. The problem the other solutions is they assume the ordering of records remains the same each repeat of the query, and there are no changes going on between commands.
To avoid that if you really want to scroll through results, it is best to use the ScrollableResults. Don't throw this object away between paging, but use it to keep the records in the same order. To find out the number of records from the ScrollableResults, you can simply move to the last() position, and then get the row number. Remember to add 1 to this value, since row numbers start counting at 0.
I personally think you should handle the paging in the front-end. I know this isn't that efficiënt but at least it would be less error prone.
If you would use the count(*) thing what would happen if records get deleted from the table in between requests for a certain page? Lots of things could go wrong this way.

Categories