Google App-Engine Java Group By and Count(*) queries - java

I am able to insert, filter and order records but can't use a simple count!!
I am wondering if there a way to get no of total rows in a table??
Is there a way to use GROUP BY in query?

You can see statistics about your data at admin interface or by using DatastoreService, and there you can find total count of rows ("entities of kind").
There no way to use GROUP BY, it's unsupported by Google BigTable. See "Unsupported Features"
Before using storages like this you need to read about NoSQL, and understand how it works, and why there can't be group_by, join, etc.

The simplest way is to use cursors to iterate over all records that match the query and count them, but this will give you a big overhead if you have many entities.
The smarter way is if you don`t have too complicated filter to use sharded counters to count the results of every possible filter, this also will bring you a headache if you have too much options in the filter.
Why is so important to show the number of the user which bring me to the old joke:
Q: How to count 8 millions rows?
A: Why/Who need to know that there are 8M rows?

Related

How to batch the fetches when pulling huge number of records from Oracle using JPA?

I have a Java application where we use spring data JPA to query our oracle database. For one use case, I need to fetch all the records present in the table. Now the table has record count of 400,000 thousand and it might grow in the near future. I don't feel comfortable pulling all records into the JVM since we don't know how large they can be. So, I want to configure the code to fetch specific number of records at a time say 50,000 and process before it goes to next 50,000. Is there a way I can achieve this with JPA? I came across this JDBC property that can be used with hibernate hibernate.jdbc.fetch_size. What I am trying to understand is if I use repository.findAll() returning List<Entity>How can a fetch Size work in this case? because List will have all the entities. I was also looking into repository methods returning Stream<>, not sure if I have to use that. Please do suggest. If there can be better solution for this use case?
Thanks
With JPA you can use the Pagination feature, means you tell the Repository how many result should be present at one page. (E.g. 50 000)
For more information follow up here https://www.baeldung.com/jpa-pagination

How to handle efficient database connection and performance in java?

I have 5000 records as search result and based on product number have to pull the related data associated with product number.that means seperate the 5000 product number and sending to database to pull the data.Creating one query and hiting the database for each product number is not efficient.
I'm looking for some idea to handle this situation.
Note:using hibernate and oracle and java
You got that search result with some query, it might be simpler to reuse that query with a join to retrieve the related data.
Instead of 5000 queries to get the result, you may use the IN clause.
You should probably split it in chunks, however, since such long SQL queries can throw errors, or use a temporary table and do a JOIN. Take a look at this.
maybe you could use a Materialized View and some basic paging? http://docs.oracle.com/cd/A97630_01/server.920/a96567/repmview.htm

How to use Bulk API with WHERE clause in Salesforce

I want to use Bulk API of Salesforce to run queries of this format.
Select Id from Object where field='<value>'.
I have thousands of such field values and want to retrieve Id of those objects. AFAIK, Bulk query of Salesforce supports only one SOQL statement as input.
One option could be to form a query like
Select Id,field where field in (<all field values>)
but problem is SOQL has 10000 characters limitation.
Any suggestions here?
Thanks
It seems like you are attempting to perform some kind of search query. If so you might look into using a SOSL query as opposed to SOQL as long as the fields you are searching are indexed by SFDC.
Otherwise, I agree with Born2BeMild. Your second approach is better and breaking up your list of values into batches would help get around the limits.
It would also help if you described a bit of your use case in more detail. Typically queries on a dynamic set of fields and values doesn't always yield the best performance even with the bulk api. You are almost better off downloading the data to a local database and exploring the data that way.
You could break those down into batches of 200 or so values and iteratively query Salesforce to build up a result set in memory or process subsets of the data.
You would have to check the governor limits for the maximum number of SOQL queries though. You should be able to track your usage via the API at runtime to avoid going over the maximum.
The problem is that you are hitting the governor limits. Saleforce can only process 200 records at a time if its coming from a database. Therefore to be able to work with all this records first you need to add all records to a list for example:
List<Account> accounts= [SELECT id, name, FROM Account];
Then you can work with the list accounts do everything you need to do with it then when you done you can update the database using:
Update accounts;
this link might be helpful:
https://help.salesforce.com/apex/HTViewSolution?id=000004410&language=en_US

Best way to sort the data : DB Query or in Application Code

I have a Mysql table with some data (> million rows). I have a requirement to sort the data based on the below criteria
1) Newest
2) Oldest
3) top rated
4) least rated
What is the recommended solution to develop the sort functionality
1) For every sort reuest execute a DBQuery with required joins and orderBy conditions and return the sorted data
2) Get all the data (un sorted) from table, put the data in cache. Write custom comparators (java) to sort the data.
I am leaning towards #2 as the load on DB is only once. Moreover, application code is better than DBQuery.
Please share your thoughts....
Thanks,
Karthik
Do as much in the database as you can. Note that if you have 1,000,000 rows, returning all million is nearly useless. Are you going to display this on a web site? I think not. Do you really care about the 500,000th least popular post? Again, I think not.
So do the sorts in the database and return the top 100, 500, or 1000 rows.
It's much faster to do it in the database:
1) the database is optimized for I/O operations, and can use indices, and other DB optimizations to improve the response time
2) taking the data from the database to the application will get all data into memory. The app will have to look all the data to redorder it without optimized algorithms
3) the database only takes the minimun necessary data into mamemory, which can be much less than all the data whihc has to be moved to java
4) you can always create extra indices on the database to improve the query performance.
I would say that operation on DB will be always faster. You should ensure that caching on DB is ON and working properly. Ensure that you are not using now() in your query because it will disable mysql cache. Take a look here how mysql query cache works. In basic. Query is cached based on string so if query string differs every time you fetch no cache is used.
AFAIK usually it should run faster if you let the DB sort your data.
And regarding code on application level vs db level I would agree in the case of stored procedures but sorting in SELECTs is fine IMHO.
If you want to show the data to the user also consider paging (in which case you're better off with sorting on the db level anyway).
Fetching a million rows from the database sounds like a terrible idea. It will generate a lot of networking traffic and require quite some time to transfer all the data. Not mentioning amounts of memory you would need to allocate in your application for storing million of objects.
So if you can fetch only a subset with a query, do that. Overall, do as much filtering as you can in the database.
And I do not see any problem in ordering in a single queue. You can always use UNION if you can't do it as one SELECT.
You do not have four tasks, you have two:
sort newest IS EQUAL TO sort oldest
AND
sort top rated IS EQUAL TO sort least rated.
So you need to make two calls to db. Yes sort in db. then instead of calling to sort every time, do this:
1] track the timestamp of the latest record in the db
2] before calling to sort and retrieve entire list, check if date has changed
3] if date has not changed, use the list you have in memory
4] if date has changed, update the list
I know this is an old thread, but it comes up in my search, so I'd like to post my opinion.
I'm a bit old school, but for that many rows, I would consider dumping the data from your database (each RDBMS has it's own method. Looks like MySQLDump command for MySQL: Link )
You can then process this with sorting algorithms or tools that are available in your java libraries or operating system.
Be careful about the work your asking your database to do. Remember that it has to be available to service other requests. Don't "bring it to it's knees" servicing only one request, unless it's a nightly batch cycle type of scenario and you're certain it won't be asked to do anything else.

How to Iterate across records in a MySql Database using Java

I have a customer with a very small set of data and records that I'd normally just serialize to a data file and be done but they want to run extra reports and have expandability down the road to do things their own way. The MySQL database came up and so I'm adapting their Java POS (point of sale) system to work with it.
I've done this before and here was my approach in a nutshell for one of the tables, say Customers:
I setup a loop to store the primary key into an arraylist then setup a form to go from one record to the next running SQL queries based on the PK. The query would pull down the fname, lname, address, etc. and fill in the fields on the screen.
I thought it might be a little clunky running a SQL query each time they click Next. So I'm looking for another approach to this problem. Any help is appreciated! I don't need exact code or anything, just some concepts will do fine
Thanks!
I would say the solution you suggest yourself is not very good not only because you run SQL query every time a button is pressed, but also because you are iterating over primary keys, which probably are not sorted in any meaningful order...
What you want is to retrieve a certain number of records which are sorted sensibly (by first/last name or something) and keep them as a kind of cache in your ArrayList or something similar... This can be done quite easily with SQL. When the user starts iterating over the results by pressing "Next", you can in the background start loading more records.
The key to keep usability is to load some records before the user actually request them to keep latency small, but keeping in mind that you also don't want to load the whole database at once....
Take a look at indexing your database. http://www.informit.com/articles/article.aspx?p=377652
Use JPA with the built in Hibernate provider. If you are not familiar with one or both, then download NetBeans - it includes a very easy to follow tutorial you can use to get up to speed. Managing lists of objects is trivial with the new JPA and you won't find yourself reinventing the wheel.
the key concept here is pagination.
Let's say you set your page size to 10. This means you select 10 records from the database, in a certain order, so your query should have an order by clause and a limit clause at the end. You use this resultset to display the form while the users navigates with Previous/Next buttons.
When the user navigates out of the page then you fetch an other page.
https://www.google.com/search?q=java+sql+pagination

Categories