What is the fast way to sort - java

Problem Description
I'm writing Android application which is working with big data, I have database (15 mb) and my application shows data from it. I have a queries which are get data from database already sorted for example alphabetic or depending on some parameters which I have provided.
Question
As I store data in the Array and then show it to user I want to know what is the fast way to sort data, while making a query or just put data in array and then sort it?

i am also faced this situation in my application. i resolved performance by using following way.
First i created index on my table based on primary key.
Then i used order by to sort the elements.
To search it in local i kept total content in one object , then perform search on that object.
If you use these surly you will improve performance 200 %.

Related

Problem on using indexOn on deeper nested item

I have search a lot on stackoverflow and read many questions
I was having 3 indexOn problem 2 of them are solved and 1 remains
I am sorting database and have indexOn on "favorite" and "poet" which runs successfully but I need one more indexOn for numbers inside heart node.
query is running successfully but I am getting indexOn warning in android studio
I have tried using variables in place of numbers in database rule but still getting warning
Using an unspecified index. Your data will be downloaded and filtered on the client. Consider adding '".indexOn": "heart/+91916*******"' at gazal to your security and Firebase Database rules for better performance
queryFav = FirebaseDatabase.getInstance()
.getReference(reference).orderByChild(child).equalTo("heart");
above query run successfully but what should be indexOn rule
The message you get means you are running a query that has orderBy("heart/+91916*******") on a node named gazal. To efficiently run that query, you need to have an index on heart/+91916******* to that node in your security rules. But since the +91916******* part of that index probably is dynamic (i.e. you'll have a different value of +91916******* for every user of the app), you'll have to add an index for each user. That is not feasible.
In other words: you current data structure makes it easy to read the users who have hearted a specific poem. It does however not make it easy to determine the poems that a specific user has hearted. To make that equally easy, you'll want to add an additional data structure"
"user_hearts": {
"+91916*******": {
"-KjYiXzl1ancR8Pi3MfQ": true
}
}
With the above structure you can easily read the user_hearts node for the user you're interested in, and then read the poems one by one if needed.
Also see:
Firebase query if child of child contains a value
Firebase Realtime Database - index on location containing uid

Caching Data vs. Multiple Queries

Very broad & open question concerning performance & implementation here:
The Program
I've built a program that allows a user to import an excel spreadsheet that contains a username and email address. This spreadsheet can have up to 100,000 unique records.
The Requirement
The requirement of this program is to check for duplicates in the database in order to prevent saving the same user twice
The Issue
The issue I anticipate running into is performance when it comes to checking for duplicates - I am looking for the fastest/most efficient method of validating unique users (based on the name and email address).
My first solution was to cache all existing members into a HashMap upon import, this way I can traverse the Map and compare my records being uploaded one by one. The obvious Pro here is one single database call - however if my Database has one million users stored I assume this may crash or severely lag my application.
The second solution was to call the database per each record to see if the username/email already exist. I'm not sure if this is desirable because 50,000 users would equal 50,000 database calls - doesn't sound too good to me.
Is there a preferred solution over the two listed above, or any aspect of this task I'm not taking into consideration here? (Batching, Database query patterns, etc).
Any input is appreciated, thank you!
Note * I'm using a SQL Server database (even though I'd like to be database agnostic, I'm open to any SQL recommendations)
If your database supports such a feature you could use a MERGE statement or a INSERT IGNORE so that all duplicate records are silently discarded and you can skip the test if the record already exists.
MERGE: https://en.wikipedia.org/wiki/Merge_%28SQL%29
MySQL INSERT IGNORE: https://dev.mysql.com/doc/refman/5.5/en/insert.html
Add constraint UNIQUE to email and username columns.
If you want to update on duplicates use UPSERT syntax that your database supports

Result Set to Multi Hash Map

I have a situation here. I have a huge database with >10 columns and millions of rows. I am using a matching algorithm which matches each input records with the values in database.
The database operation is taking lot of time when there are millions of records to match. I am thinking of using a multi-hash map or any resultset alternative so that i can save the whole table in memory and prevent hitting database again....
Can anybody tell me what should i do??
I don't think this is the right way to go. You are trying to do the database's work manually in Java. I'm not saying that you are not capable of doing this, but most databases have been developed for many years and are quite good in doing exactly the thing that you want.
However, databases need to be configured correctly for a given type of query to be executed fast. So my suggestion is that you first check whether you can tweak the database configuration to improve the performance of the query. The most common thing is to add the right indexes to your table. Read How MySQL Uses Indexes or the corresponding part of the manual of your particular database for more information.
The other thing is, if you have so much data storing everything in main memory is probably not faster and might even be infeasible. Not to say that you have to transfer the whole data first.
In any case, try to use a profiler to identify the bottleneck of the program first. Maybe the problem is not even on the database side.

Best way to sort the data : DB Query or in Application Code

I have a Mysql table with some data (> million rows). I have a requirement to sort the data based on the below criteria
1) Newest
2) Oldest
3) top rated
4) least rated
What is the recommended solution to develop the sort functionality
1) For every sort reuest execute a DBQuery with required joins and orderBy conditions and return the sorted data
2) Get all the data (un sorted) from table, put the data in cache. Write custom comparators (java) to sort the data.
I am leaning towards #2 as the load on DB is only once. Moreover, application code is better than DBQuery.
Please share your thoughts....
Thanks,
Karthik
Do as much in the database as you can. Note that if you have 1,000,000 rows, returning all million is nearly useless. Are you going to display this on a web site? I think not. Do you really care about the 500,000th least popular post? Again, I think not.
So do the sorts in the database and return the top 100, 500, or 1000 rows.
It's much faster to do it in the database:
1) the database is optimized for I/O operations, and can use indices, and other DB optimizations to improve the response time
2) taking the data from the database to the application will get all data into memory. The app will have to look all the data to redorder it without optimized algorithms
3) the database only takes the minimun necessary data into mamemory, which can be much less than all the data whihc has to be moved to java
4) you can always create extra indices on the database to improve the query performance.
I would say that operation on DB will be always faster. You should ensure that caching on DB is ON and working properly. Ensure that you are not using now() in your query because it will disable mysql cache. Take a look here how mysql query cache works. In basic. Query is cached based on string so if query string differs every time you fetch no cache is used.
AFAIK usually it should run faster if you let the DB sort your data.
And regarding code on application level vs db level I would agree in the case of stored procedures but sorting in SELECTs is fine IMHO.
If you want to show the data to the user also consider paging (in which case you're better off with sorting on the db level anyway).
Fetching a million rows from the database sounds like a terrible idea. It will generate a lot of networking traffic and require quite some time to transfer all the data. Not mentioning amounts of memory you would need to allocate in your application for storing million of objects.
So if you can fetch only a subset with a query, do that. Overall, do as much filtering as you can in the database.
And I do not see any problem in ordering in a single queue. You can always use UNION if you can't do it as one SELECT.
You do not have four tasks, you have two:
sort newest IS EQUAL TO sort oldest
AND
sort top rated IS EQUAL TO sort least rated.
So you need to make two calls to db. Yes sort in db. then instead of calling to sort every time, do this:
1] track the timestamp of the latest record in the db
2] before calling to sort and retrieve entire list, check if date has changed
3] if date has not changed, use the list you have in memory
4] if date has changed, update the list
I know this is an old thread, but it comes up in my search, so I'd like to post my opinion.
I'm a bit old school, but for that many rows, I would consider dumping the data from your database (each RDBMS has it's own method. Looks like MySQLDump command for MySQL: Link )
You can then process this with sorting algorithms or tools that are available in your java libraries or operating system.
Be careful about the work your asking your database to do. Remember that it has to be available to service other requests. Don't "bring it to it's knees" servicing only one request, unless it's a nightly batch cycle type of scenario and you're certain it won't be asked to do anything else.

Is it faster to access a java list (arraylist) compared to accessing the same data in a mysql database?

I have the MYSQL database in the local machine where I'm running the java program from.
I plan create a array list of all the entries of a particular table. From this point on wards I will not access the database to get a particular entry in the table, instead I will use the array list created. Is this going to be faster or slower compared to accessing the database to grab a particular entry in the table?
Please note that the table I'm interested has about 2 million entries.
Thank you.
More info : I need only two fields. 1 of type Long and 1 of type String. The index of the table is Long , not int.
No, it's going to be much slower, because to find an element in an ArrayList, you've to scan sequentially the ArrayList until your element is found.
It can be faster, for a few hundreds entry, because you don't have the connection overhead, but with two millions entry, MySQL is going to win, provided that you create the correct indexes. Only retrieve the rows that you actually need each time.
Why are you thinking to do this? Are you experiencing slow queries?
To find out, in your my.cnf activate the slow query log, by uncommenting (or adding) the following lines.
# Here you can see queries with especially long duration
log_slow_queries = /var/log/mysql/mysql-slow.log
long_query_time = 1
Then see which queries take a long time, and run them with EXPLAIN in front, consider to add index where the explain command tells you that is not using indexes, or just post a new question with your CREATE TABLE statement and your example query to optimize.
This question is too vague, and can easily go either way depending on:
How many fields in each record, how big are the fields?
What kind of access are you going to perform? Text search? Sequential?
For example, if each records consists of a couple bytes of data it's much faster to store them all in-memory (not necessarily an ArrayList though). You may want to put them into a TreeSet for example.
It depends on what you will do with the data. If you just wanted a few rows, only those should be fetched from the DB. If you know that you need ALL the data, go ahead and load the whole table into java if it can fit in memory. What will you do with it after? Sequencial or random reading? Will data be changed? A Map or Set could be a faster alternative depending on how the collection will be used.
Whether it is faster or slower is measurable. Time it. It is definitely faster to work with structures stored in memory than it is to work with data tables located on the disk. That is if you have enough memory and if you do not have 20 users running the same process at the same time.
How do you access the data? Do you have an integer index?
First, accessing an array list is much much faster than accessing a data base. Accessing memory is much more faster than accessing a hard disk.
If the number of entries in the array is big and I guess it is, then you need to consider using a "direct access" data structure such as a HashMap which will act as a database table where you have values referenced by their keys

Categories