I have 2 tables as follows:
ParentUpdate
name varchar(255)
value int not null
primary key: name
ChildUpdate
parentName varchar(255)
name varchar(255)
value int
data varchar(1000)
primary key: name
foreign key: parentName to ParentUpdate.name
When I run the statement "update ChildUpdate set parentName = 'Parent 2' where parentName = 'Parent 1'" with 2500 records in the ChildUpdate table and 1 record in the ParentUpdate table with only a single byte difference in data size in the ChildUpdate table, the performance decreases by 15 times.
When the ChildUpdate data column has exactly 14 bytes of the same character the runtime of the above query is about 500 milliseconds. When I add one more byte to the data column of ChildUpdate the performance all of a sudden becomes about 7500 milliseconds.
If i then decrease the data size back to 14 from 15 it's fast again. When i put it back to 15 it's slow again. This is reproducible every time.
Can you please help me figure out how to get the same fast performance without such seemingly random behaviour.
The query plans are below for both cases.
Statement Name:
null
Statement Text:
update ChildUpdate set parentName = 'Parent 2' where parentName = 'Parent 1'
Parse Time: 16
Bind Time: 0
Optimize Time: 0
Generate Time: 0
Compile Time: 16
Execute Time: -1453199411351
Begin Compilation Timestamp : 2016-01-19 05:30:11.32
End Compilation Timestamp : 2016-01-19 05:30:11.336
Begin Execution Timestamp : 2016-01-19 05:30:11.351
End Execution Timestamp : 2016-01-19 05:30:11.773
Statement Execution Plan Text:
Update ResultSet using row locking:
deferred: false
Rows updated = 2500
Indexes updated = 2
Execute Time = -1453199411383
Normalize ResultSet:
Number of opens = 1
Rows seen = 2500
constructor time (milliseconds) = 0
open time (milliseconds) = 0
next time (milliseconds) = 16
close time (milliseconds) = 16
optimizer estimated row count: 51.50
optimizer estimated cost: 796.12
Source result set:
Project-Restrict ResultSet (2):
Number of opens = 1
Rows seen = 2500
Rows filtered = 0
restriction = false
projection = true
constructor time (milliseconds) = 0
open time (milliseconds) = 0
next time (milliseconds) = 16
close time (milliseconds) = 16
restriction time (milliseconds) = 0
projection time (milliseconds) = 0
optimizer estimated row count: 51.50
optimizer estimated cost: 796.12
Source result set:
Table Scan ResultSet for CHILDUPDATE at read committed isolation level using exclusive row locking chosen by the optimizer
Number of opens = 1
Rows seen = 2500
Rows filtered = 0
Fetch Size = 1
constructor time (milliseconds) = 0
open time (milliseconds) = 15
next time (milliseconds) = 16
close time (milliseconds) = 16
next time in milliseconds/row = 0
scan information:
Bit set of columns fetched={0, 1}
Number of columns fetched=2
Number of pages visited=41
Number of rows qualified=2500
Number of rows visited=2500
Scan type=heap
start position:
null
stop position:
null
qualifiers:
Column[0][0] Id: 0
Operator: =
Ordered nulls: false
Unknown return value: false
Negate comparison result: false
optimizer estimated row count: 51.50
optimizer estimated cost: 796.12
total time: ~500 milliseconds
and the slow version
Statement Name:
null
Statement Text:
update ChildUpdate set parentName = 'Parent 2' where parentName = 'Parent 1'
Parse Time: 0
Bind Time: 0
Optimize Time: 0
Generate Time: 0
Compile Time: 0
Execute Time: -1453199485700
Begin Compilation Timestamp : 2016-01-19 05:31:25.684
End Compilation Timestamp : 2016-01-19 05:31:25.684
Begin Execution Timestamp : 2016-01-19 05:31:25.7
End Execution Timestamp : 2016-01-19 05:31:33.141
Statement Execution Plan Text:
Update ResultSet using row locking:
deferred: true
Rows updated = 2500
Indexes updated = 2
Execute Time = -1453199485747
Normalize ResultSet:
Number of opens = 1
Rows seen = 2500
constructor time (milliseconds) = 0
open time (milliseconds) = 0
next time (milliseconds) = 47
close time (milliseconds) = 0
optimizer estimated row count: 51.50
optimizer estimated cost: 810.94
Source result set:
Project-Restrict ResultSet (3):
Number of opens = 1
Rows seen = 2500
Rows filtered = 0
restriction = false
projection = true
constructor time (milliseconds) = 0
open time (milliseconds) = 0
next time (milliseconds) = 32
close time (milliseconds) = 0
restriction time (milliseconds) = 0
projection time (milliseconds) = 0
optimizer estimated row count: 51.50
optimizer estimated cost: 810.94
Source result set:
Project-Restrict ResultSet (2):
Number of opens = 1
Rows seen = 2500
Rows filtered = 0
restriction = false
projection = true
constructor time (milliseconds) = 0
open time (milliseconds) = 0
next time (milliseconds) = 32
close time (milliseconds) = 0
restriction time (milliseconds) = 0
projection time (milliseconds) = 0
optimizer estimated row count: 51.50
optimizer estimated cost: 810.94
Source result set:
Index Scan ResultSet for CHILDUPDATE using index TESTINDEX at read committed isolation level using exclusive row locking chosen by the optimizer
Number of opens = 1
Rows seen = 2500
Rows filtered = 0
Fetch Size = 1
constructor time (milliseconds) = 0
open time (milliseconds) = 0
next time (milliseconds) = 32
close time (milliseconds) = 0
next time in milliseconds/row = 0
scan information:
Bit set of columns fetched={0, 1, 2}
Number of columns fetched=3
Number of deleted rows visited=0
Number of pages visited=42
Number of rows qualified=2500
Number of rows visited=2500
Scan type=btree
Tree height=2
start position:
None
stop position:
None
qualifiers:
Column[0][0] Id: 1
Operator: =
Ordered nulls: false
Unknown return value: false
Negate comparison result: false
optimizer estimated row count: 51.50
optimizer estimated cost: 810.94
total time: ~7 seconds 500 milliseconds
I have to do a little assignment at my university:
I have a server that runs 'n' independent services. All these services started at the same time in the past. And every service 'i' writes 'b[i]' lines to a log file on the server after a certain period of time 's[i]' in seconds. The input consist of 'l' the number of lines of the log file and 'n' the number of services. Then we have in the next 'n' lines for every service i: 's[i]' the period as mentioned and 'b[i]' the number of lines the services writes to the log file.
I have to compute from the number of lines in the log file, how long ago, in seconds, the programs all started running. Example:
input:
19 3
7 1
8 1
10 2
Output:
42
I have to use divide and conquer, but I can't even figure out how to split this in subproblems. Also I have to use this function, where ss is the array of the periods of the services and bs the number of lines which each services writes to the log file:
long linesAt(int t, int[] ss, int[] bs) {
long out = 0;
for (int i = 0; i < ss.length; i++) {
// floor operation
out += bs[i] * (long)(t/ss[i]);
}
return out;
ss and bs are basically arrays of the input, if we take the example they will look like this, where the row above is the index of the array:
ss:
0 1 2
7 8 10
bs:
0 1 2
1 1 2
It is easily seen that 42 should be the output
linesAt(42) = floor(42/7)*1+floor(42/8)*1+floor(42/10)*2 = 19
Now I have to write a function
int solve(long l, int[] ss, int[] bs)
I already wrote some pseudocode in brute force, but I can't figure out how to solve this with the divide and conquer paradigm, my pseudocode looks like this:
Solve(l, ss, bs)
out = 0
t = 0
while (out != l)
out = linesAt(t, ss, bs)
t++
end while
return t
I think I have to split l in some way, so to calculate the time for smaller lengths. But I don't really see how, because when you look at this it doesn't seem to be possible:
t out
0..6 0
7 1
8 2
9 2
10 4
11..13 4
14 5
15 5
16 6
17..19 6
20 8
...
40 18
42 19
Chantal.
Sounds like a classic binary search would fit the bill, with a prior step to obtain a suitable maximum. You start with some estimate of time 't' (say 100) and call linesAt to obtain the lines for that t. If the value returned is too small (i.e. smaller than l), you double 't' and try again, until the number of lines is too large.
At this point, your maximum is t and your minimum is t/2. You then repeatedly:
pick t as the point halfway between maximum and minimum
call linesAt(t,...) to obtain the number of lines
if you've found the target, stop.
if you have too many lines, adjust the maximum: maximum = t
if you have too few lines adjust the minimum: minimum = t
The above algorithm is a binary search - it splits the search space in half each iteration. Thus, it is an example of divide-and-conquer.
You are trying to solve an integer equation:
floor(n/7)*1+floor(n/8)*1+floor(n/10)*2 = 19
You can remove the floor function and solve for n and get a lower bound and upper bound, then search between these two bounds.
Solving the following equation:
(n/7)*1+(n/8)*1+(n/10)*2 = 19
n=19/(1/7+1/8+2/10)
Having found n, which range of value m0 will be such that floor (m0 / 7) = floor (n/7)?
floor (n/7) * 7 <= m0 <= (ceiling (n/7) * 7) - 1
In the same manner, calculate m1 and m2.
Take max (mi) as upperbound and min(mi) as lowerbound for i between 1 and 3 .
A binary search at this point will probably be an overkill.
I have n tasks, each has a specific deadline and time it takes to complete. However, I cannot complete all tasks with in their deadlines. I need to arrange these tasks in such a way to minimize the task's deadline over shoot time. Consider this case(left values are dead lines and right side values are time the task takes):
2 2
1 1
4 3
These three tasks can be done optimally like this:
time 1 : task 2 - task1 complete; 0 overshoot for task2
time 2 : task 1
time 3 : task 1 - task2 complete; 1 overshoot for task1
time 4 : task 3
time 5 : task 3
time 6 : task 3 - task3 complete; 3 overshoot for task3
I need a faster algorithm for this; my goal is to find maximum overshoot of all overshoots(in above case its 3). Right now, i am sorting the tasks based on deadlines but its not fast, as when a new task is added, I should sort the whole list. Is there any other way?
After Lawrey's suggestion, I am using PriorityQueue but it is not giving me exact sorting.
This is my code:
class Compare2DArray implements Comparator<int[]> {
public int compare(int a[], int b[]) {
for (int i = 0; i < a.length && i < b.length; i++)
if (a[i] != b[i])
return a[i] - b[i];
return a.length - b.length;
}
}
public class MyClass{
public static void main(String args[]) {
Scanner scan = new Scanner(System.in);
int numberOfInputs= scan.nextInt();
PriorityQueue<int[]> inputsList = new PriorityQueue<int[]>(numberOfInputs,new Compare2DArray());
for (int i = 0; i < numberOfInputs; i++) {
int[] input = new int[2];
input[0] = scan.nextInt();
input[1] = scan.nextInt();
inputsList.add(input);
}
}
But this is sorting this queue of arrays
2 2
1 1
4 3
10 1
2 1
as
1 1
2 1
4 3
10 1
2 2
instead of
1 1
2 1
2 2
4 3
10 1
The same comparator works fine over List sorting. I am not getting whats wrong with PriorityQueue
Priority Queue is implemented using heaps. Hence, when you scan over the elements of priority queue it will not guarantee that it will give you all elements in sorted order. That is why you are not getting the desired sorted array.
I also faced the same problem for the question. I ended up using multimap in c++. But still the time complexity didn't improved much.
Unless you have a really long list of tasks, e.g. millions, it shouldn't be taking that long.
However, what you need is likely to be a PriorityQueue which has O(1) add and O(ln N) take
I was attempting the same question (Its from interviewstreet, I suppose). Did you get this order:
1 1, 2 1, 4 3, 10 1, 2 2
when you printed the heap? Did you try popping items off the heap one by one and check their order?
I am saying this since my implementation is in python and when I print the heap, I get the same order as you were saying. But that is not the point here, I think, since when I pop elements of the heap, one by one, I get a proper order that is:
1 1, 2 1, 2 2, 4 3, 10 1
Here is what my code in python looks like: (I am using the heapq library for implementing the priority queue)
To add elements to the heap:
[deadline, minutes] = map( int, raw_input().split() )
heapq.heappush( heap, ( deadline, minutes ) )
To remove them from the heap:
d, m = heapq.heappop( heap )
Here is the output I get when I print the heap, followed by popping elements from the heap step by step:
Heap: [(1, 1), (2, 1), (4, 3), (10, 1), (2, 2)]
Job taken: 1 1
Job taken: 2 1
Job taken: 2 2
Job taken: 4 3
Job taken: 10 1
Hope that helps!
I want to get data from the database (MySQL) by JPA, I want it sorted by some column value.
So, what is the best practice, to:
Retrieve the data from the database as list of objects (JPA), then
sort it programmatically using some java APIs.
OR
Let the database sort it by using a sorting select query.
Thanks in advance
If you are retrieving a subset of all the database data, for example displaying 20 rows on screen out of 1000, it is better to sort on the database. This will be faster and easier and will allow you to retrieve one page of rows (20, 50, 100) at a time instead of all of them.
If your dataset is fairly small, sorting in your code may be more convenient if you want implement a complex sort. Usually this complex sort can be done in SQL but not as easily as in code.
The short of it is, the rule of thumb is sort via SQL, with some edge cases to the rule.
In general, you're better off using ORDER BY in your SQL query -- this way, if there is an applicable index, you may be getting your sorting "for free" (worst case, it will be the same amount of work as doing it in your code, but often it may be less work than that!).
I ran into this very same question, and decided that I should run a little benchmark to quantify the speed differences. The results surprised me. I would like to post my experience with this very sort of question.
As with a number of the other posters here, my thought was that the database layer would do the sort faster because they are supposedly tuned for this sort of thing. #Alex made a good point that if the database already has an index on the sort, then it will be faster. I wanted to answer the question which raw sorting is faster on non-indexed sorts. Note, I said faster, not simpler. I think in many cases letting the db do the work is simpler and less error prone.
My main assumption was that the sort would fit in main memory. Not all problems will fit here, but a good number do. For out of memory sorts, it may well be that databases shine here, though I did not test that. In the case of in memory sorts all of java/c/c++ outperformed mysql in my informal benchmark, if one could call it that.
I wish I had had more time to more thoroughly compare the database layer vs application layer, but alas other duties called. Still, I couldn't help but record this note for others who are traveling down this road.
As I started down this path I started to see more hurdles. Should I compare data transfer? How? Can I compare time to read db vs time to read a flat file in java? How to isolate the sort time vs data transfer time vs time to read the records? With these questions here was the methodology and timing numbers I came up with.
All times in ms unless otherwise posted
All sort routines were the defaults provided by the language (these are good enough for random sorted data)
All compilation was with a typical "release-profile" selected via netbeans with no customization unless otherwise posted
All tests for mysql used the following schema
mysql> CREATE TABLE test_1000000
(
pk bigint(11) NOT NULL,
float_value DOUBLE NULL,
bigint_value bigint(11) NULL,
PRIMARY KEY (pk )
) Engine MyISAM;
mysql> describe test_1000000;
+--------------+------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------+------------+------+-----+---------+-------+
| pk | bigint(11) | NO | PRI | NULL | |
| float_value | double | YES | | NULL | |
| bigint_value | bigint(11) | YES | | NULL | |
+--------------+------------+------+-----+---------+-------+
First here is a little snippet to populate the DB. There may be easier ways, but this is what I did:
public static void BuildTable(Connection conn, String tableName, long iterations) {
Random ran = new Random();
Math.random();
try {
long epoch = System.currentTimeMillis();
for (long i = 0; i < iterations; i++) {
if (i % 100000 == 0) {
System.out.println(i + " next 100k");
}
PerformQuery(conn, tableName, i, ran.nextDouble(), ran.nextLong());
}
} catch (Exception e) {
logger.error("Caught General Exception Error from main " + e);
}
}
MYSQL Direct CLI results:
select * from test_10000000 order by bigint_value limit 10;
10 rows in set (2.32 sec)
These timings were somewhat difficult as the only info I had was the time reported after the execution of the command.
from mysql prompt for 10000000 elements it is roughly 2.1 to 2.4 either for sorting bigint_value or float_value
Java JDBC mysql call (similar performance to doing sort from mysql cli)
public static void SortDatabaseViaMysql(Connection conn, String tableName) {
try {
Statement stmt = conn.createStatement();
String cmd = "SELECT * FROM " + tableName + " order by float_value limit 100";
ResultSet rs = stmt.executeQuery(cmd);
} catch (Exception e) {
}
}
Five runs:
da=2379 ms
da=2361 ms
da=2443 ms
da=2453 ms
da=2362 ms
Java Sort Generating random numbers on fly (actually was slower than disk IO read). Assignment time is the time to generate random numbers and populate the array
Calling like
JavaSort(10,10000000);
Timing results:
assignment time 331 sort time 1139
assignment time 324 sort time 1037
assignment time 317 sort time 1028
assignment time 319 sort time 1026
assignment time 317 sort time 1018
assignment time 325 sort time 1025
assignment time 317 sort time 1024
assignment time 318 sort time 1054
assignment time 317 sort time 1024
assignment time 317 sort time 1017
These results were for reading a file of doubles in binary mode
assignment time 4661 sort time 1056
assignment time 4631 sort time 1024
assignment time 4733 sort time 1004
assignment time 4725 sort time 980
assignment time 4635 sort time 980
assignment time 4725 sort time 980
assignment time 4667 sort time 978
assignment time 4668 sort time 980
assignment time 4757 sort time 982
assignment time 4765 sort time 987
Doing a buffer transfer results in much faster runtimes
assignment time 77 sort time 1192
assignment time 59 sort time 1125
assignment time 55 sort time 999
assignment time 55 sort time 1000
assignment time 56 sort time 999
assignment time 54 sort time 1010
assignment time 55 sort time 999
assignment time 56 sort time 1000
assignment time 55 sort time 1002
assignment time 56 sort time 1002
C and C++ Timing results (see below for source)
Debug profile using qsort
assignment 0 seconds 110 milliseconds Time taken 2 seconds 340 milliseconds
assignment 0 seconds 90 milliseconds Time taken 2 seconds 340 milliseconds
assignment 0 seconds 100 milliseconds Time taken 2 seconds 330 milliseconds
assignment 0 seconds 100 milliseconds Time taken 2 seconds 340 milliseconds
assignment 0 seconds 100 milliseconds Time taken 2 seconds 330 milliseconds
assignment 0 seconds 100 milliseconds Time taken 2 seconds 340 milliseconds
assignment 0 seconds 90 milliseconds Time taken 2 seconds 340 milliseconds
assignment 0 seconds 100 milliseconds Time taken 2 seconds 330 milliseconds
assignment 0 seconds 100 milliseconds Time taken 2 seconds 340 milliseconds
assignment 0 seconds 100 milliseconds Time taken 2 seconds 330 milliseconds
Release profile using qsort
assignment 0 seconds 100 milliseconds Time taken 1 seconds 600 milliseconds
assignment 0 seconds 90 milliseconds Time taken 1 seconds 600 milliseconds
assignment 0 seconds 90 milliseconds Time taken 1 seconds 580 milliseconds
assignment 0 seconds 90 milliseconds Time taken 1 seconds 590 milliseconds
assignment 0 seconds 80 milliseconds Time taken 1 seconds 590 milliseconds
assignment 0 seconds 90 milliseconds Time taken 1 seconds 590 milliseconds
assignment 0 seconds 90 milliseconds Time taken 1 seconds 600 milliseconds
assignment 0 seconds 90 milliseconds Time taken 1 seconds 590 milliseconds
assignment 0 seconds 90 milliseconds Time taken 1 seconds 600 milliseconds
assignment 0 seconds 90 milliseconds Time taken 1 seconds 580 milliseconds
Release profile Using std::sort( a, a + ARRAY_SIZE );
assignment 0 seconds 100 milliseconds Time taken 0 seconds 880 milliseconds
assignment 0 seconds 90 milliseconds Time taken 0 seconds 870 milliseconds
assignment 0 seconds 90 milliseconds Time taken 0 seconds 890 milliseconds
assignment 0 seconds 120 milliseconds Time taken 0 seconds 890 milliseconds
assignment 0 seconds 90 milliseconds Time taken 0 seconds 890 milliseconds
assignment 0 seconds 90 milliseconds Time taken 0 seconds 880 milliseconds
assignment 0 seconds 90 milliseconds Time taken 0 seconds 900 milliseconds
assignment 0 seconds 90 milliseconds Time taken 0 seconds 890 milliseconds
assignment 0 seconds 100 milliseconds Time taken 0 seconds 890 milliseconds
assignment 0 seconds 150 milliseconds Time taken 0 seconds 870 milliseconds
Release profile Reading random data from file and using std::sort( a, a + ARRAY_SIZE )
assignment 0 seconds 50 milliseconds Time taken 0 seconds 880 milliseconds
assignment 0 seconds 40 milliseconds Time taken 0 seconds 880 milliseconds
assignment 0 seconds 50 milliseconds Time taken 0 seconds 880 milliseconds
assignment 0 seconds 50 milliseconds Time taken 0 seconds 880 milliseconds
assignment 0 seconds 40 milliseconds Time taken 0 seconds 880 milliseconds
Below is the source code used. Hopefully minimal bugs :)
Java source
Note that internal to JavaSort the runCode and writeFlag need to be adjusted depending on what you want to time. Also note that the memory allocation happens in the for loop (thus testing GC, but I did not see any appreciable difference moving the allocation outside the loop)
public static void JavaSort(int iterations, int numberElements) {
Random ran = new Random();
Math.random();
int runCode = 2;
boolean writeFlag = false;
for (int j = 0; j < iterations; j++) {
double[] a1 = new double[numberElements];
long timea = System.currentTimeMillis();
if (runCode == 0) {
for (int i = 0; i < numberElements; i++) {
a1[i] = ran.nextDouble();
}
}
else if (runCode == 1) {
//do disk io!!
try {
DataInputStream in = new DataInputStream(new FileInputStream("MyBinaryFile.txt"));
int i = 0;
//while (in.available() > 0) {
while (i < numberElements) { //this should be changed so that I always read in the size of array elements
a1[i++] = in.readDouble();
}
}
catch (Exception e) {
}
}
else if (runCode == 2) {
try {
FileInputStream stream = new FileInputStream("MyBinaryFile.txt");
FileChannel inChannel = stream.getChannel();
ByteBuffer buffer = inChannel.map(FileChannel.MapMode.READ_ONLY, 0, inChannel.size());
//int[] result = new int[500000];
buffer.order(ByteOrder.BIG_ENDIAN);
DoubleBuffer doubleBuffer = buffer.asDoubleBuffer();
doubleBuffer.get(a1);
}
catch (Exception e) {
}
}
if (writeFlag) {
try {
DataOutputStream out = new DataOutputStream(new FileOutputStream("MyBinaryFile.txt"));
for (int i = 0; i < numberElements; i++) {
out.writeDouble(a1[i]);
}
} catch (Exception e) {
}
}
long timeb = System.currentTimeMillis();
Arrays.sort(a1);
long timec = System.currentTimeMillis();
System.out.println("assignment time " + (timeb - timea) + " " + " sort time " + (timec - timeb));
//delete a1;
}
}
C/C++ source
#include <iostream>
#include <vector>
#include <algorithm>
#include <fstream>
#include <cstdlib>
#include <ctime>
#include <cstdio>
#include <math.h>
#include <stdio.h>
#include <time.h>
#include <stdlib.h>
#define ARRAY_SIZE 10000000
using namespace std;
int compa(const void * elem1, const void * elem2) {
double f = *((double*) elem1);
double s = *((double*) elem2);
if (f > s) return 1;
if (f < s) return -1;
return 0;
}
int compb (const void *a, const void *b) {
if (*(double **)a < *(double **)b) return -1;
if (*(double **)a > *(double **)b) return 1;
return 0;
}
void timing_testa(int iterations) {
clock_t start = clock(), diffa, diffb;
int msec;
bool writeFlag = false;
int runCode = 1;
for (int loopCounter = 0; loopCounter < iterations; loopCounter++) {
double *a = (double *) malloc(sizeof (double)*ARRAY_SIZE);
start = clock();
size_t bytes = sizeof (double)*ARRAY_SIZE;
if (runCode == 0) {
for (int i = 0; i < ARRAY_SIZE; i++) {
a[i] = rand() / (RAND_MAX + 1.0);
}
}
else if (runCode == 1) {
ifstream inlezen;
inlezen.open("test", ios::in | ios::binary);
inlezen.read(reinterpret_cast<char*> (&a[0]), bytes);
}
if (writeFlag) {
ofstream outf;
const char* pointer = reinterpret_cast<const char*>(&a[0]);
outf.open("test", ios::out | ios::binary);
outf.write(pointer, bytes);
outf.close();
}
diffa = clock() - start;
msec = diffa * 1000 / CLOCKS_PER_SEC;
printf("assignment %d seconds %d milliseconds\t", msec / 1000, msec % 1000);
start = clock();
//qsort(a, ARRAY_SIZE, sizeof (double), compa);
std::sort( a, a + ARRAY_SIZE );
//printf("%f %f %f\n",a[0],a[1000],a[ARRAY_SIZE-1]);
diffb = clock() - start;
msec = diffb * 1000 / CLOCKS_PER_SEC;
printf("Time taken %d seconds %d milliseconds\n", msec / 1000, msec % 1000);
free(a);
}
}
/*
*
*/
int main(int argc, char** argv) {
printf("hello world\n");
double *a = (double *) malloc(sizeof (double)*ARRAY_SIZE);
//srand(1);//change seed to fix it
srand(time(NULL));
timing_testa(5);
free(a);
return 0;
}
This is not completely on point, but I posted something recently that relates to database vs. application-side sorting. The article is about a .net technique, so most of it likely won't be interesting to you, but the basic principles remain:
Deferring sorting to the client side (e.g. jQuery, Dataset/Dataview sorting) may look tempting. And it actually is a viable option for paging, sorting and filtering, if (and only if):
1. the set of data is small, and
1. there is little concern about performance and scalability
From my experience, the systems that meet this kind of criteria are few and far between. Note that it’s not possible to mix and match sorting/paging in the application/database—if you ask the database for an unsorted 100 rows of data, then sort those rows on the application side, you’re likely not going to get the set of data you were expecting. This may seem obvious, but I’ve seen the mistake made enough times that I wanted to at least mention it.
It is much more efficient to sort and filter in the database for a number of reasons. For one thing, database engines are highly optimized for doing exactly the kind of work that sorting and filtering entail; this is what their underlying code was designed to do. But even barring that—even assuming you could write code that could match the kind of sorting, filtering and paging performance of a mature database engine—it’s still preferable to do this work in the database, for the simple reason that it’s more efficient to limit the amount of data that is transferred from the database to the application server.
So for example, if you have 10,000 rows before filtering, and your query pares that number down to 75, filtering on the client results in the data from all 10,000 rows being passed over the wire (and into your app server’s memory), where filtering on the database side would result in only the filtered 75 rows being moved between database and application. his can make a huge impact on performance and scalability.
The full post is here:
http://psandler.wordpress.com/2009/11/20/dynamic-search-objects-part-5sorting/
I'm almost positive that it will be faster to allow the Database to sort it. There's engineers who spend a lot of time perfecting and optimizing their search algorithms, whereas you'll have to implement your own sorting algorithm which might add a few more computations.
I would let the database do the sort, they are generally very good at that.
Let the database sort it. Then you can have paging with JPA easily without readin in the whole resultset.
Well, there is not really a straightforward way to answer this; it must be answered in the context.
Is your application (middle tier) is running in the same node as the database?
If yes, you do not have to worry about the latency between the database and middle tier. Then the question becomes: How big is the subset/resultset of your query? Remember that to sort this is middle tier, you will take a list/set of size N, and either write a custom comparator or use the default Collection comparator. Or, whatever. So at the outset, you are setback by the size N.
But if the answer is no, then you are hit by the latency involved in transferring your resultset from DB to middle tier. And then if you are performing pagination, which is the last thing you should do, you are throwing away 90-95% of that resultset after cutting the pages.
So the wasted bandwidth cannot be justified. Imagine doing this for every request, across your tenant organizations.
However way you look at it, this is bad design.
I would do this in the database, no matter what. Just because almost all applications today demand pagination; even if they don't sending massive resultsets over the wire to your client is a total waste; drags everybody down across all your tenants.
One interesting idea that I am toying with these days is to harness the power of HTML5, 2-way data binding in browser frameworks like Angular, and push some processing back to the browser. That way, you dont end up waiting in the line for someone else before you to finish. True distributed processing. But care must be taken in deciding what can be pushed and what not.
Depends on the context.
TL;DR
If you have the full data in your application server, do it in the application server.
If you have the full dataset that you need on the application server side already then it is better to do it on the application server side because those servers can scale horizontally. The most likely scenarios for this are:
the data set you're retrieving from the database is small
you cached the data on the application server side on startup
You're doing event sourcing and you're building up the data in the application server side anyway.
Don't do it on client side unless you can guarantee it won't impact the client devices.
Databases themselves may be optimized, but if you can pull burden away from them you can reduce your costs overall because scaling the databases up is more expensive than scaling up application servers.