No JIT Optimization

No JIT Optimization - java

Have a look at this question :
The code:
class test
{
public static void main(String abc[])
{
for( int k=1; k<=3; k++)
{
for( int N=1; N<=1_000_000_000; N=N*10)
{
long t1 = System.nanoTime();
int j=1;
for(int i=0; i<=N; i++)
j=j*i;
long t2 = System.nanoTime() - t1;
System.out.println("Time taken for "+ N + " : "+ t2);
}
}
}
}
The output of above code:
Time taken for 1 : 2160
Time taken for 10 : 1142
Time taken for 100 : 2651
Time taken for 1000 : 19453
Time taken for 10000 : 407754
Time taken for 100000 : 4648124
Time taken for 1000000 : 12859417
Time taken for 10000000 : 13706643
Time taken for 100000000 : 136928177
Time taken for 1000000000 : 1368847843
Time taken for 1 : 264
Time taken for 10 : 233
Time taken for 100 : 332
Time taken for 1000 : 1562
Time taken for 10000 : 17341
Time taken for 100000 : 136869
Time taken for 1000000 : 1366934
Time taken for 10000000 : 13689017
Time taken for 100000000 : 136887869
Time taken for 1000000000 : 1368178175
Time taken for 1 : 231
Time taken for 10 : 242
Time taken for 100 : 328
Time taken for 1000 : 1551
Time taken for 10000 : 13854
Time taken for 100000 : 136850
Time taken for 1000000 : 1366919
Time taken for 10000000 : 13692465
Time taken for 100000000 : 136833634
Time taken for 1000000000 : 1368862705
In the loop, even though the value of i starts from 0, indicating the product to be zero, there is no JIT Optimization. Why not ?
In the link provided above, I had previously put the for loop in a method call, which the JIT was optimizing. Is putting the statements in a method facilitating in the optimization process ?

In your previous question the JIT optimized away the complete code of the method start without any analysis as to what number happened to be present in the variables upon method return. This is because you chose to make your method void, giving the JIT a dead-easy clue that any values calculated will be discarded.
Contrasting your current example with the one from your previous question, there are no void methods called so naturally the optimization does not occur. Why there is not some other optimization which would help this completely different case is an unanswerable question. There is just no such optimization in the specefic JVM implementation, and the specific JVM invocation, with which you have tested your code.

The loop itself does get jitted (as observed by the slightly lower running times on second and third execution), however eliminating the entire loop is - afaik - only done when the method itself is executed multiple times, because only then the JIT has sufficient runtime information to be sure it can actually eliminate it without consequence.
If I change your code, the loop is eliminated on the third invocation:
public class LoopJit2 {
public static void main(String abc[]) {
for (int x = 0; x < 3; x++) {
loopMethod();
}
}
private static void loopMethod() {
for (int N = 1; N <= 1_000_000_000; N = N * 10) {
long t1 = System.nanoTime();
int j = 1;
for (int i = 0; i <= N; i++)
j = j * i;
long t2 = System.nanoTime() - t1;
System.out.println("Time taken for " + N + " : " + t2);
}
}
}
Time series:
Time taken for 1 : 1466
Time taken for 10 : 1467
Time taken for 100 : 2934
Time taken for 1000 : 20044
Time taken for 10000 : 201422
Time taken for 100000 : 1993200
Time taken for 1000000 : 4038223
Time taken for 10000000 : 11182357
Time taken for 100000000 : 111290192
Time taken for 1000000000 : 1038002176
Time taken for 1 : 1466
Time taken for 10 : 1467
Time taken for 100 : 2934
Time taken for 1000 : 20044
Time taken for 10000 : 10755
Time taken for 100000 : 124667
Time taken for 1000000 : 1010045
Time taken for 10000000 : 10201156
Time taken for 100000000 : 103184413
Time taken for 1000000000 : 1019723107
Time taken for 1 : 978
Time taken for 10 : 1467
Time taken for 100 : 1467
Time taken for 1000 : 1955
Time taken for 10000 : 978
Time taken for 100000 : 489
Time taken for 1000000 : 977
Time taken for 10000000 : 977
Time taken for 100000000 : 978
Time taken for 1000000000 : 978

Related

My loop does not work for finding the System Time Millis in Eclipse Java

Why does my loop not work?
I'm trying to increment by 5 and the output the time it took to increment with the linear method
for(n=5;n<=10;n=n+5) {
long startTime= System.currentTimeMillis();
System.out.println(startTime);
System.out.println("\nOddanaci(" + n +")\n" );
linear(n);
long endTime= System.currentTimeMillis();
long diff= endTime - startTime;
System.out.println("\n\nThe Total time it took to run this program is\n"+diff);
}
my output
Please enter the a non-negative value to find its Oddonacci sequence: 3
Here is the Oddonacci(3) sequence
1 1 1
The method has been called 1 times.
1537756698523
Oddanaci(5)
1 1 1 3 5
The Total time it took to run this program is
1
1537756698524
Oddanaci(10)
1 1 1 3 5 9 17 31 57 105
The Total time it took to run this program is
0
Why does it output as zero? Doesn't n iterate?

Java multiplication strange behaviour

Have a look at the code below:
class Test
{
public static void main(String abc[])
{
for( int N=1; N <= 1_000_000_000; N=N*10)
{
long t1 = System.nanoTime();
start(N);
long t2 = System.nanoTime() - t1;
System.out.println("Time taken for " + N + " : " + t2);
}
}
public static void start( int N )
{
int j=1;
for(int i=0; i<=N; i++)
j=j*i;
}
}
The output produced by the above question is:
Time taken for 1 : 7267
Time taken for 10 : 3312
Time taken for 100 : 7908
Time taken for 1000 : 51181
Time taken for 10000 : 432124
Time taken for 100000 : 4313696
Time taken for 1000000 : 9347132
Time taken for 10000000 : 858
Time taken for 100000000 : 658
Time taken for 1000000000 : 750
Questions:
1.) Why is time taken for N=1 unusually greater than the N=10 ? (sometimes it even exceeds N=100)
2.) Why is time taken for N=10M and onwards unusually lower ?
The pattern indicated in the above questions is profound and remains even after many iterations.
Is there any connection to memoization here ?
EDIT:
Thank you for your answers. I thought of replacing the method call with the actual loop. But now, there is no JIT Optimization. Why not ? Is putting the statements in a method facilitating in the optimization process ?
The modified code is below:
class test
{
public static void main(String abc[])
{
for( int k=1; k<=3; k++)
{
for( int N=1; N<=1_000_000_000; N=N*10)
{
long t1 = System.nanoTime();
int j=1;
for(int i=0; i<=N; i++)
j=j*i;
long t2 = System.nanoTime() - t1;
System.out.println("Time taken for "+ N + " : "+ t2);
}
}
}
}
EDIT 2:
The output of above modified code:
Time taken for 1 : 2160
Time taken for 10 : 1142
Time taken for 100 : 2651
Time taken for 1000 : 19453
Time taken for 10000 : 407754
Time taken for 100000 : 4648124
Time taken for 1000000 : 12859417
Time taken for 10000000 : 13706643
Time taken for 100000000 : 136928177
Time taken for 1000000000 : 1368847843
Time taken for 1 : 264
Time taken for 10 : 233
Time taken for 100 : 332
Time taken for 1000 : 1562
Time taken for 10000 : 17341
Time taken for 100000 : 136869
Time taken for 1000000 : 1366934
Time taken for 10000000 : 13689017
Time taken for 100000000 : 136887869
Time taken for 1000000000 : 1368178175
Time taken for 1 : 231
Time taken for 10 : 242
Time taken for 100 : 328
Time taken for 1000 : 1551
Time taken for 10000 : 13854
Time taken for 100000 : 136850
Time taken for 1000000 : 1366919
Time taken for 10000000 : 13692465
Time taken for 100000000 : 136833634
Time taken for 1000000000 : 1368862705

1.) Why is time taken for N=1 unusually greater than the N=10
Because it's the first time the VM has seen that code - it may decide to just interpret it, or it will take a little bit of time JITting it to native code, but probably without optimization. This is one of the "gotchas" of benchmarking Java.
2.) Why is time taken for N=10M and onwards unusually lower ?
At that point, the JIT has worked harder to optimize the code - reducing it to almost nothing.
In particular, if you run this code multiple times (just in a loop), you'll see the effect of the JIT compiler optimizing:
Time taken for 1 : 3732
Time taken for 10 : 1399
Time taken for 100 : 3266
Time taken for 1000 : 26591
Time taken for 10000 : 278508
Time taken for 100000 : 2496773
Time taken for 1000000 : 4745361
Time taken for 10000000 : 933
Time taken for 100000000 : 466
Time taken for 1000000000 : 933
Time taken for 1 : 933
Time taken for 10 : 467
Time taken for 100 : 466
Time taken for 1000 : 466
Time taken for 10000 : 933
Time taken for 100000 : 466
Time taken for 1000000 : 933
Time taken for 10000000 : 467
Time taken for 100000000 : 467
Time taken for 1000000000 : 466
Time taken for 1 : 467
Time taken for 10 : 467
Time taken for 100 : 466
Time taken for 1000 : 466
Time taken for 10000 : 466
Time taken for 100000 : 467
Time taken for 1000000 : 466
Time taken for 10000000 : 466
Time taken for 100000000 : 466
Time taken for 1000000000 : 466
As you can see, after the first the loop takes the same amount of time whatever the input (module noise - basically it's always either ~460ns or ~933ns, unpredictably) which means the JIT has optimized the loop out.
If you actually returned j, and changed the initial value of i to 1 instead of 0, you'll see the kind of results you expect. The change of the initial value of i to 1 is because otherwise the JIT can spot that you'll always end up returning 0.

youre actually benchmarking java's JIT. if i modify yout code a bit:
class Test
{
public static void main(String abc[])
{
for( int N=1; N <= 1_000_000_000; N=N*10)
{
long t1 = System.nanoTime();
start(N);
long t2 = System.nanoTime() - t1;
System.out.println("Time taken for " + N + " : " + t2);
}
for( int N=1; N <= 1_000_000_000; N=N*10)
{
long t1 = System.nanoTime();
start(N);
long t2 = System.nanoTime() - t1;
System.out.println("Time taken for " + N + " : " + t2);
}
}
public static void start( int N )
{
int j=1;
for(int i=0; i<=N; i++)
j=j*i;
}
}
i get this:
Time taken for 1 : 1811
Time taken for 10 : 604
Time taken for 100 : 1510
Time taken for 1000 : 10565
Time taken for 10000 : 104439
Time taken for 100000 : 829173
Time taken for 1000000 : 604
Time taken for 10000000 : 302
Time taken for 100000000 : 0
Time taken for 1000000000 : 0
Time taken for 1 : 0
Time taken for 10 : 302
Time taken for 100 : 0
Time taken for 1000 : 302
Time taken for 10000 : 301
Time taken for 100000 : 302
Time taken for 1000000 : 0
Time taken for 10000000 : 0
Time taken for 100000000 : 0
Time taken for 1000000000 : 302
never benchmark a "cold" system. always repeat every measurement several times and discard the 1st few ones because the optimizations have not yet kicked in

The reason is that 1) you don't return the value, and 2) the result of the calculation is always 0. Eventually the JIT will simply compile the loop away.
You get your expected behaviour if you change your loop to:
public static int start(int N) {
int j = 1;
for (int i = 1; i <= N; i++)
j = j * i;
return j;
}
Note that I have both changed the loop init to int i = 1 and added return j. If I only do one of those, the loop will (eventually) still be compiled away.
This will produce the following series (if executed twice):
Time taken for 1 : 2934
Time taken for 10 : 1466
Time taken for 100 : 3422
Time taken for 1000 : 20534
Time taken for 10000 : 191644
Time taken for 100000 : 1898845
Time taken for 1000000 : 1210489
Time taken for 10000000 : 11884401
Time taken for 100000000 : 115257525
Time taken for 1000000000 : 1061254223
Time taken for 1 : 978
Time taken for 10 : 978
Time taken for 100 : 978
Time taken for 1000 : 2444
Time taken for 10000 : 11244
Time taken for 100000 : 103644
Time taken for 1000000 : 1030089
Time taken for 10000000 : 10448535
Time taken for 100000000 : 107299391
Time taken for 1000000000 : 1072580803

Empty speed test, unexpected result

Why might this code
long s, e, sum1 = 0, sum2 = 0, TRIALS = 10000000;
for(long i=0; i<TRIALS; i++) {
s = System.nanoTime();
e = System.nanoTime();
sum1 += e - s;
s = System.nanoTime();
e = System.nanoTime();
sum2 += e - s;
}
System.out.println(sum1 / TRIALS);
System.out.println(sum2 / TRIALS);
produce this result
-60
61
"on my machine?"
EDIT:
Sam I am's answer points to the nanoSecond() documentation which helps, but now, more precisely, why does the result consistently favor the first sum?
"my machine":
JavaSE-1.7, Eclipse
Win 7 x64, AMD Athlon II X4 635
switching the order inside the loop produces reverse results
for(int i=0; i<TRIALS; i++) {
s = System.nanoTime();
e = System.nanoTime();
sum2 += e - s;
s = System.nanoTime();
e = System.nanoTime();
sum1 += e - s;
}
61
-61
looking (e-s) before adding it to sum1 makes sum1 positive.
for(long i=0; i<TRIALS; i++) {
s = System.nanoTime();
e = System.nanoTime();
temp = e-s;
if(temp < 0)
count++;
sum1 += temp;
s = System.nanoTime();
e = System.nanoTime();
sum2 += e - s;
}
61
61
And as Andrew Alcock points out, sum1 += -s + e produces the expected outcome.
for(long i=0; i<TRIALS; i++) {
s = System.nanoTime();
e = System.nanoTime();
sum1 += -s + e;
s = System.nanoTime();
e = System.nanoTime();
sum2 += -s + e;
}
61
61
A few other tests: http://pastebin.com/QJ93NZxP

This answer is supposition. If you update your question with some details about your environment, it's likely that someone else can give a more detailed, grounded answer.
The nanoTime() function works by accessing some high-resolution timer with low access latency. On the x86, I believe this is the Time Stamp Counter, which is driven by the basic clock cycle of the machine.
If you're seeing consistent results of +/- 60 ns, then I believe you're simply seeing the basic interval of the timer on your machine.
However, what about the negative numbers? Again, supposition, but if you read the Wikipedia article, you'll see a comment that Intel processors might re-order the instructions.

In conjunction with roundar, we ran a number of tests on this code. In summary, the effect disappeared when:
Running the same code in interpreted mode (-Xint)
Changing the aggregation logic order from sum += e - s to sum += -s + e
Running on some different architectures or different VMs (eg I ran on Java 6 on Mac)
Placing logging statements inspecting s and e
Performing additional arithmetic on s and e
In addition, the effect is not threading:
There are no additional threads spawned
Only local variables are involved
This effect is 100% reproducible in roundar's environment, and always results in precisely the same timings, namely +61 and -61.
The effect is not a timing issue because:
The execution takes place over 10m iterations
This effect is 100% reproducible in roundar's environment
The result is precisely the same timings, namely +61 and -61, on all iterations.
Given the above, I believe we have a bug in the hotspot module of Java VM. The code as written should return positive results, but does not.

straight from oracle's documentation
In short: the frequency of updating the values can cause results to differ.
nanoTime
public static long nanoTime()
Returns the current value of the most precise available system timer, in nanoseconds.
This method can only be used to measure elapsed time and is not related to any other notion of system or wall-clock time.
The value returned represents nanoseconds since some fixed but arbitrary time
(perhaps in the future, so values may be negative). This method
provides nanosecond precision, but not necessarily nanosecond
accuracy. No guarantees are made about how frequently values change.
Differences in successive calls that span greater than approximately
292 years (263 nanoseconds) will not accurately compute elapsed time
due to numerical overflow.
For example, to measure how long some code takes to execute:
long startTime = System.nanoTime();
// ... the code being measured ...
long estimatedTime = System.nanoTime() - startTime;
Returns:
The current value of the system timer, in nanoseconds.
Since:
1.5

java short,integer,long performance

I read that JVM stores internally short, integer and long as 4 bytes. I read it from an article from the year 2000, so I don't know how true it is now.
For the newer JVMs, is there any performance gain in using short over integer/long? And did that part of the implementation has changed since 2000?
Thanks

Integer types are stored in many bytes, depending on the exact type :
byte on 8 bits
short on 16 bits, signed
int on 32 bits, signed
long on 64 bits, signed
See the spec here.
As for performance, it depends on what you're doing with them.
For example, if you're assigning a literal value to a byte or short, they will be upscaled to int because literal values are considered as ints by default.
byte b = 10; // upscaled to int, because "10" is an int
That's why you can't do :
byte b = 10;
b = b + 1; // Error, right member converted to int, cannot be reassigned to byte without a cast.
So, if you plan to use bytes or shorts to perform some looping, you won't gain anything.
for (byte b=0; b<10; b++)
{ ... }
On the other hand, if you're using arrays of bytes or shorts to store some data, you will obviously benefit from their reduced size.
byte[] bytes = new byte[1000];
int[] ints = new int[1000]; // 4X the size
So, my answer is : it depends :)

long 64 –9,223,372,036,854,775,808 to 9 ,223,372,036,854,775,807
int 32 –2,147,483,648 to 2,147,483,647
short 16 –32,768 to 32,767
byte 8 –128 to 127
Use what you need, I would think shorts are rarely used due to the small range and it is in big-endian format.
Any performance gain would be minimal, but like I said if your application requires a range more then that of a short go with int. The long type may be too extremly large for you; but again it all depends on your application.
You should only use short if you have a concern over space (memory) otherwise use int (in most cases). If you are creating arrays and such try it out by declaring arrays of type int and short. Short will use 1/2 of the space as opposed to the int. But if you run the tests based on speed / performance you will see little to no difference (if you are dealing with Arrays), in addition, the only thing you save is space.
Also being that a commentor mentioned long because a long is 64 bits. You will not be able to store the size of a long in 4 bytes (notice the range of long).

It's an implementation detail, but it's still true that for performance reasons, most JVMs will use a full word (or more) for each variable, since CPUs access memory in word units. If the JVM stored the variables in sub-word units and locations, it would actually be slower.
This means that a 32bit JVM will use 4 bytes for short (and even boolean) while a 64bit JVM will use 8 bytes. However, the same is not true for array elements.

There's basically no difference. One has to "confuse" the JITC a bit so that it doesn't recognize that the increment/decrement operations are self-cancelling and that the results aren't used. Do that and the three cases come out about equal. (Actually, short seems to be a tiny bit faster.)
public class ShortTest {
public static void main(String[] args){
// Do the inner method 5 times to see how it changes as the JITC attempts to
// do further optimizations.
for (int i = 0; i < 5; i++) {
calculate(i);
}
}
public static void calculate(int passNum){
System.out.println("Pass " + passNum);
// Broke into two (nested) loop counters so the total number of iterations could
// be large enough to be seen on the clock. (Though this isn't as important when
// the JITC over-optimizations are prevented.)
int M = 100000;
int N = 100000;
java.util.Random r = new java.util.Random();
short x = (short) r.nextInt(1);
short y1 = (short) (x + 1);
int y2 = x + 1;
long y3 = x + 1;
long time1=System.currentTimeMillis();
short s=x;
for (int j = 0; j<M;j++) {
for(int i = 0; i<N;i++) {
s+=y1;
s-=1;
if (s > 100) {
System.out.println("Shouldn't be here");
}
}
}
long time2=System.currentTimeMillis();
System.out.println("Time elapsed for shorts: "+(time2-time1) + " (" + time1 + "," + time2 + ")");
long time3=System.currentTimeMillis();
int in=x;
for (int j = 0; j<M;j++) {
for(int i = 0; i<N;i++) {
in+=y2;
in-=1;
if (in > 100) {
System.out.println("Shouldn't be here");
}
}
}
long time4=System.currentTimeMillis();
System.out.println("Time elapsed for ints: "+(time4-time3) + " (" + time3 + "," + time4 + ")");
long time5=System.currentTimeMillis();
long l=x;
for (int j = 0; j<M;j++) {
for(int i = 0; i<N;i++) {
l+=y3;
l-=1;
if (l > 100) {
System.out.println("Shouldn't be here");
}
}
}
long time6=System.currentTimeMillis();
System.out.println("Time elapsed for longs: "+(time6-time5) + " (" + time5 + "," + time6 + ")");
System.out.println(s+in+l);
}
}
Results:
C:\JavaTools>java ShortTest
Pass 0
Time elapsed for shorts: 59119 (1422405830404,1422405889523)
Time elapsed for ints: 45810 (1422405889524,1422405935334)
Time elapsed for longs: 47840 (1422405935335,1422405983175)
0
Pass 1
Time elapsed for shorts: 58258 (1422405983176,1422406041434)
Time elapsed for ints: 45607 (1422406041435,1422406087042)
Time elapsed for longs: 46635 (1422406087043,1422406133678)
0
Pass 2
Time elapsed for shorts: 31822 (1422406133679,1422406165501)
Time elapsed for ints: 39663 (1422406165502,1422406205165)
Time elapsed for longs: 37232 (1422406205165,1422406242397)
0
Pass 3
Time elapsed for shorts: 30392 (1422406242398,1422406272790)
Time elapsed for ints: 37949 (1422406272791,1422406310740)
Time elapsed for longs: 37634 (1422406310741,1422406348375)
0
Pass 4
Time elapsed for shorts: 31303 (1422406348376,1422406379679)
Time elapsed for ints: 36583 (1422406379680,1422406416263)
Time elapsed for longs: 38730 (1422406416264,1422406454994)
0
C:\JavaTools>java -version
java version "1.7.0_65"
Java(TM) SE Runtime Environment (build 1.7.0_65-b19)
Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)

I agree with user2391480, calculations with shorts seem to be way more expensive. Here is an example, where on my machine (Java7 64bit, Intel i7-3770, Windows 7) operations with shorts are around ~50 times slower than integers and longs.
public class ShortTest {
public static void main(String[] args){
calculate();
calculate();
}
public static void calculate(){
int N = 100000000;
long time1=System.currentTimeMillis();
short s=0;
for(int i = 0; i<N;i++) {
s+=1;
s-=1;
}
long time2=System.currentTimeMillis();
System.out.println("Time elapsed for shorts: "+(time2-time1));
long time3=System.currentTimeMillis();
int in=0;
for(int i = 0; i<N;i++) {
in+=1;
in-=1;
}
long time4=System.currentTimeMillis();
System.out.println("Time elapsed for ints: "+(time4-time3));
long time5=System.currentTimeMillis();
long l=0;
for(int i = 0; i<N;i++) {
l+=1;
l-=1;
}
long time6=System.currentTimeMillis();
System.out.println("Time elapsed for longs: "+(time6-time5));
System.out.println(s+in+l);
}
}
Output:
Time elapsed for shorts: 113
Time elapsed for ints: 2
Time elapsed for longs: 2
0
Time elapsed for shorts: 119
Time elapsed for ints: 2
Time elapsed for longs: 2
0
Note: specifying "1" to be a short (in order to avoid casting every time, as suggested by user Robotnik as a source of delay) does not seem to help, e.g.
short s=0;
short one = (short)1;
for(int i = 0; i<N;i++) {
s+=one;
s-=one;
}
EDIT: modified as per request of user Hot Licks in the comment, in order to invoke the calculate() method more than once outside the main method.

Calculations with a short type are extremely expensive.
Take the following useless loop for example:
short t=0;
//int t=0;
//long t=0;
for(many many times...)
{
t+=1;
t-=1;
}
If it is a short, it will take literally 1000s of times longer than if it's an int or a long.
Checked on 64-bit JVMs versions 6/7 on Linux

database sort vs. programmatic java sort

I want to get data from the database (MySQL) by JPA, I want it sorted by some column value.
So, what is the best practice, to:
Retrieve the data from the database as list of objects (JPA), then
sort it programmatically using some java APIs.
OR
Let the database sort it by using a sorting select query.
Thanks in advance

If you are retrieving a subset of all the database data, for example displaying 20 rows on screen out of 1000, it is better to sort on the database. This will be faster and easier and will allow you to retrieve one page of rows (20, 50, 100) at a time instead of all of them.
If your dataset is fairly small, sorting in your code may be more convenient if you want implement a complex sort. Usually this complex sort can be done in SQL but not as easily as in code.
The short of it is, the rule of thumb is sort via SQL, with some edge cases to the rule.

In general, you're better off using ORDER BY in your SQL query -- this way, if there is an applicable index, you may be getting your sorting "for free" (worst case, it will be the same amount of work as doing it in your code, but often it may be less work than that!).

I ran into this very same question, and decided that I should run a little benchmark to quantify the speed differences. The results surprised me. I would like to post my experience with this very sort of question.
As with a number of the other posters here, my thought was that the database layer would do the sort faster because they are supposedly tuned for this sort of thing. #Alex made a good point that if the database already has an index on the sort, then it will be faster. I wanted to answer the question which raw sorting is faster on non-indexed sorts. Note, I said faster, not simpler. I think in many cases letting the db do the work is simpler and less error prone.
My main assumption was that the sort would fit in main memory. Not all problems will fit here, but a good number do. For out of memory sorts, it may well be that databases shine here, though I did not test that. In the case of in memory sorts all of java/c/c++ outperformed mysql in my informal benchmark, if one could call it that.
I wish I had had more time to more thoroughly compare the database layer vs application layer, but alas other duties called. Still, I couldn't help but record this note for others who are traveling down this road.
As I started down this path I started to see more hurdles. Should I compare data transfer? How? Can I compare time to read db vs time to read a flat file in java? How to isolate the sort time vs data transfer time vs time to read the records? With these questions here was the methodology and timing numbers I came up with.
All times in ms unless otherwise posted
All sort routines were the defaults provided by the language (these are good enough for random sorted data)
All compilation was with a typical "release-profile" selected via netbeans with no customization unless otherwise posted
All tests for mysql used the following schema
mysql> CREATE TABLE test_1000000
(
pk bigint(11) NOT NULL,
float_value DOUBLE NULL,
bigint_value bigint(11) NULL,
PRIMARY KEY (pk )
) Engine MyISAM;
mysql> describe test_1000000;
+--------------+------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------+------------+------+-----+---------+-------+
| pk | bigint(11) | NO | PRI | NULL | |
| float_value | double | YES | | NULL | |
| bigint_value | bigint(11) | YES | | NULL | |
+--------------+------------+------+-----+---------+-------+
First here is a little snippet to populate the DB. There may be easier ways, but this is what I did:
public static void BuildTable(Connection conn, String tableName, long iterations) {
Random ran = new Random();
Math.random();
try {
long epoch = System.currentTimeMillis();
for (long i = 0; i < iterations; i++) {
if (i % 100000 == 0) {
System.out.println(i + " next 100k");
}
PerformQuery(conn, tableName, i, ran.nextDouble(), ran.nextLong());
}
} catch (Exception e) {
logger.error("Caught General Exception Error from main " + e);
}
}
MYSQL Direct CLI results:
select * from test_10000000 order by bigint_value limit 10;
10 rows in set (2.32 sec)
These timings were somewhat difficult as the only info I had was the time reported after the execution of the command.
from mysql prompt for 10000000 elements it is roughly 2.1 to 2.4 either for sorting bigint_value or float_value
Java JDBC mysql call (similar performance to doing sort from mysql cli)
public static void SortDatabaseViaMysql(Connection conn, String tableName) {
try {
Statement stmt = conn.createStatement();
String cmd = "SELECT * FROM " + tableName + " order by float_value limit 100";
ResultSet rs = stmt.executeQuery(cmd);
} catch (Exception e) {
}
}
Five runs:
da=2379 ms
da=2361 ms
da=2443 ms
da=2453 ms
da=2362 ms
Java Sort Generating random numbers on fly (actually was slower than disk IO read). Assignment time is the time to generate random numbers and populate the array
Calling like
JavaSort(10,10000000);
Timing results:
assignment time 331 sort time 1139
assignment time 324 sort time 1037
assignment time 317 sort time 1028
assignment time 319 sort time 1026
assignment time 317 sort time 1018
assignment time 325 sort time 1025
assignment time 317 sort time 1024
assignment time 318 sort time 1054
assignment time 317 sort time 1024
assignment time 317 sort time 1017
These results were for reading a file of doubles in binary mode
assignment time 4661 sort time 1056
assignment time 4631 sort time 1024
assignment time 4733 sort time 1004
assignment time 4725 sort time 980
assignment time 4635 sort time 980
assignment time 4725 sort time 980
assignment time 4667 sort time 978
assignment time 4668 sort time 980
assignment time 4757 sort time 982
assignment time 4765 sort time 987
Doing a buffer transfer results in much faster runtimes
assignment time 77 sort time 1192
assignment time 59 sort time 1125
assignment time 55 sort time 999
assignment time 55 sort time 1000
assignment time 56 sort time 999
assignment time 54 sort time 1010
assignment time 55 sort time 999
assignment time 56 sort time 1000
assignment time 55 sort time 1002
assignment time 56 sort time 1002
C and C++ Timing results (see below for source)
Debug profile using qsort
assignment 0 seconds 110 milliseconds Time taken 2 seconds 340 milliseconds
assignment 0 seconds 90 milliseconds Time taken 2 seconds 340 milliseconds
assignment 0 seconds 100 milliseconds Time taken 2 seconds 330 milliseconds
assignment 0 seconds 100 milliseconds Time taken 2 seconds 340 milliseconds
assignment 0 seconds 100 milliseconds Time taken 2 seconds 330 milliseconds
assignment 0 seconds 100 milliseconds Time taken 2 seconds 340 milliseconds
assignment 0 seconds 90 milliseconds Time taken 2 seconds 340 milliseconds
assignment 0 seconds 100 milliseconds Time taken 2 seconds 330 milliseconds
assignment 0 seconds 100 milliseconds Time taken 2 seconds 340 milliseconds
assignment 0 seconds 100 milliseconds Time taken 2 seconds 330 milliseconds
Release profile using qsort
assignment 0 seconds 100 milliseconds Time taken 1 seconds 600 milliseconds
assignment 0 seconds 90 milliseconds Time taken 1 seconds 600 milliseconds
assignment 0 seconds 90 milliseconds Time taken 1 seconds 580 milliseconds
assignment 0 seconds 90 milliseconds Time taken 1 seconds 590 milliseconds
assignment 0 seconds 80 milliseconds Time taken 1 seconds 590 milliseconds
assignment 0 seconds 90 milliseconds Time taken 1 seconds 590 milliseconds
assignment 0 seconds 90 milliseconds Time taken 1 seconds 600 milliseconds
assignment 0 seconds 90 milliseconds Time taken 1 seconds 590 milliseconds
assignment 0 seconds 90 milliseconds Time taken 1 seconds 600 milliseconds
assignment 0 seconds 90 milliseconds Time taken 1 seconds 580 milliseconds
Release profile Using std::sort( a, a + ARRAY_SIZE );
assignment 0 seconds 100 milliseconds Time taken 0 seconds 880 milliseconds
assignment 0 seconds 90 milliseconds Time taken 0 seconds 870 milliseconds
assignment 0 seconds 90 milliseconds Time taken 0 seconds 890 milliseconds
assignment 0 seconds 120 milliseconds Time taken 0 seconds 890 milliseconds
assignment 0 seconds 90 milliseconds Time taken 0 seconds 890 milliseconds
assignment 0 seconds 90 milliseconds Time taken 0 seconds 880 milliseconds
assignment 0 seconds 90 milliseconds Time taken 0 seconds 900 milliseconds
assignment 0 seconds 90 milliseconds Time taken 0 seconds 890 milliseconds
assignment 0 seconds 100 milliseconds Time taken 0 seconds 890 milliseconds
assignment 0 seconds 150 milliseconds Time taken 0 seconds 870 milliseconds
Release profile Reading random data from file and using std::sort( a, a + ARRAY_SIZE )
assignment 0 seconds 50 milliseconds Time taken 0 seconds 880 milliseconds
assignment 0 seconds 40 milliseconds Time taken 0 seconds 880 milliseconds
assignment 0 seconds 50 milliseconds Time taken 0 seconds 880 milliseconds
assignment 0 seconds 50 milliseconds Time taken 0 seconds 880 milliseconds
assignment 0 seconds 40 milliseconds Time taken 0 seconds 880 milliseconds
Below is the source code used. Hopefully minimal bugs :)
Java source
Note that internal to JavaSort the runCode and writeFlag need to be adjusted depending on what you want to time. Also note that the memory allocation happens in the for loop (thus testing GC, but I did not see any appreciable difference moving the allocation outside the loop)
public static void JavaSort(int iterations, int numberElements) {
Random ran = new Random();
Math.random();
int runCode = 2;
boolean writeFlag = false;
for (int j = 0; j < iterations; j++) {
double[] a1 = new double[numberElements];
long timea = System.currentTimeMillis();
if (runCode == 0) {
for (int i = 0; i < numberElements; i++) {
a1[i] = ran.nextDouble();
}
}
else if (runCode == 1) {
//do disk io!!
try {
DataInputStream in = new DataInputStream(new FileInputStream("MyBinaryFile.txt"));
int i = 0;
//while (in.available() > 0) {
while (i < numberElements) { //this should be changed so that I always read in the size of array elements
a1[i++] = in.readDouble();
}
}
catch (Exception e) {
}
}
else if (runCode == 2) {
try {
FileInputStream stream = new FileInputStream("MyBinaryFile.txt");
FileChannel inChannel = stream.getChannel();
ByteBuffer buffer = inChannel.map(FileChannel.MapMode.READ_ONLY, 0, inChannel.size());
//int[] result = new int[500000];
buffer.order(ByteOrder.BIG_ENDIAN);
DoubleBuffer doubleBuffer = buffer.asDoubleBuffer();
doubleBuffer.get(a1);
}
catch (Exception e) {
}
}
if (writeFlag) {
try {
DataOutputStream out = new DataOutputStream(new FileOutputStream("MyBinaryFile.txt"));
for (int i = 0; i < numberElements; i++) {
out.writeDouble(a1[i]);
}
} catch (Exception e) {
}
}
long timeb = System.currentTimeMillis();
Arrays.sort(a1);
long timec = System.currentTimeMillis();
System.out.println("assignment time " + (timeb - timea) + " " + " sort time " + (timec - timeb));
//delete a1;
}
}
C/C++ source
#include <iostream>
#include <vector>
#include <algorithm>
#include <fstream>
#include <cstdlib>
#include <ctime>
#include <cstdio>
#include <math.h>
#include <stdio.h>
#include <time.h>
#include <stdlib.h>
#define ARRAY_SIZE 10000000
using namespace std;
int compa(const void * elem1, const void * elem2) {
double f = *((double*) elem1);
double s = *((double*) elem2);
if (f > s) return 1;
if (f < s) return -1;
return 0;
}
int compb (const void *a, const void *b) {
if (*(double **)a < *(double **)b) return -1;
if (*(double **)a > *(double **)b) return 1;
return 0;
}
void timing_testa(int iterations) {
clock_t start = clock(), diffa, diffb;
int msec;
bool writeFlag = false;
int runCode = 1;
for (int loopCounter = 0; loopCounter < iterations; loopCounter++) {
double *a = (double *) malloc(sizeof (double)*ARRAY_SIZE);
start = clock();
size_t bytes = sizeof (double)*ARRAY_SIZE;
if (runCode == 0) {
for (int i = 0; i < ARRAY_SIZE; i++) {
a[i] = rand() / (RAND_MAX + 1.0);
}
}
else if (runCode == 1) {
ifstream inlezen;
inlezen.open("test", ios::in | ios::binary);
inlezen.read(reinterpret_cast<char*> (&a[0]), bytes);
}
if (writeFlag) {
ofstream outf;
const char* pointer = reinterpret_cast<const char*>(&a[0]);
outf.open("test", ios::out | ios::binary);
outf.write(pointer, bytes);
outf.close();
}
diffa = clock() - start;
msec = diffa * 1000 / CLOCKS_PER_SEC;
printf("assignment %d seconds %d milliseconds\t", msec / 1000, msec % 1000);
start = clock();
//qsort(a, ARRAY_SIZE, sizeof (double), compa);
std::sort( a, a + ARRAY_SIZE );
//printf("%f %f %f\n",a[0],a[1000],a[ARRAY_SIZE-1]);
diffb = clock() - start;
msec = diffb * 1000 / CLOCKS_PER_SEC;
printf("Time taken %d seconds %d milliseconds\n", msec / 1000, msec % 1000);
free(a);
}
}
/*
*
*/
int main(int argc, char** argv) {
printf("hello world\n");
double *a = (double *) malloc(sizeof (double)*ARRAY_SIZE);
//srand(1);//change seed to fix it
srand(time(NULL));
timing_testa(5);
free(a);
return 0;
}

This is not completely on point, but I posted something recently that relates to database vs. application-side sorting. The article is about a .net technique, so most of it likely won't be interesting to you, but the basic principles remain:
Deferring sorting to the client side (e.g. jQuery, Dataset/Dataview sorting) may look tempting. And it actually is a viable option for paging, sorting and filtering, if (and only if):
1. the set of data is small, and
1. there is little concern about performance and scalability
From my experience, the systems that meet this kind of criteria are few and far between. Note that it’s not possible to mix and match sorting/paging in the application/database—if you ask the database for an unsorted 100 rows of data, then sort those rows on the application side, you’re likely not going to get the set of data you were expecting. This may seem obvious, but I’ve seen the mistake made enough times that I wanted to at least mention it.
It is much more efficient to sort and filter in the database for a number of reasons. For one thing, database engines are highly optimized for doing exactly the kind of work that sorting and filtering entail; this is what their underlying code was designed to do. But even barring that—even assuming you could write code that could match the kind of sorting, filtering and paging performance of a mature database engine—it’s still preferable to do this work in the database, for the simple reason that it’s more efficient to limit the amount of data that is transferred from the database to the application server.
So for example, if you have 10,000 rows before filtering, and your query pares that number down to 75, filtering on the client results in the data from all 10,000 rows being passed over the wire (and into your app server’s memory), where filtering on the database side would result in only the filtered 75 rows being moved between database and application. his can make a huge impact on performance and scalability.
The full post is here:
http://psandler.wordpress.com/2009/11/20/dynamic-search-objects-part-5sorting/

I'm almost positive that it will be faster to allow the Database to sort it. There's engineers who spend a lot of time perfecting and optimizing their search algorithms, whereas you'll have to implement your own sorting algorithm which might add a few more computations.

I would let the database do the sort, they are generally very good at that.

Let the database sort it. Then you can have paging with JPA easily without readin in the whole resultset.

Well, there is not really a straightforward way to answer this; it must be answered in the context.
Is your application (middle tier) is running in the same node as the database?
If yes, you do not have to worry about the latency between the database and middle tier. Then the question becomes: How big is the subset/resultset of your query? Remember that to sort this is middle tier, you will take a list/set of size N, and either write a custom comparator or use the default Collection comparator. Or, whatever. So at the outset, you are setback by the size N.
But if the answer is no, then you are hit by the latency involved in transferring your resultset from DB to middle tier. And then if you are performing pagination, which is the last thing you should do, you are throwing away 90-95% of that resultset after cutting the pages.
So the wasted bandwidth cannot be justified. Imagine doing this for every request, across your tenant organizations.
However way you look at it, this is bad design.
I would do this in the database, no matter what. Just because almost all applications today demand pagination; even if they don't sending massive resultsets over the wire to your client is a total waste; drags everybody down across all your tenants.
One interesting idea that I am toying with these days is to harness the power of HTML5, 2-way data binding in browser frameworks like Angular, and push some processing back to the browser. That way, you dont end up waiting in the line for someone else before you to finish. True distributed processing. But care must be taken in deciding what can be pushed and what not.

Depends on the context.
TL;DR
If you have the full data in your application server, do it in the application server.
If you have the full dataset that you need on the application server side already then it is better to do it on the application server side because those servers can scale horizontally. The most likely scenarios for this are:
the data set you're retrieving from the database is small
you cached the data on the application server side on startup
You're doing event sourcing and you're building up the data in the application server side anyway.
Don't do it on client side unless you can guarantee it won't impact the client devices.
Databases themselves may be optimized, but if you can pull burden away from them you can reduce your costs overall because scaling the databases up is more expensive than scaling up application servers.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.