Related
I was solving a problem and the basic idea to calculate the power of 2 for some k. And then multiply it with 10. Result should be calculated value mod
10^9+7.
Given Constraints 1≤K≤10^9
I am using java language for this. I used 'Math.pow' function but 2^10000000 exceeds its range and I don't want to use 'BigInteger' here. Any other way to calculate such large values.
The actual problem is:
For each valid i, the sign with number i had the integer i written on one side and 10K−i−1 written on the other side.
Now, Marichka is wondering — how many road signs have exactly two distinct decimal digits written on them (on both sides in total)? Since this number may be large, compute it modulo 10^9+7.
I'm using this pow approach, but this is not an efficient way. Any suggestion to solve this problem.
My original Solution:
/* package codechef; // don't place package name! */
import java.util.*;
class Codechef
{
public static void main (String[] args) throws java.lang.Exception
{
Scanner scan = new Scanner(System.in);
int t = scan.nextInt();
while(t-->0){
long k = scan.nextInt();
long mul=10*(long)Math.pow(2, k-1);
long ans = mul%1000000007;
System.out.println(ans);
}
}
}
After taking some example, I reached that this pow solution works fine for small constraints but not for large.
while(t-->0){
long k = scan.nextInt();
long mul=10*(long)Math.pow(2, k);
long ans = mul%1000000007;
System.out.println(ans);
}
This pow function is exceeding its range. Any good solution to this.
Basically, f(g(x)) mod M is the same as f(g(x) mod M) mod M. As exponentiation is just a lot of multiplication, you can just decompose your single exponentiation into many multiplications, and apply modulo at every step. i.e.
10 * 2^5 mod 13
is the same as
10
* 2 mod 13
* 2 mod 13
* 2 mod 13
* 2 mod 13
* 2 mod 13
You can compact the loop by not breaking up the exponentiation so far; i.e. this would give the same answer, again:
10
* 4 mod 13
* 4 mod 13
* 2 mod 13
Faruk's recursive solution shows an elegant way to do this.
You need to use the idea of dividing the power by 2.
long bigmod(long p,long e,long M) {
if(e==0)
return 1;
if(e%2==0) {
long t=bigmod(p,e/2,M);
return (t*t)%M;
}
return (bigmod(p,e-1,M)*p)%M;
}
while(t-->0){
long k = scan.nextInt();
long ans = bigmod(2, k, 1000000007);
System.out.println(ans);
}
You can get details about the idea from here: https://www.geeksforgeeks.org/how-to-avoid-overflow-in-modular-multiplication/
As the size of long is 8 bytes and it is signed datatype so the range of long datatype is -(2^63) to (2^63 - 1). Hence to store 2^100 you have to use another datatype.
I am trying to code in java prime number algorithm. I am trying to face the problems with the integer and long size. My simply code is the following which works until a limit for n:
public static long f(int n) {
if (n == 1 || n == 2)
return 1;
else {
long value = f(n-2) + f(n-1);
return value;
}
}
If in my main I ll give 50 for example my code will be crushed, I am guessing due to the size of the outcome. I ve got also another approach which I am struggle to understand it, which is the following:
private static long[] cache = new long[60];
public static long f(int n) {
if (n == 1 || n == 2)
return 1;
else if (cache[n] > 0)
return cache[n];
else {
long value = f(n-2) + f(n-1);
cache[n] = value;
return value;
}
}
With this approach everything works fine whatever is the n, my issue is that I cannot get the difference.
By "crushed" you mean that the computation runs very long. The reason is that the same call is made many times. If you add this to your method:
static long count;
public static long f(int n) {
count++;
...
you'll see how many times the method is executed. For f(50), it is actually calling the method 25,172,538,049 times, which runs in 41 seconds in my machine.
When you cache the result of previous invocations, called memoization, you eliminate all the redundant calls, e.g. f(40) = f(39) + f(38), but f(39) = f(38) + f(37), so f(38) is called twice. Remembering the result of f(38) means that subsequent invocation has the answer immediately, without having to redo the recursion.
Without memoization, I get this:
n f(n) count time(ns)
== ============== =============== ==============
1 1 1 6,273
2 1 1 571
3 2 3 855
4 3 5 1,141
5 5 9 1,425
6 8 15 1,140
7 13 25 1,996
8 21 41 2,851
9 34 67 7,413
10 55 109 16,536
11 89 177 8,839
12 144 287 19,103
13 233 465 21,098
14 377 753 11,405
15 610 1,219 5,703
16 987 1,973 9,979
17 1,597 3,193 21,099
18 2,584 5,167 32,788
19 4,181 8,361 35,639
20 6,765 13,529 57,307
21 10,946 21,891 91,521
22 17,711 35,421 147,687
23 28,657 57,313 237,496
24 46,368 92,735 283,970
25 75,025 150,049 331,583
26 121,393 242,785 401,720
27 196,418 392,835 650,052
28 317,811 635,621 1,053,483
29 514,229 1,028,457 1,702,679
30 832,040 1,664,079 2,750,745
31 1,346,269 2,692,537 4,455,137
32 2,178,309 4,356,617 12,706,520
33 3,524,578 7,049,155 11,714,051
34 5,702,887 11,405,773 19,571,980
35 9,227,465 18,454,929 30,605,757
36 14,930,352 29,860,703 51,298,507
37 24,157,817 48,315,633 84,473,965
38 39,088,169 78,176,337 127,818,746
39 63,245,986 126,491,971 208,727,118
40 102,334,155 204,668,309 336,785,071
41 165,580,141 331,160,281 543,006,638
42 267,914,296 535,828,591 875,782,771
43 433,494,437 866,988,873 1,429,555,753
44 701,408,733 1,402,817,465 2,301,577,345
45 1,134,903,170 2,269,806,339 3,724,691,882
46 1,836,311,903 3,672,623,805 6,010,675,962
47 2,971,215,073 5,942,430,145 9,706,561,705
48 4,807,526,976 9,615,053,951 15,715,064,841
49 7,778,742,049 15,557,484,097 25,427,015,418
50 12,586,269,025 25,172,538,049 41,126,559,697
If you get a StackOverflowError, it is due to the recursive nature of your algorithm. The second algorithm stores known results into an array to prevent functions piling up when it asks for an already computed result.
One of the problem can be the big number result, which is more than integer limit:
fibonacci for 50 =
12,586,269,025
2,147,483,647 - int max
and the other can be due to the recursion stackoverflow like #Xalamadrax pointed.
So in my java class we have a homework assignment to use System.currentTimeMillis to display the amount of time between clicks. I've tried and tried but it isn't working. Here's my code.
1 /* Matthew Caldwell
2 * September 21, 2011
3 * Homework #4: Problem 5.8.1 pg. 149
4 * Program that displays the amount of time that passed
5 * between two mouse clicks. Nothing is displayed
6 * on the first click, then each successive click will
7 * display how much time passed between that click and
8 * the previous one.
9 */
10
11 import objectdraw.*;
12 import java.awt.*;
13
14 public class ElapsedTimeClient extends WindowController {
15
16 public static void main(String[]args) {
17 new ElapsedTimeClient().startController(800,500);
18 }
19
20 private Text title,result;
21 private double count = 0;
22
23 public void begin() {
24
25 // Set up the title and result
26 title = new Text("CLICK COUNTER",
27 canvas.getWidth() / 2,
28 20, canvas);29 title.move(-title.getWidth() / 2, 0);
30 result = new Text("", canvas.getWidth() / 2, 40, canvas);
31
32 }
33
34 public void onMouseClick(Location p) {
35
36 double timerS=0;
37
38 if(count == 0) {
39 timerS = System.currentTimeMillis();
40 } else {
41 result.setText("The time between clicks was " +
42 (System.currentTimeMillis() - timerS)/1000 +
43 " seconds.");
44 timerS = System.currentTimeMillis();
45 }
46
47 count++;
48
49 }
50
51 }
I don't really want anyone to completely tell me how to do it but I just need a little guidance. What am I doing to make this wrong?
The code all compiles and runs just fine but when I click instead of giving me the time that elapsed between clicks it's giving me a big long number that never changes. It's telling me 1.316639174817E9 almost every single time.
Firstly, system time in millis should be represented as a long, not a double.
Secondly, you need to make the variable that holds the time since the last click (timerS) an instance variable, it's currently method local and so reset every time.
In short, change:
double timerS=0;
from being a local variable, to an instance variable, and a long:
public class ElapsedTimeClient extends WindowController {
long timerS;
the timerS variable is declared inside of the onMouseClick method, and thus only exists within this method. After your first mouse click, the variable disappears and can't be used to compare times.
Instead you should use a class variable to store this information in.
Your problems are:
timerS should be a field of the class (not a local variable) otherwise its value will not be held between calls to your method
timerS should be of type long - the type that's returned from the system time
Also:
count should be type int
In addition to the other answers mentioned, System.nanoTime is the preferred method for this type of measurement. Rather than measuring the difference in "clock time", this method measured the number of nanoseconds. You will often find that the resolution on currentTimeMilils is no better than around 16 ms, whereas nanoTime can resolve much smaller intervals.
You don't need double type for timerS. currentTimeMillis() returns long.
Why this happend?
If you use double type like this:
(double - long) / 1000,
this is interpreted on:
double / double = double
So in the result you have "precise" value (e.g. 1.316639174817E9) instead of long one.
I have this question about the performance of a method in Java with a variable number of parameters.
Say I have the following 2 alternatives:
public static final boolean isIn(int i, int v1, int v2) {
return (v1 == i) || (v2 == i);
}
public static final boolean isIn(int i, int... values) {
for (int v : values) {
if (i == v) {
return true;
}
}
return false;
}
Now the main problem comes when I have versions of the first method that go up to 20, 30 or even 50 parameters. Now that just hurts the eyes. Ok, this is legacy code and I'd like to replace all of it with the only one variable arguments method.
Any idea what the impact on performance would be? Any chance the compiler does some optimization for the second method so that it resembles the first form more or less?
EDIT: Ok, maybe I was not clear enough. I don't have performance problems with the methods with 50 arguments. It's just about readability as Peter Lawrey said.
I was wondering about performance problems if I switch to the new method with variable number of arguments.
In other words: what would be the best way to do it if you care about performance? Methods with 50 arguments or the only one method with variable arguments?
I had the same question, and turned to experimentation.
public class ArgTest {
int summation(int a, int b, int c, int d, int e, int f) {
return a + b + c + d + e + f;
}
int summationVArgs(int... args) {
int sum = 0;
for (int arg : args) {
sum += arg;
}
return sum;
}
final static public int META_ITERATIONS = 200;
final static public int ITERATIONS = 1000000;
static public void main(String[] args) {
final ArgTest at = new ArgTest();
for (int loop = 0; loop < META_ITERATIONS; loop++) {
int sum = 0;
final long fixedStart = System.currentTimeMillis();
for (int i = 0; i < ITERATIONS; i++) {
sum += at.summation(2312, 45569, -9816, 19122, 4991, 901776);
}
final long fixedEnd = System.currentTimeMillis();
final long vargStart = fixedEnd;
for (int i = 0; i < ITERATIONS; i++) {
sum += at.summationVArgs(2312, 45569, -9816, 19122, 4991, 901776);
}
final long vargEnd = System.currentTimeMillis();
System.out.printf("%03d:%d Fixed-Args: %d ms\n", loop+1, ITERATIONS, fixedEnd - fixedStart);
System.out.printf("%03d:%d Vargs-Args: %d ms\n", loop+1, ITERATIONS, vargEnd - vargStart);
}
System.exit(0);
}
}
If you run this code on a modern JVM (here 1.8.0_20), you will see that the variable number of arguments cause overhead in performance and possible in memory consumption as well.
I'll only post the first 25 runs:
001:1000000 Fixed-Args: 16 ms
001:1000000 Vargs-Args: 45 ms
002:1000000 Fixed-Args: 13 ms
002:1000000 Vargs-Args: 32 ms
003:1000000 Fixed-Args: 0 ms
003:1000000 Vargs-Args: 27 ms
004:1000000 Fixed-Args: 0 ms
004:1000000 Vargs-Args: 22 ms
005:1000000 Fixed-Args: 0 ms
005:1000000 Vargs-Args: 38 ms
006:1000000 Fixed-Args: 0 ms
006:1000000 Vargs-Args: 11 ms
007:1000000 Fixed-Args: 0 ms
007:1000000 Vargs-Args: 17 ms
008:1000000 Fixed-Args: 0 ms
008:1000000 Vargs-Args: 40 ms
009:1000000 Fixed-Args: 0 ms
009:1000000 Vargs-Args: 89 ms
010:1000000 Fixed-Args: 0 ms
010:1000000 Vargs-Args: 21 ms
011:1000000 Fixed-Args: 0 ms
011:1000000 Vargs-Args: 16 ms
012:1000000 Fixed-Args: 0 ms
012:1000000 Vargs-Args: 26 ms
013:1000000 Fixed-Args: 0 ms
013:1000000 Vargs-Args: 7 ms
014:1000000 Fixed-Args: 0 ms
014:1000000 Vargs-Args: 7 ms
015:1000000 Fixed-Args: 0 ms
015:1000000 Vargs-Args: 6 ms
016:1000000 Fixed-Args: 0 ms
016:1000000 Vargs-Args: 141 ms
017:1000000 Fixed-Args: 0 ms
017:1000000 Vargs-Args: 139 ms
018:1000000 Fixed-Args: 0 ms
018:1000000 Vargs-Args: 106 ms
019:1000000 Fixed-Args: 0 ms
019:1000000 Vargs-Args: 70 ms
020:1000000 Fixed-Args: 0 ms
020:1000000 Vargs-Args: 6 ms
021:1000000 Fixed-Args: 0 ms
021:1000000 Vargs-Args: 5 ms
022:1000000 Fixed-Args: 0 ms
022:1000000 Vargs-Args: 6 ms
023:1000000 Fixed-Args: 0 ms
023:1000000 Vargs-Args: 12 ms
024:1000000 Fixed-Args: 0 ms
024:1000000 Vargs-Args: 37 ms
025:1000000 Fixed-Args: 0 ms
025:1000000 Vargs-Args: 12 ms
...
Even at the best of times, the Vargs-Args never dropped to 0ms.
The compiler does next to no optimisation. The JVM can optimise code but the two methods won't perform anything like each other. If you have lines of code like isIn(i, 1,2,3,4,5,6,7,8,9 /* plus 40 more */) you have more than performance issues to worry about IMHO. I would worry about readability first.
If you are worried about performance pass the arguments as a int[] which is reused.
BTW The most efficient way to look up a large set of int values is to use a Set like TIntHashSet
to #Canonical Chris
I don't think problem at your test come from variable argument. The function sumationVArgs take more time to complete because of for loop.
I created this function and added to the benchmark
int summationVArgs2(int... args) {
return args[0] + args[1] + args[2] + args[3] + args[4] + args[5];
}
and this is what I see:
028:1000000 Fixed-Args: 0 ms
028:1000000 Vargs-Args: 12 ms
028:1000000 Vargs2-Args2: 0 ms
The for loop in function "summationVArgs" is compiled to more operations than add function. It contains add operation to increase iterator, check operation to check condition and branch operation to loop and exit loop, and all of them execute once for each loop except branch opration to exit loop.
Sorry for my bad English. I hop you can understand my English :)
Come back when you have profiler output that says this is a problem. Until then, it's premature optimization.
It will be the same as if you declared
isIn(int i, int[] values) {
However there will be some some small overhead in packaging the variables up when calling your method
Heard about the two optimisation rules:
Don't optimize
(For experts only!) Don't optimize yet
In other words this is nothing you should care about from the performance point of view.
I want to get data from the database (MySQL) by JPA, I want it sorted by some column value.
So, what is the best practice, to:
Retrieve the data from the database as list of objects (JPA), then
sort it programmatically using some java APIs.
OR
Let the database sort it by using a sorting select query.
Thanks in advance
If you are retrieving a subset of all the database data, for example displaying 20 rows on screen out of 1000, it is better to sort on the database. This will be faster and easier and will allow you to retrieve one page of rows (20, 50, 100) at a time instead of all of them.
If your dataset is fairly small, sorting in your code may be more convenient if you want implement a complex sort. Usually this complex sort can be done in SQL but not as easily as in code.
The short of it is, the rule of thumb is sort via SQL, with some edge cases to the rule.
In general, you're better off using ORDER BY in your SQL query -- this way, if there is an applicable index, you may be getting your sorting "for free" (worst case, it will be the same amount of work as doing it in your code, but often it may be less work than that!).
I ran into this very same question, and decided that I should run a little benchmark to quantify the speed differences. The results surprised me. I would like to post my experience with this very sort of question.
As with a number of the other posters here, my thought was that the database layer would do the sort faster because they are supposedly tuned for this sort of thing. #Alex made a good point that if the database already has an index on the sort, then it will be faster. I wanted to answer the question which raw sorting is faster on non-indexed sorts. Note, I said faster, not simpler. I think in many cases letting the db do the work is simpler and less error prone.
My main assumption was that the sort would fit in main memory. Not all problems will fit here, but a good number do. For out of memory sorts, it may well be that databases shine here, though I did not test that. In the case of in memory sorts all of java/c/c++ outperformed mysql in my informal benchmark, if one could call it that.
I wish I had had more time to more thoroughly compare the database layer vs application layer, but alas other duties called. Still, I couldn't help but record this note for others who are traveling down this road.
As I started down this path I started to see more hurdles. Should I compare data transfer? How? Can I compare time to read db vs time to read a flat file in java? How to isolate the sort time vs data transfer time vs time to read the records? With these questions here was the methodology and timing numbers I came up with.
All times in ms unless otherwise posted
All sort routines were the defaults provided by the language (these are good enough for random sorted data)
All compilation was with a typical "release-profile" selected via netbeans with no customization unless otherwise posted
All tests for mysql used the following schema
mysql> CREATE TABLE test_1000000
(
pk bigint(11) NOT NULL,
float_value DOUBLE NULL,
bigint_value bigint(11) NULL,
PRIMARY KEY (pk )
) Engine MyISAM;
mysql> describe test_1000000;
+--------------+------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------+------------+------+-----+---------+-------+
| pk | bigint(11) | NO | PRI | NULL | |
| float_value | double | YES | | NULL | |
| bigint_value | bigint(11) | YES | | NULL | |
+--------------+------------+------+-----+---------+-------+
First here is a little snippet to populate the DB. There may be easier ways, but this is what I did:
public static void BuildTable(Connection conn, String tableName, long iterations) {
Random ran = new Random();
Math.random();
try {
long epoch = System.currentTimeMillis();
for (long i = 0; i < iterations; i++) {
if (i % 100000 == 0) {
System.out.println(i + " next 100k");
}
PerformQuery(conn, tableName, i, ran.nextDouble(), ran.nextLong());
}
} catch (Exception e) {
logger.error("Caught General Exception Error from main " + e);
}
}
MYSQL Direct CLI results:
select * from test_10000000 order by bigint_value limit 10;
10 rows in set (2.32 sec)
These timings were somewhat difficult as the only info I had was the time reported after the execution of the command.
from mysql prompt for 10000000 elements it is roughly 2.1 to 2.4 either for sorting bigint_value or float_value
Java JDBC mysql call (similar performance to doing sort from mysql cli)
public static void SortDatabaseViaMysql(Connection conn, String tableName) {
try {
Statement stmt = conn.createStatement();
String cmd = "SELECT * FROM " + tableName + " order by float_value limit 100";
ResultSet rs = stmt.executeQuery(cmd);
} catch (Exception e) {
}
}
Five runs:
da=2379 ms
da=2361 ms
da=2443 ms
da=2453 ms
da=2362 ms
Java Sort Generating random numbers on fly (actually was slower than disk IO read). Assignment time is the time to generate random numbers and populate the array
Calling like
JavaSort(10,10000000);
Timing results:
assignment time 331 sort time 1139
assignment time 324 sort time 1037
assignment time 317 sort time 1028
assignment time 319 sort time 1026
assignment time 317 sort time 1018
assignment time 325 sort time 1025
assignment time 317 sort time 1024
assignment time 318 sort time 1054
assignment time 317 sort time 1024
assignment time 317 sort time 1017
These results were for reading a file of doubles in binary mode
assignment time 4661 sort time 1056
assignment time 4631 sort time 1024
assignment time 4733 sort time 1004
assignment time 4725 sort time 980
assignment time 4635 sort time 980
assignment time 4725 sort time 980
assignment time 4667 sort time 978
assignment time 4668 sort time 980
assignment time 4757 sort time 982
assignment time 4765 sort time 987
Doing a buffer transfer results in much faster runtimes
assignment time 77 sort time 1192
assignment time 59 sort time 1125
assignment time 55 sort time 999
assignment time 55 sort time 1000
assignment time 56 sort time 999
assignment time 54 sort time 1010
assignment time 55 sort time 999
assignment time 56 sort time 1000
assignment time 55 sort time 1002
assignment time 56 sort time 1002
C and C++ Timing results (see below for source)
Debug profile using qsort
assignment 0 seconds 110 milliseconds Time taken 2 seconds 340 milliseconds
assignment 0 seconds 90 milliseconds Time taken 2 seconds 340 milliseconds
assignment 0 seconds 100 milliseconds Time taken 2 seconds 330 milliseconds
assignment 0 seconds 100 milliseconds Time taken 2 seconds 340 milliseconds
assignment 0 seconds 100 milliseconds Time taken 2 seconds 330 milliseconds
assignment 0 seconds 100 milliseconds Time taken 2 seconds 340 milliseconds
assignment 0 seconds 90 milliseconds Time taken 2 seconds 340 milliseconds
assignment 0 seconds 100 milliseconds Time taken 2 seconds 330 milliseconds
assignment 0 seconds 100 milliseconds Time taken 2 seconds 340 milliseconds
assignment 0 seconds 100 milliseconds Time taken 2 seconds 330 milliseconds
Release profile using qsort
assignment 0 seconds 100 milliseconds Time taken 1 seconds 600 milliseconds
assignment 0 seconds 90 milliseconds Time taken 1 seconds 600 milliseconds
assignment 0 seconds 90 milliseconds Time taken 1 seconds 580 milliseconds
assignment 0 seconds 90 milliseconds Time taken 1 seconds 590 milliseconds
assignment 0 seconds 80 milliseconds Time taken 1 seconds 590 milliseconds
assignment 0 seconds 90 milliseconds Time taken 1 seconds 590 milliseconds
assignment 0 seconds 90 milliseconds Time taken 1 seconds 600 milliseconds
assignment 0 seconds 90 milliseconds Time taken 1 seconds 590 milliseconds
assignment 0 seconds 90 milliseconds Time taken 1 seconds 600 milliseconds
assignment 0 seconds 90 milliseconds Time taken 1 seconds 580 milliseconds
Release profile Using std::sort( a, a + ARRAY_SIZE );
assignment 0 seconds 100 milliseconds Time taken 0 seconds 880 milliseconds
assignment 0 seconds 90 milliseconds Time taken 0 seconds 870 milliseconds
assignment 0 seconds 90 milliseconds Time taken 0 seconds 890 milliseconds
assignment 0 seconds 120 milliseconds Time taken 0 seconds 890 milliseconds
assignment 0 seconds 90 milliseconds Time taken 0 seconds 890 milliseconds
assignment 0 seconds 90 milliseconds Time taken 0 seconds 880 milliseconds
assignment 0 seconds 90 milliseconds Time taken 0 seconds 900 milliseconds
assignment 0 seconds 90 milliseconds Time taken 0 seconds 890 milliseconds
assignment 0 seconds 100 milliseconds Time taken 0 seconds 890 milliseconds
assignment 0 seconds 150 milliseconds Time taken 0 seconds 870 milliseconds
Release profile Reading random data from file and using std::sort( a, a + ARRAY_SIZE )
assignment 0 seconds 50 milliseconds Time taken 0 seconds 880 milliseconds
assignment 0 seconds 40 milliseconds Time taken 0 seconds 880 milliseconds
assignment 0 seconds 50 milliseconds Time taken 0 seconds 880 milliseconds
assignment 0 seconds 50 milliseconds Time taken 0 seconds 880 milliseconds
assignment 0 seconds 40 milliseconds Time taken 0 seconds 880 milliseconds
Below is the source code used. Hopefully minimal bugs :)
Java source
Note that internal to JavaSort the runCode and writeFlag need to be adjusted depending on what you want to time. Also note that the memory allocation happens in the for loop (thus testing GC, but I did not see any appreciable difference moving the allocation outside the loop)
public static void JavaSort(int iterations, int numberElements) {
Random ran = new Random();
Math.random();
int runCode = 2;
boolean writeFlag = false;
for (int j = 0; j < iterations; j++) {
double[] a1 = new double[numberElements];
long timea = System.currentTimeMillis();
if (runCode == 0) {
for (int i = 0; i < numberElements; i++) {
a1[i] = ran.nextDouble();
}
}
else if (runCode == 1) {
//do disk io!!
try {
DataInputStream in = new DataInputStream(new FileInputStream("MyBinaryFile.txt"));
int i = 0;
//while (in.available() > 0) {
while (i < numberElements) { //this should be changed so that I always read in the size of array elements
a1[i++] = in.readDouble();
}
}
catch (Exception e) {
}
}
else if (runCode == 2) {
try {
FileInputStream stream = new FileInputStream("MyBinaryFile.txt");
FileChannel inChannel = stream.getChannel();
ByteBuffer buffer = inChannel.map(FileChannel.MapMode.READ_ONLY, 0, inChannel.size());
//int[] result = new int[500000];
buffer.order(ByteOrder.BIG_ENDIAN);
DoubleBuffer doubleBuffer = buffer.asDoubleBuffer();
doubleBuffer.get(a1);
}
catch (Exception e) {
}
}
if (writeFlag) {
try {
DataOutputStream out = new DataOutputStream(new FileOutputStream("MyBinaryFile.txt"));
for (int i = 0; i < numberElements; i++) {
out.writeDouble(a1[i]);
}
} catch (Exception e) {
}
}
long timeb = System.currentTimeMillis();
Arrays.sort(a1);
long timec = System.currentTimeMillis();
System.out.println("assignment time " + (timeb - timea) + " " + " sort time " + (timec - timeb));
//delete a1;
}
}
C/C++ source
#include <iostream>
#include <vector>
#include <algorithm>
#include <fstream>
#include <cstdlib>
#include <ctime>
#include <cstdio>
#include <math.h>
#include <stdio.h>
#include <time.h>
#include <stdlib.h>
#define ARRAY_SIZE 10000000
using namespace std;
int compa(const void * elem1, const void * elem2) {
double f = *((double*) elem1);
double s = *((double*) elem2);
if (f > s) return 1;
if (f < s) return -1;
return 0;
}
int compb (const void *a, const void *b) {
if (*(double **)a < *(double **)b) return -1;
if (*(double **)a > *(double **)b) return 1;
return 0;
}
void timing_testa(int iterations) {
clock_t start = clock(), diffa, diffb;
int msec;
bool writeFlag = false;
int runCode = 1;
for (int loopCounter = 0; loopCounter < iterations; loopCounter++) {
double *a = (double *) malloc(sizeof (double)*ARRAY_SIZE);
start = clock();
size_t bytes = sizeof (double)*ARRAY_SIZE;
if (runCode == 0) {
for (int i = 0; i < ARRAY_SIZE; i++) {
a[i] = rand() / (RAND_MAX + 1.0);
}
}
else if (runCode == 1) {
ifstream inlezen;
inlezen.open("test", ios::in | ios::binary);
inlezen.read(reinterpret_cast<char*> (&a[0]), bytes);
}
if (writeFlag) {
ofstream outf;
const char* pointer = reinterpret_cast<const char*>(&a[0]);
outf.open("test", ios::out | ios::binary);
outf.write(pointer, bytes);
outf.close();
}
diffa = clock() - start;
msec = diffa * 1000 / CLOCKS_PER_SEC;
printf("assignment %d seconds %d milliseconds\t", msec / 1000, msec % 1000);
start = clock();
//qsort(a, ARRAY_SIZE, sizeof (double), compa);
std::sort( a, a + ARRAY_SIZE );
//printf("%f %f %f\n",a[0],a[1000],a[ARRAY_SIZE-1]);
diffb = clock() - start;
msec = diffb * 1000 / CLOCKS_PER_SEC;
printf("Time taken %d seconds %d milliseconds\n", msec / 1000, msec % 1000);
free(a);
}
}
/*
*
*/
int main(int argc, char** argv) {
printf("hello world\n");
double *a = (double *) malloc(sizeof (double)*ARRAY_SIZE);
//srand(1);//change seed to fix it
srand(time(NULL));
timing_testa(5);
free(a);
return 0;
}
This is not completely on point, but I posted something recently that relates to database vs. application-side sorting. The article is about a .net technique, so most of it likely won't be interesting to you, but the basic principles remain:
Deferring sorting to the client side (e.g. jQuery, Dataset/Dataview sorting) may look tempting. And it actually is a viable option for paging, sorting and filtering, if (and only if):
1. the set of data is small, and
1. there is little concern about performance and scalability
From my experience, the systems that meet this kind of criteria are few and far between. Note that it’s not possible to mix and match sorting/paging in the application/database—if you ask the database for an unsorted 100 rows of data, then sort those rows on the application side, you’re likely not going to get the set of data you were expecting. This may seem obvious, but I’ve seen the mistake made enough times that I wanted to at least mention it.
It is much more efficient to sort and filter in the database for a number of reasons. For one thing, database engines are highly optimized for doing exactly the kind of work that sorting and filtering entail; this is what their underlying code was designed to do. But even barring that—even assuming you could write code that could match the kind of sorting, filtering and paging performance of a mature database engine—it’s still preferable to do this work in the database, for the simple reason that it’s more efficient to limit the amount of data that is transferred from the database to the application server.
So for example, if you have 10,000 rows before filtering, and your query pares that number down to 75, filtering on the client results in the data from all 10,000 rows being passed over the wire (and into your app server’s memory), where filtering on the database side would result in only the filtered 75 rows being moved between database and application. his can make a huge impact on performance and scalability.
The full post is here:
http://psandler.wordpress.com/2009/11/20/dynamic-search-objects-part-5sorting/
I'm almost positive that it will be faster to allow the Database to sort it. There's engineers who spend a lot of time perfecting and optimizing their search algorithms, whereas you'll have to implement your own sorting algorithm which might add a few more computations.
I would let the database do the sort, they are generally very good at that.
Let the database sort it. Then you can have paging with JPA easily without readin in the whole resultset.
Well, there is not really a straightforward way to answer this; it must be answered in the context.
Is your application (middle tier) is running in the same node as the database?
If yes, you do not have to worry about the latency between the database and middle tier. Then the question becomes: How big is the subset/resultset of your query? Remember that to sort this is middle tier, you will take a list/set of size N, and either write a custom comparator or use the default Collection comparator. Or, whatever. So at the outset, you are setback by the size N.
But if the answer is no, then you are hit by the latency involved in transferring your resultset from DB to middle tier. And then if you are performing pagination, which is the last thing you should do, you are throwing away 90-95% of that resultset after cutting the pages.
So the wasted bandwidth cannot be justified. Imagine doing this for every request, across your tenant organizations.
However way you look at it, this is bad design.
I would do this in the database, no matter what. Just because almost all applications today demand pagination; even if they don't sending massive resultsets over the wire to your client is a total waste; drags everybody down across all your tenants.
One interesting idea that I am toying with these days is to harness the power of HTML5, 2-way data binding in browser frameworks like Angular, and push some processing back to the browser. That way, you dont end up waiting in the line for someone else before you to finish. True distributed processing. But care must be taken in deciding what can be pushed and what not.
Depends on the context.
TL;DR
If you have the full data in your application server, do it in the application server.
If you have the full dataset that you need on the application server side already then it is better to do it on the application server side because those servers can scale horizontally. The most likely scenarios for this are:
the data set you're retrieving from the database is small
you cached the data on the application server side on startup
You're doing event sourcing and you're building up the data in the application server side anyway.
Don't do it on client side unless you can guarantee it won't impact the client devices.
Databases themselves may be optimized, but if you can pull burden away from them you can reduce your costs overall because scaling the databases up is more expensive than scaling up application servers.