Related
I have an int array with 1000 elements. I need to extract the size of various sub-populations within the array (How many are even, odd, greater than 500, etc..).
I could use a for loop and a bunch of if statements to try add to a counting variable for each matching item such as:
for(int i = 0; i < someArray.length i++) {
if(conditionA) sizeA++;
if(conditionB) sizeB++;
if(conditionC) sizeC++;
...
}
or I could do something more lazy such as:
Supplier<IntStream> ease = () -> Arrays.stream(someArray);
int sizeA = ease.get().filter(conditionA).toArray.length;
int sizeB = ease.get().filter(conditionB).toArray.length;
int sizeC = ease.get().filter(conditionC).toArray.length;
...
The benefit of doing it the second way seems to be limited to readability, but is there a massive hit on efficiency? Could it possibly be more efficient? I guess it boils down to is iterating through the array one time with 4 conditions always better than iterating through 4 times with one condition each time (assuming the conditions are independent). I am aware this particular example the second method has lots of additional method calls which I'm sure don't help efficiency any.
Preamble:
As #Kayaman points out, for a small array (1000 elements) it probably doesn't matter.
The correct approach to this kind of thing is to do the optimization after you have working code, and a working benchmark, and after you have profiled the code to see where the real hotspots are.
But assuming that this is worth spending effort on optimization, the first version is likely to be faster than the second version for a couple of reasons:
The overheads of incrementing and testing the index are only incurred once in the first version versus three times in the second one.
For an array that is too large to fit into the memory cache, the first version will entail fewer memory reads than the second one. Since memory access is typically a bottleneck (especially on a multi-core machine), this can be significant.
Streams add an extra performance overhead compared to simple iteration of an array.
I did some time measuring with this code:
Random r = new Random();
int[] array = IntStream.generate(() -> r.nextInt(100)).limit(1000).toArray();
long odd = 0;
long even = 0;
long divisibleBy3 = 0;
long start = System.nanoTime();
//for (int i: array) {
// if (i % 2 == 1) {
// odd++;
// }
// if (i % 2 == 0) {
// even++;
// }
// if (i % 3 == 0) {
// divisibleBy3++;
// }
//}
even = Arrays.stream(array).parallel().filter(x -> x % 2 == 0).toArray().length;
odd = Arrays.stream(array).parallel().filter(x -> x % 2 == 1).toArray().length;
divisibleBy3 = Arrays.stream(array).parallel().filter(x -> x % 3 == 0).toArray().length;
System.out.println(System.nanoTime() - start);
The above outputs a 8 digit number, usually around 14000000
If I uncomment the for loop and comment the streams, I get a 5 digit number as output, usually around 80000.
So the streams are slower in terms of execution time.
When the array size is bigger, though, the difference between streams and loops becomes smaller.
I am implementing something very similar to a Genetic Algorithm. So you go through multiple generations of a population - at the end of a generation you create a new population in three different ways 'randomly', 'mutation' and 'crossover'.
Currently the probabilities are static but I need to make it so that the probability of mutation gradually increases. I appreciate any direction as I'm a little stuck..
This is what I have:
int random = generator.nextInt(10);
if (random < 1)
randomlyCreate()
else if (random > 1 && random < 9 )
crossover();
else
mutate();
Thank you.
In your if statement, replace the hard coded numbers with variables and update them at the start of each generation.
Your if statement effectively divides the interval 0 to 10 into three bins. The probability of calling mutate() vs crossover() vs randomlyCreate() depends on the size of each bin. You can adjust the mutation rate by gradually moving the boundaries of the bins.
In your code, mutate() is called 20% of the time, (when random = 9 or 1), randomlyCreate() is called 10% of the time (when random = 0) and crossover() is called the other 70% of the time.
The code below starts out with these same ratios at generation 0, but the mutation rate increases by 1% each generation. So for generation 1 the mutation rate is 21%, for generation 2 it is 22%, and so on. randomlyCreate() is called 1 / 7 as often as crossover(), regardless of the mutation rate.
You could make the increase in mutation rate quadratic, exponential, or whatever form you choose by altering getMutationBoundary().
I've used floats in the code below. Doubles would work just as well.
If the mutation rate is what you're most interested in, it might be more intuitive to move the mutation bin so that it's at [0, 2] initially, and then increase its upper boundary from there (2.1, 2.2, etc). Then you can read off the mutation rate easily, (21%, 22%, etc).
void mainLoop() {
// make lots of generations
for (int generation = 0; generation < MAX_GEN; generation++) {
float mutationBoundary = getMutationBoundary(generation);
float creationBoundary = getCreationBoundary(mutationBoundary);
createNewGeneration(mutationBoundary, creationBoundary);
// Do some stuff with this generation, e.g. measure fitness
}
}
void createNewGeneration(float mutationBoundary, float creationBoundary) {
// create each member of this generation
for (int i = 0; i < MAX_POP; i++) {
createNewMember(mutationBoundary, creationBoundary);
}
}
void createNewMember(float mutationBoundary, float creationBoundary) {
float random = 10 * generator.nextFloat();
if (random > mutationBoundary) {
mutate();
}
else {
if (random < creationBoundary) {
randomlyCreate();
}
else {
crossover();
}
}
}
float getMutationBoundary(int generation) {
// Mutation bin is is initially between [8, 10].
// Lower bound slides down linearly, so it becomes [7.9, 10], [7.8, 10], etc.
// Subtracting 0.1 each generation makes the bin grow in size.
// Initially the bin is 10 - 8 = 2.0 units wide, then 10 - 7.9 = 2.1 units wide,
// and so on. So the probability of mutation grows from 2 / 10 = 20%
// to 2.1 / 10 = 21% and so on.
float boundary = 8 - 0.1f * generation;
if (boundary < 0) {
boundary = 0;
}
return boundary;
}
float getCreationBoundary(float creationBoundary) {
return creationBoundary / 8; // fixed ratio
}
Use a variable where you are currently use the 9, and (for example) multiply that by 0.9 every itaration, unless mutate() happens, in which case you multiply it by 3 for example. that way the chance of mutation grows slowly but exponentially (yes, that is possible), until they actually mutate, at which point the chance of another mutation drops like a brick and the process starts all over again.
these values are completely random, and are not based on any knowledge about mutation whatsoever, but I'm just showing you with this how you could manipulate it to have a variable value every time. Also: if you use what I just used, make sure the value of the variable is set to 10 if it ever goes over 10.
Any choose of genetic probabilites for operators is arbitrary (also valid if you use some function for increasing or decreasing probabilities). Better to codify operators inside the chromosome. For example, you can add a number of bits to codify all operators you use. When generate children, you take a look to these bits for all elements of the population and apply the operator with a probability equal to the current situation of operators in the whole population, considered globally.
For example:
void adaptive_probabilities(GA *ga, long chromosome_length) {
register int i, mut = 1, xover = 1, uxover = 1, ixover = 1, pop;
char bit1, bit2;
for (i = 0; i < ga->npop; i++) {
bit1 = ga->pop[i]->chromosome[chromosome_length - 2];
bit2 = ga->pop[i]->chromosome[chromosome_length - 1];
if (bit1 == '0' && bit2 == '0') {
mut++;
} else if (bit1 == '0' && bit2 == '1') {
xover++;
} else if (bit1 == '1' && bit2 == '0') {
uxover++;
} else if (bit1 == '1' && bit2 == '1') {
ixover++;
}
}
pop = ga->npop + 4;
ga->prob[0] = mut / (float)pop;
ga->prob[1] = xover / (float)pop;
ga->prob[2] = uxover / (float)pop;
ga->prob[3] = ixover / (float)pop;
}
In my case I use two bits because my chromosomes codify for four operators (three types of crossover + mutation). Bits for operators are located to the end of chromosome. All probabilities are > 0 (counters for operators begin from 1) and then I have to normalize all probabilities correctly with
pop = ga->npop + 4;
Then, I generate a random number for choose the operator in base to the calculated probabilities saved in the array ga->prob.Last bits of new children are changed to reflect the operator used.
This mechanism ensures a double search by the GA: in error space (as usual) and in the operators space. Probabilites change automatically and are optimized because children are generated with higher probability using best operators at any moment of the calculation.
Suppose I have a method to calculate combinations of r items from n items:
public static long combi(int n, int r) {
if ( r == n) return 1;
long numr = 1;
for(int i=n; i > (n-r); i--) {
numr *=i;
}
return numr/fact(r);
}
public static long fact(int n) {
long rs = 1;
if(n <2) return 1;
for (int i=2; i<=n; i++) {
rs *=i;
}
return rs;
}
As you can see it involves factorial which can easily overflow the result. For example if I have fact(200) for the foctorial method I get zero. The question is why do I get zero?
Secondly how do I deal with overflow in above context? The method should return largest possible number to fit in long if the result is too big instead of returning wrong answer.
One approach (but this could be wrong) is that if the result exceed some large number for example 1,400,000,000 then return remainder of result modulo
1,400,000,001. Can you explain what this means and how can I do that in Java?
Note that I do not guarantee that above methods are accurate for calculating factorial and combinations. Extra bonus if you can find errors and correct them.
Note that I can only use int or long and if it is unavoidable, can also use double. Other data types are not allowed.
I am not sure who marked this question as homework. This is NOT homework. I wish it was homework and i was back to future, young student at university. But I am old with more than 10 years working as programmer. I just want to practice developing highly optimized solutions in Java. In our times at university, Internet did not even exist. Today's students are lucky that they can even post their homework on site like SO.
Use the multiplicative formula, instead of the factorial formula.
Since its homework, I won't want to just give you a solution. However a hint I will give is that instead of calculating two large numbers and dividing the result, try calculating both together. e.g. calculate the numerator until its about to over flow, then calculate the denominator. In this last step you can chose the divide the numerator instead of multiplying the denominator. This stops both values from getting really large when the ratio of the two is relatively small.
I got this result before an overflow was detected.
combi(61,30) = 232714176627630544 which is 2.52% of Long.MAX_VALUE
The only "bug" I found in your code is not having any overflow detection, since you know its likely to be a problem. ;)
To answer your first question (why did you get zero), the values of fact() as computed by modular arithmetic were such that you hit a result with all 64 bits zero! Change your fact code to this:
public static long fact(int n) {
long rs = 1;
if( n <2) return 1;
for (int i=2; i<=n; i++) {
rs *=i;
System.out.println(rs);
}
return rs;
}
Take a look at the outputs! They are very interesting.
Now onto the second question....
It looks like you want to give exact integer (er, long) answers for values of n and r that fit, and throw an exception if they do not. This is a fair exercise.
To do this properly you should not use factorial at all. The trick is to recognize that C(n,r) can be computed incrementally by adding terms. This can be done using recursion with memoization, or by the multiplicative formula mentioned by Stefan Kendall.
As you accumulate the results into a long variable that you will use for your answer, check the value after each addition to see if it goes negative. When it does, throw an exception. If it stays positive, you can safely return your accumulated result as your answer.
To see why this works consider Pascal's triangle
1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
1 5 10 10 5 1
1 6 15 20 15 6 1
which is generated like so:
C(0,0) = 1 (base case)
C(1,0) = 1 (base case)
C(1,1) = 1 (base case)
C(2,0) = 1 (base case)
C(2,1) = C(1,0) + C(1,1) = 2
C(2,2) = 1 (base case)
C(3,0) = 1 (base case)
C(3,1) = C(2,0) + C(2,1) = 3
C(3,2) = C(2,1) + C(2,2) = 3
...
When computing the value of C(n,r) using memoization, store the results of recursive invocations as you encounter them in a suitable structure such as an array or hashmap. Each value is the sum of two smaller numbers. The numbers start small and are always positive. Whenever you compute a new value (let's call it a subterm) you are adding smaller positive numbers. Recall from your computer organization class that whenever you add two modular positive numbers, there is an overflow if and only if the sum is negative. It only takes one overflow in the whole process for you to know that the C(n,r) you are looking for is too large.
This line of argument could be turned into a nice inductive proof, but that might be for another assignment, and perhaps another StackExchange site.
ADDENDUM
Here is a complete application you can run. (I haven't figured out how to get Java to run on codepad and ideone).
/**
* A demo showing how to do combinations using recursion and memoization, while detecting
* results that cannot fit in 64 bits.
*/
public class CombinationExample {
/**
* Returns the number of combinatios of r things out of n total.
*/
public static long combi(int n, int r) {
long[][] cache = new long[n + 1][n + 1];
if (n < 0 || r > n) {
throw new IllegalArgumentException("Nonsense args");
}
return c(n, r, cache);
}
/**
* Recursive helper for combi.
*/
private static long c(int n, int r, long[][] cache) {
if (r == 0 || r == n) {
return cache[n][r] = 1;
} else if (cache[n][r] != 0) {
return cache[n][r];
} else {
cache[n][r] = c(n-1, r-1, cache) + c(n-1, r, cache);
if (cache[n][r] < 0) {
throw new RuntimeException("Woops too big");
}
return cache[n][r];
}
}
/**
* Prints out a few example invocations.
*/
public static void main(String[] args) {
String[] data = ("0,0,3,1,4,4,5,2,10,0,10,10,10,4,9,7,70,8,295,100," +
"34,88,-2,7,9,-1,90,0,90,1,90,2,90,3,90,8,90,24").split(",");
for (int i = 0; i < data.length; i += 2) {
int n = Integer.valueOf(data[i]);
int r = Integer.valueOf(data[i + 1]);
System.out.printf("C(%d,%d) = ", n, r);
try {
System.out.println(combi(n, r));
} catch (Exception e) {
System.out.println(e.getMessage());
}
}
}
}
Hope it is useful. It's just a quick hack so you might want to clean it up a little.... Also note that a good solution would use proper unit testing, although this code does give nice output.
You can use the java.math.BigInteger class to deal with arbitrarily large numbers.
If you make the return type double, it can handle up to fact(170), but you'll lose some precision because of the nature of double (I don't know why you'd need exact precision for such huge numbers).
For input over 170, the result is infinity
Note that java.lang.Long includes constants for the min and max values for a long.
When you add together two signed 2s-complement positive values of a given size, and the result overflows, the result will be negative. Bit-wise, it will be the same bits you would have gotten with a larger representation, only the high-order bit will be truncated away.
Multiplying is a bit more complicated, unfortunately, since you can overflow by more than one bit.
But you can multiply in parts. Basically you break the to multipliers into low and high halves (or more than that, if you already have an "overflowed" value), perform the four possible multiplications between the four halves, then recombine the results. (It's really just like doing decimal multiplication by hand, but each "digit" is, say, 32 bits.)
You can copy the code from java.math.BigInteger to deal with arbitrarily large numbers. Go ahead and plagiarize.
I have been told I have to write a BigInteger class, I know there is one, but I have to write my own. I am to take either ints or a string and turn them into arrays to store them. From there I am to then allow adding, subtract, and multiplying of the numbers. I have it taking the ints and the string and making the arrays that was fine. I am having issues with the rest.
For the add, I have tried to make something that checks the size of the type arrays of numbers, and then sets which is smaller and bigger. From there I have it looping till it gets to the end of the smaller one, and as it loops it takes the digit at that part of the array for the two numbers and adds them. Now this is ok till they are greater then 10, in which case I need to carryover a number. I think I had that working at a point too.
Keep in mind the two things my BigInt has is the array of the number and an int for the sign, 1 or -1.
So in this case I am having issues with it adding right and the sign being right. Same with subtracting.
As for multiplying, I am completely lost on that, and haven't even tried. Below is some of the code I have tried making: ( the add function), PLEASE HELP ME.
public BigInt add(BigInt val){
int[] bigger;
int[] smaller;
int[] dStore;
int carryOver = 0;
int tempSign = 1;
if(val.getSize() >= this.getSize()){
bigger = val.getData();
smaller = this.getData();
dStore = new int[val.getSize()+2];
if(val.getSign() == 1){
tempSign = 1;
}else{
tempSign = -1;
}
}else{
bigger = this.getData();
smaller = val.getData();
dStore = new int[this.getSize()+2];
if(this.getSign() == 1){
tempSign = 1;
}else{
tempSign = -1;
}
}
for(int i=0;i<smaller.length;i++){
if((bigger[i] < 0 && smaller[i] < 0) || (bigger[i] >= 0 && smaller[i] >= 0)){
dStore[i] = Math.abs(bigger[i]) + Math.abs(smaller[i]) + carryOver;
}else if((bigger[i] <= 0 || smaller[i] <= 0) && (bigger[i] > 0 || smaller[i] > 0)){
dStore[i] = bigger[i] + smaller[i];
dStore[i] = Math.abs(dStore[i]);
}
if(dStore[i] >= 10){
dStore[i] = dStore[i] - 10;
if(i == smaller.length - 1){
dStore[i+1] = 1;
}
carryOver = 1;
}else{
carryOver = 0;
}
}
for(int i = smaller.length;i<bigger.length;i++){
dStore[i] = bigger[i];
}
BigInt rVal = new BigInt(dStore);
rVal.setSign(tempSign);
return rVal;
if you know how to add and multiply big numbers by hand, implementing those algorithms in Java won't be difficult.
If their signs differ, you'll need to actually subtract the digits (and borrow if appropriate). Also, it looks like your carry function doesn't work to carry past the length of the smaller number (the carried "1" gets overwritten).
To go further into signs, you have a few different cases (assume that this is positive and val is negative for these cases):
If this has more digits, then you'll want to subtract val from this, and the result will be positive
If val has more digits, then you'll want to subtract this from val, and the result will be negative
If they have the same number of digits, you'll have to scan to find which is larger (start at the most significant digit).
Of course if both are positive then you just add as normal, and if both are negative you add, then set the result to be negative.
Now that we know the numbers are stored in reverse...
I think your code works if the numbers both have the same sign. I tried the following test cases:
Same length, really basic test.
Same length, carryover in the middle.
Same length, carryover at the end.
Same length, carryover in the middle and at the end
First number is longer, carryover in the middle and at the end
Second number is longer, carryover in the middle and at the end
Both negative, first number is longer, carryover in the middle and at the end
This all worked out just fine.
However, when one is positive and one is negative, it doesn't work properly.
This isn't too surprising, because -1 + 7 is actually more like subtraction than addition. You should think of it as 7-1, and it'll be much easier for you if you check for this case and instead call subtraction.
Likewise, -1 - 1 should be considered addition, even though it looks like subtraction.
I've actually written a big numbers library in assembly some years ago; i can add the multiplication code here if that helps. My advice to you is not try to write the functions on your own. There are already known ways to add, substract, multiply, divide, powmod, xgcd and more with bignumbers. I remember that i was reading Bruce Schneier's Applied Cryptography book to do that and The Art of Assembly by Randall Hyde. Both have the needed algorithms to do that (in pseudocode also). I would highly advice that you take a look, especially to the second one that it's an online free resource.
public class Main {
public static void main(String args []){
long numberOfPrimes = 0; //Initialises variable numberOfPrimes to 0 (same for all other variables)
int number = 1;
int maxLimit = 10000000;
boolean[] sieve = new boolean[maxLimit]; //creates new boolean array called sieve and allocates space on the
//stack for this array which has maxLimit spaces in it
for ( int i = 2; i < maxLimit; i++ ) { //for statement cycling from 2 to 10000000, does not execute the rest
//of the block if the boolean value in the array is true
if ( sieve[i] == true ) continue;
numberOfPrimes++; //otherwise it increments the number of prime numbers found
if ( numberOfPrimes == 10001 ) { //if 10001st prime number is found, break from loop
number = i;
break;
}
for ( int j = i+i; j < maxLimit; j += i ) //do not understand the point of this loop logically
sieve[j] = true; //testing if the value in the array is true again?
}
System.out.println("10001st prime: "+ number);
}
}
I don't really understand what is going on in this program and was hoping somebody could explain it to me? I have commented the specific lines causing me trouble/what I understand lines to be doing. Thank you very much for all the help! :)
Make yourself familiar with Eratosthenes' Sieve algorithm. Wikipedia even has animated gif demonstrating the process. And your code is just a straightforward implementation of it.
Yes, this is your basic implementation of Eratosthenes' Sieve. There are quite a few ways in which you can improve it, but let's go over the basic principle first.
What you are doing is creating an array of boolean values. The INDEX in the array represents the number which we are testing to see if it is a prime or not.
Now you are going to start checking each number to see if it is a prime. First off, the definition of a prime is "all numbers divisible ONLY by itself and 1 without fractioning".
for ( int i = 2; i < maxLimit; i++ )
You start with the INDEX 2 (the number 3) because depending on your definition, 1 and 2 are always prime. (Some definitions say 1 is not a prime).
if ( sieve[i] == true ) continue;
If a number has been marked as a non-prime previously, we don't bother with the current iteration.
numberOfPrimes++;
if ( numberOfPrimes == 10001 ) {
number = i;
break;
}
If the INDEX we are at currently has not been marked as being a prime, it has to be one, so we increment the number of primes we have found. The next piece of code I'm assuming is part of the requirements of the program which states that if 10001 primes have been found, the program must exit. That part can be left out if you actually want to check for primes up to the maximum number defined in stead of for a specific number of primes.
for ( int j = i+i; j < maxLimit; j += i )
sieve[j] = true;
This is where the actual magic of the sieve starts. From the definition of a prime, a number cannot be a prime if it is divisible by anything other than itself and 1. Therefore, for any new number we find that is a prime, we can mark all it's factors as NOT being prime. For example, the first iteration of the for loop, we start with 3. Because sieve[2] is false (have not visited before), it is a prime (AND 3 IS A PRIME!). Then, all other factors of 3 CANNOT be primes. The above mentioned for loop goes through the entire sieve and marks all factors of 3 as false. So that loop will do: sieve[5] = true; sieve[8] = true ... up until the end of the sieve.
Now, when you reach the first number greater than the maximum defined initially, you can be certain that any number that has a factor has been marked as not being a prime. What you end up with is a boolean array, where each index marked as false, represents a prime number.
You can probably get a much better description on wikipedia, but this is the jist of it. Hope it helps!
for ( int j = i+i; j < maxLimit; j += i ) //dont understand the point of this loop logically
sieve[j] = true; //testing if the value in the array is true again ?
This is not a testing, but rather a setting. This loop is setting all the items in the array with indexes multiple of i to true. When i is 2, then the items 4, 6, 8 ... will be set to true. When i is 3, the items 6, 9, 12 ... will be set to true and so on.
And as you can deduce by the first if,
if ( sieve[i] == true ) continue;
... all the items that are true correspond to non-prime numbers.
I find the easiest way to understand something is to deconstruct it. Therefore, lets go through the loop a few times, shall we?
Dawn of the First Iteration
− 9999998 Values Remain −
i = 2
sieve[2] is false, so we keep going in the current iteration.
numberOfPrimes = 1 and thus we continue processing
Set every multiple of 2 to true in sieve[].
Dawn of the Second Iteration
− 9999997 Values Remain −
i = 3
sieve[3] is false, so we keep going in the current iteration.
numberOfPrimes = 2 and thus we continue processing
Set every multiple of 3 to true in sieve[].
Dawn of the Third Iteration
− 9999996 Values Remain −
i = 4
sieve[4] is true (from first iteration). Skip to next iteration.
etc... but in this case, the moon doesn't crash into Termina.
The loop in question isn't checking for true values, it's setting true values.
It's going through each multiple of the prime and marking it as non-prime up to maxLimit. You'll notice there's no other math in the code to determine what's prime and what's not.
This is the algorithm to find the prime numbers between 1 and the maximum limit given.
And the loop added 2nd is to make true for the number which is divisible by any other number. so for the first outer loop all the number divisible by two ll be set to true then divisible by 3 then by 4 and so on.. and the numbers for which the boolean array contains false are the prime numbers.