XOR Neural Network(FF) converges to 0.5

XOR Neural Network(FF) converges to 0.5 - java

I've created a program that allows me to create flexible Neural networks of any size/length, however I'm testing it using the simple structure of an XOR setup(Feed forward, Sigmoid activation, back propagation, no batching).
EDIT: The following is a completely new approach to my original question which didn't supply enough information
EDIT 2: I started my weight between -2.5 and 2.5, and fixed a problem in my code where I forgot some negatives. Now it either converges to 0 for all cases or 1 for all, instead of 0.5
Everything works exactly the way that I THINK it should, however it is converging toward 0.5, instead of oscillating between outputs of 0 and 1. I've completely gone through and hand calculated an entire setup of feeding forward/calculating delta errors/back prop./ etc. and it matched what I got from the program. I have also tried optimizing it by changing learning rate/ momentum, as well as increase complexity in the network(more neurons/layers).
Because of this, I assume that either one of my equations is wrong, or I have some other sort of misunderstanding in my Neural Network. The following is the logic with equations that I follow for each step:
I have an input layer with two inputs and a bias, a hidden with 2 neurons and a bias, and an output with 1 neuron.
Take the input from each of the two input neurons and the bias neuron, then multiply them by their respective weights, and then add them together as the input for each of the two neurons in the hidden layer.
Take the input of each hidden neuron, pass it through the Sigmoid activation function (Reference 1) and use that as the neuron's output.
Take the outputs of each neuron in hidden layer (1 for the bias), multiply them by their respective weights, and add those values to the output neuron's input.
Pass the output neuron's input through the Sigmoid activation function, and use that as the output for the whole network.
Calculate the Delta Error(Reference 2) for the output neuron
Calculate the Delta Error(Reference 3) for each of the 2 hidden neurons
Calculate the Gradient(Reference 4) for each weight (starting from the end and working back)
Calculate the Delta Weight(Reference 5) for each weight, and add that to its value.
Start the process over with by Changing the inputs and expected output(Reference 6)
Here are the specifics of those references to equations/processes (This is probably where my problem is!):
x is the input of the neuron: (1/(1 + Math.pow(Math.E, (-1 * x))))
-1*(actualOutput - expectedOutput)*(Sigmoid(x) * (1 - Sigmoid(x))//Same sigmoid used in reference 1
SigmoidDerivative(Neuron.input)*(The sum of(Neuron.Weights * the deltaError of the neuron they connect to))
ParentNeuron.output * NeuronItConnectsTo.deltaError
learningRate*(weight.gradient) + momentum*(Previous Delta Weight)
I have an arrayList with the values 0,1,1,0 in it in that order. It takes the first pair(0,1), and then expects a 1. For the second time through, it takes the second pair(1,1) and expects a 0. It just keeps iterating through the list for each new set. Perhaps training it in this systematic way causes the problem?
Like I said before, they reason I don't think it's a code problem is because it matched exactly what I had calculated with paper and pencil (which wouldn't have happened if there was a coding error).
Also when I initialize my weights the first time, I give them a random double value between 0 and 1. This article suggests that that may lead to a problem: Neural Network with backpropogation not converging
Could that be it? I used the n^(-1/2) rule but that did not fix it.
If I can be more specific or you want other code let me know, thanks!

This is wrong
SigmoidDerivative(Neuron.input)*(The sum of(Neuron.Weights * the deltaError of the neuron they connect to))
First is sigmoid activation (g)
second is derivative of sigmoid activation
private double g(double z) {
return 1 / (1 + Math.pow(2.71828, -z));
}
private double gD(double gZ) {
return gZ * (1 - gZ);
}
Unrelated note: Your notation of (-1*x) is really strange just use -x
Your implementation from how you phrase the steps of your ANN seems poor. Try to focus on implementing Forward/BackPropogation and then an UpdateWeights method.
Creating a matrix class
This is my Java implementation, its very simple and somewhat rough. I use a Matrix class to make the math behind it appear very simple in code.
If you can code in C++ you can overload operaters which will enable for even easier writing of comprehensible code.
https://github.com/josephjaspers/ArtificalNetwork/blob/master/src/artificalnetwork/ArtificalNetwork.java
Here are the algorithms (C++)
All of these codes can be found on my github (the Neural nets are simple and funcitonal)
Each layer includes the bias nodes, which is why there are offsets
void NeuralNet::forwardPropagation(std::vector<double> data) {
setBiasPropogation(); //sets all the bias nodes activation to 1
a(0).set(1, Matrix(data)); //1 to offset for bias unit (A = X)
for (int i = 1; i < layers; ++i) {
// (set(1 -- offsets the bias unit
z(i).set(1, w(i - 1) * a(i - 1));
a(i) = g(z(i)); // g(z ) if the sigmoid function
}
}
void NeuralNet::setBiasPropogation() {
for (int i = 0; i < activation.size(); ++i) {
a(i).set(0, 0, 1);
}
}
outLayer D = A - Y (y is the output data)
hiddenLayers d^l = (w^l(T) * d^l+1) *: gD(a^l)
d = derivative vector
W = weights matrix (Length = connections, width = features)
a = activation matrix
gD = derivative function
^l = IS NOT POWER OF (this just means at layer l)
= dotproduct
*: = multiply (multiply each element "through")
cpy(n) returns a copy of the matrix offset by n (ignores n rows)
void NeuralNet::backwardPropagation(std::vector<double> output) {
d(layers - 1) = a(layers - 1) - Matrix(output);
for (int i = layers - 2; i > -1; --i) {
d(i) = (w(i).T() * d(i + 1).cpy(1)).x(gD(a(i)));
}
}
Explaining this code maybe confusing without images so I'm sending this link which I think is a good source, it also contains an explanation of BackPropagation which may be better then my own explanation.
http://galaxy.agh.edu.pl/~vlsi/AI/backp_t_en/backprop.html
void NeuralNet::updateWeights() {
// the operator () (int l, int w) returns a double reference at that position in the matrix
// thet operator [] (int n) returns the nth double (reference) in the matrix (useful for vectors)
for (int l = 0; l < layers - 1; ++l) {
for (int i = 1; i < d(l + 1).length(); ++i) {
for (int j = 0; j < a(l).length(); ++j) {
w(l)(i - 1, j) -= (d(l + 1)[i] * a(l)[j]) * learningRate + m(l)(i - 1, j);
m(l)(i - 1, j) = (d(l + 1)[i] * a(l)[j]) * learningRate * momentumRate;
}
}
}
}

Related

Java finding Zero Points with Newton Algorithm endless loop

I'm havinga Problem with a method, which takes a Polynom like f(x)=x²+1 and calculates possible zero points with the newton algorithm.
I have given requirements for specific variables so even if the naming is not good or a variable is not needed I have to use them :/
The Polynom I give my method as a parameter is a double-array: For f(x)=x²+1 it would be {1.0,0.0,1.0}
so its constructed like 1.0*x^0 + 0.0*x^1+1.0*x^2
For my Code:
x0 is the start value for the newton algorithm and eps is for the accuracy of the calculation
I followed my given Instructions and got the following code working:
public static double newton(double[] a, double x0, double eps) {
double z;
double xn;
double xa = x0;
double zaehler;
double nenner;
do {
zaehler = horner(a, xa);
nenner = horner(ableit(a), xa);
if(nenner == 0) {
return Double.POSITIVE_INFINITY;
}
xn = xa - (zaehler/nenner);
xa = xn;
} while((Math.abs(horner(a, xn))) >= eps);
z = xn;
return 0;
}
the method horner() calculates the y-Value of a given function for a given x-Value.
My Problem is if the Function doesn't has a zero-point like x²+1 and I start with x0=1 and eps=0.1 I get Infinity returned.
But If I start with x0=10 and eps=0.1for example I create an endless loop.
How can I deal with this or is this a general Problem with the Newton Algorithm?!
Is the only way to set a fixed maximum of Iterations?
The Code is working for Polynoms that have at least one zero-point!

The Newton–Raphson method requires the existence of a real root x such that f(x)=0. The function you use x^2+1 has no real roots, so your algorithm will not work in this case (nor in others where there is no root).
Since x^2+1 >= 1 for all real x this implies horner(a, xn) >= 1, so the loop
while((Math.abs(horner(a, xn))) >= eps)
will not terminate for eps < 1.
Maybe before starting to iterate, you should check the existence of a zero.
E.g. if the highest (according to the power of x) nonzero coefficient is odd then there will be a real zero.
Or extend your algorithm such that it previously tries to find some real aand b such that f(a)f(b) <= 0 (then between a and b there is a root).

Compute a table in every possible way

I would like to compute a table with values in "every possible way" by multiplying one value from each column to a product. I would preferably solve the problem in Java. The table is of size n*m. It could for example be of size 3*5 and containing:
0.5, 3.0, 5.0, 4.0, 0.75
0.5, 3.0, 5.0, 4.0, 0.75
0.5, 9.0, 5.0, 4.0, 3.0
One way of getting the product would be:
0.5 * 3.0 * 5.0 * 4.0 * 0.75
How do I compute this in "every possible way" when the table is of size n*m? I would like to write one program (presumably containing loops) that works for every n*m table.

You could do it recursively, as the other answer mentioned, but in general I find Java is somewhat unhappy with recursion. One other method to do it would be to keep track of a "signature" of where you are in the table (i.e., an array of length m where each value is 0 <= val < m). Each signature uniquely specifies a path through the table, and you can compute the value from a given signature pretty easily:
double val = 1.;
for (int j=0; j<m; j++)
val *= table[j][signature[j];
To iterate through all signatures, think of them as (up to) m-digit numbers in base n and simply increment through, making sure to carry when you get above n. Here's some untested, unoptimized, probably badly indexed sample code:
int[] sig = new int[m];
double[] values = new double[m*n];
while (sig[m-1] < n) {
values = getValue(table, sig);
int carry = 1, j = 0;
while (carry > 0 && j < n) {
sig[j] += carry;
carry = 0;
while (sig[j] >= n) {
sig[j] -= n;
carry += 1;
}
}
}

Create a recursive method that makes two calls, one call where you use a number in a column in the final product, and one call where you do not. In the call where you do not use it, you make two more calls, one where you use the next number in the column and one where you do not and so on. When you do use a number, you go to the next column, efficiently making a recursive tree of sorts where each leaf is a different combination of finding the product.
You would not need any data structure for this besides your table and it would be able to take in any size of table. If you do not understand the method I have described I can provide some short example code but it is fairly simple.
method findProducts(int total, pos x, pos y)
if(inbounds of table)
findProducts(total + column[x]row[y] value, 0, y+1)
findProducts(total, x+1, y)
else
print(total)
Something like this, a counter would be useful so you could only print those values that are combinations of y numbers, the amount of rows.

How do I generate normal cumulative distribution in Java? its inverse cdf? How about lognormal?

I am brand new to Java, second day! I want generate samples with normal distribution. I am using inverse transformation.
Basically, I want to find the inverse normal cumulative distribution, then find its inverse. And generate samples.
My questions is: Is there a built-in function for inverse normal cdf? Or do I have to hand code?
I have seen people refer to this on apache commons. Is this a built-in? Or do I have to download it?
If I have to do it myself, can you give me some tips? If I download, doesn't my prof also have to have the "package" or special file installed?
Thanks in advance!
Edit:Just found I can't use libraries, also heard there is simpler way converting normal using radian.

As it is mentioned here:
Apache Commons - Math has what you are looking for.
More specifically, check out the NormalDistrubitionImpl class.
And no your professor doesn't need to download stuff if you provide him with all the needed libraries.
UPDATE :
If you want to hand code it (I don't know the actual formula), you can check the following link:
http://home.online.no/~pjacklam/notes/invnorm/
There are 2 people who implemented it in java: http://home.online.no/~pjacklam/notes/invnorm/#Java

I had had the same problem and find its solution, the following code will give results for cumulative distribution function just like excel do:
private static double erf(double x)
{
//A&S formula 7.1.26
double a1 = 0.254829592;
double a2 = -0.284496736;
double a3 = 1.421413741;
double a4 = -1.453152027;
double a5 = 1.061405429;
double p = 0.3275911;
x = Math.abs(x);
double t = 1 / (1 + p * x);
//Direct calculation using formula 7.1.26 is absolutely correct
//But calculation of nth order polynomial takes O(n^2) operations
//return 1 - (a1 * t + a2 * t * t + a3 * t * t * t + a4 * t * t * t * t + a5 * t * t * t * t * t) * Math.Exp(-1 * x * x);
//Horner's method, takes O(n) operations for nth order polynomial
return 1 - ((((((a5 * t + a4) * t) + a3) * t + a2) * t) + a1) * t * Math.exp(-1 * x * x);
}
public static double NORMSDIST(double z)
{
double sign = 1;
if (z < 0) sign = -1;
double result=0.5 * (1.0 + sign * erf(Math.abs(z)/Math.sqrt(2)));
return result;
}

Mathematically, this is a hard problem, and there are a few solutions you might consider.
Dislcaimer: Mathematical jargon ahead.
As you probably already know, the normalcdf function is used to calculate probabilities of normal random variables. Because a normal distribution is continuous, the corresponding probability density function (normalpdf) does not itself give probabilities, (in contrast to discrete distributions like binomial or geometric distributions). Instead, the area under the curve gives the probability that the normal random variable falls within a range of values. So, the normalcdf function you seek is the area under a section of the normalpdf function.
Mathematically, finding the area under a continuous curve is a fundamental problem of calculus. The solution to this type of problem is called an integral and integrating a function over a range of numbers means finding the area under the curve and between the lowest value in the range to the highest.
In most circumstances, we could just integrate the pdf function to get the cdf function, then evaluate it wherever we want. The heart of the problem, and the reason that an algorithm in Java is not as simple as one might think, is that normalpdf function does not have a closed form integral- it's value cannot be calculated in any finite number of steps. So, values of the normalcdf function are particularly elusive.
There are two main classes of solutions for the problem.
1. Numerical Integration Techniques
Numerical integration techniques solve the problem by approximating the area under the curve geometrically. The area is divided into rectangles or other shapes of equal or varying widths, with the height of each being given by the pdf function. The sum of the areas of the rectangle is an approximation of the area under the curve, which is the corresponding probability. These technique can be used to compute values to arbitrary precision, but is more computationally expensive than class 2. Using better approximations (e.g. Simpson's rule) improves computation. A simple numeric integration method follows.
public static double normCDF(double z)
{ double LeftEndpoint = -100;
int nRectangles = 100000;
double runningSum = 0;
double x;
for(int n = 0; n < nRectangles; n++){
x = LeftEndpoint + n*(z-LeftEndpoint)/nRectangles;
runningSum += Math.pow(Math.sqrt(2*Math.PI),-1)*Math.exp(-Math.pow(x,2)/2)*(z-LeftEndpoint)/nRectangles;
}
System.out.println(runningSum);
return runningSum;
}
2. Analytic Techniques
Analytic techniques take advantage of the fact that while the normalpdf does not have a closed-form integral, the pdf can be "converted" to a sum called a Taylor series, then integrated term-by-term. Basically, it turns the pdf into a sum of infinitely many simple functions, then integrates each one analytically, then adds together all of the integrals. Since this is an analytic procedure, a programmer need only include the integral series in the program after computing the coefficients. The precision of the result just depends on how many terms of the sum you include in the calculation, and tends to approach accurate values much sooner than numerical integration techniques. For example, the solution by Mohammad Aldefrawy computes just five coefficients. Below is a method that includes the computation of coefficients, so you one could compute values to arbitrary precision (Actually, the normalcdf series isn't computed directly. Instead, the coefficients of the related error function are computed then converted by a linear transformation). However, since computation of the coefficients involves the factorial function, one experiences memory issues for substantially large numbers of coefficients. Thankfully, this method returns values with much higher precision in a fraction of the iterations required by methods in the previous class of solutions to yield similar results.
public static double normalCDF(double x){
System.out.println(0.5*(1+erf(x/Math.sqrt(2))));
return 0.5*(1+erf(x/Math.sqrt(2)));
}
public static double erf(double z)
{
int nTerms = 315;
double runningSum = 0;
for(int n = 0; n < nTerms; n++){
runningSum += Math.pow(-1,n)*Math.pow(z,2*n+1)/(factorial(n)*(2*n+1));
}
return (2/Math.sqrt(Math.PI))*runningSum;
}
static double factorial(int n){
if(n == 0) return 1;
if(n == 1) return 1;
return n*factorial(n-1);
}
Other functions
For the inverse function, since we used the error function in the normalCDF method, we can use the inverse error function in a similar way. Again, we obtain the coefficients of the inverse error function analytically, then compute them as needed in the method.
public static double invErf(double z)
{
int nTerms = 315;
double runningSum = 0;
double[] a = new double[nTerms + 1];
double[] c = new double[nTerms + 1];
c[0]=1;
for(int n = 1; n < nTerms; n++){
double runningSum2=0;
for (int k = 0; k <= n-1; k++){
runningSum2 += c[k]*c[n-1-k]/((k+1)*(2*k+1));
}
c[n] = runningSum2;
runningSum2 = 0;
}
for(int n = 0; n < nTerms; n++){
a[n] = c[n]/(2*n+1);
runningSum += a[n]*Math.pow((0.5)*Math.sqrt(Math.PI)*z,2*n+1);
}
return runningSum;
}
public static double invNorm(double A){
return (2/Math.sqrt(2))*invErf(2*A-1);
}
I don't have a method for the lognormal function, but you could obtain one using the same idea.

I never tried it but the guys from algo team were using Colt and they were happy with the results.

What is a good solution for calculating an average where the sum of all values exceeds a double's limits?

I have a requirement to calculate the average of a very large set of doubles (10^9 values). The sum of the values exceeds the upper bound of a double, so does anyone know any neat little tricks for calculating an average that doesn't require also calculating the sum?
I am using Java 1.5.

You can calculate the mean iteratively. This algorithm is simple, fast, you have to process each value just once, and the variables never get larger than the largest value in the set, so you won't get an overflow.
double mean(double[] ary) {
double avg = 0;
int t = 1;
for (double x : ary) {
avg += (x - avg) / t;
++t;
}
return avg;
}
Inside the loop avg always is the average value of all values processed so far. In other words, if all the values are finite you should not get an overflow.

The very first issue I'd like to ask you is this:
Do you know the number of values beforehand?
If not, then you have little choice but to sum, and count, and divide, to do the average. If Double isn't high enough precision to handle this, then tough luck, you can't use Double, you need to find a data type that can handle it.
If, on the other hand, you do know the number of values beforehand, you can look at what you're really doing and change how you do it, but keep the overall result.
The average of N values, stored in some collection A, is this:
A[0] A[1] A[2] A[3] A[N-1] A[N]
---- + ---- + ---- + ---- + .... + ------ + ----
N N N N N N
To calculate subsets of this result, you can split up the calculation into equally sized sets, so you can do this, for 3-valued sets (assuming the number of values is divisable by 3, otherwise you need a different divisor)
/ A[0] A[1] A[2] \ / A[3] A[4] A[5] \ // A[N-1] A[N] \
| ---- + ---- + ---- | | ---- + ---- + ---- | \\ + ------ + ---- |
\ 3 3 3 / \ 3 3 3 / // 3 3 /
--------------------- + -------------------- + \\ --------------
N N N
--- --- ---
3 3 3
Note that you need equally sized sets, otherwise numbers in the last set, which will not have enough values compared to all the sets before it, will have a higher impact on the final result.
Consider the numbers 1-7 in sequence, if you pick a set-size of 3, you'll get this result:
/ 1 2 3 \ / 4 5 6 \ / 7 \
| - + - + - | + | - + - + - | + | - |
\ 3 3 3 / \ 3 3 3 / \ 3 /
----------- ----------- ---
y y y
which gives:
2 5 7/3
- + - + ---
y y y
If y is 3 for all the sets, you get this:
2 5 7/3
- + - + ---
3 3 3
which gives:
2*3 5*3 7
--- + --- + ---
9 9 9
which is:
6 15 7
- + -- + -
9 9 9
which totals:
28
-- ~ 3,1111111111111111111111.........1111111.........
9
The average of 1-7, is 4. Obviously this won't work. Note that if you do the above exercise with the numbers 1, 2, 3, 4, 5, 6, 7, 0, 0 (note the two zeroes at the end there), then you'll get the above result.
In other words, if you can't split the number of values up into equally sized sets, the last set will be counted as though it has the same number of values as all the sets preceeding it, but it will be padded with zeroes for all the missing values.
So, you need equally sized sets. Tough luck if your original input set consists of a prime number of values.
What I'm worried about here though is loss of precision. I'm not entirely sure Double will give you good enough precision in such a case, if it initially cannot hold the entire sum of the values.

Apart from using the better approaches already suggested, you can use BigDecimal to make your calculations. (Bear in mind it is immutable)

IMHO, the most robust way of solving your problem is
sort your set
split in groups of elements whose sum wouldn't overflow - since they are sorted, this is fast and easy
do the sum in each group - and divide by the group size
do the sum of the group's sum's (possibly calling this same algorithm recursively) - be aware that if the groups will not be equally sized, you'll have to weight them by their size
One nice thing of this approach is that it scales nicely if you have a really large number of elements to sum - and a large number of processors/machines to use to do the math

Please clarify the potential ranges of the values.
Given that a double has a range ~= +/-10^308, and you're summing 10^9 values, the apparent range suggested in your question is values of the order of 10^299.
That seems somewhat, well, unlikely...
If your values really are that large, then with a normal double you've got only 17 significant decimal digits to play with, so you'll be throwing away about 280 digits worth of information before you can even think about averaging the values.
I would also note (since no-one else has) that for any set of numbers X:
mean(X) = sum(X[i] - c) + c
-------------
N
for any arbitrary constant c.
In this particular problem, setting c = min(X) might dramatically reduce the risk of overflow during the summation.
May I humbly suggest that the problem statement is incomplete...?

A double can be divided by a power of 2 without loss of precision. So if your only problem if the absolute size of the sum you could pre-scale your numbers before summing them. But with a dataset of this size, there is still the risk that you will hit a situation where you are adding small numbers to a large one, and the small numbers will end up being mostly (or completely) ignored.
for instance, when you add 2.2e-20 to 9.0e20 the result is 9.0e20 because once the scales are adjusted so that they numbers can be added together, the smaller number is 0. Doubles can only hold about 17 digits, and you would need more than 40 digits to add these two numbers together without loss.
So, depending on your data set and how many digits of precision you can afford to loose, you may need to do other things. Breaking the data into sets will help, but a better way to preserve precision might be to determine a rough average (you may already know this number). then subtract each value from the rough average before you sum it. That way you are summing the distances from the average, so your sum should never get very large.
Then you take the average delta, and add it to your rough sum to get the correct average. Keeping track of the min and max delta will also tell you how much precision you lost during the summing process. If you have lots of time and need a very accurate result, you can iterate.

You could take the average of averages of equal-sized subsets of numbers that don't exceed the limit.

divide all values by the set size and then sum it up

Option 1 is to use an arbitrary-precision library so you don't have an upper-bound.
Other options (which lose precision) are to sum in groups rather than all at once, or to divide before summing.

So I don't repeat myself so much, let me state that I am assuming that the list of numbers is normally distributed, and that you can sum many numbers before you overflow. The technique still works for non-normal distros, but somethings will not meet the expectations I describe below.
--
Sum up a sub-series, keeping track of how many numbers you eat, until you approach the overflow, then take the average. This will give you an average a0, and count n0. Repeat until you exhaust the list. Now you should have many ai, ni.
Each ai and ni should be relatively close, with the possible exception of the last bite of the list. You can mitigate that by under-biting near the end of the list.
You can combine any subset of these ai, ni by picking any ni in the subset (call it np) and dividing all the ni in the subset by that value. The max size of the subsets to combine is the roughly constant value of the n's.
The ni/np should be close to one. Now sum ni/np * ai and multiple by np/(sum ni), keeping track of sum ni. This gives you a new ni, ai combination, if you need to repeat the procedure.
If you will need to repeat (i.e., the number of ai, ni pairs is much larger than the typical ni), try to keep relative n sizes constant by combining all the averages at one n level first, then combining at the next level, and so on.

First of all, make yourself familiar with the internal representation of double values. Wikipedia should be a good starting point.
Then, consider that doubles are expressed as "value plus exponent" where exponent is a power of two. The limit of the largest double value is an upper limit of the exponent, and not a limit of the value! So you may divide all large input numbers by a large enough power of two. This should be safe for all large enough numbers. You can re-multiply the result with the factor to check whether you lost precision with the multiplication.
Here we go with an algorithm
public static double sum(double[] numbers) {
double eachSum, tempSum;
double factor = Math.pow(2.0,30); // about as large as 10^9
for (double each: numbers) {
double temp = each / factor;
if (t * factor != each) {
eachSum += each;
else {
tempSum += temp;
}
}
return (tempSum / numbers.length) * factor + (eachSum / numbers.length);
}
and dont be worried by the additional division and multiplication. The FPU will optimize the hell out of them since they are done with a power of two (for comparison imagine adding and removing digits at the end of a decimal numbers).
PS: in addition, you may want to use Kahan summation to improve the precision. Kahan summation avoids loss of precision when very large and very small numbers are summed up.

I posted an answer to a question spawned from this one, realizing afterwards that my answer is better suited to this question than to that one. I've reproduced it below. I notice though, that my answer is similar to a combination of Bozho's and Anon.'s.
As the other question was tagged language-agnostic, I chose C# for the code sample I've included. Its relative ease of use and easy-to-follow syntax, along with its inclusion of a couple of features facilitating this routine (a DivRem function in the BCL, and support for iterator functions), as well as my own familiarity with it, made it a good choice for this problem. Since the OP here is interested in a Java solution, but I'm not Java-fluent enough to write it effectively, it might be nice if someone could add a translation of this code to Java.
Some of the mathematical solutions here are very good. Here's a simple technical solution.
Use a larger data type. This breaks down into two possibilities:
Use a high-precision floating point library. One who encounters a need to average a billion numbers probably has the resources to purchase, or the brain power to write, a 128-bit (or longer) floating point library.
I understand the drawbacks here. It would certainly be slower than using intrinsic types. You still might over/underflow if the number of values grows too high. Yada yada.
If your values are integers or can be easily scaled to integers, keep your sum in a list of integers. When you overflow, simply add another integer. This is essentially a simplified implementation of the first option. A simple (untested) example in C# follows
class BigMeanSet{
List<uint> list = new List<uint>();
public double GetAverage(IEnumerable<uint> values){
list.Clear();
list.Add(0);
uint count = 0;
foreach(uint value in values){
Add(0, value);
count++;
}
return DivideBy(count);
}
void Add(int listIndex, uint value){
if((list[listIndex] += value) < value){ // then overflow has ocurred
if(list.Count == listIndex + 1)
list.Add(0);
Add(listIndex + 1, 1);
}
}
double DivideBy(uint count){
const double shift = 4.0 * 1024 * 1024 * 1024;
double rtn = 0;
long remainder = 0;
for(int i = list.Count - 1; i >= 0; i--){
rtn *= shift;
remainder <<= 32;
rtn += Math.DivRem(remainder + list[i], count, out remainder);
}
rtn += remainder / (double)count;
return rtn;
}
}
Like I said, this is untested—I don't have a billion values I really want to average—so I've probably made a mistake or two, especially in the DivideBy function, but it should demonstrate the general idea.
This should provide as much accuracy as a double can represent and should work for any number of 32-bit elements, up to 232 - 1. If more elements are needed, then the count variable will need be expanded and the DivideBy function will increase in complexity, but I'll leave that as an exercise for the reader.
In terms of efficiency, it should be as fast or faster than any other technique here, as it only requires iterating through the list once, only performs one division operation (well, one set of them), and does most of its work with integers. I didn't optimize it, though, and I'm pretty certain it could be made slightly faster still if necessary. Ditching the recursive function call and list indexing would be a good start. Again, an exercise for the reader. The code is intended to be easy to understand.
If anybody more motivated than I am at the moment feels like verifying the correctness of the code, and fixing whatever problems there might be, please be my guest.
I've now tested this code, and made a couple of small corrections (a missing pair of parentheses in the List<uint> constructor call, and an incorrect divisor in the final division of the DivideBy function).
I tested it by first running it through 1000 sets of random length (ranging between 1 and 1000) filled with random integers (ranging between 0 and 232 - 1). These were sets for which I could easily and quickly verify accuracy by also running a canonical mean on them.
I then tested with 100* large series, with random length between 105 and 109. The lower and upper bounds of these series were also chosen at random, constrained so that the series would fit within the range of a 32-bit integer. For any series, the results are easily verifiable as (lowerbound + upperbound) / 2.
*Okay, that's a little white lie. I aborted the large-series test after about 20 or 30 successful runs. A series of length 109 takes just under a minute and a half to run on my machine, so half an hour or so of testing this routine was enough for my tastes.
For those interested, my test code is below:
static IEnumerable<uint> GetSeries(uint lowerbound, uint upperbound){
for(uint i = lowerbound; i <= upperbound; i++)
yield return i;
}
static void Test(){
Console.BufferHeight = 1200;
Random rnd = new Random();
for(int i = 0; i < 1000; i++){
uint[] numbers = new uint[rnd.Next(1, 1000)];
for(int j = 0; j < numbers.Length; j++)
numbers[j] = (uint)rnd.Next();
double sum = 0;
foreach(uint n in numbers)
sum += n;
double avg = sum / numbers.Length;
double ans = new BigMeanSet().GetAverage(numbers);
Console.WriteLine("{0}: {1} - {2} = {3}", numbers.Length, avg, ans, avg - ans);
if(avg != ans)
Debugger.Break();
}
for(int i = 0; i < 100; i++){
uint length = (uint)rnd.Next(100000, 1000000001);
uint lowerbound = (uint)rnd.Next(int.MaxValue - (int)length);
uint upperbound = lowerbound + length;
double avg = ((double)lowerbound + upperbound) / 2;
double ans = new BigMeanSet().GetAverage(GetSeries(lowerbound, upperbound));
Console.WriteLine("{0}: {1} - {2} = {3}", length, avg, ans, avg - ans);
if(avg != ans)
Debugger.Break();
}
}

A random sampling of a small set of the full dataset will often result in a 'good enough' solution. You obviously have to make this determination yourself based on system requirements. Sample size can be remarkably small and still obtain reasonably good answers. This can be adaptively computed by calculating the average of an increasing number of randomly chosen samples - the average will converge within some interval.
Sampling not only addresses the double overflow concern, but is much, much faster. Not applicable for all problems, but certainly useful for many problems.

Consider this:
avg(n1) : n1 = a1
avg(n1, n2) : ((1/2)*n1)+((1/2)*n2) = ((1/2)*a1)+((1/2)*n2) = a2
avg(n1, n2, n3) : ((1/3)*n1)+((1/3)*n2)+((1/3)*n3) = ((2/3)*a2)+((1/3)*n3) = a3
So for any set of doubles of arbitrary size, you could do this (this is in C#, but I'm pretty sure it could be easily translated to Java):
static double GetAverage(IEnumerable<double> values) {
int i = 0;
double avg = 0.0;
foreach (double value in values) {
avg = (((double)i / (double)(i + 1)) * avg) + ((1.0 / (double)(i + 1)) * value);
i++;
}
return avg;
}
Actually, this simplifies nicely into (already provided by martinus):
static double GetAverage(IEnumerable<double> values) {
int i = 1;
double avg = 0.0;
foreach (double value in values) {
avg += (value - avg) / (i++);
}
return avg;
}
I wrote a quick test to try this function out against the more conventional method of summing up the values and dividing by the count (GetAverage_old). For my input I wrote this quick function to return as many random positive doubles as desired:
static IEnumerable<double> GetRandomDoubles(long numValues, double maxValue, int seed) {
Random r = new Random(seed);
for (long i = 0L; i < numValues; i++)
yield return r.NextDouble() * maxValue;
yield break;
}
And here are the results of a few test trials:
long N = 100L;
double max = double.MaxValue * 0.01;
IEnumerable<double> doubles = GetRandomDoubles(N, max, 0);
double oldWay = GetAverage_old(doubles); // 1.00535024998431E+306
double newWay = GetAverage(doubles); // 1.00535024998431E+306
doubles = GetRandomDoubles(N, max, 1);
oldWay = GetAverage_old(doubles); // 8.75142021696299E+305
newWay = GetAverage(doubles); // 8.75142021696299E+305
doubles = GetRandomDoubles(N, max, 2);
oldWay = GetAverage_old(doubles); // 8.70772312848651E+305
newWay = GetAverage(doubles); // 8.70772312848651E+305
OK, but what about for 10^9 values?
long N = 1000000000;
double max = 100.0; // we start small, to verify accuracy
IEnumerable<double> doubles = GetRandomDoubles(N, max, 0);
double oldWay = GetAverage_old(doubles); // 49.9994879713857
double newWay = GetAverage(doubles); // 49.9994879713868 -- pretty close
max = double.MaxValue * 0.001; // now let's try something enormous
doubles = GetRandomDoubles(N, max, 0);
oldWay = GetAverage_old(doubles); // Infinity
newWay = GetAverage(doubles); // 8.98837362725198E+305 -- no overflow
Naturally, how acceptable this solution is will depend on your accuracy requirements. But it's worth considering.

Check out the section for cummulative moving average

In order to keep logic simple, and keep performance not the best but acceptable, i recommend you to use BigDecimal together with the primitive type.
The concept is very simple, you use primitive type to sum values together, whenever the value will underflow or overflow, you move the calculate value to the BigDecimal, then reset it for the next sum calculation. One more thing you should aware is when you construct BigDecimal, you ought to always use String instead of double.
BigDecimal average(double[] values){
BigDecimal totalSum = BigDecimal.ZERO;
double tempSum = 0.00;
for (double value : values){
if (isOutOfRange(tempSum, value)) {
totalSum = sum(totalSum, tempSum);
tempSum = 0.00;
}
tempSum += value;
}
totalSum = sum(totalSum, tempSum);
BigDecimal count = new BigDecimal(values.length);
return totalSum.divide(count);
}
BigDecimal sum(BigDecimal val1, double val2){
BigDecimal val = new BigDecimal(String.valueOf(val2));
return val1.add(val);
}
boolean isOutOfRange(double sum, double value){
// because sum + value > max will be error if both sum and value are positive
// so I adapt the equation to be value > max - sum
if(sum >= 0.00 && value > Double.MAX - sum){
return true;
}
// because sum + value < min will be error if both sum and value are negative
// so I adapt the equation to be value < min - sum
if(sum < 0.00 && value < Double.MIN - sum){
return true;
}
return false;
}
From this concept, every time the result is underflow or overflow, we will keep that value into the bigger variable, this solution might a bit slowdown the performance due to the BigDecimal calculation, but it guarantee the runtime stability.

Why so many complicated long answers. Here is the simplest way to find the running average till now without any need to know how many elements or size etc..
long int i = 0;
double average = 0;
while(there are still elements)
{
average = average * (i / i+1) + X[i] / (i+1);
i++;
}
return average;

Generating correlated numbers

Here is a fun one: I need to generate random x/y pairs that are correlated at a given value of Pearson product moment correlation coefficient, or Pearson r. You can imagine this as two arrays, array X and array Y, where the values of array X and array Y must be re-generated, re-ordered or transformed until they are correlated with each other at a given level of Pearson r. Here is the kicker: Array X and Array Y must be uniform distributions.
I can do this with a normal distribution, but transforming the values without skewing the distribution has me stumped. I tried re-ordering the values in the arrays to increase the correlation, but I will never get arrays correlated at 1.00 or -1.00 just by sorting.
Any ideas?
--
here is the AS3 code for random correlated gaussians, to get the wheels turning:
public static function nextCorrelatedGaussians(r:Number):Array{
var d1:Number;
var d2:Number;
var n1:Number;
var n2:Number;
var lambda:Number;
var r:Number;
var arr:Array = new Array();
var isNeg:Boolean;
if (r<0){
r *= -1;
isNeg=true;
}
lambda= ( (r*r) - Math.sqrt( (r*r) - (r*r*r*r) ) ) / (( 2*r*r ) - 1 );
n1 = nextGaussian();
n2 = nextGaussian();
d1 = n1;
d2 = ((lambda*n1) + ((1-lambda)*n2)) / Math.sqrt( (lambda*lambda) + (1-lambda)*(1-lambda));
if (isNeg) {d2*= -1}
arr.push(d1);
arr.push(d2);
return arr;
}

I ended up writing a short paper on this
It doesn't include your sorting method (although in practice I think it's similar to my first method, in a roundabout way), but does describe two ways that don't require iteration.

Here is an implementation of of twolfe18's algorithm written in Actionscript 3:
for (var j:int=0; j < size; j++) {
xValues[i]=Math.random());
}
var varX:Number = Util.variance(xValues);
var varianceE:Number = 1/(r*varX) - varX;
for (var i:int=0; i < size; i++) {
yValues[i] = xValues[i] + boxMuller(0, Math.sqrt(varianceE));
}
boxMuller is just a method that generates a random Gaussian with the arguments (mean, stdDev).
size is the size of the distribution.
Sample output
Target p: 0.8
Generated p: 0.04846346291280387
variance of x distribution: 0.0707786253165176
varianceE: 17.589920412141158
As you can see I'm still a ways off. Any suggestions?

This apparently simple question has been messing up with my mind since yesterday evening! I looked for the topic of simulating distributions with a dependency, and the best I found is this: simulate dependent random variables. The gist of it is, you can easily simulate 2 normals with given correlation, and they outline a method to transform these non-independent normals, but this won't preserve correlation. The correlation of the transform will be correlated, so to speak, but not identical. See the paragraph "Rank correlation coefficents".
Edit: from what I gather from the second part of the article, the copula method would allow you to simulate / generate random variables with rank correlation.

start with the model y = x + e where e is the error (a normal random variable). e should have a mean of 0 and variance k.
long story short, you can write a formula for the expected value of the Pearson in terms of k, and solve for k. note, you cannot randomly generate data with the Pearson exactly equal to a specific value, only with the expected Pearson of a specific value.
i'll try to come back and edit this post to include a closed form solution when i have access to some paper.
EDIT: ok, i have a hand-wavy solution that is probably correct (but will require testing to confirm). for now, assume desired Pearson = p > 0 (you can figure out the p < 0 case). like i mentioned earlier, set your model for Y = X + E (X is uniform, E is normal).
sample to get your x's
compute var(x)
the variance of E should be: (1/(rsd(x)))^2 - var(x)
generate your y's based on your x's and sample from your normal random variable E
for p < 0, set Y = -X + E. proceed accordingly.
basically, this follows from the definition of Pearson: cov(x,y)/var(x)*var(y). when you add noise to the x's (Y = X + E), the expected covariance cov(x,y) should not change from that with no noise. the var(x) does not change. the var(y) is the sum of var(x) and var(e), hence my solution.
SECOND EDIT: ok, i need to read definitions better. the definition of Pearson is cov(x, y)/(sd(x)sd(y)). from that, i think the true value of var(E) should be (1/(rsd(x)))^2 - var(x). see if that works.

To get a correlation of 1 both X and Y should be the same, so copy X to Y and you have a correlation of 1. To get a -1 correlation, make Y = 1 - X. (assuming X values are [0,1])

A strange problem demands a strange solution -- here is how I solved it.
-Generate array X
-Clone array X to Create array Y
-Sort array X (you can use whatever method you want to sort array X -- quicksort, heapsort anything stable.)
-Measure the starting level of pearson's R with array X sorted and array Y unsorted.
WHILE the correlation is outside of the range you are hoping for
IF the correlation is to low
run one iteration of CombSort11 on array Y then recheck correlation
ELSE IF the correlation is too high
randomly swap two values and recheck correlation
And thats it! Combsort is the real key, it has the effect of increasing the correlation slowly and steadily. Check out Jason Harrison's demo to see what I mean. To get a negative correlation you can invert the sort or invert one of the arrays after the whole process is complete.
Here is my implementation in AS3:
public static function nextReliableCorrelatedUniforms(r:Number, size:int, error:Number):Array {
var yValues:Array = new Array;
var xValues:Array = new Array;
var coVar:Number = 0;
for (var e:int=0; e < size; e++) { //create x values
xValues.push(Math.random());
}
yValues = xValues.concat();
if(r != 1.0){
xValues.sort(Array.NUMERIC);
}
var trueR:Number = Util.getPearson(xValues, yValues);
while(Math.abs(trueR-r)>error){
if (trueR < r-error){ // combsort11 for y
var gap:int = yValues.length;
var swapped:Boolean = true;
while (trueR <= r-error) {
if (gap > 1) {
gap = Math.round(gap / 1.3);
}
var i:int = 0;
swapped = false;
while (i + gap < yValues.length && trueR <= r-error) {
if (yValues[i] > yValues[i + gap]) {
var t:Number = yValues[i];
yValues[i] = yValues[i + gap];
yValues[i + gap] = t;
trueR = Util.getPearson(xValues, yValues)
swapped = true;
}
i++;
}
}
}
else { // decorrelate
while (trueR >= r+error) {
var a:int = Random.randomUniformIntegerBetween(0, size-1);
var b:int = Random.randomUniformIntegerBetween(0, size-1);
var temp:Number = yValues[a];
yValues[a] = yValues[b];
yValues[b] = temp;
trueR = Util.getPearson(xValues, yValues)
}
}
}
var correlates:Array = new Array;
for (var h:int=0; h < size; h++) {
var pair:Array = new Array(xValues[h], yValues[h]);
correlates.push(pair);}
return correlates;
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

XOR Neural Network(FF) converges to 0.5 - java

Related

Java finding Zero Points with Newton Algorithm endless loop

Compute a table in every possible way

How do I generate normal cumulative distribution in Java? its inverse cdf? How about lognormal?

What is a good solution for calculating an average where the sum of all values exceeds a double's limits?

Generating correlated numbers

Categories

Resources