Error with l2 normalization on a vector [Java]

Error with l2 normalization on a vector [Java] - java

I'm trying to use the l2 normalization on a double vector with Java.
double[] vector = {0.00423823948, 0.00000000000823285934, 0.0000342523505342, 0.000040240234023423, 0, 0};
Now if i use the l2 normalization
for(double i : vector){
squareVectorSum += i * i;
}
normalizationFactor = Math.sqrt(squareVectorSum);
// System.out.println(squareVectorSum+" "+normalizationFactor);
for(int i = 0; i < vector.length; i++){
double normalizedFeature = vector[i] / normalizationFactor;
vector_result[i] = normalizedFeature;
}
My normalized vector is like this
Normalized vector (l2 normalization)
0.9999222784309146 1.9423676996312713E-9 0.008081112110203743 0.009493825603572155 0.0 0.0
Now if if make the squared sum of all the normalized-vector components I should get a sum that is is equal to one, instead my squared sum is
for(double i : vector_result){
sum += i*i;
}
Squared sum of the normalized-vector
1.0000000000000004
Why is my sum not equal to one?
Are there some problems in the code?
Or it's just because my numbers are too small and there is some approximation with doubles?

As indicated above, this is a common issue, one you're going to have to deal with if you're going to use floating point binary arithmetic. The problem mostly crops up when you want to compare two floating point binary numbers for equality. Since the operations applied to arrive at the values may not be identical, neither will their binary representations.
There are at least a couple strategies you can consider to deal with this situation. The first involves comparing the absolute difference between two floating point numbers, x and y rather than strict equality and comparing them to some small value ϵ>0. This would look something like
if (Math.abs(y-x) < epsilon) {
// Assume x == y
} else {
// Assume x != y
}
This works well when the possible values of x and y have a relatively tight bounding on their exponents. When this is not the case, value of x and y may be such that the difference always dominates the ϵ you choose (if the exponent is too large) or ϵ dominates the difference (such as when the possible exponents of x and y are small). To get around this, instead of comparing the absolute difference, you could instead compare the ratio of x and y to 1.0 and see whether that ratio has an absolute difference from 1.0 by more than ϵ. That would look like:
if (Math.abs(x/y-1.0) < epsilon) {
// Assume x == y
} else {
// Assume x != y
}
You will likely need to add another check to ensure y!=0 to avoid division by zero, but that's the general idea.
Other options include using a fixed point library for Java or a rational number library for Java. I have no recommendations for that, though.

Related

Floating-point equality unexpectedly working

We are often taught that floating-point numbers should not be compared for exact equality. However, the following function, which returns the Golden Ratio when passed any positive number, does in fact compare doubles for equality and to my surprise it seems to always work:
public static double f(double x) {
double y;
while ((y = 1 + 1 / x) != x)
x = (x + y) / 2;
return x;
}
#Test
void test() {
assertEquals((1 + sqrt(5)) / 2, f(1.0)); // Passes!
}
I thought that maybe it works for some input arguments but not others. But even if I use JQwik's property testing, it still works!
#Property
void test2(#ForAll #Positive double x) {
assertEquals((1 + sqrt(5)) / 2, f(x)); // Always passes!
}
Can anyone tell me why I never come across a situation where the two floating-point numbers are different by a very small amount?

You were just lucky, in general you don't get exact equality. Try this for example:
public static void main(String[] args) {
var s = 0.0;
for (int i = 0; i < 10; i++) {
s += 0.1;
}
System.out.println(s == 1.0);
}
In your concrete example one would have to do a careful analysis to prove that your iteration always converges to the floating point number closest to phi. If sqrt also returns the closest floating point number to the exact root we would get exact equality.

... and to my surprise it seems to always work:
Not always.
When I tried f(-1/1.6180339887498949), the x and y values oscillated between two floating point values that differed in the last few bits #Steve Summit. Thus an infinite loop.
x:-0.61803398874989490 y:-0.61803398874989468 // Decimal notation
x:-0x1.3c6ef372fe950p-1 y:-0x1.3c6ef372fe94ep-1 // Hex notation
x:-0.61803398874989479 y:-0.6180339887498949
x:-0x1.3c6ef372fe94fp-1 y:-0x1.3c6ef372fe950p-1
x:-0.61803398874989490 y:-0.61803398874989468
x:-0x1.3c6ef372fe950p-1 y:-0x1.3c6ef372fe94ep-1
f(some_starting_x) generally converges to render an x, such that 1 + 1 / x is x again and so meeting the stopping condition.
Better routines can prove that if x is reasonably close, the while loop will eventually get close to the desired answer, yet even then, an oscillation, as shown above is possible. Thus using an iteration limit or close enough test is needed. Usually the 2 oscillation values, when close, they are massaged (e.g. averaged) to form the best answer. If not close, the looping simply failed to find a stable answer.
Can anyone tell me why I never come across a situation where the two floating-point numbers are different by a very small amount?
Inadequate testing.
Morale of the story:
Do not rely on only floating point equality, except in select cases.
f() was not a select case and deserved additional stopping code.
Ref: Two x with math property: x = 1 + 1/x:
x1 = 1.6180339887498948482045868343656...
x2 = -0.61803398874989484820458683436564...
Note x1*x2 == -1. x1 is the Golden_ratio φ.

Large number of operations on double data type producing zero as result

I have a program which multiplies a probability over 500 times, but when I am doing so the output is zero. Should I use some other data type?
Please help.
Here is the code I am using:
double d = 1/80000d;
for (int i = 0; i < 500; i++) {
d *= d;
}
System.out.println(d);

The output is zero because double has a limited percision, and if you multiply a number lower than 1 by itself enough times, you'll get a result too small to be distinguished from 0.
If you print d after each iteration, you'll see that it becomes 0 quite fast :
1.5625E-10
2.4414062500000002E-20
5.960464477539064E-40
3.552713678800502E-79
1.2621774483536196E-157
1.593091911E-314
0.0

When working with probabilities, you can avoid these sort of numerical issues by working instead with logarithms, so that you can work additively. Something like
double d = 1/80000d;
double ld = Math.log(d)
for (int i = 0; i < 500; i++) {
ld += ld;
}
System.out.println(ld);

Naturally, if you have two numbers less than 1, and repeated the multiply times sooner or later will be small enough to not be able resepresented in Double, Extended, or any floating arithmetic it done in the future. ;)
What your turn is the aproximation that has been stored in the type. ZERO is one of the special constants of IEEE 754 format.
I do not know JAVA, but exist the type Extended in other languages.

Do rounding errors occur also in floats representing integers?

When creating a range of numbers as follows,
float increment = 0.01f;
for (int i = 0; i < 100; i++) {
float value = i * increment;
System.out.println(value);
}
it is clear, that I will end up for some i with values like
0.049999997, which are no exact multiples of 0.01, due to rounding errors.
When I try the same with floats in the range of usual integers, I have never seen the same problem:
float increment = 1.0f; //Still a float but representing an int value
for (int i = 0; i < 100; i++) {
float value = i * increment;
System.out.println(value);
}
One could expect, that this also prints out e.g. 49.999999 instead of 50, which I never saw however.
I am wondering, whether I can rely on that for any value of i and any value of increment, as long as it represents an integer (although its type is float).
And if so, I would be interested in an explanation, why rounding errors can not happen in that case.

Integers in a certain range (about up to one million or so) can be represented exactly as a float. Therefore you don't get rounding errors when you work only with them.

This is because float is based on floating point notation.
In rude words it tries to represent your decimal number as a sum of fractions of power 2.
It means it will try to sum 1/2^n1 + 1/2^n2 + 1/2^n3 .... 1/2^nm until gets closes or exact value that you put.
For example (rude):
0.5 it will represent as 1/2
0.25 it will represent as 1/2²
0.1 it will represent as 1/2^4
but in this case it will mutiply the number by 1.600000023841858 (mantissa) and it will give a number closer but not equal to 1 (1/2^4 x 1.600000023841858 = 0,100000001
Now you can see why after some loops the value changes to nonsense values
For rich detail of how it works read floating points IEEE 754
If you want precision you should use for example a BigDecimal from Java that uses another architecture to represent decimal numbers.
Double has the same problem.
Check this tool to see the repressentation of floating point:
http://www.h-schmidt.net/FloatConverter/IEEE754.html

It doesn't really represent an integer. It's still a float that you're just attempting to add the value 1.0 to. You'll get rounding errors as soon as 1.0 underflows (whenever the exponent gets larger than zero).

Poisson Distribution in Java (correctness?)

I have to generate data for a Poisson distribution. My range is n = 1000 up to 100K. Where n is the number of data elements; k varies from 1 to n. It says to use lambda as n/2
I have never taken stats and have no idea how to get the correct curve here. I can feed it lambda as n/2, but do I vary K from 0-n? I tried this (passing k in as a parameter) and when I graphed the data it ramped up, not a fish tail. What am I doing wrong, or am I doing it correctly?
Thanks
I have this code in java from Knuth.
static double poissonRandomNumber(int lambda) {
double L = Math.exp(-lambda);
int k = 0;
double p = 1;
do {
k = k + 1;
double u = Math.random();
p = p * u;
} while (p > L);
return k - 1;
}

One of the problems you are running into is a basic limitation of how computers represent and perform calculations with floating point numbers.
A real number is represented on a computer in a form similar to scientific notation:
Significant digits × base^exponent
For double precision numbers, there are 11 bits used for the exponent and 52 for the "significant digits" portion. Because floating point numbers are normalized, the first positive floating point number > 0.0 has a value of about 10^-320 (this is defined as Double.MIN_VALUE in Java). See IEEE Standard 754 Floating Point Numbers for a good writeup on this.
Consider the line of code:
double L = Math.exp(-lambda);
With a lambda of 1000, e^-1000 (which is about 10^-435) is less than Double.MIN_VALUE, and there is no way the computer can represent e^-1000 any differently than it can represent e^-100000
You can solve this problem by noticing that lambda is an "arrival rate", and you can calculate random samples for shorter intervals and sum them. That is
x = p(L);
can be computed as
x = p(L/2) + p(L/2);
and larger numbers can be approximated:
x = 100 * p(L/100);
The Wikipedia article has on the Poisson distribution has some good pointers to ways to compute Poisson distributions for large values of lambda.

Double Values Increases Randomly

The double Value increases randomly.
for(double i=-1;i<=1;i+=0.1)
{
for(double j=-1;j<=1;j+=0.1)
{
//logic
system.out.print(i);
system.out.print(j);
}
}
Here, the value comes like:
-1, -0.9, -0.8, -0.69, -0.51 ....-0.099 , 1.007 (WHY, U ARE GREATER THAN 1)
The output is not same but kind of this.
But, I want the exact values only. WHat should I do ??

You can use an integer counter, and multiply to get the double:
for(int i = -10; i <= 10; i++) {
double iDouble = 0.1 * i;
....
}
The double will still have rounding error - that is inevitable - but the rounding error will not affect the loop count.

You can't get exact values do to the limitations of doubles. They can't always represent exactly the decimal you want, and they have precision errors. In your case you may want to cast the double to an int for the double comparison, but as #alex said you shouldn't be doing this.

This is due to the way that doubles are stored in memory, they are only exact if the fractional part is a negative power of two, e.g. 0, 1/2, 1/4, etc. This is also why you should never use equality statements for doubles, but rather > and <. For exact calculations, you could use BigDecimal:
BigDecimal bigDecimal = new BigDecimal(123.45);
bigDecimal = bigDecimal.add(new BigDecimal(123.45));
System.out.println(bigDecimal.floatValue()); // prints 246.9

As said here, floating-point variables must not be used as loop counters. Limited-precision IEEE 754 floating-point types cannot represent:
all simple fractions exactly
all decimals precisely, even when the decimals can be represented in a small number of digits.
all digits of large values, meaning that incrementing a large floating-point value might not change that value within the available precision.
(...) Using floating-point loop counters can lead to unexpected behavior.
Instead, use integer loop counter and increment another variable inside this loop like this:
for (int count = 1; count <= 20; count += 1) {
double x = -1 + count * 0.1;
/* ... */
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.