Suppose we implement the following two methods to calculate the nth multiple of a real number x.
public static double multiply( double x, int n )
{
return x * n;
}
public static double iterativeAdd( double x, int n )
{
double a = 0.0;
for( int b = 0; b < n; b++ )
{
a += x;
}
return a;
}
Assume that n is a legal int and that both x and the exact mathematical product of n and x are no less in absolute value than Double.MIN_VALUE (unless both are 0.0) and no greater in absolute value than Double.MAX_VALUE. Here's what I'm wondering: In general, which is closer to the exact value of the product of x and n: the double returned by multiply( x, n ) or the double returned by iterativeAdd( x, n ) and how do you know?
According to my knowledge, the first method will produce more accurate result because in the second method after each addition probability that some of the digits will be truncated and rounded are more then a single multiplier operation as the result will be calculated once and then the digits will be truncated.
Generally, for every floating point operation you do, your epsilon increases. This happens because floating point numbers have a fixed size in memory, limiting their precision. Each operation is rounded to the nearest value that a float can represent. This rounding accumulates after a while.
Both numbers will get you very close to the answer, but if you run both methods on a large and varied set of numbers, you will see that on average iterativeAdd() has a greater distance from the actual value.
Additionally, multiply() will be significantly faster on any machine, so there's no benefit to ever using iterativeAdd().
Both will return approximately the same value, however there is greater chance that the iterativeAdd() will return more inapproximate value, but the difference will be negligible.
Any single float operation results in some precision loss however small.
In multiply() you make use of the float operation only once but in iterativeAdd() you use it n times.
In general we should avoid using any function like iterativeAdd() as it is will take up a lot of processor time with n floating point operations.
Related
We are often taught that floating-point numbers should not be compared for exact equality. However, the following function, which returns the Golden Ratio when passed any positive number, does in fact compare doubles for equality and to my surprise it seems to always work:
public static double f(double x) {
double y;
while ((y = 1 + 1 / x) != x)
x = (x + y) / 2;
return x;
}
#Test
void test() {
assertEquals((1 + sqrt(5)) / 2, f(1.0)); // Passes!
}
I thought that maybe it works for some input arguments but not others. But even if I use JQwik's property testing, it still works!
#Property
void test2(#ForAll #Positive double x) {
assertEquals((1 + sqrt(5)) / 2, f(x)); // Always passes!
}
Can anyone tell me why I never come across a situation where the two floating-point numbers are different by a very small amount?
You were just lucky, in general you don't get exact equality. Try this for example:
public static void main(String[] args) {
var s = 0.0;
for (int i = 0; i < 10; i++) {
s += 0.1;
}
System.out.println(s == 1.0);
}
In your concrete example one would have to do a careful analysis to prove that your iteration always converges to the floating point number closest to phi. If sqrt also returns the closest floating point number to the exact root we would get exact equality.
... and to my surprise it seems to always work:
Not always.
When I tried f(-1/1.6180339887498949), the x and y values oscillated between two floating point values that differed in the last few bits #Steve Summit. Thus an infinite loop.
x:-0.61803398874989490 y:-0.61803398874989468 // Decimal notation
x:-0x1.3c6ef372fe950p-1 y:-0x1.3c6ef372fe94ep-1 // Hex notation
x:-0.61803398874989479 y:-0.6180339887498949
x:-0x1.3c6ef372fe94fp-1 y:-0x1.3c6ef372fe950p-1
x:-0.61803398874989490 y:-0.61803398874989468
x:-0x1.3c6ef372fe950p-1 y:-0x1.3c6ef372fe94ep-1
f(some_starting_x) generally converges to render an x, such that 1 + 1 / x is x again and so meeting the stopping condition.
Better routines can prove that if x is reasonably close, the while loop will eventually get close to the desired answer, yet even then, an oscillation, as shown above is possible. Thus using an iteration limit or close enough test is needed. Usually the 2 oscillation values, when close, they are massaged (e.g. averaged) to form the best answer. If not close, the looping simply failed to find a stable answer.
Can anyone tell me why I never come across a situation where the two floating-point numbers are different by a very small amount?
Inadequate testing.
Morale of the story:
Do not rely on only floating point equality, except in select cases.
f() was not a select case and deserved additional stopping code.
Ref: Two x with math property: x = 1 + 1/x:
x1 = 1.6180339887498948482045868343656...
x2 = -0.61803398874989484820458683436564...
Note x1*x2 == -1. x1 is the Golden_ratio φ.
So, here is my requirement. If a quadratic equation has two roots(an int & a float), I want to take only integer value for further manipulation. I can't figure it out how it's made. Can anyone tell me please. (Java would be better).
[I do not use Java regularly. Here is a solution using C. As only elementary concepts are used, a Java practitioner should be able to translate it readily.]
Searching the web for “sum of cubes” reveals this page which tells us the sum of k3 for k from 1 to n is n2•(n+1)2/4.
That is a quartic equation, for which closed-form solutions are known, but we easily see that, for positive n, n2•(n+1)2/4 is between n4/4 and (n+1)4/4. Then, if m is the sum of the first n cubes, n = floor((4•m)1/4). So, if we have a pow implementation that is faithfully rounded using round-to-nearest (the computed result is one of the two representable values nearest the mathematical result), we can find n with floor(pow(4*m, .25)). If pow is not faithfully rounded, then round(pow(4*m, .25)) will serve over the domain for which pow returns some reasonable result without too much error. (round works because (4•m)1/4 never exceeds n by more than ½. Proof omitted, although Wolfram Alpha shows us the limit as n goes to ∞ is ½, and the excess is monotonic.)
Thus, if m is the sum of the first n cubes, then n is the result of round(pow(4*m, .25)). So we can compute this value for n, then compute the sum of the first n cubes as n*n*(n+1)*(n+1)/4 and test whether that equals m. If it does, we found a solution and return it. If it does not, m is not a sum of cubes, and we return −1:
#include <math.h>
#include <stdio.h>
static double findNb(double m)
{
double n = round(pow(4*m, .25));
double sum = n * n * (n+1) * (n+1) / 4;
return m == sum ? n : -1;
}
static void Test(double m)
{
printf("findNb(%.99g) -> %.99g.\n", m, findNb(m));
}
int main(void)
{
Test(0);
Test(1);
Test(2);
Test(8);
Test(9);
Test(10);
Test(250500249999.);
Test(250500250000.);
Test(250500250001.);
}
Output:
findNb(0) -> 0.
findNb(1) -> 1.
findNb(2) -> -1.
findNb(8) -> -1.
findNb(9) -> 2.
findNb(10) -> -1.
findNb(250500249999) -> -1.
findNb(250500250000) -> 1000.
findNb(250500250001) -> -1.
Of course, the limits of floating-point precision will cause this code to fail once m is larger than can be represented in double.
Use the basic quadatic formula to find the roots.
Set the roots to different values (both doubles)
Use modulous (%) by 1 and cast the value to a double. If the double calculated is !=0, then it is not an int.
I must calculate sin(x) with a Taylor's series, until the output has 6 decimal places. The argument is an angle. I didn't implement checking the decimal places, I'm just printing next values (to check if it's working), but after 10-20 iteration it shows infinities/NaN's.
What's wrong in my thinking?
public static void sin(double x){
double sin = 0;
int n=1;
while(1<2){
sin += (Math.pow(-1,n) / factorial(2*n+1)) * Math.pow(x, 2*n+1);
n++;
try {
Thread.sleep(50);
} catch (InterruptedException ex) {
}
// CHECKING THE PRECISION HERE LATER
System.out.println(sin);
}
}
the Equation:
Don't compute each term using factorials and powers! You will rapidly overflow.
Just realize that each next term is -term * x * x / ((n+1)*(n+2)) where n increases by 2 for each term:
double tolerance = 0.0000007; // or whatever limit you want
double sin = 0.;
int n = 1;
double term = x;
while ( Math.abs(term) > tolerance ) {
sin += term;
term *= -( (x/(n+1)) * (x/(n+2)) );
n+= 2;
}
To add on to the answer provided by #Xoce (and #FredK), remember that you are computing the McLaurin series (special case of Taylor about x = 0). While this will converge fairly quickly for values that are within about pi/2 of zero, you may not get convergence of the digits before the factorial explodes for values of x further than that.
My recommendation is to use the actual Taylor series about the closest value of sin(x) for which the exact value is known (i.e., the nearest multiple of pi/2, not just about zero. And definitely do the convergence check!
Problem:
NAN error is normally a really big number, something that can happend if you divide 2 numbers but the divisor is very small, or zero.
Solution
This happens because your factorial number is getting an overflow, and later at some point you are dividing by zero again
if your factorial is taken as argument an int, then change it by , for example, a BIgInterger object.
When creating a range of numbers as follows,
float increment = 0.01f;
for (int i = 0; i < 100; i++) {
float value = i * increment;
System.out.println(value);
}
it is clear, that I will end up for some i with values like
0.049999997, which are no exact multiples of 0.01, due to rounding errors.
When I try the same with floats in the range of usual integers, I have never seen the same problem:
float increment = 1.0f; //Still a float but representing an int value
for (int i = 0; i < 100; i++) {
float value = i * increment;
System.out.println(value);
}
One could expect, that this also prints out e.g. 49.999999 instead of 50, which I never saw however.
I am wondering, whether I can rely on that for any value of i and any value of increment, as long as it represents an integer (although its type is float).
And if so, I would be interested in an explanation, why rounding errors can not happen in that case.
Integers in a certain range (about up to one million or so) can be represented exactly as a float. Therefore you don't get rounding errors when you work only with them.
This is because float is based on floating point notation.
In rude words it tries to represent your decimal number as a sum of fractions of power 2.
It means it will try to sum 1/2^n1 + 1/2^n2 + 1/2^n3 .... 1/2^nm until gets closes or exact value that you put.
For example (rude):
0.5 it will represent as 1/2
0.25 it will represent as 1/2²
0.1 it will represent as 1/2^4
but in this case it will mutiply the number by 1.600000023841858 (mantissa) and it will give a number closer but not equal to 1 (1/2^4 x 1.600000023841858 = 0,100000001
Now you can see why after some loops the value changes to nonsense values
For rich detail of how it works read floating points IEEE 754
If you want precision you should use for example a BigDecimal from Java that uses another architecture to represent decimal numbers.
Double has the same problem.
Check this tool to see the repressentation of floating point:
http://www.h-schmidt.net/FloatConverter/IEEE754.html
It doesn't really represent an integer. It's still a float that you're just attempting to add the value 1.0 to. You'll get rounding errors as soon as 1.0 underflows (whenever the exponent gets larger than zero).
The double Value increases randomly.
for(double i=-1;i<=1;i+=0.1)
{
for(double j=-1;j<=1;j+=0.1)
{
//logic
system.out.print(i);
system.out.print(j);
}
}
Here, the value comes like:
-1, -0.9, -0.8, -0.69, -0.51 ....-0.099 , 1.007 (WHY, U ARE GREATER THAN 1)
The output is not same but kind of this.
But, I want the exact values only. WHat should I do ??
You can use an integer counter, and multiply to get the double:
for(int i = -10; i <= 10; i++) {
double iDouble = 0.1 * i;
....
}
The double will still have rounding error - that is inevitable - but the rounding error will not affect the loop count.
You can't get exact values do to the limitations of doubles. They can't always represent exactly the decimal you want, and they have precision errors. In your case you may want to cast the double to an int for the double comparison, but as #alex said you shouldn't be doing this.
This is due to the way that doubles are stored in memory, they are only exact if the fractional part is a negative power of two, e.g. 0, 1/2, 1/4, etc. This is also why you should never use equality statements for doubles, but rather > and <. For exact calculations, you could use BigDecimal:
BigDecimal bigDecimal = new BigDecimal(123.45);
bigDecimal = bigDecimal.add(new BigDecimal(123.45));
System.out.println(bigDecimal.floatValue()); // prints 246.9
As said here, floating-point variables must not be used as loop counters. Limited-precision IEEE 754 floating-point types cannot represent:
all simple fractions exactly
all decimals precisely, even when the decimals can be represented in a small number of digits.
all digits of large values, meaning that incrementing a large floating-point value might not change that value within the available precision.
(...) Using floating-point loop counters can lead to unexpected behavior.
Instead, use integer loop counter and increment another variable inside this loop like this:
for (int count = 1; count <= 20; count += 1) {
double x = -1 + count * 0.1;
/* ... */
}