Since the trigonometric functions in java.lang.Math are quite slow: is there a library that does a quick and good approximation? It seems possible to do a calculation several times faster without losing much precision. (On my machine a multiplication takes 1.5ns, and java.lang.Math.sin 46ns to 116ns). Unfortunately there is not yet a way to use the hardware functions.
UPDATE: The functions should be accurate enough, say, for GPS calculations. That means you would need at least 7 decimal digits accuracy, which rules out simple lookup tables. And it should be much faster than java.lang.Math.sin on your basic x86 system. Otherwise there would be no point in it.
For values over pi/4 Java does some expensive computations in addition to the hardware functions. It does so for a good reason, but sometimes you care more about the speed than for last bit accuracy.
Computer Approximations by Hart. Tabulates Chebyshev-economized approximate formulas for a bunch of functions at different precisions.
Edit: Getting my copy off the shelf, it turned out to be a different book that just sounds very similar. Here's a sin function using its tables. (Tested in C since that's handier for me.) I don't know if this will be faster than the Java built-in, but it's guaranteed to be less accurate, at least. :) You may need to range-reduce the argument first; see John Cook's suggestions. The book also has arcsin and arctan.
#include <math.h>
#include <stdio.h>
// Return an approx to sin(pi/2 * x) where -1 <= x <= 1.
// In that range it has a max absolute error of 5e-9
// according to Hastings, Approximations For Digital Computers.
static double xsin (double x) {
double x2 = x * x;
return ((((.00015148419 * x2
- .00467376557) * x2
+ .07968967928) * x2
- .64596371106) * x2
+ 1.57079631847) * x;
}
int main () {
double pi = 4 * atan (1);
printf ("%.10f\n", xsin (0.77));
printf ("%.10f\n", sin (0.77 * (pi/2)));
return 0;
}
Here is a collection of low-level tricks for quickly approximating trig functions. There is example code in C which I find hard to follow, but the techniques are just as easily implemented in Java.
Here's my equivalent implementation of invsqrt and atan2 in Java.
I could have done something similar for the other trig functions, but I have not found it necessary as profiling showed that only sqrt and atan/atan2 were major bottlenecks.
public class FastTrig
{
/** Fast approximation of 1.0 / sqrt(x).
* See http://www.beyond3d.com/content/articles/8/
* #param x Positive value to estimate inverse of square root of
* #return Approximately 1.0 / sqrt(x)
**/
public static double
invSqrt(double x)
{
double xhalf = 0.5 * x;
long i = Double.doubleToRawLongBits(x);
i = 0x5FE6EB50C7B537AAL - (i>>1);
x = Double.longBitsToDouble(i);
x = x * (1.5 - xhalf*x*x);
return x;
}
/** Approximation of arctangent.
* Slightly faster and substantially less accurate than
* {#link Math#atan2(double, double)}.
**/
public static double fast_atan2(double y, double x)
{
double d2 = x*x + y*y;
// Bail out if d2 is NaN, zero or subnormal
if (Double.isNaN(d2) ||
(Double.doubleToRawLongBits(d2) < 0x10000000000000L))
{
return Double.NaN;
}
// Normalise such that 0.0 <= y <= x
boolean negY = y < 0.0;
if (negY) {y = -y;}
boolean negX = x < 0.0;
if (negX) {x = -x;}
boolean steep = y > x;
if (steep)
{
double t = x;
x = y;
y = t;
}
// Scale to unit circle (0.0 <= y <= x <= 1.0)
double rinv = invSqrt(d2); // rinv ≅ 1.0 / hypot(x, y)
x *= rinv; // x ≅ cos θ
y *= rinv; // y ≅ sin θ, hence θ ≅ asin y
// Hack: we want: ind = floor(y * 256)
// We deliberately force truncation by adding floating-point numbers whose
// exponents differ greatly. The FPU will right-shift y to match exponents,
// dropping all but the first 9 significant bits, which become the 9 LSBs
// of the resulting mantissa.
// Inspired by a similar piece of C code at
// http://www.shellandslate.com/computermath101.html
double yp = FRAC_BIAS + y;
int ind = (int) Double.doubleToRawLongBits(yp);
// Find φ (a first approximation of θ) from the LUT
double φ = ASIN_TAB[ind];
double cφ = COS_TAB[ind]; // cos(φ)
// sin(φ) == ind / 256.0
// Note that sφ is truncated, hence not identical to y.
double sφ = yp - FRAC_BIAS;
double sd = y * cφ - x * sφ; // sin(θ-φ) ≡ sinθ cosφ - cosθ sinφ
// asin(sd) ≅ sd + ⅙sd³ (from first 2 terms of Maclaurin series)
double d = (6.0 + sd * sd) * sd * ONE_SIXTH;
double θ = φ + d;
// Translate back to correct octant
if (steep) { θ = Math.PI * 0.5 - θ; }
if (negX) { θ = Math.PI - θ; }
if (negY) { θ = -θ; }
return θ;
}
private static final double ONE_SIXTH = 1.0 / 6.0;
private static final int FRAC_EXP = 8; // LUT precision == 2 ** -8 == 1/256
private static final int LUT_SIZE = (1 << FRAC_EXP) + 1;
private static final double FRAC_BIAS =
Double.longBitsToDouble((0x433L - FRAC_EXP) << 52);
private static final double[] ASIN_TAB = new double[LUT_SIZE];
private static final double[] COS_TAB = new double[LUT_SIZE];
static
{
/* Populate trig tables */
for (int ind = 0; ind < LUT_SIZE; ++ ind)
{
double v = ind / (double) (1 << FRAC_EXP);
double asinv = Math.asin(v);
COS_TAB[ind] = Math.cos(asinv);
ASIN_TAB[ind] = asinv;
}
}
}
That might make it : http://sourceforge.net/projects/jafama/
I'm surprised that the built-in Java functions would be so slow. Surely the JVM is calling the native trig functions on your CPU, not implementing the algorithms in Java. Are you certain your bottleneck is calls to trig functions and not some surrounding code? Maybe some memory allocations?
Could you rewrite in C++ the part of your code that does the math? Just calling C++ code to compute trig functions probably wouldn't speed things up, but moving some context too, like an outer loop, to C++ might speed things up.
If you must roll your own trig functions, don't use Taylor series alone. The CORDIC algorithms are much faster unless your argument is very small. You could use CORDIC to get started, then polish the result with a short Taylor series. See this StackOverflow question on how to implement trig functions.
On the x86 the java.lang.Math sin and cos functions do not directly call the hardware functions because Intel didn't always do such a good job implimenting them. There is a nice explanation in bug #4857011.
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4857011
You might want to think hard about an inexact result. It's amusing how often I spend time finding this in others code.
"But the comment says Sin..."
You could pre-store your sin and cos in an array if you only need some approximate values.
For example, if you want to store the values from 0° to 360°:
double sin[]=new double[360];
for(int i=0;i< sin.length;++i) sin[i]=Math.sin(i/180.0*Math.PI):
you then use this array using degrees/integers instead of radians/double.
I haven't heard of any libs, probably because it's rare enough to see trig heavy Java apps. It's also easy enough to roll your own with JNI (same precision, better performance), numerical methods (variable precision / performance ) or a simple approximation table.
As with any optimization, best to test that these functions are actually a bottleneck before bothering to reinvent the wheel.
Trigonometric functions are the classical example for a lookup table. See the excellent
Lookup table article at wikipedia
If you're searching a library for J2ME you can try:
the Fixed Point Integer Math Library MathFP
The java.lang.Math functions call the hardware functions. There should be simple appromiations you can make but they won't be as accurate.
On my labtop, sin and cos takes about 144 ns.
In the sin/cos test I was performing for integers zero to one million. I assume that 144 ns is not fast enough for you.
Do you have a specific requirement for the speed you need?
Can you qualify your requirement in terms of time per operation which is satisfactory?
Check out Apache Commons Math package if you want to use existing stuff.
If performance is really of the essence, then you can go about implementing these functions yourself using standard math methods - Taylor/Maclaurin series', specifically.
For example, here are several Taylor series expansions that might be useful (taken from wikipedia):
Could you elaborate on what you need to do if these routines are too slow. You might be able to do some coordinate transformations ahead of time some way or another.
Related
I have to write a program in which I write a,b c,d (coefficients of equation 3 degree) and as a result I should get X1, X2, X3 (solutions of equation). I have to use Viete's formulas and BigDecimal for this, because my lecturer requires it from me.
I came to the conclusion that I have to solve the following system of equations:
x1+x2+x3=-b/a
x1*x2+x1*x3+x2*x3=c/a
x1*x2*x3=-d/a
I have no idea how I can do it in Java.
I tried to use the JAMA package, but I don't think I can use it to solve such a system of equations.
How can I do that?
If you want to find the roots of a cubic polynomial in Java you can do it easily using Newton-Raphson's method.
The algorithm -
1. Input: initial x, func(x), derivFunc(x)
Output: Root of Func()
2. Compute values of func(x) and derivFunc(x) for given initial x
3. Compute h: h = func(x) / derivFunc(x)
4. While h is greater than allowed error ε
- h = func(x) / derivFunc(x)
- x = x – h
Here is a demonstration for solving the cubic equation x^3-x^2+2
class XYZ {
static final double EPSILON = 0.001;
// An example function whose solution
// is determined using Bisection Method.
// The function is x^3 - x^2 + 2
static double func(double x)
{
return x * x * x - x * x + 2;
}
// Derivative of the above function
// which is 3*x^x - 2*x
static double derivFunc(double x)
{
return 3 * x * x - 2 * x;
}
// Function to find the root
static void newtonRaphson(double x)
{
double h = func(x) / derivFunc(x);
while (Math.abs(h) >= EPSILON)
{
h = func(x) / derivFunc(x);
// x(i+1) = x(i) - f(x) / f'(x)
x = x - h;
}
System.out.print("The value of the"
+ " root is : "
+ Math.round(x * 100.0) / 100.0);
}
// Driver code
public static void main (String[] args)
{
// Initial values assumed
double x0 = -20;
newtonRaphson(x0);
}
}
Output - The value of root is : -1.00
To do it your way you have to solve a system of non-linear equations which is harder but can be done using the Newton Raphson's Multivariate method. You might want to look it up. Also note that this is an approximate method and guesses the roots after you put an initial 'guess' of your own (in this case its -20)
The Newton (Raphson, Kantorovich) method for the Viete equations gives you the (Weierstrass-)Durand-Kerner method of simultaneous root approximation. However, in the completed method you will no longer see the Viete identities, they kind of cancel out. You will need complex numbers over the demanded real numbers data type.
If you go with the simple Newton method like in the other answer, then after computing the one real root you can split off the linear factor belonging to it via the Horner-Ruffini scheme and then solve the remaining quadratic equation directly. Then you only need to consider the possible complex nature of the roots in constructing the output strings, as the real and imaginary parts have easy direct formulas.
I'm a bit annoyed with a method I wrote to approximate sine function in Java. Here it is, it's based on Taylor's series.
static double PI = 3.14159265358979323846;
static double eps = 0.0000000000000000001;
static void sin(double x) {
x = x % (2 * PI);
double term = 1.0;
double res = 0.0;
for (int i = 1; term > eps; i++) {
term = term * (x / i);
if (i % 4 == 1) res += term;
if (i % 4 == 3) res -= term;
}
System.out.println(sum);
}
For little values, I got very good approximation of sine, but for large values (e.g pow(10,22)), results seems very very wrong.
Here are the results :
sin(pow(10,22)) // 0.8740280612007599
Math.sin(pow(10,22)) // -0.8522008497671888
Does someone have an idea ? Thank you !
Best regards,
Be reassured that the Java sin function will be off too.
You problem is that the Taylor expansion for sin has a small radius of convergence and convergence is slow even if you're within that radius.
There are floating point considerations too: a floating point double gives you about 15 significant figures of accuracy.
So for large arguments for sin, the accuracy will deteriorate significantly especially given that sin is a periodic function:
sin(x + 2 * pi * n) = sin(x) for any integer n.
Your answer is incorrect for big numbers because you accumulate a lot of rounding errors due to double presentation. When the number is big, then your for loop will iterate a lot before the term becomes smaller than epsilon. In each iteration, a rounding error is accumulated. The result is a very big error in the final value. Read some nice reference on "Numerical Analysis". Anyway, Tylor's series approximate sin near 0, by definition. So, it is normal not to be correct for very big numbers.
The difference actually has nothing to do with the radius of convergence of the Taylor Series and has to do with double precision not being accurate enough to hold the precision required for such big numbers. The radius of the Taylor series for the sine function is infinity.
10^22 is approximately 2^73. Since the mantissa for a double precision number is 52 bits, consecutive values that can be stored with double precision format will be 2^21 apart from each other. Since an evaluation of the sine function requires more resolution than that, you won't be able to reliably get an answer.
I am brand new to Java, second day! I want generate samples with normal distribution. I am using inverse transformation.
Basically, I want to find the inverse normal cumulative distribution, then find its inverse. And generate samples.
My questions is: Is there a built-in function for inverse normal cdf? Or do I have to hand code?
I have seen people refer to this on apache commons. Is this a built-in? Or do I have to download it?
If I have to do it myself, can you give me some tips? If I download, doesn't my prof also have to have the "package" or special file installed?
Thanks in advance!
Edit:Just found I can't use libraries, also heard there is simpler way converting normal using radian.
As it is mentioned here:
Apache Commons - Math has what you are looking for.
More specifically, check out the NormalDistrubitionImpl class.
And no your professor doesn't need to download stuff if you provide him with all the needed libraries.
UPDATE :
If you want to hand code it (I don't know the actual formula), you can check the following link:
http://home.online.no/~pjacklam/notes/invnorm/
There are 2 people who implemented it in java: http://home.online.no/~pjacklam/notes/invnorm/#Java
I had had the same problem and find its solution, the following code will give results for cumulative distribution function just like excel do:
private static double erf(double x)
{
//A&S formula 7.1.26
double a1 = 0.254829592;
double a2 = -0.284496736;
double a3 = 1.421413741;
double a4 = -1.453152027;
double a5 = 1.061405429;
double p = 0.3275911;
x = Math.abs(x);
double t = 1 / (1 + p * x);
//Direct calculation using formula 7.1.26 is absolutely correct
//But calculation of nth order polynomial takes O(n^2) operations
//return 1 - (a1 * t + a2 * t * t + a3 * t * t * t + a4 * t * t * t * t + a5 * t * t * t * t * t) * Math.Exp(-1 * x * x);
//Horner's method, takes O(n) operations for nth order polynomial
return 1 - ((((((a5 * t + a4) * t) + a3) * t + a2) * t) + a1) * t * Math.exp(-1 * x * x);
}
public static double NORMSDIST(double z)
{
double sign = 1;
if (z < 0) sign = -1;
double result=0.5 * (1.0 + sign * erf(Math.abs(z)/Math.sqrt(2)));
return result;
}
Mathematically, this is a hard problem, and there are a few solutions you might consider.
Dislcaimer: Mathematical jargon ahead.
As you probably already know, the normalcdf function is used to calculate probabilities of normal random variables. Because a normal distribution is continuous, the corresponding probability density function (normalpdf) does not itself give probabilities, (in contrast to discrete distributions like binomial or geometric distributions). Instead, the area under the curve gives the probability that the normal random variable falls within a range of values. So, the normalcdf function you seek is the area under a section of the normalpdf function.
Mathematically, finding the area under a continuous curve is a fundamental problem of calculus. The solution to this type of problem is called an integral and integrating a function over a range of numbers means finding the area under the curve and between the lowest value in the range to the highest.
In most circumstances, we could just integrate the pdf function to get the cdf function, then evaluate it wherever we want. The heart of the problem, and the reason that an algorithm in Java is not as simple as one might think, is that normalpdf function does not have a closed form integral- it's value cannot be calculated in any finite number of steps. So, values of the normalcdf function are particularly elusive.
There are two main classes of solutions for the problem.
1. Numerical Integration Techniques
Numerical integration techniques solve the problem by approximating the area under the curve geometrically. The area is divided into rectangles or other shapes of equal or varying widths, with the height of each being given by the pdf function. The sum of the areas of the rectangle is an approximation of the area under the curve, which is the corresponding probability. These technique can be used to compute values to arbitrary precision, but is more computationally expensive than class 2. Using better approximations (e.g. Simpson's rule) improves computation. A simple numeric integration method follows.
public static double normCDF(double z)
{ double LeftEndpoint = -100;
int nRectangles = 100000;
double runningSum = 0;
double x;
for(int n = 0; n < nRectangles; n++){
x = LeftEndpoint + n*(z-LeftEndpoint)/nRectangles;
runningSum += Math.pow(Math.sqrt(2*Math.PI),-1)*Math.exp(-Math.pow(x,2)/2)*(z-LeftEndpoint)/nRectangles;
}
System.out.println(runningSum);
return runningSum;
}
2. Analytic Techniques
Analytic techniques take advantage of the fact that while the normalpdf does not have a closed-form integral, the pdf can be "converted" to a sum called a Taylor series, then integrated term-by-term. Basically, it turns the pdf into a sum of infinitely many simple functions, then integrates each one analytically, then adds together all of the integrals. Since this is an analytic procedure, a programmer need only include the integral series in the program after computing the coefficients. The precision of the result just depends on how many terms of the sum you include in the calculation, and tends to approach accurate values much sooner than numerical integration techniques. For example, the solution by Mohammad Aldefrawy computes just five coefficients. Below is a method that includes the computation of coefficients, so you one could compute values to arbitrary precision (Actually, the normalcdf series isn't computed directly. Instead, the coefficients of the related error function are computed then converted by a linear transformation). However, since computation of the coefficients involves the factorial function, one experiences memory issues for substantially large numbers of coefficients. Thankfully, this method returns values with much higher precision in a fraction of the iterations required by methods in the previous class of solutions to yield similar results.
public static double normalCDF(double x){
System.out.println(0.5*(1+erf(x/Math.sqrt(2))));
return 0.5*(1+erf(x/Math.sqrt(2)));
}
public static double erf(double z)
{
int nTerms = 315;
double runningSum = 0;
for(int n = 0; n < nTerms; n++){
runningSum += Math.pow(-1,n)*Math.pow(z,2*n+1)/(factorial(n)*(2*n+1));
}
return (2/Math.sqrt(Math.PI))*runningSum;
}
static double factorial(int n){
if(n == 0) return 1;
if(n == 1) return 1;
return n*factorial(n-1);
}
Other functions
For the inverse function, since we used the error function in the normalCDF method, we can use the inverse error function in a similar way. Again, we obtain the coefficients of the inverse error function analytically, then compute them as needed in the method.
public static double invErf(double z)
{
int nTerms = 315;
double runningSum = 0;
double[] a = new double[nTerms + 1];
double[] c = new double[nTerms + 1];
c[0]=1;
for(int n = 1; n < nTerms; n++){
double runningSum2=0;
for (int k = 0; k <= n-1; k++){
runningSum2 += c[k]*c[n-1-k]/((k+1)*(2*k+1));
}
c[n] = runningSum2;
runningSum2 = 0;
}
for(int n = 0; n < nTerms; n++){
a[n] = c[n]/(2*n+1);
runningSum += a[n]*Math.pow((0.5)*Math.sqrt(Math.PI)*z,2*n+1);
}
return runningSum;
}
public static double invNorm(double A){
return (2/Math.sqrt(2))*invErf(2*A-1);
}
I don't have a method for the lognormal function, but you could obtain one using the same idea.
I never tried it but the guys from algo team were using Colt and they were happy with the results.
I have a piece of code that needs to do many computations based on double values, which takes too much time. Can I speed this up by dropping some decimals? if I use a formatter to parse the double, won't that do the calculus first and then shed the extra decimals, so nothing would be gained? what's the best way of doing this?
Just something to get an idea:
double avgRatingForPreferredItem = (double) tempAverageRating.get(matrix.get(0).getItemID1())/matrix.size();
double avgRatingForRandomItem = (double) tempAverageRating.get(matrix.get(0).getItemID2())/matrix.size();
double numarator = 0;
for (MatrixColumn matrixCol : matrix) {
numarator += ( matrixCol.getRatingForItemID1() - avgRatingForPreferredItem ) * (matrixCol.getRatingForItemID2() - avgRatingForRandomItem);
}
double numitor = 0;
double numitorStanga = 0;
double numitorDreapta = 0;
for (MatrixColumn matrixCol : matrix) {
numitorStanga += (matrixCol.getRatingForItemID1() - avgRatingForPreferredItem) * (matrixCol.getRatingForItemID1() - avgRatingForPreferredItem);
numitorDreapta += (matrixCol.getRatingForItemID2() - avgRatingForRandomItem) * (matrixCol.getRatingForItemID2() - avgRatingForRandomItem);
}
numitor = Math.sqrt( numitorStanga * numitorDreapta );
double corelare = numarator/numitor;
I don't believe the actual values involved can make any difference.
It's worth at least trying to reduce the computations here:
for (MatrixColumn matrixCol : matrix) {
numitorStanga += (matrixCol.getRatingForItemID1() - avgRatingForPreferredItem)
* (matrixCol.getRatingForItemID1() - avgRatingForPreferredItem);
numitorDreapta += (matrixCol.getRatingForItemID2() - avgRatingForRandomItem)
* (matrixCol.getRatingForItemID2() - avgRatingForRandomItem);
}
It depends on how smart the JIT compiler is - and I'm assuming getRatingforItemID1 and getRatingforItemID2 are just pass-through properties - but your code at least looks like it's doing redundant subtractions. So:
for (MatrixColumn matrixCol : matrix) {
double diff1 = matrixCol.getRatingForItemID1() - avgRatingForPreferredItem;
double diff2 = matrixCol.getRatingForItemID2() - avgRatingForPreferredItem;
numitorStanga += diff1 * diff1;
numitorDreapta += diff2 * diff2;
}
You could try changing everything to float instead of double - on some architectures that may make things faster; on others it may well not.
Are you absolutely sure that it's the code you've shown which has the problem, though? It's only an O(N) algorithm - how long is it taking, and how large is the matrix?
Floating-point calculations are the same speed regardless of the decimal places. This is hardware, so it operates on the complete value every time anyway. Also keep in mind that the number of decimal places is irrelevant anyway, double stores numbers in binary and just truncating decimal places could well create a same-length binary representation.
Another way to make this faster is to use arrays instead of objects. The problem with using objects is you have no idea how they are arranged in memory (often badly in my experience as the JVM doesn't optimise for this at all)
double avgRatingForPreferredItem = (double) tempAverageRating.get(matrix.get(0).getItemID1()) / matrix.size();
double avgRatingForRandomItem = (double) tempAverageRating.get(matrix.get(0).getItemID2()) / matrix.size();
double[] ratingForItemID1 = matrix.getRatingForItemID1();
double[] ratingForItemID2 = matrix.getRatingForItemID2();
double numarator = 0, numitorStanga = 0, numitorDreapta = 0;
for (int i = 0; i < ratingForItemID1.length; i++) {
double rating1 = ratingForItemID1[i] - avgRatingForPreferredItem;
double rating2 = ratingForItemID2[i] - avgRatingForRandomItem;
numarator += rating1 * rating2;
numitorStanga += rating1 * rating1;
numitorDreapta += rating2 * rating2;
}
double numitor = Math.sqrt(numitorStanga * numitorDreapta);
double corelare = numarator / numitor;
Accessing data continuously in memory can be 5x faster than random access.
You might be able to speed up your algorithm (depending on the value range used) by changing your floating point values into long values that are scaled according to the number of decimal places you need, i.e. value * 10000 for 4 decimal places.
If you chose to do this, you will need to keep the scale in mind for division and multiplication (numitorDreapta += (diff2 * diff2) / 10000;) which does add some clutter to your code.
You will need to convert before and after, but if you need to do a lot of calculations using integer arithmetic instead of floating point might yield the speedup you are looking for.
How can I multiply and divide without using arithmetic operators? I read similar question here but i still have problem to multiply and divide.
Also, how can square root be calculated without using math functions?
if you have addition and negation, as in the highest voted answer to the post you gave, you can use looped additions and subtractions to implement multiplication and division.
As for the square root, just implement Newton's Iteration on the basis of the operations from step 1.
Using bitwise operators one example I found is here:
http://geeki.wordpress.com/2007/12/12/adding-two-numbers-with-bitwise-and-shift-operators/
Addition can be translated to multiplicity and division. For sqrt you could use Taylor series.
http://en.wikipedia.org/wiki/Taylor_series
Fast square root function(even faster than the library function!):
EDIT: not true, actually slower because of recent hardware improvements. This is however the code used in Quake II.
double fsqrt (double y)
{
double x, z, tempf;
unsigned long *tfptr = ((unsigned long *)&tempf) + 1;
tempf = y;
*tfptr = (0xbfcdd90a - *tfptr)>>1; /* estimate of 1/sqrt(y) */
x = tempf;
z = y*0.5; /* hoist out the “/2” */
x = (1.5*x) - (x*x)*(x*z); /* iteration formula */
x = (1.5*x) – (x*x)*(x*z);
// x = (1.5*x) – (x*x)*(x*z); /* not necessary in games */
return x*y;
}