Converting base of floating point number without losing precision - java

Terminology
In this question I am calling "floating point number" "decimal number" to prevent ambiguation with the float/double Java primitive data types. The term "decimal" has no relationship with "base 10".
Background
I am expressing a decimal number of any base in this way:
class Decimal{
int[] digits;
int exponent;
int base;
int signum;
}
which approximately expresses this double value:
public double toDouble(){
if(signum == 0) return 0d;
double out = 0d;
for(int i = digits.length - 1, j = 0; i >= 0; i--, j++){
out += digits[i] * Math.pow(base, j + exponent);
}
return out * signum;
}
I am aware that some conversions are not possible. For example, it is not possible to convert 0.1 (base 3) to base 10, because it is a recurring decimal. Similarly, converting 0.1 (base 9) to base 3 is not possible, but covnerting 0.3 (base 3) is possible. There are probably other cases that I have not considered.
The traditional way
The traditional way (by hand) of change of base, for integers, from base 10 to base 2, is to divide the number by the exponents of 2, and from base 2 to base 10 is to multiply the digits by respective exponents of 2. Changing from base x to base y usually involves converting to base 10 as an intermediate.
First question: Argument validation
Therefore, my first question is, if I were to implement the method public Decimal Decimal.changeBase(int newBase), how can I validate whether newBase can be made without resulting in recurring decimals (which is incompatible with the design of the int[] digits field, since I don't plan to make an int recurringOffset field just for this.
Second question: Implementation
Hence, how to implement this? I instinctively feel that this question is much easier to solve if the first question is solved.
Third question: What about recurring number output:
I don't plan to make an int recurringOffset field just for this.
For the sake of future readers, this question should also be asked.
For example, according to Wolfram|Alpha:
0.1 (base 4) = 0.[2...] (base 9)
How can this be calculated (by hand, if by programming sounds too complicated)?
I think that a data structure like this can represent this decimal number:
class Decimal{
int[] constDigits;
int exponent;
int base;
int signum;
#Nullable #NonEmpty int[] appendRecurring;
}
For example, 61/55 can be expressed like this:
{
constDigits: [1, 1], // 11
exponent: -1, // 11e-1
base: 10,
signum: 1, // positive
appendRecurring: [0, 9]
}
Not a homework question
I am not looking for any libraries. Please do not answer this question with reference to any libraries. (Because I'm writing this class just for fun, OK?)

To your first question: whenever the prime factors of the old base are also among the prime factors of the new base you can always convert without becoming periodic. For example every base 2 number can be represented exactly as base 10. This condition is unfortunately sufficient but not necessary, for example there are some base 10 numbers like 0.5 that can be represented exactly as base 2, although 2 does not have the prime factor 5.
When you write the number as fraction and reduce it to lowest terms it can be represented exactly without a periodic part in base x if and only if the denominator has only prime factors that also appear in x (ignoring exponents of primes).
For example, if your number is 3/25 you can represent this exactly in every base that has a prime factor 5. That is 5, 10, 15, 20, 25, ...
If the number is 4/175, the denominator has prime factors 5 and 7 and therefore can be represented exactly in base 35, 70, 105, 140, 175, ...
For implementation, you can either work in the old base (basically doing divisions) or in the new base (basically doing multiplications). I would avoid going through a third base during the conversion.
Since you added periodic representations to your question the best way for conversion seems to be to convert the original representation to a fraction (this can always be done, also for periodic representations) and then convert this to the new representation by carrying out the division.

To answer the third part of the question, once you have your fraction reduced (and you found out that the "decimal" expansion will be a recurring fraction), you can detect the recurring part by simply doing the long-hand division and remembering the remainders you've encountered.
For example to print out 2/11 in base 6, you do this:
2/11 = 0 (rem 2/11)
2*6/11 = 1 (rem 1/11)
1*6/11 = 0 (rem 6/11)
6*6/11 = 3 (rem 3/11)
3*6/11 = 1 (rem 7/11)
7*6/11 = 3 (rem 9/11)
9*6/11 = 4 (rem 10/11)
10*6/11 = 5 (rem 5/11)
5*6/11 = 2 (rem 8/11)
8*6/11 = 4 (rem 4/11)
4*6/11 = 2 (rem 2/11) <-- We've found a duplicate remainder
(Had 2/11 been convertible to a base 6 number of finite length, we would've reached 0 remainder instead.)
So your result will be 0.[1031345242...]. You can fairly easily design a data structure to hold this, bearing in mind that there could be several digits before the recurrence begins. Your proposed data structure is good for this.
Personally I'd probably just work with fractions, floating point is all about trading in some precision and accuracy for compactness. If you don't want to compromise on precision, floating point is going to cause you a lot of trouble. (Though with careful design you can get pretty far with it.)

I waited with this after the reward because this is not directly an answer to your questions rather few hints how to approach your task instead.
Number format
Arbitrary exponential form of number during base conversion is a big problem. Instead I would convert/normalize your number to form:
(sign) mantissa.repetition * base^exp
Where unsigned int exp is the exponent of least significant digit of mantissa. The mantissa,repetition could be strings for easy manipulation and printing. But that would limit your max base of coarse. For example if you reserve e for exponent then you can use { 0,1,2,..9, A,B,C,...,Z } for digits so max base would be then only 36 (if not counting special characters). If that is not enough stay with your int digit representation.
Base conversion (mantissa)
I would handle mantissa as integer number for now. So the conversion is done simply by dividing mantissa / new_base in the old_base arithmetics. This can be done on strings directly. With this there is no problem as we can always convert any integer number from any base to any other base without any inconsistencies,rounding or remainders. The conversion could look like:
// convert a=1024 [dec] -> c [bin]
AnsiString a="1024",b="2",c="",r="";
while (a!="0") { a=divide(r,a,b,10); c=r+c; }
// output c = "10000000000"
Where:
a is number in old base which you want to convert
b is new base in old base representation
c is number in new base
Used divide function looks like this:
//---------------------------------------------------------------------------
#define dig2chr(x) ((x<10)?char(x+'0'):char(x+'A'-10))
#define chr2dig(x) ((x>'9')?BYTE(x-'A'+10):BYTE(x-'0'))
//---------------------------------------------------------------------------
int compare( const AnsiString &a,const AnsiString &b); // compare a,b return { -1,0,+1 } -> { < , == , > }
AnsiString divide(AnsiString &r,const AnsiString &a, AnsiString &b,int base); // return a/b computed in base and r = a%b
//---------------------------------------------------------------------------
int compare(const AnsiString &a,const AnsiString &b)
{
if (a.Length()>b.Length()) return +1;
if (a.Length()<b.Length()) return -1;
for (int i=1;i<=a.Length();i++)
{
if (a[i]>b[i]) return +1;
if (a[i]<b[i]) return -1;
}
return 0;
}
//---------------------------------------------------------------------------
AnsiString divide(AnsiString &r,const AnsiString &a,AnsiString &b,int base)
{
int i,j,na,nb,e,sh,aa,bb,cy;
AnsiString d=""; r="";
// trivial cases
e=compare(a,b);
if (e< 0) { r=a; return "0"; }
if (e==0) { r="0"; return "1"; }
// shift b
for (sh=0;compare(a,b)>=0;sh++) b=b+"0";
if (compare(a,b)<0) { sh--; b=b.SetLength(b.Length()-1); }
// divide
for (r=a;sh>=0;sh--)
{
for (j=0;compare(r,b)>=0;j++)
{
// r-=b
na=r.Length();
nb=b.Length();
for (i=0,cy=0;i<nb;i++)
{
aa=chr2dig(r[na-i]);
bb=chr2dig(b[nb-i]);
aa-=bb+cy; cy=0;
while (aa<0) { aa+=base; cy++; }
r[na-i]=dig2chr(aa);
}
if (cy)
{
aa=chr2dig(r[na-i]);
aa-=cy;
r[na-i]=dig2chr(aa);
}
// leading zeros removal
while ((r.Length()>b.Length())&&(r[1]=='0')) r=r.SubString(2,r.Length()-1);
}
d+=dig2chr(j);
if (sh) b=b.SubString(1,b.Length()-1);
while ((r.Length()>b.Length())&&(r[1]=='0')) r=r.SubString(2,r.Length()-1);
}
return d;
}
//---------------------------------------------------------------------------
It is written in C++ and VCL. AnsiString is VCL string type with self allocating properties and its members are indexed from 1.
Base conversion (repetition)
There are 2 approaches for this I know of. The simpler but with possible round errors is setting the repetition to long enough string sequence and handle as fractional number. For example rep="123" [dec] then conversion to different base would be done by multiplying by new base in old base arithmetics. So let create long enough sequence:
0 + 0.123123123123123 * 2
0 + 0.246246246246246 * 2
0 + 0.492492492492492 * 2
0 + 0.984984984984984 * 2
1 + 0.969969969969968 * 2
1 + 0.939939939939936 * 2
1 + 0.879879879879872 * 2 ...
------------------------------
= "0.0000111..." [bin]
With this step you need to make repetition analysis and normalize the number again after exponent correction step (in next bullet).
Second approach need to have the repetitions stored as division so you need it in form a/b in old_base. You just convert a,b as integers (the same as mantissa) and then do the division to obtain fractional part + repetition part.
So now you should have converted number in form:
mantissa.fractional [new_base] * old_base^exp
or:
mantissa.fractional+a/b [new_base] * old_base^exp
Base conversion (exponent)
You need to change old_base^old_exp to new_base^new_exp. The simplest way is to multiply the number by the old_base^old_exp value in new base arithmetics. So for starters multiply the whole
mantissa.fractional+(a/b) [new_base]
by old_base old_exp times in the new arithmetics (later you can change it to power by squaring or better). And after that normalize your number. So find where the repetition string begins and its digit position relative to . is the new_exp value.
[Notes]
For this you will need routines to convert old_base and new_base between each other but as the base is not bignum but just simple small unsigned int instead it should not be any problem for you (I hope).

Related

Anomalies in converting float to int in Java

As the title states, I'm trying to convert some floats to ints and there are a few anomalies i saw for a couple of values. I have a method that's trying to convert the decimal portion of a float, for example .45 from 123.45, to a string representation where it outputs as 45/100.
The problem is that for 0.35, 0.45, 0.65 and 0.95 i get 0.34, 0.44, 0.64 and 0.94 respectively. Here's my implementation:
public String convertDecimalPartToString(float input){
int numberOfDigits = numberOfDigits(input);
System.out.println(numberOfDigits);
String numerator = Integer.toString((int) (input * Math.pow(10, numberOfDigits)));
String denominator = Integer.toString((int) Math.pow(10, numberOfDigits));
return numerator + "/" + denominator;
}
public int numberOfDigits(float input){
String inputAsString = Float.toString(input);
System.out.println(inputAsString);
int digits = 0;
// go to less than 2 because first 2 will be 0 and .
for(int i =0;i<inputAsString.length()-2;i++){
digits++;
}
return digits;
}
My test method looks like this:
#Test
public void testConvertDecimalPartToString(){
float input = 0.95f;
//output is 94/100 and assert fails
Assert.assertEquals("95/100", checkWriter.convertDecimalPartToString(input));
}
It seems like there's a miscalculation in this line:
String numerator = Integer.toString((int) (input * Math.pow(10, numberOfDigits)));
but I don't understand why it works properly for 0.05, 0.15, 0.25, 0.55, 0.75 and 0.85.
Can anybody help me understand?
The problem is blessed numbers. Imagine I gave you 3 bits (0 or 1 values): you could only represent 8 different values with this: 000, 001, 010, 011, 100, 101, 110, and 111. That's all of em. Can't represent more than 8 concepts if that's the only legal values!
float is a 32-bit value. With 32 bits, I can give you up to 4 billion different values. That's a lot of values, but it is not infinite, and yet there are infinite numbers between 0 and 1, let alone between 0 and 340,282,346,638,528,860,000,000,000,000,000,000,000.000000 which is the largest value a float can represent.
So how does that work? Well, not every number is actually representable. Only about 4 billion numbers are. These are the blessed numbers.
Anytime you try to make a non-blessed number, or the result of a calculation isn't blessed, then your number is rounded to the nearest blessed number, and if you perform a sequence of operations, that rounding occurs at every step.
The nature of blessed numbers is such that there are as many blessed numbers beteen 0 and 1 as there are above 1 - as you move away from 0 the interval between any 2 blessed numbers goes up. Eventually it'll be more than 1.0, in fact.
Furthermore, computers count in binary, not decimal. Just like you cannot represent '1 divided amongst 3 things' in decimal (0.33333... it never ends), something as simple as 0.1 (1 divided amongst 10 things) cannot be perfectly represented in binary, so something as simple as 1.0/10.0 already rounds!
Your code will fail if _any_rounding occurs. The solution in your case is fairly easy; add 0.005 would do it. A better way is to first render it to a rounded string:
String x = String.format("%.02f", yourValue);
and then find what you need in the string. The above takes care of proper rounding and will do a better job than using Math.pow, which, as it moves you away from that 0, causes MORE errors to show up (further from 0 -> more extreme rounding errors, as there are fewer blessed numbers out that far).
NB: Note that double is as fast as float, and given that you have 64 bits to spend there, has way more blessed numbers, so, fewer errors.
NB2: Another way to do this is to just move your concept of a 'unit'. For example, if this represents cash, just have int cents; - no problems there, it's much easier to know what the blessed numbers are for int (every integer between -2^31 and +2^31).

Why `2.0 - 1.1` and `2.0F - 1.1F` produce different results?

I am working on a code where I am comparing Double and float values:
class Demo {
public static void main(String[] args) {
System.out.println(2.0 - 1.1); // 0.8999999999999999
System.out.println(2.0 - 1.1 == 0.9); // false
System.out.println(2.0F - 1.1F); // 0.9
System.out.println(2.0F - 1.1F == 0.9F); // true
System.out.println(2.0F - 1.1F == 0.9); // false
}
}
Output is given below:
0.8999999999999999
false
0.9
true
false
I believe the Double value can save more precision than the float.
Please explain this, looks like the float value is not lose precision but the double one lose?
Edit:
#goodvibration I'm aware of that 0.9 can not be exactly saved in any computer language, i'm just confused how java works with this in detail, why 2.0F - 1.1F == 0.9F, but 2.0 - 1.1 != 0.9, another interesting found may help:
class Demo {
public static void main(String[] args) {
System.out.println(2.0 - 0.9); // 1.1
System.out.println(2.0 - 0.9 == 1.1); // true
System.out.println(2.0F - 0.9F); // 1.1
System.out.println(2.0F - 0.9F == 1.1F); // true
System.out.println(2.0F - 0.9F == 1.1); // false
}
}
I know I can't count on the float or double precision, just.. can't figure it out drive me crazy, whats the real deal behind this? Why 2.0 - 0.9 == 1.1 but 2.0 - 1.1 != 0.9 ??
The difference between float and double:
IEEE 754 single-precision binary floating-point format
IEEE 754 double-precision binary floating-point format
Let's run your numbers in a simple C program, in order to get their binary representations:
#include <stdio.h>
typedef union {
float val;
struct {
unsigned int fraction : 23;
unsigned int exponent : 8;
unsigned int sign : 1;
} bits;
} F;
typedef union {
double val;
struct {
unsigned long long fraction : 52;
unsigned long long exponent : 11;
unsigned long long sign : 1;
} bits;
} D;
int main() {
F f = {(float )(2.0 - 1.1)};
D d = {(double)(2.0 - 1.1)};
printf("%d %d %d\n" , f.bits.sign, f.bits.exponent, f.bits.fraction);
printf("%lld %lld %lld\n", d.bits.sign, d.bits.exponent, d.bits.fraction);
return 0;
}
The printout of this code is:
0 126 6710886
0 1022 3602879701896396
Based on the two format specifications above, let's convert these numbers to rational values.
In order to achieve high accuracy, let's do this in a simple Python program:
from decimal import Decimal
from decimal import getcontext
getcontext().prec = 100
TWO = Decimal(2)
def convert(sign, exponent, fraction, e_len, f_len):
return (-1) ** sign * TWO ** (exponent - 2 ** (e_len - 1) + 1) * (1 + fraction / TWO ** f_len)
def toFloat(sign, exponent, fraction):
return convert(sign, exponent, fraction, 8, 23)
def toDouble(sign, exponent, fraction):
return convert(sign, exponent, fraction, 11, 52)
f = toFloat(0, 126, 6710886)
d = toDouble(0, 1022, 3602879701896396)
print('{:.40f}'.format(f))
print('{:.40f}'.format(d))
The printout of this code is:
0.8999999761581420898437500000000000000000
0.8999999999999999111821580299874767661094
If we print these two values while specifying between 8 and 15 decimal digits, then we shall experience the same thing that you have observed (the double value printed as 0.9, while the float value printed as close to 0.9):
In other words, this code:
for n in range(8, 15 + 1):
string = '{:.' + str(n) + 'f}';
print(string.format(f))
print(string.format(d))
Gives this printout:
0.89999998
0.90000000
0.899999976
0.900000000
0.8999999762
0.9000000000
0.89999997616
0.90000000000
0.899999976158
0.900000000000
0.8999999761581
0.9000000000000
0.89999997615814
0.90000000000000
0.899999976158142
0.900000000000000
Our conclusion is therefore that Java prints decimals with a precision of between 8 and 15 digits by default.
Nice question BTW...
Pop quiz: Represent 1/3rd, in decimal.
Answer: You can't; not precisely.
Computers count in binary. There are many more numbers that 'cannot be completely represented'. Just like, in the decimal question, if you have only a small piece of paper to write it on, you may simply go with 0.3333333 and call it a day, and you'd then have a number that is quite close to, but not entirely the same as, 1 / 3, so do computers represent fractions.
Or, think about it this way: a float occupies 32-bits; a double occupies 64. There are only 2^32 (about 4 billion) different numbers that a 32-bit value can represent. And yet, even between 0 and 1 there are an infinite amount of numbers. So, given that there are at most 2^32 specific, concrete numbers that are 'representable precisely' as a float, any number that isn't in that blessed set of about 4 billion values, is not representable. Instead of just erroring out, you simply get the one in this pool of 4 billion values that IS representable, and is the closest number to the one you wanted.
In addition, because computers count in binary and not decimal, your sense of what is 'representable' and what isn't, is off. You may think that 1/3 is a big problem, but surely 1/10 is easy, right? That's simply 0.1 and that is a precise representation. Ah, but, a tenth works well in decimal. After all, decimal is based around the number 10, no surprise there. But in binary? a half, a fourth, an eighth, a sixteenth: Easy in binary. A tenth? That is as difficult as a third: NOT REPRESENTABLE.
0.9 is, itself, not a representable number. And yet, when you printed your float, that's what you got.
The reason is, printing floats/doubles is an art, more than a science. Given that only a few numbers are representable, and given that these numbers don't feel 'natural' to humans due to the binary v. decimal thing, you really need to add a 'rounding' strategy to the number or it'll look crazy (nobody wants to read 0.899999999999999999765). And that is precisely what System.out.println and co do.
But you really should take control of the rounding function: Never use System.out.println to print doubles and floats. Use System.out.printf("%.6f", yourDouble); instead, and in this case, BOTH would print 0.9. Because whilst neither can actually represent 0.9 precisely, the number that is closest to it in floats (or rather, the number you get when you take the number closest to 2.0 (which is 2.0), and the number closest to 1.1 (which is not 1.1 precisely), subtract them, and then find the number closest to that result) – prints as 0.9 even though it isn't for floats, and does not print as 0.9 in double.

Luhn checksum validation in Java

I have to replicate the luhn algorithm in Java, the problem I face is how to implement this in an efficient and elegant way (not a requirement but that is what I want).
The luhn-algorithm works like this:
You take a number, let's say 56789
loop over the next steps till there are no digits left
You pick the left-most digit and add it to the total sum. sum = 5
You discard this digit and go the next. number = 6789
You double this digit, if it's more than one digit you take apart this number and add them separately to the sum. 2*6 = 12, so sum = 5 + 1 = 6 and then sum = 6 + 2 = 8.
Addition restrictions
For this particular problem I was required to read all digits one at a time and do computations on each of them separately before moving on. I also assume that all numbers are positive.
The problems I face and the questions I have
As said before I try to solve this in an elegant and efficient way. That's why I don't want to invoke the toString() method on the number to access all individual digits which require a lot of converting. I also can't use the modulo kind of way because of the restriction above that states once I read a number I should also do computations on it right away. I could only use modulo if I knew in advance the length of the String, but that feels like I first have to count all digits one-for-once which thus is against the restriction. Now I can only think of one way to do this, but this would also require a lot of computations and only ever cares about the first digit*:
int firstDigit(int x) {
while (x > 9) {
x /= 10;
}
return x;
}
Found here: https://stackoverflow.com/a/2968068/3972558
*However, when I think about it, this is basically a different and weird way to make use of the length property of a number by dividing it as often till there is one digit left.
So basically I am stuck now and I think I must use the length property of a number which it does not really have, so I should find it by hand. Is there a good way to do this? Now I am thinking that I should use modulo in combination with the length of a number.
So that I know if the total number of digits is uneven or even and then I can do computations from right to left. Just for fun I think I could use this for efficiency to get the length of a number: https://stackoverflow.com/a/1308407/3972558
This question appeared in the book Think like a programmer.
You can optimise it by unrolling the loop once (or as many times are you like) This will be close to twice as fast for large numbers, however make small numbers slower. If you have an idea of the typical range of numbers you will have you can determine how much to unroll this loop.
int firstDigit(int x) {
while (x > 99)
x /= 100;
if (x > 9)
x /= 10;
return x;
}
use org.apache.commons.validator.routines.checkdigit.LuhnCheckDigit . isValid()
Maven Dependency:
<dependency>
<groupId>commons-validator</groupId>
<artifactId>commons-validator</artifactId>
<version>1.4.0</version>
</dependency>
Normally you would process the numbers from right to left using divide by 10 to shift the digits and modulo 10 to extract the last one. You can still use this technique when processing the numbers from left to right. Just use divide by 1000000000 to extract the first number and multiply by 10 to shift it left:
0000056789
0000567890
0005678900
0056789000
0567890000
5678900000
6789000000
7890000000
8900000000
9000000000
Some of those numbers exceed maximum value of int. If you have to support full range of input, you will have to store the number as long:
static int checksum(int x) {
long n = x;
int sum = 0;
while (n != 0) {
long d = 1000000000l;
int digit = (int) (n / d);
n %= d;
n *= 10l;
// add digit to sum
}
return sum;
}
As I understand, you will eventually need to read every digit, so what is wrong with convert initial number to string (and therefore char[]) and then you can easily implement the algorithm iterating that char array.
JDK implementation of Integer.toString is rather optimized so that you would need to implement your own optimalizations, e.g. it uses different lookup tables for optimized conversion, convert two chars at once etc.
final static int [] sizeTable = { 9, 99, 999, 9999, 99999, 999999, 9999999,
99999999, 999999999, Integer.MAX_VALUE };
// Requires positive x
static int stringSize(int x) {
for (int i=0; ; i++)
if (x <= sizeTable[i])
return i+1;
}
This was just an example but feel free to check complete implementation :)
I would first convert the number to a kind of BCD (binary coded decimal). I'm not sure to be able to find a better optimisation than the JDK Integer.toString() conversion method but as you said you did not want to use it :
List<Byte> bcd(int i) {
List<Byte> l = new ArrayList<Byte>(10); // max size for an integer to avoid reallocations
if (i == 0) {
l.add((byte) i);
}
else {
while (i != 0) {
l.add((byte) (i % 10));
i = i / 10;
}
}
return l;
}
It is more or less what you proposed to get first digit, but now you have all you digits in one single pass and can use them for your algorythm.
I proposed to use byte because it is enough, but as java always convert to int to do computations, it might be more efficient to directly use a List<Integer> even if it really wastes memory.

Implementation of java.util.Random.nextInt

This function is from java.util.Random. It returns a pseudorandom int uniformly distributed between 0 and the given n. Unfortunately I did not get it.
public int nextInt(int n) {
if (n <= 0)
throw new IllegalArgumentException("n must be positive");
if ((n & -n) == n) // i.e., n is a power of 2
return (int)((n * (long)next(31)) >> 31);
int bits, val;
do {
bits = next(31);
val = bits % n;
} while (bits - val + (n-1) < 0);
return val;
}
My questions are:
Why does it treat the case where n is a power of two specially ? Is it just for performance ?
Why doest it reject numbers that bits - val + (n-1) < 0 ?
It does this in order to assure an uniform distribution of values between 0 and n. You might be tempted to do something like:
int x = rand.nextInt() % n;
but this will alter the distribution of values, unless n is a divisor of 2^31, i.e. a power of 2. This is because the modulo operator would produce equivalence classes whose size is not the same.
For instance, let's suppose that nextInt() generates an integer between 0 and 6 inclusive and you want to draw 0,1 or 2. Easy, right?
int x = rand.nextInt() % 3;
No. Let's see why:
0 % 3 = 0
1 % 3 = 1
2 % 3 = 2
3 % 3 = 0
4 % 3 = 1
5 % 3 = 2
6 % 3 = 0
So you have 3 values that map on 0 and only 2 values that map on 1 and 2. You have a bias now, as 0 is more likely to be returned than 1 or 2.
As always, the javadoc documents this behaviour:
The hedge "approximately" is used in the foregoing description only
because the next method is only approximately an unbiased source of
independently chosen bits. If it were a perfect source of randomly
chosen bits, then the algorithm shown would choose int values from the
stated range with perfect uniformity.
The algorithm is slightly tricky. It rejects values that would result
in an uneven distribution (due to the fact that 2^31 is not divisible
by n). The probability of a value being rejected depends on n. The
worst case is n=2^30+1, for which the probability of a reject is 1/2,
and the expected number of iterations before the loop terminates is 2.
The algorithm treats the case where n is a power of two specially: it
returns the correct number of high-order bits from the underlying
pseudo-random number generator. In the absence of special treatment,
the correct number of low-order bits would be returned. Linear
congruential pseudo-random number generators such as the one
implemented by this class are known to have short periods in the
sequence of values of their low-order bits. Thus, this special case
greatly increases the length of the sequence of values returned by
successive calls to this method if n is a small power of two.
The emphasis is mine.
next generates random bits.
When n is a power of 2, a random integer in that range can be generated just by generating random bits (I assume that always generating 31 and throwing some away is for reproducibility). This code path is simpler and I guess it's a more commonly used case so it's worth making a special "fast path" for this case.
When n isn't a power of 2, it throws away numbers at the "top" of the range so that the random number is evenly distributed. E.g. imagine we had n=3, and imagine we were using 3 bits rather than 31 bits. So bits is a randomly generated number between 0 and 7. How can you generate a fair random number there? Answer: if bits is 6 or 7, we throw it away and generate a new one.

Issue with implementation of Fermat's little therorm

Here's my implementation of Fermat's little theorem. Does anyone know why it's not working?
Here are the rules I'm following:
Let n be the number to test for primality.
Pick any integer a between 2 and n-1.
compute a^n mod n.
check whether a^n = a mod n.
myCode:
int low = 2;
int high = n -1;
Random rand = new Random();
//Pick any integer a between 2 and n-1.
Double a = (double) (rand.nextInt(high-low) + low);
//compute:a^n = a mod n
Double val = Math.pow(a,n) % n;
//check whether a^n = a mod n
if(a.equals(val)){
return "True";
}else{
return "False";
}
This is a list of primes less than 100000. Whenever I input in any of these numbers, instead of getting 'true', I get 'false'.
The First 100,008 Primes
This is the reason why I believe the code isn't working.
In java, a double only has a limited precision of about 15 to 17 digits. This means that while you can compute the value of Math.pow(a,n), for very large numbers, you have no guarantee you'll get an exact result once the value has more than 15 digits.
With large values of a or n, your computation will exceed that limit. For example
Math.pow(3, 67) will have a value of 9.270946314789783e31 which means that any digit after the last 3 is lost. For this reason, after applying the modulo operation, you have no guarantee to get the right result (example).
This means that your code does not actually test what you think it does. This is inherent to the way floating point numbers work and you must change the way you hold your values to solve this problem. You could use long but then you would have problems with overflows (a long cannot hold a value greater than 2^64 - 1 so again, in the case of 3^67 you'd have another problem.
One solution is to use a class designed to hold arbitrary large numbers such as BigInteger which is part of the Java SE API.
As the others have noted, taking the power will quickly overflow. For example, if you are picking a number n to test for primality as small as say, 30, and the random number a is 20, 20^30 = about 10^39 which is something >> 2^90. (I took the ln of 10^39).
You want to use BigInteger, which even has the exact method you want:
public BigInteger modPow(BigInteger exponent, BigInteger m)
"Returns a BigInteger whose value is (this^exponent mod m)"
Also, I don't think that testing a single random number between 2 and n-1 will "prove" anything. You have to loop through all the integers between 2 and n-1.
#evthim Even if you have used the modPow function of the BigInteger class, you cannot get all the prime numbers in the range you selected correctly. To clarify the issue further, you will get all the prime numbers in the range, but some numbers you have are not prime. If you rearrange this code using the BigInteger class. When you try all 64-bit numbers, some non-prime numbers will also write. These numbers are as follows;
341, 561, 645, 1105, 1387, 1729, 1905, 2047, 2465, 2701, 2821, 3277, 4033, 4369, 4371, 4681, 5461, 6601, 7957, 8321, 8481, 8911, 10261, 10585, 11305, 12801, 13741, 13747, 13981, 14491, 15709, 15841, 16705, 18705, 18721, 19951, 23001, 23377, 25761, 29341, ...
https://oeis.org/a001567
161038, 215326, 2568226, 3020626, 7866046, 9115426, 49699666, 143742226, 161292286, 196116194, 209665666, 213388066, 293974066, 336408382, 376366, 666, 566, 566, 666 2001038066, 2138882626, 2952654706, 3220041826, ...
https://oeis.org/a006935
As a solution, make sure that the number you tested is not in this list by getting a list of these numbers from the link below.
http://www.cecm.sfu.ca/Pseudoprimes/index-2-to-64.html
The solution for C # is as follows.
public static bool IsPrime(ulong number)
{
return number == 2
? true
: (BigInterger.ModPow(2, number, number) == 2
? (number & 1 != 0 && BinarySearchInA001567(number) == false)
: false)
}
public static bool BinarySearchInA001567(ulong number)
{
// Is number in list?
// todo: Binary Search in A001567 (https://oeis.org/A001567) below 2 ^ 64
// Only 2.35 Gigabytes as a text file http://www.cecm.sfu.ca/Pseudoprimes/index-2-to-64.html
}

Categories