How does BigInteger store its data? - java

I've been searching around for quite a while, and I've found almost nothing on how BigInteger actually holds its numbers. Are they an array of chars? Something else? And how is data converted to/from BigInteger?
From what I've found, I am assuming that all of arbitrary precision classes, like BigInteger and BigDecimal, hold data as a character array. Is this how it actually works? Or is it just people's guess?
I'm asking because I have been working on my own implementation of something like BigInteger, but I can't figure out how to hold numbers larger than Long.MAX_VALUE (I don't remember the actual number).
Thanks in advance.

With an int[]
From the source:
/**
* The magnitude of this BigInteger, in <i>big-endian</i> order: the
* zeroth element of this array is the most-significant int of the
* magnitude. The magnitude must be "minimal" in that the most-significant
* int ({#code mag[0]}) must be non-zero. This is necessary to
* ensure that there is exactly one representation for each BigInteger
* value. Note that this implies that the BigInteger zero has a
* zero-length mag array.
*/
final int[] mag;

The most common way of representing numbers is by using the positional notation system. Numbers are written using digits to represent multiples of powers of the specified base. The base that we are most familiar with and use everyday, is base 10. When we write the number 12345 in base 10, it actually means: 12345 = 1*10^4 + 2*10^3 + 3*10^2 + 4*10^1 + 5*10^0
Continued here...

There are many ways to represent big integers. Strings of characters is simple,
and anyone who has ever done long division with pencil and paper can write the
arithmetic routines.

Related

random byte to int java

In the Random class, define a nextByte method that returns a value of the primitive type
byte. The values returned in a sequence of calls should be uniformly distributed over all the
possible values in the type.
In the Random class, define a nextInt method that returns a value of the primitive type
int. The values returned in a sequence of calls should be uniformly distributed over all the possible
values in the type.
(Hint: Java requires implementations to use the twos-complement representation for integers.
Figure out how to calculate a random twos-complement representation from four random byte
values using Java’s shift operators.)
Hi I was able to do part 3 and now I need to use 3. to solve 4. but I do not know what to do. I was thinking of using nextByte to make an array of 4 bytes then would I take twos complement of each so I wouldn't have negative numbers and then I would put them together into one int.
byte[] bytes = {42,-15,-7, 8} Suppose nextByte returns this bytes.
Then I would take the twos complement of each which i think would be {42, 241, 249, 8}. Is this what it would look like and why doesn't this code work:
public static int twosComplement(int input_value, int num_bits){
int mask = (int) Math.pow(2, (num_bits - 1));
return -(input_value & mask) + (input_value & ~mask);
}
Then I would use the following to put all four bytes into an int, would this work:
int i= (bytes[0]<<24)&0xff000000|
(bytes[1]<<16)&0x00ff0000|
(bytes[2]<< 8)&0x0000ff00|
(bytes[3]<< 0)&0x000000ff;
Please be as specific as possible.
The assignment says that Java already uses two's complement integers. This is a useful property that simplifies the rest of the code: it guarantees that if you group together 32 random bits (or in general however many bits your desired output type has), then this covers all possible values exactly once and there are no invalid patterns.
That might not be true of some other integer representations, which might only have 2³²-1 different values (leaving an invalid pattern that you would have to avoid) or have 2³² valid patterns but both a "positive" and a "negative" zero, which would cause a random bit pattern to have a biased "interpreted value" (with zero occurring twice as often as it should).
So that it not something for you to do, it is a convenient property for you to use to keep the code simple. Actually you already used it. This code:
int i= (bytes[0]<<24)&0xff000000|
(bytes[1]<<16)&0x00ff0000|
(bytes[2]<< 8)&0x0000ff00|
(bytes[3]<< 0)&0x000000ff;
Works properly thanks to those properties. By the way it can be simplified a bit: after shifting left by 24, there is no more issue with sign-extension, all the extended bits have been shifted out. And shifting left by 0 is obviously a no-op. So (bytes[0]<<24)&0xff000000 can be written as (bytes[0]<<24), and (bytes[3]<< 0)&0x000000ff as bytes[3]&0xff. But you can keep it as it was, with the nice regular structure.
The twosComplement function is not necessary.

Calculating number of bits and number of words BigInteger

While converting a String into BigInteger, Java internally calculates the number of bits and then the number of words(each word is a group of 9 integers i think) in a BigInteger as can be seen here from Line 325 to Line 327. numWords is used then to create an array that can accomodate that BigInteger.
I don't understand the logic used for calculating numBits in line 325 and then the logic for numWords in Line 326.
Logically i think that for the string "123456789", numWords should be 1 and for "12345678912",numWords should be 2 , but that's not always the case. For example for "12345678912345678912", numWords should be 3, but it comes out to be 2.
Can anyone please explain the logic used in line 325 and 326?
To represent decimal number of numDigits as binary number, it requires
numDigits * Math.log(10) / Math.log(2)
bits.
int numBits = (int)(((numDigits * bitsPerDigit[radix]) >>> 10) + 1);
In the calculation above bitsPerDigit[10] is 3402.
Math.log(10) / Math.log(2) * Math.pow(2, 10) = 3401.6543691646593
In Java, BigIntegers are not stored as strings or bytes with a digit each. They are stored as an array of 32-bit integers, which together form the so-called magnitude of the BigInteger. There can be no leading zero integers(*), so the BigInteger is stored as compactly as possible.
The "words" mentioned are these 32-bit integers. They are not groups of 9 digits, they are used in full, so each bit counts.
So you just have to know how many 32-bit integers are stored, which is the length of the internal array times 32. But the top integer can still have leading zeroes, so you must get the number of leading zeroes of that top integer and subtract them from the obtained product, in pseudo-code:
numBits = internalArray.length * 32 - numberOfLeadingZeroBits(internalArray[0]);
Note that the internal array is stored with the top integer at the lowest address (I have no idea why that is), so the top integer is at index 0 of the array.
(*) In reality, the above is a little more complicated, since the top item may be stored at an offset from the start of the array (probably to make certain calculations easier), but to understand the mechanism, you can pretend there are no extra integers.
Words doesn't refer to words as you know it - it's referring to words as memory blocks.
https://en.wikipedia.org/wiki/Word_(computer_architecture)

Why is a double always 8 bytes and an int always 4 bytes, even if the int has more digits?

I don't understand how an int 63823, takes up less space than a double 1.0. Is there not more information stored in the int, in this particular instance?
I don't understand how an int 63823, takes up less space than a double 1.0. Is there not more information stored in the int, in this particular instance?
Good question. What you're seeing when you see 63823 and 1.0 is a representation of the underlying data, you are not seeing the underlying data. It is specially formatted so that you can read it, but it is not how the machine sees it.
Java uses very special formats for representing int and double. You need to look at those representations to understand why 63823 takes thirty-two bits when represented as a Java int and 1.0 takes sixty-four bits when represented as a Java double.
In particular, 63823 as an int in Java is represented as:
00000000000000001111100101001111
and 1.0 as a double is represented in Java as:
0011111111110000000000000000000000000000000000000000000000000000
If you want to explore more, I recommend Two's Complement and What Every Computer Scientist Should Know About Floating-Point Arithmetic.
Not exactly. The double 1.0 represents more information because, by the definition of a double as a 64 bit float, there are more values that it could be. To use your example, if you had a special data type that could only have two values, 63823 and 98321234213474932, then it would only take 1 bit to represent the number 63823, though it would be far less useful than an int.
In terms of implementation, it's often a lot easier and faster to work with fixed-size data types, so that you can allocate a fixed chunk of memory (that's what a variable is) without having to know it's value and constantly reallocate space. Examples of a variables with a different approach would be String and BigInteger, which do allocate space to accommodate their values. Note that both are immutable in Java -- that's not a coincidence.
These primitive datatypes need to be defined somewhere for you to use them. It is not a flexible container where you can stuff in whatever you want, rather more like a bottle which takes the same space no matter if full or empty. And they also have a maximum they can contain.
Read more yourself here.
The zeros that are not shown also count. Approximately, ignoring the fact that the numbers are actually stored in binary and not in decimal, when you write both numbers with the implied zero digits included, you get:
1.0 = 1.00000000000000000*10^0000
63823 = 0000063823
As you can see, 1.0 is twice as long as 63823. Therefore it requires twice as much storage.
The int and double don't have decimal digits at all. The decimal representation of the int has 8 decimal digits after removing leading zeros. The int itself has room for 32 binary digits. The double has room for 53 binary digits in the mantissa and a 10-bit exponent, and a sign bit.

Java multiplying two BigInt objects

Theres a BigInt class and two objects num1 and num2. I have a lab assignment and i have to multiply num1 and num2. they can be an integer up to 50 digits. the class has a size and a digit.size is the number of digits in the integer that is entered and digit is an array that holds the integer.
I have to write a method that multiplies these two objects and returns the product. Im a little confused on how to start this. Ive seen examples where there are two loops and a base. I have no idea what the base would be used for.
any pointers in the right direction would be appreciated.
I assume base is decimal / hexadecimal etc., for a more general implementation...
Generally, you need to use normal long multiplication, like learned in school.
Also note that the result could be up to 100 digits in length - if you just need the 50 least significant, you could optimize the long multiplication a bit (pretty much cut it in half).

Using logarithm instead of division for large numbers?

I couldn't really come up with a proper title for my question but allow me to present my case; I want to calculate a significance ratio in the form: p = 1 - X / Y
Here X comes from an iterative process; the process takes a large number of steps and counts how many different ways the process can end up in different states (stored in a HashMap). Once the iteration is over, I select a number of states and sum their values. It's hard to tell how large these numbers are so I am intending to implement the sum as BigInteger.
Y, on the other hand comes from a binomial coefficient with numbers in thousands-scale. I am inclined to use logGamma to calculate these coefficients, which as a result give me the natural logarithm of the value.
What I am interested in is to do division X / Y in the best/most effective way. If I can get X in the natural logarithm then I could subtract the powers and have my result as 1 - e ^ (lnX - lnY).
I see that BigInteger can't be logarithmized by Math.log, what can I do in this case?
You may be able to use doubles. A double can be extremely large, about 1.7e308. What it lacks is precision: it only supports about 15 digits. But if you can live with 15 digits of precision (in other words, if you don't care about the difference between 1,000,000,000,000,000 and 1,000,000,000,000,001) then doubles might get you close enough.
If you are calculating binomial coefficients on numbers in the thousands, then Doubles will not be good enough.
Instead I would be inclined to call the toString method on the number, and compute the log as log(10) * number.toString().length() + log(asFloat("0." + number.toString()) where asFloat takes a string representation of a number and converts it to a float.
If you need maximum precision, how about converting the BigIntegers into BigDecimals and doing algebra on them. If precision isn't paramount, then perhaps you can convert your BigIntegers into doubles and do simple algebra with them. Perhaps you can tell us more about your problem domain and why you feel logarithms are the best way to go.

Categories