Why is my byte array displaying the wrong length?

Why is my byte array displaying the wrong length? - java

BigInteger number = new BigInteger("7316717653133062491922511967442657474235534919493496983520312774506326239578318016984801869478851843858615607891129494954595017379583319528532088055111254069874715852386305071569329096329522744304355766896648950445244523161731856403098711121722383113622298934233803081353362766142828064444866452387493035890729629049156044077239071381051585930796086670172427121883998797908792274921901699720888093776657273330010533678812202354218097512545405947522435258490771167055601360483958644670632441572215539753697817977846174064955149290862569321978468622482839722413756570560574902614079729686524145351004748216637048440319989000889524345065854122758866688116427171479924442928230863465674813919123162824586178664583591245665294765456828489128831426076900422421902267105562632111110937054421750694165896040807198403850962455444362981230987879927244284909188845801561660979191338754992005240636899125607176060588611646710940507754100225698315520005593572972571636269561882670428252483600823257530420752963450");
byte[] array = number.toByteArray();
System.out.println((int)array.length);
I was working on number 8 for project euler, where the length of number is supposed to be 1000, but whenever I run this program, I receive 416. Could someone please explain to me why this isn't working?

one char doesn't mean one byte here, for example number 11 is 00001011 which can be represented by just 1 byte
Similarly in your case
7316717653133062491922511967442657474235534919493496983520312774506326239578318016984801869478851843858615607891129494954595017379583319528532088055111254069874715852386305071569329096329522744304355766896648950445244523161731856403098711121722383113622298934233803081353362766142828064444866452387493035890729629049156044077239071381051585930796086670172427121883998797908792274921901699720888093776657273330010533678812202354218097512545405947522435258490771167055601360483958644670632441572215539753697817977846174064955149290862569321978468622482839722413756570560574902614079729686524145351004748216637048440319989000889524345065854122758866688116427171479924442928230863465674813919123162824586178664583591245665294765456828489128831426076900422421902267105562632111110937054421750694165896040807198403850962455444362981230987879927244284909188845801561660979191338754992005240636899125607176060588611646710940507754100225698315520005593572972571636269561882670428252483600823257530420752963450
is in binary
1011001000110011100000101111011000010111001000110011110000001000000000010100100101011100100110100100001010111010001101011100100100011110110101101111001100110111101110101000011011011101011001010111001000000101110101100000100100010011010111010111100010100110000101010101101100000110100111000111001001011111001001010110110110111011010111100010111101001011010110111000110111111011011000110110110110110001110100001011010001101110110010011100010010000000011011100101110101100011110010010010110110001111101111101100010011000110001000111111001010111110001000111010000010000011110000111101011010011010100001011110001010000001001000101111110000110000011111110000010110100010110101100111011000000001100011100111000000000100100101110101000100010001010100101111001100110011000110001110101010100001010101011000111011010110010000101010100110010110111100011100011000001001011100111000001001101111111001101111011000111011110101101010001001000110100110010011110101001101110110000000010011100101011111110110101001101000011011111110001011001110111010010001110000010100111010001011101011111000111001000011010111111001000101010001101100001000111111011001010010101100000001001100111100011001010111111010011111100100101011011010000010100100101110000010101000110010011010001001100011101101111110001000001000011101011111111011010100010010101011111011101000010111011001000000001100011111101100111011001111111100100001100110111110110110000101101000101110101111000101101111010010101000000001110100111011001000011001100010001110001000010110110011000111001000110010100110111000010110110100110010100101111111000100101011001100111100111001011100000000100110110000110001001001111011110101100101010010010000111110101011111101010101101011001001010000011000110010010111101001000110001011111001111011101010111010110111111110101011010011011101000011010010110010110101001100100010110000000110101001100101010110110011000101011000111100011000100110010011101111011111111100101110000011111110000110010001100011111101011100110001001001010100101001100011110110110000101001111010101001011101000101011011011000010010000001000110001000100101000000110010000100101000101001101111010010011101010001110011001110000001011011001111100100110010101101011000101001111110011101101010001111000111101101110111001001111101001010000011001101000111110100000000100011101000101111101001111100101111101010000100011101100000110010010001001110010100101010101101000100111000001110100010011110110000100001111001001001010111101001111001100010101000110000101111100101110001110001000011001010001000001101111100010110001000111111010101110110100100111011100100010000011111100001100011001011110010111111100011111010010100100000111100110101011000010100011100100001000101011101110000011101010110100111101101110000110010011110101110110100011001101110111101010110100000010001011111110011000010111111111101101110011110010100101011100100111101000110100001011011011010101111100001101111010110011110111000000010101101000111000100101101010110010110110010100001000000110000110011100011000111101011010110011010000100000111000101101100111111101111110100110010010011001011001010110001110111011100110101010101011000110100100001000011101111011100111010001101101011111011111001010110111011101110000001110010011001101010000010110001100101101111011111011111000100010100001000011001100010101100010100100101011101111010
Now if you check how many byte it requires to represent this number
More generally you can check this by
N length of binary string can represent up to 2^N - 1 number
For length: 2 = (max binary string) 11 = 2^2 - 1 = 3 (in 10)

This is because the toByteArray saves the binary representation of the number, not a decimal one. You can think of each byte representing a single digit in base-256. That's why the space required for the representation is more than twice less than the number of decimal digits.
If you need to save each digit to a byte, convert your BigInteger to String: its length is going to equal the number of digits (plus one character for the minus character '-' if the number is negative).

I don't know precisely how BigInteger stores values, but my guess would be that rather than storing them as a string, with one byte per digit, it stores them as one long number, with log_2(n) bits being used to store the number n, and therefore ceiling(log_2(n) / 8) bytes being used.

Because a byte array is a number in base 256 (since every digit can have range 0-255 or 0x00-0xFF) while the input number is in base 10. When you convert your number into a byte array you obtain a number which is in a different base, hence has a different amount of digits.
To prove it you can apply the change of base of logarithms:
logA(C) = logB(C) / logB(A)
log10(C) = log256(C) / log256(10)
1000 ~= 416 / log256(10)
1000 ~= 416 / (log2(10)/log2(256))
1000 ~= 416 / (3.3219/8)
1000 ~= 416 / 0.4152
1000 * 0.4152 ~= 416
415.2 ~= 416

Related

What is the difference in bytes of a number as a string and as an integer?

Let's say we have a my_string = "123456"
I do
my_string.getBytes()
and
new BigInteger(123456).toByteArray()
The resulting byte arrays are different for both these cases. Why is that so? Isn't "123456" same as 123456 other than the difference in data type?

They are different because the String type is made up of unicode characters. The character '2' is not at all the same as the numeric value 2.

No. Why would they be? "123456" is a sequence of the ASCII character 1 (which is not represented as the number 1, but as the number 49), followed by the number 2 (50), and so on. 123456 as an int isn't even represented as a sequence of digits from 0-9, but it's stored as a number in binary.

I assume that you are asking about the total memory used to represent a number as a String versus a byte[].
The String size will depend on the actual string representation used. This depends on the JVM version; see What is the Java's internal represention for String? Modified UTF-8? UTF-16?
For Java 8 and earlier (with some caveats), the String consists of a String object with 1 int fields and 1 reference field. Assuming 64 bit references, that adds up to 8 bytes of header + 1 x 4 bytes + 1 x 8 bytes + 4 bytes of padding. Then add the char[] used to represent the characters: 12 bytes of header + 2 bytes per character. This needs to be rounded up to a multiple of 8.
For Java 9 and later, the main object has the same size. (There is an extra field ... but that fits into the "padding".) The char[] is replaced by a byte[], and since you are just storing ASCII decimal digits1, they will be encoded one character per byte.
In short, the asymptotic space usage is 1 byte per decimal digit for Java 9 or later and 2 bytes per decimal digit in Java 8 or earlier.
For the byte[] representation produce from a BigInteger, the represention consists of 12 bytes of header + 1 byte per byte ... rounded up to a multiple of 8. The asymptotic size is 1 byte per byte.
In both cases there is also the size of the reference to the representation; i.e. another 8 bytes.
If you do the sums, the byte[] representation is more compact than the String representation in all cases. But int or long are significantly more compact that either of these representations in all cases.
1 - If you are not ... or if you are curious why I added this caveat ... read the Q&A at the link above!

Calculating number of bits and number of words BigInteger

While converting a String into BigInteger, Java internally calculates the number of bits and then the number of words(each word is a group of 9 integers i think) in a BigInteger as can be seen here from Line 325 to Line 327. numWords is used then to create an array that can accomodate that BigInteger.
I don't understand the logic used for calculating numBits in line 325 and then the logic for numWords in Line 326.
Logically i think that for the string "123456789", numWords should be 1 and for "12345678912",numWords should be 2 , but that's not always the case. For example for "12345678912345678912", numWords should be 3, but it comes out to be 2.
Can anyone please explain the logic used in line 325 and 326?

To represent decimal number of numDigits as binary number, it requires
numDigits * Math.log(10) / Math.log(2)
bits.
int numBits = (int)(((numDigits * bitsPerDigit[radix]) >>> 10) + 1);
In the calculation above bitsPerDigit[10] is 3402.
Math.log(10) / Math.log(2) * Math.pow(2, 10) = 3401.6543691646593

In Java, BigIntegers are not stored as strings or bytes with a digit each. They are stored as an array of 32-bit integers, which together form the so-called magnitude of the BigInteger. There can be no leading zero integers(*), so the BigInteger is stored as compactly as possible.
The "words" mentioned are these 32-bit integers. They are not groups of 9 digits, they are used in full, so each bit counts.
So you just have to know how many 32-bit integers are stored, which is the length of the internal array times 32. But the top integer can still have leading zeroes, so you must get the number of leading zeroes of that top integer and subtract them from the obtained product, in pseudo-code:
numBits = internalArray.length * 32 - numberOfLeadingZeroBits(internalArray[0]);
Note that the internal array is stored with the top integer at the lowest address (I have no idea why that is), so the top integer is at index 0 of the array.
(*) In reality, the above is a little more complicated, since the top item may be stored at an offset from the start of the array (probably to make certain calculations easier), but to understand the mechanism, you can pretend there are no extra integers.

Words doesn't refer to words as you know it - it's referring to words as memory blocks.
https://en.wikipedia.org/wiki/Word_(computer_architecture)

Shifting BigInteger of Java by long variable

I know there are methods shiftLeft(int n) and shiftRight(int n) for BigInteger class which only takes int type as an argument but I have to shift it by a long variable. Is there any method to do it?

BigInteger can only have Integer.MAX_VALUE bits. Shifting right by more than this will always be zero. Shift left any value but zero will be an overflow.
From the Javadoc
* BigInteger constructors and operations throw {#code ArithmeticException} when
* the result is out of the supported range of
* -2<sup>{#code Integer.MAX_VALUE}</sup> (exclusive) to
* +2<sup>{#code Integer.MAX_VALUE}</sup> (exclusive).
If you need more than 2 billion bits to represent your value, you have a fairly usual problem, BigInteger wasn't designed for.
If you need to do bit manipulation on a very large scale, I suggest having an BitSet[] This will allow up to 2 bn of 2 bn bit sets, more than your addressable memory.
yes the long variable might go up to 10^10
For each 10^10 bit number you need 1.25 TB of memory. For this size of data, you may need to store it off heap, we have a library which persist this much data in a single memory mapping without using much heap, but you need to have this much space free on a single disk at least. https://github.com/OpenHFT/Chronicle-Bytes

BigInteger does not support values where long shift amounts would be appropriate. I tried
BigInteger a = BigInteger.valueOf(2).pow(Integer.MAX_VALUE);
and I got the following exception:
Exception in thread "main" java.lang.ArithmeticException: BigInteger would overflow supported range.

Since 2 ^ X is equal to 10 ^ (X * ln(2) / ln(10)), we can calculate for X = 10 ^ 10:
2 ^ (10 ^ 10) = 10 ^ 3,010,299,956.63981195...
= 10 ^ 3,010,299,956 * 10 ^ 0.63981195...
= 4.3632686... * 10 ^ 3,010,299,956
Meaning 4 followed by more than 3 billion more digits.
That's a very large number and will take some doing storing that to full precision.

Floating point notation representation in java specification

Here: http://docs.oracle.com/javase/specs/jls/se8/html/jls-4.html#jls-4.2.3
it says that:
The finite nonzero values of any floating-point value set can all be expressed in the form s · m · 2^(e - N + 1), where s is +1 or -1, m is a positive integer less than 2^N, and e is an integer between Emin = -(2^(K-1)-2) and Emax = 2^(K-1)-1, inclusive, and where N and K are parameters that depend on the value set.
and there is a table below:
Parameter float
N 24
K 8
So let's say N = 24 and K = 8 then we can have the following value from the formula:
s · 2^N · 2^(2^(K-1)-1 - N + 1) which gives us according to values specified in the table:
s * 2^24 * 2^(127 - 24) which is equal to s * 2^127. But float has only 32 bits so it's not possible to store in it such a big number.
So it's obvious that initial formula should be read in a different way. How then?
Also in javadoc for Float max value: http://docs.oracle.com/javase/7/docs/api/java/lang/Float.html#MAX_VALUE
it says:
A constant holding the largest positive finite value of type float, (2-2^-23)·2^127
This also doesn't make sense, as resulting value is much larger than 2^32 - which is possible the biggest value that can be stored in float variable. So again, I'm misreading this notation. So how it should be read?

The idea with the floating point notation is to store a much larger range of numbers than can be stored in the same space (bytes) with the integer representation. So, for example, you say that the "resulting value is much larger than 2^32". But, that would only be a problem if we're storing a typical binary number as one computes in a typical math class.
Instead, floating point representations break those 32 bytes into two main parts:
- significand
- exponent
For simplicity, imagine that 3 bytes are used for the significand and 1 byte for the exponent. Also assume that each of these is your typical binary integer style of representation. So, the three bytes can have a value 2^24, or 2^23 if you want to keep one bit for the sign.
However, the other byte can store up to 2^7 (if you want a sign there too).
So, you could express 500^100, by storing the 500 in the three bytes and the 100 in the 1 byte.
Essentially, one cannot store every number precisely. One changes it into significant form and one can store as many significant digits as the portion reserved for the significand (3 bytes in this example).
Rather than try to explain the complications, check this Wikipedia article for more.

Why do integers in Java integer not use all the 32 or 64 bits?

I was looking into 32-bit and 64-bit. I noticed that the range of integer values that can stored in 32 bits is ±4,294,967,295 but the Java int is also 32-bit (If I am not mistaken) and it stores values up to ±2 147 483 648. Same thing for long, it stores values from 0 to ±2^63 but 64-bit stores ±2^64 values. How come these values are different?

Integers in Java are signed, so one bit is reserved to represent whether the number is positive or negative. The representation is called "two's complement notation." With this approach, the maximum positive value represented by n bits is given by
(2 ^ (n - 1)) - 1
and the corresponding minimum negative value is given by
-(2 ^ (n - 1))
The "off-by-one" aspect to the positive and negative bounds is due to zero. Zero takes up a slot, leaving an even number of negative numbers and an odd number of positive numbers. If you picture the represented values as marks on a circle—like hours on a clock face—you'll see that zero belongs more to the positive range than the negative range. In other words, if you count zero as sort of positive, you'll find more symmetry in the positive and negative value ranges.
To learn this representation, start small. Take, say, three bits and write out all the numbers that can be represented:
0
1
2
3
-4
-3
-2
-1
Can you write the three-bit sequence that defines each of those numbers? Once you understand how to do that, try it with one more bit. From there, you imagine how it extends up to 32 or 64 bits.
That sequence forms a "wheel," where each is formed by adding one to the previous, with noted wraparound from 3 to -4. That wraparound effect (which can also occur with subtraction) is called "modulo arithemetic."

In 32 bit you can store 2^32 values. If you call these values 0 to 4294967295 or -2147483648 to +2147483647 is up to you. This difference is called "signed type" versus "unsigned type". The language Java supports only signed types for int. Other languages have different types for an unsigned 32bit type.
NO laguage will have a 32bit type for ±4294967295, because the "-" part would require another bit.

That's because Java ints are signed, so you need one bit for the sign.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.