I was wondering if there is a way to take an array of integers (and possibly decimal points) and convert them into a single number (an int or a double)? For example, if I had the array {4, 1, ., 9}, is it possible to convert it into a double, 41.9? I am implementing a way to do it by iteration and *10 to various powers, but I'm not sure if it will work out because of rounding errors and such (will it calculate i.e., 9 * (10^-1) correctly all the time?).
I guess a quick and dirty solution is to convert your array of integers and points (in which case you probably have a char array like so: {'4','1','.','9'}) into a string, "41.9" and then use the Double.parseDouble(s) to parse your string to a double. It wont be fast, though.
This only works if the possible values in the array are very limited (for instance 0 to 9) and the number of elements is not that large.
The following algorithm works does this:
long l = 0x00;
int[] array = {5,2,3,6,7,8};
int radix = 9;
for(int i = 0x00; i < array.length; i++) {
l *= radix;
l += array[i];
}
return l;
Where radix is the maximum value +1.
Even then however, a long has a maximum value of 2^63-1 or 9 223 372 036 854 775 807. That means that if the number of possible values per item is 10, you can only store 17 items.
A double won't solve the problem. Since doubles use floating point semantics, they can only store approx. 12 decimals correctly. If you store more in a double, information will get lost eventually. What happens to the least significant values depends on the larger values (since half of the time a 0 will be generates as least significant bit, and 1 in the other case).
Related
I was wondering about the differences between positive and negative zero in different numeric types.
I understand the IEEE-754 for floating point arithmetic and bit representation in double precision so the following didn't come as a surprise
double posz = 0.0;
double negz = -0.0;
System.out.println(Long.toBinaryString(Double.doubleToLongBits(posz)));
System.out.println(Long.toBinaryString(Double.doubleToLongBits(negz)));
// output
>>> 0
>>> 1000000000000000000000000000000000000000000000000000000000000000
What did surprise me and showed me that im clueless about the bit representation of long type in java is that even if i shift right (unsigned >>>) then the binary representation of both positive and negative zero is the same
long posz = 0L;
long negz = -0L;
for (int i = 63; i >= 0; i--) {
System.out.print((posz >>> i) & 1);
}
System.out.println();
for (int i = 63; i >= 0; i--) {
System.out.print((negz >>> i) & 1);
}
// output
>>> 0000000000000000000000000000000000000000000000000000000000000000
>>> 0000000000000000000000000000000000000000000000000000000000000000
so i am wondering what does java do from a bit representation when i write the following
long posz = 0L;
long negz = -0L;
Does the compiler understand that they are both zero and disregards the sign (and so assignes 0 to the sign bit) or is there other magic here?
or is there other magic here?
Yes. 2's complement.
2's complement is a bit magical. It accomplishes 2 major objectives. Before getting into that, let's first stew on the notion of negative zero for a moment.
Negative zero is kinda weird. Why does it exist at all?
Negative zero isn't actually a thing. Ask any mathematician "Hey, so, what's up with negative zero?" and they'll just look at you in befuddlement. It's not a thing. Mathematically, 0 and -0 are utterly identical. Not just 'nearly identical', but 100%, fully, in all possible ways, identical. We don't generally want our numbers to be capable of representing both 5.0 as well as 5.00 - as those two are entirely, 100%, identical. If you don't think that a value system ought to waste bits trying to differentiate between 5.0 and 5.00, then it's equally bizarro to want the ability to represent -0.0 and +0.0 as distinct entities.
So, wanting -0 in the first place is kinda weird. All the numeric primitives (long, int, short, byte, and I guess char which is technically numeric too) all cannot represent this number. Instead, long z = -0 boils down to:
Take the constant "0".
Apply the 'negate' operation to this number (- is a unary operator. Just like 2+5 makes the system calculate the binary operation of "addition" on elements 2 and 5, -x makes the system calculate the unary operation of "negation" on element x. Applying the negation operation to 0 produces 0. It's no different from writing, say, int x = 5 + 0;. That +0 part doesn't do anything. The - in front of -0 doesn't do anything. In contrast to -0.0 where it does do something (gets you negative zero, the double value, instead of positive zero).
Store this result in z (so, just 0 then).
There is no way to tell if that minus is there. They both result in ALL ZERO bits, and hence, there is no way for the computer to tell if you initialized that variable with the expression -0 or with +0. Again in contrast to double where as you noticed there's a bit different.
So why does double have it then?
Let's stew a bit on the notion of doubles and IEEE-754 math.
A double takes 64 bits. From basic pure mathematical principles then, a double is as incapable of representing more than 2^64 different possible values you are capable of breaking the speed of light or making 1+1=3.
And yet, a double aims to represent all numbers. There are way more numbers between 0 and 1 than 2^64 options (in fact, an infinite amount of numbers exist between 0 and 1), and that's just 0 to 1.
So, how doubles actually work is different. A few less than 2^64 numbers are chosen from the entire number line. Let's call these the blessed numbers.
The blessed numbers are not equally distributed. The closer you are to 1, the more blessed numbers exist. In other words, the distance between 2 blessed numbers increases as you move away from 1. For example, if you go from, say, 1e100 (a 1 with a hundred zeroes) and want to find the next blessed number, it's quite a ways. It's in fact higher than 1.0! - 1e100+1 is in fact 1e100 again, because the way double math works is that after every single last mathematical operation you to do them, the end result is rounded to the nearest blessed number.
Let's try it!
double d = 1e100;
System.out.println(d);
System.out.println(d + 1);
// prints: 1.0E100
// 1.0E100
But that means.. double values don't actually represent a single number!!. What any given double represents is in fact this concept:
An unknown number whose value lies between [D - 𝛿, D + 𝛿], where D is the blessed number that is closed to this unknown number this value represents, and, and 𝛿 is half of the distance between D and the next nearest blessed number on either side.
Given that usually 𝛿 is incredibly small, this is 'good enough'. But this weirdness does explain why you really, really do not want any business at all with double if accuracy is important (such as with currencies. Don't store those in doubles, ever!)
Given that, what does -0.0 represent? not actually just 0. It represents, specifically: An unknown number whose value lies between [-𝛿, 0] where 0 is real zero (and this, has no sign), and 𝛿 is Double.MIN_VALUE: the smallest non-zero positive number representable with a double.
That's why -0.0 and +0.0 both exist: They are in fact different concepts. Rarely relevant, but sometimes it is. In contrast to e.g. long where 5 just means 5 and not "between 4.5 and 5.5", because longs fundamentally don't recognize that fractional parts exist in the first place. Given that 5 just means 5, then 0 just means 0, and there is no such thing as negative zero in the first place.
Now we get to 2's complement
2's complement is a cool system. It has two neat properties:
It only has the one zero.
It does not matter if you treat the bit sequence as signed-by-way-of-2s-complement or as unsigned, for the purposes of the operations: Addition, Substraction, Increment, Decrement, zero-check. The modifications you do to the bits to implement those operations is identical.
It DOES matter for greater than, less than, and divide.
2's complement works like this: To negate a number, take all bits and flip them (i.e. do a NOT operation on the bits). Then, add 1.
Let's try it!
int x = 5;
int y = -x;
for (int i = 31; i >= 0; i--) {
System.out.print((x >>> i) & 1);
}
System.out.println();
for (int i = 31; i >= 0; i--) {
System.out.print((y >>> i) & 1);
}
System.out.println();
// prints 00000000000000000000000000000101
// 11111111111111111111111111111011
As we can see, the 'flip all bits and add 1' algorithm was applied.
2s complement is, of course, reversible: If you do 'flip all bits and add 1' twice in a row you get the same number out.
Now let's try -0. 0 is 32 0 bits, then flip them all, then add 1:
00000000000000000000000000000000
11111111111111111111111111111111 // flip all
100000000000000000000000000000000 // add 1
00000000000000000000000000000000 // that 1 fell off
and because ints can only store 32 bits, that final '1' falls off of the end. And we're left with zero again.
Now let's go with bytes ( abit smaller) and try to add, say, 200 and 50 together.
11001000 // 200 in binary
00110010 // 50 in binary
-------- +
11111010 // 250 in binary.
now let's instead go: Oh wait, whoops, that was an error, actually these numbers are in 2s complement. That wasn't 200, nono. 11001000 is a bit sequence that actually means (let's apply the 'flip all bits, add 1' scheme: 00111000 - it's actually -56. So the operation was meant to represent '-56 + 50'. Which is -6. -6 in binary is (write out 6, flip bits, add 1):
00000110
11111001
11111010
hey now, look at that, nothing changed! It's the same result! So, when the computer does x + y, where x and y are numbers, the computer does not care. Whether x is "an unsigned number" or "a signed with 2s complement number", the operation is identical.
That's why 2s complement is applied. It makes math MUCH faster. The CPU doesn't have to futz about with branching out to deal with sign bits.
In this sense it is more correct to say that in java, int, long, char, byte and short are neither signed nor unsigned, they just are. At least for the purposes of +, -, ++, and --. No the idea that int is signed is fundamentally a property of e.g. System.out.println(int) - that method chooses to render the bitsequence 11111111111111111111111111111111 as "-1" instead of as 4294967296.
long has no such thing as negative zero. Only float and double have a different representation of positive and negative zero.
I'm implementing Karatsuba multiplication in Scala (my choice) for an online course. Considering the algorithm is meant to multiply large numbers, I chose the BigInt type which is backed by Java BigInteger. I'd like to implement the algorithm efficiently, which using base 10 arithmetic is copied below from Wikipedia:
procedure karatsuba(num1, num2)
if (num1 < 10) or (num2 < 10)
return num1*num2
/* calculates the size of the numbers */
m = max(size_base10(num1), size_base10(num2))
m2 = floor(m/2)
/* split the digit sequences in the middle */
high1, low1 = split_at(num1, m2)
high2, low2 = split_at(num2, m2)
/* 3 calls made to numbers approximately half the size */
z0 = karatsuba(low1, low2)
z1 = karatsuba((low1 + high1), (low2 + high2))
z2 = karatsuba(high1, high2)
return (z2 * 10 ^ (m2 * 2)) + ((z1 - z2 - z0) * 10 ^ m2) + z0
Given that BigInteger is internally represented as an int[], if I can calculate m2 in terms of the int[], I can use bit shifting to extract the lower and higher halves of the number. Similarly, the last step can be achieved by bit shifting too.
However, it's easier said than done, as I can't seem to wrap my head around the logic. For example, if the max number is 999, the binary representation is 1111100111, lower half is 99 = 1100011, upper half is 9 = 1001. How do I get the above split?
Note:
There is an existing question that shows how to implement using arithmetic on BigInteger, but not bit shifting. Hence, my question is not a duplicate.
To be able to use bit shifting to do the splits and recombination, the base needs to be a power of two. Using two itself, as in the linked answer, is probably reasonable. Then the "length" of the inputs can be found directly with bitLength, and the split could be implemented as:
// x = a + 2^N b
BigInteger b = x.shiftRight(N);
BigInteger a = x.subtract(b.shiftLeft(N));
Where N is the size that a will have in bits.
Given that BigInteger is implemented with 32bit limbs, it makes sense to use 2³² as the base, ensuring that the big shifts involve only the movement of whole integers, and not also the slower code path where the BigInteger is shifted by a value between 1 and 31. This could be accomplished by rounding N to a multiple of 32.
The specific constant in this line,
if (N <= 2000) return x.multiply(y); // optimize this parameter
Should probably not be trusted too much, given that comment. For performance there should be some bound though, otherwise the recursive splitting goes too deeply. For example, when the size of the numbers is 32 or less, it's clearly better to just multiply, but probably a good cut-off is much higher. In this source of BigInteger itself, the cutoff is expressed in terms of the number of limbs instead of bits, and set to 80 (so 2560 bits) - it also has an other threshold above which it switches to 3-way Toom-Cook multiplication instead of Karatsuba multiplication.
To reverse an integer and put it into a list, one would do the following (where x is some integer):
int lastDigit = x;
while(lastDigit != 0)
{
list.add(lastDigit % 10);
lastDigit /= 10;
}
So if x was 502, 2 0 and 5 would get added to the list.
This is obviously really useful, but until yesterday I thought the only way to do something like this was by converting the int to a string first.
I'm not sure if this is just common knowledge but I had not seen this method before today. I would like to understand how it works instead of merely memorizing it.
Could someone explain why the number modulus 10 gives the last digit, and why dividing it by 10 gives the next digit on the next iteration? Why would it eventually equal 0?
The modulus operator gives you the remainder from doing a division calculation.
502 % 10 is 2 because 502/10 = 50 plus a remainder of 2.
Therefore the remainder in this calculation is 2, meaning 2 will be added to the list.
The division by ten in the next line is performed using integer arithmetic, so 502/10 gives a result of 50.
Any non-negative number less than 10 will give a result of zero, ending the loop.
Think of % 10 as getting the least significant (right most) digit in decimal system (hence 10).
And then think of / 10 as shifting all digits one place right (also decimal). You obviously have to do it until the number is 0. All remaining digits can be understood as leading zeros in this case.
In binary system you can also use the bitwise operations & 1 and >> 1 instead of modulo (% 2) and integer (/ 2) divisions.
The list append operation (here add) is the one that reverses the order. The operations above are just for extraction of the single digits.
While converting a String into BigInteger, Java internally calculates the number of bits and then the number of words(each word is a group of 9 integers i think) in a BigInteger as can be seen here from Line 325 to Line 327. numWords is used then to create an array that can accomodate that BigInteger.
I don't understand the logic used for calculating numBits in line 325 and then the logic for numWords in Line 326.
Logically i think that for the string "123456789", numWords should be 1 and for "12345678912",numWords should be 2 , but that's not always the case. For example for "12345678912345678912", numWords should be 3, but it comes out to be 2.
Can anyone please explain the logic used in line 325 and 326?
To represent decimal number of numDigits as binary number, it requires
numDigits * Math.log(10) / Math.log(2)
bits.
int numBits = (int)(((numDigits * bitsPerDigit[radix]) >>> 10) + 1);
In the calculation above bitsPerDigit[10] is 3402.
Math.log(10) / Math.log(2) * Math.pow(2, 10) = 3401.6543691646593
In Java, BigIntegers are not stored as strings or bytes with a digit each. They are stored as an array of 32-bit integers, which together form the so-called magnitude of the BigInteger. There can be no leading zero integers(*), so the BigInteger is stored as compactly as possible.
The "words" mentioned are these 32-bit integers. They are not groups of 9 digits, they are used in full, so each bit counts.
So you just have to know how many 32-bit integers are stored, which is the length of the internal array times 32. But the top integer can still have leading zeroes, so you must get the number of leading zeroes of that top integer and subtract them from the obtained product, in pseudo-code:
numBits = internalArray.length * 32 - numberOfLeadingZeroBits(internalArray[0]);
Note that the internal array is stored with the top integer at the lowest address (I have no idea why that is), so the top integer is at index 0 of the array.
(*) In reality, the above is a little more complicated, since the top item may be stored at an offset from the start of the array (probably to make certain calculations easier), but to understand the mechanism, you can pretend there are no extra integers.
Words doesn't refer to words as you know it - it's referring to words as memory blocks.
https://en.wikipedia.org/wiki/Word_(computer_architecture)
in this simple code i can not get Long larger than 1000000000. lenght of that is 10 char and i want to get larger than such as 15 character.
long value = nextLong(rand,1000000000);
long nextLong(Random rng, long n) {
long bits, val;
do {
bits = (rng.nextLong() << 1) >>> 1;
val = bits % n;
} while (bits-val+(n-1) < 0L);
return val;
}
Your long constant is missing an L suffix:
long value = nextLong(rand,100000000000000L);
I want to get larger than such as 15 character.
Java's long has range of –9223372036854775808 to 9223372036854775807 (18 full digits + top digit in the range 0..8), which is sufficient to cover the range that you need to cover. If you need 19 decimal digits or more, you would need to use BigInteger.
You should be able to use BigInt.
Import using:
import java.math.BigInteger;
declare like this:
BigInteger myBigInt = new BigInteger("123456789123456789");
Increase your limit value of 'n'. Since you are limiting the generated random value by performing a modulo 'n', obviously the generated value needs to be less than 'n'. Since your limit is a long, you can increase that limit to allow for 15 digit results without other changes.
However I am not sure what you are trying to accomplish with the loop in the nextLong function. It will only loop when bits > ( Long.MAX - n + 1 ).
I get the feeling that you're limiting yourself by your own modulo operation.
Remember that modulo division is the same as short division - the kind we used back in third grade. That is, instead of dividing out the entire number, we take the whole portion and the remainder.
So, let's take a simple example (a power of 10, since you're using one as well):
99 / 10 = 9 remainder 9
That is to say, if I divide 99 into 10 using short division, I will be able to divide it evenly 9 times, with 9 bits left over. Notice that the left-over is an order of magnitude shorter of what I'm dividing into.
This scales up with higher orders of divisors:
999 / 10 = 99 remainder 9
9999 / 10 = 999 remainder 9
99999 / 10 = 9999 remainder 9
...and so forth. Notice that our remainder is always an order of magnitude below our dividend. This makes sense, since if it were larger than our dividend, it'd be another value we could add to the quotient, and not the remainder.
Now, we come back to your example. You're taking a long value, which can be several orders of magnitude larger or smaller than your passed in value of a billion (which fits fine into an int, and is promoted to a long when you call your method).
The ultimate issue comes down to this:
val = bits % n;
...where bits is some arbitrary long value that could be greater than n.
Remember what we discovered above with the short division above? That's right - your resulting val will be an order of magnitude below your n value - that is to say, it will never be larger than or equal to n.
I'm not entirely sure what it is you're trying to accomplish, so I don't have The Right Thing™ for you to do. But I'd recommend that you re-evaluate the purpose of that modulo operation.