What is the most efficient way to compute the product
a1 b2 c3 d4 e5 ...
assuming that squaring costs about half as much as multiplication? The number of operands is less than 100.
Is there a simple algorithm also for the case that the multiplication time is proportional to the square of operand length (as with java.math.BigInteger)?
The first (and only) answer is perfect w.r.t. the number of operations.
Funnily enough, when applied to sizable BigIntegers, this part doesn't matter at all. Even computing abbcccddddeeeee without any optimizations takes about the same time.
Most of the time gets spent in the final multiplication (BigInteger implements none of the smarter algorithms like Karatsuba, Toom–Cook, or FFT, so the time is quadratic). What's important is assuring that the intermediate multiplicands are about the same size, i.e., given numbers p, q, r, s of about the same size, computing (pq) (rs) is usually faster than ((pq) r) s. The speed ratio seems to be about 1:2 for some dozens of operands.
Update
In Java 8, there are both Karatsuba and Toom–Cook multiplications in BigInteger.
I absolutely don't know if this is the optimal approach (although I think it is asymptotically optimal), but you can do it all in O(N) multiplications. You group the arguments of a * b^2 * c^3 like this: c * (c*b) * (c*b*a). In pseudocode:
result = 1
accum = 1
for i in 0 .. arguments:
accum = accum * arg[n-i]
result = result * accum
I think it is asymptotically optimal, because you have to use N-1 multiplications just to multiply N input arguments.
As mentioned in the Oct 26 '12 edit:
With multiplication time superlinear in the size of the operands, it would be of advantage to keep the size of the operands for long operations similar (especially if the only Toom-Cook available was toom-2 (Karatsuba)). If not going for a full optimisation, putting operands in a queue that allows popping them in order of increasing (significant) length looks a decent shot from the hip.
Then again, there are special cases: 0, powers of 2, multiplications where one factor is (otherwise) "trivial" ("long-by-single-digit multiplication", linear in sum of factor lengths).
And squaring is simpler/faster than general multiplication (question suggests assuming ½), which would suggest the following strategy:
in a pre-processing step, count trailing zeroes weighted by exponent
result 0 if encountering a 0
remove trailing zeroes, discard resulting values of 1
result 1 if no values left
find and combine values occurring more than once
set up a queue allowing extraction of the "shortest" number. For each pair (number, exponent), insert the factors exponentiation by squaring would multiply
optional: combine "trivial factors" (see above) and re-insert
Not sure how to go about this. Say factors of length 12 where trivial, and initial factors are of length 1, 2, …, 10, 11, 12, …, n. Optimally, you combine 1+10, 2+9, … for 7 trivial factors from 12. Combining shortest gives 3, 6, 9, 12 for 8 from 12
extract the shortest pair of factors, multiply and re-insert
once there is just one number, the result is that with the zeroes from the first step tacked on
(If factorisation was cheap, it would have to go on pretty early to get most from cheap squaring.)
Related
This question already has answers here:
Why do people say there is modulo bias when using a random number generator?
(10 answers)
Closed 2 years ago.
so my question is at Java but it can be in any programming language.
there is this declaration :
Random rnd = new Random();
We want to get a random number at range 0 to x
I want to know if there is any mathematical difference between the following:
rnd.nextInt() % x;
and
rnd.nextInt(x)
The main question is, are one of these solutions more random than the other? Is one solution more appropriate or "correct" than the other? If they are equal I will be happy to see the mathematics proof for it
Welcome to "mathematical insight" with "MS Paint".
So, from a statistical standpoint, it would depend on the distribution of the numbers being generated. First of all, we'll treat the probability of any one number coming up as an independant event (aka discarding the seed, which RNG, etc). Following that, a modulus simply takes a range of numbers (e.g. a from N, where 0<=a<N), and subdivides them based on the divisor (the x in a % x). While the numbers are technically from a discrete population (integers), the range of integers for a probability mass function would be so large that it'd end up looking like a continuous graph anyhow. So let's consider a graph of the probability distribution function for a range of numbers:
If your random number generator doesn't generate with a uniform distribution across the range of numbers (aka, any number is as likely to come up as another number), then modulo would (potentially) be breaking up the results of a non-uniform distribution. When you consider the individual integers in those ranges as discrete (and individual) outcomes, the probability of any number i (0 <= i < x) being the result is the multiplication of the individual probabilities (i_1 * i_2 * ... * i_(N/x)). To think of it another way, if we overlaid the subdivisions of the ranges, it's plain to see that in non-symmetric distributions, it's much more likely that a modulo would not result in equally likely outcomes:
Remember, the likelihood of an outcome i in the graph above would be achieved through multiplying the likelihood of the individuals numbers (i_1, ..., i_(N/x)) in the range N that could result in i. For further clarity, if your range N doesn't evenly divide by the modular divisor x, there will always be some amount of numbers N % x that will have 1 addditional integer that could produce their result. This means that most modulus divisors that aren't a power of 2 (and similarly, ranges that are not a multiple of their divisor) could be skewed towards their lower results, regardless of having a uniform distribution:
So to summarize the point, Random#nextInt(int bound) takes all of these things (and more!) into consideration, and will consistently produce an outcome with uniform probability across the range of bound. Random#nextInt() % bound is only a halfway step that works in some specific scenarios. To your teacher's point, I would argue it's more likely you'll see some specific subset of numbers when using the modulus approach, not less.
new Random(x) just creates the Random object with the given seed, it does not ifself yield a random value.
I presume you are asking what the difference is between nextInt() % x and nextInt(x).
The difference is as follows.
nextInt(x)
nextInt(x) yields a random number n where 0 ≤ n < x, evenly distributed.
nextInt() % x
nextInt() % x yields a random number in the full integer range1, and then applies modulo x. The full integer range includes negative numbers, so the result could also be a negative number. With other words, the range is −x < n < x.
Furthermore, the distribution is not even in by far the most cases. nextInt() has 232 possibilities, but, for simplicity's sake, let's assume it has 24 = 16 possibilities, and we choose x not to be 16 or greater. Let's assume that x is 10.
All possibilities are 0, 1, 2, …, 14, 15, 16. After applying the modulo 10, the results are 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5. That means that some numbers have a greater likelihood to occur than others. That also means that the change of some numbers occurring twice has increased.
As we see, nextInt() % x has two problems:
Range is not as required.
Uneven distribution.
So you should definetely use nextInt(int bound) here. If the requirement is get only unique numbers, you must exclude the numbers already drawn from the number generator. See also Generating Unique Random Numbers in Java.
1 According to the Javadoc.
So I've implemented my own little RSA algorithm and in the course of that I wrote a function to find large prime numbers.
First I wrote a function prime? that tests for primality and then I wrote two versions of a prime searching function. In the first version I just test random BigIntegers until I hit a prime. In the second version I sample a random BigInteger and then incremented it until I find a prime.
(defn resampling []
(let [rnd (Random.)]
(->> (repeatedly #(BigInteger. 512 rnd))
(take-while (comp not prime?))
(count))))
(defn incrementing []
(->> (BigInteger. 512 (Random.))
(iterate inc)
(take-while (comp not prime?))
(count)))
(let [n 100]
{:resampling (/ (reduce + (repeatedly n resampling)) n)
:incrementing (/ (reduce + (repeatedly n incrementing)) n)})
Running this code yielded the two averages of 332.41 for the resampling function and 310.74 for the incrementing function.
Now the first number makes complete sense to me. The prime number theorem states that the n'th prime is about n*ln(n) in size (where ln is the natural logarithm). So the distance between adjacent primes is approximately n*ln(n) - (n-1)*ln(n-1) ≈ (n - (n - 1))*ln(n) = ln(n) (For large values of n ln(n) ≈ ln(n - 1)). Since I'm sampling 512-bit integers I'd expect the distance between primes to be in the vicinity of ln(2^512) = 354.89. Therefore random sampling should take about 354.89 attempts on average before hitting a prime, which comes out quite nicely.
The puzzle for me is why the incrementing function is taking about just as many steps. If I imagine throwing a dart at a grid where primes are spaced 355 units apart, it should take only about half that many steps on average to walk to the next higher prime, since on average I'd be hitting the center between two primes.
(The code for prime? is a little lengthy. You can take a look at it here.)
You assume that primes are equally distributed, that seems not to be the case.
Let's consider the following possible scenario: If primes would always come as pairs for example 10...01 and 10...03 then the next pair would come in 2*ln(n). For the sampling algorithm this distribution makes no difference, but for the incrementing algorithm the probability to start inside of a such pair is almost 0, so this means it would need to go a half of the big distance on average, that is ln(n).
In a nutshell: to estimate the behavior of the incremental algorithm right, it is not enough to know the average distance between the primes.
I have 2 matrices with doubles 200,000x3,000 and 3,000x200,000. They are dense and most values (80%) are filled.
How many iterations are needed for this?
The naive algorithm will take O(200,000 * 3,000 * 200,000), i.e. O(120,000,000,000,000) which is 120 trillion operations, so it will probably take a while.
The operands will each take about 4.5 GB whereas the output matrix will require about 298 GB, assuming 8 bytes per double.
It is not straightforward to compare Strassen to the naive algorithm as:
Furthermore, there is no need for the matrices to be square. Non-square matrices can be split in half using the same methods, yielding smaller non-square matrices. If the matrices are sufficiently non-square it will be worthwhile reducing the initial operation to more square products, using simple methods which are essentially O ( n2 ). For instance:
A product of size [2N x N] * [N x 10N] can be done as 20 separate [N x N] * [N x N] operations, arranged to form the result;
A product of size [N x 10N] * [10N x N] can be done as 10 separate [N x N] * [N x N] operations, summed to form the result.
These techniques will make the implementation more complicated, compared to simply padding to a power-of-two square; however, it is a reasonable assumption that anyone undertaking an implementation of Strassen, rather than conventional, multiplication, will place a higher priority on computational efficiency than on simplicity of the implementation.
See also Adaptive Strassen’s Matrix Multiplication.
I want to generate random integers which are uniformly distributed in log space. That is, the log of the values of will be uniformly distributed.
A normal uniformly distributed unsigned int will have 75% of its magnitudes above 1 billion, and something like 99.98% above 1 million, so small values are underrepresented. A uniform value from log space would have the same number of values in the range 4-8, as 256-512, for example.
Ignoring negative values for now, one way I can think of is something like:
Random r = new Random();
return (int)Math.pow(2, r.nextDouble() * 31);
That should generate a 31-bit log-uniformly distributed. It's not going to be fast though, with an pow() operation in there and to introduce floating point values to generate integers is a bit of a smell. Furthermore, a lot of the range of double is lost by Random.nextDouble() and it is not clear to me if this code can even generate all 2^31-1 positive integer values.
Better solutions welcome.
There are two similar solutions below which both involve filling the integer with random bits, then shifting a random number of bits to the right. Something like:
int number = rand.nextInt(Integer.MAX_VALUE) >> rand.nextInt(Integer.SIZE);
This has two types of bias:
Step-wise bias
This produces sort of a stepwise log distributed value, not a smooth one. In particular, the right shift by a random value in [0,31], means there are 31 equally probable "sizes" of integers, and every value in that range is equally probable. Since there are 2^N values in range N, the values in one range are twice as probable as the ones in the next - so you get log behavior between the ranges, but the ranges themselves are flat.
I don't know of an easy way to get rid of this bias.
Top bit bias
A second form of bias occurs because the MSB is not always 1 (e.g., even a shift amount of 10, doesn't necessary produce a 31-10=21 bit value, there is an additional distortion. In effect, the ranges overlap. The value 1 is not just present (with p(1)=.5) for a shift amount of 30, but also for shifts of 29 (p(1)=0.25), 28 (p(1)=.125), and so on. That effect cancels out for smaller values (i.e., if you look at shift amounts of 30 and 29 only, 1 seems like it is 3x more likely than 2, rather than the predicted value of 2x, but once you look at more values it converges. It doesn't cancel out for large values, however, which is why you see the 20:32207 bucket be smaller than the others in #sprinter's answer.
I think this form of bias can pretty easily be removed simply by forcing the top bit to zero, so something like:
(r.nextInt(0x40000000) | 0x40000000) >> r.nextInt(31)
This has a couple of other tweaks - it a max of 2^30 for the rand, which is faster (special case for powers of 2 in nextInt(int) code), since we never want the second-from-MSB bit set anyway (we force it to 1). This also eliminates a microscopic additional source of bias which is that Integer.MAX_VALUE could never be generated, so one value is missing from full representation.
It shifts by [0,31) bits so you never get zero, if you want zeros too, change that to shift by [0,32) bits and you'll get zeros equal in frequency to ones (technically not log-distributed anymore, but useful in many cases). Another approach is to subtract one from the final value to get zeros (at the cost of never getting Integer.MAX_VALUE).
Incorrect answer provided for information only. This does not satisfy OP's requirements for the reasons given in the question.
int number = rand.nextInt(Integer.MAX_VALUE) >> rand.nextInt(Integer.SIZE);
My informal test of that seems to indicate there is the expected skew. I generated 1M numbers this way and had the following distribution of the log (ignoring zeros)
0:46819
1:47045
2:40663
3:44001
4:45306
5:43802
6:46447
7:43355
8:47366
9:42747
10:46387
11:43899
12:45179
13:45496
14:44431
15:46751
16:43055
17:47127
18:41243
19:41837
20:32207
21:11965
Per the Java documentation, the hash code for a String object is computed as:
s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
using int arithmetic, where s[i] is the
ith character of the string, n is the length of
the string, and ^ indicates exponentiation.
Why is 31 used as a multiplier?
I understand that the multiplier should be a relatively large prime number. So why not 29, or 37, or even 97?
According to Joshua Bloch's Effective Java (a book that can't be recommended enough, and which I bought thanks to continual mentions on stackoverflow):
The value 31 was chosen because it is an odd prime. If it were even and the multiplication overflowed, information would be lost, as multiplication by 2 is equivalent to shifting. The advantage of using a prime is less clear, but it is traditional. A nice property of 31 is that the multiplication can be replaced by a shift and a subtraction for better performance: 31 * i == (i << 5) - i. Modern VMs do this sort of optimization automatically.
(from Chapter 3, Item 9: Always override hashcode when you override equals, page 48)
Goodrich and Tamassia computed from over 50,000 English words (formed as the union of the word lists provided in two variants of Unix) that using the constants 31, 33, 37, 39, and 41 will produce fewer than 7 collisions in each case. This may be the reason that so many Java implementations choose such constants.
See section 9.2 Hash Tables (page 522) of Data Structures and Algorithms in Java.
On (mostly) old processors, multiplying by 31 can be relatively cheap. On an ARM, for instance, it is only one instruction:
RSB r1, r0, r0, ASL #5 ; r1 := - r0 + (r0<<5)
Most other processors would require a separate shift and subtract instruction. However, if your multiplier is slow this is still a win. Modern processors tend to have fast multipliers so it doesn't make much difference, so long as 32 goes on the correct side.
It's not a great hash algorithm, but it's good enough and better than the 1.0 code (and very much better than the 1.0 spec!).
By multiplying, bits are shifted to the left. This uses more of the available space of hash codes, reducing collisions.
By not using a power of two, the lower-order, rightmost bits are populated as well, to be mixed with the next piece of data going into the hash.
The expression n * 31 is equivalent to (n << 5) - n.
You can read Bloch's original reasoning under "Comments" in http://bugs.java.com/bugdatabase/view_bug.do?bug_id=4045622. He investigated the performance of different hash functions in regards to the resulting "average chain size" in a hash table. P(31) was one of the common functions during that time which he found in K&R's book (but even Kernighan and Ritchie couldn't remember where it came from). In the end he basically had to choose one and so he took P(31) since it seemed to perform well enough. Even though P(33) was not really worse and multiplication by 33 is equally fast to calculate (just a shift by 5 and an addition), he opted for 31 since 33 is not a prime:
Of the remaining
four, I'd probably select P(31), as it's the cheapest to calculate on a RISC
machine (because 31 is the difference of two powers of two). P(33) is
similarly cheap to calculate, but it's performance is marginally worse, and
33 is composite, which makes me a bit nervous.
So the reasoning was not as rational as many of the answers here seem to imply. But we're all good in coming up with rational reasons after gut decisions (and even Bloch might be prone to that).
Actually, 37 would work pretty well! z := 37 * x can be computed as y := x + 8 * x; z := x + 4 * y. Both steps correspond to one LEA x86 instructions, so this is extremely fast.
In fact, multiplication with the even-larger prime 73 could be done at the same speed by setting y := x + 8 * x; z := x + 8 * y.
Using 73 or 37 (instead of 31) might be better, because it leads to denser code: The two LEA instructions only take 6 bytes vs. the 7 bytes for move+shift+subtract for the multiplication by 31. One possible caveat is that the 3-argument LEA instructions used here became slower on Intel's Sandy bridge architecture, with an increased latency of 3 cycles.
Moreover, 73 is Sheldon Cooper's favorite number.
Neil Coffey explains why 31 is used under Ironing out the bias.
Basically using 31 gives you a more even set-bit probability distribution for the hash function.
From JDK-4045622, where Joshua Bloch describes the reasons why that particular (new) String.hashCode() implementation was chosen
The table below summarizes the performance of the various hash
functions described above, for three data sets:
1) All of the words and phrases with entries in Merriam-Webster's
2nd Int'l Unabridged Dictionary (311,141 strings, avg length 10 chars).
2) All of the strings in /bin/, /usr/bin/, /usr/lib/, /usr/ucb/
and /usr/openwin/bin/* (66,304 strings, avg length 21 characters).
3) A list of URLs gathered by a web-crawler that ran for several
hours last night (28,372 strings, avg length 49 characters).
The performance metric shown in the table is the "average chain size"
over all elements in the hash table (i.e., the expected value of the
number of key compares to look up an element).
Webster's Code Strings URLs
--------- ------------ ----
Current Java Fn. 1.2509 1.2738 13.2560
P(37) [Java] 1.2508 1.2481 1.2454
P(65599) [Aho et al] 1.2490 1.2510 1.2450
P(31) [K+R] 1.2500 1.2488 1.2425
P(33) [Torek] 1.2500 1.2500 1.2453
Vo's Fn 1.2487 1.2471 1.2462
WAIS Fn 1.2497 1.2519 1.2452
Weinberger's Fn(MatPak) 6.5169 7.2142 30.6864
Weinberger's Fn(24) 1.3222 1.2791 1.9732
Weinberger's Fn(28) 1.2530 1.2506 1.2439
Looking at this table, it's clear that all of the functions except for
the current Java function and the two broken versions of Weinberger's
function offer excellent, nearly indistinguishable performance. I
strongly conjecture that this performance is essentially the
"theoretical ideal", which is what you'd get if you used a true random
number generator in place of a hash function.
I'd rule out the WAIS function as its specification contains pages of random numbers, and its performance is no better than any of the
far simpler functions. Any of the remaining six functions seem like
excellent choices, but we have to pick one. I suppose I'd rule out
Vo's variant and Weinberger's function because of their added
complexity, albeit minor. Of the remaining four, I'd probably select
P(31), as it's the cheapest to calculate on a RISC machine (because 31
is the difference of two powers of two). P(33) is similarly cheap to
calculate, but it's performance is marginally worse, and 33 is
composite, which makes me a bit nervous.
Josh
Bloch doesn't quite go into this, but the rationale I've always heard/believed is that this is basic algebra. Hashes boil down to multiplication and modulus operations, which means that you never want to use numbers with common factors if you can help it. In other words, relatively prime numbers provide an even distribution of answers.
The numbers that make up using a hash are typically:
modulus of the data type you put it into
(2^32 or 2^64)
modulus of the bucket count in your hashtable (varies. In java used to be prime, now 2^n)
multiply or shift by a magic number in your mixing function
The input value
You really only get to control a couple of these values, so a little extra care is due.
In latest version of JDK, 31 is still used. https://docs.oracle.com/en/java/javase/12/docs/api/java.base/java/lang/String.html#hashCode()
The purpose of hash string is
unique (Let see operator ^ in hashcode calculation document, it help unique)
cheap cost for calculating
31 is max value can put in 8 bit (= 1 byte) register, is largest prime number can put in 1 byte register, is odd number.
Multiply 31 is <<5 then subtract itself, therefore need cheap resources.
Java String hashCode() and 31
This is because 31 has a nice property – it's multiplication can be replaced by a bitwise shift which is faster than the standard multiplication:
31 * i == (i << 5) - i
I'm not sure, but I would guess they tested some sample of prime numbers and found that 31 gave the best distribution over some sample of possible Strings.
A big expectation from hash functions is that their result's uniform randomness survives an operation such as hash(x) % N where N is an arbitrary number (and in many cases, a power of two), one reason being that such operations are used commonly in hash tables for determining slots. Using prime number multipliers when computing the hash decreases the probability that your multiplier and the N share divisors, which would make the result of the operation less uniformly random.
Others have pointed out the nice property that multiplication by 31 can be done by a multiplication and a subtraction. I just want to point out that there is a mathematical term for such primes: Mersenne Prime
All mersenne primes are one less than a power of two so we can write them as:
p = 2^n - 1
Multiplying x by p:
x * p = x * (2^n - 1) = x * 2^n - x = (x << n) - x
Shifts (SAL/SHL) and subtractions (SUB) are generally faster than multiplications (MUL) on many machines. See instruction tables from Agner Fog
That's why GCC seems to optimize multiplications by mersenne primes by replacing them with shifts and subs, see here.
However, in my opinion, such a small prime is a bad choice for a hash function. With a relatively good hash function, you would expect to have randomness at the higher bits of the hash. However, with the Java hash function, there is almost no randomness at the higher bits with shorter strings (and still highly questionable randomness at the lower bits). This makes it more difficult to build efficient hash tables. See this nice trick you couldn't do with the Java hash function.
Some answers mention that they believe it is good that 31 fits into a byte. This is actually useless since:
(1) We execute shifts instead of multiplications, so the size of the multiplier does not matter.
(2) As far as I know, there is no specific x86 instruction to multiply an 8 byte value with a 1 byte value so you would have needed to convert "31" to a 8 byte value anyway even if you were multiplying. See here, you multiply entire 64bit registers.
(And 127 is actually the largest mersenne prime that could fit in a byte.)
Does a smaller value increase randomness in the middle-lower bits? Maybe, but it also seems to greatly increase the possible collisions :).
One could list many different issues but they generally boil down to two core principles not being fulfilled well: Confusion and Diffusion
But is it fast? Probably, since it doesn't do much. However, if performance is really the focus here, one character per loop is quite inefficient. Why not do 4 characters at a time (8 bytes) per loop iteration for longer strings, like this? Well, that would be difficult to do with the current definition of hash where you need to multiply every character individually (please tell me if there is a bit hack to solve this :D).