I am writing a program that solves a sum of tenth powers problem and I need to have a fast algorithm to find n^10 as well as n^(1/10) For natural n<1 000 000. I am precomputing an array of powers, so n^10 (array lookup) takes O(1). For n^(1/10) I am doing a binary search. Is there any way to accelerate extraction of a root beyond that? For example, making an array and filling elements with corresponding roots if the index is a perfect power or leaving zero otherwise would give O(1), but I will run out of memory. Is there a way to make root extraction faster than O(log(n))?
Why should the array of roots run out of memory? If it is the same size as the array of powers, it will fit using the same datatypes. However for the powers, (10^6)^10 = 10^60, which does not fit into a long variable so you need to use biginteger or bigdecimal types. In case your number n is bigger than the biggest array size n_max your memory can afford, you can divide n by n_m until it fits, i.e. split n = n_max^m*k, where m is a natural number and k < n_max:
public class Roots
{
static final int N_MAX = 1_000_000;
double[] roots = new double[N_MAX+1];
Roots() {for (int i = 0; i <= N_MAX; i++) {roots[i] = Math.pow(i, 0.1);}}
double root(long n)
{
int m = 0;
while (n > N_MAX)
{
n /= N_MAX;
m++;
}
return (Math.pow(roots[N_MAX],m)*roots[(int)n]); // in a real case you would precompute pow(roots[N_MAX],m) as well
}
static public void main(String[] args)
{
Roots root = new Roots();
System.out.println(root.root(1000));
System.out.println(root.root(100_000_000_000_000l));
}
}
Apart LUT You got two options to speed up I can think of:
use binary search without multiplication
If you are using bignums then 10th-root binary search search is not O(log(n)) anymore as the basic operation used in it are no longer O(1) !!! For example +,-,<<,>>,|,&,^,>=,<=,>,<,==,!= will became O(b) and * will be O(b^2) or O(b.log(b)) where b=log(n) depending on algorithm used (or even operand magnitude). So naive binary search for root finding will be in the better case O(log^2(n).log(log(n)))
To speedup it you can try not to use multiplication. Yes it is possible and the final complexity will bee O(log^2(n)) Take a look at:
How to get a square root for 32 bit input in one clock cycle only?
To see how to achieve this. The difference is only in solving different equations:
x1 = x0+m
x1^10 = f(x0,m)
If you obtain algebraically x1=f(x0,m) then each multiplication inside translate to bit-shifts and adds... For example 10*x = x<<1 + x<<3. The LUT table is not necessary as you can iterate it during binary search.
I imagine that f(x0,m) will contain lesser powers of x0 so analogically compute all the needed powers too ... so the final result will have no powering. Sorry too lazy to do that for you, you can use some math app for that like Derive for Windows
you can use pow(x,y) = x^y = exp2(y*log2(x))
So x^0.1 = exp2(log2(x)/10) But you would need bigdecimals for this (or fixed point) here see how I do it:
How can I write a power function myself?
For more ideas see this:
Power by squaring for negative exponents
Related
This is a common interview question.
You have a stream of numbers coming in (let's say more than a million). The numbers are between [0-999]).
Implement a class which supports three methods in O(1)
* insert(int i);
* getMean();
* getMedian();
This is my code.
public class FindAverage {
private int[] store;
private long size;
private long total;
private int highestIndex;
private int lowestIndex;
public FindAverage() {
store = new int[1000];
size = 0;
total = 0;
highestIndex = Integer.MIN_VALUE;
lowestIndex = Integer.MAX_VALUE;
}
public void insert(int item) throws OutOfRangeException {
if(item < 0 || item > 999){
throw new OutOfRangeException();
}
store[item] ++;
size ++;
total += item;
highestIndex = Integer.max(highestIndex, item);
lowestIndex = Integer.min(lowestIndex, item);
}
public float getMean(){
return (float)total/size;
}
public float getMedian(){
}
}
I can't seem to think of a way to get the median in O(1) time.
Any help appreciated.
You have already done all the heavy lifting, by building the store counters. Together with the size value, it's easy enough.
You simply start iterating the store, summing up the counts until you reach half of size. That is your median value, if size is odd. For even size, you'll grab the two surrounding values and get their average.
Performance is O(1000/2) on average, which means O(1), since it doesn't depend on n, i.e. performance is unchanged even if n reaches into the billions.
Remember, O(1) doesn't mean instant, or even fast. As Wikipedia says it:
An algorithm is said to be constant time (also written as O(1) time) if the value of T(n) is bounded by a value that does not depend on the size of the input.
In your case, that bound is 1000.
The possible values that you can read are quite limited - just 1000. So you can think of implementing something like a counting sort - each time a number is input you increase the counter for that value.
To implement the median in constant time, you will need two numbers - the median index(i.e. the value of the median) and the number of values you've read and that are on the left(or right) of the median. I will just stop here hoping you will be able to figure out how to continue on your own.
EDIT(as pointed out in the comments): you already have the array with the sorted elements(stored) and you know the number of elements to the left of the median(size/2). You only need to glue the logic together. I would like to point out that if you use linear additional memory you won't need to iterate over the whole array on each insert.
For the general case, where range of elements is unlimited, such data structure does not exist based on any comparisons based algorithm, as it will allow O(n) sorting.
Proof: Assume such DS exist, let it be D.
Let A be input array for sorting. (Assume A.size() is even for simplicity, that can be relaxed pretty easily by adding a garbage element and discarding it later).
sort(A):
ds = new D()
for each x in A:
ds.add(x)
m1 = min(A) - 1
m2 = max(A) + 1
for (i=0; i < A.size(); i++):
ds.add(m1)
# at this point, ds.median() is smallest element in A
for (i = 0; i < A.size(); i++):
yield ds.median()
# Each two insertions advances median by 1
ds.add(m2)
ds.add(m2)
Claim 1: This algorithm runs in O(n).
Proof: Since we have constant operations of add() and median(), each of them is O(1) per iteration, and the number of iterations is linear - the complexity is linear.
Claim 2: The output is sorted(A).
Proof (guidelines): After inserting n times m1, the median is the smallest element in A. Each two insertions after it advances the median by one item, and since the advance is sorted, the total output is sorted.
Since the above algorithm sorts in O(n), and not possible under comparisons model, such DS does not exist.
QED.
This is a problem I'm trying to solve on my own to be a bit better at recursion(not homework). I believe I found a solution, but I'm not sure about the time complexity (I'm aware that DP would give me better results).
Find all the ways you can go up an n step staircase if you can take k steps at a time such that k <= n
For example, if my step sizes are [1,2,3] and the size of the stair case is 10, I could take 10 steps of size 1 [1,1,1,1,1,1,1,1,1,1]=10 or I could take 3 steps of size 3 and 1 step of size 1 [3,3,3,1]=10
Here is my solution:
static List<List<Integer>> problem1Ans = new ArrayList<List<Integer>>();
public static void problem1(int numSteps){
int [] steps = {1,2,3};
problem1_rec(new ArrayList<Integer>(), numSteps, steps);
}
public static void problem1_rec(List<Integer> sequence, int numSteps, int [] steps){
if(problem1_sum_seq(sequence) > numSteps){
return;
}
if(problem1_sum_seq(sequence) == numSteps){
problem1Ans.add(new ArrayList<Integer>(sequence));
return;
}
for(int stepSize : steps){
sequence.add(stepSize);
problem1_rec(sequence, numSteps, steps);
sequence.remove(sequence.size()-1);
}
}
public static int problem1_sum_seq(List<Integer> sequence){
int sum = 0;
for(int i : sequence){
sum += i;
}
return sum;
}
public static void main(String [] args){
problem1(10);
System.out.println(problem1Ans.size());
}
My guess is that this runtime is k^n where k is the numbers of step sizes, and n is the number of steps (3 and 10 in this case).
I came to this answer because each step size has a loop that calls k number of step sizes. However, the depth of this is not the same for all step sizes. For instance, the sequence [1,1,1,1,1,1,1,1,1,1] has more recursive calls than [3,3,3,1] so this makes me doubt my answer.
What is the runtime? Is k^n correct?
TL;DR: Your algorithm is O(2n), which is a tighter bound than O(kn), but because of some easily corrected inefficiencies the implementation runs in O(k2 × 2n).
In effect, your solution enumerates all of the step-sequences with sum n by successively enumerating all of the viable prefixes of those step-sequences. So the number of operations is proportional to the number of step sequences whose sum is less than or equal to n. [See Notes 1 and 2].
Now, let's consider how many possible prefix sequences there are for a given value of n. The precise computation will depend on the steps allowed in the vector of step sizes, but we can easily come up with a maximum, because any step sequence is a subset of the set of integers from 1 to n, and we know that there are precisely 2n such subsets.
Of course, not all subsets qualify. For example, if the set of step-sizes is [1, 2], then you are enumerating Fibonacci sequences, and there are O(φn) such sequences. As k increases, you will get closer and closer to O(2n). [Note 3]
Because of the inefficiencies in your coded, as noted, your algorithm is actually O(k2 αn) where α is some number between φ and 2, approaching 2 as k approaches infinity. (φ is 1.618..., or (1+sqrt(5))/2)).
There are a number of improvements that could be made to your implementation, particularly if your intent was to count rather than enumerate the step sizes. But that was not your question, as I understand it.
Notes
That's not quite exact, because you actually enumerate a few extra sequences which you then reject; the cost of these rejections is a multiplier by the size of the vector of possible step sizes. However, you could easily eliminate the rejections by terminating the for loop as soon as a rejection is noticed.
The cost of an enumeration is O(k) rather than O(1) because you compute the sum of the sequence arguments for each enumeration (often twice). That produces an additional factor of k. You could easily eliminate this cost by passing the current sum into the recursive call (which would also eliminate the multiple evaluations). It is trickier to avoid the O(k) cost of copying the sequence into the output list, but that can be done using a better (structure-sharing) data-structure.
The question in your title (as opposed to the problem solved by the code in the body of your question) does actually require enumerating all possible subsets of {1…n}, in which case the number of possible sequences would be exactly 2n.
If you want to solve this recursively, you should use a different pattern that allows caching of previous values, like the one used when calculating Fibonacci numbers. The code for Fibonacci function is basically about the same as what do you seek, it adds previous and pred-previous numbers by index and returns the output as current number. You can use the same technique in your recursive function , but add not f(k-1) and f(k-2), but gather sum of f(k-steps[i]). Something like this (I don't have a Java syntax checker, so bear with syntax errors please):
static List<Integer> cache = new ArrayList<Integer>;
static List<Integer> storedSteps=null; // if used with same value of steps, don't clear cache
public static Integer problem1(Integer numSteps, List<Integer> steps) {
if (!ArrayList::equal(steps, storedSteps)) { // check equality data wise, not link wise
storedSteps=steps; // or copy with whatever method there is
cache.clear(); // remove all data - now invalid
// TODO make cache+storedSteps a single structure
}
return problem1_rec(numSteps,steps);
}
private static Integer problem1_rec(Integer numSteps, List<Integer> steps) {
if (0>numSteps) { return 0; }
if (0==numSteps) { return 1; }
if (cache.length()>=numSteps+1) { return cache[numSteps] } // cache hit
Integer acc=0;
for (Integer i : steps) { acc+=problem1_rec(numSteps-i,steps); }
cache[numSteps]=acc; // cache miss. Make sure ArrayList supports inserting by index, otherwise use correct type
return acc;
}
so I have a question about an algorithm I'm supposed to "invent"/"find". It's an algorithm which calculates 2^(n) - 1 for Ө(n^n) and Ө(1) and Ө(n).
I was thinking for several hours but I couldn't find any solution for both tasks (the first ones while the last one was the easist imo, I posted the algorithm below). But I'm not skilled enough to "invent"/"find" one for a very slow and very fast algorithm.
So far my algorithms are (In Pseudocode):
The one for Ө(n)
int f(int n) {
int number = 2
if(n = 0) then return 0
if(n==1) then return 1
while(n > 1)
number = number * 2
n--
number = number - 1
return number
A simple one and kinda obvious one which uses recursion though I don't know how fast it is (It would be nice if someone could tell me that):
int f(int n) {
if(n==0) then return 0
if(n==1) then return 1
return 3*f(n-1) - 2*f(n-2)
}
Assuming n is not bounded by any constant (and output should not be a simple int, but a data type that can contain large integers to allow it) - there is no algorithm
to yield 2^n -1 in Ө(1), since the size of the output itself is
Ө(log(n)), so if we assume there is such algorithm, and let it
run in constant time and makes less than C operations, for n =
2^(C+1), you will require C+1 operations only to print the
output, which contradicts the assumption that C is the upper bound, so
there is no such algorithm.
For Ө(n^n), if you have a more efficient algorithm (Ө(n) for example), you can make a pointless loop that runs extra n^n iterations and do nothing important, it will make your algorithm Ө(n^n).
There is also a Ө(log(n)*M(logn)) algorithm, using exponent by squaring, and then simply reducing 1 from this value. In here M(x) is complexity of your multiplying operator for number containing x digits.
As commented by #kajacx, you can even improve (3) by applying Fourier transform
Something like:
HugeInt h = 1;
h = h << n;
h = h - 1;
Obviously HugeInt is pseudo-code for an integer type that can be of arbitrary size allowing for any n.
=====
Look at amit's answer instead!
the Ө(n^n) is too tricky for me, but a real Ө(1) algorithm on any "binary" architecture would be:
return n-1 bits filled with 1
(assuming your architecture can allocate and fill n-1 bits in constant time)
;)
I have a map of items with some probability distribution:
Map<SingleObjectiveItem, Double> itemsDistribution;
Given a certain m I have to generate a Set of m elements sampled from the above distribution.
As of now I was using the naive way of doing it:
while(mySet.size < m)
mySet.add(getNextSample(itemsDistribution));
The getNextSample(...) method fetches an object from the distribution as per its probability. Now, as m increases the performance severely suffers. For m = 500 and itemsDistribution.size() = 1000 elements, there is too much thrashing and the function remains in the while loop for too long. Generate 1000 such sets and you have an application that crawls.
Is there a more efficient way to generate a unique set of random numbers with a "predefined" distribution? Most collection shuffling techniques and the like are uniformly random. What would be a good way to address this?
UPDATE: The loop will call getNextSample(...) "at least" 1 + 2 + 3 + ... + m = m(m+1)/2 times. That is in the first run we'll definitely get a sample for the set. The 2nd iteration, it may be called at least twice and so on. If getNextSample is sequential in nature, i.e., goes through the entire cumulative distribution to find the sample, then the run time complexity of the loop is at least: n*m(m+1)/2, 'n' is the number of elements in the distribution. If m = cn; 0<c<=1 then the loop is at least Sigma(n^3). And that too is the lower bound!
If we replace sequential search by binary search, the complexity would be at least Sigma(log n * n^2). Efficient but may not be by a large margin.
Also, removing from the distribution is not possible since I call the above loop k times, to generate k such sets. These sets are part of a randomized 'schedule' of items. Hence a 'set' of items.
Start out by generating a number of random points in two dimentions.
Then apply your distribution
Now find all entries within the distribution and pick the x coordinates, and you have your random numbers with the requested distribution like this:
The problem is unlikely to be the loop you show:
Let n be the size of the distribution, and I be the number of invocations to getNextSample. We have I = sum_i(C_i), where C_i is the number of invocations to getNextSample while the set has size i. To find E[C_i], observe that C_i is the inter-arrival time of a poisson process with λ = 1 - i / n, and therefore exponentially distributed with λ. Therefore, E[C_i] = 1 / λ = therefore E[C_i] = 1 / (1 - i / n) <= 1 / (1 - m / n). Therefore, E[I] < m / (1 - m / n).
That is, sampling a set of size m = n/2 will take, on average, less than 2m = n invocations of getNextSample. If that is "slow" and "crawls", it is likely because getNextSample is slow. This is actually unsurprising, given the unsuitable way the distrubution is passed to the method (because the method will, of necessity, have to iterate over the entire distribution to find a random element).
The following should be faster (if m < 0.8 n)
class Distribution<T> {
private double[] cummulativeWeight;
private T[] item;
private double totalWeight;
Distribution(Map<T, Double> probabilityMap) {
int i = 0;
cummulativeWeight = new double[probabilityMap.size()];
item = (T[]) new Object[probabilityMap.size()];
for (Map.Entry<T, Double> entry : probabilityMap.entrySet()) {
item[i] = entry.getKey();
totalWeight += entry.getValue();
cummulativeWeight[i] = totalWeight;
i++;
}
}
T randomItem() {
double weight = Math.random() * totalWeight;
int index = Arrays.binarySearch(cummulativeWeight, weight);
if (index < 0) {
index = -index - 1;
}
return item[index];
}
Set<T> randomSubset(int size) {
Set<T> set = new HashSet<>();
while(set.size() < size) {
set.add(randomItem());
}
return set;
}
}
public class Test {
public static void main(String[] args) {
int max = 1_000_000;
HashMap<Integer, Double> probabilities = new HashMap<>();
for (int i = 0; i < max; i++) {
probabilities.put(i, (double) i);
}
Distribution<Integer> d = new Distribution<>(probabilities);
Set<Integer> set = d.randomSubset(max / 2);
//System.out.println(set);
}
}
The expected runtime is O(m / (1 - m / n) * log n). On my computer, a subset of size 500_000 of a set of 1_000_000 is computed in about 3 seconds.
As we can see, the expected runtime approaches infinity as m approaches n. If that is a problem (i.e. m > 0.9 n), the following more complex approach should work better:
Set<T> randomSubset(int size) {
Set<T> set = new HashSet<>();
while(set.size() < size) {
T randomItem = randomItem();
remove(randomItem); // removes the item from the distribution
set.add(randomItem);
}
return set;
}
To efficiently implement remove requires a different representation for the distribution, for instance a binary tree where each node stores the total weight of the subtree whose root it is.
But that is rather complicated, so I wouldn't go that route if m is known to be significantly smaller than n.
If you are not concerning with randomness properties too much then I do it like this:
create buffer for pseudo-random numbers
double buff[MAX]; // [edit1] double pseudo random numbers
MAX is size should be big enough ... 1024*128 for example
type can be any (float,int,DWORD...)
fill buffer with numbers
you have range of numbers x = < x0,x1 > and probability function probability(x) defined by your probability distribution so do this:
for (i=0,x=x0;x<=x1;x+=stepx)
for (j=0,n=probability(x)*MAX,q=0.1*stepx/n;j<n;j++,i++) // [edit1] unique pseudo-random numbers
buff[i]=x+(double(i)*q); // [edit1] ...
The stepx is your accuracy for items (for integral types = 1) now the buff[] array has the same distribution as you need but it is not pseudo-random. Also you should add check if j is not >= MAX to avoid array overruns and also at the end the real size of buff[] is j (can be less than MAX due to rounding)
shuffle buff[]
do just few loops of swap buff[i] and buff[j] where i is the loop variable and j is pseudo-random <0-MAX)
write your pseudo-random function
it just return number from the buffer. At first call returns the buff[0] at second buff[1] and so on ... For standard generators When you hit the end of buff[] then shuffle buff[] again and start from buff[0] again. But as you need unique numbers then you can not reach the end of buffer so so set MAX to be big enough for your task otherwise uniqueness will not be assured.
[Notes]
MAX should be big enough to store the whole distribution you want. If it is not big enough then items with low probability can be missing completely.
[edit1] - tweaked answer a little to match the question needs (pointed by meriton thanks)
PS. complexity of initialization is O(N) and for get number is O(1).
You should implement your own random number generator (using a MonteCarlo methode or any good uniform generator like mersen twister) and basing on the inversion method (here).
For example : exponential law: generate a uniform random number u in [0,1] then your random variable of the exponential law would be : ln(1-u)/(-lambda) lambda being the exponential law parameter and ln the natural logarithm.
Hope it'll help ;).
I think you have two problems:
Your itemDistribution doesn't know you need a set, so when the set you are building gets
large you will pick a lot of elements that are already in the set. If you start with the
set all full and remove elements you will run into the same problem for very small sets.
Is there a reason why you don't remove the element from the itemDistribution after you
picked it? Then you wouldn't pick the same element twice?
The choice of datastructure for itemDistribution looks suspicious to me. You want the
getNextSample operation to be fast. Doesn't the map from values to probability force you
to iterate through large parts of the map for each getNextSample. I'm no good at
statistics but couldn't you represent the itemDistribution the other way, like a map from
probability, or maybe the sum of all smaller probabilities + probability to a element
of the set?
Your performance depends on how your getNextSample function works. If you have to iterate over all probabilities when you pick the next item, it might be slow.
A good way to pick several unique random items from a list is to first shuffle the list and then pop items off the list. You can shuffle the list once with the given distribution. From then on, picking your m items ist just popping the list.
Here's an implementation of a probabilistic shuffle:
List<Item> prob_shuffle(Map<Item, int> dist)
{
int n = dist.length;
List<Item> a = dist.keys();
int psum = 0;
int i, j;
for (i in dist) psum += dist[i];
for (i = 0; i < n; i++) {
int ip = rand(psum); // 0 <= ip < psum
int jp = 0;
for (j = i; j < n; j++) {
jp += dist[a[j]];
if (ip < jp) break;
}
psum -= dist[a[j]];
Item tmp = a[i];
a[i] = a[j];
a[j] = tmp;
}
return a;
}
This in not Java, but pseudocude after an implementation in C, so please take it with a grain of salt. The idea is to append items to the shuffled area by continuously picking items from the unshuffled area.
Here, I used integer probabilities. (The proabilities don't have to add to a special value, it's just "bigger is better".) You can use floating-point numbers but because of inaccuracies, you might end up going beyond the array when picking an item. You should use item n - 1 then. If you add that saftey net, you could even have items with zero probability that always get picked last.
There might be a method to speed up the picking loop, but I don't really see how. The swapping renders any precalculations useless.
Accumulate your probabilities in a table
Probability
Item Actual Accumulated
Item1 0.10 0.10
Item2 0.30 0.40
Item3 0.15 0.55
Item4 0.20 0.75
Item5 0.25 1.00
Make a random number between 0.0 and 1.0 and do a binary search for the first item with a sum that is greater than your generated number. This item would have been chosen with the desired probability.
Ebbe's method is called rejection sampling.
I sometimes use a simple method, using an inverse cumulative distribution function, which is a function that maps a number X between 0 and 1 onto the Y axis.
Then you just generate a uniformly distributed random number between 0 and 1, and apply the function to it.
That function is also called the "quantile function".
For example, suppose you want to generate a normally distributed random number.
It's cumulative distribution function is called Phi.
The inverse of that is called probit.
There are many ways to generate normal variates, and this is just one example.
You can easily construct an approximate cumulative distribution function for any univariate distribution you like, in the form of a table.
Then you can just invert it by table-lookup and interpolation.
Let's say that you have an arbitrarily large sized two-dimensional array with an even amount of items in it. Let's also assume for clarity that you can only choose between two things to put as a given item in the array. How would you go about putting a random choice at a given index in the array but once the array is filled you have an even split among the two choices?
If there are any answers with code, Java is preferred but other languages are fine as well.
You could basically think about it in the opposite way. Rather than deciding for a given index, which value to put in it, you could select n/2 elements from the array and place the first value in them. Then place the 2nd value in the other n/2.
A 2-D A[M,N] array can be mapped to a vector V[M*N] (you can use a row-major or a column-major order to do the mapping).
Start with a vector V[M*N]. Fill its first half with the first choice, and the second half of the array with the second choice object. Run a Fisher-Yates shuffle, and convert the shuffled array to a 2-D array. The array is now filled with elements that are evenly split among the two choices, and the choices at each particular index are random.
The below creates a List<T> the size of the area of the matrix, and fills it half with the first choice (spaces[0]) and half with the second (spaces[1]). Afterward, it applies a shuffle (namely Fisher-Yates, via Collections.shuffle) and begins to fill the matrix with these values.
static <T> void fill(final T[][] matrix, final T... space) {
final int w = matrix.length;
final int h = matrix[0].length;
final int area = w * h;
final List<T> sample = new ArrayList<T>(area);
final int half = area >> 1;
sample.addAll(Collections.nCopies(half, space[0]));
sample.addAll(Collections.nCopies(half, space[1]));
Collections.shuffle(sample);
final Iterator<T> cursor = sample.iterator();
for (int x = w - 1; x >= 0; --x) {
final T[] column = matrix[x];
for (int y = h - 1; y >= 0; --y) {
column[y] = cursor.next();
}
}
}
Pseudo-code:
int trues_remaining = size / 2;
int falses_remaining = size / 2;
while (trues_remaining + falses_remaining > 0)
{
if (trues_remaining > 0)
{
if (falses_remaining > 0)
array.push(getRandomBool());
else
array.push(true);
}
else
array.push(false);
}
Doesn't really scale to more than two values, though. How about:
assoc_array = { 1 = 4, 2 = 4, 3 = 4, 4 = 4 };
while (! assoc_array.isEmpty())
{
int index = rand(assoc_array.getNumberOfKeys());
int n = assoc_array.getKeyAtIndex(index);
array.push(n);
assoc_array[n]--;
if (assoc_array[n] <= 0) assoc_array.deleteKey(n);
}
EDIT: just noticed you asked for a two-dimensional array. Well it should be easy to adapt this approach to n-dimensional.
EDIT2: from your comment above, "school yard pick" is a great name for this.
It doesn't sound like your requirements for randomness are very strict, but I thought I'd contribute some more thoughts for anyone who may benefit from them.
You're basically asking for a pseudorandom binary sequence, and the most popular one I know of is the maximum length sequence. This uses a register of n bits along with a linear feedback shift register to define a periodic series of 1's and 0's that has a perfectly flat frequency spectrum. At least it is perfectly flat within certain bounds, determined by the sequence's period (2^n-1 bits).
What does that mean? Basically it means that the sequence is guaranteed to be maximally random across all shifts (and therefore frequencies) if its full length is used. When compared to an equal length sequence of numbers generated from a random number generator, it will contain MORE randomness per length than your typical randomly generated sequence.
It is for this reason that it is used to determine impulse functions in white noise analysis of systems, especially when experiment time is valuable and higher order cross effects are less important. Because the sequence is random relative to all shifts of itself, its auto-correlation is a perfect delta function (aside from qualifiers indicated above) so the stimulus does not contaminate the cross correlation between stimulus and response.
I don't really know what your application for this matrix is, but if it simply needs to "appear" random then this would do that very effectively. In terms of being balanced, 1's vs 0's, the sequence is guaranteed to have exactly one more 1 than 0. Therefore if you're trying to create a grid of 2^n, you would be guaranteed to get the correct result by tacking a 0 onto the end.
So an m-sequence is more random than anything you'll generate using a random number generator and it has a defined number of 0's and 1's. However, it doesn't allow for unqualified generation of 2d matrices of arbitrary size - only those where the total number of elements in the grid is a power of 2.