How to pick an item by its probability? - java

I have a list of items. Each of these items has its own probability.
Can anyone suggest an algorithm to pick an item based on its probability?

Generate a uniformly distributed random number.
Iterate through your list until the cumulative probability of the visited elements is greater than the random number
Sample code:
double p = Math.random();
double cumulativeProbability = 0.0;
for (Item item : items) {
cumulativeProbability += item.probability();
if (p <= cumulativeProbability) {
return item;
}
}

So with each item store a number that marks its relative probability, for example if you have 3 items one should be twice as likely to be selected as either of the other two then your list will have:
[{A,1},{B,1},{C,2}]
Then sum the numbers of the list (i.e. 4 in our case).
Now generate a random number and choose that index.
int index = rand.nextInt(4);
return the number such that the index is in the correct range.
Java code:
class Item {
int relativeProb;
String name;
//Getters Setters and Constructor
}
...
class RandomSelector {
List<Item> items = new List();
Random rand = new Random();
int totalSum = 0;
RandomSelector() {
for(Item item : items) {
totalSum = totalSum + item.relativeProb;
}
}
public Item getRandom() {
int index = rand.nextInt(totalSum);
int sum = 0;
int i=0;
while(sum < index ) {
sum = sum + items.get(i++).relativeProb;
}
return items.get(Math.max(0,i-1));
}
}

pretend that we have the following list
Item A 25%
Item B 15%
Item C 35%
Item D 5%
Item E 20%
Lets pretend that all the probabilities are integers, and assign each item a "range" that calculated as follows.
Start - Sum of probability of all items before
End - Start + own probability
The new numbers are as follows
Item A 0 to 25
Item B 26 to 40
Item C 41 to 75
Item D 76 to 80
Item E 81 to 100
Now pick a random number from 0 to 100. Lets say that you pick 32. 32 falls in Item B's range.
mj

You can try the Roulette Wheel Selection.
First, add all the probabilities, then scale all the probabilities in the scale of 1, by dividing each one by the sum. Suppose the scaled probabilities are A(0.4), B(0.3), C(0.25) and D(0.05). Then you can generate a random floating-point number in the range [0, 1). Now you can decide like this:
random number in [0.00, 0.40) -> pick A
in [0.40, 0.70) -> pick B
in [0.70, 0.95) -> pick C
in [0.95, 1.00) -> pick D
You can also do it with random integers - say you generate a random integer between 0 to 99 (inclusive), then you can make decision like the above.

Algorithm described in Ushman's, Brent's and #kaushaya's answers are implemented in Apache commons-math library.
Take a look at EnumeratedDistribution class (groovy code follows):
def probabilities = [
new Pair<String, Double>("one", 25),
new Pair<String, Double>("two", 30),
new Pair<String, Double>("three", 45)]
def distribution = new EnumeratedDistribution<String>(probabilities)
println distribution.sample() // here you get one of your values
Note that sum of probabilities doesn't need to be equal to 1 or 100 - it will be normalized automatically.

My method is pretty simple. Generate a random number. Now since the probabilities of your items are known,simply iterate through the sorted list of probability and pick the item whose probability is lesser than the randomly generated number.
For more details,read my answer here.

A slow but simple way to do it is to have every member to pick a random number based on its probability and pick the one with highest value.
Analogy:
Imagine 1 of 3 people needs to be chosen but they have different probabilities. You give them die with different amount of faces. First person's dice has 4 face, 2nd person's 6, and the third person's 8. They roll their die and the one with the biggest number wins.
Lets say we have the following list:
[{A,50},{B,100},{C,200}]
Pseudocode:
A.value = random(0 to 50);
B.value = random(0 to 100);
C.value = random (0 to 200);
We pick the one with the highest value.
This method above does not exactly map the probabilities. For example 100 will not have twice the chance of 50. But we can do it in a by tweaking the method a bit.
Method 2
Instead of picking a number from 0 to the weight we can limit them from the upper limit of previous variable to addition of the current variable.
[{A,50},{B,100},{C,200}]
Pseudocode:
A.lowLimit= 0; A.topLimit=50;
B.lowLimit= A.topLimit+1; B.topLimit= B.lowLimit+100
C.lowLimit= B.topLimit+1; C.topLimit= C.lowLimit+200
resulting limits
A.limits = 0,50
B.limits = 51,151
C.limits = 152,352
Then we pick a random number from 0 to 352 and compare it to each variable's limits to see whether the random number is in its limits.
I believe this tweak has better performance since there is only 1 random generation.
There is a similar method in other answers but this method does not require the total to be 100 or 1.00.

Brent's answer is good, but it doesn't account for the possibility of erroneously choosing an item with a probability of 0 in cases where p = 0. That's easy enough to handle by checking the probability (or perhaps not adding the item in the first place):
double p = Math.random();
double cumulativeProbability = 0.0;
for (Item item : items) {
cumulativeProbability += item.probability();
if (p <= cumulativeProbability && item.probability() != 0) {
return item;
}
}

A space-costly way is to clone each item the number of times its probability. Selection will be done in O(1).
For example
//input
[{A,1},{B,1},{C,3}]
// transform into
[{A,1},{B,1},{C,1},{C,1},{C,1}]
Then simply pick any item randomly from this transformed list.

Adapted the code from https://stackoverflow.com/a/37228927/11257746 into a general extention method. This will allow you to get a weighted random value from a Dictionary with the structure <TKey, int>, where int is a weight value.
A Key that has a value of 50 is 10 times more likely to be chosen than a key with the value of 5.
C# code using LINQ:
/// <summary>
/// Get a random key out of a dictionary which has integer values treated as weights.
/// A key in the dictionary with a weight of 50 is 10 times more likely to be chosen than an element with the weight of 5.
///
/// Example usage to get 1 item:
/// Dictionary<MyType, int> myTypes;
/// MyType chosenType = myTypes.GetWeightedRandomKey<MyType, int>().First();
///
/// Adapted into a general extention method from https://stackoverflow.com/a/37228927/11257746
/// </summary>
public static IEnumerable<TKey> GetWeightedRandomKey<TKey, TValue>(this Dictionary<TKey, int> dictionaryWithWeights)
{
int totalWeights = 0;
foreach (KeyValuePair<TKey, int> pair in dictionaryWithWeights)
{
totalWeights += pair.Value;
}
System.Random random = new System.Random();
while (true)
{
int randomWeight = random.Next(0, totalWeights);
foreach (KeyValuePair<TKey, int> pair in dictionaryWithWeights)
{
int weight = pair.Value;
if (randomWeight - weight > 0)
randomWeight -= weight;
else
{
yield return pair.Key;
break;
}
}
}
}
Example usage:
public enum MyType { Thing1, Thing2, Thing3 }
public Dictionary<MyType, int> MyWeightedDictionary = new Dictionary<MyType, int>();
public void MyVoid()
{
MyWeightedDictionary.Add(MyType.Thing1, 50);
MyWeightedDictionary.Add(MyType.Thing2, 25);
MyWeightedDictionary.Add(MyType.Thing3, 5);
// Get a single random key
MyType myChosenType = MyWeightedDictionary.GetWeightedRandomKey<MyType, int>().First();
// Get 20 random keys
List<MyType> myChosenTypes = MyWeightedDictionary.GetWeightedRandomKey<MyType, int>().Take(20).ToList();
}

If you don't mind adding a third party dependency in your code you can use the MockNeat.probabilities() method.
For example:
String s = mockNeat.probabilites(String.class)
.add(0.1, "A") // 10% chance to pick A
.add(0.2, "B") // 20% chance to pick B
.add(0.5, "C") // 50% chance to pick C
.add(0.2, "D") // 20% chance to pick D
.val();
Disclaimer: I am the author of the library, so I might be biased when I am recommending it.

All mentioned solutions have linear effort. The following has only logarithmic effort and deals also with unnormalized probabilities. I'd reccommend to use a TreeMap rather than a List:
import java.util.*;
import java.util.stream.IntStream;
public class ProbabilityMap<T> extends TreeMap<Double,T>{
private static final long serialVersionUID = 1L;
public static Random random = new Random();
public double sumOfProbabilities;
public Map.Entry<Double,T> next() {
return ceilingEntry(random.nextDouble()*sumOfProbabilities);
}
#Override public T put(Double key, T value) {
return super.put(sumOfProbabilities+=key, value);
}
public static void main(String[] args) {
ProbabilityMap<Integer> map = new ProbabilityMap<>();
map.put(0.1,1); map.put(0.3,3); map.put(0.2,2);
IntStream.range(0, 10).forEach(i->System.out.println(map.next()));
}
}

You could use the Julia code:
function selrnd(a::Vector{Int})
c = a[:]
sumc = c[1]
for i=2:length(c)
sumc += c[i]
c[i] += c[i-1]
end
r = rand()*sumc
for i=1:length(c)
if r <= c[i]
return i
end
end
end
This function returns the index of an item efficiently.

Related

How to generate random values where a percentage of them are 0?

I create a random stream
Random random = new Random();
Stream<Integer> boxed = random.ints(0, 100000000).boxed();
But I need 60% of the numbers generated to be 0, while the remaining can be truly random. How can I do it?
EDIT:
And I need only positive numbers and between 0-100
1
2
0
0
9
0
0
1
12
I'll assume the OP wants approximately 60% of the generated values to be zero, and the remaining approximate 40% to be (pseudo-)random values in the range 1-100, inclusive.
The JDK library makes it easy to generate a stream of N different values.
Since there are 100 values in the range [1,100], and this represents 40% of the output, there need to be 150 values that map to zero to cover the remaining 60%. Thus N is 250.
We can create a stream of ints in the rage [0,249] (inclusive) and map the lowest 150 values in this range to zero, leaving the remainder in the range [1,100]. Here's the code:
IntStream is = random.ints(0, 250)
.map(i -> Math.max(i-149, 0));
UPDATE
If the task is to produce exactly 60% zeroes, there's a way to do it, using a variation of an algorithm in Knuth, TAOCP Vol 2, sec 3.4.2, Random Sampling and Shuffling, Algorithm S. (I explain this algorithm in a bit more detail in this other answer.) This algorithm lets one choose n elements at random from a collection of N total elements, making a single pass over the collection.
In this case we're not selecting elements from a collection. Instead, we're emitting a known quantity of numbers, requiring some subset of them to be zeroes, with the remainder being random numbers from some range. The basic idea is that, as you emit numbers, the probability of emitting a zero depends on the quantity of zeroes remaining to be emitted vs. the quantity of numbers remaining to be emitted. Since this is a stream of fixed size, and it has a bit of state, I've opted to implement it using a Spliterator:
static IntStream randomWithPercentZero(int count, double pctZero, int range) {
return StreamSupport.intStream(
new Spliterators.AbstractIntSpliterator(count, Spliterator.SIZED) {
int remainingInts = count;
int remainingZeroes = (int)Math.round(count * pctZero);
Random random = new Random();
#Override
public boolean tryAdvance(IntConsumer action) {
if (remainingInts == 0)
return false;
if (random.nextDouble() < (double)remainingZeroes / remainingInts--) {
remainingZeroes--;
action.accept(0);
} else {
action.accept(random.nextInt(range) + 1);
}
return true;
}
},
false);
}
There's a fair bit of boilerplate, but you can see the core of the algorithm within tryAdvance. If no numbers are remaining, it returns false, signaling the end of the stream. Otherwise, it emits a number, with a certain probability (starting at 60%) of it being a zero, otherwise a random number in the desired range. As more zeroes are emitted, the numerator drops toward zero. If enough zeroes have been emitted, the fraction becomes zero and no more zeroes are emitted.
If few zeroes are emitted, the denominator drops until it gets closer to the numerator, increasing the probability of emitting a zero. If few enough zeroes are emitted, eventually the required quantity of zeroes equals the quantity of numbers remaining, so the value of the fraction becomes 1.0. If this happens, the rest of the stream is zeroes, so enough zeroes will always be emitted to meet the requirement. The nice thing about this approach is that there is no need to collect all the numbers in an array and shuffle them, or anything like that.
Call the method like this:
IntStream is = randomWithPercentZero(1_000_000, 0.60, 100);
This gets a stream of 1,000,000 ints, 60% of which are zeroes, and the remainder are in the range 1-100 (inclusive).
Since the size of the target interval is divisible by ten, you can count on the last digit of generated numbers being uniformly distributed. Hence, this simple approach should work:
Generate numbers in the range 0..1000
If the last digit of the random number r is 0..5, inclusive, return zero
Otherwise, return r / 10
Here is this approach in code:
Stream<Integer> boxed = random.ints(0, 1000).map(r -> r%10 < 6 ? 0 : r/10).boxed();
Demo.
You can use IntStream.map and re-use your Random instance to generate a random number from 0 to 9 inclusive, returning zero if it's in the first 60%, else the generated number:
Stream<Integer> boxed = random.ints(0, 100)
.map(i -> (random.nextInt(10) < 6) ? 0 : i)
.boxed();
Construct 10 objects. 6 of them returns 0 all the time. and rest 4 returns random based on your specs.
Now randomly select one of the object and call
List<Callable<Integer>> callables = new ArrayList<>(10);
for (int i = 0; i < 6; i++) {
callables.add(() -> 0);
}
Random rand = new Random();
for (int i = 6; i < 10; i++) {
callables.add(() -> rand.nextInt());
}
callables.get(rand.nextInt(10)).call();
This is simpler way to implement it. You can optimize it further.
Why not generate an array of those 60% (those are zero values to start with) and just randomly generate the other 40%:
List<Integer> toShuffle = IntStream
.concat(Arrays.stream(new int[60_000_000]),
random.ints(40_000_000, 0, Integer.MAX_VALUE))
.boxed()
.collect(Collectors.toCollection(() -> new ArrayList<>(100_000_000)));
Collections.shuffle(toShuffle);
If you want exactly 60% of zeros and 40% of strictly positive numbers, you can simply use a modulus check:
Stream<Integer> boxed = IntStream.range(0, 100_000_000)
.map(i -> (i % 10 < 6) ? 0 : r.nextInt(Integer.MAX_VALUE) + 1)
.boxed();
You may want to "shuffle" the stream after that to avoid having 6 zeros in a row every ten positions.
You can controll your stream to generate exactly 60% of zero by counting the generated values and producing zero to reach the desired 60%.
public class RandomGeneratorSample {
public static void main(String... strings) {
Random random = new Random();
Controll60PercentOfZero controll60 = new Controll60PercentOfZero();
Stream<Integer> boxed = random.ints(0, 100).map(x -> controll60.nextValueMustBeZero(x) ? 0 : x).boxed();
boxed.forEach(System.out::println);
}
static class Controll60PercentOfZero {
private long count_zero = 1;
private long count_not_zero = 1;
public boolean nextValueMustBeZero(int x) {
if (x == 0) {
count_zero++;
} else {
count_not_zero++;
}
boolean nextValueMustBeZero= (count_zero * 100 / count_not_zero) < 60;
if(nextValueMustBeZero){
count_zero++;
}
return nextValueMustBeZero;
}
}
}
Interesting question.
Most of other answers are very relevant and each one handles it from a different point of view.
I would like to contribute.
To get a Collection (and not a stream) with exactly 60% of 0 and that these appear "pseudo-randomly", you could :
declare and instantiate a List
then loop on the number of elements that you want to add it
inside it, every 6 iterations on 10, you add 0 at a random index in the List. Otherwise, you add a random value at a random index in the List.
The drawback is that about 40 % of the time, nextInt() is invoked twice: once to generate the value and another one to generate the index where inserting the value in the List.
Here is a sample code that generate 1000 elements from 0 to 1000 which exactly 60% are 0 :
Random random = new Random();
List<Integer> values = new ArrayList<>();
for (int i = 0; i < 1000; i++) {
int nextValue = i % 10 < 6 ? 0 : random.nextInt(1000) + 1;
int indexInList = values.size() <= 1 ? 0 : random.nextInt(values.size() - 1);
values.add(indexInList, nextValue);
}

How can I put a certain amount of integers in random locations in a 2d array?

I'm trying to put a certain amount of integers into an array in random spots without putting them in the same place.
My combine method concatenates two given integers and returns the Int.
Places is an arrayList to keep the locations of the integers already put into the array.
The random method returns a random integer in between the two given ints.
The combine method works and so does the random method, but I'm not sure why it isn't working.
public void fillOne(int b)
{
for(int x = 0; x < b; x++)
{
int kachow = random(0, 5);
int kachigga = random(0, 5);
int skrrt = combine(kachow, kachigga);
if(notInArray(skrrt))
{
locations[kachow][kachigga] = 1;
places.add(skrrt);
}
}
}
You haven't really explained what isn't working. But an obvious flaw in your algorithm is that it isn't guaranteed to set b elements to 1. If your algorithm generates a duplicate position then it will set fewer than b elements.
Your logic for storing the combined positions is overly complex. One solution would be to reverse the logic: generate a single integer representing both dimensions then divide it into two when you are setting the location. That makes the checks a lot simpler.
For example, if your array is 5x5:
Set<Integer> positions = new HashSet<>();
while (positions.size() < n)
positions.add(random.nextInt(25));
for (int p: positions)
locations[p/5][p%5] = 1;
Because positions is a set it automatically excludes duplicates which means the code will keep adding random positions until their are n distinct positions in the set.
Even simpler, if you are using Java 8:
random.ints(0, 25)
.distinct().limit(n)
.forEach(p -> locations[p/5][p%5] = 1);

Allocating N tonnes of food in K rooms with M capacity

I found this problem online:
You have N tonnes of food and K rooms to store them into. Every room has a capacity of M. In how many ways can you distribute the food in the rooms, so that every room has at least 1 ton of food.
My approach was to recursively find all possible variations that satisfy the conditions of the problem. I start with an array of size K, initialized to 1. Then I keep adding 1 to every element of the array and recursively check whether the new array satisfies the condition. However, the recursion tree gets too large too quickly and the program takes too long for slightly higher values of N, K and M.
What would be a more efficient algorithm to achieve this task? Are there any optimizations to be done to the existing algorithm implementation?
This is my implementation:
import java.util.Arrays;
import java.util.HashSet;
import java.util.Scanner;
public class Main {
// keeping track of valid variations, disregarding duplicates
public static HashSet<String> solutions = new HashSet<>();
// calculating sum of each variation
public static int sum(int[] array) {
int sum = 0;
for (int i : array) {
sum += i;
}
return sum;
}
public static void distributionsRecursive(int food, int rooms, int roomCapacity, int[] variation, int sum) {
// if all food has been allocated
if (sum == food) {
// add solution to solutions
solutions.add(Arrays.toString(variation));
return;
}
// keep adding 1 to every index in current variation
for (int i = 0; i < rooms; i++) {
// create new array for every recursive call
int[] tempVariation = Arrays.copyOf(variation, variation.length);
// if element is equal to room capacity, can't add any more in it
if (tempVariation[i] == roomCapacity) {
continue;
} else {
tempVariation[i]++;
sum = sum(tempVariation);
// recursively call function on new variation
distributionsRecursive(food, rooms, roomCapacity, tempVariation, sum);
}
}
return;
}
public static int possibleDistributions(int food, int rooms, int roomCapacity) {
int[] variation = new int[rooms];
// start from all 1, keep going till all food is allocated
Arrays.fill(variation, 1);
distributionsRecursive(food, rooms, roomCapacity, variation, rooms);
return solutions.size();
}
public static void main(String[] args) {
Scanner in = new Scanner(System.in);
int food = in.nextInt();
int rooms = in.nextInt();
int roomCapacity = in.nextInt();
int total = possibleDistributions(food, rooms, roomCapacity);
System.out.println(total);
in.close();
}
}
Yes, your recursion tree will become large if you do this in a naive manner. Let's say you have 10 tonnes and 3 rooms, and M=2. One valid arrangement is [2,3,5]. But you also have [2,5,3], [3,2,5], [3,5,2], [5,2,3], and [5,3,2]. So for every valid grouping of numbers, there are actually K! permutations.
A possibly better way to approach this problem would be to determine how many ways you can make K numbers (minimum M and maximum N) add up to N. Start by making the first number as large as possible, which would be N-(M*(K-1)). In my example, that would be:
10 - 2*(3-1) = 6
Giving the answer [6,2,2].
You can then build an algorithm to adjust the numbers to come up with valid combinations by "moving" values from left to right. In my example, you'd have:
6,2,2
5,3,2
4,4,2
4,3,3
You avoid the seemingly infinite recursion by ensuring that values are decreasing from left to right. For example, in the above you'd never have [3,4,3].
If you really want all valid arrangements, you can generate the permutations for each of the above combinations. I suspect that's not necessary, though.
I think that should be enough to get you started towards a good solution.
One solution would be to compute the result for k rooms from the result for k - 1 rooms.
I've simplified the problem a bit in allowing to store 0 tonnes in a room. If we have to store at least 1 we can just subtract this in advance and reduce the capacity of rooms by 1.
So we define a function calc: (Int,Int) => List[Int] that computes for a number of rooms and a capacity a list of numbers of combinations. The first entry contains the number of combinations we get for storing 0 , the next entry when storing 1 and so on.
We can easily compute this function for one room. So calc(1,m) gives us a list of ones up to the mth element and then it only contains zeros.
For a larger k we can define this function recursively. We just calculate calc(k - 1, m) and then build the new list by summing up prefixes of the old list. E.g. if we want to store 5 tons, we can store all 5 in the first room and 0 in the following rooms, or 4 in the first and 1 in the following and so on. So we have to sum up the combinations for 0 to 5 for the rest of the rooms.
As we have a maximal capacity we might have to leave out some of the combinations, i.e. if the room only has capacity 3 we must not count the combinations for storing 0 and 1 tons in the rest of the rooms.
I've implemented this approach in Scala. I've used streams (i.e. infinite Lists) but as you know the maximal amount of elements you need this is not necessary.
The time complexity of the approach should be O(k*n^2)
def calc(rooms: Int, capacity: Int): Stream[Long] =
if(rooms == 1) {
Stream.from(0).map(x => if(x <= capacity) 1L else 0L)
} else {
val rest = calc(rooms - 1, capacity)
Stream.from(0).map(x => rest.take(x+1).drop(Math.max(0,x - capacity)).sum)
}
You can try it here:
http://goo.gl/tVgflI
(I've replaced the Long by BigInt there to make it work for larger numbers)
First tip, remove distributionsRecursive and don't build up a list of solutions. The list of all solutions is a huge data set. Just produce a count.
That will let you turn possibleDistributions into a recursive function defined in terms of itself. The recursive step will be, possibleDistributions(food, rooms, roomCapacity) = sum from i = 1 to roomCapacity of possibleDistributions(food - i, rooms - 1, roomCapacity).
You will save a lot of memory, but still have your underlying performance problem. However with a pure recursive function you can now fix that with https://en.wikipedia.org/wiki/Memoization.

How to efficiently generate a set of unique random numbers with a predefined distribution?

I have a map of items with some probability distribution:
Map<SingleObjectiveItem, Double> itemsDistribution;
Given a certain m I have to generate a Set of m elements sampled from the above distribution.
As of now I was using the naive way of doing it:
while(mySet.size < m)
mySet.add(getNextSample(itemsDistribution));
The getNextSample(...) method fetches an object from the distribution as per its probability. Now, as m increases the performance severely suffers. For m = 500 and itemsDistribution.size() = 1000 elements, there is too much thrashing and the function remains in the while loop for too long. Generate 1000 such sets and you have an application that crawls.
Is there a more efficient way to generate a unique set of random numbers with a "predefined" distribution? Most collection shuffling techniques and the like are uniformly random. What would be a good way to address this?
UPDATE: The loop will call getNextSample(...) "at least" 1 + 2 + 3 + ... + m = m(m+1)/2 times. That is in the first run we'll definitely get a sample for the set. The 2nd iteration, it may be called at least twice and so on. If getNextSample is sequential in nature, i.e., goes through the entire cumulative distribution to find the sample, then the run time complexity of the loop is at least: n*m(m+1)/2, 'n' is the number of elements in the distribution. If m = cn; 0<c<=1 then the loop is at least Sigma(n^3). And that too is the lower bound!
If we replace sequential search by binary search, the complexity would be at least Sigma(log n * n^2). Efficient but may not be by a large margin.
Also, removing from the distribution is not possible since I call the above loop k times, to generate k such sets. These sets are part of a randomized 'schedule' of items. Hence a 'set' of items.
Start out by generating a number of random points in two dimentions.
Then apply your distribution
Now find all entries within the distribution and pick the x coordinates, and you have your random numbers with the requested distribution like this:
The problem is unlikely to be the loop you show:
Let n be the size of the distribution, and I be the number of invocations to getNextSample. We have I = sum_i(C_i), where C_i is the number of invocations to getNextSample while the set has size i. To find E[C_i], observe that C_i is the inter-arrival time of a poisson process with λ = 1 - i / n, and therefore exponentially distributed with λ. Therefore, E[C_i] = 1 / λ = therefore E[C_i] = 1 / (1 - i / n) <= 1 / (1 - m / n). Therefore, E[I] < m / (1 - m / n).
That is, sampling a set of size m = n/2 will take, on average, less than 2m = n invocations of getNextSample. If that is "slow" and "crawls", it is likely because getNextSample is slow. This is actually unsurprising, given the unsuitable way the distrubution is passed to the method (because the method will, of necessity, have to iterate over the entire distribution to find a random element).
The following should be faster (if m < 0.8 n)
class Distribution<T> {
private double[] cummulativeWeight;
private T[] item;
private double totalWeight;
Distribution(Map<T, Double> probabilityMap) {
int i = 0;
cummulativeWeight = new double[probabilityMap.size()];
item = (T[]) new Object[probabilityMap.size()];
for (Map.Entry<T, Double> entry : probabilityMap.entrySet()) {
item[i] = entry.getKey();
totalWeight += entry.getValue();
cummulativeWeight[i] = totalWeight;
i++;
}
}
T randomItem() {
double weight = Math.random() * totalWeight;
int index = Arrays.binarySearch(cummulativeWeight, weight);
if (index < 0) {
index = -index - 1;
}
return item[index];
}
Set<T> randomSubset(int size) {
Set<T> set = new HashSet<>();
while(set.size() < size) {
set.add(randomItem());
}
return set;
}
}
public class Test {
public static void main(String[] args) {
int max = 1_000_000;
HashMap<Integer, Double> probabilities = new HashMap<>();
for (int i = 0; i < max; i++) {
probabilities.put(i, (double) i);
}
Distribution<Integer> d = new Distribution<>(probabilities);
Set<Integer> set = d.randomSubset(max / 2);
//System.out.println(set);
}
}
The expected runtime is O(m / (1 - m / n) * log n). On my computer, a subset of size 500_000 of a set of 1_000_000 is computed in about 3 seconds.
As we can see, the expected runtime approaches infinity as m approaches n. If that is a problem (i.e. m > 0.9 n), the following more complex approach should work better:
Set<T> randomSubset(int size) {
Set<T> set = new HashSet<>();
while(set.size() < size) {
T randomItem = randomItem();
remove(randomItem); // removes the item from the distribution
set.add(randomItem);
}
return set;
}
To efficiently implement remove requires a different representation for the distribution, for instance a binary tree where each node stores the total weight of the subtree whose root it is.
But that is rather complicated, so I wouldn't go that route if m is known to be significantly smaller than n.
If you are not concerning with randomness properties too much then I do it like this:
create buffer for pseudo-random numbers
double buff[MAX]; // [edit1] double pseudo random numbers
MAX is size should be big enough ... 1024*128 for example
type can be any (float,int,DWORD...)
fill buffer with numbers
you have range of numbers x = < x0,x1 > and probability function probability(x) defined by your probability distribution so do this:
for (i=0,x=x0;x<=x1;x+=stepx)
for (j=0,n=probability(x)*MAX,q=0.1*stepx/n;j<n;j++,i++) // [edit1] unique pseudo-random numbers
buff[i]=x+(double(i)*q); // [edit1] ...
The stepx is your accuracy for items (for integral types = 1) now the buff[] array has the same distribution as you need but it is not pseudo-random. Also you should add check if j is not >= MAX to avoid array overruns and also at the end the real size of buff[] is j (can be less than MAX due to rounding)
shuffle buff[]
do just few loops of swap buff[i] and buff[j] where i is the loop variable and j is pseudo-random <0-MAX)
write your pseudo-random function
it just return number from the buffer. At first call returns the buff[0] at second buff[1] and so on ... For standard generators When you hit the end of buff[] then shuffle buff[] again and start from buff[0] again. But as you need unique numbers then you can not reach the end of buffer so so set MAX to be big enough for your task otherwise uniqueness will not be assured.
[Notes]
MAX should be big enough to store the whole distribution you want. If it is not big enough then items with low probability can be missing completely.
[edit1] - tweaked answer a little to match the question needs (pointed by meriton thanks)
PS. complexity of initialization is O(N) and for get number is O(1).
You should implement your own random number generator (using a MonteCarlo methode or any good uniform generator like mersen twister) and basing on the inversion method (here).
For example : exponential law: generate a uniform random number u in [0,1] then your random variable of the exponential law would be : ln(1-u)/(-lambda) lambda being the exponential law parameter and ln the natural logarithm.
Hope it'll help ;).
I think you have two problems:
Your itemDistribution doesn't know you need a set, so when the set you are building gets
large you will pick a lot of elements that are already in the set. If you start with the
set all full and remove elements you will run into the same problem for very small sets.
Is there a reason why you don't remove the element from the itemDistribution after you
picked it? Then you wouldn't pick the same element twice?
The choice of datastructure for itemDistribution looks suspicious to me. You want the
getNextSample operation to be fast. Doesn't the map from values to probability force you
to iterate through large parts of the map for each getNextSample. I'm no good at
statistics but couldn't you represent the itemDistribution the other way, like a map from
probability, or maybe the sum of all smaller probabilities + probability to a element
of the set?
Your performance depends on how your getNextSample function works. If you have to iterate over all probabilities when you pick the next item, it might be slow.
A good way to pick several unique random items from a list is to first shuffle the list and then pop items off the list. You can shuffle the list once with the given distribution. From then on, picking your m items ist just popping the list.
Here's an implementation of a probabilistic shuffle:
List<Item> prob_shuffle(Map<Item, int> dist)
{
int n = dist.length;
List<Item> a = dist.keys();
int psum = 0;
int i, j;
for (i in dist) psum += dist[i];
for (i = 0; i < n; i++) {
int ip = rand(psum); // 0 <= ip < psum
int jp = 0;
for (j = i; j < n; j++) {
jp += dist[a[j]];
if (ip < jp) break;
}
psum -= dist[a[j]];
Item tmp = a[i];
a[i] = a[j];
a[j] = tmp;
}
return a;
}
This in not Java, but pseudocude after an implementation in C, so please take it with a grain of salt. The idea is to append items to the shuffled area by continuously picking items from the unshuffled area.
Here, I used integer probabilities. (The proabilities don't have to add to a special value, it's just "bigger is better".) You can use floating-point numbers but because of inaccuracies, you might end up going beyond the array when picking an item. You should use item n - 1 then. If you add that saftey net, you could even have items with zero probability that always get picked last.
There might be a method to speed up the picking loop, but I don't really see how. The swapping renders any precalculations useless.
Accumulate your probabilities in a table
Probability
Item Actual Accumulated
Item1 0.10 0.10
Item2 0.30 0.40
Item3 0.15 0.55
Item4 0.20 0.75
Item5 0.25 1.00
Make a random number between 0.0 and 1.0 and do a binary search for the first item with a sum that is greater than your generated number. This item would have been chosen with the desired probability.
Ebbe's method is called rejection sampling.
I sometimes use a simple method, using an inverse cumulative distribution function, which is a function that maps a number X between 0 and 1 onto the Y axis.
Then you just generate a uniformly distributed random number between 0 and 1, and apply the function to it.
That function is also called the "quantile function".
For example, suppose you want to generate a normally distributed random number.
It's cumulative distribution function is called Phi.
The inverse of that is called probit.
There are many ways to generate normal variates, and this is just one example.
You can easily construct an approximate cumulative distribution function for any univariate distribution you like, in the form of a table.
Then you can just invert it by table-lookup and interpolation.

Random permutation of integers using a random number generator

This is my homework assignment:
Random r = new Random();
public int get100RandomNumber() {
return 1 + r.nextInt(100);
}
You are given a pre-defined function named getrand100() (above) which
returns an integer which is one random number from 1-100. You can call
this function as many times as you want but beware that this function
is quite resource intensive. You cannot use any other random
generator. You cannot change the definition of getrand100().
Output: Print numbers 1-20 in random order. (Not 20 random numbers)
What I have tried..
public class MyClass {
static Random r = new Random();
static HashSet<Integer>;
public static void main(String args[]) {
myMethod();
System.out.println(s);
}
public static void myMethod() {
boolean b = false;
s = new HashSet<Integer>();
int i = getRand100();
if (i >= 20)
i = i % 20;
int j = 0;
int k, l;
while (s.size() <= 20)
{
System.out.println("occurence no" + ++j);
System.out.println("occurence value" + i);
b = s.add(i);
while (!b) {
k = ++i;
if(k<=20)
b = s.add(k);
if(b==true)
break;
if (!b) {
l = --i;
if(i>=1&&i<=20)
b = s.add(l);
if(b==true)
break;
}
}
}
System.out.println(s);
}
public static int getRand100()
{
return r.nextInt(100) + 1;
}
}
Thanks for any help!
I believe you are asking how to use a random number generator to print out the numbers 1 to 20 in a random order. This is also known as a "random permutation". The Fischer-Yates shuffle is such an algorithm.
However, to implement the algorithm, you first of all need a random number generator that can pick one out of N items with equal probability where N ranges from 2 up to the size of the set to shuffle, while you only have one that can pick one out of 100 items with equal probability. That can easily be obtained by a combination of modulo arithmetic and "rerolling".
Assuming you are allowed to use the ArrayList class, I'd recommend filling a list with the numbers you want (1 to 20 in this case), then randomly pick numbers from the list and remove them. Using getRand100() % theList.size() should be sufficiently random for your cause and you only need to call it 19 times. When only one element is left, there's no need to "randomly" pick it from the list anymore. ;-)
I believe that I've come up with a way to convert any number between 1 and n! (assuming the number of items is known) to a unique permutation of n items.
In essence, this allows for an "immediate" randomization of an entire deck without having to use any shuffling algorithms. For now, it runs in O(n^2) and requires using BigInteger packages (ie. in Java or Javascript), but I'm looking for ways to optimize the runtime (although, honestly 2500 iterations is nothing these days anyway). Regardless, when given at least 226 bits of valid, random data, the function is able to generate a shuffled array of 52 integers in under 10 ms.
The method is similar to that used to convert a decimal number to binary (continually dividing by 2, etc). I'm happy to provide my code upon request; I find it interesting that I haven't come across it before.

Categories