Fastest way to pick m distinct objects from a given pattern? - java

Given an array of n elements, where every element is in the range of 2 to 10^5. Now, if we paint the elements of the array such that for every m(m <= n) consecutive elements no two elements have the same color. How do I pick M distinct elements (not necessarily consecutive) such that no two of the chosen elements have the same colour and the difference between the largest element and smallest element among the choosen elements is the smallest possible?
Ex: for n = 4, A={10 20 15 28} m = 2, we can paint the elements as R G R G or G R G R. In both cases, if we pick any m consecutive elements no two elements have the same color like R G or G R or R G. There are 4 ways to pick 2 elements 10 20 or 10 28 or 20 15 or 15 28. but 20 - 15 = 5 and this is the best answer.
** duplicates allowed in array
My approach to this is to initially put all like colour elements in seperate arrays. Like in the example above i can do:[[10,15][20,28]] 10 15 are R, 20 28 are G. then i use recursion on every element of R and try all comibonations with consecutive colours.
void recurse(List<List<Integer>> bs, int max, int min, int depth) {
if(depth == bs.size()) {
int diference = max - min;
// compare diff with old res here
return;
}
for(int i=0;i<bs.get(depth).size();++i) {
int newMax = Math.max(max,bs.get(depth).get(i));
int newMin = Math.min(min,bs.get(depth).get(i));
recurse(bs, newMax, newMin, depth+1);
}
}
This is not wrong and does produces the correct result. But Im looking for a faster algorithm. Expected time complexity is O(n) or in better words i want to pass every test cases in 1 second. Note that 2 <= m <= n <= 10^5

We can solve this in O(n log n) time and O(n) space. First notice that any assigned colour must be a distance of m elements from its neighbours of the same colour or we would invalidate the constraints.
Separate each such list of elements of the same colour (defined only by their distance from each other) into its own list and sort it.
Now merge all the m sorted lists into one sorted list where each value is also paired with a label to the colour of the list it came from (the merged list could be of tuples, for example).
(Alternatively, we could first create the entire labeled list and just sort that.)
Iterate over the sorted, labeled list with a sliding window of size m, allowing only one element of each colour to stay in the window at any one time. (We could use a hash map or simple array to track the window. Remember that the window in this case is of unique labels, not a consecutive subarray of the labeled list.) Update the smallest range existing in the window during the iteration to determine the answer.

I think you could order the numbers (but keeping track of their colors), and then walk through the result from its start, first growing a candidate to have all the colors present (so the head will cover an unique color in the sublist), then shrinking it so repeated colors are thrown from the tail (so it points at a unique color too), then check if it is the best candidate so far, then throw away the tail (so that color will be missing), and proceed again with head:
import java.util.Arrays;
import java.util.List;
import java.util.Random;
public class NewClass {
public static void doThing(int nums[],int m){
int n=nums.length;
ColorNumber l[]=new ColorNumber[n];
for(int i=0;i<n;i++)
l[i]=new ColorNumber(nums[i], i%m);
System.out.println(Arrays.asList(l));
Arrays.sort(l, null);
List printlist=Arrays.asList(l);
System.out.println(printlist);
int present[]=new int[m];
int head=0,tail=0;
int minhead=0,mintail=0,mindiff=Integer.MAX_VALUE;
while(head<n){
System.out.println("try growing");
int i=0;
while(i<m && head<n){
while(present[i]==0 && head<n){
present[l[head].color]++;
head++;
}
//if(present[i]>0)i++; // the bug
while(i<m && present[i]>0)i++; // the fix
}
if(i==m){
System.out.println(printlist.subList(tail, head));
System.out.println("try shrinking");
while(present[l[tail].color]>1){
present[l[tail].color]--;
tail++;
}
int diff=l[head-1].number-l[tail].number;
System.out.println(printlist.subList(tail, head)+" diff: "+diff);
if(diff<mindiff){minhead=head;mintail=tail;mindiff=diff;}
present[l[tail].color]--;
tail++;
}
}
System.out.println("min: "+mindiff+", "+printlist.subList(mintail, minhead));
}
static class ColorNumber implements Comparable<ColorNumber>{
final int number;
final int color;
public ColorNumber(int number, int color) {
this.number = number;
this.color = color;
}
#Override
public int compareTo(ColorNumber o) {
return number-o.number;
}
#Override
public String toString() {
return number+"("+color+")";
}
}
public static void main(String args[]){
Random r=new Random(0);
int nums[]=new int[10];
for(int i=0;i<nums.length;i++)
nums[i]=r.nextInt(100);
doThing(nums, 3);
System.out.println("---");
doThing(new int[]{10,20,15,28},2);
System.out.println("---");
doThing(new int[] {2,1},2); // test case for bug
}
}
The output (one 3-color constant random sequence - because a seed is provided -, your 2-color example and the test case for the bug you fixed):
[60(0), 48(1), 29(2), 47(0), 15(1), 53(2), 91(0), 61(1), 19(2), 54(0)]
[15(1), 19(2), 29(2), 47(0), 48(1), 53(2), 54(0), 60(0), 61(1), 91(0)]
try growing
[15(1), 19(2), 29(2), 47(0)]
try shrinking
[15(1), 19(2), 29(2), 47(0)] diff: 32
try growing
[19(2), 29(2), 47(0), 48(1)]
try shrinking
[29(2), 47(0), 48(1)] diff: 19
try growing
[47(0), 48(1), 53(2)]
try shrinking
[47(0), 48(1), 53(2)] diff: 6
try growing
[48(1), 53(2), 54(0)]
try shrinking
[48(1), 53(2), 54(0)] diff: 6
try growing
[53(2), 54(0), 60(0), 61(1)]
try shrinking
[53(2), 54(0), 60(0), 61(1)] diff: 8
try growing
min: 6 [47(0), 48(1), 53(2)]
---
[10(0), 20(1), 15(0), 28(1)]
[10(0), 15(0), 20(1), 28(1)]
try growing
[10(0), 15(0), 20(1)]
try shrinking
[15(0), 20(1)] diff: 5
try growing
min: 5 [15(0), 20(1)]
---
[2(0), 1(1)]
[1(1), 2(0)]
try growing
[1(1), 2(0)]
try shrinking
[1(1), 2(0)] diff: 1
min: 1, [1(1), 2(0)]
In the output only the color of the lowest and the highest value is going to be unique, the in-between elements can be picked at will as they do not contribute to the difference (this code outputs them all like in case of the last attempt in the first sequence ([53(2), 54(0), 60(0), 61(1)])). If a specific output is needed, some Set could be used, or a for loop over the colors, printing only one (the first one it encounters) element for each color (and skipping the rest with a simple break).

Related

Finding mode for every window of size k in an array

Given an array of size n and k, how do you find the mode for every contiguous subarray of size k?
For example
arr = 1 2 2 6 6 1 1 7
k = 3
ans = 2 2 6 6 1 1
I was thinking of having a hashmap where the key is no and value is frequency, treemap where the key is freq and value is number, and having a queue to remove the first element when the size > k. Here the time complexity is o(nlog(n)). Can we do this in O(1)?.
This can be done in O(n) time
I was intrigued by this problem in part because, as I indicated in the comments, I felt certain that it could be done in O(n) time. I had some time over this past weekend, so I wrote up my solution to this problem.
Approach: Mode Frequencies
The basic concept is this: the mode of a collection of numbers is the number(s) which occur with the highest frequency within that set.
This means that whenever you add a number to the collection, if the number added was not already one of the mode-values then the frequency of the mode would not change. So with the collection (8 9 9) the mode-values are {9} and the mode-frequency is 2. If you add say a 5 to this collection ((8 9 9 5)) neither the mode-frequency nor the mode-values change. If instead you add an 8 to the collection ((8 9 9 8)) then the mode-values change to {9, 8} but the mode-frequency is still unchanged at 2. Finally, if you instead added a 9 to the collection ((8 9 9 9)), now the mode-frequency goes up by one.
Thus in all cases when you add a single number to the collection, the mode-frequency is either unchanged or goes up by only one. Likewise, when you remove a single number from the collection, the mode-frequency is either unchanged or goes down by at most one. So all incremental changes to the collection result in only two possible new mode-frequencies. This means that if we had all of the distinct numbers of the collection indexed by their frequencies, then we could always find the new Mode in a constant amount of time (i.e., O(1)).
To accomplish this I use a custom data structure ("ModeTracker") that has a multiset ("numFreqs") to store the distinct numbers of the collection along with their current frequency in the collection. This is implemented with a Dictionary<int, int> (I think that this is a Map in Java). Thus given a number, we can use this to find its current frequency within the collection in O(1).
This data structure also has an array of sets ("freqNums") that given a specific frequency will return all of the numbers that have that frequency in the current collection.
I have included the code for this data structure class below. Note that this is implemented in C# as I do not know Java well enough to implement it there, but I believe that a Java programmer should have no trouble translating it.
(pseudo)Code:
class ModeTracker
{
HashSet<int>[] freqNums; //numbers at each frequency
Dictionary<int, int> numFreqs; //frequencies for each number
int modeFreq_ = 0; //frequency of the current mode
public ModeTracker(int maxFrequency)
{
freqNums = new HashSet<int>[maxFrequency + 2];
// populate frequencies, so we dont have to check later
for (int i=0; i<maxFrequency+1; i++)
{
freqNums[i] = new HashSet<int>();
}
numFreqs = new Dictionary<int, int>();
}
public int Mode { get { return freqNums[modeFreq_].First(); } }
public void addNumber(int n)
{
int newFreq = adjustNumberCount(n, 1);
// new mode-frequency is one greater or the same
if (freqNums[modeFreq_+1].Count > 0) modeFreq_++;
}
public void removeNumber(int n)
{
int newFreq = adjustNumberCount(n, -1);
// new mode-frequency is the same or one less
if (freqNums[modeFreq_].Count == 0) modeFreq_--;
}
int adjustNumberCount(int num, int adjust)
{
// make sure we already have this number
if (!numFreqs.ContainsKey(num))
{
// add entries for it
numFreqs.Add(num, 0);
freqNums[0].Add(num);
}
// now adjust this number's frequency
int oldFreq = numFreqs[num];
int newFreq = oldFreq + adjust;
numFreqs[num] = newFreq;
// remove old freq for this number and and the new one
freqNums[oldFreq].Remove(num);
freqNums[newFreq].Add(num);
return newFreq;
}
}
Also, below is a small C# function that demonstrates how to use this datastructure to solve the problem originally posed in the question.
int[] ModesOfSubarrays(int[] arr, int subLen)
{
ModeTracker tracker = new ModeTracker(subLen);
int[] modes = new int[arr.Length - subLen + 1];
for (int i=0; i < arr.Length; i++)
{
//add every number into the tracker
tracker.addNumber(arr[i]);
if (i >= subLen)
{
// remove the number that just rotated out of the window
tracker.removeNumber(arr[i-subLen]);
}
if (i >= subLen - 1)
{
// add the new Mode to the output
modes[i - subLen + 1] = tracker.Mode;
}
}
return modes;
}
I have tested this and it does appear to work correctly for all of my tests.
Complexity Analysis
Going through the individual steps of the `ModesOfSubarrays()` function:
The new ModeTracker object is created in O(n) time or less.
The modes[] array is created in O(n) time.
The For(..) loops N times:
. 3a: the addNumber() function takes O(1) time
. 3b: the removeNumber() function takes O(1) time
. 3c: getting the new Mode takes O(1) time
So the total time is O(n) + O(n) + n*(O(1) + O(1) + O(1)) = O(n)
Please let me know of any questions that you might have about this code.

Allocating N tonnes of food in K rooms with M capacity

I found this problem online:
You have N tonnes of food and K rooms to store them into. Every room has a capacity of M. In how many ways can you distribute the food in the rooms, so that every room has at least 1 ton of food.
My approach was to recursively find all possible variations that satisfy the conditions of the problem. I start with an array of size K, initialized to 1. Then I keep adding 1 to every element of the array and recursively check whether the new array satisfies the condition. However, the recursion tree gets too large too quickly and the program takes too long for slightly higher values of N, K and M.
What would be a more efficient algorithm to achieve this task? Are there any optimizations to be done to the existing algorithm implementation?
This is my implementation:
import java.util.Arrays;
import java.util.HashSet;
import java.util.Scanner;
public class Main {
// keeping track of valid variations, disregarding duplicates
public static HashSet<String> solutions = new HashSet<>();
// calculating sum of each variation
public static int sum(int[] array) {
int sum = 0;
for (int i : array) {
sum += i;
}
return sum;
}
public static void distributionsRecursive(int food, int rooms, int roomCapacity, int[] variation, int sum) {
// if all food has been allocated
if (sum == food) {
// add solution to solutions
solutions.add(Arrays.toString(variation));
return;
}
// keep adding 1 to every index in current variation
for (int i = 0; i < rooms; i++) {
// create new array for every recursive call
int[] tempVariation = Arrays.copyOf(variation, variation.length);
// if element is equal to room capacity, can't add any more in it
if (tempVariation[i] == roomCapacity) {
continue;
} else {
tempVariation[i]++;
sum = sum(tempVariation);
// recursively call function on new variation
distributionsRecursive(food, rooms, roomCapacity, tempVariation, sum);
}
}
return;
}
public static int possibleDistributions(int food, int rooms, int roomCapacity) {
int[] variation = new int[rooms];
// start from all 1, keep going till all food is allocated
Arrays.fill(variation, 1);
distributionsRecursive(food, rooms, roomCapacity, variation, rooms);
return solutions.size();
}
public static void main(String[] args) {
Scanner in = new Scanner(System.in);
int food = in.nextInt();
int rooms = in.nextInt();
int roomCapacity = in.nextInt();
int total = possibleDistributions(food, rooms, roomCapacity);
System.out.println(total);
in.close();
}
}
Yes, your recursion tree will become large if you do this in a naive manner. Let's say you have 10 tonnes and 3 rooms, and M=2. One valid arrangement is [2,3,5]. But you also have [2,5,3], [3,2,5], [3,5,2], [5,2,3], and [5,3,2]. So for every valid grouping of numbers, there are actually K! permutations.
A possibly better way to approach this problem would be to determine how many ways you can make K numbers (minimum M and maximum N) add up to N. Start by making the first number as large as possible, which would be N-(M*(K-1)). In my example, that would be:
10 - 2*(3-1) = 6
Giving the answer [6,2,2].
You can then build an algorithm to adjust the numbers to come up with valid combinations by "moving" values from left to right. In my example, you'd have:
6,2,2
5,3,2
4,4,2
4,3,3
You avoid the seemingly infinite recursion by ensuring that values are decreasing from left to right. For example, in the above you'd never have [3,4,3].
If you really want all valid arrangements, you can generate the permutations for each of the above combinations. I suspect that's not necessary, though.
I think that should be enough to get you started towards a good solution.
One solution would be to compute the result for k rooms from the result for k - 1 rooms.
I've simplified the problem a bit in allowing to store 0 tonnes in a room. If we have to store at least 1 we can just subtract this in advance and reduce the capacity of rooms by 1.
So we define a function calc: (Int,Int) => List[Int] that computes for a number of rooms and a capacity a list of numbers of combinations. The first entry contains the number of combinations we get for storing 0 , the next entry when storing 1 and so on.
We can easily compute this function for one room. So calc(1,m) gives us a list of ones up to the mth element and then it only contains zeros.
For a larger k we can define this function recursively. We just calculate calc(k - 1, m) and then build the new list by summing up prefixes of the old list. E.g. if we want to store 5 tons, we can store all 5 in the first room and 0 in the following rooms, or 4 in the first and 1 in the following and so on. So we have to sum up the combinations for 0 to 5 for the rest of the rooms.
As we have a maximal capacity we might have to leave out some of the combinations, i.e. if the room only has capacity 3 we must not count the combinations for storing 0 and 1 tons in the rest of the rooms.
I've implemented this approach in Scala. I've used streams (i.e. infinite Lists) but as you know the maximal amount of elements you need this is not necessary.
The time complexity of the approach should be O(k*n^2)
def calc(rooms: Int, capacity: Int): Stream[Long] =
if(rooms == 1) {
Stream.from(0).map(x => if(x <= capacity) 1L else 0L)
} else {
val rest = calc(rooms - 1, capacity)
Stream.from(0).map(x => rest.take(x+1).drop(Math.max(0,x - capacity)).sum)
}
You can try it here:
http://goo.gl/tVgflI
(I've replaced the Long by BigInt there to make it work for larger numbers)
First tip, remove distributionsRecursive and don't build up a list of solutions. The list of all solutions is a huge data set. Just produce a count.
That will let you turn possibleDistributions into a recursive function defined in terms of itself. The recursive step will be, possibleDistributions(food, rooms, roomCapacity) = sum from i = 1 to roomCapacity of possibleDistributions(food - i, rooms - 1, roomCapacity).
You will save a lot of memory, but still have your underlying performance problem. However with a pure recursive function you can now fix that with https://en.wikipedia.org/wiki/Memoization.

Print all the combinations of elements in a 2D Matrix

Print all the combinations of elements in matrix of size m * n.
Sample Example:
1 3 5
2 6 7
Expected Output:
2 , 1
2 , 3
2 , 5
6 , 1
6 , 3
6 , 5
7 , 1
7 , 3
7 , 5
Rules:
- Every combination starts from bottom of matrix and proceeds towards top. It may switch columns though.
- Every combination should have number of elements equal to number of rows.
- A combination can't have an element from the same row present twice.
I never could figure the solution out for general case. I can use 3 loops. But I want to understand the recursive solution. I use Java.
Here's a non-recursive way to solve this problem (it's not all that pretty, but it works for your input). I know you were interested in recursion, but I don't have anything like that for you at the moment. Generally, I avoid recursion due to the size of problems I work with (constant heap space errors due to the size of the recursive stack even when -Xmx60G). Hope this helps.
private static List<int[]> combos;
public static void main(String[] args){
combos = new ArrayList<int[]>();
generate(new int[][]{{1,3,5},{2,6,7}});
for(int[] s : combos){
System.out.println(java.util.Arrays.toString(s));
}
}
private static void generate(int[][] elements) {
int rows = elements.length;
int[] elementsIndex = new int[rows];
int[] elementsTotals = new int[rows];
java.util.Arrays.fill(elementsTotals, elements[0].length);
int curIdx = 0;
int[] c = new int[rows];
while(true){
while(curIdx >= 0){
if(curIdx == rows) {
addCombo(c);
curIdx--;
}
if(elementsIndex[curIdx] == elementsTotals[curIdx]){
elementsIndex[curIdx] = 0;
curIdx--;
} else break;
}
if(curIdx < 0) break;
// toggle order:
// bottom up: elements[rows-curIdx-1][elementsIndex[curIdx]++]
// top down: elements[curIdx][elementsIndex[curIdx]++]
c[curIdx] = elements[rows-curIdx-1][elementsIndex[curIdx]++];
curIdx++;
}
}
private static void addCombo(int[] c){
int[] a = new int[c.length];
System.arraycopy(c, 0, a, 0, c.length);
combos.add(a);
}
A recursive solution would look something like this:
printPermutations(String head, Matrix m)
if m is empty print head and return
else for each item in last row of m
printPermutations(head + item, m - bottom row)
here "head" is all the work we've done so far. In the body of a recursive method, there have to be at least two alternatives, one that outputs a result and ends the recursion, and one that goes deeper. For the deeper alternative, we transfer the selected element to head, remove the bottom row since we can't pick more than one item from any row, and do it again.
Done well, a recursive solution is often simpler and cleaner than using loops. On the other hand, recursion tends to use more memory

Finding unique numbers from sorted array in less than O(n)

I had an interview and there was the following question:
Find unique numbers from sorted array in less than O(n) time.
Ex: 1 1 1 5 5 5 9 10 10
Output: 1 5 9 10
I gave the solution but that was of O(n).
Edit: Sorted array size is approx 20 billion and unique numbers are approx 1000.
Divide and conquer:
look at the first and last element of a sorted sequence (the initial sequence is data[0]..data[data.length-1]).
If both are equal, the only element in the sequence is the first (no matter how long the sequence is).
If the are different, divide the sequence and repeat for each subsequence.
Solves in O(log(n)) in the average case, and O(n) only in the worst case (when each element is different).
Java code:
public static List<Integer> findUniqueNumbers(int[] data) {
List<Integer> result = new LinkedList<Integer>();
findUniqueNumbers(data, 0, data.length - 1, result, false);
return result;
}
private static void findUniqueNumbers(int[] data, int i1, int i2, List<Integer> result, boolean skipFirst) {
int a = data[i1];
int b = data[i2];
// homogenous sequence a...a
if (a == b) {
if (!skipFirst) {
result.add(a);
}
}
else {
//divide & conquer
int i3 = (i1 + i2) / 2;
findUniqueNumbers(data, i1, i3, result, skipFirst);
findUniqueNumbers(data, i3 + 1, i2, result, data[i3] == data[i3 + 1]);
}
}
I don't think it can be done in less than O(n). Take the case where the array contains 1 2 3 4 5: in order to get the correct output, each element of the array would have to be looked at, hence O(n).
If your sorted array of size n has m distinct elements, you can do O(mlogn).
Note that this is going to efficient when m << n (eg m=2 and n=100)
Algorithm:
Initialization: Current element y = first element x[0]
Step 1: Do a binary search for the last occurrence of y in x (can be done in O(log(n)) time. Let it's index be i
Step 2: y = x[i+1] and go to step 1
Edit: In cases where m = O(n) this algorithm is going to work badly. To alleviate it you can run it in parallel with regular O(n) algorithm. The meta algorithm consists of my algorithm and O(n) algorithm running in parallel. The meta algorithm stops when either of these two algorithms complete.
Since the data consists of integers, there are a finite number of unique values that can occur between any two values. So, start with looking at the first and last value in the array. If a[length-1] - a[0] < length - 1, there will be some repeating values. Put a[0] and a[length-1] into some constant-access-time container like a hash set. If the two values are equal, you konow that there is only one unique value in the array and you are done. You know that the array is sorted. So, if the two values are different, you can look at the middle element now. If the middle element is already in the set of values, you know that you can skip the whole left part of the array and only analyze the right part recursively. Otherwise, analyze both left and right part recursively.
Depending on the data in the array you will be able to get the set of all unique values in a different number of operations. You get them in constant time O(1) if all the values are the same since you will know it after only checking the first and last element. If there are "relatively few" unique values, your complexity will be close to O(log N) because after each partition you will "quite often" be able to throw away at least one half of the analyzed sub-array. If the values are all unique and a[length-1] - a[0] = length - 1, you can also "define" the set in constant time because they have to be consecutive numbers from a[0] to a[length-1]. However, in order to actually list them, you will have to output each number, and there are N of them.
Perhaps someone can provide a more formal analysis, but my estimate is that this algorithm is roughly linear in the number of unique values rather than the size of the array. This means that if there are few unique values, you can get them in few operations even for a huge array (e.g. in constant time regardless of array size if there is only one unique value). Since the number of unique values is no grater than the size of the array, I claim that this makes this algorithm "better than O(N)" (or, strictly: "not worse than O(N) and better in many cases").
import java.util.*;
/**
* remove duplicate in a sorted array in average O(log(n)), worst O(n)
* #author XXX
*/
public class UniqueValue {
public static void main(String[] args) {
int[] test = {-1, -1, -1, -1, 0, 0, 0, 0,2,3,4,5,5,6,7,8};
UniqueValue u = new UniqueValue();
System.out.println(u.getUniqueValues(test, 0, test.length - 1));
}
// i must be start index, j must be end index
public List<Integer> getUniqueValues(int[] array, int i, int j) {
if (array == null || array.length == 0) {
return new ArrayList<Integer>();
}
List<Integer> result = new ArrayList<>();
if (array[i] == array[j]) {
result.add(array[i]);
} else {
int mid = (i + j) / 2;
result.addAll(getUniqueValues(array, i, mid));
// avoid duplicate divide
while (mid < j && array[mid] == array[++mid]);
if (array[(i + j) / 2] != array[mid]) {
result.addAll(getUniqueValues(array, mid, j));
}
}
return result;
}
}

How to pick an item by its probability?

I have a list of items. Each of these items has its own probability.
Can anyone suggest an algorithm to pick an item based on its probability?
Generate a uniformly distributed random number.
Iterate through your list until the cumulative probability of the visited elements is greater than the random number
Sample code:
double p = Math.random();
double cumulativeProbability = 0.0;
for (Item item : items) {
cumulativeProbability += item.probability();
if (p <= cumulativeProbability) {
return item;
}
}
So with each item store a number that marks its relative probability, for example if you have 3 items one should be twice as likely to be selected as either of the other two then your list will have:
[{A,1},{B,1},{C,2}]
Then sum the numbers of the list (i.e. 4 in our case).
Now generate a random number and choose that index.
int index = rand.nextInt(4);
return the number such that the index is in the correct range.
Java code:
class Item {
int relativeProb;
String name;
//Getters Setters and Constructor
}
...
class RandomSelector {
List<Item> items = new List();
Random rand = new Random();
int totalSum = 0;
RandomSelector() {
for(Item item : items) {
totalSum = totalSum + item.relativeProb;
}
}
public Item getRandom() {
int index = rand.nextInt(totalSum);
int sum = 0;
int i=0;
while(sum < index ) {
sum = sum + items.get(i++).relativeProb;
}
return items.get(Math.max(0,i-1));
}
}
pretend that we have the following list
Item A 25%
Item B 15%
Item C 35%
Item D 5%
Item E 20%
Lets pretend that all the probabilities are integers, and assign each item a "range" that calculated as follows.
Start - Sum of probability of all items before
End - Start + own probability
The new numbers are as follows
Item A 0 to 25
Item B 26 to 40
Item C 41 to 75
Item D 76 to 80
Item E 81 to 100
Now pick a random number from 0 to 100. Lets say that you pick 32. 32 falls in Item B's range.
mj
You can try the Roulette Wheel Selection.
First, add all the probabilities, then scale all the probabilities in the scale of 1, by dividing each one by the sum. Suppose the scaled probabilities are A(0.4), B(0.3), C(0.25) and D(0.05). Then you can generate a random floating-point number in the range [0, 1). Now you can decide like this:
random number in [0.00, 0.40) -> pick A
in [0.40, 0.70) -> pick B
in [0.70, 0.95) -> pick C
in [0.95, 1.00) -> pick D
You can also do it with random integers - say you generate a random integer between 0 to 99 (inclusive), then you can make decision like the above.
Algorithm described in Ushman's, Brent's and #kaushaya's answers are implemented in Apache commons-math library.
Take a look at EnumeratedDistribution class (groovy code follows):
def probabilities = [
new Pair<String, Double>("one", 25),
new Pair<String, Double>("two", 30),
new Pair<String, Double>("three", 45)]
def distribution = new EnumeratedDistribution<String>(probabilities)
println distribution.sample() // here you get one of your values
Note that sum of probabilities doesn't need to be equal to 1 or 100 - it will be normalized automatically.
My method is pretty simple. Generate a random number. Now since the probabilities of your items are known,simply iterate through the sorted list of probability and pick the item whose probability is lesser than the randomly generated number.
For more details,read my answer here.
A slow but simple way to do it is to have every member to pick a random number based on its probability and pick the one with highest value.
Analogy:
Imagine 1 of 3 people needs to be chosen but they have different probabilities. You give them die with different amount of faces. First person's dice has 4 face, 2nd person's 6, and the third person's 8. They roll their die and the one with the biggest number wins.
Lets say we have the following list:
[{A,50},{B,100},{C,200}]
Pseudocode:
A.value = random(0 to 50);
B.value = random(0 to 100);
C.value = random (0 to 200);
We pick the one with the highest value.
This method above does not exactly map the probabilities. For example 100 will not have twice the chance of 50. But we can do it in a by tweaking the method a bit.
Method 2
Instead of picking a number from 0 to the weight we can limit them from the upper limit of previous variable to addition of the current variable.
[{A,50},{B,100},{C,200}]
Pseudocode:
A.lowLimit= 0; A.topLimit=50;
B.lowLimit= A.topLimit+1; B.topLimit= B.lowLimit+100
C.lowLimit= B.topLimit+1; C.topLimit= C.lowLimit+200
resulting limits
A.limits = 0,50
B.limits = 51,151
C.limits = 152,352
Then we pick a random number from 0 to 352 and compare it to each variable's limits to see whether the random number is in its limits.
I believe this tweak has better performance since there is only 1 random generation.
There is a similar method in other answers but this method does not require the total to be 100 or 1.00.
Brent's answer is good, but it doesn't account for the possibility of erroneously choosing an item with a probability of 0 in cases where p = 0. That's easy enough to handle by checking the probability (or perhaps not adding the item in the first place):
double p = Math.random();
double cumulativeProbability = 0.0;
for (Item item : items) {
cumulativeProbability += item.probability();
if (p <= cumulativeProbability && item.probability() != 0) {
return item;
}
}
A space-costly way is to clone each item the number of times its probability. Selection will be done in O(1).
For example
//input
[{A,1},{B,1},{C,3}]
// transform into
[{A,1},{B,1},{C,1},{C,1},{C,1}]
Then simply pick any item randomly from this transformed list.
Adapted the code from https://stackoverflow.com/a/37228927/11257746 into a general extention method. This will allow you to get a weighted random value from a Dictionary with the structure <TKey, int>, where int is a weight value.
A Key that has a value of 50 is 10 times more likely to be chosen than a key with the value of 5.
C# code using LINQ:
/// <summary>
/// Get a random key out of a dictionary which has integer values treated as weights.
/// A key in the dictionary with a weight of 50 is 10 times more likely to be chosen than an element with the weight of 5.
///
/// Example usage to get 1 item:
/// Dictionary<MyType, int> myTypes;
/// MyType chosenType = myTypes.GetWeightedRandomKey<MyType, int>().First();
///
/// Adapted into a general extention method from https://stackoverflow.com/a/37228927/11257746
/// </summary>
public static IEnumerable<TKey> GetWeightedRandomKey<TKey, TValue>(this Dictionary<TKey, int> dictionaryWithWeights)
{
int totalWeights = 0;
foreach (KeyValuePair<TKey, int> pair in dictionaryWithWeights)
{
totalWeights += pair.Value;
}
System.Random random = new System.Random();
while (true)
{
int randomWeight = random.Next(0, totalWeights);
foreach (KeyValuePair<TKey, int> pair in dictionaryWithWeights)
{
int weight = pair.Value;
if (randomWeight - weight > 0)
randomWeight -= weight;
else
{
yield return pair.Key;
break;
}
}
}
}
Example usage:
public enum MyType { Thing1, Thing2, Thing3 }
public Dictionary<MyType, int> MyWeightedDictionary = new Dictionary<MyType, int>();
public void MyVoid()
{
MyWeightedDictionary.Add(MyType.Thing1, 50);
MyWeightedDictionary.Add(MyType.Thing2, 25);
MyWeightedDictionary.Add(MyType.Thing3, 5);
// Get a single random key
MyType myChosenType = MyWeightedDictionary.GetWeightedRandomKey<MyType, int>().First();
// Get 20 random keys
List<MyType> myChosenTypes = MyWeightedDictionary.GetWeightedRandomKey<MyType, int>().Take(20).ToList();
}
If you don't mind adding a third party dependency in your code you can use the MockNeat.probabilities() method.
For example:
String s = mockNeat.probabilites(String.class)
.add(0.1, "A") // 10% chance to pick A
.add(0.2, "B") // 20% chance to pick B
.add(0.5, "C") // 50% chance to pick C
.add(0.2, "D") // 20% chance to pick D
.val();
Disclaimer: I am the author of the library, so I might be biased when I am recommending it.
All mentioned solutions have linear effort. The following has only logarithmic effort and deals also with unnormalized probabilities. I'd reccommend to use a TreeMap rather than a List:
import java.util.*;
import java.util.stream.IntStream;
public class ProbabilityMap<T> extends TreeMap<Double,T>{
private static final long serialVersionUID = 1L;
public static Random random = new Random();
public double sumOfProbabilities;
public Map.Entry<Double,T> next() {
return ceilingEntry(random.nextDouble()*sumOfProbabilities);
}
#Override public T put(Double key, T value) {
return super.put(sumOfProbabilities+=key, value);
}
public static void main(String[] args) {
ProbabilityMap<Integer> map = new ProbabilityMap<>();
map.put(0.1,1); map.put(0.3,3); map.put(0.2,2);
IntStream.range(0, 10).forEach(i->System.out.println(map.next()));
}
}
You could use the Julia code:
function selrnd(a::Vector{Int})
c = a[:]
sumc = c[1]
for i=2:length(c)
sumc += c[i]
c[i] += c[i-1]
end
r = rand()*sumc
for i=1:length(c)
if r <= c[i]
return i
end
end
end
This function returns the index of an item efficiently.

Categories