Random number with Probabilities - java

I am wondering what would be the best way (e.g. in Java) to generate random numbers within a particular range where each number has a certain probability to occur or not?
e.g.
Generate random integers from within [1;3] with the following probabilities:
P(1) = 0.2
P(2) = 0.3
P(3) = 0.5
Right now I am considering the approach to generate a random integer within [0;100] and do the following:
If it is within [0;20] --> I got my random number 1.
If it is within [21;50] --> I got my random number 2.
If it is within [51;100] --> I got my random number 3.
What would you say?

Yours is a pretty good way already and works well with any range.
Just thinking: another possibility is to get rid of the fractions by multiplying with a constant multiplier, and then build an array with the size of this multiplier. Multiplying by 10 you get
P(1) = 2
P(2) = 3
P(3) = 5
Then you create an array with the inverse values -- '1' goes into elements 1 and 2, '2' into 3 to 6, and so on:
P = (1,1, 2,2,2, 3,3,3,3,3);
and then you can pick a random element from this array instead.
(Add.) Using the probabilities from the example in kiruwka's comment:
int[] numsToGenerate = new int[] { 1, 2, 3, 4, 5 };
double[] discreteProbabilities = new double[] { 0.1, 0.25, 0.3, 0.25, 0.1 };
the smallest multiplier that leads to all-integers is 20, which gives you
2, 5, 6, 5, 2
and so the length of numsToGenerate would be 20, with the following values:
1 1
2 2 2 2 2
3 3 3 3 3 3
4 4 4 4 4
5 5
The distribution is exactly the same: the chance of '1', for example, is now 2 out of 20 -- still 0.1.
This is based on your original probabilities all adding up to 1. If they do not, multiply the total by this same factor (which is then going to be your array length as well).

Some time ago I wrote a helper class to solve this issue. The source code should show the concept clear enough:
public class DistributedRandomNumberGenerator {
private Map<Integer, Double> distribution;
private double distSum;
public DistributedRandomNumberGenerator() {
distribution = new HashMap<>();
}
public void addNumber(int value, double distribution) {
if (this.distribution.get(value) != null) {
distSum -= this.distribution.get(value);
}
this.distribution.put(value, distribution);
distSum += distribution;
}
public int getDistributedRandomNumber() {
double rand = Math.random();
double ratio = 1.0f / distSum;
double tempDist = 0;
for (Integer i : distribution.keySet()) {
tempDist += distribution.get(i);
if (rand / ratio <= tempDist) {
return i;
}
}
return 0;
}
}
The usage of the class is as follows:
DistributedRandomNumberGenerator drng = new DistributedRandomNumberGenerator();
drng.addNumber(1, 0.3d); // Adds the numerical value 1 with a probability of 0.3 (30%)
// [...] Add more values
int random = drng.getDistributedRandomNumber(); // Generate a random number
Test driver to verify functionality:
public static void main(String[] args) {
DistributedRandomNumberGenerator drng = new DistributedRandomNumberGenerator();
drng.addNumber(1, 0.2d);
drng.addNumber(2, 0.3d);
drng.addNumber(3, 0.5d);
int testCount = 1000000;
HashMap<Integer, Double> test = new HashMap<>();
for (int i = 0; i < testCount; i++) {
int random = drng.getDistributedRandomNumber();
test.put(random, (test.get(random) == null) ? (1d / testCount) : test.get(random) + 1d / testCount);
}
System.out.println(test.toString());
}
Sample output for this test driver:
{1=0.20019100000017953, 2=0.2999349999988933, 3=0.4998739999935438}

You already wrote the implementation in your question. ;)
final int ran = myRandom.nextInt(100);
if (ran > 50) { return 3; }
else if (ran > 20) { return 2; }
else { return 1; }
You can speed this up for more complex implementations by per-calculating the result on a switch table like this:
t[0] = 1; t[1] = 1; // ... one for each possible result
return t[ran];
But this should only be used if this is a performance bottleneck and called several hundred times per second.

If you have performance issue instead of searching all the n values O(n)
you could perform binary search which costs O(log n)
Random r=new Random();
double[] weights=new double[]{0.1,0.1+0.2,0.1+0.2+0.5};
// end of init
double random=r.nextDouble();
// next perform the binary search in weights array
you only need to access log2(weights.length) in average if you have a lot of weights elements.

Your approach is fine for the specific numbers you picked, although you could reduce storage by using an array of 10 instead of an array of 100. However, this approach doesn't generalize well to large numbers of outcomes or outcomes with probabilities such as 1/e or 1/PI.
A potentially better solution is to use an alias table. The alias method takes O(n) work to set up the table for n outcomes, but then is constant time to generate regardless of how many outcomes there are.

Try this:
In this example i use an array of chars, but you can substitute it with your integer array.
Weight list contains for each char the associated probability.
It represent the probability distribution of my charset.
In weightsum list for each char i stored his actual probability plus the sum of any antecedent probability.
For example in weightsum the third element corresponding to 'C', is 65:
P('A') + P('B) + P('C') = P(X=>c)
10 + 20 + 25 = 65
So weightsum represent the cumulative distribution of my charset.
weightsum contains the following values:
It's easy to see that the 8th element correspondig to H, have a larger gap (80 of course like his probability) then is more like to happen!
List<Character> charset = Arrays.asList('A','B','C','D','E','F','G','H','I','J');
List<Integer> weight = Arrays.asList(10,30,25,60,20,70,10,80,20,30);
List<Integer> weightsum = new ArrayList<>();
int i=0,j=0,k=0;
Random Rnd = new Random();
weightsum.add(weight.get(0));
for (i = 1; i < 10; i++)
weightsum.add(weightsum.get(i-1) + weight.get(i));
Then i use a cycle to get 30 random char extractions from charset,each one drawned accordingly to the cumulative probability.
In k i stored a random number from 0 to the max value allocated in weightsum.
Then i look up in weightsum for a number grather than k, the position of the number in weightsum correspond to the same position of the char in charset.
for (j = 0; j < 30; j++)
{
Random r = new Random();
k = r.nextInt(weightsum.get(weightsum.size()-1));
for (i = 0; k > weightsum.get(i); i++) ;
System.out.print(charset.get(i));
}
The code give out that sequence of char:
HHFAIIDFBDDDHFICJHACCDFJBGBHHB
Let's do the math!
A = 2
B = 4
C = 3
D = 5
E = 0
F = 4
G = 1
H = 6
I = 3
J = 2
Total.:30
As we wish D and H are have more occurances (70% and 80% prob.)
Otherwinse E didn't come out at all. (10% prob.)

there is one more effective way rather than getting into fractions or creating big arrays or hard coding range to 100
in your case array becomes int[]{2,3,5} sum = 10
just take sum of all the probablity run random number generator on it
result = New Random().nextInt(10)
iterate over array elements from index 0 and calculate sum and return when sum is greater than return element of that index as a output
i.e if result is 6 then it will return index 2 which is no 5
this solution will scale irrespective of having big numbers or size of the range

If you are not against adding a new library in your code, this feature is already implemented in MockNeat, check the probabilities() method.
Some examples directly from the wiki:
String s = mockNeat.probabilites(String.class)
.add(0.1, "A") // 10% chance
.add(0.2, "B") // 20% chance
.add(0.5, "C") // 50% chance
.add(0.2, "D") // 20% chance
.val();
Or if you want to generate numbers within given ranges with a given probability you can do something like:
Integer x = m.probabilites(Integer.class)
.add(0.2, m.ints().range(0, 100))
.add(0.5, m.ints().range(100, 200))
.add(0.3, m.ints().range(200, 300))
.val();
Disclaimer: I am the author of the library, so I might be biased when I am recommending it.

Here is the python code even though you ask for java, but it's very similar.
# weighted probability
theta = np.array([0.1,0.25,0.6,0.05])
print(theta)
sample_axis = np.hstack((np.zeros(1), np.cumsum(theta)))
print(sample_axis)
[0. 0.1 0.35 0.95 1. ]. This represent the cumulative distribution.
you can use a uniform distribution to draw an index in this unit range.
def binary_search(axis, q, s, e):
if e-s <= 1:
print(s)
return s
else:
m = int( np.around( (s+e)/2 ) )
if q < axis[m]:
binary_search(axis, q, s, m)
else:
binary_search(axis, q, m, e)
range_index = np.random.rand(1)
print(range_index)
q = range_index
s = 0
e = sample_axis.shape[0]-1
binary_search(sample_axis, q, 0, e)

Also responded here: find random country but probability of picking higher population country should be higher. Using TreeMap:
TreeMap<Integer, Integer> map = new TreeMap<>();
map.put(percent1, 1);
map.put(percent1 + percent2, 2);
// ...
int random = (new Random()).nextInt(100);
int result = map.ceilingEntry(random).getValue();

This may be useful to someone, a simple one I did in python. you just have to change the way p and r are written. This one, for instance, projects random values between 0 and 0.1 to 1e-20 to 1e-12.
import random
def generate_distributed_random():
p = [1e-20, 1e-12, 1e-10, 1e-08, 1e-04, 1e-02, 1]
r = [0, 0.1, 0.3, 0.5, 0.7, 0.9, 1]
val = random.random()
for i in range(1, len(r)):
if val <= r[i] and val >= r[i - 1]:
slope = (p[i] - p[i - 1])/(r[i] - r[i - 1])
return p[i - 1] + (val - r[i - 1])*slope
print(generate_distributed_random())

referencing the paper pointed by pjs in another post , the population of base64 table can be further optimized. The result is amazingly fast, initialization is slightly expensive, but if the probabilities are not changing often, this is a good approach.
*For duplicate key, the last probability is taken instead of being combined (slightly different from EnumeratedIntegerDistribution behaviour)
public class RandomGen5 extends BaseRandomGen {
private int[] t_array = new int[4];
private int sumOfNumerator;
private final static int DENOM = (int) Math.pow(2, 24);
private static final int[] bitCount = new int[] {18, 12, 6, 0};
private static final int[] cumPow64 = new int[] {
(int) ( Math.pow( 64, 3 ) + Math.pow( 64, 2 ) + Math.pow( 64, 1 ) + Math.pow( 64, 0 ) ),
(int) ( Math.pow( 64, 2 ) + Math.pow( 64, 1 ) + Math.pow( 64, 0 ) ),
(int) ( Math.pow( 64, 1 ) + Math.pow( 64, 0 ) ),
(int) ( Math.pow( 64, 0 ) )
};
ArrayList[] base64Table = {new ArrayList<Integer>()
, new ArrayList<Integer>()
, new ArrayList<Integer>()
, new ArrayList<Integer>()};
public int nextNum() {
int rand = (int) (randGen.nextFloat() * sumOfNumerator);
for ( int x = 0 ; x < 4 ; x ++ ) {
if (rand < t_array[x])
return x == 0 ? (int) base64Table[x].get(rand >> bitCount[x])
: (int) base64Table[x].get( ( rand - t_array[x-1] ) >> bitCount[x]) ;
}
return 0;
}
public void setIntProbList( int[] intList, float[] probList ) {
Map<Integer, Float> map = normalizeMap( intList, probList );
populateBase64Table( map );
}
private void clearBase64Table() {
for ( int x = 0 ; x < 4 ; x++ ) {
base64Table[x].clear();
}
}
private void populateBase64Table( Map<Integer, Float> intProbMap ) {
int startPow, decodedFreq, table_index;
float rem;
clearBase64Table();
for ( Map.Entry<Integer, Float> numObj : intProbMap.entrySet() ) {
rem = numObj.getValue();
table_index = 3;
for ( int x = 0 ; x < 4 ; x++ ) {
decodedFreq = (int) (rem % 64);
rem /= 64;
for ( int y = 0 ; y < decodedFreq ; y ++ ) {
base64Table[table_index].add( numObj.getKey() );
}
table_index--;
}
}
startPow = 3;
for ( int x = 0 ; x < 4 ; x++ ) {
t_array[x] = x == 0 ? (int) ( Math.pow( 64, startPow-- ) * base64Table[x].size() )
: ( (int) ( ( Math.pow( 64, startPow-- ) * base64Table[x].size() ) + t_array[x-1] ) );
}
}
private Map<Integer, Float> normalizeMap( int[] intList, float[] probList ) {
Map<Integer, Float> tmpMap = new HashMap<>();
Float mappedFloat;
int numerator;
float normalizedProb, distSum = 0;
//Remove duplicates, and calculate the sum of non-repeated keys
for ( int x = 0 ; x < probList.length ; x++ ) {
mappedFloat = tmpMap.get( intList[x] );
if ( mappedFloat != null ) {
distSum -= mappedFloat;
} else {
distSum += probList[x];
}
tmpMap.put( intList[x], probList[x] );
}
//Normalise the map to key -> corresponding numerator by multiplying with 2^24
sumOfNumerator = 0;
for ( Map.Entry<Integer, Float> intProb : tmpMap.entrySet() ) {
normalizedProb = intProb.getValue() / distSum;
numerator = (int) ( normalizedProb * DENOM );
intProb.setValue( (float) numerator );
sumOfNumerator += numerator;
}
return tmpMap;
}
}

Related

Down to Zero II

This is the question:
You are given Q queries. Each query consists of a single number N . You can perform any of the operations on in each move:
If we take 2 integers a and b where N=a*b (a ,b cannot be equal to 1), then we can change N=max(a,b)
Decrease the value of N by 1 .
Determine the minimum number of moves required to reduce the value of to .
Input Format
The first line contains the integer Q.
The next Q lines each contain an integer,N .
Output Format
Output Q lines. Each line containing the minimum number of moves required > to reduce the value of N to 0.
I have written the following code. This code is giving some wrong answers and also giving time limit exceed error . Can you tell what are the the mistakes present in my code ? where or what I am doing wrong here?
My code:
public static int downToZero(int n) {
// Write your code here
int count1=0;
int prev_i=0;
int prev_j=0;
int next1=0;
int next2=Integer.MAX_VALUE;
if (n==0){
return 0;
}
while(n!=0){
if(n==1){
count1++;
break;
}
next1=n-1;
outerloop:
for (int i=1;i<=n;i++){
for (int j=1;j<=n;j++){
if (i*j==n){
if (prev_i ==j && prev_j==i){
break outerloop;
}
if (i !=j){
prev_i=i;
prev_j=j;
}
int max=Math.max(i,j);
if (max<next2){
next2=max;
}
}
}
}
n=Math.min(next1,next2);
count1++;
}
return count1;
}
This is part is coded for us:
public class Solution {
public static void main(String[] args) throws IOException {
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(System.in));
BufferedWriter bufferedWriter = new BufferedWriter(new FileWriter(System.getenv("OUTPUT_PATH")));
int q = Integer.parseInt(bufferedReader.readLine().trim());
for (int qItr = 0; qItr < q; qItr++) {
int n = Integer.parseInt(bufferedReader.readLine().trim());
int result = Result.downToZero(n);
bufferedWriter.write(String.valueOf(result));
bufferedWriter.newLine();
}
bufferedReader.close();
bufferedWriter.close();
}
}
Ex: it is not working for number 7176 ....
To explore all solution tree and find globally optimal solution, we must choose the best result both from all possible divisor pairs and from solution(n-1)
My weird translation to Java (ideone) uses bottom-up dynamic programming to make execution faster.
We calculate solutions for values i from 1 to n, they are written into table[i].
At first we set result into 1 + best result for previous value (table[i-1]).
Then we factor N into all pairs of divisors and check whether using already calculated result for larger divisor table[d] gives better result.
Finally we write result into the table.
Note that we can calculate table once and use it for all Q queries.
class Ideone
{
public static int makezeroDP(int n){
int[] table = new int[n+1];
table[1] = 1; table[2] = 2; table[3] = 3;
int res;
for (int i = 4; i <= n; i++) {
res = 1 + table[i-1];
int a = 2;
while (a * a <= i) {
if (i % a == 0)
res = Math.min(res, 1 + table[i / a]);
a += 1;
}
table[i] = res;
}
return table[n];
}
public static void main (String[] args) throws java.lang.Exception
{
int n = 145;//999999;
System.out.println(makezeroDP(n));
}
}
Old part
Simple implementation (sorry, in Python) gives answer 7 for 7176
def makezero(n):
if n <= 3:
return n
result = 1 + makezero(n - 1)
t = 2
while t * t <= n:
if n % t == 0:
result = min(result, 1 + makezero(n // t))
t += 1
return result
In Python it's needed to set recursion limit or change algorithm. Now use memoization, as I wrote in comments).
t = [-i for i in range(1000001)]
def makezeroMemo(n):
if t[n] > 0:
return t[n]
if t[n-1] < 0:
res = 1 + makezeroMemo(n-1)
else:
res = 1 + t[n-1]
a = 2
while a * a <= n:
if n % a == 0:
res = min(res, 1 + makezeroMemo(n // a))
a += 1
t[n] = res
return res
Bottom-up table dynamic programming. No recursion.
def makezeroDP(n):
table = [0,1,2,3] + [0]*(n-3)
for i in range(4, n+1):
res = 1 + table[i-1]
a = 2
while a * a <= i:
if i % a == 0:
res = min(res, 1 + table[i // a])
a += 1
table[i] = res
return table[n]
We can construct the directed acyclic graph quickly with a sieve and
then compute shortest paths. No trial division needed.
Time and space usage is Θ(N log N).
n_max = 1000000
successors = [[n - 1] for n in range(n_max + 1)]
for a in range(2, n_max + 1):
for b in range(a, n_max // a + 1):
successors[a * b].append(b)
table = [0]
for n in range(1, n_max + 1):
table.append(min(table[s] for s in successors[n]) + 1)
print(table[7176])
Results:
7
EDIT:
The algorithm uses Greedy approach and doesn't return optimal results, it just simplifies OP's approach. For 7176 given as example, below algorithm returns 10, I can see a shorter chain of 7176 -> 104 -> 52 -> 13 -> 12 -> 4 -> 2 -> 1 -> 0 with 8 steps, and expected answer is 7.
Let's review your problem in simple terms.
If we take 2 integers a and b where N=a*b (a ,b cannot be equal to 1), then we can change N=max(a,b)
and
Determine the minimum number of moves required to reduce the value of to .
You're looking for 2 factors of N, a and b and, if you want the minimum number of moves, this means that your maximum at each step should be minimum. We know for a fact that this minimum is reached when factors are closest to N. Let me give you an example:
36 = 1 * 36 = 2 * 18 = 3 * 12 = 4 * 9 = 6 * 6
We know that sqrt(36) = 6 and you can see that the minimum of 2 factors you can get at this step is max(6, 6) = 6. Sure, 36 is 6 squared, let me take a number without special properties, 96, with its square root rounded down to nearest integer 9.
96 = 2 * 48 = 3 * 32 = 4 * 24 = 6 * 16 = 8 * 12
You can see that your minimum value for max(a, b) is max(8, 12) = 12, which is, again, attained when factors are closest to square root.
Now let's look at the code:
for (int i=1;i<=n;i++){
for (int j=1;j<=n;j++){
if (i*j==n){
You can do this in one loop, knowing that n / i returns an integer, therefore you need to check if i * (n / i) == n. With the previous observation, we need to start at the square root, and go down, until we get to 1. If we got i and n / i as factors, we know that this pair is also the minimum you can get at this step. If no factors are found and you reach 1, which obviously is a factor of n, you have a prime number and you need to use the second instruction:
Decrease the value of N by 1 .
Note that if you go from sqrt(n) down to 1, looking for factors, if you find one, max(i, n / i) will be n / i.
Additionally, if n = 1, you take 1 step. If n = 2, you take 2 steps (2 -> 1). If n = 3, you take 3 steps (3 -> 2 -> 1). Therefore if n is 1, 2 or 3, you take n steps to go to 0. OK, less talking, more coding:
static int downToZero(int n) {
if (n == 1 || n == 2 || n == 3) return n;
int sqrt = (int) Math.sqrt(n);
for (int i = sqrt; i > 1; i--) {
if (n / i * i == n) {
return 1 + downToZero(n / i);
}
}
return 1 + downToZero(n - 1);
}
Notice that I'm stopping when i equals 2, I know that if I reach 1, it's a prime number and I need to go a step forward and look at n - 1.
However, I have tried to see the steps your algorithm and mine takes, so I've added a print statement each time n changes, and we both have the same succession: 7176, 92, 23, 22, 11, 10, 5, 4, 2, 1, which returns 10. Isn't that correct?
So, I found a solution which is working for all the test cases -
static final int LIMIT = 1_000_000;
static int[] solutions = buildSolutions();
public static int downToZero(int n) {
// Write your code here
return solutions[n];
}
static int[] buildSolutions() {
int[] solutions = new int[LIMIT + 1];
for (int i = 1; i < solutions.length; i++) {
solutions[i] = solutions[i - 1] + 1;
for (int j = 2; j * j <= i; j++) {
if (i % j == 0) {
solutions[i] = Math.min(solutions[i], solutions[i / j] + 1);
}
}
}
return solutions;
}
}

Is there any better algorithm to do this? (Unit fractions creation)

I have been given a problem where fractions between 1/2 - 1/1000 have to be added to create the longest sequence of all unique unit fractions.
The rules on constructing these fractions:
Let create a set: D , D is only to hold unique unit fractions , sub-fractions can add up to the same fraction for example:
1/10
/ \
1/110 + 1/11 1/35 + 1/14
All sub-fractions can be held within the set as long as they themselves are not the same fractions once we are adding them together it is ok for them to total up to the same root.
The goal:
The fractions have to be added in a way to sum up to exactly 1. Any sub-fractions are not allowed to be over 1000 it is explicitly between 2 and 1000 hence the fractions which make up 1/1000 would not be applicable ( 1/1004 + 1/251000 ).
What currently I found to be the most effective:
Find the two lowest multiples which make-up the current fraction that I am looking at so for e.g 1/6 = A = 3 , B = 2. And now we complete the following equation: C = (A+B)*A , D = (A+B)*B. Now C & D are the sub-fractions which add up to my initial fraction
1/6
/ \
1/15 1/10
In code:
public static int[] provideFactorsSmallest(int n) {
int k[] = new int[2];
for(int i = 2; i <= n - 1; i++) {
if(n % i == 0) {
k[0] = i;
break;
}
}
for(int i = k[0] + 1; i <= n - 1 && k[0] != 0; i++) {
//System.out.println("I HAVE BEEN ENTERED");
if(k[0] * i == n) {
k[1] = i;
int firstTerm = k[0];
int secondTerm = k[1];
k[0] = (firstTerm + secondTerm) * firstTerm;
k[1] = (firstTerm + secondTerm) * secondTerm;
return k;
}
}
return null;
}
My question:
What would be the most effective way to pair and group the numbers to achieve possible longest fraction sequence?
Thank you in advance!

Simulating unfair dice with java/R (programming done)

Task : Unfair die(6 sides) is being rolled n times. Probability of 1 is p1, probability of 2 is p2 and so on. Write a computer program, that for given n (n<100), the probability of set (p1,p2,p3,p4,p5,p6) and $x \in [n,600n]$ would find the probability of sum of dice values is less than x. Program cannot work more than 5 minutes. This is an extra question that will give me extra points, but so far nobody has done it. I quess beginner computer scientist like me can learn from this code also, since i found 0 help with bias dice in the web and came up with roulette like solution. I kind of wanted to show the world my way also.
I have 2 solutions - using geometrical and statistical probability.
My question is: 1) Is it correct when i do it like this or am i going wrong somewhere ?
2) Which one you think gives me better answer geometric or statistical probability ?
My intuition says it is geometric, because it is more reliable.
i think it is correct answer that my code is giving me - more than 0.99..... usually.
I wanted somebody to check my work since i'm not sure at all and i wanted to share this code with others.
I prefer Java more since it is much faster than R with loops, but i gave R code also for statistical , they are very similar i hope it is not a problam.
Java code :
import java.util.ArrayList;
public class Statistical_prob_lisayl2_geometrical {
public static double mean(ArrayList<Double> a) {
double sum=0;
int len = a.size();
for (int i = 0; i < len; i++) {
sum = sum + a.get(i);
}
return (sum/len);
}
public static double geom_prob(double p1,double p2,double p3,double p4,double p5,double p6){
ArrayList<Double> prob_values = new ArrayList<Double>();
int repeatcount = 1000000;
int[] options = {1,2,3,4,5,6};
int n = 50;
double[] probabilities = {p1,p2,p3,p4,p5,p6};
for (int i = 0 ; i < repeatcount ; i++ ) { // a lot of repeats for better statistical probability
int sum = 0; //for each repeat, the sum is being computed
for (int j = 0; j < n ; j++ ) { // for each repeat there is n cast of dies and we compute them here
double probability_value=0; // the value we start looking from with probability
double instant_probability = Math.random(); // we generate random probability for dice value
for (int k = 0; k < 6; k++ ) { // because we have 6 sides, we start looking at each probability like a roulette table
probability_value = probability_value + probabilities[k]; // we sum the probabilities for checking in which section the probability belongs to
if (probability_value>instant_probability) {
sum = sum + options[k]; // if probability belongs to certain area , lets say p3 to p4, then p3 is added to sum
break; // we break the loop , because it would give us false values otherwise
}
}
}
double length1 = (600*n)-n-(sum-n); //length of possible x values minus length of sum
double length2 = 600*n-n;
prob_values.add( (length1/length2) ); // geometric probability l1/l2
}
return mean(prob_values); // we give the mean value of a arraylist, with 1000000 numbers in it
}
public static double stat_prob(double p1,double p2,double p3,double p4,double p5,double p6){
ArrayList<Double> prob_values = new ArrayList<Double>();
int repeatcount = 1000000;
int[] options = {1,2,3,4,5,6};
int n = 50;
double[] probabilities = {p1,p2,p3,p4,p5,p6};
int count = 0;
for (int i = 0 ; i < repeatcount ; i++ ) {
int sum = 0;
for (int j = 0; j < n ; j++ ) {
double probability_value=0;
double instant_probability = Math.random();
for (int k = 0; k < 6; k++ ) {
probability_value = probability_value + probabilities[k];
if (probability_value>instant_probability) {
sum = sum + options[k];
break;
}
}
}
int x = (int)Math.round(Math.random()*(600*n-n)+n);
if( x>sum ) {
count = count + 1;
}
}
double probability = (double)count/(double)repeatcount;
return probability;
}
public static void main(String[] args) {
System.out.println(stat_prob(0.1,0.1,0.1,0.1,0.3,0.3));
System.out.println(geom_prob(0.1,0.1,0.1,0.1,0.3,0.3));
}
}
R code:
repeatcount = 100000
options = c(1,2,3,4,5,6)
n = 50
probabilities = c(1/10,1/10,1/10,1/10,3/10,3/10)
count = 0
for (i in 1:repeatcount) {
sum = 0
for (i in 1:n) {
probability_value=0
instant_probability = runif(1,0,1)
for (k in 1:6){
probability_value = probability_value + probabilities[k]
if (probability_value>instant_probability) {
sum = sum + options[k]
break
}
}
}
x = runif(1,n,600*n)
x
sum
if ( x> sum ) {
count = count + 1
}
}
count
probability = count/repeatcount
probability
Is this what you are trying to do??
n <- 50 # number of rolls in a trial
k <- 100000 # number if trials in the simulation
x <- 220 # cutoff for calculating P(X<x)
p <- c(1/10,1/10,1/10,1/10,3/10,3/10) # distribution of p-side
X <- sapply(1:k,function(i)sum(sample(1:6,n,replace=T,prob=p)))
P <- sum(X<x)/length(X) # probability that X < x
par(mfrow=c(1,2))
hist(X)
plot(quantile(X,probs=seq(0,1,.01)),seq(0,1,.01),type="l",xlab="x",ylab="P(X < x)")
lines(c(x,x,0),c(0,P,P),col="red",lty=2)
This makes sense because the expected side
E(s) = 1*0.1 + 2*0.1 + 3*0.1 + 4*0.1 + 5*0.3 + 6*0.3 = 4.3
Since you are simulating 50 rolls, the expected value of the total should be 50*4.3, or about 215, which is almost exactly what it is.
The slow step, below, runs in about 3.5s on my system. Obviously the actual time will depend on the number of trials in the simulation, and the speed of your computer, but 5 min is absurd...
system.time(X <- sapply(1:k,function(i)sum(sample(1:6,n,replace=T,prob=p))))
# user system elapsed
# 3.20 0.00 3.24

Integer Factorization

Could anyone explain to me why the algorithm below is an error-free integer factorization method that always return a non-trivial factor of N.
I know how weird this sounds, but I designed this method 2 years ago and still don't understand the mathematical logic behind it, which is making it difficult for me to improve it. It's so simple that it involves only addition and subtraction.
public static long factorX( long N )
{
long x=0, y=0;
long b = (long)(Math.sqrt(N));
long a = b*(b+1)-N;
if( a==b ) return a;
while ( a!= 0 )
{
a-= ( 2+2*x++ - y);
if( a<0 ) { a+= (x+b+1); y++; }
}
return ( x+b+1 );
}
It seems that the above method actually finds a solution by iteration to the diophantine equation:
f(x,y) = a - x(x+1) + (x+b+1)y
where b = floor( sqrt(N) ) and a = b(b+1) - N
that is, when a = 0, f(x,y) = 0 and (x+b+1) is a factor of N.
Example: N = 8509
b = 92, a = 47
f(34,9) = 47 - 34(34+1) + 9(34+92+1) = 0
and so x+b+1 = 127 is a factor of N.
Rewriting the method:
public static long factorX(long N)
{
long x=1, y=0, f=1;
long b = (long)(Math.sqrt(N));
long a = b*(b+1)-N;
if( a==b ) return a;
while( f != 0 )
{
f = a - x*(x+1) + (x+b+1)*y;
if( f < 0 ) y++;
x++;
}
return x+b+1;
}
I'd really appreciate any suggestions on how to improve this method.
Here's a list of 10 18-digit random semiprimes:
349752871155505651 = 666524689 x 524741059 in 322 ms
259160452058194903 = 598230151 x 433211953 in 404 ms
339850094323758691 = 764567807 x 444499613 in 1037 ms
244246972999490723 = 606170657 x 402934339 in 560 ms
285622950750261931 = 576888113 x 495109787 in 174 ms
191975635567268981 = 463688299 x 414018719 in 101 ms
207216185150553571 = 628978741 x 329448631 in 1029 ms
224869951114090657 = 675730721 x 332780417 in 1165 ms
315886983148626037 = 590221057 x 535201141 in 110 ms
810807767237895131 = 957028363 x 847213937 in 226 ms
469066333624309021 = 863917189 x 542952889 in 914 ms
OK, I used Matlab to see what was going here. Here is the result for N=100000:
You are increasing x on each iteration, and the funny pattern of a variable is strongly related with the remainder N % x+b+1 (as you can see in the gray line of the plot, a + (N % (x+b+1)) - x = floor(sqrt(N))).
Thus, I think you are just finding the first factor larger than sqrt(N) by simple iteration, but with a rather obscure criterion to decide it is really a factor :D
(Sorry for the half-answer... I have to leave, I will maybe continue later).
Here is the matlab code, just in case you want it to test by yourself:
clear all
close all
N = int64(100000);
histx = [];
histDiffA = [];
histy = [];
hista = [];
histMod = [];
histb = [];
x=int64(0);
y=int64(0);
b = int64(floor(sqrt(double(N))));
a = int64(b*(b+1)-N);
if( a==b )
factor = a;
else
while ( a ~= 0 )
a = a - ( 2+2*x - y);
histDiffA(end+1) = ( 2+2*x - y);
x = x+1;
if( a<0 )
a = a + (x+b+1);
y = y+1;
end
hista(end+1) = a;
histb(end+1) = b;
histx(end+1) = x;
histy(end+1) = y;
histMod(end+1) = mod(N,(x+b+1));
end
factor = x+b+1;
end
figure('Name', 'Values');
hold on
plot(hista,'-or')
plot(hista+histMod-histx,'--*', 'Color', [0.7 0.7 0.7])
plot(histb,'-ob')
plot(histx,'-*g')
plot(histy,'-*y')
legend({'a', 'a+mod(N,x+b+1)-x', 'b', 'x', 'y'}); % 'Input',
hold off
fprintf( 'factor is %d \n', factor );
Your method is a variant of trial multiplication of (n-a)*(n+b), where n=floor(sqrt(N)) and b==1.
The algorithm then iterates a-- / b++ until the difference of the (n-a)*(n+b) - N == 0.
The partial differences (in respect of a and b) are in proportion to 2b and 2a respectively. Thus no true multiplication are necessary.
The complexity is a linear function of |a| or |b| -- the more "square" N is, the faster the method converges. In summary, there are much faster methods, one of the easiest to understand being the quadratic residue sieve.
Pardon my c#, I don't know Java.
Stepping x and y by 2 increases algorithm speed.
using System.Numerics; // needed for BigInteger
/* Methods ************************************************************/
private static BigInteger sfactor(BigInteger k) // factor odd integers
{
BigInteger x, y;
int flag;
x = y = iSqrt(k); // Integer Square Root
if (x % 2 == 0) { x -= 1; y += 1; } // if even make x & y odd
do
{
flag = BigInteger.Compare((x*y), k);
if (flag > 0) x -= 2;
y += 2;
} while(flag != 0);
return x; // return x
} // end of sfactor()
// Finds the integer square root of a positive number
private static BigInteger iSqrt(BigInteger num)
{
if (0 == num) { return 0; } // Avoid zero divide
BigInteger n = (num / 2) + 1; // Initial estimate, never low
BigInteger n1 = (n + (num / n)) >> 1; // right shift to divide by 2
while (n1 < n)
{
n = n1;
n1 = (n + (num / n)) >> 1; // right shift to divide by 2
}
return n;
} // end iSqrt()

how to generate bins for histogram using apache math 3.0 in java?

I have been looking for away to generate bins for specific dataset (by specifying lower band, upper band and number of bins required) using apache common math 3.0. I have looked at Frequency http://commons.apache.org/math/apidocs/org/apache/commons/math3/stat/Frequency.html
but it does not give me what i want.. i want a method that give me frequency for values in an interval ( ex: how many values are between 0 to 5). Any suggestions or ideas?
Here is a simple way to implement histogram using Apache Commons Math 3:
final int BIN_COUNT = 20;
double[] data = {1.2, 0.2, 0.333, 1.4, 1.5, 1.2, 1.3, 10.4, 1, 2.0};
long[] histogram = new long[BIN_COUNT];
org.apache.commons.math3.random.EmpiricalDistribution distribution = new org.apache.commons.math3.random.EmpiricalDistribution(BIN_COUNT);
distribution.load(data);
int k = 0;
for(org.apache.commons.math3.stat.descriptive.SummaryStatistics stats: distribution.getBinStats())
{
histogram[k++] = stats.getN();
}
As far as I know there is no good histogram class in Apache Commons. I ended up writing my own. If all you want are linearly distributed bins from min to max, then it is quite easy to write.
Maybe something like this:
public static int[] calcHistogram(double[] data, double min, double max, int numBins) {
final int[] result = new int[numBins];
final double binSize = (max - min)/numBins;
for (double d : data) {
int bin = (int) ((d - min) / binSize);
if (bin < 0) { /* this data is smaller than min */ }
else if (bin >= numBins) { /* this data point is bigger than max */ }
else {
result[bin] += 1;
}
}
return result;
}
Edit: Here's an example.
double[] data = { 2, 4, 6, 7, 8, 9 };
int[] histogram = calcHistogram(data, 0, 10, 4);
// This is a histogram with 4 bins, 0-2.5, 2.5-5, 5-7.5, 7.5-10.
assert histogram[0] == 1; // one point (2) in range 0-2.5
assert histogram[1] == 1; // one point (4) in range 2.5-5.
// etc..
I think your code has a bug in it -- please see the corrected code below:
public static int[] calcHistogram(double[] data, double min, double max, int numBins) {
final int[] result = new int[numBins];
final double binSize = (max - min)/numBins;
for (double d : data) {
int bin = (int) ((d - min) / binSize); // changed this from numBins
if (bin < 0) { /* this data is smaller than min */ }
else if (bin >= numBins) { /* this data point is bigger than max */ }
else {
result[bin] += 1;
}
}
return result;
}
This is in addition to #Altair7852's answer.
If you want to generate x values bin interval for your y values (the frequency in each bin..akahistogram[] at index i) here is the full method
private fun displayHistogram(binCount: Int, data: DoubleArray) {
val histogram = DoubleArray(binCount)
val distribution = org.apache.commons.math3.random.EmpiricalDistribution(binCount)
distribution.load(data)
var k = 0
for (stats in distribution.binStats) {
histogram[k++] = stats.n.toDouble()
}
val binSize = (data.max()!!.toDouble() - data.min()!!.toDouble()) / binCount
for (i in 0 until histogram.size) {
series2?.appendData(DataPoint(generateHistogramXValues(data.min()!!.toDouble(), histogram.size, binSize)[i], histogram[i]), false, histogram.count())
}
}
Here is the x values generating method
val xValuesArray = DoubleArray(numberOfBIns)
for (i in 0 until numberOfBIns) {
if (i == 0){
xValuesArray[i] = min
}else{
val previous = xValuesArray[i-1]
xValuesArray[i] = previous+binSize
}
}
return xValuesArray
}
I'm doing this on android using GraphView graphing library but you can use this on any lib.
Here's a Java streams based implementation of the same function.
Uses some useful range, filter and count functions.
public static Long[] calcHistogram(Double[] data, Double min, Double max, Integer numBins) {
final var interval = (max - min) / numBins;
return IntStream.range(0, numBins)
.boxed()
.map(n -> {
var binStart = min + n * interval;
var binEnd = min + (n + 1) * interval;
return Arrays.stream(data).filter(d -> d >= binStart && d < binEnd).count();
})
.toArray(Long[]::new);
}

Categories