Implementing probability distribution function in Java

Implementing probability distribution function in Java - java

I'm trying to implement a probability distribution function in java where it returns the ith entry in the array with probability:
Fi = 6i(n-i) / (n3 - n)
where n is the array length i.e. for an array length 4:
P1 = 3/10, P2 = 4/10, P3 = 3/10, P4 = 0
Note that this function assumes numbering from 1 to n rather 0 to n-1 as in Java.
At the moment I'm just using the uniform distribution i.e.
int i = (int)(Math.random()*((arraySize)-1));
with the -1 so it doesn't choose the last element (i.e. Pn = 0 as in the above formula).
Anyone with any ideas or tips on implementing this?

double rand = Math.random(); // generate a random number in [0,1]
F=0;
// you test if rand is in [F(1)+..+F(i):F(1)+..+F(i)+F(i+1)] it is in this rnge with proba P(i) and therefore if it is in this range you return i
for (int i=1,i<array.size();i++ ){
F+=F(i);
if rand < F
return i;
}
return array.size(); // you went through all the array then rand==1 (this probability is null) and you return n

This is essentially what thomson_matt says, but a little more formally: You should perform discrete inverse transform sampling. Pseudocode for your example:
p = [0.3, 0.4, 0.3. 0.0]
c = [0.3, 0.7, 1.0, 1.0] // cumulative sum
generate x uniformly in continuous range [0,1]
find max i such that c[i] < x.

To do this, you want to divide the range [0, 1] into regions that have the required size. So in this case:
0 -> 0.0 - 0.3
1 -> 0.3 - 0.7
2 -> 0.7 - 1.0
3 -> 1.0 - 1.0
Then generate a random number with Math.random(), and see which interval it falls into.
In general, you want to do something like the following pseudocode:
double r = Math.random();
int i = -1;
while (r >= 0)
{
i++;
r -= F(i);
}
// i is now the value you want.
You generate a value on [0, 1], then subtract the size of each interval until you go below 0, at which point you've found your random value.

You could try using a navigable map with the probability distribution. Unlike normal Maps the NaviableMap defines an absolute ordering over its keys. And if the key isn't present in the map it can tell you which is the closest key, or which is the smallest key that is greater than the argument. I've used ceilingEntry which returns the map entry with the smallest key that is greater than or equal to the given key.
If you use a TreeMap as your implementation of NavigableMap then look ups on distributions with many classes will be faster as it performs a binary search rather than starting with the first key and then testing each key in turn.
The other advantage of NaviableMap is that you get the class of data your directly interested in rather than an index to another array or list, which can make code cleaner.
In my example I've used BigDecimals as I'm not particularly fond of using floating point numbers as you can't specify the precision you need. But you could use floats or doubles or whatever.
import java.math.BigDecimal;
import java.math.RoundingMode;
import java.util.Arrays;
import java.util.NavigableMap;
import java.util.TreeMap;
public class Main {
public static void main(String[] args) {
String[] classes = {"A", "B", "C", "D"};
BigDecimal[] probabilities = createProbabilities(classes.length);
BigDecimal[] distribution = createDistribution(probabilities);
System.out.println("probabilities: "+Arrays.toString(probabilities));
System.out.println("distribution: "+Arrays.toString(distribution)+"\n");
NavigableMap<BigDecimal, String> map = new TreeMap<BigDecimal, String>();
for (int i = 0; i < distribution.length; i++) {
map.put(distribution[i], classes[i]);
}
BigDecimal d = new BigDecimal(Math.random());
System.out.println("probability: "+d);
System.out.println("result: "+map.ceilingEntry(d).getValue());
}
private static BigDecimal[] createDistribution(BigDecimal[] probabilities) {
BigDecimal[] distribution = new BigDecimal[probabilities.length];
distribution[0] = probabilities[0];
for (int i = 1; i < distribution.length; i++) {
distribution[i] = distribution[i-1].add(probabilities[i]);
}
return distribution;
}
private static BigDecimal[] createProbabilities(int n) {
BigDecimal[] probabilities = new BigDecimal[n];
for (int i = 0; i < probabilities.length; i++) {
probabilities[i] = F(i+1, n);
}
return probabilities;
}
private static BigDecimal F(int i, int n) {
// 6i(n-i) / (n3 - n)
BigDecimal j = new BigDecimal(i);
BigDecimal m = new BigDecimal(n);
BigDecimal six = new BigDecimal(6);
BigDecimal dividend = m.subtract(j).multiply(j).multiply(six);
BigDecimal divisor = m.pow(3).subtract(m);
return dividend.divide(divisor, 64, RoundingMode.HALF_UP);
}
}

Related

Adjust list by normalizing in java

When analysing data sets, such as data for human heights or for human weights, a common step is to adjust the data. This adjustment can be done by normalizing to values between 0 and 1, or throwing away outliers.
For this program, adjust the values by dividing all values by the largest value. The input begins with an integer indicating the number of floating-point values that follow. Assume that the list will always contain fewer than 20 floating-point values.
Output each floating-point value with two digits after the decimal point, which can be achieved as follows:
System.out.printf("%.2f", yourValue);
Ex: If the input is:
5 30.0 50.0 10.0 100.0 65.0
the output is:
0.30 0.50 0.10 1.00 0.65
The 5 indicates that there are five floating-point values in the list, namely 30.0, 50.0, 10.0, 100.0, and 65.0. 100.0 is the largest value in the list, so each value is divided by 100.0.
For coding simplicity, follow every output value by a space, including the last one.
This is my code so far:
import java.util.Scanner;
public class LabProgram {
public static void main(String[] args) {
Scanner scnr = new Scanner(System.in);
double numElements;
numElements = scnr.nextDouble();
double[] userList = new double[numElements];
int i;
double maxValue;
for (i = 0; i < userList.length; ++i) {
userList[i] = scnr.nextDouble();
}
maxValue = userList[i];
for (i = 0; i < userList.length; ++i) {
if (userList[i] > maxValue) {
maxValue = userList[i];
}
}
for (i = 0; i < userList.length; ++i) {
userList[i] = userList[i] / maxValue;
System.out.print(userList[i] + " ");
System.out.printf("%.2f", userList[i]);
}
}
}
I keep getting this output.
LabProgram.java:8: error: incompatible types: possible lossy conversion from double to int
double [] userList = new double [numElements];
^
1 error
I think my variable is messed up. I read through my book and could not find help. Can someone please help me on here. Thank you so much! This has been very stressful for me.

The specific error message is because the index and size of an element must be int. So declare and assign at once: int numElements = scnr.nextInt();

Better way of programming things:
skip manual input (aka Scanner and consorts). Makes you crazy and testing a 100'000'000 times slower
you can integrate the interactive part later, once the method is done. You already know how, your code already shows.
use an explicit method to do your work. Don't throw everything into the main method. This way you can run multiple examples/tests on the method, and you have a better implementation for later.
check for invalid input INSIDE the method that you implement. Once you can rely in such a method, you can keep on using it later on.
you could even move the example numbers to its own test method, so you can run multiple test methods. You will learn about Unit Testing later on.
Example code:
public class LabProgram {
public static void main(final String[] args) {
final double[] initialValues = new double[] { 30.0, 50.0, 10.0, 100.0, 65.0 };
final double[] adjustedValues = normalizeValuesByHighest(initialValues);
System.out.println("Adjusted values:");
for (final double d : adjustedValues) {
System.out.printf("%.2f ", Double.valueOf(d));
}
// expected otuput is 0.30 0.50 0.10 1.00 0.65
System.out.println();
System.out.println("All done.");
}
static public double[] normalizeValuesByHighest(final double[] pInitialValues) {
if (pInitialValues == null) throw new IllegalArgumentException("Invalid double[] given!");
if (pInitialValues.length < 1) throw new IllegalArgumentException("double[] given contains no elements!");
// detect valid max value
double tempMaxValue = -Double.MAX_VALUE;
boolean hasValues = false;
for (final double d : pInitialValues) {
if (Double.isNaN(d)) continue;
tempMaxValue = Math.max(tempMaxValue, d);
hasValues = true;
}
if (!hasValues) throw new IllegalArgumentException("double[] given contains no valid elements, only NaNs!");
// create return array
final double maxValue = tempMaxValue; // final from here on
final double[] ret = new double[pInitialValues.length];
for (int i = 0; i < pInitialValues.length; i++) {
ret[i] = pInitialValues[i] / maxValue; // NaN will stay NaN
}
return ret;
}
}
Output:
Adjusted values:
0,30 0,50 0,10 1,00 0,65
All done.

Optimisation in Java Using Apache Commons Math

I'm trying to minimise a value in Java usingcommons-math. I've had a look at their documentation but I don't really get how to implement it.
Basically, in my code below, I have a Double which has the expected goals in a soccer match and I'd like to optimise the probability value of under 3 goals occurring in a game to 0.5.
import org.apache.commons.math3.distribution.PoissonDistribution;
public class Solver {
public static void main(String[] args) {
final Double expectedGoals = 2.9d;
final PoissonDistribution poissonGoals = new PoissonDistribution(expectedGoals);
Double probabilityUnderThreeGoals = 0d;
for (int score = 0; score < 15; score++) {
final Double probability =
poissonGoals.probability(score);
if (score < 3) {
probabilityUnderThreeGoals = probabilityUnderThreeGoals + probability;
}
}
System.out.println(probabilityUnderThreeGoals); //prints 0.44596319855718064, I want to optimise this to 0.5
}
}

The cumulative probability (<= x) of a Poisson random variable can be calculated by:
In your case, x is 2 and you want to find lambda (the mean) such that this is 0.5. You can type this into WolframAlpha and have it solve it for you. So rather than an optimisation problem, this is just a root-finding problem (though one could argue that optimisation problems are just finding roots.)
You can also do this with Apache Commons Maths, with one of the root finders.
int maximumGoals = 2;
double expectedProbability = 0.5;
UnivariateFunction f = x -> {
double sum = 0;
for (int i = 0; i <= maximumGoals; i++) {
sum += Math.pow(x, i) / CombinatoricsUtils.factorialDouble(i);
}
return sum * Math.exp(-x) - expectedProbability;
};
// the four parameters that "solve" takes are:
// the number of iterations, the function to solve, min and max of the root
// I've put some somewhat sensible values as an example. Feel free to change them
double answer = new BisectionSolver().solve(Integer.MAX_VALUE, f, 0, maximumGoals / expectedProbability);
System.out.println("Solved: " + answer);
System.out.println("Cumulative Probability: " + new PoissonDistribution(answer).cumulativeProbability(maximumGoals));
This prints:
Solved: 2.674060344696045
Cumulative Probability: 0.4999999923623868

Get a random number within a range with a bias

Hello i am trying to make a method to generate a random number within a range
where it can take a Bias that will make the number more likely to be higher/lower depending on the bias.
To do this currently i was using this
public int randIntWeightedLow(int max, int min, int rolls){
int rValue = 100;
for (int i = 0; i < rolls ; i++) {
int rand = randInt(min, max);
if (rand < rValue ){
rValue = rand;
}
}
return rValue;
}
This works okay by giving me a number in the range and the more rolls i add the likely the number will be low. However the problem i am running in to is that the there is a big difference between having 3 rolls and 4 rolls.
I am loking to have somthing like
public void randomIntWithBias(int min, int max, float bias){
}
Where giving a negative bias would make the number be low more often and
a positive bias make the number be higher more often but still keeping the number in the random of the min and max.
Currently to generate a random number i am using
public int randInt(final int n1, final int n2) {
if (n1 == n2) {
return n1;
}
final int min = n1 > n2 ? n2 : n1;
final int max = n1 > n2 ? n1 : n2;
return rand.nextInt(max - min + 1) + min;
}
I am new to java and coding in general so any help would be greatly appreciated.

Ok, here is quick sketch how it could be done.
First, I propose to use Apache commons java library, it has sampling for integers
with different probabilities already implemented. We need Enumerated Integer Distribution.
Second, two parameters to make distribution look linear, p0 and delta.
For kth value relative probability would be p0 + k*delta. For delta positive
larger numbers will be more probable, for delta negative smaller numbers will be
more probable, delta=0 equal to uniform sampling.
Code (my Java is rusty, please bear with me)
import org.apache.commons.math3.distribution.EnumeratedIntegerDistribution;
public int randomIntWithBias(int min, int max, double p0, double delta){
if (p0 < 0.0)
throw new Exception("Negative initial probability");
int N = max - min + 1; // total number of items to sample
double[] p = new double[N]; // probabilities
int[] items = new int[N]; // items
double sum = 0.0; // total probabilities summed
for(int k = 0; k != N; ++k) { // fill arrays
p[k] = p0 + k*delta;
sum += p[k];
items[k] = min + k;
}
if (delta < 0.0) { // when delta negative we could get negative probabilities
if (p[N-1] < 0.0) // check only last probability
throw new Exception("Negative probability");
}
for(int k = 0; k != N; ++k) { // Normalize probabilities
p[k] /= sum;
}
EnumeratedIntegerDistribution rng = new EnumeratedIntegerDistribution(items, p);
return rng.sample();
}
That's the gist of the idea, code could be (and should be) optimized and cleaned.
UPDATE
Of course, instead of linear bias function you could put in, say, quadratic one.
General quadratic function has three parameters - pass them on, fill in a similar way array of probabilities, normalize, sample

Quadratic Time for 4-sum Implementation

Given an array with x elements, I must find four numbers that, when summed, equal zero. I also need to determine how many such sums exist.
So the cubic time involves three nested iterators, so we just have to look up the last number (with binary search).
Instead by using the cartesian product (same array for X and Y) we can store all pairs and their sum in a secondary array. So for each sum d we just have to look for -d.
This should look something like for (close to) quadratic time:
public static int quad(Double[] S) {
ArrayList<Double> pairs = new ArrayList<>(S.length * S.length);
int count = 0;
for (Double d : S) {
for (Double di : S) {
pairs.add(d + di);
}
}
Collections.sort(pairs);
for (Double d : pairs) {
int index = Collections.binarySearch(pairs, -d);
if (index > 0) count++; // -d was found so increment
}
return count;
}
With x being 353 (for our specific array input), the solution should be 528 but instead I only find 257 using this solution. For our cubic time we are able to find all 528 4-sums
public static int count(Double[] a) {
Arrays.sort(a);
int N = a.length;
int count = 0;
for(int i = 0; i < N; i++) {
for (int j = 0; j < N; j++) {
for (int k = 0; k < N; k++) {
int l = Arrays.binarySearch(a, -(a[i] + a[j] + a[k]));
if (l > 0) count++;
}
}
}
return count;
}
Is the precision of double lost by any chance?
EDIT: Using BigDecimal instead of double was discussed, but we were afraid it would have an impact on performance. We are only dealing with 353 elements in our array, so would this mean anything to us?
EDITEDIT: I apologize if I use BigDecimal incorrectly. I have never dealt with the library before. So after multiple suggestions I tried using BigDecimal instead
public static int quad(Double[] S) {
ArrayList<BigDecimal> pairs = new ArrayList<>(S.length * S.length);
int count = 0;
for (Double d : S) {
for (Double di : S) {
pairs.add(new BigDecimal(d + di));
}
}
Collections.sort(pairs);
for (BigDecimal d : pairs) {
int index = Collections.binarySearch(pairs, d.negate());
if (index >= 0) count++;
}
return count;
}
So instead of 257 it was able to find 261 solutions. This might indicate there is a problem double and I am in fact losing precision. However 261 is far away from 528, but I am unable to locate the cause.
LASTEDIT: So I believe this is horrible and ugly code, but it seems to be working none the less. We had already experimented with while but with BigDecimal we are now able to get all 528 matches.
I am not sure if it's close enough to quadratic time or not, time will tell.
I present you the monster:
public static int quad(Double[] S) {
ArrayList<BigDecimal> pairs = new ArrayList<>(S.length * S.length);
int count = 0;
for (Double d : S) {
for (Double di : S) {
pairs.add(new BigDecimal(d + di));
}
}
Collections.sort(pairs);
for (BigDecimal d : pairs) {
BigDecimal negation = d.negate();
int index = Collections.binarySearch(pairs, negation);
while (index >= 0 && negation.equals(pairs.get(index))) {
index--;
}
index++;
while (index >= 0 && negation.equals(pairs.get(index))) {
count++;
index++;
}
}
return count;
}

You should use the BigDecimal class instead of double here, since exact precision of the floating point numbers in your array adding up to 0 is a must for your solution. If one of your decimal values was .1, you're in trouble. That binary fraction cannot be precisely represented with a double. Take the following code as an example:
double counter = 0.0;
while (counter != 1.0)
{
System.out.println("Counter = " + counter);
counter = counter + 0.1;
}
You would expect this to execute 10 times, but it is an infinite loop since counter will never be precisely 1.0.
Example output:
Counter = 0.0
Counter = 0.1
Counter = 0.2
Counter = 0.30000000000000004
Counter = 0.4
Counter = 0.5
Counter = 0.6
Counter = 0.7
Counter = 0.7999999999999999
Counter = 0.8999999999999999
Counter = 0.9999999999999999
Counter = 1.0999999999999999
Counter = 1.2
Counter = 1.3
Counter = 1.4000000000000001
Counter = 1.5000000000000002
Counter = 1.6000000000000003

When you search for either pairs or an individual element, you need to count with multiplicity. I.e., if you find element -d in your array of either singletons or pairs, then you need to increase the count by the number of matches that are found, not just increase by 1. This is probably why you're not getting the full number of results when you search over pairs. And it could mean that the number 528 of matches is not the true full number when you are searching over singletons. And in general, you should not use double precision arithmetic for exact arithmetic; use an arbitrary precision rational number package instead.

Generating N numbers that sum to 1

Given an array of size n I want to generate random probabilities for each index such that Sigma(a[0]..a[n-1])=1
One possible result might be:
0 1 2 3 4
0.15 0.2 0.18 0.22 0.25
Another perfectly legal result can be:
0 1 2 3 4
0.01 0.01 0.96 0.01 0.01
How can I generate these easily and quickly? Answers in any language are fine, Java preferred.

Get n random numbers, calculate their sum and normalize the sum to 1 by dividing each number with the sum.

The task you are trying to accomplish is tantamount to drawing a random point from the N-dimensional unit simplex.
http://en.wikipedia.org/wiki/Simplex#Random_sampling might help you.
A naive solution might go as following:
public static double[] getArray(int n)
{
double a[] = new double[n];
double s = 0.0d;
Random random = new Random();
for (int i = 0; i < n; i++)
{
a [i] = 1.0d - random.nextDouble();
a [i] = -1 * Math.log(a[i]);
s += a[i];
}
for (int i = 0; i < n; i++)
{
a [i] /= s;
}
return a;
}
To draw a point uniformly from the N-dimensional unit simplex, we must take a vector of exponentially distributed random variables, then normalize it by the sum of those variables. To get an exponentially distributed value, we take a negative log of uniformly distributed value.

This is relatively late, but to show the ammendment to #Kobi's simple and straightforward answer given in this paper pointed to by #dreeves which makes the sampling uniform. The method (if I understand it clearly) is to
Generate n-1 distinct values from the range [1, 2, ... , M-1].
Sort the resulting vector
Add 0 and M as the first and last elements of the resulting vector.
Generate a new vector by computing xi - xi-1 where i = 1,2, ... n. That is, the new vector is made up of the differences between consecutive elements of the old vector.
Divide each element of the new vector by M. You have your uniform distribution!
I am curious to know if generating distinct random values and normalizing them to 1 by dividing by their sum will also produce a uniform distribution.

Get n random numbers, calculate their sum and normalize the sum to 1
by dividing each number with the sum.
Expanding on Kobi's answer, here's a Java function that does exactly that.
public static double[] getRandDistArray(int n) {
double randArray[] = new double[n];
double sum = 0;
// Generate n random numbers
for (int i = 0; i < randArray.length; i++) {
randArray[i] = Math.random();
sum += randArray[i];
}
// Normalize sum to 1
for (int i = 0; i < randArray.length; i++) {
randArray[i] /= sum;
}
return randArray;
}
In a test run, getRandDistArray(5) returned the following
[0.1796505603694718, 0.31518724882558813, 0.15226147256596428, 0.30954417535503603, 0.043356542883939767]

If you want to generate values from a normal distribution efficiently, try the Box Muller Transformation.

public static double[] array(int n){
double[] a = new double[n];
double flag = 0;
for(int i=0;i<n;i++){
a[i] = Math.random();
flag += a[i];
}
for(int i=0;i<n;i++) a[i] /= flag;
return a;
}
Here, at first a stores random numbers. And the flag will keep the sum all the numbers generated so that at the next for loop the numbers generated will be divided by the flag, which at the end the array will have random numbers in probability distribution.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Implementing probability distribution function in Java - java

Related

Adjust list by normalizing in java

Optimisation in Java Using Apache Commons Math

Get a random number within a range with a bias

Quadratic Time for 4-sum Implementation

Generating N numbers that sum to 1

Categories

Resources