Finding a mode with decreasing precision - java

I feel like there should be an available library to more simply do two things, A) Find the mode to an array, in the case of doubles and B) gracefully degrade the precision until you reach a particular frequency.
So imagine an array like this:
double[] a = {1.12, 1.15, 1.13, 2.0, 3.4, 3.44, 4.1, 4.2, 4.3, 4.4};
If I was looking for a frequency of 3 then it would go from 2 decimal positions to 1 decimal, and finally return 1.1 as my mode. If I had a frequency requirement of 4 it would return 4 as my mode.
I do have a set of code that is working the way I want, and returning what I am expecting, but I feel like there should be a more efficient way to accomplish this, or an existing library that would help me do the same. Attached is my code, I'd be interested in thoughts / comments on different approaches I should have taken....I have the iterations listed to limit how far the precision can degrade.
public static double findMode(double[] r, int frequencyReq)
{
double mode = 0d;
int frequency = 0;
int iterations = 4;
HashMap<Double, BigDecimal> counter = new HashMap<Double, BigDecimal>();
while(frequency < frequencyReq && iterations > 0){
String roundFormatString = "#.";
for(int j=0; j<iterations; j++){
roundFormatString += "#";
}
DecimalFormat roundFormat = new DecimalFormat(roundFormatString);
for(int i=0; i<r.length; i++){
double element = Double.valueOf(roundFormat.format(r[i]));
if(!counter.containsKey(element))
counter.put(element, new BigDecimal(0));
counter.put(element,counter.get(element).add(new BigDecimal(1)));
}
for(Double key : counter.keySet()){
if(counter.get(key).compareTo(new BigDecimal(frequency))>0){
mode = key;
frequency = counter.get(key).intValue();
log.debug("key: " + key + " Count: " + counter.get(key));
}
}
iterations--;
}
return mode;
}
Edit
Another way to rephrase the question, per Paulo's comment: the goal is to locate a number where in the neighborhood are at least frequency array elements, with the radius of the neighborhood being as small as possible.

Here a solution to the reformulated question:
The goal is to locate a number where in the neighborhood are at least frequency array elements, with the radius of the neighborhood being as small as possible.
(I took the freedom of switching the order of 1.15 and 1.13 in the input array.)
The basic idea is: We have the input already sorted (i.e. neighboring elements are consecutive), and we know how many elements we want in our neighborhood. So we loop once over this array, measuring the distance between the left element and the element frequency elements more to the right. Between them are frequency elements, so this forms a neighbourhood. Then we simply take the minimum such distance. (My method has a complicated way to return the results, you may want to do it better.)
This is not completely equivalent to your original question (does not work by fixed steps of digits), but maybe this is more what you really want :-)
You'll have to find a better way of formatting the results, though.
package de.fencing_game.paul.examples;
import java.util.Arrays;
/**
* searching of dense points in a distribution.
*
* Inspired by http://stackoverflow.com/questions/5329628/finding-a-mode-with-decreasing-precision.
*/
public class InpreciseMode {
/** our input data, should be sorted ascending. */
private double[] data;
public InpreciseMode(double ... data) {
this.data = data;
}
/**
* searchs the smallest neighbourhood (by diameter) which
* contains at least minSize elements.
*
* #return an array of two arrays:
* { { the middle point of the neighborhood,
* the diameter of the neighborhood },
* all the elements of the neigborhood }
*
* TODO: better return an object of a class encapsuling these.
*/
public double[][] findSmallNeighbourhood(int minSize) {
int currentLeft = -1;
int currentRight = -1;
double currentMinDiameter = Double.POSITIVE_INFINITY;
for(int i = 0; i + minSize-1 < data.length; i++) {
double diameter = data[i+minSize-1] - data[i];
if(diameter < currentMinDiameter) {
currentMinDiameter = diameter;
currentLeft = i;
currentRight = i + minSize-1;
}
}
return
new double[][] {
{
(data[currentRight] + data[currentLeft])/2.0,
currentMinDiameter
},
Arrays.copyOfRange(data, currentLeft, currentRight+1)
};
}
public void printSmallNeighbourhoods() {
for(int frequency = 2; frequency <= data.length; frequency++) {
double[][] found = findSmallNeighbourhood(frequency);
System.out.printf("There are %d elements in %f radius "+
"around %f:%n %s.%n",
frequency, found[0][1]/2, found[0][0],
Arrays.toString(found[1]));
}
}
public static void main(String[] params) {
InpreciseMode m =
new InpreciseMode(1.12, 1.13, 1.15, 2.0, 3.4, 3.44, 4.1,
4.2, 4.3, 4.4);
m.printSmallNeighbourhoods();
}
}
The output is
There are 2 elements in 0,005000 radius around 1,125000:
[1.12, 1.13].
There are 3 elements in 0,015000 radius around 1,135000:
[1.12, 1.13, 1.15].
There are 4 elements in 0,150000 radius around 4,250000:
[4.1, 4.2, 4.3, 4.4].
There are 5 elements in 0,450000 radius around 3,850000:
[3.4, 3.44, 4.1, 4.2, 4.3].
There are 6 elements in 0,500000 radius around 3,900000:
[3.4, 3.44, 4.1, 4.2, 4.3, 4.4].
There are 7 elements in 1,200000 radius around 3,200000:
[2.0, 3.4, 3.44, 4.1, 4.2, 4.3, 4.4].
There are 8 elements in 1,540000 radius around 2,660000:
[1.12, 1.13, 1.15, 2.0, 3.4, 3.44, 4.1, 4.2].
There are 9 elements in 1,590000 radius around 2,710000:
[1.12, 1.13, 1.15, 2.0, 3.4, 3.44, 4.1, 4.2, 4.3].
There are 10 elements in 1,640000 radius around 2,760000:
[1.12, 1.13, 1.15, 2.0, 3.4, 3.44, 4.1, 4.2, 4.3, 4.4].

I think there's nothing wrong with your code and I doubt that you will find a library that does something so specific. But if still you want an idea to approach this problem using a more OOP approach that reuses Java collections, here it comes another approach:
Create a class to represent numbers with different number of decimals. It would have something like VariableDecimal(double d,int ndecimals) as constructor.
In that class override the object methods equals and hashCode. Your implementation of equals will test if two instances of VariableDecimal are the same taking into account the value d and the number of decimals. hashCode can simple return d*exp(10,ndecimals) casted to Integer.
In your logic use HashMaps so that they reuse your object:
HashMap<VariableDecimal, AtomicInteger> counters = new HashMap<VariableDecimal, AtomicInteger>();
for (double d : a) {
VariableDecimal vd = new VariableDecimal(d,ndecimals);
if (counters.get(vd)!=null)
counters.set(vd,new AtomicInteger(0));
counters.get(vd).incrementAndGet();
}
/* at the end of this loop counters should hold a map with frequencies of
each double for the selected precision so that you can simply traverse and
get the max */
This piece of code doesn't show the iteration to decrement the number of decimals, which is trivial.

Related

is there a faster way to search through cumulative distribution?

I have a List<Double> that holds probabilities (weights) for sampling an item. For example, the List holds 5 values as follows.
0.1, 0.4, 0.2, 0.1, 0.2
Each i-th Double value is the probability of sampling the i-th item of another List<Object>.
How can I construct an algorithm to perform the sampling according to these probabilities?
I tried something like this, where I first made the list of probabilities into a cumulative form.
0.1, 0.5, 0.7, 0.8, 1.0
Then my approach is as follows. I generate a random double, and iterate over the list to find the first item that is larger than the random double, and then return its index.
Random r = new Random();
double p = r.nextDouble();
int total = list.size();
for(int i=0; i < total; i++) {
double d = list.get(i);
if(d > p) {
return i;
}
}
return total-1;
This approach is slow as I am crawling through the list sequentially. In reality, my list is of 800,000 items associated with weights (probabilities) that I need to sample from. So, needless to say, this sequential approach is slow.
I'm not sure how binary search can help. Let's say I generated p = 0.01. Then, a binary search can use recursion as follows with the list.
compare 0.01 to 0.7, repeat with L = 0.1, 0.5
compare 0.01 to 0.1, stop
compare 0.01 to 0.5, stop
0.01 is smaller than 0.7, 0.5, and 0.1, but I obviously only want 0.1. So the stopping criteria is still not clear to me when using binary search.
If there's a library to help with this type of thing I'd also be interested.
Here is how you could do it using binary search, starting with the cumulative probabilities:
public static void main (String[] args) {
double[] cdf = {0.1, 0.5, 0.7, 0.8, 1.0};
double random = 0.75; // generate randomly between zero and one
int el = Arrays.binarySearch(cdf, random);
if (el < 0) {
el = -(el + 1);
}
System.out.println(el);
}
P.S. When the list of probabilities is short, a simple linear scan might turn out to be as efficient as binary search.
This isn't the most memory-efficient approach, but use a NavigableMap where your cumulative list's values are the keys. Then you can just use floorEntry(randon.nextDouble()). Like the binary search, it's log(n) space and n memory.
So...
NavigableMap<Double, Object> pdf = new TreeMap<>();
pdf.put(0.0, "foo");
pdf.put(0.1, "bar");
pdf.put(0.5, "baz");
pdf.put(0.7, "quz");
pdf.put(0.8, "quuz");
Random random = new Random();
pdf.floorEntry(random.nextDouble()).getValue();

Unique Computational value for an array

I have been thinking of it but have ran out of idea's. I have 10 arrays each of length 18 and having 18 double values in them. These 18 values are features of an image. Now I have to apply k-means clustering on them.
For implementing k-means clustering I need a unique computational value for each array. Are there any mathematical or statistical or any logic that would help me to create a computational value for each array, which is unique to it based upon values inside it. Thanks in advance.
Here is my array example. Have 10 more
[0.07518284315321135
0.002987851573676068
0.002963866526639678
0.002526139418225552
0.07444872939213325
0.0037219653347541617
0.0036979802877177715
0.0017920256571474585
0.07499695903867931
0.003477831820276616
0.003477831820276616
0.002036159171625004
0.07383539747505984
0.004311312204791184
0.0043352972518275745
0.0011786937400740452
0.07353130134299131
0.004339580295941216]
Did you checked the Arrays.hashcode in Java 7 ?
/**
* Returns a hash code based on the contents of the specified array.
* For any two <tt>double</tt> arrays <tt>a</tt> and <tt>b</tt>
* such that <tt>Arrays.equals(a, b)</tt>, it is also the case that
* <tt>Arrays.hashCode(a) == Arrays.hashCode(b)</tt>.
*
* <p>The value returned by this method is the same value that would be
* obtained by invoking the {#link List#hashCode() <tt>hashCode</tt>}
* method on a {#link List} containing a sequence of {#link Double}
* instances representing the elements of <tt>a</tt> in the same order.
* If <tt>a</tt> is <tt>null</tt>, this method returns 0.
*
* #param a the array whose hash value to compute
* #return a content-based hash code for <tt>a</tt>
* #since 1.5
*/
public static int hashCode(double a[]) {
if (a == null)
return 0;
int result = 1;
for (double element : a) {
long bits = Double.doubleToLongBits(element);
result = 31 * result + (int)(bits ^ (bits >>> 32));
}
return result;
}
I dont understand why #Marco13 mentioned " this is not returning unquie for arrays".
UPDATE
See #Macro13 comment for the reason why it cannot be unquie..
UPDATE
If we draw a graph using your input points, ( 18 elements) has one spike and 3 low values and the pattern goes..
if that is true.. you can find the mean of your Peak ( 1, 4, 8,12,16 ) and find the low Mean from remaining values.
So that you will be having Peak mean and Low mean . and you find the unquie number to represent these two also preserve the values using bijective algorithm described in here
This Alogirthm also provides formulas to reverse i.e take the Peak and Low mean from the unquie value.
To find unique pair < x; y >= x + (y + ( (( x +1 ) /2) * (( x +1 ) /2) ) )
Also refer Exercise 1 in pdf page 2 to reverse x and y.
For finding Mean and find paring value.
public static double mean(double[] array){
double peakMean = 0;
double lowMean = 0;
for (int i = 0; i < array.length; i++) {
if ( (i+1) % 4 == 0 || i == 0){
peakMean = peakMean + array[i];
}else{
lowMean = lowMean + array[i];
}
}
peakMean = peakMean / 5;
lowMean = lowMean / 13;
return bijective(lowMean, peakMean);
}
public static double bijective(double x,double y){
double tmp = ( y + ((x+1)/2));
return x + ( tmp * tmp);
}
for test
public static void main(String[] args) {
double[] arrays = {0.07518284315321135,0.002963866526639678,0.002526139418225552,0.07444872939213325,0.0037219653347541617,0.0036979802877177715,0.0017920256571474585,0.07499695903867931,0.003477831820276616,0.003477831820276616,0.002036159171625004,0.07383539747505984,0.004311312204791184,0.0043352972518275745,0.0011786937400740452,0.07353130134299131,0.004339580295941216};
System.out.println(mean(arrays));
}
You can use this the peak and low values to find the similar images.
You can simply sum the values, using double precision, the result value will unique most of the times. On the other hand, if the value position is relevant, then you can apply a sum using the index as multiplier.
The code could be as simple as:
public static double sum(double[] values) {
double val = 0.0;
for (double d : values) {
val += d;
}
return val;
}
public static double hash_w_order(double[] values) {
double val = 0.0;
for (int i = 0; i < values.length; i++) {
val += values[i] * (i + 1);
}
return val;
}
public static void main(String[] args) {
double[] myvals =
{ 0.07518284315321135, 0.002987851573676068, 0.002963866526639678, 0.002526139418225552, 0.07444872939213325, 0.0037219653347541617, 0.0036979802877177715, 0.0017920256571474585, 0.07499695903867931, 0.003477831820276616,
0.003477831820276616, 0.002036159171625004, 0.07383539747505984, 0.004311312204791184, 0.0043352972518275745, 0.0011786937400740452, 0.07353130134299131, 0.004339580295941216 };
System.out.println("Computed value based on sum: " + sum(myvals));
System.out.println("Computed value based on values and its position: " + hash_w_order(myvals));
}
The output for that code, using your list of values is:
Computed value based on sum: 0.41284176550504803
Computed value based on values and its position: 3.7396448842464496
Well, here's a method that works for any number of doubles.
public BigInteger uniqueID(double[] array) {
final BigInteger twoToTheSixtyFour =
BigInteger.valueOf(Long.MAX_VALUE).add(BigInteger.ONE);
BigInteger count = BigInteger.ZERO;
for (double d : array) {
long bitRepresentation = Double.doubleToRawLongBits(d);
count = count.multiply(twoToTheSixtyFour);
count = count.add(BigInteger.valueOf(bitRepresentation));
}
return count;
}
Explanation
Each double is a 64-bit value, which means there are 2^64 different possible double values. Since a long is easier to work with for this sort of thing, and it's the same number of bits, we can get a 1-to-1 mapping from doubles to longs using Double.doubleToRawLongBits(double).
This is awesome, because now we can treat this like a simple combinations problem. You know how you know that 1234 is a unique number? There's no other number with the same value. This is because we can break it up by its digits like so:
1234 = 1 * 10^3 + 2 * 10^2 + 3 * 10^1 + 4 * 10^0
The powers of 10 would be "basis" elements of the base-10 numbering system, if you know linear algebra. In this way, base-10 numbers are like arrays consisting of only values from 0 to 9 inclusively.
If we want something similar for double arrays, we can discuss the base-(2^64) numbering system. Each double value would be a digit in a base-(2^64) representation of a value. If there are 18 digits, there are (2^64)^18 unique values for a double[] of length 18.
That number is gigantic, so we're going to need to represent it with a BigInteger data-structure instead of a primitive number. How big is that number?
(2^64)^18 = 61172327492847069472032393719205726809135813743440799050195397570919697796091958321786863938157971792315844506873509046544459008355036150650333616890210625686064472971480622053109783197015954399612052812141827922088117778074833698589048132156300022844899841969874763871624802603515651998113045708569927237462546233168834543264678118409417047146496
There are that many unique configurations of 18-length double arrays and this code lets you uniquely describe them.
I'm going to suggest three methods, with different pros and cons which I will outline.
Hash Code
This is the obvious "solution", though it has been correctly pointed out that it will not be unique. However, it will be very unlikely that any two arrays will have the same value.
Weighted Sum
Your elements appear to be bounded; perhaps they range from a minimum of 0 to a maximum of 1. If this is the case, you can multiply the first number by N^0, the second by N^1, the third by N^2 and so on, where N is some large number (ideally the inverse of your precision). This is easily implemented, particularly if you use a matrix package, and very fast. We can make this unique if we choose.
Euclidean Distance from Mean
Subtract the mean of your arrays from each array, square the results, sum the squares. If you have an expected mean, you can use that. Again, not unique, there will be collisions, but you (almost) can't avoid that.
The difficulty of uniqueness
It has already been explained that hashing will not give you a unique solution. A unique number is possible in theory, using the Weighted Sum, but we have to use numbers of a very large size. Let's say your numbers are 64 bits in memory. That means that there are 2^64 possible numbers they can represent (slightly less using floating point). Eighteen such numbers in an array could represent 2^(64*18) different numbers. That's huge. If you use anything less, you will not be able to guarantee uniqueness due to the pigeonhole principle.
Let's look at a trivial example. If you have four letters, a, b, c and d, and you have to number them each uniquely using the numbers 1 to 3, you can't. That's the pigeonhole principle. You have 2^(18*64) possible numbers. You can't number them uniquely with less than 2^(18*64) numbers, and hashing doesn't give you that.
If you use BigDecimal, you can represent (almost) arbitrarily large numbers. If the largest element you can get is 1 and the smallest 0, then you can set N = 1/(precision) and apply the Weighted Sum mentioned above. This will guarantee uniqueness. The precision for doubles in Java is Double.MIN_VALUE. Note that the array of weights needs to be stored in _Big Decimal_s!
That satisfies this part of your question:
create a computational value for each array, which is unique to it
based upon values inside it
However, there is a problem:
1 and 2 suck for K Means
I am assuming from your discussion with Marco 13 that you are performing the clustering on the single values, not the length 18 arrays. As Marco has already mentioned, Hashing sucks for K means. The whole idea is that the smallest change in the data will result in a large change in Hash Values. That means that two images which are similar, produce two very similar arrays, produce two very different "unique" numbers. Similarity is not preserved. The result will be pseudo random!!!
Weighted Sums are better, but still bad. It will basically ignore all the elements except for the last one, unless the last element is the same. Only then will it look at the next to last, and so on. Similarity is not really preserved.
Euclidean distance from the mean (or at least some point) will at least group things together in a sort of sensible way. Direction will be ignored, but at least things that are far from the mean won't be grouped with things that are close. Similarity of one feature is preserved, the other features are lost.
In summary
1 is very easy, but is not unique and doesn't preserve similarity.
2 is easy, can be unique and doesn't preserve similarity.
3 is easy, but is not unique and preserves some similarity.
Implementatio of Weighted Sum. Not really tested.
public class Array2UniqueID {
private final double min;
private final double max;
private final double prec;
private final int length;
/**
* Used to provide a {#code BigInteger} that is unique to the given array.
* <p>
* This uses weighted sum to guarantee that two IDs match if and only if
* every element of the array also matches. Similarity is not preserved.
*
* #param min smallest value an array element can possibly take
* #param max largest value an array element can possibly take
* #param prec smallest difference possible between two array elements
* #param length length of each array
*/
public Array2UniqueID(double min, double max, double prec, int length) {
this.min = min;
this.max = max;
this.prec = prec;
this.length = length;
}
/**
* A convenience constructor which assumes the array consists of doubles of
* full range.
* <p>
* This will result in very large IDs being returned.
*
* #see Array2UniqueID#Array2UniqueID(double, double, double, int)
* #param length
*/
public Array2UniqueID(int length) {
this(-Double.MAX_VALUE, Double.MAX_VALUE, Double.MIN_VALUE, length);
}
public BigDecimal createUniqueID(double[] array) {
// Validate the data
if (array.length != length) {
throw new IllegalArgumentException("Array length must be "
+ length + " but was " + array.length);
}
for (double d : array) {
if (d < min || d > max) {
throw new IllegalArgumentException("Each element of the array"
+ " must be in the range [" + min + ", " + max + "]");
}
}
double range = max - min;
/* maxNums is the maximum number of numbers that could possibly exist
* between max and min.
* The ID will be in the range 0 to maxNums^length.
* maxNums = range / prec + 1
* Stored as a BigDecimal for convenience, but is an integer
*/
BigDecimal maxNums = BigDecimal.valueOf(range)
.divide(BigDecimal.valueOf(prec))
.add(BigDecimal.ONE);
// For convenience
BigDecimal id = BigDecimal.valueOf(0);
// 2^[ (el-1)*length + i ]
for (int i = 0; i < array.length; i++) {
BigDecimal num = BigDecimal.valueOf(array[i])
.divide(BigDecimal.valueOf(prec))
.multiply(maxNums).pow(i);
id = id.add(num);
}
return id;
}
As I understand, you are going to make k-clustering, based on the double values.
Why not just wrap double value in an object, with array and position identifier, so you would know in which cluster it ended up?
Something like:
public class Element {
final public double value;
final public int array;
final public int position;
public Element(double value, int array, int position) {
this.value = value;
this.array = array;
this.position = position;
}
}
If you need to cluster array as a whole,
You can transform original arrays of length 18 to array of length 19 with last or first element being unique id, that you will ignore during clustering, but, to which you could refer after clustering finished. That way this have a small memory footprint - of 8 additional bytes for an array, and easy association with the original value.
If space is absolutely a problem, and you have all values of an array lesser than 1, you can add unique id, greater or equal to 1 to each array, and cluster, based on reminder of division to 1, 0.07518284315321135 stays 0.07518284315321135 for the 1st, and 0.07518284315321135 becomes 1.07518284315321135 for the 2nd, although this increases complexity of computation during clustering.
First of all, let's try to understand what you need mathematically:
Uniquely mapping an array of m real numbers to a single number is in fact a bijection between R^m and R, or at least N.
Since floating points are in fact rational numbers, your problem is to find a bijection between Q^m and N, which can be transformed to N^n to N, because you know your values will always be greater than 0 (just multiply your values by the precision).
Thus you need to map N^m to N. Take a look at the Cantor Pairing Function for some ideas
A guaranteed way to generate a unique result based on the array is to convert it to one big string, and use that for your computational value.
It may be slow, but it will be unique based on the array's values.
Implementation examples:
Best way to convert an ArrayList to a string

Weighted sampling with replacement in Java

Is there a function in Java, or in a library such as Apache Commons Math which is equivalent to the MATLAB function randsample?
More specifically, I want to find a function randSample which returns a vector of Independent and Identically Distributed random variables according to the probability distribution which I specify.
For example:
int[] a = randSample(new int[]{0, 1, 2}, 5, new double[]{0.2, 0.3, 0.5})
// { 0 w.p. 0.2
// a[i] = { 1 w.p. 0.3
// { 2 w.p. 0.5
The output is the same as the MATLAB code randsample([0 1 2], 5, true, [0.2 0.3 0.5]) where the true means sampling with replacement.
If such a function does not exist, how do I write one?
Note: I know that a similar question has been asked on Stack Overflow but unfortunately it has not been answered.
I'm pretty sure one doesn't exist, but it's pretty easy to make a function that would produce samples like that. First off, Java does come with a random number generator, specifically one with a function, Random.nextDouble() that can produce random doubles between 0.0 and 1.0.
import java.util.Random;
double someRandomDouble = Random.nextDouble();
// This will be a uniformly distributed
// random variable between 0.0 and 1.0.
If you have sampling with replacement, if you convert the pdf you have as an input into a cdf, you can use the random doubles Java provides to create a random data set by seeing in which part of the cdf it falls. So first you need to convert the pdf into a cdf.
int [] randsample(int[] values, int numsamples,
boolean withReplacement, double [] pdf) {
if(withReplacement) {
double[] cdf = new double[pdf.length];
cdf[0] = pdf[0];
for(int i=1; i<pdf.length; i++) {
cdf[i] = cdf[i-1] + pdf[i];
}
Then you make the properly-sized array of ints to store the result and start finding the random results:
int[] results = new int[numsamples];
for(int i=0; i<numsamples; i++) {
int currentPosition = 0;
while(randomValue > cdf[currentPosition] && currentPosition < cdf.length) {
currentPosition++; //Check the next one.
}
if(currentPosition < cdf.length) { //It worked!
results[i] = values[currentPosition];
} else { //It didn't work.. let's fail gracefully I guess.
results[i] = values[cdf.length-1];
// And assign it the last value.
}
}
//Now we're done and can return the results!
return results;
} else { //Without replacement.
throw new Exception("This is unimplemented!");
}
}
There's some error checking (make sure value array and pdf array are the same size) and some other features you can implement by overloading this to provide the other functions, but hopefully this is enough for you to start. Cheers!

Compute a table in every possible way

I would like to compute a table with values in "every possible way" by multiplying one value from each column to a product. I would preferably solve the problem in Java. The table is of size n*m. It could for example be of size 3*5 and containing:
0.5, 3.0, 5.0, 4.0, 0.75
0.5, 3.0, 5.0, 4.0, 0.75
0.5, 9.0, 5.0, 4.0, 3.0
One way of getting the product would be:
0.5 * 3.0 * 5.0 * 4.0 * 0.75
How do I compute this in "every possible way" when the table is of size n*m? I would like to write one program (presumably containing loops) that works for every n*m table.
You could do it recursively, as the other answer mentioned, but in general I find Java is somewhat unhappy with recursion. One other method to do it would be to keep track of a "signature" of where you are in the table (i.e., an array of length m where each value is 0 <= val < m). Each signature uniquely specifies a path through the table, and you can compute the value from a given signature pretty easily:
double val = 1.;
for (int j=0; j<m; j++)
val *= table[j][signature[j];
To iterate through all signatures, think of them as (up to) m-digit numbers in base n and simply increment through, making sure to carry when you get above n. Here's some untested, unoptimized, probably badly indexed sample code:
int[] sig = new int[m];
double[] values = new double[m*n];
while (sig[m-1] < n) {
values = getValue(table, sig);
int carry = 1, j = 0;
while (carry > 0 && j < n) {
sig[j] += carry;
carry = 0;
while (sig[j] >= n) {
sig[j] -= n;
carry += 1;
}
}
}
Create a recursive method that makes two calls, one call where you use a number in a column in the final product, and one call where you do not. In the call where you do not use it, you make two more calls, one where you use the next number in the column and one where you do not and so on. When you do use a number, you go to the next column, efficiently making a recursive tree of sorts where each leaf is a different combination of finding the product.
You would not need any data structure for this besides your table and it would be able to take in any size of table. If you do not understand the method I have described I can provide some short example code but it is fairly simple.
method findProducts(int total, pos x, pos y)
if(inbounds of table)
findProducts(total + column[x]row[y] value, 0, y+1)
findProducts(total, x+1, y)
else
print(total)
Something like this, a counter would be useful so you could only print those values that are combinations of y numbers, the amount of rows.

How do I generate normal cumulative distribution in Java? its inverse cdf? How about lognormal?

I am brand new to Java, second day! I want generate samples with normal distribution. I am using inverse transformation.
Basically, I want to find the inverse normal cumulative distribution, then find its inverse. And generate samples.
My questions is: Is there a built-in function for inverse normal cdf? Or do I have to hand code?
I have seen people refer to this on apache commons. Is this a built-in? Or do I have to download it?
If I have to do it myself, can you give me some tips? If I download, doesn't my prof also have to have the "package" or special file installed?
Thanks in advance!
Edit:Just found I can't use libraries, also heard there is simpler way converting normal using radian.
As it is mentioned here:
Apache Commons - Math has what you are looking for.
More specifically, check out the NormalDistrubitionImpl class.
And no your professor doesn't need to download stuff if you provide him with all the needed libraries.
UPDATE :
If you want to hand code it (I don't know the actual formula), you can check the following link:
http://home.online.no/~pjacklam/notes/invnorm/
There are 2 people who implemented it in java: http://home.online.no/~pjacklam/notes/invnorm/#Java
I had had the same problem and find its solution, the following code will give results for cumulative distribution function just like excel do:
private static double erf(double x)
{
//A&S formula 7.1.26
double a1 = 0.254829592;
double a2 = -0.284496736;
double a3 = 1.421413741;
double a4 = -1.453152027;
double a5 = 1.061405429;
double p = 0.3275911;
x = Math.abs(x);
double t = 1 / (1 + p * x);
//Direct calculation using formula 7.1.26 is absolutely correct
//But calculation of nth order polynomial takes O(n^2) operations
//return 1 - (a1 * t + a2 * t * t + a3 * t * t * t + a4 * t * t * t * t + a5 * t * t * t * t * t) * Math.Exp(-1 * x * x);
//Horner's method, takes O(n) operations for nth order polynomial
return 1 - ((((((a5 * t + a4) * t) + a3) * t + a2) * t) + a1) * t * Math.exp(-1 * x * x);
}
public static double NORMSDIST(double z)
{
double sign = 1;
if (z < 0) sign = -1;
double result=0.5 * (1.0 + sign * erf(Math.abs(z)/Math.sqrt(2)));
return result;
}
Mathematically, this is a hard problem, and there are a few solutions you might consider.
Dislcaimer: Mathematical jargon ahead.
As you probably already know, the normalcdf function is used to calculate probabilities of normal random variables. Because a normal distribution is continuous, the corresponding probability density function (normalpdf) does not itself give probabilities, (in contrast to discrete distributions like binomial or geometric distributions). Instead, the area under the curve gives the probability that the normal random variable falls within a range of values. So, the normalcdf function you seek is the area under a section of the normalpdf function.
Mathematically, finding the area under a continuous curve is a fundamental problem of calculus. The solution to this type of problem is called an integral and integrating a function over a range of numbers means finding the area under the curve and between the lowest value in the range to the highest.
In most circumstances, we could just integrate the pdf function to get the cdf function, then evaluate it wherever we want. The heart of the problem, and the reason that an algorithm in Java is not as simple as one might think, is that normalpdf function does not have a closed form integral- it's value cannot be calculated in any finite number of steps. So, values of the normalcdf function are particularly elusive.
There are two main classes of solutions for the problem.
1. Numerical Integration Techniques
Numerical integration techniques solve the problem by approximating the area under the curve geometrically. The area is divided into rectangles or other shapes of equal or varying widths, with the height of each being given by the pdf function. The sum of the areas of the rectangle is an approximation of the area under the curve, which is the corresponding probability. These technique can be used to compute values to arbitrary precision, but is more computationally expensive than class 2. Using better approximations (e.g. Simpson's rule) improves computation. A simple numeric integration method follows.
public static double normCDF(double z)
{ double LeftEndpoint = -100;
int nRectangles = 100000;
double runningSum = 0;
double x;
for(int n = 0; n < nRectangles; n++){
x = LeftEndpoint + n*(z-LeftEndpoint)/nRectangles;
runningSum += Math.pow(Math.sqrt(2*Math.PI),-1)*Math.exp(-Math.pow(x,2)/2)*(z-LeftEndpoint)/nRectangles;
}
System.out.println(runningSum);
return runningSum;
}
2. Analytic Techniques
Analytic techniques take advantage of the fact that while the normalpdf does not have a closed-form integral, the pdf can be "converted" to a sum called a Taylor series, then integrated term-by-term. Basically, it turns the pdf into a sum of infinitely many simple functions, then integrates each one analytically, then adds together all of the integrals. Since this is an analytic procedure, a programmer need only include the integral series in the program after computing the coefficients. The precision of the result just depends on how many terms of the sum you include in the calculation, and tends to approach accurate values much sooner than numerical integration techniques. For example, the solution by Mohammad Aldefrawy computes just five coefficients. Below is a method that includes the computation of coefficients, so you one could compute values to arbitrary precision (Actually, the normalcdf series isn't computed directly. Instead, the coefficients of the related error function are computed then converted by a linear transformation). However, since computation of the coefficients involves the factorial function, one experiences memory issues for substantially large numbers of coefficients. Thankfully, this method returns values with much higher precision in a fraction of the iterations required by methods in the previous class of solutions to yield similar results.
public static double normalCDF(double x){
System.out.println(0.5*(1+erf(x/Math.sqrt(2))));
return 0.5*(1+erf(x/Math.sqrt(2)));
}
public static double erf(double z)
{
int nTerms = 315;
double runningSum = 0;
for(int n = 0; n < nTerms; n++){
runningSum += Math.pow(-1,n)*Math.pow(z,2*n+1)/(factorial(n)*(2*n+1));
}
return (2/Math.sqrt(Math.PI))*runningSum;
}
static double factorial(int n){
if(n == 0) return 1;
if(n == 1) return 1;
return n*factorial(n-1);
}
Other functions
For the inverse function, since we used the error function in the normalCDF method, we can use the inverse error function in a similar way. Again, we obtain the coefficients of the inverse error function analytically, then compute them as needed in the method.
public static double invErf(double z)
{
int nTerms = 315;
double runningSum = 0;
double[] a = new double[nTerms + 1];
double[] c = new double[nTerms + 1];
c[0]=1;
for(int n = 1; n < nTerms; n++){
double runningSum2=0;
for (int k = 0; k <= n-1; k++){
runningSum2 += c[k]*c[n-1-k]/((k+1)*(2*k+1));
}
c[n] = runningSum2;
runningSum2 = 0;
}
for(int n = 0; n < nTerms; n++){
a[n] = c[n]/(2*n+1);
runningSum += a[n]*Math.pow((0.5)*Math.sqrt(Math.PI)*z,2*n+1);
}
return runningSum;
}
public static double invNorm(double A){
return (2/Math.sqrt(2))*invErf(2*A-1);
}
I don't have a method for the lognormal function, but you could obtain one using the same idea.
I never tried it but the guys from algo team were using Colt and they were happy with the results.

Categories