JavaFX NumberAxis AutoRange Infinite Loop - java

I have a LineChart where the Y-Axis is set to auto range. Occasionally the JavaFx Thread hangs due to NumberAxis.autoRange() getting stuck in an infinite loop. New data generated by a worker thread and then added to the chart (on JFX thread) every few seconds. The infinite loop happens in this code (taken from NumberAxis.autoRange()):
for (double major = minRounded; major <= maxRounded; major += tickUnitRounded, count ++) {
double size = side.isVertical() ? measureTickMarkSize(major, getTickLabelRotation(), formatter).getHeight() :
measureTickMarkSize(major, getTickLabelRotation(), formatter).getWidth();
if (major == minRounded) { // first
last = size/2;
} else {
maxReqTickGap = Math.max(maxReqTickGap, last + 6 + (size/2) );
}
}
From debugging I've see that the if (major == minRoundeed) conditional is true every time. So, the major variable must not be getting updated.
I do not have a compiled version of the NumberAxis class with local variable debug info so I cannot see what the local variables are. Building the JavaFX Runtime classes seems like a lot of work but may be the next step.
I'm not able to reliably repro this issue and thus not able to provide a Minimal, Complete, and Verifiable example. I have not seen any issues logged in the Oracle or OpenJDK bug databases.
JDK Version: 8u60
EDIT:
I reported a this bug with Oracle and currently waiting for them to accept it.

Problem
The meant loop will rely on double values. So if you try to take so small double values for minValue and maxValue, it will fail.
Bug or not?
To me it's not like a bug. You can ask yourself, if you really want to show such big fractions on your axis, or can you scale them better up? The user of your Application maybe even happier with reading 1.5 with the base on the axis label than 0.00000000000000000000000000000000000000015 or 1.5E-33?
And there be more things in the whole Java API where this can happen too, because it's a simple number overflow.
A simple example
This will demonstrate, that if the values are too small, it will loop infinite.
import javafx.geometry.Side;
public class AutoRangeTester {
/**
* #param args the command line arguments
*/
public static void main(String[] args) {
AutoRangeTester art = new AutoRangeTester();
art.autoRange(Double.MIN_VALUE, Double.MIN_VALUE + 0.000000000000000000000000000000001, 100, 50);
}
/**
* Called to set the upper and lower bound and anything else that needs to be
* auto-ranged
*
* #param minValue The min data value that needs to be plotted on this axis
* #param maxValue The max data value that needs to be plotted on this axis
* #param length The length of the axis in display coordinates
* #param labelSize The approximate average size a label takes along the axis
*
* #return The calculated range
*/
public Object autoRange(double minValue, double maxValue, double length,
double labelSize) {
final Side side = Side.LEFT;
// check if we need to force zero into range
if (true) {
if (maxValue < 0) {
maxValue = 0;
} else if (minValue > 0) {
minValue = 0;
}
}
final double range = maxValue - minValue;
// pad min and max by 2%, checking if the range is zero
final double paddedRange = (range == 0) ? 2 : Math.abs(range) * 1.02;
final double padding = (paddedRange - range) / 2;
// if min and max are not zero then add padding to them
double paddedMin = minValue - padding;
double paddedMax = maxValue + padding;
// check padding has not pushed min or max over zero line
if ((paddedMin < 0 && minValue >= 0) || (paddedMin > 0 && minValue <= 0)) {
// padding pushed min above or below zero so clamp to 0
paddedMin = 0;
}
if ((paddedMax < 0 && maxValue >= 0) || (paddedMax > 0 && maxValue <= 0)) {
// padding pushed min above or below zero so clamp to 0
paddedMax = 0;
}
// calculate the number of tick-marks we can fit in the given length
int numOfTickMarks = (int) Math.floor(length / labelSize);
// can never have less than 2 tick marks one for each end
numOfTickMarks = Math.max(numOfTickMarks, 2);
// calculate tick unit for the number of ticks can have in the given data range
double tickUnit = paddedRange / (double) numOfTickMarks;
// search for the best tick unit that fits
double tickUnitRounded = 0;
double minRounded = 0;
double maxRounded = 0;
int count = 0;
double reqLength = Double.MAX_VALUE;
// loop till we find a set of ticks that fit length and result in a total of less than 20 tick marks
while (reqLength > length || count > 20) {
int exp = (int) Math.floor(Math.log10(tickUnit));
final double mant = tickUnit / Math.pow(10, exp);
double ratio = mant;
if (mant > 5d) {
exp++;
ratio = 1;
} else if (mant > 1d) {
ratio = mant > 2.5 ? 5 : 2.5;
}
tickUnitRounded = ratio * Math.pow(10, exp);
minRounded = Math.floor(paddedMin / tickUnitRounded) * tickUnitRounded;
maxRounded = Math.ceil(paddedMax / tickUnitRounded) * tickUnitRounded;
count = 0;
for (double major = minRounded; major <= maxRounded; major
+= tickUnitRounded, count++) {
System.out.println("minRounded: " + minRounded);
System.out.println("maxRounded: " + maxRounded);
System.out.println("major: " + major);
System.out.println("tickUnitRounded: " + tickUnitRounded);
System.out.println("-------------------------------------");
}
}
return null;
}
}
UPDATE
The Bug-Report: https://bugs.openjdk.java.net/browse/JDK-8136535
A fix is scheduled for version 9.

Related

Generate random number given a probabilty the higher the probability is the higher the result

I am working on a simple project and I have this method:
public int generatePeople(float luck, int min, int max) {
SecureRandom sr = new SecureRandom();
int people = 0;
// generate random people
return people;
}
I would like to generate an amount of people between min and max, but the number of people increases or decreases depending on the variable luck, this should be similar to a Gaussian but I have no idea on how to do it on Java.
Depending on the value of luck the center of the interval will be different for example having luck = 0.5f it will be more likely to have values around (min + max)/2 (the center of the interval) while having a luck = 0.25f or luck = 0.75f will move the center of values on the left side (min + max)/4 or on the right side 3*(min+max)/4 meaning that it will be more likely to have values from the left side or the right side.
We can generate a random int number from min to max with
int randomInt = sr.nextGaussian() * (max - min) + min;
Then to add the influence of luck we can add the shift value.
Considering luck a float number from 0 to 1 and that if luck is 0.5 it has to have no influence, we can achive this with something like this:
int people = randomInt + (luck - 0.5f) * (sr.nextGaussian() * (max - min) + min);
However, by adding something we can exceed min or max value. To avoid that we can then add a check for exceeding and assign min or max value in case of exceeding:
people = Math.min(people, max);
people = Math.max(people, min);
Assuming the min,max range is inclusive there are span = max-min+1 possible values. One approach would be define a Gaussian Probability Density Function (PDF), centered on luck * span, with a sigma of, for example, span/2 - you could vary this to alter the tightness of the distribution around the center.
You'd then use the PDF to calculate a 'weight' for each possible value within the range of span - values near the center of the distribution will get a higher weight - and use a standard technique for selecting a random value within a range, where each value has an associated weight.
Here's some Java code to illustrate.
static int generatePeople(double luck, int min, int max)
{
int span = max - min + 1;
double mean = luck * span;
double sigma = span / 2;
double[] weights = new double[span];
double totWeight = 0;
for(int i=0; i<span; i++)
{
weights[i] = pdf(mean, sigma, i+0.5);
totWeight += weights[i];
}
double rnd = Math.random()*totWeight;
for(int i=0; i<span; i++)
{
if(rnd < weights[i]) return min+i;
rnd -= weights[i];
}
return max;
}
static double pdf(double mean, double sigma, double x)
{
double e = (mean - x) / sigma;
return Math.exp(-(e*e)/2)/(sigma*Math.sqrt(2*Math.PI));
}
We can illustrate the behavior of this approach by generating a large number of values and examining the distribution between min and max for a given value of `luck'.
static void test(double luck, int min, int max)
{
int[] score = new int[max-min+1];
for(int i=0; i<1000*score.length; i++)
score[generatePeople(luck, min, max)-min]++;
for(int i=0; i<score.length; i++)
{
System.out.println(min+i + " : " + score[i]);
}
}
Output:
test(0.5, 0, 9);
Gives:
0 : 786
1 : 878
2 : 1059
3 : 1110
4 : 1152
5 : 1162
6 : 1106
7 : 1020
8 : 962
9 : 765
While
test(0.25, 0, 9);
Gives:
0 : 1195
1 : 1236
2 : 1285
3 : 1284
4 : 1146
5 : 1078
6 : 904
7 : 771
8 : 610
9 : 491

Maximum height of the staircase

Given an integer A representing the square blocks. The height of each square block is 1. The task is to create a staircase of max height using these blocks. The first stair would require only one block, the second stair would require two blocks and so on. Find and return the maximum height of the staircase.
Your submission failed for the following input: A : 92761
Your function returned the following : 65536
The expected returned value : 430
Approach:
We are interested in the number of steps and we know that each step Si uses exactly Bi number of bricks. We can represent this problem as an equation:
n * (n + 1) / 2 = T (For Natural number series starting from 1, 2, 3, 4, 5 …)
n * (n + 1) = 2 * T
n-1 will represent our final solution because our series in problem starts from 2, 3, 4, 5…
Now, we just have to solve this equation and for that we can exploit binary search to find the solution to this equation. Lower and Higher bounds of binary search are 1 and T.
CODE
public int solve(int A) {
int l=1,h=A,T=2*A;
while(l<=h)
{
int mid=l+(h-l)/2;
if((mid*(mid+1))==T)
return mid;
if((mid*(mid+1))>T && (mid!=0 && (mid*(mid-1))<=T) )
return mid-1;
if((mid*(mid+1))>T)
h=mid-1;
else
l=mid+1;
}
return 0;
}
To expand on the comment by Matt Timmermans:
You know that for n steps, you need (n * (n + 1))/2 blocks. You want know, if given B blocks, how many steps you can create.
So you have:
(n * (n + 1))/2 = B
(n^2 + n)/2 = B
n^2 + n = 2B
n^2 + n - 2B = 0
That looks suspiciously like something for which you'd use the quadratic formula.
In this case, a=1, b=1, and c=(-2B). Plugging the numbers into the formula:
n = ((-b) + sqrt(b^2 - 4*a*c))/(2*a)
= (-1 + sqrt(1 - 4*1*(-2B)))/(2*a)
= (-1 + sqrt(1 + 8B))/2
= (sqrt(1 + 8B) - 1)/2
So if you have 5050 blocks, you get:
n = (sqrt(1 + 40400) - 1)/2
= (sqrt(40401) - 1)/2
= (201 - 1)/2
= 100
Try it with the quadratic formula calculator. Use 1 for the value of a and b, and replace c with negative two times the number of blocks you're given. So in the example above, c would be -10100.
In your program, since you can't have a partial step, you'd want to truncate the result.
Why are you using all these formulas? A simple while() loop should do the trick, eventually, it's just a simple Gaussian Sum ..
public static int calculateStairs(int blocks) {
int lastHeight = 0;
int sum = 0;
int currentHeight = 0; //number of bricks / level
while (sum <= blocks) {
lastHeight = currentHeight;
currentHeight++;
sum += currentHeight;
}
return lastHeight;
}
So this should do the job as it also returns the expected value. Correct me if im wrong.
public int solve(int blocks) {
int current; //Create Variables
for (int x = 0; x < Integer.MAX_VALUE; x++) { //Increment until return
current = 0; //Set current to 0
//Implementation of the Gauss sum
for (int i = 1; i <= x; i++) { //Sum up [1,*current height*]
current += i;
} //Now we have the amount of blocks required for the current height
//Now we check if the amount of blocks is bigger than
// the wanted amount, and if so we return the last one
if (current > blocks) {
return x - 1;
}
}
return current;
}

BigInteger: count the number of decimal digits in a scalable method

I need the count the number of decimal digits of a BigInteger. For example:
99 returns 2
1234 returns 4
9999 returns 4
12345678901234567890 returns 20
I need to do this for a BigInteger with 184948 decimal digits and more. How can I do this fast and scalable?
The convert-to-String approach is slow:
public String getWritableNumber(BigInteger number) {
// Takes over 30 seconds for 184948 decimal digits
return "10^" + (number.toString().length() - 1);
}
This loop-devide-by-ten approach is even slower:
public String getWritableNumber(BigInteger number) {
int digitSize = 0;
while (!number.equals(BigInteger.ZERO)) {
number = number.divide(BigInteger.TEN);
digitSize++;
}
return "10^" + (digitSize - 1);
}
Are there any faster methods?
Here's a fast method based on Dariusz's answer:
public static int getDigitCount(BigInteger number) {
double factor = Math.log(2) / Math.log(10);
int digitCount = (int) (factor * number.bitLength() + 1);
if (BigInteger.TEN.pow(digitCount - 1).compareTo(number) > 0) {
return digitCount - 1;
}
return digitCount;
}
The following code tests the numbers 1, 9, 10, 99, 100, 999, 1000, etc. all the way to ten-thousand digits:
public static void test() {
for (int i = 0; i < 10000; i++) {
BigInteger n = BigInteger.TEN.pow(i);
if (getDigitCount(n.subtract(BigInteger.ONE)) != i || getDigitCount(n) != i + 1) {
System.out.println("Failure: " + i);
}
}
System.out.println("Done");
}
This can check a BigInteger with 184,948 decimal digits and more in well under a second.
This looks like it is working. I haven't run exhaustive tests yet, n'or have I run any time tests but it seems to have a reasonable run time.
public class Test {
/**
* Optimised for huge numbers.
*
* http://en.wikipedia.org/wiki/Logarithm#Change_of_base
*
* States that log[b](x) = log[k](x)/log[k](b)
*
* We can get log[2](x) as the bitCount of the number so what we need is
* essentially bitCount/log[2](10). Sadly that will lead to inaccuracies so
* here I will attempt an iterative process that should achieve accuracy.
*
* log[2](10) = 3.32192809488736234787 so if I divide by 10^(bitCount/4) we
* should not go too far. In fact repeating that process while adding (bitCount/4)
* to the running count of the digits will end up with an accurate figure
* given some twiddling at the end.
*
* So here's the scheme:
*
* While there are more than 4 bits in the number
* Divide by 10^(bits/4)
* Increase digit count by (bits/4)
*
* Fiddle around to accommodate the remaining digit - if there is one.
*
* Essentially - each time around the loop we remove a number of decimal
* digits (by dividing by 10^n) keeping a count of how many we've removed.
*
* The number of digits we remove is estimated from the number of bits in the
* number (i.e. log[2](x) / 4). The perfect figure for the reduction would be
* log[2](x) / 3.3219... so dividing by 4 is a good under-estimate. We
* don't go too far but it does mean we have to repeat it just a few times.
*/
private int log10(BigInteger huge) {
int digits = 0;
int bits = huge.bitLength();
// Serious reductions.
while (bits > 4) {
// 4 > log[2](10) so we should not reduce it too far.
int reduce = bits / 4;
// Divide by 10^reduce
huge = huge.divide(BigInteger.TEN.pow(reduce));
// Removed that many decimal digits.
digits += reduce;
// Recalculate bitLength
bits = huge.bitLength();
}
// Now 4 bits or less - add 1 if necessary.
if ( huge.intValue() > 9 ) {
digits += 1;
}
return digits;
}
// Random tests.
Random rnd = new Random();
// Limit the bit length.
int maxBits = BigInteger.TEN.pow(200000).bitLength();
public void test() {
// 100 tests.
for (int i = 1; i <= 100; i++) {
BigInteger huge = new BigInteger((int)(Math.random() * maxBits), rnd);
// Note start time.
long start = System.currentTimeMillis();
// Do my method.
int myLength = log10(huge);
// Record my result.
System.out.println("Digits: " + myLength+ " Took: " + (System.currentTimeMillis() - start));
// Check the result.
int trueLength = huge.toString().length() - 1;
if (trueLength != myLength) {
System.out.println("WRONG!! " + (myLength - trueLength));
}
}
}
public static void main(String args[]) {
new Test().test();
}
}
Took about 3 seconds on my Celeron M laptop so it should hit sub 2 seconds on some decent kit.
I think that you could use bitLength() to get a log2 value, then change the base to 10.
The result may be wrong, however, by one digit, so this is just an approximation.
However, if that's acceptable, you could always add 1 to the result and bound it to be at most. Or, subtract 1, and get at least.
You can first convert the BigInteger to a BigDecimal and then use this answer to compute the number of digits. This seems more efficient than using BigInteger.toString() as that would allocate memory for String representation.
private static int numberOfDigits(BigInteger value) {
return significantDigits(new BigDecimal(value));
}
private static int significantDigits(BigDecimal value) {
return value.scale() < 0
? value.precision() - value.scale()
: value.precision();
}
This is an another way to do it faster than Convert-to-String method. Not the best run time, but still reasonable 0.65 seconds versus 2.46 seconds with Convert-to-String method (at 180000 digits).
This method computes the integer part of the base-10 logarithm from the given value. However, instead of using loop-divide, it uses a technique similar to Exponentiation by Squaring.
Here is a crude implementation that achieves the runtime mentioned earlier:
public static BigInteger log(BigInteger base,BigInteger num)
{
/* The technique tries to get the products among the squares of base
* close to the actual value as much as possible without exceeding it.
* */
BigInteger resultSet = BigInteger.ZERO;
BigInteger actMult = BigInteger.ONE;
BigInteger lastMult = BigInteger.ONE;
BigInteger actor = base;
BigInteger incrementor = BigInteger.ONE;
while(actMult.multiply(base).compareTo(num)<1)
{
int count = 0;
while(actMult.multiply(actor).compareTo(num)<1)
{
lastMult = actor; //Keep the old squares
actor = actor.multiply(actor); //Square the base repeatedly until the value exceeds
if(count>0) incrementor = incrementor.multiply(BigInteger.valueOf(2));
//Update the current exponent of the base
count++;
}
if(count == 0) break;
/* If there is no way to multiply the "actMult"
* with squares of the base (including the base itself)
* without keeping it below the actual value,
* it is the end of the computation
*/
actMult = actMult.multiply(lastMult);
resultSet = resultSet.add(incrementor);
/* Update the product and the exponent
* */
actor = base;
incrementor = BigInteger.ONE;
//Reset the values for another iteration
}
return resultSet;
}
public static int digits(BigInteger num)
{
if(num.equals(BigInteger.ZERO)) return 1;
if(num.compareTo(BigInteger.ZERO)<0) num = num.multiply(BigInteger.valueOf(-1));
return log(BigInteger.valueOf(10),num).intValue()+1;
}
Hope this will helps.

generating random integers between 0 and some value where half are in the set (0,5] and the other half (5,x]

I am looking for a way to generate a random integer from 0-x, where x is defined at runtime by the human user. However, half of those numbers must be greater than zero and less than or equal to 5 (0,5] and the other half must be in the set of [6,x].
I know that the following code will generate a number from 0-x. The main problem is ensuring that half of them will be in the set of (0,5]
Math.random() * x;
I'm not looking for someone to do this for me, just looking for some hints. Thank you!
You could first flip a coin and based on that generate upper or lower number:
final Random rnd = new Random();
while (true)
System.out.println(rnd.nextBoolean()? rnd.nextInt(6) : 6 + rnd.nextInt(x-5));
Or, using the unwieldy Math.random() (bound to have trouble at the edges of the range):
while (true)
System.out.println(Math.floor(
math.random() < 0.5 ? (Math.random() * 6) : (6 + (x-5) * Math.random())
));
Consider this as a hint only :)
I'd do this:
double halfX= x / 2.0;
double random = Math.random() * x;
if( random< halfX ) {
random = random*5.0/(halfX);
} else {
random = (random/halfX - 1) * (x-5.0) + 5.0 ;
}
I think it is good now. This is less understandable and readable, but has only one call to random for each invocation. Apart from the fact MarkoTopolnic pointed out: the user needed an integer... I'd have to calculate what rounding would do to the distribution.
This is absolutely not easy... My head aches, so the best I can come up with:
double halfX= x / 2.0 + 1.0;
double random = Math.random() * (x+2.0);
int randomInt;
if( random< halfX ) {
randomInt = (int) (random*6.0/(halfX)); //truncating, means equal distribution from 0-5
} else {
randomInt = (int) ((random/halfX - 1.0) * (x-5.0) + 6.0) ; //notice x-5.0, this range before truncation is actually from 6.0 to x+1.0, after truncating it gets to [6;x], as this is integer
}
The second part I'm not sure though... A few hours of sleep would get it right... I hope the intentions and logic is clear though...
In case anyone is curious, here's the solution I came up with based on Marko's solution.
I had the following class defined for another part of this program.
public class BooleanSource
{
private double probability;
BooleanSource(double p) throws IllegalArgumentException
{
if(p < 0.0)
throw new IllegalArgumentException("Probability too small");
if(p > 1.0)
throw new IllegalArgumentException("Probability too large");
probability = p;
}
public boolean occurs()
{
return (Math.random() < probability);
}
}
With that, I did the following
private static void setNumItems(Customer c, int maxItems)
{
BooleanSource numProb = new BooleanSource(0.5);
int numItems;
if(numProb.occurs())
{
double num = (Math.random()*4)+1;
numItems = (int) Math.round(num);
}
else
{
double num = 5 + (maxItems-5)*Math.random();
numItems = (int) Math.round(num);
}
c.setNumItems(numItems);
}

Efficient implementation of mutual information in Java

I'm looking to calculate mutual information between two features, using Java.
I've read Calculating Mutual Information For Selecting a Training Set in Java already, but that was a discussion of if mutual information was appropriate for the poster, with only some light pseudo-code as to the implementation.
My current code is below, but I'm hoping there is a way to optimise it, as I have large quantities of information to process. I'm aware that calling out to another language/framework may improve speed, but would like to focus on solving this in Java for now.
Any help much appreciated.
public static double calculateNewMutualInformation(double frequencyOfBoth, double frequencyOfLeft,
double frequencyOfRight, int noOfTransactions) {
if (frequencyOfBoth == 0 || frequencyOfLeft == 0 || frequencyOfRight == 0)
return 0;
// supp = f11
double supp = frequencyOfBoth / noOfTransactions; // P(x,y)
double suppLeft = frequencyOfLeft / noOfTransactions; // P(x)
double suppRight = frequencyOfRight / noOfTransactions; // P(y)
double f10 = (suppLeft - supp); // P(x) - P(x,y)
double f00 = (1 - suppRight) - f10; // (1-P(y)) - P(x,y)
double f01 = (suppRight - supp); // P(y) - P(x,y)
// -1 * ((P(x) * log(Px)) + ((1 - P(x)) * log(1-p(x)))
double HX = -1 * ((suppLeft * MathUtils.logWithoutNaN(suppLeft)) + ((1 - suppLeft) * MathUtils.logWithoutNaN(1 - suppLeft)));
// -1 * ((P(y) * log(Py)) + ((1 - P(y)) * log(1-p(y)))
double HY = -1 * ((suppRight * MathUtils.logWithoutNaN(suppRight)) + ((1 - suppRight) * MathUtils.logWithoutNaN(1 - suppRight)));
double one = (supp * MathUtils.logWithoutNaN(supp)); // P(x,y) * log(P(x,y))
double two = (f10 * MathUtils.logWithoutNaN(f10));
double three = (f01 * MathUtils.logWithoutNaN(f01));
double four = (f00 * MathUtils.logWithoutNaN(f00));
double HXY = -1 * (one + two + three + four);
return (HX + HY - HXY) / (HX == 0 ? MathUtils.EPSILON : HX);
}
public class MathUtils {
public static final double EPSILON = 0.000001;
public static double logWithoutNaN(double value) {
if (value == 0) {
return Math.log(EPSILON);
} else if (value < 0) {
return 0;
}
return Math.log(value);
}
I have found the following to be fast, but I have not compared it against your method - only that provided in weka.
It works on the premise of re-arranging the MI equation so that it is possible to minimise the number of floating point operations:
We start by defining as count/frequency over number of samples/transactions. So, we define the number of items as n, the number of times x occurs as |x|, the number of times y occurs as |y| and the number of times they co-occur as |x,y|. We then get,
.
Now, we can re-arrange that by flipping the bottom of the inner divide, this gives us (n|x,y|)/(|x||y|). Also, compute use N = 1/n so we have one less divide operation. This gives us:
This gives us the following code:
/***
* Computes MI between variables t and a. Assumes that a.length == t.length.
* #param a candidate variable a
* #param avals number of values a can take (max(a) == avals)
* #param t target variable
* #param tvals number of values a can take (max(t) == tvals)
* #return
*/
static double computeMI(int[] a, int avals, int[] t, int tvals) {
double numinst = a.length;
double oneovernuminst = 1/numinst;
double sum = 0;
// longs are required here because of big multiples in calculation
long[][] crosscounts = new long[avals][tvals];
long[] tcounts = new long[tvals];
long[] acounts = new long[avals];
// Compute counts for the two variables
for (int i=0;i<a.length;i++) {
int av = a[i];
int tv = t[i];
acounts[av]++;
tcounts[tv]++;
crosscounts[av][tv]++;
}
for (int tv=0;tv<tvals;tv++) {
for (int av=0;av<avals;av++) {
if (crosscounts[av][tv] != 0) {
// Main fraction: (n|x,y|)/(|x||y|)
double sumtmp = (numinst*crosscounts[av][tv])/(acounts[av]*tcounts[tv]);
// Log bit (|x,y|/n) and update product
sum += oneovernuminst*crosscounts[av][tv]*Math.log(sumtmp)*log2;
}
}
}
return sum;
}
This code assumes that the values of a and t are not sparse (i.e. min(t)=0 and tvals=max(t)) for it to be efficient. Otherwise (as commented) large and unnecessary arrays are created.
I believe this approach improves further when computing MI between several variables at once (the count operations can be condensed - especially that of the target). The implementation I use is one that interfaces with WEKA.
Finally, it might be more efficient even to take the log out of the summations. But I am unsure whether log or power will take more computation within the loop. This is done by:
Apply a*log(b) = log(a^b)
Move the log to outside the summations, using log(a)+log(b) = log(ab)
and gives:
I am not mathematician but..
There are just a bunch of floating point calculations here. Some mathemagician might be able to reduce this to fewer calculation, try the Math SE.
Meanwhile, you should be able to use a static final double for Math.log(EPSILON)
Your problem might not be a single call but the volume of data for which this calculation has to be done. That problem is better solved by throwing more hardware at it.

Categories