Mutual Information: Calculation example (Java) in contingency table style - java

I am using the pointwise mutual information (PMI) association measure to calculate how frequently words co-occure by using word-frequencies obtained from a large corpus.
I am calculating PMI via the classical formulae of
log(P(X,Y) / (P(X)*P(Y))
and using the contingency table notation with joint- and marginal frequencies I found on http://collocations.de/AM/index.html
The results I get are very similar, but not the same. As far as I understood things both methods should result in the exact same result value.
I made a little Java-programm (minimal working example) that uses word-frequencies from a corpus using both formulae. I get different results for the two methods. Does someone know why ?
public class MutualInformation
{
public static void main(String[] args)
{
long N = 1024908267229L;
// mutual information = log(P(X,Y) / P(X) * P(Y))
double XandY = (double) 1210738 / N;
double X = (double) 67360790 / N;
double Y = (double) 1871676 / N;
System.out.println(Math.log(XandY / (X * Y)) / Math.log(10));
System.out.println("------");
// contingency table notation as on www.collocations.de
long o11 = 1210738;
long o12 = 67360790;
long o21 = 1871676;
long c1 = o11 + o21;
long r1 = o11 + o12;
double e11 = ((double) r1 * c1 / N);
double frac = (double) o11 / e11;
System.out.println(Math.log(frac) / Math.log(10));
}
}

Let write it in the same terms
long o11 = 1210738;
long o12 = 67360790;
long o21 = 1871676;
long N = 1024908267229L
The first equation is
XandY = o11 / N;
X = o12 / N;
Y = o21 / N;
so
XandY / (X * Y)
is
(o11 / N) / (o12 / N * o21 / N)
or
o11 * N / (o12 * o21)
Note there is no adding going on.
The second equation is rather different.
c1 = o11 + o21;
r1 = o11 + o12;
e11 = ((double) r1 * c1 / N);
frac = (double) o11 / e11;
so
e11 = (o11 + o21) * (o11 + o12) /N;
frac = (o11 * N) / (o11^2 + o11 * o12 + o21 * o11 + o21 * o12);
I would expect these to be different as mathematically they are not the same.
I suggest you write what you want as maths first, and then find the most efficient way of coding it.

Related

Karatsuba Algorithm without BigInteger in Java, unexpected behaviour while recursion

So I want to run Karatsuba Algorithm without using class BigInteger in Java, so upon following the pseudo-code and this question, I came with the following code
public static long recKaratsuba(long i1, long i2){
if(i1<10 || i2<10) {
return i1*i2 ;
}
long len = Math.round(Long.toString(Math.max(i1,i2)).length());
long N = Math.round(len/2) ;
long b = (long) (i1% Math.pow(10, N)) ;
long a = (long) (i1/ Math.pow(10,N));
long d = (long) (i2% Math.pow(10,N)) ;
long c = (long) (i2/ Math.pow(10,N));
//System.out.println("a,b,c,d :" + a + b + c + d);
long ac = recKaratsuba(a, c) ;
long bd = recKaratsuba(b, d) ;
long pos = recKaratsuba(a+b,c+d) ;
return ((long)(bd + ac*Math.pow(10,len) + (pos -ac -bd)*Math.pow(10,N) )) ;
}
Now, the problem with this is that it's producing the wrong answer, 1234*5678 is giving 11686652, which should've been 7006652.
As a beginner to Java and algorithms, I am unable to pinpoint the exact bug in this code, also I do realize that this program is very inefficient and doesn't work for more than 4 digits (according to the linked question ). But this is intuitively what I came up with originally after learning the pseudo-code.
So my question is, what is the problem in my code and how do I execute the following algorithm without using the BigInteger method?
There are a few things i notice:
Instead of i1 and i2 maybe use x and y
Variables len and N are int, not long
No need to round the maximum of the lengths of the string-representations: Lengths are ints, ints are whole numbers and cant be rounded
No need to round the division by 2: Dividing a whole number will always result in a whole number (integer division is done)
The error is in the return-statement: Math.pow(10, len) should be Math.pow(10, 2 * N) instead, this is important if N is uneven
Avoid multiple identical calculations: especially Math.pow(10, N)
The fixed code gives the correct results for all examples that i have tested.
public static long recKaratsuba2(long x, long y) {
if (x < 10 || y < 10) {
return x * y;
}
int len = Long.toString(Math.max(x, y)).length();
double den = Math.pow(10, len / 2);
long a = (long) (x / den);
long b = (long) (x % den);
long c = (long) (y / den);
long d = (long) (y % den);
long ac = recKaratsuba2(a, c);
long bd = recKaratsuba2(b, d);
long pos = recKaratsuba2(a + b, c + d);
return (long) (bd + den * (ac * den + (pos - ac - bd)));
}

Finding the cos of an angle in java

I was asked to find the cos following this equation:
I was able to find the sin of the angle, however when finding the cos, the number I would get is quite different from the correct value:
I used the following code for finding the cos.
ps: I can't use math.cos
public static double cos(double x, int n){
// declaring cos and factorial
double cos = 0.0;
// this loop determines how long does the loop go so the answer is more accurate
for (long howlong = 1 ; howlong <=n; howlong++){
double factorial =1;
// this will calculate the factorial for even numbers ex/ 2*2 = 4 , 4-2 = 2
// for the denominator
for (int factorialnumber=1; factorialnumber<=2*howlong-2; factorialnumber++){
factorial = factorial * howlong;
}
// now we need to create the pattern for the + and -
// so we use % that switches the sign everytime i increments by 1
if (howlong%2==1){
cos = cos + (double) (Math.pow(x, 2*howlong-2)/factorial);
}
else{
cos = cos - (double) (Math.pow(x, 2*howlong-2)/factorial);
}
}
return cos;
}
edit: I figured out my mistake as it was multiplying the factorial by how long instead of factorial number.
You have two bugs.
(Bug 1) Where you wrote
factorial = factorial * howlong;
it should have been
factorial = factorial * factorialnumber;
(Bug 2) You're not resetting your factorials on each iteration through the outer loop. So you need to move the line
double factorial =1;
down a couple of lines, so that it's inside the outer loop.
If you make those two changes, then the result of cos(Math.PI / 6, 10) is 0.8660254037844386 which seems correct to me.
The computation of your factorial was wrong.
Try it with this code:
public static double cos(double x, int n) {
// declaring cos and factorial
double cos = 0.0;
// this loop determines how long does the loop go so the answer is more
// accurate
for (long howlong = 1; howlong <= n; howlong++) {
// now we need to create the pattern for the + and -
// so we use % that switches the sign everytime i increments by 1
if (howlong % 2 == 1) {
cos = cos + Math.pow(x, 2 * howlong - 2) / factorial(2 * howlong - 2);
}
else {
cos = cos - Math.pow(x, 2 * howlong - 2) / factorial(2 * howlong - 2);
}
}
return cos;
}
public static long factorial(long n) {
long result = 1;
for (int i = 2; i <= n; i++) {
result *= i;
}
return result;
}
Your calculation is not correct, please change to
double value = 1;
for (int factorialnumber = 1; factorialnumber <= 2 * howlong - 2; factorialnumber++) {
value = factorialnumber * value;
}
factorial = value;
System.out.println(value + " " + (2 * howlong - 2));

Find the Maximum Step in a Staircase

I am solving a staircase problem and came up with multiple solutions in mind. It looks like below:
Question: You will be given number of stairs say N. What is the maximum step you can make?
For N = 5, The maximum step you can make is 2 because
5 = 1 + 2 + 2
Similarly for 8, its, 8 = 1 + 2 + 3 + 2, maximum step is 3
Similarly for 16, its, 16 = 1 + 2 + 3 + 4 + 5 + 1, maximum step is 5.
When the next number is less than previous then the series will stop.
Clearly, The maximum step is maximum number in the series.
Solution 1:
I came up with a simple solution. It works fine but not optimized.
Below is the following code:
public static long stairCase(long N) {
long i = 1;
long curr = N;
while (i < N) {
curr = curr - i;
if (i >= curr) {
return i;
}
i = i + 1;
}
return i;
}
Solution 2:
Then i figured out that its n(n+1) / 2. So, if i put n(n+1)/2 = no. of
Stairs.I cant get the solution by calculating its roots and taking
highest of the roots. My Code looks like below but it doesn't works
for N = 16 and many other cases.
int b = 1;
int var = (1) - (4 * -c);
double temp1 = Math.sqrt(var);
double root1 = (-b + temp1) / (2 * a);
double root2 = (-b - temp1) / (2 * a);
double root1Abs = Math.abs(root1);
double root2Abs = Math.abs(root2);
return (long) (root1Abs > root2Abs ? Math.floor(root1Abs) : Math
.floor(root2Abs));
Solution 3:
I came up with another solution but still its not working for N = 4
and many other cases. Below is my code:
double answer = Math.sqrt(c * 2);
return (long) (Math.floor(answer));
Does Anyone have the optimized solutions(preferably in constant time) because the input is too big(long).
m = number of stair
n = result
The equation is
n * (n+1) < 2m
The solution is
n < (sqrt(8*m+1)-1)/2
We try to find maximum integer so
n = floor((sqrt(8*m+1)-1)/2)
The Java Code:
import java.io.*;
public class Solution {
public static int staircase(int m){
return (int) Math.floor((Math.sqrt(8*(double)m+1)-1)/2);
}
public static void main(String[] args){
System.out.println("Result:"+staircase(16));
}
}
Actually I figured the solution by myself ..I should use 2* c
int a = 1;
int b = 1;
int var = (1) - (4 * -2 * c);
double temp1 = Math.sqrt(var);
double root1 = (-b + temp1) / (2 * a);
double root2 = (-b - temp1) / (2 * a);
return (long) (root1 > root2 ? Math.floor(root1) : Math.floor(root2));

Java calculate confidence interval

I'm looking for some method that takes or does not take parameters for calculate confidence interval.
I don't want the apache methods,
just a simple method or som type of code that does this.
My knowledge is restricted, it basically boils down to completing an online task against an expected set of answers (https://www.hackerrank.com/challenges/stat-warmup).
However, as far as I read up, there are mistakes in the given answer, and I'd like to correct these.
My source is pretty much wikipedia https://en.wikipedia.org/wiki/Confidence_interval#Basic_Steps
/**
*
* #return int[]{lower, upper}, i.e. int array with Lower and Upper Boundary of the 95% Confidence Interval for the given numbers
*/
private static double[] calculateLowerUpperConfidenceBoundary95Percent(int[] givenNumbers) {
// calculate the mean value (= average)
double sum = 0.0;
for (int num : givenNumbers) {
sum += num;
}
double mean = sum / givenNumbers.length;
// calculate standard deviation
double squaredDifferenceSum = 0.0;
for (int num : givenNumbers) {
squaredDifferenceSum += (num - mean) * (num - mean);
}
double variance = squaredDifferenceSum / givenNumbers.length;
double standardDeviation = Math.sqrt(variance);
// value for 95% confidence interval, source: https://en.wikipedia.org/wiki/Confidence_interval#Basic_Steps
double confidenceLevel = 1.96;
double temp = confidenceLevel * standardDeviation / Math.sqrt(givenNumbers.length);
return new double[]{mean - temp, mean + temp};
}
here is you go this is the code calculate Confidence Interval
/**
*
* #author alaaabuzaghleh
*/
public class TestCI {
public static void main(String[] args) {
int maximumNumber = 100000;
int num = 0;
double[] data = new double[maximumNumber];
// first pass: read in data, compute sample mean
double dataSum = 0.0;
while (num<maximumNumber) {
data[num] = num*10;
dataSum += data[num];
num++;
}
double ave = dataSum / num;
double variance1 = 0.0;
for (int i = 0; i < num; i++) {
variance1 += (data[i] - ave) * (data[i] - ave);
}
double variance = variance1 / (num - 1);
double standardDaviation= Math.sqrt(variance);
double lower = ave - 1.96 * standardDaviation;
double higher = ave + 1.96 * standardDaviation;
// print results
System.out.println("average = " + ave);
System.out.println("sample variance = " + variance);
System.out.println("sample standard daviation = " + standardDaviation);
System.out.println("approximate confidence interval");
System.out.println("[ " + lower + ", " + higher + " ]");
}
}

Differential Equations in Java

I am trying to create a simple simulation program of SIR-epidemics model in java.
Basically, SIR is defined by a system of three differential equations:
S'(t) = - l(t) * S(t)
I'(t) = l(t) * S(t) - g(t) * I(t)
R'(t) = g(t) * I(t)
S - susceptible people, I - infected people, R - recovered people.
l(t) = [c * x * I(t)] / N(T)
c - number of contacts, x - infectiveness (probability to get sick after contact with sick person), N(t) - total population (which is constant).
How can I solve such differential equations in Java? I don't think I know any useful way to do that, so my implementation produces rubbish.
public class Main {
public static void main(String[] args) {
int tppl = 100;
double sppl = 1;
double hppl = 99;
double rppl = 0;
int numContacts = 50;
double infectiveness = 0.5;
double lamda = 0;
double duration = 0.5;
double gamma = 1 / duration;
for (int i = 0; i < 40; i++) {
lamda = (numContacts * infectiveness * sppl) / tppl;
hppl = hppl - lamda * hppl;
sppl = sppl + lamda * hppl - gamma * sppl;
rppl = rppl + gamma * sppl;
System.out.println (i + " " + tppl + " " + hppl + " " + sppl + " " + rppl);
}
}
}
I would greatly appreciate any help, many thanks in advance!
Time-series differential equations can be simulated numerically by taking dt = a small number, and using one of several numerical integration techniques e.g. Euler's method, or Runge-Kutta. Euler's method may be primitive but it works OK for some equations and it's simple enough that you might give it a try. e.g.:
S'(t) = - l(t) * S(t)
I'(t) = l(t) * S(t) - g(t) * I(t)
R'(t) = g(t) * I(t)
int N = 100;
double[] S = new double[N+1];
double[] I = new double[N+1];
double[] R = new double[N+1];
S[0] = /* initial value */
I[0] = /* initial value */
R[0] = /* initial value */
double dt = total_time / N;
for (int i = 0; i < 100; ++i)
{
double t = i*dt;
double l = /* compute l here */
double g = /* compute g here */
/* calculate derivatives */
double dSdt = - I[i] * S[i];
double dIdt = I[i] * S[i] - g * I[i];
double dRdt = g * I[i];
/* now integrate using Euler */
S[i+1] = S[i] + dSdt * dt;
I[i+1] = I[i] + dIdt * dt;
R[i+1] = R[i] + dRdt * dt;
}
The tough part is figuring out how many steps to use. You should read one of the articles I have linked to. More sophisticated differential equation solvers use variable step sizes that adapt to accuracy/stability for each step.
I would actually recommend using numerical software like R or Mathematica or MATLAB or Octave, as they include ODE solvers and you wouldn't need to go to all the trouble yourself. But if you need to do this as part of a larger Java application, at least try it out first with math software, then get a sense of what the step sizes are and what solvers work.
Good luck!

Categories