In an effort to learn and use hidden markov models, I am writing my own code to implement them. I am using this wiki article to help with my work. I do not wish to resort to pre-written libraries, because I have found I can achieve a better understanding if I write it myself. And no, this isn't a school assignment! :)
Unfortunately, my highest level of education consists of high school computer science and statistics. I have no background in Machine Learning besides the casual poking around with ANN libraries and TensorFlow. I am therefore having a bit of trouble translating mathematical equations into code. Specifically, I'm worried my implementations of the alpha and beta functions aren't functionally correct. If anyone can assist in describing where I messed up and how to correct my mistakes to have a functioning HMM implementation, it'd be greatly appreciated.
Here are my class-wide globals:
public int n; //number of states
public int t; //number of observations
public int time; //iteration holder
public double[][] emitprob; //Emission parameter
public double[][] stprob; //State transition parameter
public ArrayList<String> states, observations, x, y;
My constructor:
public Model(ArrayList<String> sts, ArrayList<String> obs)
{
//the most important algorithm we need right now is
//unsupervised learning through BM. Supervised is
//pretty easy.
//need hashtable of count objects... Aya...
//perhaps a learner...?
states = sts;
observations = obs;
n = states.size();
t = observations.size();
x = new ArrayList();
y = new ArrayList();
time = 0;
stprob = new double[n][n];
emitprob = new double[n][t];
stprob = newDistro(n,n);
emitprob = newDistro(n,t);
}
The newDistro method is for creating a new, uniform, normal distribution:
public double[][] newDistro(int x, int y)
{
Random r = new Random(System.currentTimeMillis());
double[][] returnme = new double[x][y];
double sum = 0;
for(int i = 0; i < x; i++)
{
for(int j = 0; j < y; j++)
{
returnme[i][j] = Math.abs(r.nextInt());
sum += returnme[i][j];
}
}
for(int i = 0; i < x; i++)
{
for(int j = 0; j < y; j++)
{
returnme[i][j] /= sum;
}
}
return returnme;
}
My viterbi algorithm implementation:
public ArrayList<String> viterbi(ArrayList<String> obs)
{
//K means states
//T means observations
//T arrays should be constructed as K * T (N * T)
ArrayList<String> path = new ArrayList();
String firstObservation = obs.get(0);
int firstObsIndex = observations.indexOf(firstObservation);
double[] pi = new double[n]; //initial probs of first obs for each st
int ts = obs.size();
double[][] t1 = new double[n][ts];
double[][] t2 = new double[n][ts];
int[] y = new int[obs.size()];
for(int i = 0; i < obs.size(); i++)
{
y[i] = observations.indexOf(obs.get(i));
}
for(int i = 0; i < n; i++)
{
pi[i] = emitprob[i][firstObsIndex];
}
for(int i = 0; i < n; i++)
{
t1[i][0] = pi[i] * emitprob[i][y[0]];
t2[i][0] = 0;
}
for(int i = 1; i < ts; i++)
{
for(int j = 0; j < n; j++)
{
double maxValue = 0;
int maxIndex = 0;
//first we compute the max value
for(int q = 0; q < n; q++)
{
double value = t1[q][i-1] * stprob[q][j];
if(value > maxValue)
{
maxValue = value; //the max
maxIndex = q; //the argmax
}
}
t1[j][i] = emitprob[j][y[i]] * maxValue;
t2[j][i] = maxIndex;
}
}
int[] z = new int[ts];
int maxIndex = 0;
double maxValue = 0.0d;
for(int k = 0; k < n; k++)
{
double myValue = t1[k][ts-1];
if(myValue > maxValue)
{
myValue = maxValue;
maxIndex = k;
}
}
path.add(states.get(maxIndex));
for(int i = ts-1; i >= 2; i--)
{
z[i-1] = (int)t2[z[i]][i];
path.add(states.get(z[i-1]));
}
System.out.println(path.size());
for(String s: path)
{
System.out.println(s);
}
return path;
}
My forward algorithm, which takes place of the alpha function as described later:
public double forward(ArrayList<String> obs)
{
double result = 0;
int length = obs.size()-1;
for(int i = 0; i < n; i++)
{
result += alpha(i, length, obs);
}
return result;
}
The remaining functions are for implementing the Baum-Welch Algorithm.
The alpha function is what I'm afraid I'm doing wrong of the most on here. I had trouble understanding which "direction" it needs to iterate over the sequence - Do I start from the last element (size-1) or the first (at index zero) ?
public double alpha(int j, int t, ArrayList<String> obs)
{
double sum = 0;
if(t == 0)
{
return stprob[0][j];
}
else
{
String lastObs = obs.get(t);
int obsIndex = observations.indexOf(lastObs);
for(int i = 0; i < n; i++)
{
sum += alpha(i, t-1, obs) * stprob[i][j] * emitprob[j][obsIndex];
}
}
return sum;
}
I'm having similar "correctness" issues with my beta function:
public double beta(int i, int t, ArrayList<String> obs)
{
double result = 0;
int obsSize = obs.size()-1;
if(t == obsSize)
{
return 1;
}
else
{
String lastObs = obs.get(t+1);
int obsIndex = observations.indexOf(lastObs);
for(int j = 0; j < n; j++)
{
result += beta(j, t+1, obs) * stprob[i][j] * emitprob[j][obsIndex];
}
}
return result;
}
I'm more confident in my gamma function; However, since it explicitly requires use of alpha and beta, obviously I'm worried it'll be "off" somehow.
public double gamma(int i, int t, ArrayList<String> obs)
{
double top = alpha(i, t, obs) * beta(i, t, obs);
double bottom = 0;
for(int j = 0; j < n; j++)
{
bottom += alpha(j, t, obs) * beta(j, t, obs);
}
return top / bottom;
}
Same for my "squiggle" function - I do apologize for naming; Not sure of the actual name for the symbol.
public double squiggle(int i, int j, int t, ArrayList<String> obs)
{
String lastObs = obs.get(t+1);
int obsIndex = observations.indexOf(lastObs);
double top = alpha(i, t, obs) * stprob[i][j] * beta(j, t+1, obs) * emitprob[j][obsIndex];
double bottom = 0;
double innerSum = 0;
double outterSum = 0;
for(i = 0; i < n; i++)
{
for(j = 0; j < n; j++)
{
innerSum += alpha(i, t, obs) * stprob[i][j] * beta(j, t+1, obs) * emitprob[j][obsIndex];
}
outterSum += innerSum;
}
return top / bottom;
}
Lastly, to update my state transition and emission probability arrays, I have implemented these functions as aStar and bStar.
public double aStar(int i, int j, ArrayList<String> obs)
{
double squiggleSum = 0;
double gammaSum = 0;
int T = obs.size()-1;
for(int t = 0; t < T; t++)
{
squiggleSum += squiggle(i, j, t, obs);
gammaSum += gamma(i, t, obs);
}
return squiggleSum / gammaSum;
}
public double bStar(int i, String v, ArrayList<String> obs)
{
double top = 0;
double bottom = 0;
for(int t = 0; t < obs.size()-1; t++)
{
if(obs.get(t).equals(v))
{
top += gamma(i, t, obs);
}
bottom += gamma(i, t, obs);
}
return top / bottom;
}
In my understanding, since the b* function includes a piecewise function that returns either 1 or 0, I think implementing it in an "if" statement and only adding the result if the string is equal to the observation history is the same as what is described, since the function would render the call to gamma 0, thus saving a little computation time. Is this correct?
In summation, I want to get my math right, to ensure a successful (albeit simple) HMM implementation. As for the Baum-Welch algorithm, I am having trouble understanding how to implment the complete function - would it be as simple as running aStar over all states (as an n * n FOR loop) and bStar for all observations, inside a loop with a convergence function? Also, what would be a best-practice function for checking for convergence without overfitting?
Please let me know of everything I need to do in order to get this right.
Thank you heavily for any help you can give me!
To avoid underflow, one should use a scaling factor in the forward and backward algorithms. To get the correct result, one uses nested for loops and the steps are forward in the forward method.
The backward method is similar to the forward function.
You invoke them from the method of the Baum-Welch algorithm.
Related
What I'm trying to do is figure out a way to print the outliers of standard deviation here. The outliers are defined as having a variance greater than 2x the standard deviation. I can't figure out how but I've started by creating a boolean flag, however I don't understand the dynamics of this. Could someone please help me figure out how to print out the outliers somehow? Thanks.
public class Main {
public static void main(String[] args)
{
{
Algebra n = new Algebra();
System.out.println(" ");
System.out.println("The maximum number is: " + n.max);
System.out.println("The minimum is: " + n.min);
System.out.println("The mean is: " + n.avg);
System.out.println("The standard deviation is " + n.stdev);
}
}
}
2nd part:
public class Algebra
{
static int[] n = createArray();
int max = displayMaximum(n);
int min = displayMinimum(n);
double avg = displayAverage(n);
double stdev = displayStdDev(n);
public boolean outliers() {
for(int i = 0; i < n.length; i++)
{
boolean flag = (n[i] < stdev*2);
}
return
}
public Algebra()
{
this(n);
System.out.println("The numbers that are outliers are ");
for(int i = 0; i < n.length; i++)
{
System.out.print(" " + (n[i] < stdev*2));
}
}
public Algebra(int[] n)
{
createArray();
}
public static int[] createArray()
{
int[] n = new int[100];
for(int i = 0; i < n.length; i++)
n[i] = (int)(Math.random()*100 + 1);
return n;
}
public int displayMaximum(int[] n)
{
int maxValue = n[0];
for(int i=1; i < n.length; i++){
if(n[i] > maxValue){
maxValue = n[i];
}
}
return maxValue;
}
public int displayMinimum(int[] n)
{
int minValue = n[0];
for(int i=1;i<n.length;i++){
if(n[i] < minValue){
minValue = n[i];
}
}
return minValue;
}
protected double displayAverage(int[] n)
{
int sum = 0;
double mean = 0;
for (int i = 0; i < n.length; i++) {
sum += n[i];
mean = sum / n.length;
}
return mean;
}
protected double displayStdDev(int[] n)
{
int sum = 0;
double mean = 0;
for (int i = 0; i < n.length; i++) {
sum = sum + n[i];
mean = sum/ n.length;
}
double squareSum = 0.0;
for (int i = 0; i < n.length; i++)
{
squareSum += Math.pow(n[i] - mean, 2);
}
return Math.sqrt((squareSum) / (n.length - 1));
}
}
Variance is defined as the squared difference from the mean. This is a fairly straight forward calculation.
public static double variance(double val, double mean) {
return Math.pow(val - mean, 2);
}
You define an outlier as an instance that has a variance greater than x2 the standard deviation.
public static boolean isOutlier(double val, double mean, double std) {
return variance(val, mean) > 2*std;
}
You then just need to iterate through the values and print any values that are evaluated as an outlier.
public void printOutliers() {
for (int i : n) {
if (isOutlier(i, avg, stdev)) {
...
}
}
}
You should note that if one value is defined as an outlier and subsequently removed, values previously classified as an outlier may no longer be. You may also be interested in the extent of an outlier in the current set; One value may be an outlier to a greater extent than another.
I have this method where I find the distances with an Euclidean algorithm and save the values as double in an array of doubles. Now I need to find the minimum value of each test and return the value indexed.
public static double distance() {
for (int i = 0; i < GetFile.testMatrix.length;) {
double[] distances = new double[4000];
double minDistance = 999999;
for (int j = 0; j < GetFile.trainingMatrix.length; j++) {
distances[j] = EuclideanDistance.findED(GetFile.trainingMatrix[j], GetFile.testMatrix[i]);
}
return minDistance;
}
return 0;
}
I would appreciate any help. Thanks in advance
It doesn't look as though you need to store the result in an array at all, since that's being discarded. You should consider tracking the result in minDistance every time you get a result from findED.
This also looks like the kind of thing that would be a lot easier to understand if you used java's streams, if you're using a version of java that has access to them.
There are several ways to do so.
One way would be iterating over the values and taking always the smallest:
double minDistance = distances[0];
for(int j =1 ;j < GetFile.trainingMatrix.length; j++){
if(distances[j]<minDistance)
minDistance=distances[j];
}
or alternatively
double minDistance = distances[0];
for(int j =1 ;j < GetFile.trainingMatrix.length; j++){
minDistance = Math.min(minDistance, distances[j];
}
or using streams (with distances as List):
double minDistance = distances.stream().mapToDouble(e -> e).min().getAsDouble();
or even nicer (with your for-loop completely implemented):
double minDistance = Stream.iterate(0,j -> j+1)
.limit(GetFile.trainingMatrix.length)
.mapToDouble(j->EuclideanDistance.findED(GetFile.trainingMatrix[j], GetFile.testMatrix[i]))
.min().orElse(-1);
Another way along with your code:
public static double distance() {
for (int i = 0; i < GetFile.testMatrix.length;) {
double[] distances = new double[4000];
for (int j = 0; j < GetFile.trainingMatrix.length; j++) {
distances[j] = EuclideanDistance.findED(GetFile.trainingMatrix[j], GetFile.testMatrix[i]);
}
return getMinDistance(distances);
}
return 0;
}
static double getMinDistance(double[] distances) {
double minDistance = Double.MAX_VALUE;
for (double distance : distances) {
minDistance = Math.min(distance, minDistance);
}
return minDistance;
}
Assign the minDistance to distances[0] unless you are very sure about the maximum value that distances array will contain.
public static double distance() {
for (int i = 0; i < GetFile.testMatrix.length;) {
double[] distances = new double[4000];
double minDistance;
for (int j = 0; j < GetFile.trainingMatrix.length; j++) {
distances[j] = EuclideanDistance.findED(GetFile.trainingMatrix[j], GetFile.testMatrix[i]);
}
minDistance = distances[0];
for(int i = 1 ; i < distances.length; i++) {
if(minDistance > distances[i]) {
minDistance = distances[i];
}
}
return minDistance;
}
i search the net and there is a lot of error like mine. But i cant find the exact solution for matrix multiplication. I use below code for it. i have a lot of matrix multiplication. First i get tranpose of matris A which is ATP. then multiply with P like
n1=ATP*P;
then multiply n1 with A.
N1=n1*A;
When using below code i get right solution i check it with matlab. but multiplication of n1 and A gives error.
P matrix like P={{100,0,0,0...}{0,100,0,0...},......{...0,0,100}}
What is the problem?
Thanks a lot.
public static double[][] multiply(double[][] a, double[][] b) {
int rowsa = a.length;
int columnsa = a[0].length;
int rowsb = b.length;
int columnsb = b[0].length;
double c[][] = new double[rowsa][columnsb];
try {
if (columnsa == rowsb) {
for (int ii = 0; ii < rowsa; ii++) {
for (int jj = 0; jj < columnsb; jj++) {
c[ii][jj] = 0;
for (int kk = 0; kk < rowsb; kk++) {
c[ii][jj] += a[ii][kk] * b[kk][jj];
}
}
}
} else {
}
} catch (Exception e) {
System.out.println("Matris is empty");
}
return c;
}
I call the code like
double ATP[][] = transpose(A);
double n1[][] = multiply(ATP, PP);
double N1[][] = multiply(n1, A);
double n2[][] = multiply(n1, ll);
double N11[][] = inverse(N1);
double kats[][] = multiply(N11, n2);
i cant give values of A because it is changeable i read time and according to time i calculate tle elements of matrix. Every step add a row to A matrix.it starts with A(4,56)and continue like A(5,56)
I'm trying to write a method that calculates the exponential of a square matrix. In this instance, the matrix is a square array of value:
[1 0]
[0 10]
and the method should return a value of:
[e 0]
[0 e^10]
However, when I run my code, I get a range of values depending on what bits I've rearranged, non particularly close to the expected value.
The way the method works is to utilise the power series for the matrix, so basically for a matrix A, n steps and an identity matrix I:
exp(A) = I + A + 1/2!*AA + 1/3!*AAA + ... +1/n!*AAA..
The code follows here. The method where I'm having the issue is the method exponential(Matrix A, int nSteps). The methods involved are enclosed, and the Matrix objects take the arguments (int m, int n) to create an array of size double[m][n].
public static Matrix multiply(Matrix m1, Matrix m2){
if(m1.getN()!=m2.getM()) return null;
Matrix res = new Matrix(m1.getM(), m2.getN());
for(int i = 0; i < m1.getM(); i++){
for(int j = 0; j < m2.getN(); j++){
res.getArray()[i][j] = 0;
for(int k = 0; k < m1.getN(); k++){
res.getArray()[i][j] = res.getArray()[i][j] + m1.getArray()[i][k]*m2.getArray()[k][j];
}
}
}
return res;
}
public static Matrix identityMatrix(int M){
Matrix id = new Matrix(M, M);
for(int i = 0; i < id.getM(); i++){
for(int j = 0; j < id.getN(); j++){
if(i==j) id.getArray()[i][j] = 1;
else id.getArray()[i][j] = 0;
}
}
return id;
}
public static Matrix addMatrix(Matrix m1, Matrix m2){
Matrix m3 = new Matrix(m1.getM(), m2.getN());
for(int i = 0; i < m3.getM(); i++){
for(int j = 0; j < m3.getN(); j++){
m3.getArray()[i][j] = m1.getArray()[i][j] + m2.getArray()[i][j];
}
}
return m3;
}
public static Matrix scaleMatrix(Matrix m, double scale){
Matrix res = new Matrix(m.getM(), m.getN());
for(int i = 0; i < res.getM(); i++){
for(int j = 0; j < res.getN(); j++){
res.getArray()[i][j] = m.getArray()[i][j]*scale;
}
}
return res;
}
public static Matrix exponential(Matrix A, int nSteps){
Matrix runtot = identityMatrix(A.getM());
Matrix sum = identityMatrix(A.getM());
double factorial = 1.0;
for(int i = 1; i <= nSteps; i++){
sum = Matrix.multiply(Matrix.scaleMatrix(sum, factorial), A);
runtot = Matrix.addMatrix(runtot, sum);
factorial /= (double)i;
}
return runtot;
}
So my question is, how should I modify my code, so that I can input a matrix and a number of timesteps to calculate the exponential of said matrix after said timesteps?
My way to go would be to keep two accumulators :
the sum, which is your approximation of exp(A)
the nth term of the series M_n, that is A^n/n!
Note that there is a nice recursive relationship with M_n: M_{n+1} = M_n * A / (n+1)
Which yields :
public static Matrix exponential(Matrix A, int nSteps){
Matrix seriesTerm = identityMatrix(A.getM());
Matrix sum = identityMatrix(A.getM());
for(int i = 1; i <= nSteps; i++){
seriesTerm = Matrix.scaleMatrix(Matrix.multiply(seriesTerm,A),1.0/i);
sum = Matrix.addMatrix(seriesTerm, sum);
}
return sum;
}
I totally understand the sort of thrill that implementing such algorithms can give you. But if this is not a hobby project, I concur that you should that you should use a library for this kind of stuff. Making such computations precise and efficient is really not a trivial matter, and a huge wheel to reinvent.
Basically my HW says to ask user for matrix of A. Then ask user by how much he would like to power Matrix A.
So basically,
I need to find a way to raise a matrix to the power. I can multiply them, but it's harder to raise them to the power because I must multiply it by itself. So What I do is create a variable to hold the matrix like so
for (i = 0; i < matrixARowSize; i++)
{
for (j = 0; j < matrixAColumnSize; j++)
{
for (k = 0; k < matrixARowSize; k++)
{
sum += matrixA[i][j] * matrixA[i][j];
}
matrixC[i][j] = sum;
sum = 0;
}
}
Then I would have to multiply to itself as much as the user wants to.
Eg:
matrixC[i][j] * matrixC[i][j]*matrixC[i][j] ...// etc
up to whatever power the user wants. I can do that with many If statements yes, but I also need to be able to add them together like so:
matrixC^6 + matrixC^5 + matrixC^4 ...
etc from whatever power the user wants. (Highest is 6).
Any suggestions on how to do this?
You can do this:
int raiseMethod(int val, int pow) {
int temp = val;
for (int i = 1; i < pow; i++) {
temp *= val;
}
return temp;
}
for (int i = 0; i < arrayColummns; i++) {
for (int j = 0; j < arrayRows; j++) {
array[i][j] = raiseMethod(array[i][j], powerToRaise);
}
}
This way, the array will be auto-updated with it's raised value on each position.
I believe you are looking for the Math.pow() method, which raises one number to the power of another, e.g.
sum += (int) Math.power(matrixA[i][j], raiseByPower);
You can do the binary multiplication of the matrix.
This Matrix structure contains everything you need.
#include <stdio.h>
#include <string.h>
const int SIZE = 6;
struct Matrix
{
int m[SIZE][SIZE];
Matrix()
{
memset(m,0,sizeof(m));
}
Matrix( int a[SIZE][SIZE] )
{
for(int i = 0;i<SIZE;++i)for(int j = 0;j<SIZE;++j)
{
m[i][j] = a[i][j];
}
}
Matrix operator * ( const Matrix &a )
{
Matrix ret;
for(int k = 0;k<SIZE;++k) for(int i = 0;i<SIZE;++i) for(int j = 0;j<SIZE;++j)
{
ret.m[i][j] += m[i][k] * a.m[k][j];
}
return ret;
}
Matrix operator ^ ( int P )
{
Matrix ret , a(this->m);
for(int i = 0;i<SIZE;++i)
ret.m[i][i] = 1;
while(P)
{
if( P&1 )
ret = ret * a;
a = a * a;
P >>= 1;
}
return ret;
}
Matrix operator + (const Matrix &a)
{
Matrix ret;
for(int i = 0;i<SIZE;++i) for(int j = 0;j<SIZE;++j)
{
ret.m[i][j] = m[i][j] + a.m[i][j];
}
return ret;
}
};
You can use this structure like the following:
Matrix A, B;
Matrix res = (A^6) + (B^5);
This Power function does log(n) multiplications of matrix.