Does anyone know of a scientific/mathematical library in Java that has a straightforward implementation of weighted linear regression? Something along the lines of a function that takes 3 arguments and returns the corresponding coefficients:
linearRegression(x,y,weights)
This seems fairly straightforward, so I imagine it exists somewhere.
PS) I've tried Flannigan's library: http://www.ee.ucl.ac.uk/~mflanaga/java/Regression.html, it has the right idea but seems to crash sporadically and complain out my degrees of freedom?
Not a library, but the code is posted: http://www.codeproject.com/KB/recipes/LinReg.aspx
(and includes the mathematical explanation for the code, which is a huge plus).
Also, it seems that there is another implementation of the same algorithm here: http://sin-memories.blogspot.com/2009/04/weighted-linear-regression-in-java-and.html
Finally, there is a lib from a University in New Zealand that seems to have it implemented: http://www.cs.waikato.ac.nz/~ml/weka/ (pretty decent javadocs). The specific method is described here:
http://weka.sourceforge.net/doc/weka/classifiers/functions/LinearRegression.html
I was also searching for this, but I couldn't find anything. The reason might be that you can simplify the problem to the standard regression as follows:
The weighted linear regression without residual can be represented as
diag(sqrt(weights))y = diag(sqrt(weights))Xb where diag(sqrt(weights))T basically means multiplying each row of the T matrix by a different square rooted weight. Therefore, the translation between weighted and unweighted regressions without residual is trivial.
To translate a regression with residual y=Xb+u into a regression without residual y=Xb, you add an additional column to X - a new column with only ones.
Now that you know how to simplify the problem, you can use any library to solve the standard linear regression.
Here's an example, using Apache Commons Math:
void linearRegression(double[] xUnweighted, double[] yUnweighted, double[] weights) {
double[] y = new double[yUnweighted.length];
double[][] x = new double[xUnweighted.length][2];
for (int i = 0; i < y.length; i++) {
y[i] = Math.sqrt(weights[i]) * yUnweighted[i];
x[i][0] = Math.sqrt(weights[i]) * xUnweighted[i];
x[i][1] = Math.sqrt(weights[i]);
}
OLSMultipleLinearRegression regression = new OLSMultipleLinearRegression();
regression.setNoIntercept(true);
regression.newSampleData(y, x);
double[] regressionParameters = regression.estimateRegressionParameters();
double slope = regressionParameters[0];
double intercept = regressionParameters[1];
System.out.println("y = " + slope + "*x + " + intercept);
}
This can be explained intuitively by the fact that in linear regression with u=0, if you take any point (x,y) and convert it to (xC,yC), the error for the new point will also get multiplied by C. In other words, linear regression already applies higher weight to points with higher x. We are minimizing the squared error, that's why we extract the roots of the weights.
I personally used org.apache.commons.math.stat.regression.SimpleRegression Class of the Apache Math library.
I also found a more lightweight class from Princeton university but didn't test it:
http://introcs.cs.princeton.edu/java/97data/LinearRegression.java.html
Here's a direct Java port of the C# code for weighted linear regression from the first link in Aleadam's answer:
https://github.com/lukehutch/WeightedLinearRegression.java
Related
I am working on a function plotter project for Android in which the user inputs the equation as a string.
this string is solved using the EvalEx library and I get a bunch of data points which I plot using graph-view library.
now the problem is when I give it an equation with negative square root. for example SQRT(1-x) this is causing errors.
for( i = 0; i < x.length ; i++) {
//the equation solver only takes BigDecimal as input.
x1 = new BigDecimal(x[i]);
try {
// eq is the sting that i got from the EditText.
y1 = new Expression(eq).with("x", x1).eval();
y[i] = y1.floatValue();
} catch(ArithmeticException excp) {
//these are the data points that go into the plot function
x[i] = 0;
y[i] = 0;
}
}
I'm still a little unclear what you are trying to do, but I think I understand enough to give you a definitive answer.
Here's the thing:
The eval(...) function returns a single value.
That's what the API says. That's all it can do. You cannot avoid that fact. (There is no magic .....)
So if you want to get both (real) square roots, you need take the positive result returned by SQRT and negate it yourself to get the second solution. In your code. Something like this:
y1 = new Expression("SQRT(3 - x^2)").with("x", x1).eval();
y2 = BigDecimal.ZERO - y1;
Of course, this is special-case code1. And there is no general-case code that is going to give you multiple solutions to equations when the SQRT functions could be anywhere in your expression.
And expressions with complex solutions will be even more intractable with the EvalEx API. You cannot represent a complex solution using the (single) BigDecimal that is returned by the eval method. (If you look at the code, taking a square root of a negative number throws the API's ExpressionException.)
The bottom line is that EvalEx is a simple, light-weight expression evaluator. It is not designed for your use-case which involves finding all solutions, and / or dealing with complex numbers. And making the existing API work for these use-cases would be ... impossible.
But the good news is that the source code for EvalEx is available on GitHub.
https://github.com/uklimaschewski/EvalEx (I assume this is corresponds to the version you are using.)
You could download it and use it as the starting point for writing a more sophisticated expression evaluator.
Or ... you could look for an alternative library that does what you need.
1 - That is, it is implemented with the pre-knowledge of what the expression we are evaluating is! Moreover, it still fails for values of x where 3 - x^2 is negative.
I'm using the non linear least squares Levenburg Marquardt algorithm in java to fit a number of exponential curves (A+Bexp(Cx)). Although the data is quite clean and has a good approximation to the model the algorithm is not able to model the majority of them even with a excessive number of iterations(5000-6000). For the curves it can model, it does so in about 150 iterations.
LeastSquaresProblem problem = new LeastSquaresBuilder()
.start(start).model(jac).target(dTarget)
.lazyEvaluation(false).maxEvaluations(5000)
.maxIterations(6000).build();
LevenbergMarquardtOptimizer optimizer = new LevenbergMarquardtOptimizer();
LeastSquaresOptimizer.Optimum optimum = optimizer.optimize(problem);}
My question is how would I define a convergence criteria in apache commons in order to stop it hitting a max number of iterations?
I don't believe Java is your problem. Let's address the mathematics.
This problem is easier to solve if you change your function.
Your assumed equation is:
y = A + B*exp(C*x)
It'd be easier if you could do this:
y-A = B*exp(C*x)
Now A is just a constant that can be zero or whatever value you need to shift the curve up or down. Let's call that variable z:
z = B*exp(C*x)
Taking the natural log of both sides:
ln(z) = ln(B*exp(C*x))
We can simplify that right hand side to get the final result:
ln(z) = ln(B) + C*x
Transform your (x, y) data to (x, z) and you can use least squares fitting of a straight line where C is the slope in (x, z) space and ln(B) is the intercept. Lots of software available to do that.
Within a java project I've developed I need to calculate the inverse of a matrix. In order to align with other projects and other developers I'm using the Efficient Java Matrix Library (orj.ejml).
For inverting the Matrix I'm using invert from org.ejml.ops.CommonOps, and I has worked fine until now that I'm getting a unexpected result
I've isolated the case that doesn't work to be:
DenseMatrix64F X = new DenseMatrix64F(3, 3);
X.setData(new double[]{77.44000335693366,-24.64000011444091,-8.800000190734865, -24.640000114440916,7.839999732971196,2.799999952316285, -8.800000190734865,2.799999952316285,1.0000000000000004});
DenseMatrix64F invX = new DenseMatrix64F(3, 3);
boolean completed = CommonOps.invert(X, invX);
System.out.println(X);
System.out.println(invX);
System.out.println(completed);
The output I get from this test is:
Type = dense , numRows = 3 , numCols = 3
77.440 -24.640 -8.800
-24.640 7.840 2.800
-8.800 2.800 1.000
Type = dense , numRows = 3 , numCols = 3
NaN -Infinity Infinity
NaN Infinity -Infinity
NaN -Infinity Infinity
true
My first thought was that it could be a singular matrix and therefore not invertible, but after testing the same matrix with a different calculation tool I've found that it is not singular.
So I went back to the EJML documentation and found out the following information for this particular function.
If the algorithm could not invert the matrix then false is returned. If it returns true that just means the algorithm finished. The results could still be bad because the matrix is singular or nearly singular.
And, in this particular case the matrix is not singular but we could say it is near singular.
The only solution I could think off was to search the inverted matrix for NaN or Infinites after calculating it, and if I find something funny in there I just replace the inverted matrix with the original matrix, although it doesn't seem a very clean practice it yields reasonable results.
My question is:
Could you think of any solution for this situation? Something smarter and wiser than just using the original matrix as its own inverse matrix.
In case there is no way around it, do you know of any other Java Matrix library that has some solution to this situation, I'm not looking forward to introduce a new library but it may be the only solution if this becomes a real problem.
Regards and thanks for your inputs!
You should try using SVD if you have to have an inverse. Also consider a pseudo inverse instead. Basically any library using LU decomposition will have serious issues. Here's the output from Octave. Note how two of the singular values are almost zero. Octave will give you an inverse with real numbers, but it's a poor one...
octave:7> cond(B)
ans = 8.5768e+17
octave:8> svd(B)
ans =
8.6280e+01
3.7146e-15
1.0060e-16
inv(B)*B
warning: inverse: matrix singular to machine precision, rcond = 4.97813e-19
ans =
0.62500 0.06250 0.03125
0.00000 0.00000 0.00000
0.00000 0.00000 4.00000
I'm doing some video processing, for each frame I need to get a gradient of a bi-variate function.
The function is represented as a two dimensional array of doubles. Where the domain is the rows and columns indices and the range is the double value of the corresponding indices values. Or more simply put, the function f is defined for double[][] matrix as such:
f(x,y)=matrix[x][y]
I'm trying to use the Apache Commons Math library for it:
SmoothingPolynomialBicubicSplineInterpolator iterpolator = new SmoothingPolynomialBicubicSplineInterpolator();
BicubicSplineInterpolatingFunction f = iterpolator.interpolate(xs, ys, matrix.getData());
for (int i = 0; i < ans.length; i++) {
for (int j = 0; j < ans[0].length; j++) {
ans[i][j] = f.partialDerivativeY(i, j);
}
}
with xs, as a sorted array of the x indices (0,1,...,matrix.getRowDimension() - 1)
ys the same on the columns dimension (0,1,...,matrix.getColumnDimension() - 1)
The problem is that for a typical matrix in the size of 150X80 it takes as much as 1.4 seconds to run, which renders it completely irrelevant for my needs. So, as a novice user of this library, and programmatic numeric analysis in general, I want to know:
Am I doing something wrong?
Is there another, faster, way I can accomplish this task with?
Is there another open source library (preferably maven-friendly) that offers a solution?
Numerical differentiation is an entire topic unto itself, a simple google should bring up enough material for you to work with (just the wiki might be sufficient). There are parameters of your problem that I cannot know, so I can only speak broadly here, but there are direct methods of determining the gradient at a given point, i.e. ones that don't require an interpolation. See the wikipedia for the formulae (ranging from the simple f(x+1)-f(x), which is where h=1, to the higher order ones). Calculating the partial derivatives is then a simple O(NM) loop with an uber easy formula inside (no interpolation required).
The specifics can get gritty:
The higher order formulae need to be reduced for the edges, or
discarded altogether.
Your precise speed requirements might render more complex formulae useless (depending on the platform sometimes the lookup times for higher order formulae make them too slow; again, it depends on the cache etc.). This is easy to test, the formulae are simple; code them and benchmark.
The specific implementation is also dependent on your error requirements. The theory provides error bounds, so that will play a role in what formula you need; but again, there's a trade-off with speed requirements. The in turn can be practically lowered if you know specifics about the types of matrices you'll be processing, if such a thing is known.
The implementation can be made even easier (and maybe faster) if you have existing convolution tools, since this method is really just a convolution of the matrix (note; technically it's called a cross-correlation).
Are there any methods which do that? I have an application where I need the area under the curve, and I am given the formula, so if I can do the integration on hand, I should be able to do it programatically? I can't find the name of the method I'm referring to, but this image demonstrates it: http://www.mathwords.com/a/a_assets/area%20under%20curve%20ex1work.gif
Edit: to everyone replying, I have already implemented rectangular, trapezoidal and Simpson's rule. However, they take like 10k+ stripes to be accurate, and should I not be able to find programatically the integrated version of a function? If not, there must be a bloody good reason for that.
Numerical integration
There are multiple methods, which can be used. For description, have a look in Numerical Recipes: The Art of Scientific Computing.
For Java there is Apace Commons library, which can be used. Integration routines are in Numerical Analysis section.
Symbolic integration
Check out jScience. Functions module "provides support for fairly simple symbolic math analysis (to solve algebraic equations, integrate, differentiate, calculate expressions, and so on)".
If type of function is given, it can be possible to integrate faster in that specific case than when using some standard library.
To compute it exactly, you would need a computer algebra system library of some sort to perform symbolic manipulations. Such systems are rather complicated to implement, and I am not familiar with any high quality, open source libraries for Java. An alternative, though, assuming it meets your requirements, would be to estimate the area under the curve using the trapezoidal rule. Depending on how accurate you require your result to be, you can vary the size of the subdivisions accordingly.
I would recommend using Simpsons rule or the trapezium rule, because it could be excessively complicated to integrate every single type of graph.
See Numerical analysis specifically numerical integration. How about using the Riemann sum method?
You can use numerical integration, using some rule, like already mentioned Simpsons, Trapezoidal, or Monte-Carlo simulation. It uses pseudo random generator.
You can try some libraries for symbolic integration, but I'm not sure that you can get symbolic representation of every integral.
Here's a simple but efficient approach:
public static double area(DoubleFunction<Double> f, double start, double end, int intervals) {
double deltaX = (end - start)/intervals;
double area = 0.0;
double effectiveStart = start + (deltaX / 2);
for (int i=0; i<intervals; ++i) {
area += f.apply(effectiveStart + (i * deltaX));
}
return deltaX * area;
}
This is a Riemann sum using the midpoint rule, which is a variation of the trapezoidal rule, except instead of calculating the area of a trapezoid, I use a rectangle from f(x) at the middle of the interval. This is faster and gives a better result. This is why my effective starting value of x is at the middle of the first interval. And by looping over an integer, I avoid any round-off problems.
I also improve performance by waiting till the end of the loop before multiplying by deltaX. I could have written the loop like this:
for (int i=0; i<intervals; ++i) {
area += deltaX * f.apply(effectiveStart + (i * deltaX)); // this is x * y for each rectangle
}
But deltaX is constant, so it's faster to wait till the loop is finished.
One of the most popular forms of numeric integration is the Runge-Kutta order 4 (RK4) technique. It's implementations is as follows:
double dx, //step size
y ; //initial value
for(i=0;i<number_of_iterations;i++){
double k1=f(y);
double k2=f(y+dx/2*k1);
double k3=f(y+dx/2*k2);
double k4=f(y+dx*k3);
y+= dx/6*(k1+2*k2+2*k3+k4);
}
and will converge much faster than rectangle, trapezoids, and Simpson's rule. It is one of the more commonly used techniques for integration in physics simulations.