Using least square method with Commons Math and fitting

Using least square method with Commons Math and fitting - java

I was trying to use commons math to figure out the constants in a polynomial. It looks like the routine exists but I got this error. Does anyone see the issue?
I was trying to convert this question to commons-math:
https://math.stackexchange.com/questions/121212/how-to-find-curve-equation-from-data
From plotting you data (Wolfram|Alpha link), it does not look linear. So it better be fit by a polynomial. I assume you want to fit the data:
X Y
1 4
2 8
3 13
4 18
5 24
..
using a quadratic polynomial y=ax2+bx+c.
And wolfram alpha provided a great utility. I wish I could get the same answers like from wolfram.
http://www.wolframalpha.com/input/?i=fit+4%2C+8%2C+13%2C
E.g. By entering that data, I would get : 4.5 x-0.666667 (linear)
Here is the code and error:
import org.apache.commons.math3.stat.regression.OLSMultipleLinearRegression;
import org.apache.commons.math3.stat.regression.SimpleRegression;
final OLSMultipleLinearRegression regression2 = new OLSMultipleLinearRegression();
double[] y = {
4.0,
8,
13,
};
double[][] x2 =
{
{ 1.0, 1, 1 },
{ 1.0, 2, 4 },
{ 0.0, 3, 9 },
};
regression2.newSampleData(y, x2);
regression2.setNoIntercept(true);
regression2.newSampleData(y, x2);
double[] beta = regression2.estimateRegressionParameters();
for (double d : beta) {
System.out.println("D: " + d);
}
Exception in thread "main" org.apache.commons.math3.exception.MathIllegalArgumentException: not enough data (3 rows) for this many predictors (3 predictors)
at org.apache.commons.math3.stat.regression.AbstractMultipleLinearRegression.validateSampleData(AbstractMultipleLinearRegression.java:236)
at org.apache.commons.math3.stat.regression.OLSMultipleLinearRegression.newSampleData(OLSMultipleLinearRegression.java:70)
at org.berlin.bot.algo.BruteForceSort.main(BruteForceSort.java:108)

The javadoc for validateSampleData() states that the two-dimensional array must have at least one more row than it has columns.
http://commons.apache.org/proper/commons-math/javadocs/api-3.3/org/apache/commons/math3/stat/regression/AbstractMultipleLinearRegression.html

Rcook was right. I provided an additional row (test case) and that generated the same answer as from wolfram/alpha.
D: 0.24999999999999822
D: 3.4500000000000033
D: 0.24999999999999914
Or 0.25x^2 + 3.45x + 0.25
final OLSMultipleLinearRegression regression2 = new OLSMultipleLinearRegression();
double[] y = {
4,
8,
13,
18
};
double[][] x2 =
{
{ 1, 1, 1 },
{ 1, 2, 4 },
{ 1, 3, 9 },
{ 1, 4, 16 },
};
regression2.newSampleData(y, x2);
regression2.setNoIntercept(true);
regression2.newSampleData(y, x2);
double[] beta = regression2.estimateRegressionParameters();
for (double d : beta) {
System.out.println("D: " + d);
}

Related

Choco Solver setObjective maximize polynominal equation

I'm currently trying out Choco Solver (4.0.8) and I'm trying to solve this equations:
Maximize
subject to
I'm stuck on maximising the first equation. I guess I just need a hint which subtype of Varaible EQUATION should be.
Model model = new Model("my first problem");
BoolVar x1 = model.boolVar("x1");
BoolVar x2 = model.boolVar("x2");
BoolVar x3 = model.boolVar("x3");
BoolVar x4 = model.boolVar("x4");
BoolVar[] bools = {x1, x2, x3, x4};
int[] c = {5, 7, 4, 3};
int[] c2 = {8, 11, 6, 4};
Variable EQUATION = new Variable();
model.scalar(bools, c, "<=", 14).post(); // 5x1 + 7x2 + 4x3 + 3x4 ≤ 14
model.setObjective(Model.MAXIMIZE, EQUATION); // 8x1 + 11x2 + 6x3 + 4x4
model.getSolver().solve();
System.out.println(x1);
System.out.println(x2);
System.out.println(x3);
System.out.println(x4);

It seems I have found a solution like this:
Variable EQUATION = new ScaleView(x1, 8)
.add(new ScaleView(x2, 11),
new ScaleView(x3, 6),
new ScaleView(x4, 4)).intVar();

OLS Multiple Linear Regression with commons-math

Currently I have a dependency to commons-math 2.1 but I want to upgrade it to commons-math 3.6. Unfortunately there are some testcases that are not working any longer. I know what is causing my problem, but I don't know how to change the testcase accordingly to test the correct behavior as before.
I have following test code:
#Test
public void testIdentityMatrix() {
double[][] x = { { 1, 0, 0, 0 }, { 0, 1, 0, 0 }, { 0, 0, 0, 1 }, { 0, 0, 0, 1 } };
double[] y = { 1, 2, 3, 4 };
OLSMultipleLinearRegression regression = new OLSMultipleLinearRegression();
regression.setNoIntercept(true);
regression.newSampleData(y, x);
double[] b = regression.estimateRegressionParameters();
for (int i = 0; i < y.length; i++)
{
assertEquals(b[i], y[i], 0.001);
}
}
After the upgrade to commons-math 3.6 the OLSMultipleLinearRegression checks the given matrix x and vector y for valid contents. And this validation fails with the message:
not enough data (4 rows) for this many predictors (4 predictors)
What do I need to change to correct that test case?

This is a bug in Commons Math 3.x. When there is no intercept in the model, as long as the design matrix is not singular, the number of observations equal to the number of regressors should be OK. In your example, I think you mean for the third x row to be {0,0,1,0} (otherwise the design matrix is singular). With this change to your data and the code patch applied in the Hipparchus fix your test succeeds. This bug is being tracked as MATH-1392 in Commons Math.

The number of samples has to be bigger than the number of variables. Apparently your test case it not correct. You would have to add at least one more sample.
If you change
double[][] x = { { 1, 0, 0, 0 }, { 0, 1, 0, 0 }, { 0, 0, 0, 1 }, { 0, 0, 0, 1 } };
to
double[][] x = { { 1, 0, 0, 0 }, { 0, 1, 0, 0 }, { 0, 0, 0, 1 }, { 0, 0, 0, 1 }, {1,0,0,0} };
it should work. (although I didn't test it).

I guess the 3rd row of x should be 0010 instead of 0001?
However, if you change x to
double[][] x = { { 1, 0, 0, 0 }, { 0, 1, 0, 0 }, { 0, 0, 1, 0 ), { 0,
0, 0, 1 }, {1,1,1,1} };
and change y to
double[] y = { 1, 2, 3, 4, 10 };
that the last element is the sum of other elements, then it works.

Apache Commons Math3: Multiply row with column vector

I want to multiply two vectors a^T = (1,2,3) and b = (4,5,6). With pen and pencil, I got
c = 1*4 + 2*5 + 3*6 = 4 + 10 + 18 = 32
With apache commons math3 I do
ArrayRealVector a = new ArrayRealVector(new double []{1, 2, 3});
ArrayRealVector b = new ArrayRealVector(new double []{4, 5, 6});
to get a representation of the vectors. And to get the result I want to do something like
double c = a.transpose().multiply(b);
but I can't find the right method for it (Wether transpose nor multiply).

This is the dot product, which you can do with double c = a.dotProduct(b);

My table is outputting incorrect values for the percentage

public class apples {
private static String[] level1 = new String[] { "A", "B", "I", "K", "N", "O", "P", "S", "T", "W" };
public static void main(String[] args) {
int[] scores1 = { 99, 80, 56, 88, 70, 35, 67, 60, 78, 56 };
int[] correct1 = {20, 20, 13, 15, 22, 18, 19, 21, 23, 25};
int[] incorrect1 = {2, 1, 5, 2, 2, 5, 8, 1, 0, 0};
double[] percentage1 = new double[correct1.length];
for(int a = 0; a < correct1.length; a++ ){
percentage1[a] = (double)((correct1[a] / (correct1[a] + incorrect1[a]))*100);
}
System.out.println("Character \t Correct \t Incorrect \t Percentage");
for(int counter = 0; counter<scores1.length;counter++){
System.out.println(level1[counter] + "\t\t " + correct1[counter] + "\t\t " + incorrect1[counter] + "\t\t " + percentage1[counter]);
}
}
}
This outputs a table with 4 headings. The character, correct and incorrect columns show as expected. However the percentage row is not working properly. For example, character 'A', correct 20 and incorrect 2 gives a percentage of 0.0. Any 'incorrect' value > 0 outputs a percentage value of 0, and any 'incorrect' value which = 0 gives a percentage value of 100 (which is correct)... Can someone please explain where I have gone wrong?

You are dealing with integers here, and for integer division, the result is truncated. You'll need to cast the original values to double instead, or multiply one part by 1.0 to get it as a double:
percentage1[a] = ((correct1[a]*1.0 / (correct1[a] + incorrect1[a]))*100);

percentage1[a] = (double)((correct1[a] / (correct1[a] + incorrect1[a]))*100);
The above code casts to a double after the calculation is competed.
To cast as part of the calculation, use:
percentage1[a] = (( ((double)correct1[a]) / (correct1[a] + incorrect1[a]))*100);

You calculations here
percentage1[a] = (double)((correct1[a] / (correct1[a] + incorrect1[a]))*100);
perform integer division (you just cast them afterwards to double). If you want them to return the actual floating point division result, you have to cast all operands to double before the calculation.
So the fastest option would be to change this:
double[] correct1 = {20, 20, 13, 15, 22, 18, 19, 21, 23, 25};
double[] incorrect1 = {2, 1, 5, 2, 2, 5, 8, 1, 0, 0};
Another would be to change the computation to something like this
percentage1[a] = (1.0 * correct1[a] / (correct1[a] + incorrect1[a]))*100;
or to simplify a little:
percentage1[a] = 100.0 * correct1[a] / (correct1[a] + incorrect1[a]);

Array interpolation

Before I start, please accept my apologies that I'm not a mathematician and don't really know the proper names for what I'm trying to do... ;) Pointers to any plain-English explanations that might help would be most appreciated (as I'm purely Googling at the moment based upon what I think the solution might be).
If have a multi-dimensionsal array of source values and wanted to upscale that array by a factor of n, I think that what I'd need to use is Bicubic Interpolation Certainly the image top right on that page is representative of what I'm aiming for - creating a graduated flow of values between the underlying source data points, based upon the value(s) of their surrounding neighbours. I completely accept that by not increasing the volume of data I am not increasing the resolution of the data; merely blurring the edges.
Something akin to going from this;
to this (and beyond);
Following the link within the Wikipedia article gives me a (supposed) example implementation of what I'm striving for, but if it does I fear I'm currently missing the logical leap to get myself there. When I call getValue(source, 0.5, 0.5) on the BicubicInterpolator, what am I getting back? I thought that if I gave an x/y of (0.0,0.0) I would get back the bottom-left value of the grid and if I looked at (1,1) I would get top-right, and any value between would give me the specified position within the interpolated grid.
double[][] source = new double[][] {{1, 1, 1, 2}, {1, 2, 2, 3}, {1, 2, 2, 3}, {1, 1, 3, 3}};
BicubicInterpolator bi = new BicubicInterpolator();
for (double idx = 0; idx <= 1; idx += 0.1) {
LOG.info("Result (" + String.format("%3.1f", idx) + ", " + String.format("%3.1f", idx) + ") : " + bi.getValue(source, idx, idx));
}
The output for a diagonal line across my source grid however is;
Result (0.0, 0.0) : 2.0
Result (0.1, 0.1) : 2.08222625
Result (0.2, 0.2) : 2.128
Result (0.3, 0.3) : 2.13747125
Result (0.4, 0.4) : 2.11424
Result (0.5, 0.5) : 2.06640625
Result (0.6, 0.6) : 2.00672
Result (0.7, 0.7) : 1.9518312500000001
Result (0.8, 0.8) : 1.92064
Result (0.9, 0.9) : 1.93174625
Result (1.0, 1.0) : 2.0
I'm confused because the diagonals go from 1 to 3 and from 1 to 2; there's nothing going from 2 to 2 with very little (overall) variation. Am I completely mis-understanding things?
EDIT : Following Peter's suggestion to expand the boundary for analysis, the grid can now be generated as a quick'n'dirty upscale to a 30x30 matrix;
Now that what's going on is making a bit more sense, I can see that I need to consider a few additional things;
Control the overshoot (seen in the middle of the grid where the source has a block of four cells with a value of 2, but the interpolated value peaks at 2.2)
Cater for blank values in the source grid and have them treated as blanks, rather than zero, so that they don't skew the calculation
Be prepare to be told I'm on a fool's errand and that a different solution is needed
See if this was what the customer thought they actually wanted when they said "make it less blocky"

If you assume the "outside" temperature is the same as the outer most ring of values, and you want to shift which port of the grid you are considering...
public static void main(String... args) {
double[][] source = new double[][]{{1, 1, 1, 2}, {1, 2, 2, 3}, {1, 2, 2, 3}, {1, 1, 3, 3}};
BicubicInterpolator bi = new BicubicInterpolator();
for (int i = 0; i <= 30; i++) {
double idx = i / 10.0;
System.out.printf("Result (%3.1f, %3.1f) : %3.1f%n", idx, idx, bi.getValue(source, idx, idx));
}
}
public static class CubicInterpolator {
public static double getValue(double[] p, double x) {
int xi = (int) x;
x -= xi;
double p0 = p[Math.max(0, xi - 1)];
double p1 = p[xi];
double p2 = p[Math.min(p.length - 1,xi + 1)];
double p3 = p[Math.min(p.length - 1, xi + 2)];
return p1 + 0.5 * x * (p2 - p0 + x * (2.0 * p0 - 5.0 * p1 + 4.0 * p2 - p3 + x * (3.0 * (p1 - p2) + p3 - p0)));
}
}
public static class BicubicInterpolator extends CubicInterpolator {
private double[] arr = new double[4];
public double getValue(double[][] p, double x, double y) {
int xi = (int) x;
x -= xi;
arr[0] = getValue(p[Math.max(0, xi - 1)], y);
arr[1] = getValue(p[xi], y);
arr[2] = getValue(p[Math.min(p.length - 1,xi + 1)], y);
arr[3] = getValue(p[Math.min(p.length - 1, xi + 2)], y);
return getValue(arr, x+ 1);
}
}
prints
Result (0.0, 0.0) : 1.0
Result (0.1, 0.1) : 1.0
Result (0.2, 0.2) : 1.0
Result (0.3, 0.3) : 1.1
Result (0.4, 0.4) : 1.1
Result (0.5, 0.5) : 1.3
Result (0.6, 0.6) : 1.4
Result (0.7, 0.7) : 1.6
Result (0.8, 0.8) : 1.7
Result (0.9, 0.9) : 1.9
Result (1.0, 1.0) : 2.0
Result (1.1, 1.1) : 2.1
Result (1.2, 1.2) : 2.1
Result (1.3, 1.3) : 2.1
Result (1.4, 1.4) : 2.1
Result (1.5, 1.5) : 2.1
Result (1.6, 1.6) : 2.0
Result (1.7, 1.7) : 2.0
Result (1.8, 1.8) : 1.9
Result (1.9, 1.9) : 1.9
Result (2.0, 2.0) : 2.0
Result (2.1, 2.1) : 2.1
Result (2.2, 2.2) : 2.3
Result (2.3, 2.3) : 2.5
Result (2.4, 2.4) : 2.7
Result (2.5, 2.5) : 2.8
Result (2.6, 2.6) : 2.9
Result (2.7, 2.7) : 3.0
Result (2.8, 2.8) : 3.0
Result (2.9, 2.9) : 3.0
Result (3.0, 3.0) : 3.0
Looking at how this works, you have a 2x2 grid of inside values and a 4x4 square outside it for outside values. the (0.0, 0.0) to (1.0, 1.0) values map the diagonal between 2 (in cell 2,2) and 2 (in cell 3,3) using the outer values to help interpolate the values.
double[][] source = new double[][]{{1, 1, 1, 2}, {1, 2, 2, 3}, {1, 2, 2, 3}, {1, 1, 3, 3}};
BicubicInterpolator bi = new BicubicInterpolator();
for (int i = -10; i <= 20; i++) {
double idx = i / 10.0;
System.out.printf("Result (%3.1f, %3.1f) : %3.1f%n", idx, idx, bi.getValue(source, idx, idx));
}
prints
Result (-1.0, -1.0) : -5.0
Result (-0.9, -0.9) : -2.8
Result (-0.8, -0.8) : -1.2
Result (-0.7, -0.7) : -0.2
Result (-0.6, -0.6) : 0.5
Result (-0.5, -0.5) : 1.0
Result (-0.4, -0.4) : 1.3
Result (-0.3, -0.3) : 1.5
Result (-0.2, -0.2) : 1.7
Result (-0.1, -0.1) : 1.9
Result (0.0, 0.0) : 2.0
Result (0.1, 0.1) : 2.1
Result (0.2, 0.2) : 2.1
Result (0.3, 0.3) : 2.1
Result (0.4, 0.4) : 2.1
Result (0.5, 0.5) : 2.1
Result (0.6, 0.6) : 2.0
Result (0.7, 0.7) : 2.0
Result (0.8, 0.8) : 1.9
Result (0.9, 0.9) : 1.9
Result (1.0, 1.0) : 2.0
Result (1.1, 1.1) : 2.1
Result (1.2, 1.2) : 2.3
Result (1.3, 1.3) : 2.5
Result (1.4, 1.4) : 2.7
Result (1.5, 1.5) : 2.8
Result (1.6, 1.6) : 2.7
Result (1.7, 1.7) : 2.1
Result (1.8, 1.8) : 0.9
Result (1.9, 1.9) : -1.4
Result (2.0, 2.0) : -5.0

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Using least square method with Commons Math and fitting - java

The javadoc for validateSampleData() states that the two-dimensional array must have at least one more row than it has columns. http://commons.apache.org/proper/commons-math/javadocs/api-3.3/org/apache/commons/math3/stat/regression/AbstractMultipleLinearRegression.html

Related

Choco Solver setObjective maximize polynominal equation

OLS Multiple Linear Regression with commons-math

Apache Commons Math3: Multiply row with column vector

My table is outputting incorrect values for the percentage

Array interpolation

Categories

Resources