I am trying to convert a Excel Solver solution to a java app
The excel solver is
Solver Parameters:
Set Objective: D24 to Max
By Changing Variable Cells: C4:C23
Subject to the Constraints:
C24 = B24
L24 <= N24
L25 >= N25
(non-negative)
GRG Nonlinear
I have been goggling for sometime and cannot find a java library to achieve this. Any ideas?
I have tried choco-solver http://www.emn.fr/z-info/choco-solver/
Solver solver = new Solver("my first problem");
// 2. Create variables through the variable factory
IntVar x = VariableFactory.bounded("X", 0, 5, solver);
IntVar y = VariableFactory.bounded("Y", 0, 5, solver);
// 3. Create and post constraints by using constraint factories
solver.post(IntConstraintFactory.arithm(x, "+", y, "<", 5));
// 4. Define the search strategy
solver.set(IntStrategyFactory.inputOrder_InDomainMin(new IntVar[]{x,y} ));
// 5. Launch the resolution process
if (solver.findSolution()) {
do {
prettyOut();
}
while (solver.nextSolution());
}
I am finding it difficult to relate this to the Excel solver functions, my math is not great
There is a Simplexsolver implementation in Apache Commons Math, but I can't say anything on performance or possible problem size. Finding non-propertery solutions for optimization problems, can be tricky because it is very difficult to efficently optimize large problem sizes and it is an ongoing field of research with only a hand full of good commerical/research solutions.
If you want to keep excelfiles as input you need to parse the data and convert it. For reading Excel files you can use Apache POI.
If you are looking for an API to read and write to microsoft documents, take a look at Apache POI:
http://poi.apache.org/
http://viralpatel.net/blogs/java-read-write-excel-file-apache-poi/
Related
I have an excel file that has a complex calculation model, the performance of direct using apache POI is not acceptable to me. therefore I am thinking to extract the full set formula chain and parse it to java.
for example: If C1=A1+B1 and A1=A2+B2, B1=A3+B3 => C1 = A2+B2+A3+B3
I also noticed that there is an object called CalculationChain, it seems there is a convenient way to get the formula chain. rather than parse these nested Cells one by one.
Anyone can give an example to shed some light on?
This is not a good answer, but it moves one step forward.
XSSFEvaluationWorkbook xssfEvaluationWorkbook = XSSFEvaluationWorkbook.create(workbook);
Ptg[] ptg = FormulaParser.parse(cell.getCellFormula(), xssfEvaluationWorkbook, FormulaType.NAMEDRANGE, sheetIndex);
I'm working with ImageJ. I have two arrays of points (i.e it[ ], cmx[ ]) and what I want is to adjust this to a sine function. I've been working with CurveFitting but I don't understand it very well. I also am having issues with UserFunction.
Is there an easier approach to this? If you have examples I would appreciate it.
The following Groovy script is an example of running curve fitting on three data points:
import ij.measure.CurveFitter;
xData = [0,1,2];
yData = [3.1, 5.1, 6.9];
cv = new CurveFitter((double[]) xData.toArray(), (double[]) yData.toArray());
cv.doFit(CurveFitter.STRAIGHT_LINE);
println (cv.getResultString());
I'm not sure if CurveFitter allows fitting to trigonometric functions, there doesn't seem to be this option in the available fitting types. You might try a high-degree polynomial fitting instead.
You can also ask on the ImageJ forum or mailing list regarding the implementation details of the CurveFitter class.
I'm trying to calculate a the inverse of a 2 tailed Student Distribution using commons-math. I'm using Excel to compare values and validate if my results are correct.
So Using excel to calculate TINV with 5 degrees of freedom and 95.45% I use
=TINV(0.0455,5)
And get the Result: 2.64865
Using commons Math like so :
TDistribution t = new TDistribution(5);
double value = t.inverseCumulativeProbability(0.9545);
I get Result : 2.08913
I'm probably doing something wrong obviously. I'm not really that math savvy but I need to port an Excel sheet formula to Java for a project and got stuck on this.
What should I be using to get the result exactly like the TINV value? What am I missing.
MS documentation [1] says that TINV returns a two-tailed value. I'm pretty sure Commons Math is returning a one-tailed value. In order to get Commons Math to agree with Excel, cut the tail mass in half, i.e., call
t.inverseCumulativeProbability (1 - tail_mass/2);
[1] http://office.microsoft.com/en-us/excel-help/tinv-function-HP010335663.aspx
I am trying to use Weka for feature selection using PCA algorithm.
My original feature space contains ~9000 attributes, in 2700 samples.
I tried to reduce dimensionality of the data using the following code:
AttributeSelection selector = new AttributeSelection();
PrincipalComponents pca = new PrincipalComponents();
Ranker ranker = new Ranker();
selector.setEvaluator(pca);
selector.setSearch(ranker);
Instances instances = SamplesManager.asWekaInstances(trainSet);
try {
selector.SelectAttributes(instances);
return SamplesManager.asSamplesList(selector.reduceDimensionality(instances));
} catch (Exception e ) {
...
}
However, It did not finish to run within 12 hours. It is stuck in the method selector.SelectAttributes(instances);.
My questions are:
Is so long computation time expected for weka's PCA? Or am I using PCA wrongly?
If the long run time is expected:
How can I tune the PCA algorithm to run much faster? Can you suggest an alternative? (+ example code how to use it)?
If it is not:
What am I doing wrong? How should I invoke PCA using weka and get my reduced dimensionality?
Update: The comments confirms my suspicion that it is taking much more time than expected.
I'd like to know: How can I get PCA in java - using weka or an alternative library.
Added a bounty for this one.
After deepening in the WEKA code, the bottle neck is creating the covariance matrix, and then calculating the eigenvectors for this matrix. Even trying to switch to sparsed matrix implementation (I used COLT's SparseDoubleMatrix2D) did not help.
The solution I came up with was first reduce the dimensionality using a first fast method (I used information gain ranker, and filtering based on document frequencey), and then use PCA on the reduced dimensionality to reduce it farther.
The code is more complex, but it essentially comes down to this:
Ranker ranker = new Ranker();
InfoGainAttributeEval ig = new InfoGainAttributeEval();
Instances instances = SamplesManager.asWekaInstances(trainSet);
ig.buildEvaluator(instances);
firstAttributes = ranker.search(ig,instances);
candidates = Arrays.copyOfRange(firstAttributes, 0, FIRST_SIZE_REDUCTION);
instances = reduceDimenstions(instances, candidates)
PrincipalComponents pca = new PrincipalComponents();
pca.setVarianceCovered(var);
ranker = new Ranker();
ranker.setNumToSelect(numFeatures);
selection = new AttributeSelection();
selection.setEvaluator(pca);
selection.setSearch(ranker);
selection.SelectAttributes(instances );
instances = selection.reduceDimensionality(wekaInstances);
However, this method scored worse then using a greedy information gain and a ranker, when I cross-validated for estimated accuracy.
It looks like you're using the default configuration for the PCA, which judging by the long runtime, it is likely that it is doing way too much work for your purposes.
Take a look at the options for PrincipalComponents.
I'm not sure if -D means they will normalize it for you or if you have to do it yourself. You want your data to be normalized (centered about the mean) though, so I would do this yourself manually first.
-R sets the amount of variance you want accounted for. Default is 0.95. The correlation in your data might not be good so try setting it lower to something like 0.8.
-A sets the maximum number of attributes to include. I presume the default is all of them. Again, you should try setting it to something lower.
I suggest first starting out with very lax settings (e.g. -R=0.1 and -A=2) then working your way up to acceptable results.
Best
for the construction of your covariance matrix, you can use the following formula which is also used by matlab. It is faster then the apache library.
Whereby Matrix is an m x n matrix. (m --> #databaseFaces)
Does anyone know of any peakfitting libraries for Java?
I want to give it an array of data, and it tell me the position of the peak.
eg for this data:
x, y
-5, 0.875333026
-4, 0.885868909
-3, 0.895851362
-2, 0.903971085
-1, 0.908274124
0, 0.907117054
1, 0.901894046
2, 0.894918547
3, 0.887651936
4, 0.880114302
5, 0.872150014
it will say that the peak is at (about) -0.75
I'll probably just want to fit a gaussian, or maybe a split gaussian...
.
I've tagged it as curve-fitting, not peak-fitting or peak-finding as I don't have enough reputation to make new tags...
edit: I would prefer Apache (or compatible) licensed code...
Do you want to determine the position of peak by least-squares fitting?
I think the most popular method for this is Levenberg–Marquardt algorithm.
I don't know any Java libraries, but I'd search for terms like: nonlinear curve fitting, nonlinear least-squares, Levenberg–Marquardt or just Marquardt method. You may also consider coding it yourself. If you have a library for matrix manipulations it is like 20-30 lines of code (see Numerical Recipes).
Finally, there is my program for peak detection and peak fitting (peak means bell-shaped curve), on GPL. It includes a library (libfityk) and SWIG-based bindings to this library for Python and Lua. Someone reported generating also Java bindings and using libfityk from Java. But honestly, it may be an overkill for your needs.
Not sure if you've found an answer to this yet, but you can certainly make use of Apache Commons Math http://commons.apache.org/math/index.html It has a couple of curvefitting methods as well as an implementation of the Levenberg-Marquardt library.