How does WEKA normalize attributes? - java

Suppose I input to WEKA some dataset and set a normalization filter for the attributes so the values be between 0 and 1. Then suppose the normalization is done by dividing on the maximum value, and then the model is built. Then what happens if I deploy the model and in the new instances to be classified an instance has a feature value that is larger than the maximum in the training set. How such a situation is handled? Does it just take 1 or does it then take more than 1? Or does it throw an exception?

The documentation doesn't specify this for filters in general.So it must depend on the filter. I looked at the source code of weka.filters.unsupervised.attribute.Normalize which I assume you are using, and I don't see any bounds checking in it.
The actual scaling code is in the Normalize.convertInstance() method:
value = (vals[j] - m_MinArray[j]) / (m_MaxArray[j] - m_MinArray[j])
* m_Scale + m_Translation;
Barring any (unlikely) additional checks outside this method I'd say that it will scale to a value greater than 1 in the situation that you describe. To be 100% sure your best bet is to write a testcase, invoke the filter yourself, and find out. With libraries that haven't specified their working in the Javadoc, you never know what the next release will do. So if you greatly depend on a particular behaviour, it's not a bad idea to write an automated test that regression-tests the behaviour of the library.

I have the same questions as you said. I did as follows and may this method can help you:
I suppose you use the weka.filters.unsupervised.attribute.Normalize to normalize your data.
as Erwin Bolwidt said, weka use
value = (vals[j] - m_MinArray[j]) / (m_MaxArray[j] - m_MinArray[j])
* m_Scale + m_Translation;
to normalize your attribute.
Don't forget that the Normalize class has this two method:
public double[] getMinArray()
public double[] getMaxArray()
Which Returns the calculated minimum/maximum values for the attributes in the data.
And you can store the minimum/maximum values. And then use the formula to normalize your data by yourself.
Remember you can set the attribute in Instance class, and you can classify your result by Evaluation.evaluationForSingleInstance
I 'll give you the link later, may this help you.
Thank you

Related

Incorrect class prediction using Weka

I am using the WEKA API weka-stable-3.8.1.
I have been trying to use J48 decision tree(C4.5 implementation of weka).
My data has around 22 features and a nominal class with 2 possible values : yes or no.
While evaluating with the following code :
Classifier model = (Classifier) weka.core.SerializationHelper.read(trainedModelDestination);
Evaluation evaluation = new Evaluation(trainingInstances);
evaluation.evaluateModel(model, testingInstances);
System.out.println("Number of correct predictions : "+evaluation.correct());
I get all predictions correct.
But when I try these test cases individually using :
for(Instance i : testingInstances){
double predictedClassLabel = model.classifyInstance(i);
System.out.println("predictedClassLabel : "+predictedClassLabel);
}
I always get the same output, i.e. 0.0.
Why is this happening ?
If the provided snippet is indeed from your code, you seem to be always classifying the first test instance: "testingInstances.firstInstance()".
Rather, you may want to make a loop to classify each test instance.
for(Instance i : testingInstances){
double predictedClassLabel = model.classifyInstance(i);
System.out.println("predictedClassLabel : "+predictedClassLabel);
}
Should have updated much sooner.
Here's how I fixed this:
During the training phase, the model learns from your training set. While learning from this set it encounters categorical/nominal features as well.
Most algorithms require numerical values to work. To deal with this the algorithm maps the variables to a specific numerical value. longer explanation here
Since the algorithm has learned this during the training phase, the Instances object holds this information. During testing phase you have to use the same Instances object that was created during training phase. Otherwise, the testing classifier will not correctly map your nominal values to their expected values.
Note:
This kind of encoding gives biased training results in Non-tree based models and things like One-Hot-Encoding should be used in such cases.

Automate real time data using java

I am a new bee to Automation and Java. I am working on a problem which requires me to read the read time stock market data from the database and verify it with the same with the value seen on the UI. I am ok having approximations up to 5% in the value. To verify if these tests have passed its important for me to assert the values with the value in the UI.
I have a small logic to verify these values, I wanted to know if this is a good way of coding on java or do i have a better way to achieve these results.
Alorigthm.
I read the int/float value from db.
Calculate 5% of the value in step 1.
Get the value in the UI and assert if its greater then or equal to value in step 2.
If greater i say Asseert.assertEquals(true,true) else i fail my assert.
If any better way to work for these values, request a better answer.
It's more usual to have your Assertion represent the meaning of your test, having to assert(true, true) does not do this. So:
3. Calculate the absoluete difference between the value obtained in step 1 and the UI value (when I say absolute value, you need to remember that the UI might be higher or lower than the db value, you need to make the difference to be always positive)
4. Assert.assertThat( difference < theFivePercentValue)
Also you could consider using the Hamcrest extension to JUnit that includes a closeTo() method.

Solving a non linear system in java (using optim toolbox)

I have a system of nonlinear dynamics which I which to solve to optimality. I know how to do this in MATLAB, but I wish to implement this in JAVA. I'm for some reason lost in how to do it in Java.
What I have is following:
z(t) which returns states in a dynamic system.
z(t) = [state1(t),...,state10(t)]
The rate of change of this dynamic system is given by:
z'(t) = f(z(t),u(t),d(t)) = [dstate1(t)/dt,...,dstate10(t)/dt]
where u(t) and d(t) is some external variables that I know the value of.
In addition I have a function, lets denote that g(t) which is defined from a state variable:
g(t) = state4(t)/c1
where c1 is some constant.
Now I wish to solve the following unconstrained nonlinear system numerically:
g(t) - c2 = 0
f(z(t),u(t),0)= 0
where c2 is some constant. Above system can be seen as a simple f'(x) = 0 problem consisting of 11 equations and 1 unkowns and if I where supposed to solve this in MATLAB I would do following:
[output] = fsolve(#myDerivatives, someInitialGuess);
I am aware of the fact that JAVA doesn't come with any build-in solvers. So as I see it there are two options in solving the above mentioned problem:
Option 1: Do it my-self: I could use numerical methods as e.g. Gauss newton or similar to solve this system of nonlinear equations. However, I will start by using a java toolbox first, and then move to a numerical method afterwards.
Option 2: Solvers (e.g. commons optim) This solution is what I am would like to look into. I have been looking into this toolbox, however, I have failed to find an exact example of how to actually use the MultiVariateFunction evaluater and the numerical optimizer. Does any of you have any experience in doing so?
Please let me know if you have any ideas or suggestions for solving this problem.
Thanks!
Please compare what your original problem looks like:
A global optimization problem
minimize f(y)
is solved by looking for solutions of the derivatives system
0=grad f(y) or 0=df/dy (partial derivatives)
(the gradient is the column vector containing all partial derivatives), that is, you are computing the "flat" or horizontal points of f(y).
For optimization under constraints
minimize f(y,u) such that g(y,u)=0
one builds the Lagrangian functional
L(y,p,u) = f(y,u)+p*g(y,u) (scalar product)
and then compute the flat points of that system, that is
g(y,u)=0, dL/dy(y,p,u)=0, dL/du(y,p,u)=0
After that, as also in the global optimization case, you have to determine what the type of the flat point is, maximum, minimun or saddle point.
Optimal control problems have the structure (one of several equivalent variants)
minimize integral(0,T) f(t,y(t),u(t)) dt
such that y'(t)=g(t,y(t),u(t)), y(0)=y0 and h(T,y(T))=0
To solve it, one considers the Hamiltonian
H(t,y,p,u)=f(t,y,u)-p*g(t,y,u)
and obtained the transformed problem
y' = -dH/dp = g, (partial derivatives, gradient)
p' = dH/dy,
with boundary conditions
y(0)=y0, p(T)= something with dh/dy(T,y(T))
u(t) realizes the minimum in v -> H(t,y(t),p(t),v)

Eigenvalue and the corresponding EigenVector in Java

Given a Matrix, I'm interested in the Eigenvalues and the corresponding Eigenvector.
Using Jama, I can get the Eigenvalues and the Eigenvectors, yet the correlation between the two is not defined: I want to map each Eigenvector to the corresponding Eigenvalue.
Can you please recommend me of a way to do so? I tried to implement it myself but it got nasty.
Thanks :)
I am trying to look for an authorized answer, yet for now, According to experiments and observation I performed, the eigenvectors and evigenValues seem to be corresponding.
Usually they are presented in corresponding order. But you can always multiply an eigenvector by the matrix and seeing what multiplier it applies to the vector. That's also your eigenvalue directly.
I asked the developer of the Weka by mail regrading the above issue and they confirm the assumption -
The eigenvectors are indeed provided in the same order as the eigenvalues.
Use a hashmap to store them? I'm not sure this answer is relevant given the question is a bit vague..

Handling division by zero in graphics code

I'm writing a library for procedural image generation (Clisk) which allows users to define their own mathematical functions to generate images.
It's clearly possible for them to define a function which could result in a divide by zero for some pixels, e.g. (pseudocode)
red = 1.0 / (xposition - 0.5)
This would result in a divide by zero whenever xposition = 0.5 (the middle of the image)
Ideally I don't want image generation to crash... but at the same time I don't want to create a clunky hack to ignore divide by zeros that will cause problems later.
What would be a good, robust, systematic approach to handling these cases?
Ideally I don't want image generation to crash... but at the same time I don't want to create a clunky hack to ignore divide by zeros that will cause problems later.
(I'm assuming you mean the snippet to be an example of some user-supplied code ...)
Clearly, if the user-supplied code could throw exceptions, then you can't stop that happening. (And the advice to check before division is obviously irrelevant ... to you.)
So what could you do apart from "crash"? Generate an empty image? Ignore the user's function? You'd be producing garbage ... and that's not what the user needs.
You certainly can't reach in and fix his / her java code. (And if that snippet is meant to be code written in some custom language, then you can't reach in and correct that either. You / your library doesn't know what the user-supplied code should be doing ...)
No. I reckon that the best answer is to wrap any unexpected (unchecked) exceptions coming out of the user-supplied code in an exception of your own that tells the user clearly that the error occurred in his code. It is then up to the application code calling your library code whether to deal with the exception or "crash".
If you are asking for a "good, robust, systematic approach" for users to write their functions, I think you are barking up the wrong tree. And it is not really your concern ...
I'm not a graphics programmer really, but you could do
private static final double MIN_X = 0.0000001
red = 1.0 / Math.max(xpos - 0.5, MIN_X);
Obviously, you will probably have to drop an absolute value in there if you allow negatives
You could always just supply a parameter asking them what to do on divide-by-zero. It's their code, after all - they should know what's best for their case.
Then the question becomes, what's a reasonable default for that parameter? I'd say "return 0.0" or "throw an exception" are both reasonable for this application. Just make sure you document it.

Categories