Computing for Sample Standard Deviation

Computing for Sample Standard Deviation - java

In the code below I wanted to get sample standard deviation but I got (population standard deviation) instead of (sample standard
deviation), What am I doing wrong?
public void compute(View view) {
no1 = Double.parseDouble(et1.getText().toString());
no2 = Double.parseDouble(et2.getText().toString());
no3 = Double.parseDouble(et3.getText().toString());
m = (no1 + no2 + no3)/3;
mm1= (no1-m);
mm1 = mm1*mm1;
mm2= (no2-m);
mm2 = mm2*mm2;
mm3= (no3-m);
mm3 = mm3*mm3;
std = (mm1+mm2+mm3)/3;
tv1.setText(String.valueOf(Math.sqrt(std)));
}

If you're trying to calculate an estimate of the population using a random sample of that population (the "sample standard deviation") then the calculation is almost the same, but the dividend needs to be decreased by one.
In other words, your sample size is three so you need to divide by two in order to adjust for the fact you're working from a sample and not the entire population. So your final line of calculation needs to look like this:
std = (mm1 + mm2 + mm3) / 2;
You can find numerous pages online which give a detailed explanation about the difference between population and sample standard deviation, such as this article on macroption.com.

Related

FASTEST way to truncate a float in Java

I have a program that takes in anywhere from 20,000 to 500,000 velocity vectors and must output these vectors multiplied by some scalar. The program allows the user to set a variable accuracy, which is basically just how many decimal places to truncate to in the calculations. The program is quite slow at the moment, and I discovered that it's not because of multiplying a lot of numbers, it's because of the method I'm using to truncate floating point values.
I've already looked at several solutions on here for truncating decimals, like this one, and they mostly recommend DecimalFormat. This works great for formatting decimals once or twice to print nice user output, but is far too slow for hundreds of thousands of truncations that need to happen in a few seconds.
What is the most efficient way to truncate a floating-point value to n number of places, keeping execution time at utmost priority? I do not care whatsoever about resource usage, convention, or use of external libraries. Just whatever gets the job done the fastest.
EDIT: Sorry, I guess I should have been more clear. Here's a very simplified version of what I'm trying to illustrate:
import java.util.*;
import java.lang.*;
import java.text.DecimalFormat;
import java.math.RoundingMode;
public class MyClass {
static class Vector{
float x, y, z;
#Override
public String toString(){
return "[" + x + ", " + y + ", " + z + "]";
}
}
public static ArrayList<Vector> generateRandomVecs(){
ArrayList<Vector> vecs = new ArrayList<>();
Random rand = new Random();
for(int i = 0; i < 500000; i++){
Vector v = new Vector();
v.x = rand.nextFloat() * 10;
v.y = rand.nextFloat() * 10;
v.z = rand.nextFloat() * 10;
vecs.add(v);
}
return vecs;
}
public static void main(String args[]) {
int precision = 2;
float scalarToMultiplyBy = 4.0f;
ArrayList<Vector> velocities = generateRandomVecs();
System.out.println("First 10 raw vectors:");
for(int i = 0; i < 10; i++){
System.out.print(velocities.get(i) + " ");
}
/*
This is the code that I am concerned about
*/
DecimalFormat df = new DecimalFormat("##.##");
df.setRoundingMode(RoundingMode.DOWN);
long start = System.currentTimeMillis();
for(Vector v : velocities){
/* Highly inefficient way of truncating*/
v.x = Float.parseFloat(df.format(v.x * scalarToMultiplyBy));
v.y = Float.parseFloat(df.format(v.y * scalarToMultiplyBy));
v.z = Float.parseFloat(df.format(v.z * scalarToMultiplyBy));
}
long finish = System.currentTimeMillis();
long timeElapsed = finish - start;
System.out.println();
System.out.println("Runtime: " + timeElapsed + " ms");
System.out.println("First 10 multiplied and truncated vectors:");
for(int i = 0; i < 10; i++){
System.out.print(velocities.get(i) + " ");
}
}
}
The reason it is very important to do this is because a different part of the program will store trigonometric values in a lookup table. The lookup table will be generated to n places beforehand, so any velocity vector that has a float value to 7 places (i.e. 5.2387471) must be truncated to n places before lookup. Truncation is needed instead of rounding because in the context of this program, it is OK if a vector is slightly less than its true value, but not greater.
Lookup table for 2 decimal places:
...
8.03 -> -0.17511085919
8.04 -> -0.18494742685
8.05 -> -0.19476549993
8.06 -> -0.20456409661
8.07 -> -0.21434223706
...
Say I wanted to look up the cosines of each element in the vector {8.040844, 8.05813164, 8.065688} in the table above. Obviously, I can't look up these values directly, but I can look up {8.04, 8.05, 8.06} in the table.
What I need is a very fast method to go from {8.040844, 8.05813164, 8.065688} to {8.04, 8.05, 8.06}

The fastest way, which will introduce rounding error, is going to be to multiply by 10^n, call Math.rint, and to divide by 10^n.
That's...not really all that helpful, though, considering the introduced error, and -- more importantly -- that it doesn't actually buy anything. Why drop decimal points if it doesn't improve efficiency or anything? If it's about making the values shorter for display or the like, truncate then, but until then, your program will run as fast as possible if you just use full float precision.

Trying to convert this formula into an arithmetic expression in Java

I'm trying to take user input in the form of myMonthlyPayment, myAnnualInterestRate, and myPrincipal in order to calculate the number of months needed to pay off debt by using The formula I've attached to this post. What I have in eclipse for the formula right now is:
monthsNeeded = ((Math.log(myMonthlyPayment) - Math.log(myMonthlyPayment)
- ((myAnnualInterestRate / 1200.0) * myPrincipal))
/ ((Math.log(myAnnualInterestRate) / 1200.0) + 1.0));
I should be getting an output of 79 months with the inputs I'm using but instead I'm getting -62. I know the formula is correct, I'm almost positive I've made a mistake somewhere in the translation of it into Java. If someone could point it out that would be greatly appreciated!

So I've fixed it, with a sample input and output.
I didn't put much effort into making this code beautiful but you can see that even separating it into 3 parts using method extraction (although I didn't know how to name them, lacking the domain knowledge) made the code easier to understand.
public class Example {
public static void main(String[] args) {
double myMonthlyPayment = 2000;
double myAnnualInterestRate = 5;
double myPrincipal = 200000;
System.out.println(a(myMonthlyPayment));
System.out.println(b(myPrincipal, myAnnualInterestRate, myMonthlyPayment));
System.out.println(c(myAnnualInterestRate));
double monthsNeeded = (a(myMonthlyPayment) - b(myPrincipal, myAnnualInterestRate, myMonthlyPayment))
/ c(myAnnualInterestRate);
System.out.println(monthsNeeded);
}
private static double c(double myAnnualInterestRate) {
return Math.log((myAnnualInterestRate / 1200.0) + 1);
}
private static double b(double myPrinicipal, double myAnnualInterestRate, double myMonthlyPayment) {
return Math.log(myMonthlyPayment - (myAnnualInterestRate / 1200.0) * myPrinicipal);
}
private static double a(double myMonthlyPayment) {
return Math.log(myMonthlyPayment);
}
}

I think this is what you're looking for:
monthsNeeded = (Math.log(myMonthlyPayment) - Math.log(myMonthlyPayment - myAnnualInterestRate / 1200d * myPrincipal)) / Math.log(myAnnualInterestRate / 1200d + 1);
It seems that, in your solution, you weren't calculating your myAnnualInterestRate/1200*myPrincipal inside your second Math.log(...). You had also left some calculations outside of Math.log(...) in the bottom half of your equation.
If you have an equation that does an operation inside a natural log, when you convert that equation to Java code, the operation needs to still be done, inside the natural log:
ln(someNumber + 10)
would be converted to:
Math.log(someNumber + 10),
NOT:
Math.log(someNumber) + 10
Hope this helps and good luck. :)

Mind helping a newcomer who hit their first speed bump?

I just started learning how to program in Java. Everything was going well so far.. That was until I came across this "bonus" question/problem our teacher gave us to solve as an additional "challenge".
Please click here to view the Question and the Sample input/output (it's an image file)
Note that I'm not allowed to use anything that wasn't taught or discussed in class. So, things like arrays, method overloading, parsing arrays to methods, parseInt, etc. gets ruled out.
Here's what I was able to come up with, so far:
import java.util.Scanner;
public class Test
{
public static void main(String[] args)
{
int N; // number of lines of input
double length1, length2, length3; // the 3 lengths
double perimeter; // you get this by adding the 3 lengths
double minperimeter=0; // dummy value
Scanner input = new Scanner(System.in);
System.out.println("Enter the number of triangles you have:");
N = input.nextInt();
System.out.println("Insert the lengths of the sides of these " +
"triangles (3 real numbers per line):");
for (int counter=0; counter<N; counter++)
{
length1 = input.nextDouble();
length2 = input.nextDouble();
length3 = input.nextDouble();
perimeter = (length1 + length2 + length3);
minperimeter = Math.min(perimeter,Math.min(perimeter,perimeter));
}
System.out.printf("The minimum perimeter is %.1f%n", minperimeter);
}
}
My 2 main problems are:
1) The program only stores and works with the 'last' input.
The ones before it get replaced with this one. [update: solved this problem]
2) How do I print the "triangle number" in the final output? [update: solved this problem, too]
So, can anyone please help me come up with a solution that requires only the very basic learnings of Java? If it helps, this is the book we're using. Currently at Chapter 4. But we did learn about Math Class (which is in Chapter 5).
Update: Thank you so much for your replies, everyone! I was able to come up with a solution that does exactly what was asked in my question.

Math.min(perimeter,perimeter) will always give you perimeter. You probably wanted to do Math.min(perimeter,minPerimeter)
Since it's a programming assignment is best if I don't give you the full solution to your second question, but your hint is, in the counter parameter of your for loop. Save that when you update minperimeter, so that you know in which iteration of the loop you found the minimum.
Also, initialise your minPerimeter to 10000 or higher. If you start at 0, Math.Min will never be lower than that.

Change your for loop as:
double minperimeter=-1;
for (int counter=0; counter<N; counter++)
{
length1 = input.nextDouble();
length2 = input.nextDouble();
length3 = input.nextDouble();
perimeter = (length1 + length2 + length3);
if(minperimeter == -1){
minperimeter = perimeter;
} else{
Math.min(perimeter,minperimeter);
}
}

You have to store the smaller perimeter in your variable perimeter.
The hint from your task tells you, that any given perimeter is smaller than 1000. Thus initiate the perimeter to 1000.
In your for-loop then you have to store the smaller perimeter:
perimeter = Math.min(perimeter, length1 + length2 + length3)
if the sum of the edges is smaller than the current perimeter, the smaller value will be stored.
Please note that according to your given task, you have to input 3 doubles within one line.

Alternative Solution
Make an ArrayList and add all perimeter to that list and then find the minimum value from that list.
List<Double> perimeter = new ArrayList<>();
for (int counter=0; counter<N; counter++)
{
length1 = input.nextDouble();
length2 = input.nextDouble();
length3 = input.nextDouble();
perimeter.add(length1 + length2 + length3);
}
System.out.printf("The minimum perimeter is %.1f%n", Collections.min(perimeter));

ws4j returns infinity for similarity measures that should return 1

I have a very simple code taken from this example, where I am using the Lin, Path and Wu-Palmer similarity measures to compute the similarity between two words. My code is as follows:
import edu.cmu.lti.lexical_db.ILexicalDatabase;
import edu.cmu.lti.lexical_db.NictWordNet;
import edu.cmu.lti.ws4j.RelatednessCalculator;
import edu.cmu.lti.ws4j.impl.Lin;
import edu.cmu.lti.ws4j.impl.Path;
import edu.cmu.lti.ws4j.impl.WuPalmer;
public class Test {
private static ILexicalDatabase db = new NictWordNet();
private static RelatednessCalculator lin = new Lin(db);
private static RelatednessCalculator wup = new WuPalmer(db);
private static RelatednessCalculator path = new Path(db);
public static void main(String[] args) {
String w1 = "walk";
String w2 = "trot";
System.out.println(lin.calcRelatednessOfWords(w1, w2));
System.out.println(wup.calcRelatednessOfWords(w1, w2));
System.out.println(path.calcRelatednessOfWords(w1, w2));
}
}
And the scores are as expected EXCEPT when both words are identical. If both words are the same (e.g. w1 = "walk"; w2 = "walk";), the three measures I have should each return 1.0. But instead, they are returning 1.7976931348623157E308.
I have used ws4j before (the same version, in fact), but I have never seen this behavior. Searching online has not yielded any clues. What could possibly be going wrong here?
P.S. The fact that the Lin, Wu-Palmer and Path measures should return 1 can also be verified with the online demo provided by ws4j

I had a similar problem, and here's what's going on here. I hope that other people who run into this problem will find by response helpful.
If you have noticed, the online demo allows you to choose word sense by specifying word in the following format: word#pos_tag#word_sense. For example, a noun gender with the first word sense would be gender#n#1.
Your code snippet uses the first word sense by default. When I calculate WuPalmer similarity between "gender" and "sex", it will return 0.26. If I use online demo, it will return 1.0. But if we use "gender#n#1" and "sex#n#1" the online demo will return 0.26, so there is no discrepancy. The online demo calculates the max of all pos tag / word sense pairs. Here's a corresponding snippet of code that should do the trick:
ILexicalDatabase db = new NictWordNet();
WS4JConfiguration.getInstance().setMFS(true);
RelatednessCalculator rc = new Lin(db);
String word1 = "gender";
String word2 = "sex";
List<POS[]> posPairs = rc.getPOSPairs();
double maxScore = -1D;
for(POS[] posPair: posPairs) {
List<Concept> synsets1 = (List<Concept>)db.getAllConcepts(word1, posPair[0].toString());
List<Concept> synsets2 = (List<Concept>)db.getAllConcepts(word2, posPair[1].toString());
for(Concept synset1: synsets1) {
for (Concept synset2: synsets2) {
Relatedness relatedness = rc.calcRelatednessOfSynset(synset1, synset2);
double score = relatedness.getScore();
if (score > maxScore) {
maxScore = score;
}
}
}
}
if (maxScore == -1D) {
maxScore = 0.0;
}
System.out.println("sim('" + word1 + "', '" + word2 + "') = " + maxScore);
Also, this will give you 0.0 similarity on non-stemmed word forms, e.g. 'genders' and 'sex.' You can use a porter stemmer included in ws4j to make sure you stem words beforehand if needed.
Hope this helps!

I had raised this issue at the googlecode site for ws4j, and it turns out that indeed it was a bug. The reply I received is as follows:
This looks like it is due to attempting to override a protected static field (this can't be done in Java). The attached patch fixes the issue by moving the definition of min and max the fields to non-static final members in RelatednessCalculator and adding getters. Implementations then provide their min/max values through super constructor calls.
Patch can be applied with patch -p1 < 0001-Cannot-override-static-members-replacing-fields-with.patch
And here is the (now resolved) issue on their site.

Here is why -
In jcn we have...
sim(c1, c2) = 1 / distance(c1, c2)
distance(c1, c2) = ic(c1) + ic(c2) - (2 * ic(lcs(c1, c2)))
where c1, c2 are the two concepts,
ic is the information content of the concept.
lcs(c1, c2) is the least common subsumer of c1 and c2.
Now, we don't want distance to be 0 (=> similarity will become
undefined).
distance can be 0 in 2 cases...
(1) ic(c1) = ic(c2) = ic(lcs(c1, c2)) = 0
ic(lcs(c1, c2)) can be 0 if the lcs turns out to be the root
node (information content of the root node is zero). But since
c1 and c2 can never be the root node, ic(c1) and ic(c2) would be 0
only if the 2 concepts have a 0 frequency count, in which case, for
lack of data, we return a relatedness of 0 (similar to the lin case).
Note that the root node ACTUALLY has an information content of
zero. Technically, none of the other concepts can have an information
content value of zero. We assign concepts zero values, when
in reality their information content is undefined (due to zero
frequency counts). To see why look at the formula for information
content: ic(c) = -log(freq(c)/freq(ROOT)) {log(0)? log(1)?}
(2) The second case that distance turns out to be zero is when...
ic(c1) + ic(c2) = 2 * ic(lcs(c1, c2))
(which could have a more likely special case ic(c1) = ic(c2) =
ic(lcs(c1, c2)) if all three turn out to be the same concept.)
How should one handle this?
Intuitively this is the case of maximum relatedness (zero
distance). For jcn this relatedness would be infinity... But we
can't return infinity. And simply returning a 0 wouldn't work...
since here we have found a pair of concepts with maximum
relatedness, and returning a 0 would be like saying that they
aren't related at all.

1.7976931348623157E308 is the value of Double.MAX_VALUE but the maximum value of some similarity/relatedness algo (Lin, WuPalmer and Path) are between 0 and 1. Then , for identical synset, the maxium value can be returned is 1. Into the version of my repo (https://github.com/DonatoMeoli/WS4J) i fixed this and other bugs.
Now, for two identical words, the values returned are:
HirstStOnge 16.0
LeacockChodorow 1.7976931348623157E308
Lesk 1.7976931348623157E308
WuPalmer 1.0
Resnik 1.7976931348623157E308
JiangConrath 1.7976931348623157E308
Lin 1.0
Path 1.0
Done in 67 msec.
Process finished with exit code 0

How to compute the probability of a multi-class prediction using libsvm?

I'm using libsvm and the documentation leads me to believe that there's a way to output the believed probability of an output classification's accuracy. Is this so? And if so, can anyone provide a clear example of how to do it in code?
Currently, I'm using the Java libraries in the following manner
SvmModel model = Svm.svm_train(problem, parameters);
SvmNode x[] = getAnArrayOfSvmNodesForProblem();
double predictedValue = Svm.svm_predict(model, x);

Given your code-snippet, I'm going to assume you want to use the Java API packaged with libSVM, rather than the more verbose one provided by jlibsvm.
To enable prediction with probability estimates, train a model with the svm_parameter field probability set to 1. Then, just change your code so that it calls the svm method svm_predict_probability rather than svm_predict.
Modifying your snippet, we have:
parameters.probability = 1;
svm_model model = svm.svm_train(problem, parameters);
svm_node x[] = problem.x[0]; // let's try the first data pt in problem
double[] prob_estimates = new double[NUM_LABEL_CLASSES];
svm.svm_predict_probability(model, x, prob_estimates);
It's worth knowing that training with multiclass probability estimates can change the predictions made by the classifier. For more on this, see the question Calculating Nearest Match to Mean/Stddev Pair With LibSVM.

The accepted answer worked like a charm. Make sure to set probability = 1 during training.
If you are trying to drop prediction when the confidence is not met with threshold, here is the code sample:
double confidenceScores[] = new double[model.nr_class];
svm.svm_predict_probability(model, svmVector, confidenceScores);
/*System.out.println("text="+ text);
for (int i = 0; i < model.nr_class; i++) {
System.out.println("i=" + i + ", labelNum:" + model.label[i] + ", name=" + classLoadMap.get(model.label[i]) + ", score="+confidenceScores[i]);
}*/
//finding max confidence;
int maxConfidenceIndex = 0;
double maxConfidence = confidenceScores[maxConfidenceIndex];
for (int i = 1; i < confidenceScores.length; i++) {
if(confidenceScores[i] > maxConfidence){
maxConfidenceIndex = i;
maxConfidence = confidenceScores[i];
}
}
double threshold = 0.3; // set this based data & no. of classes
int labelNum = model.label[maxConfidenceIndex];
// reverse map number to name
String targetClassLabel = classLoadMap.get(labelNum);
LOG.info("classNumber:{}, className:{}; confidence:{}; for text:{}",
labelNum, targetClassLabel, (maxConfidence), text);
if (maxConfidence < threshold ) {
LOG.info("Not enough confidence; threshold={}", threshold);
targetClassLabel = null;
}
return targetClassLabel;

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Computing for Sample Standard Deviation - java

Related

FASTEST way to truncate a float in Java

Trying to convert this formula into an arithmetic expression in Java

Mind helping a newcomer who hit their first speed bump?

ws4j returns infinity for similarity measures that should return 1

How to compute the probability of a multi-class prediction using libsvm?

Categories

Resources