Difference between R.loess and org.apache.commons.math LoessInterpolator - java

I'm trying to compute the convert a R script to java using the apache.commons.math library. Can I use org.apache.commons.math.analysis.interpolation.LoessInterpolator in place of R loess ? I cannot get the same result.
EDIT.
here is a java program that creates a random array(x,y) and compute the loess with LoessInterpolator or by calling R. At the end, the results are printed.
import java.io.*;
import java.util.Random;
import org.apache.commons.math.analysis.interpolation.LoessInterpolator;
public class TestLoess
{
private String RScript="/usr/local/bin/Rscript";
private static class ConsummeInputStream
extends Thread
{
private InputStream in;
ConsummeInputStream(InputStream in)
{
this.in=in;
}
#Override
public void run()
{
try
{
int c;
while((c=this.in.read())!=-1)
System.err.print((char)c);
}
catch(IOException err)
{
err.printStackTrace();
}
}
}
TestLoess()
{
}
private void run() throws Exception
{
int num=100;
Random rand=new Random(0L);
double x[]=new double[num];
double y[]=new double[x.length];
for(int i=0;i< x.length;++i)
{
x[i]=rand.nextDouble()+(i>0?x[i-1]:0);
y[i]=Math.sin(i)*100;
}
LoessInterpolator loessInterpolator=new LoessInterpolator(
0.75,//bandwidth,
2//robustnessIters
);
double y2[]=loessInterpolator.smooth(x, y);
Process proc=Runtime.getRuntime().exec(
new String[]{RScript,"-"}
);
ConsummeInputStream errIn=new ConsummeInputStream(proc.getErrorStream());
BufferedReader stdin=new BufferedReader(new InputStreamReader(proc.getInputStream()));
PrintStream out=new PrintStream(proc.getOutputStream());
errIn.start();
out.print("T<-as.data.frame(matrix(c(");
for(int i=0;i< x.length;++i)
{
if(i>0) out.print(',');
out.print(x[i]+","+y[i]);
}
out.println("),ncol=2,byrow=TRUE))");
out.println("colnames(T)<-c('x','y')");
out.println("T2<-loess(y ~ x, T)");
out.println("write.table(residuals(T2),'',col.names= F,row.names=F,sep='\\t')");
out.flush();
out.close();
double y3[]=new double[x.length];
for(int i=0;i< y3.length;++i)
{
y3[i]=Double.parseDouble(stdin.readLine());
}
System.out.println("X\tY\tY.java\tY.R");
for(int i=0;i< y3.length;++i)
{
System.out.println(""+x[i]+"\t"+y[i]+"\t"+y2[i]+"\t"+y3[i]);
}
}
public static void main(String[] args)
throws Exception
{
new TestLoess().run();
}
}
compilation & exec:
javac -cp commons-math-2.2.jar TestLoess.java && java -cp commons-math-2.2.jar:. TestLoess
output:
X Y Y.java Y.R
0.730967787376657 0.0 6.624884763714674 -12.5936186703287
0.9715042030481429 84.14709848078965 6.5263049649584 71.9725380029913
1.6089216283982513 90.92974268256818 6.269100654071115 79.839773167581
2.159358633515885 14.112000805986721 6.051308261720918 3.9270340708818
2.756903911313087 -75.68024953079282 5.818424835586378 -84.9176311089431
3.090122310789737 -95.89242746631385 5.689740879461759 -104.617807889069
3.4753114955304554 -27.941549819892586 5.541837854229562 -36.0902352062634
4.460153035730264 65.6986598718789 5.168028655980764 58.9472823439219
5.339335553602744 98.93582466233818 4.840314399516663 93.3329030534449
6.280584733084859 41.21184852417566 4.49531113985498 36.7282165788057
6.555538699120343 -54.40211108893698 4.395343460231256 -58.5812856445538
6.68443584999412 -99.99902065507035 4.348559404444451 -104.039069260889
6.831037507640638 -53.657291800043495 4.295400167908642 -57.5419313320511
6.854275630124528 42.016703682664094 4.286978656933373 38.1564179414478
7.401015387322993 99.06073556948704 4.089252482141094 95.7504087842369
8.365502247999844 65.02878401571168 3.7422883733498726 62.5865641279576
8.469992934250815 -28.790331666506532 3.704793544880599 -31.145867173504
9.095139297716374 -96.13974918795569 3.4805388562453574 -98.0047896609079
9.505935493207435 -75.09872467716761 3.3330472034239405 -76.6664588290508
the output values for y are clearly not the same between R and Java; TheY.R column looks good (it's close to the original Y column). How should I change this in order to get Y.java ~ Y.R ?

You need to change the default values of three input parameters to make the Java and R versions identical:
The Java LoessInterpolator only does linear local polynomial regression, but R supports linear (degree=1), quadratic (degree=2), and a strange degree=0 option. So you need to specify degree=1 in R to be identical to Java.
LoessInterpolator defaults number of iterations DEFAULT_ROBUSTNESS_ITERS=2, but R defaults iterations=4. So you need to set control = loess.control(iterations=X) in R (X is the number of iterations).
LoessInterpolator defaults DEFAULT_BANDWIDTH=0.3 but R defaults span=0.75.

I can't speak for the java implementation, but lowess has a number of parameters which control the bandwidth of the fit. Unless you're fitting with the same control parameters you should expect the results to differ. My recommendation whenever people are smoothing data is to plot the original data as well as the fit, and decide for yourself what control parameters yield your desired tradeoff between fidelity to the data and smoothing (aka noise removal).

There are two problems here. First if you plot the data you are generating it looks almost random and the fit generated by loess in R is very poor e.g.
plot(T$x, T$y)
lines(T$s, T2$fitted, col="blue", lwd=3)
Then in your R script you are writing the residuals not the predictions so in this line
out.println("write.table(residuals(T2),'',
col.names= F,row.names=F,sep='\\t')");
you need to change residuals(T2) to predict(T2) e.g.
out.println("write.table(predict(T2),'',
col.names= F,row.names=F,sep='\\t')");
So it was pure chance in your code example that the first couple of lines of residuals generated by R looked a good fit.
For me if I try fitting with some more appropriate data then Java and R do return similar but not identical results. Also I found the results were closer if I did not adjust the default robustnessIter settings.

Related

Neuron results are a little off

Hi I coded a single neuron to predict a student's mark for subject D based of the marks they got for subject A, B and C.
After training my neuron with some historical data that contain the 3 marks as well as the actual mark they got for subject D, I then inputed test data to see how closely the predicted mark would match with the actual one.
Below is my Neuron class
public class Neuron
{
double[] Weights = new double[3];
public Neuron(double W1, double W2, double W3)
{
Weights[0] = W1;
Weights[1] = W2;
Weights[2] = W3;
}
public double FnetLinear(int Z1, int Z2, int Z3)
{
return (Z1*Weights[0] + Z2*Weights[1] + Z3*Weights[2]);
}
public void UpdateWeight(int i, double Wi)
{
Weights[i] = Wi;
}
}
And here is my main class
public class Main
{
public int t;
public Neuron neuron;
double LearningRate = 0.00001;
public ArrayList<Marks> TrainingSet, TestSet;
public static void main(String[] args) throws IOException
{
Main main = new Main();
main.run();
}
public void run()
{
TrainingSet = ReadCSV("G:\\EVOS\\EVO_Assignemnt1\\resources\\Streamdata.csv");
TestSet = ReadCSV("G:\\EVOS\\EVO_Assignemnt1\\resources\\Test.csv");
Random ran = new Random();
neuron = new Neuron(ran.nextDouble(), ran.nextDouble(), ran.nextDouble());
train();
Test();
}
public void train()
{
t = 0;
while(t<1000000)
{
for(Marks mark: TrainingSet)
{
for(int i=0; i<neuron.Weights.length; i++)
{
double yp = neuron.FnetLinear(mark.marks[0] , mark.marks[1], mark.marks[2]);
double wi = neuron.Weights[i] - LearningRate*(-2*(mark.marks[3]-yp))*mark.marks[i];
neuron.UpdateWeight(i, wi);
}
}
t++;
}
}
public void Test()
{
System.out.println("Test Set results:");
int count = 1;
for(Marks mark: TestSet)
{
double fnet = neuron.FnetLinear(mark.marks[0] , mark.marks[1], mark.marks[2]);
System.out.println("Mark " + count + ": " + fnet);
count++;
}
}
public static ArrayList<Marks> ReadCSV(String csv)
{
ArrayList<Marks> temp = new ArrayList<>();
String line;
BufferedReader br;
try
{
br = new BufferedReader(new FileReader(csv));
while((line=br.readLine()) != null)
{
String[] n = line.split(",");
Marks stud = new Marks(Integer.valueOf(n[0]), Integer.valueOf(n[1]), Integer.valueOf(n[2]), Integer.valueOf(n[3]));
temp.add(stud);
}
}
catch (Exception e)
{
System.out.println("ERROR");
}
return temp;
}
}
This is the test data with the last number being the actual mark.
After running the test data i get results around these:
As you can see the first 4 marks predictions are way off from the actual mark.
I followed the text book's explenation of Computational Intlligence An Introduction (Chapter 2 if u are curious).
However I would like to know what I im doing wrong. How can I get more accurate results?
Neural networks are very black-box esque; Due to this, it's pretty hard to say exactly why your marks results are way off.
That being said, here are some of the main methods of increasing the accuracy of your neural network:
Adjust the number of layers and neurons; I notice you're only using a single neuron. A single neuron in a neural network is typically just... bad. You're never going to get any good results like that. Neural networks need enough complexity in the form of layering and neuron count in order to calculate or predict whatever it is you're trying to teach it to do. A single neuron by itself really can't learn anything useful. This is also probably a big reason why your network accuracy is so bad.
Train for longer; I notice you're only training your network 1 million times; this is not always enough. For reference, the last time I trained a neural network, I used over 30 million sets of input/output.
Retrain your network with different starting weights; Randomized starting weights are great, but sometimes you just get a bad batch of starting weights. In the same project where I used 30 million input/output sets, I also tried over 25 different configurations of initial starting weights across 15 different layouts of nodes and layers.
Pick a different activation function; Linear activation functions are usually not that useful. I usually default to using a sigmoid function to start off, unless there are specific other functions that fulfill the use case I'm trying to train.
A common pitfall that can cause low accuracy is bad training data; Make sure the training data you're using is correct and is internally consistent with whatever it is you're trying to teach.
As a final note, I find myself having some trouble understanding what kind of a neural network you're trying to write exactly; I've made the assumption that this is some sort of attempt at a feed forward, back propagation neural network with a single neuron in it, but most of the advice here should still apply.

Program is delayed in writing to a .txt file?

So, I've searched around stackoverflow for a bit, but I can't seem to find an answer to this issue.
My current homework for my CS class involves reading from a file of 5000 random numbers and doing various things with the data, like putting it into an array, seeing how many times a number occurs, and finding what the longest increasing sequence is. I've got all that done just fine.
In addition to this, I am (for myself) adding in a method that will allow me to overwrite the file and create 5000 new random numbers to make sure my code works with multiple different test cases.
The method works for the most part, however after I call it it doesn't seem to "activate" until after the rest of the program finishes. If I run it and tell it to change the numbers, I have to run it again to actually see the changed values in the program. Is there a way to fix this?
Current output showing the delay between changing the data:
Not trying to change the data here- control case.
elkshadow5$ ./CompileAndRun.sh
Create a new set of numbers? Y for yes. n
What number are you looking for? 66
66 was found 1 times.
The longest sequence is [606, 3170, 4469, 4801, 5400, 8014]
It is 6 numbers long.
The numbers should change here but they don't.
elkshadow5$ ./CompileAndRun.sh
Create a new set of numbers? Y for yes. y
What number are you looking for? 66
66 was found 1 times.
The longest sequence is [606, 3170, 4469, 4801, 5400, 8014]
It is 6 numbers long.
Now the data shows that it's changed, the run after the data should have been changed.
elkshadow5$ ./CompileAndRun.sh
Create a new set of numbers? Y for yes. n
What number are you looking for? 1
1 was found 3 times.
The longest sequence is [1155, 1501, 4121, 5383, 6000]
It is 5 numbers long.
My code:
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.PrintWriter;
import java.util.Scanner;
public class jeftsdHW2 {
static Scanner input = new Scanner(System.in);
public static void main(String args[]) throws Exception {
jeftsdHW2 random = new jeftsdHW2();
int[] data;
data = new int[5000];
random.readDataFromFile(data);
random.overwriteRandNums();
}
public int countingOccurrences(int find, int[] array) {
int count = 0;
for (int i : array) {
if (i == find) {
count++;
}
}
return count;
}
public int[] longestSequence(int[] array) {
int[] sequence;
return sequence;
}
public void overwriteRandNums() throws Exception {
System.out.print("Create a new set of numbers? Y for yes.\t");
String answer = input.next();
char yesOrNo = answer.charAt(0);
if (yesOrNo == 'Y' || yesOrNo == 'y') {
writeDataToFile();
}
}
public void readDataFromFile(int[] data) throws Exception {
try {
java.io.File infile = new java.io.File("5000RandomNumbers.txt");
Scanner readFile = new Scanner(infile);
for (int i = 0; i < data.length; i++) {
data[i] = readFile.nextInt();
}
readFile.close();
} catch (FileNotFoundException e) {
System.out.println("Please make sure the file \"5000RandomNumbers.txt\" is in the correct directory before trying to run this.");
System.out.println("Thank you.");
System.exit(1);
}
}
public void writeDataToFile() throws Exception {
int j;
StringBuilder theNumbers = new StringBuilder();
try {
PrintWriter writer = new PrintWriter("5000RandomNumbers.txt", "UTF-8");
for (int i = 0; i < 5000; i++) {
if (i > 1 && i % 10 == 0) {
theNumbers.append("\n");
}
j = (int) (9999 * Math.random());
if (j < 1000) {
theNumbers.append(j + "\t\t");
} else {
theNumbers.append(j + "\t");
}
}
writer.print(theNumbers);
writer.flush();
writer.close();
} catch (IOException e) {
System.out.println("error");
}
}
}
It is possible that the file has not been physically written to the disk, using flush is not enough for this, from the java documentation here:
If the intended destination of this stream is an abstraction provided by the underlying operating system, for example a file, then flushing the stream guarantees only that bytes previously written to the stream are passed to the operating system for writing; it does not guarantee that they are actually written to a physical device such as a disk drive.
Because of the HDDs read and write speed, it is advisable to depend as little as possible on HDD access.
Perhaps storing the random number strings to a list when re-running and using that would be a solution. You could even write the list to disk, but this way the implementation does not depend on the time the file is being written.
EDIT
After the OP posted more of its code it became apparent that my original answer is not relatede to the problem. Nonetheless it is sound.
The code OP posted is not enough to see when is he reading the file after writing. It seems he is writing to the file after reading, which of course is what is percieved as an error. Reading after writing should produce a program that does what you want.
Id est, this:
random.readDataFromFile(data);
random.overwriteRandNums();
Will be reflected until the next execution. This:
random.overwriteRandNums();
random.readDataFromFile(data);
Will use the updated file in the current execution.

Can't find certain function when calling R within java

I'm trying to use R within Java, specifically within Processing.
I want to use the readPNG function, but when I try to, R displays an error readPNG function can't be found. This is extremely weird because I have the png library active and if I try to use it directly from R this workouts just fine. I'm using the Rservepackage to connect java and R. Any advise would be very much appriciated.
Here's part of the code I'm using if it helps.
import org.rosuda.REngine.Rserve.*;
import org.rosuda.REngine.*;
double[] data;
void setup() {
size(300,300);
try {
RConnection c = new RConnection();
// generate 100 normal distributed random numbers and then sort them
data= c.eval("readPNG('juego-11932.png')").asDoubles();
} catch ( REXPMismatchException rme ) {
rme.printStackTrace();
} catch ( REngineException ree ) {
ree.printStackTrace();
}
}
void draw() {
background(255);
for( int i = 0; i < data.length; i++) {
line( i * 3.0, height/2, i* 3.0, height/2 - (float)data[i] * 50 );
}
}
Your Java code connects to a fresh R session so no packages are loaded. Hence you have to either use png::readPNG() or load the png package explicitly.

rJava: java code altered but R object the same

I have a simple java program that creates an array of random numbers. I am using rJava to call this program and create an R object. I know how to create random numbers in R ... I am trying to reproduce the results of a complicated java program exactly, which requires I use the same random numbers. Here is the java:
import java.util.Random;
public class rJava
{
public static void main(String[] args)
{
createRan();
}
public static double[] createRan()
{
int numSims = 100000;
int m_RandomSeed = 1234567;
Random m_Rnd = new Random(m_RandomSeed);
double[] randoms = new double[100000];
for(int i=0; i < numSims; i++)
{
randoms[i] = m_Rnd.nextDouble();
}
return randoms;
}
}
rJava seems to be working fine for me ... I use the following commands in R and an object called "rans" with 100000 random numbers is created.
library(rJava)
.jinit()
obj <- .jnew("rJava")
rans <- .jcall(obj, "[D", "createRan")
My problem is I went to change the size of the array for testing purposes to something more manageable, like 10 random numbers instead of 100,000. I saved and recompiled rJava.java, and re-ran the R code above. It still created an array of 100,000 numbers. I rebooted my computer and tried again ... still 100,000. I would ultimately like to pass a parameter into the java code to choose the number of random numbers to generate but would like to understand what is going on here first. I know very little about Java, is there some place where the initial state of rJava.java is stored and is being called from? As I said I have recompiled the class file so I would not the "original" would be overwritten.
Thanks

Has anyone built a program slicer in Java?

I have to build a program slicer in java to slice source code based on a slicing criterion. I see there are a very few libraries out there for this purpose. Notwithstanding, I would like to try this myself. I have read some publications on the topic that include the use of a dependence graph to work out the data and control dependencies in a program. A slicing algorithm can then be used in conjunction with a slicing criterion to generate slices of the java program. Has anyone done this type of thing before? If so, could you perhaps point me in the right direction to get started with this? I have searched and search and cannot figure out where to start, what APIs exist (if any).
An Example would be:
public class Foo {
public void fooBar() {
int x = 10;
int y = 12;
String s = "";
for(int j=0; j<10; j++) {
s += x;
x++;
y += 3;
}
System.out.println("y value " + y);
}
}
If a slicing criterion (13, y) is choosen, where 13 is the last line in the above code, then the result will be
public class Foo {
public void fooBar() {
int y = 12;
for(int j=0; j<10; j++) {
y += 3;
}
}
}
The slicing criterion returns all of the statements that may affect variable 'y' at line 13.
There is very less work in this area. You can reuse code of some open source utility like checkstyle or yasca. Then you can apply your own implementation logic for the slicing.
Late, but maybe useful for others: Wala. WALA includes a slicer, based on context-sensitive tabulation of reachability in the system dependence graph.

Categories