WEKA cross validate linear regression - can I get RMSPE? - java

Is it possible to get RMSPE after cross validating a model? I see I can easily get RMSE - but what about the Root Mean Square Percentage Error?
Sample code I've put together with WEKA linear regression cross validation:
// loads data and set class index
final ArrayList<Attribute> attributes = new ArrayList<>();
attributes.add(new Attribute("x"));
attributes.add(new Attribute("y"));
Instances data = new Instances("name", attributes, 0);
data.add(new DenseInstance(1d, new double[]{5, 80}));
// ... add more data
// -c last
data.setClassIndex(data.numAttributes() - 1);
// classifier
final LinearRegression cls = new LinearRegression();
// other options
int seed = 129;
int folds = 3;
// randomize data
Random rand = new Random(seed);
Instances randData = new Instances(data);
randData.randomize(rand);
if (randData.classAttribute().isNominal())
randData.stratify(folds);
// perform cross-validation
Evaluation eval = new Evaluation(data);
eval.crossValidateModel(cls, data, 3, new Random(seed));
System.out.println("rootMeanSquaredError " + eval.rootMeanSquaredError());
System.out.println("rootRelativeSquaredError " + eval.rootRelativeSquaredError());
System.out.println("rootMeanPriorSquaredError " + eval.rootMeanPriorSquaredError());
// output evaluation
System.out.println();
System.out.println("=== Setup ===");
System.out.println("Classifier: " + cls.getClass().getName() + " " + Utils.joinOptions(cls.getOptions()));
System.out.println("Dataset: " + data.relationName());
System.out.println("Folds: " + folds);
System.out.println("Seed: " + seed);
System.out.println();
System.out.println(eval.toSummaryString("=== " + folds + "-fold Cross-validation ===", true));
/*
=== Setup ===
Classifier: weka.classifiers.functions.LinearRegression -S 0 -R 1.0E-8 -num-decimal-places 4
Dataset: name
Folds: 3
Seed: 129
=== 3-fold Cross-validation ===
Correlation coefficient 0.6289
Mean absolute error 7.5177
Root mean squared error 8.262
Relative absolute error 85.7748 %
Root relative squared error 77.9819 %
Total Number of Instances 15
*/

Weka doesn't compute the RMSPE by default. I've put together a little Weka package that should do the trick for numeric classes (NB: only done limited testing), called rmspe-weka-package.
After an evaluation run (with that package installed), you should be able to retrieve the statistic as follows:
Evaluation eval = ... // initialize your evaluation object
... // perform your evaluation
double rmspe = eval.getPluginMetric("weka.classifiers.evaluation.RMSPE").getStatistic("RMSPE");

Related

VMWare java SDK: When doesn't an available PerfMetricID report Data?

I'm trying to use vmware sdk for java to collect the perfomance data of each entity (cluster/datastore/Host/VM) in the vmware environment.
The idea is to get the available PerfMetricIds for the target entity with queryAvailablePerfMetric, query those and report the details of the counter, the timestamp and the value.
However when I get the PerfMetricIds for an entity, not every detected (returned) PerfMetricId is reporting data. For example for each Datastore I get at least 4 ids which do not return data when queried, these IDs represent the counters associated with the average number of read and write operations, and for a cluster I'm missing the cpu usage, and so on ...
so I was wondering when does this happen? Shouldn't every metric returned by queryAvailablePerfMetric report data? what am I missing here?
Minimal code snippet:
// VMWare credentials
String vmwareUrl = args[0];
String vmwareUsername = args[1];
String vmwarePassword = args[2];
// connect to vCenter
ServiceInstance si = new ServiceInstance(new URL(vmwareUrl), vmwareUsername, vmwarePassword, true);
// get performance manager
PerformanceManager perfMgr = si.getPerformanceManager();
// define the time window (the last one hour)
Calendar calTo = Calendar.getInstance();
Calendar calFrom = Calendar.getInstance();
calFrom.setTime(calTo.getTime());
calFrom.add(Calendar.HOUR, -1);
// get any datastore for testing purposes
Folder rootFolder = si.getRootFolder();
ManagedEntity[] datastores = new InventoryNavigator(rootFolder).searchManagedEntities("Datastore");
ManagedEntity me = datastores[1];
// query all available metrics for the entity
PerfMetricId[] availablePmis = perfMgr.queryAvailablePerfMetric(me, calFrom, calTo, perfMgr.getHistoricalInterval()[0].getSamplingPeriod());
// create PerfQuerySpec
PerfQuerySpec qSpec = new PerfQuerySpec();
qSpec.setEntity(me.getMOR());
qSpec.setMetricId(availablePmis);
qSpec.setFormat("csv");
qSpec.setStartTime(calFrom);
qSpec.setEndTime(calTo);
// query perf
PerfEntityMetricBase[] perfValues = perfMgr.queryPerf(new PerfQuerySpec[]{qSpec});
// Printing
System.out.println("Found pmis (CounterIDs only): ");
for (PerfMetricId pmi : availablePmis){
System.out.print(pmi.getCounterId() + ", ");
}
System.out.print("\nPmis with values:");
int pmisCount=0;
for (PerfEntityMetricBase value : perfValues) {
PerfMetricSeriesCSV[] csvValues = ((PerfEntityMetricCSV) value).getValue();
pmisCount += csvValues.length;;
for (PerfMetricSeriesCSV csv : csvValues) {
System.out.println("Counter ID: " + csv.getId().getCounterId() + " ---- Metric instance: " + csv.getId().getInstance());
System.out.println("\tInfo: " + ((PerfEntityMetricCSV) value).getSampleInfoCSV());
System.out.println("\tValues: " + csv.getValue());
}
}
System.out.println("---------------");
System.out.println("Detected PMIs: " + availablePmis.length);
System.out.println("PMIs with values: " + pmisCount);
Any help (or discussions) would be appreciated

Loading saved Tensorflow model in Java

I have developed a Tensorflow model with python in Linux based on the tutorial here: "http://cv-tricks.com/tensorflow-tutorial/training-convolutional-neural-network-for-image-classification/". I trained and saved the model using "tf.train.Saver". I am able to deploy the model in Linux environment and perform prediction successfully. Now I need to be able to load this saved model in JAVA on WINDOWS. Through extensive research online I have read that it does not work with "tf.train.Saver" and I have to change my code to use "Serving" to be able to load a saved TF model in java! Therefore, I followed the tutorial here:
"https://github.com/tensorflow/serving/blob/master/tensorflow_serving/example/mnist_saved_model.py
" and changed my code. However, I have an error with "tf.FixedLenFeature" where it is asking me to use "FixedLenSequenceFeature". Here is the complete error message:
"ValueError: First dimension of shape for feature x unknown. Consider using FixedLenSequenceFeature."
which is happening here:
feature_configs = {'x': tf.FixedLenFeature(shape=[None, img_size,img_size,num_channels], dtype=tf.float32),}
I am not sure this is the right path to go since I have batch of images of size [batchsize*128*128*3] and should not be using the sequence feature! It would be great if someone could clear this out for me and answer these questions:
1- Do I have to change my code from "tf.train.Saver" to "serving" to be able to load the saved model and deploy it in JAVA?
2- If the answer to the above question is yes, how can I feed the data correctly and solve the aforementioned ERROR?
3- Is there any example of how to DEPLOY the model that was saved using "serving"?
Here is my training code that throws the error:
import dataset
import tensorflow as tf
import time
from datetime import timedelta
import math
import random
import numpy as np
import os
#Adding Seed so that random initialization is consistent
from numpy.random import seed
seed(1)
from tensorflow import set_random_seed
set_random_seed(2)
batch_size = 32
#Prepare input data
classes = ['class1','class2','class3']
num_classes = len(classes)
# 20% of the data will automatically be used for validation
validation_size = 0.2
img_size = 128
num_channels = 3
train_path='/home/user1/Downloads/Expression/Augmented/Data/Train'
# We shall load all the training and validation images and labels into memory using openCV and use that during training
data = dataset.read_train_sets(train_path, img_size, classes, validation_size=validation_size)
print("Complete reading input data. Will Now print a snippet of it")
print("Number of files in Training-set:\t\t{}".format(len(data.train.labels)))
print("Number of files in Validation-set:\t{}".format(len(data.valid.labels)))
session = tf.Session()
serialized_tf_example = tf.placeholder(tf.string, name='tf_example')
feature_configs = {'x': tf.FixedLenFeature(shape=[None, img_size,img_size,num_channels], dtype=tf.float32),}
tf_example = tf.parse_example(serialized_tf_example, feature_configs)
x = tf.identity(tf_example['x'], name='x') # use tf.identity() to assign name
# x = tf.placeholder(tf.float32, shape=[None, img_size,img_size,num_channels], name='x')
## labels
y_true = tf.placeholder(tf.float32, shape=[None, num_classes], name='y_true')
y_true_cls = tf.argmax(y_true, dimension=1)
##Network graph params
filter_size_conv1 = 3
num_filters_conv1 = 32
filter_size_conv2 = 3
num_filters_conv2 = 32
filter_size_conv3 = 3
num_filters_conv3 = 64
fc_layer_size = 128
def create_weights(shape):
return tf.Variable(tf.truncated_normal(shape, stddev=0.05))
def create_biases(size):
return tf.Variable(tf.constant(0.05, shape=[size]))
def create_convolutional_layer(input,
num_input_channels,
conv_filter_size,
num_filters):
## We shall define the weights that will be trained using create_weights function.
weights = create_weights(shape=[conv_filter_size, conv_filter_size, num_input_channels, num_filters])
## We create biases using the create_biases function. These are also trained.
biases = create_biases(num_filters)
## Creating the convolutional layer
layer = tf.nn.conv2d(input=input,
filter=weights,
strides=[1, 1, 1, 1],
padding='SAME')
layer += biases
## We shall be using max-pooling.
layer = tf.nn.max_pool(value=layer,
ksize=[1, 2, 2, 1],
strides=[1, 2, 2, 1],
padding='SAME')
## Output of pooling is fed to Relu which is the activation function for us.
layer = tf.nn.relu(layer)
return layer
def create_flatten_layer(layer):
#We know that the shape of the layer will be [batch_size img_size img_size num_channels]
# But let's get it from the previous layer.
layer_shape = layer.get_shape()
## Number of features will be img_height * img_width* num_channels. But we shall calculate it in place of hard-coding it.
num_features = layer_shape[1:4].num_elements()
## Now, we Flatten the layer so we shall have to reshape to num_features
layer = tf.reshape(layer, [-1, num_features])
return layer
def create_fc_layer(input,
num_inputs,
num_outputs,
use_relu=True):
#Let's define trainable weights and biases.
weights = create_weights(shape=[num_inputs, num_outputs])
biases = create_biases(num_outputs)
# Fully connected layer takes input x and produces wx+b.Since, these are matrices, we use matmul function in Tensorflow
layer = tf.matmul(input, weights) + biases
if use_relu:
layer = tf.nn.relu(layer)
return layer
layer_conv1 = create_convolutional_layer(input=x,
num_input_channels=num_channels,
conv_filter_size=filter_size_conv1,
num_filters=num_filters_conv1)
layer_conv2 = create_convolutional_layer(input=layer_conv1,
num_input_channels=num_filters_conv1,
conv_filter_size=filter_size_conv2,
num_filters=num_filters_conv2)
layer_conv3= create_convolutional_layer(input=layer_conv2,
num_input_channels=num_filters_conv2,
conv_filter_size=filter_size_conv3,
num_filters=num_filters_conv3)
layer_flat = create_flatten_layer(layer_conv3)
layer_fc1 = create_fc_layer(input=layer_flat,
num_inputs=layer_flat.get_shape()[1:4].num_elements(),
num_outputs=fc_layer_size,
use_relu=True)
layer_fc2 = create_fc_layer(input=layer_fc1,
num_inputs=fc_layer_size,
num_outputs=num_classes,
use_relu=False)
y_pred = tf.nn.softmax(layer_fc2,name='y_pred')
y_pred_cls = tf.argmax(y_pred, dimension=1)
values, indices = tf.nn.top_k(y_pred, 3)
table = tf.contrib.lookup.index_to_string_table_from_tensor(
tf.constant([str(i) for i in xrange(3)]))
prediction_classes = table.lookup(tf.to_int64(indices))
session.run(tf.global_variables_initializer())
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits=layer_fc2,
labels=y_true)
cost = tf.reduce_mean(cross_entropy)
optimizer = tf.train.AdamOptimizer(learning_rate=1e-4).minimize(cost)
correct_prediction = tf.equal(y_pred_cls, y_true_cls)
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
session.run(tf.global_variables_initializer())
def show_progress(epoch, feed_dict_train, feed_dict_validate, val_loss):
acc = session.run(accuracy, feed_dict=feed_dict_train)
val_acc = session.run(accuracy, feed_dict=feed_dict_validate)
msg = "Training Epoch {0} --- Training Accuracy: {1:>6.1%}, Validation Accuracy: {2:>6.1%}, Validation Loss: {3:.3f}"
print(msg.format(epoch + 1, acc, val_acc, val_loss))
total_iterations = 0
# saver = tf.train.Saver()
def train(num_iteration):
global total_iterations
for i in range(total_iterations,
total_iterations + num_iteration):
x_batch, y_true_batch, _, cls_batch = data.train.next_batch(batch_size)
x_valid_batch, y_valid_batch, _, valid_cls_batch = data.valid.next_batch(batch_size)
feed_dict_tr = {x: x_batch,
y_true: y_true_batch}
feed_dict_val = {x: x_valid_batch,
y_true: y_valid_batch}
session.run(optimizer, feed_dict=feed_dict_tr)
if i % int(data.train.num_examples/batch_size) == 0:
print(i)
val_loss = session.run(cost, feed_dict=feed_dict_val)
epoch = int(i / int(data.train.num_examples/batch_size))
show_progress(epoch, feed_dict_tr, feed_dict_val, val_loss)
print("Saving the model Now!")
# saver.save(session, save_path_full, global_step=i)
total_iterations += num_iteration
train(num_iteration=10000)#3000
# Export model
# WARNING(break-tutorial-inline-code): The following code snippet is
# in-lined in tutorials, please update tutorial documents accordingly
# whenever code changes.
export_path_base = './SavedModel/'
export_path = os.path.join(
tf.compat.as_bytes(export_path_base),
tf.compat.as_bytes(str(1)))
print 'Exporting trained model to', export_path
builder = tf.saved_model.builder.SavedModelBuilder(export_path)
# Build the signature_def_map.
classification_inputs = tf.saved_model.utils.build_tensor_info(
serialized_tf_example)
classification_outputs_classes = tf.saved_model.utils.build_tensor_info(
prediction_classes)
classification_outputs_scores = tf.saved_model.utils.build_tensor_info(values)
classification_signature = (
tf.saved_model.signature_def_utils.build_signature_def(
inputs={
tf.saved_model.signature_constants.CLASSIFY_INPUTS:
classification_inputs
},
outputs={
tf.saved_model.signature_constants.CLASSIFY_OUTPUT_CLASSES:
classification_outputs_classes,
tf.saved_model.signature_constants.CLASSIFY_OUTPUT_SCORES:
classification_outputs_scores
},
method_name=tf.saved_model.signature_constants.CLASSIFY_METHOD_NAME))
tensor_info_x = tf.saved_model.utils.build_tensor_info(x)
tensor_info_y = tf.saved_model.utils.build_tensor_info(y_pred)
prediction_signature = (
tf.saved_model.signature_def_utils.build_signature_def(
inputs={'images': tensor_info_x},
outputs={'scores': tensor_info_y},
method_name=tf.saved_model.signature_constants.PREDICT_METHOD_NAME))
legacy_init_op = tf.group(tf.tables_initializer(), name='legacy_init_op')
builder.add_meta_graph_and_variables(
sess, [tf.saved_model.tag_constants.SERVING],
signature_def_map={
'predict_images':
prediction_signature,
tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY:
classification_signature,
},
legacy_init_op=legacy_init_op)
builder.save()
print 'Done exporting!'

calling a tell and told prolog predicate from java does not work

I have the below code and when I run it through Prolog it creates a new file with a new rule. For example, I run
create_rec(check_symptoms(Symptoms,noOk))
it creates a new file with name rule_new and content all predicates with the old rules
and the new rule predicate with content is
rule(r15,_G3583,_G3601):-check_symptoms(_G3593,noOk)
The problem is that when I call the predicate create_rec from Java it does not work:
rules([r1,r2,r3,r4,r5,r6,r7,r8,r9,r10,r11,r12,r13,r14]).
% other predicates
create_rec(Body) :-
clause(rules(R_ids), B1),
last(R_ids,LastRule_id),
atom_codes(LastRule_id, Rule_codes),
Rule_codes = [H|T],
atom_codes(AtomNum,T),
atom_number(AtomNum,Num),
NewNum is Num+1,
atom_codes(NewNum,NewNumCodes),
atom_codes(NewRule_id,[114|NewNumCodes]),
append(R_ids, [NewRule_id], New_R_ids),
retract(rules(R_ids)),
asserta((rules(New_R_ids))),
asserta((rule(NewRule_id,A,B) :- Body)),save_rule.
save_rule :-
tell('rule_new.pl'),
write(':- dynamic rule/3, rules/1.'),nl,
write(':- [\'kb_anemia_V5b.pl\'].'),nl,
write(':- encoding(utf8).'),nl,
write(':- style_check(-singleton).'),
clause(rules(R_ids),B),nl,
write((rules(R_ids) :- B)),
write('.'),
get_rule_data(R_ids),
told.
get_rule_data([]).
get_rule_data([Rule_id|Rest_Rule_Id]) :-
clause(rule(Rule_id,A,B),Body1),
% fix(B,[],B2),
write(rule(Rule_id,A,B2):-Body1),write('.'), nl,
get_rule_data(Rest_Rule_Id).
% other predicates
The code in Java is:
Term consult_arg[] = {
new Atom(Diagnosis.class.getResource("anemia_diagnosis").getPath())};
Query consult_query = new Query( // to kanei query gia na ginei
"consult", //to consult
consult_arg);
boolean consulted = consult_query.hasSolution();
if (!consulted) {
System.err.println("Consult failed");
System.exit(1);
}
bodycr = body_txt1.getText();
String t9 = "create_rec(" + bodycr + " )." + "\n";
System.out.println("FUNCTION IS " + t9);
Query q9 = new Query(t9);
diagnosis = q9.oneSolution().toString();
System.out.println(diagnosis);
JOptionPane.showMessageDialog(null, "KB is created ");
It takes the body from the textarea and call the predicate. The pop up message is displayed , no bugs were found but the file is not created.
I don't know what is wrong. Can anyone help me?

Orientdb - SQL query with millions of vertices causes Java OutOfMemory error

I need to create edges between all vertices of class V1 and all vertices of class V2. My classes have 2-3 million vertices each. A double for loop with a SELECT * FROM V1, SELECT * FROM V2 gives a Java OutOfMemory (heap space) error (see below). This is an offline process that will be performed once or twice if needed (not a frequent operation) as the graph will not be regularly updated by the users, only myself.
How can I do it in batches (using SELECT...LIMIT or g.getvertices()) to avoid this?
Here's my code:
OrientGraphNoTx G = MyOrientDBFactory.getNoTx();
G.setUseLightweightEdges(false);
G.declareIntent(new OIntentMassiveInsert());
for (Vertex p1 : (Iterable<Vertex>) EG.command( new OCommandSQL("SELECT * FROM V1")).execute())
{
for (Vertex p2 : (Iterable<Vertex>) EG.command( new OCommandSQL("SELECT * FROM V2")).execute())
{
if (p1.getProperty("prop1")==p2.getProperty("prop1")
{
//p1.addEdge("MyEdge", p2);
EG.command( new OCommandSQL("create edge MyEdge from" + p1.getId() +"to "+ p2.getId() + " retry 100") ).execute ();
}
}
}
G.shutdown();
OrientDB 2.1.5 with Java/Graph API
NetBeans 8.1 with VM options -Xmx4096m and -Dstorage.diskCache.bufferSize=7200
Error message in console:
2016-05-24 15:48:06:112 INFO {db=MyDB} [TIP] Query 'SELECT * FROM
V1' returned a result set with more than 10000 records. Check if
you really need all these records, or reduce the resultset by using a
LIMIT to improve both performance and used RAM
[OProfilerStub]java.lang.OutOfMemoryError: Java heap space Dumping
heap to java_pid7896.hprof ...
Error message in Netbeans output
Exception in thread "main"
com.orientechnologies.orient.enterprise.channel.binary.OResponseProcessingException:
Exception during response processing. at
com.orientechnologies.orient.enterprise.channel.binary.OChannelBinaryAsynchClient.throwSerializedException(OChannelBinaryAsynchClient.java:443)
at
com.orientechnologies.orient.enterprise.channel.binary.OChannelBinaryAsynchClient.handleStatus(OChannelBinaryAsynchClient.java:398)
at
com.orientechnologies.orient.enterprise.channel.binary.OChannelBinaryAsynchClient.beginResponse(OChannelBinaryAsynchClient.java:282)
at
com.orientechnologies.orient.enterprise.channel.binary.OChannelBinaryAsynchClient.beginResponse(OChannelBinaryAsynchClient.java:171)
at
com.orientechnologies.orient.client.remote.OStorageRemote.beginResponse(OStorageRemote.java:2166)
at
com.orientechnologies.orient.client.remote.OStorageRemote.command(OStorageRemote.java:1189)
at
com.orientechnologies.orient.client.remote.OStorageRemoteThread.command(OStorageRemoteThread.java:444)
at
com.orientechnologies.orient.core.command.OCommandRequestTextAbstract.execute(OCommandRequestTextAbstract.java:63)
at
com.tinkerpop.blueprints.impls.orient.OrientGraphCommand.execute(OrientGraphCommand.java:49)
at xx.xxx.xxx.xx.MyEdge.(MyEdge.java:40) at
xx.xxx.xxx.xx.GMain.main(GMain.java:60) Caused by:
java.lang.OutOfMemoryError: GC overhead limit exceeded
As a workaround you can use code similar to the following
Iterable<Vertex> cv1= g.command( new OCommandSQL("SELECT count(*) FROM V1")).execute();
long counterv1=cv1.iterator().next().getProperty("count");
int[] ids=g.getRawGraph().getMetadata().getSchema().getClass("V1").getClusterIds();
long repeat=counterv1/10000;
long rest=counterv1-(repeat*10000);
List<Vertex> v1=new ArrayList<Vertex>();
int rid=0;
for(int i=0;i<repeat;i++){
Iterable<Vertex> v= g.command( new OCommandSQL("SELECT * FROM V1 WHERE #rid >= " + ids[0] + ":" + rid + " limit 10000")).execute();
CollectionUtils.addAll(v1, v.iterator());
rid=10000*(i+1);
}
if(rest>0){
Iterable<Vertex> v=g.command( new OCommandSQL("SELECT * FROM V1 WHERE #rid >= " + ids[0] + ":" + rid + " limit "+ rest)).execute();
CollectionUtils.addAll(v1, v.iterator());
}
Hope it helps.

Mahout returning same results in sequentials runs

I'm trying an Apache Mahout example using the code bellow. Everything works fine except that each time I change the userId value I need to run the class twice so that new values are returned. What I mean by that, is that every time I run it, the previous run output is showed, even with a different userId and that user recommendations.
I've tried not using the cache recommender but that hasn't worked either.
I'm using Eclipse IDE and the code of the class is the following:
DataModel model = new FileDataModel(new File("database.csv"));
UserSimilarity userSimilarity = new LogLikelihoodSimilarity(model);
System.out.println("Method: " + userSimilarity.getClass().getName().substring(userSimilarity.getClass().getName().lastIndexOf(".") + 1));
int neighborhood= 25;
System.out.println("Neighborhood: " + neighborhood);
UserNeighborhood neighborhood = new NearestNUserNeighborhood(neighborhood, userSimilarity, model);
Recommender recommender = new GenericUserBasedRecommender(model, neighborhood, userSimilarity);
Recommender cachingRecommender = new CachingRecommender(recommender);
int userId = 1234;
System.out.println("User ID: " + userId);
List<RecommendedItem> recommendations = cachingRecommender.recommend(userId, 15);
System.out.println("Recomendations:");
for (RecommendedItem r : recommendations) {
System.out.println(r.getItemID() + " " + r.getValue());
}

Categories