"Invalid classification data: expect label value" - java

I'm trying to train a model using deep learning in java, when I start training the train data it gives an error
Invalid classification data: expect label value (at label index column = 0) to be in range 0 to 1 inclusive (0 to numClasses-1, with numClasses=2); got label value of 2
I didn't understand the error since I am a beginner in deep learning 4j. I am using a data set which views relationship between two people (if there is a relationship between two people then the class label is going to be 1 otherwise 0).
The Java code
public class SNA {
private static Logger log = LoggerFactory.getLogger(SNA.class);
public static void main(String[] args) throws Exception {
int seed = 123;
double learningRate = 0.01;
int batchSize = 50;
int nEpochs = 30;
int numInputs = 2;
int numOutputs = 2;
int numHiddenNodes = 20;
//load the training data
RecordReader rr = new CSVRecordReader(0,",");
rr.initialize(new FileSplit(new File("C:\\Users\\GTS\\Desktop\\SNA project\\experiments\\First experiment\\train\\slashdotTrain.csv")));
DataSetIterator trainIter = new RecordReaderDataSetIterator(rr, batchSize,0, 2);
// load test data
RecordReader rrTest = new CSVRecordReader();
rr.initialize(new FileSplit(new File("C:\\Users\\GTS\\Desktop\\SNA project\\experiments\\First experiment\\test\\slashdotTest.csv")));
DataSetIterator testIter = new RecordReaderDataSetIterator(rrTest, batchSize,0, 2);
log.info("**** Building Model ****");
MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
.seed(seed)
.optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
.iterations(1)
.learningRate(learningRate)
.updater(Updater.NESTEROVS).momentum(0.9)
.list()
.layer(0, new DenseLayer.Builder()
.nIn(numInputs)
.nOut(numHiddenNodes)
.activation("relu")
.weightInit(WeightInit.XAVIER)
.build())
.layer(1, new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
.activation("softmax")
.weightInit(WeightInit.XAVIER)
.nIn(numHiddenNodes)
.nOut(numOutputs)
.build())
.pretrain(false).backprop(true)
.build();
MultiLayerNetwork model = new MultiLayerNetwork(conf);
model.init();
// Listener to show how the network is training in the log
model.setListeners(new ScoreIterationListener(10));
log.info(" **** Train Model **** ");
for (int i = 0; i < nEpochs; i++) {
model.fit(trainIter);
}
System.out.println("**** Evaluate Model ****");
Evaluation evaluation = new Evaluation(numOutputs);
while (testIter.hasNext()) {
DataSet t = testIter.next();
INDArray feature = t.getFeatureMatrix();
INDArray labels = t.getLabels();
INDArray predicted = model.output(feature, false);
evaluation.eval(labels, predicted);
}
System.out.println(evaluation.stats());
}
}
Any help Please?
Thanks A lot

problem solved:
Change the third parameter of RecordReaderDataSetIterator in
DataSetIterator testIter = new RecordReaderDataSetIterator(rrTest, batchSize,0, 2); from 0 to 2; because the data set has three columns and the index of the class label is 2 because its the third column.
solution:
DataSetIterator trainIter = new RecordReaderDataSetIterator(rr, batchSize,2, 2);
refrences:
enter link description here

Related

Casting input from file as object reference

What I'm wondering is, is there a way to read the file so that the strings 'vertex' and 'connected' can be cast as a reference to the Vertex objects outside the loop? The Strings will share the same name as the file is read, with it progressing from a-t, so that isn't a problem. If this isn't possible, is there any other way to work around this? Maybe by somehow creating a vertex object within the loop yet not overwriting it? I tried it but it would overwrite itself for each loop, as it would create a new Vertex with the same value each time. Thanks in advance.
public static void main(String[] args) throws FileNotFoundException {
File file = new File("ass3.txt");
Scanner scan = new Scanner(f);
if (file.exists() == false) {
System.out.println("File doesn't exist or could not be found.");
System.exit(0);
}
int nVertices = scan.nextInt();
int nEdges = scan.nextInt();
for (int i = 0; i < 21; i++) {
String s = scan.nextLine();
}
Queue selectedSet = new Queue();
Queue candidateSet = new Queue();
Vertex a = new Vertex("a");
Vertex b = new Vertex("b");
Vertex c = new Vertex("c");
Vertex d = new Vertex("d");
Vertex e = new Vertex("e");
Vertex f = new Vertex("f");
Vertex g = new Vertex("g");
Vertex h = new Vertex("h");
Vertex i = new Vertex("i");
Vertex j = new Vertex("j");
Vertex k = new Vertex("k");
Vertex l = new Vertex("l");
Vertex m = new Vertex("m");
Vertex n = new Vertex("n");
Vertex o = new Vertex("o");
Vertex p = new Vertex("p");
Vertex q = new Vertex("q");
Vertex r = new Vertex("r");
Vertex s = new Vertex("s");
Vertex t = new Vertex("t");
for (int z = 0; z < 99; z++) {
String vertex = scan.next();
String connected = scan.next();
int weight = scan.nextInt();
vertex.addNeighbour(new Edge(weight,vertex,connected));
}
You should be using a Map. This maps objects->objects, so in this case we would want it to be Strings->Vertex.
Some sample code:
HashMap<String, Vertex> vertices = new HashMap<String, Vertex>();
vertices.put("a", new Vertex("a"));
...
You can then reference the map in or outside your loop
Vertex v = vertices.get(vertex);

How can I predict unlabeled data with .arff file in Weka?

I have the training and test "labeled.arff" files. Then I build a classifier and write to a "modelFile.model" file.
I have a "unlabeled.arff" file with the last attribute in each row "?".
How can I make the prediction in Java or C#?
I have some code but it is not right, always gives me the same prediction.
Thank you
// Write to Model
public static void Classify()
{
Instances train = new Instances(new java.io.FileReader(dirTrain + "labeled.arff"));
Instances test = new Instances(new java.io.FileReader(dirTest + "labeled.arff"));
train.setClassIndex(train.numAttributes() - 1);
test.setClassIndex(test.numAttributes() - 1);
// train Classifier
Classifier cl = new J48();
// Randomize the order of the instances in the dataset
weka.filters.Filter myRandom = new weka.filters.unsupervised.instance.Randomize();
myRandom.setInputFormat(train);
train = weka.filters.Filter.useFilter(train, myRandom);
// Build the classifier
cl.buildClassifier(train);
// evaluate classifier and print some statistics
Evaluation eval = new Evaluation(train);
eval.evaluateModel(cl, test);
Console.WriteLine(eval.toSummaryString("\nResults Decision Tree\n======\n", false));
SerializationHelper.write(dirModel + "modelFile.model", cl);
}
// Make predictions
public void Predictions()
{
Classifier cl = (Classifier)SerializationHelper.read(dirModel + "modelFile.model");
// load unlabeled data
Instances unlabeled = new Instances(new java.io.FileReader(pathFeatures + "unlabeled.arff"));
// set class attribute
unlabeled.setClassIndex(unlabeled.numAttributes() - 1);
// create copy
Instances labeled = new Instances(unlabeled);
// label instances
for (int i = 0; i < unlabeled.numInstances(); i++)
{
double clsLabel = cl.classifyInstance(unlabeled.instance(i));
labeled.instance(i).setClassValue(clsLabel);
}
int numCorrect = 0;
for (int i = 0; i < unlabeled.numInstances(); i++)
{
double pred = cl.classifyInstance(unlabeled.instance(i));
Console.Write("ID: " + unlabeled.instance(i).value(i));
//Console.Write(", actual: " + unlabeled.classAttribute().value((int)unlabeled.instance(i).classValue()));
Console.WriteLine(", predicted: " + unlabeled.classAttribute().value((int)pred));
}
Console.WriteLine("Correct predictions: " + numCorrect);
}

Repaired Records : Cell information from worksheet created

I'm receiving an error when opening my OpenXML created spreadsheet. The error is as follows.
repaired record : xl/worksheets/sheet.xml partial cell information
private void SavexlsExcelFile(String fullPathName)
{
using (SpreadsheetDocument document = SpreadsheetDocument.Create(fullPathName, SpreadsheetDocumentType.Workbook))
{
WorkbookPart workbookPart = document.AddWorkbookPart();
workbookPart.Workbook = new Workbook();
worksheetPart = workbookPart.AddNewPart<WorksheetPart>();
worksheetPart.Worksheet = new Worksheet();
Columns columns = new Columns();
worksheetPart.Worksheet.AppendChild(columns);
Sheets sheets = workbookPart.Workbook.AppendChild(new Sheets());
Sheet sheet = new Sheet() { Id = workbookPart.GetIdOfPart(worksheetPart), SheetId = 1, Name = "Sheet" };
sheets.Append(sheet);
workbookPart.Workbook.Save();
sheetData = worksheetPart.Worksheet.AppendChild(new SheetData());
List<List<string>> dataRow = new List<List<string>>();
List<String> dtRow = new List<String>();
Row row = new Row();
for (int i = 0; i < dataGridView1.RowCount; i++)
{
for (int j = 0; j < dataGridView1.ColumnCount; j++)
{
if (i == 0)
{
Cell dataCell = new Cell();
dataCell.DataType = CellValues.String;
CellValue cellValue = new CellValue();
cellValue.Text = dataGridView1.Columns[j].Name;
dataCell.StyleIndex = 2;
dataCell.Append(cellValue);
row.AppendChild(dataCell);
//dataColumn.Add(dataGridView1.Columns[j].Name);
}
dtRow.Add(dataGridView1.Rows[i].Cells[j].Value.ToString());
}
}
dataRow.Add(dtRow);
sheetData.AppendChild(row);
row = new Row();
foreach (List<string> datarow in dataRow)
{
row = new Row();
foreach(string dtrow in datarow)
{
row.Append(ConstructCell(dtrow, CellValues.String, 2));
}
sheetData.AppendChild(row);
}
worksheetPart.Worksheet.Save();
}
}
private Cell ConstructCell(string value, CellValues dataType, uint styleIndex = 0)
{
return new Cell()
{
CellValue = new CellValue(value),
DataType = new EnumValue<CellValues>(dataType),
StyleIndex = styleIndex
};
}
There are 2 issues here that I can see. The first is that your use of Columns is incorrect. You should use Columns if you wish to control things such as the width of a column. To use Columns correctly, you'll need to add child Column elements. For example (taken from here):
Columns columns = new Columns();
columns.Append(new Column() { Min = 1, Max = 3, Width = 20, CustomWidth = true });
columns.Append(new Column() { Min = 4, Max = 4, Width = 30, CustomWidth = true });
In your sample you could just remove the following two lines
Columns columns = new Columns();
worksheetPart.Worksheet.AppendChild(columns);
The second issue is the StyleIndex you are using; the style doesn't exist in your document because you haven't added it. The easiest thing to do here is to just remove the StyleIndex altogether.
When debugging files like this, it's always worth looking at the OpenXml Productivity Tool. You can open a generated file in the tool and validate it to see what errors you have in your file.
All the text in Excel is stored under a shared string table. You need to insert the string in shared string table:
string text = dataGridView1.Columns[j].Name;
cell.DataType = CellValues.SharedString;
if (!_spreadSheet.WorkbookPart.GetPartsOfType<SharedStringTablePart>().Any())
{
_spreadSheet.WorkbookPart.AddNewPart<SharedStringTablePart>();
}
var sharedStringTablePart = _spreadSheet.WorkbookPart.GetPartsOfType<SharedStringTablePart>().First();
if (sharedStringTablePart.SharedStringTable == null)
{
sharedStringTablePart.SharedStringTable = new SharedStringTable();
}
//Iterate through shared string table to check if the value is already present.
foreach (SharedStringItem ssItem in sharedStringTablePart.SharedStringTable.Elements<SharedStringItem>())
{
if (ssItem.InnerText == text)
{
cell.CellValue = new CellValue(ssItem.ElementsBefore().Count().ToString());
SaveChanges();
return;
}
}
// The text does not exist in the part. Create the SharedStringItem.
var item = sharedStringTablePart.SharedStringTable.AppendChild(new SharedStringItem(new Text(text)));
cell.CellValue = new CellValue(item.ElementsBefore().Count().ToString());

ClassifyInstance() Method error

For the code below, the classifyInstance() line gives an error:
Exception in thread "main" java.lang.NullPointerException
at weka.classifiers.functions.LinearRegression.classifyInstance(LinearRegression.java:272)
at LR.main(LR.java:45)
I tried to debug but no success. How can I use my saved model to predict the class attribute of my test file? The problem is based on the regression.
for (int i = 0; i < unlabeled.numInstances(); i++) {
double clsLabel = cls.classifyInstance(unlabeled.instance(i));
labeled.instance(i).setClassValue(clsLabel);
System.out.println(clsLabel + " -> " + unlabeled.classAttribute().value((int) clsLabel));
}
This is the actual code:
public class LR{
public static void main(String[] args) throws Exception
{
BufferedReader datafile = new BufferedReader(new FileReader("C:\\dataset.arff"));
Instances data = new Instances(datafile);
data.setClassIndex(data.numAttributes()-1); //setting class attribute
datafile.close();
LinearRegression lr = new LinearRegression(); //build model
int folds=10;
Evaluation eval = new Evaluation(data);
eval.crossValidateModel(lr, data, folds, new Random(1));
System.out.println(eval.toSummaryString());
//save the model
weka.core.SerializationHelper.write("C:\\lr.model", lr);
//load the model
Classifier cls = (Classifier)weka.core.SerializationHelper.read("C:\\lr.model");
Instances unlabeled = new Instances(new BufferedReader(new FileReader("C:\\testfile.arff")));
// set class attribute
unlabeled.setClassIndex(unlabeled.numAttributes() - 1);
// create copy
Instances labeled = new Instances(unlabeled);
double clsLabel;
// label instances
for (int i = 0; i < unlabeled.numInstances(); i++)
{
clsLabel = cls.classifyInstance(unlabeled.instance(i));
labeled.instance(i).setClassValue(clsLabel);
System.out.println(clsLabel + " -> " + unlabeled.classAttribute().value((int) clsLabel));
}
// save labeled data
BufferedWriter writer = new BufferedWriter(new FileWriter("C:\\final.arff"));
writer.write(labeled.toString());
writer.newLine();
writer.flush();
writer.close();
}
}
Did you train your classifier?
Looks to me like you are trying to classify, without having trained your classifier.

Create table with docx4j

I try to create a new table depending on input data and insert it into an docx-document.
Following leads to a corrupted output file:
private Tbl getSampleTable(WordprocessingMLPackage wPMLpackage) {
ObjectFactory factory = Context.getWmlObjectFactory();
int writableWidthTwips = wPMLpackage.getDocumentModel().getSections().get(0).getPageDimensions().getWritableWidthTwips();
List<Map<String, String>> data = getSampleTableData();
TableDefinition tableDef = getSampleTableDef();
int cols = tableDef.getColumns().size();
int cellWidthTwips = new Double(Math.floor((writableWidthTwips / cols))).intValue();
Tbl table = TblFactory.createTable((data.size() + 1), cols, cellWidthTwips);
Tr headerRow = (Tr) table.getContent().get(0);
int f = 0;
for (Column column : tableDef.getColumns()) {
Tc column = (Tc) headerRow.getContent().get(f);
f++;
Text text = factory.createText();
text.setValue(column.getName());
R run = factory.createR();
run.getContent().add(text);
column.getContent().add(run);
headerRow.getContent().add(column);
}
int i = 1;
for (Map<String, String> entry : data) {
Tr row = (Tr) table.getContent().get(i);
i++;
int p = 0;
for (String key : entry.keySet()) {
Tc column = (Tc) row.getContent().get(p);
p++;
Text tx = factory.createText();
R run = factory.createR();
tx.setValue(entry.get(key));
run.getContent().add(tx);
column.getContent().add(run);
row.getContent().add(column);
}
}
return table;
}
Without inserting the table the docx-document is created how it shall be.
I use the this function by trying to insert this table in an file that I receive as input parameter:
ByteArrayInputStream bis = new ByteArrayInputStream(file);
WordprocessingMLPackage wPMLpackage = null;
wPMLpackage = WordprocessingMLPackage.load(bis);
// Zip it up
ByteArrayOutputStream baos = new ByteArrayOutputStream();
SaveToZipFile saver = new SaveToZipFile(wPMLpackage);
saver.save(baos);
byte[] template = baos.toByteArray();
WordprocessingMLPackage target = WordprocessingMLPackage.load(new ByteArrayInputStream(template));
target.getMainDocumentPart().getContent().clear();
target.getMainDocumentPart().addObject(getSampleTable(target));
ByteArrayOutputStream baos2 = new ByteArrayOutputStream();
SaveToZipFile saver2 = new SaveToZipFile(target);
saver2.save(baos2);
return baos2.toByteArray();
Someone has an idea why the generated file can't be interpreted by Microsoft Word? The error message is "The file can't be opened as its contents causes problems". Manipulation of the document works as long as I don't insert this table.
Inserting the runs in paragraphs leads to the desired result:
private Tbl getSampleTable(WordprocessingMLPackage wPMLpackage) {
ObjectFactory factory = Context.getWmlObjectFactory();
int writableWidthTwips = wPMLpackage.getDocumentModel().getSections()
.get(0).getPageDimensions()
.getWritableWidthTwips();
List<Map<String, String>> data = getSampleTableData();
TableDefinition tableDef = getSampleTableDef();
int cols = tableDef.getColumns().size();
int cellWidthTwips = new Double(
Math.floor((writableWidthTwips / cols))
).intValue();
Tbl table = TblFactory.createTable((data.size() + 1), cols, cellWidthTwips);
Tr headerRow = (Tr) table.getContent().get(0);
int f = 0;
for (Column column : tableDef.getColumns()) {
Tc column = (Tc) headerRow.getContent().get(f);
P columnPara = (P) column.getContent().get(0);
f++;
Text text = factory.createText();
text.setValue(column.getName());
R run = factory.createR();
run.getContent().add(text);
columnPara.getContent().add(run);
}
int i = 1;
for (Map<String, String> entry : data) {
Tr row = (Tr) table.getContent().get(i);
i++;
int d = 0;
for (String key : entry.keySet()) {
Tc column = (Tc) row.getContent().get(d);
P columnPara = (P) column.getContent().get(0);
d++;
Text tx = factory.createText();
R run = factory.createR();
tx.setValue(entry.get(key));
run.getContent().add(tx);
columnPara.getContent().add(run);
}
}
return table;
}
In creating a table (or anything else for that matter), one approach worth bearing in mind is to create what you want in Word, then use one of the docx4j code gen tools to generate corresponding Java code.
The code gen tool is available 2 ways:
online at http://webapp.docx4java.org/OnlineDemo/PartsList.html
or as a Word AddIn, see http://www.docx4java.org/forums/docx4jhelper-addin-f30/
The advantage of the Word AddIn is that you avoid the save-upload cycle.

Categories