I prepared csv file with the input data for neural network, and csv file where i can test my neural network. The results are not satisfactory. I was trying increase/decrease size of input data. Probably i missing something and i would be glad if someone can some tips etc. Here is my encog code:
//input data
File file = new File("path to file");
CSVFormat format = new CSVFormat('.', ',');
VersatileDataSource source = new CSVDataSource(file, false, format);
VersatileMLDataSet data = new VersatileMLDataSet(source);
data.getNormHelper().setFormat(format);
ColumnDefinition wig20OpenN = data.defineSourceColumn("wig20OpenN", 0, ColumnType.continuous);
(...)
ColumnDefinition futureClose = data.defineSourceColumn("futureClose", 81, ColumnType.continuous);
data.analyze();
data.defineSingleOutputOthersInput(futureClose);
EncogModel model = new EncogModel(data);
//TYPE_RBFNETWORK, TYPE_SVM, TYPE_NEAT, TYPE_FEEDFORWARD <- this type of method i was trying
model.selectMethod(data, MLMethodFactory.TYPE_SVM);
model.setReport(new ConsoleStatusReportable());
data.normalize();
model.holdBackValidation(0.001, true, 10);
model.selectTrainingType(data);
MLRegression bestMethod = (MLRegression)model.crossvalidate(20, true);
// Display the training and validation errors.
System.out.println( "Training error: " + model.calculateError(bestMethod, model.getTrainingDataset()));
System.out.println( "Validation error: " + model.calculateError(bestMethod, model.getValidationDataset()));
NormalizationHelper helper = data.getNormHelper();
File testingData = new File("path to testing file");
ReadCSV csv = new ReadCSV(testingData, false, format);
String[] line = new String[81];
MLData input = helper.allocateInputVector();
while(csv.next()) {
StringBuilder result = new StringBuilder();
for(int i = 0; i <81; i++){
line[i] = csv.get(i);
}
String correct = csv.get(81);
helper.normalizeInputVector(line,input.getData(),false);
MLData output = bestMethod.compute(input);
String irisChosen = helper.denormalizeOutputVectorToString(output)[0];
result.append(Arrays.toString(line));
result.append(" -> predicted: ");
result.append(irisChosen);
result.append("(correct: ");
result.append(correct);
result.append(")");
System.out.println(result.toString());
}
// Delete data file and shut down.
filename.delete();
Encog.getInstance().shutdown();
What i was trying so far is to change the MLMethodFactory, but had problems here, only TYPE_RBFNETWORK, TYPE_SVM, TYPE_NEAT, TYPE_FEEDFORWARD this type works fine, for example if i changed it to TYPE_PNN i had following exception:
Exception in thread "main" org.encog.EncogError: Please call selectTraining first to choose how to train.
Ok i know from documentation that i should use this method:
selectTraining(VersatileMLDataSet dataset, String trainingType, String trainingArgs)
But the string type for traningtype and triningArgs is confusing.
And last question what about saving the neural after traning to file, and loading it to check on the traning data? As i would like to have this separately.
Edit: I forgot the size of the input data is 1500.
I see that you not satisfied with your results, but it is relatively fine. I propose you to consider adding scaling to your training. You have 81 columns, and in your input row I see data like 16519.1600, also 2315.94, and even -0.6388282285709328. For neural network it is hard to adjust weights correctly for such different inputs.
P.S. scaling is also normalizing of columns!. As usually in books is described normalizing of rows, but normalizing of columns is also important.
Related
I have the following dependency added:
<dependency>
<groupId>net.sf.supercsv</groupId>
<artifactId>super-csv</artifactId>
<version>2.4.0</version>
</dependency>
private final static String[] COLS = { "col1", "col2", "col3", "col4", "col5",
"col6", "col7", "col8", "col9", "col10", "col11",
"col12", "col13", "col14" };
private final static String[] TEMP_COLS = {"col1", "col2", "col3", "col4", "col5",
"col6", "col7", "col8", "col9", "col10", "col11",
"col12", "col13"};
The below is how I build my reader.
protected CsvPreference csvPref = CsvPreference.STANDARD_PREFERENCE;
protected String encoding = "US-ASCII";
InputStream is = fs.open(path);
BufferedReader br = new BufferedReader(new InputStreamReader(is, encoding));
ICsvBeanReader csvReader = new CsvBeanReader(br, csvPref);
As part of bean reader, I have the following code:
Selections bean = null;
try{
bean = reader.read(Selections.class, Selections.getCols());
}catch(Exception e){
// bean = reader.read(Selections.class, Selections.getTempCols());
// slf4j.error(bean.getEventCode() + bean.getProgramId());
slf4j.error("Error Logged for bean because of COLUMNS MISMATCH");
}
In the above code, It is throwing exception :
java.lang.IllegalArgumentException:the nameMapping array and the number of columns read should be the same size (nameMapping length = 14, columns = 13))
I am not sure what is causing this exception.It is throwing this exception on some of the records even if all the records have 14 columns(I have verified this by using a script, I have even created a schema and uploaded the file with 14 columns). Out of 7,000,000 records 2,100,000 has this issue.
In order to debug what record is causing this problem I have made the below changes to the code.
Selections bean = null;
try{
bean = reader.read(Selections.class, Selections.getCols());
}catch(Exception e){
bean = reader.read(Selections.class, Selections.getTempCols());
slf4j.error(bean.getEventCode() + bean.getProgramId());
slf4j.error("Error Logged for bean because of COLUMNS MISMATCH");
}
Now, the above changes are throwing : java.lang.IllegalArgumentException: the nameMapping array and the number of columns read should be the same size (nameMapping length = 13, columns = 14)
I have no idea why the open csv reader is behaving so strangely. When the count of columns is not 14 it would cause exception and in exception when trying to read it to print the details, It says the column count is 14.
Please help me debug this issue. I shall update more details about the issue if needed. Please let me know.
After a dive into super csv source and your confirmation that you can upload with 14 columns coreectly, I'd suggest you look for a replacement for Super CSV.
My recommendation: Check out Apache Commons CSV.
This library also supports an iterative approach, so you wouldn't need to have 7.000.000 records in memory.
Finally I resolved the problem, the problem is because of the columnquote mode character that I have given in my CSV preferences.
new CsvPreference.Builder('"', '\u0001', "\r\n").build()
My incoming data has " as part of the data. The issue got resolved when I have replaced quoted column with a character that will never be part of the incoming data.
I am not an expert at it, it is because of my ignorance and super-scv is not at fault. I believe super-csv is a decent API to explore and use.
To know more about column quote mode, please refer to their API.
https://super-csv.github.io/super-csv/apidocs/org/supercsv/quote/ColumnQuoteMode.html
I use CsvJDBC for read data from a CSV. I get CSV from web service request, so not loaded from file. I adjust these properties:
Properties props = new java.util.Properties();
props.put("separator", ";"); // separator is a semicolon
props.put("fileExtension", ".txt"); // file extension is .txt
props.put("charset", "UTF-8"); // UTF-8
My sample1.txt contains these datas:
code;description
c01;d01
c02;d02
my sample2.txt contains these datas:
code;description
c01;d01
c02;d0;;;;;2
It is optional for me deleted headers from CSV. But not optional for me change semi-colon separator.
EDIT: My query for resultSet: SELECT * FROM myCSV
I want to read code column in sample1.txt and sample2.txt with:
resultSet.getString(1)
and read full description column with many semi-colons (d0;;;;;2). Is it possible with CsvJdbc driver or need to change driver?
Thank you any advice!
This is a problem that occurs when you have messy, invalid input, which you need to try to interpret, that's being read by a too-high-level package that only handles clean input. A similar example is trying to read arbitrary HTML with an XML parser - close, but no cigar.
You can guess where I'm going: you need to pre-process your input.
The preprocessing may be very easy if you can make some assumptions about the data - for example, if there are guaranteed to be no quoted semi-colons in the first column.
You could try supercsv. We have implemented such a solution in our project. More on this can be found in http://supercsv.sourceforge.net/
and
Using CsvBeanReader to read a CSV file with a variable number of columns
Finally this problem solved without a CSVJdbc or SuperCSV driver. These drivers works fine. There are possible query data form CSV file and many features content. In my case I don't need query data from CSV. Unfortunately, sometimes the description column content one or more semi-colons and which it is my separator.
First I check code in answer of #Maher Abuthraa and modified to:
private String createDescriptionFromResult(ResultSet resultSet, int columnCount) throws SQLException {
if (columnCount > 2) {
StringBuilder data_list = new StringBuilder();
for (int ii = 2; ii <= columnCount; ii++) {
data_list.append(resultSet.getString(ii));
if (ii != columnCount)
data_list.append(";");
}
// data_list has all data from all index you are looking for ..
return data_list.toString();
} else {
// use standard way
return resultSet.getString(2);
}
}
The loop started from 2, because 1 column is code and only description column content many semi-colons. The CSVJdbc driver split columns by separator ; and these semi-colons disappears from columns data. So, I explicit add semi-colons to description, except the last column, because it is not relevant in my case.
This code work fine. But not solved my all problem. When I adjusted two columns in header of CSV I get error in row, which content more than two semi-colons. So I try adjust ignore of headers or add many column name (or simple ;) to a header. In superCSV ignore of headers option work fine.
My colleague opinion was: you are don't need CSV driver, because try load CSV which not would be CSV, if separator is sometimes relevant data.
I think my colleague has right and I loaded CSV data whith following code:
InputStream in = null;
try {
in = new ByteArrayInputStream(csvData);
List lines = IOUtils.readLines(in, "UTF-8");
Iterator it = lines.iterator();
String line = "";
while (it.hasNext()) {
line = (String) it.next();
String description = null;
String code = null;
String[] columns = line.split(";");
if (columns.length >= 2) {
code = columns[0];
String[] dest = new String[columns.length - 1];
System.arraycopy(columns, 1, dest, 0, columns.length - 1);
description = org.apache.commons.lang.StringUtils.join(dest, ";");
(...)
ok.. my solution to go and read all fields if columns are more than 2 ... like:
int ccc = meta.getColumnCount();
if (ccc > 2) {
ArrayList<String> data_list = new ArrayList<String>();
for (int ii = 1; ii < ccc; ii++) {
data_list.add(resultSet.getString(i));
}
//data_list has all data from all index you are looking for ..
} else {
//use standard way
resultSet.getString(1);
}
If the table is defined to have as many columns as there could be semi-colons in the source, ignoring the initial column definitions, then the excess semi-colons would be consumed by the database driver automatically.
The most likely reason for them to appear in the final column is because the parser returns the balance of the row to the terminator in the field.
Simply increasing the number of columns in the table to match the maximum possible in the input will avoid the need for custom parsing in the program. Try:
code;description;dummy1;dummy2;dummy3;dummy4;dummy5
c01;d01
c02;d0;;;;;2
Then, the additional ';' delimiters will be consumed by the parser correctly.
I have a large file which has 10,000 rows and each row has a date appended at the end. All the fields in a row are tab separated. There are 10 dates available and those 10 dates have randomly been assigned to all the 10,000 rows. I am now writing a java code to write all those rows with the same date into a separate file where each file has the corresponding rows with that date.
I am trying to do it using string manipulations, but when I am trying to sort the rows based on date, I am getting an error while mentioning the date and the error says the literal is out of range. Here is the code that I used. Please have a look at it let me know if this is the right approach, if not, kindly suggest a better approach. I tried changing the datatype to Long, but still the same error. The row in the file looks something like this:
Each field is tab separated and the fields are:
business id, category, city, biz.name, longitude, state, latitude, type, date
**
qarobAbxGSHI7ygf1f7a_Q ["Sandwiches","Restaurants"] Gilbert Jersey
Mike's Subs -111.8120071 AZ 3.5 33.3788385 business 06012010
**
The code is:
File f=new File(fn);
if(f.exists() && f.length()>0)
{
BufferedReader br=new BufferedReader(new FileReader(fn));
BufferedWriter bw = new BufferedWriter(new FileWriter("FilteredDate.txt"));
String s=null;
while((s=br.readLine())!=null){
String[] st=s.split("\t");
if(Integer.parseInt(st[13])==06012010){
Thanks a lot for your time..
Try this,
List<String> sampleList = new ArrayList<String>();
sampleList.add("06012012");
sampleList.add("06012013");
sampleList.add("06012014");
sampleList.add("06012015");
//
//
String[] sampleArray = s.split(" ");
if (sampleArray != null)
{
String sample = sampleArray[sampleArray.length - 1];
if (sampleList.contains(sample))
{
stringBuilder.append(sample + "\n");
}
}
i suggest not to use split, but rather use
String str = s.subtring(s.lastIndexOf('\t'));
in any case, you try to take st[13] when i see you only have 9 columns. might be you just need st[8]
one last thing, look at this post to learn what 06012010 really means
I need to normalize a CSV file. I followed this article written by Jeff Heaton. This is (some) of my code:
File sourceFile = new File("Book1.csv");
File targetFile = new File("Book1_norm.csv");
EncogAnalyst analyst = new EncogAnalyst();
AnalystWizard wizard = new AnalystWizard(analyst);
wizard.wizard(sourceFile, true, AnalystFileFormat.DECPNT_COMMA);
final AnalystNormalizeCSV norm = new AnalystNormalizeCSV();
norm.analyze(sourceFile, false, CSVFormat.ENGLISH, analyst);
norm.setProduceOutputHeaders(false);
norm.normalize(targetFile);
The only difference between my code and the one of the article is this line:
norm.setOutputFormat(CSVFormat.ENGLISH);
I tried to use it but it seems that in Encog 3.1.0, that method doesn't exist. The error I get is this one (it looks like the problem is with the line norm.normalize(targetFile):
Exception in thread "main" org.encog.app.analyst.AnalystError: Can't find column: 11700
at org.encog.app.analyst.util.CSVHeaders.find(CSVHeaders.java:187)
at org.encog.app.analyst.csv.normalize.AnalystNormalizeCSV.extractFields(AnalystNormalizeCSV.java:77)
at org.encog.app.analyst.csv.normalize.AnalystNormalizeCSV.normalize(AnalystNormalizeCSV.java:192)
at IEinSoftware.main(IEinSoftware.java:55)
I added a FAQ that shows how to normalize a CSV file. http://www.heatonresearch.com/faq/4/2
Here's a function to do it... of course you need to create an analyst
private EncogAnalyst _analyst;
public void NormalizeFile(FileInfo SourceDataFile, FileInfo NormalizedDataFile)
{
var wizard = new AnalystWizard(_analyst);
wizard.Wizard(SourceDataFile, _useHeaders, AnalystFileFormat.DecpntComma);
var norm = new AnalystNormalizeCSV();
norm.Analyze(SourceDataFile, _useHeaders, CSVFormat.English, _analyst);
norm.ProduceOutputHeaders = _useHeaders;
norm.Normalize(NormalizedDataFile);
}
I don't have soap ui pro. I am testing the web service. The actual implementation is i need pass one error code on the request, and the corresponding error description should be displayed on the response. I need to add this assertion. Every time the description in the response varies.
Here is the thing i want exactly...
Every time i need to run the same request but the error code (which is input) only should be changed on each time and the description varies on the response. How to validate this? Is there any way to do this without data source.
Regards,
Chandra
This is the way i have created.. is there any way to improve the code to do better way;
import java.io.File;
File file = new File('c:/customers.csv');
InputStream inputFile = new FileInputStream(file);
String[] lines = inputFile.text.split('\n');
List<String[]> rows = lines.collect {it.split(',')}
log.info('There are ' + rows.size() + ' customers to be inserted');
for(int i = 0; i < rows.size(); i++) {
String[] row = rows.get(i);
String errorcode = row[0];
// log.info(errorcode)
String errorDescription = row[1];
//log.info(errorDescription)
testRunner.testCase.testSuite.project.setPropertyValue('errorcode', errorcode);
testRunner.testCase.testSuite.project.setPropertyValue('errorDescription', errorDescription);
testRunner.runTestStepByName("createCard-1");
log.info(errorcode +"Finsihed")
}