How to merge two sets of weka Instances together - java

Currently, I'm copying one instance at a time from one dataset to the other. Is there a way to do this so that string mappings remain intact? The mergeInstances works horizontally, is there an equivalent vertical merge?
This is one step of a loop I use to read datasets of the same structure from multiple arff files into one large dataset. There has got to be a simpler way.
Instances iNew = new ConverterUtils.DataSource(name).getDataSet();
for (int i = 0; i < iNew.numInstances(); i++) {
Instance nInst = iNew.instance(i);
inst.add(nInst);
}

If you want a totally fully automated method that also copy properly string and nominal attributes, you can use the following function:
public static Instances merge(Instances data1, Instances data2)
throws Exception
{
// Check where are the string attributes
int asize = data1.numAttributes();
boolean strings_pos[] = new boolean[asize];
for(int i=0; i<asize; i++)
{
Attribute att = data1.attribute(i);
strings_pos[i] = ((att.type() == Attribute.STRING) ||
(att.type() == Attribute.NOMINAL));
}
// Create a new dataset
Instances dest = new Instances(data1);
dest.setRelationName(data1.relationName() + "+" + data2.relationName());
DataSource source = new DataSource(data2);
Instances instances = source.getStructure();
Instance instance = null;
while (source.hasMoreElements(instances)) {
instance = source.nextElement(instances);
dest.add(instance);
// Copy string attributes
for(int i=0; i<asize; i++) {
if(strings_pos[i]) {
dest.instance(dest.numInstances()-1)
.setValue(i,instance.stringValue(i));
}
}
}
return dest;
}
Please note that the following conditions should hold (there are not checked in the function):
Datasets must have the same attributes structure (number of attributes, type of attributes)
Class index has to be the same
Nominal values have to exactly correspond
To modify on the fly the values of the nominal attributes of data2 to match the ones of data1, you can use:
data2.renameAttributeValue(
data2.attribute("att_name_in_data2"),
"att_value_in_data2",
"att_value_in_data1");

Why not make a new ARFF file which has the data from both of the originals? A simple
cat 1.arff > tmp.arff
tail -n+20 2.arff >> tmp.arff
where 20 is replaced by however many lines long your arff header is. This would then produce a new arff file with all of the desired instances, and you could read this new file with your existing code:
Instances iNew = new ConverterUtils.DataSource(name).getDataSet();
You could also invoke weka on the command line using this documentation: http://old.nabble.com/how-to-merge-two-data-file-a.arff-and-b.arff-into-one-data-list--td22890856.html
java weka.core.Instances append filename1 filename2 > output-file
However, there is no function in the documentation http://weka.sourceforge.net/doc.dev/weka/core/Instances.html#main%28java.lang.String which will allow you to append multiple arff files natively within your java code. As of Weka 3.7.6, the code that appends two arff files is this:
// read two files, append them and print result to stdout
else if ((args.length == 3) && (args[0].toLowerCase().equals("append"))) {
DataSource source1 = new DataSource(args[1]);
DataSource source2 = new DataSource(args[2]);
String msg = source1.getStructure().equalHeadersMsg(source2.getStructure());
if (msg != null)
throw new Exception("The two datasets have different headers:\n" + msg);
Instances structure = source1.getStructure();
System.out.println(source1.getStructure());
while (source1.hasMoreElements(structure))
System.out.println(source1.nextElement(structure));
structure = source2.getStructure();
while (source2.hasMoreElements(structure))
System.out.println(source2.nextElement(structure));
}
Thus it looks like Weka itself simply iterates through all of the instances in a data set and prints them, the same process your code uses.

Another possible solution is to use addAll from java.util.AbstractCollection, since Instances implement it.
instances1.addAll(instances2);

I've just shared an extended weka.core.Instaces class with methods like innerJoin, leftJoin, fullJoin, update and union.
table1.makeIndex(table1.attribute("Continent_ID");
table2.makeIndex(table2.attribute("Continent_ID");
Instances result = table1.leftJoin(table2);
Instances can have different number of attributes, levels of NOMINAL and STRING variables are merged together if neccesary.
Sources and some examples are here on GitHub: weka.join.

Related

How to read a specific value from a JCoTable object

I successfully got the Table entries from a SAP system via RFC_GET_TABLE_ENTRIES. It works all fine and lists me all the rows of the table.
My problem right now is that I have no idea how to get a single value out. Usually I would go like codes [x][y] but that doesn't work because it is not a normal two-dimensional-array table but a JCOtable and I have no idea how that works.
The code is a little longer but this is the call itself.
JCoDestination destination = JCoDestinationManager.getDestination("mySAPSystem");
JCoFunction function = destination.getRepository().getFunction("RFC_GET_TABLE_ENTRIES");
if (function==null)
throw new RuntimeException("Function not found in SAP.");
function.getImportParameterList().setValue( "MAX_ENTRIES", 30);
function.getImportParameterList().setValue( "TABLE_NAME", "ZTEST_TABLE ");
JCoTable codes = function.getTableParameterList().getTable("ENTRIES");
codes.appendRow();
and this is the console output
System.out.println("RFC_GET_TABLE_ENTRIES");
for (int i = 0; i < 30; i++) {
codes.setRow(i);
System.out.println(codes.getString("WA"));
}
getString actually accepts indexes as well. If you want to retrieve values according to x and y you can do the following
codes.setRow(y);
String value = codes.getString(x); // It can also be getFloat, getInt, etc. depending on the data type,
// or getValue, which gives you an Object
It works similarly to codes[x][y] as if it's an array, but this is not commonly used.
In other cases, you may want to iterate through each single value in the row with JCoRecordFieldIterator.
JCoRecordFieldIterator itr = codes. getRecordFieldIterator();
while(itr.hasNextField()){
JCoRecordField field = itr.nextRecordField();
String value = field.getValue(); // Or getString, getFloat, etc.
// Whatever you want to do with the value
}

How to create a new Section in the dataprovider ini4j

I am reading from ini files and passing them via data providers to test cases.
(The data provider reads these and returns an Ini.Section[][] array. If there are several sections, testng runs the test that many times.)
Let's imagine there is a section like this:
[sectionx]
key1=111
key2=222
key3=aaa,bbb,ccc
What I want, in the end, is to read this data and execute the test case three times, each time with a different value of key3, the other keys being the same.
One way would be to copy&paste the section as many times as needed... which is clearly not an ideal solution.
The way to go about it would seem to create further copies of the section, then change the key values to aaa, bbb and ccc. The data provider would return the new array and testng would do the rest.
However, I cannot seem to be able to create a new instance of the section object. Ini.Section is actually an interface; the implementing class org.ini4j.BasicProfileSection is not visible. It does not appear to be possible to create a copy of the object, or to inherit the class. I can only manipulate existing objects of this type, but not create new ones. Is there any way around it?
It seems that it is not possible to create copies of sections or the ini files. I ended up using this workaround:
First create an 'empty' ini file, that will serve as a sort of a placeholder. It will look like this:
[env]
test1=1
test2=2
test3=3
[1]
[2]
[3]
...with a sufficiently large number of sections, equal or greater to the number of sections in the other ini files.
Second, read the data in the data provider. When there is a key that contains several values, create a new Ini object for each value. The new Ini object must be created from a new file object. (You can read the placeholder file over and over, creating any number of Ini files.)
Finally, you have to copy the content of the actual ini file into the placeholder file.
The following code code works for me:
public static Ini copyIniFile(Ini originalFile){
Set<Entry<String, Section>> entries = originalFile.entrySet();
Ini emptyFile;
try {
FileInputStream file = new FileInputStream(new File(EMPTY_DATA_FILE_NAME));
emptyFile = new Ini(file);
file.close();
} catch (Exception e) {
e.printStackTrace();
return null;
}
for(Entry<String, Section> entry : entries){
String key = (String) entry.getKey();
Section section = (Section) entry.getValue();
copySection(key, section, emptyFile);
}
return emptyFile;
}
public static Ini.Section copySection(String key, Ini.Section origin, Ini destinationFile){
Ini.Section newSection = destinationFile.get(key);
if(newSection==null) throw new IllegalArgumentException();
for(Entry<String, String> entry : origin.entrySet()){
newSection.put(entry.getKey().toString(), entry.getValue().toString());
}
return newSection;
}

Comparing strings from a written file

I'm stuck on this program I'm making for school. Here's my code:
public static void experiencePointFileWriter() throws IOException{
File writeFileResults = new File("User Highscore.txt");
BufferedWriter bw;
bw = new BufferedWriter(new FileWriter(writeFileResults, true));
bw.append(userName + ": " + experiencePoints);
bw.newLine();
bw.flush();
bw.close();
FileReader fileReader = new FileReader(writeFileResults);
char[] a = new char[50];
fileReader.read(a); // reads the content to the array
for (char c : a)
System.out.print(c); // prints the characters one by one
fileReader.close();
}
The dilemma I'm facing is how can I sort new scores with the scores in writeFileResults by the numerical value of int experiencePoints? If you're wondering about the variables userName is assigned by a textfield.getText method, and an event happens when you press one of 36 buttons which launches a math.Random statement with one of 24 possible outcomes. They all add different integer numbers to experiencePoints.
Well, I don't want to do your homework, and this does seem introductory so I'd like to give you some hints.
First, there's a few things missing:
We don't have some of the variables you've given us, so there is no type associated with oldScores
There is no reference to userName or experiencePoints outside this method call
If you can add this information, it would make this process easier. I could infer things, but then I might be wrong, or worse yet, have you learn nothing because I did your assignment for you. ;)
EDIT:
So, based on extra information, you're data file is holding an "array" of usernames and experience values. Thus, the best way (read: best design, not shortest) would be to load these into custom objects then write a comparator function (read: implement the abstract class Comparator).
Thus, in pseudo-Java, you'd have:
Declare your data type:
private static class UserScore {
private final String name;
private final double experience;
// ... fill in the rest, it's just a data struct
}
In your reader, when you read the values, split each line to get the values, and create a new List<UserScore> object which contains all of the values read from the file (I'll let you figure this part out)
After you have your list, you can use Collections#sort to sort the list to be the correct order, here would be an example of this:
// assuming we have our list, userList
Collections.sort(userList, new Comparator<UserScore>() {
public int compare(UserScore left, UserScore right) {
return (int)(left.getExperience() - right.getExperience()); // check the docs to see why this makes sense for the compare function
}
}
// userList is now sorted based on the experience points
Re-write your file, as you see fit. You now have a sorted list.

Adding an instance to Instances in weka

I have a few arff files. I would like to read them sequentially and create a large dataset. Instances.add(Instance inst) doesn't add string values to the instances, hence the attempt to setDataset() ... but even this fails. Is there a way to accomplish the intuitively correct thing for strings?
ArffLoader arffLoader = new ArffLoader();
arffLoader.setFile(new File(fName));
Instances newData = arffLoader.getDataSet();
for (int i = 0; i < newData.numInstances(); i++) {
Instance one = newData.instance(i);
one.setDataset(data);
data.add(one);
}
This is from mailing list. I saved it before
how to merge two data file a.arff and b.arff into one data list?
Depends what merge you are talking about. Do you just want to append
the second file (both have the same attributes) or do you want to add
the merge the attributes (both have the same number of instances)?
In the first case ("append"):
java weka.core.Instances append filename1 filename2 > output-file
and the latter case ("merge"):
java weka.core.Instances merge filename1 filename2 > output-file
Here's the relevant Javadoc:
http://weka.sourceforge.net/doc.dev/weka/core/Instances.html#main(java.lang.String[])
Use mergeInstances to merge two datasets.
public static Instances mergeInstances(Instances first,
Instances second)
Your code would be something like below. For same instance numbers.
ArffLoader arffLoader = new ArffLoader();
arffLoader.setFile(new File(fName1));
Instances newData1 = arffLoader.getDataSet();
arffLoader.setFile(new File(fName2));
Instances newData2 = arffLoader.getDataSet();
Instances mergedData = Instances.mergeInstances( newData1 ,newData2);
Your code would be something like below. For same attribute numbers. I do not see any java method in weka. If you read code there is something like below.
// Instances.java
// public static void main(String[] args) {
// read two files, append them and print result to stdout
else if ((args.length == 3) && (args[0].toLowerCase().equals("append"))) {
DataSource source1 = new DataSource(args[1]);
DataSource source2 = new DataSource(args[2]);
String msg = source1.getStructure().equalHeadersMsg(source2.getStructure());
if (msg != null)
throw new Exception("The two datasets have different headers:\n" + msg);
Instances structure = source1.getStructure();
System.out.println(source1.getStructure());
while (source1.hasMoreElements(structure))
System.out.println(source1.nextElement(structure));
structure = source2.getStructure();
while (source2.hasMoreElements(structure))
System.out.println(source2.nextElement(structure));
}

Select object dynamically

Here's the situation :
I have 3 objects all named **List and I have a method with a String parameter;
gameList = new StringBuffer();
appsList = new StringBuffer();
movieList = new StringBuffer();
public void fetchData(String category) {
URL url = null;
BufferedReader input;
gameList.delete(0, gameList.length());
Is there a way to do something like the following :
public void fetchData(String category) {
URL url = null;
BufferedReader input;
"category"List.delete(0, gameList.length());
, so I can choose which of the lists to be used based on the String parameter?
I suggest you create a HashMap<String, StringBuffer> and use that:
map = new HashMap<String, StringBuffer>();
map.put("game", new StringBuffer());
map.put("apps", new StringBuffer());
map.put("movie", new StringBuffer());
...
public void fetchData(String category) {
StringBuffer buffer = map.get(category);
if (buffer == null) {
// No such category. Throw an exception?
} else {
// Do whatever you need to
}
}
If the lists are fields of your object - yes, using reflection:
Field field = getClass().getDeclaredField(category + "List");
List result = field.get();
But generally you should avoid reflection. And if your objects are fixed - i.e. they don't change, simply use an if-clause.
The logically simplest way taking your question as given would just be:
StringBuffer which;
if (category.equals("game"))
which=gameList;
else if (category.equals("apps"))
which=appList;
else if (category.equals("movie"))
which=movieList;
else
... some kind of error handling ...
which.delete();
As Jon Skeet noted, if the list is big or dynamic you probably want to use a map rather than an if/else/if.
That said, I'd encourage you to use integer constant or an enum rather than a String. Like:
enum ListType {GAME, APP, MOVIE};
void deleteList(ListType category)
{
if (category==GAME)
... etc ...
In this simple example, if this is all you'd ever do with it, it wouldn't matter much. But I'm working on a system now that uses String tokens for this sort of thing all over the place, and it creates a lot of problems.
Suppose you call the function and by mistake you pass in "app" instead of "apps", or "Game" instead of "game". Or maybe you're thinking you added handling for "song" yesterday but in fact you went to lunch instead. This will successfully compile, and you won't have any clue that there's a problem until run-time. If the program does not throw an error on an invalid value but instead takes some default action, you could have a bug that's difficult to track down. But with an enum, if you mis-spell the name or try to use one that isn't defined, the compiler will immediately alert you to the error.
Suppose that some functions take special action for some of these options but not others. Like you find yourself writing
if (category.equals("app"))
getSpaceRequirements();
and that sort of thing. Then someone reading the program sees a reference to "app" here, a reference to "game" 20 lines later, etc. It could be difficult to determine what all the possible values are. Any given function might not explicitly reference them all. But with an enum, they're all neatly in one place.
You could use a switch statement
StringBuffer buffer = null;
switch (category) {
case "game": buffer = gameList;
case "apps": buffer = appsList;
case "movie": buffer = movieList;
default: return;
}

Categories