I'm new with Weka. I want to use Sequential Minimal Optimization in WEKA.
Could anyone tell me how to proceed?
here is my Java code but it doesn't work:
public class SVMTest {
public void test(File input) throws Exception{
File tmp = new File("tmp-file-duplicate-pairs.arff");
String path = input.getParent();
//tmp.deleteOnExit();
////removeFeatures(input,tmp,useType,useNames, useActivities, useOccupation,useFriends,useMailAndSite,useLocations);
Instances data = new weka.core.converters.ConverterUtils.DataSource(tmp.getAbsolutePath()).getDataSet();
data.setClassIndex(data.numAttributes() - 1);
Classifier c = null;
String ctype = null;
boolean newmodel = false;
ctype ="SMO";
c = new SMO();
String[] options = {"-M"};
c.setOptions(options);
c.buildClassifier(data);
newmodel = true;
//c = loadClassifier(input.getParentFile().getParentFile(),ctype);
if(newmodel)
saveModel(c,ctype, input.getParentFile().getParentFile());
Evaluation eval = new Evaluation(data);
eval.crossValidateModel(c, data, 10, new Random(1));
System.out.println(c);
System.out.println(eval.toSummaryString());
System.out.println(eval.toClassDetailsString());
System.out.println(eval.toMatrixString());
tmp.delete();
}
private static void saveModel(Classifier c, String name, File path) throws Exception {
ObjectOutputStream oos = null;
try {
oos = new ObjectOutputStream(
new FileOutputStream(path.getAbsolutePath()+"/"+name+".model"));
} catch (FileNotFoundException e1) {
e1.printStackTrace();
} catch (IOException e1) {
e1.printStackTrace();
}
oos.writeObject(c);
oos.flush();
oos.close();
}
}
I want to know how to provide .arff file?
my Dataset is in the form of XML files.
I guess you have figured it out by now, but in case it helps others, there is a wiki page about it:
http://weka.wikispaces.com/Text+categorization+with+WEKA
to use SMO, let's say you have some train instances "trainset", and a test set "testset"
to build the classifier:
// train SMO and output model
SMO classifier = new SMO();
classifier.buildClassifier(trainset);
to evaluate it using cross validation for example:
Evaluation eval = new Evaluation(testset);
Random rand = new Random(1); // using seed = 1
int folds = 10;
eval.crossValidateModel(classifier, testset, folds, rand);
then eval holds all the stats, etc.
You can Read input file from these line:
Instances training_data = new Instances(new BufferedReader(
new FileReader("tmp-file-duplicate-pairs.arff")));
training_data.setClassIndex(training_data.numAttributes() - 1);
The following link explains about using SMO in weka
http://preciselyconcise.com/apis_and_installations/training_a_weka_classifier_in_java.php
Related
Tab-Separated File:
2019-06-06 10:00:00 1.0
2019-06-06 11:00:00 2.0
I'd like to iterate over the file once and add the value of each column to a list.
My working approach would be:
import java.util.*;
import java.io.*;
public class Program {
public static void main(String[] args)
{
ArrayList<Double> List_1 = new ArrayList<Double>();
ArrayList<Double> List_2 = new ArrayList<Double>();
String[] values = null;
String fileName = "File.txt";
File file = new File(fileName);
try
{
Scanner inputStream = new Scanner(file);
while (inputStream.hasNextLine()){
try {
String data = inputStream.nextLine();
values = data.split("\\t");
if (values[1] != null && !values[1].isEmpty() == true) {
double val_1 = Double.parseDouble(values[1]);
List_1.add(val_1);
}
if (values[2] != null && !values[2].isEmpty() == true) {
double val_2 = Double.parseDouble(values[2]);
List_2.add(val_2);
}
}
catch (ArrayIndexOutOfBoundsException exception){
}
}
inputStream.close();
}
catch (FileNotFoundException e) {
e.printStackTrace();
}
System.out.println(List_1);
System.out.println(List_2);
}
}
I get:
[1.0]
[2.0]
It doesn't work without the checks for null, ìsEmpty and the ArrayIndexOutOfBoundsException.
I would appreciate any hints on how to save a few lines while keeping the scanner approach.
One option is to create a Map of Lists using column number as a key. This approach gives you "unlimited" number of columns and exactly the same output than one in the question.
public class Program {
public static void main(String[] args) throws Exception
{
Map<Integer, List<Double>> listMap = new TreeMap<Integer, List<Double>>();
String[] values = null;
String fileName = "File.csv";
File file = new File(fileName);
Scanner inputStream = new Scanner(file);
while (inputStream.hasNextLine()){
String data = inputStream.nextLine();
values = data.split("\\t");
for (int column = 1; column < values.length; column++) {
List<Double> list = listMap.get(column);
if (list == null) {
listMap.put(column, list = new ArrayList<Double>());
}
if (!values[column].isEmpty()) {
list.add(Double.parseDouble(values[column]));
}
}
}
inputStream.close();
for(List<Double> list : listMap.values()) {
System.out.println(list);
}
}
}
You can clean up your code some by using try-with resources to open and close the Scanner for you:
try (Scanner inputStream = new Scanner(file))
{
//your code...
}
This is useful because the inputStream will be closed automatically once the try block is left and you will not need to close it manually with inputStream.close();.
Additionally if you really want to "save lines" you can also combine these steps:
double val_2 = Double.parseDouble(values[2]);
List_2.add(val_2);
Into a single step each, since you do not actually use the val_2 anywhere else:
List_2.add(Double.parseDouble(values[2]));
Finally you are also using !values[1].isEmpty() == true which is comparing a boolean value to true. This is typically bad practice and you can reduce it to !values[1].isEmpty() instead which will have the same functionality. Try not to use == with booleans as there is no need.
you can do it like below:
BufferedReader bfr = Files.newBufferedReader(Paths.get("inputFileDir.tsv"));
String line = null;
List<List<String>> listOfLists = new ArrayList<>(100);
while((line = bfr.readLine()) != null) {
String[] cols = line.split("\\t");
List<String> outputList = new ArrayList<>(cols);
//at this line your expected list of cols of each line is ready to use.
listOfLists.add(outputList);
}
As a matter of fact, it is a simple code in java. But because it seems that you are a beginner in java and code like a python programmer, I decided to write a sample code to let you have a good start point. good luck
I have a file with some info how can I read all info?
Name names;
try (FileInputStream fileInputStream = new FileInputStream(file)) {
ObjectInputStream objectInputStream = new ObjectInputStream(fileInputStream);
names = (Name) objectInputStream.readObject();
} catch (IOException | ClassNotFoundException e) {
e.printStackTrace();
}
You have several solution, all depending on the input:
You can iterate until the stream is fully consumed: I think that is the worse solution out of those I provide you. It is worse because you are checking if EOF was reached, whilst you should know when you're done (eg: your file format is wrong).
Set<Name> result = new HashSet<>();
try {
for (;;) {
result.add((Name)objectInputStream.readObject());
}
} catch (EOFException e) {
// End of stream
}
return result;
When producing the input, serialize a collection and invoke readObject() on it. Serialization should be able to read the collection, as long as each object implements Serializable.
static void write(Path path, Set<Name> names) throws IOException {
try (OutputStream os = Files.newOutputStream(path);
ObjectOutputStream oos = new ObjectOutputStream(os)) {
oos.writeObject(names);
}
}
static Set<Name> read(Path path) throws IOException {
try (InputStream is = Files.newInputStream(path);
ObjectInputStream ois = new ObjectInputStream(is)) {
// WARN Files.newInputStream is not buffered; ObjectInputStream might
// be buffered (I don't remember).
return (Set<Name>) ois.readObject();
}
}
When producing the input, you can add a int indicating the number of object to read, and iterate over it: this is useful in case where you don't really care of the collection (HashSet). The resulting file will be smaller (because you won't have the HashSet metadata).
int result = objectInputStream.readInt();
Name[] names = new Name[result]; // do some check on result!
for (int i = 0; i < result; ++i) {
names[i] = (Name) objectInputStream.readObject();
}
Also, Set are good, but since they remove duplicate using hashCode()/equals() you may get less object if your definition of equals/hashCode changed after the fact (example: your Name was case sensitive and now it is not, eg: new Name("AA").equals(new Name("aa"))).
I need to take a list of Objects and write their instance variables to a text file. It would look something like this:
Hot Dog,1.25,Grocery Store
Gas,42.15,Gas Station
etc.
I have some code that looks like this:
public void writeListToFile(String fileName, ArrayList<BudgetItem> writeList) throws Exception {
PrintWriter out = null;
for(int i = 0; i<writeList.size(); i++) {
if(writeList.get(i) instanceof Expense) {
Expense writeExpense = (Expense) writeList.get(i);
try {
out = new PrintWriter(new FileWriter(fileName));
dump(out, writeExpense);
}
finally {
}
}
else if(writeList.get(i) instanceof Income) {
Income writeIncome = (Income) writeList.get(i);
try {
out = new PrintWriter(new FileWriter(fileName));
dump(out, writeIncome);
}
finally {
}
}
}
out.close();
}
public void dump(PrintWriter out, Expense writeExpense) {
out.print(writeExpense.getDateOfTransaction().get(GregorianCalendar.YEAR));
out.print(",");
out.print(writeExpense.getDateOfTransaction().get(GregorianCalendar.MONTH));
out.print(",");
out.print(writeExpense.getDateOfTransaction().get(GregorianCalendar.DATE));
out.print(",");
out.print(writeExpense.getItemName());
out.print(",");
out.print(writeExpense.getMethodOfPay());
out.print(",");
out.print(writeExpense.getPlaceOfPurchase());
out.print(",");
out.print(writeExpense.getQuantity());
out.print(",");
out.print(writeExpense.getPrice());
out.print("\n");
}
and one other method similar to the 2nd one.
When I run it, it only writes out one line, the first object in the list, and nothing else. I can't figure out what's going on. I know object serialization is a faster option, but for this project, since I am still learning, I want to use this way.
Main method as requested by one of the answers:
public static void main(String[] args) throws Exception {
String itemName = "Hot Dog";
int quantity = 1;
String placeOfPurchase = "Weiner Stand";
String methodOfPay = "Credit";
BigDecimal price = new BigDecimal(1.25);
GregorianCalendar g = new GregorianCalendar(2013,11,1);
Expense e = new Expense(g, price, itemName, quantity, placeOfPurchase, methodOfPay);
BudgetItem bi = (BudgetItem) e;
String itemName2 = "Gun";
int quantity2 = 1;
String placeOfPurchase2 = "Weiner Stand";
String methodOfPay2 = "Credit";
BigDecimal price2 = new BigDecimal(1.25);
GregorianCalendar g2 = new GregorianCalendar(2013,11,1);
Expense e2 = new Expense(g, price, itemName, quantity, placeOfPurchase, methodOfPay);
BudgetItem bi2 = (BudgetItem) e2;
ArrayList<BudgetItem> abi = new ArrayList<BudgetItem>();
abi.add(bi);
abi.add(bi2);
RegisterFileIO rfio = new RegisterFileIO();
rfio.writeListToFile(System.getProperty("user.dir") + "/data.out", abi);
BufferedReader in = new BufferedReader(new FileReader(System.getProperty("user.dir") + "/data.out"));
Scanner lineScanner = new Scanner(in);
lineScanner.useDelimiter(",");
while(lineScanner.hasNext()) {
System.out.println(lineScanner.next());
}
}
I believe the problem is you creating a new PrintWriter each iteration. You should declare it outside the loop. What is happened is that when a new PrintWriter is created it overwrites the previous data stored in the file.
PrintWriter out = null;
try {
out = new PrintWriter(new FileWriter(fileName));
for(int i = 0; i<writeList.size(); i++) {
if(writeList.get(i) instanceof Expense) {
Expense writeExpense = (Expense) writeList.get(i);
dump(out, writeExpense);
}
} finally {
}
This is because you're instantiating a new PrintWriter object (and a new FileWriter object) for each object in your list.
You should instantiate it only once, before the for loop. Replace
PrintWriter out = null;
with
PrintWriter out = new PrintWriter(new FileWriter(fileName));
Just a side note: with your current code, you might end up with a NullPointerException at line out.close(); if your ArrayList is empty.
First: you are writing java, not C++. Use Java structures and techniques.
As mentioned by MadConan, your implementation is overkill. Use toString() (or toBlammy() - blammy being something other than string) on each object type (Expense and Income) to format the output.
Hint: anytime you have a bunch of if (instanceof blammy) you should consider polymorphism instead.
You code should look something like this:
public void writeListToFile(
final String fileName,
final List<BudgetItem> listBudgetItem)
throws Exception
{
PrintWriter out = null;
try
{
out = new PrintWriter(new FileWriter(fileName));
for(BudgetItem current : listBudgetItem)
{
out.println(current.toBlammy());
}
}
catch (... exceptions)
{
}
finally
{
// close the PrintWriter.
}
}
I am trying to add serilization and deserialization to my app. I have already added serization which makes it into a textfileThis problem is involving ArrayLists. I was browsing this page: http://www.vogella.com/articles/JavaSerialization/article.html when I saw this code:
FileInputStream fis = null;
ObjectInputStream in = null;
try {
fis = new FileInputStream(filename);
in = new ObjectInputStream(fis);
p = (Person) in.readObject();
out.close();
} catch (Exception ex) {
ex.printStackTrace();
}
System.out.println(p);
}
I was confused on this line:
p = (Person) in.readObject();
How do I make this line an ArrayList when creating an ArrayList is not as simple as that:
List<String> List = new ArrayList<String>();
Thanks for the help in advance!
I took the code directly from the website that you provided a link for and modified it for an ArrayList. You mention "How do I make this line an ArrayList when creating an ArrayList is not as simple as that", I say creating an ArrayList is as simple as that.
public static void main(String[] args) {
String filename = "c:\\time.ser";
ArrayList<String> p = new ArrayList<String>();
p.add("String1");
p.add("String2");
// Save the object to file
FileOutputStream fos = null;
ObjectOutputStream out = null;
try {
fos = new FileOutputStream(filename);
out = new ObjectOutputStream(fos);
out.writeObject(p);
out.close();
} catch (Exception ex) {
ex.printStackTrace();
}
// Read the object from file
// Save the object to file
FileInputStream fis = null;
ObjectInputStream in = null;
try {
fis = new FileInputStream(filename);
in = new ObjectInputStream(fis);
p = (ArrayList<String>) in.readObject();
out.close();
} catch (Exception ex) {
ex.printStackTrace();
}
System.out.println(p);
}
prints out [String1, String2]
Have you written a whole ArrayList as an object in the file?
Or have you written Persons object that were in an ArrayList in a loop in the file?
I have a text file with a sequence of 4194304 letters ranging from A-D all on one line (4 MB).
How would I randomly point to a character and replace the following set of characters to another file that is 100 characters long and write it out to a file?
I'm actually currently able to do this, but I feel it's really inefficient when I iterate it several times.
Here's an illustration of what I mentioned above:
Link to Imageshack
Here's how I'm currently achieving this:
Random rnum = new Random();
FileInputStream fin = null;
FileOutputStream fout = null;
int count = 10000;
FileInputStream fin1 = null;
File file1 = new File("fileWithSet100C.txt");
int randChar = 0;
while(cnt > 0){
try {
int c = 4194304 - 100;
randChar = rnum.nextInt(c);
File file = new File("file.txt");
//seems inefficient to initiate these guys over and over
fin = new FileInputStream(file);
fin1 = new FileInputStream(file1);
//would like to remove this and have it just replace the original
fout = new FileOutputStream("newfile.txt");
int byte_read;
int byte_read2;
byte[] buffer = new byte[randChar];
byte[] buffer2 = new byte[(int)file1.length()]; //4m
byte_read = fin.read(buffer);
byte_read2 = fin1.read(buffer2);
fout.write(buffer, 0, byte_read);
fout.write(buffer2, 0, byte_read2);
byte_read = fin.read(buffer2);
buffer = new byte[4096]; //4m
while((byte_read = (fin.read(buffer))) != -1){
fout.write(buffer, 0, byte_read);
}
cnt--;
}
catch (...) {
...
}
finally {
...
}
try{
File file = new File("newfile.txt");
fin = new FileInputStream(file);
fout = new FileOutputStream("file.txt");
int byte_read;
byte[] buffer = new byte[4096]; //4m
byte_read = fin.read(buffer);
while((byte_read = (fin.read(buffer))) != -1){
fout.write(buffer, 0, byte_read);
}
}
catch (...) {
...
}
finally {
...
}
Thanks for reading!
EDIT:
For those curious, here's the code I used to solve the aforementioned problem:
String stringToInsert = "insertSTringHERE";
byte[] answerByteArray = stringToInsert.getBytes();
ByteBuffer byteBuffer = ByteBuffer.wrap(answerByteArray);
Random rnum = new Random();
randChar = rnum.nextInt(4194002); //4MB worth of bytes
File fi = new File("file.txt");
RandomAccessFile raf = null;
try {
raf = new RandomAccessFile(fi, "rw");
} catch (FileNotFoundException e1) {
// TODO error handling and logging
}
FileChannel fo = null;
fo = raf.getChannel();
// Move to the beginning of the file and write out the contents
// of the byteBuffer.
try {
outputFileChannel.position(randChar);
while(byteBuffer.hasRemaining()) {
fo.write(byteBuffer);
}
} catch (IOException e) {
// TODO error handling and logging
}
try {
outputFileChannel.close();
} catch (IOException e) {
// TODO error handling and logging
}
try {
randomAccessFile.close();
} catch (IOException e) {
// TODO error handling and logging
}
You probably want to use Java's random-access file features. Sun/Oracle has a Random Access Files tutorial that will probably be useful to you.
If you can't use Java 7, then look at RandomAccessFile which also has seek functionality and has existed since Java 1.0.
First off, for your files you could have the Files as global variables. This would all you to use the file when ever you needed without reading it again. Also note that if you keep making new files then you will lose the data that you have already acquired.
For example:
public class Foo {
// Gloabal Vars //
File file;
public Foo(String location) {
// Do Something
file = new File(location);
}
public add() {
// Add
}
}
Answering your question, I would first read both files and then make all the changes you want in memory. After you have made all the changes, I would then write the changes to the file.
However, if the files are very large, then I would make all the changes one by one on the disk... it will be slower, but you will not run out of memory this way. For what you are doing I doubt you could use a buffer to help counter how slow it would be.
My overall suggestion would be to use arrays. For example I would do the following...
public char[] addCharsToString(String str, char[] newChars, int index) {
char[] string = str.toCharArray();
char[] tmp = new char[string.length + newChars.length];
System.arraycopy(string, 0, tmp, 0, index);
System.arraycopy(newChars, index, tmp, index, newChars.length);
System.arraycopy(string, index + newChars.length, tmp, index + newChars.length, tmp.length - (newChars.length + index));
return tmp;
}
Hope this helps!