csv to arff conversion

csv to arff conversion - java

I am a beginner of java i want to convert the existing .csv file into .arff file and i have written the below code and its not converting instead i am getting the errors. please can anybody help me in solving these errors and suggest me how
program :
import weka.core.Instances;
import weka.core.converters.ArffSaver;
import weka.core.converters.CSVLoader;
import java.io.File;
import java.io.IOException;
public class meena {
/**
* takes 2 arguments:
* - CSV input file
* - ARFF output file
*/
public static void main(String[] args) throws IOException {
String args0="C:\\Documents and Settings\\CORI\\My Documents\\NetBeansProjects\\trainingset\\build\\classes\\svmlearn\\in.csv ";
String args1="C:\\Documents and Settings\\CORI\\My Documents\\NetBeansProjects\\trainingset\\build\\classes\\svmlearn\\output1.txt";
// load CSV
CSVLoader loader = new CSVLoader();
loader.setSource(new File(args0));
Instances data = loader.getDataSet();
// save ARFF
ArffSaver saver = new ArffSaver();
saver.setInstances(data);
saver.setFile(new File(args[1]));
saver.setDestination(new File(args[1]));
saver.writeBatch();
}
}
I am getting the below error:
---Registering Weka Editors---
Trying to add database driver (JDBC): RmiJdbc.RJDriver - Error, not in CLASSPATH?
Trying to add database driver (JDBC): jdbc.idbDriver - Error, not in CLASSPATH?
Trying to add database driver (JDBC): com.mckoi.JDBCDriver - Error, not in CLASSPATH?
Trying to add database driver (JDBC): org.hsqldb.jdbcDriver - Error, not in CLASSPATH?
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
at meena.main(meena.java:26)
Java Result: 1
Please help me to convert .csv file into .arff file by suggesting me clearly how and where to pass the input

There are two points here.
On line 26 you use args[1] when I believe you mean to use args1.
Trying to add database driver (JDBC) does not prevent your code from running successfully.
Weka's official reasoning
Stack Overflow answer

Related

Poi 5.0 Jar movement Issue in XWPFDocument

I am currently Apache poi 3.14 version jar to create Word documents. I am now looking to upgrade Poi to latest stable version of 5.0. But upon checking I am facing issues where I am even unable to load document stream in XWPF document. I have attached a sample code where I try to read a simple docx file & then re-write it again, I am getting error in place of even loading file bytes into XWPFDocument. I am baffled any detailed help would be really appreciated.
package basePackage;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
public class PoiJars {
public static void main(String[] args) throws Exception {
String docxFilePath = "SimpleWordFile.docx";
InputStream stream = new FileInputStream(docxFilePath);
XWPFDocument document = new XWPFDocument(stream);
FileOutputStream outFile = new FileOutputStream("output.docx");
document.write(outFile);
}
}
Exception that occurs is:
Exception in thread "main" java.lang.NoClassDefFoundError: org/openxmlformats/schemas/drawingml/x2006/chart/ChartSpaceDocument$Factory
at org.apache.poi.xddf.usermodel.chart.XDDFChart.<init>(XDDFChart.java:155)
at org.apache.poi.xwpf.usermodel.XWPFChart.<init>(XWPFChart.java:75)
at org.apache.poi.ooxml.POIXMLFactory.createDocumentPart(POIXMLFactory.java:61)
at org.apache.poi.ooxml.POIXMLDocumentPart.read(POIXMLDocumentPart.java:660)
at org.apache.poi.ooxml.POIXMLDocument.load(POIXMLDocument.java:165)
at org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:126)
at basePackage.PoiJars.main(PoiJars.java:18)
Caused by: java.lang.ClassNotFoundException: org.openxmlformats.schemas.drawingml.x2006.chart.ChartSpaceDocument$Factory
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 7 more
Jars I am using in Class path:
poi-5.0.0.jar, poi-ooxml-5.0.0.jar, xmlbeans-4.0.0.jar along with other commons & codec jar dependencies.
My queries are:
1)Why I am not even able to load basic docx file in XWPFdocument?
2)If use poi-ooxml-full-5.0.0.jar instead of poi-ooxml-5.0.0.jar, XWPFDocument class is not present in it, what is reason ?
3)Also can some one pls help me in sharing some links to get complete understanding POI architecture & code flow, so I can modify classes in jar according to my needs.

Avro Text file generated by Flume Twitter Agent not being read in Java

Not able to read and parse the File created by streaming twitter data using Flume twitter agent, neither using Java nor Avro Tools. My requirement is to convert the avro format into JSON format.
When using either of the method, I get the exception : org.apache.avro.AvroRuntimeException: java.io.IOException: Block size invalid or too large for this implementation: -40
I am using Hadoop vanilla config in pseudo node cluster and the hadoop version is 2.7.1
Flume version is 1.6.0
The flume config file for twitter agent and the java code to parse the avro file is attached below :
TwitterAgent.sources=Twitter
TwitterAgent.channels=MemChannel
TwitterAgent.sinks=HDFS
TwitterAgent.sources.Twitter.type=org.apache.flume.source.twitter.TwitterSource
TwitterAgent.sources.Twitter.channels=MemChannel
TwitterAgent.sources.Twitter.consumerKey=xxxxxxxxxxxxxx
TwitterAgent.sources.Twitter.consumerSecret=xxxxxxxxxxxxxxxx
TwitterAgent.sources.Twitter.accessToken=xxxxxxxxxxxxxxxx
TwitterAgent.sources.Twitter.accessTokenSecret=xxxxxxxxxxxxxx
TwitterAgent.sources.Twitter.keywords=Modi,PMO,Narendra Modi,BJP
TwitterAgent.sinks.HDFS.channel=MemChannel
TwitterAgent.sinks.HDFS.type=hdfs
TwitterAgent.sinks.HDFS.hdfs.path=hdfs://localhost:9000/user/ashish/Twitter_Data
TwitterAgent.sinks.HDFS.hdfs.fileType=DataStream
TwitterAgent.sinks.HDFS.hdfs.writeformat=Text
TwitterAgent.sinks.HDFS.hdfs.batchSize=100
TwitterAgent.sinks.HDFS.hdfs.rollSize=0
TwitterAgent.sinks.HDFS.hdfs.rollCount=10
TwitterAgent.sinks.HDFS.hdfs.rollInterval=30
TwitterAgent.channels.MemChannel.type=memory
TwitterAgent.channels.MemChannel.capacity=10000
TwitterAgent.channels.MemChannel.transactionCapacity=100
import org.apache.avro.file.DataFileReader;
import org.apache.avro.file.FileReader;
import org.apache.avro.file.SeekableInput;
import org.apache.avro.generic.GenericDatumReader;
import org.apache.avro.generic.GenericRecord;
import org.apache.avro.io.DatumReader;
import org.apache.avro.mapred.FsInput;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import java.io.IOException;
public class AvroReader {
public static void main(String[] args) throws IOException {
Path path = new Path("hdfs://localhost:9000/user/ashish/Twitter_Data/FlumeData.1449656815028");
Configuration config = new Configuration();
SeekableInput input = new FsInput(path, config);
DatumReader<GenericRecord> reader = new GenericDatumReader<>();
FileReader<GenericRecord> fileReader = DataFileReader.openReader(input, reader);
for (GenericRecord datum : fileReader) {
System.out.println("value = " + datum);
}
fileReader.close();
}
}
Exception stack trace which I got is :
2015-12-09 17:48:19,291 WARN [main] util.NativeCodeLoader (NativeCodeLoader.java:<clinit>(62)) - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
value = {"id": "674535686809120768", "user_friends_count": 1260, "user_location": "ユウサリ", "user_description": "「テガミバチ」に登場するザジのbotです。追加してほしい言葉などの希望があればＤＭでお願いします。リムーブする際はブロックでお願いします。", "user_statuses_count": 47762, "user_followers_count": 1153, "user_name": "ザジ", "user_screen_name": "zazie_bot", "created_at": "2015-12-09T15:56:54Z", "text": "#ill_akane_bot お前、なんか、\u2026すっげー楽しそうだな\u2026", "retweet_count": 0, "retweeted": false, "in_reply_to_user_id": 204695477, "source": "<a href=\"http:\/\/twittbot.net\/\" rel=\"nofollow\">twittbot.net<\/a>", "in_reply_to_status_id": 674535430423887872, "media_url_https": null, "expanded_url": null}
Exception in thread "main" org.apache.avro.AvroRuntimeException: java.io.IOException: Block size invalid or too large for this implementation: -40
at org.apache.avro.file.DataFileStream.hasNextBlock(DataFileStream.java:275)
at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:197)
at avro.AvroReader.main(AvroReader.java:24)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
Caused by: java.io.IOException: Block size invalid or too large for this implementation: -40
at org.apache.avro.file.DataFileStream.hasNextBlock(DataFileStream.java:266)
... 7 more
Also do I need to give the Avro schema for the Avro file to be read correctly, if so where ?

I also met this problem. Though I can see your data file which does not exist any more. I have checked this my data file which should be the same with yours.
I found my data file was already an avro container file that means it has its schema and data.
The avro file I got was very wrong because it should just include one head that contains avro schema, but actually it has multiple head in its file.
Another thing is tweets already are JSON format, why flume convert them to avro format?

Convert OWL/XML in RDF/XML with a simple command line in shell

I'm asking your help to create a converter to transform OWL/XML into RDF/XML. My purpose is to use OWLapi 2 through a simple shell command with bash.
My files are in OWL/XML but I have to transform them into RDF/XML to send them in my fuseki database. I could transform each file thanks to Protégé or a converter available online, but I've more than one thousand files to convert.
See my current java file (but I don't know how to use it) :
package owl2rdf;
import java.io.File;
import org.semanticweb.owlapi.apibinding.OWLManager;
import org.semanticweb.owlapi.io.RDFXMLOntologyFormat;
import org.semanticweb.owlapi.model.IRI;
import org.semanticweb.owlapi.model.OWLOntology;
import org.semanticweb.owlapi.model.OWLOntologyManager;
#SuppressWarnings("deprecation")
public class owl2rdf {
public static void main(String[] args) throws Exception {
// Get hold of an ontology manager
OWLOntologyManager manager = OWLManager.createOWLOntologyManager();
// Load the ontology from a local files
File file = new File(args[0]);
System.out.println("Loaded ontology: " + file);
OWLOntology ontology = manager.loadOntologyFromOntologyDocument(file);
// Get the ontology format ; in our case it's normally OWL/XML
OWLOntologyFormat format = manager.OWLOntologyFormat(file);
System.out.println(" format: " + format);
// save the file into RDF/XML format
RDFXMLOntologyFormat rdfxmlFormat = new RDFXMLOntologyFormat();
manager.saveOntology(ontology, rdfxmlFormat, IRI.create(file));
}
}
When I execute this code, I've many errors relative to exceptions I don't understand at all, but I saw it's a common error:
Exception in thread "main" java.lang.reflect.InvocationTargetException
Caused by: java.lang.NoClassDefFoundError: com/google/inject/Provider
Caused by: java.lang.ClassNotFoundException: com.google.inject.Provider

To change a entire repository of an OWL/XML file into RDF/XML file:
1- create your package owl2rdf.java
package owl2rdf;
//import all necessary classes
import java.io.File;
import org.semanticweb.owlapi.apibinding.OWLManager;
import org.semanticweb.owlapi.io.RDFXMLOntologyFormat;
import org.semanticweb.owlapi.model.IRI;
import org.semanticweb.owlapi.model.OWLOntology;
import org.semanticweb.owlapi.model.OWLOntologyManager;
#SuppressWarnings("deprecation")
public class owl2rdf {
#create a main() function to take an argument; here in the example one argument only
public static void main(String[] args) throws Exception {
// Get hold of an ontology manager
OWLOntologyManager manager = OWLManager.createOWLOntologyManager();
// Load the ontology from a local files
File file = new File(args[0]);
System.out.println("Loaded ontology: " + file);
OWLOntology ontology = manager.loadOntologyFromOntologyDocument(file);
// save the file into RDF/XML format
//in this case, my ontology file and format already manage prefix
RDFXMLOntologyFormat rdfxmlFormat = new RDFXMLOntologyFormat();
manager.saveOntology(ontology, rdfxmlFormat, IRI.create(file));
}
}
2- Thanks to a Java-IDE such as Eclipse or something else, manages all dependencies (repo Maven, downloads jar, classplath, etc.)
3- create your bash scrip my-scrip.sh; here absolutely not optimized
#!/bin/bash
cd your-dir
for i in *
do
#get the absolute path; be careful, realpath comes with the latest coreutils package
r=$(realpath "$i")
#to be not disturb by relative path with java -jar, I put the .jar in the parent directory
cd ..
java -jar owl2rdf.jar "$r"
cd your-dir
done
echo "Conversion finished, see below if there are errors."
4- Execute your script
$ chmod +x my-script.sh;./my-script
5- haha moment: all your OWL/XML are converted in RDF/XML. You can for example, import them into fuseki or sesame database.

NoClassDefFoundError after adding Opencsv jar to project

I've added the open csv jar to my project to enable data to be written out to file in csv format.
The jar file was added using the following steps:
1.Properties --> Add external jars --> opencsv-3.1.jar
2.Order & Eport tab --> tick, opencsv-3.1.jar
But when I run the project I get an error stating that one of the methods belonging to the opencsv jar cannot be found: java.lang.NoClassDefFoundError: com.opencsv.CSVWriter
Does anyone know how to resolve this error or am I missing some step in adding the jar to the project?
`

See javadoc of API
CSVWriter is in au.com.bytecode.opencsv package

As cross-listed from here, this was my solution to the problem:
I have been struggling with getting OpenCSV set up with Maven and eclipse for a while due to exactly the same error. Ultimately I abandoned OpenCSV and used CSVParser instead, which is available from the Apache Commons and works much more readily.
Update your POM with the dependency listed here, and the following will work out-of-the-box:
import org.apache.commons.csv.CSVFormat;
import org.apache.commons.csv.CSVParser;
import org.apache.commons.csv.CSVRecord;
import java.io.FileReader;
import java.io.Reader;
public class importFile {
public static void main(String[] args) {
Reader in = new FileReader( csvFileInput );
CSVParser parser = new CSVParser( in, CSVFormat.DEFAULT );
List<CSVRecord> list = parser.getRecords();
for( CSVRecord row : list )
for( String entry : row )
System.out.println( entry );
}
}

How to avoid java.lang.NoClassDefFoundError

I have a code for adding the texts to existing .doc file and it'll save that as another name by using apache POI.
The following is the code I have tried so far
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;
import org.apache.poi.xwpf.model.XWPFHeaderFooterPolicy;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFFooter;
import org.apache.poi.xwpf.usermodel.XWPFTable;
public class FooterTableWriting {
public static void main(String args[])
{
String path="D:\\vignesh\\AgileDocTemplate.doc";
String attch="D:\\Attach.doc";
String comment="good";
String stat="ready";
String coaddr="xyz";
String cmail="abc#gmail.com";
String sub="comp";
String title="Globematics";
String cat="General";
setFooter(path, attch, comment, stat, coaddr, cmail, sub, title, cat);
}
private static void setFooter(String docTemplatePath,String attachmentPath,String comments,String status,String coAddress,String coEmail,String subject,String title,String catagory)
{
try{
InputStream input = new FileInputStream(new File(docTemplatePath));
XWPFDocument document=new XWPFDocument(input);
XWPFHeaderFooterPolicy headerPolicy =new XWPFHeaderFooterPolicy(document);
XWPFFooter footer = headerPolicy.getDefaultFooter();
XWPFTable[] table = footer.getTables();
for (XWPFTable xwpfTable : table)
{
xwpfTable.getRow(1).getCell(0).setText(comments);
xwpfTable.getRow(1).getCell(1).setText(status);
xwpfTable.getRow(1).getCell(2).setText(coAddress);
xwpfTable.getRow(1).getCell(3).setText(coEmail);
xwpfTable.getRow(1).getCell(4).setText(subject);
xwpfTable.getRow(1).getCell(5).setText(title);
xwpfTable.getRow(1).getCell(6).setText(catagory);
}
File f=new File (attachmentPath.substring(0,attachmentPath.lastIndexOf('\\')));
if(!f.exists())
f.mkdirs();
FileOutputStream out = new FileOutputStream(new File(attachmentPath));
document.write(out);
out.close();
System.out.println("Attachment Created!");
}
catch(Exception e)
{
e.printStackTrace();
}
}
}
The following is what I got
org.apache.poi.POIXMLException: org.apache.xmlbeans.XmlException: error: The document is not a document#http://schemas.openxmlformats.org/wordprocessingml/2006/main: document element mismatch got themeManager#http://schemas.openxmlformats.org/drawingml/2006/main
at org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:124)
at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:200)
at org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:74)
at ext.gt.checkOut.FooterTableWriting.setFooter(FooterTableWriting.java:32)
at ext.gt.checkOut.FooterTableWriting.main(FooterTableWriting.java:25)
Caused by: org.apache.xmlbeans.XmlException: error: The document is not a document#http://schemas.openxmlformats.org/wordprocessingml/2006/main: document element mismatch got themeManager#http://schemas.openxmlformats.org/drawingml/2006/main
at org.apache.xmlbeans.impl.store.Locale.verifyDocumentType(Locale.java:458)
at org.apache.xmlbeans.impl.store.Locale.autoTypeDocument(Locale.java:363)
at org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Locale.java:1279)
at org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Locale.java:1263)
at org.apache.xmlbeans.impl.schema.SchemaTypeLoaderBase.parse(SchemaTypeLoaderBase.java:345)
at org.openxmlformats.schemas.wordprocessingml.x2006.main.DocumentDocument$Factory.parse(Unknown Source)
at org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:92)
... 4 more
I have added all the jar files corresponding to this but still I can't find the solution.I'm new to this apache poi so please help me with some explanations and examples.
Thanks

Copied from my comment done to the question:
Looks like you need poi-ooxml-schemas.jar that comes in the Apache POI distribution. Just adding a single jar doesn't mean that you have all the classes of the framework.
After solving the problem based on my comment (or another people answers), you have this new Exception
org.apache.xmlbeans.XmlException: error: The document is not a document#http://schemas.openxmlformats.org/wordprocessingml/2006/main: document element mismatch got themeManager#http://schemas.openxmlformats.org/drawingml/2006/main
Reading Apache POI - HWPF - Java API to Handle Microsoft Word Files, it looks like you're using the wrong class to handle 2003- word documents: HWPF is the name of our port of the Microsoft Word 97(-2007) file format to pure Java ... The partner to HWPF for the new Word 2007 .docx format is XWPF.. This means that you need HWPFDocument class to handle the document or change your document from Word 2003- to Word 2007+.
IMO I find Apache POI as a good solution to handling Excel files, but I would look another options to handling Word documents. Check this question to get more related info.

This is the dependency hierarchy for poi-ooxml-3.9.jar.
Which means any of them can be used at runtime even if they aren't used at compile-time.
Make sure you have all the jars in the classpath of your project.

Add this dependency on your config file:
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>ooxml-schemas</artifactId>
<version>1.3</version>
</dependency>
or
System couldn’t find the
poi-ooxml-schemas-xx.xx.jar
Please add the library to your classpath.

The class org.openxmlformats.schemas.wordprocessingml.x2006.main.DocumentDocument.Factory is located in the jar ooxml-schemas-1.0.jar which can be downloaded here

You're getting that error because you don't have the proper dependency for the XWPFDocument. ooxml-schemas requires xmlbeans, and ooxml requires poi and ooxml-schemas, etc...
Check here: http://poi.apache.org/overview.html#components

Thought I would report my experience with this error. I started getting it out of the blue, and hadn't changed anything in my workspace. Turns out that it occurs while trying to read an Excel file that has more than 1 sheet (second sheet was a pivot table, large amount of data. Not quit sure if it's due to the size of the data (I suspect so, because I HAVE read Excel files that contain more than 1 worksheet). When I deleted that second sheet, it worked. No changes to classpath needed.

org.apache.poi.POIXMLException: org.apache.xmlbeans.XmlException: Element themeManager#http://schemas.openxmlformats.org/drawingml/2006/main is not a valid workbook#http://schemas.openxmlformats.org/spreadsheetml/2006/main document or a valid substitution.
Solution :- use .xlsx format instead of .xls

FWIW I had to add this:
compile 'org.apache.poi:ooxml-schemas:1.3'

For my case I had different versions of poi(s). poi-scratchpad was of 3.9 and all others - poi, poi-ooxml,poi-ooxml-schemas were of 3.12. I changed version of poi-scratchpad to 3.12 as well and everything started working.

If you are not using maven for your project dependencies. You should have the following jars in your classpath

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

csv to arff conversion - java

There are two points here. On line 26 you use args[1] when I believe you mean to use args1. Trying to add database driver (JDBC) does not prevent your code from running successfully. Weka's official reasoning Stack Overflow answer

Related

Poi 5.0 Jar movement Issue in XWPFDocument

Avro Text file generated by Flume Twitter Agent not being read in Java

Convert OWL/XML in RDF/XML with a simple command line in shell

NoClassDefFoundError after adding Opencsv jar to project

How to avoid java.lang.NoClassDefFoundError

Categories

Resources