I had used the GUI to train a classifier for some sample arff files . After training I saved the obtained model .
Now that I need to use this model file in my java code to classify some text , could you please tell me how should I proceed ? I dont want to do an evaluation but would like to classify the input text given .
I had gone thru the http://weka.wikispaces.com/Serialization & http://weka.wikispaces.com/Use+Weka+in+your+Java+code .
But still couldn't find code for it .I just got an way to load a model file .But didn't get any clue on classifying text directly to classes . Any help on this regard would be helpfull .
Although the previously suggested post is very nice, I believe that the one I produced some time ago better fits your needs, as it specifically deals with text, and it is generic regarding the classifier. Please check "A Simple Text Classifier in Java with WEKA".
See here, similar tutorial by Hidalgo - Java code with slides and textual dataset. It trains a machine, loads the model and classify real time unseen textual data. Very easy to follow
https://github.com/drelhaj/MachineLearning
Related
Good day to all. I am currently building a program that covers the review of product warranty applications. I'm doing it in javaFX using Netbeans. The program has the following scenes:
a screen where the information of each guarantee request is entered. all the information is stored in a table in a database. The interaction between the program and the database is done, in effect, through JDBC.
a screen where you can see a table that shows all the requests that have been saved. if a row is selected, a button that carries the third scene all the data of the request that was selected is enabled.
a screen where all the data of the tests that are made to the selected guarantee application are entered. The results are also stored in another table in the database.
After the application is evaluated, a warranty review report must be generated. Currently this format is generated in pdf from excel. What I want to do is that from the data results of the tests stored in the database I can dynamically generate the pdf formats from the program in javaFX. Is there a plugin to write these documents automatically? I'm good at writing texts in LaTEX, so if there is a way to generate the latex format from the program and call the necessary information from the database, it would be perfect. Thanks in advance for the help. Any indication or idea is welcome.
It seems like you have two core requirements:
Fetch data from the database suitable for reporting
Generate the report(s) in PDF from JavaFX but can fall back to LaTEX
What you really need seems like a PDF library for Java. I can suggest iText and Docmosis as good options (please note I work for Docmosis) - both are commercial for commercial products so you would have to buy.
Assuming you are using one of these libraries, the process for each report is:
execute the query to fetch the appropriate data for the report
manipulate the data if required to make the reporting stage simple
generate the report
Using iText you would write the query, the manipulation code and then the code to layout the report including the data.
Using Docmosis you would write the query, possibly some manipulation code (Docmosis can also work directly with your ResultSet) and the code to execute the report. The layout is designed in the template (Word or Libre Office Writer).
When you mention writing "these documents automatically" I assume you mean creating the PDF file format, which iText and Docmosis can do. If you mean creating the report layout itself, then you always need to design/write something to make the report do what you require.
I hope that helps.
Thank you very much for your response Paul! I had found something related to the libraries you mentioned, and indeed something like what I'm looking for. I notice that you are more in the subject. then, you do not know bookstore, preferably free, that gives me the possibility of doing the following (pseudo code):
take the row from the database
Save the information of that row in the attributes of a created class.
create text1: "the guarantee with reference" + object.attribute1 + "was not approved in view of the physical revision test indicated that" + object.attribute2 + "
create text2: "..."
...
create the text n: "..."
take text 1 and place it in the header of the pdf document
Take text 2, put it in bold and place it in the subtitle
Generate a table and fill it with the content of text 3, 4 ...
compile all information as a pdf, (word file, xls or others if possible)
I am clear that with the libraries that you recommend you can easily make items from
1 to 8, but I do not know if it is possible to enter the texts within a template created, so that the library accommodates all the texts in the respective zones of the template file. I imagine that this can easily be done with Latex, since everything is written in plain text.
I found a library called Java LaTeX Report (JLR) that allows me to do what I want. This information may be useful to someone. Thank you again for your answer Paul, if you consider the libraries that you mention do the job more easily than JLR please let me know!
I was assigned to work on this specific project, where we will be getting AFP(advanced function presentation) files and we need to get the documents, i.e.the content and the corresponding meta data. I have been looking into AFP(advanced function presentation) file format and haven't actually got any useful resource about how I should proceed with the task.
I have almost got no information up until now and don't know where to proceed. I looked into some open source projects and found this: https://github.com/yan74/afplib
I tried running it.. But it does not work on the sample AFP file which I have.
Really need some insight upon what resources should I go through to be able this project.
I need to write the code in Java and have gone through some licensed softwares which do the same,like PROARCHIVER and PAPYRUS.
Thanks in advance
AFP is an easy format, it's composed of structured fields, your first step is decoding them, download this: "Mixed Object Document Content Architecture Reference" read first 50 pages and write code to split afp into structured fields, in order to create an easy dump of your file.
After that if you want to extract images AFP world calls them IOCA, so you need: Image Object Content Architecture reference
If you want to extract text (called PTX) you need: Presentation Text Object Content Architecture Reference
good job
I'm trying to somehow compare a sole document's topic distribution (using LDA) with, other files and their topic distributions within a previously created topic model, using MALLET.
I know that this can be done through MALLET commands in terminal but I'm having problems in finding a way to implement this in Java.
To give a gist of what the functionality of my program is:
The already created topic model was created with a large corpus of texts. I want to use this to compare topic distributions with a tweet that contains a certain hashtag and to then pull out the file most similar to the tweet from the corpus.
Ive read through Mallet's Java API docs but they seem very confusing and not really explanatory.
If anyone could give me a few tips I'd appreciate it
First, take a look at these:
Developer's guide
Tutorial slides after slide 97
Code examples in the source directory: src/cc/mallet/examples
Now, these examples show the basic functionality, but they don't show how to save and load the model if you need to separate training from testing. Basically what you need is to save both the model and the instances after training (since you need to train and test with the same pipeline), and load them before testing.
Save model and pipeline after training:
model.write(new File("model.dat"));
instances.save(new File("pipeline.dat"));
Load model and pipeline before testing:
ParallelTopicModel model = ParallelTopicModel.read(new File("model.dat"));
InstanceList instances = InstanceList.load(new File("pipeline.dat"));
Hope this helps.
I've been searching for an answer to this for a while to no avail.
First a bit of background: I'm trying to create an AI for robocode using Weka.
I'm first logging the required data from a manual robot to an ARFF file, this is working as it should.
This data is then processed this using Weka and a model created, I'm then saving this file.
I can successfully import the model and classify a dataset that has been imported from another arff file and use the results.
What I want to do now is every time the game status changes is assemble an instance and classify it, to decide for example which way to move etc. using my previously saved model.
I've tried to look it up on the wiki: http://weka.wikispaces.com/Programmatic+Use
and this ibm tutorial: http://www.ibm.com/developerworks/opensource/library/os-weka3/ to name a couple, I've also been looking through the APIs but that hasn't given me much to go on.
Much of what I've tried is deprecated, for example creating a prototype with the attributes and fast vectors then creating an empty dataset. Then creating a new instance with the required values using somthing like inst.setvalue(attrib, value) and adding it to the dataset.
Also what about the class index, or the attribute I'm predicting, in the instance does it have to be null or set to missing or something, as surley I won't know that value as I'm trying to predict it?
So are there any ideas how I can go about this?
any help is greatly appreciated,
Thank you muchly.
Managed to find the answer a while ago.
For anyone else having trouble with this basically what you have to do is in the Weka manual included with every download, (its a pdf).
Page 202 onwards in the manual - Section 16.3 "Creating datasets in memory".
Follow the steps there and it works perfectly.
How to get print out a document (Which taken from data base or current fields form the form) in java with customized page size. Mostly important thing is I want to customize the page as my requirements (May be text alignment also needed). am Not a java hard coder. Your helps will me big help to me.
Thanks.
not clear what is (Which taken from data base or current fields form the form) , I suggest to go throught the 2D Graphics tutorial, there is detailed described Printing in Java
Everywhere I've worked that wanted well formatted output from a Java back-end we've deployed Apache FO (http://xmlgraphics.apache.org/fop/) which allowed us to use XSLT to convert XML to PDF. It works really well, but has a pretty steep learning curve.