Create your own gazetteer list

Create your own gazetteer list - java

I am new to Natural Language Processing and GATE.Currently I'm learning to use GATE / ANNIE . ANNIE's default gazetteer lists are great, but obviously they don't provide lists for everything.
I need to create a list of characters in a story book.
Creating lists and adding entries to each and every list from GATE Gazetteer Editor (as mentioned in Gate manual 13.2.2) or using a text editor does not seem to be practicable .So anyone knows a method to create our own gazetteer lists other than,creating/editing directly through GATE or using a text editor?

As said in the GATE manual you can edit any of the existing lists in a text editor. Probably the most straight-forward way is to create these lists programatically. I.e. if you have them in a database, dump records in the gazetteer format (basically one word per line). If you have them in a csv or a web page, export them to the right format.
Another option is to use a more advanced gazetteer which uses an ontology or semantic repository. See the manual link above for different gazetteers and how to work with them.

I created a list using the contents of a column from a database table as suggested above. Simply saved it as a .lst file using Notepad++ in the same directory as all the other .lst files (I'm using the ANNIE gazetteer) and then added it using the gazetteer editor.
One problem that I ran into was not having it saved in the correct encoding (UTF-8). GATE didn't like it and it showed in the messages when loading. Once I figured that out and corrected it, it worked fine.
If you need to create a list of entities from text maybe you could look into the gazetteer list collector - http://gate.ac.uk/sale/tao/splitch13.html - 13.7

Related

generate PDF calling information from database in javaFX

Good day to all. I am currently building a program that covers the review of product warranty applications. I'm doing it in javaFX using Netbeans. The program has the following scenes:
a screen where the information of each guarantee request is entered. all the information is stored in a table in a database. The interaction between the program and the database is done, in effect, through JDBC.
a screen where you can see a table that shows all the requests that have been saved. if a row is selected, a button that carries the third scene all the data of the request that was selected is enabled.
a screen where all the data of the tests that are made to the selected guarantee application are entered. The results are also stored in another table in the database.
After the application is evaluated, a warranty review report must be generated. Currently this format is generated in pdf from excel. What I want to do is that from the data results of the tests stored in the database I can dynamically generate the pdf formats from the program in javaFX. Is there a plugin to write these documents automatically? I'm good at writing texts in LaTEX, so if there is a way to generate the latex format from the program and call the necessary information from the database, it would be perfect. Thanks in advance for the help. Any indication or idea is welcome.

It seems like you have two core requirements:
Fetch data from the database suitable for reporting
Generate the report(s) in PDF from JavaFX but can fall back to LaTEX
What you really need seems like a PDF library for Java. I can suggest iText and Docmosis as good options (please note I work for Docmosis) - both are commercial for commercial products so you would have to buy.
Assuming you are using one of these libraries, the process for each report is:
execute the query to fetch the appropriate data for the report
manipulate the data if required to make the reporting stage simple
generate the report
Using iText you would write the query, the manipulation code and then the code to layout the report including the data.
Using Docmosis you would write the query, possibly some manipulation code (Docmosis can also work directly with your ResultSet) and the code to execute the report. The layout is designed in the template (Word or Libre Office Writer).
When you mention writing "these documents automatically" I assume you mean creating the PDF file format, which iText and Docmosis can do. If you mean creating the report layout itself, then you always need to design/write something to make the report do what you require.
I hope that helps.

Thank you very much for your response Paul! I had found something related to the libraries you mentioned, and indeed something like what I'm looking for. I notice that you are more in the subject. then, you do not know bookstore, preferably free, that gives me the possibility of doing the following (pseudo code):
take the row from the database
Save the information of that row in the attributes of a created class.
create text1: "the guarantee with reference" + object.attribute1 + "was not approved in view of the physical revision test indicated that" + object.attribute2 + "
create text2: "..."
...
create the text n: "..."
take text 1 and place it in the header of the pdf document
Take text 2, put it in bold and place it in the subtitle
Generate a table and fill it with the content of text 3, 4 ...
compile all information as a pdf, (word file, xls or others if possible)
I am clear that with the libraries that you recommend you can easily make items from
1 to 8, but I do not know if it is possible to enter the texts within a template created, so that the library accommodates all the texts in the respective zones of the template file. I imagine that this can easily be done with Latex, since everything is written in plain text.
I found a library called Java LaTeX Report (JLR) that allows me to do what I want. This information may be useful to someone. Thank you again for your answer Paul, if you consider the libraries that you mention do the job more easily than JLR please let me know!

Is there a clean way to to transform text files that are not the same into a standard format

I'm pretty sure the answer i'm going to get is: "why don't you just have the text files all be the same or follow some set format". Unfortunately i do not have this option but, i was wondering if there is a way to take any text file and translate it over to another text or xml file that will always look the same?
The text files pretty much have the same data just arranged differently.
The closest i can come up with is to have an XSLT sheet for each text file but, then i have to turn around and read the file that was just created, delete it, and repeat for each text file.
So, is there a way to grab the data off text files that essentially have the same data just stored differently; and store this data in an object that i could then re-use later on in some process?
If it was up to me, i would push for every text file to follow some predefined format since they all pretty much contain the same data but, it's not up to me.

Odd question... You say they are text files yet mention XSLT as a possible solution. XSLT will only work if the source is XML, if that is so, please redefine the question. If you say text files I assume delimiter separated (e.g. csv), fixed length,...
There are some parsers (like smooks) out there that allow you to parse multiple formats, but it will still require you to perform the "mapping" yourself of course.
This is a typical problem in the integration world so any integration tool should offer you a solution (e.g. wso2, fuse,...).

Update objects written to a text files in java

Writing Java objects or a List into a text file is ok. But I want to know how I can update or rewrite a object which was written previously without writing objects again. For example, let s assume there is a java.util.List has a set of Objects. and then that list is written to a text file. Then later that file will be read again and get all objects from list and then change one object's value at run time by a java application. Then I don't need to write entire list back to the text file. Instead only the updated object in the list is required to be rewritten or updated in the text file without rewriting the whole list again. Any suggestion, or helpful source with sample codes please.

Take a look at RandomAccessFile. This will let you seek to the place in the file you want, and only update the part that you want to update.
Also take a look at this question on stackoverflow.

Without some fairly complex logic, you won't usually be able to update an object without rewriting the entire file. For example, if one of the objects on your list contains a string "shortstring", and you need to update it with string "muchmuchlongerstring", there will be no space in the file for the longer string without rewriting all the following content in the file.
If you want to persist large object trees to a file and still have the ability to update them, your code will be less buggy and life will be simplified by using one of the many file-based DBs out there, like:
SQLite (see Java and SQLite)
Derby
H2 (disk-based tables)

Java: How can I assemble/create a single instance for classification using a Weka generated model?

I've been searching for an answer to this for a while to no avail.
First a bit of background: I'm trying to create an AI for robocode using Weka.
I'm first logging the required data from a manual robot to an ARFF file, this is working as it should.
This data is then processed this using Weka and a model created, I'm then saving this file.
I can successfully import the model and classify a dataset that has been imported from another arff file and use the results.
What I want to do now is every time the game status changes is assemble an instance and classify it, to decide for example which way to move etc. using my previously saved model.
I've tried to look it up on the wiki: http://weka.wikispaces.com/Programmatic+Use
and this ibm tutorial: http://www.ibm.com/developerworks/opensource/library/os-weka3/ to name a couple, I've also been looking through the APIs but that hasn't given me much to go on.
Much of what I've tried is deprecated, for example creating a prototype with the attributes and fast vectors then creating an empty dataset. Then creating a new instance with the required values using somthing like inst.setvalue(attrib, value) and adding it to the dataset.
Also what about the class index, or the attribute I'm predicting, in the instance does it have to be null or set to missing or something, as surley I won't know that value as I'm trying to predict it?
So are there any ideas how I can go about this?
any help is greatly appreciated,
Thank you muchly.

Managed to find the answer a while ago.
For anyone else having trouble with this basically what you have to do is in the Weka manual included with every download, (its a pdf).
Page 202 onwards in the manual - Section 16.3 "Creating datasets in memory".
Follow the steps there and it works perfectly.

best practices question: How to save a collection of images and a java object in a single file? File is read to be rendered

I am making a java program that has a collection of flash-card like objects. I store the objects in a jtree composed of defaultmutabletreenodes. Each node has a user object attached to it with has a few string/native data type parameters. However, i also want each of these objects to have an image (typical formats, jpg, png etc).
I would like to be able to store all of this information, including the images and the tree data to the disk in a single file so the file can be transferred between users and the entire tree, including the images and parameters for each object, can be reconstructed.
I had not approached a problem like this before so I was not sure what the best practices were. I found XLMEncoder (http://java.sun.com/j2se/1.4.2/docs/api/java/beans/XMLEncoder.html) to be a very effective way of storing my tree and the native data type information. However I couldn't figure out how to save the image data itself inside of the XML file, and I'm not sure it is possible since the data is binary (so restricted characters would be invalid). My next thought was to associate a hash string instead of an image within each user object, and then gzip together all of the images, with the hash strings as the names and the XMLencoded tree in the same compmressed file. That seemed really contrived though.
Does anyone know a good approach for this type of issue?
THanks!
Thanks!

Assuming this isn't just a serializable graph, consider bundling the files together in Jar format. If you already have your data structures working with XMLEncoder, you can reuse this code by saving the data as a jar entry.
If memory serves, the jar library has better support for Unicode name entries than the zip package, which is why I would favour it.

You might consider using an MS JET database (.mdb file) and storing all the stuff in there. That'll also make it easy to examine and edit the data in (for example) MS Access.

You can employ some virtual file system, which stores it's data in a single container. We develop and offer one of such files sytems, SolFS, however right now there's no Java binding for it. We will release Java JNI interface for SolFS within a month.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.