java lyx to String - java

I would like to know if there are any java libaries which can extract the plain text of a .lyx document, so that I get the unformated content for further analysis. This seems like a relatively easy job, but I so long, I didn't find anything which I could integrate in my project.
Daniel

Related

PDF manipulation with placeholders

I am looking for a Java tool that can manipulate an existing PDF containing placeholders like ${foo}. I want to generate mail merge documents from that.
I found a lot of solutions with forms but this seems not suitable for me. Currently I generate the PDF with iText but this is a really annoying task to convert existing Word files or similar. I didn't find another solution with iText so far.
I also used JODReports in conjunction with JODConverter but it is necessary to run OpenOffice as a service and the performance is bad.

Parsing structured documents in Java

I would like to parse some legal documents with a Java library into pieces of text that represent headers, paragraphs etc. Legal documents are usually well-structured, so I would like to use something a bit easier than JavaCC (or other parser generators). Are there any which would allow to (nearly) automatically detect such a structure?
Thanks.
I think there is no tool that can "nearly automatically" extract such structures. If it is realy easy to extract the structure you would not need any tool, you can easely code it yourself. If it is not so easy you need a tool that is powerfull enough (JavaCC, ANTLR ...).
I think parsing the text yourself with custom code is the best way. Maybe read beforehand a bit about parsing (recursive decent, lexer/parser seperation...). For simple structures it is not hard to get a working solution quickly.
Apache POI - the Java API for Microsoft Documents
Apache PDFBox - Java PDF Library
easier one will be Apache Tika - a content analysis toolkit, toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries.
it uses pdfbox and poi internally
use: java -jar tika-app-0.9.jar [option] [file] -t
will parse the file(s) specified on the
command line and output the extracted text content

Download Pubmed Abstracts in Java

Does anyone have an implementation of a program that downloads pubmed abstracts with title, author, date, and content to separate plaintext files given a MESH term?
http://www.ncbi.nlm.nih.gov/entrez/eutils/soap/v2.0/DOC/esoap_java_help.html has an example. It worked for me like a charm.
I posted the code as a maven project on github
There is a built-in function for downloading different type of files (for example XML, CSV, and plain text files) right on the PubMed homepage. Just make a search and then select "Send to" where you'll be given a plethora of options.
As an alternative to esoap you can also use RESTful API.
Assuming that you want to get all articles with MESH keyword: galactosylceramides then your query would look like:
http://www.ebi.ac.uk/europepmc/webservices/rest/search/resulttype=core&query=KW:galactosylceramides
Of course, you have to parse xml result, but I don't think it's a big problem.
There is an example here, but not in Java. http://www.ncbi.nlm.nih.gov/books/NBK25500/

Where in the NetBeans 6.9.1 source code is the code for scanning Java files and finding all the methods, variables, imports etc

I'm looking for a sample or possibly even code I can use which scans Java files and tells me key pieces of information about each class, which i can then use much like the NetBeans refactoring and go to source features do.
Instead of reusing the Netbeans sources, you should probably just find a good library.
tells me key pieces of information about each class
Depending on your definition of "key pieces", I would recommend QDox:
http://qdox.codehaus.org/
QDox is a high speed, small footprint
parser for extracting
class/interface/method definitions
from source files complete with
JavaDoc #tags.
If you are looking for reusing Netbeans code which parses Java file, I don't know.
If you are looking for how to parse a Java file, you can try ANTLR. ANTLR is a parser generator. There exists Java grammar which you can use right away. Once, you generate a Java parser, you can use the parser to parse your Java file. You will have to learn how to use ANTLR.
I don't understand very well your question.
If you want to navigate inside source java code with netbeans, push CTRL and with your mouse go over the word you want to explore ; netbeans highligth the word, and if you clic you go to the source.

A solution to highlight some parts of a text file in java ? How to implement a simple DSL editor?

I'm trying to find a solution to highlight part of a text file in Java.
Basically, what I'm doing is lexing and parsing a text file respecting a certain grammar, storing some information related to the various elements of this file and then logging the information to a database.
I would like to have something more visual like a representation of the text file with some parts highlighted (and an index of the various colors used) - or even better with some context-sensitive information attached to a particular token.
Is there an easy way to do so? Basically what I would like to have, in terms of features, is a really primitive Eclipse plugin for a particular language and stand-alone. Maybe there's a framework to build DSL editors, something like that.
Hope it is clear...
Thanks
I think Xtext is just what you are looking for, it generates an Eclipse editor and more from a grammar.
Although not for Eclipse, there's MPS by JetBrains (the makers of the now open source IntelliJ IDEA) which may be worth taking a look at:
http://www.jetbrains.com/mps/

Categories