I am using CoreNlp to get the information extraction from a large text. However, its using the "triple" approach where a single sentence produce many output which is good, but there are some sentences that doesn't make sense. I tried to eliminate this by running another unsupervised NLP and try to utilize function in CoreNlp, yet I stuck at getting word vector form CoreNlp. Can anyone point where do I need to start searching for codes that do the word embedding in CoreNlp? Also I am newbie in java and IT.
There are some open libraries like glove, word2vec, text2vec, but I noticed glove already been used in CoreNlp (correct me if wrong).
since training your own model from scratch might turn out to be a time-consuming task, you could just download pretrained vectors from:
https://nlp.stanford.edu/projects/glove/
however, there is an example with dl4j here that might do to trick:
https://github.com/deeplearning4j/dl4j-examples/blob/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/nlp/glove/GloVeExample.java
I want to use a math-expression parser of java code. In particular I would like to convert a math-expression given as String to an abstract syntax tree consisted of separate nodes.
Is there anyone to recommend me a relevant open source tool?
If no, how do you reckon the possibility to exploit Intellij source code to do this work?
Which classes are responsible for code parsing and analysis?
Are they included in idea.jar? How can I easily infiltrate their functionality (methods etc)?
I am speaking exclusively for Intellij.
Take a look at MVEL library.
If you only want the results of the math-expression you should revise the question and the answer i selected months ago:
Java 1.5: mathematical formula parser
Brieff description: use the java integration with dinamyc languajes like javascript to let them do the work for you
I would not use IntelliJ, as much as I love it.
If you need an AST, look no further than ANTLR. If you can write a grammar for your equations, ANTLR can generate a lexer/parser to create it for you.
I'm looking for a way to automatically generate source code for new methods within an existing Java source code file, based on the fields defined within the class.
In essence, I'm looking to execute the following steps:
Read and parse SomeClass.java
Iterate through all fields defined in the source code
Add source code method someMethod()
Save SomeClass.java (Ideally, preserving the formatting of the existing code)
What tools and techniques are best suited to accomplish this?
EDIT
I don't want to generate code at runtime; I want to augment existing Java source code
What you want is a Program Transformation system.
Good ones have parsers for the language you care about, build ASTs representing the program for the parsed code, provide you with access to the AST for analaysis and modification, and can regenerate source text from the AST. Your remark about "scanning the fields" is just a kind of traversal of the AST representing the program. For each interesting analysis result you produce, you want to make a change to the AST, perhaps somewhere else, but nonetheless in the AST.
And after all the chagnes are made, you want to regenerate text with comments (as originally entered, or as you have constructed in your new code).
There are several tools that do this specifically for Java.
Jackpot provides a parser, builds ASTs, and lets you code Java procedures to do what you want with the trees. Upside: easy conceptually. Downside: you write a lot more Java code to climb around/hack at trees than you'd expect. Jackpot only works with Java.
Stratego and TXL parse your code, build ASTs, and let you write "surce-to-source" transformations (using the syntax of the target language, e.g., Java in this case) to express patterns and fixes. Additional good news: you can define any programming language you like, as the target language to be processed, and both of these have Java definitions.
But they are weak on analysis: often you need symbol tables, and data flow analysis, to really make analyses and changes you need. And they insist that everything is a rewrite rule, whether that helps you or not; this is a little like insisting you only need a hammer in toolbox; after all, everything can be treated like a nail, right?
Our DMS Software Reengineering Toolkit allows the definition of an abitrary target language (and has many predefined langauges including Java), includes all the source-to-source transformation capabilities of Stratego, TXL, the procedural capability of Jackpot,
and additionally provides symbol tables, control and data flow analysis information. The compiler guys taught us these things were necessary to build strong compilers (= "analysis + optimizations + refinement") and it is true of code generation systems too, for exactly the same reasons. Using this approach you can generate code and optimize it to the extent you have the knowledge to do so. One example, similar to your serialization ideas, is to generate fast XML readers and writers for specified XML DTDs; we've done that with DMS for Java and COBOL.
DMS has been used to read/modify/write many kinds of source files. A nice example that will make the ideas clear can be found in this technical paper, which shows how to modify code to insert instrumentation probes: Branch Coverage Made Easy.
A simpler, but more complete example of defining an arbitrary lanauges and transformations to apply to it can be found at How to transform Algebra using the same ideas.
Have a look at Java Emitter Templates. They allow you to create java source files by using a mark up language. It is similar to how you can use a scripting language to spit out HTML except you spit out compilable source code. The syntax for JET is very similar to JSP and so isn't too tricky to pick up. However this may be an overkill for what you're trying to accomplish. Here are some resources if you decide to go down that path:
http://www.eclipse.org/articles/Article-JET/jet_tutorial1.html
http://www.ibm.com/developerworks/library/os-ecemf2
http://www.vogella.de/articles/EclipseJET/article.html
Modifying the same java source file with auto-generated code is maintenance nightmare. Consider generating a new class that extends you current class and adds the desired method. Use reflection to read from user-defined class and create velocity templates for the auto-generating classes. Then for each user-defined class generate its extending class. Integrate the code generation phase in your build lifecycle.
Or you may use 'bytecode enhancement' techniques to enhance the classes without having to modify the source code.
Updates:
mixing auto-generated code always pose a risk of someone modifying it in future to just to tweak a small behavior. It's just the matter of next build, when this changes will be lost.
you will have to solely rely on the comments on top of auto-generated source to prevent developers from doing so.
version-controlling - Lets say you update the template of someMethod(), now all of your source file's version will be updated, even if the source updates is auto-generated. you will see redundant history.
You can use cglib to generate code at runtime.
Iterating through the fields and defining someMethod is a pretty vague problem statement, so it's hard to give you a very useful answer, but Eclipse's refactoring support provides some excellent tools. It'll give you constructors which initialize a selected set of the defined members, and it'll also define a toString method for you.
I don't know what other someMethod()'s you'd want to consider, but there's a start for you.
I'd be very wary of injecting generated code into files containing hand-written code. Hand-written code should be checked into revision control, but generated code should not be; the code generation should be done as part of the build process. You'd have to structure your build process so that for each file you make a temporary copy, inject the generated source code into it, and compile the result, without touching the original source file that the developers work on.
Antlr is really a great tool that can be used very easily for transforming Java source code to Java source code.
I'm trying to find a solution to highlight part of a text file in Java.
Basically, what I'm doing is lexing and parsing a text file respecting a certain grammar, storing some information related to the various elements of this file and then logging the information to a database.
I would like to have something more visual like a representation of the text file with some parts highlighted (and an index of the various colors used) - or even better with some context-sensitive information attached to a particular token.
Is there an easy way to do so? Basically what I would like to have, in terms of features, is a really primitive Eclipse plugin for a particular language and stand-alone. Maybe there's a framework to build DSL editors, something like that.
Hope it is clear...
Thanks
I think Xtext is just what you are looking for, it generates an Eclipse editor and more from a grammar.
Although not for Eclipse, there's MPS by JetBrains (the makers of the now open source IntelliJ IDEA) which may be worth taking a look at:
http://www.jetbrains.com/mps/
I writing a dynamic HTML parsers functionality.
I will want to modify existing parsers and also would want to add more parsers (I expect parsers will be modified as sites a remodified and new parsers will be needed for new sites).
I started writing a generic functionality which use a XML with conditions and rules for each site but as this works fine for now, I'm pretty sure it will need constant modifications...
The parsers will parse and write the data to a DB.
My application runs on JBOSS 4.
Any known best practice for that?
Thanks,
Rod
Thanks for your answer. Maybe I was unclear. I realized that imm. from the rate my question got. What I am writing feature that manage parsers execution. Each parser will parse a different text document structure. Documents structure might change from time to time and more new structured document will be added to be parsed. I dont want to recompile build deploy my application for each arser change.
I want to manage the execution of each parser as theymight be executed in parralel or according to execution rules.
Does Using Java ScriptingEngine might be a good option?
There are lots of ways to have some code that can be modified without redeploying. Using groovy scripts to do the parsing is one. Is is a rather simple matter to check to see if the script has been modified and automatically reload it.
The design sounds convoluted to me, but IFF you prove to yourself there's not a much simpler way to accomplish the same task, you may want a rules engine like Drools...