Using Apache UIMA ConceptMapper in a "proof-of-concept mode" - java

I'm trying to use UIMA ConceptMapper to extract some key concepts and other interesting metadata from text documents. Due to the time constraints of the project and the fact that I'm not sure if UIMA ConceptMapper will work in this scenario, does anyone know of any quick way to create a basic program using ConceptMapper? That is, can I get away with a quick proof-of-concept without having to write:
Analysis engine descriptor
Different structures, interfaces, etc.
other various meta-stuff
just to see what it can annotate from a single document? Obviously, if it works on a proof-of-concept level, then the long-term plan is to have all those structures in place...

Have you tried the Ruta benchmark? It will let you quickly prototype with WORDTABLE and WORDLIST, similar to what ConceptMapper can do.

Related

Niche Templating Engine for Batch Jobs

First of all start off by saying this is more of an exploratory question more than a technical problem. I feel it doesn't belong to Code Review because there's nothing to review. I'm just trying to figure out the best approach to take.
My requirement is to build a batch process that can process user-defined files. These files usually come from external sources, so the filenames are not standard. One requirement that's causing me some headaches is supporting arbitrary dates in the filenames. And since these are batch job definitions that run on particular intervals, the definition has to be flexible enough to support it.
For example, one definition might be
File1_Type1_{CurrentDate in YYMMDD}
File1_Type2_{CurrentDate in YYYYMMDD}
File1_Type3_Static_Text
So basically, I feel like I need a full-fledged template engine in order to support these cases. However, that sounds like huge overkill, so I'm interested to hear people's thoughts on this.
Since I'm focusing on Java/Scala, I've found this library
https://scalate.github.io/scalate/documentation/ssp-reference.html
If we let users create ssp files like so:
#import(java.util.Date)
File1_Type1_${new Date}
then it gives the user full control over the entire formatting. But feels overkill to me? Or not? Welcome any feedback.
There are a huge number in the Java space. I've used Apache velocity and https://freemarker.apache.org/.
I'm not aware of anything in the Scala space, but it would be an intriguing idea.

Does mahout work real time or does it pre-process the data based on the algorithm rules?

I am trying to build a recommendation engine, for that I am thinking of using apache mahout but I am unable to make out if mahout process the data in real time or does it pre-process the data when the server is idle and store the results somewhere in the database.
Also does anyone have any idea what approach do sites like amazon,netflix follow?
Either/or, but not both. There are parts inside from an older project that are essentially real time for moderate scale. There are also Hadoop based implementations which are all offline. The two are not related.
I am a primary creator of these parts, and if you want a system that does both together, I suggest you look at my current project Myrrix (http://myrrix.com)

Java; Runtime Interpretation; Strategies To Add Plugins

I'm beginning to start on my first large project. It will be a program very similar to Rosetta Stone. It will be a program, used for learning a foreign language, written in Java using Swing. In my program I plan on the user being able to select downloaded courses to learn from. I will be able to create an English course since I am a native English speaker. However, I want people who speak other languages to be able to write courses for users to use as well (this is an essential part for my program to work).
Since I want the users to be able to download courses of languages they want, having it hard-coded into the program is out of the question. The courses needed to be interpreted during the runtime. Also since I want others to collaborate with my work (ie make courses), I need to make it easy for them to do so.
What would be the best way to go about doing this?
The idea I have come up with is having a strict empty course outline (hard-coded) with a simple xml file which details the text and sounds to be used. The drawback to this is that it extremely limits the author. Different languages may need to start out with learning different parts.
Any advice on the problem at hand as well as the project as a whole will be greatly appreciated. Any links to any relevant resources or information would also be greatly appreciated.
Think you for your time and effort,
Joseph Pond
Simply, you should base your program on a system such as Eclipse RCP, or the Netbeans Platform. Both of these systems already deal with exactly this problem, and both are perfectly adequate for this task. They're not just for IDEs.
It's a larger first step as you will need to learn one of these platforms beyond simply just Swing.
But, they solve the problem, and their overall organization and technique will serve your program well anyway.
Don't reinvent this wheel, just learn one of these instead.
If you are set on doing this from scratch (Will's idea isn't bad), What I would do is first lay down the file format that would be easiest to create your language course in. It could be XML, plaintext or some other format you come up with yourself.
You will probably need some flexibility in the language format because you will want to actually be able to specify things like questions and answers. XML is a pain because of all the extra terminators, but it gives a good amount of meta-data. If you like XML for that, you may consider defining your language file in YML, it gives you the data of XML but uses whitespace delineators instead of angle brackets.
You probably also want to define your file in the language it's created for, so you might or might not want to require english words as keys. If you don't want any english, you may have to skip both XML and YML and come up with your own file format--possibly where the layout and/or special symbols define the flow and "functionality".
Once you have defined the file format, you won't have to worry about hard-coding anything... you won't be able to because it will already be in the file.
Plug-in functionality would be nice as well... This is where your definition file also contains information that tells you what class to instantiate (reflectively) and use to parse/display the data. In that way you could add new types of questions just by delivering a new jar file.
If this is confusing, sorry, this is difficult in a one-way forum because I can't look at your face and see if you're following me or if I'm even going in the right direction. If you think I'm on the right track and want more details (I've done a bit of this stuff before) feel free to leave a follow-up question (or an email address) in a comment and I'd be glad to discuss it with you further.
If I was doing this, I'd seriously consider using Eclipse EMF to model the "language" for defining courses. EMF is rather daunting to start with, but it gives you:
A high-level model that can be entered/edited in a variety of ways.
An automatic mechanism for serializing "instances" (i.e. courses) to XML. (And you can tinker with the serialization if you choose.)
Automatically generated Java classes for in-memory representations of your instances. These provide APIs that are tuned to your model, an generic ones that are the EMF equivalent of Java reflection ... but based on EMF model classes rather than Java classes.
An automatically generated tree editor for your "instances".
Hooks for implementing your own constraints / validation rules to say what is a valid "course".
Related Eclipse plugins offer:
Mappings to text-based languages with generation of parsers/unparsers
Mappings to graphical languages; e.g. notations using boxes / arrows / etc
Various more advanced persistence mechanisms
Comparisons/differencing, model-to-model transformations, constraints in OCL, etc
I've used EMF in a couple of largish projects, and the main point that keeps me coming back for more is ease of model evolution ... compared with building everything at a lower level of abstraction. If my model (language) needs to be extended / changed, I can make the necessary changes using the EMF Model editor, regenerate the code, extend my custom code to do the right stuff with the extensions, and I'm pretty much done (modulo conversion of stored instances).

How to create WSDL file given SOAP WSDL operations

I haven't had any experience with web service related development. So, any ideas will be greatly appreciated.
Suppose, I have a file listing draft specification of WSDL operations. Following is one example. How would I go about creating the WSDL file. Is notepad sufficient or do I need to have WSDL editor?
getHostSystemInfo
Returns detailed information about host systems specified via given IDs.
input HostSystemIdCollection(Collection of Strings)
Output HostSystemInfoCollection
HostSystemInfo
Id: mandatory
Properties: Following properties should be provided for host systems
HostSystemName
HostSystemProperty1
HostSystemProperty2
HostSystemProperty3
....
....
If the question is just "how do I create the WSDL" then you could indeed use Notepad and just write it, it's only XML after all. However, writing syntactically correct XML by hand is pretty dull, and error prone. So I would recommend using WSDL aware tooling for example an Eclipse editor
An alternative is to write some Java which expresses the interface, and from it generate the WSDL. There are many ways of doing this, including starting with an EJB and annotating it accordingly. A few googles should help you find what you need.
My experience is that simple POC situations tend to work well starting at the Java. Larger scale projects benfit from considered designs starting at the WSDL.
coding WSDL by hand is a big pain! i used a XML editor for creation of and then generated the stubs with JAXWS. It is important to understand and differences of the WSDL styles, which is not trivial (have a look at WSDL styles). a good help is to import the WSDL schema to your IDE (eclipse, idea) and then work with autocompletion.
just for interest, why are you using WSDL + SOAP. if you have a choice and you use anyway HTTP, have a look at REST. It can make implementation of web-api a LOT easier, both on server side and for api-clients.
If you haven't done any web services before, I would strongly recommend a WSDL Editor. The Netbeans has a plugin that should help.
The other way of doing it, which may be easier is by using the Java annotations defined in JSR 181.
Of course you could use the worst text editor in the world (!) but I'd seriously consider using any decent XML editor or IDE (Eclipse's WSDL support is pretty decent). This will save you a lot of pain and suffer.
Or, if this is an option, you could just annotate a Java class with JAX-WS annotations and have your WSDL dynamically generated from the Java code. Personally, I prefer the WSDL-first approach, the Java-first approach is just a suggestion to get you started.
You could use Axis2 to create that for you.

Executing Constantly Changing Logic

I writing a dynamic HTML parsers functionality.
I will want to modify existing parsers and also would want to add more parsers (I expect parsers will be modified as sites a remodified and new parsers will be needed for new sites).
I started writing a generic functionality which use a XML with conditions and rules for each site but as this works fine for now, I'm pretty sure it will need constant modifications...
The parsers will parse and write the data to a DB.
My application runs on JBOSS 4.
Any known best practice for that?
Thanks,
Rod
Thanks for your answer. Maybe I was unclear. I realized that imm. from the rate my question got. What I am writing feature that manage parsers execution. Each parser will parse a different text document structure. Documents structure might change from time to time and more new structured document will be added to be parsed. I dont want to recompile build deploy my application for each arser change.
I want to manage the execution of each parser as theymight be executed in parralel or according to execution rules.
Does Using Java ScriptingEngine might be a good option?
There are lots of ways to have some code that can be modified without redeploying. Using groovy scripts to do the parsing is one. Is is a rather simple matter to check to see if the script has been modified and automatically reload it.
The design sounds convoluted to me, but IFF you prove to yourself there's not a much simpler way to accomplish the same task, you may want a rules engine like Drools...

Categories