The list below are some packages related to classifier among mahout-distribution-0.8.
org.apache.mahout.classifier
org.apache.mahout.classifier.df
org.apache.mahout.classifier.df.builder
org.apache.mahout.classifier.df.data
org.apache.mahout.classifier.df.data.conditions
org.apache.mahout.classifier.df.mapreduce
org.apache.mahout.classifier.df.mapreduce.inmem
org.apache.mahout.classifier.df.mapreduce.partial
org.apache.mahout.classifier.df.node
org.apache.mahout.classifier.df.ref
org.apache.mahout.classifier.df.split
org.apache.mahout.classifier.df.tools
I guess "df" mentioned above means "decision forest". I am not good at mahout and its source code makes me crazy, so I want to find a mahout decision forest example to see how to use these packages just like the HelloWorldClustering code in Chapter 7 Introduction to clustering Mahout in Action.
I have suffered from this problem for a while. I surf a lot of articles on the Internet but still don't find an effective example yet to tell me how to write the code in real project. Can anyone give me an example with code?
I've recently been using Mahout's DecisionForest, and the best resource i've found to help is Mark Needham and Jennifer Smith's example:
http://www.markhneedham.com/blog/2012/10/27/kaggle-digit-recognizer-mahout-random-forest-attempt/
Take a look at that, the GitHub repository is at the bottom of the page.
Related
I am new to this domain. My goal is to find similarities between event logs pattern. For this I have selected alpha algorithm. I have already seen videos about heuristic approach in ProM. But my confusion is that how can I implement this in my java project using ProM Framework/Plugin. Is this possible or not? Have I selected a right algorithm for this task?
As I said I am new to this domain, it would be very helpful for me if someone guide me about this stating step.
Thanks
You can not. ProM remain itself, not support to include any other project(such as java, web etc). you may make promM plugin to use your algorithm in promM, or create your own java project but would be implement process mining logic from bottom.
You can implement your ProM plugin as a class in java. You can also modify the current ProM plugins locally on your machine. However, using Alpha algorithm is not suitable for this task. There are plenty of plugins available that can help you in this regard. For example, if you consider directly follows relations as pattern, "Discover Matrix" plugin could be useful.
I know that this question was asked before - but the answer was not satisfying (in the sense of that the answer was just a link ).
So my question is, is there any way to extend the existing openNLP models? I already know about the technique with DBPedia/Wikipedia. But what if i just want to append some lines of text to improve the models - is there really no way? (If so - that would be really stupid...)
Unfortunately, you can't. See this question which has a detailed answer to the same problem.
I think, that is a though problem because when you deal with texts you have often licensing issues. For example, you can not build a corpus on Twitter data and publish it to the community (see this paper for some more information).
Therefore, often companies build domain specific corpora and use them internally. For example, we did in our research project. Therefore, we built a tool (Quick Pad Tagger) to create annotated corpora efficiently (see here).
Ok i think this needs a separate answer.
I found the Yago database: http://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/yago//
This database seems to be just fantastic (from the first look). You can download all the tagged data and put it in a database (they already deliver the tools for that).
The next stage is to "refactor" the tagged entities so that opennlp can use it (openNLP uses sth. like this <START:person> Pierre Vinken <END>)
Then you create some text files and train it with the opennlp delivered training tool.
Not 100% sure if this works but i will come back and tell you.
I have to make lexical graph with words within a corpus. For that, I need to make a program with word2vec.
The thing is that I'm new at this. I've tried for 4 days now to find a way to use word2vec but I'm lost. My big problem is that I don't even know where to find the code in Java (I heard about deeplearning but I couldn't find the files on their website), how to integrate it in my project...
One of the easiest way to embody the Word2Vec representation in your java code is to use deeplearning4j, the one you have mentioned. I assume you have already seen the main pages of the project. For what concerns the code, check these links:
Github repository
Examples
I need to read and write some data on .mdb Access file and over the web I found the Jackcess library that that does exactly that.
Unfortunately I could't find any documentation to use that. On the library website there are a couple of examples, but no real documentation. Can anyone tell me if there's some sort of documentation somewhere?
The javadoc is intended to be fairly explanatory. The primary classes would be Database and Table. The library is also heavily unit tested, so you can dig into the unit test code to see many examples. There isn't currently a great "getting started" document. It has been discussed before, but, unfortunately no one has picked up the ball on actually writing it. That said, the help forum is actively monitored.
UPDATE:
There is now a cookbook, which is the beginnings of a more comprehensive user-level documentation.
You can use jackcess-orm that use DAO pattern and POJO with annotations.
I have a maven project imported into Eclipse. I'm trying to understand the code pattern (architecture). What is the best way to do this?
will use any UML Eclipse plugin help on this?
will use sequence diagram, help on this?
what plugins should I use?
Please share your opinion.
When I am working with a open source project/codebase I get a high-level view and focus on the core code/logic by checking the package names and structure. I then typically determine how the API works by looking at any example code / documentation contained in the project. If I still need some more help I will draw up some inheritance diagrams, print out interesting classes that I may need to make significant changes to, and try to find more examples of the code being used elsewhere.
I am biased and have been using our recently launched Architexa Eclipse plugin to accomplish the above. I am sure there are others available that do something similar.
I guess you will find some pointers in this SE-Radio podcast: Episode 148: Software Archaeology with Dave Thomas.
Of course, UML can help, but on the other side, it might not as well. For reverse engineering, there is the MoDisco project in Eclipse, which might be useful.