Custom recommender jobs using apache mahout 0.11.2 over hadoop - java

I am newbie to Apache Mahout. I am using Apache mahout 0.11.2. So to give it a try I created java class called samplereccommender.java as shown below.
package f;
import java.io.File;
import java.io.IOException;
import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.impl.model.file.FileDataModel;
import org.apache.mahout.cf.taste.impl.neighborhood.ThresholdUserNeighborhood;
import org.apache.mahout.cf.taste.impl.recommender.GenericUserBasedRecommender;
import org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarity;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.neighborhood.UserNeighborhood;
import org.apache.mahout.cf.taste.recommender.RecommendedItem;
import org.apache.mahout.cf.taste.recommender.UserBasedRecommender;
import org.apache.mahout.cf.taste.similarity.ItemSimilarity;
import org.apache.mahout.cf.taste.similarity.UserSimilarity;
import java.util.List;
public class SampleReccommender {
public static void main(String args[]){
try{
DataModel datamodel = new FileDataModel(new File(args[0]));
//Creating UserSimilarity object.
UserSimilarity usersimilarity = new PearsonCorrelationSimilarity(datamodel);
//Creating UserNeighbourHHood object.
UserNeighborhood userneighborhood = new ThresholdUserNeighborhood(1.0, usersimilarity, datamodel);
//Create UserRecomender
UserBasedRecommender recommender = new GenericUserBasedRecommender(datamodel, userneighborhood, usersimilarity);
List recommendations = (List) recommender.recommend(2, 3);
System.out.println(recommendations.size());
for (int i=0; i< recommendations.size();i++) {
System.out.println(recommendations.get(i));
}
}
catch(Exception e){
e.printStackTrace();
}
}}
I managed to run same code from command line as
java -cp n.jar f.SampleReccommender n_lib/wishlistdata.txt
Now from what I read on the internet and book "Mahout in action" I understood that same code can be run on hadoop by using following commands.
first I need to include my SampleReccommender.java into existing apache-mahout-distribution-0.11.2/mahout-mr-0.11.2-job.jar. So I followed following procedure.
jar uf /Users/rohitjain/Documents/apache-mahout-distribution-0.11.2/mahout-mr-0.11.2-job.jar samplerecommender.jar
then I tried running mahout job using following command
bin/hadoop jar /Users/rohitjain/Documents/apache-mahout-distribution-0.11.2/mahout-mr-0.11.2-job.jar org.apache.mahout.cf.taste.hadoop.item.RecommenderJob -i /input/wishlistdata.txt -o /output/ --recommenderClassName \ f.SampleRecommender
But it gives me an error as :
Unexpected --recommenderClassName while processing Job-Specific Options:
I tried above command based on the syntax given "mahout in action" book which is as mentioned below
hadoop jar mahout-core-0.5-job.jar \ org.apache.mahout.cf.taste.hadoop.pseudo.RecommenderJob \ -Dmapred.input.dir=input/ua.base.hadoop \ -Dmapred.output.dir=output \ --recommenderClassName \ org.apache.mahout.cf.taste.impl.recommender.slopeone.SlopeOneRecommender
Am I doing anything wrong ? Also tell me whether the code I used for standalone implementation same can be be used for recommenderJobs or it requires all together different implementation?

Mahout in Action is out of date and the code you are using is being deprecated.
These days Mahout runs on more modern compute platforms like Spark. For the latest Mahout Recommender you can start with the Command Line Interface to spark-itemsimilarity and integrated it with Solr or Eleasticsearch. Or you can pickup a fully integrated end-to-end solution linked below:
Building a recommender with Mahout: http://mahout.apache.org/users/algorithms/recommender-overview.html
Mahout spark-itemsimilarity: http://mahout.apache.org/users/recommender/intro-cooccurrence-spark.html
Universal Recommender from ActionML: https://github.com/actionml/template-scala-parallel-universal-recommendation
The UR is built on PredictionIO ML Framework here: https://github.com/actionml/PredictionIO

Related

Using MongoDB and Java WITHOUT gradle, maven or an IDE

I want to use MongoDB in java, without an IDE or additional tools. I have downloaded mongo-java-driver-3.12.8.jar, and put it in the same folder as my helloMongo.java file.
I then have tried to run it with:
javac -cp "mongo-java-driver.jar" helloMongo.java
java -cp "mongo-java-driver.jar" helloMongo
Only to get that it cannot find the main class.
Then I tried, assuming the main path had been lost in javas braindead implementation:
javac -cp ".;mongo-java-driver.jar" helloMongo.java
java -cp ".;mongo-java-driver.jar" helloMongo
still to no luck. Then I tried:
javac -cp ".;/mongo-java-driver.jar" helloMongo.java
java -cp ".;/mongo-java-driver.jar" helloMongo
And a hundred other variants, and still no luck.
Is an IDE and Gradle essentially required to use Mongo with Java?
package com.javatpoint.java.mongo.db;
import com.mongodb.MongoClient;
import com.mongodb.client.MongoCollection;
import com.mongodb.client.MongoDatabase;
import org.bson.Document;
public class JavaMongoDemo {
public static void main(String[] args){
try{
//---------- Connecting DataBase -------------------------//
MongoClient mongoClient = new MongoClient( "localhost" , 27017 );
//---------- Creating DataBase ---------------------------//
MongoDatabase db = mongoClient.getDatabase("javatpoint");
//---------- Creating Collection -------------------------//
MongoCollection<Document> table = db.getCollection("employee");
//---------- Creating Document ---------------------------//
Document doc = new Document("name", "Peter John");
doc.append("id",12);
//----------- Inserting Data ------------------------------//
table.insertOne(doc);
}catch(Exception e){
System.out.println(e);
}
}
}
If your package is com.javatpoint.java.mongo.db, then your class has to be in
./com/javatpoint/java/mongo/db
and assuming you leave the Mongo jar in the same directory as your source,
your java command must be
java -cp ./com/javatpoint/java/mongo/db/mongo-java-driver.jar com.javatpoint.java.mongo.db.helloMongo

Import Paramiko in Jython

I am trying to import python paramiko module from java program. So for that i used jython. When i try to import paramiko from jython it gives below error,
Exception in thread "main" Traceback (most recent call last):
File "", line 1, in
ImportError: No module named paramiko
Please advice me to import paramiko from jython.
public class jythonTest {
public static void main(String[] args) throws PyException {
PythonInterpreter interp = new PythonInterpreter();
interp.exec("import sys");
interp.exec("import paramiko");
interp.exec("import time");
}
}
This might be because Jython doesn't read the Python packages from the place where you might have installed them in Python through CLI.
One way to solve your problem is to install Paramiko in during the execution of the code:
PythonInterpreter interp = new PythonInterpreter();
interp.exec("from pip._internal import main as pip_main");
interp.exec("pip_main(['install', 'paramiko'])")
interp.exec("import paramiko");
Or
PythonInterpreter interp = new PythonInterpreter();
interp.exec("from pip import main as pip_main");
interp.exec("pip_main(['install', 'paramiko'])")
interp.exec("import paramiko");
Refer to Installing python module within code for more ways to install packages in code depending on your python version. The above should hold good for Python 2.7, which is what I believe Jython is based on.

How to import LibSVM into my Java code

In Java programming, we should firstly add weka.jar into our classpath, thus we can call all classify or cluster algorithms in WEKA in the form of the following codes,
import weka.classifiers.trees.RandomForest;
...
RandomForest rf = new RandomForest(); // RandomForest object
But unfortunately, we can not use this way to import LibSVM algorithm, because there is not such class in weka.jar.
So, my question is How to import LibSVM into my Java code? Any help will be grateful :)
Firstly, I'd like to say there are so many methods to solve the problem. The solution I mentioned is quite simple, but other answers from StackOverflow are not detailed descripted, with waste my too much time to verify. So I'm happy to share it with all WEKA beginners :)
a) Download the LibSVM.jar from Maven Repository Center. Note that this LibSVM.jar is different from the libsvm.jar developed by Chih-Chung Chang and Chih-Jen Lin;
b) Add the LibSVM.jar to the classpath of our Java project;
c) Call the classifier LibSVM when you need, see the following Java code.
import weka.classifiers.functions.LibSVM; // contained in LibSVM.jar
String path = "file/train.arff";
Instances train = DataSource.read(path); // load the dataset
train.setClassIndex(train.numAttribute()-1); // set class index
LibSVM svm = new LibSVM(); // load the svm classifier
svm.buildClassifier(train);
Evaluation eval = new Evaluation(train);
eval.crossValidateModel(svm, train, 10, new Random(1)); // 10-fold cross-validation
See: https://weka.wikispaces.com/LibSVM
Use Weka's package manager to install the LibSVM. Suppose "weka.jar" is in your current folder, than run this:
java -cp weka.jar weka.core.WekaPackageManager -install-package LibSVM
During the installation, it shows:
[DefaultPackageManager] Tmp file: /tmp/LibSVM1.0.107382715397815864641.zip
[DefaultPackageManager] Installing: Description.props
[DefaultPackageManager] Installing: LibSVM.jar
[DefaultPackageManager] Installing: build_package.xml
...
You can see that "LibSVM.jar" is installed somewhere. In my case, it is at:
/home/john/wekafiles/packages/LibSVM/LibSVM.jar

Java - MySQL to Hive Import where MySQL Running on Windows and Hive Running on Cent OS (Horton Sandbox)

Before any Answer and Comments. I tried several option I found in Stackoverflow but end with a failure. Following are those links -
How can I execute Sqoop in Java?
How to use Sqoop in Java Program?
How to import table from MySQL to Hive using Java?
How to load SQL data into the Hortonworks?
I tried it in Horton Sandbox through command line and succeded.
sqoop import --connect jdbc:mysql://192.168.56.101:3316/database_name --username=user --password=pwd --table table_name --hive-import -m 1 -- --schema default
Where 192.168.56.101 is for Windows and 192.168.56.102 for Horton Sandbox 2.6.
Now I want to do the same thing from Java where that java code run somewhere else but not in horton sandbox.
How to loacate HIVE_HOME and other Sqoop parameters because that are running in Sandbox.
parameters which I have to pass. It should be passes as SqoopOptions or Sqoop.runTools String Array Arguments. Both failing.
I also get confused While import library (com.cloudera.sqoop and org.apache.sqoop) and get this
The method run(com.cloudera.sqoop.SqoopOptions) in the type ImportTool is not applicable for the arguments
(org.apache.sqoop.SqoopOptions) with this two line (option parameter are added between this two lines)
SqoopOptions options = new SqoopOptions();
int ret = new ImportTool().run(options);
if I choose Cloudera method get deprecated but if I choose apace run method doesn't accept the options argument
I am strucked in this from weeks. Please Help.
Yes you can do it via ssh. Horton Sandbox comes with ssh support pre installed. You can execute the sqoop command via ssh client on windows. Or if you want to do it programaticaly (thats what I have done in java) you have to follow this step.
Download sshxcute java library : https://code.google.com/p/sshxcute/
Add to the build path of your java project which contains the following java code
import net.neoremind.sshxcute.core.SSHExec;
import net.neoremind.sshxcute.core.ConnBean;
import net.neoremind.sshxcute.task.CustomTask;
import net.neoremind.sshxcute.task.impl.ExecCommand;
public class TestSSH {
public static void main(String args[]) throws Exception{
// Initialize a ConnBean object, parameter list is ip, username, password
ConnBean cb = new ConnBean("192.168.56.102", "root","hadoop");
// Put the ConnBean instance as parameter for SSHExec static method getInstance(ConnBean) to retrieve a singleton SSHExec instance
SSHExec ssh = SSHExec.getInstance(cb);
// Connect to server
ssh.connect();
CustomTask sampleTask1 = new ExecCommand("echo $SSH_CLIENT"); // Print Your Client IP By which you connected to ssh server on Horton Sandbox
System.out.println(ssh.exec(sampleTask1));
CustomTask sampleTask2 = new ExecCommand("sqoop import --connect jdbc:mysql://192.168.56.101:3316/mysql_db_name --username=mysql_user --password=mysql_pwd --table mysql_table_name --hive-import -m 1 -- --schema default");
ssh.exec(sampleTask2);
ssh.disconnect();
}
}

How to access the remote .mdb file in shared folder using java and jackcess from linux machine

This is my first post. I am trying to open the remote .mdb file which is in shared folder in Windows machine from the linux machine using jackcess lib. and set the table values in busineess object. I wrote the below code.
Scenario 1 : I have run the code from windows machine it is working fine. Scenario 2 : If i run the code from linux machine it is getting file not found exception. Hope it should be small mistake. Please correct me what am missing here .
package com.gg.main;
import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Map;
import com.healthmarketscience.jackcess.Database;
import com.healthmarketscience.jackcess.Table;
import com.penske.model.Login;
public class Test {
public static void main(String args[]){
Table table = null;
Database db = null;
Login login = null;
ArrayList<Login> rowList = null;
try {
rowList = new ArrayList();
db = Database.open(new File("//aa.bb.com/file/access.mdb"));
table = db.getTable("Maintenance");
System.out.println(Database.open(new File("//aa.bb.com/file/access.mdb"))
.getTable("Maintenance").getColumns());
for(Map<String, Object> row : table) {
login = new Login();
if(row.get("Req_ID")!=null){
login.setId(row.get("Req_ID").toString());
}
if(row.get("Name")!=null){
login.setName(row.get("Name").toString());
}if(row.get("Loc")!=null){
login.setLoc(row.get("Loc").toString());
}
rowList.add(login);
}
login.setRowList(rowList);
} catch (IOException e1) {
e1.printStackTrace();
}
}
}
Linux does not have native support for Windows' UNC path, as you use them here:
new File("//aa.bb.com/file/access.mdb")
You'll have to mount the remote filesystem somewhere in your Linux filesystem where your program can access it and then replace the paths in your program to use that local filesystem path, using smbfs or something like it. It's been a long time since I've had to interact with Windows machines, but it should go something like this:
mount -t smbfs -o username=foo,password=bar //aa.bb.com/file /mnt/whatever_you_choose_to_name_it
See the manpage for smbmount for details.
Of course, if your program is supposed to start automatically eg. as part of the system booting, you'll have to see to it that the filesystem is automatically mounted as well. See fstab(5).

Categories