I am trying to use mahout for the recommendation but getting none.
My dataset :
0,102,5.0
1,101,5.0
1,102,5.0
Code :
DataModel datamodel = new FileDataModel(new File("dataset.csv"));
// Creating UserSimilarity object.
UserSimilarity usersimilarity = new PearsonCorrelationSimilarity(datamodel);
// Creating UserNeighbourHHood object.
UserNeighborhood userneighborhood = new ThresholdUserNeighborhood(0.1, usersimilarity, datamodel);
// Create UserRecomender
UserBasedRecommender recommender = new GenericUserBasedRecommender(datamodel, userneighborhood, usersimilarity);
List<RecommendedItem> recommendations = recommender.recommend(0, 1);
for (RecommendedItem recommendation : recommendations) {
System.out.println(recommendation);
}
I am using Mahout version : 0.13.0
Ideally, it should recommend item_id = 101' to 'user_id = 0' asuser = 0anduser = 1have item 102 common show it should recommenditem_id = 101touser_id = 0`
Logs :
18:08:11.669 [main] INFO org.apache.mahout.cf.taste.impl.model.file.FileDataModel - Creating FileDataModel for file dataset.csv
18:08:11.700 [main] INFO org.apache.mahout.cf.taste.impl.model.file.FileDataModel - Reading file info...
18:08:11.702 [main] INFO org.apache.mahout.cf.taste.impl.model.file.FileDataModel - Read lines: 3
18:08:11.722 [main] INFO org.apache.mahout.cf.taste.impl.model.GenericDataModel - Processed 2 users
18:08:11.738 [main] DEBUG org.apache.mahout.cf.taste.impl.recommender.GenericUserBasedRecommender - Recommending items for user ID '0'
The Hadoop Mapreduce code in Mahout is being deprecated. The new recommender code starts with #rawkintrevo 's examples. If you are a Scala programmer follow them.
Most Engineers would like a system that works with no modification, The Mahout algorithm is encapsulated in The Universal Recommender built on top of Apache PredictionIO. It has a server to accept events, like the ones in your example, it has internal event storage, and a query server for results. There are numerous improvements over the old Mapreduce code, including using real-time user behavior to make recommendations. Neither the new Mahout nor the old included servers for input and query, the Universal Recommender has REST endpoints for both.
Given that the code you are using will be deprecated I strongly suggest that you dive into Mahout code (#rawkintrevo's example) or look at The Universal Recommender, which is an entire end-to-end system.
Install PredictionIO with a "single machine" setup here or to really shortcut setup use our prepackaged AWS AMI here It includes PIO and The Universal Recommender pre-installed.
Add the UR Template here
A Java SDK for sending events to the recommender here
Once you have this setup you deal with config, REST or Java SDK and the PIO CLI. No Scala coding required.
I have three examples that are based on version 0.13.0 (and Scala, which is required for Samsara, the R-Like Scala DSL Mahout utilizes v0.10+)
Walk
The first example is a very slow walk through:
https://gist.github.com/rawkintrevo/3869030ff1a731d43c5e77979a5bf4a8
and is meant as a companion to Pat Ferrels blog post/slide deck found here.
http://actionml.com/blog/cco
Crawl
The second example is a little more "real" in that it utilizes the SimilarityAnalysis.cooccurrencesIDSs(... which is the propper interface for the CCO algorithm.
https://gist.github.com/rawkintrevo/c1bb00896263bdc067ddcd8299f4794c
Run
Here we use 'real' data. The MovieLens data set doesn't have enough going on to showcase CCO's multi-modal power (the ability to recommend on multiple user behaviors). Here we load 'real' data and generate recommendations.
https://gist.github.com/rawkintrevo/f87cc89f4d337d7ffea80a6af3bee83e
Conclusion
I know you specifically asked for Java, however Apache Mahout isn't geared for Java at the moment. In theory you could import Scala into your java, or maybe wrap the functions in another more Java friendly function... I've heard rumors late at night (or possibly in a dream) that some grad students some where were working on a Java API, but its not in the trunk at the moment, nor is there a PR, nor is their a bullet in the road map.
Hope the above provides some insight.
Appendix
The most trivial example for Stackoverflow (you can run this interactively in the Mahout spark shell by typing $MAHOUT_HOME/bin/mahout spark-shell (assuming SPARK_HOME, JAVA_HOME and MAHOUT_HOME are set):
val inputRDD = sc.parallelize(Array( ("u1", "purchase", "iphone"),
("u1","purchase","ipad"),
("u2","purchase","nexus"),
("u2","purchase","galaxy"),
("u3","purchase","surface"),
("u4","purchase","iphone"),
("u4","purchase","galaxy"),
("u1","category-browse","phones"),
("u1","category-browse","electronics"),
("u1","category-browse","service"),
("u2","category-browse","accessories"),
("u2","category-browse","tablets"),
("u3","category-browse","accessories"),
("u3","category-browse","service"),
("u4","category-browse","phones"),
("u4","category-browse","tablets")) )
import org.apache.mahout.math.indexeddataset.{IndexedDataset, BiDictionary}
import org.apache.mahout.sparkbindings.indexeddataset.IndexedDatasetSpark
val purchasesIDS = IndexedDatasetSpark.apply(inputRDD.filter(_._2 == "purchase").map(o => (o._1, o._3)))(sc)
val browseIDS = IndexedDatasetSpark.apply(inputRDD.filter(_._2 == "category-browse").map(o => (o._1, o._3)))(sc)
import org.apache.mahout.math.cf.SimilarityAnalysis
val llrDrmList = SimilarityAnalysis.cooccurrencesIDSs(Array(purchasesIDS, browseIDS),
randomSeed = 1234,
maxInterestingItemsPerThing = 3,
maxNumInteractions = 4)
val llrAtA = llrDrmList(0).matrix.collect
IndexedDatasetSpark.apply( requires an RDD[(String, String)] where the first string is the 'row' (e.g. users), second string is the 'behavior' so for the 'buy matrix', the columns would be 'products', but this could also be a 'gender' matrix, with two columns (male/female)
Then you pass an array of IndexedDataSets to SimilarityAnalysis.cooccurrencesIDSs(
Related
I am using Netlogo Api Controller With spring boot
this my code (i got it from this link )
HeadlessWorkspace workspace = HeadlessWorkspace.newInstance();
try {
workspace.open("models/Residential_Solar_PV_Adoption.nlogo",true);
workspace.command("set number-of-residences 900");
workspace.command("set %-similar-wanted 7");
workspace.command("set count-years-simulated 14");
workspace.command("set number-of-residences 500");
workspace.command("set carbon-tax 13.7");
workspace.command("setup");
workspace.command("repeat 10 [ go ]");
workspace.command("reset-ticks");
workspace.dispose();
workspace.dispose();
}
catch(Exception ex) {
ex.printStackTrace();
}
i got this result in the console:
But I want to get the table view and save to database. Which command can I use to get the table view ?
Table view:
any help please ?
If you can clarify why you're trying to generate the data this way, I or others might be able to give better advice.
There is no single NetLogo command or NetLogo API method to generate that table, you have to use BehaviorSpace to get it. Here are some options, listed in rough order of simplest to hardest.
Option 1
If possible, I'd recommend just running BehaviorSpace experiments from the command line to generate your table. This will get you exactly the same output you're looking for. You can find information on how to do that in the NetLogo manual's BehaviorSpace guide. If necessary, you can run NetLogo headless from the command line from within a Java program, just look for resources on calling out to external programs from Java, maybe with ProcessBuilder.
If you're running from within Java in order to setup and change the parameters of your BehaviorSpace experiments in a way that you cannot do from within the program, you could instead generate experiment XML files in Java to pass to NetLogo at the command line. See the docs on the XML format.
Option 2
You can recreate the contents of the table using the CSV extension in your model and adding a few more commands to generate the data. This will not create the exact same table, but it will get your data output in a computer and human readable format.
In pure NetLogo code, you'd want something like the below. Note that you can control more of the behavior (like file names or the desired variables) by running other pre-experiment commands before running setup or go in your Java code. You could also run the CSV-specific file code from Java using the controlling API and leave the model unchanged, but you'll need to write your own NetLogo code version of the csv:to-row primitive.
globals [
;; your model globals here
output-variables
]
to setup
clear-all
;;; your model setup code here
file-open "my-output.csv"
; the given variables should be valid reporters for the NetLogo model
set output-variables [ "ticks" "current-price" "number-of-residences" "count-years-simulated" "solar-PV-cost" "%-lows" "k" ]
file-print csv:to-row output-variables
reset-ticks
end
to go
;;; the rest of your model code here
file-print csv:to-row map [ v -> runresult v ] output-variables
file-flush
tick
end
Option 3
If you really need to reproduce the BehaviorSpace table export exactly, you can try to run a BehaviorSpace experiment directly from Java. The table is generated by this code but as you can see it's tied in with the LabProtocol class, meaning you'll have to setup and run your model through BehaviorSpace instead of just step-by-step using a workspace as you've done in your sample code.
A good example of this might be the Main.scala object, which extracts some experiment settings from the expected command-line arguments, and then uses them with the lab.run() method to run the BehaviorSpace experiment and generate the output. That's Scala code and not Java, but hopefully it isn't too hard to translate. You'd similarly have to setup an org.nlogo.nvm.LabInterface.Settings instance and pass that off to a HeadlessWorkspace.newLab.run() to get things going.
I've created a model based on the 'wide and deep' example (https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/learn/wide_n_deep_tutorial.py).
I've exported the model as follows:
m = build_estimator(model_dir)
m.fit(input_fn=lambda: input_fn(df_train, True), steps=FLAGS.train_steps)
results = m.evaluate(input_fn=lambda: input_fn(df_test, True), steps=1)
print('Model statistics:')
for key in sorted(results):
print("%s: %s" % (key, results[key]))
print('Done training!!!')
# Export model
export_path = sys.argv[-1]
print('Exporting trained model to %s' % export_path)
m.export(
export_path,
input_fn=serving_input_fn,
use_deprecated_input_fn=False,
input_feature_key=INPUT_FEATURE_KEY
My question is, how do I create a client to make predictions from this exported model? Also, have I exported the model correctly?
Ultimately I need to be able do this in Java too. I suspect I can do this by creating Java classes from proto files using gRPC.
Documentation is very sketchy, hence why I am asking on here.
Many thanks!
I wrote a simple tutorial Exporting and Serving a TensorFlow Wide & Deep Model.
TL;DR
To export an estimator there are four steps:
Define features for export as a list of all features used during estimator initialization.
Create a feature config using create_feature_spec_for_parsing.
Build a serving_input_fn suitable for use in serving using input_fn_utils.build_parsing_serving_input_fn.
Export the model using export_savedmodel().
To run a client script properly you need to do three following steps:
Create and place your script somewhere in the /serving/ folder, e.g. /serving/tensorflow_serving/example/
Create or modify corresponding BUILD file by adding a py_binary.
Build and run a model server, e.g. tensorflow_model_server.
Create, build and run a client that sends a tf.Example to our tensorflow_model_server for the inference.
For more details look at the tutorial itself.
Just spent a solid week figuring this out. First off, m.export is going to deprecated in a couple weeks, so instead of that block, use: m.export_savedmodel(export_path, input_fn=serving_input_fn).
Which means you then have to define serving_input_fn(), which of course is supposed to have a different signature than the input_fn() defined in the wide and deep tutorial. Namely, moving forward, I guess it's recommended that input_fn()-type things are supposed to return an InputFnOps object, defined here.
Here's how I figured out how to make that work:
from tensorflow.contrib.learn.python.learn.utils import input_fn_utils
from tensorflow.python.ops import array_ops
from tensorflow.python.framework import dtypes
def serving_input_fn():
features, labels = input_fn()
features["examples"] = tf.placeholder(tf.string)
serialized_tf_example = array_ops.placeholder(dtype=dtypes.string,
shape=[None],
name='input_example_tensor')
inputs = {'examples': serialized_tf_example}
labels = None # these are not known in serving!
return input_fn_utils.InputFnOps(features, labels, inputs)
This is probably not 100% idiomatic, but I'm pretty sure it works. For now.
I'm hoping that someone familiar with Spark can give me a "gut check" on whether I'm likely abusing the SparkML framework or if the performance I'm seeing is understandable given the context (#rows, #features).
Briefly, I have a small dataset (~150 rows) that is fairly wide (~180 features). I have coded up analogous Lasso training codes in Spark and Scikit-learn, which result in identical models (same model coefficients and LOOCVE). However, the Spark code takes roughly 100x longer (sklearn takes about 5 seconds, close to 600 secs.
I understand that Spark is optimized for large distributed datasets and that this difference can reasonably attributed to overhead latency that would be hidden by data parallelism, but this still feels extremely sluggish.
The spark code is essentially:
//... code to add a number of PipelineStages to a List<PipelineStage> (~90 UnaryTransformer stages), ending in a StandardScaler
// Add Lasso model
LinearRegression lasso = new LinearRegression()
.setLabelCol(response)
.setFeaturesCol("normed_features")
.setMaxIter(100000)
.setPredictionCol(response+"_prediction")
.setElasticNetParam(1.0)
.setFitIntercept(true)
.setRegParam(0.2);
// stages is the List<PipelineStage> loaded with 90 or so UnaryTransformer steps
stages.add(lasso);
Pipeline pipeline = new Pipeline(stages);
DataFrame df = getTrainingData(trainingData, response);
RegressionEvaluator evaluator = new RegressionEvaluator()
.setLabelCol(response)
.setMetricName("mae")
.setPredictionCol(response+"_prediction")
);
df.cache();
ParamMap[] paramGrid = new ParamGridBuilder().build();
CrossValidator cv = new CrossValidator()
.setEstimator(pipeline)
.setEvaluator(evaluator)
.setEstimatorParamMaps(paramGrid)
.setNumFolds(20);
double cve = cv.fit(df).avgMetrics()[0];
the Python code uses Lasso and GridSearchCV with the same #folds (20).
Unfortunately, I can't really provide a MWE as we use a custom Transformer that I'd have to paste in, but I'm wondering if anyone would be willing to weigh in on whether this runtime difference between sklearn and spark implies user error. The only good practice I am knowingly applying is caching the training DataFrame before fitting the CrossValidator.
I am new to BO, I need to find universe name and the corresponding metadata information like(Table name, column names, join conditions etc...). I am unable to find proper way to start. I looked with Data Access SDK, Semantic SDk.
Can any one please provide me the sample code or procedure for starting..
I googled a lot but i am unable to find any sample examples
I looked into this link but that code will work only on R2 Server.
http://www.forumtopics.com/busobj/viewtopic.php?t=67088
Help is Highly Apprecitated.....
Assuming you're talking about IDT based universes, you'll need to code some Java. The JavaDoc for the API is available here.
In a nutshell, you do something like this:
SlContext context = SlContext.create() ;
LocalResourceService service = context.getService(LocalResourceService.class) ;
String blxFile = service.retrieve("universe.unx","output directory") ;
RelationalBusinessLayer businessLayer = (RelationalBusinessLayer)service.load(blxFile);
RootFolder rootFolder = businessLayer.getRootFolder() ;
Once you have a hook on the rootFolder, you can use the getChildren() method to drill into the folder structure and access the various subfolders/business objects available.
You may also want to check the CmsResourceService class to access universes stored on the repository.
To get the information you are after will require a 2 part solution. Part 1 use the Rebean SDK looking at WebI reports for the Universe and object names being used with in it.
Part 2 is to break out your favorite COM programming tool, since I try to avoid COM I use the Excel Macro editor, and access the BusinessObjects Designer library. Main code snippets that I currently have are:
Dim boUniv As Designer.Universe
Dim tbl As Designer.Table
For Each tbl In boUniv.Tables
Debug.Print tbl.Name
Next tbl
This prints all of the tables in a universe.
You will need to combine the 2 parts on your own for a dependency list between WebI reports and Universes.
I'm trying to integrate Sugar CRM with one of my projects. I'm using Apache Axis as my SOAP client.
I've created the Sugar CRM client Stub classes using Apache Axis.
I'm able to login and add Leads, Opportunities, Accounts and Contacts.
But I'm unable to add a relation ship between my Account and Opportunity.
I've found following method in the SugarsoapPortType
port.set_relationship(session, module_name, module_id, link_field_name, related_ids, name_value_list, delete)
but I cannot understand the different parameters required by this method.
Most of the online documents suggests a simple way as given below
$result = $client->call('set_relationship',array("session"=>$session _id,array("module1"=>"Emails","module1_id"=>"<module1_id>","module2"=>"Accounts","module2_id"=> "<module2_id>")));
how can I achieve this using Java
Thanks
Got this after a lot of search
Example
New_set_relationship_list_result relationship = port.set_relationship(sessionID, "Accounts", "<account_id>", "opportunities", new String[] {"<opportunity_id>"}, null, 0);