Using mahout in java code, not cli - java
i want to be able to build a model using java, i am able to do so with CLI as folowing:
./mahout trainlogistic --input Candy-Crush.twtr.csv \
--output ./model \
--target hd_click --categories 2 \
--predictors click_frequency country_code ctr device_price_range hd_conversion time_of_day num_clicks phone_type twitter is_weekend app_entertainment app_wallpaper app_widgets arcade books_and_reference brain business cards casual comics communication education entertainment finance game_wallpaper game_widgets health_and_fitness health_fitness libraries_and_demo libraries_demo lifestyle media_and_video media_video medical music_and_audio news_and_magazines news_magazines personalization photography productivity racing shopping social sports sports_apps sports_games tools transportation travel_and_local weather app_entertainment_percentage app_wallpaper_percentage app_widgets_percentage arcade_percentage books_and_reference_percentage brain_percentage business_percentage cards_percentage casual_percentage comics_percentage communication_percentage education_percentage entertainment_percentage finance_percentage game_wallpaper_percentage game_widgets_percentage health_and_fitness_percentage health_fitness_percentage libraries_and_demo_percentage libraries_demo_percentage lifestyle_percentage media_and_video_percentage media_video_percentage medical_percentage music_and_audio_percentage news_and_magazines_percentage news_magazines_percentage personalization_percentage photography_percentage productivity_percentage racing_percentage shopping_percentage social_percentage sports_apps_percentage sports_games_percentage sports_percentage tools_percentage transportation_percentage travel_and_local_percentage weather_percentage reads_magazine_sum reads_magazine_count interested_in_gardening_sum interested_in_gardening_count kids_birthday_coming_sum kids_birthday_coming_count job_seeker_sum job_seeker_count friends_sum friends_count married_sum married_count charity_donor_sum charity_donor_count student_sum student_count interested_in_real_estate_sum interested_in_real_estate_count sports_fan_sum sports_fan_count bascketball_sum bascketball_count interested_in_politics_sum interested_in_politics_count gamer_sum gamer_count activist_sum activist_count traveler_sum traveler_count likes_soccer_sum likes_soccer_count interested_in_celebs_sum interested_in_celebs_count auto_racing_sum auto_racing_count age_group_sum age_group_count healthy_lifestyle_sum healthy_lifestyle_count interested_in_finance_sum interested_in_finance_count sports_teams_usa_sum sports_teams_usa_count interested_in_deals_sum interested_in_deals_count business_oriented_sum business_oriented_count interested_in_cooking_sum interested_in_cooking_count music_lover_sum music_lover_count beauty_sum beauty_count follows_fashion_sum follows_fashion_count likes_wrestling_sum likes_wrestling_count name_sum name_count shopper_sum shopper_count golf_sum golf_count vegetarian_sum vegetarian_count dating_sum dating_count interested_in_fashion_sum interested_in_fashion_count interested_in_news_sum interested_in_news_count likes_tennis_sum likes_tennis_count male_sum male_count interested_in_cars_sum interested_in_cars_count follows_bloggers_sum follows_bloggers_count entertainment_sum entertainment_count interested_in_books_sum interested_in_books_count has_kids_sum has_kids_count interested_in_movies_sum interested_in_movies_count musicians_sum musicians_count tech_oriented_sum tech_oriented_count female_sum female_count has_pet_sum has_pet_count practicing_sports_sum practicing_sports_count \
--types numeric word numeric word word word numeric word word word numeric \
--features 100 --passes 1 --rate 50
i cant understand the 20 news group example because its to big to learn from.
can anyone give me a code that is doing the same as the cli command?
to clarify:
i need something like this:
model.train(1,0,"monday",6,44,1,7,4,6,78,7,3,4,6,........,"good");
model.train(1,0,"sunday",6,44,5,7,9,2,4,6,78,7,3,4,6,........,"bad");
model.train(1,0,"monday",4,99,2,4,6,3,4,6,........,"good");
model.writeTofile("myModel.model");
PLESE DO NOT ANSWER IF YOU ARE NOT FAMILIAR WITH CLASSIFICATION AND ONLY WANT TO TELL ME HOW TO EXECUTE CLI COMMAND FROM JAVA
I am not 100% familiar with the Mahout API (I agree that documentation is very sparse) so I can only give pointers, but I hope it helps:
The Java source code for the trainlogistic example can actually be found in the mahout-examples library - it's on maven [0] (in org.apache.mahout.classifier.sgd.TrainLogistic). I suppose if you wanted to, you could just use the exact same source code, but it depends on a couple of utility classes in the mahout-examples library (and it's not very clean, either).
The class performing the training in this example is org.apache.mahout.classifier.sgd.OnlineLogisticRegression [1], although considering the large number of predictor variables you have you might want to use the AdaptiveLogisticRegression [2] (same package), which uses a number of OnlineLogisticRegressions internally. But you have to see for yourself which works best with your data.
The API is fairly straightforward, there's a train method which takes a Vector of your input data and a classify method to test your model, as well as learningRate and others to change the model's parameters.
To save the model to disk like the command line tool does, use the org.apache.mahout.classifier.sgd.ModelSerializer, which has a straightforward API to write and read your model. (There's also write and readFields methods in the OLR class itself, but frankly, I'm not sure what they do or if there's a difference to ModelSerializer - they're not documented either.)
Lastly, aside from the source code in mahout-examples, here's two other example of using the Mahout API directly, that might be useful [3, 4].
Sources:
[0] http://repo1.maven.org/maven2/org/apache/mahout/mahout-examples/0.8/
[1] http://archive.cloudera.com/cdh4/cdh/4/mahout/mahout-core/org/apache/mahout/classifier/sgd/OnlineLogisticRegression.html
[2] http://archive.cloudera.com/cdh4/cdh/4/mahout/mahout-core/org/apache/mahout/classifier/sgd/AdaptiveLogisticRegression.html
[3] http://mail-archives.apache.org/mod_mbox/mahout-user/201206.mbox/%3CCAJwFCa3X2fL_SRxT7f7v9uMjS3Tc9WrT7vuMQCVXyH71k0H0zQ#mail.gmail.com%3E
[4] http://skife.org/mahout/2013/02/14/first_steps_with_mahout.html
This blog has a good post about how to do training and classification with Mahout Java API: http://nigap.blogspot.com/2012/02/bayes-algorithm-with-apache-mahout.html
You could use Runtime.exec to execute the same cmd line from java.
The simple approach is:
Process p = Runtime.getRuntime().exec("/usr/bin/bash -ic \"<path_to_mahout>/mahout trainlogistic --input Candy-Crush.twtr.csv "
+ "--output ./model "
+ "--target hd_click --categories 2 "
+ "--predictors click_frequency country_code ctr device_price_range hd_conversion time_of_day num_clicks phone_type twitter is_weekend app_entertainment app_wallpaper app_widgets arcade books_and_reference brain business cards casual comics communication education entertainment finance game_wallpaper game_widgets health_and_fitness health_fitness libraries_and_demo libraries_demo lifestyle media_and_video media_video medical music_and_audio news_and_magazines news_magazines personalization photography productivity racing shopping social sports sports_apps sports_games tools transportation travel_and_local weather app_entertainment_percentage app_wallpaper_percentage app_widgets_percentage arcade_percentage books_and_reference_percentage brain_percentage business_percentage cards_percentage casual_percentage comics_percentage communication_percentage education_percentage entertainment_percentage finance_percentage game_wallpaper_percentage game_widgets_percentage health_and_fitness_percentage health_fitness_percentage libraries_and_demo_percentage libraries_demo_percentage lifestyle_percentage media_and_video_percentage media_video_percentage medical_percentage music_and_audio_percentage news_and_magazines_percentage news_magazines_percentage personalization_percentage photography_percentage productivity_percentage racing_percentage shopping_percentage social_percentage sports_apps_percentage sports_games_percentage sports_percentage tools_percentage transportation_percentage travel_and_local_percentage weather_percentage reads_magazine_sum reads_magazine_count interested_in_gardening_sum interested_in_gardening_count kids_birthday_coming_sum kids_birthday_coming_count job_seeker_sum job_seeker_count friends_sum friends_count married_sum married_count charity_donor_sum charity_donor_count student_sum student_count interested_in_real_estate_sum interested_in_real_estate_count sports_fan_sum sports_fan_count bascketball_sum bascketball_count interested_in_politics_sum interested_in_politics_count gamer_sum gamer_count activist_sum activist_count traveler_sum traveler_count likes_soccer_sum likes_soccer_count interested_in_celebs_sum interested_in_celebs_count auto_racing_sum auto_racing_count age_group_sum age_group_count healthy_lifestyle_sum healthy_lifestyle_count interested_in_finance_sum interested_in_finance_count sports_teams_usa_sum sports_teams_usa_count interested_in_deals_sum interested_in_deals_count business_oriented_sum business_oriented_count interested_in_cooking_sum interested_in_cooking_count music_lover_sum music_lover_count beauty_sum beauty_count follows_fashion_sum follows_fashion_count likes_wrestling_sum likes_wrestling_count name_sum name_count shopper_sum shopper_count golf_sum golf_count vegetarian_sum vegetarian_count dating_sum dating_count interested_in_fashion_sum interested_in_fashion_count interested_in_news_sum interested_in_news_count likes_tennis_sum likes_tennis_count male_sum male_count interested_in_cars_sum interested_in_cars_count follows_bloggers_sum follows_bloggers_count entertainment_sum entertainment_count interested_in_books_sum interested_in_books_count has_kids_sum has_kids_count interested_in_movies_sum interested_in_movies_count musicians_sum musicians_count tech_oriented_sum tech_oriented_count female_sum female_count has_pet_sum has_pet_count practicing_sports_sum practicing_sports_count "
+ "--types numeric word numeric word word word numeric word word word numeric "
+ "--features 100 --passes 1 --rate 50\"");
If you opt for this, then I suggest reading this first:
When Runtime.exec() won't
This way the application will run in a diffent process.
Additionally you can follow the section 'Integration with your application' from the following site:
Recomender Documentation
Also this is a good reference on writing a recomender:
Introducing Apache Mahout
Hope this helps.
Cheers
Related
Netlogo Api Controller - Get Table View
I am using Netlogo Api Controller With spring boot this my code (i got it from this link ) HeadlessWorkspace workspace = HeadlessWorkspace.newInstance(); try { workspace.open("models/Residential_Solar_PV_Adoption.nlogo",true); workspace.command("set number-of-residences 900"); workspace.command("set %-similar-wanted 7"); workspace.command("set count-years-simulated 14"); workspace.command("set number-of-residences 500"); workspace.command("set carbon-tax 13.7"); workspace.command("setup"); workspace.command("repeat 10 [ go ]"); workspace.command("reset-ticks"); workspace.dispose(); workspace.dispose(); } catch(Exception ex) { ex.printStackTrace(); } i got this result in the console: But I want to get the table view and save to database. Which command can I use to get the table view ? Table view: any help please ?
If you can clarify why you're trying to generate the data this way, I or others might be able to give better advice. There is no single NetLogo command or NetLogo API method to generate that table, you have to use BehaviorSpace to get it. Here are some options, listed in rough order of simplest to hardest. Option 1 If possible, I'd recommend just running BehaviorSpace experiments from the command line to generate your table. This will get you exactly the same output you're looking for. You can find information on how to do that in the NetLogo manual's BehaviorSpace guide. If necessary, you can run NetLogo headless from the command line from within a Java program, just look for resources on calling out to external programs from Java, maybe with ProcessBuilder. If you're running from within Java in order to setup and change the parameters of your BehaviorSpace experiments in a way that you cannot do from within the program, you could instead generate experiment XML files in Java to pass to NetLogo at the command line. See the docs on the XML format. Option 2 You can recreate the contents of the table using the CSV extension in your model and adding a few more commands to generate the data. This will not create the exact same table, but it will get your data output in a computer and human readable format. In pure NetLogo code, you'd want something like the below. Note that you can control more of the behavior (like file names or the desired variables) by running other pre-experiment commands before running setup or go in your Java code. You could also run the CSV-specific file code from Java using the controlling API and leave the model unchanged, but you'll need to write your own NetLogo code version of the csv:to-row primitive. globals [ ;; your model globals here output-variables ] to setup clear-all ;;; your model setup code here file-open "my-output.csv" ; the given variables should be valid reporters for the NetLogo model set output-variables [ "ticks" "current-price" "number-of-residences" "count-years-simulated" "solar-PV-cost" "%-lows" "k" ] file-print csv:to-row output-variables reset-ticks end to go ;;; the rest of your model code here file-print csv:to-row map [ v -> runresult v ] output-variables file-flush tick end Option 3 If you really need to reproduce the BehaviorSpace table export exactly, you can try to run a BehaviorSpace experiment directly from Java. The table is generated by this code but as you can see it's tied in with the LabProtocol class, meaning you'll have to setup and run your model through BehaviorSpace instead of just step-by-step using a workspace as you've done in your sample code. A good example of this might be the Main.scala object, which extracts some experiment settings from the expected command-line arguments, and then uses them with the lab.run() method to run the BehaviorSpace experiment and generate the output. That's Scala code and not Java, but hopefully it isn't too hard to translate. You'd similarly have to setup an org.nlogo.nvm.LabInterface.Settings instance and pass that off to a HeadlessWorkspace.newLab.run() to get things going.
Apache Mahout not giving any recommendation
I am trying to use mahout for the recommendation but getting none. My dataset : 0,102,5.0 1,101,5.0 1,102,5.0 Code : DataModel datamodel = new FileDataModel(new File("dataset.csv")); // Creating UserSimilarity object. UserSimilarity usersimilarity = new PearsonCorrelationSimilarity(datamodel); // Creating UserNeighbourHHood object. UserNeighborhood userneighborhood = new ThresholdUserNeighborhood(0.1, usersimilarity, datamodel); // Create UserRecomender UserBasedRecommender recommender = new GenericUserBasedRecommender(datamodel, userneighborhood, usersimilarity); List<RecommendedItem> recommendations = recommender.recommend(0, 1); for (RecommendedItem recommendation : recommendations) { System.out.println(recommendation); } I am using Mahout version : 0.13.0 Ideally, it should recommend item_id = 101' to 'user_id = 0' asuser = 0anduser = 1have item 102 common show it should recommenditem_id = 101touser_id = 0` Logs : 18:08:11.669 [main] INFO org.apache.mahout.cf.taste.impl.model.file.FileDataModel - Creating FileDataModel for file dataset.csv 18:08:11.700 [main] INFO org.apache.mahout.cf.taste.impl.model.file.FileDataModel - Reading file info... 18:08:11.702 [main] INFO org.apache.mahout.cf.taste.impl.model.file.FileDataModel - Read lines: 3 18:08:11.722 [main] INFO org.apache.mahout.cf.taste.impl.model.GenericDataModel - Processed 2 users 18:08:11.738 [main] DEBUG org.apache.mahout.cf.taste.impl.recommender.GenericUserBasedRecommender - Recommending items for user ID '0'
The Hadoop Mapreduce code in Mahout is being deprecated. The new recommender code starts with #rawkintrevo 's examples. If you are a Scala programmer follow them. Most Engineers would like a system that works with no modification, The Mahout algorithm is encapsulated in The Universal Recommender built on top of Apache PredictionIO. It has a server to accept events, like the ones in your example, it has internal event storage, and a query server for results. There are numerous improvements over the old Mapreduce code, including using real-time user behavior to make recommendations. Neither the new Mahout nor the old included servers for input and query, the Universal Recommender has REST endpoints for both. Given that the code you are using will be deprecated I strongly suggest that you dive into Mahout code (#rawkintrevo's example) or look at The Universal Recommender, which is an entire end-to-end system. Install PredictionIO with a "single machine" setup here or to really shortcut setup use our prepackaged AWS AMI here It includes PIO and The Universal Recommender pre-installed. Add the UR Template here A Java SDK for sending events to the recommender here Once you have this setup you deal with config, REST or Java SDK and the PIO CLI. No Scala coding required.
I have three examples that are based on version 0.13.0 (and Scala, which is required for Samsara, the R-Like Scala DSL Mahout utilizes v0.10+) Walk The first example is a very slow walk through: https://gist.github.com/rawkintrevo/3869030ff1a731d43c5e77979a5bf4a8 and is meant as a companion to Pat Ferrels blog post/slide deck found here. http://actionml.com/blog/cco Crawl The second example is a little more "real" in that it utilizes the SimilarityAnalysis.cooccurrencesIDSs(... which is the propper interface for the CCO algorithm. https://gist.github.com/rawkintrevo/c1bb00896263bdc067ddcd8299f4794c Run Here we use 'real' data. The MovieLens data set doesn't have enough going on to showcase CCO's multi-modal power (the ability to recommend on multiple user behaviors). Here we load 'real' data and generate recommendations. https://gist.github.com/rawkintrevo/f87cc89f4d337d7ffea80a6af3bee83e Conclusion I know you specifically asked for Java, however Apache Mahout isn't geared for Java at the moment. In theory you could import Scala into your java, or maybe wrap the functions in another more Java friendly function... I've heard rumors late at night (or possibly in a dream) that some grad students some where were working on a Java API, but its not in the trunk at the moment, nor is there a PR, nor is their a bullet in the road map. Hope the above provides some insight. Appendix The most trivial example for Stackoverflow (you can run this interactively in the Mahout spark shell by typing $MAHOUT_HOME/bin/mahout spark-shell (assuming SPARK_HOME, JAVA_HOME and MAHOUT_HOME are set): val inputRDD = sc.parallelize(Array( ("u1", "purchase", "iphone"), ("u1","purchase","ipad"), ("u2","purchase","nexus"), ("u2","purchase","galaxy"), ("u3","purchase","surface"), ("u4","purchase","iphone"), ("u4","purchase","galaxy"), ("u1","category-browse","phones"), ("u1","category-browse","electronics"), ("u1","category-browse","service"), ("u2","category-browse","accessories"), ("u2","category-browse","tablets"), ("u3","category-browse","accessories"), ("u3","category-browse","service"), ("u4","category-browse","phones"), ("u4","category-browse","tablets")) ) import org.apache.mahout.math.indexeddataset.{IndexedDataset, BiDictionary} import org.apache.mahout.sparkbindings.indexeddataset.IndexedDatasetSpark val purchasesIDS = IndexedDatasetSpark.apply(inputRDD.filter(_._2 == "purchase").map(o => (o._1, o._3)))(sc) val browseIDS = IndexedDatasetSpark.apply(inputRDD.filter(_._2 == "category-browse").map(o => (o._1, o._3)))(sc) import org.apache.mahout.math.cf.SimilarityAnalysis val llrDrmList = SimilarityAnalysis.cooccurrencesIDSs(Array(purchasesIDS, browseIDS), randomSeed = 1234, maxInterestingItemsPerThing = 3, maxNumInteractions = 4) val llrAtA = llrDrmList(0).matrix.collect IndexedDatasetSpark.apply( requires an RDD[(String, String)] where the first string is the 'row' (e.g. users), second string is the 'behavior' so for the 'buy matrix', the columns would be 'products', but this could also be a 'gender' matrix, with two columns (male/female) Then you pass an array of IndexedDataSets to SimilarityAnalysis.cooccurrencesIDSs(
Country/region codes (iso-3166-1/iso-3166-2) to longitude and latitude
I need to convert from iso-3166-1/iso-3166-2 codes to longitude/latitude Examples: Input: "US", Output: (37.09024, -95.71289100000001). Input "VE-O", Output: (10.9970723, -63.91132959999999). I have been searching around but failed to find a complete listing or, ideally, a Java library doing it. This Github project is promising but is missing a lot of geolocation information for a lot of regions. Note that, unlike the question Need a list of all countries in the world, with a longitude and latitude coordinate, this one refers to regional subdivisions (iso-3166-2).
Since I didn't get any answers, I will explain how I solved it. I found this csv listing for the iso-3166-1 country code centroids: http://dev.maxmind.com/geoip/legacy/codes/country_latlon/ (I had to make a few manual tweaks) As for the iso-3166-2 region centroids, I ended up creating a shell script which uses the Google Maps API to print the region centroids in csv format (note that I didn't verify the full output but the cases I checked are correct). Here's a simplified version of the script using curl and jq to process the API's output: #!/bin/bash # Example list of regions (full list can be obtained from https://www.ip2location.com/free/iso3166-2 ) REGIONS="VE-O GB-BKM GB-CAM GB-CMA" for REGION in $REGIONS; do LATLON=$(curl -s "maps.googleapis.com/maps/api/geocode/json?sensor=false&address=$REGION" | jq -r '.results[0].geometry.location.lat,#text ",",.results[0].geometry.location.lng') echo $REGION , $LATLON | tr -d ' ' done Then I imported the csv listings in my java code using Apache Commons CSV
These are the best resources I found when I had to tackle this problem: Country coordinates: http://dev.maxmind.com/geoip/legacy/codes/country_latlon/ ISO 3166-2 coordinates: https://github.com/oodavid/iso-3166-2/blob/master/iso_3166_2.js
How can i send an query from java to lpsolve as String
Hi i formulated a linear programing problem using java and i want to send it to be solved by lpsolve without the need to create each constraint seperatlly. i want to send the entire block (which if i insert it to the ide works well) and get a result so basically instead of using something like problem.strAddConstraint("", LpSolve.EQ, 9); problem.strAddConstraint("", LpSolve.LE, 5); i want to just send as one string min: 0*x11 + 0*x12 + 0*x13 x11 + x12 + x13= 9; x12 + x12<5; can it be done if so how?
LpSolve supports LP files as well as MPS files. Everything is thoroughly detailed in the API documentation (see http://lpsolve.sourceforge.net/5.5/). You can do your job like this in java : lp = LpSolve.readLP("model.lp", NORMAL, "test model"); LpSolve.solve(lp) What is sad with file based approaches is that you will not be able to use warm start features. I would not suggest you to use such approach if you want to optimize successive similar problems. Cheers
How to write a Ruby-regex pattern in Java (includes recursive named-grouping)?
well... i have a file containing tintin-script. Now i already managed to grab all actions and substitutions from it to show them properly ordered on a website using Ruby, which helps me to keep an overview. Example TINTIN-script #substitution {You tell {([a-zA-Z,\-\ ]*)}, %*$} {<279>[<269> $sysdate[1]<279>, <269>$systime<279> |<219> Tell <279>] <269>to <219>%2<279> : <219>%3} {4} #substitution {{([a-zA-Z,\-\ ]*)} tells you, %*$} {<279>[<269> $sysdate[1]<279>, <269>$systime<279> |<119> Tell <279>] <269>from <119>%2<279> : <119>%3} {2} #action {Your muscles suddenly relax, and your nimbleness is gone.} { #if {$sw_keepaon} { aon; }; } {5} #action {xxxxx} { #if {$sw_keepfamiliar} { familiar $familiar; }; } {5} To grab them in my Ruby-App i read my script-file into a varibable 'input' and then use the following pattern to scan the 'input' pattern = /(?<braces>{([^{}]|\g<braces>)*}){0}^#(?<type>action|substitution)\s*(?<b1>\g<braces>)\s*(?<b2>\g<braces>)\s*(?<b3>\g<braces>)/im input = "" File.open("/home/igambin/lmud/lmud.tt") { |file| input = file.read } input.scan(pattern) { |prio, type, pattern, code| ## here i usually create objects, but for simplicity only output now puts "Type : #{type}" puts "Pattern : #{pattern}" puts "Priority: #{prio}" puts "Code :\n#{code}" puts } Now my idea was to use the netbeans platform to write a module to not only keep an overview but also to assist editing the tintin script file. So opening the file in an Editor-Window I still need to parse the tintin-file and have all 'actions' and 'substitutions' from the file grabbed and displayed in an eTable, in wich I could dbl-click on one item to open a modification-window. I've setup the module and got everything ready so far, i just can't figure out how to translate the ruby-regex pattern i've written to a working java-regex-pattern. It seems named-group-capturing and especially the recursive application of these groups is not supported in Java. Without that I seem to be unable to find a working solution... Here's the ruby pattern again... pattern = /(?<braces>{([^{}]|\g<braces>)*}){0}^#(?<type>action|substitution)\s*(?<b1>\g<braces>)\s*(?<b2>\g<braces>)\s*(?<b3>\g<braces>)/im Can anyone help me to create a java pattern that matches the same? Many thanks in advance for tips/hints/ideas and especially for solutions or (close-to-solution comments)!
Your text format seems pretty simple; it's possible you don't really need recursive matching. This Java-compatible regex matches your sample data correctly, as far as I can tell: (?s)#(substitution|action)\s*\{(.*?)\}\s*\{(.*?)\}\s*\{(\d+)\} Would that work for you? If you run Java 7, you can even name the groups. ;)
Can anyone help me to create a java pattern that matches the same? No, no one can: Java's regex engine does not support recursive patterns (as Ruby 1.9 does).