I am following the book Apache Mahout Cookbook by piero giacomelli. Now when i download the maven sources using netbeans as IDE, i guess the sources are from mahout version 1.0 and not 0.8 as it showing an error in SlopeOneRecommender import alone.
Here is the complete code -
package com.packtpub.mahout.cookbook.chapter01;
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import java.util.List;
import org.apache.commons.cli2.OptionException;
import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.impl.common.LongPrimitiveIterator;
import org.apache.mahout.cf.taste.impl.model.file.FileDataModel;
import org.apache.mahout.cf.taste.impl.recommender.CachingRecommender;
import org.apache.mahout.cf.taste.impl.recommender.slopeone.SlopeOneRecommender;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.recommender.RecommendedItem;
public class App {
static final String inputFile = "/home/hadoop/ml-1m/ratings.dat";
static final String outputFile = "/home/hadoop/ml-1m/ratings.csv";
public static void main( String[] args ) throws IOException, TasteException, OptionException
{
CreateCsvRatingsFile();
// create data source (model) - from the csv file
File ratingsFile = new File(outputFile);
DataModel model = new FileDataModel(ratingsFile);
// create a simple recommender on our data
CachingRecommender cachingRecommender = new CachingRecommender(new SlopeOneRecommender(model));
// for all users
for (LongPrimitiveIterator it = model.getUserIDs(); it.hasNext();){
long userId = it.nextLong();
// get the recommendations for the user
List<RecommendedItem> recommendations = cachingRecommender.recommend(userId, 10);
// if empty write something
if (recommendations.size() == 0){
System.out.print("User ");
System.out.print(userId);
System.out.println(": no recommendations");
}
// print the list of recommendations for each
for (RecommendedItem recommendedItem : recommendations) {
System.out.print("User ");
System.out.print(userId);
System.out.print(": ");
System.out.println(recommendedItem);
}
}
}
private static void CreateCsvRatingsFile() throws FileNotFoundException, IOException {
BufferedReader br = new BufferedReader(new FileReader(inputFile));
BufferedWriter bw = new BufferedWriter(new FileWriter(outputFile));
String line = null;
String line2write = null;
String[] temp;
int i = 0;
while (
(line = br.readLine()) != null
&& i < 1000
){
i++;
temp = line.split("::");
line2write = temp[0] + "," + temp[1];
bw.write(line2write);
bw.newLine();
bw.flush();
}
br.close();
bw.close();
}
}
The error is being shown only on import org.apache.mahout.cf.taste.impl.recommender.slopeone.SlopeOneRecommender;
and hence on the line where i create an object using this. Error being shown is package does not exist.
Please help. Is it because i am using a newer version of mahout? I am even uncertain if I am using version 0.8 or a higher version as i followed all the links given in the book.
The SlopeOneRecommender was removed from mahout since v0.8. If you want to use it, you can switch to version such as 0.7.
<dependency>
<groupId>org.apache.mahout</groupId>
<artifactId>mahout-core</artifactId>
<version>0.7</version>
</dependency>
See http://permalink.gmane.org/gmane.comp.apache.mahout.user/20282
Exactly,
The SlopeOneRecommender was removed from mahout since v0.8. So either you get back to the version 0.7. Or if your purpose is only to try mahout u can try with other recommenders, such as :ItemAverageRecommender.
Related
I'm trying to create and save a generated model directly from Java. The documentation specifies how to do this in R and Python, but not in Java. A similar question was asked before, but no real answer was provided (beyond linking to H2O doc, which doesn't contain a code example).
It'd be sufficient for my present purpose get some pointers to be able to translate the following reference code to Java. I'm mainly looking for guidance on the relevant JAR(s) to import from the Maven repository.
import h2o
h2o.init()
path = h2o.system_file("prostate.csv")
h2o_df = h2o.import_file(path)
h2o_df['CAPSULE'] = h2o_df['CAPSULE'].asfactor()
model = h2o.glm(y = "CAPSULE",
x = ["AGE", "RACE", "PSA", "GLEASON"],
training_frame = h2o_df,
family = "binomial")
h2o.download_pojo(model)
I think I've figured out an answer to my question. A self-contained sample code follows. However, I'll still appreciate an answer from the community since I don't know if this is the best/idiomatic way to do it.
package org.name.company;
import hex.glm.GLMModel;
import water.H2O;
import water.Key;
import water.api.StreamWriter;
import water.api.StreamingSchema;
import water.fvec.Frame;
import water.fvec.NFSFileVec;
import hex.glm.GLMModel.GLMParameters.Family;
import hex.glm.GLMModel.GLMParameters;
import hex.glm.GLM;
import water.util.JCodeGen;
import java.io.*;
import java.util.Map;
public class Launcher
{
public static void initCloud(){
String[] args = new String [] {"-name", "h2o_test_cloud"};
H2O.main(args);
H2O.waitForCloudSize(1, 10 * 1000);
}
public static void main( String[] args ) throws Exception {
// Initialize the cloud
initCloud();
// Create a Frame object from CSV
File f = new File("/path/to/data.csv");
NFSFileVec nfs = NFSFileVec.make(f);
Key frameKey = Key.make("frameKey");
Frame fr = water.parser.ParseDataset.parse(frameKey, nfs._key);
// Create a GLM and output coefficients
Key modelKey = Key.make("modelKey");
try {
GLMParameters params = new GLMParameters();
params._train = frameKey;
params._response_column = fr.names()[1];
params._intercept = true;
params._lambda = new double[]{0};
params._family = Family.gaussian;
GLMModel model = new GLM(params).trainModel().get();
Map<String, Double> coefs = model.coefficients();
for(Map.Entry<String, Double> entry : coefs.entrySet()) {
System.out.format("%s: %f\n", entry.getKey(), entry.getValue());
}
String filename = JCodeGen.toJavaId(model._key.toString()) + ".java";
StreamingSchema ss = new StreamingSchema(model.new JavaModelStreamWriter(false), filename);
StreamWriter sw = ss.getStreamWriter();
OutputStream os = new FileOutputStream("/base/path/" + filename);
sw.writeTo(os);
} finally {
if (fr != null) {
fr.remove();
}
}
}
}
Would something like this do the trick?
public void saveModel(URI uri, Keyed<Frame> model)
{
Persist p = H2O.getPM().getPersistForURI(uri);
OutputStream os = p.create(uri.toString(), true);
model.writeAll(new AutoBuffer(os, true)).close();
}
Make sure the URI has a proper form otherwise H2O will break on an npe. As for Maven you should be able to get away with the h2o core.
<dependency>
<groupId>ai.h2o</groupId>
<artifactId>h2o-core</artifactId>
<version>3.14.0.2</version>
</dependency>
I'm trying to just write some data from an API (google stocks/finance API) to my AWS Firehose stream. I already downloaded and installed the AWS plugin on Eclipse, setup my Firehose stream on AWS, and everything seems to be setup correctly. Am encountering some problems, though. The following line seems to be deprecated...I tried different variations from Amazon's SDK, but I can't seem to get the correct code.
AmazonKinesisFirehoseClient firehoseClient = new
AmazonKinesisFirehoseClient(credentials);
Next, I'm getting some errors with the following. The specific error is, "The method setRecord(Record) is undefined for the type PutRecordRequest," even though I took it directly from Amazon's API reference.
request.setRecord(record);
firehoseClient.putRecord(request);
Also getting an error on the second line above: "The method putRecord(com.amazonaws.services.kinesisfirehose.model.PutRecordRequest) in the type AmazonKinesisFirehoseClient is not applicable for the arguments (com.amazonaws.services.kinesis.model.PutRecordRequest)"
package com.amazonaws.samples;
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URL;
import java.nio.ByteBuffer;
import org.apache.http.client.CredentialsProvider;
import com.amazonaws.*;
import com.amazonaws.AmazonClientException;
import com.amazonaws.auth.AWSCredentials;
import com.amazonaws.auth.AWSCredentialsProvider;
import com.amazonaws.auth.profile.ProfileCredentialsProvider;
import com.amazonaws.client.builder.AwsClientBuilder;
import com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient;
import com.amazonaws.services.kinesis.AmazonKinesis;
import com.amazonaws.services.kinesis.AmazonKinesisClient;
import com.amazonaws.services.kinesis.AmazonKinesisClientBuilder;
import com.amazonaws.services.kinesis.clientlibrary.interfaces.IRecordProcessorFactory;
import com.amazonaws.services.kinesis.clientlibrary.lib.worker.InitialPositionInStream;
import com.amazonaws.services.kinesis.clientlibrary.lib.worker.KinesisClientLibConfiguration;
import com.amazonaws.services.kinesis.clientlibrary.lib.worker.Worker;
import com.amazonaws.services.kinesis.model.PutRecordRequest;
import com.amazonaws.services.kinesis.model.ResourceNotFoundException;
import com.amazonaws.services.kinesisfirehose.AmazonKinesisFirehoseClient;
import com.amazonaws.services.kinesisfirehose.model.PutRecordBatchRequest;
import com.amazonaws.services.kinesisfirehose.model.Record;
public class FirehoseExample {
public static void main(String[] args) {
AWSCredentials credentials = null;
try {
credentials = new ProfileCredentialsProvider().getCredentials();
}
catch (Exception e) {
throw new AmazonClientException("Cannot load the credentials from the credential profiles file. "
+ "Please make sure that your credentials file is at the correct "
+ "location (/Users/elybenari/.aws/credentials), and is in valid format.", e);
}
AmazonKinesisFirehoseClient firehoseClient = new AmazonKinesisFirehoseClient(credentials);
PutRecordRequest request = new PutRecordRequest();
request.setStreamName("project-stream");
Record record = new Record();
for (int i = 0; i < 20*60; i++){
try {
URL url = new URL("https://www.google.com/finance/info?q=NASDAQ:AMZN");
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
BufferedReader reader = new BufferedReader(new InputStreamReader(conn.getInputStream()));
StringBuilder response = new StringBuilder();
String line;
while ((line = reader.readLine()) != null) {
response.append(line);
}
reader.close();
System.out.println(response.toString().replace("\n", "").replaceAll(" ", ""));
System.out.println("****\n");
ByteBuffer buffer = ByteBuffer.wrap(response.toString().replace("\n", "").replaceAll(" ", "").getBytes());
record.setData(buff);
request.setRecord(record);
firehoseClient.putRecord(request);
Thread.sleep(2000);
}
catch(Exception e){
e.printStackTrace();
}
}
}
}
The problem is that you've included some classes from Kinesis, not Kinesis Firehose, Java package. For e.g., you've used:
import com.amazonaws.services.kinesis.model.PutRecordRequest;
Whereas, you should've used:
import com.amazonaws.services.kinesisfirehose.model.PutRecordRequest;
Kinesis, Kinesis Firehose and Kinesis Analytics are different services, even though they fall under one umbrella of streaming services on AWS. Consequently, they have different package namespaces in the Java SDK. If you start from the official documentation here, you'll reach the correct Java SDK reference here.
EDIT: To answer your other question: yes, the following is deprecated:
AmazonKinesisFirehoseClient firehoseClient = new AmazonKinesisFirehoseClient(credentials);
You should instead use the following:
AmazonKinesisFirehoseClient firehoseClient = AmazonKinesisFirehoseClientBuilder.standard().withCredentials(new AWSStaticCredentialsProvider(awsCredentials)).build();
Refer to the official documentation here on how to correctly initialize AmazonKinesisFirehoseClient.
So I'm trying to use jsoup to scrape Reddit for images, but when I scrape certain subreddits such as /r/wallpaper, I get a 429 error and am wondering how to fix this. Totally understand that this code is horrible and this is a pretty noob question, but I'm completely new to this. Anyways:
import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.MalformedURLException;
import java.net.URL;
import java.net.URLConnection;
import java.io.*;
import java.net.URL;
import java.util.logging.Level;
import java.util.logging.Logger;
import java.io.*;
import java.util.logging.Level;
import java.util.logging.Logger;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Attributes;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import java.io.IOException;
import java.net.URL;
import java.util.Scanner;
public class javascraper{
public static void main (String[]args) throws MalformedURLException
{
Scanner scan = new Scanner (System.in);
System.out.println("Where do you want to store the files?");
String folderpath = scan.next();
System.out.println("What subreddit do you want to scrape?");
String subreddit = scan.next();
subreddit = ("http://reddit.com/r/" + subreddit);
new File(folderpath + "/" + subreddit).mkdir();
//test
try{
//gets http protocol
Document doc = Jsoup.connect(subreddit).timeout(0).get();
//get page title
String title = doc.title();
System.out.println("title : " + title);
//get all links
Elements links = doc.select("a[href]");
for(Element link : links){
//get value from href attribute
String checkLink = link.attr("href");
Elements images = doc.select("img[src~=(?i)\\.(png|jpe?g|gif)]");
if (imgCheck(checkLink)){ // checks to see if img link j
System.out.println("link : " + link.attr("href"));
downloadImages(checkLink, folderpath);
}
}
}
catch (IOException e){
e.printStackTrace();
}
}
public static boolean imgCheck(String http){
String png = ".png";
String jpg = ".jpg";
String jpeg = "jpeg"; // no period so checker will only check last four characaters
String gif = ".gif";
int length = http.length();
if (http.contains(png)|| http.contains("gfycat") || http.contains(jpg)|| http.contains(jpeg) || http.contains(gif)){
return true;
}
else{
return false;
}
}
private static void downloadImages(String src, String folderpath) throws IOException{
String folder = null;
//Exctract the name of the image from the src attribute
int indexname = src.lastIndexOf("/");
if (indexname == src.length()) {
src = src.substring(1, indexname);
}
indexname = src.lastIndexOf("/");
String name = src.substring(indexname, src.length());
System.out.println(name);
//Open a URL Stream
URL url = new URL(src);
InputStream in = url.openStream();
OutputStream out = new BufferedOutputStream(new FileOutputStream( folderpath+ name));
for (int b; (b = in.read()) != -1;) {
out.write(b);
}
out.close();
in.close();
}
}
Your issue is caused by the fact that your scraper is violating reddit's API rules. Error 429 means "Too many requests" – you're requesting too many pages too fast.
You can make one request every 2 seconds, and you also need to set a proper user agent (they format they recommend is <platform>:<app ID>:<version string> (by /u/<reddit username>)). The way it currently looks, your code is running too fast and doesn't specify one, so it's going to be severely rate-limited.
To fix it, first off, add this to the start of your class, before the main method:
public static final String USER_AGENT = "<PUT YOUR USER AGENT HERE>";
(Make sure to specify an actual user agent).
Then, change this (in downloadImages)
URL url = new URL(src);
InputStream in = url.openStream();
to this:
URLConnection connection = (new URL(src)).openConnection();
Thread.sleep(2000); //Delay to comply with rate limiting
connection.setRequestProperty("User-Agent", USER_AGENT);
InputStream in = connection.getInputStream();
You'll also want to change this (in main)
Document doc = Jsoup.connect(subreddit).timeout(0).get();
to this:
Document doc = Jsoup.connect(subreddit).userAgent(USER_AGENT).timeout(0).get();
Then your code should stop running into that error.
Note that using reddit's API (IE, /r/subreddit.json instead of /r/subreddit) would probably make this project easier, but it isn't required and your current code will work.
As you can look up at Wikipedia the 429 status code tells you that you have too many requests:
The user has sent too many requests in a given amount of time. Intended for use with rate limiting schemes.
A solution would be to slow down your scraper. There are some options how to do this, one would be to use sleep.
Can I use java api to connect to Hbase in a standalone mode(without Hadoop)?
Here is my code, and I was wondering how to make it work. Should I set some property to the variable 'config'?
I have these installed locally : Hbase-0.98.0 Hadoop 2.2.0
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.util.Bytes;
public class MyLittleHBaseClient {
public static void main(String[] args) throws IOException {
// maybe I should do some configuration here, but I don't know how
Configuration config = HBaseConfiguration.create();
HTable table = new HTable(config, "myLittleHBaseTable");
Put p = new Put(Bytes.toBytes("myLittleRow"));
p.add(Bytes.toBytes("myLittleFamily"), Bytes.toBytes("someQualifier"),
Bytes.toBytes("Some Value"));
table.put(p);
Get g = new Get(Bytes.toBytes("myLittleRow"));
Result r = table.get(g);
byte [] value = r.getValue(Bytes.toBytes("myLittleFamily"),
Bytes.toBytes("someQualifier"));
String valueStr = Bytes.toString(value);
System.out.println("GET: " + valueStr);
Scan s = new Scan();
s.addColumn(Bytes.toBytes("myLittleFamily"), Bytes.toBytes("someQualifier"));
ResultScanner scanner = table.getScanner(s);
try {
for (Result rr = scanner.next(); rr != null; rr = scanner.next()) {
System.out.println("Found row: " + rr);
}
} finally {
scanner.close();
}
}
}
If your hbase-site.xml in standalone mode is empty(), you don't have to set any thing. If have overridden anything in the hbase-site.xml, better add that hbase-site.xml instead of setting parameter separately.
Configuration config = HBaseConfiguration.create();
config.addResource("<HBASE_CONF_DIR_PATH>/hbase-site.xml");
I have a program that ask the user for what application it want to open,
this is how the program works:
the user write what application it want to open in a "inputDialog" example the user write "Open application Notepad".
the program looks for the word "application" in the text file so the program is sure that it was a application the user wanted to open.
both the "open application" sentence and the application name get stored in a text file.
then does program remove "Open application" from the text file, and then is only the application name visible.
but always a space comes in front of the application name. Please help me remove the space infront of the application name!!
Here is my code:
package Test_Code;
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.DataInputStream;
import java.io.FileInputStream;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import java.io.InputStreamReader;
import javax.script.ScriptEngine;
import javax.script.ScriptEngineManager;
import javax.script.ScriptException;
import javax.swing.JOptionPane;
public class New_Loader_3 {
public static void main(String[]args) throws IOException{
String Test = JOptionPane.showInputDialog("Test");
BufferedWriter writer = new BufferedWriter(new FileWriter("/Applications/Userdata/tmp/Application.txt"));
writer.write(Test);
writer.close();
int tokencount;
FileReader fr=new FileReader("/Applications/Userdata/tmp/Application.txt");
BufferedReader br=new BufferedReader(fr);
String s1;
int linecount=0;
String line;
String words[]=new String[500];
while ((s1=br.readLine())!=null)
{
linecount++;
int indexfound=s1.indexOf("application");
if (indexfound>-1)
{
FileInputStream fstream1121221 = new FileInputStream("/Applications/Userdata/tmp/Application.txt");
DataInputStream in1121211 = new DataInputStream(fstream1121221);
BufferedReader br1112211 = new BufferedReader(new InputStreamReader(in1121211));
String Name12122131;
while ((Name12122131 = br1112211.readLine()) != null) {
if (Name12122131.startsWith(" "))
{
System.out.println("Name12122131");
}
}
String mega = Test.replaceAll("Open application","");
System.out.println(mega);
BufferedWriter Update_Catch = new BufferedWriter(new FileWriter("/Applications/Userdata/tmp/Application.txt"));
Update_Catch.write(mega);
Update_Catch.close();
}
}
System.out.println("Done");
}
}
It's because the user types in Open<space>application<space>Notepad. Now when you replace Open<space>Applicaton the space before Notepad is still left. So I just you use this instead:
String mega = Test.replaceAll("Open application ","");
Adding a <space> at the end of Open<space>Application will replace the space too. So now mega will be Notepad.
Otherwise you could use what you're already using and then call mega.trim()