I am running a storm topology .This is the basic wordcount topology.I am using text file as the source and storm for processing the data.While submitting the i am facing these issues.I am very new to storm.Please suggest me the changes i need to do in the following code.Thanks in Advance !!
My topology
public class TopologyMain {
public static void main(String[] args) throws InterruptedException, AlreadyAliveException, InvalidTopologyException {
//Topology definition
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("word-reader",new WordReader());
builder.setBolt("word-normalizer", new WordNormalizer())
.shuffleGrouping("word-reader");
builder.setBolt("word-counter", new WordCounter(),1)
.fieldsGrouping("word-normalizer", new Fields("word"));
//Configuration
Config conf = new Config();
conf.put("wordsFile", args[0]);
conf.setDebug(false);
//Topology run
conf.put(Config.TOPOLOGY_MAX_SPOUT_PENDING, 1);
conf.put(Config.NIMBUS_HOST, "192.168.1.229");
//LocalCluster cluster = new LocalCluster();
//cluster.submitTopology("Getting-Started-Toplogie", conf, builder.createTopology());
//Thread.sleep(1000);
System.setProperty("storm.jar", "/home/raremile/st/examples-ch02-getting_started/target/Getting-Started-0.0.1-SNAPSHOT.jar");
StormSubmitter.submitTopology("Count-Word-Topology-With-Refresh-Cache", conf,
builder.createTopology());
//cluster.shutdown();
}
}
ERROR
Exception in thread "main" java.lang.NoSuchMethodError: backtype.storm.topology.TopologyBuilder.setBolt(Ljava/lang/String;Lbacktype/storm/topology/IBasicBolt;Ljava/lang/Integer;)Lbacktype/storm/topology/BoltDeclarer;
at TopologyMain.main(TopologyMain.java:21)
I am able to run this code in local mode without any error.
changed the version to the 0.9.0.1 and i am able to run it
<dependency>
<groupId>storm</groupId>
<artifactId>storm</artifactId>
<version>0.9.0.1</version>
</dependency>
Related
I do HBase scan in Mapper, then Reducer writes result to HDFS.
The number of records output by mapper is roughly 1,000,000,000.
The problem is the number of reducers is always one, though I have set -Dmapred.reduce.tasks=100. The reduce process is very slow.
// edit at 2016-12-04 by 祝方泽
the code of my main class:
public class GetUrlNotSent2SpiderFromHbase extends Configured implements Tool {
public int run(String[] arg0) throws Exception {
Configuration conf = getConf();
Job job = new Job(conf, conf.get("mapred.job.name"));
String input_table = conf.get("input.table");
job.setJarByClass(GetUrlNotSent2SpiderFromHbase.class);
Scan scan = new Scan();
scan.setCaching(500);
scan.setCacheBlocks(false);
scan.addColumn(Bytes.toBytes("info"), Bytes.toBytes("sitemap_type"));
scan.addColumn(Bytes.toBytes("info"), Bytes.toBytes("is_send_to_spider"));
TableMapReduceUtil.initTableMapperJob(
input_table,
scan,
GetUrlNotSent2SpiderFromHbaseMapper.class,
Text.class,
Text.class,
job);
/*job.setMapperClass(GetUrlNotSent2SpiderFromHbaseMapper.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);*/
job.setReducerClass(GetUrlNotSent2SpiderFromHbaseReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
if (job.waitForCompletion(true) && job.isSuccessful()) {
return 0;
}
return -1;
}
public static void main(String[] args) throws Exception {
Configuration conf = HBaseConfiguration.create();
int res = ToolRunner.run(conf, new GetUrlNotSent2SpiderFromHbase(), args);
System.exit(res);
}
}
here is the script to run this MapReduce job:
table="xxx"
output="yyy"
sitemap_type="zzz"
JOBCONF=""
JOBCONF="${JOBCONF} -Dmapred.job.name=test_for_scan_hbase"
JOBCONF="${JOBCONF} -Dinput.table=$table"
JOBCONF="${JOBCONF} -Dmapred.output.dir=$output"
JOBCONF="${JOBCONF} -Ddemand.sitemap.type=$sitemap_type"
JOBCONF="${JOBCONF} -Dyarn.app.mapreduce.am.command-opts='-Xmx8192m'"
JOBCONF="${JOBCONF} -Dyarn.app.mapreduce.am.resource.mb=9216"
JOBCONF="${JOBCONF} -Dmapreduce.map.java.opts='-Xmx1536m'"
JOBCONF="${JOBCONF} -Dmapreduce.map.memory.mb=2048"
JOBCONF="${JOBCONF} -Dmapreduce.reduce.java.opts='-Xmx1536m'"
JOBCONF="${JOBCONF} -Dmapreduce.reduce.memory.mb=2048"
JOBCONF="${JOBCONF} -Dmapred.reduce.tasks=100"
JOBCONF="${JOBCONF} -Dmapred.job.priority=VERY_HIGH"
hadoop fs -rmr $output
hadoop jar get_url_not_sent_2_spider_from_hbase_hourly.jar hourly.GetUrlNotSent2SpiderFromHbase $JOBCONF
echo "===== scan HBase finished ====="
I set job.setNumReduceTasks(100); in code, it worked.
Since you mentioned only one reduce is working that's the obvious reason why reducer is very slow.
Unified way to know configuration properties applied to Job (this you call for every job you execute to know parameters are passed correctly) :
add the below method to your job driver mentioned above to print configuration entries applied from all possible sources i.e either from -D or some where else please add this method call in driver program before your job is submitted :
public static void printConfigApplied(Configuration conf)
try {
conf.writeXml(System.out);
} catch (final IOException e) {
e.printStackTrace();
}
}
This proves your system properties are not applied from the command line i.e -Dxxx so the way you are passing system properties is not correct. since pro grammatically.
Since job.setnumreducetasks is working , I strongly suspect the below where your system properties are not passed correctly to driver.
Configuration conf = getConf();
Job job = new Job(conf, conf.get("mapred.job.name"));
change this to the example in this
When I'm trying to run my code it throws this Exception:
Exception in thread "main" org.apache.spark.SparkException: Could not parse Master URL:spark:http://localhost:18080
This is my code:
SparkConf conf = new SparkConf().setAppName("App_Name").setMaster("spark:http://localhost:18080").set("spark.ui.port","18080");
JavaStreamingContext ssc = new JavaStreamingContext(sc, new Duration(1000));
String[] filet=new String[]{"Obama","ISI"};
JavaReceiverInputDStream<Status> reciverStream=TwitterUtils.createStream(ssc,filet);
JavaDStream<String> statuses = reciverStream.map(new Function<Status, String>() {
public String call(Status status) { return status.getText(); }
}
);
ssc.start();
ssc.awaitTermination();}}
Any idea how can I fix this problem?
The problem is that you specify 2 schemas in the URL you pass to SparkConf.setMaster().
The spark is the schema, so you don't need to add http after spark. See the javadoc of SparkConf.setMaster() for more examples.
So the master URL you should be using is "spark://localhost:18080". Change this line:
SparkConf conf = new SparkConf().setAppName("App_Name")
.setMaster("spark://localhost:18080").set("spark.ui.port","18080");
The standard port for master is 7077 not 18080. Maybe you can try 7077.
Currently i have cluster looks zoo keeper 2 node,Kafka 3 node ,storm 3 node.
My topology I have configured zookeeper's using property file loader i have loaded zookeeper ip/ports local mode working but cluster mode not working.
My maven project structure is
project
src/main/java
src/main/test
src/main/resources
inside of resource dir i have a file's zoo.config and log4j.config
My topology is
public class Mytopology { public static void main(String[] args) throws AlreadyAliveException,
InvalidTopologyException, FileNotFoundException, IOException {
/** PropertiesConfigurator is used to configure logger from properties file */
Properties prop = new Properties();
PropertyConfigurator.configure("src/main/resources/log4j.properties");
String zoo_cluster=null;
int zoo_cluster_timeout_ms;
/** Topology definition */
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("kafka", new KafkaSpout(),1);
builder.setBolt("bolt1",new bolt1(),1).shuffleGrouping("kafka");
builder.setBolt("bolt2",new bolt2(),1).shuffleGrouping("bolt1");
/** create a storm Config object*/
Config config = new Config();
prop.load(new FileInputStream("src/main/resources/zookeeper-config.properties"));
zoo_cluster= prop.getProperty("zookeeper.connect");
String zoo_cluster_timeout= prop.getProperty("consumer.timeout.ms");
zoo_cluster_timeout_ms=Integer.parseInt(zoo_cluster_timeout);
config.put("kafka.zookeeper.connect",zoo_cluster);
config.put("kafka.consumer.timeout.ms",zoo_cluster_timeout_ms);
/** submit topology into cluster */
if (args != null && args.length > 2) {
StormSubmitter.submitTopology(args[2], config,builder.createTopology());
System.out.println("Topology submitted into Storm Cluster ........");
}
/** submit topology into local */
else if(args != null && args.length > 1) {
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("storm-local", config,
builder.createTopology());
System.out.println("Topology submitted into Local Mode........");
Utils.sleep(100000);
}
}
}
Plz help me! how read property file in cluster mode both zookeeper config and error logs
Few days ago, I decided to let
selenium webdriver(a third-party package)
run in mapreduce frame of hadoop.
And I met a problem. The map step freeze in new FirefoxDriver();. The FirefoxDriver class was in the third-party jar named selenium-server-standalone-2.38.0.jar. If someone had some experiences or interest in, I need your help !
Some Details:
problem details
In order to run the code in command line, I use "Xvfb" to stop the Firefox graphic
interface. Then the problem I said in the beginning arise. I go through the tasktraker's log and find the code freeze at
this.driver = new FirefoxDriver(ffprofile); Although the code freezed, the firefox
has been setup, I check it using ps -A | grep firefox
environment:
ubuntu 10.04 32bit; hadoop-1.2.0; Mozilla Firefox 17.0.5; selenium-server-standalone-2.38.0.jar; Xvfb;
hints
(1) Hadoop run in Pesudo-distributed mode;
(2) Every thing is ok when I run the code in Eclipse.The Firefox pop up as planned.(I will show the demo code lastly);
(3) If you run in to org.openqa.selenium.WebDriverException: Unable to bind to locking port 7054 within 45000 ms, use commad ps -A | grep firefox to check if some firefox was setup, and use killall firefox.
(4) Let the code run in command line. Maybe you would meet Error no display specified
You can install xvfb and setup xvfb using Xvfb :99 -ac 2>/dev/null &. Before setup xvfb append a line export DISPLAY=:99 to the end of HADOOP_HOME/conf/hadoop-env.sh
code demo
public class MapRunnerNewFirefox extends Configured implements Tool, MapRunnable{
public static final Logger LOG = LoggerFactory.getLogger(MapRunnerNewFirefox.class);
#Override
public void configure(JobConf conf) {
}
#Override
public void run(RecordReader recordReader,
OutputCollector output, Reporter reporter) throws IOException {
LongWritable key = new LongWritable(-1);// shouldn't be null ,otherwise the recordReader will report nullpointer err;
Text val = new Text("begin text"); // same as up line;
int i = 0;
reporter.progress();
while(recordReader.next(key, val)){
if(LOG.isInfoEnabled()){
LOG.info("key: "+key.toString()+" val: "+val.toString());
}
String temp = "ao";
NewFirefox ff = new NewFirefox("/home/cc/firefox/firefox/firefox");
output.collect(new Text("get-"+key.toString()), new Text(temp));
}
}
#Override
public int run(String[] args) throws Exception {
if(LOG.isInfoEnabled()) {
LOG.info("set maprunner conf");
}
Path urlDir = new Path(args[0]);
Path resultDir = new Path(args[1] + System.currentTimeMillis());
JobConf job = new JobConf(getConf());
job.setNumMapTasks(1);
job.setJobName("hello maprunners");
job.setInputFormat(TextInputFormat.class);
FileInputFormat.addInputPath(job, urlDir);
job.setMapRunnerClass(MapRunnerNewFirefox.class);
FileOutputFormat.setOutputPath(job, resultDir);
job.setOutputFormat(TextOutputFormat.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
JobClient.runJob(job);
return 0;
}
public static void main(String[] args) throws Exception{
Configuration conf = new Configuration();
int res = ToolRunner.run(conf, new MapRunnerNewFirefox(), args);
System.exit(res);
}
}
public class NewFirefox {
private WebDriver driver;
private static final Logger LOG = LoggerFactory.getLogger(NewFirefox.class);
public NewFirefox(String firefoxPath){
if(LOG.isInfoEnabled()){
LOG.info("firefox ****0");
}
System.setProperty("webdriver.firefox.bin", firefoxPath);
if(LOG.isInfoEnabled()){
LOG.info("firefox ****1");
}
ProfilesIni profile = new ProfilesIni();
FirefoxProfile ffprofile = profile.getProfile("default");
if(LOG.isInfoEnabled()){
LOG.info("firefox ****2");
}
this.driver = new FirefoxDriver(ffprofile);
if(LOG.isInfoEnabled()){
LOG.info("firefox ****3");
}
this.driver.quit();
if(LOG.isInfoEnabled()){
LOG.info("firefox quit");
}
}
}
I have created a topology which should read from a file and write it to a new file. My program is running properly in local cluster but while submitting in remote cluster i am not getting any error but file is not getting created. below is my code to submit topolgy in remote cluster :-
public static void main(String[] args) {
final Logger logger = LoggingService.getLogger(FileToFileTopology.class.getName());
try{
Properties prop =new Properties();
prop.load(new FileInputStream(args[0]+"/connection.properties"));
LoggingService.generateAppender("storm_etl",prop, "");
logger.info("inside main method...." +args.length);
System.out.println("inside main sys out");
Config conf= new Config();
conf.setDebug(false);
conf.put(Config.TOPOLOGY_MAX_SPOUT_PENDING,1);
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("file-reader",new FileReaderSpout(args[1]));
builder.setBolt("file-writer",new WriteToFileBolt(args[1]),2).shuffleGrouping("file-reader");
logger.info("submitting topology");
StormSubmitter.submitTopology(args[0], conf, builder.createTopology());
}
catch(Exception e){
System.out.println("inside catch");
logger.info("inside catch"+e.getMessage());
logger.error("inside error", e);
e.printStackTrace();
}
}
I have also used log4j to create my own logfile for my topology, log file gets created but no error in my log file. pls help
I had same issue with Hortonworks2.2. This happened because of permissions.
Even if you are submitting to the cluster as Root user, when submitting the storm jar command, it executes as 'storm' user. It can read the file from source, but won't write, because it doesn't have the necessary rights.
Modify the permissions of destination folder where you want to write file with all permissions.
chmod 777 -R /user/filesfolder