Currently i have cluster looks zoo keeper 2 node,Kafka 3 node ,storm 3 node.
My topology I have configured zookeeper's using property file loader i have loaded zookeeper ip/ports local mode working but cluster mode not working.
My maven project structure is
project
src/main/java
src/main/test
src/main/resources
inside of resource dir i have a file's zoo.config and log4j.config
My topology is
public class Mytopology { public static void main(String[] args) throws AlreadyAliveException,
InvalidTopologyException, FileNotFoundException, IOException {
/** PropertiesConfigurator is used to configure logger from properties file */
Properties prop = new Properties();
PropertyConfigurator.configure("src/main/resources/log4j.properties");
String zoo_cluster=null;
int zoo_cluster_timeout_ms;
/** Topology definition */
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("kafka", new KafkaSpout(),1);
builder.setBolt("bolt1",new bolt1(),1).shuffleGrouping("kafka");
builder.setBolt("bolt2",new bolt2(),1).shuffleGrouping("bolt1");
/** create a storm Config object*/
Config config = new Config();
prop.load(new FileInputStream("src/main/resources/zookeeper-config.properties"));
zoo_cluster= prop.getProperty("zookeeper.connect");
String zoo_cluster_timeout= prop.getProperty("consumer.timeout.ms");
zoo_cluster_timeout_ms=Integer.parseInt(zoo_cluster_timeout);
config.put("kafka.zookeeper.connect",zoo_cluster);
config.put("kafka.consumer.timeout.ms",zoo_cluster_timeout_ms);
/** submit topology into cluster */
if (args != null && args.length > 2) {
StormSubmitter.submitTopology(args[2], config,builder.createTopology());
System.out.println("Topology submitted into Storm Cluster ........");
}
/** submit topology into local */
else if(args != null && args.length > 1) {
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("storm-local", config,
builder.createTopology());
System.out.println("Topology submitted into Local Mode........");
Utils.sleep(100000);
}
}
}
Plz help me! how read property file in cluster mode both zookeeper config and error logs
Related
I do HBase scan in Mapper, then Reducer writes result to HDFS.
The number of records output by mapper is roughly 1,000,000,000.
The problem is the number of reducers is always one, though I have set -Dmapred.reduce.tasks=100. The reduce process is very slow.
// edit at 2016-12-04 by 祝方泽
the code of my main class:
public class GetUrlNotSent2SpiderFromHbase extends Configured implements Tool {
public int run(String[] arg0) throws Exception {
Configuration conf = getConf();
Job job = new Job(conf, conf.get("mapred.job.name"));
String input_table = conf.get("input.table");
job.setJarByClass(GetUrlNotSent2SpiderFromHbase.class);
Scan scan = new Scan();
scan.setCaching(500);
scan.setCacheBlocks(false);
scan.addColumn(Bytes.toBytes("info"), Bytes.toBytes("sitemap_type"));
scan.addColumn(Bytes.toBytes("info"), Bytes.toBytes("is_send_to_spider"));
TableMapReduceUtil.initTableMapperJob(
input_table,
scan,
GetUrlNotSent2SpiderFromHbaseMapper.class,
Text.class,
Text.class,
job);
/*job.setMapperClass(GetUrlNotSent2SpiderFromHbaseMapper.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);*/
job.setReducerClass(GetUrlNotSent2SpiderFromHbaseReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
if (job.waitForCompletion(true) && job.isSuccessful()) {
return 0;
}
return -1;
}
public static void main(String[] args) throws Exception {
Configuration conf = HBaseConfiguration.create();
int res = ToolRunner.run(conf, new GetUrlNotSent2SpiderFromHbase(), args);
System.exit(res);
}
}
here is the script to run this MapReduce job:
table="xxx"
output="yyy"
sitemap_type="zzz"
JOBCONF=""
JOBCONF="${JOBCONF} -Dmapred.job.name=test_for_scan_hbase"
JOBCONF="${JOBCONF} -Dinput.table=$table"
JOBCONF="${JOBCONF} -Dmapred.output.dir=$output"
JOBCONF="${JOBCONF} -Ddemand.sitemap.type=$sitemap_type"
JOBCONF="${JOBCONF} -Dyarn.app.mapreduce.am.command-opts='-Xmx8192m'"
JOBCONF="${JOBCONF} -Dyarn.app.mapreduce.am.resource.mb=9216"
JOBCONF="${JOBCONF} -Dmapreduce.map.java.opts='-Xmx1536m'"
JOBCONF="${JOBCONF} -Dmapreduce.map.memory.mb=2048"
JOBCONF="${JOBCONF} -Dmapreduce.reduce.java.opts='-Xmx1536m'"
JOBCONF="${JOBCONF} -Dmapreduce.reduce.memory.mb=2048"
JOBCONF="${JOBCONF} -Dmapred.reduce.tasks=100"
JOBCONF="${JOBCONF} -Dmapred.job.priority=VERY_HIGH"
hadoop fs -rmr $output
hadoop jar get_url_not_sent_2_spider_from_hbase_hourly.jar hourly.GetUrlNotSent2SpiderFromHbase $JOBCONF
echo "===== scan HBase finished ====="
I set job.setNumReduceTasks(100); in code, it worked.
Since you mentioned only one reduce is working that's the obvious reason why reducer is very slow.
Unified way to know configuration properties applied to Job (this you call for every job you execute to know parameters are passed correctly) :
add the below method to your job driver mentioned above to print configuration entries applied from all possible sources i.e either from -D or some where else please add this method call in driver program before your job is submitted :
public static void printConfigApplied(Configuration conf)
try {
conf.writeXml(System.out);
} catch (final IOException e) {
e.printStackTrace();
}
}
This proves your system properties are not applied from the command line i.e -Dxxx so the way you are passing system properties is not correct. since pro grammatically.
Since job.setnumreducetasks is working , I strongly suspect the below where your system properties are not passed correctly to driver.
Configuration conf = getConf();
Job job = new Job(conf, conf.get("mapred.job.name"));
change this to the example in this
We have an OSGi based server, from where we use an embedded jetty to handle webtraffic.
I'm using the XmlConfiguration to create a jetty server instance, see the code below.
The configStream is the from the jetty-http.xml which is read per default from our plugin or from a custom location.
Now I'm trying to enable https for the server. I would like to load the jetty-ssl.xml and jetty-https.xml the same as jetty-http.xml.
How can I do that? I can't load another stream into the XmlConfiguration.
Is there another approach, maybe without XmlConfiguration?
XmlConfiguration xmlConfig = new XmlConfiguration(configStream);
Object root = xmlConfig.configure();
if (!(root instanceof Server)) {
throw new IllegalArgumentException("expected a Server object as a root for server configuration"); //$NON-NLS-1$
}
server = (Server) root;
I found a solution, see the code below.
XmlConfiguration can be used for all xml files, if two of them create a Server instance (e.g. jetty.xml and jetty-ssl.xml), only one Server is created and configurations/beans are added to the same instance.
List<String> configurations = new ArrayList<String>();
configurations.add("jetty.xml"); //$NON-NLS-1$
// use pre-configured jetty xml files to construct a server instance
if (System.getProperty("jetty.sslContext.keyStorePath") != null) { //$NON-NLS-1$
configurations.add("jetty-ssl.xml"); //$NON-NLS-1$
configurations.add("jetty-ssl-context.xml"); //$NON-NLS-1$
configurations.add("jetty-https.xml"); //$NON-NLS-1$
} else {
configurations.add("jetty-http.xml"); //$NON-NLS-1$
}
XmlConfiguration last = null;
List<Object> objects = new ArrayList<Object>();
for (String configFile : configurations) {
InputStream configStream = null;
File xmlConfiguration = new File(webserverHome, CONFIG_LOCATION + configFile);
if (xmlConfiguration.exists()) {
configStream = new FileInputStream(xmlConfiguration);
logger.info("Using custom XML configuration {}", xmlConfiguration); //$NON-NLS-1$
} else {
// configStream = ... // read from bundle
logger.info("Using default XML configuration {}/{}", Activator.PLUGIN_ID, CONFIG_LOCATION + configFile); //$NON-NLS-1$
}
XmlConfiguration configuration = new XmlConfiguration(configStream);
if (last != null) {
configuration.getIdMap().putAll(last.getIdMap());
}
objects.add(configuration.configure());
last = configuration;
}
// first object is a Server instance because of the jetty.xml
server = (Server) objects.get(0);
I am running a storm topology .This is the basic wordcount topology.I am using text file as the source and storm for processing the data.While submitting the i am facing these issues.I am very new to storm.Please suggest me the changes i need to do in the following code.Thanks in Advance !!
My topology
public class TopologyMain {
public static void main(String[] args) throws InterruptedException, AlreadyAliveException, InvalidTopologyException {
//Topology definition
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("word-reader",new WordReader());
builder.setBolt("word-normalizer", new WordNormalizer())
.shuffleGrouping("word-reader");
builder.setBolt("word-counter", new WordCounter(),1)
.fieldsGrouping("word-normalizer", new Fields("word"));
//Configuration
Config conf = new Config();
conf.put("wordsFile", args[0]);
conf.setDebug(false);
//Topology run
conf.put(Config.TOPOLOGY_MAX_SPOUT_PENDING, 1);
conf.put(Config.NIMBUS_HOST, "192.168.1.229");
//LocalCluster cluster = new LocalCluster();
//cluster.submitTopology("Getting-Started-Toplogie", conf, builder.createTopology());
//Thread.sleep(1000);
System.setProperty("storm.jar", "/home/raremile/st/examples-ch02-getting_started/target/Getting-Started-0.0.1-SNAPSHOT.jar");
StormSubmitter.submitTopology("Count-Word-Topology-With-Refresh-Cache", conf,
builder.createTopology());
//cluster.shutdown();
}
}
ERROR
Exception in thread "main" java.lang.NoSuchMethodError: backtype.storm.topology.TopologyBuilder.setBolt(Ljava/lang/String;Lbacktype/storm/topology/IBasicBolt;Ljava/lang/Integer;)Lbacktype/storm/topology/BoltDeclarer;
at TopologyMain.main(TopologyMain.java:21)
I am able to run this code in local mode without any error.
changed the version to the 0.9.0.1 and i am able to run it
<dependency>
<groupId>storm</groupId>
<artifactId>storm</artifactId>
<version>0.9.0.1</version>
</dependency>
I have created a topology which should read from a file and write it to a new file. My program is running properly in local cluster but while submitting in remote cluster i am not getting any error but file is not getting created. below is my code to submit topolgy in remote cluster :-
public static void main(String[] args) {
final Logger logger = LoggingService.getLogger(FileToFileTopology.class.getName());
try{
Properties prop =new Properties();
prop.load(new FileInputStream(args[0]+"/connection.properties"));
LoggingService.generateAppender("storm_etl",prop, "");
logger.info("inside main method...." +args.length);
System.out.println("inside main sys out");
Config conf= new Config();
conf.setDebug(false);
conf.put(Config.TOPOLOGY_MAX_SPOUT_PENDING,1);
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("file-reader",new FileReaderSpout(args[1]));
builder.setBolt("file-writer",new WriteToFileBolt(args[1]),2).shuffleGrouping("file-reader");
logger.info("submitting topology");
StormSubmitter.submitTopology(args[0], conf, builder.createTopology());
}
catch(Exception e){
System.out.println("inside catch");
logger.info("inside catch"+e.getMessage());
logger.error("inside error", e);
e.printStackTrace();
}
}
I have also used log4j to create my own logfile for my topology, log file gets created but no error in my log file. pls help
I had same issue with Hortonworks2.2. This happened because of permissions.
Even if you are submitting to the cluster as Root user, when submitting the storm jar command, it executes as 'storm' user. It can read the file from source, but won't write, because it doesn't have the necessary rights.
Modify the permissions of destination folder where you want to write file with all permissions.
chmod 777 -R /user/filesfolder
I've issuse with configure file quartz.properties connect to mongodb.
this is my file quartz.properties
#specify the jobstore used
org.quartz.jobStore.class=com.novemberain.quartz.mongodb.MongoDBJobStore
org.quartz.jobStore.mongoUri=mongodb://localhost:27017
#The datasource for the jobstore that is to be used
org.quartz.jobStore.dbName=myds
org.quartz.jobStore.addresses=host1,host2
#quartz table prefixes in the database
org.quartz.jobStore.collectionPrefix=quartz_
org.quartz.threadPool.threadCount = 4
Can anyone recommend either a way to back Quartz with MongoDB use quartz.properties, or a simple alternative to Quartz?
Use this method for get scheduler with/without properties
public Scheduler getScheduler(Properties properties) {
SchedulerFactory factory;
if (!properties.isEmpty()) {
// with properties
factory = new StdSchedulerFactory(properties);
} else {
// without properties
factory = new StdSchedulerFactory();
}
return factory.getScheduler();
}
And this method for load properties file
public Scheduler load() throws SchedulerException {
Properties prop = new Properties();
try {
// file 'quartz.properties' in the 'src/main/resources/config' (Maven project structure)
properties.load(this.getClass().getResourceAsStream("/config/my-quartz.properties"));
} catch (IOException e) {
// process the exception, maybe load default properties
}
return getScheduler(properties);
}
You can put properties file into 'src/main/resources/config' folder,
or set $JAVA_OPTS=-Dorg.quartz.properties=/config/my-quartz.properties
Also you can load properties with Spring
#Component
public class SchedulerLoader {
#Value("${org.quartz.jobStore.class}")
private String quartzJobStoreClass;
#Value("${org.quartz.jobStore.mongoUri}")
private String quartzJobStoreMongoUri;
#Value("${org.quartz.jobStore.dbName}")
private String quartzJobStoreDbName;
#Value("${org.quartz.jobStore.collectionPrefix}")
private String quartzJobStoreCollectionPrefix;
#Value("${org.quartz.threadPool.threadCount}")
private String quartzThreadPoolThreadCount;
#Value("${org.quartz.jobStore.addresses}")
private String quartzJobStoreAddresses;
public Scheduler load() throws SchedulerException {
Properties properties = new Properties();
try {
properties.setProperty("org.quartz.jobStore.class", quartzJobStoreClass);
properties.setProperty("org.quartz.jobStore.mongoUri", quartzJobStoreMongoUri);
properties.setProperty("org.quartz.jobStore.dbName", quartzJobStoreDbName);
properties.setProperty("org.quartz.jobStore.collectionPrefix", quartzJobStoreCollectionPrefix);
properties.setProperty("org.quartz.threadPool.threadCount", quartzThreadPoolThreadCount);
properties.setProperty("org.quartz.jobStore.addresses", quartzJobStoreAddresses);
} catch (IOException e) {
// process the exception, maybe load default properties
}
return getScheduler(properties);
}
...
with Spring config
<beans ...>
...
<context:annotation-config/>
<context:property-placeholder location="classpath:config/*.properties"/>
...
There are properties file
# Use the MongoDB store
org.quartz.jobStore.class=com.novemberain.quartz.mongodb.MongoDBJobStore
# MongoDB URI (optional if 'org.quartz.jobStore.addresses' is set)
org.quartz.jobStore.mongoUri=mongodb://localhost:27020
# comma separated list of mongodb hosts/replica set seeds (optional if 'org.quartz.jobStore.mongoUri' is set)
org.quartz.jobStore.addresses=host1,host2
# database name
org.quartz.jobStore.dbName=quartz
# Will be used to create collections like mycol_jobs, mycol_triggers, mycol_calendars, mycol_locks
org.quartz.jobStore.collectionPrefix=mycol
# thread count setting is ignored by the MongoDB store but Quartz requries it
org.quartz.threadPool.threadCount=1
Also you should add Maven dependency (check https://github.com/michaelklishin/quartz-mongodb for details)
<dependency>
<groupId>com.novemberain</groupId>
<artifactId>quartz-mongodb</artifactId>
<version>2.0.0-rc1</version>
</dependency>