HDFS Permission denied - java

I'm trying to start a MapReduce job from java. But when I try to submit the job I get Permission Denied exception. I'm able to run hdfs dfs -ls / from command line without any error. But it doesn't work from my java program.
Here's my code
public static void main(String[] args) {
Configuration conf=new Configuration();
conf.set("mapreduce.map.class","org.apache.hadoop.conf.TestMapper");
conf.set("mapreduce.reduce.class","org.apache.hadoop.conf.TestReducer");
conf.set("mapreduce.framework.name","yarn");
conf.set("hadoop.security.group.mapping","org.apache.hadoop.security.ShellBasedUnixGroupsMapping");
conf.set("fs.default.name","hdfs://master:9000");
conf.set("dfs.permission","false");
conf.set("yarn.nodemanager.aux-services","mapreduce_shuffle");
conf.set("yarn.resourcemanager.resource-tracker.address","master:8025");
conf.set("yarn.resourcemanager.scheduler.address","master:8030");
conf.set("yarn.resourcemanager.address","master:8040");
conf.set("yarn.nodemanager.localizer.address","master:8060");
Job job=null;
try {
job = Job.getInstance(conf, "Test Map Reduce");
job.setJarByClass(RunJob.class);
job.setOutputKeyClass(LongWritable.class);
job.setOutputValueClass(Text.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
TextInputFormat.setInputPaths(job, new Path("/input.txt"));
TextOutputFormat.setOutputPath(job, new Path("/output"));
job.submit();
}
But I get the following exception
org.apache.hadoop.security.AccessControlException: Permission denied: user=manthosh, access=EXECUTE, inode="/tmp":hduser:supergroup:drwxrwx---
The solution here doesn't work.
What am I missing?

Related

how to submit mapreduce job with yarn api in java

I want submit my MR job using YARN java API, I try to do it like WritingYarnApplications, but I don't know what to add amContainer, below is code I have written:
package org.apache.hadoop.examples;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.yarn.api.protocolrecords.GetNewApplicationResponse;
import org.apache.hadoop.yarn.api.records.ApplicationId;
import org.apache.hadoop.yarn.api.records.ApplicationSubmissionContext;
import org.apache.hadoop.yarn.api.records.ContainerLaunchContext;
import org.apache.hadoop.yarn.api.records.Resource;
import org.apache.hadoop.yarn.client.api.YarnClient;
import org.apache.hadoop.yarn.client.api.YarnClientApplication;
import org.apache.hadoop.yarn.util.Records;
import org.mortbay.util.ajax.JSON;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
public class YarnJob {
private static Logger logger = LoggerFactory.getLogger(YarnJob.class);
public static void main(String[] args) throws Throwable {
Configuration conf = new Configuration();
YarnClient client = YarnClient.createYarnClient();
client.init(conf);
client.start();
System.out.println(JSON.toString(client.getAllQueues()));
System.out.println(JSON.toString(client.getConfig()));
//System.out.println(JSON.toString(client.getApplications()));
System.out.println(JSON.toString(client.getYarnClusterMetrics()));
YarnClientApplication app = client.createApplication();
GetNewApplicationResponse appResponse = app.getNewApplicationResponse();
ApplicationId appId = appResponse.getApplicationId();
// Create launch context for app master
ApplicationSubmissionContext appContext = Records.newRecord(ApplicationSubmissionContext.class);
// set the application id
appContext.setApplicationId(appId);
// set the application name
appContext.setApplicationName("test");
// Set the queue to which this application is to be submitted in the RM
appContext.setQueue("default");
// Set up the container launch context for the application master
ContainerLaunchContext amContainer = Records.newRecord(ContainerLaunchContext.class);
//amContainer.setLocalResources();
//amContainer.setCommands();
//amContainer.setEnvironment();
appContext.setAMContainerSpec(amContainer);
appContext.setResource(Resource.newInstance(1024, 1));
appContext.setApplicationType("MAPREDUCE");
// Submit the application to the applications manager
client.submitApplication(appContext);
//client.stop();
}
}
I can run a mapreduce job properly with command interface:
hadoop jar wordcount.jar org.apache.hadoop.examples.WordCount /user/admin/input /user/admin/output/
But how can I submit this wordcount job in yarn java api?
You do not use Yarn Client to submit job, instead use MapReduce APIs to submit job. See this link for Example
However if you need more control on the job, like getting status of completion, Mapper phase status, Reducer phase status, etc, you can use
job.submit();
Instead of
job.waitForCompletion(true)
You can use functions job.mapProgress() and job.reduceProgress() to get the status. There are lots of functions in job object which you can explore.
As far as your query about
hadoop jar wordcount.jar org.apache.hadoop.examples.WordCount /user/admin/input /user/admin/output/
Whats happening here is you are running your driver program which is available in wordcount.jar. Instead of doing "java -jar wordcount.jar" you are using "hadoop jar wordcount.jar". you can as well use "yarn jar wordcount.jar". Hadoop/Yarn will setup necessary additional classpaths compared to java -jar command. This executes the "main()" of your driver program which is available in class org.apache.hadoop.examples.WordCount as specified in the command.
You can check out the source here Source for WordCount class
The only reason i would assume you want to submit job via yarn is to integrate it with some kind of service which kicks up MapReduce2 jobs on certain events.
For this you can always have your drivers main() something like this.
public class MyMapReduceDriver extends Configured implements Tool {
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
/******/
int errCode = ToolRunner.run(conf, new MyMapReduceDriver(), args);
System.exit(errCode);
}
#Override
public int run(String[] args) throws Exception {
while(true) {
try{
runMapReduceJob();
}
catch(IOException e)
{
e.printStackTrace();
}
}
}
private void runMapReduceJob() {
Configuration conf = new Configuration();
Job job = new Job(conf, "word count");
/******/
job.submit();
// Get status
while(job.getJobState()==RUNNING || job.getJobState()==PREP){
Thread.sleep(1000);
System.out.println(" Map: "+ StringUtils.formatPercent(job.mapProgress(), 0) + " Reducer: "+ StringUtils.formatPercent(job.reduceProgress(), 0));
}
}}
Hope this helps.

during HBase scan with MapReduce, the number of Reducer is always one

I do HBase scan in Mapper, then Reducer writes result to HDFS.
The number of records output by mapper is roughly 1,000,000,000.
The problem is the number of reducers is always one, though I have set -Dmapred.reduce.tasks=100. The reduce process is very slow.
// edit at 2016-12-04 by 祝方泽
the code of my main class:
public class GetUrlNotSent2SpiderFromHbase extends Configured implements Tool {
public int run(String[] arg0) throws Exception {
Configuration conf = getConf();
Job job = new Job(conf, conf.get("mapred.job.name"));
String input_table = conf.get("input.table");
job.setJarByClass(GetUrlNotSent2SpiderFromHbase.class);
Scan scan = new Scan();
scan.setCaching(500);
scan.setCacheBlocks(false);
scan.addColumn(Bytes.toBytes("info"), Bytes.toBytes("sitemap_type"));
scan.addColumn(Bytes.toBytes("info"), Bytes.toBytes("is_send_to_spider"));
TableMapReduceUtil.initTableMapperJob(
input_table,
scan,
GetUrlNotSent2SpiderFromHbaseMapper.class,
Text.class,
Text.class,
job);
/*job.setMapperClass(GetUrlNotSent2SpiderFromHbaseMapper.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);*/
job.setReducerClass(GetUrlNotSent2SpiderFromHbaseReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
if (job.waitForCompletion(true) && job.isSuccessful()) {
return 0;
}
return -1;
}
public static void main(String[] args) throws Exception {
Configuration conf = HBaseConfiguration.create();
int res = ToolRunner.run(conf, new GetUrlNotSent2SpiderFromHbase(), args);
System.exit(res);
}
}
here is the script to run this MapReduce job:
table="xxx"
output="yyy"
sitemap_type="zzz"
JOBCONF=""
JOBCONF="${JOBCONF} -Dmapred.job.name=test_for_scan_hbase"
JOBCONF="${JOBCONF} -Dinput.table=$table"
JOBCONF="${JOBCONF} -Dmapred.output.dir=$output"
JOBCONF="${JOBCONF} -Ddemand.sitemap.type=$sitemap_type"
JOBCONF="${JOBCONF} -Dyarn.app.mapreduce.am.command-opts='-Xmx8192m'"
JOBCONF="${JOBCONF} -Dyarn.app.mapreduce.am.resource.mb=9216"
JOBCONF="${JOBCONF} -Dmapreduce.map.java.opts='-Xmx1536m'"
JOBCONF="${JOBCONF} -Dmapreduce.map.memory.mb=2048"
JOBCONF="${JOBCONF} -Dmapreduce.reduce.java.opts='-Xmx1536m'"
JOBCONF="${JOBCONF} -Dmapreduce.reduce.memory.mb=2048"
JOBCONF="${JOBCONF} -Dmapred.reduce.tasks=100"
JOBCONF="${JOBCONF} -Dmapred.job.priority=VERY_HIGH"
hadoop fs -rmr $output
hadoop jar get_url_not_sent_2_spider_from_hbase_hourly.jar hourly.GetUrlNotSent2SpiderFromHbase $JOBCONF
echo "===== scan HBase finished ====="
I set job.setNumReduceTasks(100); in code, it worked.
Since you mentioned only one reduce is working that's the obvious reason why reducer is very slow.
Unified way to know configuration properties applied to Job (this you call for every job you execute to know parameters are passed correctly) :
add the below method to your job driver mentioned above to print configuration entries applied from all possible sources i.e either from -D or some where else please add this method call in driver program before your job is submitted :
public static void printConfigApplied(Configuration conf)
try {
conf.writeXml(System.out);
} catch (final IOException e) {
e.printStackTrace();
}
}
This proves your system properties are not applied from the command line i.e -Dxxx so the way you are passing system properties is not correct. since pro grammatically.
Since job.setnumreducetasks is working , I strongly suspect the below where your system properties are not passed correctly to driver.
Configuration conf = getConf();
Job job = new Job(conf, conf.get("mapred.job.name"));
change this to the example in this

Error while submiting topology to storm clustre

I am running a storm topology .This is the basic wordcount topology.I am using text file as the source and storm for processing the data.While submitting the i am facing these issues.I am very new to storm.Please suggest me the changes i need to do in the following code.Thanks in Advance !!
My topology
public class TopologyMain {
public static void main(String[] args) throws InterruptedException, AlreadyAliveException, InvalidTopologyException {
//Topology definition
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("word-reader",new WordReader());
builder.setBolt("word-normalizer", new WordNormalizer())
.shuffleGrouping("word-reader");
builder.setBolt("word-counter", new WordCounter(),1)
.fieldsGrouping("word-normalizer", new Fields("word"));
//Configuration
Config conf = new Config();
conf.put("wordsFile", args[0]);
conf.setDebug(false);
//Topology run
conf.put(Config.TOPOLOGY_MAX_SPOUT_PENDING, 1);
conf.put(Config.NIMBUS_HOST, "192.168.1.229");
//LocalCluster cluster = new LocalCluster();
//cluster.submitTopology("Getting-Started-Toplogie", conf, builder.createTopology());
//Thread.sleep(1000);
System.setProperty("storm.jar", "/home/raremile/st/examples-ch02-getting_started/target/Getting-Started-0.0.1-SNAPSHOT.jar");
StormSubmitter.submitTopology("Count-Word-Topology-With-Refresh-Cache", conf,
builder.createTopology());
//cluster.shutdown();
}
}
ERROR
Exception in thread "main" java.lang.NoSuchMethodError: backtype.storm.topology.TopologyBuilder.setBolt(Ljava/lang/String;Lbacktype/storm/topology/IBasicBolt;Ljava/lang/Integer;)Lbacktype/storm/topology/BoltDeclarer;
at TopologyMain.main(TopologyMain.java:21)
I am able to run this code in local mode without any error.
changed the version to the 0.9.0.1 and i am able to run it
<dependency>
<groupId>storm</groupId>
<artifactId>storm</artifactId>
<version>0.9.0.1</version>
</dependency>

Run the "Selenium Webdriver" in the mapreduce frame of Hadoop, freezed in the map step

Few days ago, I decided to let
selenium webdriver(a third-party package)
run in mapreduce frame of hadoop.
And I met a problem. The map step freeze in new FirefoxDriver();. The FirefoxDriver class was in the third-party jar named selenium-server-standalone-2.38.0.jar. If someone had some experiences or interest in, I need your help !
Some Details:
problem details
In order to run the code in command line, I use "Xvfb" to stop the Firefox graphic
interface. Then the problem I said in the beginning arise. I go through the tasktraker's log and find the code freeze at
this.driver = new FirefoxDriver(ffprofile); Although the code freezed, the firefox
has been setup, I check it using ps -A | grep firefox
environment:
ubuntu 10.04 32bit; hadoop-1.2.0; Mozilla Firefox 17.0.5; selenium-server-standalone-2.38.0.jar; Xvfb;
hints
(1) Hadoop run in Pesudo-distributed mode;
(2) Every thing is ok when I run the code in Eclipse.The Firefox pop up as planned.(I will show the demo code lastly);
(3) If you run in to org.openqa.selenium.WebDriverException: Unable to bind to locking port 7054 within 45000 ms, use commad ps -A | grep firefox to check if some firefox was setup, and use killall firefox.
(4) Let the code run in command line. Maybe you would meet Error no display specified
You can install xvfb and setup xvfb using Xvfb :99 -ac 2>/dev/null &. Before setup xvfb append a line export DISPLAY=:99 to the end of HADOOP_HOME/conf/hadoop-env.sh
code demo
public class MapRunnerNewFirefox extends Configured implements Tool, MapRunnable{
public static final Logger LOG = LoggerFactory.getLogger(MapRunnerNewFirefox.class);
#Override
public void configure(JobConf conf) {
}
#Override
public void run(RecordReader recordReader,
OutputCollector output, Reporter reporter) throws IOException {
LongWritable key = new LongWritable(-1);// shouldn't be null ,otherwise the recordReader will report nullpointer err;
Text val = new Text("begin text"); // same as up line;
int i = 0;
reporter.progress();
while(recordReader.next(key, val)){
if(LOG.isInfoEnabled()){
LOG.info("key: "+key.toString()+" val: "+val.toString());
}
String temp = "ao";
NewFirefox ff = new NewFirefox("/home/cc/firefox/firefox/firefox");
output.collect(new Text("get-"+key.toString()), new Text(temp));
}
}
#Override
public int run(String[] args) throws Exception {
if(LOG.isInfoEnabled()) {
LOG.info("set maprunner conf");
}
Path urlDir = new Path(args[0]);
Path resultDir = new Path(args[1] + System.currentTimeMillis());
JobConf job = new JobConf(getConf());
job.setNumMapTasks(1);
job.setJobName("hello maprunners");
job.setInputFormat(TextInputFormat.class);
FileInputFormat.addInputPath(job, urlDir);
job.setMapRunnerClass(MapRunnerNewFirefox.class);
FileOutputFormat.setOutputPath(job, resultDir);
job.setOutputFormat(TextOutputFormat.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
JobClient.runJob(job);
return 0;
}
public static void main(String[] args) throws Exception{
Configuration conf = new Configuration();
int res = ToolRunner.run(conf, new MapRunnerNewFirefox(), args);
System.exit(res);
}
}
public class NewFirefox {
private WebDriver driver;
private static final Logger LOG = LoggerFactory.getLogger(NewFirefox.class);
public NewFirefox(String firefoxPath){
if(LOG.isInfoEnabled()){
LOG.info("firefox ****0");
}
System.setProperty("webdriver.firefox.bin", firefoxPath);
if(LOG.isInfoEnabled()){
LOG.info("firefox ****1");
}
ProfilesIni profile = new ProfilesIni();
FirefoxProfile ffprofile = profile.getProfile("default");
if(LOG.isInfoEnabled()){
LOG.info("firefox ****2");
}
this.driver = new FirefoxDriver(ffprofile);
if(LOG.isInfoEnabled()){
LOG.info("firefox ****3");
}
this.driver.quit();
if(LOG.isInfoEnabled()){
LOG.info("firefox quit");
}
}
}

issue in file creation while running topology in remote cluster using storm

I have created a topology which should read from a file and write it to a new file. My program is running properly in local cluster but while submitting in remote cluster i am not getting any error but file is not getting created. below is my code to submit topolgy in remote cluster :-
public static void main(String[] args) {
final Logger logger = LoggingService.getLogger(FileToFileTopology.class.getName());
try{
Properties prop =new Properties();
prop.load(new FileInputStream(args[0]+"/connection.properties"));
LoggingService.generateAppender("storm_etl",prop, "");
logger.info("inside main method...." +args.length);
System.out.println("inside main sys out");
Config conf= new Config();
conf.setDebug(false);
conf.put(Config.TOPOLOGY_MAX_SPOUT_PENDING,1);
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("file-reader",new FileReaderSpout(args[1]));
builder.setBolt("file-writer",new WriteToFileBolt(args[1]),2).shuffleGrouping("file-reader");
logger.info("submitting topology");
StormSubmitter.submitTopology(args[0], conf, builder.createTopology());
}
catch(Exception e){
System.out.println("inside catch");
logger.info("inside catch"+e.getMessage());
logger.error("inside error", e);
e.printStackTrace();
}
}
I have also used log4j to create my own logfile for my topology, log file gets created but no error in my log file. pls help
I had same issue with Hortonworks2.2. This happened because of permissions.
Even if you are submitting to the cluster as Root user, when submitting the storm jar command, it executes as 'storm' user. It can read the file from source, but won't write, because it doesn't have the necessary rights.
Modify the permissions of destination folder where you want to write file with all permissions.
chmod 777 -R /user/filesfolder

Categories