I am working on a MapReduce Java project in eclipse (on Ubuntu 14.04LTS) for which I am using Apache Avro serialization framework for that I need avro-tools-1.7.7.jar file. I have downloaded this jar from apache website and I have written the java code using the downloaded jar. When I execute the program I am getting java.lang.VerifyError error. I have read from few website that this error is due to a version mismatch between the JDK version of compiled class files in the jar to the runtime JDK version so I checked the versions of the downloaded jar file .class version and my runtime JVM version and there was a mismatch so I downgraded my JDK from 1.7 to 1.6 and there was no mismatch. The compiled classes in jar has 50 as thier major version and so are my current project class files. but I am still getting that error.
srimanth#srimanth-Inspiron-N5110:~$ hadoop jar Desktop/AvroMapReduceExamples.jar practice.AvroSort file:///home/srimanth/avrofile.avro file:///home/srimanth/sorted/ test.avro
15/04/19 22:14:36 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Exception in thread "main" java.lang.VerifyError: (class: org/apache/hadoop/mapred/JobTrackerInstrumentation, method: create signature: (Lorg/apache/hadoop/mapred/JobTracker;Lorg/apache/hadoop/mapred/JobConf;)Lorg/apache/hadoop/mapred/JobTrackerInstrumentation;) Incompatible argument to function
at org.apache.hadoop.mapred.LocalJobRunner.<init>(LocalJobRunner.java:420)
at org.apache.hadoop.mapred.JobClient.init(JobClient.java:470)
at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:455)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
at practice.AvroSort.run(AvroSort.java:63)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at practice.AvroSort.main(AvroSort.java:67)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:622)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Here is my java program
package practice;
import java.io.File;
import java.io.IOException;
import org.apache.avro.Schema;
import org.apache.avro.mapred.AvroCollector;
import org.apache.avro.mapred.AvroJob;
import org.apache.avro.mapred.AvroMapper;
import org.apache.avro.mapred.AvroReducer;
import org.apache.avro.mapred.Pair;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
public class AvroSort extends Configured implements Tool {
static class SortMapper<K> extends AvroMapper<K, Pair<K, K>> {
public void map(K datum, AvroCollector<Pair<K, K>> collector,
Reporter reporter) throws IOException {
collector.collect(new Pair<K, K>(datum, null, datum, null));
}
}
static class SortReducer<K> extends AvroReducer<K, K, K> {
public void reduce(K key, Iterable<K> values,
AvroCollector<K> collector,
Reporter reporter) throws IOException {
for (K value : values) {
collector.collect(value);
}
}
}
#Override
public int run(String[] args) throws Exception {
if (args.length != 3) {
System.err.printf(
"Usage: %s [generic options] <input> <output> <schema-file>\n",
getClass().getSimpleName());
ToolRunner.printGenericCommandUsage(System.err);
return -1;
}
String input = args[0];
String output = args[1];
String schemaFile = args[2];
JobConf conf = new JobConf(getConf(), getClass());
conf.setJobName("Avro sort");
FileInputFormat.addInputPath(conf, new Path(input));
FileOutputFormat.setOutputPath(conf, new Path(output));
Schema schema = new Schema.Parser().parse(new File(schemaFile));
AvroJob.setInputSchema(conf, schema);
Schema intermediateSchema = Pair.getPairSchema(schema, schema);
AvroJob.setMapOutputSchema(conf, intermediateSchema);
AvroJob.setOutputSchema(conf, schema);
AvroJob.setMapperClass(conf, SortMapper.class);
AvroJob.setReducerClass(conf, SortReducer.class);
JobClient.runJob(conf);
return 0;
}
public static void main(String[] args) throws Exception {
int exitCode = ToolRunner.run(new AvroSort(), args);
System.exit(exitCode);
}
}
Additional info: JDK version : 1.6 ,
Hadoop Version : 2.6.0 and I am not using maven.
please help me I am stuck at here this entire day. I really appreciate some help.
Related
I am building a gradle java project (please refer below) using Apache Beam code and executing on Eclipse Oxygen.
package com.xxxx.beam;
import java.io.IOException;
import org.apache.beam.runners.spark.SparkContextOptions;
import org.apache.beam.runners.spark.SparkPipelineResult;
import org.apache.beam.sdk.Pipeline;
import org.apache.beam.sdk.PipelineRunner;
import org.apache.beam.sdk.options.PipelineOptions;
import org.apache.beam.sdk.options.PipelineOptionsFactory;
import org.apache.beam.sdk.transforms.MapElements;
import org.apache.beam.sdk.transforms.SimpleFunction;
import org.apache.beam.sdk.values.KV;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.beam.sdk.io.FileIO;
import org.apache.beam.sdk.io.FileIO.ReadableFile;
public class ApacheBeamTestProject {
public void modelExecution(){
SparkContextOptions options = (SparkContextOptions) PipelineOptionsFactory.create();
options.setSparkMaster("xxxxxxxxx");
JavaSparkContext sc = options.getProvidedSparkContext();
JavaLinearRegressionWithSGDExample.runJavaLinearRegressionWithSGDExample(sc);
Pipeline p = Pipeline.create(options);
p.apply(FileIO.match().filepattern("hdfs://path/to/*.gz"))
// withCompression can be omitted - by default compression is detected from the filename.
.apply(FileIO.readMatches())
.apply(MapElements
// uses imports from TypeDescriptors
.via(
new SimpleFunction <ReadableFile, KV<String,String>>() {
private static final long serialVersionUID = -5715607038612883677L;
#SuppressWarnings("unused")
public KV<String,String> createKV(ReadableFile f) {
String temp = null;
try{
temp = f.readFullyAsUTF8String();
}catch(IOException e){
}
return KV.of(f.getMetadata().resourceId().toString(), temp);
}
}
))
.apply(FileIO.write())
;
SparkPipelineResult result = (SparkPipelineResult) p.run();
result.getState();
}
public static void main(String[] args) throws IOException {
System.out.println("Test log");
PipelineOptions options = PipelineOptionsFactory.create();
Pipeline p = Pipeline.create(options);
p.apply(FileIO.match().filepattern("hdfs://path/to/*.gz"))
// withCompression can be omitted - by default compression is detected from the filename.
.apply(FileIO.readMatches())
.apply(MapElements
// uses imports from TypeDescriptors
.via(
new SimpleFunction <ReadableFile, KV<String,String>>() {
private static final long serialVersionUID = -5715607038612883677L;
#SuppressWarnings("unused")
public KV<String,String> createKV(ReadableFile f) {
String temp = null;
try{
temp = f.readFullyAsUTF8String();
}catch(IOException e){
}
return KV.of(f.getMetadata().resourceId().toString(), temp);
}
}
))
.apply(FileIO.write());
p.run();
}
}
I am observing the following error when executing this project in Eclipse.
Test log
Exception in thread "main" java.lang.IllegalArgumentException: No Runner was specified and the DirectRunner was not found on the classpath.
Specify a runner by either:
Explicitly specifying a runner by providing the 'runner' property
Adding the DirectRunner to the classpath
Calling 'PipelineOptions.setRunner(PipelineRunner)' directly
at org.apache.beam.sdk.options.PipelineOptions$DirectRunner.create(PipelineOptions.java:291)
at org.apache.beam.sdk.options.PipelineOptions$DirectRunner.create(PipelineOptions.java:281)
at org.apache.beam.sdk.options.ProxyInvocationHandler.returnDefaultHelper(ProxyInvocationHandler.java:591)
at org.apache.beam.sdk.options.ProxyInvocationHandler.getDefault(ProxyInvocationHandler.java:532)
at org.apache.beam.sdk.options.ProxyInvocationHandler.invoke(ProxyInvocationHandler.java:155)
at org.apache.beam.sdk.options.PipelineOptionsValidator.validate(PipelineOptionsValidator.java:95)
at org.apache.beam.sdk.options.PipelineOptionsValidator.validate(PipelineOptionsValidator.java:49)
at org.apache.beam.sdk.PipelineRunner.fromOptions(PipelineRunner.java:44)
at org.apache.beam.sdk.Pipeline.create(Pipeline.java:150)
This project doesn't contain pom.xml file. Gradle has setup for all the links.
I am not sure how to fix this error? Could someone advise?
It seems that you are trying to use the DirectRunner and it is not on the classpath of your application. You can supply it by adding beam-runners-direct-java dependency to your application:
https://mvnrepository.com/artifact/org.apache.beam/beam-runners-direct-java
EDIT (answered in comment): you are trying to run this code on spark, but didn't specify it in PipelineOptions. Beam by default tries to run the code on DirectRunner, so I think this is why you get this error. Specifying:
options.setRunner(SparkRunner.class); before creating the pipeline sets the correct runner and fixes the issue.
Downloading the beam-runners-direct-java-x.x.x.jar and adding it to the project classpath worked for me. Please refer to this maven repository to download the DirectRunner jar file.
Furthermore, if you need a specific beam runner for your project, you can pass the runner name as a program argument (eg: --runner=DataflowRunner) and add the corresponding jar to the project classpath.
First of all, I'm running mapreduce as a java action through oozie. Used AvroTools to extract the schema of the avro files and then to compile it in to a Java class (RxAvro.java).
Whenever I try to use the Avro object, I get the following ClassCastException:
java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record cannot be cast to com.xxx.yyy.zz.jobs.actions.test.RxAvro
2018-01-05 07:46:07,038 WARN [main] org.apache.hadoop.mapred.YarnChild:
Exception running child : java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record cannot be cast to com.xxx.yyy.zzz.jobs.actions.test.RxAvro
at com.xxx.yyy.zzz.jobs.actions.test.AvroParqueMapreduce$Map.map(AvroParqueMapreduce.java:70)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1796)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
MapReduce Code:
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import parquet.avro.AvroParquetInputFormat;
import java.io.IOException;
public class AvroParqueMapreduce extends Configured implements Tool {
/**
* Main entry point for the example.
*
* #param args arguments
* #throws Exception when something goes wrong
*/
public static void main(final String[] args) throws Exception {
int res = ToolRunner.run(new Configuration(), new AvroParqueMapreduce(), args);
System.exit(res);
}
/**
* The MapReduce driver - setup and launch the job.
*
* #param args the command-line arguments
* #return the process exit code
* #throws Exception if something goes wrong
*/
public int run(final String[] args) throws Exception {
Path inputPath = new Path(args[0]);
Path outputPath = new Path(args[1]);
Configuration conf = super.getConf();
Job job = new Job(conf);
job.setJarByClass(AvroParqueMapreduce.class);
job.setInputFormatClass(AvroParquetInputFormat.class);
AvroParquetInputFormat.setInputPaths(job, inputPath);
job.setMapperClass(Map.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileOutputFormat.setOutputPath(job, outputPath);
return job.waitForCompletion(true) ? 0 : 1;
}
public static class Map extends Mapper<Void, RxAvro, Text, Text> {
#Override
public void map(Void key,
RxAvro value,
Context context) throws IOException, InterruptedException {
context.write(new Text(value.getObjectId().toString()),
new Text(value.getCollectionInterval().toString()));
}
}
Any help would be greatly appreciated.
there are several problems in your code. You should be setting your input format class properly as follow, AvroPArquetInputFormat is not correct for avro data.
job.setInputFormatClass(AvroKeyInputFormat.class);
Additional to that, you need to setup your input schema
AvroJob.setInputKeySchema(job, RxAvro.getClassSchema());
This tutorial can be useful as starting point https://dzone.com/articles/mapreduce-avro-data-files
I have a file
import java.io.IOException;
import java.nio.file.Paths;
import java.util.*;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.fs.*;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.*;
import org.apache.hadoop.mapreduce.lib.output.*;
import org.apache.hadoop.util.*;
public class ViewCount extends Configured implements Tool {
public static void main(String args[]) throws Exception {
int res = ToolRunner.run(new ViewCount(), args);
System.exit(res);
}
public int run(String[] args) throws Exception {
//Path inputPath = new Path(args[0]);
Path inputPath = Paths.get("C:/WorkSpace/input.txt");
Path outputPath = Paths.get("C:/WorkSpace/output.txt");
Configuration conf = getConf();
Job job = new Job(conf, this.getClass().toString());
I try to run a the app in windows. How can I set inputPath and outputPath? The method I use now doesn't work. Before I had
Path inputPath = new Path(args[0]);
Path outputPath = new Path(args[1]);
and I had to go to the command line. Now I want to run the app from the IDE.
I'm getting
Required:
org.apache.hadoop.fs.Path
Found:
java.nio.file.Path
For Eclipse, you could set arguments :
Run -> run configuration -> arguments.
It should be the same in Intellij.
The error tells you that it is expecting a org.apache.hadoop.fs.Path, but instead it receives a java.nio.file.Paths.
This means that you should change the second import of your code to
org.apache.hadoop.fs.Path. IDEs import suggestions can be wrong some times ;)
Change the import and then use the method that you already had to add the input and output path. Those arguments are given in Eclipse with right-clicking the project -> Run as -> Run configurations -> arguments. The two paths should be white-space separated. Apply and run!
For the next executions, just run the project.
maxtempmapper.java class:
package com.hadoop.gskCodeBase.maxTemp;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class MaxTempMapper extends Mapper<LongWritable,Text,Text,IntWritable> {
private static final int MISSING=9999;
#Override
public void map(LongWritable kay,Text value,Context context) throws IOException,InterruptedException {
String line = value.toString();
String year = line.substring(15,19);
int airTemperature;
if(line.charAt(87)== '+'){
airTemperature=Integer.parseInt(line.substring(88, 92));
}else{
airTemperature=Integer.parseInt(line.substring(87, 92));
}
String quality=line.substring(92,93);
if(airTemperature !=MISSING && quality.matches("[01459]")){
context.write(new Text(year), new IntWritable(airTemperature));
}
}
}
maxtempreducer.java class:
package com.hadoop.gskCodeBase.maxTemp;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class MaxTempReducer extends Reducer<Text,IntWritable,Text,IntWritable> {
#Override
public void reduce(Text key,Iterable<IntWritable> values,Context context) throws IOException,InterruptedException {
int maxValue = Integer.MIN_VALUE;
for(IntWritable value : values){
maxValue=Math.max(maxValue, value.get());
}
context.write(key, new IntWritable(maxValue));
}
}
maxtempdriver.java class:
package com.hadoop.gskCodeBase.maxTemp;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
public class MaxTempDriver extends Configured implements Tool{
public int run(String[] args) throws Exception{
if(args.length !=2){
System.err.println("UsageTemperatureDriver <input path> <outputpath>");
System.exit(-1);
}
Job job = Job.getInstance();
job.setJarByClass(MaxTempDriver.class);
job.setJobName("Max Temperature");
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job,new Path(args[1]));
job.setMapperClass(MaxTempMapper.class);
job.setReducerClass(MaxTempReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
System.exit(job.waitForCompletion(true) ? 0:1);
boolean success = job.waitForCompletion(true);
return success ? 0 : 1;
}
public static void main(String[] args) throws Exception {
MaxTempDriver driver = new MaxTempDriver();
int exitCode = ToolRunner.run(driver, args);
System.exit(exitCode);
}
}
I have to execute the above three classes on single node hadoop cluster on windows using command prompt
can someone please help me in how to execute these three classes on command prompt(windows)?
Archive all the java files into a single .jar file. Then just run it as you normally do. In Windows, it's easier to run Hadoop via Cygwin terminal. You can execute the job by the following command:
hadoop jar <path to .jar> <path to input folder in hdfs> <path to output folder in hdfs>
Eg:
hadoop jar wordcount.jar /input /output
-UPDATE-
You should assign you driver class to the job.setJarByClass(). In this case, it would be your MaxTempDriver.class
In eclipse, you can create a jar file by right clicking on your source folder > Export > JAR file. From there you can follow the steps. You can set your Main Class during the process as well.
Hope this answers your question.
I Have pseudo-distributed Hadoop setup on a linux machine. I have done a few examples in eclipse which is also installed in that linux machine and they worked fine. Now I want to perform MapReduce Jobs through eclipse (installed in windows machine) and access the HDFS which is already present in my linux machine. I have written the following Driver code:
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
public class Windows_Driver extends Configured implements Tool{
public static void main(String[] args) throws Exception {
int exitcode = ToolRunner.run(new Windows_Driver(), args);
System.exit(exitcode);
}
#Override
public int run(String[] arg0) throws Exception {
JobConf conf = new JobConf(Windows_Driver.class);
conf.set("fs.defaultFS", "hdfs://<Ip address>:50070");
FileInputFormat.setInputPaths(conf, new Path("sample"));
FileOutputFormat.setOutputPath(conf, new Path("sam"));
conf.setMapperClass(Win_Mapper.class);
conf.setMapOutputKeyClass(Text.class);
conf.setMapOutputValueClass(Text.class);
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(Text.class);
JobClient.runJob(conf);
return 0;
}
}
And the Mapper code :
import java.io.IOException;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;
public class Win_Mapper extends MapReduceBase implements Mapper<LongWritable, Text,Text, Text> {
#Override
public void map(LongWritable key, Text value, OutputCollector<Text, Text> o, Reporter arg3) throws IOException {
...
o.collect(... , ...);
}
}
When I run this, I get the following error:
SEVERE: PriviledgedActionException as:miracle cause:java.io.IOException: Failed to set permissions of path: \tmp\hadoop-miracle\mapred\staging\miracle1262421749\.staging to 0700
Exception in thread "main" java.io.IOException: Failed to set permissions of path: \tmp\hadoop-miracle\mapred\staging\miracle1262421749\.staging to 0700
at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:691)
at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:664)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:514)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:349)
at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:193)
at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:126)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:942)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:910)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1353)
at Windows_Driver.run(Windows_Driver.java:41)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at Windows_Driver.main(Windows_Driver.java:16)
How can I rectify the error? And how can I access my HDFS remotely from windows?
submit() method on the Job creates an internal Jobsubmitter instance and that would do all the data validations including input path,output path availability,file/directory creation permissions and other things. During different phases of MR, it will create temporary directories under which it will put the temp. files. The temp directory is taken from core-site.xml with property hadoop.tmp.dir. The issue with your system is it seems the temp. directory is /tmp/ and the user running the MR job doesn't have permission to change its rwx status to 700. Provide appropriate permissions and rerun the job.