I am writing a mapreduce application which takes input in (key, value) format and just displays the same data as output from reducer.
This is the sample input:
1500s 1
1960s 1
Aldus 1
In the below code, I am specifying the input format using <<>> and specified the delimiter as tab in the main(). When I run the code, I am running into the error message:
java.lang.Exception: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.LongWritable
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.LongWritable
at cscie63.examples.WordDesc$KVMapper.map(WordDesc.java:1)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
Tried different things to debug but nothing helped.
public class WordDesc {
public static class KVMapper
extends Mapper<Text, LongWritable, Text, LongWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Text key, LongWritable value , Context context
) throws IOException, InterruptedException {
context.write(key,value);
}
}
public static class KVReducer
extends Reducer<Text,LongWritable,Text,LongWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, LongWritable value,
Context context
) throws IOException, InterruptedException {
context.write(key, value);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
conf.set("mapreduce.input.keyvaluelinerecordreader.key.value.separator", "\t");
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length < 2) {
System.err.println("Usage: wordcount <in> [<in>...] <out>");
System.exit(2);
}
Job job = new Job(conf, "word desc");
job.setInputFormatClass(KeyValueTextInputFormat.class);
job.setJarByClass(WordDesc.class);
job.setMapperClass(KVMapper.class);
job.setCombinerClass(KVReducer.class);
job.setReducerClass(KVReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(LongWritable.class);
for (int i = 0; i < otherArgs.length - 1; ++i) {
FileInputFormat.addInputPath(job, new Path(otherArgs[i]));
}
FileOutputFormat.setOutputPath(job,
new Path(otherArgs[otherArgs.length - 1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
I guess that this line job.setInputFormatClass(KeyValueTextInputFormat.class); tells your program to treat your input as key value pairs of Text. Therefore, when you require your input value to be a LongWritable you get this Exception.
A quick fix would be to read your input as Text and then, if you want to use a LongWritable, parse it using:
public static class KVMapper
extends Mapper<Text, Text, Text, LongWritable>{
private final static LongWritable val = new LongWritable();
public void map(Text key, Text value, Context context) {
val.set(Long.parseLong(value.toString()));
context.write(key,val);
}
}
What it does is the following: value is Text, then value.toString() gives the String representation of this Text and then Long.parseLong() parses this String as long. Finally, val.set(), transforms it to a a LongWritable.
By the way, I don't think that you need a Reducer for that... You could make it faster by setting the number of reduce tasks to 0.
Related
I am trying to analyze a retail store data where i want to solve the breakdown of sales by city ,Here is my data
Date Time City Product-Cat Sale-Value Payment-Mode
2012-01-01 09:20 Fort Worth Women's Clothing 153.57 Visa
2012-01-01 09:00 San Jose Mens Clothing 214.05 Rupee
2012-01-01 09:00 San Diego Music 76.43 Amex
2012-01-01 09:00 New York Cameras 45.76 Visa
Now i want to calculate sales break down by product category across all the stores
Here is the Mapper and reducer and the main class
public class RetailDataAnalysis {
public static class RetailDataAnalysisMapper extends Mapper<Text,Text,Text,Text>{
// when trying with LongWritable Key
public void map(LongWritable key,Text Value,Context context) throws IOException, InterruptedException{
String analyser [] = Value.toString().split(",");
Text productCategory = new Text(analyser[3]);
Text salesPrice = new Text(analyser[4]);
context.write(productCategory, salesPrice);
}
// When trying with Text key
public void map(Text key,Text Value,Context context) throws IOException, InterruptedException{
String analyser [] = Value.toString().split(",");
Text productCategory = new Text(analyser[3]);
Text salesPrice = new Text(analyser[4]);
context.write(productCategory, salesPrice);
}
}
public static class RetailDataAnalysisReducer extends Reducer<Text,Text,Text,Text>{
protected void reduce(Text key,Iterable<Text> values,Context context)throws IOException, InterruptedException{
String csv ="";
for(Text value:values){
if(csv.length()>0){
csv+= ",";
}
csv+=value.toString();
}
context.write(key, new Text(csv));
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String [] otherArgs = new GenericOptionsParser(conf,args).getRemainingArgs();
if(otherArgs.length<2){
System.out.println("Usage Retail Data ");
System.exit(2);
}
Job job= new Job(conf,"Retail Data Analysis");
job.setJarByClass(RetailDataAnalysis.class);
job.setMapperClass(RetailDataAnalysisMapper.class);
job.setCombinerClass(RetailDataAnalysisReducer.class);
job.setReducerClass(RetailDataAnalysisReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
for(int i=0;i<otherArgs.length-1;++i){
FileInputFormat.addInputPath(job, new Path(otherArgs[i]));
}
FileOutputFormat.setOutputPath(job, new Path(otherArgs[otherArgs.length-1]));
System.exit(job.waitForCompletion(true)?0:1);
}
}
And the exception i am getting is when using LongWritable Key,
18/04/11 09:15:40 INFO mapreduce.Job: Task Id : attempt_1523355254827_0008_m_000000_2, Status : FAILED
Error: java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, received org.apache.hadoop.io.LongWritable
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1069)
Exception i am getting when trying to use Text key
Error: java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, received org.apache.hadoop.io.LongWritable
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1069)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:712)
at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
at org.apache.hadoop.mapreduce.Mapper.map(Mapper.java:124)
Please help me to solve this,i am very new to hadoop.
You may need different input format class. By default used is TextInputFormat which split the file line by line and gives line number as LongWritable and the line as Text.
You can specify the input format class this way:
job.setInputFormatClass(TextInputFormat.class);
In your case, if you do not need the key, just the values, you can use LongWritable as key:
public static class RetailDataAnalysisMapper extends Mapper<LongWritable, Text, Text, Text> {
public void map(LongWritable key, Text Value, Context context) throws IOException, InterruptedException {
//...
}
}
Edit:
Here is whole code after modyfing to use LongWritable as key:
public class RetailDataAnalysis {
public static class RetailDataAnalysisMapper extends Mapper<LongWritable, Text, Text, Text> {
public void map(LongWritable key, Text Value, Context context) throws IOException, InterruptedException {
String analyser[] = Value.toString().split(",");
Text productCategory = new Text(analyser[3]);
Text salesPrice = new Text(analyser[4]);
context.write(productCategory, salesPrice);
}
}
public static class RetailDataAnalysisReducer extends Reducer<Text, Text, Text, Text> {
protected void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
String csv = "";
for (Text value : values) {
if (csv.length() > 0) {
csv += ",";
}
csv += value.toString();
}
context.write(key, new Text(csv));
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length < 2) {
System.out.println("Usage Retail Data ");
System.exit(2);
}
Job job = new Job(conf, "Retail Data Analysis");
job.setJarByClass(RetailDataAnalysis.class);
job.setMapperClass(RetailDataAnalysisMapper.class);
job.setCombinerClass(RetailDataAnalysisReducer.class);
job.setReducerClass(RetailDataAnalysisReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
for (int i = 0; i < otherArgs.length - 1; ++i) {
FileInputFormat.addInputPath(job, new Path(otherArgs[i]));
}
FileOutputFormat.setOutputPath(job, new Path(otherArgs[otherArgs.length - 1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
Also if you are splitting the data by ,, your data should be a csv, like this:
2012-01-01 09:20,Fort Worth,Women's Clothing,153.57,Visa
2012-01-01 09:00,San Jose,Mens Clothing,214.05,Rupee
2012-01-01 09:00,San Diego,Music,76.43,Amex
2012-01-01 09:00,New York,Cameras,5.76,Visa
Not space separated as you specified it in your question.
When you read a file using Map Reduce, the file input format ( the default one ) reads an entire line and sends it to the mapper in the format of , so the input to the mapper becomes :-
public static class RetailDataAnalysisMapper extends Mapper<LongWritable,Text,Text,Text>
In case you need to read as
public static class RetailDataAnalysisMapper extends Mapper<Text,Text,Text,Text>
you would need to change the file input format and use your custom file input format along with the custom record reader.
Then you need to add the following line in the driver code.
job.setInputFormatClass("your custom input format".class);
Hadoop understands everything in the form of
so when you read a file, the offset becomes the LongWritable key and the value read becomes the value.
So you need to use the default signature of Mapper<LongWritable,Text, <anything>,<anything> >
I'm trying to run hadoop 2 MapReduce process that the output_format_class is SequenceFileOutputFormat and the input_format_class is SequenceFileInputFormat.
I chose that the Mapper emits key and value both as BytesWritable. For the Reducer it emits key as IntWritable and value as BytesWritable.
Every time I'm getting the following error:
Error: java.io.IOException: wrong key class: org.apache.hadoop.io.BytesWritable is not class org.apache.hadoop.io.IntWritable
at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1306)
at org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat$1.write(SequenceFileOutputFormat.java:83)
at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:558)
at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:105)
at org.apache.hadoop.mapreduce.Reducer.reduce(Reducer.java:150)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
I discovered that when I define the OutputFormat not as SequenceFileOutputFormat the problem is solved but I need it as a SequenceFileOutputFormat.
Here is the main:
Configuration conf = new Configuration(true);
conf.set("refpath", "/out/Sample1/Local/EU/CloudBurst/BinaryFiles/ref.br");
conf.set("qrypath", "/out/Sample1/Local/EU/CloudBurst/BinaryFiles/qry.br");
conf.set("MIN_READ_LEN", Integer.toString(MIN_READ_LEN));
conf.set("MAX_READ_LEN", Integer.toString(MAX_READ_LEN));
conf.set("K", Integer.toString(K));
conf.set("SEED_LEN", Integer.toString(SEED_LEN));
conf.set("FLANK_LEN", Integer.toString(FLANK_LEN));
conf.set("ALLOW_DIFFERENCES", Integer.toString(ALLOW_DIFFERENCES));
conf.set("BLOCK_SIZE", Integer.toString(BLOCK_SIZE));
conf.set("REDUNDANCY", Integer.toString(REDUNDANCY));
conf.set("FILTER_ALIGNMENTS", (FILTER_ALIGNMENTS ? "1" : "0"));
Job job = new Job(conf,"CloudBurst");
job.setNumReduceTasks(NUM_REDUCE_TASKS); // MV2
//conf.setNumMapTasks(NUM_MAP_TASKS); TODO find solution for mv2
FileInputFormat.addInputPath(job, new Path("/out/Sample1/Local/EU/CloudBurst/BinaryFiles/ref.br"));//TODO change it fit to the params
FileInputFormat.addInputPath(job, new Path("/out/Sample1/Local/EU/CloudBurst/BinaryFiles/qry.br"));//TODO change it fit to the params
job.setJarByClass(MerReduce.class);//mv2
job.setInputFormatClass(SequenceFileInputFormat.class);
job.setOutputFormatClass(SequenceFileOutputFormat.class);
// The order of seeds is not important, but make sure the reference seeds are seen before the qry seeds
job.setPartitionerClass(MerReduce.PartitionMers.class); // mv2
job.setGroupingComparatorClass(MerReduce.GroupMersWC.class); //mv2 TODO
job.setMapperClass(MerReduce.MapClass.class);
job.setReducerClass(MerReduce.ReduceClass.class);
job.setMapOutputKeyClass(BytesWritable.class);//mv2
job.setMapOutputValueClass(BytesWritable.class);//mv2
job.setOutputKeyClass(IntWritable.class);
job.setOutputValueClass(BytesWritable.class);
Path oPath = new Path("/out/Sample1/Local/EU/Vectors");//TODO change it fit to the params
//conf.setOutputPath(oPath);
FileOutputFormat.setOutputPath(job, oPath);
System.err.println(" Removing old results");
FileSystem.get(conf).delete(oPath);
int code = job.waitForCompletion(true) ? 0 : 1;
System.err.println("Finished");
}
The mapper class headline:
public static class MapClass extends Mapper<IntWritable, BytesWritable, BytesWritable, BytesWritable>
public void map(IntWritable id, BytesWritable rawRecord,Context context) throws IOException, InterruptedException
The reducer class headline:
public static class ReduceClass extends Reducer (BytesWritable, BytesWritable, IntWritable, BytesWritable)
public synchronized void reduce(BytesWritable mer, Iterator<BytesWritable> values,Context context)
throws IOException, InterruptedException {
Anybody has an idea?
job.setInputFormatClass(SequenceFileInputFormat.class);
should be
job.setInputFormatClass(IntWritable.class);
You mapper input is int and bytes, but in job you gave both inputs as sequence
in Hadoop 2.4.0, I get the following error while executing below code sample. I think, there is mismatch hadoop version. Are you review the code? and How can I fix this codes?
I am trying to write map-reduce job that copying Hcatalog table.
thank you.
Exception in thread "main" java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected
at org.apache.hcatalog.mapreduce.HCatBaseOutputFormat.getJobInfo(HCatBaseOutputFormat.java:94)
at org.apache.hcatalog.mapreduce.HCatBaseOutputFormat.getOutputFormat(HCatBaseOutputFormat.java:82)
at org.apache.hcatalog.mapreduce.HCatBaseOutputFormat.checkOutputSpecs(HCatBaseOutputFormat.java:72)
at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:458)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:343)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303)
at org.deneme.hadoop.UseHCat.run(UseHCat.java:102)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.deneme.hadoop.UseHCat.main(UseHCat.java:107)
Code Sample
public class UseHCat extends Configured implements Tool{
public static class Map extends Mapper<WritableComparable, HCatRecord,Text,IntWritable> {
String groupname;
#Override
protected void map( WritableComparable key,
HCatRecord value,
org.apache.hadoop.mapreduce.Mapper<WritableComparable, HCatRecord,
Text, IntWritable>.Context context)
throws IOException, InterruptedException {
// The group table from /etc/group has name, 'x', id
groupname = (String) value.get(0);
int id = (Integer) value.get(2);
// Just select and emit the name and ID
context.write(new Text(groupname), new IntWritable(id));
}
}
public static class Reduce extends Reducer<Text, IntWritable,
WritableComparable, HCatRecord> {
protected void reduce( Text key,
java.lang.Iterable<IntWritable> values,
org.apache.hadoop.mapreduce.Reducer<Text, IntWritable,
WritableComparable, HCatRecord>.Context context)
throws IOException, InterruptedException {
// Only expecting one ID per group name
Iterator<IntWritable> iter = values.iterator();
IntWritable iw = iter.next();
int id = iw.get();
// Emit the group name and ID as a record
HCatRecord record = new DefaultHCatRecord(2);
record.set(0, key.toString());
record.set(1, id);
context.write(null, record);
}
}
public int run(String[] args) throws Exception {
Configuration conf = getConf(); //hdfs://sandbox.hortonworks.com:8020
//conf.set("fs.defaultFS", "hdfs://192.168.1.198:8020");
//conf.set("mapreduce.job.tracker", "192.168.1.115:50001");
//Configuration conf = new Configuration();
//conf.set("fs.defaultFS", "hdfs://192.168.1.198:8020/data");
args = new GenericOptionsParser(conf, args).getRemainingArgs();
// Get the input and output table names as arguments
String inputTableName = args[0];
String outputTableName = args[1];
// Assume the default database
String dbName = null;
String jobName = "UseHCat";
String userChosenName = getConf().get(JobContext.JOB_NAME);
if (userChosenName != null)
jobName += ": " + userChosenName;
Job job = Job.getInstance(getConf());
job.setJobName(jobName);
// Job job = new Job(conf, "UseHCat");
// HCatInputFormat.setInput(job, InputJobInfo.create(dbName,inputTableName, null));
HCatInputFormat.setInput(job, dbName, inputTableName);
job.setJarByClass(UseHCat.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
// An HCatalog record as input
job.setInputFormatClass(HCatInputFormat.class);
// Mapper emits a string as key and an integer as value
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
// Ignore the key for the reducer output; emitting an HCatalog record as value
job.setOutputKeyClass(WritableComparable.class);
job.setOutputValueClass(DefaultHCatRecord.class);
job.setOutputFormatClass(HCatOutputFormat.class);
HCatOutputFormat.setOutput(job, OutputJobInfo.create(dbName, outputTableName, null));
HCatSchema s = HCatOutputFormat.getTableSchema(job.getConfiguration());
System.err.println("INFO: output schema explicitly set for writing:" + s);
HCatOutputFormat.setSchema(job, s);
return (job.waitForCompletion(true) ? 0 : 1);
}
public static void main(String[] args) throws Exception {
// System.setProperty("hadoop.home.dir", "C:"+File.separator+"hadoop-2.4.0");
int exitCode = ToolRunner.run(new UseHCat(), args);
System.exit(exitCode);
}
}
In Hadoop 1.x.x JobContext is a Class where as in Hadoop 2.x.x, it is an interface and HCatalog-core APIs are not compatible with hadoop 2.x.x.
HCatalogBaseOutputFormat class needs the following code change to fix the issue:
//JobContext ctx = new JobContext(conf,jobContext.getJobID());
JobContext ctx = new Job(conf);
I am using multi text output formate to create multiple files of a single file i.e each line on new file.
This is my code:
public class MOFExample extends Configured implements Tool {
private static double count = 0;
static class KeyBasedMultipleTextOutputFormat extends
MultipleTextOutputFormat<Text, Text> {
#Override
protected String generateFileNameForKeyValue(Text key, Text value,
String name) {
return count++ + "_";// + name;
}
}
/**
* The main job driver.
*/
public int run(final String[] args) throws Exception {
Path csvInputs = new Path(args[0]);
Path outputDir = new Path(args[1]);
JobConf jobConf = new JobConf(super.getConf());
jobConf.setJarByClass(MOFExample.class);
jobConf.setMapperClass(IdentityMapper.class);
jobConf.setInputFormat(KeyValueTextInputFormat.class);
jobConf.setOutputFormat(KeyBasedMultipleTextOutputFormat.class);
jobConf.setOutputValueClass(Text.class);
jobConf.setOutputKeyClass(Text.class);
FileInputFormat.setInputPaths(jobConf, csvInputs);
FileOutputFormat.setOutputPath(jobConf, outputDir);
//jobConf.setNumMapTasks(4);
jobConf.setNumReduceTasks(4);
return JobClient.runJob(jobConf).isSuccessful() ? 0 : 1;
}
public static void main(final String[] args) throws Exception {
int res = ToolRunner.run(new Configuration(), new MOFExample(), args);
System.exit(res);
}
}
This code runs fine on small text file but when the number of lines of input file are greater than 1900 which is yet not a large file it throws an exception:
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:836)
at MOFExample.run(MOFExample.java:57)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at MOFExample.main(MOFExample.java:61)
I also tried this tutorial but this one returns empty output directory without any exception when the input file is large however this one also worked fine with small input file.
Note: I am using Single-Node Cluster
In Map I read Hdfs file update to Hbase,
Version:hadoop 2.5.1 hbase 1.0.0
Exception as follows :
Exception in thread "main" org.apache.hadoop.mapred.InvalidJobConfException: Output directory not set.
maybe there is something wrong with
job.setOutputFormatClass(TableOutputFormat.class);
this line prompt:
The method setOutputFormatClass(Class<? extends OutputFormat>) in the type Job is not applicable for the arguments (Class<TableOutputFormat>)
codes as follows:
public class HdfsAppend2HbaseUtil extends Configured implements Tool{
public static class HdfsAdd2HbaseMapper extends Mapper<Text, Text, ImmutableBytesWritable, Put>{
public void map(Text ikey, Text ivalue, Context context)
throws IOException, InterruptedException {
String oldIdList = HBaseHelper.getValueByKey(ikey.toString());
StringBuffer sb = new StringBuffer(oldIdList);
String newIdList = ivalue.toString();
sb.append("\t" + newIdList);
Put p = new Put(ikey.toString().getBytes());
p.addColumn("idFam".getBytes(), "idsList".getBytes(), sb.toString().getBytes());
context.write(new ImmutableBytesWritable(), p);
}
}
public int run(String[] paths) throws Exception {
Configuration conf = HBaseConfiguration.create();
conf.set("hbase.zookeeper.quorum", "master,salve1");
conf.set("hbase.zookeeper.property.clientPort", "2181");
Job job = Job.getInstance(conf,"AppendToHbase");
job.setJarByClass(cn.edu.hadoop.util.HdfsAppend2HbaseUtil.class);
job.setInputFormatClass(KeyValueTextInputFormat.class);
job.setMapperClass(HdfsAdd2HbaseMapper.class);
job.setNumReduceTasks(0);
job.setOutputFormatClass(TableOutputFormat.class);
job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, "reachableTable");
FileInputFormat.setInputPaths(job, new Path(paths[0]));
job.setOutputKeyClass(ImmutableBytesWritable.class);
job.setOutputValueClass(Put.class);
return job.waitForCompletion(true) ? 0 : 1;
}
public static void main(String[] args) throws Exception {
System.out.println("Append Start: ");
long time1 = System.currentTimeMillis();
long time2;
String[] pathsStr = {Const.TwoDegreeReachableOutputPathDetail};
int exitCode = ToolRunner.run(new HdfsAppend2HbaseUtil(), pathsStr);
time2 = System.currentTimeMillis();
System.out.println("Append Cost " + "\t" + (time2-time1)/1000 +" s");
System.exit(exitCode);
}
}
You didn't mention the output directory where it is to write the output like you gave for input path.
Mention it like this.
FileOutputFormat.setOutputPath(job, new Path(<output path>));
At last , I know why,just as I supposed there is something wrong with:
job.setOutputFormatClass(TableOutputFormat.class);
this line prompt:
The method setOutputFormatClass(Class<? extends OutputFormat>) in the type Job is not applicable for the arguments (Class<TableOutputFormat>)
In fact here we need import
org.apache.hadoop.hbase.mapreduce.TableOutputFormat
not to import
org.apache.hadoop.hbase.mapred.TableOutputFormat
the former extends from org.apache.hadoop.mapred.FileOutputFormat
see:
https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapred/TableOutputFormat.html
and the later extends from
org.apache.hadoop.mapreduce.OutputFormat
see:
https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.html
At last Thank U all very much!!!