Load data via HFile into HBase not working - java

I wrote a mapper to load data from disk via HFile into HBase, the program runs successfully, but there's no data loaded in my HBase table, any ideas on this please?
Here's my java program:
protected void writeToHBaseViaHFile() throws Exception {
try {
System.out.println("In try...");
Configuration conf = HBaseConfiguration.create();
conf.set("hbase.zookeeper.quorum", "XXXX");
Connection connection = ConnectionFactory.createConnection(conf);
System.out.println("got connection");
String inputPath = "/tmp/nuggets_from_Hive/part-00000";
String outputPath = "/tmp/mytemp" + new Random().nextInt(1000);
final TableName tableName = TableName.valueOf("steve1");
System.out.println("got table steve1, outputPath = " + outputPath);
// tag::SETUP[]
Table table = connection.getTable(tableName);
Job job = Job.getInstance(conf, "ConvertToHFiles");
System.out.println("job is setup...");
HFileOutputFormat2.configureIncrementalLoad(job, table,
connection.getRegionLocator(tableName)); // <1>
System.out.println("done configuring incremental load...");
job.setInputFormatClass(TextInputFormat.class); // <2>
job.setJarByClass(Importer.class); // <3>
job.setMapperClass(LoadDataMapper.class); // <4>
job.setMapOutputKeyClass(ImmutableBytesWritable.class); // <5>
job.setMapOutputValueClass(KeyValue.class); // <6>
FileInputFormat.setInputPaths(job, inputPath);
HFileOutputFormat2.setOutputPath(job, new org.apache.hadoop.fs.Path(outputPath));
System.out.println("Setup complete...");
// end::SETUP[]
if (!job.waitForCompletion(true)) {
System.out.println("Failure");
} else {
System.out.println("Success");
}
} catch (Exception e) {
e.printStackTrace();
}
}
Here's my mapper class:
public class LoadDataMapper extends Mapper<LongWritable, Text, ImmutableBytesWritable, Cell> {
public static final byte[] FAMILY = Bytes.toBytes("pd");
public static final byte[] COL = Bytes.toBytes("bf");
public static final ImmutableBytesWritable rowKey = new ImmutableBytesWritable();
#Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String[] line = value.toString().split("\t"); // <1>
byte[] rowKeyBytes = Bytes.toBytes(line[0]);
rowKey.set(rowKeyBytes);
KeyValue kv = new KeyValue(rowKeyBytes, FAMILY, COL, Bytes.toBytes(line[1])); // <6>
context.write (rowKey, kv); // <7>
System.out.println("line[0] = " + line[0] + "\tline[1] = " + line[1]);
}
}
I've created the table steve1 in my cluster, but got 0 rows after the program runs successfully:
hbase(main):007:0> count 'steve1'
0 row(s) in 0.0100 seconds
=> 0
What I've tried:
I tried to add print out message as in the mapper class to see if it really read the data, but the printouts never got printed in my console.
I'm at a loss at how to debug this.
Any ideas is greatly appreciated!

This is only to create HFiles, you still need to load HFile onto your table. For example, you need to do something like:
LoadIncrementalHFiles loader = new LoadIncrementalHFiles(conf);
loader.doBulkLoad(new Path(outputPath), admin, hTable, regionLocator);

Related

MapReduce: Reduce function is writing strange values that are not expected

My reduce function in Java is writing on the output file values that are not expected. I inspect my code with breakpoints and I saw that, for each context.write call that I made, the key and the value that I'm writing are correct. Where am I making mistakes?
What I'm trying to do is taking in input row of type date, customer, vendor, amount that represent transactions and generate a dataset with row like date, user, balance where the balance is the sum of all transactions in which user was both customer or vendor.
Here is my code:
public class Transactions {
public static class TokenizerMapper
extends Mapper<Object, Text, Text, Text>{
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
var splittedValues = value.toString().split(",");
var date = splittedValues[0];
var customer = splittedValues[1];
var vendor = splittedValues[2];
var amount = splittedValues[3];
var reduceValue = new Text(customer + "," + vendor + "," + amount);
context.write(new Text(date), reduceValue);
}
}
public static class IntSumReducer
extends Reducer<Text,Text,Text,Text> {
public void reduce(Text key, Iterable<Text> values,
Context context
) throws IOException, InterruptedException {
Map<String, Integer> balanceByUserId = new ConcurrentHashMap<>();
values.forEach(transaction -> {
var splittedTransaction = transaction.toString().split(",");
var customer = splittedTransaction[0];
var vendor = splittedTransaction[1];
var amount = 0;
if (splittedTransaction.length > 2) {
amount = Integer.parseInt(splittedTransaction[2]);
}
if (!balanceByUserId.containsKey(customer)) {
balanceByUserId.put(customer, 0);
}
if (!balanceByUserId.containsKey(vendor)) {
balanceByUserId.put(vendor, 0);
}
balanceByUserId.put(customer, balanceByUserId.get(customer) - amount);
balanceByUserId.put(vendor, balanceByUserId.get(vendor) + amount);
});
balanceByUserId.entrySet().forEach(entry -> {
var reducerValue = new Text(entry.getKey() + "," + entry.getValue().toString());
try {
context.write(key, reducerValue);
} catch (IOException e) {
e.printStackTrace();
} catch (InterruptedException e) {
e.printStackTrace();
}
});
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "transactions");
job.setJarByClass(Transactions.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
where the balance is the sum of all transactions in which user was both customer or vendor
balanceByUserId exists only for each unique date since your map key is the date.
If you want to aggregate by customer info (name / ID?), then customer should be the key of the mapper output.
Once all data from each customer is grouped by the reducer, you can then sort by date, if needed, but aggregate by other details.
Also worth pointing out that this would be easier in Hive or SparkSQL rather than Mapreduce.

Run a Cypher query from Spring to Neo4j

I have uploaded a CSV file and already have nodes and relationship defined on Neo4j. I've tried to create a program base on an example that basically run a cypher query from Spring that would generate the output from neo4j. However, I'm encountering this error:
Exception in thread "main" java.lang.NoSuchMethodError:org.neo4j.graphdb.factory.GraphDatabaseFactory.newEmbeddedDatabase(Ljava/io/File;)Lorg/neo4j/graphdb/GraphDatabaseService;
at org.neo4j.connection.Neo4j.run(Neo4j.java:43)
at org.neo4j.connection.Neo4j.main(Neo4j.java:37)
I'm wondering what could possibly be the error?
Here is my code:
public class Neo4j{
public enum NodeType implements Label{
Issues, Cost, Reliability, Timeliness;
}
public enum RelationType implements RelationshipType{
APPLIES_TO
}
String rows = "";
String nodeResult;
String resultString;
String columnString;
private static File DB_PATH = new File("/Users/phaml1/Documents/Neo4j/default.graphdb/import/");
public static void main(String[] args){
Neo4j test = new Neo4j();
test.run();
}
void run()
{
clear();
GraphDatabaseService db = new GraphDatabaseFactory().newEmbeddedDatabase(DB_PATH);
try(Transaction tx1 = db.beginTx();
Result result = db.execute("MATCH(b:Business)-[:APPLIES_TO]->(e:Time) RETURN b,e"))
{
while(result.hasNext())
{
while ( result.hasNext() )
{
Map<String,Object> row = result.next();
for ( Entry<String,Object> column : row.entrySet() )
{
rows += column.getKey() + ": " + column.getValue() + "; ";
}
rows += "\n";
}
}
try (Transaction something = db.beginTx();
Result result1 = db.execute("MATCH(b:Business)-[:APPLIES_TO]->(e:Time) RETURN b,e"))
{
Iterator<Node> n_column = result.columnAs("n");
for(Node node: Iterators.asIterable(n_column))
{
nodeResult = node + ": " + node.getProperties("Description");
}
List<String> columns = result.columns();
columnString = columns.toString();
resultString = db.execute("MATCH(b:Business)-[:APPLIES_TO]->(e:Time) RETURN b,e").resultAsString();
}
db.shutdown();
}
}
private void clear(){
try{
deleteRecursively(DB_PATH);
}
catch(IOException e){
throw new RuntimeException(e);
}
}
}
It looks like a Neo4j version conflict.
GraphDatabaseService db = new GraphDatabaseFactory().newEmbeddedDatabase(DB_PATH);
has a String as the argument in Neo4j 2x (https://neo4j.com/api_docs/2.0.3/org/neo4j/graphdb/factory/GraphDatabaseFactory.html#newEmbeddedDatabase(java.lang.String))
but a File in Neo4j 3x (http://neo4j.com/docs/java-reference/current/javadocs/org/neo4j/graphdb/factory/GraphDatabaseFactory.html#newEmbeddedDatabase-java.io.File-)
SDN is probably pulling in Neo4j 2.3.6 as a dependency- please check your dependency tree and override the Neo4j version

org.apache.hadoop.io.Text cannot be cast to org.apache.hive.hcatalog.data.HCatRecord

I wrote a script which can take data from HBase, parse it and then save it into Hive. But I am getting this error:
org.apache.hadoop.mapred.YarnChild: Exception running child : java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hive.hcatalog.data.HCatRecord
at org.apache.hive.hcatalog.mapreduce.FileRecordWriterContainer.write(FileRecordWriterContainer.java:53)
at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:558)
at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:105)
at org.apache.hadoop.mapreduce.Reducer.reduce(Reducer.java:150)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
I know that problem is some stupid mismatch of reducer key, value and job.setOutputKeyClass, job.setOutputValueClass, but I cannot find it :(. Please help me, here is my code:
public class DumpProductViewsAggHive extends Configured implements Tool {
public static enum LOCAL_COUNTER {
IGNORED, VALID, INVALID
}
private static final String NAME = "DumpProductViewsAggHive"; //Change the name of the job here
private static final String SEPARATOR = "/t"; //Change the separator here
private String dateFrom; //Start date - HBase MR applicable
private String dateTo; //Ending date - HBase MR applicable
private String fileOutput; //output file path
private String table = "we_json"; //default HBase table
private int caching = 500; //default HBase caching
/**
* Map phase HBase
*/
public static class MapHBase extends TableMapper<Text, Text> {
private Text key_out = new Text();
private Text value_out = new Text();
private JSONParser parser = new JSONParser();
private DateFormat formatter = new SimpleDateFormat("yyyyMMdd");
private String day;
private Date date = new Date();
private Double ts = new Double(0);
public void map(ImmutableBytesWritable row, Result value,
Context context) throws IOException, InterruptedException {
String b = new String(value.getValue(Bytes.toBytes("d"),
Bytes.toBytes("j")));
JSONObject obj;
try {
obj = (JSONObject) parser.parse(b);
if (obj.get("e").equals("pview_bcn")) {
ts = Double.parseDouble(obj.get("ts").toString());
ts = ts * 1000;
date.setTime(Math.round(ts));
day = formatter.format(date);
key_out.set(obj.get("sid").toString());
value_out.set(obj.get("variant_id") + SEPARATOR + obj.get("shop")
+ SEPARATOR + obj.get("status") + SEPARATOR + day
+ SEPARATOR + "D");
context.getCounter(LOCAL_COUNTER.VALID).increment(1);
context.write(key_out, value_out);
} else {
context.getCounter(LOCAL_COUNTER.IGNORED).increment(1);
}
} catch (Exception pe) {
// ignore value
context.getCounter(LOCAL_COUNTER.INVALID).increment(1);
return;
}
}
}
/**
* Reduce phase
*/
public static class Reduce extends Reducer<Text, Text, NullWritable, HCatRecord>{
public void reduce (Iterable<Text> key, Text value, Context context)
throws IOException, InterruptedException{
Set<Text> sidSet = new HashSet<Text>();
while (key.iterator().hasNext()) {
sidSet.add(key.iterator().next());
}
String[] tokens = value.toString().split( SEPARATOR );
HCatRecord record = new DefaultHCatRecord(6);
record.set(0, tokens[0].toString());
record.set(1, tokens[1].toString());
record.set(2, tokens[2].toString());
record.set(3, tokens[3].toString());
record.set(4, tokens[4].toString());
record.set(5, sidSet.size());
context.write(NullWritable.get(), record);
}
}
public void getParams(String[] otherArgs) throws ParseException {
DateFormat formatter = new SimpleDateFormat("yyyyMMdd");
Calendar cal = Calendar.getInstance();
int i = 0;
/*
* Loop parameters
*/
while (i<otherArgs.length) {
// get parameter -d query only one day. HBase applicable.
if (otherArgs[i].equals("-d")) {
cal.setTime(formatter.parse(otherArgs[++i]));
dateFrom = Long.toHexString(cal.getTimeInMillis()/1000);
cal.add(Calendar.DATE, 1);
dateTo = Long.toHexString(cal.getTimeInMillis()/1000);
System.out.println("Day translated to start: " + dateFrom + "; End: " + dateTo);
}
// get start date -f parameter. HBase applicable.
if (otherArgs[i].equals("-f")) {
cal.setTime(formatter.parse(otherArgs[++i]));
dateFrom = Long.toHexString(cal.getTimeInMillis() / 1000);
System.out.println("From: " + dateFrom);
}
// get end date -t parameter. HBase applicable.
if (otherArgs[i].equals("-t")) {
cal.setTime(formatter.parse(otherArgs[++i]));
dateTo = Long.toHexString(cal.getTimeInMillis() / 1000);
System.out.println("To: " + dateTo);
}
// get output folder -o parameter.
if (otherArgs[i].equals("-o")) {
fileOutput = otherArgs[++i];
System.out.println("Output: " + fileOutput);
}
// get caching -c parameter. HBase applicable.
if (otherArgs[i].equals("-c")) {
caching = Integer.parseInt(otherArgs[++i]);
System.out.println("Caching: " + caching);
}
// get table name -tab parameter. HBase applicable.
if (otherArgs[i].equals("-tab")) {
table = otherArgs[++i];
System.out.println("Table: " + table);
}
i++;
}
}
/**
*
* #param fileInput
* #param dateFrom
* #param dateTo
* #param job
* #param caching
* #param table
* #throws IOException
*/
public void getInput(String fileInput, String dateFrom, String dateTo, Job job, int caching, String table) throws IOException {
// If the source is from Hbase
if (fileInput == null) {
/**
* HBase source
*/
// If date is not defined
if (dateFrom == null || dateTo == null) {
System.err.println("Start date or End Date is not defined.");
return;
}
System.out.println("HBase table used as a source.");
Scan scan = new Scan(Bytes.toBytes(dateFrom), Bytes.toBytes(dateTo));
scan.setCaching(caching); // set Caching, when the table is small it is better to use bigger number. Default scan is 1
scan.setCacheBlocks(false); // do not set true for MR jobs
scan.addColumn(Bytes.toBytes("d"), Bytes.toBytes("j"));
TableMapReduceUtil.initTableMapperJob(
table, //name of table
scan, //instance of scan
MapHBase.class, //mapper class
Text.class, //mapper output key
Text.class, //mapper output value
job);
}
}
/**
* Tool implementation
*/
#SuppressWarnings("deprecation")
#Override
public int run(String[] args) throws Exception {
// Create configuration
Configuration conf = this.getConf();
String databaseName = null;
String tableName = "test";
// Parse arguments
String[] otherArgs = new GenericOptionsParser(conf,args).getRemainingArgs();
getParams(otherArgs);
// It is better to specify zookeeper quorum in CLI parameter -D hbase.zookeeper.quorum=zookeeper servers
conf.set( "hbase.zookeeper.quorum",
"cz-dc1-s-132.mall.local,cz-dc1-s-133.mall.local,"
+ "cz-dc1-s-134.mall.local,cz-dc1-s-135.mall.local,"
+ "cz-dc1-s-136.mall.local");
// Create job
Job job = Job.getInstance(conf, NAME);
job.setJarByClass(DumpProductViewsAggHive.class);
// Setup MapReduce job
job.setReducerClass(Reducer.class);
//job.setNumReduceTasks(0); // If reducer is not needed
// Specify key / value
job.setOutputKeyClass(NullWritable.class);
job.setOutputValueClass(DefaultHCatRecord.class);
// Input
getInput(null, dateFrom, dateTo, job, caching, table);
// Output
// Ignore the key for the reducer output; emitting an HCatalog record as value
job.setOutputFormatClass(HCatOutputFormat.class);
HCatOutputFormat.setOutput(job, OutputJobInfo.create(databaseName, tableName, null));
HCatSchema s = HCatOutputFormat.getTableSchema(job);
System.err.println("INFO: output schema explicitly set for writing:" + s);
HCatOutputFormat.setSchema(job, s);
// Execute job and return status
return job.waitForCompletion(true) ? 0 : 1;
}
/**
* Main
* #param args
* #throws Exception
*/
public static void main(String[] args) throws Exception {
int res = ToolRunner.run(new Configuration(), new DumpProductViewsAggHive(), args);
System.exit(res);
}
}
Similarly to the question I answered a few minutes ago, you are defining the reducer wrong:
#Override
public void reduce (Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException
Please use the #Override annotation to let the compiler spot this error for you.

Error when calculate average temperature on hbase

i want calculate average temperature on table test(info:date,info:temp) in hbase and put the result into table result(info:date,info:avg).
However when running the program it gave me an error.
the code is:
public static class mapper1 extends TableMapper<Text,FloatWritable>
{
public static final byte[] Info = "info".getBytes();
public static final byte[] Date = "date".getBytes();
public static final byte[] Temp = "temp".getBytes();
private static Text key=new Text();
public void map(ImmutableBytesWritable row,Result value,Context context)
throws IOException
{
String k1 = new String(value.getValue(Info, Date));
key.set(k1);
byte[] val=value.getValue(Info,Temp);
try
{
context.write(key,new
FloatWritable(Float.parseFloat(Bytes.toString(val))));
}
catch(InterruptedException e)
{
throw new IOException(e);
}
}}
//********************************************************************
public static class reducer1 extends TableReducer<Text,Result,Text>
{
public static final byte[] info = "info".getBytes();
public static final byte[] date = "date".getBytes();
byte[] avg ;
public void reduce(Text key,Iterable<FloatWritable>values, Context context)
throws IOException, InterruptedException
{
float sum=0;
int count=0;
float average=0;
for(FloatWritable val:values)
{
sum+=val.get();
count++;
}
average=(sum/count);
Put put = new Put(Bytes.toBytes(key.toString()));
put.add(info, date, Bytes.toBytes(average));
System.out.println("For\t"+count+"\t average is:"+average);
context.write(key,put);
}
}
//*********************************************************************
public static void main(String args[]) throws
IOException,ClassNotFoundException, InterruptedException, NullPointerException
{
Configuration config=HBaseConfiguration.create();
config.set("hbase.zookeeper.quorum", "localhost");
HTable table1 = new HTable(config, "test");
HTable table2 = new HTable(config, "result");
Job job=new Job(config,"AVG");
Scan scan=new Scan();
scan.addFamily("info".getBytes());
scan.setFilter(new FirstKeyOnlyFilter());
TableMapReduceUtil.initTableMapperJob(
"test",
scan,
mapper1.class,
Text.class,
FloatWritable.class,
job);
TableMapReduceUtil.initTableReducerJob(
"result",
reducer1.class,
job);
job.setNumReduceTasks(1);
boolean b = job.waitForCompletion(true);
if (!b) {
throw new IOException("error with job!");
}
}
}
The error message is:
Exception in thread "main" java.lang.NullPointerException
at org.apache.hadoop.net.DNS.reverseDns(DNS.java:92)
at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.reverseDNS(TableInputFormatBase.java:223)
at
org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:189)
at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:452)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:469)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:366)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1218)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1215)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1367)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1215)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1236)
at TempVar.AVG.main(AVG.java:126)
can you help me?
It seems the computer name returned by zookeeper where hbase should be is not recognized
either configure the DNS correctly or if you don't use that add the mapping form names to ip addresses in the /etc/hosts file

Hadoop - Writing to HBase directly from the Mapper

I have a haddop job that its output should be written to HBase. I do not really needs reducer, the kind of row I would like to insert is determined in the Mapper.
How can I use TableOutputFormat to achieve this? From all the examples I have seen the assumption is that the reducer is the one creating the Put, and that TableMapper is just for reading from HBase table.
In my case the input is HDFS the output is Put to specific table, I cannot find anything in TableMapReduceUtil that can help me with that either.
Is there any example out there that can help me with that?
BTW, I am using the new Hadoop API
This is the example of reading from file and put all lines into Hbase. This example is from "Hbase: The definitive guide" and you can find it on repository. To get it just clone repo on your computer:
git clone git://github.com/larsgeorge/hbase-book.git
In this book you can also find all the explanations about the code. But if something is incomprehensible for you, feel free to ask.
` public class ImportFromFile {
public static final String NAME = "ImportFromFile";
public enum Counters { LINES }
static class ImportMapper
extends Mapper<LongWritable, Text, ImmutableBytesWritable, Writable> {
private byte[] family = null;
private byte[] qualifier = null;
#Override
protected void setup(Context context)
throws IOException, InterruptedException {
String column = context.getConfiguration().get("conf.column");
byte[][] colkey = KeyValue.parseColumn(Bytes.toBytes(column));
family = colkey[0];
if (colkey.length > 1) {
qualifier = colkey[1];
}
}
#Override
public void map(LongWritable offset, Text line, Context context)
throws IOException {
try {
String lineString = line.toString();
byte[] rowkey = DigestUtils.md5(lineString);
Put put = new Put(rowkey);
put.add(family, qualifier, Bytes.toBytes(lineString));
context.write(new ImmutableBytesWritable(rowkey), put);
context.getCounter(Counters.LINES).increment(1);
} catch (Exception e) {
e.printStackTrace();
}
}
}
private static CommandLine parseArgs(String[] args) throws ParseException {
Options options = new Options();
Option o = new Option("t", "table", true,
"table to import into (must exist)");
o.setArgName("table-name");
o.setRequired(true);
options.addOption(o);
o = new Option("c", "column", true,
"column to store row data into (must exist)");
o.setArgName("family:qualifier");
o.setRequired(true);
options.addOption(o);
o = new Option("i", "input", true,
"the directory or file to read from");
o.setArgName("path-in-HDFS");
o.setRequired(true);
options.addOption(o);
options.addOption("d", "debug", false, "switch on DEBUG log level");
CommandLineParser parser = new PosixParser();
CommandLine cmd = null;
try {
cmd = parser.parse(options, args);
} catch (Exception e) {
System.err.println("ERROR: " + e.getMessage() + "\n");
HelpFormatter formatter = new HelpFormatter();
formatter.printHelp(NAME + " ", options, true);
System.exit(-1);
}
return cmd;
}
public static void main(String[] args) throws Exception {
Configuration conf = HBaseConfiguration.create();
String[] otherArgs =
new GenericOptionsParser(conf, args).getRemainingArgs();
CommandLine cmd = parseArgs(otherArgs);
String table = cmd.getOptionValue("t");
String input = cmd.getOptionValue("i");
String column = cmd.getOptionValue("c");
conf.set("conf.column", column);
Job job = new Job(conf, "Import from file " + input + " into table " + table);
job.setJarByClass(ImportFromFile.class);
job.setMapperClass(ImportMapper.class);
job.setOutputFormatClass(TableOutputFormat.class);
job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, table);
job.setOutputKeyClass(ImmutableBytesWritable.class);
job.setOutputValueClass(Writable.class);
job.setNumReduceTasks(0);
FileInputFormat.addInputPath(job, new Path(input));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}`
You just need to make the mapper output the pair. OutputFormat only specifies how to persist the output key-values. It does not necessarily mean that the key values come from reducer.
You would need to do something like this in the mapper:
... extends TableMapper<ImmutableBytesWritable, Put>() {
...
...
context.write(<some key>, <some Put or Delete object>);
}

Categories