Storm Hbase Configuration not found

Storm Hbase Configuration not found - java

So I have set up a storm spout coming from kafka and a bolt writing to the HDFS. This all works fine. I now want to add a new bolt which write to Hbase. For some reason, my application is not picking up the hbase configuration stuff and I get the following error:
java.lang.IllegalArgumentException: HBase configuration not found using key 'null'
at org.apache.storm.hbase.bolt.AbstractHBaseBolt.prepare(AbstractHBaseBolt.java:58) ~[storm-hbase-0.9.3.2.2.0.0-2041.jar:0.9.3.2.2.0.0-2041]
at backtype.storm.daemon.executor$fn__5697$fn__5710.invoke(executor.clj:732) ~[storm-core-0.9.3.2.2.0.0-2041.jar:0.9.3.2.2.0.0-2041]
at backtype.storm.util$async_loop$fn__452.invoke(util.clj:463) ~[storm-core-0.9.3.2.2.0.0-2041.jar:0.9.3.2.2.0.0-2041]
at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
2015-04-16 02:05:44 b.s.d.executor [ERROR]
java.lang.IllegalArgumentException: HBase configuration not found using key 'null'
at org.apache.storm.hbase.bolt.AbstractHBaseBolt.prepare(AbstractHBaseBolt.java:58) ~[storm-hbase-0.9.3.2.2.0.0-2041.jar:0.9.3.2.2.0.0-2041]
at backtype.storm.daemon.executor$fn__5697$fn__5710.invoke(executor.clj:732) ~[storm-core-0.9.3.2.2.0.0-2041.jar:0.9.3.2.2.0.0-2041]
at backtype.storm.util$async_loop$fn__452.invoke(util.clj:463) ~[storm-core-0.9.3.2.2.0.0-2041.jar:0.9.3.2.2.0.0-2041]
at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
2015-04-16 02:05:44 o.a.h.u.NativeCodeLoader [WARN] Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2015-04-16 02:05:44 b.s.util [ERROR] Halting process: ("Worker died")
java.lang.RuntimeException: ("Worker died")
at backtype.storm.util$exit_process_BANG_.doInvoke(util.clj:322) [storm-core-0.9.3.2.2.0.0-2041.jar:0.9.3.2.2.0.0-2041]
at clojure.lang.RestFn.invoke(RestFn.java:423) [clojure-1.5.1.jar:na]
at backtype.storm.daemon.worker$fn__6109$fn__6110.invoke(worker.clj:495) [storm-core-0.9.3.2.2.0.0-2041.jar:0.9.3.2.2.0.0-2041]
at backtype.storm.daemon.executor$mk_executor_data$fn__5530$fn__5531.invoke(executor.clj:245) [storm-core-0.9.3.2.2.0.0-2041.jar:0.9.3.2.2.0.0-2041]
at backtype.storm.util$async_loop$fn__452.invoke(util.clj:475) [storm-core-0.9.3.2.2.0.0-2041.jar:0.9.3.2.2.0.0-2041]
at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
When I look at the code it shows the following block where the error is occuring:
public void prepare(Map map, TopologyContext topologyContext, OutputCollector collector) {
this.collector = collector;
final Configuration hbConfig = HBaseConfiguration.create();
Map<String, Object> conf = (Map<String, Object>)map.get(this.configKey);
if(conf == null) {
throw new IllegalArgumentException("HBase configuration not found using key '" + this.configKey + "'");
}
It looks like configKey isn't getting set anywhere so I tried to set it ion the HBaseBolt method as below:
SimpleHBaseMapper mapper = new SimpleHBaseMapper()
.withRowKeyField("CustomerId")
.withColumnFields(new Fields("CustomerId"))
.withCounterFields(new Fields("Count"))
.withColumnFamily("cf1");
HBaseBolt hbase = new HBaseBolt("Customer", mapper).withConfigKey("/etc/hbase/conf/hbase-site.xml");
builder.setBolt("HBASE_BOLT", hbase, 1)
.fieldsGrouping("stormspout", new Fields("CustomerId"));
Didn't seem to do anything though as I am still getting the same error....
Anyone have any suggestions?! It feels like its just not picking up my hbase-site.xml file but I'm not sure why not...

After lots of work, I finally got this to work!!
In your topology createConfig method, add
Map<String, String> HBConfig = Maps.newHashMap();
HBConfig.put("hbase.rootdir","hdfs://<IP Address>:8020/hbase");
conf.put("HBCONFIG",HBConfig);
When you instantiate HBaseBolt, do so using
.withConfigKey("HBCONFIG")

I actually ended up writing my own hbasebolt that implements IRichBolt. I then did an override on the prepare method and built the following configuration which seemed to solve my problem :-)
Configuration conf = HBaseConfiguration.create();
conf.set("hbase.zookeeper.quorum", "sandbox.hortonworks.com");
conf.set("hbase.zookeeper.property.clientPort", "2181");
conf.set("zookeeper.session.timeout", "1200000");
conf.set("hbase.zookeeper.property.tickTime", "6000");
conf.set("zookeeper.znode.parent", "/hbase-unsecure");
conf.set("hbase.coprocessor.region.classes", "com.xasecure.authorization.hbase.XaSecureAuthorizationCoprocessor");
conf.set("hbase.coprocessor.master.classes", "com.xasecure.authorization.hbase.XaSecureAuthorizationCoprocessor");
admin = new HBaseAdmin(conf);
table = new HTable(conf, _tableName);

You should set the configuration section name for your HBase settings in your configuration of topology:
Config cfg = new Config();
Map<String, String> HBConfig = Maps.newHashMap();
HBConfig.put(somekey,somevalue);
cfg.put("HBCONFIG",HBConfig);
StormSubmitter.submitTopology(TOPOLOGY_NAME, cfg, builder.createTopology());
Then in your HBase bolt configuration set this key as a bolt configuration key:
HBaseBolt bolt = new HBaseBolt("table_name", mapper).withConfigKey("HBCONFIG");

Related

Avro GenericRecord deserialization not working via SpringKafka

I'm trying to simplify my consumer as much as possible. The problem is, when looking at the records coming in my Kafka listener:
List<GenericRecord> incomingRecords the values are just string values. I've tried turning specific reader to true and false. I've set the value deserializer as well. Am I missing something? This worked fine when I use a Java configuration class, but want to keep consolidated to this application.properties file.
application.properties
spring.kafka.properties.security.protocol=SASL_SSL
spring.kafka.properties.sasl.mechanism=SCRAM-SHA-256
spring.kafka.properties.sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule required username="${SASL_ACCESS_KEY}" password="${SASL_SECRET}";
spring.kafka.consumer.auto-offset-reset=earliest
#### Consumer Properties Configuration
spring.kafka.properties.key.deserializer=org.apache.kafka.common.serialization.StringDeserializer
spring.kafka.properties.value.deserializer=io.confluent.kafka.serializers.KafkaAvroDeserializer
spring.kafka.properties.value.subject.name.strategy=io.confluent.kafka.serializers.subject.TopicRecordNameStrategy
spring.kafka.bootstrap-servers=
spring.kafka.properties.schema.registry.url=
spring.kafka.properties.specific.avro.reader=true
spring.kafka.consumer.properties.spring.json.trusted.packages=*
logging.level.org.apache.kafka=TRACE
logging.level.io.confluent.kafka.schemaregistry=TRACE
consumer
#KafkaListener(topics = "${topic}", groupId = "${group}")
public void processMessageBatch(List<GenericRecord> incomingRecords,
#Header(KafkaHeaders.RECEIVED_PARTITION_ID) List<Integer> partitions,
#Header(KafkaHeaders.RECEIVED_TOPIC) List<String> topics,
#Header(KafkaHeaders.OFFSET) List<Long> offsets) {
currentMicroBatch = Stream.of(currentMicroBatch, incomingRecords).flatMap(List::stream).collect(Collectors.toList());
if (currentMicroBatch.size() >= maxRecords || validatedElapsedDuration(durationMonitor)) {
System.out.println("ETL processing logic will be done here");
}
clearBatch();
}
I notice when I use:
spring.kafka.consumer.value-deserializer=io.confluent.kafka.serializers.KafkaAvroDeserializer
spring.kafka.consumer.key-deserializer=org.apache.kafka.common.serialization.StringDeserializer
I get the following error:
2020-12-02 17:04:42.745 DEBUG 51910 --- [ntainer#0-0-C-1] i.c.k.s.client.rest.RestService : Sending GET with input null to https://myschemaregistry.com
2020-12-02 17:04:42.852 ERROR 51910 --- [ntainer#0-0-C-1] o.s.kafka.listener.LoggingErrorHandler : Error while processing: null
org.apache.kafka.common.errors.SerializationException: Error deserializing key/value for partition my-topic-avro-32 at offset 7836. If needed, please seek past the record to continue consumption.
java.lang.IllegalArgumentException: argument "src" is null
at com.fasterxml.jackson.databind.ObjectMapper._assertNotNull(ObjectMapper.java:4735)
at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3502)
at io.confluent.kafka.schemaregistry.client.rest.RestService.sendHttpRequest(RestService.java:270)
at io.confluent.kafka.schemaregistry.client.rest.RestService.httpRequest(RestService.java:334)
at io.confluent.kafka.schemaregistry.client.rest.RestService.getId(RestService.java:573)
at io.confluent.kafka.schemaregistry.client.rest.RestService.getId(RestService.java:557)
at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getSchemaByIdFromRegistry(CachedSchemaRegistryClient.java:149)
at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getBySubjectAndId(CachedSchemaRegistryClient.java:230)
at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getById(CachedSchemaRegistryClient.java:209)
at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer$DeserializationContext.schemaFromRegistry(AbstractKafkaAvroDeserializer.java:241)
at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer.deserialize(AbstractKafkaAvroDeserializer.java:102)
at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer.deserialize(AbstractKafkaAvroDeserializer.java:81)
at io.confluent.kafka.serializers.KafkaAvroDeserializer.deserialize(KafkaAvroDeserializer.java:55)
at org.apache.kafka.common.serialization.Deserializer.deserialize(Deserializer.java:60)
at org.apache.kafka.clients.consumer.internals.Fetcher.parseRecord(Fetcher.java:1268)
at org.apache.kafka.clients.consumer.internals.Fetcher.access$3600(Fetcher.java:124)
at org.apache.kafka.clients.consumer.internals.Fetcher$PartitionRecords.fetchRecords(Fetcher.java:1492)
at org.apache.kafka.clients.consumer.internals.Fetcher$PartitionRecords.access$1600(Fetcher.java:1332)
at org.apache.kafka.clients.consumer.internals.Fetcher.fetchRecords(Fetcher.java:645)
at org.apache.kafka.clients.consumer.internals.Fetcher.fetchedRecords(Fetcher.java:606)
at org.apache.kafka.clients.consumer.KafkaConsumer.pollForFetches(KafkaConsumer.java:1263)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1225)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1201)
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.doPoll(KafkaMessageListenerContainer.java:1062)
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.pollAndInvoke(KafkaMessageListenerContainer.java:1018)
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.run(KafkaMessageListenerContainer.java:949)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.lang.Thread.run(Thread.java:834)

I found the issue. Debugging deep into the rest client for confluent, I was hit with a 401 (terrible logs btw)
I needed to add this:
spring.kafka.properties.basic.auth.credentials.source=SASL_INHERIT
since I'm using SASL auth and needed registry to inherit the SASL config I added up above. fun stuff..

Had the same issue.
for me I just needed to set the protocol of the schema-registry url from http:// to https:// and it worked.
#Ryan answered gave me a clue.

How to delete a topic configuration via KafkaAdminClient

I want to delete a configuration (reset it to default) for a topic which was overridden before. This is possible with the provided script
$> ./kafka-configs.sh --zookeeper localhost:2181 --alter --entity-type topics \
--entity-name test --delete-config my.overridden.config
Is there a way to do this with the KafkaAdminClient provided in kafka-clients-1.1.1.jar?
I just found the method org.apache.kafka.clients.admin.KafkaAdminClient.alterConfigs(Map<ConfigResource, Config>, AlterConfigsOptions), but when I call it with a configuration value set to null, I get a NullPointerException on the server:
[2018-07-31 11:24:01,658] ERROR [Admin Manager on Broker 0]: Error processing alter configs request for resource Resource(type=TOPIC, name='test'}, config org.apache.kafka.common.requests.AlterConfigsRequest$Config#5d4fef59 (kafka.server.AdminManager)
java.lang.NullPointerException
at java.util.Hashtable.put(Hashtable.java:459)
at java.util.Properties.setProperty(Properties.java:166)
at kafka.server.AdminManager$$anonfun$alterConfigs$1$$anonfun$apply$18.apply(AdminManager.scala:357)
at kafka.server.AdminManager$$anonfun$alterConfigs$1$$anonfun$apply$18.apply(AdminManager.scala:356)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at kafka.server.AdminManager$$anonfun$alterConfigs$1.apply(AdminManager.scala:356)
at kafka.server.AdminManager$$anonfun$alterConfigs$1.apply(AdminManager.scala:339)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at kafka.server.AdminManager.alterConfigs(AdminManager.scala:339)
at kafka.server.KafkaApis.handleAlterConfigsRequest(KafkaApis.scala:1987)
at kafka.server.KafkaApis.handle(KafkaApis.scala:136)
at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:69)
at java.lang.Thread.run(Thread.java:745)
An empty list won't work either.
I am using Kafka in version 2.11-1.1.0.

A lot of the function provided in Kafka admin jars are APIs and not a direct scripted function. You can change a topic's configuration using zookeeper AdminUtils in a java program as below. Send an empty property object to the function to clear out existing properties.
import org.I0Itec.zkclient.ZkClient;
import org.I0Itec.zkclient.ZkConnection;
import kafka.admin.AdminUtils;
import kafka.utils.ZKStringSerializer$;
import kafka.utils.ZkUtils;
public static void changeConfig(String topic) {
ZkClient zkClient = new ZkClient("your_zkHost", 5000, 5000, ZKStringSerializer$.MODULE$);
ZkUtils zkUtils = new ZkUtils(zkClient, new ZkConnection("your_zkHost"), false);
Properties prop = new Properties();
prop.setProperty("retention.ms", "3600000");
AdminUtils.changeTopicConfig(zkUtils, topic, prop);
}
If you need this function often, you can include a file reader to get new configurations and package into a jar for easy execution.

SparkStreaming put a kafka message into Hbase [duplicate]

I am trying to use the HBase Java APIs to write data into HBase. I installed Hadoop/HBase through Ambari.
Here is how the configuration is currently set up:
final Configuration CONFIGURATION = HBaseConfiguration.create();
final HBaseAdmin HBASE_ADMIN;
HBASE_ADMIN = new HBaseAdmin(CONFIGURATION)
When I try to write to HBase, I check to make sure that the table exists
!HBASE_ADMIN.tableExists(tableName)
If not, create a new one. However, it appears that when attempting to check if the table exists exceptions are being thrown.
I'm wondering if I'm not correctly connected to HBase...is there any good way to verify that the configuration is correct and that I am connecting to HBase? The exception I'm getting is below.
Thanks.
java.lang.RuntimeException: java.lang.NullPointerException
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:209)
at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:288)
at org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:268)
at org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:140)
at org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:135)
at org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:597)
at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:802)
at org.apache.hadoop.hbase.catalog.MetaReader.tableExists(MetaReader.java:359)
at org.apache.hadoop.hbase.client.HBaseAdmin.tableExists(HBaseAdmin.java:287)
at org.apache.hadoop.hbase.client.HBaseAdmin.tableExists(HBaseAdmin.java:301)
at com.business.project.hbase.HBaseMessageWriter.getTable(HBaseMessageWriter.java:40)
at com.business.project.hbase.HBaseMessageWriter.write(HBaseMessageWriter.java:59)
at com.business.project.hbase.HBaseMessageWriter.write(HBaseMessageWriter.java:54)
at com.business.project.storm.bolt.package.exampleBolt.execute(exampleBolt.java:19)
at backtype.storm.daemon.executor$fn__5697$tuple_action_fn__5699.invoke(executor.clj:659)
at backtype.storm.daemon.executor$mk_task_receiver$fn__5620.invoke(executor.clj:415)
at backtype.storm.disruptor$clojure_handler$reify__1741.onEvent(disruptor.clj:58)
at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125)
at backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99)
at backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80)
at backtype.storm.daemon.executor$fn__5697$fn__5710$fn__5761.invoke(executor.clj:794)
at backtype.storm.util$async_loop$fn__452.invoke(util.clj:465)
at clojure.lang.AFn.run(AFn.java:24)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.getMetaReplicaNodes(ZooKeeperWatcher.java:269)
at org.apache.hadoop.hbase.zookeeper.MetaRegionTracker.blockUntilAvailable(MetaRegionTracker.java:241)
at org.apache.hadoop.hbase.client.ZooKeeperRegistry.getMetaRegionLocation(ZooKeeperRegistry.java:62)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateMeta(ConnectionManager.java:1203)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1164)
at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:294)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:130)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:55)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:201)

In addition to the configuration parameters suggested by Yosr, specifying
conf.set("zookeeper.znode.parent", "VALUE")
would help resolve the issue.

The property below resolved my issue
For Hortonworks:
hconfig.set("zookeeper.znode.parent", "/hbase-unsecure")
For cloudera:
hconfig.set("zookeeper.znode.parent", "/hbase")

You can use HBaseAdmin.checkHBaseAvailable(conf);

Configuration conf = HBaseConfiguration.create();
conf.set("hbase.master", "ip_address:60000");
conf.set("hbase.zookeeper.quorum","ip_address");
conf.set("hbase.zookeeper.property.clientPort", "2181");
HBaseAdmin admin = new HBaseAdmin(conf);
boolean bool = admin.tableExists("table_name");
System.out.println( bool);
ip_address : this is the ip_adress of your hbase cluster, change your hbase zookeeper port (2181) if it is not the same on your configuration files.

Using Yarn's registry error: Service RegistryOperations is in wrong state: INITED

I am trying to write a Yarn application master that submits itself into Yarn's registry (Hadoop 2.6)
In essence this is what the application master is trying to do:
ApplicationId id = ...
String path = ...
YarnConfiguration conf = new YarnConfiguration();
RegistryOperations registryOperations = RegistryOperationsFactory.createInstance(conf);
ServiceRecord record = new ServiceRecord();
record.set(YarnRegistryAttributes.YARN_ID, applicationId);
record.set(YarnRegistryAttributes.YARN_PERSISTENCE,PersistencePolicies.APPLICATION_ATTEMPT);
registryOperations.bind(path, record, BindFlags.CREATE | BindFlags.OVERWRITE);
When submitting this code to hadoop 2.6 I get the following exception:
org.apache.hadoop.service.ServiceStateException: Service RegistryOperations is in wrong state: INITED
at org.apache.hadoop.registry.client.impl.zk.CuratorService.checkServiceLive(CuratorService.java:184)
at org.apache.hadoop.registry.client.impl.zk.CuratorService.zkSet(CuratorService.java:633)
at org.apache.hadoop.registry.client.impl.zk.RegistryOperationsService.bind(RegistryOperationsService.java:114)
...
Googling the problem yield no usable results, so I tried inspecting the relevant Yarn's source code - currently without success
Anyone else having this problem? any Idea's of what causing it or how to solve it?

From reading the RegistryOperationsFactory javadoc it said that calling any of the create* functions will initialize the resulted RegistryOperations instance.
What I didnt know is that while RegistryOperationsFactory initialize it, It still need to get started.. so this code works:
ApplicationId id = ...
String path = ...
YarnConfiguration conf = new YarnConfiguration();
RegistryOperations registryOperations = RegistryOperationsFactory.createInstance(conf);
registryOperations.start();
ServiceRecord record = new ServiceRecord();
record.set(YarnRegistryAttributes.YARN_ID, applicationId);
record.set(YarnRegistryAttributes.YARN_PERSISTENCE,PersistencePolicies.APPLICATION_ATTEMPT);
registryOperations.bind(path, record, BindFlags.CREATE | BindFlags.OVERWRITE);

Running Hadoop MR jobs without Admin privilege on Windows

I have installed Hadoop 2.3.0 in windows and able to execute MR jobs successfully. But when I trying to execute MR jobs in normal privilege (without admin privilege) means job get fails with following exception. Here I tried with Pig Script sample.
2014-10-15 12:02:32,822 WARN [main] org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:kaveen (auth:SIMPLE) cause:java.io.IOException: Split class org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit not found
2014-10-15 12:02:32,823 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.io.IOException: Split class org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit not found
at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:362)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:403)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.lang.ClassNotFoundException: Class org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1794)
at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:360)
... 7 more
2014-10-15 12:02:32,827 INFO [main] org.apache.hadoop.mapred.Task: Runnning cleanup for the task
2014-10-15 12:02:32,827 WARN [main] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: Output Path is null in abortTask()
Update:
I was able to drill down the problem and found that the exception raised in the following line at method "MapTask.getSplitDetails(MapTask.java:363)".
private <T> T getSplitDetails(Path file, long offset)
throws IOException {
FileSystem fs = file.getFileSystem(conf);
FSDataInputStream inFile = fs.open(file);
inFile.seek(offset);
String className = StringInterner.weakIntern(Text.readString(inFile));
Class<T> cls;
try {
cls = (Class<T>) conf.getClassByName(className);
} catch (ClassNotFoundException ce) {
IOException wrap = new IOException("Split class " + className +
" not found");
wrap.initCause(ce);
throw wrap;
}
But If I start "NodeManager" with admin privilege mean the above exception won't occur. I don't know why MR job not working when I start "NodeManager" with normal privilege.
If anyone know the reason and solution for above problem. Please guide me as soon as possible.

You can change the location of tmp directory location for hadoop using the below property
<property>
<name>hadoop.tmp.dir</name>
<value>/other/tmp</value>
</property>
Your default tmp location is c:\tmp which requires admin privilege to access. Change the location into any sub directory and try MR job without admin privilege.
Hope it helps.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Storm Hbase Configuration not found - java

Related

Avro GenericRecord deserialization not working via SpringKafka

How to delete a topic configuration via KafkaAdminClient

SparkStreaming put a kafka message into Hbase [duplicate]

Using Yarn's registry error: Service RegistryOperations is in wrong state: INITED

Running Hadoop MR jobs without Admin privilege on Windows

Categories

Resources