Using PigServer to run mapreduce jobs

Using PigServer to run mapreduce jobs - java

I want to use the PigServer java class to run pig scripts on a remote hadoop cluster. At the moment I am using the cloudera cdh4.4.0 distribution for testing pusposes. I'm running my java program from within the cloudera VM.
The directory with the hadoop config files is included in the classpath.
When I run the same script in map reduce mode in shell it works just fine. Thanks in advance for your answer.
The java code:
Properties props = new Properties();
props.setProperty("fs.default.name", "hdfs://localhost.localdomain:8020");
props.setProperty("mapred.job.tracker", "localhost.localdomain:8021");
PigServer pigServer = new PigServer(ExecType.MAPREDUCE, props);
pigServer.registerQuery("batting = load './Batting.csv' using PigStorage(',');");
pigServer.registerQuery("runs_raw = FOREACH batting GENERATE $0 as playerID, $1 as year, $8 as runs;");
pigServer.registerQuery("runs = FILTER runs_raw BY runs > 0;");
pigServer.registerQuery("grp_data = GROUP runs by (year);");
pigServer.registerQuery("max_runs = FOREACH grp_data GENERATE group as grp, MAX(runs.runs) as max_runs;");
pigServer.registerQuery("join_max_run = JOIN max_runs by ($0, max_runs), runs by (year,runs);");
pigServer.registerQuery("join_data = FOREACH join_max_run GENERATE $0 as year, $2 as playerID, $1 as runs;");
pigServer.store("join_data", "join_data");
I also added the following maven dependency
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.2.0</version>
</dependency>
Upon running my java program I'm getting the following stacktrace:
log4j:WARN No appenders could be found for logger (org.apache.hadoop.conf.Configuration.deprecation).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
*Exception in thread "main" org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during parsing. Unable to check name hdfs://localhost.localdomain:8020/user/cloudera*
at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1608)
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1547)
at org.apache.pig.PigServer.registerQuery(PigServer.java:518)
at org.apache.pig.PigServer.registerQuery(PigServer.java:531)
at MainTestPigServer.main(MainTestPigServer.java:24)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)
Caused by: Failed to parse: Pig script failed to parse:
<line 1, column 10> pig script failed to validate: org.apache.pig.backend.datastorage.DataStorageException: ERROR 6007: Unable to check name hdfs://localhost.localdomain:8020/user/cloudera
at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:191)
at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1600)
... 9 more
Caused by:
<line 1, column 10> pig script failed to validate: org.apache.pig.backend.datastorage.DataStorageException: ERROR 6007: Unable to check name hdfs://localhost.localdomain:8020/user/cloudera
at org.apache.pig.parser.LogicalPlanBuilder.buildLoadOp(LogicalPlanBuilder.java:835)
at org.apache.pig.parser.LogicalPlanGenerator.load_clause(LogicalPlanGenerator.java:3236)
at org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1315)
at org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:799)
at org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:517)
at org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:392)
at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:184)
... 10 more
Caused by: org.apache.pig.backend.datastorage.DataStorageException: ERROR 6007: Unable to check name hdfs://localhost.localdomain:8020/user/cloudera
at org.apache.pig.backend.hadoop.datastorage.HDataStorage.isContainer(HDataStorage.java:207)
at org.apache.pig.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:128)
at org.apache.pig.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:138)
at org.apache.pig.parser.QueryParserUtils.getCurrentDir(QueryParserUtils.java:91)
at org.apache.pig.parser.LogicalPlanBuilder.buildLoadOp(LogicalPlanBuilder.java:827)
... 16 more
Caused by: java.io.IOException: Failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message contained an invalid tag (zero).; Host Details : local host is: "localhost.localdomain/127.0.0.1"; destination host is: "localhost.localdomain":8020;
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
at org.apache.hadoop.ipc.Client.call(Client.java:1351)
at org.apache.hadoop.ipc.Client.call(Client.java:1300)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at $Proxy9.getFileInfo(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at $Proxy9.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:651)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1679)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1106)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1397)
at org.apache.pig.backend.hadoop.datastorage.HDataStorage.isContainer(HDataStorage.java:200)
... 20 more
Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol message contained an invalid tag (zero).
at com.google.protobuf.InvalidProtocolBufferException.invalidTag(InvalidProtocolBufferException.java:89)
at com.google.protobuf.CodedInputStream.readTag(CodedInputStream.java:108)
at org.apache.hadoop.ipc.protobuf.RpcHeaderProtos$RpcResponseHeaderProto.<init>(RpcHeaderProtos.java:1398)
at org.apache.hadoop.ipc.protobuf.RpcHeaderProtos$RpcResponseHeaderProto.<init>(RpcHeaderProtos.java:1362)
at org.apache.hadoop.ipc.protobuf.RpcHeaderProtos$RpcResponseHeaderProto$1.parsePartialFrom(RpcHeaderProtos.java:1492)
at org.apache.hadoop.ipc.protobuf.RpcHeaderProtos$RpcResponseHeaderProto$1.parsePartialFrom(RpcHeaderProtos.java:1487)
at com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:200)
at com.google.protobuf.AbstractParser.parsePartialDelimitedFrom(AbstractParser.java:241)
at com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:253)
at com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:259)
at com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:49)
at org.apache.hadoop.ipc.protobuf.RpcHeaderProtos$RpcResponseHeaderProto.parseDelimitedFrom(RpcHeaderProtos.java:2364)
at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:996)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:891)
Process finished with exit code 1

Related

pyspark container work locally but not when deployed to AWS lambda

I have create a pyspark batch pipeline and packed into a docker container. everything works fine locally until I deploy it to run as AWS lambda function.
The error message is as follows:
java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found
and trackback is
Traceback (most recent call last):
File "/var/task/pipeline.py", line 71, in
df.write.mode("overwrite").parquet(parquet_output)
File "/var/lang/lib/python3.8/site-packages/pyspark/sql/readwriter.py", line 1140, in parquet
self._jwrite.parquet(path)
File "/var/lang/lib/python3.8/site-packages/py4j/java_gateway.py", line 1321, in call
return_value = get_return_value(
File "/var/lang/lib/python3.8/site-packages/pyspark/sql/utils.py", line 190, in deco
return f(*a, **kw)
File "/var/lang/lib/python3.8/site-packages/py4j/protocol.py", line 326, in get_return_value
raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling o47.parquet.
: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2688)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3431)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3466)
at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:174)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3574)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3521)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:540)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
at org.apache.spark.sql.execution.datasources.DataSource.planForWritingFileFormat(DataSource.scala:461)
at org.apache.spark.sql.execution.datasources.DataSource.planForWriting(DataSource.scala:558)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:390)
at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:363)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:239)
at org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:793)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2592)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2686)
... 25 more
END RequestId: fde8d18e-0a38-4242-a919-8f94530a87e4
REPORT RequestId: fde8d18e-0a38-4242-a919-8f94530a87e4 Duration: 32409.88 ms Billed Duration: 32410 ms Memory Size: 5120 MB Max Memory Used: 472 MB
RequestId: fde8d18e-0a38-4242-a919-8f94530a87e4 Error: Runtime exited with error: exit status 1
Runtime.ExitError
My question here is:
How come the same container image works locally but not on lambda? My understanding is that if a Docker image runs locally, it should also work when it is deployed to any cloud environment since it is self-contained.
Am I misunderstanding something?

Apache Storm: IllegalArgumentException: field supervisor.scheduler.meta must be a 'java.util.Map'

I am new to apache storm , currenly trying Pluggable Scheduler, to schedule the task assignment: which task should run on which supervisor.
I tried setting the "supervisor.scheduler.meta" value in the storm.yaml file in the supervisor node as shown below and when i tried to run the supervisor i end up with the illegal argument exception.I am using apache storm 0.10.0. Could you please guide me in solving this issue. Please find the configuration files and error logs below
storm.yaml
-----------
supervisor.scheduler.meta: "special-supervisor"
error-log
----
java.lang.IllegalArgumentException: field supervisor.scheduler.meta 'special-supervisor' must be a 'java.util.Map'
at backtype.storm.config$fn$reify__880.validateField(config.clj:58)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93)
at clojure.lang.Reflector.invokeInstanceMethod(Reflector.java:28)
at backtype.storm.config$validate_configs_with_schemas.invoke(config.clj:118)
at backtype.storm.config$read_storm_config.invoke(config.clj:123)
at backtype.storm.command.config_value$_main.invoke(config_value.clj:22)
at clojure.lang.AFn.applyToHelper(AFn.java:154)
at clojure.lang.AFn.applyTo(AFn.java:144)

Map entries need to have key and a value. For example:
supervisor.scheduler.meta:
name: "special-supervisor"
where "name" is the key and "special-supervisor" is the value.

java.lang.IllegalArgumentException: There is no queue named default

I'm trying to load the data into pig and dump the same data on to the console. I did without any errors in Cloudera sandbox using following commands.
raw_data = LOAD 'hdfs:/user/cloudera/sampledata' USING PigStorage(',') AS (
custno:chararray,
firstname:chararray,
lastname:chararray,
age:int,
profession:chararray
);
dump raw_data;
it dumps all the data in sampledata file.
Trying to do the same in MapR cluster with the following commands.
raw_data = LOAD '/hdfspath/input' USING PigStorage(',') AS (
custno:chararray,
firstname:chararray,
lastname:chararray,
age:int,
profession:chararray
);
dump raw_data;
Getting the following error.
(RemoteException): org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.IllegalArgumentException: There is no queue named default
ERROR org.apache.hadoop.ipc.RPC - FailoverProxy: Failing this Call: getQueueAdmins for error(RemoteException): org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.IllegalArgumentException: There is no queue named default
at org.apache.hadoop.mapred.QueueManager.getQueueACL(QueueManager.java:413)
at org.apache.hadoop.mapred.JobTracker.getQueueAdmins(JobTracker.java:5346)
at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:993)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1326)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1322)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1320)
ERROR 2997: Unable to recreate exception from backend error: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.IllegalArgumentException: There is no queue named default
at org.apache.hadoop.mapred.QueueManager.getQueueACL(QueueManager.java:413)
at org.apache.hadoop.mapred.JobTracker.getQueueAdmins(JobTracker.java:5346)
at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:993)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1326)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1322)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1320)
at org.apache.hadoop.ipc.Client.call(Client.java:1095)
at org.apache.hadoop.ipc.Client.call(Client.java:1041)
at org.apache.hadoop.ipc.RPC$FailoverInvoker.invoke(RPC.java:540)
at org.apache.hadoop.mapred.$Proxy0.getQueueAdmins(Unknown Source)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:939)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:885)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:885)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:859)
at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378)
at org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.pig.backend.hadoop20.PigJobControl.mainLoopAction(PigJobControl.java:157)
at org.apache.pig.backend.hadoop20.PigJobControl.run(PigJobControl.java:134)
at java.lang.Thread.run(Thread.java:724)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:257)
Any help please.
Thanks in Advance.

Typically this happens if your scheduler has specific queues created without users assigned, and the user submitting the job doesn't specify any queue name.
If it assumes default queue, and has no permission to use it, you could end up with this error. You can avoid the issue with
export PIG_OPTS=”$PIG_OPTS -Dmapred.job.queue.name=my-queue”
or
pig -Dmapreduce.job.queuename=$queue_name -f path/to/script.pig

Apache Crunch Error

I have command that reads the *.avro file and loads into the table
loadavro *.avro tablename
when I try to run the code in vagrant cluster and it's throwing following error
[root#cdh4-cluster vagrant]# hadoop jar devcenter-store-load-0.2-SNAPSHOT.jar loadavro person.avro person_table
Unexpected exception: Cannot create job output directory /tmp/crunch-1880716375
java.lang.RuntimeException: Cannot create job output directory /tmp/crunch-1880716375
at org.apache.crunch.impl.dist.DistributedPipeline.createTempDirectory(DistributedPipeline.java:281)
at org.apache.crunch.impl.dist.DistributedPipeline.<init>(DistributedPipeline.java:88)
at org.apache.crunch.impl.mr.MRPipeline.<init>(MRPipeline.java:88)
at org.apache.crunch.impl.mr.MRPipeline.<init>(MRPipeline.java:76)
at com.cerner.devcenter.pipeline.crunch.PersonRecordPipeline.getPipelineInstance(PersonRecordPipeline.java:72)
at com.cerner.devcenter.pipeline.crunch.PersonRecordPipeline.writeIntoHBase(PersonRecordPipeline.java:53)
at com.cerner.console.commands.LoadAvroCommand.run(LoadAvroCommand.java:35)
at com.cerner.kepler.commands.Command.run(Command.java:112)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at com.cerner.kepler.commands.BasicCommandRunner.run(BasicCommandRunner.java:61)
at com.cerner.kepler.commands.CommandRunner.run(CommandRunner.java:19)
at com.cerner.console.main.ConsoleMain.main(ConsoleMain.java:26)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException): Cannot create directory /tmp/crunch-1880716375. Name node is in safe mode.
The reported blocks 104 needs additional 2 blocks to reach the threshold 0.9990 of total blocks 106. Safe mode will be turned off automatically.
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:2982)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:2960)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:2938)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:648)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:417)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44096)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1695)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1691)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1689)
at org.apache.hadoop.ipc.Client.call(Client.java:1225)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
at $Proxy10.mkdirs(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:425)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
at $Proxy11.mkdirs(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2121)
at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:2092)
at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:546)
at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1902)
at org.apache.crunch.impl.dist.DistributedPipeline.createTempDirectory(DistributedPipeline.java:279)
... 16 more

If you look into the exception you can see that the NameNode is in safe mode so the crunch job can't write any new files into HDFS. Take the NameNode out the safe mode and run the job again. hadoop dfsadmin -safemode leave

How to configure Jaql with an existing Hadoop Cluster and use jaql oprators to filter the result?

When I read a file from hdfs by giving it proper path the file is read successfully but when I try to use transform operator of jaql it throws an exception as given below and if I try to execute the code on JAQL shell then an exception is thrown of job.jar but even after adding the jar the exception is still thrown. if anybody knows that somehow JAQL is not configured properly with existing hadoop cluster or the exception is due to some other cause?
My code is:
jaql.setQueryString("read(lines('hdfs://hadoopserver:54310/dbreports/reports.json'," +
"{format: 'org.apache.hadoop.mapred.TextInputFormat',converter: 'com.ibm.jaql.io.hadoop.converter.FromJsonTextConverter'})) -> transform $.store_number;");
System.out.println("jaql running successfully....");
JsonValue jv = jaql.evaluate();
System.out.println("value is "+jv);
when run it throws an exception as:
Exception in thread "Thread-38" java.lang.NoClassDefFoundError: org/apache/commons/httpclient/HttpMethod
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:295)
Caused by: java.lang.ClassNotFoundException: org.apache.commons.httpclient.HttpMethod
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
at sbt.PlayCommands$$anonfun$61$$anonfun$63$$anon$2$$anonfun$loadClass$1.apply(PlayCommands.scala:563)
at sbt.PlayCommands$$anonfun$61$$anonfun$63$$anon$2$$anonfun$loadClass$1.apply(PlayCommands.scala:563)
at scala.Option.map(Option.scala:133)
at sbt.PlayCommands$$anonfun$61$$anonfun$63$$anon$2.loadClass(PlayCommands.scala:563)
... 1 more
java.io.IOException: Job failed!
..........
Do anybody knows what it is that i'm missing?

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Using PigServer to run mapreduce jobs - java

Related

pyspark container work locally but not when deployed to AWS lambda

Apache Storm: IllegalArgumentException: field supervisor.scheduler.meta must be a 'java.util.Map'

java.lang.IllegalArgumentException: There is no queue named default

Apache Crunch Error

How to configure Jaql with an existing Hadoop Cluster and use jaql oprators to filter the result?

Categories

Resources