I installed pig on centos 6. I am new to pig.
I opened pig in local mode using $pig -x local.
only getting error while doing DUMP.
error message is : ERROR 2998: Unhandled internal error. org.apache.hadoop.mapred.jobcontrol.JobControl.addJob(Lorg/apache/hadoop/mapred/jobcontrol/Job;)Ljava/lang/String;
I have set JAVA_HOME and java version is 1.7. But hadoop is not installed.
=================
grunt> A = load '/etc/passwd' using PigStorage(':');
grunt> B = foreach A generate $0 as id; grunt> dump B;
2014-06-13 16:24:33,039 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
2014-06-13 16:24:33,040 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, NewPartitionFilterOptimizer, PartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter], RULES_DISABLED=[FilterLogicExpressionSimplifier]}
2014-06-13 16:24:33,041 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2014-06-13 16:24:33,042 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2014-06-13 16:24:33,042 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2014-06-13 16:24:33,043 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2014-06-13 16:24:33,044 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2014-06-13 16:24:33,044 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2014-06-13 16:24:33,052 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2014-06-13 16:24:33,052 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.
2014-06-13 16:24:33,053 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cache
2014-06-13 16:24:33,053 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Distributed cache not supported or needed in local mode. Setting key [pig.schematuple.local.dir] with code temp directory: /tmp/1402656873052-0
2014-06-13 16:24:33,062 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. org.apache.hadoop.mapred.jobcontrol.JobControl.addJob(Lorg/apache/hadoop/mapred/jobcontrol/Job;)Ljava/lang/String;
Details at logfile: /home/ripsme/pig/pig_1402655166073.log
grunt>
Thank you very much.
for Redhat systems, pls do
$sudo yum install pig
and open pig in local mode. this worked for me.
Related
04:39:30.558 [main] INFO [Cluster.java:1614] - Cannot connect with protocol version V4, trying with V3
04:39:30.580 [main] INFO [Cluster.java:1614] - Cannot connect with protocol version V3, trying with V2
04:39:30.671 [main] WARN [Cluster.java:2202] - You listed localhost/0:0:0:0:0:0:0:1:9042 in your contact points, but it wasn't found in the control host's system.peers at startup
04:39:30.847 [main] INFO [KairosRetryPolicy.java:104] - Initializing KairosRetryPolicy: retry count set to 1
04:39:30.848 [main] INFO [Cluster.java:1568] - New Cassandra host localhost/127.0.0.1:9042 added
04:39:31.333 [main] ERROR [CassandraModule.java:136] - Unable to setup cassandra schema
com.datastax.driver.core.exceptions.SyntaxError: line 6:28 no viable alternative at input '>'
at com.datastax.driver.core.exceptions.SyntaxError.copy(SyntaxError.java:58)
at com.datastax.driver.core.exceptions.SyntaxError.copy(SyntaxError.java:24)
at com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:37)
at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:245)
at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:68)
at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:43)
Using this guide to get a Standalone. But whenever I perform the "sbin\start-master.sh" command, a new CMD window pops up and then disappears immediately. No URL is being printed.
I've done the initial setup and set all the environment variables properly but it still doesn't seem to work. How can I fix this?
Edit: Also tried this example using Intelli-J. Its showing the following without any other output:
[Thread-0] INFO org.eclipse.jetty.util.log - Logging initialized #568ms
[Thread-0] INFO spark.webserver.JettySparkServer - == Spark has ignited ...
[Thread-0] INFO spark.webserver.JettySparkServer - >> Listening on 0.0.0.0:4567
[Thread-0] INFO org.eclipse.jetty.server.Server - jetty-9.3.2.v20150730
[Thread-0] INFO org.eclipse.jetty.server.ServerConnector - Started ServerConnector#e92ccd{HTTP/1.1,[http/1.1]}{0.0.0.0:4567}
[Thread-0] INFO org.eclipse.jetty.server.Server - Started #942ms
I'm trying to use JSonLoader form elephant-bird-pig package.
My script is simple:
register elephant-bird-pig-4.5.jar
register elephant-bird-hadoop-compat-4.5.jar
A = load '1_record_2.json' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad');
DUMP A
And I get an error:
2014-09-30 16:15:32,439 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2014-09-30 16:15:32,447 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
2014-09-30 16:15:32,448 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, NewPartitionFilterOptimizer, PartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter], RULES_DISABLED=[FilterLogicExpressionSimplifier]}
2014-09-30 16:15:32,449 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2014-09-30 16:15:32,450 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2014-09-30 16:15:32,450 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2014-09-30 16:15:32,464 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at hadoop1/10.242.8.131:8050
2014-09-30 16:15:32,466 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2014-09-30 16:15:32,466 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2014-09-30 16:15:32,467 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. com/twitter/elephantbird/util/HadoopCompat
Details at logfile: pig_1412081068149.log
I don't know what is missing. Can you please suggest something?
File pig_1412081068149.log contains:
Pig Stack Trace
---------------
ERROR 2998: Unhandled internal error. com/twitter/elephantbird/util/HadoopCompat
java.lang.NoClassDefFoundError: com/twitter/elephantbird/util/HadoopCompat
at com.twitter.elephantbird.pig.load.LzoBaseLoadFunc.setLocation(LzoBaseLoadFunc.java:93)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:477)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:298)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:191)
at org.apache.pig.PigServer.launchPlan(PigServer.java:1324)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1309)
at org.apache.pig.PigServer.storeEx(PigServer.java:980)
at org.apache.pig.PigServer.store(PigServer.java:944)
at org.apache.pig.PigServer.openIterator(PigServer.java:857)
at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:774) etc...
What class is missing (java.lang.NoClassDefFoundError) ? What libraries should I add more?
Thanks
pawel
I've checked if registered libraries exist and classes are inside the libraries.
Everythig looked fine, but I was still getting this error.
So I left pig shell, and opened it once again - the same script works fine.
You missed two important jars to register.
register elephant-bird-core-4.5.jar
register json-simple-1.1.1.jar
Add these two to your pig script and everything should work fine.
I'm having problems with a particular UDF jar and have so far been unable to figure out where to begin. I have a way of testing the jar from the command line and it works. If I REGISTER the jar in a pig script, pig fails to create a jar for the job. I can register other jars without trouble and this jar was working until a few days ago. Here is the output when running the pig script:
[michael#hadoop01 logitech-correlation]$ pig -f MatchWithClassifier.pig -param date=20130301 -param siteId=0
2013-05-10 11:20:30,523 [main] INFO org.apache.pig.Main - Apache Pig version 0.10.0-cdh4.1.2 (rexported) compiled Nov 01 2012, 18:38:58
2013-05-10 11:20:30,524 [main] INFO org.apache.pig.Main - Logging error messages to: /home/michael/correlation/pig_1368210030521.log
2013-05-10 11:20:30,981 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://hadoop01/
2013-05-10 11:20:31,346 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: hadoop01.dev.terapeak.com:8021
2013-05-10 11:20:32,143 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: FILTER
2013-05-10 11:20:32,390 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2013-05-10 11:20:32,422 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2013-05-10 11:20:32,422 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2013-05-10 11:20:32,508 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2013-05-10 11:20:32,518 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2013-05-10 11:20:32,522 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job5623238576559565298.jar
2013-05-10 11:20:36,398 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2017: Internal error creating job configuration.
Details at logfile: /home/michael/correlation/pig_1368210030521.log
The stack trace is below:
Pig Stack Trace
---------------
ERROR 2017: Internal error creating job configuration.
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobCreationException: ERROR 2017: Internal error creating job configuration.
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:727)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:259)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:180)
at org.apache.pig.PigServer.launchPlan(PigServer.java:1275)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1260)
at org.apache.pig.PigServer.execute(PigServer.java:1250)
at org.apache.pig.PigServer.executeBatch(PigServer.java:362)
at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:132)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:193)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
at org.apache.pig.Main.run(Main.java:430)
at org.apache.pig.Main.main(Main.java:111)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Caused by: java.util.zip.ZipException: invalid distance too far back
at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:164)
at java.util.zip.ZipInputStream.read(ZipInputStream.java:163)
at java.util.jar.JarInputStream.read(JarInputStream.java:194)
at java.io.FilterInputStream.read(FilterInputStream.java:107)
at org.apache.pig.impl.util.JarManager.addStream(JarManager.java:242)
at org.apache.pig.impl.util.JarManager.mergeJar(JarManager.java:216)
at org.apache.pig.impl.util.JarManager.mergeJar(JarManager.java:206)
at org.apache.pig.impl.util.JarManager.createJar(JarManager.java:126)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:411)
... 17 more
================================================================================
Based on this I would think the problem is from the "java.util.zip.ZipException: invalid distance too far back" exception. Is pig having some issue reading the jar?
I have a cluster running Hadoop 0.20.2 and Pig 0.10.
I'm interested to add some logs to Pig's source code and to run my own Pig version on the cluster.
What I did:
built the project with 'ant' command
got pig.jar and pig-withouthadoop.jar
copied the jars to Pig home directory on the cluster's namenode
run a job
Then I've got following std output:
2013-03-25 06:35:05,226 [main] WARN org.apache.pig.backend.hadoop20.PigJobControl - falling back to default JobControl (not using hadoop 0.20 ?)
java.lang.NoSuchFieldException: runnerState
at java.lang.Class.getDeclaredField(Class.java:1882)
at org.apache.pig.backend.hadoop20.PigJobControl.<clinit>(PigJobControl.java:51)
at org.apache.pig.backend.hadoop.executionengine.shims.HadoopShims.newJobControl(HadoopShims.java:97)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:287)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:177)
at org.apache.pig.PigServer.launchPlan(PigServer.java:1320)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1305)
at org.apache.pig.PigServer.execute(PigServer.java:1295)
at org.apache.pig.PigServer.executeBatch(PigServer.java:375)
at org.apache.pig.PigServer.executeBatch(PigServer.java:353)
at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:137)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
at org.apache.pig.Main.run(Main.java:480)
at org.apache.pig.Main.main(Main.java:157)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
2013-03-25 06:35:05,229 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2013-03-25 06:35:05,260 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2013-03-25 06:35:05,272 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1
2013-03-25 06:35:06,041 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job9091543475518322185.jar
2013-03-25 06:35:10,974 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job9091543475518322185.jar created
2013-03-25 06:35:10,995 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2013-03-25 06:35:11,006 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.
2013-03-25 06:35:11,006 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cacche
2013-03-25 06:35:11,006 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []
2013-03-25 06:35:11,181 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. org.apache.hadoop.mapred.jobcontrol.JobControl.addJob(Lorg/apache/hadoop/mapred/jobcontrol/Job;)Ljava/lang/String;
Pig Stack Trace:
ERROR 2998: Unhandled internal error. org.apache.hadoop.mapred.jobcontrol.JobControl.addJob(Lorg/apache/hadoop/mapred/jobcontrol/Job;)Ljava/lang/String;
java.lang.NoSuchMethodError: org.apache.hadoop.mapred.jobcontrol.JobControl.addJob(Lorg/apache/hadoop/mapred/jobcontrol/Job;)Ljava/lang/String;
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:298)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:177)
at org.apache.pig.PigServer.launchPlan(PigServer.java:1320)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1305)
at org.apache.pig.PigServer.execute(PigServer.java:1295)
at org.apache.pig.PigServer.executeBatch(PigServer.java:375)
at org.apache.pig.PigServer.executeBatch(PigServer.java:353)
at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:137)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
at org.apache.pig.Main.run(Main.java:480)
at org.apache.pig.Main.main(Main.java:157)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
what went wrong? Should I do anything else except replacing pig.jar and pig-withouthadoop.jar in installation directory of namenode?
help...
the point I missed was: pig-withouthadoop.jar should be compiled with specific Hadoop version.
I compiled the jar in following way and it worked:
% ant clean jar-withouthadoop -Dhadoopversion=23