I added KafkaLog4JAppender functionality to my MR job.
locally the job is running and sending the formatted logs into my Kafka cluster.
when I try to run it from the yarn server, using:
jar [jar-name].jar [DriverClass].class [job-params] -Dlog4j.configuration=log4j.xml -libjars
I get the following expception:
log4j:ERROR Could not create an Appender. Reported error follows.
java.lang.ClassNotFoundException: kafka.producer.KafkaLog4jAppender
the KafkaLog4JAppender class is in the path.
running
jar tvf [my-jar].jar | grep KafkaLog4J
finds the class
I'm kinda lost and would appreciate any helpfull input
thanks in advance!
If it works in local mode and not working in Yarn/distributed mode, then it could be problem of jar not being distributed properly. YOu might want to check Using third part jars and files in your MapReduce application(Distributed cache) for details on how to distribute your jar containing KafkaLog4jAppender.class
Related
we have recently upgraded the DataStage from 9.1 to 11.7 on Server AIX 7.1 .
and i'm trying to use the new connector "File Connector" to write on parquet file. i created simple job takes from teradata as a source and write on the parquet file as a target.
Image of the job
but facing below error :
> File_Connector_20,0: java.lang.NoClassDefFoundError: org.apache.hadoop.fs.FileSystem
at java.lang.J9VMInternals.prepareClassImpl (J9VMInternals.java)
at java.lang.J9VMInternals.prepare (J9VMInternals.java: 304)
at java.lang.Class.getConstructor (Class.java: 594)
at com.ibm.iis.jis.utilities.dochandler.impl.OutputBuilder.<init> (OutputBuilder.java: 80)
at com.ibm.iis.jis.utilities.dochandler.impl.Registrar.getBuilder (Registrar.java: 340)
at com.ibm.iis.jis.utilities.dochandler.impl.Registrar.getBuilder (Registrar.java: 302)
at com.ibm.iis.cc.filesystem.FileSystem.getBuilder (FileSystem.java: 2586)
at com.ibm.iis.cc.filesystem.FileSystem.writeFile (FileSystem.java: 1063)
at com.ibm.iis.cc.filesystem.FileSystem.process (FileSystem.java: 935)
at com.ibm.is.cc.javastage.connector.CC_JavaAdapter.run (CC_JavaAdapter.java: 444)
i followed the steps in below link :
https://www.ibm.com/support/knowledgecenter/SSZJPZ_11.7.0/com.ibm.swg.im.iis.conn.s3.usage.doc/topics/amaze_file_formats.html
1- i uploaded the jar files into "/ds9/IBM/InformationServer/Server/DSComponents/jars"
2- added them to CLASSPATH in agent.sh then restarted the datastage.
3- i have set The environment variable CC_USE_LATEST_FILECC_JARS to the value parquet-1.9.0.jar:orc-2.1.jar.
i tried also to add the CLASSPATH as an environment variable in the job but not worked.
noting that i'm using Local in File System.
so any hint is appreciated as i'm searching a lot time ago.
Thanks in advance,
Which File System mode you are using ? If you are using Native HDFS as File System mode, then you would need to configure CLASSPATH to include some third party jars.
Perhaps these links should provide you with some guidance.
https://www.ibm.com/support/pages/node/301847
https://www.ibm.com/support/pages/steps-required-configure-file-connector-use-parquet-or-orc-file-format
Note : Based on the hadoop distribution and version you are using, the version of the jars could be different.
If the above information does not help in resolving the issue, then you may have to reach out to IBM Support to get this addressed.
TO use File Connector, there is no need to add CLASSPATH in agent.sh unless you want to import HDFS files from IMAM.
If your requirement is reading Parquet files, then set
$CC_USE_LATEST_FILECC_JARS=parquet-1.9.0.jar
$FILECC_PARQUET_AVRO_COMPAT_MODE=TRUE
If you are still seeing issue, then run job with $CC_MSG_LEVEL=2 and open IBM support case along with job design, FULL job log and Version.xml file from Engine tier.
I have a hadoop job which requires several 3rd party jars. I have put them on the classpath with conf/hadoop-env.sh
export HADOOP_CLASSPATH=hdfs://name.node.private.ip:9000/home/ec2-user/hadoop-gremlin-libs/
When I run $ bin/hadoop classpath this path is included, as you can see here. However, when I go to run a job, it throws an error in initialization:
Error: java.lang.ClassNotFoundException: com.google.common.collect.Lists
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.giraph.conf.AllOptions.<clinit>(AllOptions.java:37)
at org.apache.giraph.conf.ClassConfOption.<init>(ClassConfOption.java:47)
at org.apache.giraph.conf.ClassConfOption.create(ClassConfOption.java:60)
at org.apache.giraph.conf.GiraphConstants.<clinit>(GiraphConstants.java:62)
at org.apache.giraph.conf.GiraphClasses.readFromConf(GiraphClasses.java:152)
at org.apache.giraph.conf.GiraphClasses.<init (GiraphClasses.java:142)
at org.apache.giraph.conf.ImmutableClassesGiraphConfiguration.<init>(ImmutableClassesGiraphConfiguration.java:93)
at org.apache.giraph.bsp.BspOutputFormat.getOutputCommitter(BspOutputFormat.java:56)
at org.apache.hadoop.mapred.Task.initialize(Task.java:515)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
This particular class should be packaged in guava, which is included on the classpath:
[ec2-user]$ bin/hadoop dfs -ls /home/ec2-user/hadoop-gremlin-libs | grep guava
-rw-r--r-- 3 ec2-user supergroup 0 2017-04-20 17:57 /home/ec2-user/hadoop-gremlin-libs/guava-18.0.jar
I am submitting the job from gremlin as follows:
graph = GraphFactory.open('conf/hadoop.properties')
result = graph.compute().program(MyVertexProgram.build().create()).submit().get()
I have also tried putting the jars on the local filesystem and receive the same error. Does anyone know how to solve this issue?
I can't tell exactly what kind of job are you doing, but looking at those classes it appears to be a Mapreduce2 maptask it is trying to setup when you hit that exception.
I think you are updating the wrong classpath value probably. You are updating the Hadoop classpath not the mapreduce classpath.
More than likely you need to update the hadoop clusters yarn/mapreduce2 application classpath values in the cluster manager application, or their site xml files the cluster is using. You should have a mapred-site.xml file which has property named mapreduce.application.classpath that has its own classpath to point to its own jars it needs to execute its jobs, add your path to the classpath in the value of the mapreduce.application.classpath value instead.
The second goes for yarn, update the yarn.application.classpath property if yarn needs any other jars, as the yarn classpath points to yarn jars that help yarn run. You can update this easily in a cluster manager application if you have it, or edit the yarn-site.xml manually to add this classpath.
The only other option is if your client software program has its own dedicated mapred-site.xml file it reads to get the mapreduce.application.classpath from for you. If so it is possible you can just modify the mapreduce.application.classpath on the client site if your software supports it. Some client programs may have their own classpaths, or read the hadoop clusters site xml files to connect to the cluster.
I am pretty sure from what it shows in the exception you need this jar somehow in the mapreduce.application.path not the hadoop classpath.
So I've installed Hadoop File System on my machine and I'm using maven dependency to provide my code spark environment. (spark-mllib_2.10)
Now, My code is using spark mllib. And accessing data from Hadoop file system with this code.
String finalData = ProjectProperties.hadoopBasePath + ProjectProperties.finalDataPath;
JavaRDD<LabeledPoint> data = MLUtils.loadLibSVMFile(jsc.sc(), finalData).toJavaRDD();
With following properties set.
finalDataPath = /data/finalInput.txt
hadoopBasePath = hdfs://127.0.0.1:54310
I am starting the dfs nodes externally through command
start-dfs.sh
Now, my code works perfectly fine when running from eclipse. But if I export the whole code to an executable jar, it gives me following exception.
Exception in thread "main" java.io.IOException: No FileSystem for scheme: hdfs
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2421)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2428)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:88)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2467)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2449)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:367)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:287)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:221)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:270)
I also checked different solutions online given for this issue where people are asking me to add following
hadoopConfig.set("fs.hdfs.impl",
org.apache.hadoop.hdfs.DistributedFileSystem.class.getName()
);
hadoopConfig.set("fs.file.impl",
org.apache.hadoop.fs.LocalFileSystem.class.getName()
);
OR
<property>
<name>fs.file.impl</name>
<value>org.apache.hadoop.fs.LocalFileSystem</value>
<description>The FileSystem for file: uris.</description>
</property>
<property>
<name>fs.hdfs.impl</name>
<value>org.apache.hadoop.hdfs.DistributedFileSystem</value>
<description>The FileSystem for hdfs: uris.</description>
</property>
But I don't use any Hadoop context or hadoop config into my project. Simply load the data from Hadoop using the URL.
Can someone give some answer relevant to this issue?
Please mind that this totally works fine from Eclipse. And only doesn't work if I export the same project as an executable Jar.
Update
As suggested in the comment and from the solutions found online, I tried two things.
Added dependencies into my pom.xml for hadoop-core, hadoop-hdfs and hadoop-client libraries.
Added the above properties configuration to hadoop's site-core.xml as suggested here http://grokbase.com/t/cloudera/scm-users/1288xszz7r/no-filesystem-for-scheme-hdfs
But still no luck in getting the error resolved. Gives the same issue locally on my machine as well as one of the remote machines I tried it on.
I also installed hadoop the same way I did on my machine using the link mentioned above.
I am setting up an Apache Storm system but am having problems getting the program to run consistently. I have set up storm on three servers but it only works consistently on one. I think the issue lies somewhere in the path of the command.
I have been using storm-starter to set up the program and have tested it locally with RollingTopWords. When I run the following command $ storm jar storm-starter-*.jar storm.starter.RollingTopWords the computer stalls a second then i get the following error:
Could not find or load main class storm.starter.RollingTopWords
The jar is stored in the directory /apache/storm/examples/storm-starter/target . Let me know if there is any other information I can provide that would be of help because I'm feeling a little desperate at this point.
The following is the entire output for the program that doesn't work.
Running: /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -client -Dstorm.options= -Dstorm.home=/home/scix3/apache/storm -Dstorm.log.dir=/home/scix3/apache/storm/logs -Djava.library.path=/usr/local/lib:/opt/local/lib:/usr/lib -Dstorm.conf.file= -cp /home/scix3/apache/storm/lib/kryo-2.21.jar:/home/scix3/apache/storm/lib/core.incubator-0.1.0.jar:/home/scix3/apache/storm/lib/commons-fileupload-1.2.1.jar:/home/scix3/apache/storm/lib/ring-servlet-0.3.11.jar:/home/scix3/apache/storm/lib/clj-stacktrace-0.2.2.jar:/home/scix3/apache/storm/lib/jline-2.11.jar:/home/scix3/apache/storm/lib/servlet-api-2.5.jar:/home/scix3/apache/storm/lib/disruptor-2.10.1.jar:/home/scix3/apache/storm/lib/log4j-over-slf4j-1.6.6.jar:/home/scix3/apache/storm/lib/clojure-1.5.1.jar:/home/scix3/apache/storm/lib/commons-exec-1.1.jar:/home/scix3/apache/storm/lib/logback-core-1.0.13.jar:/home/scix3/apache/storm/lib/jetty-util-6.1.26.jar:/home/scix3/apache/storm/lib/slf4j-api-1.7.5.jar:/home/scix3/apache/storm/lib/carbonite-1.4.0.jar:/home/scix3/apache/storm/lib/compojure-1.1.3.jar:/home/scix3/apache/storm/lib/minlog-1.2.jar:/home/scix3/apache/storm/lib/commons-lang-2.5.jar:/home/scix3/apache/storm/lib/tools.macro-0.1.0.jar:/home/scix3/apache/storm/lib/reflectasm-1.07-shaded.jar:/home/scix3/apache/storm/lib/tools.cli-0.2.4.jar:/home/scix3/apache/storm/lib/math.numeric-tower-0.0.1.jar:/home/scix3/apache/storm/lib/logback-classic-1.0.13.jar:/home/scix3/apache/storm/lib/tools.logging-0.2.3.jar:/home/scix3/apache/storm/lib/asm-4.0.jar:/home/scix3/apache/storm/lib/jetty-6.1.26.jar:/home/scix3/apache/storm/lib/snakeyaml-1.11.jar:/home/scix3/apache/storm/lib/hiccup-0.3.6.jar:/home/scix3/apache/storm/lib/clj-time-0.4.1.jar:/home/scix3/apache/storm/lib/jgrapht-core-0.9.0.jar:/home/scix3/apache/storm/lib/clout-1.0.1.jar:/home/scix3/apache/storm/lib/chill-java-0.3.5.jar:/home/scix3/apache/storm/lib/commons-io-2.4.jar:/home/scix3/apache/storm/lib/joda-time-2.0.jar:/home/scix3/apache/storm/lib/storm-core-0.9.4.jar:/home/scix3/apache/storm/lib/objenesis-1.2.jar:/home/scix3/apache/storm/lib/commons-logging-1.1.3.jar:/home/scix3/apache/storm/lib/ring-core-1.1.5.jar:/home/scix3/apache/storm/lib/ring-jetty-adapter-0.3.11.jar:/home/scix3/apache/storm/lib/commons-codec-1.6.jar:/home/scix3/apache/storm/lib/json-simple-1.1.jar:/home/scix3/apache/storm/lib/ring-devel-0.3.11.jar:storm-starter-.jar:/home/scix3/apache/storm/conf:/home/scix3/apache/storm/bin -Dstorm.jar=storm-starter-.jar storm.starter.RollingTopWords
Error: Could not find or load main class storm.starter.RollingTopWords
The main issue for the error
Could not find or load main class storm.starter.RollingTopWords cloud be.
Check the launch configuration while building the jar.
you must be very careful while building the jar ,it asks you to choose destination folder and launch configuration(launch configuration should be of same project)
You might have missed the main class in your project.
Before using Stormsubmitter in Remote cluster, check once weather it works properly localcluster
To check if the problem is with storm unable to find the jar, you can try issuing
storm jar /fullpath/my-storm-jar.jar Classname
Few other things you can make sure
The jar is compiled properly/jar contains the RollingTopWords class
storm.yaml points to the correct nimubs (This seems less probable, as the the connection is being made and there is an attempt to load the topology)
SOLVED (the solution is in the comments)
I'm using Hadoop 2.2.0 (in pseudo-distributed mode) on ubuntu 13.10 and Eclipse Kepler v4.3 to develop my Hadoop program and Dynamic Web Project (without Maven).
My Hadoop jar project, called "WorkTest.jar", works correctly when I run job from command line with: "Hadoop jar WorkTest.jar" and I see correctly the work progress on the terminal.
Hadoop project contains four elements:
DriverJob.java (class that configures and starts the job)
Mapper.java
Combiner.java
Reducer.java
Now I have written a new Dynamic Web Project with a ServletTest.java in which I entered the DriverJob class code, the other class (Mapper.java, Combiner.java, Reducer.java) are placed in the same package as the servlet (main package). The WebContent/lib folder contains all Hadoop jar necessary dependencies.
I have successfully deploy my application on WildFly 8 Server whit Eclipse but when I try to run mapreduce job (the job configuration runs successfully and I managed to delete and write a folder on HDFS), he keeps on failing with the following exception visible from the Hadoop Job log file:
FATAL [IPC Server handler 5 on 46834] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1396015900746_0023_m_000002_0 - exited : java.lang.RuntimeException: java.lang.ClassNotFoundException: Class Mapper not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1720)
at org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContextImpl.java:186)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:721)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
Caused by: java.lang.ClassNotFoundException: Class Mapper not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1626)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1718)
... 8 more
and from the WildFly log file:
WARN [org.apache.hadoop.mapreduce.JobSubmitter] Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
WARN [org.apache.hadoop.mapreduce.JobSubmitter] No job jar file set. User classes may not be found. See Job or Job#setJar(String).
But the WEB-INF/classes/ deploy folder on WildFly containing the Mapper.class, Combiner.class and Reducer.class.
I also tried to enter the class code of Mapper, Combiner and Reducer inside the servlet, but does not work with the same error...
What I'm doing wrong?
I believe you need to have your .class files in an archive (jar) that can be distributed to the nodes in the cluster.
WARN [org.apache.hadoop.mapreduce.JobSubmitter] No job jar file set. User classes may not be found. See Job or Job#setJar(String).
This error is the key. Generally you would use job.setJarByClass(DriverJob.class) to tell the mapreduce client which jar file has the Mapper/Reducer classes. You don't have a jar and so that method for distributing the proper classes falls apart.