I was able to verify that the input directory (under /tmp) exists with the newsgroup data. Not sure why I am getting a file not found exception.
$ sh classify-20newsgroups.sh
Please select a number to choose the corresponding task to run
1. naivebayes
2. sgd
3. clean -- cleans up the work area in /tmp/mahout-work-rsrinivasan
Enter your choice : 1
ok. You chose 1 and we'll use naivebayes
creating work directory at /tmp/mahout-work-rsrinivasan
Preparing Training Data
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
no HADOOP_HOME set, running locally
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/C:/cygwin/usr/local/mahout/examples/target/mahout-examples-0.6-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/C:/cygwin/usr/local/mahout/examples/target/dependency/slf4j-jcl-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/C:/cygwin/usr/local/mahout/examples/target/dependency/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
12/05/14 09:13:44 WARN driver.MahoutDriver: No org.apache.mahout.classifier.bayes.PrepareTwentyNewsgroups.props found on classpath, will use command-line arguments only
Exception in thread "main" java.io.FileNotFoundException: Can't find input directory \tmp\mahout-work-rsrinivasan\20news-bydate\20news-bydate-train
at org.apache.mahout.classifier.bayes.PrepareTwentyNewsgroups.main(PrepareTwentyNewsgroups.java:92)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
You probably have to edit that script before it works on Windows. I imagine the paths are wrong for Cygwin/Windows.
it is probably best to run the example under a unix environment. When I was trying the oscon2011 reuters example I ran into similar issues; although I was using the git bash console for doing the work. It seems that the classification and clustering examples hdfs local to run properly.
I managed to get a virtualbox up and running using vagrant and the process was relatively straightforward. Yes it does add to the learning cycle but after some initial investment I was able to complete the reuters example in a couple of hours.
thanks
anand
Related
I want to run Kafka from IDEA and I am getting the following error:
> Task :core:Kafka.main()
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
SLF4J: Failed to load class "org.slf4j.impl.StaticMDCBinder".
SLF4J: Defaulting to no-operation MDCAdapter implementation.
SLF4J: See http://www.slf4j.org/codes.html#no_static_mdc_binder for further details.
> Task :core:Kafka.main() FAILED
Execution failed for task ':core:Kafka.main()'.
> Process 'command '/Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home/bin/java'' finished with non-zero exit value 1
* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output. Run with --scan to get full insights.
I am able to run Kafka from the terminal.
I run Zookeeper from terminal and then Run Kafka from IDEA.
Steps that I follow to run Kafka
As usual, I run the command ./gradlew jar to build Kafka from the source, using the terminal.
I open the project in IDEA using idea . from the root directory of the cloned repository.
I opened the file: core/src/main/scala/kafka/Kafka.scala
I then navigate to main() function and click on the green triangle.
This helps me get the Run configurations and it fails. I then add the config/server.properties in the Program Arguments to make the Run config look like this:
Upon running with the above configurations, I get the aforementioned error.
I searched a bit and found that the same issue was resolved by adding dependencies, like mentioned here and here but I could not understand how I can add dependency as I do not use Maven and I cannot find the file pom.xml file as described here.
Update 1
I tried to add the exact dependency as stated from the terminal to the Run configurations, I am unsure whether that is added or not because the result is still the same:
This is what I find when I run Kafka from the terminal:
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/Users/aviralsrivastava/dev/kafka/core/build/dependant-libs-2.13.5/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/Users/aviralsrivastava/dev/kafka/tools/build/dependant-libs-2.13.5/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/Users/aviralsrivastava/dev/kafka/connect/api/build/dependant-libs/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/Users/aviralsrivastava/dev/kafka/connect/transforms/build/dependant-libs/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/Users/aviralsrivastava/dev/kafka/connect/runtime/build/dependant-libs/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/Users/aviralsrivastava/dev/kafka/connect/file/build/dependant-libs/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/Users/aviralsrivastava/dev/kafka/connect/mirror/build/dependant-libs/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/Users/aviralsrivastava/dev/kafka/connect/mirror-client/build/dependant-libs/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/Users/aviralsrivastava/dev/kafka/connect/json/build/dependant-libs/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/Users/aviralsrivastava/dev/kafka/connect/basic-auth-extension/build/dependant-libs/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
[2021-03-23 10:54:59,524] INFO Registered kafka:type=kafka.Log4jController MBean (kafka.utils.Log4jControllerRegistration$)
From the second last line, I deduced the dependency name and added it as shown in the last image.
Generally speaking, the classpath needs configured correctly, to isolate a single Slf4j implementation (the one in the core module, specifically), but messing with the build scripts should be avoided to fix your runtime dependencies within a module
To fix your logger, you need to pass in -Dlog4j.configuration=config/log4j.properties as a VM option (I think you'll have to toggle the drop-down where it says -cp kafka.core.main to get that input to show)
If you want to emulate actual runtime behavior of the server, and attach a debugger, setup your breakpoints, open a terminal (assuming you are using Zookeeper, and it is already running somewhere else, otherwise you need a new terminal for it)
export KAFKA_DEBUG=y
export DEBUG_SUSPEND_FLAG=y
bin/kafka-server-start config/server.properties
Then add a run configuration for a remote application and attach it to port 5005
Once it attaches, your breakpoint should take focus, and you can step through the code
I was also facing the same issue while trying to set up the Apache Kafka source code in Intellij IDEA and here's what I have tried to solve the SL4J issue.
Set the JVM arguments as highlighted in the picture below:
Also, in the build.gradle file, add the following dependency under section project(':core') after the line implementation libs.commonsCli
implementation libs.slf4jlog4j
Tested on Java 17, Apache Kafka 3.1.0 and Intellij IDEA 2021.3.1
I am new in hadoop. I am trying to run a job by toolrunner of hadoop from java code in netbeans environment. But still I can't find a solution to fix the issue.
Exception in thread "main" java.lang.NoClassDefFoundError: javax/security/auth/kerberos/KeyTab
at org.apache.hadoop.security.UserGroupInformation.(UserGroupInformation.java:609)
at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:799)
at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:760)
at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:633)
at org.apache.hadoop.fs.FileSystem$Cache$Key.(FileSystem.java:2812)
at org.apache.hadoop.fs.FileSystem$Cache$Key.(FileSystem.java:2802)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2668)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:170)
at org.enahang.mapreduce.utils.mrUtils.Test.run(Test.java:125)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.enahang.mapreduce.utils.mrUtils.Test.main(Test.java:62)
My platform is windows 7.
I’ve added many libraries. First of all I’ve added the
Apacheds-kerberos-codec-2.0.0-M15.jar
Then I added many other libraries similar
Javaee-api-7.0.jar
java-rt-jar-stubs-1.5.0.jar
…
But I don’t know from where the error arises.
This is the complete list of .jar files I added to include Keytab class in code:
apacheds-i18n-2.0.0-M15.jar
api-asn1-api-1.0.0-M20.jar
api-util-1.0.0-M20.jar
commons-cli-1.2.jar
commons-codec-1.4.jar
commons-collections-3.2.1.jar
commons-configuration-1.6.jar
jsp-api-2.1.jar
hadoop-auth-2.7.0.jar
Thanks in advance
Ok, I think for this question the research is enough; The answer is that hadoop-conf-kerberos-6.0.0.jar contains some xml and properties files for configuration of hadoop with respect to kerberos. The problem of my program to find the class of KeyTab was the result of a bad configuration.
[hadoop-conf-kerberos-6.0.0.jar][1]contains prepared configuration files similar core-site.xml, hdfs-site.xml, mapred-site.xml, etc. It complemented the Apacheds-kerberos-codec-2.0.0-M15.jar I had added to my program.
Logging initialized using configuration in jar:file:/usr/local/hive/lib/hive-common-0.12.0.jar!/hive-log4j.properties
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12- 1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hive/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
hive>
you need to delete these jar files binding between Hadoop and Hive
rm lib/hive-jdbc-2.0.0-standalone.jar
rm lib/log4j-slf4j-impl-2.4.1.jar
you have to delete /usr/local/hive/lib/slf4j-log4j12-1.6.1.jar because hive will automatically use slf4j-log4j jar file present in hadoop.
you can also refer here https://issues.apache.org/jira/browse/HIVE-6162
Of the 2 SLF4J bindings being listed in the warning you'll need to exclude one of them from the classpath.
Even though this is a warning SLF4J will pick one logging framework/implementation and bind with it - binding is determined by the JVM and is mostly considered a random function.
You are getting such warning message because of conflicts sl4j.jar which is being used from HIVE and HADOOP path.
In order to get rid of this thing just delete hive-jdbc-1.1.0-standalone.jar from /usr/local/hive/lib.
Then you should good to go ... :)
To resolve this add the following line of script to /usr/iop/4.1.0.0/hive/bin/hive.distro file on all hive nodes,
CLASSPATH=`echo $CLASSPATH| sed 's/\/usr\/local\/hadoop\/lib\/slf4j\-log4j12\-1\.7\.10\.jar//g'`
The script should be inserted after the lines
if $cygwin; then
CLASSPATH=`cygpath -p -w "$CLASSPATH"`
CLASSPATH=${CLASSPATH};${AUX_CLASSPATH}
else
CLASSPATH=${CLASSPATH}:${AUX_CLASSPATH}
fi
The warnings will no longer appear.
http://www-01.ibm.com/support/docview.wss?uid=swg21971864
I am running a job on an AWS EMR cluster, and am having issues with a Jackson library conflict. Based on the article here I tried to add a bootstrap step to set my classpath with the following script:
#!/bin/bash
export HADOOP_USER_CLASSPATH_FIRST=true;
echo "HADOOP_CLASSPATH=s3n://bucket/myjar.jar" > /home/hadoop/conf/hadoop-user-env.sh
I have built my jar so that all its dependencies are included with it. The first problem I have when I do this is that my enable debugging step that I have dies with the following error:
Exception in thread "main" java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.amazon.ws.emr.hadoop.fs.EmrFileSystem not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1895)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2427)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2440)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:88)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2479)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2461)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:372)
at com.amazon.elasticmapreduce.scriptrunner.ScriptRunner.fetchFile(ScriptRunner.java:39)
at com.amazon.elasticmapreduce.scriptrunner.ScriptRunner.main(ScriptRunner.java:56)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: java.lang.ClassNotFoundException: Class com.amazon.ws.emr.hadoop.fs.EmrFileSystem not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1801)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1893)
... 13 more
So I have two questions, what is wrong with this regards to the enable debugging step also? Is it valid to give my classpath as a s3 location? If not what should the value of:
/path/to/my.jar
be in the example on the page indicated above?
Looking at your bootstrap action, it looks like there might be a mistake in your string. The line should look like the following:
#!/bin/bash
export HADOOP_USER_CLASSPATH_FIRST=true
echo "HADOOP_CLASSPATH=/path/to/my.jar" >> /home/hadoop/conf/hadoop-user-env.sh
Note the '>>' characters. A single '>' means that you're replacing the entire file with the output of the 'echo' command, whereas a double '>>' means you're appending that line at the end of the script. Additionally, a semi-colon isn't needed in a Bash script.
References : http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-hadoop-config_hadoop-user-env.sh.html
PS : Amazon's awesome support found this question and replied to my email; although this question was not asked by me. So this is the attribution to the author - AWS Support Engineer named Rendy O.
Logging initialized using configuration in jar:file:/usr/local/hive/lib/hive-common-0.12.0.jar!/hive-log4j.properties
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12- 1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hive/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
hive>
you need to delete these jar files binding between Hadoop and Hive
rm lib/hive-jdbc-2.0.0-standalone.jar
rm lib/log4j-slf4j-impl-2.4.1.jar
you have to delete /usr/local/hive/lib/slf4j-log4j12-1.6.1.jar because hive will automatically use slf4j-log4j jar file present in hadoop.
you can also refer here https://issues.apache.org/jira/browse/HIVE-6162
Of the 2 SLF4J bindings being listed in the warning you'll need to exclude one of them from the classpath.
Even though this is a warning SLF4J will pick one logging framework/implementation and bind with it - binding is determined by the JVM and is mostly considered a random function.
You are getting such warning message because of conflicts sl4j.jar which is being used from HIVE and HADOOP path.
In order to get rid of this thing just delete hive-jdbc-1.1.0-standalone.jar from /usr/local/hive/lib.
Then you should good to go ... :)
To resolve this add the following line of script to /usr/iop/4.1.0.0/hive/bin/hive.distro file on all hive nodes,
CLASSPATH=`echo $CLASSPATH| sed 's/\/usr\/local\/hadoop\/lib\/slf4j\-log4j12\-1\.7\.10\.jar//g'`
The script should be inserted after the lines
if $cygwin; then
CLASSPATH=`cygpath -p -w "$CLASSPATH"`
CLASSPATH=${CLASSPATH};${AUX_CLASSPATH}
else
CLASSPATH=${CLASSPATH}:${AUX_CLASSPATH}
fi
The warnings will no longer appear.
http://www-01.ibm.com/support/docview.wss?uid=swg21971864