I am trying to run the typical Flume first example to get tweets and store them in HDFS using Apache FLume.
[Hadoop version 3.1.3; Apache Flume 1.9.0]
I have configured flume-env.sh:
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre/
Configured the agent as shown in the TwitterStream.properties config file:
# Naming the components on the current agent.
TwitterAgent.sources = Twitter
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS
# Describing/Configuring the source
TwitterAgent.sources.Twitter.type = org.apache.flume.source.twitter.TwitterSource
TwitterAgent.sources.Twitter.consumerKey = **********************
TwitterAgent.sources.Twitter.consumerSecret = **********************
TwitterAgent.sources.Twitter.accessToken = **********************
TwitterAgent.sources.Twitter.accessTokenSecret = **********************
TwitterAgent.sources.Twitter.keywords = tutorials point, java, bigdata, mapreduce, mahout, hbase, nosql
# Describing/Configuring the sink
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://localhost:9000/user/twitter_data/
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000
# Describing/Configuring the channel
TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 10000
TwitterAgent.channels.MemChannel.transactionCapacity = 100
# Binding the source and sink to the channel
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sinks.HDFS.channel = MemChannel
And then running the command:
bin/flume-ng agent -c /home/jiglesia/hadoop/flume/conf/ -f TwitterStream.properties -n TwitterAgent -Dflume.root.logger=INFO, console -n TwitterAgent
Getting the following ERROR during the execution:
Info: Sourcing environment configuration script /home/jiglesia/hadoop/flume/conf/flume-env.sh
Info: Including Hadoop libraries found via (/home/jiglesia/hadoop/bin/hadoop) for HDFS access
/home/jiglesia/hadoop/libexec/hadoop-functions.sh: line 2360: HADOOP_ORG.APACHE.FLUME.TOOLS.GETJAVAPROPERTY_USER: bad substitution
/home/jiglesia/hadoop/libexec/hadoop-functions.sh: line 2455: HADOOP_ORG.APACHE.FLUME.TOOLS.GETJAVAPROPERTY_OPTS: bad substitution
Info: Including Hive libraries found via () for Hive access
I don't know why it says Bad Substitution.
I finally attach the entire log if it could say anything to you:
+ exec /usr/lib/jvm/java-8-openjdk-amd64/jre//bin/java -Xmx20m -Dflume.root.logger=INFO, -cp '/home/jiglesia/hadoop/flume/conf:/home/jiglesia/hadoop/flume/lib/*:/home/jiglesia/hadoop/etc/hadoop:/home/jiglesia/hadoop/share/hadoop/common/lib/*:/home/jiglesia/hadoop/share/hadoop/common/*:/home/jiglesia/hadoop/share/hadoop/hdfs:/home/jiglesia/hadoop/share/hadoop/hdfs/lib/*:/home/jiglesia/hadoop/share/hadoop/hdfs/*:/home/jiglesia/hadoop/share/hadoop/mapreduce/lib/*:/home/jiglesia/hadoop/share/hadoop/mapreduce/*:/home/jiglesia/hadoop/share/hadoop/yarn:/home/jiglesia/hadoop/share/hadoop/yarn/lib/*:/home/jiglesia/hadoop/share/hadoop/yarn/*:/lib/*' -Djava.library.path=:/home/jiglesia/hadoop/lib/native org.apache.flume.node.Application -f TwitterStream.properties -n TwitterAgent console -n TwitterAgent
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/jiglesia/hadoop/flume/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/jiglesia/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
log4j:WARN No appenders could be found for logger (org.apache.flume.node.Application).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
The environment variables configured in the bashrc file:
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre/
export HADOOP_HOME=/home/jiglesia/hadoop
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
Thanks for your help !
You've referenced bash variables incorrectly.
Try this
Note: I'd suggest not putting flume as a subdirectory of Hadoop.
I recommend using Apache Ambari to install and configure Hadoop and Flume processes
With regard to the Log4j JNDI remote code execution vulnerability that has been identified CVE-2021-44228 - (also see references) - I wondered if Log4j-v1.2 is also impacted, but the closest I got from source code review is the JMS-Appender.
The question is, while the posts on the Internet indicate that Log4j 1.2 is also vulnerable, I am not able to find the relevant source code for it.
Am I missing something that others have identified?
Log4j 1.2 appears to have a vulnerability in the socket-server class, but my understanding is that it needs to be enabled in the first place for it to be applicable and hence is not a passive threat unlike the JNDI-lookup vulnerability which the one identified appears to be.
Is my understanding - that Log4j v1.2 - is not vulnerable to the jndi-remote-code execution bug correct?
Apache Log4j Security Vulnerabilities
Zero-day in ubiquitous Log4j tool poses a grave threat to the Internet
Worst Apache Log4j RCE Zero day Dropped on Internet
‘Log4Shell’ vulnerability poses critical threat to applications using ‘ubiquitous’ Java logging package Apache Log4j
This blog post from Cloudflare also indicates the same point as from AKX....that it was introduced from Log4j 2!
Update #1 - A fork of the (now-retired) apache-log4j-1.2.x with patch fixes for few vulnerabilities identified in the older library is now available (from the original log4j author). The site is https://reload4j.qos.ch/. As of 21-Jan-2022 version has been released. Vulnerabilities addressed to date include those pertaining to JMSAppender, SocketServer and Chainsaw vulnerabilities. Note that I am simply relaying this information. Have not verified the fixes from my end. Please refer the link for additional details.
The JNDI feature was added into Log4j 2.0-beta9.
Log4j 1.x thus does not have the vulnerable code.
While not affected by the exact same Log4Shell issue, the Apache Log4j team recommends to remove JMSAppender and SocketServer, which has a vulnerability in CVE-2019-17571, from your JAR files.
You can use the zip command to remove the affected classes. Replace the filename/version with yours:
zip -d log4j-1.2.16.jar org/apache/log4j/net/JMSAppender.class
zip -d log4j-1.2.16.jar org/apache/log4j/net/SocketServer.class
You can look through through the files in your zip using less and grep, e.g. less log4j-1.2.16.jar | grep JMSAppender
That being said, Apache recommends that you upgrade to the 2.x version if possible. According to their security page:
Please note that Log4j 1.x has reached end of life and is no longer supported. Vulnerabilities reported after August 2015 against Log4j 1.x were not checked and will not be fixed. Users should upgrade to Log4j 2 to obtain security fixes.
In addition to giraffesyo's answer and in case it helps anyone - I wrote this Bash script - which removes classes identified as vulnerabilities (link here to Log4j dev thread) and sets properties files are read-only - as suggested here on a Red Hat Bugzilla thread.
Note 1 - it does not check for any usage of these classes in properties it is purely a way to find and remove - use at own risk!
Note 2 - it depends on zip and unzip being installed
# Classes to be searched for/removed
PROGNAME=`basename $0`
PROGPATH=`echo $0 | sed -e 's,[\\/][^\\/][^\\/]*$,,'`
usage () {
echo >&2 Usage: ${PROGNAME} DIR [APPLY]
echo >&2 Where DIR is the starting directory for find
echo >&2 and APPLY = "Y" - to perform purification
exit 1
# Force upper case on Apply
APPLY=$(echo "${APPLY}" | tr '[:lower:]' '[:upper:]')
# Default Apply to N
if [ "$APPLY" == "" ] ; then
# Check parameters
if [ "$DIR" == "" ] ; then
echo $APPLY | grep -q -i -e '^Y$' -e '^N$' || usage
# Search for log4j jar files - for class file removal
FILES=$(find $DIR -name *log4j*jar)
for f in $FILES
echo "Checking Jar [$f]"
for jf in $CLASSES
unzip -v $f | grep -e "$jf"
if [ "$APPLY" = "Y" ]
echo "Deleting $jf from $f"
zip -d $f $jf
# Search for Log4j properties files - for read-only setting
PFILES=$(find $DIR -name *log4j*properties)
for f in $PFILES
echo "Checking permissions [$f]"
if [ "$APPLY" = "Y" ]
echo "Changing permissons on $f"
chmod 444 $f
ls -l $f
I read in git I should run this
To run as a javaagent download the jar and run:
java -javaagent:./jmx_prometheus_javaagent-0.14.0.jar=8080:config.yaml -jar yourJar.jar
but I didnt understand, should I also download the file for the kubenrtes container as well?
can someone assist how can I continue from here
I have the following default configuration:
hostPort: localhost:5555
- pattern: ".*"
I am trying to run a spark job in EMR cluster.
I my spark-submit I have added configs to read from log4j.properties
--files log4j.properties --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=file:/log4j.properties"
Also I have added
log4j.rootLogger=INFO, file
log4j.appender.file.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %5p %c{7} - %m%n
in my log4j configurations.
Anyhow I see the logs in the console, though I don't see the log file generated. What am I doing wrong here ?
Quoting spark-submit --help:
--files FILES Comma-separated list of files to be placed in the working directory of each executor. File paths of these files in executors can be accessed via SparkFiles.get(fileName).
That doesn't much say what to do with the FILES if you cannot use SparkFiles.get(fileName) (which you cannot for log4j).
Quoting SparkFiles.get's scaladoc:
Get the absolute path of a file added through SparkContext.addFile().
That does not give you much either, but suggest to have a look at the source code of SparkFiles.get:
def get(filename: String): String =
new File(getRootDirectory(), filename).getAbsolutePath()
The nice thing about it is that getRootDirectory() uses an optional property or just the current working directory:
def getRootDirectory(): String =
That gives as something to work on, doesn't it?
On the driver the so-called driverTmpDir directory should be easy to find in Environment tab of web UI (under Spark Properties for spark.files property or Classpath Entries marked as "Added By User" in Source column).
On executors, I'd assume a local directory so rather than using file:/log4j.properties I'd use
Note the dot to specify the local working directory (in the first option) or no leading / (in the latter).
Don't forget about spark.driver.extraJavaOptions to set the Java options for the driver if that's something you haven't thought about yet. You've been focusing on executors only so far.
You may want to add -Dlog4j.debug=true to spark.executor.extraJavaOptions that is supposed to print what locations log4j uses to find log4j.properties.
I have not checked that answer on a EMR or YARN cluster myself but believe that may have given you some hints where to find the answer. Fingers crossed!
With Spark 2.2.0 standalone cluster, executor JVM is started first and only then Spark distributes application jar and --files
Which means passing
does not makes sense as this file does not exist yet (is not downloaded) at the executor JVM launch time and log4j initialization
If you pass
spark.executor.extraJavaOptions=-Dlog4j.debug -Dlog4j.configuration=file:log4j-spark.xml
you will find at the beginning of the executor's stderr failed attempt to load log4j config file
log4j:ERROR Could not parse url [file:log4j-spark.xml].
java.io.FileNotFoundException: log4j-spark.xml (No such file or directory)
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
And bit later is logged download of --files from driver
18/07/18 17:24:12 INFO Utils: Fetching spark:// to /ca/tmp-spark/spark-49815375-3f02-456a-94cd-8099a0add073/executor-7df1c819-ffb7-4ef9-b473-4a2f7747237a/spark-0b50a7b9-ca68-4abc-a05f-59df471f2d16/fetchFileTemp5898748096473125894.tmp
18/07/18 17:24:12 INFO Utils: Copying /ca/tmp-spark/spark-49815375-3f02-456a-94cd-8099a0add073/executor-7df1c819-ffb7-4ef9-b473-4a2f7747237a/spark-0b50a7b9-ca68-4abc-a05f-59df471f2d16/-18631083971531927447443_cache to /opt/spark-2.2.0-bin-hadoop2.7/work/app-20180718172407-0225/2/./log4j-spark.xml
It may work differently with yarn or another cluster manager but with standalone cluster, it seems there is no way you can specify your own logging configuration for executors on spark-submit.
You can dynamically reconfigure log4j in your job code (override log4j configuration programmatically: file location for FileAppender), but you would need to do it carefully in some mapPartition lambda that is executed in executor's JVM. Or maybe you can dedicate first stage of your job to it. All that sucks though...
Here is the complete command I used to run my uber-jar in EMR and I see log files generated in driver and executor nodes.
spark-submit --class com.myapp.cloud.app.UPApp --master yarn --deploy-mode client --driver-memory 4g --executor-memory 2g --executor-cores 8 --files log4j.properties --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=file:log4j.properties" --conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=file:log4j.properties -Dlog4j.debug=true" --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=file:log4j.properties" --conf "spark.eventLog.dir=/mnt/var/log/" uber-up-0.0.1.jar
where log4j.properties is in my local filesystem.
We're trying to debug the behaviour of svnkit during checkout on Linux (Debian) that is related to a JDK/JVM bug described here
We followed the steps described here
by renaming the file conf/logging.properties.disabled to logging.properties and by also replacing the /usr/lib/jvm/java-1.8.0-openjdk-amd64/jre/lib/logging.properties (which symlinks to /etc/java-8-openjdk/logging.properties)
This produces a log file under Windows (in the bin folder of the svnkit standalone), but has no effect under Linux.
./jsvn checkout --username XYZ --password ABC http://SVN_SERVER/svn/project/trunk/
makes ps aux tell us
/usr/lib/jvm/java-1.8.0-openjdk-amd64/bin/java -Djava.util.logging.config.file=/tmp/svnkit-1.8.15/conf/logging.properties -Dsun.io.useCanonCaches=false -classpath /tmp/svnkit-1.8.15/lib/svnkit-1.8.15.jar:/tmp/svnkit-1.8.15/lib/sequence-library-1.0.3.jar:/tmp/svnkit-1.8.15/lib/sqljet-1.1.10.jar:/tmp/svnkit-1.8.15/lib/jna-4.1.0.jar:/tmp/svnkit-1.8.15/lib/jna-platform-4.1.0.jar:/tmp/svnkit-1.8.15/lib/trilead-ssh2-1.0.0-build221.jar:/tmp/svnkit-1.8.15/lib/jsch.agentproxy.connector-factory-0.0.7.jar:/tmp/svnkit-1.8.15/lib/jsch.agentproxy.svnkit-trilead-ssh2-0.0.7.jar:/tmp/svnkit-1.8.15/lib/antlr-runtime-3.4.jar:/tmp/svnkit-1.8.15/lib/jsch.agentproxy.core-0.0.7.jar:/tmp/svnkit-1.8.15/lib/jsch.agentproxy.usocket-jna-0.0.7.jar:/tmp/svnkit-1.8.15/lib/jsch.agentproxy.usocket-nc-0.0.7.jar:/tmp/svnkit-1.8.15/lib/jsch.agentproxy.sshagent-0.0.7.jar:/tmp/svnkit-1.8.15/lib/jsch.agentproxy.pageant-0.0.7.jar:/tmp/svnkit-1.8.15/lib/svnkit-cli-1.8.15.jar org.tmatesoft.svn.cli.SVN checkout --username XYZ --password ABC http://SVN_SERVER/svn/project/trunk/
is the part that tells us the renamed logging.properties file is being used to configure the logging.
The content of the logging.properties is
handlers = java.util.logging.FileHandler
java.util.logging.FileHandler.pattern = svnkit.%u.log
java.util.logging.FileHandler.limit = 0
java.util.logging.FileHandler.count = 1
java.util.logging.FileHandler.append = true
java.util.logging.FileHandler.formatter = java.util.logging.SimpleFormatter
Any ideas what we are doing wrong?
Can anyone help me how to integrate storm with zookeeper in windows.
I try to find the find a good installation steps for production in windows but I can't.
Write now I have installed stand alone zookeeper and I am trying to configure it in the storm.yaml.
sample code I tried :
- ""
- "server2"
storm.zookeeper.port: 2180
nimbus.host: "localhost"
If any body knows please help me.
Please follow the below steps for complete installation of storm with zookeeper in windows
1. create a zoo.cfg file under /conf directory of zookeeper.
here are the configuration entries
# The number of milliseconds of each tick
# The number of ticks that the initial
# synchronization phase can take
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
# the directory where the snapshot is stored.
# the port at which the clients will connect
2.run the zookeeper by executing the zkServer.bat in /bin directory
3.create the storm.yaml file in /conf directory of the apache-storm
here are the configuration entries
########### These MUST be filled in for a storm configuration
- "localhost"
# - 6700
# - 6701
# - 6702
# - 6703
storm.local.dir: D:\\workspace\\storm-data
nimbus.host: "localhost"
4.run the below commands from the command prompt
a. bin\storm nimbus
b. bin\storm supervisor
c. bin\storm ui
hit the browser with localhost:8080 to see the nimbus ui
6.make sure your JAVA_HOME path does not have any space. otherwise storm command
will not run.