Why do my application level logs disappear when executed in oozie? - java

I'm using oozie in CDH5 environment. I'm also using the oozie web-console. I'm not able to see any of the logs from my application. I can see hadoop logs, spark logs, etc; but I see no application specific logs.
In my application I've included src/main/resources/log4j.properties
# Root logger option
log4j.rootLogger=INFO, stdout
# Direct log messages to stdout
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.Target=System.out
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n
In my oozie workflow I have java-actions and spark-actions.
It is also important to note that when I run my application from the command line I do see my application level logs.

Oozie runs each Action in a different "launcher" job -- actually a YARN job with a single mapper (see exceptions below).
Whenever you see an "external ID" in the form job_000000000_0000 then you can reach the YARN logs for application_000000_0000 (yeah, "job" is the legacy naming convention from Hadoop 1, still used by JobHistory service, but YARN has another naming convention).
Your application output is actually dumped into the YARN logs for that Oozie "launcher"
your StdErr is dumped as-is and can be retrieved in the "stderr" section
your StdOut is dumped with a prefix on each line (that prefix is used by Oozie to manage its <capture_output/> trick for Shell and Pig actions) at the end of the atrocely verbose "stdout" section
and nothing gets into the "syslog" section AFAIK
Bottom line:
run oozie job -info ****** to get the list of Actions and the corresponding "external IDs" for your Oozie workflow execution
for each job_*****_** legacy ID, run yarn logs -applicationId application_*****_** | more to skim the global YARN logs, then zoom on your specific app logs
now you can try to automate that thing... have fun B-)
Exceptions to the "launcher" Oozie job principle -- the E-mail Action / Filesystem Action are just API calls executed directly from the Oozie server process; and the MapReduce Action spawns a regular YARN job with multiple Mappers and Reducers.

Related

Can writing data to console in a docker container in linux occupy disk space?

I have the following log4j.properties file in a docker container:
log4j.rootLogger=WARN,CONSOLE
log4j.logger.com.xxx.mypackage=DEBUG
# Console appender
log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender
log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout
log4j.appender.CONSOLE.layout.ConversionPattern=%d %-5p [%c] %m%n
So, I'm logging in the console.
Can this configuration consume disk space in linux?
Because I have no space left in my device and the space results free after the command:
docker-compose down
Yes, stdout and stderr in containers are captured as logs in docker. By default, they are stored in json without limits beside the other container filesystem changes and metadata. Those filesystem changes and logs are removed when the container is deleted. And as long as the container exists, they can be viewed with docker logs for a given container.
To limit the size of these logs, see this answer that describes the engine defaults and container specific overrides for the logs.

Log4j not writing logs to file on one Websphere server and writing to file on other

I have maven application with log4j.properties in it with setting to write the logs to a specified file instead of console. When I run the EAR on one of the websphere servers it is creating the file as expected and writing logs to it. But, when I am running the same EAR on other webspehere server it is writing to console instead of writing the logs to the specified file. I have checked the permissions and everything seems to be fine. Please help me in identifying what the issue is. Thanks in advance.
# CONSOLE APPENDER (stdout)
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.Threshold=DEBUG
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=[%d] [%t] %-5p %20c - %m%n
# ROLLING FILE APPENDER (on the file system) for memberpolicyattributesservice code
log4j.appender.xxxxService=org.apache.log4j.RollingFileAppender
log4j.appender.xxxxService.Threshold=DEBUG
log4j.appender.xxxxService.File=/var/logs/xxxServer1/xxxServiceLog.log
log4j.appender.xxxxService.layout=org.apache.log4j.PatternLayout
log4j.appender.xxxxService.layout.ConversionPattern=%d{MM-dd#HH:mm:ss} %-5p (%13F:%L) %3x - %m%n
log4j.appender.xxxxService.MaxFileSize=10000KB
log4j.appender.xxxxService.MaxBackupIndex=30
log4j.appender.xxxxService.layout=org.apache.log4j.PatternLayout
log4j.appender.xxxxService.layout.ConversionPattern=[%d] [%t] %-5p %20c - %m%n
# ROLLING FILE APPENDER (on the file system) for hiberate, open source code log files
log4j.appender.open_source_code=org.apache.log4j.RollingFileAppender
log4j.appender.open_source_code.layout=org.apache.log4j.PatternLayout
log4j.appender.open_source_code.Threshold=DEBUG
#message format:YYYY-MM-DD HH:mm:ss,ms [ThreadId] <PRIORITY> classname.message
log4j.appender.open_source_code.layout.ConversionPattern=%d [%t]<%-5p> %c.%m \r\n
#file that will be logged to
log4j.appender.open_source_code.File=/var/logs/xxxServer1/open_source_code.log
log4j.appender.open_source_code.Append=true
log4j.appender.open_source_code.MaxFileSize=1024KB
log4j.appender.open_source_code.MaxBackupIndex=5
#turn on log4j verbose mode
log4j.debug = true
# Set root logger level to INFO and its appender to DSInstrumentor,stdout.
log4j.rootLogger=DEBUG,stdout,xxxxService
# YOUR CATEGORIES (to customize logging per class/pkg/project/etc)
log4j.category.fatal=FATAL,xxxxService
log4j.category.error=ERROR,xxxxService
#This will also enable the logging for all the children (packages and classes) of this package
log4j.logger.com.xxxxx=ALL,xxxxService
# Print only messages of level INFO in the open source code
log4j.logger.org=INFO,open_source_code
You have multiple loggers defined in your root logger (DEBUG, stdout, xxxxService) but the xxxxService loggers look like they're bound to the file system on one of the systems:
log4j.appender.open_source_code.File=/var/logs/xxxServer1/open_source_code.log
Make sure that path is valid for each server in your WAS cluster.
As a side note, you should probably avoid using debug and stdout on a remote server. This is fine for local development in your workstation, but not on a remote box. Instead, provide different log4j properties on your various deployment tiers. This lets you customize the log location or appender (say c:\temp or CONSOLE on your desktop, but /var/logs ... on all remote machines) as well as your log levels (DEBUG for desktop, maybe INFO for QA or Staging, and WARN or ERROR for Production).

How to specify the location of custom log4j.configuration when spark-submit to Amazon EMR?

I am trying to run a spark job in EMR cluster.
I my spark-submit I have added configs to read from log4j.properties
--files log4j.properties --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=file:/log4j.properties"
Also I have added
log4j.rootLogger=INFO, file
log4j.appender.file=org.apache.log4j.RollingFileAppender
log4j.appender.file.File=/log/test.log
log4j.appender.file.MaxFileSize=10MB
log4j.appender.file.MaxBackupIndex=10
log4j.appender.file.layout=org.apache.log4j.PatternLayout
log4j.appender.file.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %5p %c{7} - %m%n
in my log4j configurations.
Anyhow I see the logs in the console, though I don't see the log file generated. What am I doing wrong here ?
Quoting spark-submit --help:
--files FILES Comma-separated list of files to be placed in the working directory of each executor. File paths of these files in executors can be accessed via SparkFiles.get(fileName).
That doesn't much say what to do with the FILES if you cannot use SparkFiles.get(fileName) (which you cannot for log4j).
Quoting SparkFiles.get's scaladoc:
Get the absolute path of a file added through SparkContext.addFile().
That does not give you much either, but suggest to have a look at the source code of SparkFiles.get:
def get(filename: String): String =
new File(getRootDirectory(), filename).getAbsolutePath()
The nice thing about it is that getRootDirectory() uses an optional property or just the current working directory:
def getRootDirectory(): String =
SparkEnv.get.driverTmpDir.getOrElse(".")
That gives as something to work on, doesn't it?
On the driver the so-called driverTmpDir directory should be easy to find in Environment tab of web UI (under Spark Properties for spark.files property or Classpath Entries marked as "Added By User" in Source column).
On executors, I'd assume a local directory so rather than using file:/log4j.properties I'd use
-Dlog4j.configuration=file://./log4j.properties
or
-Dlog4j.configuration=file:log4j.properties
Note the dot to specify the local working directory (in the first option) or no leading / (in the latter).
Don't forget about spark.driver.extraJavaOptions to set the Java options for the driver if that's something you haven't thought about yet. You've been focusing on executors only so far.
You may want to add -Dlog4j.debug=true to spark.executor.extraJavaOptions that is supposed to print what locations log4j uses to find log4j.properties.
I have not checked that answer on a EMR or YARN cluster myself but believe that may have given you some hints where to find the answer. Fingers crossed!
With Spark 2.2.0 standalone cluster, executor JVM is started first and only then Spark distributes application jar and --files
Which means passing
spark.executor.extraJavaOptions=-Dlog4j.configuration=file:log4j-spark.xml
does not makes sense as this file does not exist yet (is not downloaded) at the executor JVM launch time and log4j initialization
If you pass
spark.executor.extraJavaOptions=-Dlog4j.debug -Dlog4j.configuration=file:log4j-spark.xml
you will find at the beginning of the executor's stderr failed attempt to load log4j config file
log4j:ERROR Could not parse url [file:log4j-spark.xml].
java.io.FileNotFoundException: log4j-spark.xml (No such file or directory)
...
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
And bit later is logged download of --files from driver
18/07/18 17:24:12 INFO Utils: Fetching spark://123.123.123.123:35171/files/log4j-spark.xml to /ca/tmp-spark/spark-49815375-3f02-456a-94cd-8099a0add073/executor-7df1c819-ffb7-4ef9-b473-4a2f7747237a/spark-0b50a7b9-ca68-4abc-a05f-59df471f2d16/fetchFileTemp5898748096473125894.tmp
18/07/18 17:24:12 INFO Utils: Copying /ca/tmp-spark/spark-49815375-3f02-456a-94cd-8099a0add073/executor-7df1c819-ffb7-4ef9-b473-4a2f7747237a/spark-0b50a7b9-ca68-4abc-a05f-59df471f2d16/-18631083971531927447443_cache to /opt/spark-2.2.0-bin-hadoop2.7/work/app-20180718172407-0225/2/./log4j-spark.xml
It may work differently with yarn or another cluster manager but with standalone cluster, it seems there is no way you can specify your own logging configuration for executors on spark-submit.
You can dynamically reconfigure log4j in your job code (override log4j configuration programmatically: file location for FileAppender), but you would need to do it carefully in some mapPartition lambda that is executed in executor's JVM. Or maybe you can dedicate first stage of your job to it. All that sucks though...
Here is the complete command I used to run my uber-jar in EMR and I see log files generated in driver and executor nodes.
spark-submit --class com.myapp.cloud.app.UPApp --master yarn --deploy-mode client --driver-memory 4g --executor-memory 2g --executor-cores 8 --files log4j.properties --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=file:log4j.properties" --conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=file:log4j.properties -Dlog4j.debug=true" --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=file:log4j.properties" --conf "spark.eventLog.dir=/mnt/var/log/" uber-up-0.0.1.jar
where log4j.properties is in my local filesystem.

log4j: logging works when package inside app, but not when package contributed by a JAR

This isn't just a rehash of how to do package-specific logging. I think I've got that nailed. It's about logging not being done when the package is moved out to a JAR shared with others.
Originally, I wrote some special logging into my web application which worked, putting special statements into a file separate from catalina.out. Later, it seemed useful to share this facility with other, cooperating web applications, so we moved the package, consisting of one interface and one class, to a utilities JAR. While the information-gathering continued successfully, the output to the "special" log file, separate from catalina.out, ceased.
When I relocate a copy of the package back into my application, the logging begins working again as originally designed.
The package name remains the same no matter whether it's physically part of my application code or linked in from a consumed library JAR.
I'm struggling to understand a class linked from a JAR is seemingly excluded from the logging scheme while it works flawlessly when simply part of the application code.
Thank you for taking the time to look at this. Any suggestions would be greatly appreciated.
Here's log4j.properties:
# --------------------------------------------------------------------------------------------
# 1) Standard Tomcat log (tomcat): /var/log/tomcat6/catalina.out
# 2) Splunkable output log (splunkable): /var/log/tomcat6/myapp.log.
# 3) Console used when debugging myapp from Eclipse.
log4j.rootLogger=TRACE,tomcat,console
# Rolling-file appender for Tomcat -----------------------------------------------------------
log4j.appender.tomcat.Threshold=INFO
log4j.appender.tomcat=org.apache.log4j.DailyRollingFileAppender
log4j.appender.tomcat.DatePattern='.'yyy-MM-dd
log4j.appender.tomcat.File=${catalina.home}/logs/catalina.out
log4j.appender.tomcat.file.MaxFileSize=500Mb
log4j.appender.tomcat.file.MaxBackupIndex=5
log4j.appender.tomcat.layout=org.apache.log4j.PatternLayout
log4j.appender.tomcat.layout.ConversionPattern=%d{ABSOLUTE} %5p %c{1}:%L - %m%n
# Splunkable log -----------------------------------------------------------------------------
log4j.appender.splunkable.Threshold=TRACE
log4j.appender.splunkable=org.apache.log4j.RollingFileAppender
log4j.appender.splunkable.File=${catalina.home}/logs/myapp.log
log4j.appender.splunkable.MaxFileSize=1Gb
log4j.appender.splunkable.MaxBackupIndex=7
log4j.appender.splunkable.layout=org.apache.log4j.PatternLayout
log4j.appender.splunkable.layout.ConversionPattern=%d{ABSOLUTE} %5p %c{1}: %m%n
# Set the additivity flag to false to avoid propagating the rather verbose splunkable
# output to the normal Tomcat and console logs. Is this mere superstition?
log4j.category.com.acme.web.myapp.logging=TRACE,splunkable
log4j.additivity.com.acme.web.myapp.logging=false
# This logs output to the Eclipse console when running in that mode --------------------------
log4j.appender.console.Threshold=TRACE
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.layout=org.apache.log4j.SimpleLayout
log4j.appender.console.Target=System.out
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{ABSOLUTE} %5p %c{1}:%L - %m%n

log4j: How to use SocketAppender?

I've got an answer about how to use SocketAppender (I need it to gather logs from distributed system), but I am new to log4j and I have no idea how to use that sample code.
Probably I should have log4j-server.properties like that:
log4j.appender.SERVER=org.apache.log4j.net.SocketAppender
log4j.appender.SA.Port=4712
log4j.appender.SA.RemoteHost=loghost
log4j.appender.SA.ReconnectionDelay=10000
But I still don't know how to start the server (how to use this line)
org.apache.log4j.net.SimpleSocketServer 4712 log4j-server.properties
And what is the most important:
Where\How can I see my logs?
You can run the server using
java -classpath log4j.jar org.apache.log4j.net.SimpleSocketServer 4712 log4j-server.properties
The SimpleSocketServer receives logging events sent to the specified port number by the remote SocketAppender, and logs them as if they were generated locally, according to the configuration you supply in log4j-server.properties. It's up to you to configure the relevant console/file/rolling file appenders and attach them to the relevant loggers just as you would if you were doing the logging directly in the original process rather than piping the log events over a network socket. I.e. if you're currently creating local log files with something like:
log4j.rootLogger=DEBUG, file
log4j.appender.file=org.apache.log4j.RollingFileAppender
log4j.appender.file.File=logfile.log
log4j.appender.file.MaxFileSize=1MB
log4j.appender.file.MaxBackupIndex=1
log4j.appender.file.layout=org.apache.log4j.PatternLayout
log4j.appender.file.layout.ConversionPattern=[%d] [%t] [%m]%n
then you would change it so that the sending side log4j.properties simply says
log4j.rootLogger=DEBUG, server
log4j.appender.server=org.apache.log4j.net.SocketAppender
log4j.appender.server.Port=4712
log4j.appender.server.RemoteHost=loghost
log4j.appender.server.ReconnectionDelay=10000
and the server-side log4j-server.properties contains the definitions that were previously on the sending side:
log4j.rootLogger=DEBUG, file
log4j.appender.file=org.apache.log4j.RollingFileAppender
log4j.appender.file.File=logfile.log
log4j.appender.file.MaxFileSize=1MB
log4j.appender.file.MaxBackupIndex=1
log4j.appender.file.layout=org.apache.log4j.PatternLayout
log4j.appender.file.layout.ConversionPattern=[%d] [%t] [%m]%n
In particular, note that there's no point specifying a layout on the SocketAppender on the sending side - what goes over the network is the whole logging event object, it's the receiving side that is responsible for doing the layout.
To start the server simple type below command in command prompt and server will be up and running:
java -classpath C:Users.m2repositorylog4jlog4j1.2.17log4j-1.2.17.jar org.apache.log4j.net.SimpleSocketServer 4712 log4j-server.properties
Please do not forget to specify the correct path of log4j.jar in your system.

Categories