Why java action System.out disappear when executed in Oozie? - java

I'm using oozie environment. After successfully completition of the job, I can't find System.out.println output in the oozie log. I googled for many hours and i found this
but without result. From oozie web console i got the job id "0000011-180801114827014-oozie-oozi-W", then i tried to get more information about the job using the following command:
oozie job -oozie http://localhost:11000/oozie/ -info 0000011-180801114827014-oozie-oozi-W
then i get the externalId from action JobCompleted "16546" and i think the job id 180801114827014. Finally i tried to get log from java action using the following command:
yarn logs -applicationId application_180801114827014_16546
Where I'm doing wrong? Any suggestion?
Edit
I check if log aggregation was enaled and seems that it is enabled
Then, where I'm a doing wrong?

I can say from experience that stdout is not removed from any YARN action, however, the encouraged way to log information in your applications is using Log4j which goes to syslog, not stdout (or stderr).
However, as your terminal says, YARN log aggregation needs enabled / completed for you to see the logs from the yarn logs command
And if that command doesn't work otherwise, go to the Oozie UI, to the job action, or directly to the YARN UI and search for the action, then find the logs link from there

Related

Hadoop Single Node Cluster - Process Not Running

I have this project to do with Hadoop and I have installed Hadoop just as described here: https://www.codeproject.com/Articles/757934/Apache-Hadoop-for-Windows-Platform I'm trying to run the same map-reduce job Recipe.java on the dataset recipeitems-latest.json
I have created a .jar file from this Recipe.java code, and I've started YARN and DFS. I have also created the directory /in and copied recipeitems-latest.json to it.
Now, I start the job by calling:
hadoop jar c:\Hwork\Recipe.jar Recipe /in /out
The job starts, says running but no progress is made as you can see here: https://i.stack.imgur.com/QSifC.png
I tried tracking the job too by clicking on given link, its status is accepted but the progress bar shows nothing.
I have started using Hadoop only 1 day back and I really don't know what is going wrong. Why is there no progress in the job I started?
The problem is resolved. Apparently EOL characters in \sbin\start-yarn must be changed (as well as in \bin\hadoop.cmd) from '\n' to '\r\n' and it worked like a charm!

Hadoop log4j cannot find KafkaLog4JAppender.class

I added KafkaLog4JAppender functionality to my MR job.
locally the job is running and sending the formatted logs into my Kafka cluster.
when I try to run it from the yarn server, using:
jar [jar-name].jar [DriverClass].class [job-params] -Dlog4j.configuration=log4j.xml -libjars
I get the following expception:
log4j:ERROR Could not create an Appender. Reported error follows.
java.lang.ClassNotFoundException: kafka.producer.KafkaLog4jAppender
the KafkaLog4JAppender class is in the path.
running
jar tvf [my-jar].jar | grep KafkaLog4J
finds the class
I'm kinda lost and would appreciate any helpfull input
thanks in advance!
If it works in local mode and not working in Yarn/distributed mode, then it could be problem of jar not being distributed properly. YOu might want to check Using third part jars and files in your MapReduce application(Distributed cache) for details on how to distribute your jar containing KafkaLog4jAppender.class

Could some one detail the Flume Command?

Could someone tell me the detailed description of below flume command to execute conf file.
bin/flume-ng agent --conf-file netcat_flume.conf --name a1
-Dflume.root.logger=INFO,console
As of my knowledge,
--conf-file -> To specify Configuration File name or to mention to FLUME that we need to run this file.
--name -> Agent
But what below command does.?
-Dflume.root.logger=INFO,console
Thanks in advance for your help.
Its the Log4j Property which is explained in detail below
INFO which means output only informational messages that highlight the progress of the application at coarse-grained level. For more details check
console means output the log4j logs onto the console. Other options available are write to database and write to file.
-Dflume.root.logger=INFO,console
The above statement write coarse grained level logs of flume execution to console
The shell script flume-ng,accept args,finally run command like:
java -Xmx20m -Dflume.root.logger=INFO,console -cp '=:/home/scy/apache-flume-1.4.0-bin/lib/*:/home/scy/apache-flume-1.4.0-bin/conf:/home/scy/jdk1.6.0_45/lib/tools.jar' -Djava.library.path= org.apache.flume.node.Application --conf-file conf/example.conf --name agent1 conf org.apache.flume.node
Let's look at sourcecode org.apache.flume.node.Application.main(String[] args):
PropertiesFileConfigurationProvider configurationProvider =
new PropertiesFileConfigurationProvider(agentName,
configurationFile);
Here class PropertiesFileConfigurationProvider accept agentName and configurationFile which specific by "--conf-file" and "--name"
Then application.start() run all source,channel and sink
As about -Dflume.root.logger=INFO,console,Let's look at flume/log4j.properties:
flume.root.logger=INFO,LOGFILE
flume.root.logger will changed by -Dflume.root.logger=INFO,console,it means put all info level logs to console

Batch file run command after previous command has finished executing

I am trying to clear a batch file, which does some Weblogic Admin server data source update after which I need to restart the admin server..I am trying to automate the same through batch file...So I have;
call wlst UpdateDataSource.py
stopWebLogic.cmd weblogicUser weblogicPwd localhost:7001
startWebLogic.cmd
Now, how do I ensure that startWebLogic.cmd is executed only after the previous line has finished executing (i.e. after stopWebLogic.cmd finishes)
I'm assuming Windows, since you have .cmd files!?
You can use & between the scripts to run them in sequence. If you use && the ensuing scripts will only run if the previous ones completed successfully.
You can read more here.
Cheers,

Hadoop profile output - where and what?

I'm trying to profile my application to see if I can reproduce this blogpost. I added -D mapred.task.profile=true to the command line and checked in the job configuration that it took.
Hadoop: The Definitive Guide says the profile info will appear in the Unix dir I ran the job from. The dir I started from has a file attempt_201305011806_0042_m_000002_0.profile, which is correct job ID but there wasn't a mapper #2 (only 1 mapper and it didn't fail). The output only has the header info in the profile file; there isn't any actual profiling info.
The Hadoop docs say the output will be in the user log directory but I can't find anything. If I go into the task logs for the mapper, there's profiling info under "profile.out logs" with legitimate info. My HDFS output dir doesn't have the profiling info at all. Shouldn't the profiling output be in HDFS somewhere?
Also, it only gives text-based output in the log but all of the tools I've found to visualize the profile assume binary hprof format. Any ideas for how I could get a binary profile or else load a text-based profile into an hprof tool?
I noticed there's a space at
-D mapred.task.profile=true
Is that a typo? If yes, just remove it and see what happens. Also, you should be able to see a profiler files under the user log directory, which is usually where you ran the job from.
Also, hprof is the default for hadoop, so check if you are not overwriting it with
-Dmapred.task.profile.params

Categories