Hadoop profile output - where and what?

Hadoop profile output - where and what? - java

I'm trying to profile my application to see if I can reproduce this blogpost. I added -D mapred.task.profile=true to the command line and checked in the job configuration that it took.
Hadoop: The Definitive Guide says the profile info will appear in the Unix dir I ran the job from. The dir I started from has a file attempt_201305011806_0042_m_000002_0.profile, which is correct job ID but there wasn't a mapper #2 (only 1 mapper and it didn't fail). The output only has the header info in the profile file; there isn't any actual profiling info.
The Hadoop docs say the output will be in the user log directory but I can't find anything. If I go into the task logs for the mapper, there's profiling info under "profile.out logs" with legitimate info. My HDFS output dir doesn't have the profiling info at all. Shouldn't the profiling output be in HDFS somewhere?
Also, it only gives text-based output in the log but all of the tools I've found to visualize the profile assume binary hprof format. Any ideas for how I could get a binary profile or else load a text-based profile into an hprof tool?

I noticed there's a space at
-D mapred.task.profile=true
Is that a typo? If yes, just remove it and see what happens. Also, you should be able to see a profiler files under the user log directory, which is usually where you ran the job from.
Also, hprof is the default for hadoop, so check if you are not overwriting it with
-Dmapred.task.profile.params

Related

JMeter can't read custom search_paths directory

I want JMeter to find a jar in lib/ext/custom.
In my jmeter.properties:
search_paths=lib/ext/custom
When I run the test, I get this output:
2019-06-25 10:21:54,792 INFO o.a.j.JMeter: search_paths=lib/ext/custom
2019-06-25 10:21:54,792 WARN o.a.j.JMeter: Can't read lib/ext/custom
Does anyone have an idea why it wouldn't be able to read that directory? It has the same owner as all the other directories/files and has the same permissions as lib/ext itself.
I turned the root longer to DEBUG but received no extra information than the above log messages.

The answer was simple. It could not read the directory because it could not find it because the value I provided it wasn't correct.
search_paths=../lib/ext/custom
It needed that look back to correctly find it. Putting the full path also worked.

Java: Using JVM argument -XX:ErrorFile and append the logs in existing log file without pid

I have following configuration for my service
exec java -Djava.io.tmpdir=$tmpdir -Djava.library.path="Some_Path"
-Xmx"$heapsize"m -XX:+UseConcMarkSweepGC -XX:OnOutOfMemoryError="Do something, may be restart"
-XX:ErrorFile=/var/log/service/myService/"myServiceCrash".log -jar .jar
I am not able to append the crash logs into the same file. But new file with new PID is created every time.
Requirement : Dump crash logs into same file.

This is expected behavior. For the first time it will write to the file provided in -XX:ErrorFile=, Once the file exists it won't be overwritten and you will then get the default error file.
Ideally there should be some way top show the file creation fails, but it can't be done as part of the error handling code.
Please check the evaluation here - https://bugs.openjdk.java.net/browse/JDK-8189672

Solr 5.1: Solr is creating way too many log files

I'm dealing with a problem where Solr 5.1 is creating way too many log files. Every time Solr is restarted, and periodically throughout the week, Solr creates the following files and I need it to stop:
Files of the type solr_gc_xxxxxxxx_xxxx, where the x's stand for the date and some kind of identifying number, respectively. These contain garbage collection information.
Files of the type solr_log_xxxxxxxx_xxxx, where the x's stand for the date and some kind of identifying number, respectively. These contain the same kind of information you'd find in solr.log.
One file of the type solr-[port]-console.log. It always contains
only the following text: WARNING: System properties and/or JVM args
set. Consider using --dry-run or --exec
In one week I racked up nearly thirty of files of the type 1 and 2!
Even worse, file types 1 and 2 don't seem to respect my log4j.rootlogger setting and instead are filled with INFO level material.
Here are the relevant parts of my log4j.properties file:
# Logging level
solr.log=logs
log4j.rootLogger=WARN, file
#- size rotation with log cleanup.
log4j.appender.file=org.apache.log4j.RollingFileAppender
log4j.appender.file.MaxFileSize=100MB
log4j.appender.file.File=${solr.log}/solr.log
log4j.appender.file.MaxBackupIndex=0
What I want to do is the following:
Create only solr.log + one backup file. solr.log should be periodically overwritten.
Not create any other log file.
What can I do to accomplish this?

So after some time, I figured out how to fix this.
To recap, Solr kept creating a whole bunch of files with the solr_log* and gc_log* patterns on startup and periodically throughout the day. Eventually I had some pretty serious space issues because of the endless amount of logs Solr likes to create.
Navigate to /path/to/solr/bin and locate the solr script, which runs at startup. Open the file, look for the following, and comment out mv "$SOLR_LOGS_DIR/solr.log" "$SOLR_LOGS_DIR/solr_log_$(date +"%Y%m%d_%H%M")":
# backup the log files before starting
if [ -f "$SOLR_LOGS_DIR/solr.log" ]; then
if $verbose ; then
echo "Backing up $SOLR_LOGS_DIR/solr.log"
fi
mv "$SOLR_LOGS_DIR/solr.log" "$SOLR_LOGS_DIR/solr_log_$(date +"%Y%m%d_%H%M")"
fi
Or remove it, if you like. You could also try not using the -f flag but here at my shop we like it.
This will retain solr.log, but Solr won't make any more backups. If you want daily backups, I recommend configuring a TimeBasedRollingPolicy or, better yet, a DailyRollingFileAppender in the log4j.properties file, which can be found under /path/to/solr/server/resources.
If you want, you can also comment out the mv line for the Solr garbage collection logs, which will leave you with solr_gc.log only.
If, like me, you have other ways you monitor gc for Solr, then you need to turn off gc logging completely.
In the same directory as the solr script, open solr.in.sh (Mac/Linux only, I think solr.cmd is for Windows users) and comment this line out: # Enable verbose GC logging
GC_LOG_OPTS="-verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails \
-XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime".
You will need to restart Solr.

Could some one detail the Flume Command?

Could someone tell me the detailed description of below flume command to execute conf file.
bin/flume-ng agent --conf-file netcat_flume.conf --name a1
-Dflume.root.logger=INFO,console
As of my knowledge,
--conf-file -> To specify Configuration File name or to mention to FLUME that we need to run this file.
--name -> Agent
But what below command does.?
-Dflume.root.logger=INFO,console
Thanks in advance for your help.

Its the Log4j Property which is explained in detail below
INFO which means output only informational messages that highlight the progress of the application at coarse-grained level. For more details check
console means output the log4j logs onto the console. Other options available are write to database and write to file.
-Dflume.root.logger=INFO,console
The above statement write coarse grained level logs of flume execution to console

The shell script flume-ng,accept args,finally run command like:
java -Xmx20m -Dflume.root.logger=INFO,console -cp '=:/home/scy/apache-flume-1.4.0-bin/lib/*:/home/scy/apache-flume-1.4.0-bin/conf:/home/scy/jdk1.6.0_45/lib/tools.jar' -Djava.library.path= org.apache.flume.node.Application --conf-file conf/example.conf --name agent1 conf org.apache.flume.node
Let's look at sourcecode org.apache.flume.node.Application.main(String[] args):
PropertiesFileConfigurationProvider configurationProvider =
new PropertiesFileConfigurationProvider(agentName,
configurationFile);
Here class PropertiesFileConfigurationProvider accept agentName and configurationFile which specific by "--conf-file" and "--name"
Then application.start() run all source,channel and sink
As about -Dflume.root.logger=INFO,console,Let's look at flume/log4j.properties:
flume.root.logger=INFO,LOGFILE
flume.root.logger will changed by -Dflume.root.logger=INFO,console,it means put all info level logs to console

Hadoop - No job jar file set. User classes may not be found

I use the code here http://blog.cloudera.com/blog/2012/12/how-to-run-a-mapreduce-job-in-cdh4/.
However, when I type sudo -u hdfs hadoop jar target/gapdeduce-1.0-SNAPSHOT.jar GapDeduceRunner /gaps/gaplog.txt /gaps/output
It gives me the error like this:
WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
How can I solve this porblem?

Double check that you are calling conf.setJarByClass() or JobConf#setJar(String) in your main method or driver class. Or even you posted the job before setting the previous setting.
Much better if you can post your code.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.