I have this project to do with Hadoop and I have installed Hadoop just as described here: https://www.codeproject.com/Articles/757934/Apache-Hadoop-for-Windows-Platform I'm trying to run the same map-reduce job Recipe.java on the dataset recipeitems-latest.json
I have created a .jar file from this Recipe.java code, and I've started YARN and DFS. I have also created the directory /in and copied recipeitems-latest.json to it.
Now, I start the job by calling:
hadoop jar c:\Hwork\Recipe.jar Recipe /in /out
The job starts, says running but no progress is made as you can see here: https://i.stack.imgur.com/QSifC.png
I tried tracking the job too by clicking on given link, its status is accepted but the progress bar shows nothing.
I have started using Hadoop only 1 day back and I really don't know what is going wrong. Why is there no progress in the job I started?
The problem is resolved. Apparently EOL characters in \sbin\start-yarn must be changed (as well as in \bin\hadoop.cmd) from '\n' to '\r\n' and it worked like a charm!
Related
I'm using oozie environment. After successfully completition of the job, I can't find System.out.println output in the oozie log. I googled for many hours and i found this
but without result. From oozie web console i got the job id "0000011-180801114827014-oozie-oozi-W", then i tried to get more information about the job using the following command:
oozie job -oozie http://localhost:11000/oozie/ -info 0000011-180801114827014-oozie-oozi-W
then i get the externalId from action JobCompleted "16546" and i think the job id 180801114827014. Finally i tried to get log from java action using the following command:
yarn logs -applicationId application_180801114827014_16546
Where I'm doing wrong? Any suggestion?
Edit
I check if log aggregation was enaled and seems that it is enabled
Then, where I'm a doing wrong?
I can say from experience that stdout is not removed from any YARN action, however, the encouraged way to log information in your applications is using Log4j which goes to syslog, not stdout (or stderr).
However, as your terminal says, YARN log aggregation needs enabled / completed for you to see the logs from the yarn logs command
And if that command doesn't work otherwise, go to the Oozie UI, to the job action, or directly to the YARN UI and search for the action, then find the logs link from there
I have two machines named: ubuntu1 and ubuntu2.
In ubuntu1, I started the master node in Spark Standalone Cluster and ubuntu2 I started with a worker (slave).
I am trying to execute the example workCount available on github.
When I submit the application, the worker send an error message
java.io.FileNotFoundException: File file:/home/ubuntu1/demo/test.txt does not exist.
My command line is
./spark-submit --master spark://ubuntu1-VirtualBox:7077 --deploy-mode cluster --clas br.com.wordCount.App -v --name"Word Count" /home/ubuntu1/demo/wordCount.jar /home/ubuntu1/demo/test.txt
The file test.txt has only to stay in one machine ?
Note: The master and the worker are in different machine.
Thank you
I got the same problem while loading the JSON file. I recognized by default windows storing file format as Textfile regardless of the name. identify the file format then you can load easily.
example: think you saved the file as test.JSON. but by default windows adding .txt to it.
check that and try to run again.
I hope your problem will get resolved with this idea.
Thank you.
You should put your file on hdfs by going to the folder and typing :
hdfs dfs -put <file>
Otherwise each node has to have access to it by having the same path folder existing on each machine.
Don't forget to change file:/ to hdfs:/ after you do that
I added KafkaLog4JAppender functionality to my MR job.
locally the job is running and sending the formatted logs into my Kafka cluster.
when I try to run it from the yarn server, using:
jar [jar-name].jar [DriverClass].class [job-params] -Dlog4j.configuration=log4j.xml -libjars
I get the following expception:
log4j:ERROR Could not create an Appender. Reported error follows.
java.lang.ClassNotFoundException: kafka.producer.KafkaLog4jAppender
the KafkaLog4JAppender class is in the path.
running
jar tvf [my-jar].jar | grep KafkaLog4J
finds the class
I'm kinda lost and would appreciate any helpfull input
thanks in advance!
If it works in local mode and not working in Yarn/distributed mode, then it could be problem of jar not being distributed properly. YOu might want to check Using third part jars and files in your MapReduce application(Distributed cache) for details on how to distribute your jar containing KafkaLog4jAppender.class
I have a single node where i run MR jobs frequently. The system up and running fine from two days. Suddently all the hadoop process were stoped. It is annoying that my all running jobs were failed since then. I have gotten the logs saying
for secondary namenode:
java.io.ioexception: cannot lock storage /hdfs/namesecondary. the directory is already locked
for namenode :
java.io.ioexception: cannot lock storage /hdfs/name. the directory is already locked
I tried to come out of the safemode and tried formatting the namenode but this also throws the same exception.
How can i start the Hadoop process. There were no disk space issue. its a 900GB disk and 300GB is free at the time of this shutdown.
What should i verify now? Have not found any thread on this world.
Thanks
I solved it by removing the inuse_lock file in /hdfs/name/ and /hdfs/secondarynamenode. After i formatted both namenode and secondarynamenode.
I have set up a hadoop single node in windows.
When I execute the command ./bin/hadoop jar Prefix.jar PrefixJob ip op
the job is stuck. There is no exception or anything. but it is just stuck.
How to get it to run?
The correct command to run the WordCount example is as below which I just tested yesterday (on HDInsight):
hadoop.cmd jar jar_file_name.jar class_name iput_file_or_folder_name output_folder_name
c:\apps\dist\hadoop-1.1.0-SNAPSHOT\bin\hadoop.cmd jar \
c:\apps\Jobs\templates\635000448534317551.hadoop-examples.jar wordcount \
/user/admin/DaVinci.txt /user/admin/outcount
What you can do is look at the Job Log to understand what is happening. First visit your cluster to Open Job Tracker and then search for the Job ID to check the information about the submitted job.