File Not Found - Spark standalone cluster

File Not Found - Spark standalone cluster - java

I have two machines named: ubuntu1 and ubuntu2.
In ubuntu1, I started the master node in Spark Standalone Cluster and ubuntu2 I started with a worker (slave).
I am trying to execute the example workCount available on github.
When I submit the application, the worker send an error message
java.io.FileNotFoundException: File file:/home/ubuntu1/demo/test.txt does not exist.
My command line is
./spark-submit --master spark://ubuntu1-VirtualBox:7077 --deploy-mode cluster --clas br.com.wordCount.App -v --name"Word Count" /home/ubuntu1/demo/wordCount.jar /home/ubuntu1/demo/test.txt
The file test.txt has only to stay in one machine ?
Note: The master and the worker are in different machine.
Thank you

I got the same problem while loading the JSON file. I recognized by default windows storing file format as Textfile regardless of the name. identify the file format then you can load easily.
example: think you saved the file as test.JSON. but by default windows adding .txt to it.
check that and try to run again.
I hope your problem will get resolved with this idea.
Thank you.

You should put your file on hdfs by going to the folder and typing :
hdfs dfs -put <file>
Otherwise each node has to have access to it by having the same path folder existing on each machine.
Don't forget to change file:/ to hdfs:/ after you do that

Related

Error while trying to write on parquet file in datastage 11.7 (File_Connector_20,0: java.lang.NoClassDefFoundError: org.apache.hadoop.fs.FileSystem)

we have recently upgraded the DataStage from 9.1 to 11.7 on Server AIX 7.1 .
and i'm trying to use the new connector "File Connector" to write on parquet file. i created simple job takes from teradata as a source and write on the parquet file as a target.
Image of the job
but facing below error :
> File_Connector_20,0: java.lang.NoClassDefFoundError: org.apache.hadoop.fs.FileSystem
at java.lang.J9VMInternals.prepareClassImpl (J9VMInternals.java)
at java.lang.J9VMInternals.prepare (J9VMInternals.java: 304)
at java.lang.Class.getConstructor (Class.java: 594)
at com.ibm.iis.jis.utilities.dochandler.impl.OutputBuilder.<init> (OutputBuilder.java: 80)
at com.ibm.iis.jis.utilities.dochandler.impl.Registrar.getBuilder (Registrar.java: 340)
at com.ibm.iis.jis.utilities.dochandler.impl.Registrar.getBuilder (Registrar.java: 302)
at com.ibm.iis.cc.filesystem.FileSystem.getBuilder (FileSystem.java: 2586)
at com.ibm.iis.cc.filesystem.FileSystem.writeFile (FileSystem.java: 1063)
at com.ibm.iis.cc.filesystem.FileSystem.process (FileSystem.java: 935)
at com.ibm.is.cc.javastage.connector.CC_JavaAdapter.run (CC_JavaAdapter.java: 444)
i followed the steps in below link :
https://www.ibm.com/support/knowledgecenter/SSZJPZ_11.7.0/com.ibm.swg.im.iis.conn.s3.usage.doc/topics/amaze_file_formats.html
1- i uploaded the jar files into "/ds9/IBM/InformationServer/Server/DSComponents/jars"
2- added them to CLASSPATH in agent.sh then restarted the datastage.
3- i have set The environment variable CC_USE_LATEST_FILECC_JARS to the value parquet-1.9.0.jar:orc-2.1.jar.
i tried also to add the CLASSPATH as an environment variable in the job but not worked.
noting that i'm using Local in File System.
so any hint is appreciated as i'm searching a lot time ago.
Thanks in advance,

Which File System mode you are using ? If you are using Native HDFS as File System mode, then you would need to configure CLASSPATH to include some third party jars.
Perhaps these links should provide you with some guidance.
https://www.ibm.com/support/pages/node/301847
https://www.ibm.com/support/pages/steps-required-configure-file-connector-use-parquet-or-orc-file-format
Note : Based on the hadoop distribution and version you are using, the version of the jars could be different.
If the above information does not help in resolving the issue, then you may have to reach out to IBM Support to get this addressed.

TO use File Connector, there is no need to add CLASSPATH in agent.sh unless you want to import HDFS files from IMAM.
If your requirement is reading Parquet files, then set
$CC_USE_LATEST_FILECC_JARS=parquet-1.9.0.jar
$FILECC_PARQUET_AVRO_COMPAT_MODE=TRUE
If you are still seeing issue, then run job with $CC_MSG_LEVEL=2 and open IBM support case along with job design, FULL job log and Version.xml file from Engine tier.

How to give local system path in ssh?

I'm trying to insert records from local directory path in SSH but it shows "file not found" because it does not recognize the path. But if I give sft directory then it recognise the path.
This is the code:
Local directory path:
./putmsgE1E2.ksh LPDWA123 ABC.GD.SENDCORRESPONDENCE.BATCH.INPUT XPRESSION.TEST /Users/admin/Desktop/important.txt;
and I tried using c:/Users/admin/Desktop/important.txt
SFT Directory:
./putmsgE1E2.ksh LPDWA123 ABC.GD.SENDCORRESPONDENCE.BATCH.INPUT XPRESSION.TEST /abchome/abc123/1.txt;
I have inputs in the txt file in my local directory; I want to poll that files to the server. Hope someone finds a solution. Thanks.

Use the below command on Linux systems:
scp /path/to/your/local/file remoteUser#some_address:/home/remoteUser/Documents

Hadoop log4j cannot find KafkaLog4JAppender.class

I added KafkaLog4JAppender functionality to my MR job.
locally the job is running and sending the formatted logs into my Kafka cluster.
when I try to run it from the yarn server, using:
jar [jar-name].jar [DriverClass].class [job-params] -Dlog4j.configuration=log4j.xml -libjars
I get the following expception:
log4j:ERROR Could not create an Appender. Reported error follows.
java.lang.ClassNotFoundException: kafka.producer.KafkaLog4jAppender
the KafkaLog4JAppender class is in the path.
running
jar tvf [my-jar].jar | grep KafkaLog4J
finds the class
I'm kinda lost and would appreciate any helpfull input
thanks in advance!

If it works in local mode and not working in Yarn/distributed mode, then it could be problem of jar not being distributed properly. YOu might want to check Using third part jars and files in your MapReduce application(Distributed cache) for details on how to distribute your jar containing KafkaLog4jAppender.class

Writing to file from jar run from Oozie shell

I have jar file that needs to be run before running our map reduce process. This is going to process the data to be fed in later to the map reduce process. The jar file works fine without oozie, but I like to automate the workflow.
The jar if runs should accept two inputs: <input_file> and <output_dir>
And it should be expected to output two files <output_file_1>, <output_file_2> under the <output_dir> specified.
This is the workflow:
<workflow-app name="RI" xmlns="uri:oozie:workflow:0.4">
<start to="RI"/>
<action name="RI">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<exec>java </exec>
<argument>-jar</argument>
<argument>RI-Sequencer.jar </argument>
<argument>log.csv</argument>
<argument>/tmp</argument>
<file>/user/root/algo/RI-Sequencer.jar#RI-Sequencer.jar</file>
<file>/user/root/algo/log.csv#log.csv</file>
<capture-output/>
</shell>
<ok to="end"/>
<error to="kill"/>
</action>
<kill name="kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
I run the task using Hue, and currently I can't get the output of the process to be written to files. It runs fine, but the supposed files are no where to be found.
I have also changed the output directory to be in HDFS, but with same result, no files are generated.
If it helps, this is sample of codes from my jar file:
File fileErr = new File(targetPath + "\\input_RI_err.txt");
fileErr.createNewFile();
textFileErr = new BufferedWriter(new FileWriter(fileErr));
//
// fill in the buffer with the result
//
textFileErr.close();
UPDATE:
If it helps, I can upload the jar file for testing.
UPDATE 2:
I've changed to make it write to HDFS. Still not working when using Oozie to execute the job. Running the job independently works.

It seems like you are creating a regular output file (on the local filesystem, not HDFS). As the job is going to run on one of the node of the cluster, the output is going to be on the local /tmp of the machine picked.

I do not understand why are you want to preprocess data before mapreduce. Think it is not too effective. But as Roamin said, you are saving your output file into local filesystem (file should be in your user home folder ~/). If you want to save your data into hdfs directly from java (without using mapreduce library) look here - How to write a file in HDFS using hadoop or Write a file in hdfs with java.
Eventually you can generate your file to local directory and then load it into HDFS with this command:
hdfs dfs -put <localsrc> ... <dst>

InvalidInputException When loading file into Hbase MapReduce

I am very new for Hadoop and Map Reduce. For starting bases i executed Word Count Program. It executed well but when i try running csv file into Htable which i followed [Csv File][1]
It throwing me in to following error which i am not aware of it, please can any one help me in knowing the above error
12/09/07 05:47:31 ERROR security.UserGroupInformation: PriviledgedActionException as:hduser cause:org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path [1]: http://salsahpc.indiana.edu/ScienceCloud/hbase_hands_on_1.htm#shell_exercises
This error is really kiiling my time, please can any one help me with this exception

The problem why you are directing to path hdfs://HadoopMaster:54310/user/hduser/csvtable instead of csvtable is.
1) Add your Hbase jars into Hadoop class path because your Map reduce doesn't by default configure to hbase jars.
2) GO to hadoop-ev.sh and edit Hadoop_classpath and add all your hbase jars in it. hope it might work now

your job is attempting to read an input file from:
hdfs://HadoopMaster:54310/user/hduser/csvtable
you should verify that this file exists on HDFS using the hadoop shell tools:
hadoop fs -ls /user/hduser/csvtable
my guess is that your file hasn't been loaded onto HDFS.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

File Not Found - Spark standalone cluster - java

You should put your file on hdfs by going to the folder and typing : hdfs dfs -put <file> Otherwise each node has to have access to it by having the same path folder existing on each machine. Don't forget to change file:/ to hdfs:/ after you do that

Related

Error while trying to write on parquet file in datastage 11.7 (File_Connector_20,0: java.lang.NoClassDefFoundError: org.apache.hadoop.fs.FileSystem)

How to give local system path in ssh?

Hadoop log4j cannot find KafkaLog4JAppender.class

Writing to file from jar run from Oozie shell

InvalidInputException When loading file into Hbase MapReduce

Categories

Resources