How to specify path of a file in java/terminal on Hadoop? - java

I am running a task on Hadoop2:
$hadoop jar hipi.jar "/5" "/processWOH" 1
hipi.jar: the jar file name
"/5": the input folder name
"/processWOH": the output folder name
I am getting and exception regarding the path /localhost:9000/5/LC814000.tif:
Error: java.io.FileNotFoundException: /localhost:9000/5/LC814000.tif (No such file or directory)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at java.io.FileInputStream.<init>(FileInputStream.java:93)
at ProcessWithoutHIPI.ProcessRecordReaderWOH.getCurrentKey(ProcessRecordReaderWOH.java:81)
at ProcessWithoutHIPI.ProcessRecordReaderWOH.getCurrentKey(ProcessRecordReaderWOH.java:1)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.getCurrentKey(MapTask.java:507)
at org.apache.hadoop.mapreduce.task.MapContextImpl.getCurrentKey(MapContextImpl.java:70)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.getCurrentKey(WrappedMapper.java:81)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
I think ( I am not sure) the problem with the extra "/localhost:9000" added to the path, but I don't know how it is added ( By hadoop, java code, ...).
Notice: this jar file is running fine outside of hadoop but in hadoop (hdfs) it is not
Any help is appreciated
Update:
As I discovered later that "/5" folder is searched inside the local system not inside hdfs and if I create a folder in the local file system with name "localhost:9000" under root i.e. /localhost:9000 and put "/5" the code will run, but in this case the data is taken outside from hadoop like if I am not using hadoop at all.
So is this a mistake in programming i.e. I should use hadoop io packages instead of java io packages to deal with hdfs instead of local filesystem, or it is another problem.?

The default directory of your hdfs is /localhost:9000/, hadoop can not find your input file there; just past it in /localhost:9000/:
$hadoop fs -put $LOCAL_PATH_OF_INPUT_FILE:/5 /localhost:9000/
$hadoop jar hipi.jar "/5" "/processWOH" 1
Good luck!

The problem is as I said earlier was that Java IO (i.e. File Class, Path class,...) treats paths as in local file system whereas Hadoop Io (FileSystem class, Path class,...) treats paths as in HDFS.
Please have a look here:
read/write from/in HDFS
Using FileSystem API to read and write data to HDFS
Reading data from and writing data to Hadoop Distributed File System (HDFS) can be done in a lot of ways. Now let us start by using the FileSystem API to create and write to a file in HDFS, followed by an application to read a file from HDFS and write it back to the local file system.
Step 1: Once you have downloaded a test dataset, we can write an application to read a file from the local file system and write the contents to Hadoop Distributed File System.
package com.hadoop.hdfs.writer;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.util.Tool;
import java.io.BufferedInputStream;
import java.io.FileInputStream;
import java.io.InputStream;
import java.io.OutputStream;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.util.ToolRunner;
public class HdfsWriter extends Configured implements Tool {
public static final String FS_PARAM_NAME = "fs.defaultFS";
public int run(String[] args) throws Exception {
if (args.length < 2) {
System.err.println("HdfsWriter [local input path] [hdfs output path]");
return 1;
}
String localInputPath = args[0];
Path outputPath = new Path(args[1]);
Configuration conf = getConf();
System.out.println("configured filesystem = " + conf.get(FS_PARAM_NAME));
FileSystem fs = FileSystem.get(conf);
if (fs.exists(outputPath)) {
System.err.println("output path exists");
return 1;
}
OutputStream os = fs.create(outputPath);
InputStream is = new BufferedInputStream(new FileInputStream(localInputPath));
IOUtils.copyBytes(is, os, conf);
return 0;
}
public static void main( String[] args ) throws Exception {
int returnCode = ToolRunner.run(new HdfsWriter(), args);
System.exit(returnCode);
}
}
Step 2: Export the Jar file and run the code from terminal to write a sample file to HDFS:
[training#localhost ~]$ hadoop jar HdfsWriter.jar com.hadoop.hdfs.writer.HdfsWriter sample.txt /user/training/HdfsWriter_sample.txt
Step 3: Verify whether the file is written into HDFS and check the contents of the file:
[training#localhost ~]$ hadoop fs -cat /user/training/HdfsWriter_sample.txt
Step 4: Next, we write an application to read the file we just created in Hadoop Distributed File System and write its contents back to the local file system:
package com.hadoop.hdfs.reader;
import java.io.BufferedOutputStream;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.io.OutputStream;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
public class HdfsReader extends Configured implements Tool {
public static final String FS_PARAM_NAME = "fs.defaultFS";
public int run(String[] args) throws Exception {
if (args.length < 2) {
System.err.println("HdfsReader [hdfs input path] [local output path]");
return 1;
}
Path inputPath = new Path(args[0]);
String localOutputPath = args[1];
Configuration conf = getConf();
System.out.println("configured filesystem = " + conf.get(FS_PARAM_NAME));
FileSystem fs = FileSystem.get(conf);
InputStream is = fs.open(inputPath);
OutputStream os = new BufferedOutputStream(new FileOutputStream(localOutputPath));
IOUtils.copyBytes(is, os, conf);
return 0;
}
public static void main( String[] args ) throws Exception {
int returnCode = ToolRunner.run(new HdfsReader(), args);
System.exit(returnCode);
}
}
Step 5: Export the Jar file and run the code from terminal to write a sample file to HDFS:
[training#localhost ~]$ hadoop jar HdfsReader.jar com.hadoop.hdfs.reader.HdfsReader /user/training/HdfsWriter_sample.txt /home/training/HdfsReader_sample.txt
Step 6: Verify whether the file is written back into local file system:
[training#localhost ~]$ hadoop fs -cat /user/training/HdfsWriter_sample.txt
FileSystem is an abstract class that represents a generic file system. Most Hadoop file system implementations can be accessed and updated through the FileSystem object. To create an instance of the HDFS, you call the method FileSystem.get(). The FileSystem.get() method will look at the URI assigned to the fs.defaultFS parameter of the Hadoop configuration files on your classpath and choose the correct implementation of the FileSystem class to instantiate. The fs.defaultFS parameter of HDFS has the value hdfs://.
Once an instance of the FileSystem class has been created, the HdfsWriter class calls the create() method to create a file in HDFS. The create() method returns an OutputStream object, which can be manipulated using normal Java I/O methods. Similarly HdfsReader calls the method open() to open a file in HDFS, which returns an InputStream object that can be used to read the contents of the file.
The FileSystem API is extensive. To demonstrate some of the other methods available in the API, we can add some error checking to the HdfsWriter and HdfsReader classes we created.
To check whether the file exists before we call create(), use:
boolean exists = fs.exists(inputPath);
To check whether the path is a file, use:
boolean isFile = fs.isFile(inputPath);
To rename a file that already exits, use:
boolean renamed = fs.rename(inputPath, new Path("old_file.txt"));

Related

Why can my library not access its resources?

I have a class that reads a file:
package classlibrary;
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.net.URL;
public class ReadingResource {
public static String readResource() throws IOException {
URL resource = ClassLoader.getSystemClassLoader().getResource("classlibrary/test_file.txt");
BufferedReader br = new BufferedReader(new FileReader(resource.getPath()));
return br.readLine();
}
}
The resource file is in the same directory where this class is.
I made a library out of this class and the file.
Now I want to use it in the other class:
package uritesting;
import classlibrary.ReadingResource;
import java.io.IOException;
public class URITesting {
public static void main(String[] args) throws IOException {
System.out.println(ReadingResource.readResource());
}
}
When I make a .jar file out of this class, set the class as the main class, add the .jar from above and execute it as "java -jar URITesting.jar" I get a FileNotFoundException, saying the class ReadingResource can not find the specified file. It is funny because the path that is specified in the exception message is actually the correct path to the file.
You can find the files here.
EDIT:
I developed the project in NetBeans. When I run it there, it works fine. The classpath is different in that case. It contains both resources of the URITestingProject and ReadingResource.
However, when I run it as a standalone JAR, the classpath contains URITestingProject only. What is strange to me is that it doesn't complain about not finding the class ReadingResource. It means that it is loaded, although it is not in the classpath :/
The problem is resource.getPath(). It's not possible to calculate a path ,valid for a file reader, inside a jar file, on another server and so on. However you can get the data through a stream instead:
InputStream data = ClassLoader.getSystemClassLoader().getResourceAsStream("classlibrary/test_file.txt");
BufferedReader br = new BufferedReader(new InputStreamReader(data, "utf-8"));
As a side note: When reading with reader it's a good idea to specify the encoding:

Convert OWL/XML in RDF/XML with a simple command line in shell

I'm asking your help to create a converter to transform OWL/XML into RDF/XML. My purpose is to use OWLapi 2 through a simple shell command with bash.
My files are in OWL/XML but I have to transform them into RDF/XML to send them in my fuseki database. I could transform each file thanks to Protégé or a converter available online, but I've more than one thousand files to convert.
See my current java file (but I don't know how to use it) :
package owl2rdf;
import java.io.File;
import org.semanticweb.owlapi.apibinding.OWLManager;
import org.semanticweb.owlapi.io.RDFXMLOntologyFormat;
import org.semanticweb.owlapi.model.IRI;
import org.semanticweb.owlapi.model.OWLOntology;
import org.semanticweb.owlapi.model.OWLOntologyManager;
#SuppressWarnings("deprecation")
public class owl2rdf {
public static void main(String[] args) throws Exception {
// Get hold of an ontology manager
OWLOntologyManager manager = OWLManager.createOWLOntologyManager();
// Load the ontology from a local files
File file = new File(args[0]);
System.out.println("Loaded ontology: " + file);
OWLOntology ontology = manager.loadOntologyFromOntologyDocument(file);
// Get the ontology format ; in our case it's normally OWL/XML
OWLOntologyFormat format = manager.OWLOntologyFormat(file);
System.out.println(" format: " + format);
// save the file into RDF/XML format
RDFXMLOntologyFormat rdfxmlFormat = new RDFXMLOntologyFormat();
manager.saveOntology(ontology, rdfxmlFormat, IRI.create(file));
}
}
When I execute this code, I've many errors relative to exceptions I don't understand at all, but I saw it's a common error:
Exception in thread "main" java.lang.reflect.InvocationTargetException
Caused by: java.lang.NoClassDefFoundError: com/google/inject/Provider
Caused by: java.lang.ClassNotFoundException: com.google.inject.Provider
To change a entire repository of an OWL/XML file into RDF/XML file:
1- create your package owl2rdf.java
package owl2rdf;
//import all necessary classes
import java.io.File;
import org.semanticweb.owlapi.apibinding.OWLManager;
import org.semanticweb.owlapi.io.RDFXMLOntologyFormat;
import org.semanticweb.owlapi.model.IRI;
import org.semanticweb.owlapi.model.OWLOntology;
import org.semanticweb.owlapi.model.OWLOntologyManager;
#SuppressWarnings("deprecation")
public class owl2rdf {
#create a main() function to take an argument; here in the example one argument only
public static void main(String[] args) throws Exception {
// Get hold of an ontology manager
OWLOntologyManager manager = OWLManager.createOWLOntologyManager();
// Load the ontology from a local files
File file = new File(args[0]);
System.out.println("Loaded ontology: " + file);
OWLOntology ontology = manager.loadOntologyFromOntologyDocument(file);
// save the file into RDF/XML format
//in this case, my ontology file and format already manage prefix
RDFXMLOntologyFormat rdfxmlFormat = new RDFXMLOntologyFormat();
manager.saveOntology(ontology, rdfxmlFormat, IRI.create(file));
}
}
2- Thanks to a Java-IDE such as Eclipse or something else, manages all dependencies (repo Maven, downloads jar, classplath, etc.)
3- create your bash scrip my-scrip.sh; here absolutely not optimized
#!/bin/bash
cd your-dir
for i in *
do
#get the absolute path; be careful, realpath comes with the latest coreutils package
r=$(realpath "$i")
#to be not disturb by relative path with java -jar, I put the .jar in the parent directory
cd ..
java -jar owl2rdf.jar "$r"
cd your-dir
done
echo "Conversion finished, see below if there are errors."
4- Execute your script
$ chmod +x my-script.sh;./my-script
5- haha moment: all your OWL/XML are converted in RDF/XML. You can for example, import them into fuseki or sesame database.

Program that cannot acess input from a file

I have a program that scans a file and then closes the file.
import java.util.*;
import java.io.*;
public class FileTester{
public static void main(String[] args) throws IOException {
File test = new File("MyDatta.in.txt");
Scanner sf = new Scanner(test);
sf.close();
}
}
When I run the program I get an error message like this:
Exception in thread "main" java.io.FileNotFoundException: MyDatta.in.txt (No such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:120)
at java.util.Scanner.<init>(Scanner.java:636)
at FileTester.main(FileTester.java:6)
I have a mac that runs on Mac OS. I have reason to believe it has to do with the pathway to my file which is in documents. I know in windows one would use C:\folder name\file to scan it but I just don't know with Mac and I cannot find it anywhere
From Java documentation for FileInputStream: "If the named file does not exist, is a directory rather than a regular file, or for some other reason cannot be opened for reading then a FileNotFoundException is thrown."
Maybe file is used by another program?
You should put this as the file path:
~/Documents/MyDatta.in.txt
to tell java that your file is in documents. The ~ is your home folder.

Java - FilenotfoundException for reading text file

by running this...
File file = new File("Highscores.scr");
i keep getting this error, and i really don't know how to get around it.
the file is currently sitting in my source packages with my .java files.
I can quite easily read the file by specifying the path but i intend to run this on multiple computers so i need the file to be portable with the program.
this question isnt about reading the text file but rather specifying its location without using an absolute path .
ive searched for the answer but the answers i get are just "specify the name" and "specify the absolute path".
id post an image to make it more clear but i dont have the 10 rep to do so :/
how do i do this?
cheers.
The best way to do this is to put it in your classpath then getResource()
package com.sandbox;
import org.apache.commons.io.FileUtils;
import java.io.File;
import java.io.IOException;
import java.net.URISyntaxException;
import java.net.URL;
public class Sandbox {
public static void main(String[] args) throws URISyntaxException, IOException {
new Sandbox().run();
}
private void run() throws URISyntaxException, IOException {
URL resource = Sandbox.class.getResource("/my.txt");
File file = new File(resource.toURI());
String s = FileUtils.readFileToString(file);
System.out.println(s);
}
}
I'm doing this because I'm assuming you need a File. But if you have an api which takes an InputStream instead, it's probably better to use getResourceAsStream instead.
Notice the path, /my.txt. That means, "get a file named my.txt that is in the root directory of the classpath". I'm sure you can read more about getResource and getResourceAsStream to learn more about how to do this. But the key thing here is that the classpath for the file will be the same for any computer you give the executable to (as long as you don't move the file around in your classpath).
BTW, if you get a null pointer exception on the line that does new File, that means that you haven't specified the correct classpath for the file.
As far as I remember the default directory with be the same as your project folder level. Put the file one level higher.
-Project/
----src/
----test/
-Highscores.scr
If you are building your code on your eclipse then you need to put your Highscores.scr to your project folder. Try that and check.
You can try to run the following sample program to check which is the current directory your program is picking up.
File f = new File(".");
System.out.println("Current Directory is: " + f.getAbsolutePath());

Creating and using .exe in JAR

I have some code written in Java that executes some .exe file. It first creates a temporary file from where it executes it and then destroys that file after the execution is done.
The only problem with this is, it requires the executable file to be in the same package as that of the class having main function. I want to place and access my .exe file from other locations as well because while creating the JAR file of my project it never executes that exe file.
Is there some other way by which my .exe file can also be a part of my JAR file irrespective of its location in my project?
Here's the code :
package com.web.frame;
import java.io.File;
import java.io.FileOutputStream;
import java.io.InputStream;
//import java.net.URL;
public class Test{
public void Test1(String fileAddr,String filename, String destFilenam) throws Exception {
// Place .exe into the package folder.
InputStream src =this.getClass().getResourceAsStream("DECRYPT.exe");
if(src!=null) {
File exeTempFile = File.createTempFile("dspdf", ".exe");
byte []ba=new byte[src.available()];
src.read(ba,0,ba.length);
exeTempFile.deleteOnExit();
FileOutputStream os=new FileOutputStream(exeTempFile);
os.write(ba,0,ba.length);
os.close();
String hello=exeTempFile.getParent();
System.out.println("Current Directory Of file : "+hello);
String hello1=exeTempFile.getName();
System.out.println("Full Name Of File : "+hello1);
int l=hello1.length();
l=l-4;
char[] carray=hello1.toCharArray();
String s = new String(carray,0,l);
System.out.println(s);
String param="cmd /c cd "+hello+" && "+s+" d 23 11 23 "+fileAddr+"\\"+filename+" "+destFilenam;
Runtime.getRuntime().exec(param);
Runtime.getRuntime().exec("c:\\Program Files\\VideoLAN\\VLC\\vlc.exe "+hello+"\\"+destFilenam);
Runtime.getRuntime().exec("cmd /c del"+hello+"\\"+destFilenam);
}
else
System.out.println("Executable not found");
}
}
The maximum you can do is place your exe anywhere in your classpath. Ensure that the jar's manifest has a classpath element. Then you should be able to access the exe by saying Test.class.getResourceAsStream("")
So create a folder, put the exe into that folder and include the folder in your classpath.

Categories