Is it possible to create a file instance by putting the uri of my HDFS as File class's constructor? For example:
val conf = new Configuration()
conf.addResource(hdfsCoreSitePath)
conf.addResource(hdfsHDFSSitePath)
val uri = conf.get("fs.default.name")
val file = new File(uri + pathtothefile)
Then, with the file instance, I wish to access the file list with the functions provided by File class such as file.list() to returns an array of strings naming the files and directories in the directory denoted by this abstract pathname. I tried the code but it return null on the file.list().
The method below is not recommended as I am trying to writing the same codebase for normal file system and hdfs to achieve code reusable.
val fileSystem = FileSystem.get(conf)
val status = fileSystem.listStatus(new Path(filepath))
status.map(x => ...
The regular built-in Java/Scala File APIs will not work for HDFS files. The protocol and implementation are too different. You have to use the Hadoop API to access HDFS files as in your second example.
The good news, though, is that the Hadoop API will work for non-HDFS files (regular files). So that code is reusable. Just use a URI like: file:///foo/bar for a local file.
fs.default.name is deprecated. Try to use fs.defaultFS and make sure this property is available in the core-site.xml file you are referring using the below command
conf.addResource(hdfsCoreSitePath)
https://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/core-default.xml
Related
I am working on a Java Maven project, and I have gotten to a point where I need to determine if my input from HDFS is either a directory of CSV files or a Parquet file. From my understanding, and I could be wrong, I believe HDFS stores Parquet files as directories.
My question is, what might be a good way of determining the difference between these two potential inputs so that I can handle each of them appropriately?
You can use Hadoop FileSystem API.
If you want to check whether an hdfsPath is a directory or a file use getFileStatus:
Path path = new Path(hdfsPath);
FileSystem fs = path.getFileSystem(conf);
FileStatus fileStatus = fs.getFileStatus(path);
if (fileStatus.isFile()) {
// .... logic for file
} else {
// ... logic for directory
}
To check if the directory contains Parquet on CSV files, you can use listStatus method to list the files under that directory, and for each file, you can check its extension to determine its type (.csv or .parquet).
I'm loading some network path in my java code. It is not taking the same format as present in the configuration file, missing one slash.
Example:
String path = "//abckatte.com/abc/test";
File fileobj = new File(path);
Whenever I saw the fileobj in log message it is displaying as /abckatte.com/abc/test. One slash is missing.
I tried with appending two more slash like.
String path = "////abckatte.com/abc/test";
then also it is not working.
You could make use of Apache Commons VFS 2, as it provides access to several file systems. Chek it out here, in Local files file://///somehost/someshare/afile.txt.
How to write to properties file in a java package using java class in another package.
Here is the code for writing properties file
String filePath1 = "com/...../application.properties";
File applicationProperties = new File(filePath1);
FileOutputStream fileOutputStream = new FileOutputStream(applicationProperties);
Date todayDate = new Date();
SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
properties.setProperty("application.database.backup.date", sdf.format(todayDate));
properties.store(fileOutputStream, "storing index values to properties file");
fileOutputStream.close();
Getting FileNotFoundException.But file is exist in this package.while reading these file get the output.
String filePath = "com/....../application.properties";
InputStream inputStream = getClass().getResourceAsStream(filePath);
Properties properties = new Properties();
properties.load(inputStream);
if (properties.getProperty("application.grouping.mode") != null || !properties.getProperty("application.grouping.mode").isEmpty()) {
String lastBackupDate = properties.getProperty("application.grouping.mode");
}
How to solve this Exception.
There are three problems here, which are related. Basically, you're assuming that get because you can read from a resource, you can write to a file in the same folder structure, relative to the current directory. That's a flawed assumption because:
The resources may not be on the file system as separate files to start with. For example, Java applications are usually packaged up into jar files. The classloader knows how to read resources from a jar file, but the folder structure isn't present on disk
Even if the files are on disk as separate files in the right folder structure, they may not be rooted in the process's working directory. For example:
# Running in /home/jon/Documents
$ java -cp /home/jon/java/bin com.foo.bar.SomeApplication
Here SomeApplication.class would be in /home/jon/java/bin/com/foo/bar, but new File("com/foo/bar/something.properties") would refer to /home/jon/Documents/com/foo/bar/something.properties.
Finally, even if you were trying to write to the right place, you may not have write access - very often the binaries for applications are stored in read-only directories, with the reasonable justification that the code and read-only application data should be separated from the application's changing state. Aside from anything else, this makes updates/repairs much easier - just blow away the old version's directory, knowing that you won't have lost any user data.
Your context isn't clear, but I would suggest that you find some appropriate way of passing a filename to the application, and write to that and read from it where you need to.
Make sure your property file is in class path. after that you should be able to load it as
String filePath = "/application.properties";
InputStream inputStream = getClass().getResourceAsStream(filePath);
Your com is in the src directory. So your file path must be src/com/...... It must start with src/ and not com/
My app needs to get an existing file for processing. Now I have the path of the file in String format, how can I get the File with it? Is it correct to do this:
File fileToSave = new File(dirOfTheFile);
Here dirOfTheFile is the path of the file. If I implement it in this way, will I get the existing file or the system will create another file for me?
That's what you want to do. If the file exists you'll get it. Otherwise you'll create it. You can check whether the file exists by calling fileToSave.exists() on it and act appropriately if it does not.
The new keyword is creating a File object in code, not necessarily a new file on the device.
I would caution you to not use hardcoded paths if you are for dirOfFile. For example, if you're accessing external storage, call Environment.getExternalStorageDirectory() instead of hardcoding /sdcard.
The File object is just a reference to a file (a wrapper around the path of the file); creating a new File object does not actually create or read the file; to do that, use FileInputStream to read, FileOutputStream to write, or the various File helper methods (like exists(), createNewFile(), etc.) for example to actually perform operations on the path in question. Note that, as others have pointed out, you should use one of the utilities provided by the system to locate directories on the internal or external storage, depending on where you want your files.
try this..
File fileToSave = new File(dirOfTheFile);
if(fileToSave.exists())
{
// the file exists. use it
} else {
// create file here
}
if parent folder is not there you may have to call fileToSave.getParentFile().mkdirs() to create parent folders
I am trying to load multiple files/directories in SPARK using Java, I have found a few examples on how to do this in scala, can someone give an example with explanation on how to do this in Java?
In particular I would like to use regex like paths, so that I do not have to specify a fully qualified name for each file. I can already give a comma separated file values with fully qualified names.
I am loading from the local file system, I don't know if this makes a difference
The following is the code I have used to load the files:
SparkConf sparkConf = new SparkConf().setAppName("TableAggregator");
JavaSparkContext ctx = new JavaSparkContext(sparkConf);
JavaRDD<String> lines = ctx.textFile(args[0], 1);
In Spark, the method textFile() takes an URI for the file (either a local path on the machine or a hdfs://, etc URI).
You can run this method on directories, compressed files and wildcard :
ctx.textFile("data.txt");
ctx.textFile("/your/directory/");
ctx.textFile("/your/directory/*");
ctx.textFile("/your/directory/*.gz");
Be aware that when you use a path for your input, it has to be the same path for all the worker nodes. So you have to copy the file to all workers or use a shared network-mounted file system.
So you can use a pattern with the wildcard to do it simply.