File from a hadoop distributed cache is presented as directory - java

When using the DistributedCache in Hadoop, I manage to push the files from hdfs in the driver class like this:
FileSystem fileSystem = FileSystem.get(getConf());
DistributedCache.createSymlink(conf);
DistributedCache.addCacheFile(fileSystem.getUri().resolve("/dumps" + "#" + "file.txt"), job.getConfiguration());
Then, to read the file, in the setup() of Mapper I do:
Path localPaths[] = context.getLocalCacheFiles();
The file is located in the cache, under a path /tmp/solr-map-reduce/yarn-local-dirs/usercache/user/appcache/application_1398146231614_0045/container_1398146231614_0045_01_000004/file.txt. But when I read it, I get IOException: file is a directory.
How can one go about solving this?

Related

Cant input file into mac OS folder, fileNotFoundException

Try to insert file to existing local folder. But seems like there's any security or permission issue
String path = con.getConfigValue("CustomPathImgUpload"); //return /Users/name/Documents/
File file = new File(path + date + "_" + fileName);
Got error below
java.io.FileNotFoundException: /Users/name/Documents/20221010_abc.xlsx (No
such file or directory) at
java.base/java.io.FileInputStream.open0(Native Method) at
java.base/java.io.FileInputStream.open(FileInputStream.java:211)
Check users folder , system read write
check name folder , name read write
check documetns folder, name read write
Eclipse apps and chrome apps already get full disk access
Get this problem after update from OS bigsur to menterey
How to solve this issue?
EDIT
Try to change the path to /Users/ and /Applications/ still folder location not found.
Make sure it's not source code mistakes by run it in windows eclipse. File saved correctly
ada usr/bin/java , jawaw, javap, etc to full disk access still not work
EDIT 2
try to use System.getProperty("user.home") + File.seperator + date + filename
return "/Users/name/20221010_abc.xlsx" But still get no such directory :'(
Is it because my Users folder permission looks like this?
OR is there any way to check what path is exist ?

Transfer file from SFTP to ADLS

We are currently in the process of exploring the sshj library to download a file from SFTP path into ADLS. We are using the example as reference.
We have already configured the ADLS Gen2 storage in Databricks to be accessed as an abfss URL.
We are using scala within Databricks.
How should we pass the abfss path as FileSystemFile object in the get step ?
sftp.get("test_file", new FileSystemFile("abfss://<container_name>#a<storage_account>.dfs.core.windows.net/<path>"));
Is the destination supposed to be a file path only or file path with file name?
Use streams. First obtain InputStream of the source SFTP file:
RemoteFile f = sftp.open(sftpPath);
InputStream is = f.new RemoteFileInputStream(0);
(How to read from the remote file into a Stream?)
Then obtain OutputStream of the destination file on ADLS:
OutputStream os = adlsStoreClient.createFile(adlsPath, IfExists.OVERWRITE);
(How to upload and download a file from my locale to azure adls using java sdk?)
And copy from the first to the other:
is.transferTo(os);
(Easy way to write contents of a Java InputStream to an OutputStream)

How to add jars on hive shell in a java application

i am trying to add jar on the hive shell. I am aware of the global option on the server but my requirement is to add them per session on the hive shell.
I have used this class for the hdfs dfs commands to add the jars to the hdfs file system
This is what i have tried:
Created a folder on the hdfs /tmp
Add the file to hdfs filesystem using FileSystem.copyFromLocalFile method
(equivalent to the hdfs dfs -put myjar.jar /tmp
Set permissions on the file on fhe fs file system
Check that the jar was loaded to hdfs using the getFileSystem method
List files on the fs FileSystem using listFiles to confirm the jars are there.
This works and I have the jars loaded to hdfs but i cannot add jars to the hive session
When i am trying to add it in the hive shell, i am doing the following:
statement = setStmt(createStatement(getConnection()));
query = "add jar " + path;
statement.execute(query);
I am getting this error [For example path of /tmp/myjar.jar]:
Error while processing statement: /tmp/myjar.jar does not exist
Other permutations on the path such as
query = "add jar hdfs://<host>:<port>" + path;
query = "add jar <host>:<port>" + path;
results with an error.
command to list jars works (with no results)
query = "list jars";
ResultSet rs = statement.executeQuery(query);
I managed to solve this issue
The process failed because of the configuration of the FileSystem.
This object is where we upload the jars to, before adding them on the session.
This is how you init the FileSystem
FileSystem fs = FileSystem.newInstance(conf);
The object conf should have the properties of the hive server.
In order for the process to work, I needed to set the following parameter on the Configuration property
conf.set("fs.defaultFS", hdfsDstStr);

Hadoop Java - copy file from windows share folder server to HDFS

I want upload multiple file from Windows share folder server (e.g. //server_name/folder/)
to my HDFS using Java
list of methods I have tried
org.apache.hadoop.fs.FileUtil set input path = //server_name/folder/
it says java.io.FileNotFoundException: File //server_name/folder/ does not exist
FileSystem.copyFromLocalFile (i think this is from local hadoop server to hdfs server)
IOUtils.copyBytes same as fileUtil >> file does not exist
a simple File.renameTo same as fileUtil >> file does not exist
String source_path = "\\server_name\folder\xxx.txt";
String hdfs_path = "hdfs://HADOOP_SERVER_NAME:Port/myfile/xxx.txt";
File srcFile = new File(source_path);
File dstFile = new File(hdfs_path);
srcFile.renameTo(dstFile);
Do I need to create FTP or How about using FTPFileSystem?
Or anyone have better solution Or Sample Code
thank you
FileSystem has copyFromLocal method:
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
Configuration configuration = new Configuration();
configuration.set("fs.defaultFS", "hdfs://abc:9000");
FileSystem fs= FileSystem.get(configuration);
fs.copyFromLocalFile(new Path("/source/directory/"),
new Path("/user/hadoop/dir"));

Backup and restore of Hsqldb database in java code

I am new in Hsqldb database. I want to know how to take backup and restore of Hsqldb database through java code.
Use the BACKUP DATABASE TO command.
Here is a link to the documentation:
HSQLDB System Management Documentation
I haven't tested this, but I imagine it's something along the lines of:
String backup = "BACKUP DATABASE TO " + "'" + filePath + "' BLOCKING";
PreparedStatement preparedStatement = connection.prepareStatement(backup);
preparedStatement.execute();
You'll want to wrap it in a try-catch block of course.
As far as restoring the db goes, I think you have to perform that while the database is offline using the DbBackupMain application. So you would issue this command at the command line:
java -cp hsqldb.jar org.hsqldb.lib.tar.DbBackupMain --extract tardir/backup.tar dbdir
Each HyperSQL database is called a catalog. There are three types of catalog depending on how the data is stored.
Types of catalog data :
mem: stored entirely in RAM - without any persistence beyond the JVM process's life
file: stored in filesystem files
res: stored in a Java resource, such as a Jar and always read-only
To back up a running catalog, obtain a JDBC connection and issue a BACKUP DATABASE command in SQL. In its most simple form, the command format below will backup the database as a single .tar.gz file to the given directory.
BACKUP DATABASE TO <directory name> BLOCKING [ AS FILES ]
The directory name must end with a slash to distinguish it as a directory, and the whole string must be in single quotes like so: 'subdir/nesteddir/'.
To back up an offline catalog, the catalog must be in shut down state. You will run a Java command like
java -cp hsqldb.jar org.hsqldb.lib.tar.DbBackupMain --save tardir/backup.tar dbdir/dbname
. In this example, the database is named dbname and is in the dbdir directory. The backup is saved to a file named backup.tar in the tardir directory.
where tardir/backup.tar is a file path to the *.tar or *.tar.gz file to be created in your file system, and dbdir/dbname is the file path to the catalog file base name.
You use DbBackup on your operating system command line to restore a catalog from a backup.
java -cp hsqldb.jar org.hsqldb.lib.tar.DbBackupMain --extract tardir/backup.tar dbdir
where tardir/backup.tar is a file path to the *.tar or *.tar.gz file to be read, and dbdir is the target directory to extract the catalog files into. Note that dbdir specifies a directory path, without the catalog file base name. The files will be created with the names stored in the tar file.
For more details refer
So in java + SPring + JdbcTemplate
Backup (On-line):
#Autowired
public JdbcTemplate jdbcTemplate;
public void mainBackupAndRestore() throws IOException {
...
jdbcTemplate.execute("BACKUP DATABASE TO '" + sourceFile.getAbsolutePath() + "' BLOCKING");
}
This will save .properties, .scripts and .lobs file to a tar in sourceFile.getAbsolutePath().
Restore:
DbBackupMain.main(new String[] { "--extract", baseDir.getAbsolutePath(),
System.getProperty("user.home") + "/restoreFolder" });
This will get files from baseDir.getAbsolutePath() and will put them in userHome/restoreFolder where you can check if all restore is OK.
lobs contains lob/blob data, scripts contains executed queries.

Categories