Transfer file from SFTP to ADLS - java

We are currently in the process of exploring the sshj library to download a file from SFTP path into ADLS. We are using the example as reference.
We have already configured the ADLS Gen2 storage in Databricks to be accessed as an abfss URL.
We are using scala within Databricks.
How should we pass the abfss path as FileSystemFile object in the get step ?
sftp.get("test_file", new FileSystemFile("abfss://<container_name>#a<storage_account>.dfs.core.windows.net/<path>"));
Is the destination supposed to be a file path only or file path with file name?

Use streams. First obtain InputStream of the source SFTP file:
RemoteFile f = sftp.open(sftpPath);
InputStream is = f.new RemoteFileInputStream(0);
(How to read from the remote file into a Stream?)
Then obtain OutputStream of the destination file on ADLS:
OutputStream os = adlsStoreClient.createFile(adlsPath, IfExists.OVERWRITE);
(How to upload and download a file from my locale to azure adls using java sdk?)
And copy from the first to the other:
is.transferTo(os);
(Easy way to write contents of a Java InputStream to an OutputStream)

Related

How to download file to local machine instead of cloud desktop in Java?

I have a file on S3 which I am downloading using a s3 handler. It gets downloaded to the cloud desktop (where my code is located). I have used the following to get the path:
String home = System.getProperty("user.home");
File file = new File(home+"/Downloads/" + fileName + ".txt");
I want the file to be downloaded to the local machine instead of the cloud desktop. Is there a way to route the file from cloud desktop to local? What can be done to get the file on the local machine instead? Any ideas?

Copy file from ftp server to another folder in the same server [duplicate]

I have a CSV file, and I need to copy it and rename it in the same path.
I tried this after the FTP login:
InputStream inputStream = ftpClient.retrieveFileStream(cvs_name +".csv");
ftpClient.storeFile(cvs_name2 + ".csv",inputStream);
But when I verify the file on the server, it's empty. How can I copy a file and rename it?
I believe your code cannot work. You cannot download and upload a file over a single FTP connection at the same time.
You have two options:
Download the file completely first (to a temporary file or to a memory).
The accepted answer to How to copy a file on the ftp server to a directory on the same server in java? shows the "to memory" solution. Note the outputStream.toByteArray() call.
Open two connections (two instances of the FTPClient) and copy the file between the instances.
InputStream inputStream = ftpClient1.retrieveFileStream(cvs_name + ".csv");
ftpClient2.storeFile(cvs_name2 + ".csv", inputStream);

SQLite + Virtual File System?

I'm using an SQLite database and I want to open a .db file from within a Jimfs virtual file system. Using the following code I can import a file into the virtual file system:
String databaseFilePath = "...";
Configuration configuration = Configuration.unix();
FileSystem fileSystem = Jimfs.newFileSystem(configuration);
Path targetDirectory = fileSystem.getPath("/");
Files.copy(Paths.get(databaseFilePath), targetDirectory);
Next, when I try to open the database file, I'm running into problems:
Connection connection = DriverManager.getConnection("jdbc:sqlite:" + databaseFileName);
I cannot use Strings since the virtual file can only be referenced using the Path object. How do I open a database connection using Paths?
SQLite works on 'real' files.
To be able to store data elsewhere, you have to implement your own SQLite VFS. (This is not supported by every JDBC driver.)

File from a hadoop distributed cache is presented as directory

When using the DistributedCache in Hadoop, I manage to push the files from hdfs in the driver class like this:
FileSystem fileSystem = FileSystem.get(getConf());
DistributedCache.createSymlink(conf);
DistributedCache.addCacheFile(fileSystem.getUri().resolve("/dumps" + "#" + "file.txt"), job.getConfiguration());
Then, to read the file, in the setup() of Mapper I do:
Path localPaths[] = context.getLocalCacheFiles();
The file is located in the cache, under a path /tmp/solr-map-reduce/yarn-local-dirs/usercache/user/appcache/application_1398146231614_0045/container_1398146231614_0045_01_000004/file.txt. But when I read it, I get IOException: file is a directory.
How can one go about solving this?

how to get remote linux server file inputstream

I am trying to read a file which is in the remote linux server. But I do not know how to get the inputstream of the file using java.
How can this be done?
Assuming that by "remote linux server" you mean "remote linux shell", you should use an ssh library like JSch. You can find a file download example here.
Maybe SSHJ can help you? https://github.com/shikhar/sshj
Features of the library include:
reading known_hosts files for host key verification
publickey, password and keyboard-interactive authentication
command, subsystem and shell channels
local and remote port forwarding
scp + complete sftp version 0-3 implementation
Assuming you have a working connection to the server and access to the file, you can create a File object with the URI of the file:
File f = new File(uri);
FileInputStream fis = new FileInputStream(f);
The URI should be the URI to the file, for example "file://server/path/to/file".
See also the Javadoc for File(URI) .
It depends on how is the file available. Is it by HTTP, FTP, SFTP or through a server you wrote yourself ?
If your want to get the file through HTTP, you can use this :
HttpURLConnection connec = (HttpURLConnection)new URL("http://host/file").openConnection();
if(connec.getResponseCode() != connec.HTTP_OK)
{
System.err.println("Not OK");
return;
}
System.out.println("length = " + connec.getContentLength());
System.out.println("Type = " + connec.getContentType());
InputStream in = connec.getInputStream();
//Now you can read the file content with in
There is also Jsch library which is very good for SFTP / SCP
You can use any ssh java lib, as was mentioned in other answers, or mount directory with file as NFS share folder. After mounting you can use usual java API to acsess file.
Example

Categories