Reading end of huge and dynamic file via SFTP from server

Reading end of huge and dynamic file via SFTP from server - java

I am trying to find a way to read just end of huge and dynamic log file (like 20-30 lines from end) via SFTP from server and to save the point until where I read, and if I need more lines, to read more from this point upper.
Everything I've tried takes too long time, I've tried to copy this file on machine and after this to read from end using ReversedLinesFileReader because this method need the File object, when via SFTP you will get only InputStream, takes a lot to download file.
Also tried to count lines and to read from n line but also takes too long and throws exception because sometime in this time file is modified. Another way I tried to connect via SSH and used tail -100 and get the desired result, but just for one time, because next time I will get also new logs, but I need to go upper. Is there a fast way to get the end of file and to save the point and to read more upper of this point later? Any idea?

You don't say what SFTP library you're using, but the most widely used Java SSH/SFTP library is JSch, so I'll assume you're using that.
The SFTP protocol has operations to perform random-access I/O on remote files. Unfortunately, the JSch SFTP client doesn't expose the full range of operations. However, it does have versions of the get operation (for getting a file from the remote server) which permit skipping over the first part of the remote file. You can use one of these operations to read for example the last 10 KB of a file.
Several of the JSch get operations return an InputStream. You can read the contents of the remote file from the input stream. If you want to access the remote file line-by-line, you can convert it to Reader using InputStreamReader.
So, a process might do the following:
Call stat() on the remote file to get its size.
Figure out where in the file you want to start reading from. You could keep track of where you stopped reading last time, or you could guess based on the amount of data you're willing to download and the expected size in bytes of these last 20-30 lines.
Call get() to start reading it.
Process data read from the InputStream returned by the get() call.

Best would be to have a kind of rotating log files, possibly with compression.
Hower rsync is a unidirectional synchronisation, that can transmit only the changed parts of a file: for a log the new end.
I am not sure whether it works sufficiently performant in your case, and ssh is a prerequisite.

Related

xml processing on a client server application in java using sockets

I am writing a client server application in java for XML processing.
I have successfully implemented a multi-threaded server that can handle multiple clients. The XML file has a number of transaction tags that are like commands for the server, instructing him to do something( they can have a number of values including binary data). So single client can send multiple commands. These commands are also processed in separate threads that I create using Executors. I have successfully transmitted an xml file and executed its commands.
I am stuck while I am trying to incorporate additional functionality. I want the transfer to resume from the point where it stopped say due to a network disconnection. For this I am splitting the file into bundles and then transmitting these bundles and keeping a track of the last bundle successfully transmitted (through acknowledgement) so that the transmission can resume from the bundle number last transmitted. Obviously there are other fields like total number of bundles to be transmitted and so on.
Now suppose some part of the XML file is transmitted and received at the server when the network is disconnected. This part may have some transaction tags or commands.
I want them to be processed by the server without waiting for the complete file to be received. In fact the correct solution would be to read the XML file being received and keep calling the command handler as soon as a complete transaction tag is received, without waiting for the complete XML file to be received.
One naive way of doing it would be to create my XML file line by line, that is having line separators for each line and then reading the socket line by line to determine location of transaction tags. I would be really grateful if anybody can give me a better solution.

You dont need to write your XML line by line. You can use a SAX parser.
The parser will read the input and dispach events for each tag it reads. The parser will be responsible for buffering the input and notify your code when a TAG start or end.

How do I know that Apache Camel route has no more files to copy

I am writing simple command line application, which copies files from ftp server to local drive. Lets assume that I am using the following route definition:
File tmpFile = File.createTempFile("repo", "dat");
IdempotentRepository<String> repository = FileIdempotentRepository.fileIdempotentRepository(tmpFile);
from("{{ftp.server}}")
.idempotentConsumer(header("CamelFileName"), repository)
.to("file:target/download")
.log("Downloaded file ${file:name} complete.");
where ftp.server is something like:
ftp://ftp-server.com:21/mypath?username=foo&password=bar&delay=5
Let's assume that files on the ftp server will not change over time. How do I check, whether the coping has finished or there are still some more file to copy? I need this, because I want to finish my app, once all file are copied.

Read about batch consumer
http://camel.apache.org/batch-consumer.html
The ftp consumer will set some exchange properties with the number of files, and if its the last file etc.

Do you have any control over the end that publishes the FTP files? E.g. is it your server and your client or can you make a request as a customer?
If so, you could ask for a flag file to be added at the end of their batch process. This is a single byte file with an agreed name that you watch for - when that file appears you know the batch is complete.
This is a useful technique if you regularly pull down huge files and they take a long time for a batch process to copy to disk at the server end. E.g. a file is produced by some streaming process.

How to transfer file from php to java using php-java-bridge

I am trying to upload a file, My front end application is in PHP and backend engine is in Java. They both communicate through PHP-Java_bridge.
My first action was, when a file is posted to PHP page, it will retrieve its content.
$filedata= file_get_contents($tmpUploadedLocation);
and then pass this information to Java EJB façade which accepts byte array saveFileContents(byte[] contents)
Here is how in PHP I converted the $filedata into byte array.
$bytearrayData = unpack("C*",$filedata);
and finally called the Java service (Java service object was retrieved using php-java-bridge)
$javaService->saveFileContents($bytearrayData);
This works fine if file size is less, but if the size increase 2.9 MB, I receive an error and hence file contents are not saved on to the disk.
Fatal error: Allowed memory size of 134217728 bytes exhausted //This is PHP side error due to unpack
I am not sure how to do this, Above method is not accurate, Please I have few limits.
The engine(Java) is responsible for saving and retrieving the
contents.
PHP-HTML is the front end application, It could be any thing for now its just PHP
PHP communicate with Java using PHP-Java-Bridge
EJB's methods are accessed by PHP for saving and retrieving information.
Everything was working fine with above combination, but now its about upload and saving documents. It is EJB (Application Engine access point) that will be used for any front-end application (PHP or another java application through remote interface (lookups)).
My question is how File contents from PHP can be sent to Java, where it does not break any thing (Memory)?

Instead of converting a file into an array I'd try to pass it as string. Encode the string into base64 in PHP and decode it into array in Java.
Another option is to pass the file thru the filesystem. Some Linux systems have /dev/shm or /run/shm mounted to a tmpfs, which is often a good way to pass temporary data between programs without incurring a hard-drive overhead. A typical tmpfs algorithm is 1) create a folder; 2) remove old files from it (e.g. files older than a minute); 3) save the new file; 4) pass the file path to Java; 5) remove the file. Step 2 is important in order not to waste RAM if steps 3-5 are not completed for some reason.

Create a temporary file, then upload it using FTP (Java webapp)

Users of my web application have an option to start a process that generates a CSV file (populated by some data from a database) and uploads it to an FTP server (and another department will read the file from there). I'm just trying to figure out how to best implement this. I use commons net ftp functionality. It offers two ways to upload data to the FTP server:
storeFile(String remote, InputStream local)
storeFileStream(String remote)
It can take a while to generate all the CSV data so I think keeping a connection open the whole time (storeFileStream) would not be the best way. That's why I want to generate a temporary file, populate it and only then transfer it.
What is the best way to generate a temporary file in a webapp? Is it safe and recommended to use File.createTempFile?

As long as you don't create thousands of CSV files concurrently the upload-time doesn't matter from my point of view. Databases usually output the data row by row and if this is already the format you need for the CSV file I strongly recommend not to use temporary files at all - just do the conversion on-the-fly:
Create an InputStream implementation that reads the database data row by row, converts it to CSV and publish the data via it's read() methods.
BTW: You mentioned that the conversion is done by a web application and that it can take a long time - this can be problematic as the default web client has a timeout. Therefore the long lasting process should be better done by a background thread only triggered by the webapp interface.

It is ok to use createTempFile, new File(tmpDir, UUID.randomUUID().toString()) can do as well. Just do not use deleteOnExit(), it is a leak master. Make sure you delete the file on your right own.
Edit: since you WILL have the data in memory, do not store it anywhere; wrap a java.io.ByteArrayInputSteam and use the method w/ the InputStream. Much neater and better solution.

Can and How do you use RandomAccessFile with a file contained on an FTP server?

This problem pertains to Java
By using RandomAccessFile I intend to be able to also modify the file without blanking it.

The FTP protocol only barely supports random access reads and writes.
That is to say, an FTP client can use the REST command to start reading or writing from a particular offset, but it will always truncate the file from that point.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.