Apache Camel pull data from ftp incrementally & periodically - java

I am very new to Apache camel and I am exploring how to create a rout which pulls data from ftp for instance each 15 minutes and pulls only new or updated files, so if some files were downloaded early and still the same (unchanged) ftp loader should not load them to the destination folder.
Any advices are warmly appreciated.
UPDATE #1
I've already noticed that I need to look at the FTP2, and actually I've already made a progress, the last thing that I want to clarify: consumer.dealy defines delay between each download attempt, for instance consumer.delay = 5s and at the first attempt ftp contains 5 files, consumer pulls data to somewhere and waites 5s at the second attempt ftp still the same and camel just does nothing, after that to ftp arrives additional 5 files and after 5 seconds ftp consumer downloads these just arrived new files or consumer.delay just makes consumer wait between each download of file (file#1 -> 5s -> file#2 -> 5s -> etc...)
I want to achieve first scenario.
Also, I observed that once some files were downloaded to the destination folder, I mean from ftp to local file system, this files will be ignored in subsequent data loads, even if this files were deleted on the local file system, how I can tell to camel to download again deleted files, how it stores information about already loaded files? And it seems that it downloads all files each time even files were downloaded at first data pull. Do I need to write a filter to exclude already downloaded files?

there is FTP component for apache camel http://camel.apache.org/ftp.html
use "consumer.delay" property to pull data for delay in milliseconds between each poll.
for implementation details look here http://architects.dzone.com/articles/apache-camel-integration

Related

Multiple zip file download on single Spring Boot rest controller

I have a requirement where :
I have multiple files on the server location
My Response size limit is 100 MB, Max
So my use case here is to merge files and produce in zip format and send attachments to the client browser. But, here client browser has only one button "DOWNLOAD ALL".Based on the click of the button, all files which are located on the server should get downloaded as multiple zip files to the client.
For example, I have 7 files
1.txt - 24 MB
2.txt - 30 MB
3.txt - 30 MB
4.txt - 30 MB
5.txt - 40 MB
so, By clicking of button two zip files should get downloaded as 1.zip contains 1.txt,2.txt,3.txt because it. has around 100 MB, and the other 2.zip will contain 4.txt and 5.txt.
I came across multiple things on the web like zipping a file and sending it as a response, but it sends only a response once the channel gets closed after the response is transferred.
http://techblog.games24x7.com/2017/07/18/streaming-large-data-as-zipped-file-from-rest-services/
https://javadigest.wordpress.com/2012/02/13/downloading-multiple-files-using-multipart-response/
Moreover, UI can have multiple requests to the endpoint, but I have multiple users, so I may need to keep track of users and files. Any idea or implementation will be appreciated.
Thanks in advance.
"Moreover, UI can have multiple requests to the endpoint, but I have multiple users, so I may need to keep track of users and files. Any idea or implementation will be appreciated."
This can be solved by using spring-actuators and log4J. Using spring actuators you can monitor from where the request is coming.
Can be done using the dependency
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
and in application.properties file :-
management.endpoints.web.exposure.include=*
as by default only 2 endpoints are exposed.
Regarding the multiple file download... is there a specific limit? like the zip file must be of 100mb in size?
If yes, then you need to orchestrate your solution by calling multiple end points one by one and download the same.
Maybe I have a workaround regarding this, if you take a look into the video-streaming concept, it's the same as your use-case since a huge file size served for users. So; I would like to suggest this as a possible solution.
The concept of streaming is to load specific chunks of the required file based on the client's request till the stream ends. so assuming that the file size is 100MB then the client has the ability to specify in the request header the chunk bypassing the Range HTTP Header, and your backend should understand this header and respond with the required data till the file ends.
Take a look at the below link, maybe will give you more details:
https://saravanastar.medium.com/video-streaming-over-http-using-spring-boot-51e9830a3b8

Java sftp file transfer for windows servers

My requirements is-
To push the file into windows servers from other windows server. Files will be near 1000s of in 1 batch and it will run these batches to sftp locations
So i need to config multithreads to process these files
Issues :
The first thing i m picking list of files from source directory and that list i am iterating which is getting iterated successfully like all the thread picking the file separately nothing is colliding.
Then i need to connect with sftp...
Heres the question.
I need create a separate session for each thread right? And also separate channels to.
Is Jsch.jar SFTPConnection work for windows server to?
And the third when i am multithreading and going to call
Channelsftp.put(src,des) its not putting some pipe closed some time no file sometime inderoutbound exception some times input stream is closed. How to config that if possible? Like connection is created for each thread when it comes to putting the file into sftp location its not working
Please if u have any push condition let me for multi threading it will help

Spooldir source stop processing

I have a distant server which generate files. The server push files each 15 min to hadoop cluster. These files are stored into a specific directory. We used flume to read files from local directory and send them to HDFS. However, SpoolDir is The suitable to process data.
The problem is flume shut down processing while the file is written into the directory.
I don't know how to make flume spooldir wait for a complete write of file , then process it.
Or how to block reading the file until it completly written, using a script shell or processor .
Someone can help me!
Set pollDelay property for spool source.
Spool dir source polls for new file at specific interval in the given directory.
By default value is 500ms.
Which is too fast for many systems so you should configure it accordingly.

NFS: Synchronized processing of files in cluster env

There is a process which dumps 10k files in a shared NFS drive. I need to read and process the data from files. I have written java code which works great in a single node env. But when the code is deployed in WAS cluster with 4 nodes, the nodes are picking and processing the same files.
How can I avoid this? Is there some sort of file lock feature that I can use to fix this issue? Any help is highly appreciated.
More info:
I am using org.apache.commons.io.monitor library to poll the NFS directory every 10secs. Then, we read and process the files and then move the file to a post process folder. As mentioned, this works great in a single node env. When deployed in cluster, the nodes are polling the same file and processing them which is causing multiple calls with same data to a backend service.
I am looking for optimal solution.
PS:The application which processes the files doesn't have access to any kind of database.
Thanks in advance
"Is there some sort of file lock feature that I can use to fix this issue?" Not without doing some work on your end. You could create another file with the same name ending in .lock and have the application check to see if a lock file exists by creating the lock file and if it succeeds then it will process the file. If it fails it then knows one of the other cluster members already grabbed the lock file.

How do I know that Apache Camel route has no more files to copy

I am writing simple command line application, which copies files from ftp server to local drive. Lets assume that I am using the following route definition:
File tmpFile = File.createTempFile("repo", "dat");
IdempotentRepository<String> repository = FileIdempotentRepository.fileIdempotentRepository(tmpFile);
from("{{ftp.server}}")
.idempotentConsumer(header("CamelFileName"), repository)
.to("file:target/download")
.log("Downloaded file ${file:name} complete.");
where ftp.server is something like:
ftp://ftp-server.com:21/mypath?username=foo&password=bar&delay=5
Let's assume that files on the ftp server will not change over time. How do I check, whether the coping has finished or there are still some more file to copy? I need this, because I want to finish my app, once all file are copied.
Read about batch consumer
http://camel.apache.org/batch-consumer.html
The ftp consumer will set some exchange properties with the number of files, and if its the last file etc.
Do you have any control over the end that publishes the FTP files? E.g. is it your server and your client or can you make a request as a customer?
If so, you could ask for a flag file to be added at the end of their batch process. This is a single byte file with an agreed name that you watch for - when that file appears you know the batch is complete.
This is a useful technique if you regularly pull down huge files and they take a long time for a batch process to copy to disk at the server end. E.g. a file is produced by some streaming process.

Categories