Camel route picking up file before ftp is complete - java

I have a customer who ftp's a file over to our server. I have a route defined to select certain files from this directory and move them to a different directory to be processed. The problem is that it takes it as soon as it sees it and doesn't wait till the ftp is complete. The result is a 0 byte file in the path described in the to uri. I have tried each of the readLock options (masterFile,rename,changed, fileLock) but none have worked. I am using spring DSL to define my camel routes. Here is an example of one that is not working. camel version is 2.10.0
<route>
<from uri="file:pathName?initialDelay=10s&move=ARCHIVE&sortBy=ignoreCase:file:name&readLock=fileLock&readLockCheckInterval=5000&readLockTimeout=10m&filter=#FileFilter" />
<to uri="file:pathName/newDirectory/" />
</route>
Any help would be appreciated. Thanks!
Just to note...At one point this route was running on a different server and I had to ftp the file to another server that processed it. When I was using the ftp component in camel, that route worked fine. That is it did wait till the file was received before doing the ftp. I had the same option on my route defined. Thats why I am thinking there should be a way to do it since the ftp component uses the file component options in camel.
I am taking #PeteH's suggestion #2 and did the following. I am still hoping there is another way, but this will work.
I added the following method that returns me a Date that is current.minus(x seconds)
public static Date getDateMinusSeconds(Integer seconds) {
Calendar cal = Calendar.getInstance();
cal.add(Calendar.SECOND, seconds);
return cal.getTime();
}
Then within my filter I check if the initial filtering is true. If it is I compare the Last modified date to the getDateMinusSeconds(). I return a false for the filter if the comparison is true.
if(filter){
if(new Date(pathname.getLastModified()).after(DateUtil.getDateMinusSeconds(-30))){
return false;
}
}

I have not done any of this in your environment, but have had this kind of problem before with FTP.
The better option of the two I can suggest is if you can get the customer to send two files. File1 is their data, File2 can be anything. They send them sequentially. You trap when File2 arrives, but all you're doing is using it as a "signal" that File1 has arrived safely.
The less good option (and this is the one we ended up implementing because we couldn't control the files being sent) is to write your code such that you refuse to process any file until its last modified timestamp is at least x minutes old. I think we settled on 5 minutes. This is pretty horrible since you're essentially firing, checking, sleeping, checking etc. etc.
But the problem you describe is quite well known with FTP. Like I say, I don't know whether either of these approaches will work in your environment, but certainly at a high level they're sound.

camel inherits from the file component. This is at the top describing this very thing..
Beware the JDK File IO API is a bit limited in detecting whether another application is currently writing/copying a file. And the implementation can be different depending on OS platform as well. This could lead to that Camel thinks the file is not locked by another process and start consuming it. Therefore you have to do you own investigation what suites your environment. To help with this Camel provides different readLock options and doneFileName option that you can use. See also the section Consuming files from folders where others drop files directly.
To get around this problem I had my publishers put out a "done" file. This solves this problem

A way to do so is to use a watcher which will trigger the job once a file is deposed and to delay the consuming of the file to a significant amount of time, to be sure that it's upload is finished.
from("file-watch://{{ftp.file_input}}?events=CREATE&recursive=false")
.id("FILE_WATCHER")
.log("File event: ${header.CamelFileEventType} occurred on file ${header.CamelFileName} at ${header.CamelFileLastModified}")
.delay(20000)
.to("direct:file_processor");
from("direct:file_processor")
.id("FILE_DISPATCHER")
.log("Sending To SFTP Uploader")
.to("sftp://{{ftp.user}}#{{ftp.host}}:{{ftp.port}}//upload?password={{ftp.password}}&fileName={{file_pattern}}-${date:now:yyyyMMdd-HH:mm}.csv")
.log("File sent to SFTP");
It's never late to respond.
Hope it can help someone struggling in the deepest creepy places of the SFTP world...

Related

JMeter tries to use EOF in request

I have a HTTP request in a thread group that reads from a single column csv file to get values to populate a parameter in the request URL.
Below is my configuration for these:
There are 30 values in the csv data file.
My goal is to have each thread start at the beginning of the file once it gets to the end, effectively infinitely looping through the data values until the scheduler duration expires.
However, what actually happens is some requests try and use (see screenshot below) and therefore fail.
I have tried this but that just stops at the 30th iteration i.e. the end of the csv data file.
I assume I have some config option(s) wrong but I can't find anything online to suggest what they might be. Can anyone point me in the right direction (what i should be searching for?) or provide a solution?
Most probably it's test data issue, double check your CSV file and make sure it doesn't contain empty lines, if they are - remove them and your test should start working as expected.
For small files with only one column you can use __StringFromFile() function - it's much easier to set up and use.

j2ee download a file issues if same file used in backend?

Webapp, in my project to provide download CSV file functionality based on a search by end user, is doing the following:
A file is opened "download.csv" (not using File.createTempFile(String prefix,
String suffix, File directory); but always just "download.csv"), writing rows of data from a Sql recordset to it and then using FileUtils to copy that file's content to the servlet's OutputStream.
The recordset is based on a search criteria, like 1st Jan to 30th March.
Can this lead to a potential case where the file has contents of 2 users who make different date ranges/ other filters and submit at the same time so JVM processes the requests concurrently ?
Right now we are in dev and there is very little data.
I know we can write automated tests to test this, but wanted to know the theory.
I suggested to use the OutputStream of the Http Response (pass that to the service layer as a vanilla OutputSteam and directly write to that or wrap in a Buffered Writer and then write to it).
Only down side is that the data will be written slower than the File copy.
As if there is more data in the recordset it will take time to iterate thru it. But the total time of request should be less? (as the time to write to output stream of file will be same + time to copy from file to servlet output stream).
Anyone done testing around this and have test cases or solutions to share?
Well that is a tricky question if you really would like to go into the depth of both parts.
Concurrency
As you wrote this "same name" thing could lead to a race condition if you are working on a multi thread system (almost all of the systems are like that nowadays). I have seen some coding done like this and it can cause a lot of trouble. The result file could have not only lines from both of the searches but merged characters as well.
Examples:
Thread 1 wants to write: 123456789\n
Thread 2 wants to write: abcdefghi\n
Outputs could vary in the mentioned ways:
1st case:
123456789
abcdefghi
2nd case:
1234abcd56789
efghi
I would definitely use at least unique (UUID.randomUUID()) names to "hot-fix" the problem.
Concurrency
Having disk IO is a tricky thing if you go in-depth. The speads could vary in a vide range. In the JVM you can have blocking and non-blocking IO as well. The blocking one could wait until the data is really on the disk and the other will do some "magic" to flush the file later. There is a good read in here.
TL.DR.: As a rule of thumb it is better to have things in the memory (if it could fit) and not bother with the disk. If you use thread memory for that purpose as well you can avoid the concurrency problem as well. So in your case it could be better to rewrite the given part to utilize the memory only and write to the output.

How do I configure Apache Camel to sequentially execute processes based on a trigger file?

I have a situation where an external system will send me 4 different files at the same time. Let's call them the following:
customers.xml (optional)
addresses.xml (optional)
references.xml (optional)
activity.xml (trigger file)
When the trigger file is sent and picked up by Camel, Camel should then look to see if file #1 exists, if it does then process it; if it doesn't then move on to file #2 and file #3 applying the same if/then logic. Once that logic has been performed, then it can proceed with file #4.
I found elements like OnCompletion and determining if body is null or not but if someone has a much better idea, I would greatly appreciate it.
As I thought this further, it turns out this was more of a sequence problem. The key here is that I would be receiving the files in batches at the same time. That being said, I created a pluggable CustomComparator.
Once I created my CustomComparator class to order my files in a given ArrayList index position, I was able to route the messages in the order I wanted them in.

Java batch processing

I will read 2000 files and do some works on them with java. So I think I should use batch processing. But How could I do? My system is Windows 7.
You can use Apache Camel / Servicemix ESB in combination with ActiveMQ.
Your first step would be to write the fileNames one by one in ActiveMQ Messages. This could be done in one so called route (a separate Thread automatically by the framework). Here you have several options which component to use. There is a file component which reads files and moves them to done afterwards or you can use a simple Java Bean.
In a second route you read the Active MQ messages (single consumer if it is important to process the files in a sequence or multiple consumers if you want more performance) process the File Content in a processor or Java Bean like you want.
You can stop the Camel context any time you want (during the processing) and restart it afterwards getting the process started at the next file not yet processed by loading / consuming it from the Active MQ message queue.
Java does not provide built in support for batch processing. You need to use something like Spring Batch.
Check this out:
http://jcp.org/en/jsr/detail?id=352
This is a new "Batch" on JSR - javax.batch
You can't read files as a batch. You have the read one at a time. You can use more than one thread but I would write it single threaded first.
It doesn't matter what OS you are using.
Assuming you have the ability to work on one file, you have two options: use a file list, or recur through a directory. It gets trickier if you need to roll back changes as a result of something that happens towards the end, though. You'd have to create a list of changes to make and then commit them all at the end of the batch operation.
// first option
batchProcess(Collection<File> filesToProcess) {
for(File file : filesToProcess) processSingle(file);
}
// second option
batchProcess(File file) {
if(file.isDirectory()) {
for(File child : file.listFiles()) {
batchProcess(file);
}
} else {
processSingle(file);
}
}

Queueing Multiple Downloads, looking for a producer consumer API

I have an application (a servlet, but that's not very important) that downloads a set of files and parse them to extract information. Up to now, I did those operations in a loop :
- fetching new file on the Internet
- analyzing it.
A multi-threaded download-manager seems a better solution for this problem and I would like to implement it in the fastest way possible.
Some of the downloads are dependant from others (so, this set is partially ordered).
Mutli-threaded programming is hard and if I could find an API to do that I would be quite happy. I need to put a group of files (ordered) in a queue and get the first group of files that is completely downloaded.
Do you know of any library I could use to achieve that ?
Regards,
Stéphane
You could do something like:
BlockingQueue<Download> queue = new BlockingQueue<Download>();
ExecutorService pool = Executors.newFixedThreadPool(5);
Download obj = new Download(queue);
pool.execute(obj); //start download and place on queue once completed
Data data = queue.take(); //get completely downloaded item
You may have to use a different kind of queue if the speed of each download is not the same. BlockingQueue is first in first out I believe.
You may want to look into using a PriorityBlockingQueue which will order the Download objects according to their Comparable method. See the API here for more details.
Hope this helps.

Categories