FTP file corrupt after download with Apache Commons Net

FTP file corrupt after download with Apache Commons Net - java

The files downloaded by this, are nearly the same size but differ in some lines. Every answer points to binary file type. But this won't help.
Got anybody an idea for the problem (transferring PDF)?
FTPClient ftpClient = new FTPClient();
OutputStream outputStream = null;
boolean resultOk = true;
try {
ftpClient.connect(host, port);
ftpClient.enterLocalPassiveMode();
ftpClient.setFileTransferMode(FTP.COMPRESSED_TRANSFER_MODE);
ftpClient.setFileType(FTP.BINARY_FILE_TYPE);
if (showMessages) {
System.out.println(ftpClient.getReplyString());
}
resultOk &= ftpClient.login(usr, pwd);
if (showMessages) {
System.out.println(ftpClient.getReplyString());
}
outputStream = new FileOutputStream(localResultFile);
resultOk &= ftpClient.retrieveFile(remoteSourceFile, outputStream);
outputStream.flush();
outputStream.close();
if (showMessages) {
System.out.println(ftpClient.getReplyString());
}
if (resultOk == true) {
resultOk &= ftpClient.deleteFile(remoteSourceFile);
}
resultOk &= ftpClient.logout();
if (showMessages) {
System.out.println(ftpClient.getReplyString());
}
} finally {
ftpClient.disconnect();
}

It's clear from the files you have shared, that the transfer indeed happened in text/ascii mode.
While probably not required by FTP specification, with some FTP servers (e.g. FileZilla server or ProFTPD), you cannot change transfer type before logging in. But servers like IIS, ProFTPD or vsftpd have no problem with that. On the other hand FileZilla server defaults to binary mode anyway (what is another violation of the specification), so you are probably using yet another one.
In any case, move the .setFileType call after .login. And test its return value.
And remove the .setFileTransferMode call. It does not do any harm with most servers, as hardly any server support MODE C, hence the call is ignored anyway. But if you encounter a server that does, it would break the transfer, as FTPClient actually does not support it.

While my problem was related to corruption of the upload, I resolved similar issue by moving the set of file type after ftp login (I dont use transfer mode leaving it to its default value):
resultOk &= ftpClient.login(usr, pwd);
ftpClient.setFileType(FTP.BINARY_FILE_TYPE);
I saw in some forums that setting binary file type before invoking login method, could lead to problems in transfer. Before this change, PDF file get downloaded but show corrupted fonts and elements. Now it works. Hope it may helps someone.

It seems to happen when using unescaped paths that contain spaces. E.g. C:/Documents and Settings/test
Got it solved now by using a escaped path for the spaces. Thanks for your help

Related

Copying files from a remote address seems to loose information

I have a program that reads a file from a webpage and writes it to a file. Most of the time this works good, but on occasions the file gets corrupted. I guess this has something to do with network issues. What could I do to make my code more stable?
String filename = "myfile.txt";
File file = new File(PROFilePath+"/"+filename);
//Open the connection
URL myCon = new URL("url to a page");
URLConnection uc = myCon.openConnection();
FileOutputStream outputStream = new FileOutputStream(file);
int read = 0;
byte[] bytes = new byte[1024];
while ((read = uc.getInputStream().read(bytes)) != -1) {
outputStream.write(bytes, 0, read);
}
uc.getInputStream().close();
outputStream.close();

You are not using an explicit encoding for your copies, you are merely copying all bytes and write these bytes to a file which might later be read with a different decoding. An easy way to find this out is to compare the bytes of the document at the remote address and the copied file after you discover a "broken" file. However, with the information you provide is not detailed enough to provide you more specific help. Is there an example document are you having struggles with? Check out this related question and answer as well as this thread for a deeper discussion of this issue.
As to your suspicion: The connection should not simply lose bytes while you are reading from the remote address. This would be a very serious bug in the implementation as you connect via TCP (I guess the URL's protocol is HTTP) where lost packages are automatically compensated. And if the connection breaks, the connection should issue an exception instead of failing silently. I do not think that this is the source of your error.

how to upload multiple files using java

I am hoping someone can help me (once again).
I have a very large number of smmll files (over 4000) each only a few K.
I have writen an FTP program in java which will transfer each file individually but it is taking a very long time. Also the handshaking overhead seems to make the problem worse.
What I would like to be able to do is open the FTP connection send all the files then close it again.
I know that this is possible in FTP but quite how to acheive this in java is beyond me.
I currently have the filenames in an array so parsing through them is no problem. I have tried calling the following class and passing it the filename but after several hours it was still moving about 1 file per second.
package website;
import java.io.BufferedOutputStream;
import java.io.FileInputStream;
import java.io.IOException;
import java.net.MalformedURLException;
import java.net.URL;
import java.net.URLConnection;
public class ftpUpload {
public ftpUpload(String target, String savename, String localFilePath) {
URL url;
try {
url = new URL(target + savename + ";type=i");
URLConnection con = url.openConnection();
BufferedOutputStream out =
new BufferedOutputStream(con.getOutputStream());
FileInputStream in =
new FileInputStream(localFilePath + savename);
int i = 0;
byte[] bytesIn = new byte[1024];
while ((i = in.read(bytesIn)) >= 0) {
out.write(bytesIn, 0, i);
}
out.close();
in.close();
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
}
Is there a way I can open the connection with the ftp site username and password,
then send it the files
and finally close the connection?
This would seem to me easier than creating multiple threads to send files concurrently.
Any advice greatfully received.
Paul

I don't think it's possible to send multiple files in one session using URLConnection, this means you get the overhead of opening and closing the session for every file.
FTPClient from commons net does support multiple operations in one session. For example (exception handling ommitted):
FTPClient ftp = new FTPClient();
ftp.connect("ftp.example.com");
ftp.login("admin", "secret");
ftp.setFileType(FTPClient.BINARY_FILE_TYPE);
for(File file : files) {
InputStream in = new FileInputStream(file);
ftp.storeFile(file.getName(), in);
in.close();
}
ftp.disconnect();
This should help.
If you still need better performance, I don't see any other alternative than using multiple threads.

After plenty of testing I have found reliability issues with multiple ftp threads to public servers which is what I need in this case. Most (if not all) ftp servers limit the maximum number of connections and also limit the maximum number of concurrent connections from the same IP address. Two concurrent connections from the same IP seems to be the only guaranteed maximum you are allowed. The realistic option as suggested above is to zip the files and ftp a single file. You can unzip the file when it gets there using an php script (as long as the server supports unzip you will need to check if this included in the php build). Finally if like me you need to upload in excess of 10,000 files many ftp servers will not show more than 9998 files (10,000 inlcuding . and ..)
If anyone knows of an ftp server free or cheap that supports ZipArchive in the php build and will list more than 9998 files when requesting a file listing in ftp can you please let me know.....

Who is tampering with my data stream?

The piece of code below downloads a file from some URL and saves it to a local file. Piece of cake. What could possible be wrong here?
protected long download(ProgressMonitor montitor) throws Exception{
long size = 0;
DataInputStream dis = new DataInputStream(is);
int read = 0;
byte[] chunk = new byte[chunkSize];
while( (read = dis.read(chunk)) != -1){
os.write(chunk, 0, read);
size += read;
if(montitor != null)
montitor.worked(read);
}
chunk = null;
dis.close();
os.flush();
os.close();
return size;
}
The reason I am posting a question here is because it works in 99.999% of the time and doesn't work as expected whenever there is an antivirus or some other protection software installed on a computer running this code. I am blindly pointing a finger that way because whenever I stop (or disable) it, the code works perfect again. The end result of such interference is that the MD5 of downloaded file don't match the expected, and a whole new saga begins.
So, the question is - is it really possible that some smart "protection" software would alter the actual stream coming from the URL without me knowing about it? And if yes - how do you deal with this? (verified with Kasperksy and Norton products).
EDIT-1:
Apparently I've got a hold on the problem and it's got nothing to do with antiviruses. The download takes place from the FTP server (FileZilla in particular) and we use apache commons ftp on client side . What I did is went to the FTP server and terminated the connection (kicked it out) in a middle of the download. I expected that is.read(..) would throw an IOException on client side, but this never happened. Instead, the is.read(..) returns -1 meaning that there is no more data coming from the stream. This is definitely unexpected and explains why sometimes I get partial files. This doesn't explain however why sometimes the data gets altered as well.

Yeah this happens to me all the time. In my case it's caused by transparent HTTP proxying by Websense on my corporate network. The worst problem are caused by the block page being returned with 200 OK.
Do you get the same or similar corruption every time? E.g., do you get some HTML explaining why the request was blocked? The best you can probably do is compare the first few bytes of the downloaded data to some text in the block page, and throw an exception in this case.
Edit: based on your update, have you got the FTP client set to image/binary mode?

S3 Java client fails a lot with "Premature end of Content-Length delimited message body" or "java.net.SocketException Socket closed"

I have an application that does a lot work on S3, mostly downloading files from it. I am seeing a lot of these kind of errors and I'd like to know if this is something on my code or if the service is really unreliable like this.
The code I'm using to read from the S3 object stream is as follows:
public static final void write(InputStream stream, OutputStream output) {
byte[] buffer = new byte[1024];
int read = -1;
try {
while ((read = stream.read(buffer)) != -1) {
output.write(buffer, 0, read);
}
stream.close();
output.flush();
output.close();
} catch (IOException e) {
throw new RuntimeException(e);
}
}
This OutputStream is a new BufferedOutputStream( new FileOutputStream( file ) ). I am using the latest version of the Amazon S3 Java client and this call is retried four times before giving up. So, after trying this for 4 times it still fails.
Any hints or tips on how I could possibly improve this are appreciated.

I just managed to overcome a very similar problem. In my case the exception I was getting was identical; it happened for larger files but not for small files, and it never happened at all while stepping through the debugger.
The root cause of the problem was that the AmazonS3Client object was getting garbage collected in the middle of the download, which caused the network connection to break. This happened because I was constructing a new AmazonS3Client object with every call to load a file, while the preferred use case is to create a long-lasting client object that survives across calls - or at least is guaranteed to be around during the entirety of the download. So, the simple remedy is to make sure a reference to the AmazonS3Client is kept around so that it doesn't get GC'd.
A link on the AWS forums that helped me is here: https://forums.aws.amazon.com/thread.jspa?threadID=83326

The network is closing the connection, prior to the client getting all the data, for one reason or another, that's what is going on.
Part of any HTTP Request is the content length, Your code is getting the header, saying hey buddy, here's data, and its this much of it.. and then the connection is dropping before the client has read all of the data.. so its bombing out with the exception.
I'd look at your OS/NETWORK/JVM connection timeout settings (though JVM generally inherit from the OS in this situation). The key is to figure out what part of the network is causing the problem. Is it your computer level settings saying, nope not going to wait any longer for packets.. is it that you are using a non blocking read, which has a timeout setting in your code, where it is saying, hey, haven't gotten any data from the server since longer than I'm supposed to wait so I'm going to drop the connection and exception. etc etc etc.
Best bet is to low level snoop the packet traffic and trace backwards, to see where the connection drop is happening, or see if you can up timeouts in things you can control, like your software, and OS/JVM.

First of all, your code is operating entirely normally if (and only if) you suffer connectivity troubles between yourself and Amazon S3. As Michael Slade points out, standard connection-level debugging advice applies.
As to your actual source code, I note a few code smells you should be aware of. Annotating them directly in the source:
public static final void write(InputStream stream, OutputStream output) {
byte[] buffer = new byte[1024]; // !! Abstract 1024 into a constant to make
// this easier to configure and understand.
int read = -1;
try {
while ((read = stream.read(buffer)) != -1) {
output.write(buffer, 0, read);
}
stream.close(); // !! Unexpected side effects: closing of your passed in
// InputStream. This may have unexpected results if your
// stream type supports reset, and currently carries no
// visible documentation.
output.flush(); // !! Violation of RAII. Refactor this into a finally block,
output.close(); // a la Reference 1 (below).
} catch (IOException e) {
throw new RuntimeException(e); // !! Possibly indicative of an outer
// try-catch block for RuntimeException.
// Consider keeping this as IOException.
}
}
(Reference 1)
Otherwise, the code itself seems fine. IO exceptions should be expected occurrences in situations where you're connecting to a fickle remote host, and your best course of action is to draft a sane policy to cache and reconnect in these scenarios.

Try using wireshark to see what is happening on the wire when this happens.
Try temporarily replacing S3 with your own web server and see if the problem persists. If it does it's your code and not S3.
The fact that it's random suggests network issues between your host and some of the S3 hosts.

Also S3 could close slow connections according to my experience.

I would take a very close look at the network equipment nearest your client app. This problem smacks of some network device dropping packets between you and the service. Look to see if there was a starting point when the problem first occurred. Was there any change like a firmware update to a router or replacement of a switch around that time?
Verify your bandwidth usage against the amount purchased from your ISP. Are there times of the day where you're approaching that limit? Can you obtain graphs of your bandwidth usage? See if the premature terminations can be correlated with high-bandwidth usage, particularly if it approaches some known limit. Does the problem seem to pick on smaller files and on large files only when they're almost finished downloading? Purchasing more bandwidth from your ISP may fix the problem.

Problem with Sending and Receiving Files with SPP over Bluetooth

I am attempting to transfer files (MP3s about six megabytes in size) between two PCs using SPP over Bluetooth (in Java, with the BlueCove API). I can get the file transfer working fine in one direction (for instance, one file from the client to the server), but when I attempt to send any data in the opposite direction during the same session (i.e., send a file from the server to the client), the program freezes and will not advance.
For example, if I simply:
StreamConnection conn;
OutputStream outputStream;
outputStream = conn.openOutputStream();
....
outputStream.write(data); //Data here is an MP3 file converted to byte array
outputStream.flush();
The transfer works fine. But if I try:
StreamConnection conn;
OutputStream outputStream;
InputStream inputStream;
ByteArrayOutputStream out = new ByteArrayOutputStream();
outputStream = conn.openOutputStream();
inputStream = conn.openInputStream();
....
outputStream.write(data);
outputStream.flush();
int receiveData;
while ((receiveData = inputStream.read()) != -1) {
out.write(receiveData);
}
Both the client and the server freeze, and will not advance. I can see that the file transfer is actually happening at some point, because if I kill the client, the server will still write the file to the hard drive, with no issues. I can try to respond with another file, or with just an integer, and it still will not work.
Anyone have any ideas what the problem is? I know OBEX is commonly used for file transfers over Bluetooth, but it seemed overkill for what I needed to do. Am I going to have to use OBEX for this functionality?

It could be as simple as both programs stuck in blocking receive calls, waiting for the other end to say something... try adding a ton of log statements so you can see what "state" each program is in (ie, so it gives you a running commentary such as "trying to recieve", "got xxx data", "trying to reply", etc), or set up debugging, wait until it gets stuck and then stop one of them and single step it.

you can certainly use SPP to transfer file between your applications (assuming you are sending and receiving at both ends using your application). From the code snippet it is difficult to tell what is wrong with your program.
I am guessing that you will have to close the stream as an indication to the other side that you are done with sending the data .. Note even though you write the whole file in one chunk, SPP / Bluetooth protocol layers might fragment it and the other end could receive in fragments, so you need to have some protocol to indicate transfer completion.

It is hard to say without looking at the client side code, but my guess, if the two are running the same code (i.e. both writing first, and then reading), is that the outputStream needs to be closed before the reading occurs (otherwise, both will be waiting for the other to close their side in order to get out of the read loop, since read() only returns -1 when the other side closes).
If the stream should not be closed, then the condition to stop reading cannot be to wait for -1. (so, either change it to transmit the file size first, or some other mechanism).

Why did you decide to use ByteArrayOutputStream? Try following code:
try {
try {
byte[] buf = new byte[1024];
outputstream = conn.openOutputStream();
inputStream = conn.openInputStream();
while ((n = inputstream.read(buf, 0, 1024)) > -1)
outputstream.write(buf, 0, n);
} finally {
outputstream.close();
inputstream.close();
log.debug("Closed input streams!");
}
} catch (Exception e) {
log.error(e);
e.printStackTrace();
}
And to convert the outputStream you could do something like this:
byte currentMP3Bytes[] = outputStream.toString().getBytes();
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream(currentMP3Bytes);

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

FTP file corrupt after download with Apache Commons Net - java

It seems to happen when using unescaped paths that contain spaces. E.g. C:/Documents and Settings/test Got it solved now by using a escaped path for the spaces. Thanks for your help

Related

Copying files from a remote address seems to loose information

how to upload multiple files using java

Who is tampering with my data stream?

S3 Java client fails a lot with "Premature end of Content-Length delimited message body" or "java.net.SocketException Socket closed"

Problem with Sending and Receiving Files with SPP over Bluetooth

Categories

Resources