I'm retrieving a file from a FTP Server. The file is encoded as UTF-8
ftpClient.connect(props.getFtpHost(), props.getFtpPort());
ftpClient.login(props.getUsername(), props.getPassword());
ftpClient.setFileType(FTP.BINARY_FILE_TYPE);
inputStream = ftpClient.retrieveFileStream(fileNameBuilder
.toString());
And then somewhere else I'm reading the input stream
bufferedReader = new BufferedReader(new InputStreamReader(
inputStream, "UTF-8"));
But the file is not getting read as UTF-8 Encoded!
I tried ftpClient.setAutodetectUTF8(true); but still doesn't work.
Any ideas?
EDIT:
For example a row in the original file is
...00248090041KENAN SARÐIN 00000000015.993FAC...
After downloading it through FTPClient, I parse it and load in a java object, one of the fields of the java object is name, which for this row is read as "KENAN SAR�IN"
I tried dumping to disk directly:
File file = new File("D:/testencoding/downloaded-file.txt");
FileOutputStream fop = new FileOutputStream(file);
ftpClient.retrieveFile(fileName, fop);
if (!file.exists()) {
file.createNewFile();
}
I compared the MD5 Checksums of the two files(FTP Server one and the and the one dumped to disk), and they're the same.
I would separate out the problems first: dump the file to disk, and compare it with the original. If it's the same as the original, the problem has nothing to do with UTF-8. The FTP code looks okay though, and if you're saying you want the raw binary data, I'd expect it not to mess with anything.
If the file is the same after transfer as before, then the problem has nothing to do with FTP. You say "the file is not getting read as UTF-8 Encoded" but it's not clear what you mean. How certain are you that it's UTF-8 text to start with? If you could edit your question with the binary data, how it's being read as text, and how you'd expect it to be read as text, that would really help.
Try to download the file content as bytes and not as characters using InputStream and OutputStream instead of InputStreamReader. This way you are sure that the file is not changed during transfer.
Related
I am facing a problem in saving a text file in UTF-8 format using java. When i click on save as for the generated text file, it gives ANSI as text format but not as UTF-8. Below is the code i am writing while creating the file:
String excelFile="";
excelFile = getFullFilePath(fileName);
File file = new File(excelFile + fileName,"UTF-8");
File output = new File(excelFile,"UTF-8");
FileUtils.writeStringToFile(file, content, "UTF-8");
While creating the text file, I am using UTF-8 encoding, but the file still shows the encoding as ANSI while saving.
Kindly help.
Instead of using File, create a FileOutputStream.
Try this.
Writer out = new BufferedWriter(new OutputStreamWriter(
new FileOutputStream("outfilename"), "UTF-8"));
try {
out.write(aString);
} finally {
out.close();
}
I had a problem very similar to that and I solved saving the original file with the UTF-8 encoding. In this case, go to the excel file and save it again, or create a new file and copy the content, and make sure that its encoding is UTF-8. This link has a tutorial on how to save an excel file with UTF-8 encoding: https://help.surveygizmo.com/help/encode-an-excel-file-to-utf-8-or-utf-16. At the most your code seems to be correct.
I need to write a file stream to database. The file content must be readable only through
the program. Manual open file should not display the readable content. I decided to use
ObjectOutput stream as it is the binary writing mechanism in java. But I can see the string
content when I open the file.
Writing to stream
ByteArrayOutputStream baos = new ByteArrayOutputStream();
ObjectOutputStream os = new ObjectOutputStream(baos);
os.writeObject("HIIIIIIIIIIIIIIIIIIIIII HOW ARE YOU");
The created content is look like
’ t #HIIIIIIIIIIIIIIIIIIIIII HOW ARE YOU
How to get complete binary stream output?
The file content must be readable only through the program. Manual open file should not display the readable content.
So you need some security.
I decided to use ObjectOutput stream as it is the binary writing mechanism in java.
That's (a) a non sequitur, and (b) security by obscurity: i.e. it is no security at all.
You should use encryption.
Good Evening, today i faced a strange situation when i used Print Writer in uploading files to a server, the file is transferred i tried to use FileOutPutStream instead and it solves the problem, my question is why PrintWriter does that strange behaviour, here's the code that i used in uploading a file and save it at the server:
public void doPost(HttpServletRequest request,HttpServletResponse response)throws IOException,ServletException{
int i;
if(request instanceof MultipartWrapper){
String DestinationPath="C:\\";
MultipartWrapper request1=(MultipartWrapper)request;
File f=request1.getFile("photo");
java.io.FileInputStream fis=new java.io.FileInputStream(f);
//PrintWriter out=new PrintWriter(DestinationPath+f.getName()); causes the problem mentioned above
java.io.FileOutputStream out=new java.io.FileOutputStream(DestinationPath+f.getName());
while((i=fis.read())!=-1){
out.write(i);
}
fis.close();
out.close();
}
}
You need to understand the difference between Writers and OutputStreams. PrintWriter.write(int) is writing a character, while FileOutputStream.write(int) is writing a byte. you were accidentally converting bytes to characters, which was corrupting your file. in general, when just copying streams around, you want to stick to bytes.
PrintWriter will create a Writer using the default encoding, while FileOutputStream will simply write raw bytes out. Provided that your original content and the server side use the same encoding, you won't have problems writing bytes and reinterpreting them. However, when you use the PrintWriter, the default system encoding is used, potentially mucking up your data.
I tried adding UTF-8 for this but it didn't work out. What should i do for reading a Russian file in Java?
FileInputStream fstream1 = new FileInputStream("russian.txt");
DataInputStream in = new DataInputStream(fstream1);
BufferedReader br = new BufferedReader(new InputStreamReader(in,"UTF-8"));
If the file is from Windows PC, try either "windows-1251" or "Cp1251" for the charset name.
If the file is somehow in the MS-DOS encoding, try using "Cp866".
Both of these are single-byte encodings and changing the file type to UTF-8 (which is multibyte) does nothing.
If all else fails, use the hex editor and dump a few hex lines of these file to you question. Then we'll detect the encoding.
As others mentioned you need to know how the file is encoded. A simple check is to (ab)use Firefox as an encoding detector: answer to similar question
If this is a display problem, it depends what you mean by "reads": in the console, in some window? See also How can I make a String with cyrillic characters display correctly?
I'm reading a file line by line. The file is encoded by CipherOutputStream and then later compressed by DeflaterOutputStream. The file can consist of UTF-8 characters, like Russian letters, etc.
I want to obtain the offset in actually read file, or the number of bytes read by br.ReadLine() command. The problem is that the file is both encrypted, and deflated, so length of read String is larger than number of read bytes in file.
InputStream fis=tempURL.openStream(); //in tempURL I've got an URL to download
CipherInputStream cis=new CipherInputStream(fis,pbeCipher); //CipherStream
InflaterInputStream iis=new InflaterInputStream(cis); //InflaterInputStream
BufferedReader br = new BufferedReader(
new InputStreamReader(iis, "UTF8")); //BufferedReader
br.readLine();
int fSize=tempURL.openConnection().getContentLength(); //Catch FileSize
Use a CountingInputStream from the Apache Commons IO project:
InputStream fis=tempURL.openStream();
CountingInputStream countStream = new CountingInputStream(fis);
CipherInputStream cis=new CipherInputStream(countStream,pbeCipher);
...
Later you can obtain the file position with countStream.getByteCount().
For compressed files, you can find that a String doesn't use a whole number of bytes so the question cannot be answered. e.g. a byte can take less than a byte when compressed (otherwise there would be no point trying to compress it)
BTW: Is usually best to compress the data before encrypting it as it will usually be much more compact. Compressing the data after it has been encrypted will only help if its output is base 64 or something similar. Compression works best when you can the contents become predictable (e.g. repeating sequences, common characters) whereas the porpose of encryption is to make the data appear unpredictable.