I am facing a problem in saving a text file in UTF-8 format using java. When i click on save as for the generated text file, it gives ANSI as text format but not as UTF-8. Below is the code i am writing while creating the file:
String excelFile="";
excelFile = getFullFilePath(fileName);
File file = new File(excelFile + fileName,"UTF-8");
File output = new File(excelFile,"UTF-8");
FileUtils.writeStringToFile(file, content, "UTF-8");
While creating the text file, I am using UTF-8 encoding, but the file still shows the encoding as ANSI while saving.
Kindly help.
Instead of using File, create a FileOutputStream.
Try this.
Writer out = new BufferedWriter(new OutputStreamWriter(
new FileOutputStream("outfilename"), "UTF-8"));
try {
out.write(aString);
} finally {
out.close();
}
I had a problem very similar to that and I solved saving the original file with the UTF-8 encoding. In this case, go to the excel file and save it again, or create a new file and copy the content, and make sure that its encoding is UTF-8. This link has a tutorial on how to save an excel file with UTF-8 encoding: https://help.surveygizmo.com/help/encode-an-excel-file-to-utf-8-or-utf-16. At the most your code seems to be correct.
Related
In the web page, it is "Why don't we" as follows:
But when I parse the webpage and save it to a text file, it becomes this under eclipse:
Why don鈥檛 we
More information about my implementation:
The webpage is: utf-8
I use jSoup to parse, the file is saved as a txt.
I use FileWriter f = new FileWriter() to write to file.
UPDATE:
I actually solve the display problem in eclipse by changing eclipse's encoding to utf-8.
FileWriter is a utility class that uses the default current platform encoding. That is non-portable, and probably incorrect.
BufferedWriter f = new BufferedWriter(New OutputStreamWriter(
new FileOutputStream(file), StandardCharsets.UTF_9));
f,Write("\uFEFF"); // Redundant BOM character might be written to be sure
// the text is read as UTF-8
...
I am having a problem in exporting a csv file using au.com.bytecode.opencsv.CSVWriter. I did something like:
File file = File.createTempFile("UserDetails_", ".csv");
CSVWriter writer = new CSVWriter(new OutputStreamWriter(
new FileOutputStream(file), "UTF-8"),
',');
and then when I exporting the .csv file, it shows the junk characters for french letters.[Data to be saved in the .csv are french characters].
But previously I was doing something like:
CSVWriter writer = new CSVWriter(new FileWriter(file));, then it was perfectly showing all french characters in Windows environment, but in Prod environment[Linux] it was showing junks. So I thought to use the Character set UTF-8 for the file format to be exported.
How can I get rid of the problem?
Please Suggest!!
Thanks in advance!
Hypothesis: you use Excel to open your CSVs under Windows.
Unfortunately for you, Excel is crap at reading UTF-8. Even though it should not be required, Excel expects to have a byte order mark at the beginning of the CSV if it uses any UTF-* encoding, otherwise it will try and read it using Windows 1252!
Solution? Errr... Don't use Excel?
Anyway, with your old way:
CSVWriter writer = new CSVWriter(new FileWriter(file));
this would use the JVM's default encoding; this is windows-1252 under Windows and UTF-8 under Linux.
Note that Apache's commons-io has BOM{Input,Output}Stream classes which may help you here.
Another solution would be (ewwww) to always read/write using Windows-1252.
Other note: if you use Java 7, use the Files.newBuffered{Reader,Writer}() methods -- and the try-with-resources statement.
I am using Super CSV for creating CSV file. When I write special characters like umlaut characters ï or î then it breaks in the generated CSV when I open it using Excel. When I open the same file using Notepad++ it shows the umlaut characters perfectly.
Any idea what could be the cause? I have specified UTF-8 encoding. Is there something that I am missing?
ICsvListWriter writer = new CsvListWriter(
getWriter(getCUSTOMEREXPORT_FOLDERPATH()+filename),
CsvPreference.STANDARD_PREFERENCE);
private static OutputStreamWriter getWriter(String fileName) {
final File file = new File(fileName);
return new OutputStreamWriter(new FileOutputStream(file),
Charset.forName("UTF-8"));
}
Microsoft and its Excel are too stupid for UTF-8. Excel strictly requires a byte order mark to realize that this is UTF-8 actually. Add it and it will work.
I'm retrieving a file from a FTP Server. The file is encoded as UTF-8
ftpClient.connect(props.getFtpHost(), props.getFtpPort());
ftpClient.login(props.getUsername(), props.getPassword());
ftpClient.setFileType(FTP.BINARY_FILE_TYPE);
inputStream = ftpClient.retrieveFileStream(fileNameBuilder
.toString());
And then somewhere else I'm reading the input stream
bufferedReader = new BufferedReader(new InputStreamReader(
inputStream, "UTF-8"));
But the file is not getting read as UTF-8 Encoded!
I tried ftpClient.setAutodetectUTF8(true); but still doesn't work.
Any ideas?
EDIT:
For example a row in the original file is
...00248090041KENAN SARÐIN 00000000015.993FAC...
After downloading it through FTPClient, I parse it and load in a java object, one of the fields of the java object is name, which for this row is read as "KENAN SAR�IN"
I tried dumping to disk directly:
File file = new File("D:/testencoding/downloaded-file.txt");
FileOutputStream fop = new FileOutputStream(file);
ftpClient.retrieveFile(fileName, fop);
if (!file.exists()) {
file.createNewFile();
}
I compared the MD5 Checksums of the two files(FTP Server one and the and the one dumped to disk), and they're the same.
I would separate out the problems first: dump the file to disk, and compare it with the original. If it's the same as the original, the problem has nothing to do with UTF-8. The FTP code looks okay though, and if you're saying you want the raw binary data, I'd expect it not to mess with anything.
If the file is the same after transfer as before, then the problem has nothing to do with FTP. You say "the file is not getting read as UTF-8 Encoded" but it's not clear what you mean. How certain are you that it's UTF-8 text to start with? If you could edit your question with the binary data, how it's being read as text, and how you'd expect it to be read as text, that would really help.
Try to download the file content as bytes and not as characters using InputStream and OutputStream instead of InputStreamReader. This way you are sure that the file is not changed during transfer.
I tried adding UTF-8 for this but it didn't work out. What should i do for reading a Russian file in Java?
FileInputStream fstream1 = new FileInputStream("russian.txt");
DataInputStream in = new DataInputStream(fstream1);
BufferedReader br = new BufferedReader(new InputStreamReader(in,"UTF-8"));
If the file is from Windows PC, try either "windows-1251" or "Cp1251" for the charset name.
If the file is somehow in the MS-DOS encoding, try using "Cp866".
Both of these are single-byte encodings and changing the file type to UTF-8 (which is multibyte) does nothing.
If all else fails, use the hex editor and dump a few hex lines of these file to you question. Then we'll detect the encoding.
As others mentioned you need to know how the file is encoded. A simple check is to (ab)use Firefox as an encoding detector: answer to similar question
If this is a display problem, it depends what you mean by "reads": in the console, in some window? See also How can I make a String with cyrillic characters display correctly?