can not save utf8 file in windows server with java - java

I have a simple java application that saves some String in utf-8 encode.
But when I open that file with notepad and save as,it shows it's encode ANSI.Now I don't know where is the problem?
My code that save the file is
File fileDir = new File("c:\\Sample.txt");
Writer out = new BufferedWriter(new OutputStreamWriter(
new FileOutputStream(fileDir), "UTF8"));
out.append("kodehelp UTF-8").append("\r\n");
out.append("??? UTF-8").append("\r\n");
out.append("???? UTF-8").append("\r\n");
out.flush();
out.close();

The characters you are writing to the file, as they appear in the code snippet, are in the basic ASCII subset of UFT-8. Notepad is likely auto-detecting the format, and seeing nothing outside the ASCII range, decides the file is ANSI.
If you want to force a different decision, place characters such as 字 or õ which are well out of the ASCII range.
It is possible that the ??? strings in your example were intended to be UTF-8. If so. make sure your IDE and/or build tool recognizes the files as UTF-8, and the files are indeed UTF-8 encoded. If you provide more information about your build system, then we can help further.

Related

Java UTF-8 non ASCII chars not supported on Windows

I am working on a tool which produces a zip with some generated files. Some of my users using Windows 10 reported me that when I add a string into a file within a zip, non ascii chars are replaced by "?"
It is really strange because that works perfectly on linux (nixos). Do you have any idea?
fis = new ByteArrayInputStream(content.getBytes(StandardCharsets.UTF_8));
...
public static void addToZip(String zipFilePath, final InputStream fis, final ZipOutputStream zos)
throws IOException {
final ZipEntry zipEntry = new ZipEntry(zipFilePath);
zipEntry.setLastModifiedTime(FileTime.fromMillis(0L));
zos.putNextEntry(zipEntry);
final byte[] bytes = new byte[1024];
int length;
while ((length = fis.read(bytes)) >= 0)
zos.write(bytes, 0, length);
zos.closeEntry();
fis.close();
if (!(Settings.PROTECTION.toBool()))
return;
zipEntry.setCrc(bytes.length);
zipEntry.setSize(new BigInteger(bytes).mod(BigInteger.valueOf(Long.MAX_VALUE)).longValue());
}
...
final ZipOutputStream zos = new ZipOutputStream(fos, StandardCharsets.UTF_8);
Since your description and your code is incomplete, I'm making a few assumptions:
Some of the files in zip archive are text files (or similar such as CSV).
The zip archives are created on a Linux system (or a single system under your control).
The zip archives are the sent to your users and then used on different operating systems.
If so, the problem is not related to the zip archive. Instead, it's a general problem of text files. It would also occur if you just sent a single text file.
The cause of the problem is that text files do not contain any reliable information about the encoding. On your side, the text file is created using UTF-8 encoding. On the users side, different operating systems and different tools are used to view or process the text file. Some of these tools might make an effort to determine the encoding and guess it correctly. But if they just use the operating system's default encoding, users with Windows will use the incorrect encoding as Windows defaults to Windows-1252 and similar encodings.
The result of processing an UTF-8 encoded file with Windows-1252 encoding is that bytes that are not valid in Windows-1252 are shown as "?".
If your users view the text files with text editors, ask them to set the text editor to UTF-8. If the text files are processed with custom software, ask them to modify the software such that it explicitly uses UTF-8.

Different character after export application on Eclipse

When I run my application on Eclipse, I can see the correct latin character, like this:
But when I export to runnable jar file and execute it, the special character is wrong, like this:
I have no idea why this happen. On Mac it's ok both on Eclipse and .jar file. But on Windows it's not ok.
I get the data from webserver and I show in a JavaFX ListView.
It is a String turned into UTF-8 bytes shown as some Windows encoding.
My guess you did this:
URL url = ...
BufferedReader in = new BufferedReader(
new InputStreamReader(url.openStream());
Whereas you should have done this:
URL url = ...
BufferedReader in = new BufferedReader(
new InputStreamReader(url.openStream(),
StandardCharsets.UTF_8));
The constructor InputStreamReader without Charset will use the current default platform encoding - wrong.
For any URL you could first do an openConnection and try to divine the delivered encoding. The strategy is a bit circumstantial:
connection.getContentEncoding() / getContentType
default is ISO-8859-1
When ISO-8859-1 take Windows-1252 instead, as browser do that too
Java keeps Unicode in String, char, so all scripts can be handled simultaneous.
Binary data, byte[], InputStream, OutputStream, need to have the charset/encoding specified, when it must be converted from/to text.

How to convert strange character from web page?

In the web page, it is "Why don't we" as follows:
But when I parse the webpage and save it to a text file, it becomes this under eclipse:
Why don鈥檛 we
More information about my implementation:
The webpage is: utf-8
I use jSoup to parse, the file is saved as a txt.
I use FileWriter f = new FileWriter() to write to file.
UPDATE:
I actually solve the display problem in eclipse by changing eclipse's encoding to utf-8.
FileWriter is a utility class that uses the default current platform encoding. That is non-portable, and probably incorrect.
BufferedWriter f = new BufferedWriter(New OutputStreamWriter(
new FileOutputStream(file), StandardCharsets.UTF_9));
f,Write("\uFEFF"); // Redundant BOM character might be written to be sure
// the text is read as UTF-8
...

Exporting CSV in french language shows junk charcters

I am having a problem in exporting a csv file using au.com.bytecode.opencsv.CSVWriter. I did something like:
File file = File.createTempFile("UserDetails_", ".csv");
CSVWriter writer = new CSVWriter(new OutputStreamWriter(
new FileOutputStream(file), "UTF-8"),
',');
and then when I exporting the .csv file, it shows the junk characters for french letters.[Data to be saved in the .csv are french characters].
But previously I was doing something like:
CSVWriter writer = new CSVWriter(new FileWriter(file));, then it was perfectly showing all french characters in Windows environment, but in Prod environment[Linux] it was showing junks. So I thought to use the Character set UTF-8 for the file format to be exported.
How can I get rid of the problem?
Please Suggest!!
Thanks in advance!
Hypothesis: you use Excel to open your CSVs under Windows.
Unfortunately for you, Excel is crap at reading UTF-8. Even though it should not be required, Excel expects to have a byte order mark at the beginning of the CSV if it uses any UTF-* encoding, otherwise it will try and read it using Windows 1252!
Solution? Errr... Don't use Excel?
Anyway, with your old way:
CSVWriter writer = new CSVWriter(new FileWriter(file));
this would use the JVM's default encoding; this is windows-1252 under Windows and UTF-8 under Linux.
Note that Apache's commons-io has BOM{Input,Output}Stream classes which may help you here.
Another solution would be (ewwww) to always read/write using Windows-1252.
Other note: if you use Java 7, use the Files.newBuffered{Reader,Writer}() methods -- and the try-with-resources statement.

How can i read a Russian file in Java?

I tried adding UTF-8 for this but it didn't work out. What should i do for reading a Russian file in Java?
FileInputStream fstream1 = new FileInputStream("russian.txt");
DataInputStream in = new DataInputStream(fstream1);
BufferedReader br = new BufferedReader(new InputStreamReader(in,"UTF-8"));
If the file is from Windows PC, try either "windows-1251" or "Cp1251" for the charset name.
If the file is somehow in the MS-DOS encoding, try using "Cp866".
Both of these are single-byte encodings and changing the file type to UTF-8 (which is multibyte) does nothing.
If all else fails, use the hex editor and dump a few hex lines of these file to you question. Then we'll detect the encoding.
As others mentioned you need to know how the file is encoded. A simple check is to (ab)use Firefox as an encoding detector: answer to similar question
If this is a display problem, it depends what you mean by "reads": in the console, in some window? See also How can I make a String with cyrillic characters display correctly?

Categories