Chinese Characters Are Misspelled in OI Operations - java

I try to write chinese character but take a wrong result
For Instance :
import java.io.*;
import java.nio.*;
class x {
public static void main(String... args) throws Exception {
OutputStreamWriter outputStreamWriter =
new OutputStreamWriter(new FileOutputStream(new File("practice.csv"), true), "GBK");
outputStreamWriter.write("常用场景");
outputStreamWriter.write("Helo World!");
outputStreamWriter.flush();
outputStreamWriter.close();
}
}
Response : ????¡±¡§??????Helo World!
I tried to change charset utf-8, utf-16 but it doesn't anything and lastly I tried to add BufferedWriter but unfortunately it doesn't anything again.
then I considered to change csv to txt, but again same result. What am I doing wrong ?

I found it finally. Firstly very thanks for helping #Kayaman and #user16320675.
In fact, everything was correct. This problem's resource is csv files is opened by excel. When you want to open csv files directly in excel, it opens according to the encoding of the current computer language. We just have a option in Windows 10 EN(manually Data Import). I used the windows 10 EN and excel uses ANSI for windows 10 EN.
My Solution : I added to chinese language pack to my windows 10 computer and I changed the excel editing language (chinese for default) and everything worked.

Related

Java: Runtime.exec() and Unicode symbols on Windows: how to make it work with non-English letters?

Intro
I am using Runtime.exec() to execute some external command and I am using parameters that contain non-English characters. I simply want to run something like this:
python test.py шалом
It works correctly in cmd directly, but is incorrectly handled via Runtime.exec.getRuntime()("python test.py шалом")
On Windows my external program fails due to unknown symbols passed to it.
I remember similar issue from early 2010s (!) - JDK-4947220, but I thought it is already fixed since Java core 1.6.
Environments:
OS: Name Microsoft Windows 10 Pro (Version 10.0.18362 Build 18362)
Java: jdk1.8.0_221
Code
To understand the question - the best way is to use code snippet listed below:
import java.io.BufferedReader;
import java.io.InputStreamReader;
public class MainClass {
private static void foo(String filename) {
try {
BufferedReader input = new BufferedReader(
new InputStreamReader(
Runtime.getRuntime().exec(filename).getInputStream()));
String line;
while ((line = input.readLine()) != null) {
System.out.println(line);
}
input.close();
} catch (Exception e) { /* ... */ }
}
public static void main(String[] args) {
foo("你好.bat 你好"); // ??
foo("привет.bat привет"); // ??????
foo("hi.bat hi"); // hi
}
}
Where .bat file contains only simple #echo %1
The output will be:
??
??????
hi
PS
System.out.println("привет") - works fine and prints everything correctly
Questions are the following:
1) Is this issue related to Utf-8 utf-16 formats?
2) How to fix this issue? I do not like this answer as it looks like a very dangerous and ugly workaround.
3) Does anyone know why file names of batch file is not broken and this file can be found, but the argument gets broken? May be it is problem of #echo?
Yes, issue is related with UTF. Theoretically a setting 65001 codepage for cmd that executes the bat files should solve the issue (along with setting UTF-8 charset as default from the Java side)
Unfortunately there a bug in Windows mentioning here Java, Unicode, UTF-8, and Windows Command Prompt
So there's no simple and complete solution. What it's possible to do is to set the same default language-specific encoding, like cp1251 Cyrillic, for both java and cmd. Not all languages are well reflected in the windows encodings, for example Chinese is one of them.
If there's some non-technical restriction on the windows system to change default encoding to the language-specific one for all cmd processes, the java code will be more complicated. At beginning new cmd process have to be created and to its stdin/stdout streams should be attached reader with UTF-16LE (for `cmd /U' process) and writer with CP1251 from different threads. First command sending to stdin from java should be 'chcp 1251' and second is the name of bat-file with its parameters.
Complete solution still may use UTF-16LE for reading of cmd output but to pass a text in, other universal encoding should be used, for example base64, which again leads to increasing complexity

can not save utf8 file in windows server with java

I have a simple java application that saves some String in utf-8 encode.
But when I open that file with notepad and save as,it shows it's encode ANSI.Now I don't know where is the problem?
My code that save the file is
File fileDir = new File("c:\\Sample.txt");
Writer out = new BufferedWriter(new OutputStreamWriter(
new FileOutputStream(fileDir), "UTF8"));
out.append("kodehelp UTF-8").append("\r\n");
out.append("??? UTF-8").append("\r\n");
out.append("???? UTF-8").append("\r\n");
out.flush();
out.close();
The characters you are writing to the file, as they appear in the code snippet, are in the basic ASCII subset of UFT-8. Notepad is likely auto-detecting the format, and seeing nothing outside the ASCII range, decides the file is ANSI.
If you want to force a different decision, place characters such as 字 or õ which are well out of the ASCII range.
It is possible that the ??? strings in your example were intended to be UTF-8. If so. make sure your IDE and/or build tool recognizes the files as UTF-8, and the files are indeed UTF-8 encoded. If you provide more information about your build system, then we can help further.

How to convert strange character from web page?

In the web page, it is "Why don't we" as follows:
But when I parse the webpage and save it to a text file, it becomes this under eclipse:
Why don鈥檛 we
More information about my implementation:
The webpage is: utf-8
I use jSoup to parse, the file is saved as a txt.
I use FileWriter f = new FileWriter() to write to file.
UPDATE:
I actually solve the display problem in eclipse by changing eclipse's encoding to utf-8.
FileWriter is a utility class that uses the default current platform encoding. That is non-portable, and probably incorrect.
BufferedWriter f = new BufferedWriter(New OutputStreamWriter(
new FileOutputStream(file), StandardCharsets.UTF_9));
f,Write("\uFEFF"); // Redundant BOM character might be written to be sure
// the text is read as UTF-8
...

saving Java file in UTF-8

When I run this program it gives me a '?' for the unicode code-point \u0508. This is because the default windows character encoding CP-1252 is unable to map this code-point.
But when I save this file in Eclipse as 'Text file encoding' = UTF-8 and run this program it gives me the correct output AԈC.
why does this work? I mean the java file is saved as UTF-8 but still the underlying windows OS encoding is CP-1252. My question is similar to, when I try to read a text file in UTF-16 which was originally written in UTF-8, the output is wierd with different box symbols.
public class e {
public static void main(String[] args) {
System.out.println(System.getProperty("file.encoding"));
String original = new String("A" + "\u0508" + "C");
try {
System.out.println("original = " + original);
} catch (Exception e) {
e.printStackTrace();
}
}
}
Saving the Java source file either as UTF-8 or Windows-1252 shouldn't make any difference, because both encodings encode all the ASCII code-points the same way. And your source file is only using ASCII characters.
So, that you should try to find the bug somewhere else. I suggest to redo the steps you did with care and do the tests over.
The issue is the setting of file.encoding when you run the program, and the destination of System.out. If System.out is an eclipse console, it may well be set to be UTF-8 eclipse console. If it's just a Windows DOS box, it is a CP1252 code page, and will only display ? in this case.

Carriage Return\Line feed in Java

I have created a text file in Unix environment using Java code.
For writing the text file I am using java.io.FileWriter and BufferedWriter. And for newline after each row I am using bw.newLine() method (where bw is object of BufferedWriter).
And I'm sending that text file by attaching in mail from Unix environment itself (automated that using Unix commands).
My issue is, after I download the text file from mail in a Windows system, if I
opened that text file the data is not properly aligned. newline() character is
not working, I think so.
I want same text file alignment as it is in Unix environment, if I opened the
text file in Windows environment also.
How do I resolve the problem?
Java code below for your reference (running in Unix environment):
File f = new File(strFileGenLoc);
BufferedWriter bw = new BufferedWriter(new FileWriter(f, false));
rs = stmt.executeQuery("select * from jpdata");
while ( rs.next() ) {
bw.write(rs.getString(1)==null? "":rs.getString(1));
bw.newLine();
}
Java only knows about the platform it is currently running on, so it can only give you a platform-dependent output on that platform (using bw.newLine()) . The fact that you open it on a windows system means that you either have to convert the file before using it (using something you have written, or using a program like unix2dos), or you have to output the file with windows format carriage returns in it originally in your Java program. So if you know the file will always be opened on a windows machine, you will have to output
bw.write(rs.getString(1)==null? "":rs.getString(1));
bw.write("\r\n");
It's worth noting that you aren't going to be able to output a file that will look correct on both platforms if it is just plain text you are using, you may want to consider using html if it is an email, or xml if it is data. Alternatively, you may need some kind of client that reads the data and then formats it for the platform that the viewer is using.
The method newLine() ensures a platform-compatible new line is added (0Dh 0Ah for DOS, 0Dh for older Macs, 0Ah for Unix/Linux). Java has no way of knowing on which platform you are going to send the text. This conversion should be taken care of by the mail sending entities.
Don't know who looks at your file, but if you open it in wordpad instead of notepad, the linebreaks will show correct. In case you're using a special file extension, associate it with wordpad and you're done with it. Or use any other more advanced text editor.
bw.newLine(); cannot ensure compatibility with all systems.
If you are sure it is going to be opened in windows, you can format it to windows newline.
If you are already using native unix commands, try unix2dos and convert teh already generated file to windows format and then send the mail.
If you are not using unix commands and prefer to do it in java, use ``bw.write("\r\n")` and if it does not complicate your program, have a method that finds out the operating system and writes the appropriate newline.
If I understand you right, we talk about a text file attachment.
Thats unfortunate because if it was the email's message body, you could always use "\r\n", referring to http://www.faqs.org/rfcs/rfc822.html
But as it's an attachment, you must live with system differences. If I were in your shoes, I would choose one of those options:
a) only support windows clients by using "\r\n" as line end.
b) provide two attachment files, one with linux format and one with windows format.
c) I don't know if the attachment is to be read by people or machines, but if it is people I would consider attaching an HTML file instead of plain text. more portable and much prettier, too :)
Encapsulate your writer to provide char replacement, like this:
public class WindowsFileWriter extends Writer {
private Writer writer;
public WindowsFileWriter(File file) throws IOException {
try {
writer = new OutputStreamWriter(new FileOutputStream(file), "ISO-8859-15");
} catch (UnsupportedEncodingException e) {
writer = new FileWriter(logfile);
}
}
#Override
public void write(char[] cbuf, int off, int len) throws IOException {
writer.write(new String(cbuf, off, len).replace("\n", "\r\n"));
}
#Override
public void flush() throws IOException {
writer.flush();
}
#Override
public void close() throws IOException {
writer.close();
}
}

Categories