OpenCSV writing unescaped escape char - java

I am using OpenCSV 2.3 to read and write files data, but when, I switch Windows PC into in Japanese Language then, I notice that OpenCSV write file method internally uses Print writer that is converting yen char to \
As a result - the CSV file created ends up with unescaped \, and reading such file using CSVReader fails.
How could I fix this problem ?

Further investigated more on this problem and noticed that, this is not a problem of CSVWrite file method. Although, CSVWrite file methods is working Fine.
Now, Where is problem?
Previously, I was using FileWriter, It uses the system default Encoding. (In other words, If we uses the FileWriter then encoding of writing/reading files is depends on the mercy of Writer).
So, I tried/use
csvReader = new CSVReader(new BufferedReader(new InputStreamReader(new FileInputStream(inputFile), "UTF-8")));
to tell the Reader and Writer just read and write file in specified Encoding system not on systems's default.

Related

can not save utf8 file in windows server with java

I have a simple java application that saves some String in utf-8 encode.
But when I open that file with notepad and save as,it shows it's encode ANSI.Now I don't know where is the problem?
My code that save the file is
File fileDir = new File("c:\\Sample.txt");
Writer out = new BufferedWriter(new OutputStreamWriter(
new FileOutputStream(fileDir), "UTF8"));
out.append("kodehelp UTF-8").append("\r\n");
out.append("??? UTF-8").append("\r\n");
out.append("???? UTF-8").append("\r\n");
out.flush();
out.close();
The characters you are writing to the file, as they appear in the code snippet, are in the basic ASCII subset of UFT-8. Notepad is likely auto-detecting the format, and seeing nothing outside the ASCII range, decides the file is ANSI.
If you want to force a different decision, place characters such as 字 or õ which are well out of the ASCII range.
It is possible that the ??? strings in your example were intended to be UTF-8. If so. make sure your IDE and/or build tool recognizes the files as UTF-8, and the files are indeed UTF-8 encoded. If you provide more information about your build system, then we can help further.

How can I write UTF-8 chars on java application?

I want to write
ısı
to csv on java netbeans. It works fine when I debug the code. But when I clean and build the project, I run .jar application and then when I look the csv I see
?s?
How can I solve this ?
thanks in advance.
EDIT
I use this to write :
PrintWriter csvWriter = new PrintWriter(new File("myfile.csv")) ;
csvWriter.println("ısı") ;
With this code:
PrintWriter csvWriter = new PrintWriter(new File("myfile.csv")) ;
csvWriter.println("ısı") ;
you are using the default character encoding of your system, which may or may not be UTF-8. If you want to use UTF-8, you have to specify that:
PrintWriter csvWriter = new PrintWriter(new File("myfile.csv"), "UTF-8");
Note that even if you do this, you might still see unexpected output. If that's the case, then you will need to check if whatever program you use to display the output (the Windows command prompt, or a text editor, or ...) understands that the file is in UTF-8 and displays it correctly.

How to convert strange character from web page?

In the web page, it is "Why don't we" as follows:
But when I parse the webpage and save it to a text file, it becomes this under eclipse:
Why don鈥檛 we
More information about my implementation:
The webpage is: utf-8
I use jSoup to parse, the file is saved as a txt.
I use FileWriter f = new FileWriter() to write to file.
UPDATE:
I actually solve the display problem in eclipse by changing eclipse's encoding to utf-8.
FileWriter is a utility class that uses the default current platform encoding. That is non-portable, and probably incorrect.
BufferedWriter f = new BufferedWriter(New OutputStreamWriter(
new FileOutputStream(file), StandardCharsets.UTF_9));
f,Write("\uFEFF"); // Redundant BOM character might be written to be sure
// the text is read as UTF-8
...

Exporting CSV in french language shows junk charcters

I am having a problem in exporting a csv file using au.com.bytecode.opencsv.CSVWriter. I did something like:
File file = File.createTempFile("UserDetails_", ".csv");
CSVWriter writer = new CSVWriter(new OutputStreamWriter(
new FileOutputStream(file), "UTF-8"),
',');
and then when I exporting the .csv file, it shows the junk characters for french letters.[Data to be saved in the .csv are french characters].
But previously I was doing something like:
CSVWriter writer = new CSVWriter(new FileWriter(file));, then it was perfectly showing all french characters in Windows environment, but in Prod environment[Linux] it was showing junks. So I thought to use the Character set UTF-8 for the file format to be exported.
How can I get rid of the problem?
Please Suggest!!
Thanks in advance!
Hypothesis: you use Excel to open your CSVs under Windows.
Unfortunately for you, Excel is crap at reading UTF-8. Even though it should not be required, Excel expects to have a byte order mark at the beginning of the CSV if it uses any UTF-* encoding, otherwise it will try and read it using Windows 1252!
Solution? Errr... Don't use Excel?
Anyway, with your old way:
CSVWriter writer = new CSVWriter(new FileWriter(file));
this would use the JVM's default encoding; this is windows-1252 under Windows and UTF-8 under Linux.
Note that Apache's commons-io has BOM{Input,Output}Stream classes which may help you here.
Another solution would be (ewwww) to always read/write using Windows-1252.
Other note: if you use Java 7, use the Files.newBuffered{Reader,Writer}() methods -- and the try-with-resources statement.

Character encoding via JDBC/ODBC/Microsoft Access

I'm doing a connection via JDBC/ODBC to Microsoft Access successfully. After that, I make a query to select rows from Microsoft Access, and I write these results to a TXT file. Everything is OK, but I have some strings that include accents, and these appear as '?' in TXT file. I already tried various forms of methods to write files in java, as PrintWriter, FileWriter, Outputstream, and others, including adding character encoding parameter (UTF-8 or ISO-8859-1) to some these methods. I need any help about some way to show these characters in a right way. Thanks.
Try the below line,
String OUTPUTFILE = "PATH/TO/FILE/";
BufferedWriter bf = new BufferedWriter(new OutputStreamWriter(
new FileOutputStream(OUTPUTFILE),"UTF8"));
Once you add that to your code you should be fine using bf.write('VALUE') to write UTF8 characters to your file. And, also make sure to set your text editor encoding to Unicode or UTF8, if you don't it might seem like the hole process didn't work which would lead to even more confusion.
Edited:
To read UTF8 txts
String IPUTFILE = "PATH/TO/File";
BufferedReader in = new BufferedReader(
new InputStreamReader(
new FileInputStream(INPUTFILE), "UTF8"));
then to read line String str = in.readLine();

Categories