Write a file in UTF-8 using FileWriter (Java)? - java

I have the following code however, I want it to write as a UTF-8 file to handle foreign characters. Is there a way of doing this, is there some need to have a parameter?
I would really appreciate your help with this. Thanks.
try {
BufferedReader reader = new BufferedReader(new FileReader("C:/Users/Jess/My Documents/actresses.list"));
writer = new BufferedWriter(new FileWriter("C:/Users/Jess/My Documents/actressesFormatted.csv"));
while( (line = reader.readLine()) != null) {
//If the line starts with a tab then we just want to add a movie
//using the current actor's name.
if(line.length() == 0)
continue;
else if(line.charAt(0) == '\t') {
readMovieLine2(0, line, surname.toString(), forename.toString());
} //Else we've reached a new actor
else {
readActorName(line);
}
}
} catch (IOException e) {
e.printStackTrace();
}

Safe Encoding Constructors
Getting Java to properly notify you of encoding errors is tricky. You must use the most verbose and, alas, the least used of the four alternate contructors for each of InputStreamReader and OutputStreamWriter to receive a proper exception on an encoding glitch.
For file I/O, always make sure to always use as the second argument to both OutputStreamWriter and InputStreamReader the fancy encoder argument:
Charset.forName("UTF-8").newEncoder()
There are other even fancier possibilities, but none of the three simpler possibilities work for exception handing. These do:
OutputStreamWriter char_output = new OutputStreamWriter(
new FileOutputStream("some_output.utf8"),
Charset.forName("UTF-8").newEncoder()
);
InputStreamReader char_input = new InputStreamReader(
new FileInputStream("some_input.utf8"),
Charset.forName("UTF-8").newDecoder()
);
As for running with
$ java -Dfile.encoding=utf8 SomeTrulyRemarkablyLongcLassNameGoeShere
The problem is that that will not use the full encoder argument form for the character streams, and so you will again miss encoding problems.
Longer Example
Here’s a longer example, this one managing a process instead of a file, where we promote two different input bytes streams and one output byte stream all to UTF-8 character streams with full exception handling:
// this runs a perl script with UTF-8 STD{IN,OUT,ERR} streams
Process
slave_process = Runtime.getRuntime().exec("perl -CS script args");
// fetch his stdin byte stream...
OutputStream
__bytes_into_his_stdin = slave_process.getOutputStream();
// and make a character stream with exceptions on encoding errors
OutputStreamWriter
chars_into_his_stdin = new OutputStreamWriter(
__bytes_into_his_stdin,
/* DO NOT OMIT! */ Charset.forName("UTF-8").newEncoder()
);
// fetch his stdout byte stream...
InputStream
__bytes_from_his_stdout = slave_process.getInputStream();
// and make a character stream with exceptions on encoding errors
InputStreamReader
chars_from_his_stdout = new InputStreamReader(
__bytes_from_his_stdout,
/* DO NOT OMIT! */ Charset.forName("UTF-8").newDecoder()
);
// fetch his stderr byte stream...
InputStream
__bytes_from_his_stderr = slave_process.getErrorStream();
// and make a character stream with exceptions on encoding errors
InputStreamReader
chars_from_his_stderr = new InputStreamReader(
__bytes_from_his_stderr,
/* DO NOT OMIT! */ Charset.forName("UTF-8").newDecoder()
);
Now you have three character streams that all raise exception on encoding errors, respectively called chars_into_his_stdin, chars_from_his_stdout, and chars_from_his_stderr.
This is only slightly more complicated that what you need for your problem, whose solution I gave in the first half of this answer. The key point is this is the only way to detect encoding errors.
Just don’t get me started about PrintStreams eating exceptions.

Ditch FileWriter and FileReader, which are useless exactly because they do not allow you to specify the encoding. Instead, use
new OutputStreamWriter(new FileOutputStream(file), StandardCharsets.UTF_8)
and
new InputStreamReader(new FileInputStream(file), StandardCharsets.UTF_8);

You need to use the OutputStreamWriter class as the writer parameter for your BufferedWriter. It does accept an encoding. Review javadocs for it.
Somewhat like this:
BufferedWriter out = new BufferedWriter(new OutputStreamWriter(
new FileOutputStream("jedis.txt"), "UTF-8"
));
Or you can set the current system encoding with the system property file.encoding to UTF-8.
java -Dfile.encoding=UTF-8 com.jediacademy.Runner arg1 arg2 ...
You may also set it as a system property at runtime with System.setProperty(...) if you only need it for this specific file, but in a case like this I think I would prefer the OutputStreamWriter.
By setting the system property you can use FileWriter and expect that it will use UTF-8 as the default encoding for your files. In this case for all the files that you read and write.
EDIT
Starting from API 19, you can replace the String "UTF-8" with StandardCharsets.UTF_8
As suggested in the comments below by tchrist, if you intend to detect encoding errors in your file you would be forced to use the OutputStreamWriter approach and use the constructor that receives a charset encoder.
Somewhat like
CharsetEncoder encoder = Charset.forName("UTF-8").newEncoder();
encoder.onMalformedInput(CodingErrorAction.REPORT);
encoder.onUnmappableCharacter(CodingErrorAction.REPORT);
BufferedWriter out = new BufferedWriter(new OutputStreamWriter(new FileOutputStream("jedis.txt"),encoder));
You may choose between actions IGNORE | REPLACE | REPORT
Also, this question was already answered here.

Since Java 11 you can do:
FileWriter fw = new FileWriter("filename.txt", Charset.forName("utf-8"));

Since Java 7 there is an easy way to handle character encoding of BufferedWriter and BufferedReaders. You can create a BufferedWriter directly by using the Files class instead of creating various instances of Writer. You can simply create a BufferedWriter, which considers character encoding, by calling:
Files.newBufferedWriter(file.toPath(), StandardCharsets.UTF_8);
You can find more about it in JavaDoc:
Files class
Files#newBufferedWriter

With Chinese text, I tried to use the Charset UTF-16 and lucklily it work.
Hope this could help!
PrintWriter out = new PrintWriter( file, "UTF-16" );

OK it's 2019 now, and from Java 11 you have a constructor with Charset:
FileWriter​(String fileName, Charset charset)
Unfortunately, we still cannot modify the byte buffer size, and it's
set to 8192. (https://www.baeldung.com/java-filewriter)

use OutputStream instead of FileWriter to set encoding type
// file is your File object where you want to write you data
OutputStream outputStream = new FileOutputStream(file);
OutputStreamWriter outputStreamWriter = new OutputStreamWriter(outputStream, "UTF-8");
outputStreamWriter.write(json); // json is your data
outputStreamWriter.flush();
outputStreamWriter.close();

In my opinion
If you wanna write follow kind UTF-8.You should create a byte array.Then,you can do such as the following:
byte[] by=("<?xml version=\"1.0\" encoding=\"utf-8\"?>"+"Your string".getBytes();
Then, you can write each byte into file you created.
Example:
OutputStream f=new FileOutputStream(xmlfile);
byte[] by=("<?xml version=\"1.0\" encoding=\"utf-8\"?>"+"Your string".getBytes();
for (int i=0;i<by.length;i++){
byte b=by[i];
f.write(b);
}
f.close();

Related

Java application does not works as jar [duplicate]

I have a program developed under java with netbeans. It has a text pane that takes text written in non English language and do some operation including save open new.....
The program was fine and complete worked flawlessly when i run it from netbeans. But when i go to the dist folder and run the jar (which was supposed to be the executable) it runs good but when i open a previously saved file to the editor it shows mysterious fonts.
like-
লিখ "The original inputs are" << নতুন_লাইন;
চলবে(সংখ্যা প=০;প<যতটা;প++)
becomes
লিখ "The original inputs are" << নত�ন_লাইন;
চলবে(সংখ�যা প=০;প<যতটা;প++)
one more interesting thing is that. If i type in the editor it is also working fine (no font problem).
I am using these 2 functions to read and write to file
public void writeToFile(String data,String address)
{
try{
// Create file
FileWriter fstream = new FileWriter(address);
BufferedWriter out = new BufferedWriter(fstream);
out.write(data);
//Close the output stream
out.close();
}catch (Exception e){//Catch exception if any
System.err.println("Error: " + e.getMessage());
}
}
public String readFromFile(String fileName) {
String output="";
try {
File file = new File(fileName);
FileReader reader = new FileReader(file);
BufferedReader in = new BufferedReader(reader);
String string;
while ((string = in.readLine()) != null) {
output=output+string+"\n";
}
in.close();
} catch (IOException e) {
e.printStackTrace();
}
return output;
}
I have set the font of the text pane to vrinda which works from within the IDE as i mentioned.
Please help me identify what is wrong.
is there something i need to do to publish JAR when native support is required?
Try changing your reading logic to use InputStreamReader which allows setting encoding:
InputStreamReader inputStreamReader =
new InputStreamReader(new FileInputStream (file), "UTF-8" );
Also change your writing logic to use OutputStreamWriter which allows setting encoding:
OutputStreamWriter outputStreamWriter =
new OutputStreamWriter(new FileOutputStream (file), "UTF-8" );
The root problem is that your current application is reading the file using the "platform default" character set / character encoding. This is obviously different when you are running from the command line and from NetBeans. In the former cause, it depends on the locale settings of the host OS or the current shell ... depending on your platform. In NetBeans, it seems to default to UTF-8.
#Andrey Adamovich's answer explains how to specify a character encoding when opening a file using a file reader or adapting a byte stream using an input stream reader.

Not working I/O method

I'm pretty new to JAVA, so I do not really see what I am doing wrong within the following method:
public void writeWNDFile(){
String strFilePath = "C:/Users/fperrone/Desktop/ddd.txt";
try
{
//create FileOutputStream object
FileOutputStream fos = new FileOutputStream(strFilePath);
DataOutputStream dos = new DataOutputStream(fos);
dos.writeDouble(12);
dos.close();
}
catch (IOException e)
{
System.out.println("IOException : " + e);
}
}
The file is actually generated, but I don't get 12 as printed and expected result, but #(, which probably is the ASCII representation.
May you shed some light?
EDIT
Does eventually exist a JAVA function behaving similarly to the MATLAB fwrite function? I actually wanna write a binary file. In MATLAB I am simply calling:
fwrite(filename, A, precision)
How could I achieve the same in JAVA?
DataOutputStream.writeDouble and other methods of DataOutputStream are designed to write numbers in binary format. If you want your data be saved in text format use FileWriter and its write(String) method. You can convert double to String with Double.toString(double).
from writeDouble java docs:
Converts the double argument to a long using the doubleToLongBits
method in class Double, and then writes that long value to the
underlying output stream as an 8-byte quantity, high byte first.
As DataOutputStream uses the binary format to write, hence you are seeing the same. But you need to worry if you are going to read the file again using DataInputStream and readDouble method. It should give you the right values.
//PrintWriter out = new PrintWriter(new BufferedWriter(
// new OuputStreamWriter(new FileOutputStream(strFilePath) /*, "UTF-8"*/)));
PrintWriter out = new PrintWriter(strFilePath /*, "UTF-8"*/);
out.println(12);
out.close();
Add the encoding, here UTF-8, when you want the application to write with the same encoding everywhere. Otherwise the default platform encoding is used.
PrintWriter's println adds a newline, print not.

BufferedReader to BufferedWriter

How can I obtain a BufferedWriter from a BufferedReader?
I'd like to be able to do something like this:
BufferedReader read = new BufferedReader(new InputStreamReader(...));
BufferedWriter write = new BufferedWriter(read);
You can use the following from Apache commons io:
IOUtils.copy(reader, writer);
site here
JAVA 9 Updates
Since Java 9, Reader provides a method called transferTo with the following signature:
public long transferTo(Writer out) throws IOException
As the documentation states, transferTo will:
Reads all characters from this reader and writes the characters to the given writer in the order that they are read. On return, this reader will be at end of the stream. This method does not close either reader or writer.
This method may block indefinitely reading from the reader, or writing to the writer. The behavior for the case where the reader and/or writer is asynchronously closed , or the thread interrupted during the transfer, is highly reader and writer specific, and therefore not specified.
If an I/O error occurs reading from the reader or writing to the writer, then it may do so after some characters have been read or written. Consequently the reader may not be at end of the stream and one, or both, streams may be in an inconsistent state. It is strongly recommended that both streams be promptly closed if an I/O error occurs.
So in order to write contents of a Java Reader to a Writer, you can write:
reader.transferTo(writer);
If you want to know what happens:
All input from the reader is copied to the inputstream
Something similar too:
private final void copyInputStream( InputStreamReader in, OutputStreamWriter out ) throws IOException
{
char[] buffer=new char[1024];
int len;
while ( ( len=in.read(buffer) ) >= 0 )
{
out.write(buffer, 0, len);
}
}
More on input and output on The Really big Index
BufferedWriter constructor is not overloaded for accept readers right? what Buhb said was correct.
BufferedWriter writer = new BufferedWriter(
new FileWriter("filename_towrite"));
IOUtils.copy(new InputStreamReader(new FileInputStream("filename_toread")), writer);
writer.flush();
writer.close();
You could use Piped Read/Writers (link). This is exactly what they're designed for. Not sure you could retcon them onto an existing buffered reader you got passed tho'. You'd have to construct the buf reader yourself around it deliberately.

DataInputStream and readLine() with UTF8

I've got some trouble with sending a UTF8 string from a c socket to a java socket.
The following method works fine:
BufferedReader in = new BufferedReader(new InputStreamReader(socket.getInputStream(), "UTF8"));
main.title = in.readLine();
but then I need a int java.io.InputStream.read(byte[] b, int offset, int length) method which does not exist for a BufferedReader. So then I tried to take a DataInputStream
DataInputStream in2 = new DataInputStream(socket.getInputStream());
but everything it reads is just rubbish.
Then I tried to use the readLine() method from DataInputStream but this doesn't give me the correct UTF8 string.
You see my dilemma. Can't I use two readers for one InputStream? Or can I convert the DataInputStream.readLine() result and convert it to UTF8?
Thanks,
Martin
We know from the design of the UTF-8 encoding that the only usage of the value 0x0A is the LINE FEED ('\n'). Therefore, you can read until you hit it:
/** Reads UTF-8 character data; lines are terminated with '\n' */
public static String readLine(InputStream in) throws IOException {
ByteArrayOutputStream buffer = new ByteArrayOutputStream();
while (true) {
int b = in.read();
if (b < 0) {
throw new IOException("Data truncated");
}
if (b == 0x0A) {
break;
}
buffer.write(b);
}
return new String(buffer.toByteArray(), "UTF-8");
}
I am making the assumption that your protocol uses \n as a line terminator. If it doesn't - well, it is generally useful to point out the constraints you're writing to.
Do NOT use BufferedReader and DataInputStream on the same InputStream!! I did that and spent days trying to figure out why my code broke. BufferedReader can read more than what you extract from it into its buffer, resulting in situation when the data I was supposed to read with the DataInputStream being "in the BufferedReader". This resulted in lost data which caused my program to "hang" waiting for it to arrive.
I believe that you should not mismatch the BufferedReader and DataInputStream here. DataInputStream has readLine() too, so use it.
And yet another comment. I am not sure it is a problem but avoid multiple calls of socket.getInputStream(). Do it once and then wrap it as you want using other streams and readers.
Am I understanding it correctly that you are sending both text and binary data on the same socket, in the same "conversation"? There should be no problem creating two readers for the same inputstream. The problem is knowing when (and how much) to read which reader. They will both consume (and advance) the underlying stream when you read from them, since you have mixed types of data. You could just read the stream as bytes and then convert the bytes explicitly in your code (new String(bytes, "UTF-8") etc). Or you could split your communication onto two different sockets.

Java Native language Application Doesnt work outside IDE

I have a program developed under java with netbeans. It has a text pane that takes text written in non English language and do some operation including save open new.....
The program was fine and complete worked flawlessly when i run it from netbeans. But when i go to the dist folder and run the jar (which was supposed to be the executable) it runs good but when i open a previously saved file to the editor it shows mysterious fonts.
like-
লিখ "The original inputs are" << নতুন_লাইন;
চলবে(সংখ্যা প=০;প<যতটা;প++)
becomes
লিখ "The original inputs are" << নত�ন_লাইন;
চলবে(সংখ�যা প=০;প<যতটা;প++)
one more interesting thing is that. If i type in the editor it is also working fine (no font problem).
I am using these 2 functions to read and write to file
public void writeToFile(String data,String address)
{
try{
// Create file
FileWriter fstream = new FileWriter(address);
BufferedWriter out = new BufferedWriter(fstream);
out.write(data);
//Close the output stream
out.close();
}catch (Exception e){//Catch exception if any
System.err.println("Error: " + e.getMessage());
}
}
public String readFromFile(String fileName) {
String output="";
try {
File file = new File(fileName);
FileReader reader = new FileReader(file);
BufferedReader in = new BufferedReader(reader);
String string;
while ((string = in.readLine()) != null) {
output=output+string+"\n";
}
in.close();
} catch (IOException e) {
e.printStackTrace();
}
return output;
}
I have set the font of the text pane to vrinda which works from within the IDE as i mentioned.
Please help me identify what is wrong.
is there something i need to do to publish JAR when native support is required?
Try changing your reading logic to use InputStreamReader which allows setting encoding:
InputStreamReader inputStreamReader =
new InputStreamReader(new FileInputStream (file), "UTF-8" );
Also change your writing logic to use OutputStreamWriter which allows setting encoding:
OutputStreamWriter outputStreamWriter =
new OutputStreamWriter(new FileOutputStream (file), "UTF-8" );
The root problem is that your current application is reading the file using the "platform default" character set / character encoding. This is obviously different when you are running from the command line and from NetBeans. In the former cause, it depends on the locale settings of the host OS or the current shell ... depending on your platform. In NetBeans, it seems to default to UTF-8.
#Andrey Adamovich's answer explains how to specify a character encoding when opening a file using a file reader or adapting a byte stream using an input stream reader.

Categories