Unexpected result when writing a Unicode(UTF-8) text to file

Unexpected result when writing a Unicode(UTF-8) text to file - java

I have problem when writing a Unicode(UTF-8) text to file with java.
I want writ some text in other language (Persian) to file in java, but i receive Unexpected result after run my app.
File file = new File(outputFileName);
FileOutputStream f = new FileOutputStream(outputFileName);
String encoding = "UTF-8";
OutputStreamWriter osw = new OutputStreamWriter(f,encoding);
BufferedWriter bw = new BufferedWriter(osw);
StringBuilder row = new StringBuilder();
row.append("Some text in English language");
// in below code it should be 4 space before علی
row.append(" علی");
// in below code it should be 6 space before علی یاری
row.append(" علی یاری");
bw.write(row.toString());
bw.flush(); bw.close();
how can i solve this problem?

The output is what is expected according to the Unicode bidirectional algorithm. The entire Persian text is rendered right-to-left. If you want the individual words to be laid out left-to-right, then you need to insert a strongly left-to-right character between the two Persian words. There's a special character for this: LEFT TO RIGHT MARK (U+200e). This modification to your code should produce the correct output:
row.append("Some text in English language");
row.append(" علی");
row.append('\u200e');
row.append(" علی یاری");

Related

Unexpected line break using PrintWriter

I don't know why PrintWriter break the line when it see this symbol '##'
I want to write these line to .txt file
But got this, you can see it got some unexpected rows:
Part of my code:
try(OutputStreamWriter fw = new OutputStreamWriter(
new FileOutputStream(filePath + "\\" +file_number + "_"+ info.getFileName(), true),
StandardCharsets.UTF_8);
BufferedWriter bw = new BufferedWriter(fw);
PrintWriter out = new PrintWriter(bw, false)) {
JCoTable rows = function6.getTableParameterList().getTable("DATA");
for (int i = 0; i < rows.getNumRows(); i++) {
rows.setRow(i);
out.write(Integer.toString(CURRENT_ROW) + ((char)Integer.parseInt(info.getFileDelimited()))+ rows.getString("WA") + "\n");}

The ABAP table contains 6 rows and your output contains 6 rows. So I guess you mean with "additional rows" the 2 additional line breaks in rows no. 2 and 6.
I assume this is because these line breaks are part of the text in these rows. The SAP GUI doesn't write these control characters and outputs them as '##', but the Java code prints these characters, of course. I guess these 2 substituted chars '##' actually are the control characters END OF LINE (U+000A) and CARRIAGE RETURN (U+000D).
You can check their real character codes with the ABAP debugger's hexadecimal view or in your Java development environment debugger.

How to convert a UTF-8 file to UTF-16 format in Java?

I have got a file with text in it, which is in UTF-8 format. I want to convert the content of that file to UTF-16 file format, but I want to keep the special characters. How can I do this?
My attempt:
reader = new BufferedReader(new InputStreamReader(
new FileInputStream(file), StandardCharsets.UTF_8));
// ...
// read content from file
// ...
writer = new PrintWriter(new OutputStreamWriter(
new FileOutputStream(file), StandardCharsets.UTF_16));
// ...
// write content to file
// ...
However the special characters are lost using this approach. "ÜÖÄ²³§`´" resulted in "ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½`ï¿½". I also tried replacing the characters in the Java string, but the read character is already malformed at this point. How can this be done?

read greek characters from xls file into java

I am trying to read an xls file in java and convert it to csv. The problem is that it contains greek characters. I have used various different methods with no success.
br = new BufferedReader(new InputStreamReader(
new FileInputStream(saveDir+"/"+fileName+".xls"), "UTF-8"));
FileWriter writer1 = new FileWriter(saveDir+"/A"+fileName+".csv");
byte[] bytes = thisLine.getBytes("UTF-8");
writer1.append(new String(bytes, "UTF-8"));
used that with different encoders, like utf16 and windoes-1253 and ofcourse with out using the bytes array. none worked. any ideas?

Use "ISO-8859-7" instead of "UTF-8". It is for latin and greek. See documentation
InputStream in = new BufferedInputStream(new FileInputStream(new File(myfile)));
result = new Scanner(in,"ISO-8859-7").useDelimiter("\\A").next();

A Byte Order Mask (BOM) should be entered at the start of the CSV file.
Can you try this code?
PrintWriter writer1 = new PrintWriter(saveDir+"/A"+fileName+".csv");
writer1.print('\ufeff');
....

jTable Encoding in Java

I am loading a file,
getting a line from it
and putting it in a jTable
the jTable shows some of my charcters as square boxes.
answers I found online:
1.
it's how the file is opened, so I changed
BufferedReader bf = new BufferedReader(new FileReader(filename));
into
BufferedReader bf = new BufferedReader(new InputStreamReader(new FileInputStream(filename), "UTF8"));
I also tried Charset instead of the string
2.
it's the font, so I tried
jTable1.setFont(new Font("Times New Roman", Font.BOLD, 12));
I tried other fonts, like Arial, David, ...
Can you think of any other reason??
By nachokk's request, here's my code:
int linecount = 0;
String line = null;
BufferedReader bf = new BufferedReader(new InputStreamReader(new FileInputStream(filename), "UTF8"));
DefaultTableModel model = (DefaultTableModel) jTable1.getModel();
while ((line = bf.readLine()) != null) {
linecount++;
model.addRow(new Object[]{filename, linecount, line});
}

You get a box on the screen when you use a font which doesn't contain the necessary glyphs (i.e. it doesn't know how to render certain Unicode character codes).
If Java couldn't read the file (i.e. encoding issues), you should see ? instead.
If you need a quick test, open a text editor that allows to select the fond and then your OS's Unicode input method to enter some Unicode values (or, if you have a file, just try to open it).
Note that some word processors notice when a glyph is missing and they automatically fall back to a font which contains the glyph.
[EDIT]
it gives me a replacement character 65533 at every location
That means you use the wrong encoding to read the file. Find out what the correct encoding is (ask the people who create it, for example) and use that instead of UTF-8

How to empty file content and then append text multiple times

I have a file (file.txt), and I need to empty his current content, and then to append some text multiple times.
Example: file.txt current content is:
aaa
bbb
ccc
I want to remove this content, and then to append the first time:
ddd
The second time:
eee
And so on...
I tried this:
// empty the current content
fileOut = new FileWriter("file.txt");
fileOut.write("");
fileOut.close();
// append
fileOut = new FileWriter("file.txt", true);
// when I want to write something I just do this multiple times:
fileOut.write("text");
fileOut.flush();
This works fine, but it seems inefficient because I open the file 2 times just for remove the current content.

When you open up the file to write it with your new text, it will overwrite whatever is in the file already.
A good way to do this is
// empty the current content
fileOut = new FileWriter("file.txt");
fileOut.write("");
fileOut.append("all your text");
fileOut.close();

The first answer is not correct. If you create a new filewriter with the true flag for the second parameter, it will open in append mode. This will cause any write(string) commands to "append" text to the end of the file, not wipe out whatever text is already there.

I'm just stupid.
I only needed to do this:
// empty the current content
fileOut = new FileWriter("file.txt");
// when I want to write something I just do this multiple times:
fileOut.write("text");
fileOut.flush();
And AT THE END close the stream.

I see that this question was answered quite a few Java versions ago...
Starting from Java 1.7 and using the new FileWriter + BufferWriter + PrintWriter for appending (as recommended in this SO answer ), my suggestion for file erasing and then appending:
FileWriter fw = new FileWriter(myFilePath); //this erases previous content
fw = new FileWriter(myFilePath, true); //this reopens file for appending
BufferedWriter bw = new BufferedWriter(fw);
PrintWriter pw = new PrintWriter(bw);
pw.println("text");
//some code ...
pw.println("more text"); //appends more text
pw.flush();
pw.close();

Best I could think of is :
Files.newBufferedWriter(pathObject , StandardOpenOption.TRUNCATE_EXISTING);
and
Files.newInputStream(pathObject , StandardOpenOption.TRUNCATE_EXISTING);
In both the cases if the file specified in pathObject is writable, then that file will be truncated.
No need to call write() function. Above code is sufficient to empty/truncate a file.This is new in java 8.
Hope it Helps

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Unexpected result when writing a Unicode(UTF-8) text to file - java

Related

Unexpected line break using PrintWriter

How to convert a UTF-8 file to UTF-16 format in Java?

read greek characters from xls file into java

jTable Encoding in Java

How to empty file content and then append text multiple times

Categories

Resources