jTable Encoding in Java - java

I am loading a file,
getting a line from it
and putting it in a jTable
the jTable shows some of my charcters as square boxes.
answers I found online:
1.
it's how the file is opened, so I changed
BufferedReader bf = new BufferedReader(new FileReader(filename));
into
BufferedReader bf = new BufferedReader(new InputStreamReader(new FileInputStream(filename), "UTF8"));
I also tried Charset instead of the string
2.
it's the font, so I tried
jTable1.setFont(new Font("Times New Roman", Font.BOLD, 12));
I tried other fonts, like Arial, David, ...
Can you think of any other reason??
By nachokk's request, here's my code:
int linecount = 0;
String line = null;
BufferedReader bf = new BufferedReader(new InputStreamReader(new FileInputStream(filename), "UTF8"));
DefaultTableModel model = (DefaultTableModel) jTable1.getModel();
while ((line = bf.readLine()) != null) {
linecount++;
model.addRow(new Object[]{filename, linecount, line});
}

You get a box on the screen when you use a font which doesn't contain the necessary glyphs (i.e. it doesn't know how to render certain Unicode character codes).
If Java couldn't read the file (i.e. encoding issues), you should see ? instead.
If you need a quick test, open a text editor that allows to select the fond and then your OS's Unicode input method to enter some Unicode values (or, if you have a file, just try to open it).
Note that some word processors notice when a glyph is missing and they automatically fall back to a font which contains the glyph.
[EDIT]
it gives me a replacement character 65533 at every location
That means you use the wrong encoding to read the file. Find out what the correct encoding is (ask the people who create it, for example) and use that instead of UTF-8

Related

Java: UTF8 encoding is not displayed correctly in JTextArea

I am trying to display txt or docx file content in JTextArea, but text area does not correcty display armenian or russian text. UTF-8 enconding in InputStreamReader does not help:
public class TextReader {
public static String getText(File textFile) throws IOException {
FileInputStream fis = new FileInputStream(textFile);
InputStreamReader isr = new InputStreamReader(fis, "UTF8");
BufferedReader br = new BufferedReader(isr);
StringBuilder text = new StringBuilder();
String c;
while ((c = br.readLine()) != null)
text.append(c + "\n");
fis.close();
isr.close();
br.close();
return String.valueOf(text);
}
}
I am using this static method in another class in JTextArea:
String text = TextReader.getText(currentFile);
textArea.setText(text);
After running and choosing the file, I got random characters. What could be the solution in this case?
Your code seems to be fine. My guess is you are trying to read a docx file.
You can't directly read docx files this way. Use some library like Apache POI.
If you are indeed using a text file, it might be the case that application you use to save the file uses wrong encoding. You could try saving some (hard-coded) sample Russian text using Java itself to a text file and reading it again in to your JTextArea.

I want to cut and paste paragraphs of txt file to new number of files

consider the content of text file is like:
java is a programming language and a platform.Java is a high level,
robust, secured and object-oriented programming language.
Platform: Any hardware or software environment in which a program
runs, is known as a platform. Since Java has its own runtime
environment (JRE) and API, it is called platform.
Java history is interesting to know. The history of java starts from
Green Team. Java team members (also known as Green Team), initiated a
revolutionary task to develop a language for digital devices such as
set-top boxes, televisions etc.
now as you can see there are three paragphs. I want to store these three paragraphs in 3 different txt files.
You'll first need to read the file in using a FileReader and store it as a String. You can then use String.split("\n\n") to split it into paragraphs (this will give you an array with 3 elements).
You can then loop through each of those array elements, creating a PrintWriter for each (to write each array element to a separate file.)
Try the following code:
public void readFileParagraphs(String fileName) throws IOException {
BufferedReader br = new BufferedReader(new FileReader(fileName));
try {
StringBuilder sb = new StringBuilder();
String line = br.readLine();
while (line != null) {
sb.append(line);
sb.append("\n");
line = br.readLine();
}
// Split the content of the file into an array of paragraphs
String parags[]= sb.toString().split("\n\n");
//Write every paragraph to a new file
for (int i=0; i<parags.length();i++) {
File file = new File("Paragraph_"+i+".txt");
FileWriter writer = new FileWriter(file, true);
PrintWriter output = new PrintWriter(writer);
output.print(parags[i]);
output.close();
}
} finally {
br.close();
}
}
You have to:
Read the first file text and store it in a String.
Split it using a new line regex to get the paragraphs.
And finally save every result(paragraph) of the split in a new file.

Unexpected result when writing a Unicode(UTF-8) text to file

I have problem when writing a Unicode(UTF-8) text to file with java.
I want writ some text in other language (Persian) to file in java, but i receive Unexpected result after run my app.
File file = new File(outputFileName);
FileOutputStream f = new FileOutputStream(outputFileName);
String encoding = "UTF-8";
OutputStreamWriter osw = new OutputStreamWriter(f,encoding);
BufferedWriter bw = new BufferedWriter(osw);
StringBuilder row = new StringBuilder();
row.append("Some text in English language");
// in below code it should be 4 space before علی
row.append(" علی");
// in below code it should be 6 space before علی یاری
row.append(" علی یاری");
bw.write(row.toString());
bw.flush(); bw.close();
how can i solve this problem?
The output is what is expected according to the Unicode bidirectional algorithm. The entire Persian text is rendered right-to-left. If you want the individual words to be laid out left-to-right, then you need to insert a strongly left-to-right character between the two Persian words. There's a special character for this: LEFT TO RIGHT MARK (U+200e). This modification to your code should produce the correct output:
row.append("Some text in English language");
row.append(" علی");
row.append('\u200e');
row.append(" علی یاری");

How to save an HTML page with special chars (UTF-8) to a txt file

I need to make a java code that save an html to a txt file.
The problem is that the special chars in UTF-8 are broken.
Words like "Hamamélis" are saved in this way "Hamam�lis".
the code that i writed is listed down there:
URLConnection conn;
conn = site.openConnection();
conn.setReadTimeout(10000);
Charset charset = Charset.forName("UTF8");
BufferedReader in = new BufferedReader( new InputStreamReader( conn.getInputStream(), "UTF-8" ) );
buff = in.readLine();
And after:
out = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(Nome), "UTF-8"));
out.write(buff);
out.close();
Anyone can suggest me a solution?
One possible error is omitting the hyphen from "UTF-8" in the 4th line of your first piece of code. See the CharSet documentation.
Otherwise, code seems correct. But of course we cannot test it directly as we do not have your data.
For comparison, here is a little class I wrote. In a manner similar to your code, this class correctly writes your "Hamamélis" example's accented 'e' as the two octets expected in UTF-8 for a single (non-normalized) character: in hex 'C3' & 'A9'.
import java.io.File;
import java.io.FileOutputStream;
import java.io.OutputStreamWriter;
import java.io.BufferedWriter;
import java.io.IOException;
public class ReaderWriter {
public static void main(String[] args) {
try {
String content = "Hamamélis. Written: " + new java.util.Date();
File file = new File("some_text.txt");
// Create file if not already existent.
if (!file.exists()) {
file.createNewFile();
}
FileOutputStream fileOutputStream = new FileOutputStream( file );
OutputStreamWriter outputStreamWriter = new OutputStreamWriter( fileOutputStream, "UTF-8" );
BufferedWriter bufferedWriter = new BufferedWriter( outputStreamWriter );
bufferedWriter.write( content );
bufferedWriter.close();
System.out.println("ReaderWriter 'main' method is done. " + new java.util.Date() );
} catch (IOException e) {
e.printStackTrace();
}
}
}
As icktoofay commented, you should dig deeper to discover exactly what octets are involved. Use a hex editor like this "File Viewer" app I found today on the Mac App Store to see the exact octets in your saved file.
If the octets are C3 & A9, then the problem is simply that the text editor you used to look at the file as text used the wrong character encoding. For example, you can open that text file in a web browser, and use its menu commands to re-interpret the file as UTF-8.
If the octets are not C3 & A9, I would go further back to examine the input's octets.
If you do not understand that text files in computers actually contain numbers (not text in the human sense), then take a break from coding to read this entertaining article:
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky

Why do I loose new line character when I load text from a java servlet to JTextPane?

I try to load content of a text file that contains some text in multiple lines using java servlet.
When I test servlet in browser it works fine. Text is loaded with new line chars.
But when I load it to a string in my swing application and then use textpane.setText(text); new lines are gone. I tried many solutions I found int the net, but still can't get it right.
Servlet Code:
Reading text from file (simplified):
File file = new File(path);
StringBuilder data = new StringBuilder();
BufferedReader in = new BufferedReader(new FileReader(file));
String line;
while ((line = in.readLine()) != null) {
data.append(line);
data.append("\n");
}
in.close();
Sending text:
PrintWriter out = response.getWriter();
out.write(text));
Is it some platform issue? Servlet was writen and compiled on Linux, but I run it on Windows (on JBoss). Textfiles are also stored on my machine.
Instead of data.append("\n") use
data.append(System.getProperty("line.separator"));

Categories