remove Â from word pad document after writing from java

remove Â from word pad document after writing from java - java

taking a string from java and placing it into a text file. When the string is written it does not contain Â, however when the string comes open in word pad the character appears.
String without:
Notice of Appeal:
 
Hamilton City Board of Education
String with:
Notice of Appeal:
Â 
Â 
Hamilton City Board of Education
Below is the write string
out = new BufferedWriter (new FileWriter(filePrefix + "-body" + ".txt"));
out.write("From: " + em.from);
out.newLine();
out.write("Sent Date: " + em.sentDate);
out.newLine();
out.write("Subject: " + em.subject);
out.newLine();
out.newLine();
out.newLine();
String temp = new String(emi.stringContent.getBytes("UTF-8"), "UTF-8");
out.write(temp);
What should i do to not have them appear in word pad?

This looks like a UTF-8 encoding problem to me. I believe you are getting the Â character because you are writing the content in UTF-8, and the content contains a high-ASCII value, but WordPad is expecting the data to be in the code-page your local system is running in. Either write the content in the code-page expected by WordPad, or make WordPad expect UTF-8.
As an aside:
String temp = new String(emi.stringContent.getBytes("UTF-8"), "UTF-8");
out.write(temp);
is a complete waste of time; use:
out.write(emi.stringContent);
instead.

Im assuming this is a line separator issue.
use:
String line = System.getProperty("line.separator");
and just add it to your string wherever you want a new line

Related

Why does RandomAccessFile read ï»¿ as firt character in my UTF-8 text file?

A question on reading text files in Java. I have a text file saved with UTF-8 encoding with only the content:
Hello. World.
Now I am using a RandomAccessFile to read this class. But for some reason, there seems to be an "invisible" character at the beginning of the file ...?
I use this code:
File file = new File("resources/texts/books/testfile2.txt");
try(RandomAccessFile reader = new RandomAccessFile(file, "r")) {
String readLine = reader.readLine();
String utf8Line = new String(readLine.getBytes("ISO-8859-1"), "UTF-8" );
System.out.println("Read Line: " + readLine);
System.out.println("Real length: " + readLine.length());
System.out.println("UTF-8 Line: " + utf8Line);
System.out.println("UTF-8 length: " + utf8Line.length());
System.out.println("Current position: " + reader.getFilePointer());
} catch (Exception e) {
e.printStackTrace();
}
The output is this:
Read Line: ?»?Hello. World.
Real length: 16
UTF-8 Line: ?Hello. World.
UTF-8 length: 14
Current position: 16
These (1 or 2) characters seem to appear only at the very beginning. If I add more lines to the file and read them, then all the further lines are being read normally.
Can someone explain this behavior? What is this character at the beginning?
Thanks!

The first 3 bytes in your file (0xef, 0xbb, 0xbf) is so called UTF-8 BOM (Byte Order Mark). BOM is important for UTF-16 and UTF-32 only - for UTF-8 it has no meaning. Microsoft introduced it for the better guess of the file encoding.
That is, no all UTF-8 encoded text files have that mark, but some have.

Splitting CSV input

I am trying to get data out of a CSV data. However, if I try to read the data so I can use the individual data inside it, it prints extra stuff like:
x����sr��java.util.ArrayListx����a���I��sizexp������w������t��17 mei 2017t��Home - Gastt��4 - 1t��(4 - 0)t��
With this code:
FileInputStream in = openFileInput("savetest13.dat");
BufferedReader reader = new BufferedReader(new InputStreamReader(in));
List<String[]> resultList = new ArrayList();
String csvLine;
while ((csvLine = reader.readLine()) != null){
String[] row = csvLine.split(",");
out.println("while gepakt");
out.println(row);
date = row[0];
out.println("date: "+date);
resultList.add(row);
txtTest.setText(date);
}
But whenever I read the file to check what data it contains, I get the exact same data as I put in. But I can't manage to split the data with stuff:
FileInputStream in = openFileInput("savetest13.dat");
ObjectInputStream ois = new ObjectInputStream(in);
List stuff = (List) ois.readObject();
txtTest.setText(String.valueOf(stuff));
[17 mei 2017, Home - Guest, 2 - 1, (2 - 0), ]
I am trying to get them separated into date, names, score1, score2
.
Which of the 2 would be better to use and how can I get the correct output, which I am failing to obtain?

You are not writing CSV to your output file, rather than that your are using standard java serialization ObjectOutputStream#writeObject(...) to create that file. try using a CSV library to write/read data in CSV format (see hier) and before that, start here to learn about CSV because
[17 mei 2017, Home - Guest, 2 - 1, (2 - 0), ]
is not csv, but only the output of toString of the list you are using.

Here is an easy way to write the CSV file formatted correctly. This is a simple example which does not take into account the need to escape any commas found in your data.You can open the file created in Excel, Google Sheets, OpenOffice etc. and see that it is formatted correctly.
final String COMMA = ",";
final String NEW_LINE = System.getProperty("line.separator");
ArrayList<String> myRows = new ArrayList<String>();
// add comma delimited rows to ArrayList
myRows.add("date" + COMMA +
"names" + COMMA +
"score1" + COMMA +
"score2"); // etc.
// optional - insert field names into the first row
myRows.add(0, "[Date]" + COMMA +
"[Names]" + COMMA +
"[Score1]" + COMMA +
"[Score2]");
// get a writer
final String fileName = "myFileName.csv";
final String dirPath = Environment.getExternalStoragePublicDirectory(Environment.DIRECTORY_DOCUMENTS).getAbsolutePath();
File myDirectory = new File(dirPath);
FileWriter writer = new FileWriter(new File(myDirectory, fileName));
// write the rows to the file
for (String myRow : myRows) {
writer.append(myRow + NEW_LINE);
}
writer.close();

not represented in specified output encoding of UTF-8

I have a java program which reads docx files line by line with Apache POI. I have a word list and if I match this word in one line, I print the line from docx file. So far, I had not problem. Today I had an output like this :
Attempt to output character of integral value 0 that is not represented in specified output encoding of UTF-8.
What does this mean ? And please provide me a solution.
Thank you.
My code where I read docx file and print the line.
URL url = new URL(URL.get(y));
File file = new File("E:\\demo\\myfile.docx");
org.apache.commons.io.FileUtils.copyURLToFile(url, file);
POITextExtractor extractor1 = ExtractorFactory.createExtractor(file);
String text = extractor1.getText();
StringReader sr = new StringReader(text);
BufferedReader readme = new BufferedReader(sr);
while ((sCurrentLine3 = readme.readLine()) != null) {
sCurrentLine3= sCurrentLine3.trim().replaceAll("\\s+","").replaceAll("\n", "").replaceAll("\r", "").replaceAll(" ", "");
sCurrentLine3 = "Z:" + sCurrentLine3;
sCurrentLine3 = sCurrentLine3.replace("/", "\\");
System.out.println(ObjectsLine.get(i) + " " + Change.get(y) + " " + sCurrentLine3);

Writing bi-directional text to a text file in Java

I'm trying to write the following strings to a text file:
str1 = "אבג IMM:";
str2 = "3492";
To make things clearer, let's say a = "אבג", b = "IMM:". What I'm trying to write to the text file is a + b + str2.
What I'm actually getting is a + str2 + b
I thought I'd find an easy answer in google but couldn't so I'm stuck with this silly little issue.
Any ideas?
Thanks
Edit:
Thanks for the quick responses. This is an example of my code:
try {
FileOutputStream out = new FileOutputStream("newtxt.txt");
BufferedWriter writer = new BufferedWriter(new OutputStreamWriter
(out,"UNICODE"));
String str1 = "אבג IMM:";
String str2 = "3492";
String newStr = str1 + str2;
writer.write(newStr);
writer.close();
} catch(IOException ex) {}
Things to keep in mind:
I'm writing this bit of text to an existing file with mostly right to left text, so while this text displays properly in left to right, the problem is there.
Manually writing this bit of text in notepad also proves problematic. Manually writing in a more advanced program such as Microsoft's Word, and the problem is gone. However, as the code is written right now, saving the file as a doc / rtf type doesn't solve this problem.
There's no problem appending english to hebrew and vise versa, with no numbers.

Why is Java BufferedReader() not reading Arabic and Chinese characters correctly?

I'm trying to read a file which contain English & Arabic characters on each line and another file which contains English & Chinese characters on each line. However the characters of the Arabic and Chinese fail to show correctly - they just appear as question marks. Any idea how I can solve this problem?
Here is the code I use for reading:
try {
String sCurrentLine;
BufferedReader br = new BufferedReader(new FileReader(directionOfTargetFile));
int counter = 0;
while ((sCurrentLine = br.readLine()) != null) {
String lineFixedHolder = converter.fixParsedParagraph(sCurrentLine);
System.out.println("The line number "+ counter
+ " contain : " + sCurrentLine);
counter++;
}
}
Edition 01
After reading the line and getting the Arabic and Chinese word I use a function to translate them by simply searching for Given Arabic Text in an ArrayList (which contain all expected words) (using indexOf(); method). Then when the word's index is found it's used to call the English word which has the same index in another Arraylist. However this search always returns false because it fails when searching the question marks instead of the Arabic and Chinese characters. So my System.out.println print shows me nulls, one for each failure to translate.
*I'm using Netbeans 6.8 Mac version IDE
Edition 02
Here is the code which search for translation:
int testColor = dbColorArb.indexOf(wordToTranslate);
int testBrand = -1;
if ( testColor != -1 ) {
String result = (String)dbColorEng.get(testColor);
return result;
} else {
testBrand = dbBrandArb.indexOf(wordToTranslate);
}
//System.out.println ("The testBrand is : " + testBrand);
if ( testBrand != -1 ) {
String result = (String)dbBrandEng.get(testBrand);
return result;
} else {
//System.out.println ("The first null");
return null;
}
I'm actually searching 2 Arraylists which might contain the the desired word to translate. If it fails to find them in both ArrayLists, then null is returned.
Edition 03
When I debug I found that lines being read are stored in my String variable as the following:
"3;0000000000;0000001001;1996-06-22;;2010-01-27;����;;01989;������;"
Edition 03
The file I'm reading has been given to me after it has been modified by another program (which I know nothing about beside it's made in VB) the program made the Arabic letters that are not appearing correctly to appear. When I checked the encoding of the file on Notepad++ it showed that it's ANSI. however when I convert it to UTF8 (which replaced the Arabic letter with other English one) and then convert it back to ANSI the Arabic become question marks!

FileReader javadoc:
Convenience class for reading character files. The constructors of this class assume that the default character encoding and the default byte-buffer size are appropriate. To specify these values yourself, construct an InputStreamReader on a FileInputStream.
So:
Reader reader = new InputStreamReader(new FileInputStream(fileName), "utf-8");
BufferedReader br = new BufferedReader(reader);
If this still doesn't work, then perhaps your console is not set to properly display UTF-8 characters. Configuration depends on the IDE used and is rather simple.
Update : In the above code replace utf-8 with cp1256. This works fine for me (WinXP, JDK6)
But I'd recommend that you insist on the file being generated using UTF-8. Because cp1256 won't work for Chinese and you'll have similar problems again.

IT is most likely Reading the information in correctly, however your output stream is probably not UTF-8, and so any character that cannot be shown in your output character set is being replaced with the '?'.
You can confirm this by getting each character out and printing the character ordinal.

public void writeTiFile(String fileName,String str){
try {
FileOutputStream out = new FileOutputStream(fileName);
out.write(str.getBytes("windows-1256"));
} catch (Exception ex) {
ex.printStackTrace();
}
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

remove Â from word pad document after writing from java - java

Im assuming this is a line separator issue. use: String line = System.getProperty("line.separator"); and just add it to your string wherever you want a new line

Related

Why does RandomAccessFile read ï»¿ as firt character in my UTF-8 text file?

Splitting CSV input

not represented in specified output encoding of UTF-8

Writing bi-directional text to a text file in Java

Why is Java BufferedReader() not reading Arabic and Chinese characters correctly?

Categories

Resources