I use a FileWriter to save a CSV file (text file).
All seems good when I read it with a text editor like sublime text.
But when I read it with java I get some nasty characters, anyhow I try to read it.
An example of the reading:
StringBuilder sb=new StringBuilder();
try {
String ligne;
BufferedReader fichier1 = new BufferedReader(new FileReader(nom_office));
while ((ligne = fichier1.readLine()) != null) {
sb.append(ligne);
}
fichier1.close();
} catch (Exception e) {
e.printStackTrace();
}
//String totalité = new String(encoded, encoding);
String totalité = sb.toString();
the result of these following statements is:
System.out.println("##############");
System.out.println(totalité);
PK ! T��ep [Content_Types].xml �(�
�TKn�0�W�"o���EUU�,[$�L/�i"m�k�IO)�
...and so on.
why isn't it the same result as in sublime text?
BufferedReader uses default system encoding which probably isn't UTF-8 and that's what you need here. Try this:
BufferedReader br = new BufferedReader(new InputStreamReader(
new FileInputStream(file), "UTF-8"));
Also, your IDEs console needs to be configured to use UTF-8, that's really important!
Related
I am trying to copy some data from pdf to txt file here is the code
public void readPDFFile() throws IOException {
InputStreamReader reader;
OutputStreamWriter writer;
FileInputStream inputstream;
FileOutputStream outputStream;
BufferedReader bufferedReader = null;
BufferedWriter bufferedWriter = null;
String str;
File rfile = new File(
"C://Documents and Settings/Administrator/My Documents/EGDownloads/source.pdf");
File wFile = new File("C://Documents and Settings/Administrator/My Documents/Folder/destination.txt");
try {
inputstream = new FileInputStream(rfile);
outputStream = new FileOutputStream(wFile);
reader = new InputStreamReader(inputstream, "UTF-8");
writer = new OutputStreamWriter(outputStream, "UTF-8");
bufferedReader = new BufferedReader(reader);
bufferedWriter = new BufferedWriter(writer);
while ((str = bufferedReader.readLine()) != null) {
writer.write(str);
}
} catch (IOException es) {
System.out.println(es.getMessage());
es.printStackTrace(System.out);
} finally {
if (bufferedReader != null) {
bufferedReader.close();
}
if (bufferedWriter != null)
bufferedWriter.close();
}
}
Expected output is supposed in other language but all I am getting is some random boxes as tried both UTF-16 and UTF-8 unicodes
I tried pdfBox but is still not working as all I'm getting is only original language accent and in english language
Note :
1 I'm not trying to print data on console but copying from pdf to txt file
2 Other file contains non english words,
can anyone help me to solve that??
Or any link that might help
Thanks.
The PDF format is a binary format. You must have a really special PDF as all that I know of are compressed in some way. Use a proper library to read it, be it pdfbox or itext or other. Be aware that in some PDFs it's impossible to extract text, you can check it with Acrobat, if Acrobat can't do it nobody can.
I am trying to read a file using BufferedReader, but when I tried to print, It is returning some weird characters.
Code of reading file is:
private static String readJsonFile(String fileName) throws IOException{
BufferedReader br = null;
try {
StringBuilder sb = new StringBuilder();
br = new BufferedReader(new FileReader(fileName));
String line = br.readLine();
while(line != null ){
sb.append(line);
System.out.println(line);
line=br.readLine();
}
return sb.toString();
} finally{
br.close();
}
}
This function is being called as :
String jsonString = null;
try {
jsonString = readJsonFile(fileName);
} catch (IOException e) {
e.printStackTrace();
}
But when I tried to print this in console using System.out.println(jsonString);, It is returning some fancy pictures.
Note: It is Working file when file size is small.
Is there any limit on size of file it can read ?
You're using the platform default encoding to read the file, which is probably encoded in UTF8. Check the actual encoding of the file, and specify the encoding:
BufferedReader r = new BufferedReader(new InputStreamReader(new FileInputStream("...", StandardCharsets.UTF_8));
Note that since you simply want to read everything from the file, you could simply use
String json = new String(Files.readAllBytes(...), StandardCharsets.UTF_8);
In my java application, I have to read one file. The problem what I am facing, after reading the file, the results is coming as non readable format. that means some ascii characters are displayed. That means none of the letters are readable. How can I make it display that?
// Open the file that is the first
// command line parameter
FileInputStream fstream = new FileInputStream("c:\\hello.txt");
// Get the object of DataInputStream
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String strLine;
// Read File Line By Line
while ((strLine = br.readLine()) != null) {
// Print the content on the console
System.out.println(strLine);
}
// Close the input stream
in.close();
} catch (Exception e) {// Catch exception if any
System.err.println("Error: " + e.getMessage());
}
Perhaps you have an encoding error. The constructor you are using for an InputStreamReader uses the default character encoding; if your file contains UTF-8 text outside the ASCII range, you will get garbage. Also, you don't need a DataInputStream, since you aren't reading any data objects from the stream. Try this code:
FileInputStream fstream = null;
try {
fstream = new FileInputStream("c:\\hello.txt");
// Decode data using UTF-8
BufferedReader br = new BufferedReader(new InputStreamReader(in, "UTF-8"));
String strLine;
// Read File Line By Line
while ((strLine = br.readLine()) != null) {
// Print the content on the console
System.out.println(strLine);
}
} catch (Exception e) {// Catch exception if any
System.err.println("Error: " + e.getMessage());
} finally {
if (fstream != null) {
try { fstream.close(); }
catch (IOException e) {
// log failure to close file
}
}
}
The output you are getting is an ascii value ,so you need to type cast it into char or string before printing it.Hope this helps
You have to implement this way to handle:-
BufferedReader br = new BufferedReader(new InputStreamReader(in, encodingformat));
.
encodingformat - change it according to which type of encoding issue you are encounter.
Examples: UTF-8, UTF-16, ... soon
Refer this Supported Encodings by Java SE 6 for more info.
My problem got solved. I dont know how. I copied the hello.txt contents to another file and run the java program. I could read all letters. dont know whats the problem in that.
Since you doesn't know the encoding the file is in, use jchardet to detect the encoding used by the file and then use that encoding to read the file as others have already suggested. This is not 100 % fool proof but works for your scenario.
Also, use of DataInputStream is unnecessary.
In Java, I am trying to parse an HTML file that contains complex text such as greek symbols.
I encounter a known problem when text contains a left facing quotation mark. Text such as
mutations to particular “hotspot” regions
becomes
mutations to particular “hotspot�? regions
I have isolated the problem by writting a simple text copy meathod:
public static int CopyFile()
{
try
{
StringBuffer sb = null;
String NullSpace = System.getProperty("line.separator");
Writer output = new BufferedWriter(new FileWriter(outputFile));
String line;
BufferedReader input = new BufferedReader(new FileReader(myFile));
while((line = input.readLine())!=null)
{
sb = new StringBuffer();
//Parsing would happen
sb.append(line);
output.write(sb.toString()+NullSpace);
}
return 0;
}
catch (Exception e)
{
return 1;
}
}
Can anybody offer some advice as how to correct this problem?
★My solution
InputStream in = new FileInputStream(myFile);
Reader reader = new InputStreamReader(in,"utf-8");
Reader buffer = new BufferedReader(reader);
Writer output = new BufferedWriter(new FileWriter(outputFile));
int r;
while ((r = reader.read()) != -1)
{
if (r<126)
{
output.write(r);
}
else
{
output.write("&#"+Integer.toString(r)+";");
}
}
output.flush();
The file read is not in the same encoding (probably UTF-8) as the file written (probably ISO-8859-1).
Try the following to generate a file with UTF-8 encoding:
BufferedWriter output = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(outputFile),"UTF8"));
Unfortunately, determining the encoding of a file is very difficult. See Java : How to determine the correct charset encoding of a stream
In addition to what Thierry-Dimitri Roy wrote, if you know the encoding you have to create your FileReader with a bit of extra work. From the docs:
Convenience class for reading
character files. The constructors of
this class assume that the default
character encoding and the default
byte-buffer size are appropriate. To
specify these values yourself,
construct an InputStreamReader on a
FileInputStream.
The Javadoc for FileReader says:
The constructors of this class assume that the default character encoding and the default byte-buffer size are appropriate. To specify these values yourself, construct an InputStreamReader on a FileInputStream.
In your case the default character encoding is probably not appropriate. Find what encoding the input file uses, and specify it. For example:
FileInputStream fis = new FileInputStream(myFile);
InputStreamReader isr = new InputStreamReader(fis, "charset name goes here");
BufferedReader input = new BufferedReader(isr);
Currently I am trying something very simple. I am looking through an XML document for a certain phrase upon which I try to replace it. The problem I am having is that when I read the lines I store each line into a StringBuffer. When I write the it to a document everything is written on a single line.
Here my code:
File xmlFile = new File("abc.xml")
BufferedReader br = new BufferedReader(new FileReade(xmlFile));
String line = null;
while((line = br.readLine())!= null)
{
if(line.indexOf("abc") != -1)
{
line = line.replaceAll("abc","xyz");
}
sb.append(line);
}
br.close();
BufferedWriter bw = new BufferedWriter(new FileWriter(xmlFile));
bw.write(sb.toString());
bw.close();
I am assuming I need a new line character when I prefer sb.append but unfortunately I don't know which character to use as "\n" does not work.
Thanks in advance!
P.S. I figured there must be a way to use Xalan to format the XML file after I write to it or something. Not sure how to do that though.
The readline reads everything between the newline characters so when you write back out, obviously the newline characters are missing. These characters depend on the OS: windows uses two characters to do a newline, unix uses one for example. To be OS agnostic, retrieve the system property "line.separator":
String newline = System.getProperty("line.separator");
and append it to your stringbuffer:
sb.append(line).append(newline);
Modified as suggested by Brel, your text-substituting approach should work, and it will work well enough for simple applications.
If things start to get a little hairier, and you end up wanting to select elements based on their position in the XML structure, and if you need to be sure to change element text but not tag text (think <abc>abc</abc>), then you'll want to call in in the cavalry and process the XML with an XML parser.
Essentially you read in a Document using a DocuemntBuilder, you hop around the document's nodes doing whatever you need to, and then ask the Document to write itself back to file. Or do you ask the parser? Anyway, most XML parsers have a handful of options that let you format the XML output: You can specify indentation (or not) and maybe newlines for every opening tag, that kinda thing, to make your XML look pretty.
Sb would be the StringBuffer object, which has not been instantiated in this example. This can added before the while loop:
StringBuffer sb = new StringBuffer();
Scanner scan = new Scanner(System.in);
String filePath = scan.next();
String oldString = "old_string";
String newString = "new_string";
String oldContent = "";
BufferedReader br = null;
FileWriter writer = null;
File xmlFile = new File(filePath);
try {
br = new BufferedReader(new FileReader(xmlFile));
String line = br.readLine();
while (line != null) {
oldContent = oldContent + line + System.lineSeparator();
line = br.readLine();
}
String newContent = oldContent.replaceAll(oldString, newString);
writer = new FileWriter(xmlFile);
writer.write(newContent);
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
scan.close();
br.close();
writer.close();
} catch (IOException e) {
e.printStackTrace();
}
}