I have one CSV file which contains many records. Noticed that some of the records contain French characters. My script reads each record and processes it and inserts the processed record in the XML. When we view the .csv file on terminal using VIM Editor on Fedora system, the French characters are displayed in correct format. But after processing the records these characters are not getting displayed properly. Also when such a record is printed on the console, it is not displayed properly.
For eg.
String in .csv file : Crêpe Skirt
String in XML : Cr�pe Skirt
code Snippet for Reading file.
BufferedReader file = new BufferedReader(new FileReader(fileLocation));
String line = file.readLine();
Kindly suggest a way to handle such issue.
You need to know what encoding the file is in (probably UTF-8) and then when you open the file in Java specify the same encoding.
try reading the file as UTF-8 file. And provide the encoding of your xml file as UTF-8 too
BufferedReader reader=new BufferedReader(new InputStreamReader(new FileInputStream(your-file-path),"UTF-8"));
String line="";
while((line=reader.readLine())!=null) {
//Do your work here
}
Related
first i like to explain our usecase.
Users can upload some text files using our website. The file will be stored in a folder and will be read using java.
Our problem: Most users using ansi encoded text files but some uses utf-8 encoding.
If i read the text file in java i did not read the file correctly. For example the Word "Äpfel" will be read as "?pfel".
I know i can use the encoding settings in my reader:
reader = new BufferedReader(new InputStreamReader(new FileInputStream(csvFile), "UTF-8"));
But how can i determine the correct coding?
My idea is to read the file once and check if there is any unknown char like "?pfel" but how can i check the char is not correct?
BufferedReader in = new BufferedReader(new FileReader( fi ));
while ( in.ready() ) {
String row = in.readLine();
...
How can i check row contains unkown chars ??????
}
Thanks for your help!
Unable to read euro symbol(€) and German characters like (ö, ß, ü, ä, Ä) from a CSV file in java. The point here is that this behaviour is only seen when uploading a CSV file through windows. When we upload the same file from Linux machines we are able to see the Character uploaded successfully. We are using Linux servers which by default uses UTF-8 character set. We also tried to change the character set from UTF-8 to ISO_8859_1, but some character sets are not supported in Linux environment.
Code Overview:
The code is basically a Rest Service, which accepts .csv file as multipart form data. Below is the sample code which is used to write the uploaded contents to the file system.
// Reading file data using multipart-form/data
FormDataBodyPart fileDataBodyPart = multiPart.getField("fileContent");
InputStream fileInputStream = fileDataBodyPart.getValueAs(InputStream.class)
// Writing to a TEMP location
String line = null;
BufferedReader skipLine = new BufferedReader(new InputStreamReader(fileInputStream, StandardCharsets.ISO_8859_1));
OutputStreamWriter writer = new OutputStreamWriter(outputStream, StandardCharsets.ISO_8859_1);
while ((line = skipLine.readLine()) != null) {
line = line + "\n";
writer.write(line);
}
I am facing a problem in saving a text file in UTF-8 format using java. When i click on save as for the generated text file, it gives ANSI as text format but not as UTF-8. Below is the code i am writing while creating the file:
String excelFile="";
excelFile = getFullFilePath(fileName);
File file = new File(excelFile + fileName,"UTF-8");
File output = new File(excelFile,"UTF-8");
FileUtils.writeStringToFile(file, content, "UTF-8");
While creating the text file, I am using UTF-8 encoding, but the file still shows the encoding as ANSI while saving.
Kindly help.
Instead of using File, create a FileOutputStream.
Try this.
Writer out = new BufferedWriter(new OutputStreamWriter(
new FileOutputStream("outfilename"), "UTF-8"));
try {
out.write(aString);
} finally {
out.close();
}
I had a problem very similar to that and I solved saving the original file with the UTF-8 encoding. In this case, go to the excel file and save it again, or create a new file and copy the content, and make sure that its encoding is UTF-8. This link has a tutorial on how to save an excel file with UTF-8 encoding: https://help.surveygizmo.com/help/encode-an-excel-file-to-utf-8-or-utf-16. At the most your code seems to be correct.
I'm doing a connection via JDBC/ODBC to Microsoft Access successfully. After that, I make a query to select rows from Microsoft Access, and I write these results to a TXT file. Everything is OK, but I have some strings that include accents, and these appear as '?' in TXT file. I already tried various forms of methods to write files in java, as PrintWriter, FileWriter, Outputstream, and others, including adding character encoding parameter (UTF-8 or ISO-8859-1) to some these methods. I need any help about some way to show these characters in a right way. Thanks.
Try the below line,
String OUTPUTFILE = "PATH/TO/FILE/";
BufferedWriter bf = new BufferedWriter(new OutputStreamWriter(
new FileOutputStream(OUTPUTFILE),"UTF8"));
Once you add that to your code you should be fine using bf.write('VALUE') to write UTF8 characters to your file. And, also make sure to set your text editor encoding to Unicode or UTF8, if you don't it might seem like the hole process didn't work which would lead to even more confusion.
Edited:
To read UTF8 txts
String IPUTFILE = "PATH/TO/File";
BufferedReader in = new BufferedReader(
new InputStreamReader(
new FileInputStream(INPUTFILE), "UTF8"));
then to read line String str = in.readLine();
I tried adding UTF-8 for this but it didn't work out. What should i do for reading a Russian file in Java?
FileInputStream fstream1 = new FileInputStream("russian.txt");
DataInputStream in = new DataInputStream(fstream1);
BufferedReader br = new BufferedReader(new InputStreamReader(in,"UTF-8"));
If the file is from Windows PC, try either "windows-1251" or "Cp1251" for the charset name.
If the file is somehow in the MS-DOS encoding, try using "Cp866".
Both of these are single-byte encodings and changing the file type to UTF-8 (which is multibyte) does nothing.
If all else fails, use the hex editor and dump a few hex lines of these file to you question. Then we'll detect the encoding.
As others mentioned you need to know how the file is encoded. A simple check is to (ab)use Firefox as an encoding detector: answer to similar question
If this is a display problem, it depends what you mean by "reads": in the console, in some window? See also How can I make a String with cyrillic characters display correctly?