java convert to utf-8 from postgres - java

I'm generating html files whit this data (stored in postgres):
The html files are generated as UTF-8, but the string looks like they appear in the DB.
How I can do to make the text appear correctly? Like: Últiles de Escritorio
Note. I'm not able to change postgres configuration, I'm using Java 1.6, Postgres 8.4, JDBC
UPDATE:
I use this code to create the html files:
public static void stringToFile (String file_name, String file_content) throws IOException {
OutputStream out = new FileOutputStream(file_name);
OutputStreamWriter writer = new OutputStreamWriter(out, "UTF-8");
try {
try {
writer.write(file_content);
} finally {
writer.close();
}
} finally {
out.close();
}
}
And I use it like:
StringBuilder html_content = new StringBuilder();
ResultSet result_set = statement.executeQuery(sql_query);
while (result_set.next()) {
html_content.append(String.format('<li>%s</li>', result_set.getString(1)));
}
Utils.stringToFile('thehtmlfile.html', html_content.toString());
UPDATE: [SOLVED]
This works for me:
new String(str.getBytes("ISO-8859-1"), "UTF-8")

I hope you're sure that the error isn't happening before you add the string to the db.
Because you can't change your db settings, this isn't so easy to handle, first of all you have to know in which format the text will be saved in your db, see here.
Than you should be a familiar with UTF16 see here and here.
Now, after you are familiar with codecs you wanna use, you have to create the correct utf16 value for each character you get from the db. I will just point out how it could work, the correct implementation you have to do by your own.
public char createUTF16char( char first, char second ) {
char res = first;
res = res << 8;
res = res & (0x0F & second);
return res;
}
This code should just combine the last 8bit of each char ( first and second ) to a new char.
Maybe this is the operation you need but it depends on the coding what is used on the server.
Sincerely

Related

Why do I get an Excel warning about file format and extension mismatch when I try to download an excel file? [duplicate]

I have this application I'm developing in JSP and I wish to export some data from the database in XLS (MS Excel format).
Is it possible under tomcat to just write a file as if it was a normal Java application, and then generate a link to this file? Or do I need to use a specific API for it?
Will I have permission problems when doing this?
While you can use a full fledged library like JExcelAPI, Excel will also read CSV and plain HTML tables provided you set the response MIME Type to something like "application/vnd.ms-excel".
Depending on how complex the spreadsheet needs to be, CSV or HTML can do the job for you without a 3rd party library.
Don't use plain HTML tables with an application/vnd.ms-excel content type. You're then basically fooling Excel with a wrong content type which would cause failure and/or warnings in the latest Excel versions. It will also messup the original HTML source when you edit and save it in Excel. Just don't do that.
CSV in turn is a standard format which enjoys default support from Excel without any problems and is in fact easy and memory-efficient to generate. Although there are libraries out, you can in fact also easily write one in less than 20 lines (funny for ones who can't resist). You just have to adhere the RFC 4180 spec which basically contains only 3 rules:
Fields are separated by a comma.
If a comma occurs within a field, then the field has to be surrounded by double quotes.
If a double quote occurs within a field, then the field has to be surrounded by double quotes and the double quote within the field has to be escaped by another double quote.
Here's a kickoff example:
public static <T> void writeCsv (List<List<T>> csv, char separator, OutputStream output) throws IOException {
BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(output, "UTF-8"));
for (List<T> row : csv) {
for (Iterator<T> iter = row.iterator(); iter.hasNext();) {
String field = String.valueOf(iter.next()).replace("\"", "\"\"");
if (field.indexOf(separator) > -1 || field.indexOf('"') > -1) {
field = '"' + field + '"';
}
writer.append(field);
if (iter.hasNext()) {
writer.append(separator);
}
}
writer.newLine();
}
writer.flush();
}
Here's an example how you could use it:
public static void main(String[] args) throws IOException {
List<List<String>> csv = new ArrayList<List<String>>();
csv.add(Arrays.asList("field1", "field2", "field3"));
csv.add(Arrays.asList("field1,", "field2", "fie\"ld3"));
csv.add(Arrays.asList("\"field1\"", ",field2,", ",\",\",\""));
writeCsv(csv, ',', System.out);
}
And inside a Servlet (yes, Servlet, don't use JSP for this!) you can basically do:
protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
String filename = request.getPathInfo().substring(1);
List<List<Object>> csv = someDAO().findCsvContentFor(filename);
response.setHeader("content-type", "text/csv");
response.setHeader("content-disposition", "attachment;filename=\"" + filename + "\"");
writeCsv(csv, ';', response.getOutputStream());
}
Map this servlet on something like /csv/* and invoke it as something like http://example.com/context/csv/filename.csv. That's all.
Note that I added the possiblity to specify the separator character separately, because it may depend on the locale used whether Excel would accept a comma , or semicolon ; as CSV field separator. Note that I also added the filename to the URL pathinfo, because a certain webbrowser developed by a team in Redmond otherwise wouldn't save the download with the proper filename.
You will probably need a library to manipulate Excel files, like JExcelAPI ("jxl") or POI. I'm more familiar with jxl and it can certainly write files. You can generate them and store them by serving a URL to them but I wouldn't. Generated files are a pain. They add complication in the form on concurrency, clean-up processes, etc.
If you can generate the file on the fly and stream it to the client through the standard servlet mechanisms.
If it's generated many, may times or the generation is expensive then you can cache the result somehow but I'd be more inclined to keep it in memory than as a file. I'd certainly avoid, if you can, linking directly to the generated file by URL. If you go via a servlet it'll allow you to change your impleemntation later. It's the same encapsualtion concept as in OO dsign.
POI or JExcel are good APIs. I personally like better POI, plus POI is constantly updated. Furthermore, there are more resources online about POI than JExcel in case you have any questions. However, either of the two does a great job.
maybe you should consider using some reporting tool with an option of exporting files into XLS format. my suggestion is JasperReports
try {
String absoluteDiskPath = test.xls";
File f = new File(absoluteDiskPath);
response.setContentType("application/xlsx");
response.setHeader("Content-Disposition", "attachment; filename=" + absoluteDiskPath);
String name = f.getName().substring(f.getName().lastIndexOf("/") + 1, f.getName().length());
InputStream in = new FileInputStream(f);
out.clear(); //clear outputStream prevent illegalStateException write binary data to outputStream
ServletOutputStream outs = response.getOutputStream();
int bit = 256;
int i = 0;
try {
while ((bit) >= 0) {
bit = in.read();
outs.write(bit);
}
outs.flush();
outs.close();
in.close();
} catch (IOException ioe) {
ioe.printStackTrace();
} finally {
try {
if(outs != null)
outs.close();
if(in != null)
in.close();
}catch (Exception ioe2) {
ioe2.printStackTrace();
}
}
} catch (Exception ex) {
ex.printStackTrace();
}
I tried like as below in JSP, it is working fine.
<% String filename = "xyz.xls";
response.setContentType("application/octet-stream");
response.setHeader("Content-Disposition","attachment; filename=\"" + filename + "\"");
java.io.File excelFile=new java.io.File("C:\\Users\\hello\\Desktop\\xyz.xls");
java.io.FileInputStream fileInputStream=new java.io.FileInputStream(excelFile);
byte[] bytes = new byte[(int) excelFile.length()];
int offset = 0;
while (offset < bytes.length)
{
int result = fileInputStream.read(bytes, offset, bytes.length - offset);
if (result == -1) {
break;
}
offset += result;
}
javax.servlet.ServletOutputStream outs = response.getOutputStream();
outs.write(bytes);
outs.flush();
outs.close();
fileInputStream.close();
%>

how do i get the data from a database and store it into a text file?

I am new to databases in Java and i am trying to export the data from 1 table and store it in a text file. At the moment the code below writes to the text file however all on one line? can anyone help?
My Code
private static String listHeader() {
String output = "Id Priority From Label Subject\n";
output += "== ======== ==== ===== =======\n";
return output;
}
public static String Export_Message_Emails() {
String output = listHeader();
output +="\n";
try {
ResultSet res = stmt.executeQuery("SELECT * from messages ORDER BY ID ASC");
while (res.next()) { // there is a result
output += formatListEntry(res);
output +="\n";
}
} catch (Exception e) {
System.out.println(e);
return null;
}
return output;
}
public void exportCode(String File1){
try {
if ("Messages".equals(nameOfFile)){
fw = new FileWriter(f);
//what needs to be written here
//fw.write(MessageData.listAll());
fw.write(MessageData.Export_Message_Emails());
fw.close();
}
}
Don't use a hard coded value of "\n". Instead use System.getProperty("line.separator"); or if you are using Java 7 or greater, you can use System.lineSeparator();
Try String.format("%n") instead "\n".
Unless you're trying to practice your Java programming (which is perfectly fine of course!), you can export all the data from one table and store it in a file by using the SYSCS_UTIL.SYSCS_EXPORT_TABLE system procedure: http://db.apache.org/derby/docs/10.11/ref/rrefexportproc.html
I'm gonna assume you are using Windows and that you are opening your file with notepad. If that is correct then it is not really a problem with your output but with the editor you are viewing it with.
Try a nicer editor, ie. Notepad++
Do as the other answers suggest and use System.getProperty("line.separator"); or similar.
Use a Writer implementation such as, PrintWriter.
Personally I prefer "\n" over the system line separator, which on Windows is "\r\n".
EDIT: Added option 3

UTF-8 Encoding in java, retrieving data from website

I'm trying to get data from website which is encoded in UTF-8 and insert them into the database (MYSQL). Database is also encoded in UTF-8.
This is the method I use to download data from specific site.
public String download(String url) throws java.io.IOException {
java.io.InputStream s = null;
java.io.InputStreamReader r = null;
StringBuilder content = new StringBuilder();
try {
s = (java.io.InputStream)new URL(url).getContent();
r = new java.io.InputStreamReader(s, "UTF-8");
char[] buffer = new char[4*1024];
int n = 0;
while (n >= 0) {
n = r.read(buffer, 0, buffer.length);
if (n > 0) {
content.append(buffer, 0, n);
}
}
}
finally {
if (r != null) r.close();
if (s != null) s.close();
}
return content.toString();
}
If encoding is set to 'UTF-8' (r = new java.io.InputStreamReader(s, "UTF-8"); ) data inserted into database seems to look OK, but when I try to display it, I am getting something like this: C�te d'Ivoire, instead of Côte d'Ivoire.
All my websites are encoded in UTF-8.
Please help.
If encoding is set to 'windows-1252' (r = new java.io.InputStreamReader(s, "windows-1252"); ) everything works fine and I am getting Côte d'Ivoire on my website (), but in java this title looks like 'C?´te d'Ivoire' what breaks other things, such as for example links. What does it mean ?
I would consider using commons-io, they have a function doing what you want to do:link
That is replace your code with this:
public String download(String url) throws java.io.IOException {
java.io.InputStream s = null;
String content = null;
try {
s = (java.io.InputStream)new URL(url).getContent();
content = IOUtils.toString(s, "UTF-8")
}
finally {
if (s != null) s.close();
}
return content.toString();
}
if that nots doing start looking into if you can store it to file correctly to eliminate the possibility that your db isn't set up correctly.
Java
The problem seems to lie in the HttpServletResponse , if you have a servlet or jsp page. Make sure to set your HttpServletResponse encoding to UTF-8.
In a jsp page or in the doGet or doPost of a servlet, before any content is sent to the response, just do :
response.setCharacterEncoding("UTF-8");
PHP
In PHP, try to use the utf8-encode function after retrieving from the database.
Is your database encoding set to UTF-8 for both server, client, connection and have the tables been created with that encoding? Check 'show variables' and 'show create table <one-of-the-tables>'
If encoding is set to 'UTF-8' (r = new java.io.InputStreamReader(s, "UTF-8"); ) data inserted into database seems to look OK, but when I try to display it, I am getting something like this: C�te d'Ivoire, instead of Côte d'Ivoire.
Thus, the encoding during the display is wrong. How are you displaying it? As per the comments, it's a PHP page? If so, then you need to take two things into account:
Write them to HTTP response output using the same encoding, thus UTF-8.
Set content type to UTF-8 so that the webbrowser knows which encoding to use to display text.
As per the comments, you have apparently already done 2. Left behind 1, in PHP you need to install mb_string and set mbstring.http_output to UTF-8 as well. I have found this cheatsheet very useful.

How to save Chinese Characters to file with java?

I use the following code to save Chinese characters into a .txt file, but when I opened it with Wordpad, I couldn't read it.
StringBuffer Shanghai_StrBuf = new StringBuffer("\u4E0A\u6D77");
boolean Append = true;
FileOutputStream fos;
fos = new FileOutputStream(FileName, Append);
for (int i = 0;i < Shanghai_StrBuf.length(); i++) {
fos.write(Shanghai_StrBuf.charAt(i));
}
fos.close();
What can I do ? I know if I cut and paste Chinese characters into Wordpad, I can save it into a .txt file. How do I do that in Java ?
There are several factors at work here:
Text files have no intrinsic metadata for describing their encoding (for all the talk of angle-bracket taxes, there are reasons XML is popular)
The default encoding for Windows is still an 8bit (or doublebyte) "ANSI" character set with a limited range of values - text files written in this format are not portable
To tell a Unicode file from an ANSI file, Windows apps rely on the presence of a byte order mark at the start of the file (not strictly true - Raymond Chen explains). In theory, the BOM is there to tell you the endianess (byte order) of the data. For UTF-8, even though there is only one byte order, Windows apps rely on the marker bytes to automatically figure out that it is Unicode (though you'll note that Notepad has an encoding option on its open/save dialogs).
It is wrong to say that Java is broken because it does not write a UTF-8 BOM automatically. On Unix systems, it would be an error to write a BOM to a script file, for example, and many Unix systems use UTF-8 as their default encoding. There are times when you don't want it on Windows, either, like when you're appending data to an existing file: fos = new FileOutputStream(FileName,Append);
Here is a method of reliably appending UTF-8 data to a file:
private static void writeUtf8ToFile(File file, boolean append, String data)
throws IOException {
boolean skipBOM = append && file.isFile() && (file.length() > 0);
Closer res = new Closer();
try {
OutputStream out = res.using(new FileOutputStream(file, append));
Writer writer = res.using(new OutputStreamWriter(out, Charset
.forName("UTF-8")));
if (!skipBOM) {
writer.write('\uFEFF');
}
writer.write(data);
} finally {
res.close();
}
}
Usage:
public static void main(String[] args) throws IOException {
String chinese = "\u4E0A\u6D77";
boolean append = true;
writeUtf8ToFile(new File("chinese.txt"), append, chinese);
}
Note: if the file already existed and you chose to append and existing data wasn't UTF-8 encoded, the only thing that code will create is a mess.
Here is the Closer type used in this code:
public class Closer implements Closeable {
private Closeable closeable;
public <T extends Closeable> T using(T t) {
closeable = t;
return t;
}
#Override public void close() throws IOException {
if (closeable != null) {
closeable.close();
}
}
}
This code makes a Windows-style best guess about how to read the file based on byte order marks:
private static final Charset[] UTF_ENCODINGS = { Charset.forName("UTF-8"),
Charset.forName("UTF-16LE"), Charset.forName("UTF-16BE") };
private static Charset getEncoding(InputStream in) throws IOException {
charsetLoop: for (Charset encodings : UTF_ENCODINGS) {
byte[] bom = "\uFEFF".getBytes(encodings);
in.mark(bom.length);
for (byte b : bom) {
if ((0xFF & b) != in.read()) {
in.reset();
continue charsetLoop;
}
}
return encodings;
}
return Charset.defaultCharset();
}
private static String readText(File file) throws IOException {
Closer res = new Closer();
try {
InputStream in = res.using(new FileInputStream(file));
InputStream bin = res.using(new BufferedInputStream(in));
Reader reader = res.using(new InputStreamReader(bin, getEncoding(bin)));
StringBuilder out = new StringBuilder();
for (int ch = reader.read(); ch != -1; ch = reader.read())
out.append((char) ch);
return out.toString();
} finally {
res.close();
}
}
Usage:
public static void main(String[] args) throws IOException {
System.out.println(readText(new File("chinese.txt")));
}
(System.out uses the default encoding, so whether it prints anything sensible depends on your platform and configuration.)
If you can rely that the default character encoding is UTF-8 (or some other Unicode encoding), you may use the following:
Writer w = new FileWriter("test.txt");
w.append("上海");
w.close();
The safest way is to always explicitly specify the encoding:
Writer w = new OutputStreamWriter(new FileOutputStream("test.txt"), "UTF-8");
w.append("上海");
w.close();
P.S. You may use any Unicode characters in Java source code, even as method and variable names, if the -encoding parameter for javac is configured right. That makes the source code more readable than the escaped \uXXXX form.
Be very careful with the approaches proposed. Even specifying the encoding for the file as follows:
Writer w = new OutputStreamWriter(new FileOutputStream("test.txt"), "UTF-8");
will not work if you're running under an operating system like Windows. Even setting the system property for file.encoding to UTF-8 does not fix the issue. This is because Java fails to write a byte order mark (BOM) for the file. Even if you specify the encoding when writing out to a file, opening the same file in an application like Wordpad will display the text as garbage because it doesn't detect the BOM. I tried running the examples here in Windows (with a platform/container encoding of CP1252).
The following bug exists to describe the issue in Java:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4508058
The solution for the time being is to write the byte order mark yourself to ensure the file opens correctly in other applications. See this for more details on the BOM:
http://mindprod.com/jgloss/bom.html
and for a more correct solution see the following link:
http://tripoverit.blogspot.com/2007/04/javas-utf-8-and-unicode-writing-is.html
Here's one way among many. Basically, we're just specifying that the conversion be done to UTF-8 before outputting bytes to the FileOutputStream:
String FileName = "output.txt";
StringBuffer Shanghai_StrBuf=new StringBuffer("\u4E0A\u6D77");
boolean Append=true;
Writer writer = new OutputStreamWriter(new FileOutputStream(FileName,Append), "UTF-8");
writer.write(Shanghai_StrBuf.toString(), 0, Shanghai_StrBuf.length());
writer.close();
I manually verified this against the images at http://www.fileformat.info/info/unicode/char/ . In the future, please follow Java coding standards, including lower-case variable names. It improves readability.
Try this,
StringBuffer Shanghai_StrBuf=new StringBuffer("\u4E0A\u6D77");
boolean Append=true;
Writer out = new BufferedWriter(new OutputStreamWriter(
new FileOutputStream(FileName,Append), "UTF8"));
for (int i=0;i<Shanghai_StrBuf.length();i++) out.write(Shanghai_StrBuf.charAt(i));
out.close();

JSP generating Excel spreadsheet (XLS) to download

I have this application I'm developing in JSP and I wish to export some data from the database in XLS (MS Excel format).
Is it possible under tomcat to just write a file as if it was a normal Java application, and then generate a link to this file? Or do I need to use a specific API for it?
Will I have permission problems when doing this?
While you can use a full fledged library like JExcelAPI, Excel will also read CSV and plain HTML tables provided you set the response MIME Type to something like "application/vnd.ms-excel".
Depending on how complex the spreadsheet needs to be, CSV or HTML can do the job for you without a 3rd party library.
Don't use plain HTML tables with an application/vnd.ms-excel content type. You're then basically fooling Excel with a wrong content type which would cause failure and/or warnings in the latest Excel versions. It will also messup the original HTML source when you edit and save it in Excel. Just don't do that.
CSV in turn is a standard format which enjoys default support from Excel without any problems and is in fact easy and memory-efficient to generate. Although there are libraries out, you can in fact also easily write one in less than 20 lines (funny for ones who can't resist). You just have to adhere the RFC 4180 spec which basically contains only 3 rules:
Fields are separated by a comma.
If a comma occurs within a field, then the field has to be surrounded by double quotes.
If a double quote occurs within a field, then the field has to be surrounded by double quotes and the double quote within the field has to be escaped by another double quote.
Here's a kickoff example:
public static <T> void writeCsv (List<List<T>> csv, char separator, OutputStream output) throws IOException {
BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(output, "UTF-8"));
for (List<T> row : csv) {
for (Iterator<T> iter = row.iterator(); iter.hasNext();) {
String field = String.valueOf(iter.next()).replace("\"", "\"\"");
if (field.indexOf(separator) > -1 || field.indexOf('"') > -1) {
field = '"' + field + '"';
}
writer.append(field);
if (iter.hasNext()) {
writer.append(separator);
}
}
writer.newLine();
}
writer.flush();
}
Here's an example how you could use it:
public static void main(String[] args) throws IOException {
List<List<String>> csv = new ArrayList<List<String>>();
csv.add(Arrays.asList("field1", "field2", "field3"));
csv.add(Arrays.asList("field1,", "field2", "fie\"ld3"));
csv.add(Arrays.asList("\"field1\"", ",field2,", ",\",\",\""));
writeCsv(csv, ',', System.out);
}
And inside a Servlet (yes, Servlet, don't use JSP for this!) you can basically do:
protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
String filename = request.getPathInfo().substring(1);
List<List<Object>> csv = someDAO().findCsvContentFor(filename);
response.setHeader("content-type", "text/csv");
response.setHeader("content-disposition", "attachment;filename=\"" + filename + "\"");
writeCsv(csv, ';', response.getOutputStream());
}
Map this servlet on something like /csv/* and invoke it as something like http://example.com/context/csv/filename.csv. That's all.
Note that I added the possiblity to specify the separator character separately, because it may depend on the locale used whether Excel would accept a comma , or semicolon ; as CSV field separator. Note that I also added the filename to the URL pathinfo, because a certain webbrowser developed by a team in Redmond otherwise wouldn't save the download with the proper filename.
You will probably need a library to manipulate Excel files, like JExcelAPI ("jxl") or POI. I'm more familiar with jxl and it can certainly write files. You can generate them and store them by serving a URL to them but I wouldn't. Generated files are a pain. They add complication in the form on concurrency, clean-up processes, etc.
If you can generate the file on the fly and stream it to the client through the standard servlet mechanisms.
If it's generated many, may times or the generation is expensive then you can cache the result somehow but I'd be more inclined to keep it in memory than as a file. I'd certainly avoid, if you can, linking directly to the generated file by URL. If you go via a servlet it'll allow you to change your impleemntation later. It's the same encapsualtion concept as in OO dsign.
POI or JExcel are good APIs. I personally like better POI, plus POI is constantly updated. Furthermore, there are more resources online about POI than JExcel in case you have any questions. However, either of the two does a great job.
maybe you should consider using some reporting tool with an option of exporting files into XLS format. my suggestion is JasperReports
try {
String absoluteDiskPath = test.xls";
File f = new File(absoluteDiskPath);
response.setContentType("application/xlsx");
response.setHeader("Content-Disposition", "attachment; filename=" + absoluteDiskPath);
String name = f.getName().substring(f.getName().lastIndexOf("/") + 1, f.getName().length());
InputStream in = new FileInputStream(f);
out.clear(); //clear outputStream prevent illegalStateException write binary data to outputStream
ServletOutputStream outs = response.getOutputStream();
int bit = 256;
int i = 0;
try {
while ((bit) >= 0) {
bit = in.read();
outs.write(bit);
}
outs.flush();
outs.close();
in.close();
} catch (IOException ioe) {
ioe.printStackTrace();
} finally {
try {
if(outs != null)
outs.close();
if(in != null)
in.close();
}catch (Exception ioe2) {
ioe2.printStackTrace();
}
}
} catch (Exception ex) {
ex.printStackTrace();
}
I tried like as below in JSP, it is working fine.
<% String filename = "xyz.xls";
response.setContentType("application/octet-stream");
response.setHeader("Content-Disposition","attachment; filename=\"" + filename + "\"");
java.io.File excelFile=new java.io.File("C:\\Users\\hello\\Desktop\\xyz.xls");
java.io.FileInputStream fileInputStream=new java.io.FileInputStream(excelFile);
byte[] bytes = new byte[(int) excelFile.length()];
int offset = 0;
while (offset < bytes.length)
{
int result = fileInputStream.read(bytes, offset, bytes.length - offset);
if (result == -1) {
break;
}
offset += result;
}
javax.servlet.ServletOutputStream outs = response.getOutputStream();
outs.write(bytes);
outs.flush();
outs.close();
fileInputStream.close();
%>

Categories