Encoding Problems - java

Having an issue with a java string used for emails in a java source file. The string contains "Protégé". Our server environment from what I have been able to determine uses UTF-8.
So I converted it to "Protégé" for UTF-8. It works great on our server, but when I run it locally it doesn't translate it properly. So I changed eclipse to use UTF-8 under preferences but it doesn't translate it locally. Still shows "Protégé". Any ideas?
From the comments:
I ran this locally and on our server:
OutputStreamWriter out = new OutputStreamWriter(new ByteArrayOutputStream());
System.out.println(out.getEncoding());
And it displays Cp1252 locally and UTF-8 on our JBoss server. We originally had the string with "Protégé" but on JBoss it only
shows "Prot".
When I use "Prot\u00e9g\u00e9" it works fine locally but when ran on our server it shows "Protg".

If the string contains "Prot\u00e9g\u00e9", this precludes a compiler encoding problem (like alluded by SyntaxT3rr0r), since it is now right in the Java String (unless there is a compiler bug, which I would not assume).
Thus we have an problem between output, transfer and display. How do you look at the output from your server? It could be that there somewhere is some recoding which destroys your strings. Or that somewhere some output is mis-declared.
If you are using a Terminal/command window to look at the output, consider setting it to UTF-8 before connecting to the server.
And yes, Java uses internally UTF-16 for the strings, but some system dependent encoding as both compiler default and default encoding of OutputStreamWriter/InputStreamReader and several other APIs which convert between strings and bytes. Looks like this is UTF-8 on the server and Windows-1252 on your client system. This should not really matter here.

Try this:
MimeMessage msg = new MimeMessage(session);
MimeBodyPart mbp1 = new MimeBodyPart();
mbp1.setDataHandler(new DataHandler(new ByteArrayDataSource(message.toString, "text/html")));
mbp1.setContent(new String(message.getBytes("UTF-8"),"ISO-8859-1"), "text/html");
Multipart mp = new MimeMultipart();
mp.addBodyPart(mbp1);
msg.setContent(mp, "text/html");
put your language char set instead of "ISO-8859-1"

Related

How to resolve UTF-8 enconding in JSP on tomcat server? [duplicate]

I have used Java Mail API, for sending emails. I am using a contact formular to send the input, which has to be send to a specific email.
The email is send without problems, though I am a danish guy, and I am therefore in need of three danish characters which is 'æ', 'ø' and 'å', in the subject and the email text.
I have therefore seen that I can use UTF-8 character encoding, to provide these characters, but when my mail is send I only see some strange letters - 'ã¦', 'ã¸' and 'ã¥' - instead of the danish letters - 'æ', 'ø' and 'å'.
My method to send the email is looking like this:
public void sendEmail(String name, String fromEmail, String subject, String message) throws AddressException, MessagingException, UnsupportedEncodingException, SendFailedException
{
//Set Mail properties
Properties props = System.getProperties();
props.setProperty("mail.smtp.starttls.enable", "true");
props.setProperty("mail.smtp.host", "smtp.gmail.com");
props.setProperty("mail.smtp.socketFactory.port", "465");
props.setProperty("mail.smtp.socketFactory.class", "javax.net.ssl.SSLSocketFactory");
props.setProperty("mail.smtp.auth", "true");
props.setProperty("mail.smtp.port", "465");
Session session = Session.getDefaultInstance(props, new javax.mail.Authenticator() {
#Override
protected PasswordAuthentication getPasswordAuthentication() {
return new PasswordAuthentication("my_username", "my_password");
}
});
//Create the email with variable input
MimeMessage mimeMessage = new MimeMessage(session);
mimeMessage.setHeader("Content-Type", "text/plain; charset=UTF-8");
mimeMessage.setFrom(new InternetAddress(fromEmail, name));
mimeMessage.setRecipient(Message.RecipientType.TO, new InternetAddress("my_email"));
mimeMessage.setSubject(subject, "utf-8");
mimeMessage.setContent(message, "text/plain");
//Send the email
Transport.send(mimeMessage);
}
Please help me find out how I can correct this 'error'.
For all e-mails
There are a couple of system properties related to mailing, that can probably simplify your code. I am talking about this specific property actually: "mail.mime.charset".
The mail.mime.charset System property can be used to specify the default MIME charset to use for encoded words and text parts that don't otherwise specify a charset. Normally, the default MIME charset is derived from the default Java charset, as specified in the file.encoding System property. Most applications will have no need to explicitly set the default MIME charset. In cases where the default MIME charset to be used for mail messages is different than the charset used for files stored on the system, this property should be set.
As you can read above, by default there is no value for the mail.mime.charset and the file encoding (file.encoding property) is used.
For a specific e-mail
However, if you want to specify a specific encoding for a specific e-mail, then you should probably use the 2 parameter setSubject(subject,charset) and setText(text,charset) methods.
If that doesn't work, then probably your input is already corrupted before it reached this point. In other words, you probably used the wrong encoding to collect your data.
Mime types are complicated
The setContent(content, "UTF-8") (as other sources claim) will just not work. Just look at the signature of this method: setContent(Object content, String mimetype). Mime type and charset are 2 totally different things. Imho, you should really be using one of the setText(...) methods with a charset parameter.
But if you persist in using a mimetype to set the charset setContent(content,mimetype), then use the correct format. (not just "UTF-8", but something like "text/plain; charset=UTF-8"). But more importantly, be aware that every mime-type has its own way of handling charsets.
As specified in RFC-2046 the default charset for text/plain is US-ASCII, but can be overruled with an additional charset parameter.
However, in RFC-6657 makes clear that the text/xml type determines the charset using the content of the message. The charset parameter will just be ignored here.
And in RFC-2854 is stated that text/html should really always specify a charset. But if you don't, then it will use ISO-8859-1 (=Latin-1).
Maybe You should provide also UTF-8 here
mimeMessage.setContent(message, "text/plain; charset=UTF-8");
You have to look at http://www.coderanch.com/t/274480/java/java/JavaMail-set-content-utf
After spending a lot of time on debugging, and searching the internet for a clue, I have found a solution to my problem.
It seems that whenever I sended data through a web request, my application didn't encode the characters with UTF-8 encoding. This meant that the data which was send from my contact form, which contained æ, ø and å characters, couldn't be handled correct by the character encoding.
The solution seemed to setup a Character Encoding Filter, in my Deployment Descriptor, which would encode all incoming request from the web to be with the character encoding UTF-8.
private void registerCharacterEncodingFilter(ServletContext servletContext) {
CharacterEncodingFilter encodingFilter = new CharacterEncodingFilter();
encodingFilter.setEncoding("UTF-8");
encodingFilter.setForceEncoding(true);
FilterRegistration.Dynamic characterEncodingFilter = servletContext.addFilter("characterEncodingFilter", encodingFilter);
characterEncodingFilter.addMappingForUrlPatterns(null, false, "/*");
}
This filter sets the encoding to be UTF-8 and force the encoding to all requests comming at the url ' /* '.
It's easy,
run your project with parameter -Dfile.encoding=UTF-8
ex:
java -Dfile.encoding=UTF-8 -jar MyProject.jar
//Fix a typo
Before sending your String to the send method, you must convert the String into UTF-8
If you are receiving a "request" parameter, you can use "setCharacterEncoding":
request.setCharacterEncoding("utf-8");
String subject = request.getParameter("subject");
String content = request.getParameter("content");
...
MimeMessage mineMessage = new MimeMessage(session);
mineMessage.setFrom(new InternetAddress(myAccountEmail));
mineMessage.setRecipient(Message.RecipientType.TO, new InternetAddress(recepient));
mineMessage.setSubject(subject, "UTF-8");
mineMessage.setContent(content, "text/plain;charset=UTF-8");
Otherwise, convert your String into UTF-8 format with the following method:
String subject = new String(subject.getBytes(Charset.forName("ISO-8859-1")), Charset.forName("UTF-8"));
String content = new String(content.getBytes(Charset.forName("ISO-8859-1")), Charset.forName("UTF-8"));
...
MimeMessage mineMessage = new MimeMessage(session);
mineMessage.setFrom(new InternetAddress(myAccountEmail));
mineMessage.setRecipient(Message.RecipientType.TO, new InternetAddress(recepient));
mineMessage.setSubject(subject, "UTF-8");
mineMessage.setContent(content, "plain/plain;charset=UTF-8");
This is the result in Spanish.
mimeMessage.setContent(mail.getBody(), "text/html; charset=UTF-8");
maybe iam wrong, but this work for me. :) any ööö, äää, üüü character will shown correctly in my outlook.
outlook screenshot
I know I'm late to this question, but I had a similar problem just now.
It may be worth it to check your source encodings too! I was using a test class, with hardcoded subject/text containing some special characters, which kept coming garbled when sending the email. Even though I had set the charset UTF-8 wherever applicable (mimeMessage.setSubject(subject, charset), mimeMessage.setContent(content, "text/plain; charset=UTF-8")).
Then I noted that the source encoding of this class was windows-1252. From my understanding, when a java file is compiled, any source texts are converted to UTF-8. But in this case, in the maven pom.xml for this project, the project.build.sourceEncoding property was missing - so I'm actually not sure which encoding maven was using during compilation (by default) since none was specified.
Changing the source encoding was not possible here, but as soon as I changed the special characters to Unicode code literals (e.g. "ü" to "\u00fc"), the whole thing worked fine.
Maybe is too later, but there is a very simple method to fix this problem.
Just call this constructor to create a MimeMessageHelper that encode UTF-8 as we escpect:
MimeMessage **mimeMessage** = mailSender.createMimeMessage();
MimeMessageHelper **helper** = new MimeMessageHelper(mimeMessage, false(or true if you want include Multipart), "UTF-8");
No more actions are needed, continue the mail sending flow as you wish.

How to get UTF-8 conversion for a string

Frédéric in java converted to Frédéric.
However i need to pass the proper string to my client.
How to achieve this in Java ?
Did tried
String a = "Frédéric";
String b = new String(a.getBytes(), "UTF-8");
However string b also contain same value as a.
I am expecting string should able to store value as : Frédéric
How to pass this value properly to client.
If I understand the question correctly, you're looking for a function that will repair strings that have been damaged by others' encoding mistakes?
Here's one that seems to work on the example you gave:
static String fix(String badInput) {
byte[] bytes = badInput.getBytes(Charset.forName("cp1252"));
return new String(bytes, Charset.forName("UTF-8"));
}
fix("Frédéric") == "Frédéric"
The answer is quite complicated. See http://www.joelonsoftware.com/articles/Unicode.html for basic understanding.
My first suggestion would be to save your Java file with utf-8. Default for Eclipse on Windows would be cp1252 which might be your problem. Hope I could help.
Find your language code here and use that.
String a = new String(yourString.getBytes(), YOUR_ENCODING);
You can also try:
String a = URLEncoder.encode(yourString, HTTP.YOUR_ENCODING);
If System.out.println("Frédéric") shows the garbled output on the console it is most likely that the encodings used in your sourcecode (seems to be UTF-8) is not the same as the one used by the compiler - which by default is the platform-encoding, so probably some flavor of ISO-8859. Try using javac -encoding UTF-8 to compile your source (or set the appropriate property of your build environment) and you should be OK.
If you are sending this to some other piece of client software it's most likely an encoding issue on the client-side.

UTF-8 charset doesn't work with javax.mail

I have used Java Mail API, for sending emails. I am using a contact formular to send the input, which has to be send to a specific email.
The email is send without problems, though I am a danish guy, and I am therefore in need of three danish characters which is 'æ', 'ø' and 'å', in the subject and the email text.
I have therefore seen that I can use UTF-8 character encoding, to provide these characters, but when my mail is send I only see some strange letters - 'ã¦', 'ã¸' and 'ã¥' - instead of the danish letters - 'æ', 'ø' and 'å'.
My method to send the email is looking like this:
public void sendEmail(String name, String fromEmail, String subject, String message) throws AddressException, MessagingException, UnsupportedEncodingException, SendFailedException
{
//Set Mail properties
Properties props = System.getProperties();
props.setProperty("mail.smtp.starttls.enable", "true");
props.setProperty("mail.smtp.host", "smtp.gmail.com");
props.setProperty("mail.smtp.socketFactory.port", "465");
props.setProperty("mail.smtp.socketFactory.class", "javax.net.ssl.SSLSocketFactory");
props.setProperty("mail.smtp.auth", "true");
props.setProperty("mail.smtp.port", "465");
Session session = Session.getDefaultInstance(props, new javax.mail.Authenticator() {
#Override
protected PasswordAuthentication getPasswordAuthentication() {
return new PasswordAuthentication("my_username", "my_password");
}
});
//Create the email with variable input
MimeMessage mimeMessage = new MimeMessage(session);
mimeMessage.setHeader("Content-Type", "text/plain; charset=UTF-8");
mimeMessage.setFrom(new InternetAddress(fromEmail, name));
mimeMessage.setRecipient(Message.RecipientType.TO, new InternetAddress("my_email"));
mimeMessage.setSubject(subject, "utf-8");
mimeMessage.setContent(message, "text/plain");
//Send the email
Transport.send(mimeMessage);
}
Please help me find out how I can correct this 'error'.
For all e-mails
There are a couple of system properties related to mailing, that can probably simplify your code. I am talking about this specific property actually: "mail.mime.charset".
The mail.mime.charset System property can be used to specify the default MIME charset to use for encoded words and text parts that don't otherwise specify a charset. Normally, the default MIME charset is derived from the default Java charset, as specified in the file.encoding System property. Most applications will have no need to explicitly set the default MIME charset. In cases where the default MIME charset to be used for mail messages is different than the charset used for files stored on the system, this property should be set.
As you can read above, by default there is no value for the mail.mime.charset and the file encoding (file.encoding property) is used.
For a specific e-mail
However, if you want to specify a specific encoding for a specific e-mail, then you should probably use the 2 parameter setSubject(subject,charset) and setText(text,charset) methods.
If that doesn't work, then probably your input is already corrupted before it reached this point. In other words, you probably used the wrong encoding to collect your data.
Mime types are complicated
The setContent(content, "UTF-8") (as other sources claim) will just not work. Just look at the signature of this method: setContent(Object content, String mimetype). Mime type and charset are 2 totally different things. Imho, you should really be using one of the setText(...) methods with a charset parameter.
But if you persist in using a mimetype to set the charset setContent(content,mimetype), then use the correct format. (not just "UTF-8", but something like "text/plain; charset=UTF-8"). But more importantly, be aware that every mime-type has its own way of handling charsets.
As specified in RFC-2046 the default charset for text/plain is US-ASCII, but can be overruled with an additional charset parameter.
However, in RFC-6657 makes clear that the text/xml type determines the charset using the content of the message. The charset parameter will just be ignored here.
And in RFC-2854 is stated that text/html should really always specify a charset. But if you don't, then it will use ISO-8859-1 (=Latin-1).
Maybe You should provide also UTF-8 here
mimeMessage.setContent(message, "text/plain; charset=UTF-8");
You have to look at http://www.coderanch.com/t/274480/java/java/JavaMail-set-content-utf
After spending a lot of time on debugging, and searching the internet for a clue, I have found a solution to my problem.
It seems that whenever I sended data through a web request, my application didn't encode the characters with UTF-8 encoding. This meant that the data which was send from my contact form, which contained æ, ø and å characters, couldn't be handled correct by the character encoding.
The solution seemed to setup a Character Encoding Filter, in my Deployment Descriptor, which would encode all incoming request from the web to be with the character encoding UTF-8.
private void registerCharacterEncodingFilter(ServletContext servletContext) {
CharacterEncodingFilter encodingFilter = new CharacterEncodingFilter();
encodingFilter.setEncoding("UTF-8");
encodingFilter.setForceEncoding(true);
FilterRegistration.Dynamic characterEncodingFilter = servletContext.addFilter("characterEncodingFilter", encodingFilter);
characterEncodingFilter.addMappingForUrlPatterns(null, false, "/*");
}
This filter sets the encoding to be UTF-8 and force the encoding to all requests comming at the url ' /* '.
It's easy,
run your project with parameter -Dfile.encoding=UTF-8
ex:
java -Dfile.encoding=UTF-8 -jar MyProject.jar
//Fix a typo
Before sending your String to the send method, you must convert the String into UTF-8
If you are receiving a "request" parameter, you can use "setCharacterEncoding":
request.setCharacterEncoding("utf-8");
String subject = request.getParameter("subject");
String content = request.getParameter("content");
...
MimeMessage mineMessage = new MimeMessage(session);
mineMessage.setFrom(new InternetAddress(myAccountEmail));
mineMessage.setRecipient(Message.RecipientType.TO, new InternetAddress(recepient));
mineMessage.setSubject(subject, "UTF-8");
mineMessage.setContent(content, "text/plain;charset=UTF-8");
Otherwise, convert your String into UTF-8 format with the following method:
String subject = new String(subject.getBytes(Charset.forName("ISO-8859-1")), Charset.forName("UTF-8"));
String content = new String(content.getBytes(Charset.forName("ISO-8859-1")), Charset.forName("UTF-8"));
...
MimeMessage mineMessage = new MimeMessage(session);
mineMessage.setFrom(new InternetAddress(myAccountEmail));
mineMessage.setRecipient(Message.RecipientType.TO, new InternetAddress(recepient));
mineMessage.setSubject(subject, "UTF-8");
mineMessage.setContent(content, "plain/plain;charset=UTF-8");
This is the result in Spanish.
mimeMessage.setContent(mail.getBody(), "text/html; charset=UTF-8");
maybe iam wrong, but this work for me. :) any ööö, äää, üüü character will shown correctly in my outlook.
outlook screenshot
I know I'm late to this question, but I had a similar problem just now.
It may be worth it to check your source encodings too! I was using a test class, with hardcoded subject/text containing some special characters, which kept coming garbled when sending the email. Even though I had set the charset UTF-8 wherever applicable (mimeMessage.setSubject(subject, charset), mimeMessage.setContent(content, "text/plain; charset=UTF-8")).
Then I noted that the source encoding of this class was windows-1252. From my understanding, when a java file is compiled, any source texts are converted to UTF-8. But in this case, in the maven pom.xml for this project, the project.build.sourceEncoding property was missing - so I'm actually not sure which encoding maven was using during compilation (by default) since none was specified.
Changing the source encoding was not possible here, but as soon as I changed the special characters to Unicode code literals (e.g. "ü" to "\u00fc"), the whole thing worked fine.
Maybe is too later, but there is a very simple method to fix this problem.
Just call this constructor to create a MimeMessageHelper that encode UTF-8 as we escpect:
MimeMessage **mimeMessage** = mailSender.createMimeMessage();
MimeMessageHelper **helper** = new MimeMessageHelper(mimeMessage, false(or true if you want include Multipart), "UTF-8");
No more actions are needed, continue the mail sending flow as you wish.

Java mail PDF attachment not working

I am generating a PDF and trying to attach it to a mail as well as download it from browser using java. Download from browser works fine, but attaching to mail is where I am facing an issue. The file is attached. Attachment name and size of the file are intact. The problem is when I open the PDF from mail attachment, it shows nothing. correct number of pages with no content. When I attach the file downloaded from browser by hardcoding, it works fine. So I suppose the problem is not with the PDF generation. I tried opening both(one downloaded from browser and the other downloaded from mail) the files using comparing tool beyond compare. The one downloaded from mail shows conversion error. When I open with notepad++, both show different encoding. I not very familiar with these encoding thing. I suppose it is something to do with encoding.
I also observed that the content in mail download is same as the one at PDF generation. But the one at browser download is different.
An excerpt of what I get on browser download is as below(The content is too large to paste)
%PDF-1.4
%âãÏÓ
4 0 obj <</Type/XObject/ColorSpace/DeviceRGB/Subtype/Image/BitsPerComponent 8/Width 193/Length 11222/Height 58/Filter/DCTDecode>>stream
ÿØÿà
An excerpt of what I get on mail download is as below
%PDF-1.4
%????
4 0 obj <</Type/XObject/ColorSpace/DeviceRGB/Subtype/Image/BitsPerComponent 8/Width 193/Length 11222/Height 58/Filter/DCTDecode>>stream
????
I am using Spring MimeMessageHelper to send the message. I am using the below method to add attachment
MimeMessageHelper.addAttachment(fileName, new ByteArrayResource(attachmentContent.getBytes()), "application/pdf");
I've also tried another way of attaching but in vain
DataSource dataSource = new ByteArrayDataSource(bytes, "application/pdf");
MimeBodyPart pdfBodyPart = new MimeBodyPart();
pdfBodyPart.addHeader("Content-Type", "application/pdf;charset=UTF-8");
pdfBodyPart.addHeader("Content-disposition", "attachment; filename="+fileName);
pdfBodyPart.setDataHandler(new DataHandler(dataSource));
pdfBodyPart.setFileName(fileName);
mimeMessageHelper.getMimeMultipart().addBodyPart(pdfBodyPart);
Any help would be greatly appreciated. Thanks in advance
I'm not sure if this has anything to do with it but I noticed you're not setting the actual charset in pdfBodyPart.addHeader("Content-Type", "application/pdf;charset");, nor are you calling attachmentContent.getBytes() with a charset as parameter. How is it supposed to know which one you want to use?
What Content-Transfer-Encoding is being used for the attachment in the message you receive? Normally JavaMail will choose an appropriate value, but if document contains an unusual mix of plain text and binary, as your document seems to, JavaMail may not choose the best encoding. You can try adding pdfBodyPart.setHeader("Content-Transfer-Encoding", "base64");
I found out why it was'nt working. It is an encoding issue but nothing to do with MimeMessageHelper. The problem was I generated the PDF to an OutputStream and converted it to String and then converted it into byte array. When I converted to it to String the encoding changed resulting in the issue. So i fixed it by getting byte array from outputStream :)

How to print "rājshāhi" to the Eclipse output console?

I have tried the following:
System.out.println("rājshāhi");
new PrintWriter(new OutputStreamWriter(System.out), true).println("rājshāhi");
new PrintWriter(new OutputStreamWriter(System.out, "UTF-8"), true).println("rājshāhi");
new PrintWriter(new OutputStreamWriter(System.out, "ISO-8859-1"), true).println("rājshāhi");
Which yields the following output:
r?jsh?hi
r?jsh?hi
rÄ?jshÄ?hi
r?jsh?hi
So, what am I doing wrong?
Thanks.
P.S.
I am using Eclipse Indigo on Windows 7. The output goes to the Eclipse output console.
The java file must be encoded correctly. Look in the properties for that file, and set the encoding correctly:
What you did should work, even the simple System.out.println if you have a recent version of eclipse.
Look at the following:
The version of eclipse you are using
Whether the file is encoded correctly. See #Matthew's answer. I assume this would be the case because otherwise eclipse wouldn't allow you to save the file (would warn "unsupported characters")
The font for the console (Windows -> Preferences -> Fonts -> Default Console Font)
When you save the text to a file whether you get the characters correctly
Actually, copying your code and running it on my computer gave me the following output:
rājshāhi
rājshāhi
rājshāhi
r?jsh?hi
It looks like all lines work except the last one. Get your System default character set (see this answer). Mine is UTF-8. See if changing your default character set makes a difference.
Either of the following lines will get your default character set:
System.out.println(System.getProperty("file.encoding"));
System.out.println(Charset.defaultCharset());
To change the default encoding, see this answer.
Make sure when you are creating your class Assign the Text file Encoding Value UTF-8.
Once a class is created with any other Text File Encoding later on you can't change the Encoding syle even though eclipse will allow you it won't reflect.
So create a new class with TextFile Encoding UTF 8.It will work definitely.
EDIT: In your case though you are trying to assing Text File encoding programatically it is not making any impact it is taking the container inherited encoding (Cp1252)
Using Latest Eclipse version helped me to achive UTF-8 encoding on console
I used Luna Version of Eclipse and set Properties->Info->Others->UTF-8

Categories