Java mail vs. MySQL: probably a character encoding issue? - java

I'm currently fetching data from a MySQL database using JDBC and executeQuery. One of the fields contains the email content, which I fetch via ResultSet.getString("emailBody").
The mail is sent using the following code (simplified):
Properties props = new Properties();
Session session;
Message message;
props.put("mail.smtp.host", "mysmtpserver");
session = Session.getInstance(props, null);
message = new MimeMessage(session);
message.setFrom(new InternetAddress("myaddress#example.com", "System");
message.setSubject("Automatic notification");
message.setRecipient(RecipientType.BCC,
new InternetAddress("admin#example.com", "Admin Distribution List"));
// email contains the previously fetched value
message.setContent(email, "text/plain");
Transport.send(message);
This works fine for all characters, including german umlaute, brackets, etc. Unfortunately the following characters fail:
– which is displayed as ? on the mail clients
" which becomes \"
' which is sent as \'
I couldn't find anything useful on the web, please advise. Many thanks!

Your mail is probably send encoded as iso-8859-1, which does not include the codepoint for en-dash. You could try to specify the charset as utf-8 in the setContent call:
message.setContent(email, "text/plain; charset=utf-8");
This does however not explain the problem with quotes you are seeing, but I guess these are actually two different problems.

" turing into \" and ' turining into \' is escaping issue. During insert those values were escaped so it won't break sql insert query. During select you have to unescape them. (don't know specific java functions...)

The quotes problem is happening because at some point the strings are being escaped one too many times.
If you select the strings from the database manually, does this return quotes visually escaped? If so, you're escaping too many times before inserting into the database.

Related

Filter Special Characters in Spring / Java

I'm using jsoup to get all text from websites.
Document doc = Jsoup.connect("URL").get();
String allText doc.text().toLowerCase();
Then I'm using Hibernate to persist the object that holds all text to a MySQL DB:
...
#Column(name="all_text")
#Lob
private String allText = null;
...
Everything is good so far. Only that sometimes I get a MySQL error when I try to save the object with allText:
java.sql.SQLException: Incorrect string value: '\xF0\x9F\x98\x8A s...' for column 'all_text' at row 1
Already looked this up and it's an encoding error. Probably have some special characters on their websites. I found a way to fix this by changing the encoding in the DB.
But my actual question is: what's the best way to filter and remove the special characters from the allText string and not persist them at all?
EDIT: To clarify, by special characters I mean Emoticons and all that stuff. Definitely anything that doesn't fit into UTF-8 encoding. I'm not concerned about ~ ^ etc...
Thanks in advance!
Just use regex:
allText.replaceAll("\\p{C}", "");
Don't forget to import java.util.regexPattern

Java: Search in a wrong encoded String without modifying it

I have to find a user-defined String in a Document (using Java), which is stored in a database in a BLOB. When I search a String with special characters ("Umlaute", äöü etc.), it failes, meaning it does not return any positions at all. And I am not allowed to convert the document's content into UTF-8 (which would have fixed this problem but raised a new, even bigger one).
Some additional information:
The document's content is returned as String in "ISO-8859-1" (Latin1).
Here is an example, what a String could look like:
Die Erkenntnis, daà der Künstler Schutz braucht, ...
This is how it should look like:
Die Erkenntnis, daß der Künstler Schutz braucht, ...
If I am searching for Künstler it would fail to find it, because it looks for ü but only finds ü.
Is it possible to convert Künstler into Künstler so I can search for the wrong encoded version instead?
Note:
We are using the Hibernate Framework for Database access. The original Getter for the Document's Content returns a byte[]. The String is than returned by calling
new String(getContent(), "ISO-8859-1")
The problem here is, that I cannot change this to UTF-8, because it would then mess up the rest of our application which is based on a third party application that delivers data this way.
Okay, looks like I've found a way to mess up the encoding on purpose.
new String("Künstler".getBytes("UTF-8"), "ISO-8859-1")
By getting the Bytes of the String Künstler in UTF-8 and then creating a new String, telling Java that this is Latin1, it converts to Künstler. It's a hell of a hack but seems to work well.
Already answered by yourself.
An altoghether different approach:
If you can search the blob, you could search using
"SELECT .. FROM ... WHERE"
+ " ... LIKE '%" + key.replaceAll("\\P{Ascii}+", "%") + "%'"
This replaces non-ASCII sequences by the % wildcard: UTF-8 multibyte sequences are non-ASCII by design.

xmlhttp.setRequestHeader not working

this is my code
// assume var data has japanese characters
xmlhttp.open("POST","adminUpdate?&value="+data,true); // tried GET as well
xmlhttp.setRequestHeader("Content-Type", "text/plain;charset=UTF-8");
xmlhttp.send();
if I insert alert(data) then i can see japanese characters perfectly fine.
But on the server side (servlet class) when I add this code :
String query = request.getParameter("value");
system.out.println(query)
Now I see garbage value ??????
Ok so I added this line server side :
System.out.println("content type : "+ request.getContentType());
and I got this : text/plain;charset=UTF-8
So now my question is if the encoding is set correctly then why I cant see Japanese characters
One option is to send the query parameters as part of the request body and have the content type set to application/x-www-form-urlencoded.
Then, before getting the parameter, set the request's content character encoding
request.setCharacterEncoding("UTF-8");
String query = request.getParameter("value");
Note that wherever you're printing the query value has to be able to display UTF-8 encoded characters.

How LDAP search filter string accepts space?

I feel a bit nervous because this is my first question here at Stack Overflow. Please let me know if I am not doing it in a good manner.
In LDAP, I think the following search filter string works.
( & (uid=tt4cs) (objectClass=inetOrgPerson) )
It means searching for entries, one of whose uid is tt4cs and one of whose objectClass is inetOrgPerson.
Please note that there are spaces between every parenthesis and ampersand, which will just be ignored. But, as far as I read RFC4515, I can find no implication that allows any space that way. Could anybody kindly tell me whether it is allowed by any other standards or it is just so by convention?
Update on Jan 13, 2014
I have tested it in three ways. (LDAP server in my environment is OpenLDAP 2.4.38)
(1) Do ldapsearch on command line. The above search filter works and gets a result.
(2) Search by using UnboundID LDAP SDK for Java. This API does not send the search request to the server, but throws an exception that says "Unexpected closing parenthesis found at position 15 of the filter string."
String filter = "( & (uid=tt4cs) (objectClass=inetOrgPerson) )";
SearchResult searchResult
= connection.search("dc=localdomain", SearchScope.SUB, filter);
(3) Search by using Apache Directory LDAP API. This API does not send the search request to the server, but throws an exception that says "The filter ( & (uid=tt4cs) (objectClass=inetOrgPerson) ) is invalid."
String filter = "( & (uid=tt4cs) (objectClass=inetOrgPerson) )";
EntryCursor cursor
= connection.search("dc=localdomain", filter, SearchScope.SUBTREE);
Now I have a feeling that acceptance of the extra spaces may probably be an implementation-dependent behavior, and that it is better to avoid it.

String .replaceAll() unexpected behaviors while getting data from Sqlite database

I am wondering with this behavior. In my application I am getting data from server , or my own created database. ( I clone server database)
.replaceAll ( "\r\n" , "<br/>" ) ;
When the data is come from server that it replace. But When data is get from sqlite database its unable to replace the above. As I have try .replaceAll ( "a" , "??" ) ; and its working.
The database data is
Bradley Ambrose is the freelance cameraman who recorded the John Key and John Banks tea meeting.\r\n\r\nHe intentionally placed a black bag with a recording device on the table where Key and Banks were sitting, although he claims it was a mistake, If that were true then how did so many people get a copy of it???\r\n\r\nAlso this guy bloody changed his name from Brad White what the hell is this guy an international man of mystery or something.
I have also debug that issue in detail. But the is not replaced even code is executed the above line successfully.
I have also try
replaceAll ( "\n" , "<br/>" )
replaceAll ( "\r" , "<br/>" )
There is debugging picture.
Does the input string contain actual CR and LF characters or pairs of \ and r and \ and n?
The regex won't work in latter case. It would require .replaceAll("\\\\r\\\\n" , "<br/>")
Can you try with Pattern#quote() ?
Something like:
System.out.println("hello\r\n\r\n something".replaceAll(Pattern.quote("\r\n"), ""));
The code is fine. The data you are seeing in the debug screen is wrong. Do the same debug session and insert a system.out.println and check the output with the output in the debug screen.
Unless you you mean the database actually has the string "\r\n". The above assumes that the database actually contains the carrige return and line feed characters. If your database actually has the backslash character followed by the 'n' character then your regex needs a simple tweak. s.replaceAll("\\\\r\\\\n", "")

Categories