Java PreparedStatement setString changes characters - java

As in title: to be sure, I was debugging my application, and so in line, where I put strings into PreparedStatement variable, special characters are changing to "?". I actually don't know where to search for things that should repair it, so I don't know if code is required.. Anyway, I'll put some here:
PreparedStatement stm = null;
String sql = "";
try{
sql = "INSERT INTO methods (name, description) VALUES (?, ?)";
stm = connection.prepareStatement(sql);
stm.setString(1, method.getName());
stm.setString(2, method.getDescription());
//...
}catch(Exception e){}
while debugging 'name' field was correct in method object, but after adding it into stm variable, it changed it's characters to '?'.
I have found one topic about the similar sitoatuin on SO, but there wasn't any answer that could help me since I exactely know that there is something not right in adding string to statement, not in database. But I don't know what..
Any sugestions?
PS. I'm using netbeans 6.7.1 version
EDIT: I was debugging with standard netbeans debugger, and was checking state of variables before adding strings to 'stm' variable. I was even changing getName() method to static string with special characters. So for sure everything is ok with Method class.
EDIT2: I've made one more test. Checked stm variable and one of it's properties is "charEncoding" which is set to "cp1252". So the main question is.. how to change that?

this normally happens by using different charsets in different locations. sound like you're getting your input as UTF-8, converting it to another chatset (maybe your database is set to something else) which breaks the special character.
to fix this: use the same charset everywhere*. (i would recommend using UTF-8)
*take a look at this or my answer to another thread (that's about a problem in php, but in java it's almost the same)

Sounds like a character encoding issue to me. Perhaps the driver is transcoding your strings into the appropriate encoding for the field/table/schema/database rather than letting the server do it? If you are trying to store a character which has no representation in the encoding of the field/table/schema/database, that would explain the '?' characters.

Are you using Oracle? I have had similar situations, if the environment variables regarding character sets weren't defined correctly.
By default, an Oracle connection is ASCII (7-bit characters, A-Z, a-z, numbers, punctuation, ...). If you use any character outside of that (e.g. European accents, Chinese characters, ..) then you need to use something other than ASCII. UTF-8 is best. If you don't, your characters will get replaced by "?".
You'd need to get your sysadmin to set this up for you. Alternatively take a look here:
http://arjudba.blogspot.com/2009/02/what-is-nlslang-environmental-variable.html

Related

PreparedStatement IN clause Regexp alternative?

I came across the same issue as the author of this question (PreparedStatement IN clause alternatives?), and wondered if using mysql's REGEXP would be an elegant way of getting the same functionality of IN while using only one PreparedStatement for varying number of values to match? Some example SQL here to show what I am talking about:
SELECT first_name, last_name
FROM people
WHERE first_name REGEXP ?
Multiple values could be supplied using a string like "Robert|Janice|Michael". I did not see REGEXP mentioned anywhere in that post.
Technically, yes, it is an alternative.
Note, however, that using a regex for matching is less efficient that the in operator ; it incurs more work for the database, that needs to initialize the regex engine, and run it against each and every value (it cannot take advantage of an index).You might not notice it on small volumes, but as your data grows larger this might become an issue. So I would not recommend that as a general solution: instead, just write a few more code lines in your application to properly use the in operator, and use regexes only where they are truly needed.
Aside: if you want to match the entire string, as in does, you need to surround the list of values with ^ and $, so the equivalent for:
first_name in ('Robert', 'Janice', 'Michael')
Would be:
first name regexp '^(Robert|Janice|Michael)$'
Another approach:
FIND_IN_SET(name, 'Robert,Janice,Michael')
Yes, that could be substituted in. But it must be a commalist of the desired values. This also works for FIND_IN_SET(foo, '1,123,45'). Note that 12 will not match.

Can't have the correct UTF-8 charset in php/java/mysql

I am developing an app with JAVA and PHP.
I call to the php file from the JAVA class, the php executes queries in the ddbb and returns the result. The problem is that it doesn't return the characters correctly. I want to use UTF-8, I have in the beginning of my php file this:
header("Content-type: text/html; charset=utf-8");
but it doesn`t work. If I put a word directly with strange characters, and I do echo, the word appears correctly, but if I took them from the ddbb, it doesn't.
The functions mb_internal_encoding returns ISO-8859-1 but if I set it to utf-8 doesnt work either
I don't know if I explained correctly the problem, sorry for my english.
Thanks
You should prefer the following method if you have php 5.3.6+
$pdo = new PDO("mysql:host=localhost;dbname=world;charset=utf8", 'my_user', 'my_pass');
The reason is that otherwise, string escaping won't be handled properly. This is important because php still defaults to emulating prepared statements for mysql, so, it very heavily relies on the internal real_escape_string being able to identify the proper character set. Setting the character set via "SET NAMES ..." will not update the internal character set setting that is used by real_escape_string.
summary: you may be vulnerable to sql injection attacks if you don't use the above method. I say may, because it depends on the initial and final charsets.
alternatively, you can configure pdo to not emulate prepared statements.
reference:
http://www.php.net/manual/en/mysqlinfo.concepts.charset.php
In your PDO try this:
$dbInfos = array(PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES 'utf8'");
$this->connection = new PDO($this->dsn, $this->username, $this->password, $dbInfos);

Escaping issue with MySQL JDBC connector

So I'm trying to input blog comments into a database for an NLP experiment but I'm having some issues: I'm using prepare statements on the inserts but all the single quotes are turning into question marks.
I'm testing on OS X and don't know the character encoding: I assume it's default isn_swedish, etc, but after a few hours of scattered Googling I haven't been able to figure out how to determine it. I'm submitting something like "I didn't say that" as a param to
PreparedStatement statement = connect.prepareStatement("INSERT IGNORE INTO bwog.article (article_id, date, title, content, url) VALUES (?, ?, ?, ?, ?)");
...
...
String s = "I didn't say that"; //not literal string, but printlns like this
statment.setString(4, s);
and it's turning into "I didn?t say that" in the database after execution and all that.
I assume it's some kind of assumption issue where I didn't know about or forgot to fulfill some precondition.
SOLUTION: It was character encoding. Database and tables were in UTF-8 but command line connection was in latin1 for all the "character_set%" variables, so even though the data was fine it appeared garbled.
In order to remove this from the "Unanswered" filter...
Prediction: Your problem is character encoding. I bet your database and tables are in UTF-8 but your command line connection is in latin1 for all the "character_set%" variables, so even though the data is fine it appears garbled.

Java SQL Escape without using setString

Is there a built-in method to escape a string for SQL? I would use setString, but it happens I am using setString multiple times in the same combined SQL statement and it would be better performance (I think) if the escape happened only once instead of each time I say setString. If I had the escaped string in a variable, I could re-use it.
Is there no way to do this in Java?
Current method, multi-source search. In reality they are three entirely different where statements including joins, but for this example I will just show the same where for each table.
String q = '%' + request.getParameter("search") + '%';
PreparedStatement s = s("SELECT a,b,c FROM table1 where a = ? UNION select a,b,c from table2 where a = ? UNION select a,b,c FROM table3 where a = ?");
s.setString(1, q);
s.setString(2, q);
s.setString(3, q);
ResultSet r = s.executeQuery();
I know this is not a big deal, but I like to make things efficient and also there are situations where it is more readable to use " + quote(s) + " instead of ? and then somewhere down the line you find setString.
If you use setString for a parameter (e.g. PreparedStatement.setString), there may well be no actual escaping required - it's likely that the data will be passed separately from the SQL itself, in a way that doesn't require escaping.
Do you have any concrete indication that this really is a performance bottleneck? It seems very unlikely that within a database query, the expensive part is setting the parameters locally...
Short answer: I wouldn't bother. It's best to do escaping at the last popssible moment. When you try to escape a string early and keep it around, it becomes much more difficult to verify that all strings have been escaped exactly once. (Escaping a string twice is almost as bad as not escaping it at all!) I've seen plenty of programs that try to escape strings early and then run into trouble because they need to update the string and then the programmer forgets to re-do the escape, or they update the escaped version of the string, or they have four strings and they escape three of them, etc. (I was just working on a bug where a programmer did HTML escapes on a string early, then decided he had to truncate the string to fit on a form, and ended up trying to output a string that ended with "&am". That is, he truncated his escape sequence so it was no longer valid.)
The CPU time to escape a string should be trivial. Unless you have a very large number of records or very big strings that are re-used, I doubt the savings would be worth worrying about. You'd probably be better off spending your time optimizing queries: saving a read of one record would probably be worth far more than eliminating 1000 trips through the string escape logic.
Longer answer: There's no built-in function. You could write one easily enough: Most flavors of SQL just need you to double any single quotes. You may need to also double backslashes or one or two other special characters. The fact that this can be different between SQL engines is one of the big arguments for using PreparedStatements and letting JDBC worry about it. (Personally I think there should be a JDbC function to do escaping that could then know any requirements specific to the DB engine. But there isn't so that's how it is.)
In any case, it's not clear how it would work with a PreparedStatement. There'd have to be some way to tell the PreparedStatement not to escape this string because it's already been escaped. And who really knows what's happening under the table in the conversation between JDBC and the DB engine: Maybe it never really escapes it at all, but passes it separately from the query. I suppose there could be an extra parameter on the setString that says "this string was pre-escaped", but that would add complexity and potential errors for very little gain.
Do not use org.apache.commons.lang.StringEscapeUtils.escapeSql(yourUnscapedSQL);
It does not escape characters like \
You can use StringEscapeUtils from Apache commons:
org.apache.commons.lang.StringEscapeUtils.escapeSql(yourUnscapedSQL);

How to escape special characters used in SQL query?

Is there a Java library for escaping special characters from a string that is going to be inserted into an SQL query.
I keep writing code to escape various things, but I keep finding some new issue trips me up. So a library that takes care of all or most of the possibilities would be very handy.
EDIT: I am using MySQL (if that makes any difference).
Well... jdbc. Pass the strings as parameters, and don't append them to the query string
A little bit more research points me to this:
http://devwar.blogspot.com/2010/06/how-to-escape-special-characters-in.html
Which suggests to use apache.commons.lang.StringEscapeUtils, I will try this out
I know this is a long time thread, but using the commonslang library there is a method called escapeSql(String). Also using prepared statement automatically escape the offending SQL character.

Categories