Keeping prepared statements aside, Alternatively I want to "stay safe" from sql injection in java...
I thought of doing this (a htmlentity conversion) ?
suspectedInputvariable.replace("'","'")
.replace(";","ŧ")
.replace("\"",""");
is suspectedInputvariable now safe to be embedded with a sql query?
First, why would you want to do such a thing? The driver knows how to safely treat strings. Just use a PreparedStatement.
Second, you have to escape \ and some other characters, too. If you handle all the characters listed here your code should be reasonably safe with MySQL: http://dev.mysql.com/doc/refman/4.1/en/mysql-real-escape-string.html The list of characters for other databases may differ.
is suspectedInputvariable now safe to be embedded with a sql query?
Probably not. There are all kinds of little-known features in various SQL dialects that could be used to circumvent this blacklist.
Just use prepared statements. Period.
Related
How to compose SQL(MySQL) like PreparedStatement, such as escaping, to avoid SQL injection and genearte safe SQL statement.
Is there any present JavaCode to do this?
Real scenario:
Frontend input as column value to compose a safe SQL (the "where" part), the specified in the backend are table name and column name.
There are some SQL builders, and in general they keep track of all parameters and build a PreparedStatement. It might even be an idea to not only provide parameter values, but parameter names, so one may use it as a real PreparedStatement.
You may create a JdbcTemplate from Spring. Or Criteria API.
If you want to build your own, as research, you might explore escaping too.
Then (research) also consider barring Unicode bidi hacking with LTR (\u200E) and RTL (\u200F): by using a Right-To-Left control one can obfuscate the SQL looking seemingly okay in the editor, but doing something maliciously different. You could require that the characters may not appear in a string but must be escaped too: \\u200F. (However it is something for nerds or insiders, and your SQL must already be at a sensitive spot.)
The SQL dialect is important; backtick (MySQL) or double quotes (Standard) for names etcetera.
There is Apache's commonlang escapeSql.
I have the unfortunate situation where I have to build up a SQL string by concatenating strings - the classic SQL injection scenario. I can't use prepared statements.
If I escape the ' character am I safe? Or are there other attack vectors?
I'm using MyBatis and it's ${} notation (vs #{} that generates prepared statements). I have no choice with this - it has to be ${}. I can't use prepared statements.
EDIT:
To add a little more clarity; it's an ASW Redshift UNLOAD command. The first parameter for UNLOAD is a SQL string.
(Given that you cannot do it the correct way because of restrictions in Redshift):
On PostgreSQL with standard_conforming_strings set to on all you need to do is double quotes, making ' into ''. That's it.
Backslashes aren't significant unless standard_conforming_strings is off or you use an E'' string. If either of those things are true then you have to do backslash escaping instead.
As Redshift is based on a fork of an ancient PostgreSQL version I don't know for sure how this applies to it. Reading the documentation on its lexical structure and syntax would be wise, to verify that it is consistent with how PostgreSQL works.
PreparedStatement (Wikipedia) really is the way to go. In one fell swoop you eliminate a big pile of work and risk regarding SQL Injection hackers.
If you absolutely can’t/won’t use PreparedStatement, then you need to read about various strategies. You'll have to write a bunch of checks to examine and modify your inputs and SQL. No silver bullet. (Well, actually, PreparedStatement is your silver bullet. But no other silver bullet.)
Google for items like "sanitize sql input". You will find resources such as:
Bobby-Tables.com (which tells you to use PreparedStatement).
Mitigation section of Wikipedia page on SQL Injection.
Article, Prevent Web Attacks Using Input Sanitization.
Article, How to prevent SQL injection attacks?, that explains with examples how sanitizing input is not enough, and recommends using … yes, you guessed it: PreparedStatement.
Is there a built-in method to escape a string for SQL? I would use setString, but it happens I am using setString multiple times in the same combined SQL statement and it would be better performance (I think) if the escape happened only once instead of each time I say setString. If I had the escaped string in a variable, I could re-use it.
Is there no way to do this in Java?
Current method, multi-source search. In reality they are three entirely different where statements including joins, but for this example I will just show the same where for each table.
String q = '%' + request.getParameter("search") + '%';
PreparedStatement s = s("SELECT a,b,c FROM table1 where a = ? UNION select a,b,c from table2 where a = ? UNION select a,b,c FROM table3 where a = ?");
s.setString(1, q);
s.setString(2, q);
s.setString(3, q);
ResultSet r = s.executeQuery();
I know this is not a big deal, but I like to make things efficient and also there are situations where it is more readable to use " + quote(s) + " instead of ? and then somewhere down the line you find setString.
If you use setString for a parameter (e.g. PreparedStatement.setString), there may well be no actual escaping required - it's likely that the data will be passed separately from the SQL itself, in a way that doesn't require escaping.
Do you have any concrete indication that this really is a performance bottleneck? It seems very unlikely that within a database query, the expensive part is setting the parameters locally...
Short answer: I wouldn't bother. It's best to do escaping at the last popssible moment. When you try to escape a string early and keep it around, it becomes much more difficult to verify that all strings have been escaped exactly once. (Escaping a string twice is almost as bad as not escaping it at all!) I've seen plenty of programs that try to escape strings early and then run into trouble because they need to update the string and then the programmer forgets to re-do the escape, or they update the escaped version of the string, or they have four strings and they escape three of them, etc. (I was just working on a bug where a programmer did HTML escapes on a string early, then decided he had to truncate the string to fit on a form, and ended up trying to output a string that ended with "&am". That is, he truncated his escape sequence so it was no longer valid.)
The CPU time to escape a string should be trivial. Unless you have a very large number of records or very big strings that are re-used, I doubt the savings would be worth worrying about. You'd probably be better off spending your time optimizing queries: saving a read of one record would probably be worth far more than eliminating 1000 trips through the string escape logic.
Longer answer: There's no built-in function. You could write one easily enough: Most flavors of SQL just need you to double any single quotes. You may need to also double backslashes or one or two other special characters. The fact that this can be different between SQL engines is one of the big arguments for using PreparedStatements and letting JDBC worry about it. (Personally I think there should be a JDbC function to do escaping that could then know any requirements specific to the DB engine. But there isn't so that's how it is.)
In any case, it's not clear how it would work with a PreparedStatement. There'd have to be some way to tell the PreparedStatement not to escape this string because it's already been escaped. And who really knows what's happening under the table in the conversation between JDBC and the DB engine: Maybe it never really escapes it at all, but passes it separately from the query. I suppose there could be an extra parameter on the setString that says "this string was pre-escaped", but that would add complexity and potential errors for very little gain.
Do not use org.apache.commons.lang.StringEscapeUtils.escapeSql(yourUnscapedSQL);
It does not escape characters like \
You can use StringEscapeUtils from Apache commons:
org.apache.commons.lang.StringEscapeUtils.escapeSql(yourUnscapedSQL);
Is there a Java library for escaping special characters from a string that is going to be inserted into an SQL query.
I keep writing code to escape various things, but I keep finding some new issue trips me up. So a library that takes care of all or most of the possibilities would be very handy.
EDIT: I am using MySQL (if that makes any difference).
Well... jdbc. Pass the strings as parameters, and don't append them to the query string
A little bit more research points me to this:
http://devwar.blogspot.com/2010/06/how-to-escape-special-characters-in.html
Which suggests to use apache.commons.lang.StringEscapeUtils, I will try this out
I know this is a long time thread, but using the commonslang library there is a method called escapeSql(String). Also using prepared statement automatically escape the offending SQL character.
I'm writing a java class which would be invoked by a servlet filter and which checks for injection attack attempts and XSS for a java web application based on Struts. The InjectionAttackChecker class uses regex & java.util.regex.Pattern class to validate the input against the patterns specified in regex.
With that said, I have following questions:
What all special characters and character patterns (for example <>, ., --, <=, ==,>=) should be blocked so that injection attack could be prevented.
Is there any existing regex pattern which I could use as is?
I have to allow some of the special character patterns in some specific cases, some example values (to be allowed) are (used 'pipe' | character as a separator of different values) *Atlanta | #654,BLDG 8 #501 | Herpes simplex: chronic ulcer(s) (>1 mo. duration) or bronchitis, pneumonitis, or esophagitis | FUNC & COMP(date_cmp), "NDI & MALKP & HARS_IN(icd10, yes)" . What strategy should I adopt so that injection attack and XSS could be prevented but still allowing these character patterns.
I hope I have mentioned the question clearly. But if I didn't, I apologize as its just my 2nd question. Please let me know if any clarification is needed.
Based on your questions I am assuming you are attempting to filtering bad values. I personally feel that this method can get very complex very quickly and would recommend encoding values as an alternate method. Here is an IBM article on the subject that lays out the pros and cons of both methods, http://www.ibm.com/developerworks/tivoli/library/s-csscript/.
To avoid SQL injection attacks just use prepared statements instead of creating SQL strings.
If you attempt to sanitize all the data on input, you're going to have a very difficult time of it. There are tons of tricks involving character encoding and such that will allow people to circumvent your filters. This impressive list is only some of the myriad things that can be done as SQL injections. You've also got to prevent HTML injection, JS injection, and potentially others. The only sure way of doing this is to encode the data where it is used in your application. Encode all the output you write to your web site, encode all of your SQL parameters. Be especially careful with the latter, as normal encoding will not work for non-string SQL parameters, as explained in that link. Use parameterized queries to be completely safe. Also note that you could theoretically encode your data at the time the user enters it and store it encoded in the database, but that only works if you're always going to be using the data in ways that use that type of encoding (i.e. HTML encoding if it will only ever be used with HTML; if it's used in SQL, you're not going to be protected). This is partially why the rule of thumb is to never store encoded data in the database and always encode on use.
Validating and binding all data is a must. Perform both client-side and server-side validatation, because 10% of people turn off JavaScript in their browsers.
Jeff Atwood has a nice blog about the topic that gives you a flavor for its complexity.
Here's a pretty extensive article on that very subject.
I don't think you'll have a holy grail here though. I would also suggest trying to encode/decode the received text in some standard ways (uuencode, base64)
don't filter or block values.
you should ensure that when combining bits of text you do the proper type conversions :) ie: if you have a piece a string which is type HTML and a string which is type TEXT you should convert TEXT to HTML instead of blindly concatenating them. in haskell you can conveniently enforce this with the type system.
good html templating languages will escape by default. if you are generating XML/HTML then sometimes it is better to use DOM tools than a templating language. if you use a DOM tool then it removes a lot of these issues. unfortunately, DOM tool is usually crap compared to templating :)
if you take strings of type HTML from users you should sanitize it with a library to remove all not-good tags/attributes. there are lots of good whitelist html filters out there.
you should always use parameterized queries. ALWAYS! if you have to build up queries dynamically then build them up dynamically with parameters. don't ever combine non-SQL typed strings with SQL typed strings.
Take a look at the AntiSamy project [www.owasp.org]. I think it is exactly what you want; you can setup a filter to block certain tags. They also supply policy templates, the slashdot policy would be a good start, then add on the tags you require.
Also, there is a wealth of knowledge on the www.osasp.org website about securing your application.
What user 'nemo' says about using prepared statements and encoding should also be performed.