Doing Java String replacement efficiently - java

We have the following code :
String templateQuery = "select * from my_table where col1=$1 or col2 like '%$2.$1'";
String tmp = templateQuery;
for(int i=1;i<=maxCols;i++) {
tmp = tmp.replaceAll("\\$"+i, data[i-1]);
}
This code works fine as maxCols never exceeds 10. But my colleague disagree with me stating that this code consumes too much memory. Can you help us ?
EDIT:
I have change the initial templateQuery with a much realistic one. Secondly, templateQuery can potentially be a big string.
EDIT 2:
Thanks for those who have pointed out the SQLInjection problem.

Don't do this.
Not for performance reasons (which will be miniscule compared with the cost of the database query), but to avoid SQL injection attacks. What happens if data[0] is actually the string
' OR 'x' = 'x
?
Then you'll end up with a SQL statement of:
SELECT * FROM my_table WHERE col1='' OR 'x' = 'x'
which I think we can agree isn't what you wanted.
Use a parameterized SQL statement instead (PreparedStatement) and get the database driver to send the parameter values separately.
EDIT: In other comments, the OP has specified that the template string can be quite long, and some parameters may actually involve multiple initial values combined together. I still say that the cost of replacement is likely to be insignificant in the grand scheme of things, and I still say that PreparedStatement is the way to go. You should perform whatever combining operations you need to on the input before setting them as the values for the PreparedStatement - so the template may need the SQL with SQL placeholders, and then "subtemplates" to work out how to get from your input to the parameters for the PreparedStatement. Whatever you do, putting the values directly into the SQL is the wrong approach.

Why aren't you just using a PreparedStatement with replacement parameters?
String templateQuery = "SELECT * FROM my_table WHERE col1 = ?";
PreparedStatement ps = con.prepareStatement(templateQuery);
for (int i = 0; i < data.length; i++) {
ps.setString(i + 1, data[i]);
}
ResultSet rs = ps.executeQuery();
You're otherwise vulnerable to SQL injection if you use string replacement like you have.

He is correct, because you create maxCols tmp Strings.
I realized that it is for Sql commands, if is it, why you do not use PreparedStatement (http://download.oracle.com/javase/1.4.2/docs/api/java/sql/PreparedStatement.html) for this task?
Also, for formatting strings, rather than use substitute, use Formatter, it is much more elegant: http://download.oracle.com/javase/1.5.0/docs/api/java/util/Formatter.html

Whether this consumes too much memory is open to debate (what's "too much"?)
Nonetheless, for this kind of stuff you should use PreparedStatement. It allows you to do pretty much exactly what you're trying to achieve, but in a much cleaner fashion.

Your colleague is right in that every string replacement creates a new copy of the string. (However, the cost of these is probably negligible with less than 10 parameters.) Moreover, for every execution of this query the SQL engine needs to parse it afresh, which consumes far more additional resources each time.
The potential bigger problem though is that the code is suscept to SQL injection. If the input data is coming from an external source, a hacker can pass in a parameter such as "col1; drop table my_table;", effectively deleting your whole table.
All of these can be solved by using a PreparedStatement instead.

Related

Can this simple String escaping prevent any SQL Injections?

I'm working at a company where the person responsible for the database module is strictly against using prepared statements. I'm worrying that his implementation is not secure.
Here is the code we are currently using to make a SQL query (Java 8 Application with JDBC/MySQL 5.5):
String value = "Raw user input over HTTP-Form";
String sql = "SELECT * FROM db1.articles WHERE title like '" +
replaceSingleQuotes(value) + "'";
executeSQL(sql);
public static String replaceSingleQuotes(String value) {
value = value.replaceAll("\\\\", "\\\\\\\\");
return value.replaceAll("'", "\\\\'");
}
I was not able to come up with any injections but his solution seems very fishy to me. Can anyone point out any way how this escaping could be circumvented? He will not replace his code if I can't come up with anything and we have very sensitive information of thousands of customers in our application (banking).
Edit:
Unfortunately i can't show executeSQL() because there is a mess with a huge class hierarchy and everything is scattered. But it comes down to something like this:
String query = ... // query escaped with the above function
java.sql.Connection connection = ...
Statement stmt = connection.createStatement();
stmt.executeUpdate(query);
One method of attack would be by "loading" the attack.
First, you inject as user name, bank transfer message, whatever into it
transfer 0.01
to: 02020.020202.200202
name: johnny tables';drop table foobar --
will be escaped to
johnny tables\';drop table foobar --
So far so good. protection in effect. our attach failed. We try the loading attack.
Now we are going to make a scheduled payment order.
This is assuming a common error is made, that once inserted in the database, the value is "safe" because it has been checked once.
transfer 0.01
to: 02020.020202.200202
name: johnny tables';drop table foobar--
schedule: 1 day from now
Storing the order in the db
'johnny tables\';drop table foobar--'
will be stored as
johnny tables';drop table foobar--
Now at midnight the scheduler kicks in and starts iterating the scheduled payments
select name from scheduled where time > x and < y
so the bank code starts to chrunch
String name = result['name'];
String acct = result['acct'];
String amt = result['amt'];
string query = "insert into payment_process (name,acct,amt) values('"+name+"','"+acct+"','"+amt+"');
and boom, your table is dropped. *
When you go the manual route, you have to ensure all, each and every instance of the variable is escaped, that all unicode characters are accounted for, that all idiocrancies of the database engine are accounted for.
Also, using prepared statements can give a significant speed boost, because you don't have to rebuild queries. You can just build them once, store them in a cache and just swap out the parameters.
Especially when iterating large lists they are a godsent.
The root problem is that he probably doesn't understand prepared statements, doesn't get them how they work. Insecurity triggered can make aggressive and protective of a certain way, even fanatical, just to prevent to admit you just don't know how they work.
Try to talk to him about it, if he doesn't wish to listen to reason go to his manager and explain the issue, and that if the site/app gets hacked, it will be on the head of your co-worker and your manager, and tell him the risks are HUGE. Point to the recent hacks where a lot of money was stolen like the swift hack.
* May not actually work, dependent on actual query, joins, unions etc.. it's a very simplified example

Regex to Parse DB Columns like JDBC ResultSet

I am using JDBC to get columns from query results.
For example:
String query = "SELECT foo, bar AS barAlias FROM table";
PreparedStatement preparedStatement = connection.prepareStatement(query);
ResultSet resultSet = preparedStatement.executeQuery();
//Show my "foo" column results
while (resultSet.next()) {
log.debug(resultSet.getString("foo"));
}
I would like to parse my queries BEFORE running them. Essentially, I want to create an array for column labels that will match the value expected by the resultSet.get** method. For illustration purpose, I would like to replace the loop above by this and get the same results:
//Show my "foo" column results
while (resultSet.next()) {
log.debug(resultSet.getString(arrayOfColumns.get(0)));
}
This seems easy. I could parse my statement with a simple regex that takes the string between SELECT and FROM, creating groups using the column delimiter, and building arrayOfColumns from the groups. But JDBC has specific ways to deal with aliases for example. For the second column, this naive parser would return the entire "bar AS barAlias" where I believe that resultSet.get** expects "barAlias". It will tell me "The column name bar As barAlias is not valid."
Essentially, I would like to understand the JBDC column label parsing behavior in order to replicate it.
NOTE, I am aware that I can get columns by index from the resultSet object. That is not the objective here. Also, I do not want to use any JDBC related method (I understand that the preparedStatement metadata is available). Think of this as a theoretical questions where do not have a JDBC library and MUST use regular expressions.
The best way of obtaining the columns of the result set is to ask JDBC to do it for you. You do not need a ResultSet object for it - ResultSetMetaData would be enough.
PreparedStatement preparedStatement = connection.prepareStatement(query);
ResultSetMetaData metaData = preparedStatement.getMetaData();
for (int i = 1 ; i <= metaData.getColumnCount() ; i++) {
listOfColumns.add(metaData.getColumnLabel(i));
}
To get the column names that are associated with your ResultSet, you should consult its ResultSetMetaData. That's going to be both easier and more reliable than parsing the query text.
So, after having split on the delimiter (comma) to get column names between SELECT and FROM, check each one to see if it contains the string " AS " (you may need to account for upper and lowercase), then if present, split on that, and use the rightmost value.
While I agree with others you should probably use ResultSetMetaData in "ordinary" use-cases, you seem to insist that the query string itself should be parsed. Parsing SQL with regular expressions is brittle, and you should avoid doing that. Better use an off-the-shelf SQL parser, which can do the parsing for you. There are several, including:
jOOQ
JSqlParser
General SQL Parser
With jOOQ, you could just write this:
ctx.parser()
.parseSelect("SELECT a, b ...")
.getSelect()
.stream()
.forEach(f -> System.out.println(f.getName()));
Disclaimer: I work for the company behind jOOQ

JDBC - Best way for perfomance of a preparedstatement with optional criteria

I try to improve the performance of my application. For this, I replace the current statement by prepared-statement to use the oracle cash.
But several of my queries are dynamic (conditional criteria and tables) and I would like to know the best solution to reduce the execution time.
In stackoverflow, I found two solutions.
The first is to build query with if statement :
if(a.equals("Test")) {
whereClause.append ("COLUMN1 = ? ");
}
if(b.equals("Test2")) {
whereClause.append ("COLUMN2 = ? ");
} etc...
The second is to create one query with all criteria but to add "OR var is null"
where (? is null OR (COLUMN1 = ?)) AND (? is null OR (COLUMN2 = ?)) etc.
In your opinion what is the best way In your to performance knowing that a request may contain 40/50 criteria and some tables can be optional too. Or is there another solution.
The database that use is oracle with datasource and jdk 1.5.
Thank you in advance for your help.
First JDBC and Oracle are capable of handling and caching multiple prepared statements.
First approach:
if(a.equals("Test")) {
whereClause.append ("COLUMN1 = ? ");
}
if(b.equals("Test2")) {
whereClause.append ("COLUMN2 = ? ");
}
will certainly lead to more different statements and thous to more prepared statements, but each of this statements is optimizable at preparation-time as no variable is optional.
Second approach:
where (? is null OR (COLUMN1 = ?)) AND (? is null OR (COLUMN2 = ?)) etc.
will result in exactly one prepared statement, but during prepare the query-plan needs to include the possibility that all parameters are non-null.
So it really becomes a matter of benchmarking and how many optional parameters there are. If you have too many prepared statements one might not be cached (but it might also be, that it is so seldomly used, that this does not disturb overall performance).
I also don't know how much overhead the optional parameters in the second approach create even if they are evaluated for each line as the optimizer could not remove them.
Finally if you have indexes prepared that cover some combinations of your optional parameters the first approach should be faster as during prepare the possible indexes are evaluated and the best is choosen (query-plan).
The biggest wins in prepared statement is the query-plan (optimization) being done ahead of time. I, to be honest, doubt the optimization will be very big in case you have 20 criteria like that and only a few are actually used. Why don't you test it and see which is quickest?
Also, I think making a stored procedure out of this could be very advantageous.
I would use PreparedStatement with dynamic query without any additional logic. If a query repeats JDBC will find previously executed cached statement.
I agree with Evgeniy, PreparedStatements are the way to go. If you have many parameters, the upkeep is easier if you use named parameters instead of question marks. See: http://www.javaworld.com/article/2077706/core-java/named-parameters-for-preparedstatement.html. I have used this with about 20 criteria.

Numerical wildcard for jdbc prepareStatement?

I´m doing a query with a prepareStatement in JDBC. I would sometimes insert a "numerical woldcard" instead of an actual number.
Consider a query like this:
Domains: a int, b text;
pStatement =
dbConnection.prepareStatement("SELECT * FROM R1 WHERE a LIKE ? AND b LIKE ?");
Some times I would like to:
pStatement.setInt(1, 10);
pStatement.setString(2,"%");
pStatement.exequteQuery();
That is no problem, since the wildcard is a string.
Other times I would like to:
pStatement.setInt(1, ANY_INT_SHOULD_BE_VALID);
pStatement.setString(2, "Hello");
pStatement.exequteQuery();
That does not work. I could change the query and use i.e. "a <> 0" but that requires extra code and makes the use of a prepareStatement somewhat unnecessary.
Is there a way to solve this without changing the actual query, only the inserted values?
I could change the query and use i.e. "a <> 0"
For your requirement, this should be a IS NOT NULL.
but that requires extra code and makes the use of a prepareStatement somewhat unnecessary.
PreparedStatement is necessary for the VARCHAR column the most and that seems to be present in all scenarios.
pStatement.setString(2,"%");
This wouldn't match all the strings as LIKE operator is needed for % to take its special meaning.
Is there a way to solve this without changing the actual query, only the inserted values?
No, IMO, since the requirements are fundamentally different, you would need to have queries accordingly.

How to create a mutliple search SQL statement where all the parameters are optional?

I would like to know if there is any smart way of making a SQL statement for a search engine where there are 5 optional parameters. All parameters can be used or only one of them, or a mix of any of them.. This makes up to 3000+ different combinations.
The statement needs to be prepared to avoid SQL injections.
I've looked at this post, but it dosent quite cut.
What I'm looking for is something like,
String sql =SELECT * FROM table WHERE (optional1)=? AND (optional2)=? AND (optional3)=? AND (optional4)=? AND (optional5)=?
prepared.setString(1, optional1)
and so on...
Use your java code to add the options to the where clause based on the presence of your arguments (their length or existence, whichever). That way if the optional parameter is not needed, it won't even be part of your SQL expression. Simple.
#a1ex07 has given the answer for doing this as a single query. Using NULLs and checking for them in each condition.
WHERE
table.x = CASE WHEN #x IS NULL THEN table.x ELSE #x END
or...
WHERE
(#x IS NULL OR table.x = #x)
or...
WHERE
table.x = COALESCE(#x, table.x)
etc, etc.
There is one warning, however; As convenient as it is to make one query to do all of this, All of these answers are sub-optimal. Often they're horednous.
When you write ONE query, only ONE execution plan is created. And that ONE execution plan must be suitable for ALL possible combinations of values. But that fixes which indexes are searched, what order they're searched, etc. It yields the least worst plan for a one-size-fits-all query.
Instead, you're better adding the conditions as necessary. You still parameterise them, but you don't include a condition if you know the parameter is NULL.
This is a good link explaining it further, it's for MS SQL Server specifically but it's generally applicatble to any RDBMS that caches the plans after it compiles the SQL.
http://www.sommarskog.se/dyn-search.html
I believe it should work (haven't tested though)
SELECT * FROM table
WHERE
field1 = CASE
WHEN ? IS NULL THEN field1
ELSE ?
END AND
field2 = CASE
WHEN ? IS NULL THEN field2
ELSE ?
END AND .... etc
//java code
if ([optional1 is required])
{
prepared.setString(1, optional1) ;
prepared.setString(2, optional1) ;
}
else
{
prepared.setNull(1, java.sql.Types.VARCHAR) ;
prepared.setNull(2, java.sql.Types.VARCHAR) ;
}
etc.

Categories