Regex to Parse DB Columns like JDBC ResultSet - java

I am using JDBC to get columns from query results.
For example:
String query = "SELECT foo, bar AS barAlias FROM table";
PreparedStatement preparedStatement = connection.prepareStatement(query);
ResultSet resultSet = preparedStatement.executeQuery();
//Show my "foo" column results
while (resultSet.next()) {
log.debug(resultSet.getString("foo"));
}
I would like to parse my queries BEFORE running them. Essentially, I want to create an array for column labels that will match the value expected by the resultSet.get** method. For illustration purpose, I would like to replace the loop above by this and get the same results:
//Show my "foo" column results
while (resultSet.next()) {
log.debug(resultSet.getString(arrayOfColumns.get(0)));
}
This seems easy. I could parse my statement with a simple regex that takes the string between SELECT and FROM, creating groups using the column delimiter, and building arrayOfColumns from the groups. But JDBC has specific ways to deal with aliases for example. For the second column, this naive parser would return the entire "bar AS barAlias" where I believe that resultSet.get** expects "barAlias". It will tell me "The column name bar As barAlias is not valid."
Essentially, I would like to understand the JBDC column label parsing behavior in order to replicate it.
NOTE, I am aware that I can get columns by index from the resultSet object. That is not the objective here. Also, I do not want to use any JDBC related method (I understand that the preparedStatement metadata is available). Think of this as a theoretical questions where do not have a JDBC library and MUST use regular expressions.

The best way of obtaining the columns of the result set is to ask JDBC to do it for you. You do not need a ResultSet object for it - ResultSetMetaData would be enough.
PreparedStatement preparedStatement = connection.prepareStatement(query);
ResultSetMetaData metaData = preparedStatement.getMetaData();
for (int i = 1 ; i <= metaData.getColumnCount() ; i++) {
listOfColumns.add(metaData.getColumnLabel(i));
}

To get the column names that are associated with your ResultSet, you should consult its ResultSetMetaData. That's going to be both easier and more reliable than parsing the query text.

So, after having split on the delimiter (comma) to get column names between SELECT and FROM, check each one to see if it contains the string " AS " (you may need to account for upper and lowercase), then if present, split on that, and use the rightmost value.

While I agree with others you should probably use ResultSetMetaData in "ordinary" use-cases, you seem to insist that the query string itself should be parsed. Parsing SQL with regular expressions is brittle, and you should avoid doing that. Better use an off-the-shelf SQL parser, which can do the parsing for you. There are several, including:
jOOQ
JSqlParser
General SQL Parser
With jOOQ, you could just write this:
ctx.parser()
.parseSelect("SELECT a, b ...")
.getSelect()
.stream()
.forEach(f -> System.out.println(f.getName()));
Disclaimer: I work for the company behind jOOQ

Related

Java SQL pattern to find some tables in my database

I'm using database metadata to find some tables in a given schema in database like that:
DatabaseMetaData dbmd = connections.getMetaData();
ResultSet rs = dbmd.getTables(null,"schema_name","table_name_pattern","type");
It works, but my problem is that I only want to find tables that begin with t and three others tables for which I have the exact names:
books_table, froots, and colors.
How can I make a pattern that gives me only these three tables and the tables that begin with t?
I had a similar problem time ago, and the answer is no, you can't do it this way.
The JDBC metadata classes have only very limited functionality.
However, you can query the engine metadata directly. For example you can do:
select *
from pg_catalog.pg_tables
where schemaname = 'public'
and tablename like 't%' or tablename in ('books_table', 'froots', 'colors')
The parameter is interpreted as as wildcard value for a LIKE condition. So to find all tables starting with t use t%.
Quote from the JavaDocs
Some DatabaseMetaData methods take arguments that are String patterns. These arguments all have names such as fooPattern. Within a pattern String, "%" means match any substring of 0 or more characters, and "_" means match any one character.
Note that this is case-sensitive. If you created your tables without quotes (which is what you should be doing) they are stored in lowercase, so t% will work fine.
The last parameter should be a string array with all possible "object types". As you want tables, use {"TABLE"}. Alternatively if you don't care about the "type", you can also pass null
DatabaseMetaData dbmd = connections.getMetaData();
ResultSet rs = dbmd.getTables(null,"schema_name","t%",new String[]{"TABLE"});
You can't however specify multiple OR conditions. If you want to find if other tables exist where you can't use a wildcard, you will have to call getTables() once for each table name.
Another option is to simply get all tables from that schema, then discard those that you don't want while processing the ResultSet. That is most probably faster than using multiple calls to getTables()

How to get rid of many placeholders (question marks) in java sql statements

I want to insert a row to table using JDBC. But table has a big number of columns, so I don't want to specify all of them into a statement. Is there another way of specifying column values?
INSERT INTO TABLE_NAME VALUES(?, ?, ?, ..., ?)
I hope there is another way to do it without a big number of placeholders.
EDITED:
I mean, can I just edit query like ... meaning that it expects a many values and dynamically set values with incremented index or something else. And yes as #f1sh mentioned the problem is in the number of placeholders, but not in case of laziness, more in case of convenience and cleanliness.
P.S.: for some funny enterprise requirements reasons I can not use JPA frameworks :)
EDIT
Another option is to write your own or use SqlBuilder
SqlBuilder changes that whole scenario by wrapping the SQL syntax within very lightweight and easy to use Java objects which follow the "builder" paradigm (similar to StringBuilder). This changes many common SQL syntactical, runtime errors into Java compile-time errors! Let's dive right in to some quick examples to to see how it all works.
You can use Spring's NamedParameterJdbcTemplate and use variable names
String SQL = INSERT INTO TABLE_NAME VALUES((:id),...);
MapSqlParameterSource parameters = new MapSqlParameterSource();
parameters.addValue("id", "idValue");
//...
namedParameterJdbcTemplate.update(SQL, parameters) > 0;
Template class with a basic set of JDBC operations, allowing the use of named parameters rather than traditional '?' placeholders.
If you can (safely) use dynamic query using Statement:
Statement stmt = (Statement) con.createStatement(“SELECT username, password FROM users WHERE username='” + user + “‘ AND password='” + pass + “‘ limit 0,1”);
ResultSet rs = stmt.executeQuery();
You can use it only on closed values so there will be no option for SQL injection

Using Variable as a column in sql query

Here is my code. I am trying to use a variable instead of a column name in here
But I get below exception. How can I resolve this error?
You can't bind table/column names in a prepared statement, nor would you normally want to allow this. Here is a working version of your code:
String query = "UPDATE report SET itemno = ?";
pst = (PreparedStatement) con.prepareStatement(query);
pst.setInt(1, dqty);
pst.executeUpdate();
Notes:
You almost certainly want to add a WHERE clause to your update, without which it would affect every record in the table. With prepared statements, you don't need to worry about escaping your literal data. Just let Java handle this for you.
If you really need the ability to update other table/column combinations, then just create other statements for that. One size fits all works at 7-Eleven, but not JDBC, since you might SQL injected.

How to make PreparedStatement.setXXX() dynamic based on type of column values

I have to update table using values from a data file which contains all the rows. Now I am using JDBC batches. Data files contains 100s of columns and millions of rows.
For e.g. data file contains 3 columns two rows to make it simple
1,ABC,DEF
2.GHI,JKL
PreparedStatement pstmt = connection.prepareStatement(insert);
//how to find type
pstmt.setInt(1, 2);
pstmt.setString(2, "GHI");
pstmt.setString(3, "JKL");
pstmt.addBatch();
pstmt.executeBatch();
Now my question is at run time based on the data coming from data file how do I find that I need to call setInt or setString and more importantly how many times I need to call setXXX for that addBatch(). This seems like for each table I need to have dedicated preparedStatements. More importantly I need to find how many times I should call setObject at run based on the number of columns which is in data ile? Is there anyway I can make this generic?
I am new to JDBC please guide. Thanks in advance.
You can use setObject(int index, Object obj). JDBC then determines the correct type.
The PreparedStatement has a method setObject(int, Object). The documentation states
If arbitrary parameter type conversions are required, the method setObject should be used with a target SQL type.
If you have an SQL statement like
Select * From table Where value1 = ? and value2 = ?
you have to call the setXXX methods two times. They are used to specify the wildcard values (marked ny ?) for the SQL statement that the PreparedStatement instance represents. The number of calls is therefore dependent of your SQL statement that is referenced by your variable insert. The int argument of the setXXX methods refers to the position of the variable in the SQL statement with setXXX(1, object) referring to the first wildcard and so on.
Of course, you have to repeat the same amount of calls to setXXX for each query you add the the batch that you want to execute at the end.
You can use like below snippet. Check statement.setObject documentation for further details. The "rs" in the below snippet is a resultset got by executing some query from one table. The "query" in the below snippet is some insert or update query into different table. The below example states selection from one table and insertion into some other table while dynamically identifying the column type. Note: Table column types should match else exception will be thrown.
PreparedStatement statement = connection.prepateStatement( query );
ResultSetMetaData rsmd = rs.getMetaData();
while( rs.next() )
{
for( int i = 1 ; i <= rsmd.getColumnCount() ; i++ )
{
statement.setObject( i, rs.getObject(i), rsmd.getColumnType(i) );
}
}

Doing Java String replacement efficiently

We have the following code :
String templateQuery = "select * from my_table where col1=$1 or col2 like '%$2.$1'";
String tmp = templateQuery;
for(int i=1;i<=maxCols;i++) {
tmp = tmp.replaceAll("\\$"+i, data[i-1]);
}
This code works fine as maxCols never exceeds 10. But my colleague disagree with me stating that this code consumes too much memory. Can you help us ?
EDIT:
I have change the initial templateQuery with a much realistic one. Secondly, templateQuery can potentially be a big string.
EDIT 2:
Thanks for those who have pointed out the SQLInjection problem.
Don't do this.
Not for performance reasons (which will be miniscule compared with the cost of the database query), but to avoid SQL injection attacks. What happens if data[0] is actually the string
' OR 'x' = 'x
?
Then you'll end up with a SQL statement of:
SELECT * FROM my_table WHERE col1='' OR 'x' = 'x'
which I think we can agree isn't what you wanted.
Use a parameterized SQL statement instead (PreparedStatement) and get the database driver to send the parameter values separately.
EDIT: In other comments, the OP has specified that the template string can be quite long, and some parameters may actually involve multiple initial values combined together. I still say that the cost of replacement is likely to be insignificant in the grand scheme of things, and I still say that PreparedStatement is the way to go. You should perform whatever combining operations you need to on the input before setting them as the values for the PreparedStatement - so the template may need the SQL with SQL placeholders, and then "subtemplates" to work out how to get from your input to the parameters for the PreparedStatement. Whatever you do, putting the values directly into the SQL is the wrong approach.
Why aren't you just using a PreparedStatement with replacement parameters?
String templateQuery = "SELECT * FROM my_table WHERE col1 = ?";
PreparedStatement ps = con.prepareStatement(templateQuery);
for (int i = 0; i < data.length; i++) {
ps.setString(i + 1, data[i]);
}
ResultSet rs = ps.executeQuery();
You're otherwise vulnerable to SQL injection if you use string replacement like you have.
He is correct, because you create maxCols tmp Strings.
I realized that it is for Sql commands, if is it, why you do not use PreparedStatement (http://download.oracle.com/javase/1.4.2/docs/api/java/sql/PreparedStatement.html) for this task?
Also, for formatting strings, rather than use substitute, use Formatter, it is much more elegant: http://download.oracle.com/javase/1.5.0/docs/api/java/util/Formatter.html
Whether this consumes too much memory is open to debate (what's "too much"?)
Nonetheless, for this kind of stuff you should use PreparedStatement. It allows you to do pretty much exactly what you're trying to achieve, but in a much cleaner fashion.
Your colleague is right in that every string replacement creates a new copy of the string. (However, the cost of these is probably negligible with less than 10 parameters.) Moreover, for every execution of this query the SQL engine needs to parse it afresh, which consumes far more additional resources each time.
The potential bigger problem though is that the code is suscept to SQL injection. If the input data is coming from an external source, a hacker can pass in a parameter such as "col1; drop table my_table;", effectively deleting your whole table.
All of these can be solved by using a PreparedStatement instead.

Categories