I'm trying to put some anti sql injection in place in java and am finding it very difficult to work with the the "replaceAll" string function. Ultimately I need a function that will convert any existing \ to \\, any " to \", any ' to \', and any \n to \\n so that when the string is evaluated by MySQL SQL injections will be blocked.
I've jacked up some code I was working with and all the \\\\\\\\\\\ in the function are making my eyes go nuts. If anyone happens to have an example of this I would greatly appreciate it.
PreparedStatements are the way to go, because they make SQL injection impossible. Here's a simple example taking the user's input as the parameters:
public insertUser(String name, String email) {
Connection conn = null;
PreparedStatement stmt = null;
try {
conn = setupTheDatabaseConnectionSomehow();
stmt = conn.prepareStatement("INSERT INTO person (name, email) values (?, ?)");
stmt.setString(1, name);
stmt.setString(2, email);
stmt.executeUpdate();
}
finally {
try {
if (stmt != null) { stmt.close(); }
}
catch (Exception e) {
// log this error
}
try {
if (conn != null) { conn.close(); }
}
catch (Exception e) {
// log this error
}
}
}
No matter what characters are in name and email, those characters will be placed directly in the database. They won't affect the INSERT statement in any way.
There are different set methods for different data types -- which one you use depends on what your database fields are. For example, if you have an INTEGER column in the database, you should use a setInt method. The PreparedStatement documentation lists all the different methods available for setting and getting data.
The only way to prevent SQL injection is with parameterized SQL. It simply isn't possible to build a filter that's smarter than the people who hack SQL for a living.
So use parameters for all input, updates, and where clauses. Dynamic SQL is simply an open door for hackers, and that includes dynamic SQL in stored procedures. Parameterize, parameterize, parameterize.
If really you can't use Defense Option 1: Prepared Statements (Parameterized Queries) or Defense Option 2: Stored Procedures, don't build your own tool, use the OWASP Enterprise Security API. From the OWASP ESAPI hosted on Google Code:
Don’t write your own security controls! Reinventing the wheel when it comes to developing security controls for every web application or web service leads to wasted time and massive security holes. The OWASP Enterprise Security API (ESAPI) Toolkits help software developers guard against security‐related design and implementation flaws.
For more details, see Preventing SQL Injection in Java and SQL Injection Prevention Cheat Sheet.
Pay a special attention to Defense Option 3: Escaping All User Supplied Input that introduces the OWASP ESAPI project).
(This is in answer to the OP's comment under the original question; I agree completely that PreparedStatement is the tool for this job, not regexes.)
When you say \n, do you mean the sequence \+n or an actual linefeed character? If it's \+n, the task is pretty straightforward:
s = s.replaceAll("['\"\\\\]", "\\\\$0");
To match one backslash in the input, you put four of them in the regex string. To put one backslash in the output, you put four of them in the replacement string. This is assuming you're creating the regexes and replacements in the form of Java String literals. If you create them any other way (e.g., by reading them from a file), you don't have to do all that double-escaping.
If you have a linefeed character in the input and you want to replace it with an escape sequence, you can make a second pass over the input with this:
s = s.replaceAll("\n", "\\\\n");
Or maybe you want two backslashes (I'm not too clear on that):
s = s.replaceAll("\n", "\\\\\\\\n");
PreparedStatements are the way to go in most, but not all cases. Sometimes you will find yourself in a situation where a query, or a part of it, has to be built and stored as a string for later use. Check out the SQL Injection Prevention Cheat Sheet on the OWASP Site for more details and APIs in different programming languages.
Prepared Statements are the best solution, but if you really need to do it manually you could also use the StringEscapeUtils class from the Apache Commons-Lang library. It has an escapeSql(String) method, which you can use:
import org.apache.commons.lang.StringEscapeUtils;
…
String escapedSQL = StringEscapeUtils.escapeSql(unescapedSQL);
Using a regular expression to remove text which could cause a SQL injection sounds like the SQL statement is being sent to the database via a Statement rather than a PreparedStatement.
One of the easiest ways to prevent an SQL injection in the first place is to use a PreparedStatement, which accepts data to substitute into a SQL statement using placeholders, which does not rely on string concatenations to create an SQL statement to send to the database.
For more information, Using Prepared Statements from The Java Tutorials would be a good place to start.
You need the following code below. At a glance, this may look like any old code that I made up. However, what I did was look at the source code for http://grepcode.com/file/repo1.maven.org/maven2/mysql/mysql-connector-java/5.1.31/com/mysql/jdbc/PreparedStatement.java. Then after that, I carefully looked through the code of setString(int parameterIndex, String x) to find the characters which it escapes and customised this to my own class so that it can be used for the purposes that you need. After all, if this is the list of characters that Oracle escapes, then knowing this is really comforting security-wise. Maybe Oracle need a nudge to add a method similar to this one for the next major Java release.
public class SQLInjectionEscaper {
public static String escapeString(String x, boolean escapeDoubleQuotes) {
StringBuilder sBuilder = new StringBuilder(x.length() * 11/10);
int stringLength = x.length();
for (int i = 0; i < stringLength; ++i) {
char c = x.charAt(i);
switch (c) {
case 0: /* Must be escaped for 'mysql' */
sBuilder.append('\\');
sBuilder.append('0');
break;
case '\n': /* Must be escaped for logs */
sBuilder.append('\\');
sBuilder.append('n');
break;
case '\r':
sBuilder.append('\\');
sBuilder.append('r');
break;
case '\\':
sBuilder.append('\\');
sBuilder.append('\\');
break;
case '\'':
sBuilder.append('\\');
sBuilder.append('\'');
break;
case '"': /* Better safe than sorry */
if (escapeDoubleQuotes) {
sBuilder.append('\\');
}
sBuilder.append('"');
break;
case '\032': /* This gives problems on Win32 */
sBuilder.append('\\');
sBuilder.append('Z');
break;
case '\u00a5':
case '\u20a9':
// escape characters interpreted as backslash by mysql
// fall through
default:
sBuilder.append(c);
}
}
return sBuilder.toString();
}
}
In case you are dealing with a legacy system, or you have too many places to switch to PreparedStatements in too little time - i.e. if there is an obstacle to using the best practice suggested by other answers, you can try AntiSQLFilter
From:Source
public String MysqlRealScapeString(String str){
String data = null;
if (str != null && str.length() > 0) {
str = str.replace("\\", "\\\\");
str = str.replace("'", "\\'");
str = str.replace("\0", "\\0");
str = str.replace("\n", "\\n");
str = str.replace("\r", "\\r");
str = str.replace("\"", "\\\"");
str = str.replace("\\x1a", "\\Z");
data = str;
}
return data;
}
Most of the people are recommending PreparedStatements, however that requires you to have a direct connection with your Database using the Java Application. But then you'll have everyone else saying that you shouldn't have a direct connection to your database due to security issues, but utilize a Restful API to deal with queries.
In my opinion, as long as you're aware that you have to be careful with what you escape and do It deliberately, there shouldn't be a problem.
My solution is using contains() to check for SQL keywords such as UPDATE or other dangerous characters like = to completely nullify the SQL injection by asking the user to insert other characters on input.
Edit:
You can use this source material from W3Schools about Java Regular Expressions to do this validation on Strings.
After searching an testing alot of solution for prevent sqlmap from sql injection, in case of legacy system which cant apply prepared statments every where.
java-security-cross-site-scripting-xss-and-sql-injection topic
WAS THE SOLUTION
i tried #Richard s solution but did not work in my case.
i used a filter
The goal of this filter is to wrapper the request into an own-coded
wrapper MyHttpRequestWrapper which transforms:
the HTTP parameters with special characters (<, >, ‘, …) into HTML
codes via the org.springframework.web.util.HtmlUtils.htmlEscape(…)
method. Note: There is similar classe in Apache Commons :
org.apache.commons.lang.StringEscapeUtils.escapeHtml(…) the SQL
injection characters (‘, “, …) via the Apache Commons classe
org.apache.commons.lang.StringEscapeUtils.escapeSql(…)
<filter>
<filter-name>RequestWrappingFilter</filter-name>
<filter-class>com.huo.filter.RequestWrappingFilter</filter-class>
</filter>
<filter-mapping>
<filter-name>RequestWrappingFilter</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping>
package com.huo.filter;
import java.io.IOException;
import javax.servlet.Filter;
import javax.servlet.FilterChain;
import javax.servlet.FilterConfig;
import javax.servlet.ServletException;
import javax.servlet.ServletRequest;
import javax.servlet.ServletReponse;
import javax.servlet.http.HttpServletRequest;
public class RequestWrappingFilter implements Filter{
public void doFilter(ServletRequest req, ServletReponse res, FilterChain chain) throws IOException, ServletException{
chain.doFilter(new MyHttpRequestWrapper(req), res);
}
public void init(FilterConfig config) throws ServletException{
}
public void destroy() throws ServletException{
}
}
package com.huo.filter;
import java.util.HashMap;
import java.util.Map;
import javax.servlet.ServletException;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletRequestWrapper;
import org.apache.commons.lang.StringEscapeUtils;
public class MyHttpRequestWrapper extends HttpServletRequestWrapper{
private Map<String, String[]> escapedParametersValuesMap = new HashMap<String, String[]>();
public MyHttpRequestWrapper(HttpServletRequest req){
super(req);
}
#Override
public String getParameter(String name){
String[] escapedParameterValues = escapedParametersValuesMap.get(name);
String escapedParameterValue = null;
if(escapedParameterValues!=null){
escapedParameterValue = escapedParameterValues[0];
}else{
String parameterValue = super.getParameter(name);
// HTML transformation characters
escapedParameterValue = org.springframework.web.util.HtmlUtils.htmlEscape(parameterValue);
// SQL injection characters
escapedParameterValue = StringEscapeUtils.escapeSql(escapedParameterValue);
escapedParametersValuesMap.put(name, new String[]{escapedParameterValue});
}//end-else
return escapedParameterValue;
}
#Override
public String[] getParameterValues(String name){
String[] escapedParameterValues = escapedParametersValuesMap.get(name);
if(escapedParameterValues==null){
String[] parametersValues = super.getParameterValues(name);
escapedParameterValue = new String[parametersValues.length];
//
for(int i=0; i<parametersValues.length; i++){
String parameterValue = parametersValues[i];
String escapedParameterValue = parameterValue;
// HTML transformation characters
escapedParameterValue = org.springframework.web.util.HtmlUtils.htmlEscape(parameterValue);
// SQL injection characters
escapedParameterValue = StringEscapeUtils.escapeSql(escapedParameterValue);
escapedParameterValues[i] = escapedParameterValue;
}//end-for
escapedParametersValuesMap.put(name, escapedParameterValues);
}//end-else
return escapedParameterValues;
}
}
If you are using PL/SQL you can also use DBMS_ASSERT
it can sanitize your input so you can use it without worrying about SQL injections.
see this answer for instance:
https://stackoverflow.com/a/21406499/1726419
You can try sanitize the parameters, (not the 1st option)
Codec ORACLE_CODEC = new OracleCodec();
String user = req.getParameter("user");
String query = "SELECT user_id FROM user_data WHERE user_name = '" +
ESAPI.encoder().encodeForSQL( ORACLE_CODEC, user) + "' ...;
First, ask the question - are double or single quotes, or backslashes needed in user entry fields?
Backslashes - no. Double and single quotes are rarely used in English and they are used differently in Britain than the U.S.
I say remove or replace them and you simplify.
private String scrub(
String parameter,
int length
)
{
String parm = null;
if ( parameter != null && parameter.length() > 0 && parameter.length() < length )
{
parm = parameter
.replace( "\\", " " )
.replace( "\"", " " )
.replace( "\'", " " )
.replace( "\t", " " )
.replace( "\r", " " )
.replace( "\n", " " )
.trim();
}
return parm;
}
Related
I am getting a high severity issue in this method:
public void recordBadLogin(final String uid, final String reason, final String ip) throws DataAccessException {
if (Utils.isEmpty(uid)) {
throw new DataAccessException("User information needed to update , Empty user information passed");
}
try {
String sql = (String) this.queries.get(IUtilDAO.queryKeyPrefix + UtilDAO.RECORD_FAILED_LOGIN);
Map<String, Object> paramMap = new HashMap<String, Object>();
paramMap.put("uid", uid.trim());
paramMap.put("reason", (reason != null ? reason.trim() : "Invalid userid/password"));
paramMap.put("ip", ip);
this.namedJdbcTemplate.update(sql, paramMap);
} catch (Exception e) {
throw new DataAccessException("Failed to record bad login for user " + uid, e);
}
}
This line of code is causing the issue:
String sql = (String) this.queries.get(IUtilDAO.queryKeyPrefix + UtilDAO.RECORD_FAILED_LOGIN);
queries is a properties object and the prepared statement is being retrieved given IUtilDAO.queryKeyPrefix + UtilDAO.RECORD_FAILED_LOGIN. And those 2 arguments are constants. Logically I don't see how this can cause an SQL injection issue as the prepared statement is being retrieved from a dictionary. Does anyone have an idea if this is a false positive or if there is an actual vulnerability present?
It's hard to tell from the example given, but I'd guess that the properties object was tainted by untrusted data. Most code flow analysis tools will taint the entire data structure if any untrusted data is placed in it.
Technically this is a "false positive". But architecturally it's something that should be fixed - it's generally a bad idea to mix trusted and untrusted data together in the same data structure. It makes it easy for future developers to misunderstand the status of a particular element, and makes it harder for both humans and tools to code review for security issues.
after a recent findbugs (FB) run it complains about a: Security - HTTP Response splitting vulnerability The following code triggers it:
String referrer = req.getParameter("referrer");
if (referrer != null) {
launchURL += "&referrer="+(referrer);
}
resp.sendRedirect(launchURL);
Basically the 'referrer' http parameter contains an url, to which, when clicking on a back button in our application the browser returns to. It is appended to the url as a parameter. After a bit research i know that i need to sanitize the referrer url. After a bit more research i found the esapi project which seem to offer this kind of functionality:
//1st canonicalize
import org.owasp.esapi.Encoder;
import org.owasp.esapi.Validator;
import org.owasp.esapi.reference.DefaultEncoder;
import org.owasp.esapi.reference.DefaultValidator;
[...]
Encoder encoder = new DefaultEncoder(new ArrayList<String>());
String cReferrer = encoder.canonicalize(referrer);
However I didn't figure out how to detect e.g. jscript code or other stuff which doesn't belong to a referrer url. So how can I achieve that with esapi?
I tried:
Validator validator = new DefaultValidator(encoder);
validator.isValidInput("Redirect URL",referrer,"HTTPParameterValue",512,false);
however this doesn't work. What I need is a function which results in:
http://www.google.com (ok)
http://www.google.com/login?dest=http://google.com/%0D%0ALocation: javascript:%0D%0A%0D%0Aalert(document.cookie) (not ok)
Or is it enough to call the following statement?
encoder.encodeForHTMLAttribute(referrer);
Any help appreciated.
Here's my final solution if anyone is interested. First I canonicalize and then URL decode the string. If a CR or LF exists (\n \r) I just cut of the rest of that potential 'attack' string starting with \n or \r.
String sanitize(String url) throws EncodingException{
Encoder encoder = new DefaultEncoder(new ArrayList<String>());
//first canonicalize
String clean = encoder.canonicalize(url).trim();
//then url decode
clean = encoder.decodeFromURL(clean);
//detect and remove any existent \r\n == %0D%0A == CRLF to prevent HTTP Response Splitting
int idxR = clean.indexOf('\r');
int idxN = clean.indexOf('\n');
if(idxN >= 0 || idxR>=0){
if(idxN<idxR){
//just cut off the part after the LF
clean = clean.substring(0,idxN);
}
else{
//just cut off the part after the CR
clean = clean.substring(0,idxR);
}
}
//re-encode again
return encoder.encodeForURL(clean);
}
Theoretically i could have later verified the value against 'HTTPParameterValue' regex which is defined in the ESAPI.properties however it didn't like colon in the http:// and I didn't investigated further.
And one more remark after testing it: Most modern browser nowadays (Firefox > 3.6, Chrome, IE10 etc.) detect this kind of vulnerability and do not execute the code...
I think you have the right idea, but are using an inappropriate encoder. The Referer [sic] header value is really a URL, not an HTML attribute, so you really want to use:
encoder.encodeForURL(referrer);
-kevin
I would suggest white-listing approach wherein you check the referrer string only for permissible characters. Regex would be a good option.
EDIT:
The class org.owasp.esapi.reference.DefaultEncoder being used by you is not really encoding anything. Look at the source code of the method encodeForHTMLAttribute(referrer) here at grepcode. A typical URL encoding (encoding carriage return and line feed) too wont help.
So the way forward would be device some validation logic which checks for valid set of characters. Here is another insightful article.
The accepted answer will not work if in case there is "\n\r" in the string.
Example:
If I have string: "This is str\n\rstr", it returns "This is str\nstr"
Rectified version of above accepted answer is:
String sanitizeCarriageReturns(String value) {
int idxR = value.indexOf('\r');
int idxN = value.indexOf('\n');
if (idxN >= 0 || idxR >= 0) {
if ((idxN > idxR && idxR<0) || (idxR > idxN && idxR>=0)) {
value = value.substring(0, idxN);
} else if (idxN < idxR){
value = value.substring(0, idxR);
}
}
return value;
}
I am using a properties file to store my application's configuration values.
In one of the instances, I have to store a value as
xxx:yyy:zzz. When I do that, the colon is escaped with a back slash\ resulting in the value showing as xxx\:yyy\:zzz in the properties file.
I am aware that the colon : is a standard delimiter of the Properties Java class. However I still need to save the value without the back slash \.
Any suggestions on how to handle this?
Put the properties into the Properties object and save it using a store(...) method. The method will perform any escaping required. The Java documentation says:
"... For the key, all space characters are written with a preceding \ character. For the element, leading space characters, but not embedded or trailing space characters, are written with a preceding \ character. The key and element characters #, !, =, and : are written with a preceding backslash to ensure that they are properly loaded."
You only need to manually escape characters if you are creating / writing the file by hand.
Conversely, if you want the file to contain unescaped colon characters, you are out of luck. Such a file is malformed and probably won't load properly using the Properties.load(...) methods. If you go down this route, you'll need to implement your own custom load and/or store methods.
I came across the same issue. Forward slashes / also get escaped by the store() method in Properties.
I solved this issue by creating my own CustomProperties class (extending java.util.Properties) and commenting out the call to saveConvert() in the customStore0() method.
Here is my CustomProperties class:
import java.io.BufferedWriter;
import java.io.IOException;
import java.io.OutputStream;
import java.io.OutputStreamWriter;
import java.util.Date;
import java.util.Enumeration;
import java.util.Properties;
public class CustomProperties extends Properties {
private static final long serialVersionUID = 1L;
#Override
public void store(OutputStream out, String comments) throws IOException {
customStore0(new BufferedWriter(new OutputStreamWriter(out, "8859_1")),
comments, true);
}
//Override to stop '/' or ':' chars from being replaced by not called
//saveConvert(key, true, escUnicode)
private void customStore0(BufferedWriter bw, String comments, boolean escUnicode)
throws IOException {
bw.write("#" + new Date().toString());
bw.newLine();
synchronized (this) {
for (Enumeration e = keys(); e.hasMoreElements();) {
String key = (String) e.nextElement();
String val = (String) get(key);
// Commented out to stop '/' or ':' chars being replaced
//key = saveConvert(key, true, escUnicode);
//val = saveConvert(val, false, escUnicode);
bw.write(key + "=" + val);
bw.newLine();
}
}
bw.flush();
}
}
We hit this question a couple of days ago. We were manipulating existing properties files with URLs as values.
It's risky but if your property values are less than 40 characters then you can use the "list" method instead of "store":
http://docs.oracle.com/javase/6/docs/api/java/util/Properties.html#list(java.io.PrintWriter)
We had a quick look at the JDK code and hacked out a custom implementation of store that works for our purposes:
public void store(Properties props, String propertyFilePath) throws FileNotFoundException {
PrintWriter pw = new PrintWriter(propertyFilePath);
for (Enumeration e = props.propertyNames(); e.hasMoreElements();) {
String key = (String) e.nextElement();
pw.println(key + "=" + props.getProperty(key));
}
pw.close();
}
If you use the xml variant of the properties file (using loadFromXML and storeToXML) this shouldn't be a problem.
Try using unicode.
The unicode for a colon is\u003A
Additionally the unicode for a space is: \u0020
For a list of basic Latin characters see: https://en.wikipedia.org/wiki/Basic_Latin_(Unicode_block)
For example:
ProperName\u003A\NameContinues=Some property value
Will expect a property with a key:
ProperName:NameContinues
And will have a value of:
Some property value
For me it worked by using \ before special character,
e.g,
Before: VCS\u003aIC\u0020Server\u003a=Migration
After: VCS\:IC\ Server\:=Migration
: is escaped with \: and (space) with \ (\ followed by <Space>).
For more info : https://en.wikipedia.org/wiki/.properties
For people like me that get here for this when using Spring Boot configuration properties files: You need to enclose in [..]:
E.g.:
my.test\:key=value
is not enough, you need this in your application.properties for example:
my.[test\:key]=value
See also SpringBoot2 ConfigurationProperties removes colon from yaml keys
Its simple,
just use Apostrophe ' ' over there
E.g.:
Instead of this(case 1)
File file= new File("f:\\properties\\gog\\esave\\apple");
prop.setProperty("basedir",file.toString());
Use this(case 2)
File file= new File("f':'\\properties\\gog\\esave\\apple");
prop.setProperty("basedir",file.toString());
Output will be
Case 1: basedir = f\:\\properties\\gog\\esave\\apple
Case 2: basedir = f:\\properties\\gog\\esave\\apple
I hope this will help you
Hope someone can help me out with this one !
I have a sql file that looks like this:
CREATE TABLE IF NOT EXISTS users(
id INT UNSIGNED NOT NULL AUTO_INCREMENT,
firstname VARCHAR(30) NOT NULL,
lastname VARCHAR(30) NOT NULL,
PRIMARY KEY (id),
CONSTRAINT UNIQUE (firstname,lastname)
)
ENGINE=InnoDB
;
INSERT IGNORE INTO users (firstname,lastname) VALUES ('x','y');
/*
INSERT IGNORE INTO users (firstname,lastname) VALUES ('a','b');
*/
I have buit a web application that initializes a mysql database at startup with this function:
public static void initDatabase(ConnectionPool pool, File sqlFile){
Connection con = null;
Statement st = null;
String mySb=null;
try{
con = pool.getConnection();
mySb=IOUtils.copyToString(sqlFile);
// We use ";" as a delimiter for each request then we are sure to have well formed statements
String[] inst = mySb.split(";");
st = con.createStatement();
for(int i = 0; i<inst.length; i++){
// we ensure that there is no spaces before or after the request string
// in order not to execute empty statements
if(!inst[i].trim().isEmpty()){
st.executeUpdate(inst[i]);
}
}
st.close();
}catch(IOException e){
throw new RuntimeException(e);
}catch(SQLException e){
throw new RuntimeException(e);
}finally{
SQLUtils.safeClose(st);
pool.close(con);
}
}
(This function was found on the web. Author, please forgive me for not citing your name, I lost it !!)
It works perfectly as long as there is not SQL comment blocks.
The copyToString() function basically does what it says.
What I would like now is build a regex that will remove block comments from the string. I only have block comments /* */ in the file, no --.
What I have tried so far:
mySb = mySb.replaceAll("/\\*.*\\*/", "");
Unfortunatly, I'm not very good at regex...
I get all the troubles of "The matched string look something like /* comment */ real statement /* another comment*/ " and so on...
Try
mySb = mySb.replaceAll("/\\*.*?\\*/", "");
(notice the ? which stands for "lazy").
EDIT: To cover multiline comments, use this approach:
Pattern commentPattern = Pattern.compile("/\\*.*?\\*/", Pattern.DOTALL);
mySb = commentPattern.matcher(mySb).replaceAll("");
Hope this works for you.
You need to use a reluctant qualifier like this:
public class Main {
public static void main(String[] args) {
String s = "The matched string look something like /* comment */ real statement /* another comment*/";
System.err.println(s.replaceAll("/\\*.*?\\*/", ""));
}
}
Try the following approach:
String s = "/* comment */ select * from XYZ; /* comment */";
System.out.println(s.replaceAll("/\\*.*?\\*/", ""));
Outputs:
select * from XYZ;
The .*? stands for use Laziness Instead of Greediness (that means the .* matches the largest string possible by default, i.e. is greedy => you have to configure it to be non-greedy using the ? in the .*? expression).
it won't work 100%
the comments can be a part of a valid string specified in the SQL and in that case they need to be kept...
I am just researching a solution... seems to be complicated
so far I have:
\G(?:[^']*?|'(?:[^']|'')*?'(?!'))*?\/\*.*?\*\/
but it matches all while I need to match the comment only... and just found out it could fail when preceded by a single-line comment... damn
I am looking for a SQL Library that will parse an SQL statement and return some sort of Object representation of the SQL statement. My main objective is actually to be able to parse the SQL statement and retrieve the list of table names present in the SQL statement (including subqueries, joins and unions).
I am looking for a free library with a license business friendly (e.g. Apache license). I am looking for a library and not for an SQL Grammar. I do not want to build my own parser.
The best I could find so far was JSQLParser, and the example they give is actually pretty close to what I am looking for. However it fails parsing too many good queries (DB2 Database) and I'm hoping to find a more reliable library.
I doubt you'll find anything prewritten that you can just use. The problem is that ISO/ANSI SQL is a very complicated grammar — something like more than 600 production rules IIRC.
Terence Parr's ANTLR parser generator (Java, but can generate parsers in any one of a number of target languages) has several SQL grammars available, including a couple for PL/SQL, one for a SQL Server SELECT statement, one for mySQL, and one for ISO SQL.
No idea how complete/correct/up-to-date they are.
http://www.antlr.org/grammar/list
You needn't reinvent the wheel, there is already such a reliable SQL parser library there, (it's commerical, not free), and this article shows how to retrieve the list of table names present in the SQL statement (including subqueries, joins and unions) that is exactly what you are looking for.
http://www.dpriver.com/blog/list-of-demos-illustrate-how-to-use-general-sql-parser/get-columns-and-tables-in-sql-script/
This SQL parser library supports Oracle, SQL Server, DB2, MySQL, Teradata and ACCESS.
You need the ultra light, ultra fast library to extract table names from SQL (Disclaimer: I am the owner)
Just add the following in your pom
<dependency>
<groupId>com.github.mnadeem</groupId>
<artifactId>sql-table-name-parser</artifactId>
<version>0.0.1</version>
And do the following
new TableNameParser(sql).tables()
For more details, refer the project
Old question, but I think this project contains what you need:
Data Tools Project - SQL Development Tools
Here's the documentation for the SQL Query Parser.
Also, here's a small sample program. I'm no Java programmer so use with care.
package org.lala;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.charset.Charset;
import java.util.Iterator;
import java.util.List;
import org.eclipse.datatools.modelbase.sql.query.QuerySelectStatement;
import org.eclipse.datatools.modelbase.sql.query.QueryStatement;
import org.eclipse.datatools.modelbase.sql.query.TableReference;
import org.eclipse.datatools.modelbase.sql.query.ValueExpressionColumn;
import org.eclipse.datatools.modelbase.sql.query.helper.StatementHelper;
import org.eclipse.datatools.sqltools.parsers.sql.SQLParseErrorInfo;
import org.eclipse.datatools.sqltools.parsers.sql.SQLParserException;
import org.eclipse.datatools.sqltools.parsers.sql.SQLParserInternalException;
import org.eclipse.datatools.sqltools.parsers.sql.query.SQLQueryParseResult;
import org.eclipse.datatools.sqltools.parsers.sql.query.SQLQueryParserManager;
import org.eclipse.datatools.sqltools.parsers.sql.query.SQLQueryParserManagerProvider;
public class SQLTest {
private static String readFile(String path) throws IOException {
FileInputStream stream = new FileInputStream(new File(path));
try {
FileChannel fc = stream.getChannel();
MappedByteBuffer bb = fc.map(FileChannel.MapMode.READ_ONLY, 0,
fc.size());
/* Instead of using default, pass in a decoder. */
return Charset.defaultCharset().decode(bb).toString();
} finally {
stream.close();
}
}
/**
* #param args
* #throws IOException
*/
public static void main(String[] args) throws IOException {
try {
// Create an instance the Parser Manager
// SQLQueryParserManagerProvider.getInstance().getParserManager
// returns the best compliant SQLQueryParserManager
// supporting the SQL dialect of the database described by the given
// database product information. In the code below null is passed
// for both the database and version
// in which case a generic parser is returned
SQLQueryParserManager parserManager = SQLQueryParserManagerProvider
.getInstance().getParserManager("DB2 UDB", "v9.1");
// Sample query
String sql = readFile("c:\\test.sql");
// Parse
SQLQueryParseResult parseResult = parserManager.parseQuery(sql);
// Get the Query Model object from the result
QueryStatement resultObject = parseResult.getQueryStatement();
// Get the SQL text
String parsedSQL = resultObject.getSQL();
System.out.println(parsedSQL);
// Here we have the SQL code parsed!
QuerySelectStatement querySelect = (QuerySelectStatement) parseResult
.getSQLStatement();
List columnExprList = StatementHelper
.getEffectiveResultColumns(querySelect);
Iterator columnIt = columnExprList.iterator();
while (columnIt.hasNext()) {
ValueExpressionColumn colExpr = (ValueExpressionColumn) columnIt
.next();
// DataType dataType = colExpr.getDataType();
System.out.println("effective result column: "
+ colExpr.getName());// + " with data type: " +
// dataType.getName());
}
List tableList = StatementHelper.getTablesForStatement(resultObject);
// List tableList = StatementHelper.getTablesForStatement(querySelect);
for (Object obj : tableList) {
TableReference t = (TableReference) obj;
System.out.println(t.getName());
}
} catch (SQLParserException spe) {
// handle the syntax error
System.out.println(spe.getMessage());
#SuppressWarnings("unchecked")
List<SQLParseErrorInfo> syntacticErrors = spe.getErrorInfoList();
Iterator<SQLParseErrorInfo> itr = syntacticErrors.iterator();
while (itr.hasNext()) {
SQLParseErrorInfo errorInfo = (SQLParseErrorInfo) itr.next();
// Example usage of the SQLParseErrorInfo object
// the error message
String errorMessage = errorInfo.getParserErrorMessage();
String expectedText = errorInfo.getExpectedText();
String errorSourceText = errorInfo.getErrorSourceText();
// the line numbers of error
int errorLine = errorInfo.getLineNumberStart();
int errorColumn = errorInfo.getColumnNumberStart();
System.err.println("Error in line " + errorLine + ", column "
+ errorColumn + ": " + expectedText + " "
+ errorMessage + " " + errorSourceText);
}
} catch (SQLParserInternalException spie) {
// handle the exception
System.out.println(spie.getMessage());
}
System.exit(0);
}
}