Java regular expression to prevent SQL injection [duplicate] - java

I'm trying to put some anti sql injection in place in java and am finding it very difficult to work with the the "replaceAll" string function. Ultimately I need a function that will convert any existing \ to \\, any " to \", any ' to \', and any \n to \\n so that when the string is evaluated by MySQL SQL injections will be blocked.
I've jacked up some code I was working with and all the \\\\\\\\\\\ in the function are making my eyes go nuts. If anyone happens to have an example of this I would greatly appreciate it.

PreparedStatements are the way to go, because they make SQL injection impossible. Here's a simple example taking the user's input as the parameters:
public insertUser(String name, String email) {
Connection conn = null;
PreparedStatement stmt = null;
try {
conn = setupTheDatabaseConnectionSomehow();
stmt = conn.prepareStatement("INSERT INTO person (name, email) values (?, ?)");
stmt.setString(1, name);
stmt.setString(2, email);
stmt.executeUpdate();
}
finally {
try {
if (stmt != null) { stmt.close(); }
}
catch (Exception e) {
// log this error
}
try {
if (conn != null) { conn.close(); }
}
catch (Exception e) {
// log this error
}
}
}
No matter what characters are in name and email, those characters will be placed directly in the database. They won't affect the INSERT statement in any way.
There are different set methods for different data types -- which one you use depends on what your database fields are. For example, if you have an INTEGER column in the database, you should use a setInt method. The PreparedStatement documentation lists all the different methods available for setting and getting data.

The only way to prevent SQL injection is with parameterized SQL. It simply isn't possible to build a filter that's smarter than the people who hack SQL for a living.
So use parameters for all input, updates, and where clauses. Dynamic SQL is simply an open door for hackers, and that includes dynamic SQL in stored procedures. Parameterize, parameterize, parameterize.

If really you can't use Defense Option 1: Prepared Statements (Parameterized Queries) or Defense Option 2: Stored Procedures, don't build your own tool, use the OWASP Enterprise Security API. From the OWASP ESAPI hosted on Google Code:
Don’t write your own security controls! Reinventing the wheel when it comes to developing security controls for every web application or web service leads to wasted time and massive security holes. The OWASP Enterprise Security API (ESAPI) Toolkits help software developers guard against security‐related design and implementation flaws.
For more details, see Preventing SQL Injection in Java and SQL Injection Prevention Cheat Sheet.
Pay a special attention to Defense Option 3: Escaping All User Supplied Input that introduces the OWASP ESAPI project).

(This is in answer to the OP's comment under the original question; I agree completely that PreparedStatement is the tool for this job, not regexes.)
When you say \n, do you mean the sequence \+n or an actual linefeed character? If it's \+n, the task is pretty straightforward:
s = s.replaceAll("['\"\\\\]", "\\\\$0");
To match one backslash in the input, you put four of them in the regex string. To put one backslash in the output, you put four of them in the replacement string. This is assuming you're creating the regexes and replacements in the form of Java String literals. If you create them any other way (e.g., by reading them from a file), you don't have to do all that double-escaping.
If you have a linefeed character in the input and you want to replace it with an escape sequence, you can make a second pass over the input with this:
s = s.replaceAll("\n", "\\\\n");
Or maybe you want two backslashes (I'm not too clear on that):
s = s.replaceAll("\n", "\\\\\\\\n");

PreparedStatements are the way to go in most, but not all cases. Sometimes you will find yourself in a situation where a query, or a part of it, has to be built and stored as a string for later use. Check out the SQL Injection Prevention Cheat Sheet on the OWASP Site for more details and APIs in different programming languages.

Prepared Statements are the best solution, but if you really need to do it manually you could also use the StringEscapeUtils class from the Apache Commons-Lang library. It has an escapeSql(String) method, which you can use:
import org.apache.commons.lang.StringEscapeUtils;
…
String escapedSQL = StringEscapeUtils.escapeSql(unescapedSQL);

Using a regular expression to remove text which could cause a SQL injection sounds like the SQL statement is being sent to the database via a Statement rather than a PreparedStatement.
One of the easiest ways to prevent an SQL injection in the first place is to use a PreparedStatement, which accepts data to substitute into a SQL statement using placeholders, which does not rely on string concatenations to create an SQL statement to send to the database.
For more information, Using Prepared Statements from The Java Tutorials would be a good place to start.

You need the following code below. At a glance, this may look like any old code that I made up. However, what I did was look at the source code for http://grepcode.com/file/repo1.maven.org/maven2/mysql/mysql-connector-java/5.1.31/com/mysql/jdbc/PreparedStatement.java. Then after that, I carefully looked through the code of setString(int parameterIndex, String x) to find the characters which it escapes and customised this to my own class so that it can be used for the purposes that you need. After all, if this is the list of characters that Oracle escapes, then knowing this is really comforting security-wise. Maybe Oracle need a nudge to add a method similar to this one for the next major Java release.
public class SQLInjectionEscaper {
public static String escapeString(String x, boolean escapeDoubleQuotes) {
StringBuilder sBuilder = new StringBuilder(x.length() * 11/10);
int stringLength = x.length();
for (int i = 0; i < stringLength; ++i) {
char c = x.charAt(i);
switch (c) {
case 0: /* Must be escaped for 'mysql' */
sBuilder.append('\\');
sBuilder.append('0');
break;
case '\n': /* Must be escaped for logs */
sBuilder.append('\\');
sBuilder.append('n');
break;
case '\r':
sBuilder.append('\\');
sBuilder.append('r');
break;
case '\\':
sBuilder.append('\\');
sBuilder.append('\\');
break;
case '\'':
sBuilder.append('\\');
sBuilder.append('\'');
break;
case '"': /* Better safe than sorry */
if (escapeDoubleQuotes) {
sBuilder.append('\\');
}
sBuilder.append('"');
break;
case '\032': /* This gives problems on Win32 */
sBuilder.append('\\');
sBuilder.append('Z');
break;
case '\u00a5':
case '\u20a9':
// escape characters interpreted as backslash by mysql
// fall through
default:
sBuilder.append(c);
}
}
return sBuilder.toString();
}
}

In case you are dealing with a legacy system, or you have too many places to switch to PreparedStatements in too little time - i.e. if there is an obstacle to using the best practice suggested by other answers, you can try AntiSQLFilter

From:Source
public String MysqlRealScapeString(String str){
String data = null;
if (str != null && str.length() > 0) {
str = str.replace("\\", "\\\\");
str = str.replace("'", "\\'");
str = str.replace("\0", "\\0");
str = str.replace("\n", "\\n");
str = str.replace("\r", "\\r");
str = str.replace("\"", "\\\"");
str = str.replace("\\x1a", "\\Z");
data = str;
}
return data;
}

Most of the people are recommending PreparedStatements, however that requires you to have a direct connection with your Database using the Java Application. But then you'll have everyone else saying that you shouldn't have a direct connection to your database due to security issues, but utilize a Restful API to deal with queries.
In my opinion, as long as you're aware that you have to be careful with what you escape and do It deliberately, there shouldn't be a problem.
My solution is using contains() to check for SQL keywords such as UPDATE or other dangerous characters like = to completely nullify the SQL injection by asking the user to insert other characters on input.
Edit:
You can use this source material from W3Schools about Java Regular Expressions to do this validation on Strings.

After searching an testing alot of solution for prevent sqlmap from sql injection, in case of legacy system which cant apply prepared statments every where.
java-security-cross-site-scripting-xss-and-sql-injection topic
WAS THE SOLUTION
i tried #Richard s solution but did not work in my case.
i used a filter
The goal of this filter is to wrapper the request into an own-coded
wrapper MyHttpRequestWrapper which transforms:
the HTTP parameters with special characters (<, >, ‘, …) into HTML
codes via the org.springframework.web.util.HtmlUtils.htmlEscape(…)
method. Note: There is similar classe in Apache Commons :
org.apache.commons.lang.StringEscapeUtils.escapeHtml(…) the SQL
injection characters (‘, “, …) via the Apache Commons classe
org.apache.commons.lang.StringEscapeUtils.escapeSql(…)
<filter>
<filter-name>RequestWrappingFilter</filter-name>
<filter-class>com.huo.filter.RequestWrappingFilter</filter-class>
</filter>
<filter-mapping>
<filter-name>RequestWrappingFilter</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping>
package com.huo.filter;
import java.io.IOException;
import javax.servlet.Filter;
import javax.servlet.FilterChain;
import javax.servlet.FilterConfig;
import javax.servlet.ServletException;
import javax.servlet.ServletRequest;
import javax.servlet.ServletReponse;
import javax.servlet.http.HttpServletRequest;
public class RequestWrappingFilter implements Filter{
public void doFilter(ServletRequest req, ServletReponse res, FilterChain chain) throws IOException, ServletException{
chain.doFilter(new MyHttpRequestWrapper(req), res);
}
public void init(FilterConfig config) throws ServletException{
}
public void destroy() throws ServletException{
}
}
package com.huo.filter;
import java.util.HashMap;
import java.util.Map;
import javax.servlet.ServletException;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletRequestWrapper;
import org.apache.commons.lang.StringEscapeUtils;
public class MyHttpRequestWrapper extends HttpServletRequestWrapper{
private Map<String, String[]> escapedParametersValuesMap = new HashMap<String, String[]>();
public MyHttpRequestWrapper(HttpServletRequest req){
super(req);
}
#Override
public String getParameter(String name){
String[] escapedParameterValues = escapedParametersValuesMap.get(name);
String escapedParameterValue = null;
if(escapedParameterValues!=null){
escapedParameterValue = escapedParameterValues[0];
}else{
String parameterValue = super.getParameter(name);
// HTML transformation characters
escapedParameterValue = org.springframework.web.util.HtmlUtils.htmlEscape(parameterValue);
// SQL injection characters
escapedParameterValue = StringEscapeUtils.escapeSql(escapedParameterValue);
escapedParametersValuesMap.put(name, new String[]{escapedParameterValue});
}//end-else
return escapedParameterValue;
}
#Override
public String[] getParameterValues(String name){
String[] escapedParameterValues = escapedParametersValuesMap.get(name);
if(escapedParameterValues==null){
String[] parametersValues = super.getParameterValues(name);
escapedParameterValue = new String[parametersValues.length];
//
for(int i=0; i<parametersValues.length; i++){
String parameterValue = parametersValues[i];
String escapedParameterValue = parameterValue;
// HTML transformation characters
escapedParameterValue = org.springframework.web.util.HtmlUtils.htmlEscape(parameterValue);
// SQL injection characters
escapedParameterValue = StringEscapeUtils.escapeSql(escapedParameterValue);
escapedParameterValues[i] = escapedParameterValue;
}//end-for
escapedParametersValuesMap.put(name, escapedParameterValues);
}//end-else
return escapedParameterValues;
}
}

If you are using PL/SQL you can also use DBMS_ASSERT
it can sanitize your input so you can use it without worrying about SQL injections.
see this answer for instance:
https://stackoverflow.com/a/21406499/1726419

You can try sanitize the parameters, (not the 1st option)
Codec ORACLE_CODEC = new OracleCodec();
String user = req.getParameter("user");
String query = "SELECT user_id FROM user_data WHERE user_name = '" +
ESAPI.encoder().encodeForSQL( ORACLE_CODEC, user) + "' ...;

First, ask the question - are double or single quotes, or backslashes needed in user entry fields?
Backslashes - no. Double and single quotes are rarely used in English and they are used differently in Britain than the U.S.
I say remove or replace them and you simplify.
private String scrub(
String parameter,
int length
)
{
String parm = null;
if ( parameter != null && parameter.length() > 0 && parameter.length() < length )
{
parm = parameter
.replace( "\\", " " )
.replace( "\"", " " )
.replace( "\'", " " )
.replace( "\t", " " )
.replace( "\r", " " )
.replace( "\n", " " )
.trim();
}
return parm;
}

Related

Search true if the word is Singular or Plural Java

I am trying to achieve the result in which if the user enters the word, in plural or singular, the regex should return true
For example 'I want to by drone' or 'I want to by drones'.
#Test
public void testProductSearchRegexp() {
String regexp = "(?i).*?\\b%s\\b.*?";
String query = "I want the drone with FLIR Duo";
String data1 = "drone";
String data2 = "FLIR Duo";
String data3 = "FLIR";
String data4 = "drones";
boolean isData1 = query.matches(String.format(regexp, data1));
boolean isData2 = query.matches(String.format(regexp, data2));
boolean isData3 = query.matches(String.format(regexp, data3));
boolean isData4 = query.matches(String.format(regexp, data4));
assertTrue(isData1);
assertTrue(isData2);
assertTrue(isData3);
assertTrue(isData4);//Test fails here (obviously)
}
Your valuable time on this question is very appreciated.
English is a language with many exceptions. Checking whether a word ends in 's' is simply not sufficient to determine whether it's plural.
The best way to solve this problem is to not solve this problem. It's been done before. Take advantage of that. One solution would be to make use of a third party API. The OED have one, for example.
If you were to make a request to their API such as:
/entries/en/mice
You would get back a JSON response containing:
"crossReferenceMarkers": [
"plural form of mouse"
],
from there it should be easy to parse. Simply checking for the presence of the word 'plural' may be sufficient.
They even have working Java examples that you can copy and paste.
An advantage of this approach is there's no compile-time dependency. A disadvantage is that you're relying on being able to make HTTP requests. Another is that you're limited by any restrictions they impose. The OED allows up to 3k requests/month and 60 requests/minute on their free plan, which seems pretty reasonable to me.
Well something like this is very hard to achieve without external sources. Sure many words in plural end with 's' but there are also a lot of exceptions to this like "knife" and "knives" or "cactus" and "cacti". For that you could use a Map to sort these out.
public static String getPlural(String singular){
String plural;
HashMap<String,String> irregularPlurals = new HashMap<>();
irregularPlurals.put("cactus","cacti");
irregularPlurals.put("knife","knives");
irregularPlurals.put("man","men");
/*add all your irregular ones*/
plural = irregularPlurals.get(singular);
if (plural == null){
return singular + "s";
}else{
return plural;
}
}
Very simple and not very practical but gets the job done when you only have a few words.

Is using org.postgresql.core.Utils.escapeLiteral enough to prevent SQL Injections?

I need to sanitize some user entered data before building sql queries and updates to submit to my DB.
I know that it is preferable to use either prepared statements but this is not an option. Unfortunatly, I am stuck with escaping all user supplied Input.
It looks like the Postgres JDBC libs come with a tool to do String escaping. See org.postgresql.core.Utils.escapeLiteral(..) (attached below). I am hoping that since this comes with Postgres, that it is safe to use. After several hours of googling and looking at SQL cheatsheets I am unable to find an example that will break this.
Does the following look safe enough?
public class FruitDb {
private Connection connection;
public void findFruit ( String /* user enterable field */ fruitColor ) {
String query = "SELECT * FROM fruit WHERE fruit_color = " + quote( fruitColor );
Statement statement = connection.createStatement();
statement.executeQuery( sql );
}
private String quote( String toQuote ) {
return "'" + Utils.escapeLiteral( null, s, true ).toString() + "'";
}
}
For those interested here is the implementation of Utils.escapeLiteral. Looks reasonably safe to me...
package org.postgresql.core;
class Utils {
...
/**
* Escape the given literal <tt>value</tt> and append it to the string builder
* <tt>sbuf</tt>. If <tt>sbuf</tt> is <tt>null</tt>, a new StringBuilder will be
* returned. The argument <tt>standardConformingStrings</tt> defines whether the
* backend expects standard-conforming string literals or allows backslash
* escape sequences.
*
* #param sbuf the string builder to append to; or <tt>null</tt>
* #param value the string value
* #param standardConformingStrings if standard conforming strings should be used
* #return the sbuf argument; or a new string builder for sbuf == null
* #throws SQLException if the string contains a <tt>\0</tt> character
*/
public static StringBuilder escapeLiteral(StringBuilder sbuf, String value, boolean standardConformingStrings)
throws SQLException
{
if (sbuf == null)
{
sbuf = new StringBuilder(value.length() * 11 / 10); // Add 10% for escaping.
}
doAppendEscapedLiteral(sbuf, value, standardConformingStrings);
return sbuf;
}
private static void doAppendEscapedLiteral(Appendable sbuf, String value, boolean standardConformingStrings)
throws SQLException
{
try
{
if (standardConformingStrings)
{
// With standard_conforming_strings on, escape only single-quotes.
for (int i = 0; i < value.length(); ++i)
{
char ch = value.charAt(i);
if (ch == '\0')
throw new PSQLException(GT.tr("Zero bytes may not occur in string parameters."), PSQLState.INVALID_PARAMETER_VALUE);
if (ch == '\'')
sbuf.append('\'');
sbuf.append(ch);
}
}
else
{
// REMOVED. I am using standard encoding.
}
}
catch (IOException e)
{
throw new PSQLException(GT.tr("No IOException expected from StringBuffer or StringBuilder"), PSQLState.UNEXPECTED_ERROR, e);
}
}
}
Similar Questions:
How to safely escape arbitrary strings for SQL in PostgreSQL using Java - I actually answered this suggesting to use Utils.escapeLiteral(..) because I think that is a better solution than the excepted answer.
Can I protect against SQL Injection by escaping single-quote and surrounding user input with single-quotes?
Very good post: How can sanitation that escapes single quotes be defeated by SQL injection in SQL Server?
Yes, it is safe to use escapeLiteral() and escapeIdentifier().
In fact, escapeLiteral() and escapeIdentifier() are complements for PQescapeLiteral() and PQescapeIdentifier() that are defined in libpq.
The main difference is JDBC's escapeLiteral() doesn't consider database connection to validate character encoding. However, Java's internal encoding is Unicode and converted to database character encoding. Therefore, it does not matter.
Another difference I noticed is how strings are escaped as SQL Literal. PQescapeLiteral adds quotes for string, but escapeLiteral() doesn't. e.g. abc'xyz became 'abc''xyz' with PQescapeLiteral(), but it became abc''xyz (Notice no quotes around the result)
escapeIdentifier() adds quotes just like PQescapeIdentifier(). e.g. abc"xyz became "abc""xyz"
CERT TOP 10 Secure Coding Practices #7 (Sanitize Output) suggests to sanitize all variables regardless of previous checks/validations. One can avoid using escapeLiteral() by using prepared query, but not escapeIdentifier() since prepared query cannot separate parameterized identifiers. i.e.
SELECT user_specified_col FROM tbl;
or
SELECT * FROM tbl ORDER BY user_specified_col;
Developers must validate "user_specified_col" strictly, but they must sanitize parameters in case for improper validation. i.e. Output sanitization must be done independently.

Scanning a number and returning the lexeme in the input stream- Java?

I am trying to write a method that will scan the input and return a String representing the lexeme found in the input string.
This is what I have so far but I don't know if I'm going in the right direction-- all help would be appreciated :)
private String scanNumbers(char input)
{
String result= "";
int value = in.read()
if(value != -1)
{
If(isDigit(input))
{
result = Integer.toString(value);
}
}
return result;
}
public static boolean isDigit(char input)
{
return (input >= '0' && input <= '9');
}
Thank you I am new to parsing/lexemes/compilers.
Introduction
Questions that appear to be related to a homework exercise are often slow to be answered on SO. We often wait until the deadline has well passed!
You mention you are new to the topics of parsing/lexemes/compilers, and want some help in writing a Java method to scan the input and return a string representing the lexeme found in the input string. Later you clarify, indicating that you want a method that skips characters until it finds digits.
There is quite a bit of confusion in your question which produces conflicts in what you want to achieve.
It is not clear if you are wanting to learn about performing lexical analysis in Java as part of a larger compiler project, whether you only want to do it with numbers, whether you are looking for existing tools or methods that do this or are trying to learn how to program such methods yourself. If you are programming, whether you only need to know about reading a number, or if this is just an example of the kind of things you want to do.
Lexical Analysis
Lexical analysis, which is also known as scanning, is the process of reading a corpus of text which is composed of characters. This can be done for several purposes, such as data input, linguistic analysis of written material (such as word frequency counting) or part of language compilation or interpretation. When done as part of compilation it is one (and usually the first) of a sequence of phases that include parsing, semantic analysis, code generation, optimisation and such. In the writing of compilers code generator tools are usually used, so if it was desired to write a compiler in Java, then a Java lexical generator and a Java parser generator would often be used to create the Java code for those compiler components. Sometimes that lexer and parser are hand written, but it is not a recommended task for a novice. It would require a compiler writing specialist to build a compiler by hand better than a tool-set. Sometimes, as a class exercise, students are asked to write code to perform a piece lexical analysis to help them understand the process, but this is often for a few lexemes, like your digit exercise.
The term lexeme is used to describe a sequence of characters that compose an individual entity recognised by a lexical analyser. Once recognised it is usually represented by a token. The lexeme is therefore replaced by a token as part of the lexical analysis process. A lexical analyser will sometime record the lexeme in a symbol table for later use before replacing it by the token. This is how identifiers in programs are often recorded in a compiler.
There are several tools for building lexers in Java. Two of the most common are Jlex and JFlex. To illustrate how they work, to recognise an integer whilst skipping whitespace, we would use the following rules:
%%
WHITE_SPACE_CHAR=[\n\ \t\b\012]
DIGIT=[0-9]
%%
{WHITE_SPACE_CHAR}+ { }
{DIGIT}+ { return(new Yytoken(42,yytext(),yyline,yychar,yychar + yytext().length())); }
%%
which would be processed by the tool to produce Java methods to achieve that task.
The notations used to describe the lexemes are usually written as regular expressions. Computer Science theory can help us with the programming of a lexical analyser. Regular expressions can be represented by a form of finite state automata. There is a particular style of coding that can be used to match lexemes that experienced programers would recognise and use in this situation, which involves a switch inside a loop:
while ( ! eof ) {
switch ( next_symbol() ) {
case symbol:
...
break;
default:
error(diagnostic); break;
}
}
It is often these concepts that a simple lexical programming exercise is intended to introduce to students.
Tokenizing in Java
With all those preliminary explanations out of the way, lets get down to your piece of Java code. As mentioned in the comments there is a difference in Java between reading bytes from an input stream and reading characters, as characters are in unicode, which is represented by two bytes. You have used a byte read within a character processing method.
The recognising simple tokens in an input stream, particularly for data entry, is such a common activity that Java has a specific built-in class for that called the StreamTokenizer.
We could implement your task in the following way, for example:
// create a new tokenizer
Reader r = new BufferedReader(new InputStreamReader( System.in ));
StreamTokenizer st = new StreamTokenizer(r);
// print the stream tokens
boolean eof = false;
do {
int token = st.nextToken();
switch (token) {
case StreamTokenizer.TT_EOF:
System.out.println("End of File encountered.");
eof = true;
break;
case StreamTokenizer.TT_EOL:
System.out.println("End of Line encountered.");
break;
case StreamTokenizer.TT_NUMBER:
System.out.println("Number: " + st.nval);
break;
default:
System.out.println((char) token + " encountered.");
if (token == '!') {
eof = true;
}
}
} while (!eof);
However, this does not return the string of the lexeme for a number, only matches the number and gets the value.
I see you have noticed the Java class java.util.scanner because your question had that as a tag. This is another class that can perform similar operations.
We could get an integer lexeme from the input like this:
Scanner s = new Scanner(System.in);
System.out.println(s.nextInt());
Solution
Finally, lets re-write your original code to find the lexeme for an integer skipping over an unwanted characters, in which I use java regular expression matching.
import java.io.IOException; import java.io.InputStreamReader;
import java.util.regex.Pattern;
public class ReadNumbers {
static InputStreamReader in = null; // Have input source as a global
static int value = -1; // and the current input value
public static void main ( String [] args ) {
try {
in = new InputStreamReader(System.in); // Set up the input
value = in.read(); // pre-fill the input state
System.out.println(scanNumbers()) ;
}
catch (Exception e) {
e.printStackTrace(); // print error
}
}
private static String scanNumbers() {
String SkipCharacters = "\\s" ; // Characters that can be skipped
String result= ""; // empty string to store lexeme
int charcount=0;
try {
while ( (value != -1) && Pattern.matches(SkipCharacters,"" + (char)value) )
// Now skip optional characters before the number
value = in.read() ; // pre-load the next character
while ( (value != -1) && isDigit((char)value)) {
// Now find the number digits
result = result + (char)value; // append digit character to result
value = in.read() ; // pre-load the next character
}
} finally {
return result;
}
}
public static boolean isDigit(char input) {
return (input >= '0' && input <= '9');
}
}
Afterword
The comment from #markspace is interesting and useful, as it points out not all numbers are soley decimal digits.
Consider numbers in other bases, like hexdecimal. Java allows integer constants to be specified in those number bases which do not just use the digits 0..9.

sql command not properly ended (where is my mistakes)

I hope receive my answer this time
i wrote below code but don't know where is my mistake
it seem correct i think
this code should insert more than million records into oracle xe
i wrote it by single insert statement when execute PreparedStatement one by one
but it's run took 6 hours !!!!!!
because i was forced use thread.sleep()
package tokenizing;
import java.sql.*;
import java.util.StringTokenizer;
public class TokenExtraction2 {
public static void main(String[] args) throws Exception {
String myText[]=new String[2276];
Jdbc db=new Jdbc();
String st1=null;
int i=0;
int j=0;
String tokens[][]=new String [3000000][2];
st1="select ntext from NEWSTEXT ";
ResultSet result=db.select(st1);
while(result.next())
{
myText[i]=result.getString("ntext");
++i;
}
db.closedb();
i=0;
StringBuilder st= new StringBuilder("insert into tokens5(token,tokenlength) values");
while(i<2276)
{
StringTokenizer s=new StringTokenizer(myText[i]," 0123456789*./»«،~!##$%^&()_-\"+=:;|<>?“؟”’{}[]‘,\\\t\n\r\fabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ...—`—ـ؛–…_");
while(s.hasMoreTokens()){
String key=s.nextToken();
tokens[j][0]=key;
tokens[j][1]=(key.length())+"";
st.append("(?,?)");
if( i<2276 && s.hasMoreTokens())
st.append(", ");
else
st.append(";");
//db.insert(st, key, key.length());
//db.closedb();
System.out.println(key+"\t");
j++;
}
System.out.println("num of news is: "+i);
System.out.println("*****************************************************************************************");
System.out.println("num of tokens is: "+j);
System.out.println("next news"+"\t");
//j=0;
i++;
}
System.out.println(st);
int k=0;
Class.forName("oracle.jdbc.driver.OracleDriver") ;
Connection con = DriverManager.getConnection("jdbc:oracle:thin:#localhost:1521:xe","ALBALOO","myjava123");
PreparedStatement ps=con.prepareStatement(st.toString());
// con.setAutoCommit(false);
//j=1;
i=0;
//j=j-286;
while(k<j)
{
i=i+1;
ps.setString(i, tokens[k][0]);
System.out.println(i);
i=i+1;
ps.setInt(i,Integer.parseInt(tokens[k][1]));
System.out.println(k+2);
k++;
}
ps.executeUpdate();
//con.commit();
}
}
You seem to have trying to insert multiple rows with a single insert statement, by passing multiple sets of values; st appears to end up as:
insert into tokens5(token,tokenlength) values (?,?), (?,?);(?,?), ...;`
with thousands of value pair placeholder. You can't pass multiple sets of values like that. Oracle isn't expecting a comma after the first (?,?), hence the ORA-00933 error. You also have multiple semi-colons in there as you're putting one for each time around the i while loop. As Mark Rotteveel pointed out, you should not have any as Oracle JDBC doesn't allow multiple statements.
You might be better off implementing a string tokenizer as a function on the database and then doing a single insert ... select from newstext, rather than pulling all the data out, converting to and pushing it back. You should at least batch up your updates though. You could pass the tokens as an array argument to a stored procedure, for example.
I'm struggling to understand what you're really doing though, as it looks like you're splitting a string on pretty much any character, which doesn't leave much for the actual keys, does it? It's hard to follow though...
If you look at the Oracle INSERT description in the SQL Language Reference, then you can see that Oracle does not support inserting multiple rows using VALUES. Also as I commented above, using ; in a query doesn't always work as it is usually not part of the query itself, but a terminator for command line or script input.
In your specific case you are even trying to put multiple statements into one prepare. In JDBC a single statement prepare (or execute) should only be one actual statement, not multiple statements separated by ;. Drivers (or the database) usually don't allow it, although some provide options to execute multiple statements, but that is not compliant with JDBC.
Instead you can use JDBC batched updates:
con.setAutoCommit(false);
try (
PreparedStatement pstmt = con.
prepareStatement("insert into tokens5(token,tokenlength) values (?, ?)"
) {
// I use tokens as an abstraction on how you get the token and its length
while (tokens.next()) {
pstmt.setString(1, tokens.token());
pstmt.setInt(2, tokens.length());
pstmt.addBatch();
};
pstmt.executeBatch();
// Optionally do something with result of executeBatch()
con.commit();
}
Depending on the database+driver this will have similar runtime performance as a multi-values insert (I believe with Oracle it does), or simply behave as if you executed a single PreparedStatement multiple times with different values.

How to generate String "elegantly" in Java?

I want to generate a string such as sql command:
"INSERT INTO xxx VALUES(XXX, XXX, XXX)"
currently I use StringBuilder and some String constant like "INSERT INTO" to concatenate input String parameters for the table name and inserted values.
However, other than performance issue, this plain concatenation looks not elegant.
Is there any other way of doing this?
In my opinion, JDBC's prepared statement is one good example of such a "command template":
PreparedStatement pstmt=connection.createPreparedStatement("INSERT INTO ? VALUES(?,?,?)");
then you can set the table name and inserted value.
pstmt.setString(1,"tableA");
pstmt.setInt(2, 100);
...
However, I can not use prepared statement, since what I want is just String...
And someone give me some hint to use java.util.Regex or JavaCC to produce the String.
But as far as I can see, whatever is chosen for some code elegancy issue, Java String must be generated by something like StringBuilder, right???
You could use String.format():
String.format("insert into %s values('%s', '%s', '%s')", "user", "user123", "pass123", "yellow");
It's worth noting though, that any of these "string building" techniques leave you vulnerable to SQL injection attacks. You should really use JDBC parameterised queries wherever possible.
Edited to add quotes around strings.
Maybe you are looking for java.text.MessageFormat
int planet = 7;
String event = "a disturbance in the Force";
String result = MessageFormat.format(
"At {1,time} on {1,date}, there was {2} on planet {0,number,integer}.",
planet, new Date(), event);
Have you tried just using '+' ?
String sql = "INSERT INTO " + table
+" VALUES(" + value1 + ", " + value2 + ", " = value3+")";
Given the variety of other answers and none of them met your approval, perhaps you should accept that the actual String generation (sans JPA, PreparedStatement, etc.) is going to be fairly inelegant and create a utility class with static sql generators.
edit Showing an example of how I'd go about this if a pre-existing class such as PreparedStatement weren't an option. It's not the most elegant, but it does what it's supposed to (assuming I typed it all in correctly).
public class SQLUtil {
public static String generateInsertSQL(String tableName, List<CustomParameter> parmList){
StringBuilder sb = new Stringbuilder();
sb.append("insert into ");
sb.append(tableName);
sb.append(" values (");
for (int i = 0; i < parmList.size(); i++){
customParameter parm = parmList.get(i);
switch (parm.getType()) { // enum with your desired sql types
case ParmTypes.String:
sb.append("'");
sb.append(StringEscapeUtils.escapeSql(String.valueOf(parm.getValue())));
sb.append("'");
break;
case ParmTypes.Integer:
sb.append(Integer.valueOf(parm.getValue()));
break;
}
if (i < parmList.size() - 1) sb.append(",");
}
sb.append(")");
return sb.toString();
}
}
This way, your business code will remain relatively elegant and you can play around with the SQL String generation to your heart's content. You can also use this to "guarantee" all your inserts are protected against such attacks as SQL injection.
Use StringTemplate (http://www.stringtemplate.org/) maybe a good choice:
This looks better, right?
StringTemplate insert = new StringTemplate("INSERT $table$ VALUES ($value; separator=\",\"$)");
insert.setAttribute("table", "aTable");
String[] values = {"1", "1", "'aaa'", "'bbb'"};
for(int i = 0;i < values.length;i++){
insert.setAttribute("value", values[i]);
}
System.out.println(insert.toString());

Categories