Best Practices with PreparedStatements; when to and when not to

Best Practices with PreparedStatements; when to and when not to - java

I recently have began using prepared statements again in a web application, and I know that it is discouraged to use prepared statements for all the transactions. What I do not know is when it is best to use prepared statements or not.
I have read of when to use and not use them, but none of the examples really tell best practice of using them.
I am trying to figure out which database calls I should be using them for and which ones I should not.
For Example the MySQL website mentions it in "When to use prepared statements" on the following page Prepared Statements-MySQL

The general thumb rule in deciding whether to go for a PreparedStatement or not is:
Use Prepared Statements, unless you
have sufficient reason not to.
Prepared Statements are compiled
before execution therefore lending to
better performance, and increased
security against SQL injection as the
database server takes care of the
encoding of special characters.
Going by the article that you have referenced, the list of reasons where I believe Prepared Statements are less useful than normal queries or stored procedures are:
One-time queries. If your application makes a single query to the database, and this is done infrequently compared to the other queries, it might not make sense to use a Prepared Statement in this case. The rationale is that the Prepared Statement must first be compiled and the 'compiled' form of the statement is cached for later use. For queries that are run infrequently, the compilation is an overhead. But still, it is preferable to use prepared statements, to avoid any SQL injection issues.
Data-intensive operations. Sometimes Prepared Statements are not as effective as stored procedures, especially when a sequence of operations need to be performed in the same transaction. When you have a business process that requires multiple selects, updates and deletes to be executed against a variety of tables, stored procedures are often better than a bunch of prepared statements executed one after the other. This performance penalty can turn serious as several network trips are made for the execution of multiple statements, which is considerably reduced when invoking a stored procedure. This effect is more pronounced in query batching where several objects are created and destroyed in a short duration of time. This often tends to be a contentious issue between database administrators and application developers, as this is an edge-case; DBAs will believe that the batching of operations is better performed via SPs, while application developers believe that PreparedStatements can handle it (its usually better to have all logic in one tier). It eventually boils down to the application on whether using SPs is an advantage or not.
Support for native database operations and types.. This might not hold good for MySQL, but in general the JDBC standard does not support all the operations supported by a database, and all the SQL/native/custom types supported by the database. This is more pronounced in the Oracle database (and possibly IBM DB2?), where programmers can create their own types, which require custom Java code to be written as the JDBC standard does not support User-Defined Types in the database. Similarly, other operations in the database need to not supported (as the MySQL document states) - one cannot create users (execute CREATE USER), modify user privileges (perform GRANT operations) etc. using a Prepared Statement. Stored procedures are better suited to this task, as they would have access to the native operation set of the database, either in a direct or indirect manner.

In order to prevent SQL Injection it is better to use prepared statements in Java
For more information: SQL injections with prepared statements?

PreparedStatements have two major uses:
Preventing SQL injection attacks. This basically means automated sanitizing of inputs from external sources (web browser is external!) which are going to be saved to the database.
Batch processing. If you have a lot of data to enter into/modify in/remove from database at once, PreparedStatement can be used for that. In this case, PreparedStatement optimizes away most of the overhead of such operations and allows you to write fast database batch code.
Both of these reasons are a very compelling ones to justify using PreparedStatement almost always, however depending on how you're using the database you may hit a point where PreparedStatement won't allow you to do what you want.
As an example of such case, I've once written a tool which generated table names on the fly based on runtime properties of certain abstractions which meant that I had to be able to have SQL queries with mutable table names; you can't get those with PreparedStatement so I had to use raw Statements and some preprocessing trickery to get back to utilizing PreparedStatements for SQL injection protection.

Related

createQuery vs createNativeQuery, performance difference for update/delete statements

Is there a performance difference between:
entityManager.createQuery("UPDATE MyTable SET coll1 = :someValue").setParameter("someValue").executeUpdate();
and
entityManager.createNativeQuery("UPDATE MyTable SET coll1 = :someValue").setParameter("someValue").executeUpdate();
and if yes, is it high enough to use 1 approach over the another?
I am making a performance comparison between hibernate and entity framework core. In EF core this kind of thing can only be done using native SQL (well, there are third party libs) so i want to know if i should switch out all createQuery().executeUpdate() for createNativeQuery().executeUpdate() on my hibernate project.

As with anything of this nature, you should test on your data and your system.
However, the createNativeQuery() interface is designed to let you invoke SQL directly, rather than going through the ORM mapping. You have a simple update statement here, so the generated SQL should be remarkably close to the native SQL.
You are not relying on any underlying features of the database. There might be a little additional overhead in the translation via the ORM -- but you have already accepted that overhead by choosing to use an ORM.
I would say to stick with the framework, unless testing shows that there is a noticeable loss of performance.

Statement.executeQuery() and SQL injection

We have internal web based tool, that allows arbitrary SQL queries to database. Access to the the tool is limited. I am more worried about mistakes or accidents than someone intentionally tampering data or attacks.
The queries are ultimately executed by Statement.executeQuery and results are returned. I tried few test runs and it seems like executeQuery, as documentation suggests, fails on any other call than select.
Are there any other SQL statements / combinations that can trick executeQuery call to cuase changes in database (insert/update/delete/drop etc.). I tried few SQL injection examples available on the web and it failed in every case.

SQL injection attacks are possible when the query arguments are concatenated to the query template, therefore allowing a rogue attacker to inject a malicious code.
If your Statement queries don't take any parameter, the client has no way to inject a malicious SQL routine. Whenever you have parameterized queries, you should use PreparedStatement instead.
As for statement restriction, you should have the DBA provide you a database user account that can only execute SELECT and DML statements on the application schema only. DROP and TRUNCATE privileges shouldn't be allowed to the application user account.
If you use dynamic schema upgrade (e.g. FleywayDB), you can use a separate database account and a separate DataSource for that specific case.
This way, you will also protect you against data corruptions due to application developers mistakes.

Using Java locks for database concurrency

I have the following scenario.
I have two tables. One stores multi values that are counters for transactions. Through a java application the first table value is read, incremented and written to the second table, as well as the new value being written back to the first table. Obviously there is potential for this to go wrong as it's a multiple user system.
My solution, in Java, to the issue is to provide Locks that have to, well should, be aquired before any action can be taken on either table. These Locks, ReentrantLocks, are static and there is one for each column in Table 1 as the values are completely independent of each other.
Is this a recommended approached?
Cheers.

No. Use implicit Database Locks1 for Database Concurrency. Relational databases support Transactions which are a vital part of ACID: use them.
Java-centric locks will not work cross-VM and as such will not help in multi-User/Server environments.
1 Databases are smart enough to acquire/release locks to ensure Consistency and Isolation and may even use "lock free" implementations such as MVCC. There are rare occasions when explicit database locks must be requested, but this is an advanced use-case.

Whilst agreeing with some of the sentiments of #pst's answer, I would say this depends slightly.
If the sequence of events is, and probably always will be, essentially "SQL oriented", then you may as well do the locking at the database level (and indeed, probably implicitly via the use of transactions).
However if there is, or you are planning to build in, significant data manipulation logic within your app tier (either generally or in the case of this specific operation), then locking at the app level may be more appropriate. (In reality, you will probably still run your SQL in transactions so that you're actually locking at both levels.)
I don't think the issue of multiple VMs is necessarily a compelling issue on its own for relying on DB-level locking. If you have multiple server apps accessing the database, you will in any case want to establish a well-defined protocol for which data is accessed concurrently under what circumstances. And in a system of moderate complexity, you will in any case want to build in a system of running periodic sanity checks on the data. (Even if your server apps are perfectly behaved 100% of the time, will back end tech support never ever ever have to run some miscellaneous SQL on the database outside your app...?)

Disadvantages of Object Relational Mapping

I am a fan of ORM - Object Relational Mapping and I have been using it with Rails for the past year and a half. Prior that, I use to write raw queries using JDBC and make Database do the heavy lifting via Stored Procedures. With ORM, I was initially happy to do stuff like coach.manager and manager.coaches which were very simple and easy to read.
But as time went by there were in-numerous associations creeping up and I ended up doing a.b.c.d which were firing queries in all directions, behind the scenes. With rails and ruby, the garbage collector went nuts and took insane time to load a very complex page which involves relatively lesser data. I had to replace this ORM style code by a simple Stored procedure and the result I saw was enormous. A page that took 50 seconds to load now takes only 2 seconds.
With this huge difference, should I continue using ORM? It is very clear it has severe overheads compared to a raw query.
In general, what are the general pitfalls of using an ORM framework like Hibernate, ActiveRecord?

An ORM is only a tool. If you don't use it correctly, you'll have bad results.
Nothing stops you from using dedicated HQL/criteria queries, with fetch joins or projections, to return the information that your page must display in as few queries as possible. This will take more or less the same time as dedicated SQL queries.
But of course, if you just get everything by ID and navigate through your objects without realizing how many queries it generates, it will lead to long loading times. The key is to know exactly what the ORM does behind the scene, and decide if it's appropriate or if another strategy must be adopted.

I think you've already identified the major tradeoff associated with ORM software. Every time you add a new layer of abstraction that tries to provide a generalized implementation of something that you used to do by hand there is going to be some loss of performance/efficiency.
As you noted, traversing multiple relationships such as a.b.c.d can be inefficient, because most ORM software will be doing an independent database query for each . along the way. But I'm not sure that means you should eliminate ORM altogether. Most ORM solutions (or at least, certainly Hibernate) allow you to specify custom queries where you can bring back exactly what you want in a single database operation. This should be about as fast as your dedicated SQL.
Really the issue is about understanding how the ORM layer is working behind the scenes, and realizing that while something like a.b.c.d is simple to write, what it causes the ORM layer to do as it is evaluated is not. As a general rule I always go with the simplest possible approach to begin, and then write optimized queries in areas where it makes sense/where it is obvious that the simple approach will not scale.

I'd say, one should use the appropriate tool for different tasks.
E.g., for CRUD operations, ORM frameworks like Hibernate can speed up development and it will perform well enough. Sometimes you need to do some necessary tweaks to achieve acceptable performance. I'm not sure, your task (what took 50 sec with Hibernate) could not be done properly with Hibernate, because you did not provide us with the details.
On the other hand, for example bulk operations involving hundreds of thousands of records is not the type of task you'd expect Hibernate will do without significant performance penalty.

As it was mentioned already, ORM is only a tool and you can use it eiter good or bad.
One of the most typical performance problems in ORMs is 1+N queries problem. It is caused by loading additional objects for each of objects from the list. This is caused by eager fetch of 1-to-n-relation entities for each element on list, the dealing is using HQL queries, specifying fields in projection or marking fetching 1-to-n relations to lazy.
Any time, you must exactly know what the ORM is doing in order to achieve good performance. Not understanding what operations are done in background is a way to disaster (slow, buggy and hard to analyze code because of unnecessary and wrongly written work-arounds).

I'm with Petar from your comments regarding the lazy fetching. Say you have an html table filled fields from object a.b.c.d. You could find your framework round-tripping the database thousands of times(possibly many more) . The disadvantage of ORM in this case is you have to read the documentation thoroughly. Most frameworks support disabling lazy fetching and many even support adding your own processing logic to bind the data set.
The net out is that almost any ORM is almost undoubtedly better than anything you are going to write yourself. You will find yourself saddled with maintaining huge libraries of boilerplate or worse writing the same code over and over again.

We are currently investigating to switch from our own data store layer with clean separation of transfer objects and data access objects to JPA. We used a generator to create the TOs, the DAOs and the SQL DDL as well from some documentation in docbook format. By this all of our stuff from documentation, the database structure and the generated Java classes where always in sync with a good documentation of the database itself.
What we discovered so far by using JPA:
Foreign key references cannot be used for imports, some special
queries and so on because they must not be placed in a managed
entity. JPA only allows the target class there.
Access to some user session scope is difficult upto impossible. We
still have no clue how to get the users id into the column
'userWhoLastMadeAnUpdate' in some PrePersist method.
Something expected to be quite easy with an ORM, namely "class
mapping" does not work at all. We are using HalDateTime
(http://sourceforge.net/projects/haldatetime/) internally.
Especially in the client. Mapping it with JPA directly is not
possible although HalDateTime supports it. Due to JPA restrictions
we have to use two fields in the entity.
JPA uses either one XML file to describe the mapping. So you have to
look at least into two files to even understand the relationship
between the Java class and the database. And the XML file becomes
huge for large applications.
Alternatively ORMs provide annotations in the Java class itself. So
its easier to learn and understand the relationship. But it forces
you to see all that database stuff in the client layer (which
completely breaks a proper layering).
You will have to restrict yourself to stay as close to a clean
database structure as anyhow possible. Otherwise you will for sure
end up with a mess of queries and statements by the ORM.
Use an ORM which provides a query language which is close to SQL
itself (JPA seems quite acceptable here). An ORM induced language
makes supporting a large application really expensive.

Avoiding SQL Injection

I want to avoid SQL Injections in my Webapp.
It's Java based.
Are PreparedStatements enough?
Do i have to filter out the ' and "? Are there already solutions for this in Java?

My gut response to the question in your second paragraph is that it's usually a bad idea to consider a single aspect "enough" for this sort of issue - at least if you do this to the point that you stop thinking about the principles involved.
Using PreparedStatements does go a long way to stopping SQL injection, just like using slapping down synchronized everywhere goes a long way to stopping data races. And in many individual situations they'll be entirely sufficient. But in both cases they're not magic bullets - you need to be aware of the reasons you're using them, and when and where they're insufficient. For example, if you think PreparedStatements are a magic wrapper that prevents SQL injection, you'll be very disappointed the first time you need to create a dynamic statement (as opposed to merely a parameterised one) based on user input.
Thus the thing that's "enough", is education. Understand how and why the threat works; once you grok that, you'll be able to take the appropriate actions to a given situation (which sometimes is just using a PreparedStatement, but not always). I'm not aware of any particularly good resources on SQL injection though (above and beyond what you can get from Google), so hopefully other answers can point you to the One True Tutorial!

Simply never craft your SQLs manually by concatenating Strings, always use PreparedStatement and parameterize it with ? wildcards. JDBC driver will take care of escaping, so you don't have to do it yourself.
On the other hand escaping is hard. You would be surprised how many ways there are to work around your escaping algorithms. JDBC driver will do the job properly.

Although Prepared Statements helps in defending against SQL Injection, there are possibilities of SQL Injection attacks through inappropriate usage of Prepared Statements. The example below explains such a scenario where the input variables are passed directly into the Prepared Statement and thereby paving way for SQL Injection attacks.
Example:
String strUserName = request.getParameter("Txt_UserName");
PreparedStatement prepStmt = con.prepareStatement("SELECT * FROM user WHERE userId = '+strUserName+'");
More information on preventing SQL injections here.
OWASP is a great place to start for anything security related to software development.
They have java libraries which you can use to prevent XSS and SQL injections.
They also have a webapp which is very unsecure, which you can try to hack, and by that learn how not to do it.

Prepared statements can be enough. If using prepared statements you still have to take care of building the statements with wildcards only. In other words, it's possible to use prepared statements the wrong way. You do not have to filter out any parameters to avoid SQL injection. Nevertheless, you may need to filter out certain values to avoid web based attacks (like XSS), depends on your environment and scope.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.