I have just found RowSets for database querying with JDBC. They are stateless and cacheable, they look to be superior to ResultSets.
Can PreparedStatements be used with them though? PreparedStatements are a performance booster for querying very large databases, and not something I would want to give up (before something is said, this is not premature optimization, we have a proven speed need!!). I need the fastest query return to a set here, caching is secondary.
The default implementation of RowSet use prepared statements internally.
I would have been surprised if that was not the case.
See the code JDBCRowSetImpl code your self http://www.google.com/codesearch/p?hl=en#TTY8xLpnKOE/src/share/classes/com/sun/rowset/JdbcRowSetImpl.java&q=JDBCRowSetImpl
you will want to look at prepare() method.
Note: Poking around the code is why i love Open Source :D
Related
I am a fan of ORM - Object Relational Mapping and I have been using it with Rails for the past year and a half. Prior that, I use to write raw queries using JDBC and make Database do the heavy lifting via Stored Procedures. With ORM, I was initially happy to do stuff like coach.manager and manager.coaches which were very simple and easy to read.
But as time went by there were in-numerous associations creeping up and I ended up doing a.b.c.d which were firing queries in all directions, behind the scenes. With rails and ruby, the garbage collector went nuts and took insane time to load a very complex page which involves relatively lesser data. I had to replace this ORM style code by a simple Stored procedure and the result I saw was enormous. A page that took 50 seconds to load now takes only 2 seconds.
With this huge difference, should I continue using ORM? It is very clear it has severe overheads compared to a raw query.
In general, what are the general pitfalls of using an ORM framework like Hibernate, ActiveRecord?
An ORM is only a tool. If you don't use it correctly, you'll have bad results.
Nothing stops you from using dedicated HQL/criteria queries, with fetch joins or projections, to return the information that your page must display in as few queries as possible. This will take more or less the same time as dedicated SQL queries.
But of course, if you just get everything by ID and navigate through your objects without realizing how many queries it generates, it will lead to long loading times. The key is to know exactly what the ORM does behind the scene, and decide if it's appropriate or if another strategy must be adopted.
I think you've already identified the major tradeoff associated with ORM software. Every time you add a new layer of abstraction that tries to provide a generalized implementation of something that you used to do by hand there is going to be some loss of performance/efficiency.
As you noted, traversing multiple relationships such as a.b.c.d can be inefficient, because most ORM software will be doing an independent database query for each . along the way. But I'm not sure that means you should eliminate ORM altogether. Most ORM solutions (or at least, certainly Hibernate) allow you to specify custom queries where you can bring back exactly what you want in a single database operation. This should be about as fast as your dedicated SQL.
Really the issue is about understanding how the ORM layer is working behind the scenes, and realizing that while something like a.b.c.d is simple to write, what it causes the ORM layer to do as it is evaluated is not. As a general rule I always go with the simplest possible approach to begin, and then write optimized queries in areas where it makes sense/where it is obvious that the simple approach will not scale.
I'd say, one should use the appropriate tool for different tasks.
E.g., for CRUD operations, ORM frameworks like Hibernate can speed up development and it will perform well enough. Sometimes you need to do some necessary tweaks to achieve acceptable performance. I'm not sure, your task (what took 50 sec with Hibernate) could not be done properly with Hibernate, because you did not provide us with the details.
On the other hand, for example bulk operations involving hundreds of thousands of records is not the type of task you'd expect Hibernate will do without significant performance penalty.
As it was mentioned already, ORM is only a tool and you can use it eiter good or bad.
One of the most typical performance problems in ORMs is 1+N queries problem. It is caused by loading additional objects for each of objects from the list. This is caused by eager fetch of 1-to-n-relation entities for each element on list, the dealing is using HQL queries, specifying fields in projection or marking fetching 1-to-n relations to lazy.
Any time, you must exactly know what the ORM is doing in order to achieve good performance. Not understanding what operations are done in background is a way to disaster (slow, buggy and hard to analyze code because of unnecessary and wrongly written work-arounds).
I'm with Petar from your comments regarding the lazy fetching. Say you have an html table filled fields from object a.b.c.d. You could find your framework round-tripping the database thousands of times(possibly many more) . The disadvantage of ORM in this case is you have to read the documentation thoroughly. Most frameworks support disabling lazy fetching and many even support adding your own processing logic to bind the data set.
The net out is that almost any ORM is almost undoubtedly better than anything you are going to write yourself. You will find yourself saddled with maintaining huge libraries of boilerplate or worse writing the same code over and over again.
We are currently investigating to switch from our own data store layer with clean separation of transfer objects and data access objects to JPA. We used a generator to create the TOs, the DAOs and the SQL DDL as well from some documentation in docbook format. By this all of our stuff from documentation, the database structure and the generated Java classes where always in sync with a good documentation of the database itself.
What we discovered so far by using JPA:
Foreign key references cannot be used for imports, some special
queries and so on because they must not be placed in a managed
entity. JPA only allows the target class there.
Access to some user session scope is difficult upto impossible. We
still have no clue how to get the users id into the column
'userWhoLastMadeAnUpdate' in some PrePersist method.
Something expected to be quite easy with an ORM, namely "class
mapping" does not work at all. We are using HalDateTime
(http://sourceforge.net/projects/haldatetime/) internally.
Especially in the client. Mapping it with JPA directly is not
possible although HalDateTime supports it. Due to JPA restrictions
we have to use two fields in the entity.
JPA uses either one XML file to describe the mapping. So you have to
look at least into two files to even understand the relationship
between the Java class and the database. And the XML file becomes
huge for large applications.
Alternatively ORMs provide annotations in the Java class itself. So
its easier to learn and understand the relationship. But it forces
you to see all that database stuff in the client layer (which
completely breaks a proper layering).
You will have to restrict yourself to stay as close to a clean
database structure as anyhow possible. Otherwise you will for sure
end up with a mess of queries and statements by the ORM.
Use an ORM which provides a query language which is close to SQL
itself (JPA seems quite acceptable here). An ORM induced language
makes supporting a large application really expensive.
Does exists a java library that can create sql statements?
I'm not in search of something fancy, just something at "string manipulation" level: I just use jdbc (with Preparestatements and Resultsets) but I don't really like to pass huge strings containing SQL code...
What I need is a "simple" Select class (or something similar); in my mind all I really want is to be able to do
SQLStatement stat = Select("*").from("table").where("condition and condition").orderby("something");
ResultSet rs = Connection.getResultSet(stat.toString());
/* equals to "select * from table where condition and condition order by something" */
Maybe I'm blind, but I cannot find something like that...
Obviously, I want some methods/class able to write inserts and updates and the other stuff...
I excluded ORMs for two reasons:
the db schema it's "old" and I cannot change it, and I'm not sure how can I adapt the ORM to follow our db
AFAIK the ORMs needs to change the model (maybe adding a base class, maybe you need to implements an interface) and the model in my project is big, old and grumpy
Onestly, I don't really like ORMs: Objects and Set theory just aren't made to be mapped (IMHO)
ORM (Object Relational Mapping) library is the clue.
Hibernate is the most mature one.
And the Hibernate-s Criteria API is object - oriented way to create such queries as You wished. Criteria API doc.
Hibernate is most likely what you're looking for. It contains many advanced features, but SQL statements are more straightforward.
Take a look at their site: http://www.hibernate.org/
I'd also recommend skimming through this guide:
https://www.owasp.org/index.php/Preventing_SQL_Injection_in_Java
Try SQLBuilder project. Honestly, I have not used this. Looking at their docs, i think it might suit your requirement.
You can also try to find similar APIs in Sourceforge,Google code etc..
I am not sure if you use Java for a native application or for the web.
If you use Java for web you could consider using the Play framework.
Easy and has Hibernate included with a really simple implementation (easier when implementing Hibernate yourself).
I want to avoid SQL Injections in my Webapp.
It's Java based.
Are PreparedStatements enough?
Do i have to filter out the ' and "? Are there already solutions for this in Java?
My gut response to the question in your second paragraph is that it's usually a bad idea to consider a single aspect "enough" for this sort of issue - at least if you do this to the point that you stop thinking about the principles involved.
Using PreparedStatements does go a long way to stopping SQL injection, just like using slapping down synchronized everywhere goes a long way to stopping data races. And in many individual situations they'll be entirely sufficient. But in both cases they're not magic bullets - you need to be aware of the reasons you're using them, and when and where they're insufficient. For example, if you think PreparedStatements are a magic wrapper that prevents SQL injection, you'll be very disappointed the first time you need to create a dynamic statement (as opposed to merely a parameterised one) based on user input.
Thus the thing that's "enough", is education. Understand how and why the threat works; once you grok that, you'll be able to take the appropriate actions to a given situation (which sometimes is just using a PreparedStatement, but not always). I'm not aware of any particularly good resources on SQL injection though (above and beyond what you can get from Google), so hopefully other answers can point you to the One True Tutorial!
Simply never craft your SQLs manually by concatenating Strings, always use PreparedStatement and parameterize it with ? wildcards. JDBC driver will take care of escaping, so you don't have to do it yourself.
On the other hand escaping is hard. You would be surprised how many ways there are to work around your escaping algorithms. JDBC driver will do the job properly.
Although Prepared Statements helps in defending against SQL Injection, there are possibilities of SQL Injection attacks through inappropriate usage of Prepared Statements. The example below explains such a scenario where the input variables are passed directly into the Prepared Statement and thereby paving way for SQL Injection attacks.
Example:
String strUserName = request.getParameter("Txt_UserName");
PreparedStatement prepStmt = con.prepareStatement("SELECT * FROM user WHERE userId = '+strUserName+'");
More information on preventing SQL injections here.
OWASP is a great place to start for anything security related to software development.
They have java libraries which you can use to prevent XSS and SQL injections.
They also have a webapp which is very unsecure, which you can try to hack, and by that learn how not to do it.
Prepared statements can be enough. If using prepared statements you still have to take care of building the statements with wildcards only. In other words, it's possible to use prepared statements the wrong way. You do not have to filter out any parameters to avoid SQL injection. Nevertheless, you may need to filter out certain values to avoid web based attacks (like XSS), depends on your environment and scope.
I recently have began using prepared statements again in a web application, and I know that it is discouraged to use prepared statements for all the transactions. What I do not know is when it is best to use prepared statements or not.
I have read of when to use and not use them, but none of the examples really tell best practice of using them.
I am trying to figure out which database calls I should be using them for and which ones I should not.
For Example the MySQL website mentions it in "When to use prepared statements" on the following page Prepared Statements-MySQL
The general thumb rule in deciding whether to go for a PreparedStatement or not is:
Use Prepared Statements, unless you
have sufficient reason not to.
Prepared Statements are compiled
before execution therefore lending to
better performance, and increased
security against SQL injection as the
database server takes care of the
encoding of special characters.
Going by the article that you have referenced, the list of reasons where I believe Prepared Statements are less useful than normal queries or stored procedures are:
One-time queries. If your application makes a single query to the database, and this is done infrequently compared to the other queries, it might not make sense to use a Prepared Statement in this case. The rationale is that the Prepared Statement must first be compiled and the 'compiled' form of the statement is cached for later use. For queries that are run infrequently, the compilation is an overhead. But still, it is preferable to use prepared statements, to avoid any SQL injection issues.
Data-intensive operations. Sometimes Prepared Statements are not as effective as stored procedures, especially when a sequence of operations need to be performed in the same transaction. When you have a business process that requires multiple selects, updates and deletes to be executed against a variety of tables, stored procedures are often better than a bunch of prepared statements executed one after the other. This performance penalty can turn serious as several network trips are made for the execution of multiple statements, which is considerably reduced when invoking a stored procedure. This effect is more pronounced in query batching where several objects are created and destroyed in a short duration of time. This often tends to be a contentious issue between database administrators and application developers, as this is an edge-case; DBAs will believe that the batching of operations is better performed via SPs, while application developers believe that PreparedStatements can handle it (its usually better to have all logic in one tier). It eventually boils down to the application on whether using SPs is an advantage or not.
Support for native database operations and types.. This might not hold good for MySQL, but in general the JDBC standard does not support all the operations supported by a database, and all the SQL/native/custom types supported by the database. This is more pronounced in the Oracle database (and possibly IBM DB2?), where programmers can create their own types, which require custom Java code to be written as the JDBC standard does not support User-Defined Types in the database. Similarly, other operations in the database need to not supported (as the MySQL document states) - one cannot create users (execute CREATE USER), modify user privileges (perform GRANT operations) etc. using a Prepared Statement. Stored procedures are better suited to this task, as they would have access to the native operation set of the database, either in a direct or indirect manner.
In order to prevent SQL Injection it is better to use prepared statements in Java
For more information: SQL injections with prepared statements?
PreparedStatements have two major uses:
Preventing SQL injection attacks. This basically means automated sanitizing of inputs from external sources (web browser is external!) which are going to be saved to the database.
Batch processing. If you have a lot of data to enter into/modify in/remove from database at once, PreparedStatement can be used for that. In this case, PreparedStatement optimizes away most of the overhead of such operations and allows you to write fast database batch code.
Both of these reasons are a very compelling ones to justify using PreparedStatement almost always, however depending on how you're using the database you may hit a point where PreparedStatement won't allow you to do what you want.
As an example of such case, I've once written a tool which generated table names on the fly based on runtime properties of certain abstractions which meant that I had to be able to have SQL queries with mutable table names; you can't get those with PreparedStatement so I had to use raw Statements and some preprocessing trickery to get back to utilizing PreparedStatements for SQL injection protection.
We are currently evaluating options for migrating from hand-written persistence layer to ORM.
We have a bunch of legacy persistent objects (~200), that implement simple interface like this:
interface JDBC {
public long getId();
public void setId(long id);
public void retrieve();
public void setDataSource(DataSource ds);
}
When retrieve() is called, object populates itself by issuing handwritten SQL queries to the connection provided using the ID it received in the setter (this usually is the only parameter to the query). It manages its statements, result sets, etc itself. Some of the objects have special flavors of retrive() method, like retrieveByName(), in this case a different SQL is issued.
Queries could be quite complex, we often join several tables to populate the sets representing relations to other objects, sometimes join queries are issued on-demand in the specific getter (lazy loading). So basically, we have implemented most of the ORM's functionality manually.
The reason for that was performance. We have very strong requirements for speed, and back in 2005 (when this code was written) performance tests has shown that none of mainstream ORMs were that fast as hand-written SQL.
The problems we are facing now that make us think of ORM are:
Most of the paths in this code are well-tested and are stable. However, some rarely-used code is prone to result set and connection leaks that are very hard to detect
We are currently squeezing some additional performance by adding caching to our persistence layer and it's a huge pain to maintain the cached objects manually in this setup
Support of this code when DB schema changes is a big problem.
I am looking for an advice on what could be the best alternative for us. As far as I know, ORMs has advanced in last 5 years, so it might be that now there's one that offers an acceptable performance. As I see this issue, we need to address those points:
Find some way to reuse at least some of the written SQL to express mappings
Have the possibility to issue native SQL queries without the necessity to manually decompose their results (i.e. avoid manual rs.getInt(42) as they are very sensitive to schema changes)
Add a non-intrusive caching layer
Keep the performance figures.
Is there any ORM framework you could recommend with regards to that?
UPDATE To give a feeling of what kind of performance figures we are talking about:
The backend database is TimesTen, in-memory database that runs on the same machine as the JVM
We found out that changing rs.getInt("column1") to rs.getInt(42) brings the performance increase we consider significant.
If you want a standard persistence layer that lets you issue native SQL queries, consider using iBATIS. It's a fairly thin mapping between your objects and SQL. http://ibatis.apache.org/
For caching and lazy joins, Hibernate might be a better choice. I haven't used iBATIS for these purposes.
Hibernate provides a lot of flexibility in allowing you to specify certain defaults for lazy loading as you traverse your object graph, yet also pre-fetch data with SQL or HQL queries to your heart's content when you need better-known load times. However, the conversion effort will be complicated for you as it has a fairly high bar to entry in terms of learning and configuration. Annotations made this easier for me.
Two benefits you didn't mention about switching to a standard framework:
(1) running down bugs becomes easier when you have a wealth of sites and forums out there to support you.
(2) new hires are cheaper, easier and faster.
Good luck in addressing your performance and usability issues. The tradeoffs you point out are very common. Sorry if I evangelized.
For the bulk of your queries, I'd go with hibernate. It's widely used,well documented, and generally performant. You can drop down to hand-written SQL if hibernate isn't producing efficient enough queries. Hibernate gives you a lot of control in specifying the table names and columns that the domain objects map to, and in most cases you can retro fit it to an exisitng schema.
Find some way to reuse at least some of the written SQL to express mappings
The mappings are expressed in JPA using annotations. You can use the existing SQL as a guide when creating JPQL queries.
Add a non-intrusive caching layer
Caching in hibernate is automatic and transparent, unless you specifically choose to get involved. You can mark entities as read only, or evict from the cache, control when changes are flushed to the database (inside a transaction of course - automatic use of batching improves performance when network latency is a concern.)
Have the possibility to issue native
SQL queries without the necessity to
manually decompose their results (i.e.
avoid manual rs.getInt(42) as they
are very sensitive to schema changes)
Hibernate allows you to write SQL, and have this mapped to your entities. You don't deal with the ResultSet directly - hibernate takes care of the deconstruction into your entity. See Chpt 16, Native SQL in the hibernate manual.
Support of this code when DB schema changes is a big problem.
Managing schema changes can still be a pain, since you now effectively have two schemata - the database schema and the JPA mapping (an object schema). if you choose to let hibernate generate the db schema and move your data to that, you are no longer directly responsible for what goes into the database, and so you are then faced with manging automatic changes to a machine generated schema. There are tools that can assist, such as dbmigrate, and liquibase, but it's no walk in the park. Conversely, if you are managing the db schema by hand, then you will have to carefully recraft your entities, JPA annotations and queries to accomodate the schema changes. Adding columns and new entities is relatively trivial, but more complex changes such as changing a single property to a collection of properties, or restructing an object hierarchy will involve considerably more extensive changes. There is no easy way out of this - either the db or hibernate is the "master" that decides the schema, and when one changes, the other must follow. The code changes aren't so bad - in my experience, it's migrating the data that's difficult. But this is a basic issue with databases, and will be present in any solution you choose.
So, to sum up, I'd go with hibernate, and use the JPA interface.
I've recently drilled through a bunch of Java ORMs and didn't come up with anything much better than Hibernate. Hibernate's performance may get you there and satisfy your performance goals.
Lots of people think that moving to Hibernate will make everything so awesome, but it's really just moving a set of problems from JDBC queries into Hibernate tuning. Read a bunch of books or (better) hire a "Hibernate guy" to come in and help.
During your refactor, I'd recommend using JPA so you can un-plug and re-plug a new persistence provider when the Next Big Thing comes along (or you move to Oracle)
Do you really need to migrate? What's forcing you to move? Is there some REAL need here or someone just inventing work (an 'Astronaut architect')?
I agree with the above answers though - if you HAVE to move - Hibernate or iBatis are good choices. iBatis especially if you want to stay 'closer' to the SQL.
If you need more performance: drop the database (for on-line work) and handle the persistence direct. Adding caching is not going to help you with a TimesTen DB, it just adds an extra copy (slowing you down).
You might want to take a look at GemFire.
There is a lot of good advice already in here that I won't repeat. The only thing I didn't see suggested that might work for you is caching reference data in memory.
I have done quite a bit of this in the past and it does save a lot of time. If you have a large number of fairly static reference tables, load them all into memory at startup time and refresh them every couple minutes. That way you're not hitting the DB over and over again for data that never changes.