I have a performance issue with ResultSet.next() while retrieving data from an sqlite database. I am using org.sqlite.JDBC driver. I tried setting up statement/ResultSet.fetchSize(), but that did not work either.
The operation lasts around 25 seconds when I use an SQL statement with a string paramater in the WHERE clause (with a non-string parameter in the WHERE, it lasts just 4 seconds).
I found that with some drivers it is possible to set up send string parameter as unicode = false in the URL, which should improve performance. However, I was not able to find way how to set it up with SQLite connection.
Can some one help me with this issue?
In SQLite, result rows are computed on demand. (SQLite is not a client/server database, so there is no communication overhead.)
So when next() is slow, this just means that your query is slow, and that the database must do much I/O and/or many computations until it can return the next row.
To speed it up, you have to improve the database design, or the query, or the hardware.
Related
I am using Sybase ASE, and for a table, in which I will save results calculated by Java. This table has 10 columns, one column type is INT value (but not an ID column), and other 9 columns are all VARCHAR(50) type.
There's no index or trigger on this table (in fact this table is really independent). I need to insert around 160K rows into this table. I tried to separate the work by batch, which will do 10,000 insertions every time. I used two different ways, one is Spring's JdbcTemplate.batchUpdate the other one is native JDBC PreparedStatement.executeBatch api.
However no clear winner regarding the performance. Both of them takes around 25 to 30 seconds for 10K insertions.
Then I thought it could be related to the JDBC driver, so I tried two different drivers: jConnect and jTDS. No real impact on insertion performance.
Finally I decided to compare Sybase with another database, i.e. PostgreSQL in my test. I kept the same Java code, and surprisingly PostgreSQL takes only 0.3 second for every 10K insertions, while Sybase took 25 to 30 seconds (75 to 100 times longer).
DBA support team explains the difference is due to that PostgreSQL is installed on my local machine, while Sybase is installed on our enterprise's server. However, I am not convinced by this explanation at all.
Does anyone know if there's a configuration in Sybase which could considerably impact the insertion speed? Or are there any other possible causes for my above scenario?
The delay that you see on the sybase end is because of a lot of factors that needs to be checked and comparing it to a different database that too on a local machine is not correct.
For starting, we need to check the network latency and the storage used in the sybase database. We need to check the sybase server configuration, page size and locking scheme of the table that you are inserting into. We also need to do a basic health check of the server while you are inserting the data. As you have mentioned that you have used two different ways to insert the data, It is important that you check whether these two ways along are updated accordingly to the sybase client you have installed on your system.
To sum it up, It may be a simple issue as blocking on the sybase instance or it could be related to the storage which is not able to write it quickly. Given the sybase is configured properly, The performance would be very good.
Whether the DB server is local or not may indeed make a significant difference. Until you cut out this factor, comparison with a local DB makes little sense.
But that aside, there are many aspects that affect insert performance in ASE. First off, make sure the overall memory configuration (e.g. data cache and procedure cache) is not too small -- leaving it at the installation defaults is a guarantee for disappointing results. Then there is network packet size that can play a role. And the batch size (#rows before you commit). And the table's lock scheme.
Trying to use minimally logged inserts will help (requires config setting changes), especially since the table has no indexes (and no UNIQUE or PK constraints either?)
The ASE server page size (which you choose when you create the server) also makes a difference: bigger is basically better for inserts.
Set the ENABLE_BULK_LOAD parameter to True. It will speed it up.
We are working in our team pretty tight up with an Oracle DB server using jdbc. In one of our changes, I'm calling a Stored Procedure which returns me two different ResultSets. At first my implementation assumed default Scroll-ability.
After that failed, I looked it up in the Internet.
Everything I could read about it said basically the same thing: use prepareStatement or prepareCall methods with the appropriate TYPE_SCROLL_INSENSITIVE and CONCUR_READ_ONLY. None of these worked.
The Stored Procedure I use, again, return me two different result sets and they are extracted through a (ResultSet) rs.getObject("name"). Generally in examples, their ResultSet are coming back instantly from a .executeQuery.
My Question is, Do the Scrollablility/Updatability types in the prepareCall methods affecting these sort of ResultSets? if so, how do I get them?
I know that the JDBC driver can degrade my request for ScrollableResultSet. How can I tell if my ResultSet was degraded?
On that note, Why aren't ResultSets scrollable by default? What are the best practices and what is "the cost" of their flexibility?
In Oracle, a cursor is a forward-only structure. All the database knows how to do is fetch the next row (well, technically the next n rows). In order to make a ResultSet seem scrollable, you rely on the JDBC driver.
The JDBC driver has two basic approaches to making ResultSet seem scrollable. The first is to save the entire result set in memory as you fetch data just in case you want to go backwards. Functionally, that works but it has potentially catastrophic results on performance and scalability when a query potentially returns a fair amount of data. The first time some piece of code starts chewing up GB of RAM on app servers because a query returned thousands of rows that included a bunch of long comment fields, that JDBC driver will get rightly pilloried as a resource hog.
The more common approach is to for the driver to add a key to the query and to use that key to manage the data the driver caches. So, for example, the driver might keep the last 1000 rows in memory in their entirety but only cache the key for the earlier rows so it can go back and re-fetch that data later. That's more complicated to code but it also requires that the ResultSet has a unique key. Normally, that's done by trying to add a ROWID to the query. That's why, for example, the Oracle JDBC driver specifies that a scrollable or updatable ResultSet cannot use a SELECT * but can use SELECT alias.*-- the latter makes it possible for the driver to potentially be able to blindly add a ROWID column to the query.
A ResultSet coming from a stored procedure, however, is completely opaque to the driver-- it has no way of getting the query that was used to open the ResultSet so it has no way to add an additional key column or to go back and fetch the data again. If the driver wanted to make the ResultSet scrollable, it would have to go back to caching the entire ResultSet in memory. Technically, that is entirely possible to do but very few drivers will do so since it tends to lead to performance problems. It's much safer to downgrade the ResultSet. Most of the time, the application is in a better position to figure out whether it is reasonable to cache the entire ResultSet because you know it is only ever going to return a small amount of data or to be able to go back and fetch rows again by their natural key.
You can use the getType() and getConcurrency() methods on your ResultSet to determine whether your ResultSet has been downgraded by the driver.
I'm trying to render a list of records in a jsf page with a query that returns many records, I'm using weblogic 10.3.0.0 and sql server driver 4 from Microsoft to connect a sql server database, when I run this jsf page, this consumes a lot of memory because the query returns many record and therefore a OutOfMemoryError is occurring. I've seen that with setFetchSize you can limit the results, but here:
What does Statement.setFetchSize(nSize) method really do in SQL Server JDBC driver?
the microsoft's sql server driver not limit this, I used jDTS driver as the above post has suggested, but the same problem ocurre, I've also tried to use this:
http://msdn.microsoft.com/en-us/library/bb879937.aspx
for use adaptive buffering with the driver, but my driver version is 4 so this by default has adaptive buffering, but apparently no, I've tried this:
statement = connectionDB.createStatement();
SQLServerStatement SQLstmt = (SQLServerStatement) statement;
SQLstmt.setResponseBuffering("adaptive");
but this no returns results, I've put this in the connection properties also, but the problem still occure, I understand that the problem is the query have big results and the driver no execute it with chunks, and therefore the memory is decreasing, and I believe that this is the problem. I don't know who workaround use, if do the manual pagination with the query, if use another driver, etc., please help me to find a workaround, whatever info is well received, sorry for my poor english
A few days ago I had to create some processing performance tests using an in memory computing framework. So in order to do this, I needed a big data pool, which was increased incrementally given the various performance tests.
The DB was Oracle, containing a table of 22 fields. This table needed to be populated gradually from 1 mil records to 100 mil records.
For populating the table with 1 mil, I generated random test data and used java Statement to insert it into DB, and that has taken around 17 and 16 seconds minutes. After that, I quickly realized that to populate a 100 mil records table will take forever so I tried it with PreparedStatement because I knew that is a bit faster… but the difference was so immense, 1 min and 24 seconds, that I have started to search the web the reason behind this and found out some reasons but nothing that, in my opinion, should have this impact.
this is what I have found that might explain this difference:
LINK
PreparedStatement gets pre compiled
In database and there access plan is also cached in database, which allows database to execute parametric query written using prepared statement much faster than normal query because it has less work to do. You should always try to use PreparedStatement in production JDBC code to reduce load on database. In order to get performance benefit its worth noting to use only parametrized version of sql query and not with string concatenation.
BUT
all the data was generated randomly, so no major caching from oracles side should be involved.
Oracle is probably able to cache the query plan in the statement cache; per the Oracle® Database JDBC Developer's Guide Implicit Statement Caching,
When you enable implicit Statement caching, JDBC automatically caches the prepared or callable statement when you call the close method of this statement object. The prepared and callable statements are cached and retrieved using standard connection object and statement object methods.
Plain statements are not implicitly cached, because implicit Statement caching uses a SQL string as a key and plain statements are created without a SQL string. Therefore, implicit Statement caching applies only to the OraclePreparedStatement and OracleCallableStatement objects, which are created with a SQL string.
I am using Java to read from a SQL RDBMS and return the results to the user. The problem is that the database table has 155 Million rows, which make the wait time really long.
I wanted to know if it is possible to retrieve results as they come from the database and present them incrementaly to the user (in batches).
My query is a simple SELECT * FROM Table_Name query.
Is there a mechanism or technology that can give me callbacks of DB records, in batches until the SELECT query finishes?
The RDBMS that is used is MS SQL Server 2008.
Thanks in advance.
Methods Statement#setFetchSize and Statement#getMoreResults are supposed to allow you to manage incremental fetches from the database. Unfortunately, this is the interface spec and vendors may or may not implement these. Memory management during a fetch is really down to the vendor (which is why I wouldn't strictly say that "JDBC just works like this").
From the JDBC documentation on Statement :
setFetchSize(int rows)
Gives the JDBC driver a hint as to the number of rows that should be
fetched from the database when more rows are needed for ResultSet
objects genrated by this Statement.
getMoreResults()
Moves to this Statement object's next result, returns true if it is a
ResultSet object, and implicitly closes any current ResultSet object(s)
obtained with the method getResultSet.
getMoreResults(int current)
Moves to this Statement object's next result, deals with any current
ResultSet object(s) according to the instructions specified by the given
flag, and returns true if the next result is a ResultSet object.
current param indicates Keep or close current ResultSet?
Also, this SO response answers about the use of setFetchSize with regards to SQLServer 2005 and how it doesn't seem to manage batched fetches. The recommendation is to test this using the 2008 driver or moreover, to use the jTDS driver (which gets thumbs up in the comments)
This response to the same SO post may also be useful as it contains a link to SQLServer driver settings on MSDN.
There's also some good info on the MS technet website but relating more to SQLServer 2005. Couldn't find the 2008 specific version in my cursory review. Anyway, it recommends creating the Statement with:
com.microsoft.sqlserver.jdbc.SQLServerResultSet.TYPE_SS_SERVER_CURSOR_FORWARD_ONLY (2004) scrollability for forward-only, read-only access, and then use the setFetchSize method to tune performance
Using pagination (LIMIT pageno, rows / TOP) might create holes and duplicates, but might be used in combination with checking the last row ID (WHERE id > ? ORDER BY id LIMIT 0, 100).
You may use TYPE_FORWARD_ONLY or FETCH_FORWARD_ONLY.
This is exactly how is JDBC driver supposed to work (I remember the bug in old PostgreSQL driver, that caused all fetched records to be stored in memory).
However, it enables you to read record when the query starts to fetch them. This is where I would start to search.
For example, Oracle optimizes SELECT * queries for fetching the whole set. It means it can take a lot of time before first results will appear. You can give hints to optimize for fetching first results, so you can show first rows to your user quite fast, but the whole query can take longer to execute.
You should test your query on console first, to check when it starts to fetch results. Then try with JDBC and monitor the memory usage while you iterate through ResultSet. If the memory usage grows fast, check if you have opened ResultSet in forward-only and read-only mode, if necessary update driver.
If such solution is not feasible because of memory usage, you can still use cursors manually and fetch N rows (say, 100) in each query.
Cursor documentation for MSSQL: for example here: http://msdn.microsoft.com/en-us/library/ms180152.aspx