I'm using SELECT GEN_ID(TABLE,1) FROM MON$DATABASE from a PreparedStatement to generate an ID that will be used in several tables.
I'm going to do a great number of INSERTs with PreparedStatements batches and I'm looking for a way to fetch a lot of new IDs at once from Firebird.
Doing a trigger seems to be out of the question, since I have to INSERT on other tables at another time with this ID in the Java code. Also, getGeneratedKeys() for batches seem to not have been implemented yet in (my?) Firebird JDBCdriver.
I'm answering from memory here, but I remember that I once had to load a bunch of transactions from a Quicken file into my Firebird database. I loaded an array with the transactions and set a variable named say iCount to the number. I then did SELECT GEN_ID(g_TABLE, iCount) from RDB$DATABASE. This gave me the next ID and incremented the generator by the number of records that I was going to insert. Then I started a transaction, stepped through the array and inserted the records one after the other and closed the transaction. I was surprised how fast this went. I think, at the time, I was working with about 28,000 transactions and the time was like a couple of seconds. Something like this might work for you.
As jrodenhi says, you can reserve a range of values using
SELECT GEN_ID(<generator>, <count>) FROM RDB$DATABASE
This will return a value of <count> higher than the previously generated key, so you can use all values from (value - count, value] (where ( signifies exclusive, ] inclusive). Say generator currently has value 10, calling GEN_ID(generator, 10) will return 20, you can then use 11...20 for ids.
This does assume that you normally use generators to generated ids for your table, and that no application makes up its own ids without using the generator.
As you noticed, getGeneratedKeys() has not been implemented for batches in Jaybird 2.2.x. Support for this option will be available in Jaybird 3.0.0, see JDBC-452.
Unless you are also targeting other databases, there is no real performance advantage to use batched updates (in Jaybird). Firebird does not support update batches, so the internal implementation in Jaybird does essentially the same as preparing a statement and executing it yourself repeatedly. This might change in the future as there are plans to add this to Firebird 4.
Disclosure: I am one of the Jaybird developers
Related
For my website, I'm creating a book database. I have a catalog, with a root node, each node have subnodes, each subnode has documents, each document has versions, and each version is made of several paragraphs.
In order to create this database the fastest possible, I'm first creating the entire tree model, in memory, and then I call session.save(rootNode)
This single save will populate my entire database (at the end when I'm doing a mysqldump on the database it weights 1Go)
The save coasts a lot (more than an hour), and since the database grows with new books and new versions of existing books, it coasts more and more. I would like to optimize this save.
I've tried to increase the batch_size. But it changes nothing since it's a unique save. When I mysqldump a script, and I insert it back into mysql, the operation coast 2 minutes or less.
And when I'm doing a "htop" on the ubuntu machine, I can see the mysql is only using 2 or 3 % CPU. Which means that it's hibernate who's slow.
If someone could give me possible techniques that I could try, or possible leads, it would be great... I already know some of the reasons, why it takes time. If someone wants to discuss it with me, thanks for his help.
Here are some of my problems (I think): For exemple, I have self assigned ids for most of my entities. Because of that, hibernate is checking each time if the line exists before it saves it. I don't need this because, the batch I'm executing, is executed only one, when I create the databse from scratch. The best would be to tell hibernate to ignore the primaryKey rules (like mysqldump does) and reenabeling the key checking once the database has been created. It's just a one shot batch, to initialize my database.
Second problem would be again about the foreign keys. Hibernate inserts lines with null values, then, makes an update in order to make foreign keys work.
About using another technology : I would like to make this batch work with hibernate because after, all my website is working very well with hibernate, and if it's hibernate who creates the databse, I'm sure the naming rules, and every foreign keys will be well created.
Finally, it's a readonly database. (I have a user database, which is using innodb, where I do updates, and insert while my website is running, but the document database is readonly and mYisam)
Here is a exemple of what I'm doing
TreeNode rootNode = new TreeNode();
recursiveLoadSubNodes(rootNode); // This method creates my big tree, in memory only.
hibernateSession.beginTrasaction();
hibernateSession.save(rootNode); // during more than an hour, it saves 1Go of datas : hundreads of sub treeNodes, thousands of documents, tens of thousands paragraphs.
hibernateSession.getTransaction().commit();
It's a little hard to guess what could be the problem here but I could think of 3 things:
Increasing batch_size only might not help because - depending on your model - inserts might be interleaved (i.e. A B A B ...). You can allow Hibernate to reorder inserts and updates so that they can be batched (i.e. A A ... B B ...).Depending on your model this might not work because the inserts might not be batchable. The necessary properties would be hibernate.order_inserts and hibernate.order_updates and a blog post that describes the situation can be found here: https://vladmihalcea.com/how-to-batch-insert-and-update-statements-with-hibernate/
If the entities don't already exist (which seems to be the case) then the problem might be the first level cache. This cache will cause Hibernate to get slower and slower because each time it wants to flush changes it will check all entries in the cache by iterating over them and calling equals() (or something similar). As you can see that will take longer with each new entity that's created.To Fix that you could either try to disable the first level cache (I'd have to look up whether that's possible for write operations and how this is done - or you do that :) ) or try to keep the cache small, e.g. by inserting the books yourself and evicting each book from the first level cache after the insert (you could also go deeper and do that on the document or paragraph level).
It might not actually be Hibernate (or at least not alone) but your DB as well. Note that restoring dumps often removes/disables constraint checks and indices along with other optimizations so comparing that with Hibernate isn't that useful. What you'd need to do is create a bunch of insert statements and then just execute those - ideally via a JDBC batch - on an empty database but with all constraints and indices enabled. That would provide a more accurate benchmark.
Assuming that comparison shows that the plain SQL insert isn't that much faster then you could decide to either keep what you have so far or refactor your batch insert to temporarily disable (or remove and re-create) constraints and indices.
Alternatively you could try not to use Hibernate at all or change your model - if that's possible given your requirements which I don't know. That means you could try to generate and execute the SQL queries yourself, use a NoSQL database or NoSQL storage in a SQL database that supports it - like Postgres.
We're doing something similar, i.e. we have Hibernate entities that contain some complex data which is stored in a JSONB column. Hibernate can read and write that column via a custom usertype but it can't filter (Postgres would support that but we didn't manage to enable the necessary syntax in Hibernate).
I have a MySQL database where I need to do a 1k or so updates, and I am contemplating whether it would be more appropriate to use executeBatch or executeUpdate. The preparedstatement is to be built on an ArrayList of 1k or more ids (which are PKs of the table to be updated). For each update to the table I need to check if it was updated or not (it's possible that the id is not in the table). In the case that the id doesn't exist, I need to add that id to a separate ArrayList which will be used to do batch inserts.
Given the above, is it more appropriate to do:
Various separate executeUpdate() and then store the id if it is not updated, or
Simply create a batch and use executeBatch(), which will return an array of either a 0 or 1 for each separate statement/id.
In case two, the overhead would be an additional array to hold all the 0 or 1 return values. In case one, the overhead would be due to executing each UPDATE separately.
Definitely executeBatch(), and make sure that you add "rewriteBatchedStatements=true" to your jdbc connection string.
The increase in throughput is hard to exaggerate. Your 1K updates will likely take barely longer than a single update, assuming that you have proper indexes and a WHERE clause that makes use of them.
Without the extra setting on the connection string, the time to do the batch update is going to be about the same as to do each update individually.
I'd go with batch since network latency is something to consider unless you are somehow running it on the same box.
I am executing a custom built DML statement using the
namedParameterJdbcTemplate.update(sql, valueMap);
call, where the sql is built based on the values in the map. Here my map could get very large and thus the sql might also get very lengthy. I understand that in Oracle, there is no fixed number for how long a query can be and there are many factors including the database configuration that may affect this value, but I would like to limit the query length to a fixed number.
What is the best way to limit the query length? Would the spring-batch API be any useful here?
Thanks in advance for any pointers.
I would choose one of these approaches:
Temporary table - Insert data in batch to the temporary table and then use MERGE INTO statement with that table.
Create SQL type for your rows and bind just that one parameter. (google for OraData - it is a bit tricky but it works)
Both will enable you to have a small static query and therefore avoid potential problems with too large query (and its parsing, polluting library cache etc.).
My question is very simple and in the title. Google and stack overflow are giving me nothing so I figured it was time to ask a question.
I am currently in the process of making an sql query for when users register to my site. I have ALWAYS only used prepared statements b/c the extra coding in callable statements, and the performance hit of regular statements are both turn offs. However this query is causing me to think of possible alternatives to my previous one size fits all (prepared statements) ways.
This query has a total of 4 round trips to the database. The steps are
Insert a user into the database, get back the generated key (their user id) within a result set.
Take the user id and insert a row into the album table. Get back a generated key (album id)
Take the album id and insert a row into the images table. Get back a generated key (image id)
Take the image id and update the user tables current default column with the image id
Aside: For anyone interested in the way I am getting the keys back after my inserts it is with Statement.RETURN_GENERATED_KEYS and you can read a great article about this here - IBM Article
So anyway I'd like to know if the use of 4 round trip (but cacheable) prepared statements is okay or if I should go with batched (but not cacheable) statements?
JDBC batch statements let you reduce the number of roundtrips under a condition that there is no data dependency among the rows that you are inserting or updating. Your scenario fails this condition, because the changes are dependent on each other's data: statements 2 through 4 must pick up an ID from the prior statement 1 through 3.
On the other hand, four round-trips is definitely suboptimal. That is why scenarios like yours call for stored procedures: you can put all this logic into a create_user_proc, and return the user ID back to the caller. All insertions from 1 to 4 would happen inside your SQL code, letting you manage ID dependencies in SQL. You would be able to call this stored procedure in a single roundtrip, which is definitely faster, especially if you process multiple user registrations per minute.
I would advice to write one Stored Proc doing all this four operation and passing the all the required params from application (to stored proc) at once and there in stored proc, you can get the generated keys for resultset
To increase performance and reduce database round trips, I agree with dasblinkenlight and ajduke - stored procedures will achieve this.
But, it this really a performance bottleneck in your application?
How often do users register on your site?
Compare this to how often information is read from these tables (once per page access?)
If information in these tables are being read thousands of times more than being written via new registrations, then it might not be worth going for the stored procedure approach.
Why you might not want to use stored procedures and stick to prepared statements:
not as portable as using prepared statements (a different syntax/language for each database, some simpler databases don't even support them)
will not work with ORM solutions such as JPA* - you mentioned using PreparedStatements directly so this probably does not apply to you, at least not now but it might limit you later on if you wanted to use ORM in the future
*JPA 2.1 might actually support stored procedures, but as of writing it has not yet been released.
I have a producer thread in Java pulling items from an Oracle table every n milliseconds.
The current implementation relies on a Java timestamp in order to retrieve data and never re-retrieve them again.
My objective is to get rid of the timestamp pattern and directly update the same items I'm pulling from the database.
Is there a way to SELECT a set of items and UPDATE them at the same time to mark them as "Being processed"?
If not, would a separate UPDATE query relying on the IN clause be a major performance hit?
I tried using a temporary table for that purpose, but I've seen that performance was severely affected.
Don't know if it helps, but the application is using iBatis.
If you are using oracle 10g or higher, you can use the RETURNING clause of the update statement. If you wish the retrieve more than one row you can use the BULK COLLECT statement.
Here is a link to some examples;
http://psoug.org/snippet/UPDATE-with-RETURNING-clause_604.htm