How to inspect data within a database transaction? - java

I'm running an integration test that executes some Hibernate code within a single transaction (managed by Spring). The test is failing with a duplicate key violation and I'd like to hit a breakpoint just before this and inspect the table contents. I can't just go into MySQL Workbench and run a SELECT query as it would be outside the transaction. Is there another way?

After reading your comments, my impression that predominantly you are interested in how to hit a breakpoint and at the same time be able to examine database contents. Under normal circumstances I would just offer you to log the SQLs. Having the breakpoint in mind my suggestion is:
Reduce isolation level to READ_UNCOMMITED for the integration test.
Reducing the isolation level will allow you to see the uncommitted values in the database during the debugging. As long as you don't have parallel activity within the integration test. It should be fine.
Isolation level can be set up on per connection basis. There is no need for anything to be done on the server.
One side note. If you are using Hibernate even the parallel activities may work fine when you reduce the ISOLATION LEVEL because largely Hibernate behaves as it is in REPEATABLE_READ because of the transactional Level 1 cache.

The following can be run from Eclipse's "Display" view:
java.util.Arrays.deepToString(
em.createNativeQuery("SELECT mystuff FROM mytable").getResultList().toArray())
.replace("], ", "]\n");
This displays all the data, albeit not in a very user-friendly way - e.g. will need to work out which columns the comma-separated fields correspond to.

Related

Is it possible to know the progress of a transaction.commit operation?

I'm using JPA in my application to bundle a series of insert and updates into one commit() operation.
While that commit is running, is it possible to learn the progress of that operation (0-100%) so I can display that in a progress bar to the user?
I could split my updates into many commits, but that would make the entire job take longer.
Using EclipseLink as my JPA provider.
I think the only way to create something like that would be to use the org.hibernate.stat.internal.StatisticsImpl class of hibernate. You can programmatically get different metrics from the instantiation of this class. Hibernate statistics generation must be enabled for this to work. You can enable it by setting the property hibernate.generate_statistics to true.
The statistics instance has a method called getQueryExecutionCount() that you might be able to use to build a progress bar. It gives the number of queries that were executed by the current JPA EntityManagerFactory or Hibernate. If you keep calling that method in a while loop while the queries are still running you might be able to show the percentage of completed queries by dividing the return value of getQueryExecutionCount() by the total amount of queries that need to be processed. Heres a good tutorial that explains all the different metrics that are available.
I must also point out that turning on hibernate statistics could slow your application down. So if you want to use this feature in production then you must also test whether this slowdown is acceptable or not.
EDIT: You could also choose to only turn hibernate statistics on right before the queries will run and turn it off after they've completed.
The StatisticsImpl class has a method called setStatisticsEnabled(boolean b) that you can use to programmatically turn it on or off.
EDIT 2: I'm assuming here that you are using Hibernate as the JPA provider. If not i'll remove this answer.

Parallel updates to different entity properties

I'm using JDO to access Datastore entities. I'm currently running into issues because different processes access the same entities in parallel and I'm unsure how to go around solving this.
I have entities containing values and calculated values: (key, value1, value2, value3, calculated)
The calculation happens in a separate task queue.
The user can edit the values at any time.
If the values are updated, a new task is pushed to the queue that overwrite the old calculated value.
The problem I currently have is in the following scenario:
User creates entity
Task is started
User notices an error in his initial entry and quickly updates the entity
Task finishes based on the old data (from step 1) and overwrites the entire entity, also removing the newly entered values (from step 3)
User is not happy
So my questions:
Can I make the task fail on update in step 4? Wrapping the task in a transaction does not seem to solve this issue for all cases due to eventual consistency (or, quite possibly, my understanding of datastore transactions is just wrong)
Is using the low-level setProperty method the only way to update a single field of an entity and will this solve my problem?
If none of the above, what's the best way to deal with a use case like this
background:
At the moment, I don't mind trading performance for consistency. I will care about performance later.
This was my first AppEngine application, and because it was a learning process, it does not use some of the best practices. I'm well aware that, in hindsight, I should have thought longer and harder about my data schema. For instance, none of my entities use ancestor relationships where they would be appropriate. I come from a relational background and it shows.
I am planning a major refactoring, probably moving to Objectify, but in the meantime I have a few urgent issues that need to be solved ASAP. And I'd like to first fully understand the Datastore.
Obviously JDO comes with optimistic concurrency checking (should the user enable it) for transactions, which would prevent/reduce the chance of such things. Optimistic concurrency is equally applicable with relational datastores, so you likely know what it does.
Google's JDO plugin uses the low-level API setProperty() method obviously. The log even tells you what low level calls are made (in terms of PUT and GET). Moving to some other API will not on its own solve such problems.
Whenever you need to handle write conflicts in GAE, you almost always need transactions. However, it's not just as simple as "use a transaction":
First of all, make sure each logical unit of work can be defined in a transaction. There are limits to transactions; no queries without ancestors, only a certain number of entity groups can be accessed. You might find you need to do some extra work prior to the transaction starting (ie, lookup keys of entities that will participate in the transaction).
Make sure each unit of work is idempotent. This is critical. Some units of work are automatically idempotent, for example "set my email address to xyz". Some units of work are not automatically idempotent, for example "move $5 from account A to account B". You can make transactions idempotent by creating an entity before the transaction starts, then deleting the entity inside the transaction. Check for existence of the entity at the start of the transaction and simply return (completing the txn) if it's been deleted.
When you run a transaction, catch ConcurrentModificationException and retry the process in a loop. Now when any txn gets conflicted, it will simply retry until it succeeds.
The only bad thing about collisions here is that it slows the system down and wastes effort during retries. However, you will get at least one completed transaction per second (maybe a bit less if you have XG transactions) throughput.
Objectify4 handles the retries for you; just define your unit of work as a run() method and run it with ofy().transact(). Just make sure your work is idempotent.
The way I see it, you can either prevent the first task from updating the object because certain values have changed from when the task was first launched.
Or you can you embed the object's values within the task request so that the 2nd calc task will restore the object state with consistent value and calcuated members.

Is there a good way to execute MySQL statements atomically via JDBC?

Suppose I have a table that contains valid data. I would like to modify this data in some way, but I'd like to make sure that if any errors occur with the modification, the table isn't changed and the method returns something to that effect.
For instance, (this is kind of a dumb example, but it illustrates the point so bear with me) suppose I want to edit all the entries in a "name" column so that they are properly capitalized. For some reason, I want either ALL of the names to have proper capitalization, or NONE of them to have proper capitalization (and the starting state of the table is that NONE of them do).
Is there an already-implemented way to run a batch update on the table and be assured that, if any one of the updates fails, all changes are rolled back and the table remains unchanged?
I can think of a few ways to do this by hand (though suggestions are welcomed), but it'd be nice if there was some method I could use that would function this way. I looked at the java.sql.statement.executeBatch() command, but I'm not convinced by the documentation that my table wouldn't be changed if it failed in some manner.
I hit this one too when starting with JDBC - it seemed to fly in the face of what I understood about databases and ACID guarantees.
Before you begin, be sure that your MySQL storage engine supports transactions. MyISAM doesn't support transactions, but InnoDB does.
Then be sure to disable JDBC autoCommit - Connection.setAutoCommit(false), or the JDBC will run each statement as a separate trasaction. The commit will be an all or nothing affair - there will be no partial changes.
Then you run your various update statements, and finally call Connection.commit() to commit the transaction.
See the Sun Tutorial for more details about JDBC transactions.
Using a batch does not change the ACID guarantees - you're either transacted or you're not! - batching is more about collecting multiple statements together for improved performance.

Why does query caching with Hibernate make the query ten times slower?

I'm currently experimenting with EJB3 as a prestudy for a major project at work. One of the things I'm looking into is query caching.
I've made a very simple domain model with JPA annotations, a #Local business interface and a #Stateless implementation in an EJB-JAR, deployed in an EAR together with a very simple webapp to do some basic testing. The EAR is deployed in JBoss 5.0.1 default config with no modifications. This was very straighforward, and worked as expected.
However, my latest test involved query caching, and I got some strange results:
I have a domain class that only maps an ID and a String value, and have created about 10000 rows in that particular table
In the business bean, there's a very simple query, SELECT m FROM MyClass m
With no cache, this executes in about 400ms on average
With query cache enabled (through hints on the query), the first execution of course takes a little longer, about 1200ms. The next executions take 3500ms on average!
This puzzled me, so I enabled Hibernate's show_sql to look at the log. Uncached, and on the first execution with cache enabled, there is one SELECT logged, as expected. When I should get cache hits, Hibernate logs one SELECT for each row in the database table.
That would certainly explain the slow execution time, but can anyone tell me why this happens?
The way that the query cache works is that it only caches the ID's of the objects returned by the query. So, your initial SELECT statement might return all the objects, and Hibernate will give them back to you and remember the ID's.
The next time you issue the query, however, Hibernate goes through the list of ID's and realizes it needs to materialize the actual data. So it goes back to the database to get the rest. And it does one SELECT per row, which is exactly what you are seeing.
Now, before you think, "this feature is obviously broken", the reason it works this way is that the Query Cache is designed to work in concert with the Second Level Cache. If the objects are stored in the L2 cache after the first query, then Hibernate will look there instead to satisfy the per-ID requests.
I highly recommend you pick up the book Java Persistence with Hibernate to learn more about this. Chapter 13 in particular covers optimizing queries, and how to use the cache effectively.

Unit testing DDL statements that need to be in a transaction

I am working on an application that uses Oracle's built in authentication mechanisms to manage user accounts and passwords. The application also uses row level security. Basically every user that registers through the application gets an Oracle username and password instead of the typical entry in a "USERS" table. The users also receive labels on certain tables. This type of functionality requires that the execution of DML and DDL statements be combined in many instances, but this poses a problem because the DDL statements perform implicit commits. If an error occurs after a DDL statement has executed, the transaction management will not roll everything back. For example, when a new user registers with the system the following might take place:
Start transaction
Insert person details into a table. (i.e. first name, last name, etc.) -DML
Create an oracle account (create user testuser identified by password;) -DDL implicit commit. Transaction ends.
New transaction begins.
Perform more DML statments (inserts,updates,etc).
Error occurs, transaction only rolls back to step 4.
I understand that the above logic is working as designed, but I'm finding it difficult to unit test this type of functionality and manage it in data access layer. I have had the database go down or errors occur during the unit tests that caused the test schema to be contaminated with test data that should have been rolled back. It's easy enough to wipe the test schema when this happens, but I'm worried about database failures in a production environment. I'm looking for strategies to manage this.
This is a Java/Spring application. Spring is providing the transaction management.
First off I have to say: bad idea doing it this way. For two reasons:
Connections are based on user. That means you largely lose the benefits of connection pooling. It also doesn't scale terribly well. If you have 10,000 users on at once, you're going to be continually opening and closing hard connections (rather than soft connection pools); and
As you've discovered, creating and removing users is DDL not DML and thus you lose "transactionality".
Not sure why you've chosen to do it this but I would strongly recommend you implement users at the application and not the database layer.
As for how to solve your problem, basically you can't. Same as if you were creating a table or an index in the middle of your sequence.
You should use Oracle proxy authentication in combination with row level security.
Read this: http://www.oracle.com/technology/pub/articles/dikmans-toplink-security.html
I'll disagree with some of the previous comments and say that there are a lot of advantages to using the built-in Oracle account security. If you have to augment this with some sort of shadow table of users with additional information, how about wrapping the Oracle account creation in a separate package that is declared PRAGMA AUTONOMOUS_TRANSACTION and returns a sucess/failure status to the package that is doing the insert into the shadow table? I believe this would isolate the Oracle account creation from the transaction.

Categories