I am not a DB expert, so maybe somebody can help me out with the following.
I use JPA icm MySQL on the Play! Framework.
As I in the past ran into problems with DB locks on a data collection routine, I made this routine read-only. Now when I need to make changes from this routine I do it concurrently.
Worked fine so far, but now I run into a problem. When a new user connects a record is created and the ID passed back. But I need to get the Model, not just the id.
When I search for the 'id' on my test environment this is fine, I just search for it (mem db) but with MYSQL the EM returns with nothing. It seems to me that with JPA-MySQL no changes made before the transaction start (even if bread-only) are included in future searches.
Is this correct, and is there a way around this? I can rewrite the procedure to make it readable again, and see if I can optimise it more so it would not result in other threads (or server) to run into a problem in obtaining a db lock. Possibly the best in the long term, but I am looking for a quicker solution for the moment.
The Transaction Isolation Level was set too high. Changing it solved the problem.
In the Play! Framework (1.2.5) it can be done like this:
%prod-test.db.isolation=READ_COMMITTED
Related
I'm investigating the possibility of using neo4j to handle some of the queries of our java web application that simply take too long to run on MSSQL as they require so many joins on large tables, even with indexes implemented.
I am however concerned about the time that it might take to complete the ETL ultimately impacting on how outdated the information may be when queries.
Can someone advise on either a production strategy or toolkit / library that can assist in reading a production sql-server database (using deltas if possible to optimise) and updating a running instance of a neo4j database? I imagine that there will have to be some kind of mapping configuration but the idea is to have this run in an automated manner, updating the neo4j database with one or more sql-server table or view contents.
The direct way to connect a MS SQL database to a Neo4j database would be using the apoc.load.jdbc procedure.
For an initial load you can use Neo4j ETL (https://neo4j.com/blog/rdbms-neo4j-etl-tool/).
There is however no way around the fact that some planning and work will be involved if you want to keep two databases in sync (and if the logic involved goes beyond a few simple queries) continiously. You might want to offload a delta every so often (monthly, daily, hourly, ...) into CSV files and load those (with CYPHER syntax determining what needs to be added, removed, changed or connected) with LOAD CSV.
Sadly enough there's no such thing as a free lunch.
Hope this helps,
Tom
Quick background story:
I work on a very old application that has recently been having issues with locks on the database. The app is written in Java and uses Hibernate. One of the issues we identified are transactions that are kept alive unnaturally long while also having isolation levels changed between READ_COMMITED and READ_UNCOMMITED frequently. While we acknowledge that the clear solution is refactoring the code so that transactions are smaller, this would be an enormous effort that we cannot afford entirely right now (most used parts of the app are being migrated to a new system but this procedure is relatively slow).
So - because we use READ_UNCOMMITED for all our Select operations and READ_COMMITED for everything else, a DBA that has been helping us, identified a possible solution in changing the isolation level to a global READ_COMMITED and changing all select queries to include the hint 'with (NOLOCK)'. He says functionally there should be no difference in the way data is retrieved (since we use dirty reads right now with no problem) while providing us with an advantage in not having to frequently change isolation level within the transaction. I believe his idea also comes in regards to recent reports we've been having about database locks being caused by isolation level changes.
So - Can we (and if so, how?) tell hibernate to add a 'with (nolock)' hint on all queries being automatically generated by the usage of mapped java objects and HQL (and maybe even existing SQL being passed to hibernate, though this seems like pushing it :) ) WITHOUT changing the isolation level?
Final side notes: we are using an older version of hibernate, v3.5 and right now an upgrade is unlikely, some incredibly 'smart' people decided to taint it at some point, inserting some of their own code that the application uses. Upgrading has been tried and failed multiple times.
Also: i have checked quite a few related threads, the general idea seems to be: don't use nolock, change isolation level, which - as stated - we're not looking to do.
Edit1: Since the app has been continuously developed in the past 12 years, there are loads of modules that haven't been even once glanced over by the current dev team, the ideal solution would be something that doesn't require the identification of every single bit of Java code that uses persisted objects.
Edit2: A possible way to go about this - should Hibernate allow it - would be to add a form of Interceptor that receives the formatted SQL query before being passed to the db driver. I would then take care of adding the hints myself, using some form of regex.
Thank you very much in advance.
You cannot use (NOLOCK) with HQL. You can however with native SQL if you decide to change your queries. Something like:
getCurrentSession().createSQLQuery("select * from table with(NOLOCK)").list();
I am using Spring 2.5 and the Hibernate that goes with it. I'm running against an Oracle 11g database.
I have created my DAOs which extend HibernateTemplate. Now I want to write a loader that inserts 5 million rows in my person table. I have written this in a simple minded fashion like read a row from a CSV file, turn it into a person, save into the table. Keep doing this until CSV file is empty.
The problem is that I run out of heap space at about 450000 rows. So I double the size of memory from 1024m to 2048m and now I run out of memory after about 900000 rows.
Hmmmmm....
So I've read some things about turning off the query cache for Hibernate, but I'm not using a L2 cache, so I don't think this is the issue.
I've read some things about JDBC2 batching, but I don't think that applies to hibernate.
So, I'm wondering if maybe there's a fundamental thing about Hibernate that I'm missing.
To be honest I wouldn't be using hibernate for that. ORMs are not designed to load million of rows into DBs. Not saying that you can't, but it's a bit like digging a swimming pool with a electric drill; you'd use an excavator for that, not a drill.
In your case, I'd load the CSV directly to the DB with a loader application that comes with databases. If you don't want to do that, yes, batch inserts will be way more efficient. I don't think Hibernate let's you do that easily though. If I were you I'd just use plain JDBC, or at most Spring JDBC.
If you have complicated businesslogic in the entities and absolutely have to use Hibernate, you could flush every N records as Richard suggests. However, I'd consider that a pretty bad hack.
In my experience with EclipseLink, holding a single transaction open while inserting/updating many records results in the symptoms you've experienced.
You are working with an EntityManager (of some sort, JPA or Hybernate specific - it's still managing Entitys). It's trying to keep the working set in memory, for the life of the transaction.
A general solution was to commit & the restart the transaction after every N inserts; a typical N for me was 1000.
As a footnote, with some version (undefined, it's been a few years) of EclipseLink, a session flush/clear didn't solve the problem.
It sounds like you are running out of space due to your first-level cache (the Hibernate session). You can flush the Hibernate session periodically to keep memory usage down, and break up the work into chunks by committing every few thousand rows, keeping the database's transaction log from getting too big.
But using Hibernate for a load task like that will be slow, because JDBC is slow. If you have a good idea what the environment will be like, you have a cap on the amount of data, and you have a big enough window for processing, then you can manage, but in a situation where you want it to work in multiple different client sites and you want to minimize the time spent on figuring out problems due to some client site's load job not working, then you should go with the database's bulk-copy tool.
The bulk-copy approach means the database suspends all constraint checking and index-building and transaction logging, instead it concentrates on slurping the data in as fast as possible. Because JDBC doesn't get anything like this level of cooperation from the database it can't compete. At a previous job we replaced a JDBC loader task that took over 8 hours to run with a SQLLoader task that took 20 minutes.
You do sacrifice database independence, but all databases have a bulk-copy tool (because DBAs rely on them) so you will have a very similar process for each database, only the exe you invoke and the way the file formatting is specified should change. And this way you make the best use of your processing window.
I've got an Oracle database that has two schemas in it which are identical. One is essentially the "on" schema, and the other is the "off" schema. We update data in the off schema and then switch the schemas behind an alias which our production servers use. Not a great solution, but it's what I've been given to work with.
My problem is that there is a separate application that will now be streaming data to the database (also handed to me) which is currently only updating the alias, which means it is only updating the "on" schema at any given time. That means that when the schemas get switched, all the data from this separate application vanishes from production (the schema it is in is now the "off" schema).
This application is using Hibernate 3.3.2 to update the database. There's Spring 3.0.6 in the mix as well, but not for the database updates. Finally, we're running on Java 1.6.
Can anyone point me in a direction to updating both "on" and "off" schemas simultaneously that does not involve rewriting the whole DAO layer using Spring JDBC to load two separate connection pools? I have not been able to find anything about getting hibernate to do this. Thanks in advance!
You shouldn't be updating two seperate databases this way, especially from the application's point of view. All it should know/care about is whether or not the data is there, not having to mess with two separate databases.
Frankly, this sounds like you may need to purchase an ETL tool. Even if you can't get it to update the 'on' schema from the 'off' one (fast enough to be practical), you will likely be able to use it to keep the two in sync (mirror changes from 'on' to 'off').
HA-JDBC is a replicating JDBC Driver we investigated for a short while. It will automatically replicate all inserts and updates, and distribute all selects. There are other database specific master-slave solutions as well.
On the other hand, I wouldn't recommend doing this for 4-8 hour procedures. Better lock the database before, update one database, and then backup-restore a copy, and then unlock again.
I have a java application that is using MSSQL server through the JDBC driver. Is there some kind of stub that I can use for testing? For example I want to test how my application handle cases of connection errors, SQL server out of disk, and other exceptions. It's pretty hard and complex to simulate this with real SQL server.
Thanks
You could write unit tests against your DAOs or repositories returning mock Connection objects using a mock library such as https://mocquer.dev.java.net/.
You'd need a really clean and decoupled application architecture though in order to make this work correctly and provide you with actual test coverage.
You could (assuming the system is architected in a way to make this easy) create your own versions of the DB Access classes (I assume you are using teh statement/preparedstatement interfaces), which would hold the real DB calls and that you can modify to do exactly what you want.
I've done this - it takes a day or so of really boring work.
I don't think there's something like that.
You'd be better off setting up your own database and testing on your machine/lan.
All I know there is out there, is:
freeSQL
db4free
Both support MySQL, but none MS-SQL. I do think that has to do with licensing issues and limitations. So I'm afraid you won't find a similar service for MS-SQL db.
Answering myself with an option I thought of, I'll be glad to hear your inputs on it.
After crawling around, I got to HyperSQLDB, a java-implemented database.
How feasible do you think is to take the source code of HSQLDB, and adding another layer to it, so I can control it and inject pre-defined behaviors to it.
For example, I'll make it run all queries slowly, I'll make it disconnect, etc.
Do you think this idea is worth pursuing? Is it doable in a reasonable amount of time?
If you use something other than MS-SQL, you may cause more testing problems due to incompatibilities and lack of functionality (e.g., transactions) than you solve. So I'm with Carl - use a shim.
If you were looking for unit-test coverage of ordinary behavior, I might think differently.
I haven't used them personally, but the stuff you're talking about sounds like a really good fit for a mocking framework, such as Mockito(docs) or PowerMock. They appear to provide good support for the kind of failure injection you're after. Can someone with experience with either of them (or similar) weigh in? See also How to stub/mock JDBC ResultSet to work both with Java 5 and 6?
execute procedure sp_who2 it will generate the all the current connections and process in your db you can see a column named spid corresponding to each db connection. just type: kill <<spid>> and execute it to terminate any users..etc. but if the spid is less than 50 it means it is a system process and dont kill it. This can help you replicate connection drops.
you can also say ALTER DATABASE dbname SET SINGLE_USER WITH ROLLBACK_IMMEDIATE this will drop all connections to the said db immediately.
Select ##MAX_Connections as Max_Connections would give you the max connections which can be made to a database (you can set it to a low number to test connection unavailability).
to replicate query timeout.. set the query timeout to a very low number & execute a fairly large query.
to create disk space error, simply redice the size of the db file & do not allow it to grow... then insert data to the database (you'll get an exception).
altert database xxx (file= maxsize= filegrowth=)