In my Java webapp, each instance is checking on startup if the database is up-to-date via a JDBC connection. If the DB is not up-to-date, it performs an update routine by executing SQL scripts.
I can't control when instances get startet. Therefore, I need to ensure that only a single instance is performing a database update at the same time. Ideally, I would need to lock the complete database, but according to
http://www.postgresql.org/docs/8.4/static/explicit-locking.html
and
http://wiki.postgresql.org/wiki/Lock_database
PostgreSQL doesn't support it (I'm still using version 8.4).
What other options do I have?
If you control the code for all the instances, then you can create a table in the database where each instance that starts, looks in this table for a record with a timestamp. Lets call it your "lock" record.
If a process finds that the lock record does not exist, then it inserts the record and processes the data you require.
If a process finds that the lock record does exist then you can assume that another process has created it and do nothing, busy wait, or what ever.
With this design you are effectively creating a "lock" in the database to synchronize your processes with. You code it, so all processes know they have to adhere to the logic of the lock record.
Once the first process that has the lock, has completed processing, it should clear the lock record so the next restart behaves correctly. You also need to think about the situation where the lock has not been cleared due to a server error, or execution erorr. Typically, if the lock is older than n minutes you can consider it to be "stale", therefore delete it, and create it again (or just update it).
When dealing with the "lock" record be sure to utilise the Serializable isolation level on your DB connection in order to guarantee atomicity.
The Service layer of your Java code can enforce with your locking strategy prior to calling your Data Access layer. It won't matter whether you use Hibernate or not, as it's just application logic.
Ideally, I would need to lock the complete database.
Does it really matter what your lock applies to, as long as you're effectively serializing access? Just acquire an exclusive lock on any table, or row for that matter.
Related
I am a little confused as to why Optimistic Locking is actually safe. If I am checking the version at the time of retrieval with the version at the time of update, it seems like I can still have two requests enter the update block if the OS issues an interrupt and swaps the processes before the commit actually occurs. For example:
latestVersion = vehicle.getVersion();
if (vehicle.getVersion() == latestVersion) {
// update record in database
} else {
// don't update record
}
In this example, I am trying to manually use Optimistic Locking in a Java application without using JPA / Hibernate. However, it seems like two requests can enter the if block at the same time. Can you please help me understand how to do this properly? For context, I am also using Java Design Patterns website as an example.
Well... that's the optimistic part. The optimism is that it is safe. If you have to be certain it's safe, then that's not optimistic.
The example you show definitely is susceptible to a race condition. Not only because of thread scheduling, but also due to transaction isolation level.
A simple read in MySQL, in the default transaction isolation level of REPEATABLE READ, will read the data that was committed at the time your transaction started.
Whereas updating data will act on the data that is committed at the time of the update. If some other concurrent session has updated the row in the database in the meantime, and committed it, then your update will "see" the latest committed row, not the row viewed by your get method.
The way to avoid the race condition is to not be optimistic. Instead, force exclusive access to the record. Doveryai, no proveryai.
If you only have one app instance, you might use a critical section for this.
If you have multiple app instances, critical sections cannot coordinate other instances, so you need to coordinate in the database. You can do this by using pessimistic locking. Either read the record using a locking read query, or else you can use MySQL's user-defined locks.
Right now, I am thinking of implementing multi-threading to take tasks corresponding to records in the DB tables. The tasks will be ordered by created date. Now, I am stuck to handle the case that when one task (record) being taken, other tasks should skip this one and chase the next one.
Is there any way to do this? Many thanks in advance.
One solution is to make a synchronized pickATask() method and free threads can only pick a task by this method.
this will force the other free threads to wait for their order.
synchronized public NeedTask pickATask(){
return task;
}
According to how big is your data insertion you can either use global vectorized variables or use a table in the database itself to record values like (string TASK, boolean Taken, boolean finished, int Owner_PID).
By using the database to check the status you tend to accomplish a faster code in large scale, but if do not have too many threads or this code will run just once the (Synchronized) global variable approach may be a better solution.
In my opinion if you create multiple thread to read from db and every thread involve in I/O operation and some kind of serialization while reading row from same table.In my mind this is not scallable and also some performance impact.
My solution will be one thread will be producer which will read the row in batch and create task and submit the task to execution (will be thread pool of worker to do the actual task.)Now we have two module which can be scallable independently.In producer side if required we can create multiple thread and every thread will read some partition data.For an example Thread 1 will read 0-100 and thread 2 read 101-200.
It depends on how you manage your communication between java and DB. Are you using direct jdbc calls, Hibernate, Spring Data or any other ORM framework. In case you use just JDBC you can manage this whole issue on your DB level. you will need to configure your DB to lock your record upon writing. I.e. once a record was selected for update no-one can read it until the update is finished.
In case that you use some ORM framework (Such as Hibernate for example) the framework allows you to manage concurrency issues. See about Optimistic and Pessimistic locking. Pessimistic locking does approximately what is described above - Once the record is being updated no-one can read it until the update is finished. Optimistic one uses versioning mechanism, and then multiple threads can try to update the record but only the first one succeeds and the rest will get an exception saying that they are now working with stale data and they should read the record again. The versioning mechanism is to add a version column that is usually a number or sometimes timestamp. Each thread reads the record and upon update it checks if the version in DB still the same. If so it means no-ne else updated the record and upon update the version is changed (incremented or current timestamp is set). If the version changed then someone else already updated the record since it was read and so this thread has stale record and should not be allowed to update it. Optimistic locking shows better performance in environment where reading heavily outnumbers writing
For example, we have a table (login, hash). We have no unique constraint on login column, but we should keep it unique (just for example).
When a new user registers, we check if entered login is free.
If it's a java web app deployed to Tomcat, that has thread pool, then those checks might be processed parallel, right? How to ensure uniqueness then?
You can use pessimist lock in the table, that will lock the table and you can check if has and save, so other thread won't be able to change that table for this time being. But I think that is a really bad way to do things, why not use DB constraints ?
In short, you can't have a good solution without database constraints here.
Without a constraint in a multi-threaded environment you'll need some common resource to synchronize your threads on. A thread would acquire the mutex, check if login is free (using a SELECT) and then INSERT a new record if it was free. No other thread should be able to do this at the same time - this is why you need synchronization here.
This will work iff all your threads have access to this mutex and if it is guaranteed that noone else can access the database at the same time.
The first problem appears if you have, for instance several machines which access the same database. Threads running on different machines will not have access to the same mutex so they will happily insert in your table in parallel.
The other problem that if someone logs in to database and creates records in that table directly, such inserts may happen exactly between SELECT and INSERT executed from your code. So synchronization in code won't help here.
A further option is locking the whole table, but that's even worse. You'll need to very reliably release the lock otherwise you're risking stalling the whole system.
I'm developing an application with JPA2.1. I have the followed trouble.
I'm trying to lock an entity in this way :
Book book = em.find(Book.class, 12);
em.lock(book, LockModeType.PESSIMISTIC_WRITE);
but if try to access from another windows browser or client to entity with id=12 , the system doesn't thrown PessimisticLockException?
Where am I wrong?
The lock will be effective during the lifetime of the transaction but certainly not across multiple request-response loop (unless you have configured your entity manager and transaction manager to manage long time transaction).
The transaction MUST be a short-time living object (for performance reasons).
Optimistic write-lock means that book will not be modified by any other thread between the lock instruction and the end of the transaction. But the book object itself may live longer of course.
I suppose that in another window/browser you try the same thing: to acquire a PESSIMISTIC_WRITE lock.
The problem that you have, is that the lock is released when the method returns (as the transaction ends), meaning that when you open the second browser/window, there is no lock anymore.
You should probably explain us the problem/scenario that you want to try to solve/test.
For the general situation:
Another possible cause could be that your database table does not support row-level locking. For example in MySql only the InnoDB storage engine supports "SELECT * FOR UPDATE" (which the PESSIMISTIC_WRITE lock is translated into).
I have come across this oracle java tutorial. As a beginner in the topic I cannot grasp why it's needed to set con.setAutocommit(true); at the end of the transaction.
Here is the oracle explanation:
The statement con.setAutoCommit(true); enables auto-commit mode, which
means that each statement is once again committed automatically when
it is completed. Then, you are back to the default state where you do
not have to call the method commit yourself. It is advisable to
disable the auto-commit mode only during the transaction mode. This
way, you avoid holding database locks for multiple statements, which
increases the likelihood of conflicts with other users.
Could you explain it in other words? especially this bit:
This way, you avoid holding database locks for multiple statements,
which increases the likelihood of conflicts with other users.
What do they mean with "holding database locks for multiple statements"?
Thanks in advance.
The database has to perform row-level or table-level locking (based on your database-engine in MySQL) to handle transactions. If you keep the auto-commit mode off and keep executing statements, these locks won't be released until you commit the transactions. Based on the type, other transactions won't be able to update the row/table that is currently locked. setAutocommit(true) basically commits the current transaction, releases the locks currently held, and enables auto-commit, That is, until further required, each individual statement is executed and commited.
row-level locks protect the individual rows that take part in the transaction (InnoDB). Table-level locks prevent concurrent access to the entire table (MyIsam).
When one transaction updates a row in the database others transaction cannot alter this row until the first one finishes (commits or rollbacks), therefore if you do not need transactions it is advisable to set con.setAutocommit(true).
With most modern database systems you can batch together a series of SQL statements. Typically the ones you care about are inserts as these will block out a portion of the space on disk that is being written to. In JDBC this is akin to Statement.addBatch(sql). Now where this becomes problematic is when you try to implement pessimistic or optimistic locks on tuples in the database. So if you have a series of long running transactions that execute multiple batches you can find yourself in a situation where all reads get rejected because of these exclusive locks. I believe in Oracle there is no such thing as the dirty read so this can potentially be mitigated. But imagine the scenario where you are running a job that attempts to delete a record while I am updating it, this is the type of conflict that they are referring to.
With auto-commit on, each part of the batch is saved before moving on to the next unit of work. This is what you see when trying to persist millions of records and it slows down considerably. Because the system is ensuring consistency with each insert statement. There is a quick way to get around this in Oracle (if you are using oracle) is to use the oracle.sql package and look at the ARRAY class.
Most databases will autoCommit by default. That means that as soon as you execute a statement the results will immediately appear in the database and everyone else using the database will immediately see them.
There are times, however, when you need to perform a number of changes on the database which must all be done at once and if one fails you want to back out of all of them.
Say you have a cars database and you come across a new car from a new manufacturer. Here you may wish to create the manufacturer entry in your database and the new car record and make sure they both appear at once for other users. Otherwise there may be a confusing moment in your database where one exists without the other.
To achieve this you switch autoCommit off, execute the statements, commit them and then set autoCommit back on. This last switch on of autoCommit is probably what you are seeing.