For example, we have a table (login, hash). We have no unique constraint on login column, but we should keep it unique (just for example).
When a new user registers, we check if entered login is free.
If it's a java web app deployed to Tomcat, that has thread pool, then those checks might be processed parallel, right? How to ensure uniqueness then?
You can use pessimist lock in the table, that will lock the table and you can check if has and save, so other thread won't be able to change that table for this time being. But I think that is a really bad way to do things, why not use DB constraints ?
In short, you can't have a good solution without database constraints here.
Without a constraint in a multi-threaded environment you'll need some common resource to synchronize your threads on. A thread would acquire the mutex, check if login is free (using a SELECT) and then INSERT a new record if it was free. No other thread should be able to do this at the same time - this is why you need synchronization here.
This will work iff all your threads have access to this mutex and if it is guaranteed that noone else can access the database at the same time.
The first problem appears if you have, for instance several machines which access the same database. Threads running on different machines will not have access to the same mutex so they will happily insert in your table in parallel.
The other problem that if someone logs in to database and creates records in that table directly, such inserts may happen exactly between SELECT and INSERT executed from your code. So synchronization in code won't help here.
A further option is locking the whole table, but that's even worse. You'll need to very reliably release the lock otherwise you're risking stalling the whole system.
Related
Right now, I am thinking of implementing multi-threading to take tasks corresponding to records in the DB tables. The tasks will be ordered by created date. Now, I am stuck to handle the case that when one task (record) being taken, other tasks should skip this one and chase the next one.
Is there any way to do this? Many thanks in advance.
One solution is to make a synchronized pickATask() method and free threads can only pick a task by this method.
this will force the other free threads to wait for their order.
synchronized public NeedTask pickATask(){
return task;
}
According to how big is your data insertion you can either use global vectorized variables or use a table in the database itself to record values like (string TASK, boolean Taken, boolean finished, int Owner_PID).
By using the database to check the status you tend to accomplish a faster code in large scale, but if do not have too many threads or this code will run just once the (Synchronized) global variable approach may be a better solution.
In my opinion if you create multiple thread to read from db and every thread involve in I/O operation and some kind of serialization while reading row from same table.In my mind this is not scallable and also some performance impact.
My solution will be one thread will be producer which will read the row in batch and create task and submit the task to execution (will be thread pool of worker to do the actual task.)Now we have two module which can be scallable independently.In producer side if required we can create multiple thread and every thread will read some partition data.For an example Thread 1 will read 0-100 and thread 2 read 101-200.
It depends on how you manage your communication between java and DB. Are you using direct jdbc calls, Hibernate, Spring Data or any other ORM framework. In case you use just JDBC you can manage this whole issue on your DB level. you will need to configure your DB to lock your record upon writing. I.e. once a record was selected for update no-one can read it until the update is finished.
In case that you use some ORM framework (Such as Hibernate for example) the framework allows you to manage concurrency issues. See about Optimistic and Pessimistic locking. Pessimistic locking does approximately what is described above - Once the record is being updated no-one can read it until the update is finished. Optimistic one uses versioning mechanism, and then multiple threads can try to update the record but only the first one succeeds and the rest will get an exception saying that they are now working with stale data and they should read the record again. The versioning mechanism is to add a version column that is usually a number or sometimes timestamp. Each thread reads the record and upon update it checks if the version in DB still the same. If so it means no-ne else updated the record and upon update the version is changed (incremented or current timestamp is set). If the version changed then someone else already updated the record since it was read and so this thread has stale record and should not be allowed to update it. Optimistic locking shows better performance in environment where reading heavily outnumbers writing
I had a problem in my software that sometimes caused a lock on the SQL server.
This was caused by a process that selects a group of records and starts processing them.
Based on some values and a calculation the records get updated.
When a record is being updated the page where that record is on, is locked by the SQL server for select. Which results in a lock that never solves itself.
To solve the problem we have created a second table, from which we select, the main table is copied into it before the process starts, the table that is updated is not being selected in that way and no lock can appear.
What I am looking for is simple and better solution for this problem, because for me it is like a workaround for something I'm doing the wrong way and would really like to improve the processing.
Try to change TRANSACTION ISOLATION LEVEL on database. Here is link.
I guess your default isolation level is set to repeatable read, which causes the select to set a shared lock on the returned records, deadlock happens when concurrent requests come in. To solve this you should take a locking select (to lock records with X lock rather than S lock).
I'm doing a sql query in some java code, but the execution blocked at a call to ResultSet.next(). At the same time, I found I'm visiting the queried table using IBM DB2 Control Center. When I closed Control Center, the call to ResultSet.next() is resumed. So I suspect my manipulation on Control Center locked the queried table. Is this possible? How can I confirm this(i.e. how can I know if a certain table is locked in DB2)?
Yes, it can be blocked depending, on the queries. It does not matter that it is the control center, it can be any other kind of connection. The fact is that in a connection, you put a lock on a table or a row, and the other connection tries to access that table or row, however the second one stays in 'Lock Wait' because it should wait for the lock release from the first connection.
Let's suppose you create a table from the Control Center, but you does not execute a commit. When you issue a select from the Java application, the table is with a lock wait on it. You should issue a rollback or commit in the control center in order to release the lock.
In order to see the locks, there are many tools, however, db2top, option u is a good one for me when analyzing real time situations.
Finally, why you use control center? it is outdated. Try IBM Data Studio.
While AngocA is right in general, it must be said that what you experienced may hint to some configuration problems, as two read-only transactions shouldn't lock out each other under normal circumstances.
DB2 normaly uses row locking. However, if the number of locked rows gets too big for the so called lock list, the db2 server may perform a so called "lock escalation" and replace millions of row locks for some table with a single exclusive lock on the table.
The behaviour also depends on the so called isolation mode used, but this is a topic not easily explained in a SO post; you need to read some documentation to understand it.
The size of the lock list can be configured with db paramater LOCKLIST. The diagnostic log (look for file db2diag.log) tells you whether lock escalations have taken place.
In my Java webapp, each instance is checking on startup if the database is up-to-date via a JDBC connection. If the DB is not up-to-date, it performs an update routine by executing SQL scripts.
I can't control when instances get startet. Therefore, I need to ensure that only a single instance is performing a database update at the same time. Ideally, I would need to lock the complete database, but according to
http://www.postgresql.org/docs/8.4/static/explicit-locking.html
and
http://wiki.postgresql.org/wiki/Lock_database
PostgreSQL doesn't support it (I'm still using version 8.4).
What other options do I have?
If you control the code for all the instances, then you can create a table in the database where each instance that starts, looks in this table for a record with a timestamp. Lets call it your "lock" record.
If a process finds that the lock record does not exist, then it inserts the record and processes the data you require.
If a process finds that the lock record does exist then you can assume that another process has created it and do nothing, busy wait, or what ever.
With this design you are effectively creating a "lock" in the database to synchronize your processes with. You code it, so all processes know they have to adhere to the logic of the lock record.
Once the first process that has the lock, has completed processing, it should clear the lock record so the next restart behaves correctly. You also need to think about the situation where the lock has not been cleared due to a server error, or execution erorr. Typically, if the lock is older than n minutes you can consider it to be "stale", therefore delete it, and create it again (or just update it).
When dealing with the "lock" record be sure to utilise the Serializable isolation level on your DB connection in order to guarantee atomicity.
The Service layer of your Java code can enforce with your locking strategy prior to calling your Data Access layer. It won't matter whether you use Hibernate or not, as it's just application logic.
Ideally, I would need to lock the complete database.
Does it really matter what your lock applies to, as long as you're effectively serializing access? Just acquire an exclusive lock on any table, or row for that matter.
I have a problem when I try to persist objects using multiple threads.
Details :
Suppose I have an object PaymentOrder which has a list of PaymentGroup (One to Many relationship) and PaymentGroup contains a list of CreditTransfer(One to Many Relationship again).
Since the number of CreditTransfer is huge (in lakhs), I have grouped it based on PaymentGroup(based on some business logic)
and creating WORKER threads(one thread for each PaymentGroup) to form the PaymentOrder objects and commit in database.
The problem is, each worker thread is creating one each of PaymentOrder(which contains a unique set of PaymentGroups).
The primary key for all the entitties are auto generated.
So there are three tables, 1. PAYMENT_ORDER_MASTER, 2. PAYMENT_GROUPS, 3. CREDIT_TRANSFERS, all are mapped by One to Many relationship.
Because of that when the second thread tries to persist its group in database, the framework tries to persist the same PaymentOrder, which previous thread committed,the transaction fails due to some other unique field constraints(the checksum of PaymentOrder).
Ideally it must be 1..n..m (PaymentOrder ->PaymentGroup-->CreditTransfer`)
What I need to achieve is if there is no entry of PaymentOrder in database make an entry, if its there, dont make entry in PAYMENT_ORDER_MASTER, but only in PAYMENT_GROUPS and CREDIT_TRANSFERS.
How can I ovecome this problem, maintaining the split-master-payment-order-using-groups logic and multiple threads?
You've got options.
1) Primitive but simple, catch the key violation error at the end and retry your insert without the parents. Assuming your parents are truly unique, you know that another thread just did the parents...proceed with children. This may perform poorly compared to other options, but maybe you get the pop you need. If you had a high % parents with one child, it would work nicely.
2) Change your read consistency level. It's vendor specific, but you can sometimes read uncommitted transactions. This would help you see the other threads' work prior to commit. It isn't foolproof, you still have to do #1 as well, since another thread can sneak in after the read. But it might improve your throughput, at a cost of more complexity. Could be impossible, based on RDBMS (or maybe it can happen but only at DB level, messing up other apps!)
3) Implement a work queue with single threaded consumer. If the main expensive work of the program is before the persistence level, you can have your threads "insert" their data into a work queue, where the keys aren't enforced. Then have a single thread pull from the work queue and persist. The work queue can be in memory, in another table, or in a vendor specific place (Weblogic Queue, Oracle AQ, etc). If the main work of the program is before the persistence, you parallelize THAT and go back to a single thread on the inserts. You can even have your consumer work in "batch insert" mode. Sweeeeeeeet.
4) Relax your constraints. Who cares really if there are two parents for the same child holding identical information? I'm just asking. If you don't later need super fast updates on the parent info, and you can change your reading programs to understand it, it can work nicely. It won't get you an "A" in DB design class, but if it works.....
5) Implement a goofy lock table. I hate this solution, but it does work---have your thread write down that it is working on parent "x" and nobody else can as it's first transaction (and commit). Typically leads to the same problem (and others--cleaning the records later, etc), but can work when child inserts are slow and single row insert is fast. You'll still have collisions, but fewer.
Hibernate sessions are not thread-safe. JDBC connections that underlay Hibernate are not thread safe. Consider multithreading your business logic instead so that each thread would use it's own Hibernate session and JDBC connection. By using a thread pool you can further improve your code by adding ability of throttling the number of the simultaneous threads.