Using Java locks for database concurrency - java

I have the following scenario.
I have two tables. One stores multi values that are counters for transactions. Through a java application the first table value is read, incremented and written to the second table, as well as the new value being written back to the first table. Obviously there is potential for this to go wrong as it's a multiple user system.
My solution, in Java, to the issue is to provide Locks that have to, well should, be aquired before any action can be taken on either table. These Locks, ReentrantLocks, are static and there is one for each column in Table 1 as the values are completely independent of each other.
Is this a recommended approached?
Cheers.

No. Use implicit Database Locks1 for Database Concurrency. Relational databases support Transactions which are a vital part of ACID: use them.
Java-centric locks will not work cross-VM and as such will not help in multi-User/Server environments.
1 Databases are smart enough to acquire/release locks to ensure Consistency and Isolation and may even use "lock free" implementations such as MVCC. There are rare occasions when explicit database locks must be requested, but this is an advanced use-case.

Whilst agreeing with some of the sentiments of #pst's answer, I would say this depends slightly.
If the sequence of events is, and probably always will be, essentially "SQL oriented", then you may as well do the locking at the database level (and indeed, probably implicitly via the use of transactions).
However if there is, or you are planning to build in, significant data manipulation logic within your app tier (either generally or in the case of this specific operation), then locking at the app level may be more appropriate. (In reality, you will probably still run your SQL in transactions so that you're actually locking at both levels.)
I don't think the issue of multiple VMs is necessarily a compelling issue on its own for relying on DB-level locking. If you have multiple server apps accessing the database, you will in any case want to establish a well-defined protocol for which data is accessed concurrently under what circumstances. And in a system of moderate complexity, you will in any case want to build in a system of running periodic sanity checks on the data. (Even if your server apps are perfectly behaved 100% of the time, will back end tech support never ever ever have to run some miscellaneous SQL on the database outside your app...?)

Related

Is there a way to monitor database performance from java using hibernate?

I use hibernate and my database may be quite different, MySQL, MSSQL, Oracle. I need to monitor database load and/or performance to get the state of database in real time. Availability of the database is quite easy to check, but I doubt it is really useful if you want to know how loaded the database in the current moment.
Is there a way to do it? May be there are some useful alternatives?
All the databases you mentioned provide some kind of statistics and performance counters via SQL interface. Under the hood, however, they are completely different and use different, proprietary concepts. Each database provides different performance counters and also a different way of reading them. Without knowing the internals those numbers will look completely meaningless to you. To understand them you have to understand how that particular database works, and this can mean many years of study for each database separately. It's called a "DBA career". None of those databases has a concept of "how loaded" it is, and you will have to make that conclusion based on tens (possibly hundreds) of counters that you see as relevant in your case. Still, regardless of how you try and which counters you use for your calculation of "loadedness", a value of 10 for MSSQL will never be same as value of 10 for Oracle. To make things even worse, database internals (and thus also performance counters) change drastically between versions.
Bottom line: there is no common denominator, and there can never be. Trying to compare them is like comparing apples and oranges.
That said, each of them has its own tools for monitoring and optimization.

Default #Transactional in spring and the default lost update

There is one big phenomena in the spring environment or I am terribly wrong.
But the default spring #Transactional annotation is not ACID but only ACD lacking the isolation. That means that if you have the method:
#Transactional
public TheEntity updateEntity(TheEntity ent){
TheEntity storedEntity = loadEntity(ent.getId());
storedEntity.setData(ent.getData);
return saveEntity(storedEntity);
}
What would happen if 2 threads enter with different planned updates. They both load the entity from the db, they both apply their own changes, then the first is saved and commit and when the second is saved and commit the first UPDATE IS LOST. Is that really the case? With the debugger it is working like that.
Losing data?
You're not losing data. Think of it like changing a variable in code.
int i = 0;
i = 5;
i = 10;
Did you "lose" the 5? Well, no, you replaced it.
Now, the tricky part that you alluded to with multi-threading is what if these two SQL updates happen at the same time?
From a pure update standpoint (forgetting the read), it's no different. Databases will use a lock to serialize the updates so one will still go before the other. The second one wins, naturally.
But, there is one danger here...
Update based on the current state
What if the update is conditional based on the current state?
public void updateEntity(UUID entityId) {
Entity blah = getCurrentState(entityId);
blah.setNumberOfUpdates(blah.getNumberOfUpdates() + 1);
blah.save();
}
Now you have a problem of data loss because if two concurrent threads perform the read (getCurrentState), they will each add 1, arrive at the same number, and the second update will lose the increment of the previous one.
Solving it
There are two solutions.
Serializable isolation level - In most isolation levels, reads (selects) do not hold any exclusive locks and therefore do not block, regardless of whether they are in a transaction or not. Serializable will actually acquire and hold an exclusive lock for every row read, and only release those locks when the transaction commits or rolls back.
Perform the update in a single statement. - A single UPDATE statement should make this atomic for us, i.e. UPDATE entity SET number_of_updates = number_of_updates + 1 WHERE entity_id = ?.
Generally speaking, the latter is much more scalable. The more locks you hold and the longer you hold them, the more blocking you get and therefore less throughput.
To add to the comments above, this situation with #Transactional and "lost updates" is not wrong, however, it may seem confusing, because it does not meet our expectations that #Transactional protects against "lost updates".
"Lost update" problem can happen with the READ_COMMITED isolation level, which is the default for most DBs and JPA providers as well.
To prevent it one needs to use #Transactional(isolation = isolation.REPEATABLE_READ). No need for SERIALIZABLE, that would overkill.
The very good explanation is given by well known JPA champion Vlad Mihalcea in his article: https://vladmihalcea.com/a-beginners-guide-to-database-locking-and-the-lost-update-phenomena/
It is also worth mentioning that a better solution is to use #Version that also can prevent lost updates with an optimistic locking approach.
The problem maybe comes from wiki page https://en.wikipedia.org/wiki/Isolation_(database_systems) where it is shown that "lost update" is a "weaker" problem than "dirty reads" and is never a case, however, the text below contradicts:
You are not terribly wrong, your question is a very interesting observation. I believe (based on your comments) you are thinking about it in your very specific situation whereas this subject is much broader. Let's take it step by step.
ACID
I in ACID indeed stands for isolation. But it does not mean that two or more transactions need to be executed one after another. They just need to be isolated to some level. Most of the relational databases allow to set an isolation level on a transaction even allowing you to read data from other uncommitted transaction. It is up to specific application if such a situation is fine or not. See for example mysql documentation:
https://dev.mysql.com/doc/refman/5.7/en/innodb-transaction-isolation-levels.html
You can of course set the isolation level to serializable and achieve what you expect.
Now, we also have NoSQL databases that don't support ACID. On top of that if you start working with a cluster of databases you may need to embrace eventual consistency of data which might even mean that the same thread that just wrote some data may not receive it when doing a read. Again this is a question very specific to a particular app - can I afford having inconsistent data for a moment in exchange for a fast write?
You would probably lean towards consistent data handled in serializable manner in banking or some financial system and you would probably be fine with less consistent data in a social app but achieving a higher performance.
Update is lost - is that the case?
Yes, that will be the case.
Are we scared of serializable?
Yes, it might get nasty :-) But it is important to understand how it works and what are the consequences. I don't know if this is still the case but I had a situation in a project about 10 years ago where DB2 was used. Due to very specific scenario DB2 was performing a lock escalation to exclusive lock on the whole table effectively blocking any other connection from accessing the table even for reads. That meant only a single connection could be handled at a time.
So if you choose to go with serializable level you need to be sure that your transaction are in fact fast and that it is in fact needed. Maybe it is fine that some other thread is reading the data while you are writing? Just imagine a scenario where you have a commenting system for your articles. Suddenly a viral article gets published and everyone starts commenting. A single write transaction for comment takes 100ms. 100 new comments transactions get queued which effectively will block reading the comments for the next 10s. I am sure that going with read committed here would be absolutely enough and allow you achieve two things: store the comments faster and read them while they are being written.
Long story short:
It all depends on your data access patterns and there is no silver bullet. Sometimes serializable will be required but it has its performance penalty and sometimes read uncommitted will be fine but it will bring inconsistency penalties.

How to achieve synchronization in DB update when application is running on multiple nodes?

I am new to J2EE development and trying to get some basics right. So the doubt is like, suppose I have an employee table somewhere in a database and a function which increases the salarycolumn in employee table.
public synchronized IncreaseSalaryResponse increaseSalary(int empId, long raise);
In my case the service is running on two nodes (so two JVMs) for high availability. Now if two calls are made simultaneously to increase salary of an employee and in case they hit different hosts, the function call which gets over last shall have its effect reflected in the ultimate salary.
Does DB makes sure that such race conditions are avoided at its level?
If not, how to handle this situation in my case.
Please provide ideas.
It depends on the RDMS you are using if it is Transnational or not and the level of Isolation.
According to Wikipedia,
Transactions in a database environment have two main purposes:
1- To provide reliable units of work that allow correct recovery from
failures and keep a database consistent even in cases of system
failure, when execution stops (completely or partially) and many
operations upon a database remain uncompleted, with unclear status.
2- To provide isolation between programs accessing a database concurrently.
If this isolation is not provided, the program's outcome are possibly
erroneous.

How to lock database records in a Java EE application?

I want to write a Java EE web application where different users work with a database. A user can start editing a record, and then either save changes or cancel editing. While the user is editing, the record should be locked for other users. It should be locked on the database level, because there are also other non-Java users editing the same database, locking the records they work on.
I understand some basic Java + databases, but I am not good at multiple-user things like locking. Looking for some examples on the internet, it seems to me like every "hello world" example for a Java EE technology introduces at least one another technology. To access objects in the database, I use JPA. To lock records, I probably need transactions, which brings JTA. To work with JTA, I need JNDI. To work with all those objects, I probably also need EJB and injections... and at this moment I wonder whether this is really the most simple way to solve the problem, or whether I missed something important. I do not know whether all those technologies are necessary (if yes, I will use them; I just would like to be sure before I learn them all). I just see that the examples I found on the web introduce them very generously.
I would like a simple example of a Java EE code which:
uses JPA;
connects to a database described in the "persistence.xml" file;
has a MyObject class with properties id and name, stored in the MYOBJECT table;
has a method (e.g. called from a JSP page) that database-level locks the object with id = 42 (so that non-Java users with access to the same database also cannot modify it), or displays an error if the record is already locked by another user (either another Java user, or a non-Java user);
has another method (e.g. called from another JSP) that either updates the name to a specified value and releases the lock, or just releases the lock if empty string is provided.
For each new technology you introduce in the solution, I would like to hear a very short explanation why did you use it. Also whether that technology requires me to install new libraries, create or modify configuration files, write additional code, etc. (The JSP files which call the methods are not necessary; I am interested in the database-related parts.)
(Another detail: Here is described a difference between EntityTransaction and UserTransaction. If I understand it correctly, JTA is needed only if I use multiple databases. Is it also necessary if I use only one Oracle database with different schemas? If yes, the please write the example code using JTA.)
1) If you want to lock a record in a database, you need something called pessimistic lock. Remember this keyword and use it for further googling. Simply said, pessimistic lock means really locking the record in the database. Which means that if your Java application makes a pessimistic lock, the record is really locked; so even if some other non-Java program accesses the same database, the record will be locked, and they cannot modify it.
On the other hand, the so-called optimistic lock is mostly a pretend-lock. It is, approximately, a "we most likely don't need to lock this record anyway, so we will not really lock it, and if something bad happens, then we will try to fix the problem afterwards" approach. Which actually makes sense and increases performance, but only in situations where the assumptions behind this approach are true; where the conflicts are really rare, and where you really can fix the problem afterwards. Unless you understand it well (which you don't seem to), just don't use it.
2) JPA is a unified approach for using a database with transactions and stuff, and it also maps objects to tables for you. This is probably what you want.
JTA is the same stuff, plus a unified approach to use transactions over many databases, so it is more powerful than JPA, but that means it has additional functionality that you don't really need. On the other hand, for using these superpowers you pay some cost, like losing the ability to start and transactions on whim. The server will manage the transactions for you, as the server needs. If you completely understand how exactly that works, then you know whether this fits your needs; but if you don't, then you rather avoid it. Your development environment may offer you JTA as a default option, but that is only because it thinks that you are going to write Skynet. By not using JTA you also don't have to use JNDI, EJB, and many other Skynet-related technologies.
3) After hearing this, now it is time for you to do your homework. Because now you have an idea of what to do. Read the "javax.persistence" API documentation.
You can use annotated Java classes to represent your database tables; or you can use the old-fashioned SQL queries; or both, as you wish. You can use either of them to lock and release records. A lock must be inside of a transaction, so if you want to keep the lock, you have to keep the transaction.
We will not solve this for you. You are asking for everything. You need to code it your self, but here is a link for JPA locking.
Hint: Use #Version
Read here for information on locking for JPA

Restrict postges access from java clients by using java program on a server

Perhaps this question is not very clear but I didn't find better words for the heading, which describes the problem I like to deal with shortly.
I want to restrict access from a java desktop application to postgres.
The background:
Suppose you have 2 apps running and the first Application has to do some complex calculations on the basis of data in the db. To nail the immutability of the data in the db down i'd like to lock the db for insert, update and delete operations. On client side i think it's impossible to handle this behaviour satisfactory. So i thought about to use a little java-app on server-side which works like a proxy. So the task is to hand over CRUD (Create Read Update Delete) operations until it gets a command to lock. After a lock it rejects all CUD operations until it gets a unlock command from the locking client or a timeout is reached.
Questions:
What do you think about this approach?
Is it possible to lock a Database while using such an approach?
Would you prefer Java SE or Java EE as server-side java app?
Thanks in advance.
Why not use transactions in your operations? The database has features to maintain data integrity itself, rather than resorting to a brute operation such as a total-database lock.
This locking mechanism you describe sounds like it would be a pain for the users. Are the users initating the lock or is the software itself? If it's the users, you can expect some problems when Bob hits lock and then goes to lunch for 2 hours, forgetting to unlock the database first...
Indeed... there are a few proper ways to deal with this problem.
Just lock the tables in your code. Postgresql has commands for locking entire tables that you could run from your client application
Pick a transaction isolation level that doesn't have the problem of reading data that was committed after your txn started (BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ).
Of these, by far the most efficient is to use repeatable read as your isolation level. Postgres supports this quite efficiently, and it will give you a consistent view of the data without such heavy locking of the db.
Year i thought about transactions but in this case i can't use them. I'm sorry i didn't mention it exactly. So assume the follow easy case:
A calculation closes one area of responsibility. After calc a new one is opened and new inserts are dedicated to it. But while calculation-process a insert or update or delete is not allowed to the data of the (currently calculated) area of responsibility. More over a delete is strictly prohibited because data has to be archived.
So imo the use of transactions doesn't fit this requirement. Or did i miss sth.?
ps: (off topic) #jsight: i currently read that intenally postgres mapps "repeatable read" to "serializable", so using "repeatable read" gets you more restriction then you would perhaps expect.

Categories