I'm trying to build an application in Java(JDK1.8) with Connector/J and MySql. I'm told that Serializable is the highest level, but it affects performance, so Serializable is not commonly adopted.
But consider this situation:
There are two commits which are going to update the fields of the same row (commit A and commit B). If A and B happens concurrently and the isolation level is not Serializable, there would be data races, which makes the fields inconsistent. But in Serializable level, the two updates won't happen at the same time, so either A happens before B or B happens before A, and the row will either be in version A or in version B, but not some mix of A and B.
I thought Atomicity of ACID guarantees the synchronization of A and B. But it seems that the definition of Atomicity only guarantees one transaction happens "all or nothing", it says nothing about the concurrent commits.
So should I use Serializable to prevent the data race? Which one of the ACID actually guarantees the atomicity of one update?
No. To avoid the problem you described you don't need "serializable". There are also other isolation levels. What you are afraid of can only happen with "read uncommitted". For any other isolation level the fields within a single record will always be consistent.
Related
I kinda understand the purpose of entity locking and transaction isolation level, but can't get the difference between pessimistic locking and serializable level. As I understand, in both cases the table gets locked and no other transaction can access it, so in both cases actions to prevent concurrent modifications are taken by the DB, which looks like there's no difference. Could someone please explain if there actually is difference here?
(I don't assume you're using ObjectDB. You'll probably get better answers if you edit your question, and include the specific database you're using with JPA.)
I don't like the terms optimistic locking and pessimistic locking. I think optimistic concurrency control and pessimistic concurrency control are more accurate. Locks are the most common way to deal with concurrency control problems, but they're not the only way. (Date's chapter on concurrency in An Introduction to Database Systems is about 25 pages long.)
The topics of transaction management and concurrency control aren't limited to the relational model of data or to SQL database management systems (dbms). Transaction isolation levels have to do with SQL.
Pessimistic concurrency control really means only that you expect the dbms to prevent other transactions from accessing something when the dbms starts processing your request. Behavior is up to the dbms vendor. Different vendors might prevent access by locking the entire database, locking some tables, locking some pages, or locking some rows. Or the dbms might prevent access in some other way that doesn't directly involve locks.
Transaction isolation levels are how SQL tries to solve concurrency control problems. Transaction isolation levels are defined in SQL standards.
The serializable transaction isolation level guarantees that the effect of concurrent, serializable transactions is the same as running them one at a time in some particular order. The guarantee describes the effect--not any particular kind of concurrency control or locking needed to achieve that effect.
Pessimistic locking normally involves writelocks to the database to do changes in a safe and exclusive way. This is normally done by doing select ... for update. This will prevent or delay other connections from doing their own select ... for update or changes on the locked records in the database until the transaction of the first connection is completed.
Serializable Isolation Level does not need to be concerned with changes but makes sure that after the transaction started, the result of reads will always stay the same (except changes by the transaction itself) until that transactions ends. To support this "Non-MVCC"-DBMS must set many locks (on each record read by the connection working serializable) in the database and therefore might hinder concurrency very much.
The same effect can also be achieved without locking when databases provide MVCC as do Oracle, MySql-INNODB, MariaDB, Postgres
I have a bank project which customer balances should be updated by parallel threads in parallel applications. I hold customer balances in an Oracle database. My java applications will be implemented with Spring and Hibernate.
How can i implement the race condition between parallel applications? Should my solution be at database level or at application level?
I assume what you would like to know is how to handle concurrency, preventing race conditions which can occur where two parts of the application modify and accidentally overwrite the same data.
You have mostly two strategies for this: pessimistic locking and optimistic locking:
Pessimistic locking
here you assume that the likelyhood that two threads overwrite the same data is high, so you would like it to handle it in a transparent way. To handle this, increase the isolation level of your Spring transactions from it's default value of READ_COMMITTED to for example REPEATABLE_READ which should be sufficient in most cases:
#Transactional(isolation=Isolation.REPEATABLE_READ)
public void yourBusinessMethod {
...
}
In this case if you read some data in the beginning of the method, you are sure that noone can overwrite the data in the database while your method is ongoing. Note that it's still possible for another thread to insert extra records to a query you made (a problem known as phantom reads), but not change the records you already read.
If you want to protect against phantom reads, you need to upgrade the isolation level to SERIALIZABLE. The improved isolation comes at a performance cost, your program will run slower and will more frequently 'hang' waiting for the other part of the program to finish.
Optimistic Locking
Here you assume that data access colisions are rare, and that in the rare cases they occur they are easilly recoverable by the application. In this mode, you keep all your business methods in their default REPEATABLE_READ mode.
Then each Hibernate entity is marked with a version column:
#Entity
public SomeEntity {
...
#Version
private Long version;
}
With this each entity read from the database is versioned using the version column. When Hibernate write changes to an entity in the database, it will check if the version was incremented since the last time that transaction read the entity.
If so it means someone else modified the data, and decisions where made using stale data. In this case a StaleObjectException is thrown, that needs to be caught by the application and handled, ideally at a central place.
In the case of a GUI, you usuall catch the exception, show a message saying user xyz changed this data while you where also editing it, your changes are lost. Press Ok to reload the new data.
With optimistic locking your program will run faster but the applications needs to handle some concurrency aspects that would otherwise be transparent with pessimistic locking: version entities, catch exceptions.
The most frequently used method is optimistic locking, as it seems to be acceptable in most applications. With pessimistic locking it's very easy to cause performance problems, specially when data access colisions are rare and can be solved in a simple way.
There are no constraints to mix the use of the two concurrency handling methods in the same application if needed.
My understanding of an atomic operation is that it should not be possible for the steps of the operation to be interleaved with those of any other operation - that it should be executed as a single unit.
I have a method for creating a database record that will first of all check if a record with the same value, which also satisfies certain other parameters, already exists, and if so will not create the record.
In fakecode:
public class FooDao implements IFooDao {
#Transactional
public void createFoo(String fooValue) {
if (!fooExists(fooValue)) {
// DB call to create foo
}
}
#Transactional
public boolean fooExists(String fooValue) {
// DB call to check if foo exists
}
}
However I have seen that it is possible for two records with the same value to be created, suggesting that these operations have interleaved in some way. I am aware that with Spring's transactional proxies, self-invocation of a method within an object will not use the transactional logic, but if createFoo() is called from outside the object then I would expect fooExists() to still be included in the same transaction.
Are my expectations as to what transactional atomicity should enforce wrong? Do I need to be using a synchronized block to enforce this?
What a transaction really mean for the database depends on the isolation level. The wikipdia article on Isolation (database systems) explain it well.
Normally one use a not so high isolation level, for example: Read committed. This mean that one can read data from an other transaction not until the other transaction is committed.
In your case this is not enough, because this is the opposite from what you want. - So the obvious solution would be using a more restrictive and slower isolation level: Repeatable reads.
But to be honest, I would use an other way: Make the relevant column unique (but do not remove your if (!fooExists(fooValue))-check). So in 99% your check work. In the remaining 1% you will get an exception, because you try to violate the unique constraint.
Transactional means all updates occur within the same transaction, ie all updates/inserts/delete succeed or all are rolled back (for example if you update multiple tables).
It doesn't guarantee anything about the behaviour of queries within the transaction, which depend on the RDBMS and its configuration (configuration of the isolation level on the database).
#Transactional does not by default make the code synchronized. Two separate threads can enter the same block at the same time and cause inserts to occur. synchronizing the method isn't really a good answer either since that can drastically affect application performance. If your issue is that two identical records are being created by two different threads you may want to add some indexes with unique constraint on the database so that duplicate inserts will fail.
Forgive me if it's an obvious question, but I read the documentation from top to bottom, and I'm still not sure what is the answer to this question:
If I have a datastore transaction that reads the number of a counter and increments it, can I be certain that 2 parallel servlets won't interleave each other? In other words, the docs only say that the transaction either fails or succeed atomically, but it does not say the transaction locks the data for other servlets, so what happens when 2 servlets access the same entity at the same time?
If indeed the transactions are not thread-safe, should I just use synchronized when accessing shared datastore counters?
Transactions are "thread-safe" as long as you do writes on for entities that have a common ancestor.
Take a look at the GAE low level datastore API and once you understand how Entities work, it's pretty straightforward.
https://developers.google.com/appengine/docs/java/datastore/entities
I was going through ACID properties regarding Transaction and encountered the statement below across the different sites
ACID is the acronym for the four properties guaranteed by transactions: atomicity, consistency, isolation, and durability.
**My question is specifically about the phrase.
guaranteed by transactions
**. As per my experience these properties are not taken care by
transaction automatically. But as a java developer we need to ensure that these properties criteria are met.
Let's go through for each property:-
Atomicity:- Assume when we create the customer the account should be created too as it is compulsory. So now during transaction
the customer gets created while during account creation some exception oocurs. So the developer can now go two ways: either he rolls back the
complete transaction (atomicity is met in this case) or he commits the transaction so customer will be created but not the
account (which violates the atomicity). So responsibility lies with developer?
Consistency:- Same reason holds valid for consistency too
Isolation :- as per definition isolation makes a transaction execute without interference from another process or transactions.
But this is achieved when we set the isolation level as Serializable. Otherwis in another case like read commited or read uncommited
changes are visible to other transactions. So responsibility lies with the developer to make it really isolated with Serializable?
Durability:- If we commit the transaction, then even if the application crashes, it should be committed on restart of application. Not sure if it needs to be taken care by developer or by database vendor/transaction?
So as per my understanding these ACID properties are not guaranteed automatically; rather we as a developer sjould achieve them. Please let me know
if above understanding regarding each point is correct? Would appreciate if you folks can reply for each point(yes/no will also do.
As per my understanding read committed should be most logical isolation level in most application, though it depends on requirement too.
The transactions guarantees ACID more or less:
1) Atomicity. Transaction guarantees all changes are made or none of them. But you need to manually set the start and end of a transaction and manually perform commit or rollback. Depending on the technology you use (EJB...), transactions are container-managed, setting the start and end to the whole "method" you are creating. You can control by configuration if a method invoked requires a new transaction or an existing one, no transaction...
2) Consistency. Guaranteed by atomicity.
3) Isolation. You must define the isolation level your application needs. Default value is defined depending upon the database, container... The commonest one is READ COMMITTED. Be careful with locks as can cause dead-lock depending on your logic and isolation level.
4) Durability. Managed entirely by the database. If your commit executes without error, nearly all database guarantees durability of changes, but some scenarios can cause to not guarantee that (writes to disk are cached in memory and flushed later...)
In general, you should be aware of transactions and configure it in the container of declare by code the star and end (commit, rollback).
Database transactions are atomic: They either happen in their entirety or not at all. By itself, this says nothing about the atomicity of business transactions. There are various strategies to map business transactions to database transactions. In the simplest case, a business transaction is implemented by one database transaction (where a business transaction is aborted by rolling back the database one). Then, atomicity of database transactions implies atomicity of business transactions. However, things get tricky once business transactions span several database transactions ...
See above.
Your statement is correct. Often, the weaker guarantees are sufficient to prove correctness.
Database transactions are durable (unless there is a hardware failure): if the transaction has committed, its effect will persist until other transactions change the data. However, calling code might not learn whether a transaction has comitted if the database or the network between database and calling code fails. Therefore
If we commit the transaction, then even if application crash, it should be committed on restart of application.
is wrong. If the transaction has committed, there is nothing left to do.
To summarize, the database does give strong guarantees - about the behaviour of the database. Obviously, it can not give guarantees about the behaviour of the entire application.