Forgive me if it's an obvious question, but I read the documentation from top to bottom, and I'm still not sure what is the answer to this question:
If I have a datastore transaction that reads the number of a counter and increments it, can I be certain that 2 parallel servlets won't interleave each other? In other words, the docs only say that the transaction either fails or succeed atomically, but it does not say the transaction locks the data for other servlets, so what happens when 2 servlets access the same entity at the same time?
If indeed the transactions are not thread-safe, should I just use synchronized when accessing shared datastore counters?
Transactions are "thread-safe" as long as you do writes on for entities that have a common ancestor.
Take a look at the GAE low level datastore API and once you understand how Entities work, it's pretty straightforward.
https://developers.google.com/appengine/docs/java/datastore/entities
Related
I was asked a use case in one of the interviews where the interviewer asked:
Suppose you have a bank account and you are doing online shopping
Your brother has your ATM card and he is about to make a transaction
Your father has gone to give a withdrawal cheque in the bank. All
these 3 transactions happen at the same time.
How will these transactions be managed such that the balance is not overdrawn?
Me: I said I would use synchronisation on the account balance object
Interviewer : Not satisfied. Next question.
Can someone please explain what could have been the answer? Would a database lock or transaction isolation be a better approach?
I am a beginner in JAVA so forgive my naivety.
Concurrency control is a database management systems (DBMS) concept
that is used to address conflicts with the simultaneous accessing or
altering of data that can occur with a multi-user system. concurrency
control, when applied to a DBMS, is meant to coordinate simultaneous
transactions while preserving data integrity. 1 The Concurrency is
about to control the multi-user access of Database
Basic timestamping is a concurrency control mechanism that eliminates deadlock. This method doesn’t use locks to control concurrency, so it is impossible for deadlock to occur. According to this method a unique timestamp is assigned to each transaction, usually showing when it was started. This effectively allows an age to be assigned to transactions and an order to be assigned. Data items have both a read-timestamp and a write-timestamp. These timestamps are updated each time the data item is read or updated respectively.
Problems arise in this system when a transaction tries to read a data item which has been written by a younger transaction. This is called a late read. This means that the data item has changed since the initial transaction start time and the solution is to roll back the timestamp and acquire a new one. Another problem occurs when a transaction tries to write a data item which has been read by a younger transaction. This is called a late write. This means that the data item has been read by another transaction since the start time of the transaction that is altering it. The solution for this problem is the same as for the late read problem. The timestamp must be rolled back and a new one acquired.
Adhering to the rules of the basic timestamping process allows the transactions to be serialized and a chronological schedule of transactions can then be created. Timestamping may not be practical in the case of larger databases with high levels of transactions. A large amount of storage space would have to be dedicated to storing the timestamps in these cases.
Source:-http://databasemanagement.wikia.com/wiki/Concurrency_Control
Normally, when you start a project there are some kind of behaviour that must be handled. Transaction isolation is one of these, so I would evaluate the most appropriate transaction level in this case. To know more, I would recommend you to read about view serialization. This is one of the way of doing this.
As I understand it, all transactions are Thread-bound (i.e. with the context stored in ThreadLocal). For example if:
I start a transaction in a transactional parent method
Make database insert #1 in an asynchronous call
Make database insert #2 in another asynchronous call
Then that will yield two different transactions (one for each insert) even though they shared the same "transactional" parent.
For example, let's say I perform two inserts (and using a very simple sample, i.e. not using an executor or completable future for brevity, etc.):
#Transactional
public void addInTransactionWithAnnotation() {
addNewRow();
addNewRow();
}
Will perform both inserts, as desired, as part of the same transaction.
However, if I wanted to parallelize those inserts for performance:
#Transactional
public void addInTransactionWithAnnotation() {
new Thread(this::addNewRow).start();
new Thread(this::addNewRow).start();
}
Then each one of those spawned threads will not participate in the transaction at all because transactions are Thread-bound.
Key Question: Is there a way to safely propagate the transaction to the child threads?
The only solutions I've thought of to solve this problem:
Use JTA or some XA manager, which by definition should be able to do
this. However, I ideally don't want to use XA for my solution
because of it's overhead
Pipe all of the transactional work I want performed (in the above example, the addNewRow() function) to a single thread, and do all of the prior work in the multithreaded fashion.
Figuring out some way to leverage InheritableThreadLocal on the Transaction status and propagate it to the child threads. I'm not sure how to do this.
Are there any more solutions possible? Even if it's tastes a little bit of like a workaround (like my solutions above)?
The JTA API has several methods that operate implicitly on the current Thread's Transaction, but it doesn't prevent you moving or copying a Transaction between Threads, or performing certain operations on a Transaction that's not bound to the current (or any other) Thread. This causes no end of headaches, but it's not the worst part...
For raw JDBC, you don't have a JTA Transaction at all. You have a JDBC Connection, which has its own ideas about transaction context. In which case, the transaction is Connection bound, not thread bound. Pass the Connection around and the tx goes with it. But Connections aren't necessarily threadsafe and are probably a performance bottleneck anyhow, so sharing one between multiple concurrent threads doesn't really help you. You likely need multiple Connections that think they are in the same Transaction, which means you need XA, since that's how the db identifies such cases. At which point you're back to JTA, but now with a JCA in the picture to handle the Connection management properly. In short, you've reinvented the JavaEE application server.
For frameworks that layer on JDBC e.g. ORMs like Hibernate, you have an additional complication: their abstractions are not necessarily threadsafe. So you can't have a Session that is bound to multiple Threads concurrently. But you can have multiple concurrent Sessions that each participate in the same XA transaction.
As usual it boils down to Amdahl's law. If the speedup you get from using multiple Connections per tx to allow for multiple concurrent Threads to share the db I/O work is large relative to what you get from batching, then the overhead of XA is worthwhile. If the speedup is in local computation and the db I/O is a minor concern, then a single Thread that handles the JDBC Connection and offloads non-IO computation work to a Thread pool is the way to go.
First, a clarification: if you want to speed up several inserts of the same kind, as your example suggests, you will probably get the best performance by issuing the inserts in the same thread and using some type of batch inserting. Depending on your DBMS there are several techniques available, look at:
Efficient way to do batch INSERTS with JDBC
What's the fastest way to do a bulk insert into Postgres?
As for your actual question, I would personally try to pipe all the work to a worker thread. It is the simplest option as you don't need to mess with either ThreadLocals or transaction enlistment/delistment. Furthermore, once you have your units of work in the same thread, if you are smart you might be able to apply the batching techniques above for better performance.
Lastly, piping work to worker threads does not mean that you must have a single worker thread, you could have a pool of workers and achieve some parallelism if it is really beneficial to your application. Think in terms of producers/consumers.
I have a method in which some database insert operations are happening using hibernate and i want them to be thread safe. The method is getting some data in parametres and its a possiblity that sometimes two calls are made with same data at same point of time.
I can't lock those tables because of performance degradation. Can anyone suggest making the method as synchronized will solve issue?
Synchronizing a method will ensure that it can only be accessed by one thread at a time. If this method is your only means of writing to the database, then yes, this will stop two threads from writing at the same time. However, you still have to deal with the fact that you have multiple insert operations with the same data.
You should let Hibernate handle the concurrency, that's what it is meant to do. Don't assume Hibernate will lock anything: it supports optimistic transactions for exactly this purpose. Quote from the above link:
The only approach that is consistent with high concurrency and high scalability, is optimistic concurrency control with versioning. Version checking uses version numbers, or timestamps, to detect conflicting updates and to prevent lost updates. Hibernate provides three possible approaches to writing application code that uses optimistic concurrency.
Database Concurrency is handled by transactions. Transactions have the Atomic Consistent Isolated Durable (ACID) properties. They provide isolation between programs accessing a database concurrently. In the Hibernate DAO template of spring framework there are single line methods for CRUD operations on the database. When used individually these don't need to be synchronized by method. Spring provides declarative (XML), programmatic and annotation meta-data driven transaction management if you need to declare "your method" as transactional with specific propagation settings, rollbackFor settings, isolation settings. So in "your method" you can do multiple save,update,deletes etc and the ORM will ensure that it is executed with the transaction settings you have given in the meta-data.
Another issue is that the thread has to have the lock on all the objects that are taking part in the transaction.Otherwise the transaction might fail or the ORM will persist stale data. In another situation it can result in a deadlock because of lock-ordering. I think this is what really answers your question.
Both objects a and b have an instance variable of the type Lock. A boolean flag can be used to indicate the success of the transaction. The client code can retry the same transaction if it fails.
if (a.lock.tryLock()) {
try {
if (b.lock.tryLock()) {
try {
// persist or update object a and b
} finally {
b.lock.unlock();
}
}
} finally {
a.lock.unlock();
}
}
The problem with using synchronized methods is that it locks up the entire Service or DAO class making other service methods unavailable to other threads. By using individual locks on objects we can gain the advantage of fine grained concurrency.
No. This method probably uses another methods and objects, which may be not thread safe. synchronized makes threads to use that's method's object monitor only once at a time, so it makes thread-safe a method with respect to the object.
If you are sure that all other threads use shared functionality only with this method, then making it synchronized may be sufficient.
Choosing the best strategy depends on the architecture, sometimes to increase performance seems to be easier to use the trick like method synchronization, but this is bad approach.
There's no doubts, you should use transactions, and if with that strategy you're facing performance issues you should optimize your db queries or db structure.
Please remember that "Synchronization" should be as much as possible atomic.
I have a problem when I try to persist objects using multiple threads.
Details :
Suppose I have an object PaymentOrder which has a list of PaymentGroup (One to Many relationship) and PaymentGroup contains a list of CreditTransfer(One to Many Relationship again).
Since the number of CreditTransfer is huge (in lakhs), I have grouped it based on PaymentGroup(based on some business logic)
and creating WORKER threads(one thread for each PaymentGroup) to form the PaymentOrder objects and commit in database.
The problem is, each worker thread is creating one each of PaymentOrder(which contains a unique set of PaymentGroups).
The primary key for all the entitties are auto generated.
So there are three tables, 1. PAYMENT_ORDER_MASTER, 2. PAYMENT_GROUPS, 3. CREDIT_TRANSFERS, all are mapped by One to Many relationship.
Because of that when the second thread tries to persist its group in database, the framework tries to persist the same PaymentOrder, which previous thread committed,the transaction fails due to some other unique field constraints(the checksum of PaymentOrder).
Ideally it must be 1..n..m (PaymentOrder ->PaymentGroup-->CreditTransfer`)
What I need to achieve is if there is no entry of PaymentOrder in database make an entry, if its there, dont make entry in PAYMENT_ORDER_MASTER, but only in PAYMENT_GROUPS and CREDIT_TRANSFERS.
How can I ovecome this problem, maintaining the split-master-payment-order-using-groups logic and multiple threads?
You've got options.
1) Primitive but simple, catch the key violation error at the end and retry your insert without the parents. Assuming your parents are truly unique, you know that another thread just did the parents...proceed with children. This may perform poorly compared to other options, but maybe you get the pop you need. If you had a high % parents with one child, it would work nicely.
2) Change your read consistency level. It's vendor specific, but you can sometimes read uncommitted transactions. This would help you see the other threads' work prior to commit. It isn't foolproof, you still have to do #1 as well, since another thread can sneak in after the read. But it might improve your throughput, at a cost of more complexity. Could be impossible, based on RDBMS (or maybe it can happen but only at DB level, messing up other apps!)
3) Implement a work queue with single threaded consumer. If the main expensive work of the program is before the persistence level, you can have your threads "insert" their data into a work queue, where the keys aren't enforced. Then have a single thread pull from the work queue and persist. The work queue can be in memory, in another table, or in a vendor specific place (Weblogic Queue, Oracle AQ, etc). If the main work of the program is before the persistence, you parallelize THAT and go back to a single thread on the inserts. You can even have your consumer work in "batch insert" mode. Sweeeeeeeet.
4) Relax your constraints. Who cares really if there are two parents for the same child holding identical information? I'm just asking. If you don't later need super fast updates on the parent info, and you can change your reading programs to understand it, it can work nicely. It won't get you an "A" in DB design class, but if it works.....
5) Implement a goofy lock table. I hate this solution, but it does work---have your thread write down that it is working on parent "x" and nobody else can as it's first transaction (and commit). Typically leads to the same problem (and others--cleaning the records later, etc), but can work when child inserts are slow and single row insert is fast. You'll still have collisions, but fewer.
Hibernate sessions are not thread-safe. JDBC connections that underlay Hibernate are not thread safe. Consider multithreading your business logic instead so that each thread would use it's own Hibernate session and JDBC connection. By using a thread pool you can further improve your code by adding ability of throttling the number of the simultaneous threads.
Suppose that I have a method called doSomething() and I want to use this method in a multithreaded application (each servlet inherits from HttpServlet).I'm wondering if it is possible that a race condition will occur in the following cases:
doSomething() is not staic method and it writes values to a database.
doSomething() is static method but it does not write values to a database.
what I have noticed that many methods in my application may lead to a race condition or dirty read/write. for example , I have a Poll System , and for each voting operation, a certain method will change a single cell value for that poll as the following:
[poll_id | poll_data ]
[1 | {choice_1 : 10, choice_2 : 20}]
will the JSP/Servlets app solve these issues by itself, or I have to solve all that by myself?
Thanks..
It depends on how doSomething() is implemented and what it actually does. I assume writing to the database uses JDBC connections, which are not threadsafe. The preferred way of doing that would be to create ThreadLocal JDBC connections.
As for the second case, it depends on what is going on in the method. If it doesn't access any shared, mutable state then there isn't a problem. If it does, you probably will need to lock appropriately, which may involve adding locks to every other access to those variables.
(Be aware that just marking these methods as synchronized does not fix any concurrency bugs. If doSomething() incremented a value on a shared object, then all accesses to that variable need to be synchronized since i++ is not an atomic operation. If it is something as simple as incrementing a counter, you could use AtomicInteger.incrementAndGet().)
The Servlet API certainly does not magically make concurrency a non-issue for you.
When writing to a database, it depends on the concurrency strategy in your persistence layer. Pessimistic locking, optimistic locking, last-in-wins? There's way more going on when you 'write to a database' that you need to decide how you're going to handle. What is it you want to have happen when two people click the button at the same time?
Making doSomething static doesn't seem to have too much bearing on the issue. What's happening in there is the relevant part. Is it modifying static variables? Then yes, there could be race conditions.
The servlet api will not do anything for you to make your concurrency problems disappear. Things like using the synchronized keyword on your servlets are a bad idea because you are basically forcing your threads to be processed one at a time and it ruins your ability to respond quickly to multiple users.
If you use Spring or EJB3, either one will provide threadlocal database connections and the ability to specify transactions. You should definitely check out one of those.
Case 1, your servlet uses some code that accesses a database. Databases have locking mechanisms that you should exploit. Two important reasons for this: the database itself might be used from other applications that read and write that data, it's not enough for your app to deal with contending with itself. And: your own application may be deployed to a scaled, clustered web container, where multiple copies of your code are executing on separate machines.
So, there are many standard patterns for dealing with locks in databases, you may need to read up on Pessimistic and Optimistic Locking.
The servlet API and JBC connection pooling gives you some helpful guarantees so that you can write your servlet code without using Java synchronisation provided your variables are in method scope, in concept you have
Start transaction (perhaps implicit, perhaps on entry to an ejb)
Get connection to DB ( Gets you a connection from pool, associated with your tran)
read/write/update code
Close connection (actually keeps it for your thread until your transaction commits)
Commit (again maybe implictly)
So your only real issue is dealing with any contentions in the DB. All of the above tends to be done rather more nicely using things such as JPA these days, but under the covers thats more or less what's happening.
Case 2: static method, this presumably implies that you now keep everything in a memory structure. This (barring remote invocation of some sort) impies a single JVM and you managing your own locking. Should your JVM or machine crash I guess you lose your data. If you care about your data then using a DB is probably better.
OR, how about a completely other approach: servlet simply records the "vote" by writing a message to a persistent JMS queue. Have some other processes pick up the votes from the queue and adds them up. You won't give immediate feedback to the voter this way, but you decouple the user's experience from the actual (in similar scenarios) quite complex processing .
I thing that the best solution for your problem is to use something like "synchronized" keyword and wait/notify!