I am experiencing an issue while trying to use write-behind on caches connected to tables which have foreign key constraints between them. Seemingly the write-behind mechanism is not executing the updates/inserts in a deterministic order, but rather is trying to push all the collected changes per each cache consecutively in some unknown order. But as we have foreign keys in the tables, the order of the operation matters, so parent objects should be inserted/updated first, and children only after that (otherwise foreign key violations are thrown from the DB).
It seems that the current implementation is trying to workaround this problem on a trial-and-error basis (org.apache.ignite.cache.store.jdbc.CacheAbstractJdbcStore:888), which means that it will periodically retry to flush the changes again and again for the caches in case of which a constraint violation occured. So the "child" cache will periodically retry to flush, until the "parent" cache gets flushed first. This ultimately will result in getting the data into the DB, but it also means a lot of unsuccessful tries in case of complex hierarchical tables, until the correct order is "found". This results in poor performance and unnecessary shelling of the DB.
Do you have any suggestions on how could I circumvent this issue?
(Initially I were trying with write-through, but it resulted in VERY poor performance, because the CacheAbstractJdbcStore is seemingly opening a new prepared statement for each insert/update operation.)
With write-behind the order of store updates is undefined because each node writes independently and asynchronously. If you have foreign key constraints, you should use write-through.
As for write-through performance, CacheAbstractJdbcStore operates with a configurable DataSource, so it depend on its implementation whether it opens a new connection each time or not. If you use some pooled version, this will not happen.
Related
I have two table with below model:
CREATE TABLE IF NOT EXISTS INV (
CODE TEXT,
PRODUCT_CODE TEXT,
LOCATION_NUMBER TEXT,
QUANTITY DECIMAL,
CHECK_INDICATOR BOOLEAN,
VERSION BIGINT,
PRIMARY KEY ((LOCATION_NUMBER, PRODUCT_CODE)));
CREATE TABLE IF NOT EXISTS LOOK_INV (
LOCATION_NUMBER TEXT,
CHECK_INDICATOR BOOLEAN,
PRODUCT_CODE TEXT,
CHECK_INDICATOR_DDTM TIMESTAMP,
PRIMARY KEY ((LOCATION_NUMBER), CHECK_INDICATOR, PRODUCT_CODE))
WITH CLUSTERING ORDER BY (CHECK_INDICATOR ASC, PRODUCT_CODE ASC);
I have a business operation where i need to update CHECK_INDICATOR in both the tables and QUANTITY in INV table.
As CHECK_INDICATOR is a part of key in LOOK_INV table, i need to delete the row first and insert a new row.
Below are the three operations i need to perform in batch fashion (either all will be executed sucessfully or none should be executed)
Delete row from LOOK_INV table.
Insert row in LOOK_INV table.
Update QUANTITY and CHECK_INDICATOR in INV table.
As INV table is getting access by multiple threads, i need to make sure before updating INV table row that it has not been changed since last read.
I am using LWT transaction to update INV table using VERSON column and batch operation for deletion and insertion in LOOK_INV table.I want to add all the three operation in batch.But since LWT is not acceptable in batch i need to execute in aforesaid fashion.
The problem with this approach is that in some scenario batch get executed sucessfully but updating INV table results in timeout exception and data become incosistent in both the table.
Is there any feature provided by cassandra to handle these type of scenario elegantly?
Caution with Lightweight Transactions (LWT)
Lightweight Transactions are currently considered a Cassandra anti-pattern because of the performance issues you are suffering.
Here is a bit of context to explain.
Cassandra does not use RDBMS ACID transactions with rollback or locking mechanisms. It does not provide locking because of a fundamental constraint on all kinds of distributed data store called the CAP Theorem. It states that it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees:
Consistency (all nodes see the same data at the same time)
Availability (a guarantee that every request receives a response about whether it was successful or failed)
Partition tolerance (the system continues to operate despite arbitrary message loss or failure of part of the system)
Because of this, Cassandra is not good for atomic operations and you should not use Cassandra for this purpose.
It does provide lightweight transactions, which can replace locking in some cases. But because the Paxos protocol (the basis for LWT) involves a series of actions that occur between nodes, there will be multiple round trips between the node that proposes a LWT and the other replicas that are part of the transaction.
This has an adverse impact on performance and is one reason for the WriteTimeoutException error. In this situation you can't know if the LWT operation has been applied, so you need to retry it in order to fallback to a stable state. Because LWTs are so expensive, the driver will not automatically retry it for you.
LTW comes with big performance penalties if used frequently, and we see some clients with big timeout issues due to using LWTs.
Lightweight transactions are generally a bad idea and should be used infrequently.
If you do require ACID properties on part of your workload but still require it to scale , consider shifting that part of your load to cochroach BD.
In summary, if you do need ACID transactions it is generally a lot easier to bring a second technology in.
I am building a web application. which inserts records in database. I validate records before inserting them in database. If between the time of my validation check and insertion, another application put the db state such that a unique key violation constraints occur if I attempt to insert these records that I have just validated for insertion. How can I avoid this kind of problem. I am using an oracle database and my development language is java.
Basically, you can't unless you change your constraints. You have several ways to to:
You keep the unique constraint and deal with the database exception in your Java code. Race conditions can happen and you have to deal with it.
You lock the entire table as soon as someone enters "insertion mode" in your app, effectively limiting inserts to one at a time. This would mean blocking other users in your application from entering edit mode until the first one is done. Probably not a good idea, but can work when you have very few users.
Remove the constraint. Now this might seem difficult but think about it. Do you really need globally unique entries in some fields? Or can you work around that by including an additional column in your key. This could be an artificial counter, effectively making each row unique again or maybe just the UserID, so that the unique constraint is only checked within each user?
I have a table like (id INTEGER, sometext VARCHAR(255), ....) with id as the primary key and a UNIQUE constraint on sometext. It gets used in a web server, where a request needs to find the id corresponding to a given sometext if it exists, otherwise a new row gets inserted.
This is the only operation on this table. There are no updates and no other operations on this table. Its sole purpose is to persistently number of encountered values of sometext. This means that I can't drop the id and use sometext as the PK.
I do the following:
First, I consult my own cache in order to avoid any DB access. Nearly always, this works and I'm done.
Otherwise, I use Hibernate Criteria to find the row by sometext. Usually, this works and again, I'm done.
Otherwise, I need to insert a new row.
This works fine, except when there are two overlapping requests with the same sometext. Then an ConstraintViolationException results. I'd need something like INSERT IGNORE or INSERT ... ON DUPLICATE KEY UPDATE (Mysql syntax) or MERGE (Firebird syntax).
I wonder what are the options?
AFAIK Hibernate merge works on PK only, so it's inappropriate. I guess, a native query might help or not, as it may or may not be committed when the second INSERT takes place.
Just let the database handle the concurrency. Start a secondary transaction purely for inserting the new row. if it fails with a ConstraintViolationException, just roll that transaction back and read the new row.
Not sure this scales well if the likelihood of a duplicate is high, a lot of extra work if some percent (depends on database) of transactions have to fail the insert and then reselect.
A secondary transaction minimizes the length of time the transaction to add the new text takes, assuming the database supports it correctly, it might be possible for the thread 1 transaction to cause the thread 2 select/insert to hang until the thread 1 transaction is committed or rolled back. Overall database design might also affect transaction throughput.
I don't necessarily question why sometext can't be a PK, wondering why you need to break it out at all. Of course, large volumes might substantially save space if sometext records are large, it almost seems like you're trying to emulate a lucene index to give you a complete list of text values.
i'm no expert in Databases so what i know about queries is that they are the way to read or write in databases
in eventual consistency read will return stale data
in write query first data node will be updated but other node will need some time to be updated
in strong consistency read will be locked until data get modified to it latest version (really i'm not sure about what i said here so help if u got it wrong)
in write query all read operations for will be lock until data node get modified to its latest version
so if i write data as eventual and tried ancestors query to get that data will i get the latest version ?
if i used ancestors query to update would all eventual read operation get the latest version ?
update
i think Transactions is there so if there is multi modification request to the same data 1 will succeeded and other will fail after that the data the have been modified will take some time to be replicated in all datacenter so if transaction succeeded does not mean all read query will return the latest version (correct me if i'm right)
If you use what you call an "ancestor query", you're working in a transaction: either the transaction terminates successfully, in which case all subsequent reads will get the values as updated by the transaction, or else the transaction fails, in which case none of the changes made by the transaction will be seen (this all-or-nothing property is often referred to as a transaction being "atomic"). In particular, you do get strong consistency this way, not just eventual consistency.
The cost can be large, in terms of performance and scalability. In particular, an application should not update an entity group (any and all entities descending from a common ancestor) more than once a second, which can be a very constraining limit for a highly scalable application.
The online docs include a large variety of tips, tricks and advice on how to deal with this -- you could start at https://cloud.google.com/datastore/docs/articles/balancing-strong-and-eventual-consistency-with-google-cloud-datastore/ and continue with the "additional resources" this article lists at the end.
One simple idea that often suffices is that (differently from queries) getting a specific entity from its key is strongly consistent without needing transactions, and memcache is also strongly consistent; writing a modified entity gives you its new key, so you can stash that key into memcache and have other parts of your code fetch the modified entity from that key, rather than relying on queries. This has limits, of course, because memcache doesn't give you unbounded space -- but it's a useful idea to keep in mind, nevertheless, in many practical cases.
With GAE the only way to be consistante is to use transaction, into a transaction you can update, query the last update but it's slower.
For me using ancestors just compose the primary key and that's all.
There doesn't seem to be any direct way to know affected rows in cassandra for update, and delete statements.
For example if I have a query like this:
DELETE FROM xyztable WHERE PKEY IN (1,2,3,4,5,6);
Now, of course, since I've passed 6 keys, it is obvious that 6 rows will be affected.
But, like in RDBMS world, is there any way to know affected rows in update/delete statements in datastax-driver?
I've read cassandra gives no feedback on write operations here.
Except that I could not see any other discussion on this topic through google.
If that's not possible, can I be sure that with the type of query given above, it will either delete all or fail to delete all?
In the eventually consistent world you can look at these operations as if it was saving a delete request, and depending on the requested consistency level, waiting for a confirmation from several nodes that this request has been accepted. Then the request is delivered to the other nodes asynchronously.
Since there is no dependency on anything like foreign keys, then nothing should stop data from being deleted if the request was successfully accepted by the cluster.
However, there are a lot of ifs. For example, deleting data with a consistency level one, successfully accepted by one node, followed by an immediate node hard failure may result in the loss of that delete if it was not replicated before the failure.
Another example - during the deletion, one node was down, and stayed down for a significant amount of time, more than the gc_grace_period, i.e., more than it is required for the tombstones to be removed with deleted data. Then if this node is recovered, then all suddenly all data that has been deleted from the rest of the cluster, but not from this node, will be brought back to the cluster.
So in order to avoid these situations, and consider operations successful and final, a cassandra admin needs to implement some measures, including regular repair jobs (to make sure all nodes are up to date). Also applications need to decide what is better - faster performance with consistency level one at the expense of possible data loss, vs lower performance with higher consistency levels but with less possibility of data loss.
There is no way to do this in Cassandra because the model for writes, deletes, and updates in Cassandra is basically the same. In all of those cases a cell is added to the table which has either the new information or information about the delete. This is done without any inspection of the current DB state.
Without checking the rest of the replicas and doing a full merge on the row there is no way to tell if any operation will actually effect the current read state of the database.
This leads to the oft cited anti-pattern of "Reading before a write." In Cassandra you are meant to write as fast as possible and if you need to have history, use a datastructure which preservations a log of modifications rather than just current state.
There is one option for doing queries like this, using the CAS syntax of IF value THEN do other thing but this is a very expensive operation compared normal write and should be used sparingly.