Missing column that was just inserted in cassandra column family - java

We are constancly getting problem on our test cluster.
Cassandra configuration:
cassandra version: 2.2.12
nodes count: 6, seed-nodess 3, none-seed-nodes 3
replication factor 1 (of course for prod we will use 3)
Table configuration where we get problem:
CREATE TABLE "STATISTICS" (
key timeuuid,
column1 blob,
column2 blob,
column3 blob,
column4 blob,
value blob,
PRIMARY KEY (key, column1, column2, column3, column4)
) WITH COMPACT STORAGE
AND CLUSTERING ORDER BY (column1 ASC, column2 ASC, column3 ASC, column4 ASC)
AND caching = {
'keys':'ALL', 'rows_per_partition':'100'
}
AND compaction = {
'class': 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'
};
Our java code details
java 8
cassandra driver: astyanax
app-nodes count: 4
So, whats happening:
Under high load our application do many inserts in cassandra tables from all nodes.
During this we have one workflow when we do next with one row in STATISTICS table:
do insert 3 columns from app-node-1
do insert 1 column from app-node-2
do insert 1 column from app-node-3
do read all columns from row on app-node-4
at last step(4) when we read all columns we are sure that insert of all columns is done (it is guaranteed by other checks that we have)
The problem is that some times(2-5 times on 100'000) it happens that at stpp 4 when we read all columns, we get 4 columns instead of 5, i.e. we are missing column that was inserted at step 2 or 3.
We even start doing reads of this columns every 100ms in loop and we dont get expected result. During this time we also check columns using cqlsh - same result, i.e. 4 instead of 5.
BUT, if we add in this row any new column, then we immediately get expected result, i.e. we are getting then 6 columns - 5 columns from workflow and 1 dummy.
So after inserting dummy column we get missing column that was inserted at step 2 or 3.
Moreover when we get the timestamp of missing (and then apperared column), - its very closed to time when this column was actually added from our app-node.
Basically insertions from app-node-2 & app-node-3 are done nearlly at the same time, so finally these two columns allways have nearly same timestamp, even if we do insert of dummy column in 1 minute after first read of all columns at step 4.
With replication factor 3 we cannot reproduce this problem.
So open questions are:
May be this is expected behavior of Cassandra when replication factor is 1 ?
If its not expected, then what could be potential reason?
UPDATE 1:
next code is used to insert column:
UUID uuid = <some uuid>;
short shortV = <some short>;
int intVal = <some int>;
String strVal = <some string>;
ColumnFamily<UUID, Composite> statisticsCF = ColumnFamily.newColumnFamily(
"STATISTICS",
UUIDSerializer.get(),
CompositeSerializer.get()
);
MutationBatch mb = keyspace.prepareMutationBatch();
ColumnListMutation<Composite> clm = mb.withRow(statisticsCF, uuid);
clm.putColumn(new Composite(shortV, intVal, strVal, null), true);
mb.execute();
UPDATE 2:
Proceed testing/investigatnig.
When we caught this situation again, we immediately stop(killed) our java apps. And then can constantly see in cqlsh that particular row does not contain inserted column.
To appear it, first we tried nodetool flash on every cassandra node:
pssh -h cnodes.txt /path-to-cassandra/bin/nodetool flush
result - the same, column did not appear.
Then we just restarted the cassandra cluster and column appeared
UPDATE 3:
Tried to disable cassandra cache, by setting row_cache_size_in_mb property to 0 (before it was 2Gb)
row_cache_size_in_mb: 0
After it, the problem gone.
SO probably the probmlem may be in OHCProvider which is used as default cache provider.

Related

Different response for same method of DatabaseMetaData pointing to different databases?

When I invoke the method getIndexInfo( catalog, schema, table, true, false ), I receive a ResultSet slightly different from what is described:
With MySQL (5.7), I receive a ResultSet containing:
one row corresponding to the primary key column description
n rows corresponding to the unique columns description
With SQL Server (14.00), I receive a ResultSet containing:
one row corresponding to the tableIndexStatistic of the primary key
one row corresponding to the primary key column description
n rows corresponding to the unique columns description
m rows corresponding to the index columns description
Due to a project choice all the primary keys are auto-increment, so there aren't case where a primary key column is also a unique column.
I'm searching to write a solution database-independent, since it will be used for both MySQL and SQL Server databases;
MySQL use the MySQL-AB JDBC Driver 5.1.20, SQL Server use the Microsoft JDBC Driver 6.4.
Initially I "resolved" this problem retrieving from the session the driver name, in order to apply a specific filter for each database;
for MySQL i found that the column INDEX_NAME for the Primary Key is always 'PRIMARY', meanwhile for SQL Server I found that the column TYPE is:
0 for the tableIndexStatistic
1 for our SQL Server Primary Keys (tableIndexClustered )
2 (not found in my case yet, but is for tableIndexHashed )
3 for the Unique keys ( tableIndexOther )
A difference between MySQL and SQL Server is that the Primary Keys are respectively of TYPE 3 and 1.
Filter example:
String driver = session.getConfiguration().getDatabaseId();
DatabaseMetaData metadata = session.getConnection().getMetaData();
ResultSet result = metadata.getIndexInfo(catalog, schema, table, true, false);
while( result.next() ){
if( "mysql".equals(driver) ){
if( !"PRIMARY".equals((String) result.getObject("INDEX_NAME"))){
... code to save the result ...
}
} else if ( "sqlserver".equals(driver) ){
if( 3 == (short) result.getObject("TYPE")){
... code to save the result ...
}
} else {
throw new Exception();
}
}
This code worked for a bit, until I discovered on SQL Server a table with an index; in this case, as per documentation linked before, the indexes are part of the tableIndexOther so they have the column TYPE with the value 3.
At this point I've noticed that the column NON_UNIQUE is true for the Unique columns descriptions and false for the Index columns descriptions.
So I was thinking to proceed expanding the SQL Server filter including the NON_UNIQUE column but, against as described in the documentation, when I retrieve a tableIndexStatistic I'll get null instead of false.
I'm a bit confused of how I should approach all those inconsistencies with the documentation, since my main goal is to retrieve the same result of unique keys from those two databases.

Can you check if a column exists and perform different actions with oracle?

My table looks like the following:
id | value1 | count
I have a list of value1 in RAM and I want to do the following:
(if value1 exists in table){
count + 1}else{
insert new row into table}
Is this possible with Oracle or do I have to take it to the code, do a for loop and execute one element of the list at a time? The list contains 5 million values. I'd have to do something like this in the code:
for(int i=0; i<list.size; i++){
boolean exists = checkifexists(list.get(i));
if(exists=true){
countPlusOne(list.get(i);
}else{
createNewRow(list.get(i));
}
}
So I have to do at least two queries for each value, totalling 10m+ queries. This could take a long time and may not be the most efficient way to do this. I'm trying to think of another way.
"I load them into RAM from the database"
You already have the source data in the database so you should do the processing in the database. Instantiating a list of 5 million strings in local memory is not a cheap operation, especially when it's unnecessary.
Oracle supports a MERGE capability which we can use to test whether a record exists in the target table and populate a new row conditionally. Being a set operation MERGE is way more performative than single row inserts in a Java loop.
The tricky bit is uniqueness. You need to have a driving query from the source table which contains unique values (otherwise MERGE will hurl). In this example I aggregate a count of each occurrence of value1 in the source table. This gives us a set of value1 plus a figure we can use to maintain the count column on the target table.
merge into you_target_table tt
using ( select value1
, count(*) as dup_cnt
from your_source_table
group by value1
) st
on ( st.value1 = tt.value1 )
when not matched then
insert (id, value1, cnt)
values (someseq.nextval, st.value1, st.dup_cnt)
when matched then
update
set tt.cnt = tt.cnt + st.dup_cnt;
(I'm assuming the ID column of the target table is populated by a sequence; amend that as you require).
In Oracle, we could use a MERGE statement to check if a row exists and do insertion only if it doesn't.
First create a type that defines your list.
CREATE OR REPLACE TYPE value1_type as TABLE OF VARCHAR2(10); --use the datatype of value1
Merge statement.
MERGE INTO yourtable t
USING (
select distinct column_value as value1 FROM TABLE(value1_type(v1,v2,v3))
)s ON ( s.value1 = t.value1 )
WHEN NOT MATCHED THEN INSERT
(col1,col2,col3) VALUES ( s.col1,s.col2,s.col3);
You may also use NOT EXISTS.
INSERT INTO yourtable t
select * FROM
(
select distinct column_value as value1 from TABLE(value1_type(v1,v2,v3))
) s
WHERE NOT EXISTS
(
select 1 from
yourtable t where t.value1 = s.value1
);
You can do this by two approaches
Approach 1:
Create a temp table in database and insert all your value in RAM into that Temp Table
Write query for updating count on the basis of you main table and temp table join and
set a flag in temp table which values are updated, the value which are not updated
use insert query to insert.
Approach 2:
You can create your own data type, which accepts array of values as input:
CREATE OR REPLACE TYPE MyType AS VARRAY(200) OF VARCHAR2(50);
You can write procedure with your logic,procedure will take value of array as input: CREATE OR REPLACE PROCEDURE testing (t_in MyType)
First fill your RAM list in a temporary table TMP
select * from tmp;
VALUE1
----------
V00000001
V00000002
V00000003
V00000004
V00000005
...
You may use a MERGE statement to handle your logik
if key existe increase the count by 1
if key doesn't exists insert it with the initial count of 1
.
merge into val
using tmp
on (val.value1 = tmp.value1)
when matched then update
set val.count = val.count + 1
when not matched then
insert (val.value1, val.count)
values (tmp.value1, 1)
;
Note that I assume you have IDENTITY key in the column ID, so no key assignment is requeired.
In case there are duplicated record in the TMP table (more records with the same VALUE1 key) you get error as MERGEcan not hanlde more actions with one key.
ORA-30926: unable to get a stable set of rows in the source tables
If you want to count each duplicated key as one -
you must pre-aggregate the temporary table using GROUP BY and add the counts.
Otherwise simple ignore the duplicates using DISTINCT.
merge /*+ PARALLEL(5) */ into val
using (select value1, count(*) count from tmp group by value1) tmp
on (val.value1 = tmp.value1)
when matched then update
set val.count = val.count + 1
when not matched then
insert (val.value1, val.count)
values (tmp.value1, 1)

MySql count of rows and insert into the table that count

I have a scenario where i need to take count of rows in mysql table for the current branch(in that table we are store branch) and insert the count of rows with other details into the same table. But the problem is when two or more concurrent users try to insert from the same branch at the same time the count is same for all the users, but for me the insert should not happ for the other user(s) until i read the count and insert that one user request . Is there any way the locking works for this and any example would be helpful(All i need to do this in MySql store procedure)
Edit : Sorry, I cant share the working code but i can write example here
My table structure is here
id name branchid count
1 abc 1 1
2 xyz 1 2
3 abcd 2 1
4 wxyz 2 2
Here am taking count of rows from the above table for given branch(ex : 1) and inserting the row with that calculated count
Ex :
set #count = (select count(id) from tbl where branchid = 1);
later
insert into tbl(id, name, branchid, count)
values(5, 'abcd', 1, #count)
This works great provided if only one user access this from one branch , but if more than one user from same branch try to access this at exact same time the
#count
is duplicating for the branch users.
Why not just do it in one query:
insert into tbl(id, name, branchid, count)
select 5, 'abcd', 1, count(*)
from from tbl
where branchid = 1;

Derby Database : delete records which their timestamp greater than 2 hours

i tried to write a named query in my Entity Java Bean class , and tried also to write the same query as a native query , its job is to delete records which difference between their timestamp column and current time stamp, are greater than 2 hours.
my query :
DELETE FROM APP.WEATHER WHERE timestampdiff(SQL_TSI_HOUR,APP.WEATHER.SINCE,CURRENT_TIMESTAMP) > 2;
but it failed , and this error message appeared to me :
Error code -1, SQL state 42X04: Column 'SQL_TSI_HOUR' is either not in any table in the FROM list or appears within a join specification and is outside the scope of the join specification or appears in a HAVING clause and is not in the GROUP BY list. If this is a CREATE or ALTER TABLE statement then 'SQL_TSI_HOUR' is not a column in the target table.
Line 1, column 1

Java update when data exists and insert if doesnt [duplicate]

In MySQL, if you specify ON DUPLICATE KEY UPDATE and a row is inserted that would cause a duplicate value in a UNIQUE index or PRIMARY KEY, an UPDATE of the old row is performed. For example, if column a is declared as UNIQUE and contains the value 1, the following two statements have identical effect:
INSERT INTO table (a,b,c) VALUES (1,2,3)
ON DUPLICATE KEY UPDATE c=c+1;
UPDATE table SET c=c+1 WHERE a=1;
I don't believe I've come across anything of the like in T-SQL. Does SQL Server offer anything comparable to MySQL's ON DUPLICATE KEY UPDATE?
I was surprised that none of the answers on this page contained an example of an actual query, so here you go:
A more complex example of inserting data and then handling duplicate
MERGE
INTO MyBigDB.dbo.METER_DATA WITH (HOLDLOCK) AS target
USING (SELECT
77748 AS rtu_id
,'12B096876' AS meter_id
,56112 AS meter_reading
,'20150602 00:20:11' AS time_local) AS source
(rtu_id, meter_id, meter_reading, time_local)
ON (target.rtu_id = source.rtu_id
AND target.time_local = source.time_local)
WHEN MATCHED
THEN UPDATE
SET meter_id = '12B096876'
,meter_reading = 56112
WHEN NOT MATCHED
THEN INSERT (rtu_id, meter_id, meter_reading, time_local)
VALUES (77748, '12B096876', 56112, '20150602 00:20:11');
There's no DUPLICATE KEY UPDATE equivalent, but MERGE and WHEN MATCHED might work for you
Inserting, Updating, and Deleting Data by Using MERGE
You can try the other way around. It does the same thing more or less.
UPDATE tablename
SET field1 = 'Test1',
field2 = 'Test2'
WHERE id = 1
IF ##ROWCOUNT = 0
INSERT INTO tablename
(id,
field1,
field2)
VALUES (1,
'Test1',
'Test2')
SQL Server 2008 has this feature, as part of TSQL.
See documentation on MERGE statement here - http://msdn.microsoft.com/en-us/library/bb510625.aspx
SQL server 2000 onwards has a concept of instead of triggers, which can accomplish the wanted functionality - although there will be a nasty trigger hiding behind the scenes.
Check the section "Insert or update?"
http://msdn.microsoft.com/en-us/library/aa224818(SQL.80).aspx

Categories