How to retrieve only the information that got changed from Cassandra?

How to retrieve only the information that got changed from Cassandra? - java

I am working on designing the Cassandra Column Family schema for my below use case.. I am not sure what is the best way to design the cassandra column family for my below use case? I will be using CQL Datastax Java driver for this..
Below is my use case and the sample schema that I have designed for now -
SCHEMA_ID RECORD_NAME SCHEMA_VALUE TIMESTAMP
1 ABC some value t1
2 ABC some_other_value t2
3 DEF some value again t3
4 DEF some other value t4
5 GHI some new value t5
6 IOP some values again t6
Now what I will be looking from the above table is something like this -
For the first time whenever my application is running, I will ask for everything from the above table.. Meaning give me everything from the above table..
Then every 5 or 10 minutes, my background thread will be checking this table and will ask for give me everything that has changed only (full row if anything got changed for that row).. so that is the reason I am using timestamp as one of the column here..
But I am not sure how to design the query pattern in such a way such that both of my use cases gets satisfied easily and what will be the proper way of designing the table for this? Here SCHEMA_ID will be primary key I am thinking to use...
I will be using CQL and Datastax Java driver for this..
Update:-
If I am using something like this, then is there any problem with this approach?
CREATE TABLE TEST (SCHEMA_ID TEXT, RECORD_NAME TEXT, SCHEMA_VALUE TEXT, LAST_MODIFIED_DATE TIMESTAMP, PRIMARY KEY (ID));
INSERT INTO TEST (SCHEMA_ID, RECORD_NAME, SCHEMA_VALUE, LAST_MODIFIED_DATE) VALUES ('1', 't26', 'SOME_VALUE', 1382655211694);
Because, in my this use case, I don't want anybody to insert same SCHEMA_ID everytime.. SCHEMA_ID should be unique whenever we are inserting any new row into this table.. So with your example (#omnibear), it might be possible, somebody can insert same SCHEMA_ID twice? Am I correct?
And also regarding type you have taken as an extra column, that type column can be record_name in my example..

Regarding 1)
Cassandra is used for heavy writing, lots of data on multiple nodes. To retrieve ALL data from this kind of set-up is daring since this might involve huge amounts that have to be handled by one client. A better approach would be to use pagination. This is natively supported in 2.0.
Regarding 2)
The point is that partition keys only support EQ or IN queries. For LT or GT (< / >) you use column keys. So if it makes sense to group your entries by some ID like "type", you can use this for your partition key, and a timeuuid as a column key. This allows to query for all entries newer than X like so
create table test
(type int, SCHEMA_ID int, RECORD_NAME text,
SCHEMA_VALUE text, TIMESTAMP timeuuid,
primary key (type, timestamp));
select * from test where type IN (0,1,2,3) and timestamp < 58e0a7d7-eebc-11d8-9669-0800200c9a66;
Update:
You asked:
somebody can insert same SCHEMA_ID twice? Am I correct?
Yes, you can always make an insert with an existing primary key. The values at that primary key will be updated. Therefore, to preserve uniqueness, a UUID is often used in the primary key, for instance, timeuuid. It is a unique value containing a timestamp and the MAC address of the client. There is excellent documentation on this topic.
General advice:
Write down your queries first, then design your model. (Use case!)
Your queries define your data model which in turn is primarily defined by your primary keys.
So, in your case, I'd just adapt my schema above, like so:
CREATE TABLE TEST (SCHEMA_ID TEXT, RECORD_NAME TEXT, SCHEMA_VALUE TEXT,
LAST_MODIFIED_DATE TIMEUUID, PRIMARY KEY (RECORD_NAME, LAST_MODIFIED_DATE));
Which allows this query:
select * from test where RECORD_NAME IN ("componentA","componentB")
and LAST_MODIFIED_DATE < 1688f180-4141-11e3-aa6e-0800200c9a66;
the uuid corresponds to -> Wednesday, October 30, 2013 8:55:55 AM GMT
so you would fetch everything after that

Related

How to make a primary key start with a specific letter?

Here I am using MySQL and I want my primary key to start with a letter, like D000. Then everytime I enter a new record the primary key auto increments like so:
D001
D002
D003.
How can I do this?

You can't AUTO_INCREMENT a column whose type is VARCHAR.
What you could do is make it BIGINT and AUTO_INCREMENT, and whenever you need it as String, you can prepend it with your letter 'D' like:
Long dbKey = ...;
String key = "D" + dbKey;
You could create a stored procedure for this to set an "auto-incremented" string as the default value for this column, but it just doesn't worth the hassle. Plus working with numbers is always faster and more efficient than working with strings.

I'm not sure whether I get your question right, but shouldn't the following work?
CREATE TRIGGER myTrigger
BEFORE INSERT
ON myTable
FOR EACH ROW
BEGIN
SET NEW.myCustomId = COALESCE('D', RPAD('0',3,NEW.id));
END
for this case you NEED a "normal" primary key column..

Two ideas.
(Useless IMHO) I think Maria DB has virtual columns, though MySQL I think not. But you have views. So you could make a normal INT, AUTOINCREMENT and in the view have a calculated column concatting your key.
One can use different number ranges for different tables.
ALTER TABLE debtors AUTO_INCREMENT=10000;
ALTER TABLE creditors AUTO_INCREMENT=30000;
ALTER TABLE guests AUTO_INCREMENT=50000;
This admittedly is a lame solution, but might do. I think such a distinction might be what you are aiming at.

Not sure why you need it but you can add the D AFTER you fetched the data (String id = "D" + autoIncId;).
You can't insert a string or anything in an autoincrement field and I can't see anyway this can be useful (all the recorde will have a D, so no one has).
If you want to declare a row default, you can add a boolean column named DEFAULT.
while(rs.next()){
String id = rs.getBoolean("DEFAULT")?"D":"ND";
id+=rs.getLong(1);
}
EDIT
As per your comment I understand that you want to select the max ID and add 1 to it. Then it's ok to use an autoincrement field in your DB and it must be a number type (INTEGER, BIGINT...).
Please FORGET to add the "D" to your primary key, it will simply not going to work as you want. The autoincrement takes the last inserted ID and adds 1 to it. If your last id is "D3" adding 1 has the same meaning as adding 4 to "apple". You are using different types.
There is no way for SQL or any other programming language to understand that if you add 1 to "D3" it should become "D4". What you need to do is get rid of that D (which purpose I still don't understand).

Yo may try to do this aberration at your own risk:
INSERT INTO table (id, a, b, c)
VALUES ( fn_get_key( LAST_INSERT_ID("table_name ") +1), "a", "b", "c");
Where fn_get_key is a function that will convert the number into your desired string AND will execute:
ALTER TABLE table_name AUTO_INCREMENT = start_value;
Anyway I do not recommend your approach. Numeric strings are faster and easier to sort. You could always create a view that transforms the ID or use logic o change from the "D001" key to "1". Foreing key and uniqness of ids enforcement will be harder and more expensive

Auto-generating a unique varchar field - mySQL, Java, Hibernate

Background : I have a database table called Contact. All users of my system have details of their contacts in this table including a firstname, a lastname, and also a varchar field called 'UniqueId'. Users of the system may put anything in the UniqueId field, as long as it is unique from that user's other contact's unique ids.
Aim : I now need to change my code so a unique id is automatically generated if the user does not provide one. This should be short and visually pleasing. Ideally it could just be an auto-incrementing number. However, AUTO_INCREMENT works for an integer field, not a varchar field.
Also note that each contact UniqueId needs to be unique from the other contacts of that user, but not neccesarily unique to the entire system. Therefore, the following UniqueIds are valid :
Contact
UserId Firstname Lastname UniqueId
1 Bob Jones 1
1 Harold Smith 2
2 Joe Bloggs 1
Question : So, how can I achieve this? Is there a reliable and clean way to get the database to generate a unique id for each contact in the existing UniqueId varchar field (Which is my preference if possible)? Or am I forced to make Java go and get the next available unique id, and if so, what is the most reliable way of doing this? Or any alternative solution?
Edit - 11th April AM: We use hibernate to map our fields. I'm just beginning to research if that may provide an alternative solution? Any opinions?
Edit - 11th April PM: 2 options are currently standing out, but neither seem as ideal as I would like.
1. As #eis suggests, I could have an auto-incrementing field in addition to my current varchar field. Then, either when a contact is saved the int can also be saved in the varchar field, or when a contact is retrieved the int can be used if the varchar is empty. But it feels messy and wrong to use two fields rather than one
2. I am looking into using a hibernate generator, as discussed here. But this involves holding a count elsewhere of the next id, and Java code, and seems to massively overcomplicate the process.
If my existing uniqueId field had been an int field, AUTO_INCREMENT would simply work, and work nicely. Is there no way to make the database generate this but save it as a String?

I think what you really should do is ditch your current 'uniqueid' and generate new ones that are really unique across the system, being always autogenerated and never provided by the user. You would need to do separate work to migrate to the new system. That's the only way I see to keep it sane. User could provide something like an alias to be more visually pleasing, if needs be.
On the upside, you could use autoincrement then.
Ok, one additional option, if you really really want what you're asking. You could have a prefix like §§§§ that is never allowed for a user, and always autogenerate ids based on that, like §§§§1, §§§§2 etc. If you disallow anything starting with that prefix from the end user, you would know that there would be no collisions, and you could just generate them one-by-one whenever needed.
Sequences would be ideal to generate numbers to it. You don't have sequences in MySQL, but you could emulate them for example like this.

I apologize, I really don't know MySQL syntax, but here's how I'd do it in SQL Server. Hopefully that will still have some value to you. Basically, I'm just counting the number of existing contacts and returning it as a varchar.
CREATE FUNCTION GetNewUniqueId
(#UserId int)
RETURNS varchar(3)
AS
BEGIN
DECLARE #Count int;
SELECT #Count = COUNT(*)
FROM Contacts
WHERE UserId = #UserId;
SET #Count = #Count + 1;
RETURN CAST(#Count AS varchar(3));
END
But if you really want something "visually pleasing," why not try returning something more like Firstname + Lastname?
CREATE FUNCTION GetNewUniqueId
(#UserId int, #FirstName varchar(255), #LastName varchar(255))
RETURNS varchar(515)
AS
BEGIN
DECLARE #UniqueId varchar(515), #Count int;
SET #UniqueId = #FirstName + #LastName;
SELECT #Count = COUNT(*)
FROM Contacts
WHERE UserId = #UserId AND LEFT(UniqueId, LEN(#UniqueId)) = #UniqueId;
IF #Count > 0
SET #UniqueId = #UniqueId + '_' + CAST(#Count + 1 AS varchar(3));
RETURN #UniqueId;
END

Create new table entry with new id

The problem
I have a table for some data that has an ID column of type integer (which is also the primary key).
When a new data entry is added to the table, it should get a new ID whereas the ID is not known by the application that inserts the object but it should be given by the database. For example, the IDs should be assigned like 0, 1, 2, ...
Assume that I have all other data for the new entry, how would I do the insert? Normally:
insert into T values(123, 'data');
But now I don't know what to put instead of 123
- would you create some kind of global variable NEXTID in the database that provides the IDs and query/update this value each time before inserting into T?
The questions
How to handle this kind of problem? A solution that is concurrency save is preferable.
How to achieve this with Java/myBatis? I Have a Java class that corresponds to the table structure and a new object should be added to the database, getting a new ID automatically.
Update
What I searched for was auto-increment.
Is there a standard SQL way (database independent) of declaring a column as auto-increment? I am using Apache Derby and GENERATED ALWAYS AS IDENTITY (START WITH 1, INCREMENT BY 1) is suggested here.
How does the insert to a table that contains auto-increment columns look like?
What is the best way to get the created auto-increment value after an insert when simultaneaous access to the database is possible?
I'll accept an answer that includes explanation and SQL instructions for declaration and insertion :)

If you are using sqlserver, making column of identity type will solve the purpose something like this
.
ALTER TABLE [dbo].[T] ADD [Column1] INT identity (1, 1)
For others like oracle you can for simple database sequence.

In MySQL you can use
ALTER TABLE table_name ADD id INT AUTO_INCREMENT;
this auto increment the id column, you don't have to give in insert.

Hibernate and padding on CHAR primary key column in Oracle

I'm having a little trouble using Hibernate with a char(6) column in Oracle. Here's the structure of the table:
CREATE TABLE ACCEPTANCE
(
USER_ID char(6) PRIMARY KEY NOT NULL,
ACCEPT_DATE date
);
For records whose user id has less than 6 characters, I can select them without padding the user id when running queries using SQuirreL. I.E. the following returns a record if there's a record with a user id of "abc".
select * from acceptance where user_id = "abc"
Unfortunately, when doing the select via Hibernate (JPA), the following returns null:
em.find(Acceptance.class, "abc");
If I pad the value though, it returns the correct record:
em.find(Acceptance.class, "abc ");
The module that I'm working on gets the user id unpadded from other parts of the system. Is there a better way to get Hibernate working other than putting in code to adapt the user id to a certain length before giving it to Hibernate? (which could present maintenance issues down the road if the length ever changes)

That's God's way of telling you to never use CHAR() for primary key :-)
Seriously, however, since your user_id is mapped as String in your entity Hibernate's Oracle dialect translates that into varchar. Since Hibernate uses prepared statements for all its queries, that semantics carries over (unlike SQuirreL, where the value is specified as literal and thus is converted differently).
Based on Oracle type conversion rules column value is then promoted to varchar2 and compared as such; thus you get back no records.
If you can't change the underlying column type, your best option is probably to use HQL query and rtrim() function which is supported by Oracle dialect.

How come that your module gets an unpadded value from other parts of the system?
According to my understanding, if the other part of the system don't alter the PK, they should read 6 chars from the db and pass 6 chars all along the way -- that would be ok. The only exception would be when a PK is generated, in which case it may need to be padded.
You can circumvent the problem (by trimming or padding the value each time it's necessary), but it won't solve the problem upfront that your PK is not handled consistently. To solve the problem upfront you must eiher
always receive 6 chars from the other parts of the module
use varchar2 to deal with dynamic size correctly
If you can't solve the problem upfront, then you will indeed need to either
add trimming/padding all around the place when necessary
add trimming/padding in the DAO if you have one
add trimming/padding in the user type if this works (suggestion from N. Hughes)

dictionary data insert or select (oracle, java)

I have an table (in ORADB) containing two columns: VARCHAR unique key and NUMBER unique key generated from an sequence.
I need my Java code to constantly (and in parallel) add records to this column whenever a new VARCHAR key it gets, returning the newly generated NUMBER key. Or returns the existing NUMBER key when it gets an existing VARCHAR (it doesn't insert it then, that would throw an exception of course due to the uniq key violation).
Such procedure would be executed from many (Java) clients working in parallel.
Hope my English is understandable :)
What is the best (maybe using PL/SQL block instead of Java code...) way to do it?

I do not think you can do better than
SELECT the_number FROM the_table where the_key = :key
if found, return it
if not found, INSERT INTO the_table SELECT :key, the_seq.NEXT_VAL RETURNING the_number INTO :number and COMMIT
this could raise a ORA-00001(duplicate primary key insert)
if the timing is unlucky. In this case, SELECT again.
Not sure if JDBC supports RETURNING, so you might need to wrap it into a stored procedure (also saves database roundtrips).
You can use an index-organized table (with the_key as primary key), makes the lookup faster.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.