How to update data from one table to another using jpa efficiently - java

I have a linux centos system, and am using postgres PostgreSQL 9.4.11 database.
I have a messaging system in which messages gets stored in db and there receipts are inserted in to the same table with the updated status (so for 1 message 3 rows are added in the table), everyday million rows are inserted into it due to which generating reports from that table takes a lot time. I don't have control over that table to make it store the updated status in the same row.
I tried data table partitioning to improve it but it was not that much useful.
So to overcome this, I started storing the sent messages in two tables and when I receive the receipt I update the second table same row and generate reports from that. To do this I tried using trigger but trigger were enormously slow and to update 100k rows it took 12 hours approx.
Third option that I tried is that I created a scheduler in java which queries 100 records from the large table and updates the records in the new table by its correct id.
Doing this one by one is not efficient at all as it increases the server load as well as takes time to update data.
I tried the following batch update method to update all receipts in one go:
I added all id's in array and set the required status and time and did execute update, but this didn't work, the arrids array shows 1000 records and the update count shows only 20 or 50 records updated (shows some random count).
queryStm = "Update reciept set smsc_sub=:dlr,time_sub=:time,serverTime=:servertime where id IN :arrids ";
Query query = em.createNativeQuery(queryStm);
query.setParameter("arrids ",arrids );
query.setParameter("dlr", dlr);
query.setParameter("time", time);
query.setParameter("servertime", new Timestamp(new Date().getTime()));
query.executeUpdate();
Is there any efficient way to achieve updating data into one table from another table either in java/jpa or in postgres.

Related

When does Cassandra fetches full rows when a client sends a update query

Lets take an example table:
CREATE TABLE student (
id int PRIMARY KEY,
name text,
phone text
);
And a clients sends a update query like: update student set name='name_temp' where id in (1, 2);
My question is what gets saved into memtable, does it save the whole row for ids 1 and 2 (which means it has to fetch the whole row first) with updated value for name column or just the delta? When does the whole row gets fetched as I assume when it writes to SSTable it has to write the whole row with the latest 'name` column value.
EDIT:
For complete understanding please read the comments as part of the selected answer.
In Cassandra, INSERT, UPDATE and DELETE statements are all inserts under the hood. Cassandra doesn't do a read-before-write (with the exception of lightweight transactions) so your query:
UPDATE student SET name='name_temp' WHERE id IN (1, 2);
does not "fetch the rows" before updating the 2 partitions.
All it does is insert 2 new records to the student table where only the name column is set -- for these 2 particular mutations, there is no value for the column phone.
Provided there are no new mutations (inserts/updates/deletes) to those 2 records, the following records get flushed from the memtable to disk:
{ id = 1, name = 'name_temp' }
{ id = 2, name = 'name_temp' }
Cassandra has sparse storage meaning only the columns with values set are stored on disk. Since the mutation did not contain the phone column, it will not get included in the new SSTable that resulted from the memtable flush. Cheers!
Update is also a write in Cassandra. So as you update those two partitions are written into memtable first and then flushed into a new sstable.
Only updated value is wriiten to memtable. Complete row is not fetched. So cells which are updated gets written.
Cassandra resolves different writes (old data and updated data) during read path. With each cell Cassandra stores a metadata for write time which is used to determine the latest data (Last Write Wins).
The different data for same partition are compacted by the process of compaction.
#rafel, the short answer to your question is only the columns which have been changed are updated.
Here's a good resource about the write path: https://docs.datastax.com/en/cassandra-oss/3.0/cassandra/dml/dmlWriteUpdate.html
This note is at the bottom of the page:
Note: Some database operations may only write partial updates of a row, so some versions of a row may include some columns, but not all. During a compaction or write, Cassandra assembles a complete version of each row from the partial updates, using the most recent version of each column.

Apache NiFi : How to fetch data from Cassandra table whenever table record is getting modified

I have Cassandra as the database, We are using "QueryCassandra" processor to fetch values from Cassandra table to an output port,
Which uses a select query to fetch the records. I have a use case mentioned below.
1) First time all the records need to be fetched from Cassandra and transferred to the output port, that's happening now. (i.e All data is frequently fetched from the table at particular time interval as we mention in Run Schedule)
2) Later whenever the Cassandra table is modified (Insert New Record or Row Updated or Row delete) then only the records need to be sent to the output port,
is there any way we can achieve this instead of fetching every time intervals?
Sample Nifi Template
This isn't currently possible with NiFi (1.11.4 at the time of this writing), we'd need either a Cassandra version of QueryDatabaseTable (where you provide a column that only increases, like timestamp) or a CaptureChangeCassandra processor where we use a CommitLogReader to read the commit log rather than querying the table itself.
Please feel free to write a New Feature Jira case to add CDC capabilities for Cassandra.

Multiple database calls with multi threading vs single database call using Java REST

We want to generate a report with 10,000 records by querying 12 million records from the database and show in an user interface, it's a REST API call. We have two approaches right now to achieve this.
Here is the little background about our database tables design:
Design#1: We uses SQL database and we have a big legacy database table(Single) where it has more than 12 million records in a given time, we always keeps one year data in this table. Every month we have a backup policy, which moves data to a history table, even with this backup policy we end up more than 12 million records.
Design#2: As part of the above big table redesign, we created 12 tables based on certain criteria and we are persisting the above 12 million records into these 12 tables more or less equally, a million records per each table.
Approach#1: Query 12 tables simultaneously using java executor API with callable tasks and send the result to the caller.
Approach#2: Query the single big legacy table and send the result to the caller.
Please suggest me which approach better suits with optimal performance.
Let me know if anything unclear.

How Can I get table's primary key via Oracle Database Change Notification

I could get notifications from an Oracle database thanks to this code and omitting this line:
prop.setProperty(OracleConnection.DCN_QUERY_CHANGE_NOTIFICATION,"true");
Also I could solve my ORA-29977 problem changing select * from act_code_metadata where product_id=1159 with select column_with_Number_type from act_code_metadata where product_id=1159
Everything works as expected :D
This is the code I use to print the row's information (Java 8):
DatabaseChangeRegistration dcp.addListener((DatabaseChangeEvent dce) ->
System.out.println(
"Changed row id : " +
dce.getTableChangeDescription()[0].getRowChangeDescription()[0].getRowid().stringValue()
+ " " + dce.getTableChangeDescription()[0].getRowChangeDescription()[0].getRowOperation().toString()));
But all the information I get is the row's physical address (rowid) and the operation involved (insert, delete or update).
I need to identify the row being modified/inserted/deleted to refresh my cached data in several Swing controls in my GUI.
I've read that, despite the rowid being imutable, the same rowid can be re-assigned if the row is deleted and a new one is inserted, and the rowid can change if the row is in a partitioned table. So the best that can be done is using the rowid and the row's primary key.
My table has a autoincrementable primary key (with a sequence and a trigger) created with this code.
I have no control on what happens on the database or if somebody inserts and deletes rows several times. So I can get the wrong row when selecting it using the rowid given by the notification.
Is there a way that I can get my row's primary key via Oracle Database Change Notification so I can identify the inserted/deleted/modified row correctly?
I'm working with Oracle Database XE 11.2 Express and Java 8. The user for database connection already has the change notification privilege.
It seems that you have a lot of overhead trying to basically maintain a fresh snapshot of the data in your GUI. It may be simpler to look at client result caching and just re-running your query every X seconds; and let Oracle do the magic of seeing if the data changed. You would be limited to a JDBC driver that supports OCI. See http://docs.oracle.com/cd/E11882_01/server.112/e41573/memory.htm#PFGRF985 for details. With client result caching, the first time the SQL is executed, it will take say 500 milliseconds. Next query using the same criteria it will take 2 or 3 milliseconds. Assuming the result set is small (less than 100 rows is quite small), you can get a lot better results without all that framework you are trying to build.

Fetching records one by one from PostgreSql DB

There's a DB that contains approximately 300-400 records. I can make a simple query for fetching 30 records like:
SELECT * FROM table
WHERE isValidated = false
LIMIT 30
Some more words about content of DB table. There's a column named isValidated, that can (as you correctly guessed) take one of two values: true or false. After a query some of the records should be made validated (isValidated=true). It is approximately 5-6 records from each bunch of 30 records. Correspondingly after each query, I will fetch the records (isValidated=false) from previous query. In fact, I'll never get to the end of the table with such approach.
The validation process is made with Java + Hibernate. I'm new to Hibernate, so I use Criterion for making this simple query.
Is there any best practices for such task? The variant with adding a flag-field (that marks records which were fetched already) is inappropriate (over-engineering for this DB).
Maybe there's an opportunity to create some virtual table where records that were already processed will be stored or something like this. BTW, after all the records are processed, it is planned to start processing them again (it is possible, that some of them need to be validated).
Thank you for your help in advance.
I can imagine several solutions:
store everything in memory. You only have 400 records, and it could be a perfectly fine solution given this small number
use an order by clause (which you should do anyway) on a unique column (the PK, for example), store the ID of the last loaded record, and make sure the next query uses where ID > :lastId

Categories