Monitoring changes in SQL db using Java - java

I want to check for changes in a SQL database but not sure what the best approach would be.
Currently what I do is I create an Object with column (key, value) information and store that into an ArrayList. I then copy that ArrayList and make a new one called my "cache".
Now, this is what I'm doing: every X ms I am grabbing all the data from the SQL db and then comparing that to my cached local copy. If there is anything different e.g. a value has been updated, inserted or deleted, I have a listener to notify me.
Is this a bad approach? Should I be doing a "SELECT * FROM table" query or is there another way I can get the information I want. I also want to be able to see the specific data that has been modified e.g. the row, the column and value.
Note: The database is generic.

Related

Is there a way to uniquely identify a column via jdbc? I do not mean via schema table column

I am trying to track changes made to a database (schema) using a java app. We are trying to track changes for each column/unique-constraint/index and table.
Functionally I know table.column is unique. So, if the datatype of a column changes, we know which column to find and record the change. But what if the name changes? If JDBC's result set is ordered (it asks for index), then I can rely on the order to give me the same column everytime, even if the name changes. Will there be any surprises here, since it is a result 'set'?
However, I learnt that we can change the order of the columns as well. Isn't there any unique ID associated with the columns so that they can be picked up on that basis?
I would mostly not want to use information_schema route, but even though i checked there for mysql, found nothing useful.

Google BigQuery : Inserting data into temporary table for joining it onto another table

I am currently trying to push some data from Java into BigQuery, in order to be able to make use of it only for the next query and then to get rid of it.
The data consists of 3 columns which exists in the table I want to query. Therefore, by creating a temporary table containing this data, I could make a left join and get the query results I am in need of.
This process will happen on a scheduled basis, having different sets of data.
Can you please tell me if that can be done ?
Many thanks !
Using the jobs.query API, you can specify a destinationTable as part of configuration.query. Does that help? You can control the table expiration time using the tables.update API and setting expirationTime.
Alternatively, you can "inline" the table as part of the query that you want to run using a WITH clause in standard SQL rather than writing to a temporary table.

Can we store records got from a query to a list in stored procedure?

I am new to stored procedure, I am using hibernate concept to retrieve data from the database. client server traffic is more so I decide to move to SP by doing simple logics in server side and return needed values to front end. Now I want to know that is there any way to store records to list, so that I can rotate the list of records in a loop and ask them to come one by one and get a single field from a record and make a process then return a value to front end like we are doing in Java? List,getter,setter and generic class to store needed entities. I am confused with this.Please advise and guide me to know well about stored procedures.
It sounds like you are wanting to use a cursor over your query results, generate a temporary table, and then select the contents of that temporary table to return from your stored procedure.
You should be able to find plenty of examples online for cursors and temporary tables.

Avoiding for loop and try to utilize collection APIs instead (performance)

I have a piece of code from an old project.
The logic (in a high level) is as follows:
The user sends a series of {id,Xi} where id is the primary key of the object in the database.
The aim is that the database is updated but the series of Xi values is always unique.
I.e. if the user sends {1,X1} and in the database we have {1,X2},{2,X1} the input should be rejected otherwise we end up with duplicates i.e. {1,X1},{2,X1} i.e. we have X1 twice in different rows.
In lower level the user sends a series of custom objects that encapsulate this information.
Currently the implementation for this uses "brute-force" i.e. continuous for-loops over input and jdbc resultset to ensure uniqueness.
I do not like this approach and moreover the actual implementation has subtle bugs but this is another story.
I am searching for a better approach, both in terms of coding and performance.
What I was thinking is the following:
Create a Set from the user's input list. If the Set has different size than list, then user's input has duplicates.Stop there.
Load data from jdbc.
Create a HashMap<Long,String> with the user's input. The key is the primary key.
Loop over result set. If HashMap does not contain a key with the same value as ResultSet's row id then add it to HashMap
In the end get HashMap's values as a List.If it contains duplicates reject input.
This is the algorithm I came up.
Is there a better approach than this? (I assume that I am not erroneous on the algorithm it self)
Purely from performance point of view , why not let the database figure out that there are duplicates ( like {1,X1},{2,X1} ) ? Have a unique constraint in place in the table and then when the update statement fails by throwing the exception , catch it and deal with what you would want to do under these input conditions. You may also want to run this as a single transaction just if you need to rollback any partial updates. Ofcourse this is assuming that you dont have any other business rules driving the updates that you havent mentioned here.
With your algorithm , you are spending too much time iterating over HashMaps and Lists to remove duplicates IMHO.
Since you can't change the database, as stated in the comments. I would probably extend out your Set idea. Create a HashMap<Long, String> and put all of the items from the database in it, then also create a HashSet<String> with all of the values from your database in it.
Then as you go through the user input, check the key against the hashmap and see if the values are the same, if they are, then great you don't have to do anything because that exact input is already in your database.
If they aren't the same then check the value against the HashSet to see if it already exists. If it does then you have a duplicate.
Should perform much better than a loop.
Edit:
For multiple updates perform all of the updates on the HashMap created from your database then once again check the Map's value set to see if its' size is different from the key set.
There might be a better way to do this, but this is the best I got.
I'd opt for a database-side solution. Assuming a table with the columns id and value, you should make a list with all the "values", and use the following SQL:
select count(*) from tbl where value in (:values);
binding the :values parameter to the list of values however is appropriate for your environment. (Trivial when using Spring JDBC and a database that supports the in operator, less so for lesser setups. As a last resort you can generate the SQL dynamically.) You will get a result set with one row and one column of a numeric type. If it's 0, you can then insert the new data; if it's 1, report a constraint violation. (If it's anything else you have a whole new problem.)
If you need to check for every item in the user input, change the query to:
select value from tbl where value in (:values)
store the result in a set (called e.g. duplicates), and then loop over the user input items and check whether the value of the current item is in duplicates.
This should perform better than snarfing the entire dataset into memory.

Insert fail then update OR Load and then decide if insert or update

I have a webservice in java that receives a list of information to be inserted or updated in a database. I don't know which one is to insert or update.
Which one is the best approach to abtain better performance results:
Iterate over the list(a object list, with the table pk on it), try to insert the entry on Database. If the insert failed, run a update
Try to load the entry from database. if the results retrieved update, if not insert the entry.
another option? tell me about it :)
In first calls, i believe that most of the entries will be new bd entries, but there will be a saturation point that most of the entries will be to update.
I'm talking about a DB table that could reach over 100 million entries in a mature form.
What will be your approach? Performance is my most important goal.
If your database supports MERGE, I would have thought that was most efficient (and treats all the data as a single set).
See:
http://www.oracle.com/technology/products/oracle9i/daily/Aug24.html
https://web.archive.org/web/1/http://blogs.techrepublic%2ecom%2ecom/datacenter/?p=194
If performance is your goal then first get rid of the word iterate from your vocabulary! learn to do things in sets.
If you need to update or insert, always do the update first. Otherwise it is easy to find yourself updating the record you just inserted by accident. If you are doing this it helps to have an identifier you can look at to see if the record exists. If the identifier exists, then do the update otherwise do the insert.
The important thing is to understand the balance or ratio between the number of inserts versus the number of updates on the list you receive. IMHO you should implement an abstract strategy that says "persists this on database". Then create concrete strategies that (for example):
checks for primary key, if zero records are found does the insert, else updates
Does the update and, if fails, does the insert.
others
And then pull the strategy to use (the class fully qualified name for example) from a configuration file. This way you can switch from one strategy to another easily. If it is feasible, could be depending on your domain, you can put an heuristic that selects the best strategy based on the input entities on the set.
MySQL supports this:
INSERT INTO foo
SET bar='baz', howmanybars=1
ON DUPLICATE KEY UPDATE howmanybars=howmanybars+1
Option 2 is not going to be the most efficient. The database will already be making this check for you when you do the actual insert or update in order to enforce the primary key. By making this check yourself you are incurring the overhead of a table lookup twice as well as an extra round trip from your Java code. Choose which case is the most likely and code optimistically.
Expanding on option 1, you can use a stored procedure to handle the insert/update. This example with PostgreSQL syntax assumes the insert is the normal case.
CREATE FUNCTION insert_or_update(_id INTEGER, _col1 INTEGER) RETURNS void
AS $$
BEGIN
INSERT INTO
my_table (id, col1)
SELECT
_id, _col1;
EXCEPTION WHEN unique_violation THEN
UPDATE
my_table
SET
col1 = _col1
WHERE
id = _id;
END;
END;
$$
LANGUAGE plpgsql;
You could also make the update the normal case and then check the number of rows affected by the update statement to determine if the row is actually new and you need to do an insert.
As alluded to in some other answers, the most efficient way to handle this operation is in one batch:
Take all of the rows passed to the web service and bulk insert them into a temporary table
Update rows in the mater table from the temp table
Insert new rows in the master table from the temp table
Dispose of the temp table
The type of temporary table to use and most efficient way to manage it will depend on the database you are using.

Categories