How to track pagination information for multi database connection? - java

I have following setup
MySQL server instance1
MySQL server instance2
in both these table I have a single table records which are partitioned. I have to retrieve the data from each instance and show the data in JQGrid.
Here are the consideration to be made:
1) From each database instance only 1000 records needs to be got.
2) Merge these 1000 records and sort in ascending order by a default column.
3) Again from the merged records get only 1000 records to be shown in a Grid.
4) For the next 1000 records we should not show any of the earlier records which have been shown.
The major problem I am having is how to uniquely identify the last row shown from the fetched records.
I thought about doing it this way:
1) Get the rowid for each record from all the connection. But from two instances the rowid would be same then how would I identify which record is from which database?
2) Check for rowid and primary key combination. But if the client sets the primary key's auto-increment value as same on all the instance then we would not get a unique combination.
Am I missing something or is there any other way to do it?
I am using JDBC connection to connect the database.
[SOLVED IT]
Solved the problem by writing a small function which calculates and create a map for the number of records to be fetched from each connection for each iteration.
Sorry can't add the code here as it is clients IP.

You usually uniquely identify a row with a POJO class where you override your hashCode() and equals() methods basing on your class fields. You can put your last row into a HashMap and check against it.
A rownumber retrieved from the tables can be achieved by creating one using RANK() and PARTITION BY. Create a temporary or tmp_ table with your merging results then drop it.
hope this helps abit.

if the two mysql instance can see each other, how about create a view, UNION the two query results. then in java world, you could do query/pagination on this view?

Related

Get records that don't have their ID in a seperate String Set - DynamoDB

I'm using Amazon Dynamo DB to down load a number of records to Android.
I have 2 Tables.
Table 1 contains a Set of Strings containing ID's
Table 2 has records each with an individual ID.
I want to download 10 records from Table 2 only if the record ID does not appear in the Set of strings in Table 1.
I can do this by downloading all the records in table 2 and then not saving /displaying the ones that appear in the String Set in table 1. However is there a way to only download the ones that don't appear in the String Set?
Any ideals would be appreciated.
Many Thanks
In order to query a dynamodb you need the attribute to either be a range key or partition key which in terns need to be scalar so you cannot directly query what you want. Your best chance of doing what you want if I understand your requirement is scan operation. Scan the whole table(2) and then use queryExpression to filter the results you get using a nested query in table one. This demands you to make the "Set of Strings" either either partition or range key in your first table.

Hibernate concurrency creating a duplicate record on saveOrUpdate

I'm trying to implement a counter with Java, Spring, Hibernate and Oracle SQL. Each record represents a count, by a given timestamp. Let's say each record is uniquely identified by the minute, and each record holds a count column. The service should expect to receive a ton of concurrent requests and my update a counter column for possibly the same record.
In my table, if the record does not exist, just insert the record in and set its count to 1. Otherwise, find the record by timestamp and increase its existing counter column by 1.
In order to ensure that we're maintain data consistency and integrity, I'm using pessimistic locking. For example, if 20 counts come in at the same time, and not necessarily by the same user, it's possible that we may override the record from a stale read of that record before updating. With locking, I'm ensuring that if 20 counts come in, the net effect on the database should represent the 20 count.
So locking is fine, but the problem is that if the record never did exist in the first place, and we have two or more concurrent requests coming in trying to update the not-yet-existant record, I've observed that the a duplicate record gets inserted because we cannot lock on a record that doesn't exist yet. How can we ensure that no duplicates get created in the table? Should it be controlled via Oracle? Or can I manage this via my app and Hibernate?
Thank you.
One was to avoid this sort of problem altogether would be to just generate the count at the time you actually query the data. Oracle has an analytic function ROW_NUMBER() which can assign a row number to each record in the result set of a query. As a rough example, consider the following query:
SELECT
ts,
ROW_NUMBER() OVER (ORDER BY ts) rn
FROM yourTable
The count you want would be in the rn column, representing the number of records appearing since the first entry in the table. Of course, you could further restrict the query.
This approach is robust to removing records, as the count would always start with 1. One drawback is that row number functionality is not supported by Hibernate. You would have to run this either as a native query or a stored proc.

Validate existing values in the Postgres table before inserting the row

I am using JPA and Hibernate to connect to a table and insert data in the same. I have a table say : User which has three columns ID, Name and Address. I have an entity class for the same and to insert the data I simply use the EntityManager's object and persist the data in the db which works like a charm for me.
Now I have a scenario where I want to check whether the values that I am persisting already exist, if that is the case I have to log an error. Currently how I am doing that is manually loading the rows from the table and manually checking if the same values exist or not which is fairly simple for the example table (User) that has only three columns. But what if I have a table with 30 columns?
Do I manually load the data based on one condition and check for other columns or is there a better and a short way to do that ?
30 columns, is that you primary key as? If the data you are checking for duplication is the primary key, or unique constraint then you can use Hibnerate to fetch an object before save and report back if it exists. If the 30 columns are not part of the key then I would use equals method, and as such fetch all rows. However if there are many rows and this would be slow then I would probably write an dedicated SQL to check wherever an object exists, i.e.
UserDao public boolean rowExists(User user) { ... }

Fetching records one by one from PostgreSql DB

There's a DB that contains approximately 300-400 records. I can make a simple query for fetching 30 records like:
SELECT * FROM table
WHERE isValidated = false
LIMIT 30
Some more words about content of DB table. There's a column named isValidated, that can (as you correctly guessed) take one of two values: true or false. After a query some of the records should be made validated (isValidated=true). It is approximately 5-6 records from each bunch of 30 records. Correspondingly after each query, I will fetch the records (isValidated=false) from previous query. In fact, I'll never get to the end of the table with such approach.
The validation process is made with Java + Hibernate. I'm new to Hibernate, so I use Criterion for making this simple query.
Is there any best practices for such task? The variant with adding a flag-field (that marks records which were fetched already) is inappropriate (over-engineering for this DB).
Maybe there's an opportunity to create some virtual table where records that were already processed will be stored or something like this. BTW, after all the records are processed, it is planned to start processing them again (it is possible, that some of them need to be validated).
Thank you for your help in advance.
I can imagine several solutions:
store everything in memory. You only have 400 records, and it could be a perfectly fine solution given this small number
use an order by clause (which you should do anyway) on a unique column (the PK, for example), store the ID of the last loaded record, and make sure the next query uses where ID > :lastId

Insert fail then update OR Load and then decide if insert or update

I have a webservice in java that receives a list of information to be inserted or updated in a database. I don't know which one is to insert or update.
Which one is the best approach to abtain better performance results:
Iterate over the list(a object list, with the table pk on it), try to insert the entry on Database. If the insert failed, run a update
Try to load the entry from database. if the results retrieved update, if not insert the entry.
another option? tell me about it :)
In first calls, i believe that most of the entries will be new bd entries, but there will be a saturation point that most of the entries will be to update.
I'm talking about a DB table that could reach over 100 million entries in a mature form.
What will be your approach? Performance is my most important goal.
If your database supports MERGE, I would have thought that was most efficient (and treats all the data as a single set).
See:
http://www.oracle.com/technology/products/oracle9i/daily/Aug24.html
https://web.archive.org/web/1/http://blogs.techrepublic%2ecom%2ecom/datacenter/?p=194
If performance is your goal then first get rid of the word iterate from your vocabulary! learn to do things in sets.
If you need to update or insert, always do the update first. Otherwise it is easy to find yourself updating the record you just inserted by accident. If you are doing this it helps to have an identifier you can look at to see if the record exists. If the identifier exists, then do the update otherwise do the insert.
The important thing is to understand the balance or ratio between the number of inserts versus the number of updates on the list you receive. IMHO you should implement an abstract strategy that says "persists this on database". Then create concrete strategies that (for example):
checks for primary key, if zero records are found does the insert, else updates
Does the update and, if fails, does the insert.
others
And then pull the strategy to use (the class fully qualified name for example) from a configuration file. This way you can switch from one strategy to another easily. If it is feasible, could be depending on your domain, you can put an heuristic that selects the best strategy based on the input entities on the set.
MySQL supports this:
INSERT INTO foo
SET bar='baz', howmanybars=1
ON DUPLICATE KEY UPDATE howmanybars=howmanybars+1
Option 2 is not going to be the most efficient. The database will already be making this check for you when you do the actual insert or update in order to enforce the primary key. By making this check yourself you are incurring the overhead of a table lookup twice as well as an extra round trip from your Java code. Choose which case is the most likely and code optimistically.
Expanding on option 1, you can use a stored procedure to handle the insert/update. This example with PostgreSQL syntax assumes the insert is the normal case.
CREATE FUNCTION insert_or_update(_id INTEGER, _col1 INTEGER) RETURNS void
AS $$
BEGIN
INSERT INTO
my_table (id, col1)
SELECT
_id, _col1;
EXCEPTION WHEN unique_violation THEN
UPDATE
my_table
SET
col1 = _col1
WHERE
id = _id;
END;
END;
$$
LANGUAGE plpgsql;
You could also make the update the normal case and then check the number of rows affected by the update statement to determine if the row is actually new and you need to do an insert.
As alluded to in some other answers, the most efficient way to handle this operation is in one batch:
Take all of the rows passed to the web service and bulk insert them into a temporary table
Update rows in the mater table from the temp table
Insert new rows in the master table from the temp table
Dispose of the temp table
The type of temporary table to use and most efficient way to manage it will depend on the database you are using.

Categories