I've a typical scenario & need to understand best possible way to handle this, so here it goes -
I'm developing a solution that will retrieve data from a remote SOAP based web service & will then push this data to an Oracle database on network.
Also, this will be a scheduled task that will execute every 15 minutes.
I've event queues on remote service that contains the INSERT/UPDATE/DELETE operations that have been done since last retrieval, & once I retrieve the events for last 15 minutes, it again add events for next retrieval.
Now, its just pushing data to Oracle so all my interactions are INSERT & UPDATE statements.
There are around 60 tables on Oracle with some of them having 100+ columns. Moreover, for every 15 minutes cycle there would be around 60-70 Inserts, 100+ Updates & 10-20 Deletes.
This will be an executable jar file that will terminate after operation & will again start on next 15 minutes cycle.
So, I need to understand how should I handle WRITE operations (best practices) to improve performance for this application as whole ?
Current Test Code (on every cycle) -
Connects to remote service to get events.
Creates a connection with DB (single connection object).
Identifies the type of operation (INSERT/UPDATE/DELETE) & table on which it is done.
After above, calls the respective method based on type of operation & table.
Uses Preparedstatement with positional parameters, & retrieves each column value from remote service & assigns that to statement parameters.
Commits the statement & returns to get event class to process next event.
Above is repeated till all the retrieved events are processed after which program closes & then starts on next cycle & everything repeats again.
Thanks for help !
If you are inserting or updating one row at a time,You can consider executing a batch Insert or a batch Update. It has been proven that if you are attempting to update or insert rows after a certain quantity, you get much better performance.
The number of DB operations you are talking about (200 every 15 minutes) is tiny and will be easy to finish in less than 15 minutes. Some concrete suggestions:
You should profile your application to understand where it is spending its time. If you don't do this, then you don't know what to optimize next and you don't know if something you did helped or hurt.
If possible, try to get all of the events in one round-trip to the remote server.
You should reuse the connection to the remote service (probably by using a library that supports connection persistence and reuse).
You should reuse the DB connections by using a connection pooling library rather than creating a new connection for each insert/update/delete. Believe it or not, creating the connection probably takes 100+ times as long as doing your DB operation once you have the connection in hand.
You should consider doing multiple (or all) of the database operations in the same transaction rather than creating a new transaction for each row that is changed. However, you should carefully consider your failure modes such that you don't lose any events (if that is an important consideration).
You should consider utilizing prepared statement caching. This may help, but maybe not if Oracle is configured properly.
You should consider trying to analyze your operations to find any that can be batched together. This can be a lot faster if you have some "hot" operations that get done often.
"I've a typical scenario"
No you haven't. You have a bespoke architecture, with a unique data model, unique data and unique business requirements. That's not a bad thing, it's the state of pretty much every computer system that's not been bought off-the-shelf (and even some of them).
So, it's an experiment and you must approach it as such. There is no "best practice". Try various things and see what works best.
"need to understand best possible way to handle this"
You will improve your chances of success enormously by hiring somebody who understands Oracle databases.
Related
I am working on a Java web application that uses Weblogic to connect to an Informix database. In the application we have multiple threads creating records in a table.
It happens pretty often that it fails and the following error is thrown:
java.sql.SQLException: Could not do a physical-order read to fetch next row....
Caused by: java.sql.SQLException: ISAM error: record is locked.
I am assuming that both threads are trying to insert or update when the record is locked.
I did some research and found that there is an option to set the database that instead of throwing an error, it should wait for the lock to be released.
SET LOCK MODE TO WAIT;
SET LOCK MODE TO WAIT 17;
I don't think that there is an option in JDBC to use this setting. How do I go about using this setting in my java web app?
You can always just send that SQL straight up, using createStatement(), and then send that exact SQL.
The more 'normal' / modern approach to this problem is a combination of MVCC, the transaction level 'SERIALIZABLE', retry, and random backoff.
I have no idea if Informix is anywhere near that advanced, though. Modern DBs such as Postgres are (mysql does not count as modern for the purposes of MVCC/serializable/retry/backoff, and transactional safety).
Doing MVCC/Serializable/Retry/Backoff in raw JDBC is very complicated; use a library such as JDBI or JOOQ.
MVCC: A mechanism whereby transactions are shallow clones of the underlying data. 2 separate transactions can both read and write to the same records in the same table without getting in each other's way. Things aren't 'saved' until you commit the transaction.
SERIALIZABLE: A transaction level (also called isolationlevel), settable with jdbcDbObj.setTransactionIsolation(Connection.TRANSACTION_SERIALIZABLE); - the safest level. If you know how version control systems work: You're asking the database to aggressively rebase everything so that the entire chain of commits is ordered into a single long line of events: Each transaction acts as if it was done after the previous transaction was completed. The simplest way to implement this level is to globally lock all the things. This is, of course, very detrimental to multithread performance. In practice, good DB engines (such as postgres) are smarter than that: Multiple threads can simultaneously run transactions without just being frozen and waiting for locks; the DB engine instead checks if the things that the transaction did (not just writing, also reading) is conflict-free with simultaneous transactions. If yes, it's all allowed. If not, all but one simultaneous transaction throw a retry exception. This is the only level that lets you do this sequence of events safely:
Fetch the balance of isaace's bank account.
Fetch the balance of rzwitserloot's bank account.
subtract €10,- from isaace's number, failing if the balance is insufficient.
add €10,- to rzwitserloot's number.
Write isaace's new balance to the db.
Write rzwitserloot's new balance to the db.
commit the transaction.
Any level less than SERIALIZABLE will silently fail the job; if multiple threads do the above simultaneously, no SQLExceptions occur but the sum of the balance of isaace and rzwitserloot will change over time (money is lost or created – in between steps 1 & 2 vs. step 5/6/7, another thread sets new balances, but these new balances are lost due to the update in 5/6/7). With serializable, that cannot happen.
RETRY: The way smart DBs solve the problem is by failing (with a 'retry' error) all but one transaction, by checking if all SELECTs done by the entire transaction are not affected by any transactions that been committed to the db after this transaction was opened. If the answer is yes (some selects would have gone differently), the transaction fails. The point of this error is to tell the code that ran the transaction to just.. start from the top and do it again. Most likely this time there won't be a conflict and it will work. The assumption is that conflicts CAN occur but usually do not occur, so it is better to assume 'fair weather' (no locks, just do your stuff), check afterwards, and try again in the exotic scenario that it conflicted, vs. trying to lock rows and tables. Note that for example ethernet works the same way (assume fair weather, recover errors afterwards).
BACKOFF: One problem with retry is that computers are too consistent: If 2 threads get in the way of each other, they can both fail, both try again, just to fail again, forever. The solution is that the threads twiddle their thumbs for a random amount of time, to guarantee that at some point, one of the two conflicting retriers 'wins'.
In other words, if you want to do it 'right' (see the bank account example), but also relatively 'fast' (not globally locking), get a DB that can do this, and use JDBI or JOOQ; otherwise, you'd have to write code to run all DB stuff in a lambda block, catch the SQLException, check the SqlState to see if it is indicating that you should retry (sqlstate codes are DB-engine specific), and if yes, rerun that lambda, after waiting an exponentially increasing amount of time that also includes a random factor. That's fairly complicated, which is why I strongly advise you rely on JOOQ or JDBI to take care of this for you.
If you aren't ready for that level of DB usage, just make a statement and send "SET LOCK MDOE TO WAIT 17;" as SQL statement straight up, at the start of opening any connection. If you're using a connection pool there is usually a place you can configure SQL statements to be run on connection start.
The Informix JDBC driver does allow you to automatically set the lock wait mode when you connect to the server.
Simply pass via the DataSource or connection URL the following parameter
IFX_LOCK_MODE_WAIT=17
The values for JDBC are
(-1) Wait forever
(0) not wait (default)
(> 0) wait this many seconds
See https://www.ibm.com/support/knowledgecenter/SSGU8G_14.1.0/com.ibm.jdbc.doc/ids_jdbc_040.htm
Connection conn = DriverManager.getConnection ( "jdbc:Informix-sqli://cleo:1550:
IFXHOST=cleo;PORTNO=1550;user=rdtest;password=my_passwd;IFX_LOCK_MODE_WAIT=17";);
I would like to ask for some advices concerning my problem.
I have a batch that does some computation (multi threading environement) and do some inserts in a table.
I would like to do something like batch insert, meaning that once I got a query, wait to have 1000 queries for instance, and then execute the batch insert (not doing it one by one).
I was wondering if there is any design pattern on this.
I have a solution in mind, but it's a bit complicated:
build a method that will receive the queries
add them to a list (the string and/or the statements)
do not execute until the list has 1000 items
The problem : how do I handle the end ?
What I mean is, the last 999 queries, when do I execute them since I'll never get to 1000 ?
What should I do ?
I'm thinking at a thread that wakes up every 5 minutes and check the number of items in a list. If he wakes up twice and the number is the same , execute the existing queries.
Does anyone has a better idea ?
Your database driver needs to support batch inserting. See this.
Have you established your system is choking on network traffic because there is too much communication between the service and the database? If not, I wouldn't worry about batching, until you are sure you need it.
You mention that in your plan you want to check every 5 minutes. That's an eternity. If you are going to get 1000 items in 5 minutes, you shouldn't need batching. That's ~ 3 a second.
Assuming you do want to batch, have a process wake up every 2 seconds and commit whatever is queued up. Don't wait five minutes. It might commit 0 rows, it might commit 10...who cares...With this approach, you don't need to worry that your arbitrary threshold hasn't been met.
I'm assuming that the inserts come in one at a time. If your incoming data comes in n at once, I would just commit every incoming request, no matter how many inserts happen. If your messages are coming in as some sort of messaging system, it's asynchronous anyway, so you shouldn't need to worry about batching. Under high load, the incoming messages just wait till there is capacity to handle them.
Add a commit kind of method to that API that will be called to confirm all items have been added. Also, the optimum batch size is somewhere in the range 20-50. After that the potential gain is outweighed by the bookkeeping necessary for a growing number of statements. You don't mention it explicitly, but of course you must use the dedicated batch API in JDBC.
If you need to keep track of many writers, each in its own thread, then you'll also need a begin kind of method and you can count how many times it was called, compared to how many times commit was called. Something like reference-counting. When you reach zero, you know you can flush your statement buffer.
This is most amazing concept , I have faced many time.So, according to your problem you are creating a batch and that batch has 1000 or more queries for insert . But , if you are inserting into same table with repeated manner.
To avoid this type of situation you can make the insert query like this:-
INSERT INTO table1 VALUES('4','India'),('5','Odisha'),('6','Bhubaneswar')
It can execute only once with multiple values.So, better you can keep all values inside any collections elements (arraylist,list,etc) and finally make a query like above and insert it once.
Also you can use SQL Transaction API.(Commit,rollback,setTraction() ) etc.
Hope ,it will help you.
All the best.
Can you help me in two problem :
A. We have a table on which read and write operation happens simultaneously. Write happens very vastly so read is very slow - sometimes my web application does not come up due to heavy write operation on this table. How could i handle such scenario. Write happens through different Java application while read happens through our web application, so web application become very slow. Any idea?
B. Write happens to this table happens through 200 threads, these thread take connection from connection pool and write into the table and this application run 24 by 7. is the thread priority is having issue and stopping read operation from web application.
C. Can we have master- master replication for that table only- so write happens in one table and write happens in other table and every two minute data migrates from one table to other table?
Please suggest me .
Thanks in advance.
Check connection pool size - maybe it's too small and your threads waste time waiting for connection from pool.
Check your database settings, if you just running it with out-of-the-box params there maybe a good space for improvements.
You probably need some kind of event-driven system - when vehicle sends data DB is not updated, but a message is added to some queue (e.g. JMS). Your app then caches data on startup, and updates both cache and database upon receiving this message. The key thing is that the only component that interacts with DB is your app, and data changed only when you receive event - so you don't need to query DB to read the data, plus you may do updates in the background using only few threads, etc. There are quite good open-source messaging systems (e.g. Apache Active MQ) and caching libraries (e.g. EH Cache), so you can built reasonably perfomant and fault-tolerant system with not too much effort.
I guess introducing messaging will be a serious reengineering, so to solve your immediate problem replication might be the best solution - merge data from the updateable table to another one every 2 minutes, and the tracker will read that another table; obviously works well if you only read the data in the web-app, and not update them, otherwise you need to put a lot of effort to keep 2 tables in sync. A variation of that is batching - data from vehicle are iserted into intermediate table, and then every 2 minutes transferred into main table from which reader queries them; intermediate table is cleaned after transfer.
The one true way to solve this is to use a queue of write events and to stop the writing periodically so that the reader has a chance.
Create a queue for incoming write updates
Create an atomicXXX (see java.util.concurrency) to use as a lock
Create a thread pool to read from the queue and execute the updates when the lock is unset
Use javax.swing.Timer to periodically set the lock and read the table data.
Before trying anything too complicated try this perhaps:
1) Don't use Thread priorities, they are rarely what you want.
2) Set up your own priority scheme, perhaps simply by having a (priority) queue for both reads and writes where reads are prioritized. That is: add read and write requests to a single queue and have them block or be notified of the result.
3) check your database features to optimize write heavy tables
I have a long running job that updates 1000's of entity groups. I want to kick off a 2nd job afterwards that will have to assume all of those items have been updated. Since there are so many entity groups, I can't do it in a transaction, so i've just scheduled the 2nd job to run 15 minutes after the 1st completes using task queues.
Is there a better way?
Is it even safe to assume that 15 minutes gives a promise that the datastore is in sync with my previous calls?
I am using high replication.
In the google IO videos about HRD, they give a list of ways to deal with eventual consistency. One of them was to "accept it". Some updates (like twitter posts) don't need to be consistent with the next read. But they also said something like "hey, we're only talking miliseconds to a couple of seconds before they are consistent". Is that time frame documented anywhere else? Is it safe assuming that waiting 1 minute after a write before reading again will mean all my preivous writes are there in the read?
The mention of that is at the 39:30 mark in this video http://www.youtube.com/watch?feature=player_embedded&v=xO015C3R6dw
I don't think there is any built in way to determine if the updates are done. I would recommend adding a lastUpdated field to your entities and updating it with your first job, then check for the timestamp on the entity you're updating with the 2nd before running... kind of a hack but it should work.
Interested to see if anybody has a better solution. Kinda hope they do ;-)
This is automatic as long as you are getting entities without changing the consistency to Eventual. The HRD puts data to a majority of relevant datastore servers before returning. If you are calling the asynchronous version of put, you'll need to call get on all the Future objects before you can be sure it's completed.
If however you are querying for the items in the first job, there's no way to be sure that the index has been updated.
So for example...
If you are updating a property on every entity (but not creating any entities), then retrieving all entities of that kind. You can do a keys-only query followed by a batch get (which is approximately as fast/cheap as doing a normal query) and be sure that you have all updates applied.
On the other hand, if you're adding new entities or updating a property in the first process that the second process queries, there's no way to be sure.
I did find this statement:
With eventual consistency, more than 99.9% of your writes are available for queries within a few seconds.
at the bottom of this page:
http://code.google.com/appengine/docs/java/datastore/hr/overview.html
So, for my application, a 0.1% chance of it not being there on the next read is probably OK. However, I do plan to redesign my schema to make use of ancestor queries.
I am using Informix DB. This question may not be tied to one specific database. But I want to know how I can in Java, continuously probe into a Database and check if a certain row has been added to a table in the DB. Basically, the flow is:
My Java application should use JDBC to check if a certain table is populated.
If no, it should wait until a row has been inserted.
My question how can I have Java be aware of a row insertion. I am not expecting to add any triggers or anything, but in pure Java be able to check that the row is added.
Some thoughts that come to my mind are continuously call DB for the row, or periodically (every half-hour or so) call DB and check if the row is available. But what I am looking for is something like a Listener which can do this.
There is no facility in the Informix DBMS to signal when a particular row arrives in a table.
Well, I say that, but there is the DB-Cron facility which can periodically execute tasks (inside the server), and you could conceivably schedule a task to poll for the data to see if it has arrived and to send a message (somehow) to indicate that it has. It would be non-trivial, especially the part the indicates that it has arrived.
The JDBC protocol (and SQL protocols generally) are essentially synchronous; the client sends a request and waits for an answer from the DBMS.
So, pragmatically, if your delay period is half an hour, you can either create an admin task to handle the processing (you could write a Java UDR to be executed in the server by the server if that's crucial to you), or you can arrange for the Java (client-side) program to poll periodically to find out whether the information you need is there. A half-hour delay is not going to stress anything, even with a moderate number of processes polling for separate values (or even the same value). On the other hand, you normally try to avoid polling when you can. You'll need to strike a balance between responsiveness to the special data arriving and general system responsiveness. On the whole, general system responsiveness is more important, so keep the polling interval as large as you can.
If your polling interval needed to be sub-second, then the balance would be different - the job would be a lot harder.