Here is my requirement:
a date is inserted in to a db table with each record. Two weeks
before that particulate date, a separate record should be entered to a
different table.
My initial solution was to put up a SQL schedule job, but my client insisted on it being handled through java.
What is the best approach for this?
What are the pros and cons of using SQL schedule job and Java scheduling for this task?
Ask yourself the question: to what domain does this piece of work belong? If it's required for data integrity, then it's obviously the DBMS' problem and would probably best be handled there. If it's part of the business domain rather than the data, or might require information or processing that's not available or natural to the DBMS, it's probably best made external.
I'd say, use the best tool for the job. Having stuff handled by the database using whatever features it offers is often nice. For example, a log table that keeps "snapshots" of status updates of records in another table is something I typically like to have a trigger for, taking that responsibility out of my app's hands.
But that's something that's available in practically any DBMS. There's the possibility that other databases won't offer the job scheduling capacities you require. If it's conceivable that some day you'll be switching to a different DBMS, you'll then be forced to do it in Java anyway. That's the advantage of the Java approach: you've got the functionality independently of the database. If you're using pure JDBC with standard SQL queries, you've got a fully portable solution.
Both approaches seem valid. Consider what induces the least work and worries. If it's done in Java you'll need to make sure that process is running or scheduled. That's some external dependency. If it's in the database, you'll be sure the job is done as long as the DB is up.
Well, first off, if you want to do it in Java, you can use the Timer for a simple basic repetitive job, or Quartz for more advanced stuff.
Personally I also think that it would be better to have the same entity (application) deal with all related database actions. In other words, if your Java app is reading/writing to/from the db, it should be consistent and also deal with scheduled reading/writings. And as a plus, this way you can synchronize your actions easier, like, if you want to make sure that a scheduled job is running, has started, has finished, you can do that a lot easier if all is done in Java as compared with having a different process (like the SQL Scheduler) doing it.
Related
Context of My question:
I use a proprietary Database (target database) and I can not reveal the name of the DB (you may not know even If I reveal the name).
Here, I usually need to update the records using java. (The number of records vary from 20000 to 40000)
Each update transaction is taking one or two seconds for this DB. So, you see that the execution time would be in hours. There are no Batch execution functions are available for this Database API. For this, I am thinking to use Java multi-threaded feature, instead of executing all the records in single process I want to create a thread for every 100 records. We know that Java can make these threads run parallelly.
But, I want to know how does the DB process these threads sharing the same connection? I can find this by running a trail program and compare time intervals. I feel that it may be deceiving to some extent. I know that you don't have much information about the database. You can just answer this question assuming the DB as MS SQL/MySQL.
Please suggest me if there is any other feature in java I can utilize to make this program execute faster if not multi-threading.
It is not recommended to use single connection with multiple threads, you can read the pitfalls of doing so here.
If you really need to use a single connection with multiple threads, then I would suggest making sure threads start and stop successfully within a transaction. If one of them fails you have to make sure to rollback the changes. So, first get the count, make cursor ranges and for each range start a thread that will execute that on that range. One thing to look for is to not close the connection after executing the partitions individually, but to close it when the transaction is complete and the db is committed.
If you have an option to use Spring Framework, check out Spring Batch.
Spring Batch provides reusable functions that are essential in processing large volumes of records, including logging/tracing, transaction management, job processing statistics, job restart, skip, and resource management. It also provides more advanced technical services and features that will enable extremely high-volume and high performance batch jobs through optimization and partitioning techniques. Simple as well as complex, high-volume batch jobs can leverage the framework in a highly scalable manner to process significant volumes of information.
Hope this helps.
I want to develop a program that reads data from the database and written into file.
For a better performance, I want to use multithreading.
The solution I plan to implement is based on these assumptions:
it is not necessary to put multiple threads to read from the database because there is a concurrency problem to be managed by the DBMS (similarly to the writing into the file). Given that each read element from the database will be deleted in the same transaction.
Using the model producer-consumer: a thread to read the data (main program). and another thread to write the data in the file.
For implementation I will use the executor framework: a thread pool (size=1) to represent the consumer thread.
Can these assumptions make a good solution ?
Is this problem requires a solution based on multithreading?
it is not necessary to put multiple threads to read from the database because there is a concurrency problem to be managed by the DBMS
Ok. So you want one thread that is reading from the database.
Can these assumptions make a good solution ? Is this problem requires a solution based on multithreading?
Your solution will work but as mentioned by others, there are questions about the performance improvements (if any). Threading programs work because you can make use of the multiple processor (or core) hardware on your computer. In your case, if the threads are blocked by the database or blocked by the file-system, the performance improvement may be minimal if at all. If you were doing a lot of processing of the data, then having multiple threads handle the task would work well.
This is more of a comment:
For your first assumption: You should post the db part on https://dba.stackexchange.com/ .
A simple search returned :
https://dba.stackexchange.com/questions/2918/about-single-threaded-versus-multithreaded-databases-performance - so you need to check if your read action is complex enough and if multithread even serves your need for db connection.
Also, your program seems to be sequential read and write. I dont think you even need multithreading unless you want multiple writes on the same file at the same time.
You should have a look at Spring Batch, http://projects.spring.io/spring-batch/, which relates to JSR 352 specs.
This framework comes with pretty good patterns to manage ETL related operations, including multi-threaded processing, data partitioning, etc.
In my database, i have many records of a certain table that need to be processed from time to time by my java spring app.
There is a boolean flag, on each row of that table saying whether a given record is currently being processed.
What I'm looking at is having my java spring app deployed multiple times on different servers, all accessing the same shared DB, the same app duplicated with some load balancer, etc.
But only one java app instance at a time can process a given DB record of that particular table.
What are the different approaches to enforce that constraint?
I can think of some unique queue that would dispatch those processing tasks to different java running instances making sure that the same DB record is not processed simultaneously by two different java instances. But that sounds quite complicated for what it is. Maybe there is something simpler? Anything else? Thanks in advance.
you can use the locking strategies to enforce the exclusiveness of access to the particular records in you table. there are 2 different approaches that can be applied to reach this requirement. optimistic locking or pessimistic locking, take a look at hibernate docs
additionally, there's another issue that you should think of. with current approach, if the server would crash during the time when it was processing a certain record and eventually would not succeed to complete, then this record would stay in "incomplete" state and would not be processed by others. one possible solution that come to my mind is to use the 'node id' of server that took responsibility for processing instead of state flag.
Basically what the title says. Going forward, we need to start supporting both database platforms (and will start writing migrations accordingly), but we need to do the first initial "port".
Our DBAs are confident they can convert the schema, tables, data types, etc. but our developers have less confidence that the DAOs will "just work". Can someone point us towards some resources we can review? Ideally common pitfalls to avoid, specific tests to run, etc. We will of course run the full suite of database tests at the application layer, but want to do as much preparation as possible before then.
Pay attention to and test performance under load. Oracle does some things fundamentally differently than other database vendors. Tom Kyte's excellent book Expert Oracle Database Architecture points out several differences. A couple of highlights:
Oracle never locks data just to read it. Many other databases do.
A writer of data in Oracle never blocks a reader. A reader of data never blocks a writer. Again, many other vendors do.
Not paying attention to things like this can cause big headaches after a conversion when locking issues surface. This is not to imply a superiority of one product over another, rather it just means that what works well with one vendor's product may fail miserably in another, and custom approaches depending on the database may be required.
Ditto (although on a quite simple schema, have to say). "Just worked". Hibernate magic.
I had my peace of mind because we had 100% test coverage for DAO layer. So when schema was recreated on MS SQL, and some table and column names were updated in the mapping (don't remember why, but DBAs asked to, may be naming convention), we just run our tests and found no failed ones.
P.S. Recalled one interesting detail: functional tests were all OK. But when PTE started on MS SQL database, we have found that a concurrent access to one particular table was times slower than on Oracle due to locks propagation. We had to redesign that functionality.
I think the first step would be to get an empty MS SQL schema, use hbm2ddl=true and let Hibernate create the tables there. Then show this to your DBAs and ask if this makes sense.
Populating data is less of a problem, I'd guess queries would be more slippery (especially if you use raw JDBC in some places). You might also want to check query plans for commonly used queries and see if these make sense, too.
I'm designing multiple process to query job from database.
Each job wake up once per minute to query task and send to workflow system.
I need advice about which best way to mask record and query it and not duplicate with other process.
In Oracle, depending on version:
10g and down -- use Advanced Queuing. Have your job dequeue the keys that you've had enqueued.
11g and up -- if you don't want the hassle of queuing, you can use the SKIP LOCKED clause and have your job SELECT FOR UPDATE the task it's to work on; think of it as queuing without having to make PL/SQL calls.
assuming you have control over the database design, wouldn't you just include flag of some kind? You could just have a yes/no kind of flag, which might be too basic (?). Alternatively a more detailed approach that recorded the identity of the process which "has" the Record / Job. This would give you a richer set of data and a more flexible application.