Due to the slowness of the application, we made some of our long running queries asynchronous. Problem is these are part of a single transaction but in case if these queries/routines fail I need to roll back everything. How to achieve this? This application is legacy application using JDBC oracle and java 8. Also like to know if there is any support for this in Springboot, jpa application.
Thanks in advance.
Don't try to interact with the same DB connection from multiple threads at once. JDBC's connection system isn't specced to let you do this.
A transaction belongs to a single connection. You can't smear it out over multiples.
The obvious way to ensure that 'it is all rolled back' is to have a single long-lived transaction (but see later).
Combine these 3 facets and you end up with: Do all work in the async block. At least, all work that either needs to all happen, or none of it happens (i.e. the one transaction).
Any other basic approach wouldn't work or wouldn't be useful; there's no point freezing the main thread to wait for the async task (just do the async task on the spot; moving code to another thread doesn't magically make it go any faster. On the contrary, in fact).
However, transactions that aren't just long lived, but make a ton of changes to a DB is its own problem, but we now we're getting into the performance characteristics of your specific batch of queries and your particular DB engine, version, indices, and data. Kinda hard to answer with specifics, what with all those unknowns.
There are ways to design your DB to deal with this (mostly involving a table representing a calculation, and having a row indicate whether the calculation is complete or not. As long as you aren't done, dont set it to 'completed', and all your queries should ignore non-complete results. Upon bootup, delete (and with it, let that cascade) any non-complete results: Those must be half-baked work done right before your server crashed, and now you've restarted it). It's probably not the right answer here, just making sure you're aware that such options also exist.
As a general rule of thumb, countering a problem of "Our code has been observed to run too slowly" with "lets make it all async" doesn't work. async makes code harder to read, way harder to debug, and doesn't make stuff go faster. All you can really do with async is soothe the user by playing them some elevator music or slightly more pragmatic: A progress bar or whatnot, whilst they wait. And that's actually generally easier by spawning off the bits that tell the user what's happening into a separate thread, instead of asyncing the work itself. That, and make your algorithm better and/or fix your DB index definitions. You can search the web for that too; run EXPLAIN variants of your queries to make the DB tell you whether it is using any table sweeps (that's where it goes through the entire dataset before it can answer a query. You want to avoid those).
If you need help with either of those parts (show the user what is going on, instead of freezing the webpage or freezing the GUI / how to optimize a DB query), search the web for this information, there are tons of tutorials. Make sure to include the frontend tech; java can be used for swing apps, javafx, android, and there are at last count like a 100 web frameworks.
Related
I've a typical scenario & need to understand best possible way to handle this, so here it goes -
I'm developing a solution that will retrieve data from a remote SOAP based web service & will then push this data to an Oracle database on network.
Also, this will be a scheduled task that will execute every 15 minutes.
I've event queues on remote service that contains the INSERT/UPDATE/DELETE operations that have been done since last retrieval, & once I retrieve the events for last 15 minutes, it again add events for next retrieval.
Now, its just pushing data to Oracle so all my interactions are INSERT & UPDATE statements.
There are around 60 tables on Oracle with some of them having 100+ columns. Moreover, for every 15 minutes cycle there would be around 60-70 Inserts, 100+ Updates & 10-20 Deletes.
This will be an executable jar file that will terminate after operation & will again start on next 15 minutes cycle.
So, I need to understand how should I handle WRITE operations (best practices) to improve performance for this application as whole ?
Current Test Code (on every cycle) -
Connects to remote service to get events.
Creates a connection with DB (single connection object).
Identifies the type of operation (INSERT/UPDATE/DELETE) & table on which it is done.
After above, calls the respective method based on type of operation & table.
Uses Preparedstatement with positional parameters, & retrieves each column value from remote service & assigns that to statement parameters.
Commits the statement & returns to get event class to process next event.
Above is repeated till all the retrieved events are processed after which program closes & then starts on next cycle & everything repeats again.
Thanks for help !
If you are inserting or updating one row at a time,You can consider executing a batch Insert or a batch Update. It has been proven that if you are attempting to update or insert rows after a certain quantity, you get much better performance.
The number of DB operations you are talking about (200 every 15 minutes) is tiny and will be easy to finish in less than 15 minutes. Some concrete suggestions:
You should profile your application to understand where it is spending its time. If you don't do this, then you don't know what to optimize next and you don't know if something you did helped or hurt.
If possible, try to get all of the events in one round-trip to the remote server.
You should reuse the connection to the remote service (probably by using a library that supports connection persistence and reuse).
You should reuse the DB connections by using a connection pooling library rather than creating a new connection for each insert/update/delete. Believe it or not, creating the connection probably takes 100+ times as long as doing your DB operation once you have the connection in hand.
You should consider doing multiple (or all) of the database operations in the same transaction rather than creating a new transaction for each row that is changed. However, you should carefully consider your failure modes such that you don't lose any events (if that is an important consideration).
You should consider utilizing prepared statement caching. This may help, but maybe not if Oracle is configured properly.
You should consider trying to analyze your operations to find any that can be batched together. This can be a lot faster if you have some "hot" operations that get done often.
"I've a typical scenario"
No you haven't. You have a bespoke architecture, with a unique data model, unique data and unique business requirements. That's not a bad thing, it's the state of pretty much every computer system that's not been bought off-the-shelf (and even some of them).
So, it's an experiment and you must approach it as such. There is no "best practice". Try various things and see what works best.
"need to understand best possible way to handle this"
You will improve your chances of success enormously by hiring somebody who understands Oracle databases.
I have a long running job that updates 1000's of entity groups. I want to kick off a 2nd job afterwards that will have to assume all of those items have been updated. Since there are so many entity groups, I can't do it in a transaction, so i've just scheduled the 2nd job to run 15 minutes after the 1st completes using task queues.
Is there a better way?
Is it even safe to assume that 15 minutes gives a promise that the datastore is in sync with my previous calls?
I am using high replication.
In the google IO videos about HRD, they give a list of ways to deal with eventual consistency. One of them was to "accept it". Some updates (like twitter posts) don't need to be consistent with the next read. But they also said something like "hey, we're only talking miliseconds to a couple of seconds before they are consistent". Is that time frame documented anywhere else? Is it safe assuming that waiting 1 minute after a write before reading again will mean all my preivous writes are there in the read?
The mention of that is at the 39:30 mark in this video http://www.youtube.com/watch?feature=player_embedded&v=xO015C3R6dw
I don't think there is any built in way to determine if the updates are done. I would recommend adding a lastUpdated field to your entities and updating it with your first job, then check for the timestamp on the entity you're updating with the 2nd before running... kind of a hack but it should work.
Interested to see if anybody has a better solution. Kinda hope they do ;-)
This is automatic as long as you are getting entities without changing the consistency to Eventual. The HRD puts data to a majority of relevant datastore servers before returning. If you are calling the asynchronous version of put, you'll need to call get on all the Future objects before you can be sure it's completed.
If however you are querying for the items in the first job, there's no way to be sure that the index has been updated.
So for example...
If you are updating a property on every entity (but not creating any entities), then retrieving all entities of that kind. You can do a keys-only query followed by a batch get (which is approximately as fast/cheap as doing a normal query) and be sure that you have all updates applied.
On the other hand, if you're adding new entities or updating a property in the first process that the second process queries, there's no way to be sure.
I did find this statement:
With eventual consistency, more than 99.9% of your writes are available for queries within a few seconds.
at the bottom of this page:
http://code.google.com/appengine/docs/java/datastore/hr/overview.html
So, for my application, a 0.1% chance of it not being there on the next read is probably OK. However, I do plan to redesign my schema to make use of ancestor queries.
Our company has a Batch Application which runs every day, It does some database related jobs mostly, import data into database table from file for example.
There are 20+ tasks defined in that application, each one may depends on other ones or not.
The application execute tasks one by one, the whole application runs in a single thread.
It takes 3~7 hours to finish all the tasks. I think it's too long, so I think maybe I can improve performance by multi-threading.
I think as there is dependency between tasks, it not good (or it's not easy) to make tasks run in parallel, but maybe I can use multi-threading to improve performance inside a task.
for example : we have a task defined as "ImportBizData", which copy data into a database table from a data file(usually contains 100,0000+ rows). I wonder is that worth to use multi-threading?
As I know a little about multi-threading, I hope some one provide some tutorial links on this topic.
Multi-threading will improve your performance but there are a couple of things you need to know:
Each thread needs its own JDBC connection. Connections can't be shared between threads because each connection is also a transaction.
Upload the data in chunks and commit once in a while to avoid accumulating huge rollback/undo tables.
Cut tasks into several work units where each unit does one job.
To elaborate the last point: Currently, you have a task that reads a file, parses it, opens a JDBC connection, does some calculations, sends the data to the database, etc.
What you should do:
One (!) thread to read the file and create "jobs" out of it. Each job should contains a small, but not too small "unit of work". Push those into a queue
The next thread(s) wait(s) for jobs in the queue and do the calculations. This can happen while the threads in step #1 wait for the slow hard disk to return the new lines of data. The result of this conversion step goes into the next queue
One or more threads to upload the data via JDBC.
The first and the last threads are pretty slow because they are I/O bound (hard disks are slow and network connections are even worse). Plus inserting data in a database is a very complex task (allocating space, updating indexes, checking foreign keys)
Using different worker threads gives you lots of advantages:
It's easy to test each thread separately. Since they don't share data, you need no synchronization. The queues will do that for you
You can quickly change the number of threads for each step to tweak performance
Multi threading may be of help, if the lines are uncorrelated, you may start off two processes one reading even lines, another uneven lines, and get your db connection from a connection pool (dbcp) and analyze performance. But first I would investigate whether jdbc is the best approach normally databases have optimized solutions for imports like this. These solutions may also temporarily switch of constraint checking of your table, and turn that back on later, which is also great for performance. As always depending on your requirements.
Also you may want to checkout springbatch which is designed for batch processing.
As far as I know,the JDBC Bridge uses synchronized methods to serialize all calls to ODBC so using mutliple threads won't give you any performance boost unless it boosts your application itself.
I am not all that familiar with JDBC but regarding the multithreading bit of your question, what you should keep in mind is that parallel processing relies on effectively dividing your problem into bits that are independent of one another and in some way putting them back together (their output that is). If you dont know the underlying dependencies between tasks you might end up having really odd errors/exceptions in your code. Even worse, it might all execute without any problems, but the results might be off from true values. Multi-threading is tricky business, in a way fun to learn (at least I think so) but pain in the neck when things go south.
Here are a couple of links that might provide useful:
Oracle's java trail: best place to start
A good tutorial for java concurrency
an interesting article on concurrency
If you are serious about putting effort to getting into multi-threading I can recommend GOETZ, BRIAN: JAVA CONCURRENCY, amazing book really..
Good luck
I had a similar task. But in my case, all the tables were unrelated to each other.
STEP1:
Using SQL Loader(Oracle) for uploading data into database(very fast) OR any similar bulk update tools for your database.
STEP2:
Running each uploading process in a different thread(for unrelated tasks) and in a single thread for related tasks.
P.S. You could identify different inter-related jobs in your application and categorize them in groups; and running each group in different threads.
Links to run you up:
JAVA Threading
follow the last example in the above link(Example: Partitioning a large task with multiple threads)
SQL Loader can dramatically improve performance
The fastest way I've found to insert large numbers of records into Oracle is with array operations. See the "setExecuteBatch" method, which is specific to OraclePreparedStatement. It's described in one of the examples here:
http://betteratoracle.com/posts/25-array-batch-inserts-with-jdbc
If Multi threading would complicate your work, you could go with Async messaging. I'm not fully aware of what your needs are, so, the following is from what I am seeing currently.
Create a file reader java whose purpose is to read the biz file and put messages into the JMS queue on the server. This could be plain Java with static void main()
Consume the JMS messages in the Message driven beans(You can set the limit on the number of beans to be created in the pool, 50 or 100 depending on the need) if you have mutliple servers, well and good, your job is now split into multiple servers.
Each row of data is asynchronously split between 2 servers and 50 beans on each server.
You do not have to deal with threads in the whole process, JMS is ideal because your data is within a transaction, if something fails before you send an ack to the server, the message will be resent to the consumer, the load will be split between the servers without you doing anything special like multi threading.
Also, spring is providing spring-batch which can help you. http://docs.spring.io/spring-batch/reference/html/spring-batch-intro.html#springBatchUsageScenarios
I have a SWING UI that contains a button that creates a new SwingWorker thread. That thread then queries the SQLite database for results to put them in a JTable. In my StringWorker constructor, the parameters are various fields taken from other SWING components such as a JSpinner, JComboBoxes, etc.
Since I'm new to all of this thread thing, I'd like some advice from more knowledgeable programmers on how I should go about doing what I want to do.
I'd like to know if threads automatically end when I close the program with System.exit(0); so I don't end up with memory leaks
What is the best way to make sure I don't have two threads accessing my database at the same time (let's say the user clicks multiple times on the button or, other case, an administrator is updating the database with some files as input (within my program), then while the first thread is parsing the files and updating the database, he wants to query the database using the button, etc.).
Is it slower to use threads? At first I did all my calculations right in the EDT and of course the UI locked every time after pressing the button, but it only locked for about 5 seconds if I recall correctly. Now, when I press the button, it doesn't lock up but it seems like the result take about a little bit less than twice as long to show up in the JTable. Is it because I made a mistake in my code or is this normal?
I though about using a static field in the class the queries are in and setting it to true if it's in use. Is that the correct way of doing it? That way, not matter which thread is using the database, the second thread won't launch.
If it's not absolutely necessary (it shouldn't be), don't use System#exit in your code. Here are some explanations why and what is better.
Your database is capable of handling two concurrent requests, so it's not a bad thing in itself. If you use JDBC and its pooled connections via DataSource, then you should probably restrict the usage of one such a connection to one thread at a time. To cure the problem of having redundant database queries, e.g. when "clicking twice", there is probably more than one solution. I assume here that you mean the scenario where you have a Swing UI that is distributed to several people, and each of these instances talks to the same database -> simply disable your button as long as the execution of the database query takes.
It's slightly slower if you do not run your code directly in the Event Dispatch Thread due to scheduling of execution of your workers, but this should not be noticable. To see what goes wrong I would have to see the relevant code.
I'd like to know if threads automatically end when I close the program with System.exit(0);
Yes. Entire process will end and threads that are part of this process. However, if you don't call System.exit(), all non daemon threads must finish before process is gone.
What is the best way to make sure I don't have two threads accessing my database at the same time
Since it's a Swing application, I assume that both you and administrator can't access the application at the same time. However, to guarantee that even in single application you can't start more than one operation affecting database, you have to block UI. Either disable buttons or put glass pane on top of UI. Modal progress dialog is also helpful.
Is it slower to use threads?
No, it is not slower if done right. Slow operation will take as long as it takes. You can't fix it with threads, but you can, either keep speed (perceived) the same while providing nice, non blocking UI or you can do more than one slow operation at a time and therefore increase that perceived speed.
I'm currently working on a (rather large) pet project of mine , a Swing application that by it's very nature needs to be multi-threaded. Almost all user interactions might fetch data from some remote servers over the internet , since I neither control these servers nor the internet itself, long response times are thus inevitable. A Swing UI obviously cannot repaint itself while the EDT is busy so all remote server calls need to be executed by background thread(s).
My problem:
Data fetched by the background threads gets 'enriched' with data from a local (in-memory) database (remote server returns IDs/references to data in the local database). This data later eventually gets passed to the EDT where it becomes part of the view model. Some entities are not completely initialized at this point (lazy-fetching enabled) so the user might trigger lazy-fetching by e.g. scrolling in a JTable. Since the hibernate session is already closed this will trigger a LazyInitializationException. I can't know when lazy-fetching might be triggered by the user so creating a session on demand/attaching the detached object will not work here.
I 'solved' this problem by:
using a single (synchronized , since Session instances are not thread-safe) Session for the whole application
disabling lazy-fetching completely
While this works, the application's performance has suffered greatly (sometimes being close to unusable). The slowdown is mainly caused by the large number of objects that are now fetched by each query.
I'm currently thinking about changing the application's design to 'Session-per-thread' and migrating all entities fetched by non-EDT threads to the EDT thread's Session (similar to this posting on the Hibernate forums).
Side-note: Any problems related to database updates do not apply since all database entities are read-only (reference data).
Any other ideas on how to use Hibernate with lazy-loading in this scenario ?
Don't expose the Session itself in your data API. You can still do it lazily, just make sure that the hydration is being done from the 'data' thread each time. You could use a block (runnable or some kind of command class is probably the best Java can do for you here unfortunately) that's wrapped by code that performs the load async from the 'data' thread. When you're in UI code, (on the UI thread of course) field some kind of a 'data is ready' event that is posted by the data service. You can then get the data from the event use in the UI.
You could look have a look at Ebean ORM. It is session-less and lazy loading just works. This doesn't answer your question but really proposes an alternative.
I know Ebean has built in support for asynchronous query execution which may also be interesting for your scenario.
Maybe worth a look.
Rob.
There are two distinct problems, that should get resolved seperately:
Handling of Hibernate Sessions in Swing Applications. Let me recommend my own article, regarding this problem: http://blog.schauderhaft.de/2008/09/28/hibernate-sessions-in-two-tier-rich-client-applications/
The basic idea is to have a session for every frame, excluding modal frames which use the session of the spawning frame. It is not easy but it works. Meaning, you won't get any LLEs anymore.
How to get your GUI thread separated from the back end.
I recommend to keep the hibernate objects strictly on the back end thread they originate from. Only give wrapper objects to the ETD. If these wrapper objects are asked for a value, they create a request which gets passed to the backend thread, which eventually will return the value.
I'd envision three kinds of wrapper Implementations:
Async: requests the value, and gets notified when the value is available. It would return immediately with some dummy value. On notification it will fire a PropertyChange event i.O. to inform the GUI about the 'changed' value (changed from unknown to a real value).
Sync: requests the value and waits for it to be available.
Timed: a mixture between the two, waiting for a short time (0.01) seconds, before returning. This would avoid plenty change events, compared to the async version.
As a basis for these wrappers a recommend the ValueModel of the JGoodies Binding library: http://www.jgoodies.com/downloads/libraries.html
Obviously You need to take care that any action is only performed on actually loaded values, but since you don't plan on doing updates this shouldn't be to much of an issue.
Let me end with a warning: I have thought about it a lot, but never actually tried it, so move with care.