Multithreaded data processing hanging on PostgreSQL

Multithreaded data processing hanging on PostgreSQL - java

I'm trying to use the 8 threads from my new processor to handle transactions on the PostgreSQL database. It have to process geographic data in PostGIS, what I already do with just 1 processor core (one thread). I'm using Java (JDBC4) to create one Connection for each thread. Each connection receives the job to process groups of geometric entities, where one SELECT and one UPDATE statements are used for each entity. Each entity is processed by unique ID and no relation functions are used, so there is no dependencies between the transactions.
The application can be started to run with a variable number of threads. When I run it, all except one of the threads hang. Even if I try to run with just two threads, one hangs. With the "Server status" tool from pgAdmin3 I can see that all the hanging threads are "IDLE in transaction", some in "ExclusiveLock" mode, some in "RowExclusiveLock" mode and some in "AccessShareLock" mode.
I've adjusted my postgresql.conf as described in http://jayant7k.blogspot.com/2010/06/postgresql-tuning-quick-tips.html
I've tried to put the threads to sleep for a while right after the UPDATE statement with no success.
Why are the locks been created? Is there a way to avoid these locks, once that are no reasons to a query depend on other?
Thanks for any help

Did you set min-pool-size and max-pool-size for JDBC connection?
In your case, minimum should be 8.

Related

redis - acquiring connection time with PHP clients

My use case is a bunch of isolated calls that at some point interact with redis.
The problem is I'm seeing a super long wait time for acquiring a connection, having tried both predis and credis on my LAN environment. Over 1-3 client threads, the time it takes for my PHP scripts to connect to redis and select a database ranges from 18ms to 700ms!
Normally, I'd use a connection pool or cache a connection and use it across all my threads, but I don't think this can be done in PHP over different scripts.
Is there anything I can do to speed this up?

Apparently, Predis needs the persistent flag set: https://github.com/nrk/predis/wiki/Connection-Parameters) and also FPM, which was frustrating to set up on both Windows and Linux, not to mention testing before switching to FPM on our live setup.
I've switched to Phpredis (https://github.com/phpredis/phpredis), which is a PHP module/extension, and all is good now. The connection times have dropped dramatically using $redis->pconnect() and are consistent across multiple scripts/threads.
Caveat: it IS a little different from Predis in terms of error handling (it fails when instantiating the object, not when running the first call, it returns false instead of null for nonexistent values - ???), so watch out for that if switching from Predis.

how to avoid race condition when multiple jvm reading/writing same instances of NoSQL db like mongodb?

I am running on a scenario where a NoSQL DB(MongoDB) will hold a record of success or failure status of a certain task and if a task is in failed state, my java scheduler application will read the status,retry and will try to execute the task once again and if task completes successfully, will update the DB record as 'success' state or 'fail' state if failed again.
Now the problem is, my application is deployed in two different nodes(so basically 2 different jvm) and both the nodes share the common NoSQL DB.At a certain point, one scheduler thread wakes up,read the DB and retries for failed tasks and before it completes its execution, another application thread from different node read DB record and and re-executes the task once again which override the previous successful data record update causes unexpected results. So in short, i am running through a race condition and i do not have control over NoSQL DB lock but I want to control at application level.
How do I prevent my application from reading the record which is already been under process by another jvm thread ??
Your suggestions will be greatly appreciated.
I am using jongo library version 1.2 and jdk 1.8

Java many connections to API server with threads

I have created a custom web API for submitting data to a MYSQL server for my Java application. Within my Java application I need to add/update 200 rows, and I already have it making these requests one at a time.
This can be pretty time consuming, so can I create threads for all
of these different connections?
Should I limit the number of maximum connections made at a time? Maybe like 10 at a time?
Will this cause any issues with mysql
possibly adding rows at exactly the same time? (No 2 rows would ever
be needed to be changed at a single point in time)

Inserting records in multiple connections will potentially speed up the 200 inserts, but you can only know for sure by testing and measuring after introducing multiple threads. I will also suggest trying JDBC batch and sending all 200 inserts to the database in one go (if that is possible in your implementation), as that might provide a performance boost by saving round trips to the database.
To create a connection pool, look at HikariCP which is a JDBC connection pool implementation. It will allow you to specify the min/max concurrent connections along with other settings. Your worker threads then can request connections from the pool and perform the inserts.
Inserting multiple records concurrently could have issues at mySQL level if it acquires a table lock for each insert. In that case you would not get a speed improvement with multiple threads and might need some tuning at the database level to work around it. Here's a good article that should help: High rate insertion with mySQL

JDBC requests to Oracle 11g failing to be commited although apparently succeding

We have an older web-based application (Java with Spring 2.5.4 framework) running on a GlassFish 3.1 (build 43) server. This application was recently (a few weeks ago) re-directed to use an Oracle 11g (11.2.0.3.0) database and ojdbc6.jar/orai18n.jar (up from Oracle 10g 10.2.0.3.0 and ojdbc14.jar) -- using a JDBC Thin connection. The application is using org.apache.commons.dbcp.BasicDataSource version 1.2.2 for connections and the database requests are handled either through Spring jdbcTemplate (via the JdbcDaoSupport abstract class) or Spring's PlatformTransactionManager.
This morning we noticed that application users were able to enter information, modify it and later to retrieve and print that data through the application, but that there were no committed updates for the last 24 hours. This application currently has only a few users each day and they are apparently sharing the same connection which has been kept open by the connection pool during the last day and so their uncommitted updates were visible through the application, but not through other connections to the database. When the connection was closed, the uncommitted updates were lost.
Examining the server logs showed no errors from the time of the last committed changes to the database through the times of printed reports the next morning. In addition, even if some of the changes had been (somehow) made with the JDBC connection being set to Auto-Commit false, there were specific commits made for some of those updates that were part of a transaction which, as part of a try/catch block should have either executed one of the "transactionManager.commit(transactionStatus);" or "transactionManager.rollback(transactionStatus);" calls that must have been processed without error. It looks as though the commit was returning successfully, but no commit actually occurred.
Restarting the GlassFish domain and the application restored the normal operation with the various updates being committed as they are entered.
My question is has anyone seen or heard about anything like this occurring and, if so, what could have caused it?
Thank you for any ideas here -- we are at a loss.
Some new information:
Examination of our Oracle 11g Server showed that near the time that we believe that the commits appeared to stop, there were four operations blocked on some other operation that we were not able to fully resolve, but was probably an update.
Examination of the Glassfish Server logs showed that the appearance of the worker threads changed following this estimated start time and fewer threads were appearing in the log until only one thread continued to be used for several hours.
The problem occurred again about one week later and was caught after about 1/2 hour. At this time, there were two worker threads in operation.

The problem occurred due to a combination of two things. The first was a method that setup a Spring Transaction, but had an exit that bypassed both the TransactionManager.commit() and the TransactionManager.rollback() (as well as the several SQL requests making up the transaction). Although this was admittedly incorrect coding, in the past, this transaction was closed and therefore had no effect on subsequent usage.
The solution was to insure that the transaction was not started if there was nothing to be done; or, in general double check to make sure that all transactions, once started, are completed.
I am not certain of the exact how or why this problem began presenting itself, so the following is partly conjectured. Apparently, upgrading to Oracle 11g and/or switching to the ojdbc6.jar driver altered the earlier behavior of the incorrect code so that the transaction was not terminated and the connection auto-commit was left false. (It could also be due to some other change that we have not identified since the special case above happens rarely – but does happen.) The corresponding JDBC connection appears to be bound to a specific GlassFish worker thread (I will call this a 'bad' thread in the following as opposed to the normally acting 'good' threads). Whenever this 'bad' thread is used to handle an application request (for this particular application), changes are uncommitted and selects return dirty data. As time goes on, when a change is requested on a 'good' thread and JDBC connection that already has an uncommitted change made on the 'bad' thread, than the new request hangs and the worker thread also hangs. Eventually all but the 'bad' worker thread are hung and everything seems to work correctly from the application viewpoint, but nothing is ever committed.
Again, the solution was to correct the bad code.

Passing around DB connection objects between multiple map-reduce jobs

Fundamentally, this question is about: Can the same DB connection be used across multiple processes (as different map-reduce jobs are in real different independent processes).
I know that this is a little trivial question but it would be great if somebody can answer this as well: What happens in case if the maximum number of connections to the DB(which is preconfigured on the server hosting the DB) have exhausted and a new process tries to get a new connection? Does it wait for sometime, and if yes, is there a way to set a timeout for this wait period. I am talking in terms of a PostGres DB in this particular case and the language used for talking to the DB is java.
To give you a context of the problem, I have multiple map-reduce jobs (about 40 reducers) running in parallel, each wanting to update a PostGres DB. How do I efficiently manage these DB read/writes from these processes. Note: The DB is hosted on a separate machine independent of where the map reduce job is running.
Connection pooling is one option but it can be very inefficient at times especially for several reads/writes per second.

Can the same DB connection be used across multiple processes
No, not in any sane or reliable way. You could use a broker process, but then you'd be one step away from inventing a connection pool anyway.
What happens in case if the maximum number of connections to the
DB(which is preconfigured on the server hosting the DB) have exhausted
and a new process tries to get a new connection?
The connection attempt fails with SQLSTATE 53300 too_many_connections. If it waited, the server could exhaust other limits and begin to have issues servicing existing clients.
For a problem like this you'd usually use tools like C3P0 or DBCP that do in-JVM pooling, but this won't work when you have multiple JVMs.
What you need to do is to use an external connection pool like PgBouncer or PgPool-II to maintain a set of lightweight connections from your workers. The pooler then has a smaller number of real server connections and shares those between the lightweight connections from clients.
Connection pooling is typically more efficient than not pooling, because it allows you to optimise the number of active PostgreSQL worker processes to the hardware and workload, providing admission control for work.
An alternative is to have a writer process with one or more threads (one connection per thread) that takes finished work from the reduce workers and writes to the DB, so the reduce workers can get on to their next unit of work. You'd need to have a way to tell the reduce workers to wait if the writer got too far behind. There are several Java queueing system implementations that would be suitable for this, or you could use JMS.
See IPC Suggestion for lots of small data
It's also worth optimizing how you write to PostgreSQL as much as possible, using:
Prepared statements
A commit_delay
synchronous_commit = 'off' if you can afford to lose a few transactions if the server crashes
Batching work into bigger transactions
COPY or multi-valued INSERTs to insert blocks of data
Decent hardware with a useful disk subsystem, not some Amazon EC2 instance with awful I/O or a RAID 5 box with 5400rpm disks
A proper RAID controller with battery backed write-back cache to reduce the cost of fsync(). Most important if you can't do big batches of work or use a commit delay; has less impact if your fsync rate is lower because of batching and group commit.
See:
http://www.postgresql.org/docs/current/interactive/populate.html
http://www.depesz.com/index.php/2007/07/05/how-to-insert-data-to-database-as-fast-as-possible/

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.