How to limit DB Connections in massive scale Java App - java

I have a Java app with more than 100 servers. Currently each server opens up connections to 7 database schemas in a relational databases (logging, this, that, the other). All schemas connect to the same DB cluster, but its all effectively one database instance.
The server managed connection pools open up a handfull of connections (1 - 5) on each database schema per instance, then double that on a redundant pool. So each server open up a minimum of 30 database connections and can grow to a maximum of several hundred per server, and again, there are more than 100 servers.
All in all, the minimum number of database connections used are 3000, and this can grow to ludricous.
Obviously this is just plain wrong. The database cluster can only effeciently handle X concurrent requests and any number of requests > X introduces unnecessary contention and slows the whole lot down. (X is unknown, but it is way smaller than the 3000 minimum concurrent connections).
I want to lower the total connections used by implementing the following strategy:
Connect to one schema only (Application-X), have 6 connections per pool maximum.
Write a layer above the pool that will switch to the schema I want. The getConnection(forSchema) function will take a parameter for the target schema (eg. logging), will get a connection that could last be pointing to any schema, and issue a schema switch SQL statement (set search_path to 'target_schema').
Please do not comment on whether this approach is right or wrong. Because 'it depends' needs to be considered, such comments will not add value.
My question is whether there is a DB pool implementation out there that already does this - allows me to have one set of connections and automatically places me at the right schema, or better yet - tracks whether a pooled connection is available for your target schema before making a decision to go ahead and switch the schema (saves a DB round trip).
I would also like to hear from anyone else who has a similar issue (real-world experience) if you solved it in a different way.

Having learned the hard way myself, the best way to stabilize the number of database connections between a web application and a database is to put a reverse proxy in front of the web application.
Here's why it works:
A slow part of a web request can be returning the data to the client. If there's a lot of data or the user is on a slow connection, the connection can remain open to the web server where the data dribbles out to the client slowly. Meanwhile, the application server continues to hold a database connection open to the backend server. While the database connection may only needed for a fraction of the transaction, it's tied up until client disconnects from the application server.
Now, consider what happens when a reverse proxy is added in front of the app server. When the app server has a response prepared, it can quickly reply back to the reverse proxy in front of it, and free up the database connection behind it. The reverse proxy can then handle slowly dribbling out responses to uses, without keeping a related database connection tied up.
Before I made this architectural change, there were a number of traffic spikes that resulted in death spirals: the database handle usage would spike to exhaustion, and things would go downhill from there.
After the change, the number of the database handles required was both far less and far more stable.
If a reverse proxy is not already part of your architecture, I recommend it as a first step to control the number of database connections you require.

Related

Direct db connection per http request vs connection pooling- what is the difference

Let's say I am storing data of Person(id, country_id, name). And let's say user just sent the id and country_id and we send back the name.
Now I have one db and 2 webserver and each webserver keeps a connection pool (e.g. c3p0) of 20 connection.
That means db is maintaining 40 connections and each webserver is maintaining 20 connections.
Analyzing the above system we can see that we used connection pool because people say "creating db connection is expensive"
This all make sense
Now let's say I shard table data on country_id, so now there may be 200 db, also assuming our app is popular now and we need to have 50 webserver.
Now the above strategy of connection pooling fails as if each webserver is keeping 20 connections in the pool for each db.
that means each webserver will have 20*200 db = 4000 connection
and each db will have 50 web server *20 = 1000 connection.
This doesn't sound good, so I got the question that why use connection pooling what is the overhead of creating 1 connection per web request?
So I run a test where I saw that DriverManager.getConnection() takes a average of 20 ms on localhost.
20 ms extra per request is not a game killer
Question1: Is there any other downside of using 1 connection per web request ?
Question2: People all over internet say "db connection is expensive". What are the different expenses?
PS: I also see pinterest doing same https://medium.com/#Pinterest_Engineering/sharding-pinterest-how-we-scaled-our-mysql-fleet-3f341e96ca6f
Other than Connection creation & Connection close cycle being a time consuming task ( i.e. being costly ) , pooling is also done to control the number of simultaneous open connections to your database since there is a limit on number of simultaneous connections that a db server can handle. When you do , one connection per request , you loose that control and your application is always at risk of crashing at peak load.
Secondly, you would unnecessarily tie your web server capacity with your database capacity and target is also to treat db connection management not as a developer concern but an infrastructure concern. Would you like to give control to open a database connection for production application to developer as per his/her code?
In traditional monolithic application servers like Weblogic, JBoss, WebSphere etc, Its sys admin who will create a connection pool as per db serer capacity and pass on JNDI name to developers to use to.Developer's job is to only get connection using that JNDI.
Next comes if database is shared among various independent applications then pooling lets you know as what you are giving out to which application. Some apps might be more data intensive and some might not be that intensive.
Traditional problem of resource leak i.e when developers forget to cleanly close their connection is also taken care of with pooling.
All in all - idea behind pooling is to let developers be concerned only about using a connection and do their job and not being worried about opening and closing it. If a connection is not being used for X minutes, it will be returned to pool per configuration.
If you have a busy web site and every request to the database opens and closes a connection, you are dead in the water.
The 20ms you measured are for a localhost connection. I don't think that all your 50 web servers will be on localhost...
Apart from the time it takes to establish and close a database connection, it also uses resources on the database server. This is mostly the CPU, but there could also be contention on kernel data structures.
Also, if you allow several thousand connections, there is nothing that keeps them from all gettings busy at the same time, in which case your database server will be overloaded and unresponsive unless it has several thousand cores (and even then you'd be limited by lock contention).
Your solution is an external connection pool like pgBouncer.

Java many connections to API server with threads

I have created a custom web API for submitting data to a MYSQL server for my Java application. Within my Java application I need to add/update 200 rows, and I already have it making these requests one at a time.
This can be pretty time consuming, so can I create threads for all
of these different connections?
Should I limit the number of maximum connections made at a time? Maybe like 10 at a time?
Will this cause any issues with mysql
possibly adding rows at exactly the same time? (No 2 rows would ever
be needed to be changed at a single point in time)
Inserting records in multiple connections will potentially speed up the 200 inserts, but you can only know for sure by testing and measuring after introducing multiple threads. I will also suggest trying JDBC batch and sending all 200 inserts to the database in one go (if that is possible in your implementation), as that might provide a performance boost by saving round trips to the database.
To create a connection pool, look at HikariCP which is a JDBC connection pool implementation. It will allow you to specify the min/max concurrent connections along with other settings. Your worker threads then can request connections from the pool and perform the inserts.
Inserting multiple records concurrently could have issues at mySQL level if it acquires a table lock for each insert. In that case you would not get a speed improvement with multiple threads and might need some tuning at the database level to work around it. Here's a good article that should help: High rate insertion with mySQL

Optimal database access for continued and multiple CRUD operations

Before I ask this question, I am sure this has been asked before but I had a hard time filling in proper terms to find this. As a result I was unable to find any information. So I apologize if it has been asked before.
Consider the following scenario:
A game server is backed by an SQL database for player storage and logging. Every time a player logs in data is retrieved and written. Also every few seconds (20 seconds or something) the logs are written to the database including changed data about the players.
I am wondering how to handle these connections. Keeping the connection open forever is a bad idea because the MySQL server closes it after "inactivity".
Opening the connection each time works but I am wondering if it is the best approach or is there another possibility?
That is what connection pools are good for. Try HikariCP, its extremly fast. You can use it on top of plain JDBC, as well as JPA or O/R mappers. It will keep a set of connections open (pooling) and manage their reuse, if you have a lot of concurrent connections.
If you have to store logs in the database, there are several logging frameworks, that already have funtions to do so. For example Logback has a DBAppender that works on top of connection pools:
"..., sending 500 logging requests to the aforementioned MySQL database takes around 0.5 seconds, for an average of 1 millisecond per request, that is a tenfold improvement in performance." (source)

Opening a new database connection for every client that connects to the server application?

I am in the process of building a client-server application and I would really like an advise on how to design the server-database connection part.
Let's say the basic idea is the following:
Client authenticates himself on the server.
Client sends a request to server.
Server stores client's request to the local database.
In terms of Java Objects we have
Client Object
Server Object
Database Object
So when a client connects to the server a session is created between them through which all the data is exchanged. Now what bothers me is whether i should create a database object/connection for each client session or whether I should create one database object that will handle all requests.
Thus the two concepts are
Create one database object that handles all client requests
For each client-server session create a database object that is used exclusively for the client.
Going with option 1, I guess that all methods should become synchronized in order to avoid one client thread not overwriting the variables of the other. However, making it synchronize it will be time consuming in the case of a lot of concurrent requests as each request will be placed in queue until the one running is completed.
Going with option 2, seems a more appropriate solution but creating a database object for every client-server session is a memory consuming task, plus creating a database connection for each client could lead to a problem again when the number of concurrent connected users is big.
These are just my thoughts, so please add any comments that it may help on the decision.
Thank you
Option 3: use a connection pool. Every time you want to connect to the database, you get a connection from the pool. When you're done with it, you close the connection to give it back to the pool.
That way, you can
have several clients accessing the database concurrently (your option 1 doesn't allow that)
have a reasonable number of connections opened and avoid bringing the database to its knees or run out of available connections (your option 2 doesn't allow that)
avoid opening new database connections all the time (your option 2 doesn't allow that). Opening a connection is a costly operation.
Basically all server apps use this strategy. All Java EE servers come with a connection pool. You can also use it in Java SE applications, by using a pool as a library (HikariCP, Tomcat connection pool, etc.)
I would suggested a third option, database connection pooling. This way you create a specified number of connections and give out the first available free connection as soon as it becomes available. This gives you the best of both worlds - there will almost always be free connections available quickly and you keep the number of connections the database at a reasonable level. There are plenty of the box java connection pooling solutions, so have a look online.
Just use connection pooling and go with option 2. There are quite a few - C3P0, BoneCP, DBCP. I prefer BoneCP.
Both are not good solutions.
Problem with Option 1:
You already stated the problems with synchronizing when there are multiple threads. But apart from that there are many other problems like transaction management (when are you going to commit your connection?), Security (all clients can see precommitted values).. just to state a few..
Problem with Option 2:
Two of the biggest problems with this are:
It takes a lot of time to create a new connection each and every time. So performance will become an issue.
Database connections are extremely expensive resources which should be used in limited numbers. If you start creating DB Connections for every client you will soon run out of them although most of the connections would not be actively used. You will also see your application performance drop.
The Connection Pooling Option
That is why almost all client-server applications go with the connection pooling solution. You have a set connections in the pool which are obtained and released appropriately. Almost all Java Frameworks have sophisticated connection pooling solutions.
If you are not using any JDBC framework (most use the Spring JDBC\Hibernate) read the following article:
http://docs.oracle.com/javase/jndi/tutorial/ldap/connect/pool.html
If you are using any of the popular Java Frameworks like Spring, I would suggest you use Connection Pooling provided by the framework.

Does H2 spawn a new thread for each remote connection? Then, is there a limit?

I am developing an online mobile game. I have several server machines running numerous instances of a Java socket server application.
Player data has to be stored somewhere (their profiles, items etc). I want to use the H2 database for this purpose.
Now, here's the tricky part: I want all the player data to be stored in the same H2 database. That is, all my server applications will access the data by remotely connecting to one particular machine over TCP, out of convenience.
The thing is, we are expecting a very large amount of clients on launch. For each client, a connection to the H2 database is created. The obvious concern here is whether one single H2 database process can handle so many connections concurrently.
From the website:
There is no limit on the number of database open concurrently per
server, or on the number of open connections.
Given the above fact, in theory, if our server machine has enough resources (memory, space, CPUs, etc), then yes, the H2 database should be able to handle as many concurrent connections as our resources allow.
But there is something unclear to me:
Does the H2 process create a thread for each remote connection? I ask this because I once read that in Windows (our VPS' OS), a thread is stored as a short type, and hence the max amount of threads an application can spawn is roughly 32,000 (I don't know the math they used to get that number). In that case, then the H2 process does have a limit of concurrent connections - which is troubling because I do indeed expect more than 32,000 clients connected.
Of course, it would seem wise to discard the idea of having one single H2 database for all my clients. But I'd like to know if the above statement is correct: can H2 handle more than 32,000 remote database connections?
Let take this by parts:
"Does the H2 process create a thread for each remote connection?"
An application should normally use one connection per thread. An H2 database synchronizes access to the same connection, but other databases may not do this.
"can H2 handle more than 32,000 remote database connections?"
If you want to access the same database at the same time from different processes or computers, you need to use the client / server mode. The JdbcConnectionPool class has the default maximum number of connections set to 10, but it provides a setter to change it if you want. In theory, you can set it to Integer.MAX_VALUE, but I don't think this is wise. Why? For starters, the synchronization point made on the previous section. Another point to consider is if your application opens and closes connections a lot (for example, for each request), you should consider using a connection pool. Opening a DB connection is very slow.
"Of course, it would seem wise to discard the idea of having one single H2 database for all my clients"
It might be, but you have to keep in mind that the number of open database is limited by the memory available. If you are running on a powerful server, it might be a good option to consider. Then again, it might not.

Categories