redis - acquiring connection time with PHP clients - java

My use case is a bunch of isolated calls that at some point interact with redis.
The problem is I'm seeing a super long wait time for acquiring a connection, having tried both predis and credis on my LAN environment. Over 1-3 client threads, the time it takes for my PHP scripts to connect to redis and select a database ranges from 18ms to 700ms!
Normally, I'd use a connection pool or cache a connection and use it across all my threads, but I don't think this can be done in PHP over different scripts.
Is there anything I can do to speed this up?

Apparently, Predis needs the persistent flag set: https://github.com/nrk/predis/wiki/Connection-Parameters) and also FPM, which was frustrating to set up on both Windows and Linux, not to mention testing before switching to FPM on our live setup.
I've switched to Phpredis (https://github.com/phpredis/phpredis), which is a PHP module/extension, and all is good now. The connection times have dropped dramatically using $redis->pconnect() and are consistent across multiple scripts/threads.
Caveat: it IS a little different from Predis in terms of error handling (it fails when instantiating the object, not when running the first call, it returns false instead of null for nonexistent values - ???), so watch out for that if switching from Predis.

Related

Microsoft SQL Server - Query with huge results, causes ASYNC_NETWORK_IO wait issue

I have a Java application requesting about 2.4 million records from a Microsoft SQL Server (Microsoft SQL Server 2008 R2 (SP3))
The application runs fine on all hosts, except one. On this host, the application is able to retrieve data on some occasions. But on some others, it hangs.
Monitoring the MS Sql server indicates that the SPID associated with the query is in an ASYNC_NETWORK_IO wait state.
There are a few links online that talk about it.
https://blogs.msdn.microsoft.com/joesack/2009/01/08/troubleshooting-async_network_io-networkio/
https://social.msdn.microsoft.com/Forums/sqlserver/en-US/6db233d5-8892-4f8a-88c7-b72d0fc59ca9/very-high-asyncnetworkio?forum=sqldatabaseengine
https://social.msdn.microsoft.com/Forums/sqlserver/en-US/1df2cab8-33ca-4870-9daf-ed333a64630c/network-packet-size-and-delay-by-sql-server-sending-data-to-client?forum=sqldatabaseengine
Based on the above, the ASYNC_NETWORK_IO means 2 things:
1. Application is slow to process the results
2. Network between application and DB has some issues.
For #1 above, We analyzed using tcpdumps and found that in the cases where the query goes into ASYNC_NETWORK_IO state, the application server's tcp connection has a window size that oscillates between 0 and a small number, and eventually remains stuck at 0. Based on some more analysis, aspects related to firewalls between DB and application have also been mostly ruled out.
So I am staring at #2, unable to understand what could possibly go wrong. All the more baffling because the same code has been running under similar data loads for more than a year now. And it also runs fine on other hosts.
The JDBC driver being used is sqljdbc4-4.0.jar.
This by default has an adaptive buffering feature, which does things under the hood to reduce application resources.
We use the default fetch size of 128 (which i believe is not a good one).
So i am going to experiment overriding the default adaptive buffering behavior, though the MS docs suggest that it is good to have adaptive buffering for large result sets.
I will change the connection setting to use selectMethod=cursor.
And also change the fetchSize to 1024.
Now if it does not work:
What are some aspects of the problem that are worth investigating.
Assuming its still an issue with the client, what other connection settings, network settings should be inspected/changed to make progress?
If it does work consistently, what is the impact of making the connection setting change to selectMethod=cursor
On the application side?
Database side?
Update: I tested the application adding the selectMethod=cursor to the connection. However, it results in the same issue as above.
Based on discussions with other administrators in the team - at this point the issue may be in the jdbc driver, or on the OS (when it tries to handle the data on the network).
After a good amount of discussions with the System admin, Network Admin and Database admin - it was agreed that somewhere in the OS -> Application stack, the data from network wasn't handled. In the meantime, we tested out a solution where we broke down the query to return smaller sized results. So we broke it down to 5 queries, each returning about 500k records.
Now when we ran these queries sequentially, we still ran into the same issue.
However, when we ran the queries in parallel, it always was successful.
Given that the solution worked always we haven't bothered getting to the root cause of the problem anymore.
On another note, the hardware and software running the application was also outdated. It was running Red Hat 5. So, it could well have to do something with that.

Opening a new database connection for every client that connects to the server application?

I am in the process of building a client-server application and I would really like an advise on how to design the server-database connection part.
Let's say the basic idea is the following:
Client authenticates himself on the server.
Client sends a request to server.
Server stores client's request to the local database.
In terms of Java Objects we have
Client Object
Server Object
Database Object
So when a client connects to the server a session is created between them through which all the data is exchanged. Now what bothers me is whether i should create a database object/connection for each client session or whether I should create one database object that will handle all requests.
Thus the two concepts are
Create one database object that handles all client requests
For each client-server session create a database object that is used exclusively for the client.
Going with option 1, I guess that all methods should become synchronized in order to avoid one client thread not overwriting the variables of the other. However, making it synchronize it will be time consuming in the case of a lot of concurrent requests as each request will be placed in queue until the one running is completed.
Going with option 2, seems a more appropriate solution but creating a database object for every client-server session is a memory consuming task, plus creating a database connection for each client could lead to a problem again when the number of concurrent connected users is big.
These are just my thoughts, so please add any comments that it may help on the decision.
Thank you
Option 3: use a connection pool. Every time you want to connect to the database, you get a connection from the pool. When you're done with it, you close the connection to give it back to the pool.
That way, you can
have several clients accessing the database concurrently (your option 1 doesn't allow that)
have a reasonable number of connections opened and avoid bringing the database to its knees or run out of available connections (your option 2 doesn't allow that)
avoid opening new database connections all the time (your option 2 doesn't allow that). Opening a connection is a costly operation.
Basically all server apps use this strategy. All Java EE servers come with a connection pool. You can also use it in Java SE applications, by using a pool as a library (HikariCP, Tomcat connection pool, etc.)
I would suggested a third option, database connection pooling. This way you create a specified number of connections and give out the first available free connection as soon as it becomes available. This gives you the best of both worlds - there will almost always be free connections available quickly and you keep the number of connections the database at a reasonable level. There are plenty of the box java connection pooling solutions, so have a look online.
Just use connection pooling and go with option 2. There are quite a few - C3P0, BoneCP, DBCP. I prefer BoneCP.
Both are not good solutions.
Problem with Option 1:
You already stated the problems with synchronizing when there are multiple threads. But apart from that there are many other problems like transaction management (when are you going to commit your connection?), Security (all clients can see precommitted values).. just to state a few..
Problem with Option 2:
Two of the biggest problems with this are:
It takes a lot of time to create a new connection each and every time. So performance will become an issue.
Database connections are extremely expensive resources which should be used in limited numbers. If you start creating DB Connections for every client you will soon run out of them although most of the connections would not be actively used. You will also see your application performance drop.
The Connection Pooling Option
That is why almost all client-server applications go with the connection pooling solution. You have a set connections in the pool which are obtained and released appropriately. Almost all Java Frameworks have sophisticated connection pooling solutions.
If you are not using any JDBC framework (most use the Spring JDBC\Hibernate) read the following article:
http://docs.oracle.com/javase/jndi/tutorial/ldap/connect/pool.html
If you are using any of the popular Java Frameworks like Spring, I would suggest you use Connection Pooling provided by the framework.

Does H2 spawn a new thread for each remote connection? Then, is there a limit?

I am developing an online mobile game. I have several server machines running numerous instances of a Java socket server application.
Player data has to be stored somewhere (their profiles, items etc). I want to use the H2 database for this purpose.
Now, here's the tricky part: I want all the player data to be stored in the same H2 database. That is, all my server applications will access the data by remotely connecting to one particular machine over TCP, out of convenience.
The thing is, we are expecting a very large amount of clients on launch. For each client, a connection to the H2 database is created. The obvious concern here is whether one single H2 database process can handle so many connections concurrently.
From the website:
There is no limit on the number of database open concurrently per
server, or on the number of open connections.
Given the above fact, in theory, if our server machine has enough resources (memory, space, CPUs, etc), then yes, the H2 database should be able to handle as many concurrent connections as our resources allow.
But there is something unclear to me:
Does the H2 process create a thread for each remote connection? I ask this because I once read that in Windows (our VPS' OS), a thread is stored as a short type, and hence the max amount of threads an application can spawn is roughly 32,000 (I don't know the math they used to get that number). In that case, then the H2 process does have a limit of concurrent connections - which is troubling because I do indeed expect more than 32,000 clients connected.
Of course, it would seem wise to discard the idea of having one single H2 database for all my clients. But I'd like to know if the above statement is correct: can H2 handle more than 32,000 remote database connections?
Let take this by parts:
"Does the H2 process create a thread for each remote connection?"
An application should normally use one connection per thread. An H2 database synchronizes access to the same connection, but other databases may not do this.
"can H2 handle more than 32,000 remote database connections?"
If you want to access the same database at the same time from different processes or computers, you need to use the client / server mode. The JdbcConnectionPool class has the default maximum number of connections set to 10, but it provides a setter to change it if you want. In theory, you can set it to Integer.MAX_VALUE, but I don't think this is wise. Why? For starters, the synchronization point made on the previous section. Another point to consider is if your application opens and closes connections a lot (for example, for each request), you should consider using a connection pool. Opening a DB connection is very slow.
"Of course, it would seem wise to discard the idea of having one single H2 database for all my clients"
It might be, but you have to keep in mind that the number of open database is limited by the memory available. If you are running on a powerful server, it might be a good option to consider. Then again, it might not.

PHP vs Java MySQL connection pooling

I have a problem that thought I will break down to the simplest. There two applications on LAMP stack, one PHP and another Java that do the only and exactly the same thing: run a simple query:
SELECT * FROM test
PHP execution takes 30 ms in total
Java excution takes 230 ms in total
Query run on a local MySQL client takes 10-15 ms in total
Java takes ~200 ms roughly every time only to establish a connection to the db. I understand that PHP uses some kind of built in connection pooling, therefor it doesn't need to establish a new connection every time and only takes 30 ms as a result of it.
Is the same thing possible on Java? So far I failed to achieve that. I tried to use Apache Commons DBCP connection pooling, no change at all, still takes the same time to connect to the database.
UPDATE: This is a separate question where I'm trying to make connection pooling work on Java, for those who are asking for a code example: Java MySQL connetion pool is not working
You are misunderstanding the concept and purpose of connection pooling.
Connection pooling is meant to maintain a (set of) connections on a single (Java Virtual) machine (typically, an application server). The goal is to allow multiple threads on the same machine to essentially share their connection to the database server without the need to open one everytime they need to query the database.
Stand-alone applications cannot share connections as they run on different Java Virtual Machines, or perhaps even on different physical machines.
Connection pooling would help in your stand-alone application if you had several threads making concurrent access to the database.
However, you can still measure the benefit of connection pooling by wrapping your test (the code inside your try...catch) in a loop and iterating a few times. At the first iteration, the connection needs to be opened. Don't forget to release the connection (call Con.close()), then it would be reused at the next iteration.

Passing around DB connection objects between multiple map-reduce jobs

Fundamentally, this question is about: Can the same DB connection be used across multiple processes (as different map-reduce jobs are in real different independent processes).
I know that this is a little trivial question but it would be great if somebody can answer this as well: What happens in case if the maximum number of connections to the DB(which is preconfigured on the server hosting the DB) have exhausted and a new process tries to get a new connection? Does it wait for sometime, and if yes, is there a way to set a timeout for this wait period. I am talking in terms of a PostGres DB in this particular case and the language used for talking to the DB is java.
To give you a context of the problem, I have multiple map-reduce jobs (about 40 reducers) running in parallel, each wanting to update a PostGres DB. How do I efficiently manage these DB read/writes from these processes. Note: The DB is hosted on a separate machine independent of where the map reduce job is running.
Connection pooling is one option but it can be very inefficient at times especially for several reads/writes per second.
Can the same DB connection be used across multiple processes
No, not in any sane or reliable way. You could use a broker process, but then you'd be one step away from inventing a connection pool anyway.
What happens in case if the maximum number of connections to the
DB(which is preconfigured on the server hosting the DB) have exhausted
and a new process tries to get a new connection?
The connection attempt fails with SQLSTATE 53300 too_many_connections. If it waited, the server could exhaust other limits and begin to have issues servicing existing clients.
For a problem like this you'd usually use tools like C3P0 or DBCP that do in-JVM pooling, but this won't work when you have multiple JVMs.
What you need to do is to use an external connection pool like PgBouncer or PgPool-II to maintain a set of lightweight connections from your workers. The pooler then has a smaller number of real server connections and shares those between the lightweight connections from clients.
Connection pooling is typically more efficient than not pooling, because it allows you to optimise the number of active PostgreSQL worker processes to the hardware and workload, providing admission control for work.
An alternative is to have a writer process with one or more threads (one connection per thread) that takes finished work from the reduce workers and writes to the DB, so the reduce workers can get on to their next unit of work. You'd need to have a way to tell the reduce workers to wait if the writer got too far behind. There are several Java queueing system implementations that would be suitable for this, or you could use JMS.
See IPC Suggestion for lots of small data
It's also worth optimizing how you write to PostgreSQL as much as possible, using:
Prepared statements
A commit_delay
synchronous_commit = 'off' if you can afford to lose a few transactions if the server crashes
Batching work into bigger transactions
COPY or multi-valued INSERTs to insert blocks of data
Decent hardware with a useful disk subsystem, not some Amazon EC2 instance with awful I/O or a RAID 5 box with 5400rpm disks
A proper RAID controller with battery backed write-back cache to reduce the cost of fsync(). Most important if you can't do big batches of work or use a commit delay; has less impact if your fsync rate is lower because of batching and group commit.
See:
http://www.postgresql.org/docs/current/interactive/populate.html
http://www.depesz.com/index.php/2007/07/05/how-to-insert-data-to-database-as-fast-as-possible/

Categories