I am using Apache Ignite with a Java application and observing that with increasing concurrency, response times also increases. I noticed that there is only one connection established between the java application and the Ignite server. How can I confirm if that is the bottleneck? Thread dumps reveal that some threads are waiting for the Socket.Read method. Is it relatable to number of connections?
As of Ignite 2.7.6, Thin Client establishes only one connection to the server node. Yes, it can become a bottleneck when used from multiple threads.
I can recommend either having one IgniteClient instance per thread, or using some kind of a connection pool.
Also, Ignite 2.8 introduces Partition Awareness (release is planned for today), where thin client connection is established to every specified server node, and key-based requests are dispatched to primary nodes. This may help in your case as well.
Did you tried the applications that comes with the java JDK (JVisualVM) or best yourkit to identify where you're loosing time ?
Related
My use case is a bunch of isolated calls that at some point interact with redis.
The problem is I'm seeing a super long wait time for acquiring a connection, having tried both predis and credis on my LAN environment. Over 1-3 client threads, the time it takes for my PHP scripts to connect to redis and select a database ranges from 18ms to 700ms!
Normally, I'd use a connection pool or cache a connection and use it across all my threads, but I don't think this can be done in PHP over different scripts.
Is there anything I can do to speed this up?
Apparently, Predis needs the persistent flag set: https://github.com/nrk/predis/wiki/Connection-Parameters) and also FPM, which was frustrating to set up on both Windows and Linux, not to mention testing before switching to FPM on our live setup.
I've switched to Phpredis (https://github.com/phpredis/phpredis), which is a PHP module/extension, and all is good now. The connection times have dropped dramatically using $redis->pconnect() and are consistent across multiple scripts/threads.
Caveat: it IS a little different from Predis in terms of error handling (it fails when instantiating the object, not when running the first call, it returns false instead of null for nonexistent values - ???), so watch out for that if switching from Predis.
I am in the process of building a client-server application and I would really like an advise on how to design the server-database connection part.
Let's say the basic idea is the following:
Client authenticates himself on the server.
Client sends a request to server.
Server stores client's request to the local database.
In terms of Java Objects we have
Client Object
Server Object
Database Object
So when a client connects to the server a session is created between them through which all the data is exchanged. Now what bothers me is whether i should create a database object/connection for each client session or whether I should create one database object that will handle all requests.
Thus the two concepts are
Create one database object that handles all client requests
For each client-server session create a database object that is used exclusively for the client.
Going with option 1, I guess that all methods should become synchronized in order to avoid one client thread not overwriting the variables of the other. However, making it synchronize it will be time consuming in the case of a lot of concurrent requests as each request will be placed in queue until the one running is completed.
Going with option 2, seems a more appropriate solution but creating a database object for every client-server session is a memory consuming task, plus creating a database connection for each client could lead to a problem again when the number of concurrent connected users is big.
These are just my thoughts, so please add any comments that it may help on the decision.
Thank you
Option 3: use a connection pool. Every time you want to connect to the database, you get a connection from the pool. When you're done with it, you close the connection to give it back to the pool.
That way, you can
have several clients accessing the database concurrently (your option 1 doesn't allow that)
have a reasonable number of connections opened and avoid bringing the database to its knees or run out of available connections (your option 2 doesn't allow that)
avoid opening new database connections all the time (your option 2 doesn't allow that). Opening a connection is a costly operation.
Basically all server apps use this strategy. All Java EE servers come with a connection pool. You can also use it in Java SE applications, by using a pool as a library (HikariCP, Tomcat connection pool, etc.)
I would suggested a third option, database connection pooling. This way you create a specified number of connections and give out the first available free connection as soon as it becomes available. This gives you the best of both worlds - there will almost always be free connections available quickly and you keep the number of connections the database at a reasonable level. There are plenty of the box java connection pooling solutions, so have a look online.
Just use connection pooling and go with option 2. There are quite a few - C3P0, BoneCP, DBCP. I prefer BoneCP.
Both are not good solutions.
Problem with Option 1:
You already stated the problems with synchronizing when there are multiple threads. But apart from that there are many other problems like transaction management (when are you going to commit your connection?), Security (all clients can see precommitted values).. just to state a few..
Problem with Option 2:
Two of the biggest problems with this are:
It takes a lot of time to create a new connection each and every time. So performance will become an issue.
Database connections are extremely expensive resources which should be used in limited numbers. If you start creating DB Connections for every client you will soon run out of them although most of the connections would not be actively used. You will also see your application performance drop.
The Connection Pooling Option
That is why almost all client-server applications go with the connection pooling solution. You have a set connections in the pool which are obtained and released appropriately. Almost all Java Frameworks have sophisticated connection pooling solutions.
If you are not using any JDBC framework (most use the Spring JDBC\Hibernate) read the following article:
http://docs.oracle.com/javase/jndi/tutorial/ldap/connect/pool.html
If you are using any of the popular Java Frameworks like Spring, I would suggest you use Connection Pooling provided by the framework.
I am using Datastax Java Driver.
There is a tutorial to use the same.
What I do not understand is how would one close the connection to cassandra?
There is no close method available and I assume we do not want to shutdown the Session as it it expected to be one per application.
Regards
Gaurav
tl;dr Calling shutdown on Session is the correct way to close connections.
You should be safe to keep a Session object at hand and shut it when you're finished with Cassandra - this can be long-lived. You can get individual connections in the form of Session objects as you need them and shut them down when done but ideally you should only create one Session object per application. A Session is a fairly heavyweight object that keeps pools of pools of connections to the node in the cluster, so creating multiple of those will be inefficient (and unnecessary) (taken verbatim from advice given by Sylvain Lebresne on the mailing list). If you forget to shut down the session(s) they will be all closed when you call shutdown on your Cluster instance... really simple example below:
Cluster cluster = Cluster.builder().addContactPoints(host).withPort(port).build();
Session session = cluster.connect(keyspace);
// Do something with session...
session.shutdown();
cluster.shutdown();
See here - http://www.datastax.com/drivers....
The driver uses connections in an asynchronous manner. Meaning that
multiple requests can be submitted on the same connection at the same
time. This means that the driver only needs to maintain a relatively
small number of connections to each Cassandra host. These options
allow the driver to control how many connections are kept exactly.
For each host, the driver keeps a core pool of connections open at all
times determined by calling . If the use of those connections reaches
a configurable threshold , more connections are created up to the
configurable maximum number of connections. When the pool exceeds the
maximum number of connections, connections in excess are reclaimed if
the use of opened connections drops below the configured threshold
Each of these parameters can be separately set for LOCAL and REMOTE
hosts (HostDistance). For IGNORED hosts, the default for all those
settings is 0 and cannot be changed.
Fundamentally, this question is about: Can the same DB connection be used across multiple processes (as different map-reduce jobs are in real different independent processes).
I know that this is a little trivial question but it would be great if somebody can answer this as well: What happens in case if the maximum number of connections to the DB(which is preconfigured on the server hosting the DB) have exhausted and a new process tries to get a new connection? Does it wait for sometime, and if yes, is there a way to set a timeout for this wait period. I am talking in terms of a PostGres DB in this particular case and the language used for talking to the DB is java.
To give you a context of the problem, I have multiple map-reduce jobs (about 40 reducers) running in parallel, each wanting to update a PostGres DB. How do I efficiently manage these DB read/writes from these processes. Note: The DB is hosted on a separate machine independent of where the map reduce job is running.
Connection pooling is one option but it can be very inefficient at times especially for several reads/writes per second.
Can the same DB connection be used across multiple processes
No, not in any sane or reliable way. You could use a broker process, but then you'd be one step away from inventing a connection pool anyway.
What happens in case if the maximum number of connections to the
DB(which is preconfigured on the server hosting the DB) have exhausted
and a new process tries to get a new connection?
The connection attempt fails with SQLSTATE 53300 too_many_connections. If it waited, the server could exhaust other limits and begin to have issues servicing existing clients.
For a problem like this you'd usually use tools like C3P0 or DBCP that do in-JVM pooling, but this won't work when you have multiple JVMs.
What you need to do is to use an external connection pool like PgBouncer or PgPool-II to maintain a set of lightweight connections from your workers. The pooler then has a smaller number of real server connections and shares those between the lightweight connections from clients.
Connection pooling is typically more efficient than not pooling, because it allows you to optimise the number of active PostgreSQL worker processes to the hardware and workload, providing admission control for work.
An alternative is to have a writer process with one or more threads (one connection per thread) that takes finished work from the reduce workers and writes to the DB, so the reduce workers can get on to their next unit of work. You'd need to have a way to tell the reduce workers to wait if the writer got too far behind. There are several Java queueing system implementations that would be suitable for this, or you could use JMS.
See IPC Suggestion for lots of small data
It's also worth optimizing how you write to PostgreSQL as much as possible, using:
Prepared statements
A commit_delay
synchronous_commit = 'off' if you can afford to lose a few transactions if the server crashes
Batching work into bigger transactions
COPY or multi-valued INSERTs to insert blocks of data
Decent hardware with a useful disk subsystem, not some Amazon EC2 instance with awful I/O or a RAID 5 box with 5400rpm disks
A proper RAID controller with battery backed write-back cache to reduce the cost of fsync(). Most important if you can't do big batches of work or use a commit delay; has less impact if your fsync rate is lower because of batching and group commit.
See:
http://www.postgresql.org/docs/current/interactive/populate.html
http://www.depesz.com/index.php/2007/07/05/how-to-insert-data-to-database-as-fast-as-possible/
Not being a database administrator (even less of a MS database admin :), I have received complaints that a piece of code I've written leaves "sleeping connections" behind in the database.
My code is Java, and uses Apache Commons DBCP for connection pooling. I also use Spring's JdbcTemplate to manage the connection's state, so not closing the connections is out of the question (since the library is doing that for me).
My main question is, from a DBA's point of view, can these connections cause outages or poor performance?
This question is related, currently the settings were left as they were there (infinite active/idle connections in the pool).
Really, to answer your question, an idea of the number of these "sleeping" connections would be good. It also matters whether this server's primary purpose is serving your application, or whether your application is one of many. Also relevant is whether there are multiple instances of your app (eg on multiple web servers), or whether it's just the one.
In my experience, there is little to no overhead associated with idle connections on modern hardware, as long as you don't reach into the hundreds. That said, looking at your previous question, allowing the pool to spawn an unbounded number of connections does not sound wise - I'd recommend setting a cap, even if you set it at a hundreds.
I can tell you from at least one painful situation with leaking connection pools, that having a thousand open connections to a single SQL server is expensive, even if they're idle. I seem to recall the server started losing it (failing to accept new connections, simple queries timing out, etc) when nearing the 2,000-connection range (this was SQL 2000 on mid-range hardware a few years ago).
Hope this helps!
Apache DBCP has maxIdle connections settings to 8 and maxActive settings to 8. This means that 8 number of active connections and 8 numbers of idle connections can exist in the pool. DBCP reuses the connections when the call for connection is made. You can set this according to your requirement. You can refer to the document below:
DBCP Configuration - Apache