vert.x replication over a cluster - java

I want to user vert.x to implement a socket server.
I will be using a cluster and I have one problem I can not figure out. If say I create a ConcurrentMap to store the socket connections on one verticle and this is accessed by the other verticles on other clusters, what happens if the node that is running the first verticle with the ConurrentMap crashes? Obviously I would lose all connections in the ConcurrentMap. How would I replicate this ConcurrentMap so that one is always ready in-case of a crash? I have looked over the documentation and there does not seem to be a solution for replication. The only solution I can think of is whenever there is a new socket connection to insert it into the concurrentMap and also create an in-memory redis database and insert a new socket connection there everytime. This though seems like too much overkill and to recover could take a lot of time if there are a lot of connections (millions). Is there any easier way?

Related

Opening a new database connection for every client that connects to the server application?

I am in the process of building a client-server application and I would really like an advise on how to design the server-database connection part.
Let's say the basic idea is the following:
Client authenticates himself on the server.
Client sends a request to server.
Server stores client's request to the local database.
In terms of Java Objects we have
Client Object
Server Object
Database Object
So when a client connects to the server a session is created between them through which all the data is exchanged. Now what bothers me is whether i should create a database object/connection for each client session or whether I should create one database object that will handle all requests.
Thus the two concepts are
Create one database object that handles all client requests
For each client-server session create a database object that is used exclusively for the client.
Going with option 1, I guess that all methods should become synchronized in order to avoid one client thread not overwriting the variables of the other. However, making it synchronize it will be time consuming in the case of a lot of concurrent requests as each request will be placed in queue until the one running is completed.
Going with option 2, seems a more appropriate solution but creating a database object for every client-server session is a memory consuming task, plus creating a database connection for each client could lead to a problem again when the number of concurrent connected users is big.
These are just my thoughts, so please add any comments that it may help on the decision.
Thank you
Option 3: use a connection pool. Every time you want to connect to the database, you get a connection from the pool. When you're done with it, you close the connection to give it back to the pool.
That way, you can
have several clients accessing the database concurrently (your option 1 doesn't allow that)
have a reasonable number of connections opened and avoid bringing the database to its knees or run out of available connections (your option 2 doesn't allow that)
avoid opening new database connections all the time (your option 2 doesn't allow that). Opening a connection is a costly operation.
Basically all server apps use this strategy. All Java EE servers come with a connection pool. You can also use it in Java SE applications, by using a pool as a library (HikariCP, Tomcat connection pool, etc.)
I would suggested a third option, database connection pooling. This way you create a specified number of connections and give out the first available free connection as soon as it becomes available. This gives you the best of both worlds - there will almost always be free connections available quickly and you keep the number of connections the database at a reasonable level. There are plenty of the box java connection pooling solutions, so have a look online.
Just use connection pooling and go with option 2. There are quite a few - C3P0, BoneCP, DBCP. I prefer BoneCP.
Both are not good solutions.
Problem with Option 1:
You already stated the problems with synchronizing when there are multiple threads. But apart from that there are many other problems like transaction management (when are you going to commit your connection?), Security (all clients can see precommitted values).. just to state a few..
Problem with Option 2:
Two of the biggest problems with this are:
It takes a lot of time to create a new connection each and every time. So performance will become an issue.
Database connections are extremely expensive resources which should be used in limited numbers. If you start creating DB Connections for every client you will soon run out of them although most of the connections would not be actively used. You will also see your application performance drop.
The Connection Pooling Option
That is why almost all client-server applications go with the connection pooling solution. You have a set connections in the pool which are obtained and released appropriately. Almost all Java Frameworks have sophisticated connection pooling solutions.
If you are not using any JDBC framework (most use the Spring JDBC\Hibernate) read the following article:
http://docs.oracle.com/javase/jndi/tutorial/ldap/connect/pool.html
If you are using any of the popular Java Frameworks like Spring, I would suggest you use Connection Pooling provided by the framework.

Does H2 spawn a new thread for each remote connection? Then, is there a limit?

I am developing an online mobile game. I have several server machines running numerous instances of a Java socket server application.
Player data has to be stored somewhere (their profiles, items etc). I want to use the H2 database for this purpose.
Now, here's the tricky part: I want all the player data to be stored in the same H2 database. That is, all my server applications will access the data by remotely connecting to one particular machine over TCP, out of convenience.
The thing is, we are expecting a very large amount of clients on launch. For each client, a connection to the H2 database is created. The obvious concern here is whether one single H2 database process can handle so many connections concurrently.
From the website:
There is no limit on the number of database open concurrently per
server, or on the number of open connections.
Given the above fact, in theory, if our server machine has enough resources (memory, space, CPUs, etc), then yes, the H2 database should be able to handle as many concurrent connections as our resources allow.
But there is something unclear to me:
Does the H2 process create a thread for each remote connection? I ask this because I once read that in Windows (our VPS' OS), a thread is stored as a short type, and hence the max amount of threads an application can spawn is roughly 32,000 (I don't know the math they used to get that number). In that case, then the H2 process does have a limit of concurrent connections - which is troubling because I do indeed expect more than 32,000 clients connected.
Of course, it would seem wise to discard the idea of having one single H2 database for all my clients. But I'd like to know if the above statement is correct: can H2 handle more than 32,000 remote database connections?
Let take this by parts:
"Does the H2 process create a thread for each remote connection?"
An application should normally use one connection per thread. An H2 database synchronizes access to the same connection, but other databases may not do this.
"can H2 handle more than 32,000 remote database connections?"
If you want to access the same database at the same time from different processes or computers, you need to use the client / server mode. The JdbcConnectionPool class has the default maximum number of connections set to 10, but it provides a setter to change it if you want. In theory, you can set it to Integer.MAX_VALUE, but I don't think this is wise. Why? For starters, the synchronization point made on the previous section. Another point to consider is if your application opens and closes connections a lot (for example, for each request), you should consider using a connection pool. Opening a DB connection is very slow.
"Of course, it would seem wise to discard the idea of having one single H2 database for all my clients"
It might be, but you have to keep in mind that the number of open database is limited by the memory available. If you are running on a powerful server, it might be a good option to consider. Then again, it might not.

Reliable/persistent outbound sockets needed, options?

I have a Scala application which maintains (or tries to) TCP connections to various servers for hours (possibly > 24) at a time. Each server sends a short, ~30 character message about twice a second. These messages are fed into an iteratee where they are parsed and eventually end up making state changes to a database.
If any of these connections fail for any reason, my app needs to continually try to reconnect until I specify otherwise. Any messages getting lost is Bad. I have no control over the servers I connect to, or the protocols used.
It is conceivable there would be as many as 300 of these connections at once. No exactly a high-load scenario, so I don't think NIO is needed, though it might be nice to have? Other bits of the app are high-load.
I'm looking for some sort of socket controller / manager which can keep these connections as reliably as possible. I am running my own blocking controller now, but as I'm inexperienced with socket coding (and all the various settings, options, timeouts, etc.) I doubt its will achieve the best possible uptime. Plus I may need SSL support at some point down the line.
Would NIO offer any real advantages?
Would Netty be the best choice here? I've seen the Uptime example here, and was thinking of simply duplicating it, but being new to lower-level networking I wasn't sure if there were better options.
However I'm uncertain of the best strategies for ensuring as few packets are lost as possible, and assumed this would be a "solved" problem in one library or another.
Yup. JMS is an example.
I suppose a lot of it would come down to a timeout guessing strategy? Close and re-open a socket too early and you've lost whatever packets were en-route.
That is correct. That approach is not going to be reliable, especially if connections go up and down regularly.
A real solution involves having the other end keep track of what it has received, and letting the sender know when then connection is re-established. If that can't be done, you have no real way of controlling how much gets lost. (This is what the reliable messaging services do ...)
I have no control over the servers I connect to. So unless there's another way to adapt JMS to a generic TCP stream I don't think it will work.
Yup. And the same applies if you try to implement this by hand. The other end has to cooperate.
I guess you could construct something where you run (say) a JMS end point on each of the remote servers, and have the endpoint use UNIX domain sockets or loopback (i.e. 127.0.0.1) to talk to the server. But you still have potential for message loss.

Passing around DB connection objects between multiple map-reduce jobs

Fundamentally, this question is about: Can the same DB connection be used across multiple processes (as different map-reduce jobs are in real different independent processes).
I know that this is a little trivial question but it would be great if somebody can answer this as well: What happens in case if the maximum number of connections to the DB(which is preconfigured on the server hosting the DB) have exhausted and a new process tries to get a new connection? Does it wait for sometime, and if yes, is there a way to set a timeout for this wait period. I am talking in terms of a PostGres DB in this particular case and the language used for talking to the DB is java.
To give you a context of the problem, I have multiple map-reduce jobs (about 40 reducers) running in parallel, each wanting to update a PostGres DB. How do I efficiently manage these DB read/writes from these processes. Note: The DB is hosted on a separate machine independent of where the map reduce job is running.
Connection pooling is one option but it can be very inefficient at times especially for several reads/writes per second.
Can the same DB connection be used across multiple processes
No, not in any sane or reliable way. You could use a broker process, but then you'd be one step away from inventing a connection pool anyway.
What happens in case if the maximum number of connections to the
DB(which is preconfigured on the server hosting the DB) have exhausted
and a new process tries to get a new connection?
The connection attempt fails with SQLSTATE 53300 too_many_connections. If it waited, the server could exhaust other limits and begin to have issues servicing existing clients.
For a problem like this you'd usually use tools like C3P0 or DBCP that do in-JVM pooling, but this won't work when you have multiple JVMs.
What you need to do is to use an external connection pool like PgBouncer or PgPool-II to maintain a set of lightweight connections from your workers. The pooler then has a smaller number of real server connections and shares those between the lightweight connections from clients.
Connection pooling is typically more efficient than not pooling, because it allows you to optimise the number of active PostgreSQL worker processes to the hardware and workload, providing admission control for work.
An alternative is to have a writer process with one or more threads (one connection per thread) that takes finished work from the reduce workers and writes to the DB, so the reduce workers can get on to their next unit of work. You'd need to have a way to tell the reduce workers to wait if the writer got too far behind. There are several Java queueing system implementations that would be suitable for this, or you could use JMS.
See IPC Suggestion for lots of small data
It's also worth optimizing how you write to PostgreSQL as much as possible, using:
Prepared statements
A commit_delay
synchronous_commit = 'off' if you can afford to lose a few transactions if the server crashes
Batching work into bigger transactions
COPY or multi-valued INSERTs to insert blocks of data
Decent hardware with a useful disk subsystem, not some Amazon EC2 instance with awful I/O or a RAID 5 box with 5400rpm disks
A proper RAID controller with battery backed write-back cache to reduce the cost of fsync(). Most important if you can't do big batches of work or use a commit delay; has less impact if your fsync rate is lower because of batching and group commit.
See:
http://www.postgresql.org/docs/current/interactive/populate.html
http://www.depesz.com/index.php/2007/07/05/how-to-insert-data-to-database-as-fast-as-possible/

JMS Multiple threads processing

We have a web application that is generating some 3-5 parallel threads every five seconds to connect to a JMS/JNDI connection pool. We wait for the first batch of parallel threads to complete before creating next batch of parallel threads. During this process we are using a lot of network traffic and connection threads are just hanging. Eventually we manually call operations team to kill the connection threads to free up connections.
Question I wanted to ask you is:
Obviously we are doing something wrong as we are holding up connection resources
When we wait for parallel threads to respond before sending second batch of requests,Does this design not resonate well with industry best practices?
Finally what are the options and recommendations you have for this scenario i.e. multiple threads connecting to JMS/JMDI connection
Thanks for your inputs
You need to adjust your connection pool parameters. It sounds like you're using up only 3-5 connections for your service, which seems very reasonable to me. A JMS service should be able to handle thousands of connections. Either your pool's default limit is too low, or your JMS server is configured with too few allowed connections.
Are you sure that's what the other users are blocking on? It seems strange to me.
I'm almost sure that you would be alright with single connection factory. Just make sure clean up/close session properly. We uses spring's SingleConnectionFactory.

Categories