Managing multiple outgoing TCP connections - java

My program needs to send data to multiple (about 50) "client" stations. Important bits of data must be sent over TCP to ensure arrival. The connections are mostly fixed and are not expected to change during a single period of activity of the program.
What do you think would be the best architecture for this? I've heard that creating a new thread per connection is generally not recommended, but is this recommendation valid when connections are not expected to change? Scalability would be nice to have but is not much of a concern as the number of client stations is not expected to grow.
The program is written in Java if it matters.
Thanks,
Alex

If scalability, throughput and memory usage are not a concern, then using 50 threads is an adequate solution. It has the advantage of being simple, and simplicity is a good thing.
If you want to be able to scale, or you are concerned about memory usage (N threads implies N thread stacks) then you need to consider an architecture using NIO selectors. However, the best architecture probably depends on things like:
the amount of work that needs to be performed for each client station,
whether the work is evenly spread (on average),
whether the work involves other I/O, access to shared data structures, etc and
how close the aggregate work is to saturating a single processor.

50 threads is fine, go for it. It hardly matters. Anything over 200 threads, start to worry..

I'd use thread pool anyway. Depending on your thread pool configuration it will create as many threads as you need but this solution is more scalable. It will be ok not only for 50 but also for 5000 clients.

Why don't you limit the amount of threads by using someting like a connection Pool?

Related

Thread per connection vs one thread for all connections in java

I have two different types of server and clients working at the moment and i am trying to decide which one would be better for an MMO server or at least a small MMO like server with at least 100 players at a time.
my first server is using a thread per connection model and sending objects over the socket using ObjectOutputStream.
my second server is using java nio to use only one thread for all the connections and using select to loop through them. this server is also using ObjectOutputStream to send data
for my question, what would be a better approach to an MMO server and if the single thread model is better how would sending an object over the socket channel be affected, would it not be read all the way and not get the full object?
for each object being sent over it just contains for example an int and 2 floats for sending position and player id.
I will relate this question to why MMO use UDP over TCP. The reason being that UDP promises fast delivery whereas TCP promises guaranteed delivery.
A similar analogy can be applied to a single-threaded vs a multi-threaded model.Regardless of which you choose, your overall CPU cycles remain the same i.e. the server can process only so much information per second.
Lets see what happens in each of these scenarios
1.Single-Threaded Model :
In this, your own implementation or the underlying library will end up creating a pipeline where the requests start queuing. If you are at min load, the queue will remain virtually empty and execution will be real-time, however a lot of CPU may be wasted. At max load, there will be a long queue-up and execution will have latency with increasing load, however delivery will be guaranteed and CPU utilization will be optimum. Typically a slow client will slow everybody else down.
Multi-Threaded Model :
In this, depending on how your own implementation or the underlying library implements mutli-threading, parallel execution of requests will start happening. The catch to MT is that it's easy to get fooled. For example, java.util.concurrent.ThreadPoolExecutor doesnt actually do any parallel processing unless you set the queue size to a low value. Once parallel processing starts happening, at min load, your execution will be superfast and CPU utilization will be optimum and game performance will be great. However, at max load your RAM usage will be high and CPU utilization will still be optimum. Typically you'll need to put thread interrupts to avoid a slow client hogging all the threads, which will mean glitchy performance for the slow client. Additionally as you start exhausting your thread pool and resources, threads will either get queued or just get dropped leading to glitchy performance.
In gaming, performance matters more over stability, hence there is no question that you should use MT wherever you can, however tuning your thread parameters to compliment your server resources will decide whether its a boon or a complete bane

Akka actor message needs memory pool

I a new in java. I'm c++ programmer and nowadays study java for 2 months.
Sorry for my pool English.
I have a question that if it needs memory pool or object pool for Akka actor model. I think if i send some message from one actor to one of the other actors, i have to allocate some heap memory(just like new Some String, or new Some BigInteger and other more..) And times on, the garbage collector will be got started(I'm not sure if it would be started) and it makes my application calculate slowly.
So I search for the way to make the memory-pool and failed(Java not supported memory pool). And I Could Make the object pool but in others project i did not find anybody use the object-pool with actor(also in Akka Homepage).
Is there any documents bout this topic in the akka hompage? Plz tell me the link or tell me the solution of my question.
Thanks.
If, as it's likely you will, you are using Akka across multiple computers, messages are serialized on the wire and sent to the other instance. This means that simply a local memory pool won't suffice.
While it's technically possible that you write a custom JSerializer (see the doc here) implementation that stores local messages in a memory pool after deserializing them, I feel that's a bit of an overkill for most applications (and easy to cock-up and actually worsen performance with lookup times in the map)
Yes, when the GC kicks in, the app will lag a bit under heavy loads. But in 95% of the scenarios, especially under a performant framework like Akka, GC will not be your bottleneck: IO will.
I'm not saying you shouldn't do it. I'm saying that before you take on the task, given its non-triviality, you should measure the impact of GC on your app at runtime with things like Kamon or other Akka-specialized monitoring solutions, and only after you are sure it's worth it you can go for it.
Using an ArrayBlockingQueue to hold a pool of your objects should help,
Here is the example code.
TO create a pool and insert an instance of pooled object in it.
BlockingQueue<YOURCLASS> queue = new ArrayBlockingQueue<YOURCLASS>(256);//Adjust 256 to your desired count. ArrayBlockingQueues size cannot be adjusted once it is initialized.
queue.put(YOUROBJ); //This should be in your code that instanciates the pool
and later where you need it (in your actor that receives message)
YOURCLASS instanceName = queue.take();
You might have to write some code around this to create and manage the pool.
But this is the gist of it.
One can do object pooling to minimise long tail of latency (by sacrifice of median in multythreaded environment). consider using appropriate queues e.g. from JCTools, Distruptor, or Agrona. Don't forget about rules of engagement for state exhange via mutable state using multiple thereads in stored objects - https://youtu.be/nhYIEqt-jvY (the best content I was able to find).
Again, don't expect to improve throughout using such slightly dangerous techniques. You will loose L1-L3 cache efficiency and will polite PCI with barriers.
Bit of tangent (to get sense of low latency technology):
One may consider some GC implementation with lower latency if you want to stick with Akka, or use custom reactive model where object pool is used by single thread, or memory copied over e.g. Distrupptor's approach.
Another alternative is using memory regions (the way Erlang VM works). It creates garbage, but in form easy to handle by GC!
If you go to very low latency IO and are the biggest enemy of latency - forget legacy TCP (vs RDMA over Infininiband), switches (over swichless), OS accessing disk via OS calls and file system (use RDMA), forget interrupts shared by same core, not pinned cores (and without spinning for input) to real CPU core (vs virtual/hyperthreads) or inter NUMA communication or messages one by one instead of hardware multicast (or better optical switch) for multiple consumers and don't forget turning Epsilon GC for JVM ;)

Why increasing number of threads is useless?

I've been looking for the way to increase the speed of processing messages received from rabbitmq queue. The only way I've found is make more than one threads doing the same - receiving and processing. And this gave me some profit. After I created 4 threads the speed quadrupled. As I have 8-core processor I've decided to increase the number of threads to 8. But this gives no performance increasing. YourKit shows that only 50% of CPU is used. Somebody can say that my app is lightweight and so it is so, but I can say that it can't do more work than it does regardless I produce much more what to do. Why this doesn't work?
There are many different issues that can constrain the maximum speed of some application on a given system. For example, it can be limited by memory bandwidth, by Amdahl's Law effects (time needed for non-parallel code, including synchronized blocks), I/O bandwidth, and cache space.
If you want further improvement you need to do some measurements and profiling to find where the time is going, and then work on that.
The short (and not particularly helpful) answer is "overheads and bottlenecks".
For instance:
Creating threads in Java is relatively expensive. If the amount of work done by a thread isn't large, the overhead of creating the thread can out-weigh the benefits.
Context switching between threads is relatively expensive, especially when you take account of memory-related overheads such as cache misses, TLB misses. (These overheads actually hit when a native thread is assigned to a core. If the OS can somehow keep a native thread on a single core continuously (i.e. with no other threads on the same core), then it can use spinlocking ... and avoid the context switch. But the more Java threads you have, the less likely it is that the OS can do this.)
The threads may be spending a large proportion of their time waiting for I/O to complete. The I/O system's throughput or the speed / latency of some external service can be a bottleneck.
You may have contention over data structures; e.g. threads requiring exclusive access to safely read or update (say) a shared Map. If threads regularly need to wait for others to release locks, then you have a bottleneck.
Your computation may be dominated by the costs of "feeding" the threads. For example, if there is a single master thread that hands out "work" to worker threads, then the master thread's activities could be the bottleneck; i.e. it may not be able to provide enough work to keep the workers busy.
Since your tags imply that you are using a message queue, it is possible that that is the bottleneck, especially if the messages are big or the "work" done on each one is relatively small.
(Using a separate separate message queue service is liable to increase context switches, add I/O latency, add protocol overheads and so on. It's not an automatic route to performance improvement for small-scale systems.)
It is also possible that you have "hyperthreaded" cores not real cores, or that the operating system is stopping your JVM from using all cores.
If CPU or waiting for IO is your bottle neck, adding independent threads can make a big difference.
If you have a shared resource is a bottleneck, e.g. your L3 cache, your network adapter, your kernel, adding threads won't help because CPU is not the problem. In fact it can often make it worse by adding overhead.
my app is lightweight
In which case CPU is unlikely to be your issue and you are doing well to see a speed up with more than 1 CPU. Most likely you are speeding up CPU used by RabbitMQ. Ideally it should be more efficient and this shouldn't really help much. IMHO, more efficient messaging solutions don't gain much by multiple CPUs as they will not be bottlenecked on CPU.
One way or another, you're only using 4 cores. There's a lot that can stop stop you from doubling your performance by doubling your threads, but from your 4 thread success you've gotten past all that. I'm guessing there's a bug in your code to set off 8 threads and it's only firing up 4. (Even with hyperthreading, you're going to get some improvement. Even with every possible problem, you're going to get some improvement.) Otherwise, I'll go with T.J.Crowder and Stephen C: I don't think you really have 8 cores.
I'd try using different numbers of threads: 3, 5, 6. See what changes. I think you'll stumble on the problem soon enough.
To be fair to Java: if you write thread-safe code and avoid bottlenecks, it handles threads really well, as you've noticed going from one thread to 4. I've always found the overhead costs to be trivial.
Your application does not have linear speedup, and therefore, it does not have good scalability.
In order to keep increase the number of threads you need to ensure the data being handle is growing accordingly. For a fixed amount of data, increasing number of threads (and/or cores) will have a diminishing return at some point since the overhead of creating threads will outweight the thread's compute time.
Make sure to look up the following link:
Ahmdahl's law
Gustafson's law
Gustafson's law is a great counterpoint to Ahmdal's law so I highly recommend understanding that article.

Java threading synchronization for I/O bound and CPU-heavy operations

The task is - need to process multiple I/O streams (HTTP downloads) with some CPU-heavy operation. Ideally would like to have full bandwidth and CPU 100% used. Of course - heavy CPU processing is slower then internet download. Unprocessed data could be cached to disk. Are there any existing Executors in ASF or other components providing this functionality? If not - what's the best way to achieve this? Thinking of having 2 thread pools one for Internet-To-Disk and other for Disk-To-CPU-To-Disk operations.
EDITED:
I'll clarify my question:
2 thread pools: Internet-To-Disk and Disk-To-CPU-To-Disk is producer/consumer approach itself. The question was HOW to make sure I've selected right number of threads for producers and consumers? Same code will work simultenously on different boxes, arches with different number of cores and different bandwidth. How to make sure I've chosen right number of threads so 100% bandwidth and 100% CPU are consumed?
Assuming that CPU processing is going to be the main bottleneck of your system, the number of threads for CPU processing should be, at the least, set to the number of CPUs or cores available.
I/O part is probably not going to use much CPU at all, but you may want to allocate a fixed pool of few threads (equal to, or less than, the number of cores) to prevent excess thread context switching for simultaneous I/O streams.
You may also set the number of threads for CPU processing to a number slightly bigger than the number of cores, if your CPU processing threads do not always use 100% of CPU from start to finish. For example, if they may do some I/O or access some shared resource in the middle of processing.
But as with any system, the ideal number of threads will greatly depend on the nature of your program. You can use tools like JVisual VM (bundled with JDK) to analyse how threads are utilised in your program, and try different thread setting variations.
You can use producer-consumer for this purpose. Use as many producers and consumers as its needed to fulfill the needs.
If your CPU stage is more intensive than the download time, why not just download the data as you are able to process it. That way you can have multiple Internet-To-CPU-To-Disk processes. By skipping a stage it may be faster, and it will certainly be simpler.
I'd go for a producer-consumer architecture : one thread pool to process the data (managed by an ExecutorService), and one or more threads to download the data from the internet.
The data to be processed would be put into a bounded blocking queue (ex: LinkedBlockingQueue), so that the downloading threads would only fetch data when required (that is, when a computing thread is able to process new data). Plus, this structure guaranteed thread safety and memory publication.

Connection Pool Strategy: Good, Bad or Ugly?

I'm in charge of developing and maintaining a group of Web Applications that are centered around similar data. The architecture I decided on at the time was that each application would have their own database and web-root application. Each application maintains a connection pool to its own database and a central database for shared data (logins, etc.)
A co-worker has been positing that this strategy will not scale because having so many different connection pools will not be scalable and that we should refactor the database so that all of the different applications use a single central database and that any modifications that may be unique to a system will need to be reflected from that one database and then use a single pool powered by Tomcat. He has posited that there is a lot of "meta data" that goes back and forth across the network to maintain a connection pool.
My understanding is that with proper tuning to use only as many connections as necessary across the different pools (low volume apps getting less connections, high volume apps getting more, etc.) that the number of pools doesn't matter compared to the number of connections or more formally that the difference in overhead required to maintain 3 pools of 10 connections is negligible compared to 1 pool of 30 connections.
The reasoning behind initially breaking the systems into a one-app-one-database design was that there are likely going to be differences between the apps and that each system could make modifications on the schema as needed. Similarly, it eliminated the possibility of system data bleeding through to other apps.
Unfortunately there is not strong leadership in the company to make a hard decision. Although my co-worker is backing up his worries only with vagueness, I want to make sure I understand the ramifications of multiple small databases/connections versus one large database/connection pool.
Your original design is based on sound principles. If it helps your case, this strategy is known as horizontal partitioning or sharding. It provides:
1) Greater scalability - because each shard can live on separate hardware if need be.
2) Greater availability - because the failure of a single shard doesn't impact the other shards
3) Greater performance - because the tables being searched have fewer rows and therefore smaller indexes which yields faster searches.
Your colleague's suggestion moves you to a single point of failure setup.
As for your question about 3 connection pools of size 10 vs 1 connection pool of size 30, the best way to settle that debate is with a benchmark. Configure your app each way, then do some stress testing with ab (Apache Benchmark) and see which way performs better. I suspect there won't be a significant difference but do the benchmark to prove it.
If you have a single database, and two connection pools, with 5 connections each, you have 10 connections to the database. If you have 5 connection pools with 2 connections each, you have 10 connections to the database. In the end, you have 10 connections to the database. The database has no idea that your pool exists, no awareness.
Any meta data exchanged between the pool and the DB is going to happen on each connection. When the connection is started, when the connection is torn down, etc. So, if you have 10 connections, this traffic will happen 10 times (at a minimum, assuming they all stay healthy for the life of the pool). This will happen whether you have 1 pool or 10 pools.
As for "1 DB per app", if you're not talking to an separate instance of the database for each DB, then it basically doesn't matter.
If you have a DB server hosting 5 databases, and you have connections to each database (say, 2 connection per), this will consume more overhead and memory than the same DB hosting a single database. But that overhead is marginal at best, and utterly insignificant on modern machines with GB sized data buffers. Beyond a certain point, all the database cares about is mapping and copying pages of data from disk to RAM and back again.
If you had a large redundant table in duplicated across of the DBs, then that could be potentially wasteful.
Finally, when I use the word "database", I mean the logical entity the server uses to coalesce tables. For example, Oracle really likes to have one "database" per server, broken up in to "schemas". Postgres has several DBs, each of which can have schemas. But in any case, all of the modern servers have logical boundaries of data that they can use. I'm just using the word "database" here.
So, as long as you're hitting a single instance of the DB server for all of your apps, the connection pools et al don't really matter in the big picture as the server will share all of the memory and resources across the clients as necessary.
Excellent question. I don't know which way is better, but have you considered designing the code in such a way that you can switch from one strategy to the other with the least amount of pain possible? Maybe some lightweight database proxy objects could be used to mask this design decision from higher-level code. Just in case.
Database- and overhead-wise, 1 pool with 30 connections and 3 pools with 10 connections are largely the same assuming the load is the same in both cases.
Application-wise, the difference between having all data go through a single point (e.g. service layer) vs having per-application access point may be quite drastic; both in terms of performance and ease of implementation / maintenance (consider having to use distributed cache, for example).
Well, excellent question, but it's not easy to discuss using a several data bases (A) approach or the big one (B):
It depends on the database itself. Oracle, e.g. behaves differently from Sybase ASE regarding the LOG (and therefore the LOCK) strategy. It might be better to use several different & small data base to keep lock contention rate low, if there is a lot of parallel writes and the DB is using a pessimistic lock strategy (Sybase).
If the table space of the small data bases aren't spread over several disks, it might better be using one big data base for using the (buffer/cache) memory only for one. I think this is rarely the case.
Using (A) is scales better for a different reason than performance. You're able moving a hot spot data base on a different (newer/faster) hardware when needed without touching the other data bases. In my former company this approach was always cheaper than variant (B) (no new licenses).
I personally prefer (A) for reason 3.
Design, architecture, plans and great ideas falls short when there is no common sense or a simple math behind the. Some more practice and/or experience helps ... Here is a simple math of why 10 pools with 5 connections is not the same as 1 pool with 50 connection:
each pool is configured with min & max open connections, fact is that it will usually use (99% of the time) 50% of the min number (2-3 in case of 5 min) if it is using more that that this pool is mis-configured since it is opening and closing connections all the time (expensive) ... so we 10 pools with 5 min connections each = 50 open connections... means 50 TCP connections; 50 JDBC connections on top of them ... (have you debug a JDBC connection? you will be surprise how much meta data flows both ways ...)
If we have 1 pool (serving the same infrastructure above) we can set the min to 30 simple because it will be able to balance the extras more efficiently ... this means 20 less JDBS connections. I don't know about you but for me this is a lot ...
The devil s in the detail - the 2-3 connections that you leave in each pool to make sure it doesn't open/close all the time ...
Don't even want to go in the overhead of 10 pool management ... (I do not want to maintain 10 pools every one ever so different that the other, do you?)
Now that you get me started on this if it was me I would "wrap" the DB (the data source) with a single app (service layer anyone?) that would provide diff services (REST/SOAP/WS/JSON - pick your poison) and my applications won't even know about JDBC, TCP etc. etc. oh, wait google has it - GAE ...

Categories