In our company we are building a high demand system for sending SMS to different clients and providers through SMPP and also directly using modems.
The system handles different requests, and connects to a database to select messages and update their status (sent, received, error etc). We receive demands for sending SMS that are queued according to priorities, and released by different channels according to what is requested. Right now, is necessary to generate threads to handle the different channels concurrently, but this makes the system run slow as the transactions can be numerous.
We are interested in develop a new system, that should not have too many problems with concurrency and that would maximize the capacity to take advantage of our server processors.
To our understanding, our problems could be solved remaking the system with a different handling of threads for the requests,
¿Which architecture, framework or library would you recommend for handling this problem, which will provide the best performance?
We are currently considering: Java 7 Fork/Join, IBIS (MPJ, GMI, Satin) and AKKA (Actors library), but it is not a limitation. Is also desirable that the system is not tied to the architecture, and may be scalable and migrated to a cloud service.
PD: The current system does generate one thread per message to send, and use somehow thread pools, but not at all in an optimized way. Apart from improving that poor implementation we would like something to improve the overall performance taking advantage of all our resources (cores, processors).
Right now, is necessary to generate threads to handle the different channels concurrently, but this makes the system run slow as the transactions can be numerous.
The implicate in this statement is that it is the threads that is making the system slow and not the transaction bandwidth. What is your evidence about this?
The only way threads could create problems is if there were so many of them that you were running into memory issues and the system was slow because of GC overhead. Each thread allocates a large contiguous stack space (by default 512k) so 2000 threads (for example) will consume 1gb of core.
One way to verify that the threads are the problem is to watch the memory usage of your application using jconsole or something. If all of your memory buckets are full and the GC button does little to nothing then you are correct. Another thing to try is to use fixed sized thread-pools instead of forking a thread for each request you get. If this improves your system performance, but decreases your transaction throughput then you are correct.
Since the SMPP protocol seems to be TCP/IP, you don't want all of your threads to be sitting in wait loops. Writing your own SMPP protocol using NIO is possible if you know your NIO fu.
I'd also do some searches for java NIO SMPP libraries. A quick search took me to JSMPP. I have no experience with it however.
JSMPP is a java implementation (SMPP API) of SMPP protocol (currently support SMPP v3.4). It provides interfaces to communicate with Message Center or ESME (External Short Message Entity) and able to handle traffic 3000-5000 messages per second.
https://github.com/twitter/cloudhopper-smpp
Twitter's NIO SMPP library built on Netty. Currently used to support hundreds of operator binds sending/receiving billions of messages per month. Solves the problem of needing a thread per bind/message. There are examples of how to use it in cloudhopper-smpp/src/test/java/com/cloudhopper/smpp/demo/
Related
I am developing server-client communication based system and I am trying to determine the most optimal way to handle multiple clients. What is important I really don't want to use any third-party libraries.
In many places in the Internet I saw this resolved by creating a separate thread for each connections, but I don't think it is the best way when I assume there will be a huge number of connections (maybe I'm wrong). So, solution that I'm thinking of is
Creating queue of events and handling them by workers - the defined pool of threads (where there is a constant number n of workers). This solution seems to be pretty slow, but I can not imagine how big difference will be in case of handling huge amount of clients.
I've been thinking also about load-balancing via multiinstantiatig the server (on different physical machines) but it is only a nice add-on to any solution, not the solution itself.
I am aware that Java is not really async-friendly, but maybe I lack some knowledge and there is nice solution. I'll be grateful for any sugestions.
Additional info:
I assume really big number of connections
Every connection will last for a long time (days, maybe weeks)
Program will need to send some data to specified client quite frequently
Each client will send data to server about once a 3 seconds
To avoid discussion (as SO is not a place for them):
One client - one thread
Many clients - constant number of threads and events pool
Any async-like solution, that I'm not aware of
Anything else?
I'd suggest starting off with the simple architecture of one thread per connection. Modern JVMs on sufficiently sized systems can support thousands of threads. You might be pleasantly surprised at how well even this simple scheme works. If you need 300k connections, though, I doubt that one thread per connection will work. (But I've been wrong before.) You might have to fiddle with the thread stack size and OS resource limits.
A queueing system will help decouple the connections from the threads handling the work, but it will add to the amount of work done per message received from each client. This will also add to latency (but I'm not sure how important that is). If you have 300k connections, you'll probably want to have a pool of threads reading from the connections, and you'll also want to have more than one queue through which the work flows. If you have 300k clients sending data once every 3 seconds, that's 100k ops/sec, which is a lot of data to shove through a single queue. That might turn into a bottleneck.
Another approach probably worth investigating is to have a pool of worker threads, but instead of each worker reading data from a queue written by connection reader threads, have each worker handle a bunch of sockets directly. Use a NIO Selector to have each thread wait on multiple sockets. Say, have 100 threads each handling 3,000 sockets. Or perhaps have 1,000 threads each handling 300 sockets. This depends on the amount of data and the amount of work necessary to process each incoming message. You'll have to experiment. This will probably be considerably simpler than using asynchronous I/O.
Java 7 has true asynchronous IO under the NIO package I've heard. I don't know much about it other than its difficult to work with.
Basic IO in java is blocking. This means using a fixed number of threads to support many clients is likely not possible with basic IO as you could have all threads tied up in blocking calls reading from clients who aren't sending data.
I suggest you look in asynchronous IO with Grizzly/Netty, if you change your mind on 3rd party libraries.
If you haven't changed your mind, look into NIO yourself.
I made an application in java between two machines where each one at the time makes some computation over some data and send to the other to do its part. I managed to do it using sockets. Therefore, both machines play server and client depending on which part of the code they are running. However, it demands a lot of synchronization so that will have already the data to compute and so far I managed to do it with a Thread.sleep(); but since I put a big margin for the sleep time, it results on a lot of idle time.
I was wondering if there is any alternative to this so that I have automatic synchronization.
There is a java framework called apache MINA which abstracts the complexity and limit of pure sockets. You can find more details here: http://mina.apache.org/
I need some advice in building a Java server that handles multiple clients at the same time. The clients need to remain connected for fairly long periods of time. I'm currently using blocking IO and spawning a thread to read from each client that connects to the server, but this is obviously not scalable.
I've found a few options, including using Selector or Executor with fixed size thread pools. I am not too familiar with either one, so which would be the best solution here? Thanks!
It depends on your definition of scalable. The system you have described with a single thread per connection is scalable up to hundreds may be even a couple of thousand concurrent connections, it will hit a wall at some point.
Your question says that your clients connect and stay connected for an extended period of time, it would be possible to have a single IO thread to handle the reading and writing, but have the processing of the request dispatched to another thread using an Executor.
There are frameworks/servers that are already written to handle this sort of event driven design. Have a look at:
Netty recently used by twitter in there query server
Jetty (not to be confused with Netty) capable of NIO and very scalable, might be to HTTP focused
MINA
Grizzly
It's worth noting that the world is full of failed startups & software products that had really scalable architecture. Scaling is a nice problem to have, better to have the problem than not to have it and no customers.
using multiple threads is scalable. Apache for example does this, and some sites using it get many visitors. However, another approach would indeed be using selector, though I have no experience using it.
After all, this seems like a question, which religion is the best.
there's a lot of framework for this kind of job, examples
Netty
Apache MINA
Independently of scalability every server application has it's limits. By using blocking IO, one of your limits will be the number of threads that the VM can spawn because the approach you take is "one-thread-per-client". With NIO (of which Selector is one of the classes), the approach is "one-thread-per-request" which will run out of threads much latter.
Horizontal scalability ( http://en.wikipedia.org/wiki/Scalability#Scale_horizontally_vs._vertically ) of your app will not depend on either of these choices.
Is the non-blocking Java NIO still slower than your standard thread per connection asynchronous socket?
In addition, if you were to use threads per connection, would you just create new threads or would you use a very large thread pool?
I'm writing an MMORPG server in Java that should be able to scale 10000 clients easily given powerful enough hardware, although the maximum amount of clients is 24000 (which I believe is impossible to reach for the thread per connection model because of a 15000 thread limit in Java).
From a three year old article, I've heard that blocking IO with a thread per connection model was still 25% faster than NIO (namely, this document http://www.mailinator.com/tymaPaulMultithreaded.pdf), but can the same still be achieved on this day? Java has changed a lot since then, and I've heard that the results were questionable when comparing real life scenarios because the VM used was not Sun Java.
Also, because it is an MMORPG server with many concurrent users interacting with each other, will the use of synchronization and thread safety practices decrease performance to the point where a single threaded NIO selector serving 10000 clients will be faster? (all the work doesn't necessary have to be processed on the thread with the selector, it can be processed on worker threads like how MINA/Netty works).
Thanks!
NIO benefits should be taken with a grain of salt.
In a HTTP server, most connections are keep-alive connections, they are idle most of times. It would be a waste of resource to pre-allocate a thread for each.
For MMORPG things are very different. I guess connections are constantly busy receiving instructions from users and sending latest system state to users. A thread is needed most of time for a connection.
If you use NIO, you'll have to constantly re-allocate a thread for a connection. It may be a inferior solution, to the simple fixed-thread-per-connection solution.
The default thread stack size is pretty large, (1/4 MB?) it's the major reason why there can only be limited threads. Try reduce it and see if your system can support more.
However if your game is indeed very "busy", it's your CPU that you need to worry the most. NIO or not, it's really hard to handle thousands of hyper active gamers on a machine.
There are actually 3 solutions:
Multiple threads
One thread and NIO
Both solutions 1 and 2 at the same
time
The best thing to do for performance is to have a small, limited number of threads and multiplex network events onto these threads with NIO as new messages come in over the network.
Using NIO with one thread is a bad idea for a few reasons:
If you have multiple CPUs or cores, you will be idling resources since you can only use one core at a time if you only have one thread.
If you have to block for some reason (maybe to do a disk access), you CPU is idle when you could be handling another connection while you're waiting for the disk.
One thread per connection is a bad idea because it doesn't scale. Let's say have:
10 000 connections
2 CPUs with 2 cores each
only 100 threads will be block at any given time
Then you can work out that you only need 104 threads. Any more and you're wasting resources managing extra threads that you don't need. There is a lot of bookkeeping under the hood needed to manage 10 000 threads. This will slow you down.
This is why you combine the two solutions. Also, make sure your VM is using the fastest system calls. Every OS has its own unique system calls for high performance network IO. Make sure your VM is using the latest and greatest. I believe this is epoll() in Linux.
In addition, if you were to use
threads per connection, would you just
create new threads or would you use a
very large thread pool?
It depends how much time you want to spend optimizing. The quickest solution is to create resources like threads and strings when needed. Then let the garbage collection claim them when you're done with them. You can get a performance boost by having a pool of resources. Instead of creating a new object, you ask the pool for one, and return it to the pool when you're done. This adds the complexity of concurrency control. This can be further optimized with advance concurrency algorithms like non-blocking algorithms. New versions of the Java API have a few of these for you. You can spend the rest of your life doing these optimizations on just one program. What is the best solution for your specific application is probably a question that deserves its own post.
If you willing to spend any amount of money on powerful enough hardware why limit yourself to one server. google don't use one server, they don't even use one datacenter of servers.
A common misconception is that NIO allows non-blocking IO therefor its the only model worth benchmarking. If you benchmark blocking NIO you can get it 30% faster than old IO. i.e. if you use the same threading model and compare just the IO models.
For a sophisticated game, you are far more likely to run out of CPU before you hit 10K connections. Again it is simpler to have a solution which scales horizontally. Then you don't need to worry about how many connections you can get.
How many users can reasonably interact? 24? in which case you have 1000 independent groups interacting. You won't have this many cores in one server.
How much money per users are you intending to spend on server(s)? You can buy an 12 core server with 64 GB of memory for less than £5000. If you place 2500 users on this server you have spent £2 per user.
EDIT: I have a reference http://vanillajava.blogspot.com/2010/07/java-nio-is-faster-than-java-io-for.html which is mine. ;) I had this reviewed by someone who is a GURU of Java Networking and it broadly agreed with what he had found.
If you have busy connections, which means they constantly send you data and you send them back, you may use non-Blocking IO in conjunction with Akka.
Akka is an open-source toolkit and runtime simplifying the construction of concurrent and distributed applications on the JVM. Akka supports multiple programming models for concurrency, but it emphasizes actor-based concurrency, with inspiration drawn from Erlang. Language bindings exist for both Java and Scala.
Akka's logic is non-blocking so its perfect for asynchronous programming. Using Akka Actors you may remove Thread overhead.
But if your socket streams block more often, I suggest using Blocking IO in conjunction with Quasar
Quasar is an open-source library for simple, lightweight JVM concurrency, which implements true lightweight threads (AKA fibers) on the JVM. Quasar fibers behave just like plain Java threads, except they have virtually no memory and task-switching overhead, so that you can easily spawn hundreds of thousands of fibers – or even millions – in a single JVM. Quasar also provides channels for inter-fiber communications modeled after those offered by the Go language, complete with channel selectors. It also contains a full implementation of the actor model, closely modeled after Erlang.
Quasar's logic is blocking, so you may spawn, say 24000 fibers waiting on different connections. One of positive points about Quasar is, fibers can interact with plain Threads very easily. Also Quasar has integrations with popular libraries, such as Apache HTTP client or JDBC or Jersey and so on, so you may use benefits of using Fibers in many aspects of your project.
You may see a good comparison between these two frameworks here.
As most of you guys are saying that the server is bound to be locked up in CPU usage before 10k concurrent users are reached, I suppose it is better for me to use a threaded blocking (N)IO approach considering the fact that for this particular MMORPG, getting several packets per second for each player is not uncommon and might bog down a selector if one were to be used.
Peter raised an interesting point that blocking NIO is faster than the old libraries while irreputable mentioned that for a busy MMORPG server, it would be better to use threads because of how many instructions are received per player. I wouldn't count on too many players going idle on this game, so it shouldn't be a problem for me to have a bunch of non-running threads. I've come to realize that synchronization is still required even when using a framework based on NIO because they use several worker threads running at the same time to process packets received from clients. Context switching may prove to be expensive, but I'll give this solution a try. It's relatively easy to refactor my code so that I could use a NIO framework if I find there is a bottleneck.
I believe my question has been answered. I'll just wait a little bit more in order to receive even more insight from more people. Thank you for all your answers!
EDIT: I've finally chosen my course of action. I actually was indecisive and decided to use JBoss Netty and allow the user to switch between either oio or nio using the classes
org.jboss.netty.channel.socket.nio.NioServerSocketChannelFactory;
org.jboss.netty.channel.socket.oio.OioServerSocketChannelFactory;
Quite nice that Netty supports both!
You might get some inspiration from the former Sun sponsored project, now named Red Dwarf.
The old website at http://www.reddwarfserver.org/ is down.
Github to the rescue: https://github.com/reddwarf-nextgen/reddwarf
If you do client side network calls, most likely you just need plain socket io.
If you are creating server side technologies, then NIO would help you in separating the network io part from fulfillment/processing work.
IO threads configured as 1 or 2 for network IO. Worker threads are for actual processing part(which ranges from 1 to N, based on machine capabilities).
I can create multiple threads for supporting multi-client feature in socket programming; that's working fine. But if 10,000 clients want to be connected, my server cannot create so many threads.
How can I manage the threads so that I can listen to all these clients simultaneously?
Also, if in this case the server wants to send something to a particular client, then how is it possible?
You should investigate Java's NIO ("New I/O") library for non-blocking network programming. NIO was designed to solve precisely the server scalability problem you are facing!
Introductory article about NIO: Building Highly Scalable Servers with Java NIO
Excerpts from O'Reilly's Java NIO book
Highly scalable socket programming in Java requires the selectable channels provided in the "New I/O", or NIO packages. By using non-blocking IO, a single thread can service many sockets, tending only to those sockets that are ready.
One of the more scalable open-source NIO applications is the Grizzly component of the Glassfish application server. Jean-Francois Arcand has written a number of informative, in-depth blog posts about his work on the project, and covers many subtle pitfalls in writing this kind of software with NIO.
If the concept of non-blocking IO is new to you, using existing software like Grizzly, or at least using it as a starting point for your adaptation, might be very helpful.
The benefits of NIO are debatable. See Paul Tyma's blog entries here and here.
A thread-per-connection threading model (Blocking Socket I/O) will not scale too well. Here's an introduction to Java NIO which will allow you to use non-blocking socket calls in java:
http://today.java.net/cs/user/print/a/350
As the article states, there are plenty of frameworks available so you don't have to roll your own.
As previously mentioned, 10.000 clients is not easy. For java, NIO (possibly augmented with a separate threadpool to handle each request without blocking the NIO thread) is usual way to handle a large amount of clients.
As mentioned, depending on implementation, threads might actually scale, but it depends a lot on how much interaction there is between client connections. Massive threads are more likely to work if there is little synchronization between the threads.
That said, NIO is notoriously difficult to get 100% right the first time you implement it.
I'd recommend either trying out, or at least looking at the source for the Naga NIO lib at naga.googlecode.com. The codebase for the lib is small compared to most other NIO frameworks. You should be able to quickly implement a test to see if you can get 10.000 clients up and running.
(The Naga source also happens to be free to modify or copy without attributing the original author)
This is not a simple question, but for a very in depth (sorry, not in java though) answer see this: http://www.kegel.com/c10k.html
EDIT
Even with nio, this is still a difficult problem. 10000 connections is a tremendous resource burden on the machine, even if you are using non-blocking sockets. This is why large web sites have server farms and load balancers.
Why don't you process only a certain amount of requests at a time.
Let's say you want to process a maximum of 50 requests at a time (for not creating too many threads)
You create a threadpool of 50 threads.
You put all the requests in a Queue (accept connections, keep sockets open), and each thread, when it is done, gets the next request then process it.
This should scale more easily.
Also, if the need arise, it will be easier to do load balancing, since you could share your queues for multiple servers
Personally I would rather use create a custom I/O non blocking setup, for example using one thread to accept clients and using one other thread to process them (checking if any input is available and writing data to the output if necessary).
You'll have to figure out why your application is failing at 10,000 threads.
Is there a hard limit to the number of threads in the JVM or the OS? If so, can it be lifted?
Are you running out of memory? Try configuring a smaller stack size per thread, and/or add more memory to the server.
Something else? Fix it.
Only once you have determined the source of the problem will you be able to fix it. In theory 10,000 threads should be OK but at that level of concurrency it requires some extra tuning of the JVM and operating system if you want it to work out.
You can also consider NIO but I think it can work fine with threads as well.