How do I figure out and prove the optimal port/socket/thread ratio for my application?
At the moment I am considering something like this:
Each thread handles all the traffic of a single port, each client gets their own socket, and the sockets are split between the available ports, and thus threads. This solution is based on the assumption that I should create approximately one thread per CPU core, and that sockets are fairly cheap to open. Is this a good solution, and more importantly how do I mathematically prove that this, or any other solution, is a good one?
I know I can write a sample program for every solution and test the results, but I would much prefer a mathematical proof over an empirical one, especially where the test is done on a machine that does not reflect the server hardware and configuration.
I don't have much experience with ports and sockets, and I am having a tough time finding information to answer my question. The best resources I could find so far are these Stack Overflow questions:
Forcing multiple threads to use multiple CPUs when they are available
When Should I Use Threads?
What is the difference between a port and a socket?
If I simply overlooked someting, or are misunderstanding the way ports, sockets and threads are/should be used I will be quite content with a simple "rtfm:[link]" answer to point me in the right direction. However If you are feeling magnanimous and provide me with a good explanation I will be much obliged.
If you use non-blocking NIO, the optimal performance is 1 Thread / core. The reason for this is very simple:
With non-blocking IO, whenever there is work available, it will be executed by 1 available thread at full speed (as no blocking).
You can never exceed 100% CPU usage.
Related
I want to build a service that basically pulls data from various API's.
On a typical server, is there a thread limit that one should adhere too?
Does anyone have any experience building something similiar, how many threads was considered ideal and what kind of requests per second can one expect?
Is 100 threads too much? 200?
I realize this is something I'm going to have to test, but looking for someone who has built something similar in nature that can shed some experience on it.
It depends on you bottlenecks and your requirements. How fast do you need to complete the operations? Do the threads make IO? I know they make a lot of network requests from your explanation.
So the threads are going to wait on network. Why do you need many threads then, maybe async operations will be faster.
And in general, as Robert Harvey commented: It's going to take us longer to answer your question than it is for you to test it and tweak the number. The number of threads depends on all sorts of variables which you haven't specified, so any answer is going to be a guess
For your particular case it may be more suited to use an asynchronous style of programming. In this case you could achieve a large throughput of API calls using a small number of threads - it may be even comparable to the number of available cores.
There are several available libraries to achieve this (Twitter is the king here).
Finagle - General purpose, supports multiple transport protocols
Scrooge - For thrift only
Async Http Client - Java-oriented async http client
And there are many others.
I am writing an Android application to simultaneously upgrade the firmware of four-five medical devices. Should i go with traditional blocking I/O employing thread per connection approach or non-blocking NIO approach.
The program is already working fine for upgarding one device at a time.
What would have greater overhead here ? java NIO overhead or context switching overhead
Any help would be greatly appreciated.
If the number of devices is four or five, and isn't going to grow by at least two orders of magnitude any time soon, I'd suggest you stick with blocking I/O and thread per connection model due to its simplicity and the fact that you are unlikely to see any performance improvements from NIO in this case and a performance drop is actually quite possible. Thread switching works quite well even for many more tasks than 4 or 5 and the programming model is so much simpler. If the communication code uses sockets, there may be an advantage in using NIO, though, because Selectors allow you to detect some kinds of broken connections better than IO Streams. Even for that case, however, I would recommend using NIO in a thread-per-client model, as it will probably save you lots of work on the code.
java NIO overhead or context switching overhead
That's the wrong question. NIO doesn't have much in the way of overhead, but you have to implement the 'context switching' yourself in your application, you can't just pretend it's gone away with NIO ... and your code that schedules between channels may not be as efficient as the operating system's thread scheduler. The real question is whether the saving in threads and more specifically thread stacks has any significance versus the greatly increased complexity of NIO code. Unless you are planning to service tens of thousand of connections it generally doesn't.
It's worth noting that the NIO model using selectors originated prior to threads, i.e. when the choice was between more complex code and more processes. These days there is a significant school of thought that holds that you should use blocking I/O and threads in almost all circumstances. There is a study and a paper out there somewhere, by I think Peter Lawrey, but I don't have the citation to hand.
Is the non-blocking Java NIO still slower than your standard thread per connection asynchronous socket?
In addition, if you were to use threads per connection, would you just create new threads or would you use a very large thread pool?
I'm writing an MMORPG server in Java that should be able to scale 10000 clients easily given powerful enough hardware, although the maximum amount of clients is 24000 (which I believe is impossible to reach for the thread per connection model because of a 15000 thread limit in Java).
From a three year old article, I've heard that blocking IO with a thread per connection model was still 25% faster than NIO (namely, this document http://www.mailinator.com/tymaPaulMultithreaded.pdf), but can the same still be achieved on this day? Java has changed a lot since then, and I've heard that the results were questionable when comparing real life scenarios because the VM used was not Sun Java.
Also, because it is an MMORPG server with many concurrent users interacting with each other, will the use of synchronization and thread safety practices decrease performance to the point where a single threaded NIO selector serving 10000 clients will be faster? (all the work doesn't necessary have to be processed on the thread with the selector, it can be processed on worker threads like how MINA/Netty works).
Thanks!
NIO benefits should be taken with a grain of salt.
In a HTTP server, most connections are keep-alive connections, they are idle most of times. It would be a waste of resource to pre-allocate a thread for each.
For MMORPG things are very different. I guess connections are constantly busy receiving instructions from users and sending latest system state to users. A thread is needed most of time for a connection.
If you use NIO, you'll have to constantly re-allocate a thread for a connection. It may be a inferior solution, to the simple fixed-thread-per-connection solution.
The default thread stack size is pretty large, (1/4 MB?) it's the major reason why there can only be limited threads. Try reduce it and see if your system can support more.
However if your game is indeed very "busy", it's your CPU that you need to worry the most. NIO or not, it's really hard to handle thousands of hyper active gamers on a machine.
There are actually 3 solutions:
Multiple threads
One thread and NIO
Both solutions 1 and 2 at the same
time
The best thing to do for performance is to have a small, limited number of threads and multiplex network events onto these threads with NIO as new messages come in over the network.
Using NIO with one thread is a bad idea for a few reasons:
If you have multiple CPUs or cores, you will be idling resources since you can only use one core at a time if you only have one thread.
If you have to block for some reason (maybe to do a disk access), you CPU is idle when you could be handling another connection while you're waiting for the disk.
One thread per connection is a bad idea because it doesn't scale. Let's say have:
10 000 connections
2 CPUs with 2 cores each
only 100 threads will be block at any given time
Then you can work out that you only need 104 threads. Any more and you're wasting resources managing extra threads that you don't need. There is a lot of bookkeeping under the hood needed to manage 10 000 threads. This will slow you down.
This is why you combine the two solutions. Also, make sure your VM is using the fastest system calls. Every OS has its own unique system calls for high performance network IO. Make sure your VM is using the latest and greatest. I believe this is epoll() in Linux.
In addition, if you were to use
threads per connection, would you just
create new threads or would you use a
very large thread pool?
It depends how much time you want to spend optimizing. The quickest solution is to create resources like threads and strings when needed. Then let the garbage collection claim them when you're done with them. You can get a performance boost by having a pool of resources. Instead of creating a new object, you ask the pool for one, and return it to the pool when you're done. This adds the complexity of concurrency control. This can be further optimized with advance concurrency algorithms like non-blocking algorithms. New versions of the Java API have a few of these for you. You can spend the rest of your life doing these optimizations on just one program. What is the best solution for your specific application is probably a question that deserves its own post.
If you willing to spend any amount of money on powerful enough hardware why limit yourself to one server. google don't use one server, they don't even use one datacenter of servers.
A common misconception is that NIO allows non-blocking IO therefor its the only model worth benchmarking. If you benchmark blocking NIO you can get it 30% faster than old IO. i.e. if you use the same threading model and compare just the IO models.
For a sophisticated game, you are far more likely to run out of CPU before you hit 10K connections. Again it is simpler to have a solution which scales horizontally. Then you don't need to worry about how many connections you can get.
How many users can reasonably interact? 24? in which case you have 1000 independent groups interacting. You won't have this many cores in one server.
How much money per users are you intending to spend on server(s)? You can buy an 12 core server with 64 GB of memory for less than £5000. If you place 2500 users on this server you have spent £2 per user.
EDIT: I have a reference http://vanillajava.blogspot.com/2010/07/java-nio-is-faster-than-java-io-for.html which is mine. ;) I had this reviewed by someone who is a GURU of Java Networking and it broadly agreed with what he had found.
If you have busy connections, which means they constantly send you data and you send them back, you may use non-Blocking IO in conjunction with Akka.
Akka is an open-source toolkit and runtime simplifying the construction of concurrent and distributed applications on the JVM. Akka supports multiple programming models for concurrency, but it emphasizes actor-based concurrency, with inspiration drawn from Erlang. Language bindings exist for both Java and Scala.
Akka's logic is non-blocking so its perfect for asynchronous programming. Using Akka Actors you may remove Thread overhead.
But if your socket streams block more often, I suggest using Blocking IO in conjunction with Quasar
Quasar is an open-source library for simple, lightweight JVM concurrency, which implements true lightweight threads (AKA fibers) on the JVM. Quasar fibers behave just like plain Java threads, except they have virtually no memory and task-switching overhead, so that you can easily spawn hundreds of thousands of fibers – or even millions – in a single JVM. Quasar also provides channels for inter-fiber communications modeled after those offered by the Go language, complete with channel selectors. It also contains a full implementation of the actor model, closely modeled after Erlang.
Quasar's logic is blocking, so you may spawn, say 24000 fibers waiting on different connections. One of positive points about Quasar is, fibers can interact with plain Threads very easily. Also Quasar has integrations with popular libraries, such as Apache HTTP client or JDBC or Jersey and so on, so you may use benefits of using Fibers in many aspects of your project.
You may see a good comparison between these two frameworks here.
As most of you guys are saying that the server is bound to be locked up in CPU usage before 10k concurrent users are reached, I suppose it is better for me to use a threaded blocking (N)IO approach considering the fact that for this particular MMORPG, getting several packets per second for each player is not uncommon and might bog down a selector if one were to be used.
Peter raised an interesting point that blocking NIO is faster than the old libraries while irreputable mentioned that for a busy MMORPG server, it would be better to use threads because of how many instructions are received per player. I wouldn't count on too many players going idle on this game, so it shouldn't be a problem for me to have a bunch of non-running threads. I've come to realize that synchronization is still required even when using a framework based on NIO because they use several worker threads running at the same time to process packets received from clients. Context switching may prove to be expensive, but I'll give this solution a try. It's relatively easy to refactor my code so that I could use a NIO framework if I find there is a bottleneck.
I believe my question has been answered. I'll just wait a little bit more in order to receive even more insight from more people. Thank you for all your answers!
EDIT: I've finally chosen my course of action. I actually was indecisive and decided to use JBoss Netty and allow the user to switch between either oio or nio using the classes
org.jboss.netty.channel.socket.nio.NioServerSocketChannelFactory;
org.jboss.netty.channel.socket.oio.OioServerSocketChannelFactory;
Quite nice that Netty supports both!
You might get some inspiration from the former Sun sponsored project, now named Red Dwarf.
The old website at http://www.reddwarfserver.org/ is down.
Github to the rescue: https://github.com/reddwarf-nextgen/reddwarf
If you do client side network calls, most likely you just need plain socket io.
If you are creating server side technologies, then NIO would help you in separating the network io part from fulfillment/processing work.
IO threads configured as 1 or 2 for network IO. Worker threads are for actual processing part(which ranges from 1 to N, based on machine capabilities).
I remember 2 or 3 years ago reading a couple articles where people claimed that modern threading libraries were getting so good that thread-per-request servers would not only be easier to write than non-blocking servers but that they'd be faster, too. I believe this was even demonstrated in Java with a JVM that mapped Java threads to pthreads (i.e. the Java nio overhead was more than the context-switching overhead).
But now I see all the "cutting edge" servers use asynchronous libraries (Java nio, epoll, even node.js). Does this mean that async won?
Not in my opinion. If both models are well implemented (this is a BIG requirement) I think that the concept of NIO should prevail.
At the heart of a computer are cores. No matter what you do, you cannot parallelize your application more than you have cores. i.e. If you have a 4 core machine, you can ONLY do 4 things at a time (I'm glossing over some details here, but that suffices for this argument).
Expanding on that thought, if you ever have more threads than cores, you have waste. That waste takes two forms. First is the overhead of the extra threads themselves. Second is the time spent switching between threads. Both are probably minor, but they are there.
Ideally, you have a single thread per core, and each of those threads is running at 100% processing speed on their core. Task switching wouldn't occur in the ideal. Of course there is the OS, but if you take a 16 core machine and leave 2-3 threads for the OS, then the remaining 13-14 go towards your app. Those threads can switch what they are doing within your app, like when they are blocked by IO requirements, but don't have to pay that cost at the OS level. Write it right into your app.
An excellent example of this scaling is seen in SEDA http://www.eecs.harvard.edu/~mdw/proj/seda/ . It showed much better scaling under load than a standard thread-per-request model.
My personal experience is with Netty. I had a simple app. I implemented it well in both Tomcat and Netty. Then I load tested it with 100s of concurrent requests (upwards of 800 I believe). Eventually Tomcat slowed to a crawl and exhibited extremely bursty/laggy behavior. Whereas the Netty implementation simply increased response time, but continued with incredibly overall throughput.
Please note, this hinges on solid implementation. NIO is still getting better with time. We are learning how to tune our servers OSes to work better with it as well as how to implement the JVMs to better leverage the OS functionality. I don't think a winner can be declared yet, but I believe NIO will be the eventual winner, and it's doing quite well already.
It is faster as long as there is enough memory.
When there are too many connections, most of which are idle, NIO can save threads therefore save memory, and the system can handle a lot more users than thread-per-connection model.
CPU is not a direct factor here. With NIO, you effectively need to implement a threading model yourself, which is unlikely to be faster than JVM's threads.
In either choice, memory is the ultimate bottleneck. When load increases and memory used approaches max, GC will be very busy, and the system often demonstrate the symptom of 100% CPU.
Some time ago I found rather interesting presentation providing some insight on "why old thread-per-client model is better". There are even measurements. However I'm still thinking it through. In my opinion the best answer to this question is "it depends" because most (if not all) engineering decisions are trade offs.
Like that presentation said - there's speed and there's scalability.
One scenario where thread-per-request will almost certainly be faster than any async solution is when you have a relatively small number of clients (e.g. <100), but each client is very high volume. e.g. a realtime app where no more than 100 clients are sending/generating 500 messages a second each. Thread-per-request model will certainly be more efficient than any async event loop solution. Async scales better when there are many requests/clients because it doesn't waste cycles waiting on many client connections, but when you have few high volume clients with little waiting, it's less efficient.
From what I seen, authors of Node and Netty both recognize that these frameworks are meant to primarily address high volumes/many connections scalability situations, rather than being faster for for a smaller number of high volume clients.
I am developing a application for benchmarking purposes, for which I require to create large number of http connection in a short time, I created a program in java to test how much threads is java able to create, it turns out in my 2GB single core machine, the limit is variable between 5000 and 6000 with 1 GB of memory given to JVM after which it hits outofmemoryerror with heap limit reached.
It is suggested that erlang will be able to generate much more concurrent processes, I am willing to learn erlang if it is capable of solving the problem , can erlang be able to generate somewhere around 100000 processes which are essentially http requests waiting for responses, in a matter of few seconds without reaching any limit like memory error etc.,
According famous Richard Jones blog you can handle 100k connection almost out of the box. You have to increase process limit, see +P parameter and it needs little bit memory management trickery e.g. gc or hibernate. To achieve significantly more you have to do more hacking with C.
It can handle pretty much anything you throw at it. Erlang processes are extremely light weight.
See http://www.sics.se/~joe/apachevsyaws.html for a benchmark in concurrency between Yaws and Apache. It gives you a good idea of what's possible.
I thought an interesting thing for this is that a guy was able to get a million comet connections with mochiweb http://www.metabrew.com/article/a-million-user-comet-application-with-mochiweb-part-1
As previously stated, a lot is a good answer. And also given your requirement is to write a tsunami client, you can/should distribute code over to several erlang nodes.
Even better, you might want to check out Tsung. It is a distributed load testing application written in erlang.
And if you don't want to use it, I am pretty sure there's code you want to read in there.