I am writing a server side application using Java.
The server holds a number of users of the system. For each user, I want to synchronize its disk space with a remote network storage. Because synchronizations are independent, I am thinking to do them concurrently.
I am thinking to create one thread for each user and let the synchronization tasks to fire at the same time.
But the system can have tens of thousands of users. This means creating tens of thousand thread at one time and fire at the same time. I am not sure if this is something JVM can handle.
Even if it can handle this, will that be memory efficient because each thread have its own stack and this could be a big memory hit!
Please let me know your opinion.
Many thanks.
You could look at a fixed size thread pool giving a pool of threads to execute your task. This would give the benefit of multithreading with a sensible limit.
Check out Executors.newFixedThreadPool()
You should look into Non-blocking IO.
Here is a "random" article about it from courtesy of google:
http://www.developer.com/java/article.php/3837316/Non-Blocking-IO-Made-Possible-in-Java.htm
Personally I wouldn't have tens of thousands of users on a single machine. You won't be able to much per user with this many users active. You should be able to afford more than one machine.
You can have this many thread in Java but as you say this is not efficient. You can use an NIO library to manage multiple connection with each thread.
Libraries like
http://mina.apache.org/
http://www.jboss.org/netty
Are suitable.
Also interesting http://code.google.com/p/nfs-rpc/
Related
I know there are many open source server programs that leverage java.nio's non-blocking I/O, such as Mina. Many implementations use multiple selectors and multi-threading to handle selected events. It seems like a perfect design.
Is it? What is the bottleneck for an NIO-based server? It seems like there wouldn't be any?
Is there any need to control the number of connections? How would one do so?
With traditional blocking I/O, each connection must be handled by one or more dedicated threads. As the number of connections grows so does the number of required threads. This model works reasonably well with connection numbers into the hundreds or low thousands, but it doesn't scale well past that.
Multiplexing and non-blocking I/O invert the model, allowing one thread to service many different connections. It does so by selecting the active connections and only performing I/O when it's guaranteed the sockets are ready.
This is a much more scalable solution because now you're not having hordes of mostly-inactive threads sitting around twiddling their thumbs. Instead you have one or a few very active threads shuttling between all of the sockets. So what's the bottleneck here?
An NIO-based server is still limited by its CPU. As the number of sockets and the amount of I/O each does grows the CPU will be more and more busy.
The multiplexing threads need to service the sockets as quickly as possible. They can only work with one at a time. If the server isn't careful, there might be too much work going on in these threads. When that happens it can take some careful, perhaps difficult programming to move the work off-thread.
If the incoming data can't be processed immediately, it may be prudent to copy it off to a separate memory buffer so it's not sitting in the operating system's queue. That copying takes both time and additional memory.
Programs can't have an infinite number of file descriptors / kernel handles open. Every socket has associated read and write buffers in the OS, so you'll eventually run into the operating system's limits.
Obviously, you're still limited by your hardware and network infrastructure. A server is limited by its NIC, by the bandwidth and latency of other network hops, etc.
This is a very generic answer. To really answer this question you need to examine the particular server program in question. Every program is different. There may be bottlenecks in any particular program that aren't Java NIO's "fault", so to speak.
What is the bottleneck for an NIO-based server?
The network, memory, CPU, all the usual things.
It seems like there wouldn't be any?
Why?
Is there any need to control the number of connections?
Not really.
How would one do so?
Count them in and out, and deregister OP_ACCEPT while you're at the maximum.
I am providing a RESTful service that is being served by a servlet (running inside Tomcat 7.0.X and Ubuntu Linux). I'm already getting about 20 thousand queries per hour and it will grow much higher. The servlet receives the requests, prepares the response, inserts a records in a MySQL database table and delivers the response. The log in the database is absolutely mandatory. Untily recently, all this happened in a syncronous way. I mean, before the Tomcat thread delivered the response, it had to create the records in the database table. The problem is that this log used to take more than 90% of the total time, and even worse: when the database got slower then the service took about 10-15 seconds instead of just 20 miliseconds.
I recetly made an improvement: Each Tomcat thread creates an extra thread doing a "(new Thread(new certain Object)).start();" that takes care of the SQL insertion in an asyncronous way, so the response gets to the clients faster. But these threads take too much RAM when MySQL runs slower and threads multiply, and with a few thousands of them the JVM Tomcat runs of the memory.
What I need is to be able to accept as much HTTP requests as possible, to log every one of them as fast as possible (not syncronously), and to make everything fast and with a very low usage of RAM when MySQL gets slow and inserts need to queue. I think I need some kind of queue to buffer the entries when the speed of http request is higher than the speed of insertions in the database log.
I'm thinking about these ideas:
1- Creating some kind of FIFO queue myself, maybe using some of those Apache commons collections, and the some kind of thread that polls the collection and creates the database records. But what collection should I use? And how should I program the thread that polls it, so it won't monopolize the CPU? I think that a "Do while (true)...." would eat the CPU cycles. And that about making it thread safe? How to do it? I think doing it myself is too much effort and most likely I will reinvent the wheel.
2- log4J? I have never used it directly, but it seems that this framework is algo designed to creat "appenders" that talk to the database. Would that be the way to do it?
3- Using some kind of any other framework that specializes in this?
What would you suggest?
Thanks in advance!
What comes to mind right away is a queue like you said. You can use things like ActiveMQ http://activemq.apache.org/ or RabbitMQ http://www.rabbitmq.com/.
The idea is to just fire and forget. There should be almost no overhead to send the messages.
Then you can connect some "offline" to pick up messages off the queues and write them to the database at the speed you need.
I feel like I plug this all day on Stack Overflow, but we use Mule (http://www.mulesoft.org/) at work to do this. One of the great things about Mule is that you can explicitly set the number of threads that read from the queue and the number of threads that write to the database. It allows you fine grain control over throttling messages.
Definitely take a look at using a ThreadPoolExecutor. You can provide the thread pool size, and it will handle all the concurrency and queuing for you. Only possible issue is that if your JVM crashes for any reason, you'll lose any queued items in your pool.
http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/ThreadPoolExecutor.html
I would also definitely look into optimizing the MySQL database as much as possible. 20k entries per hour can get hairy pretty quickly. The better optimized your hardware, os, and indexes the quicker your inserts and smaller your queue will be.
First of all: Thanks a lot for your valuable suggestions!
So far I have found a partial solution to my need, and I already implemented it succesfully:
http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/LinkedBlockingQueue.html
Now I'm thinking about using also a queue provider if that gets full, as a failover solution. So far I have thought about Amazon's queue service, but it costs money. I will also check the queue solutions that Ryan suggested.
I'm working on a multi-user Java webapp, where it is possible for clients to use the webapp API to do potentially naughty things, by passing code which will execute on our server in a sandbox.
For example, it is possible for a client to write a tight while(true) loop that impacts the performance of other clients.
Can you guys think of ways to limit the damage caused by these sorts of behaviors to other clients' performance?
We are using Glassfish for our application server.
The halting problem show that there is no way that a computer can reliably identify code that will not terminate.
The only way to do this reliably is to execute your code in a separate JVM which you then ask the operating system to shut down when it times out. A JVM not timing out can process more tasks so you can just reuse it.
One more idea would be byte-code instrumentation. Before you load the code sent by your client, manipulate it so it adds a short sleep in every loop and for every method call (or method entry).
This avoids clients clogging a whole CPU until they are done. Of course, they still block a Thread object (which takes some memory), and the slowing down is for every client, not only the malicious ones. Maybe make the first some tries free, then scale the waiting time up with each try (and set it down again if the thread has to wait for other reasons).
Modern day app servers use Thread Pooling for better performance. The problem is that one bad apple can spoil the bunch. What you need is an app server with one thread or maybe process per request. Of course there are going to be trade offs. but the OS will handle making sure that processing time gets allocated evenly.
NOTE: After researching a little more what you need is an engine that will create another process per request. If not a user can either cripple you servlet engine by having servlets with infinite loops and then posting multiple requests. Or he could simply do a System.exit in his code and bring everybody down.
You could use a parent thread to launch each request in a separate thread as suggested already, but then monitor the CPU time used by the threads using the ThreadMXBean class. You could then have the parent thread kill any threads that are misbehaving. This is if, of course, you can establish some kind of reasonable criteria for how much CPU time a thread should or should not be using. Maybe the rule could be that a certain initial amount of time plus a certain additional amount per second of wall clock time is OK?
I would make these client request threads have lower priority than the thread responsible for monitoring them.
I have a program that starts up and creates an in-memory data model and then creates a (command-line-specified) number of threads to run several string checking algorithms against an input set and that data model. The work is divided amongst the threads along the input set of strings, and then each thread iterates the same in-memory data model instance (which is never updated again, so there are no synchronization issues).
I'm running this on a Windows 2003 64-bit server with 2 quadcore processors, and from looking at Windows task Manager they aren't being maxed-out, (nor are they looking like they are being particularly taxed) when I run with 10 threads. Is this normal behaviour?
It appears that 7 threads all complete a similar amount of work in a similar amount of time, so would you recommend running with 7 threads instead?
Should I run it with more threads?...Although I assume this could be detrimental as the JVM will do more context switching between the threads.
Alternatively, should I run it with fewer threads?
Alternatively, what would be the best tool I could use to measure this?...Would a profiling tool help me out here - indeed, is one of the several profilers better at detecting bottlenecks (assuming I have one here) than the rest?
Note, the server is also running SQL Server 2005 (this may or may not be relevant), but nothing much is happening on that database when I am running my program.
Note also, the threads are only doing string matching, they aren't doing any I/O or database work or anything else they may need to wait on.
My guess would be that your app is bottlenecked on memory access, i.e. your CPU cores spend most of the time waiting for data to be read from main memory. I'm not sure how well profilers can diagnose this kind of problem (the profiler itself could influence the behaviour considerably). You could verify the guess by having your code repeat the operations it does many times on a very small data set.
If this guess is correct, the only thing you can do (other than getting a server with more memory bandwidth) is to try and increase the locality of your memory access to make better use of caches; but depending on the details of the application that may not be possible. Using more threads may in fact lead to worse performance because of cores sharing cache memory.
Without seeing the actual code, it's hard to give proper advice. But do make sure that the threads aren't locking on shared resources, since that would naturally prevent them all from working as efficiently as possible. Also, when you say they aren't doing any io, are they not reading an input or writing an output either? this could also be a bottleneck.
With regards to cpu intensive threads, it is normally not beneficial to run more threads than you have actual cores, but in an uncontrolled environment like this with other big apps running at the same time, you are probably better off simply testing your way to the optimal number of threads.
I am working on a bittorrent client. While communicating with the peers the easiest way for me to communicate with them is to spawn a new thread for each one of them. But if the user wants to keep connections with large number of peers that my cause me to spawn a lot of threads.
Another solution i thought of is have one thread to iterate through peer objects and run them for e period.
I checked other libraries mostly in ruby( mine is in java ) and they spawn one thread for each new peer. Do you think spawning one thread will degrade performence if user sets the number of connections to a high number like 100 or 200?
It shouldn't be a problem unless you're running thousands of threads. I'd look into a compromise, using a threadpool. You can detect the number of CPUs at runtime and decide how many threads to spin up based on that, and then hand out work to the threadpool as it comes along.
You can avoid the problem altogether by using Non-blocking IO (java.nio.*).
I'd recommend using an Executor to keep the number of threads pooled.
Executors.newFixedThreadPool(numberOfThreads);
With this, you can basically add "tasks" to the pool and they will complete as soon as threads become available. This way, you're not exhausting all of the enduser's computer's threads and still getting a lot done at the same time. If you set it to like 16, you'd be pretty safe, though you could always allow the user to change this number if they wanted to.
No.....
Once I had this very same doubt and created a .net app (4 years ago) with 400 threads....
Provided they don't do a lot of work, with a decent machine you should be fine...
A few hundred threads is not a problem for most workstation-class machines, and is simpler to code.
However, if you are interested in pursuing your idea, you can use the non-blocking IO features provided by Java's NIO packages. Jean-Francois Arcand's blog contains a lot of good tips learned from creating the Grizzly connector for Glassfish.
Well in 32bit Windows for example there is actually a maximum number of native Threads you can create (2 Gigs / (number of Threads * ThreadStackSize (default is 2MB)) or something like that). So with too many connections you simply might run out of Virtual Memory address space.
I think a compromise might work: Use a Thread Pool with e.g. 10 Threads (depending on the machine) running and Distribute the connections evenly. Inside the Thread loop through the peers assigned to this Thread. And limit the maximum number of connections.
Use a thread pool and you should be safe with a fairly large pool size (100 or so). CPU will not be a problem since you are IO bound with this type of application.
You can easily make the pools size configurable and put in a reasonable maximum, just to prevent memory related issues with all the threads. Of course that should only occur if all the threads are actually being used.