Ive been reading for the past few days about the difference between io and nio when building a java socket server. for my use i need a server which can run lots of connections and ion supposed to do the trick.
My only fear is that it a bit slower and a bit harder to implement instead of running a thread for each connection. so i thought why dont i use the same easiness of threads and the logic of java.nio and build a server has a thread which checks all the open connection and when ever there is a new event, it will open a new thread for handing it. i think in that way im enjoying both of the worlds... what do u suggest?
NIO almost entirely relies on JNI, so if you want to implement it again, you'll actually have to write loads of C/++ and OS API interface code.
I think the existing Java implementations are already quite good. For example, the Selector class wraps the system call for waiting on multiple file descriptors. There's hardly anything you can do to improve the efficiency of that.
I suggest you don't understand the point of NIO, which is to only use one thread. It is certainly complex and it is arguable whether you have any need for it at all below 1000 clients, probably 10,000, possibly even 100,000. I would implement your server with java.net to get it running and save java.nio for phase 2 if you ever get there and prove to yourself that you really need it.
EDIT: I would certainly forget this concept of rolling your own. You're wildly underestimating the task (It took Sun 1.4.0, 1.4.1, 1.4.2 before it really worked properly), and you seem to be aiming to get the worst of both worlds. You won't be able to get any more out of it than Sun did with java.nio, as there isn't any more. Arguably a bit less ;-)
Related
I have the following situation:
I have 2 JVM processes (really 2 java processes running separately, not 2 threads) running on a local machine. Let's call them ProcessA an ProcessB.
I want them to communicate (exchange data) with one another (e.g. ProcessA sends a message to ProcessB to do something).
Now, I work around this issue by writing a temporary file and these process periodically scan this file to get message. I think this solution is not so good.
What would be a better alternative to achieve what I want?
Multiple options for IPC:
Socket-Based (Bare-Bones) Networking
not necessarily hard, but:
might be verbose for not much,
might offer more surface for bugs, as you write more code.
you could rely on existing frameworks, like Netty
RMI
Technically, that's also network communication, but that's transparent for you.
Fully-fledged Message Passing Architectures
usually built on either RMI or network communications as well, but with support for complicated conversations and workflows
might be too heavy-weight for something simple
frameworks like ActiveMQ or JBoss Messaging
Java Management Extensions (JMX)
more meant for JVM management and monitoring, but could help to implement what you want if you mostly want to have one process query another for data, or send it some request for an action, if they aren't too complex
also works over RMI (amongst other possible protocols)
not so simple to wrap your head around at first, but actually rather simple to use
File-sharing / File-locking
that's what you're doing right now
it's doable, but comes with a lot of problems to handle
Signals
You can simply send signals to your other project
However, it's fairly limited and requires you to implement a translation layer (it is doable, though, but a rather crazy idea to toy with than anything serious.
Without more details, a bare-bone network-based IPC approach seems the best, as it's the:
most extensible (in terms of adding new features and workflows to your
most lightweight (in terms of memory footprint for your app)
most simple (in terms of design)
most educative (in terms of learning how to implement IPC). (as you mentioned "socket is hard" in a comment, and it really is not and should be something you work on)
That being said, based on your example (simply requesting the other process to do an action), JMX could also be good enough for you.
I've added a library on github called Mappedbus (http://github.com/caplogic/mappedbus) which enable two (or many more) Java processes/JVMs to communicate by exchanging messages. The library uses a memory mapped file and makes use of fetch-and-add and volatile read/writes to synchronize the different readers and writers. I've measured the throughput between two processes using this library to 40 million messages/s with an average latency of 25 ns for reading/writing a single message.
What you are looking for is inter-process communication. Java provides a simple IPC framework in the form of Java RMI API. There are several other mechanisms for inter-process communication such as pipes, sockets, message queues (these are all concepts, obviously, so there are frameworks that implement these).
I think in your case Java RMI or a simple custom socket implementation should suffice.
Sockets with DataInput(Output)Stream, to send java objects back and forth. This is easier than using disk file, and much easier than Netty.
I tend to use jGroup to form local clusters between processes. It works for nodes (aka processes) on the same machine, within the same JVM or even across different servers.
Once you understand the basics it is easy working with it and having the options to actually run two or more processes in the same JVM makes it easy to test those processes easily.
The overhead and latency is minimal if both are on the same machine (usually only a TCP rountrip of about >100ns per action).
socket may be a better choice, I think.
Back in 2004 I implement code which do the job with sockets. Until then, many times I search for a better solution, because socket approach triggers firewall and my clients worry. There is no better solution until now. Client must serialize your data, send and server must receive and unserialize.
It is easy.
We all have learned from trial and error that multiple blocking threads do not scale well, and that we should switch to using NIO where we see possible. Yet, there are not as many resources explaining why non-blocking is better by giving an under-the-hood example of how it actually works.
We all have learned from trial and error that multiple blocking threads do not scale well,
This was true 10 years ago, however in general, using blocking IO and NIO works well enough. Unless you have a very large number of connections and a service which does very little, you can support up to 1000 connections on a modern server comfortably. Don't forget servers are faster now, have much more cores, and people expect servers to do more work. i.e. the bottleneck is in the application not the IO.
we should switch to using NIO where we see possible.
The main benefit is reduced thread over head. As I mentioned this is not as high as it was when NIO was introduced more than ten years ago.
NIO is much harder to work with so I would suggest only using it if you really need to.
there are not as many resources explaining why non-blocking is better
The explanation is; you use less threads, thus you have a lower overhead. This only matters if the work each thread does is very little.
Note: It is often assumed that NIO mean non-blocking when actually the default behaviour of all the Channels in NIO is blocking. In fact in NIO, only TCP can be configured to be non-blocking. This is the exception rather than the rule.
Note2: the fastest way to handle a small number of connections is to use blocking NIO.
Finally, another benefit of using NIO is reduced copying of data if you use "direct" or native buffers. However, again you need to be doing bulk transfers of data, as soon as you start reading/writing the data in a byte by byte manner e.g. as text, this overhead swamps the gains you might have made.
by giving an under-the-hood example of how it actually works.
Most of the under the hood differences are either, not as much as you might imagine, or handled entirely by the operating system and thus obscured from Java or even the JVM.
I just thinking to build a cluster software in Linux in Java. I want to control the CPU load, for example, if CPU load is higher than a threshold, then I reduce execution thread sizes. I thought I could check CPU load once per second or a couple of seconds by a demon thread, how to implement it in Java? and How to implment in Java if I am going to check particular process is dead or not, and the port it opens is lost or not?
There's no way (AFAIK) to do this in pure Java. You could potentially write some native C to do this and then interface using JNA / JNI (which would be the most robust solution.)
Alternatively, a quick hacky easy approach (if you're just using Linux) would be to use Runtime.exec() to call one of these native approaches which you could then parse from within Java.
In terms of checking whether a process is dead or not, you could use the above approach but with ps.
EDIT: This may be something that helps.
Here's the situation:
I currently have a web application that uses PHP to serve HTML/CSS/JS and that talks to a MySQL DB. Completely vanilla and common. The PHP is a mixture of presentation logic (HTML generation, etc) and business logic (the app uses Ajax extensively to make requests for data or to tell the server to make changes to something).
As part of a redesign of this system I am removing all of the presentation logic from the PHP. Instead, I will be using Ext JS 4 (a javascript-based windowing toolkit / app) connected to a web socket gateway (a COMET/AJAX replacement that allows bi-directional communication) on the server. Let's wave a magic wand for a minute and forget about how the Ext JS 4 gets delivered to the browser and how it talks to the web socket gateway.
What we are left with is a web socket gateway (written in Java and running persistently listening on a specific port for web socket connections) and some business logic / DB interaction currently written in PHP.
At this point, I see one of two options:
Keep the business logic / DB interaction in the PHP and execute it by calling either PHP from the command line or by having the PHP / Apache listen on a different port only for communications from the web socket gateway.
Write a new Java or C++ application that will be persistent and listen on a specific port for communications from the web socket gateway. The business logic / DB integration is re-written in Java or C++ code and is part of this application.
Would re-writing in Java or C++ give better performance than calling PHP over and over? (The PHP code is pretty cleanly written: object-oriented using packages like CodeIgniter and Doctrine).
Would the performance benefits outweigh the hassle of re-writing all the business logic? Obviously dependent on many factors such as quantity of code but what is your gut feeling?
In case it might influence your thinking / feedback, you should know that the web socket gateway (Kaazing) supports JMS, Stomp, AMQP, XMPP, or something custom you build yourself.
Let me know if there is any other info I can provide to help you with your answers.
Thanks!
I know a lot of the solutions I mention here are "ugly" but you sound like a person who's looking to get results and refactor, so I hope it's okay.
Do it the easy way (PHP if I understood correctly) first. Then run a realistic stress test. Since you're making PHP calls, just create a realistic sequence (log in, change this, do that, log out) and run as many as you think is realistic. 100? 10000? It depends on how stressed you expect this thing to be and still preform.
That step is easier than it sounds. Don't think "ultimate test framework", think 20 line python script that runs as many threads as you want executing a few lines that will keep your application busy. If it takes you more than 40 minutes, stop and simplify. The hour you spend will be worth it.
If CPU hits 100 or you run out of some resource then perhaps it's time for a rewrite, or you can probably guess what's taking the longest and write it in C. If you do use C/C++ and you're not 100% comfortable with it, avoid a major rewrite, since it's a dangerous language with lots of opportunities for introducing bugs. Maybe even call compiled code from the PHP you have if that suits your application.
I've written server-side HTML-generating C code once. It's not exactly the right tool for the job. PHP may be hackish but it gets the job done fast. I would avoid optimization unless/until it is actually needed.
Good luck, don't forget to tell us how it goes!
Edit: If you do go for a mixed-language solution, don't forget to clean it up after! Standardize what you do fast and what you do in PHP, do it in a common format, maybe write up a short readme. Again, those fifteen minutes will save you, or the next person, a few days and many hairs.
Writing in a compiled language (Java or C++, in your examples) would almost certainly give better performance than an interpreted language like PHP. The performance benefits almost certainly would not outweigh the hassle of rewriting all of the code.
If your business logic has high processing costs, Java or C++ will give you a much better performance.
If you are simply fetching some results from a DB, do not expect any great performance gains.
I would do some prototyping/testing to identify the performance bottleneck.
My opinion is that PHP is too slow for processing HUGE datasets if you have many 100,000s of objects to analyse C++ rocks and Java benefits from the HotSpot JIT performance optimizer.
The HotSpot effect is very specific to doing number crunching in Java. You really can see the JRE is pushing the accelerator, ironing out detected bottlenecks. In some rare cases HotSpot JIT optimised Java can be even faster than C.
In some also very rare cases HotSpot performance voodooism can make your code slower!
Have you ever thought of turning a PHP application into a faster Java or C++ app?
Maybe the HipHop php2cpp compiler is all you want:
https://github.com/facebook/hiphop-php/wiki/
Quercus is a php4java runtime which can help you migrate more cheaply to Java.
http://quercus.caucho.com/
Quite interesting was Joshua Bloch's talk about "Performance Anxiety" last year.
http://www.wiki.jvmlangsummit.com/images/1/1d/PerformanceAnxiety2010.pdf
http://parleys.com/#st=5&id=2103 (32min video)
I can create multiple threads for supporting multi-client feature in socket programming; that's working fine. But if 10,000 clients want to be connected, my server cannot create so many threads.
How can I manage the threads so that I can listen to all these clients simultaneously?
Also, if in this case the server wants to send something to a particular client, then how is it possible?
You should investigate Java's NIO ("New I/O") library for non-blocking network programming. NIO was designed to solve precisely the server scalability problem you are facing!
Introductory article about NIO: Building Highly Scalable Servers with Java NIO
Excerpts from O'Reilly's Java NIO book
Highly scalable socket programming in Java requires the selectable channels provided in the "New I/O", or NIO packages. By using non-blocking IO, a single thread can service many sockets, tending only to those sockets that are ready.
One of the more scalable open-source NIO applications is the Grizzly component of the Glassfish application server. Jean-Francois Arcand has written a number of informative, in-depth blog posts about his work on the project, and covers many subtle pitfalls in writing this kind of software with NIO.
If the concept of non-blocking IO is new to you, using existing software like Grizzly, or at least using it as a starting point for your adaptation, might be very helpful.
The benefits of NIO are debatable. See Paul Tyma's blog entries here and here.
A thread-per-connection threading model (Blocking Socket I/O) will not scale too well. Here's an introduction to Java NIO which will allow you to use non-blocking socket calls in java:
http://today.java.net/cs/user/print/a/350
As the article states, there are plenty of frameworks available so you don't have to roll your own.
As previously mentioned, 10.000 clients is not easy. For java, NIO (possibly augmented with a separate threadpool to handle each request without blocking the NIO thread) is usual way to handle a large amount of clients.
As mentioned, depending on implementation, threads might actually scale, but it depends a lot on how much interaction there is between client connections. Massive threads are more likely to work if there is little synchronization between the threads.
That said, NIO is notoriously difficult to get 100% right the first time you implement it.
I'd recommend either trying out, or at least looking at the source for the Naga NIO lib at naga.googlecode.com. The codebase for the lib is small compared to most other NIO frameworks. You should be able to quickly implement a test to see if you can get 10.000 clients up and running.
(The Naga source also happens to be free to modify or copy without attributing the original author)
This is not a simple question, but for a very in depth (sorry, not in java though) answer see this: http://www.kegel.com/c10k.html
EDIT
Even with nio, this is still a difficult problem. 10000 connections is a tremendous resource burden on the machine, even if you are using non-blocking sockets. This is why large web sites have server farms and load balancers.
Why don't you process only a certain amount of requests at a time.
Let's say you want to process a maximum of 50 requests at a time (for not creating too many threads)
You create a threadpool of 50 threads.
You put all the requests in a Queue (accept connections, keep sockets open), and each thread, when it is done, gets the next request then process it.
This should scale more easily.
Also, if the need arise, it will be easier to do load balancing, since you could share your queues for multiple servers
Personally I would rather use create a custom I/O non blocking setup, for example using one thread to accept clients and using one other thread to process them (checking if any input is available and writing data to the output if necessary).
You'll have to figure out why your application is failing at 10,000 threads.
Is there a hard limit to the number of threads in the JVM or the OS? If so, can it be lifted?
Are you running out of memory? Try configuring a smaller stack size per thread, and/or add more memory to the server.
Something else? Fix it.
Only once you have determined the source of the problem will you be able to fix it. In theory 10,000 threads should be OK but at that level of concurrency it requires some extra tuning of the JVM and operating system if you want it to work out.
You can also consider NIO but I think it can work fine with threads as well.