I made an application in java between two machines where each one at the time makes some computation over some data and send to the other to do its part. I managed to do it using sockets. Therefore, both machines play server and client depending on which part of the code they are running. However, it demands a lot of synchronization so that will have already the data to compute and so far I managed to do it with a Thread.sleep(); but since I put a big margin for the sleep time, it results on a lot of idle time.
I was wondering if there is any alternative to this so that I have automatic synchronization.
There is a java framework called apache MINA which abstracts the complexity and limit of pure sockets. You can find more details here: http://mina.apache.org/
Related
I have the following situation:
I have 2 JVM processes (really 2 java processes running separately, not 2 threads) running on a local machine. Let's call them ProcessA an ProcessB.
I want them to communicate (exchange data) with one another (e.g. ProcessA sends a message to ProcessB to do something).
Now, I work around this issue by writing a temporary file and these process periodically scan this file to get message. I think this solution is not so good.
What would be a better alternative to achieve what I want?
Multiple options for IPC:
Socket-Based (Bare-Bones) Networking
not necessarily hard, but:
might be verbose for not much,
might offer more surface for bugs, as you write more code.
you could rely on existing frameworks, like Netty
RMI
Technically, that's also network communication, but that's transparent for you.
Fully-fledged Message Passing Architectures
usually built on either RMI or network communications as well, but with support for complicated conversations and workflows
might be too heavy-weight for something simple
frameworks like ActiveMQ or JBoss Messaging
Java Management Extensions (JMX)
more meant for JVM management and monitoring, but could help to implement what you want if you mostly want to have one process query another for data, or send it some request for an action, if they aren't too complex
also works over RMI (amongst other possible protocols)
not so simple to wrap your head around at first, but actually rather simple to use
File-sharing / File-locking
that's what you're doing right now
it's doable, but comes with a lot of problems to handle
Signals
You can simply send signals to your other project
However, it's fairly limited and requires you to implement a translation layer (it is doable, though, but a rather crazy idea to toy with than anything serious.
Without more details, a bare-bone network-based IPC approach seems the best, as it's the:
most extensible (in terms of adding new features and workflows to your
most lightweight (in terms of memory footprint for your app)
most simple (in terms of design)
most educative (in terms of learning how to implement IPC). (as you mentioned "socket is hard" in a comment, and it really is not and should be something you work on)
That being said, based on your example (simply requesting the other process to do an action), JMX could also be good enough for you.
I've added a library on github called Mappedbus (http://github.com/caplogic/mappedbus) which enable two (or many more) Java processes/JVMs to communicate by exchanging messages. The library uses a memory mapped file and makes use of fetch-and-add and volatile read/writes to synchronize the different readers and writers. I've measured the throughput between two processes using this library to 40 million messages/s with an average latency of 25 ns for reading/writing a single message.
What you are looking for is inter-process communication. Java provides a simple IPC framework in the form of Java RMI API. There are several other mechanisms for inter-process communication such as pipes, sockets, message queues (these are all concepts, obviously, so there are frameworks that implement these).
I think in your case Java RMI or a simple custom socket implementation should suffice.
Sockets with DataInput(Output)Stream, to send java objects back and forth. This is easier than using disk file, and much easier than Netty.
I tend to use jGroup to form local clusters between processes. It works for nodes (aka processes) on the same machine, within the same JVM or even across different servers.
Once you understand the basics it is easy working with it and having the options to actually run two or more processes in the same JVM makes it easy to test those processes easily.
The overhead and latency is minimal if both are on the same machine (usually only a TCP rountrip of about >100ns per action).
socket may be a better choice, I think.
Back in 2004 I implement code which do the job with sockets. Until then, many times I search for a better solution, because socket approach triggers firewall and my clients worry. There is no better solution until now. Client must serialize your data, send and server must receive and unserialize.
It is easy.
In our company we are building a high demand system for sending SMS to different clients and providers through SMPP and also directly using modems.
The system handles different requests, and connects to a database to select messages and update their status (sent, received, error etc). We receive demands for sending SMS that are queued according to priorities, and released by different channels according to what is requested. Right now, is necessary to generate threads to handle the different channels concurrently, but this makes the system run slow as the transactions can be numerous.
We are interested in develop a new system, that should not have too many problems with concurrency and that would maximize the capacity to take advantage of our server processors.
To our understanding, our problems could be solved remaking the system with a different handling of threads for the requests,
¿Which architecture, framework or library would you recommend for handling this problem, which will provide the best performance?
We are currently considering: Java 7 Fork/Join, IBIS (MPJ, GMI, Satin) and AKKA (Actors library), but it is not a limitation. Is also desirable that the system is not tied to the architecture, and may be scalable and migrated to a cloud service.
PD: The current system does generate one thread per message to send, and use somehow thread pools, but not at all in an optimized way. Apart from improving that poor implementation we would like something to improve the overall performance taking advantage of all our resources (cores, processors).
Right now, is necessary to generate threads to handle the different channels concurrently, but this makes the system run slow as the transactions can be numerous.
The implicate in this statement is that it is the threads that is making the system slow and not the transaction bandwidth. What is your evidence about this?
The only way threads could create problems is if there were so many of them that you were running into memory issues and the system was slow because of GC overhead. Each thread allocates a large contiguous stack space (by default 512k) so 2000 threads (for example) will consume 1gb of core.
One way to verify that the threads are the problem is to watch the memory usage of your application using jconsole or something. If all of your memory buckets are full and the GC button does little to nothing then you are correct. Another thing to try is to use fixed sized thread-pools instead of forking a thread for each request you get. If this improves your system performance, but decreases your transaction throughput then you are correct.
Since the SMPP protocol seems to be TCP/IP, you don't want all of your threads to be sitting in wait loops. Writing your own SMPP protocol using NIO is possible if you know your NIO fu.
I'd also do some searches for java NIO SMPP libraries. A quick search took me to JSMPP. I have no experience with it however.
JSMPP is a java implementation (SMPP API) of SMPP protocol (currently support SMPP v3.4). It provides interfaces to communicate with Message Center or ESME (External Short Message Entity) and able to handle traffic 3000-5000 messages per second.
https://github.com/twitter/cloudhopper-smpp
Twitter's NIO SMPP library built on Netty. Currently used to support hundreds of operator binds sending/receiving billions of messages per month. Solves the problem of needing a thread per bind/message. There are examples of how to use it in cloudhopper-smpp/src/test/java/com/cloudhopper/smpp/demo/
Here's the situation:
I currently have a web application that uses PHP to serve HTML/CSS/JS and that talks to a MySQL DB. Completely vanilla and common. The PHP is a mixture of presentation logic (HTML generation, etc) and business logic (the app uses Ajax extensively to make requests for data or to tell the server to make changes to something).
As part of a redesign of this system I am removing all of the presentation logic from the PHP. Instead, I will be using Ext JS 4 (a javascript-based windowing toolkit / app) connected to a web socket gateway (a COMET/AJAX replacement that allows bi-directional communication) on the server. Let's wave a magic wand for a minute and forget about how the Ext JS 4 gets delivered to the browser and how it talks to the web socket gateway.
What we are left with is a web socket gateway (written in Java and running persistently listening on a specific port for web socket connections) and some business logic / DB interaction currently written in PHP.
At this point, I see one of two options:
Keep the business logic / DB interaction in the PHP and execute it by calling either PHP from the command line or by having the PHP / Apache listen on a different port only for communications from the web socket gateway.
Write a new Java or C++ application that will be persistent and listen on a specific port for communications from the web socket gateway. The business logic / DB integration is re-written in Java or C++ code and is part of this application.
Would re-writing in Java or C++ give better performance than calling PHP over and over? (The PHP code is pretty cleanly written: object-oriented using packages like CodeIgniter and Doctrine).
Would the performance benefits outweigh the hassle of re-writing all the business logic? Obviously dependent on many factors such as quantity of code but what is your gut feeling?
In case it might influence your thinking / feedback, you should know that the web socket gateway (Kaazing) supports JMS, Stomp, AMQP, XMPP, or something custom you build yourself.
Let me know if there is any other info I can provide to help you with your answers.
Thanks!
I know a lot of the solutions I mention here are "ugly" but you sound like a person who's looking to get results and refactor, so I hope it's okay.
Do it the easy way (PHP if I understood correctly) first. Then run a realistic stress test. Since you're making PHP calls, just create a realistic sequence (log in, change this, do that, log out) and run as many as you think is realistic. 100? 10000? It depends on how stressed you expect this thing to be and still preform.
That step is easier than it sounds. Don't think "ultimate test framework", think 20 line python script that runs as many threads as you want executing a few lines that will keep your application busy. If it takes you more than 40 minutes, stop and simplify. The hour you spend will be worth it.
If CPU hits 100 or you run out of some resource then perhaps it's time for a rewrite, or you can probably guess what's taking the longest and write it in C. If you do use C/C++ and you're not 100% comfortable with it, avoid a major rewrite, since it's a dangerous language with lots of opportunities for introducing bugs. Maybe even call compiled code from the PHP you have if that suits your application.
I've written server-side HTML-generating C code once. It's not exactly the right tool for the job. PHP may be hackish but it gets the job done fast. I would avoid optimization unless/until it is actually needed.
Good luck, don't forget to tell us how it goes!
Edit: If you do go for a mixed-language solution, don't forget to clean it up after! Standardize what you do fast and what you do in PHP, do it in a common format, maybe write up a short readme. Again, those fifteen minutes will save you, or the next person, a few days and many hairs.
Writing in a compiled language (Java or C++, in your examples) would almost certainly give better performance than an interpreted language like PHP. The performance benefits almost certainly would not outweigh the hassle of rewriting all of the code.
If your business logic has high processing costs, Java or C++ will give you a much better performance.
If you are simply fetching some results from a DB, do not expect any great performance gains.
I would do some prototyping/testing to identify the performance bottleneck.
My opinion is that PHP is too slow for processing HUGE datasets if you have many 100,000s of objects to analyse C++ rocks and Java benefits from the HotSpot JIT performance optimizer.
The HotSpot effect is very specific to doing number crunching in Java. You really can see the JRE is pushing the accelerator, ironing out detected bottlenecks. In some rare cases HotSpot JIT optimised Java can be even faster than C.
In some also very rare cases HotSpot performance voodooism can make your code slower!
Have you ever thought of turning a PHP application into a faster Java or C++ app?
Maybe the HipHop php2cpp compiler is all you want:
https://github.com/facebook/hiphop-php/wiki/
Quercus is a php4java runtime which can help you migrate more cheaply to Java.
http://quercus.caucho.com/
Quite interesting was Joshua Bloch's talk about "Performance Anxiety" last year.
http://www.wiki.jvmlangsummit.com/images/1/1d/PerformanceAnxiety2010.pdf
http://parleys.com/#st=5&id=2103 (32min video)
I need some advice in building a Java server that handles multiple clients at the same time. The clients need to remain connected for fairly long periods of time. I'm currently using blocking IO and spawning a thread to read from each client that connects to the server, but this is obviously not scalable.
I've found a few options, including using Selector or Executor with fixed size thread pools. I am not too familiar with either one, so which would be the best solution here? Thanks!
It depends on your definition of scalable. The system you have described with a single thread per connection is scalable up to hundreds may be even a couple of thousand concurrent connections, it will hit a wall at some point.
Your question says that your clients connect and stay connected for an extended period of time, it would be possible to have a single IO thread to handle the reading and writing, but have the processing of the request dispatched to another thread using an Executor.
There are frameworks/servers that are already written to handle this sort of event driven design. Have a look at:
Netty recently used by twitter in there query server
Jetty (not to be confused with Netty) capable of NIO and very scalable, might be to HTTP focused
MINA
Grizzly
It's worth noting that the world is full of failed startups & software products that had really scalable architecture. Scaling is a nice problem to have, better to have the problem than not to have it and no customers.
using multiple threads is scalable. Apache for example does this, and some sites using it get many visitors. However, another approach would indeed be using selector, though I have no experience using it.
After all, this seems like a question, which religion is the best.
there's a lot of framework for this kind of job, examples
Netty
Apache MINA
Independently of scalability every server application has it's limits. By using blocking IO, one of your limits will be the number of threads that the VM can spawn because the approach you take is "one-thread-per-client". With NIO (of which Selector is one of the classes), the approach is "one-thread-per-request" which will run out of threads much latter.
Horizontal scalability ( http://en.wikipedia.org/wiki/Scalability#Scale_horizontally_vs._vertically ) of your app will not depend on either of these choices.
I can create multiple threads for supporting multi-client feature in socket programming; that's working fine. But if 10,000 clients want to be connected, my server cannot create so many threads.
How can I manage the threads so that I can listen to all these clients simultaneously?
Also, if in this case the server wants to send something to a particular client, then how is it possible?
You should investigate Java's NIO ("New I/O") library for non-blocking network programming. NIO was designed to solve precisely the server scalability problem you are facing!
Introductory article about NIO: Building Highly Scalable Servers with Java NIO
Excerpts from O'Reilly's Java NIO book
Highly scalable socket programming in Java requires the selectable channels provided in the "New I/O", or NIO packages. By using non-blocking IO, a single thread can service many sockets, tending only to those sockets that are ready.
One of the more scalable open-source NIO applications is the Grizzly component of the Glassfish application server. Jean-Francois Arcand has written a number of informative, in-depth blog posts about his work on the project, and covers many subtle pitfalls in writing this kind of software with NIO.
If the concept of non-blocking IO is new to you, using existing software like Grizzly, or at least using it as a starting point for your adaptation, might be very helpful.
The benefits of NIO are debatable. See Paul Tyma's blog entries here and here.
A thread-per-connection threading model (Blocking Socket I/O) will not scale too well. Here's an introduction to Java NIO which will allow you to use non-blocking socket calls in java:
http://today.java.net/cs/user/print/a/350
As the article states, there are plenty of frameworks available so you don't have to roll your own.
As previously mentioned, 10.000 clients is not easy. For java, NIO (possibly augmented with a separate threadpool to handle each request without blocking the NIO thread) is usual way to handle a large amount of clients.
As mentioned, depending on implementation, threads might actually scale, but it depends a lot on how much interaction there is between client connections. Massive threads are more likely to work if there is little synchronization between the threads.
That said, NIO is notoriously difficult to get 100% right the first time you implement it.
I'd recommend either trying out, or at least looking at the source for the Naga NIO lib at naga.googlecode.com. The codebase for the lib is small compared to most other NIO frameworks. You should be able to quickly implement a test to see if you can get 10.000 clients up and running.
(The Naga source also happens to be free to modify or copy without attributing the original author)
This is not a simple question, but for a very in depth (sorry, not in java though) answer see this: http://www.kegel.com/c10k.html
EDIT
Even with nio, this is still a difficult problem. 10000 connections is a tremendous resource burden on the machine, even if you are using non-blocking sockets. This is why large web sites have server farms and load balancers.
Why don't you process only a certain amount of requests at a time.
Let's say you want to process a maximum of 50 requests at a time (for not creating too many threads)
You create a threadpool of 50 threads.
You put all the requests in a Queue (accept connections, keep sockets open), and each thread, when it is done, gets the next request then process it.
This should scale more easily.
Also, if the need arise, it will be easier to do load balancing, since you could share your queues for multiple servers
Personally I would rather use create a custom I/O non blocking setup, for example using one thread to accept clients and using one other thread to process them (checking if any input is available and writing data to the output if necessary).
You'll have to figure out why your application is failing at 10,000 threads.
Is there a hard limit to the number of threads in the JVM or the OS? If so, can it be lifted?
Are you running out of memory? Try configuring a smaller stack size per thread, and/or add more memory to the server.
Something else? Fix it.
Only once you have determined the source of the problem will you be able to fix it. In theory 10,000 threads should be OK but at that level of concurrency it requires some extra tuning of the JVM and operating system if you want it to work out.
You can also consider NIO but I think it can work fine with threads as well.