I have the following situation:
I have 2 JVM processes (really 2 java processes running separately, not 2 threads) running on a local machine. Let's call them ProcessA an ProcessB.
I want them to communicate (exchange data) with one another (e.g. ProcessA sends a message to ProcessB to do something).
Now, I work around this issue by writing a temporary file and these process periodically scan this file to get message. I think this solution is not so good.
What would be a better alternative to achieve what I want?
Multiple options for IPC:
Socket-Based (Bare-Bones) Networking
not necessarily hard, but:
might be verbose for not much,
might offer more surface for bugs, as you write more code.
you could rely on existing frameworks, like Netty
RMI
Technically, that's also network communication, but that's transparent for you.
Fully-fledged Message Passing Architectures
usually built on either RMI or network communications as well, but with support for complicated conversations and workflows
might be too heavy-weight for something simple
frameworks like ActiveMQ or JBoss Messaging
Java Management Extensions (JMX)
more meant for JVM management and monitoring, but could help to implement what you want if you mostly want to have one process query another for data, or send it some request for an action, if they aren't too complex
also works over RMI (amongst other possible protocols)
not so simple to wrap your head around at first, but actually rather simple to use
File-sharing / File-locking
that's what you're doing right now
it's doable, but comes with a lot of problems to handle
Signals
You can simply send signals to your other project
However, it's fairly limited and requires you to implement a translation layer (it is doable, though, but a rather crazy idea to toy with than anything serious.
Without more details, a bare-bone network-based IPC approach seems the best, as it's the:
most extensible (in terms of adding new features and workflows to your
most lightweight (in terms of memory footprint for your app)
most simple (in terms of design)
most educative (in terms of learning how to implement IPC). (as you mentioned "socket is hard" in a comment, and it really is not and should be something you work on)
That being said, based on your example (simply requesting the other process to do an action), JMX could also be good enough for you.
I've added a library on github called Mappedbus (http://github.com/caplogic/mappedbus) which enable two (or many more) Java processes/JVMs to communicate by exchanging messages. The library uses a memory mapped file and makes use of fetch-and-add and volatile read/writes to synchronize the different readers and writers. I've measured the throughput between two processes using this library to 40 million messages/s with an average latency of 25 ns for reading/writing a single message.
What you are looking for is inter-process communication. Java provides a simple IPC framework in the form of Java RMI API. There are several other mechanisms for inter-process communication such as pipes, sockets, message queues (these are all concepts, obviously, so there are frameworks that implement these).
I think in your case Java RMI or a simple custom socket implementation should suffice.
Sockets with DataInput(Output)Stream, to send java objects back and forth. This is easier than using disk file, and much easier than Netty.
I tend to use jGroup to form local clusters between processes. It works for nodes (aka processes) on the same machine, within the same JVM or even across different servers.
Once you understand the basics it is easy working with it and having the options to actually run two or more processes in the same JVM makes it easy to test those processes easily.
The overhead and latency is minimal if both are on the same machine (usually only a TCP rountrip of about >100ns per action).
socket may be a better choice, I think.
Back in 2004 I implement code which do the job with sockets. Until then, many times I search for a better solution, because socket approach triggers firewall and my clients worry. There is no better solution until now. Client must serialize your data, send and server must receive and unserialize.
It is easy.
From the docs
It is not a general-purpose publish-subscribe system, nor is it intended for interprocess communication
What do they mean by saying it's not a general-purpose publish-subscribe system?
As far as I understand this is within a JVM only, publish/subscribe is usually between processes residing in potentially different places: like one JVM talking to another, suppose one Spring application publishing events that are to be consumed by another Spring application, residing in entirely different places.
It's better explained in the bottom of the EventBusExplained page:
Why can't I do <magic thing> with EventBus?
EventBus is designed to
deal with a large class of use cases really, really well. We prefer
hitting the nail on the head for most use cases to doing decently on
all use cases.
Additionally, making EventBus extensible -- and making it useful and
productive to extend, while still allowing ourselves to make additions
to the core EventBus API that don't conflict with any of your
extensions -- is an extremely difficult problem.
If you really, really need magic thing X, that EventBus can't
currently provide, you should file an issue, and then design your own
alternative.
(Emphasis mine.)
Compare EventBus to Kafka for example -- the latter is much more extensible but also more complex.
I want to build a service that basically pulls data from various API's.
On a typical server, is there a thread limit that one should adhere too?
Does anyone have any experience building something similiar, how many threads was considered ideal and what kind of requests per second can one expect?
Is 100 threads too much? 200?
I realize this is something I'm going to have to test, but looking for someone who has built something similar in nature that can shed some experience on it.
It depends on you bottlenecks and your requirements. How fast do you need to complete the operations? Do the threads make IO? I know they make a lot of network requests from your explanation.
So the threads are going to wait on network. Why do you need many threads then, maybe async operations will be faster.
And in general, as Robert Harvey commented: It's going to take us longer to answer your question than it is for you to test it and tweak the number. The number of threads depends on all sorts of variables which you haven't specified, so any answer is going to be a guess
For your particular case it may be more suited to use an asynchronous style of programming. In this case you could achieve a large throughput of API calls using a small number of threads - it may be even comparable to the number of available cores.
There are several available libraries to achieve this (Twitter is the king here).
Finagle - General purpose, supports multiple transport protocols
Scrooge - For thrift only
Async Http Client - Java-oriented async http client
And there are many others.
As part of a study I am doing, I am exploring the supposed simplicity of using languages like Scala & Clojure to achieve concurrency on the JVM.
By simplicity, I am hoping to prove that these languages provide easier concurrency constructs than what Java 7 provides.
Therefore, I am hoping to find some good references that explain the complexities of Java's concurrency model.
Outside of pointing me in the direction of Google (which I have already searched with limited success), I would appreciate if those in-the-know could provide me with some good references to get me started off in this area.
Thanks
Java does not support lambda expressions. Creating an inline callback (eg, for the completion of an asynchronous call) requires 5 lines of boilerplate for an anonymous type.
This strongly discourages people from using callbacks. This is probably why Java 7 still does not have an interface for a callback that takes a value (as opposed to Runnable and Callbable), whereas C# has had one since 2005.
Therefore, the JDK does not have any real support for asynchronous operations.
The key to an asynchronous operation is the ability to kick off a long-running request, and have it run a callback when it finishes, without consuming a thread for the duration of the request. In Java, you can only do this by making a separate thread call get() on a Future<V>. This limits the concurrency of an application using the standard API to the number of threads you can sanely support.
To solve this problem, Google's Guava framework for better Java code introduces a ListenableFuture<V> interface which does have completion callbacks.
Languages like Scala fix this problem by supporting lambda expressions (which compile to anonymous classes) and adding their own Promise / Future types.
While higher level languages are easier to use multiple cores, what is often forgotten is why you want to use multiple cores which is to make the program faster e.g. increase its throughput.
When you consider options which increase concurrency, you need to test whether these options actually improve performance in some way. (Because very often they don't)
e.g. STM (Software Transactional Memory) makes it easier to write multi-threaded applications without having to worry about concurrency issues. The problem is that for trivial examples, it would be faster to not use STM and only use one thread.
Using multiple threads adds complexity and makes your application more fragile, so there has to be a good reason to do it otherwise you should stick to the simplest solution possible.
For more discussion
http://vanillajava.blogspot.co.uk/2011/11/why-concurency-examples-are-confusing.html
I have a quadcore processor and I would really like to take advantage of all those cores when I'm running quick simulations. The problem is I'm only familiar with the small Linux cluster we have in the lab and I'm using Vista at home.
What sort of things do I want to look into for multicore programming with C or Java? What is the lingo that I want to google?
Thanks for the help.
The key word is "threading" - wouldn't work in a cluster, but it will be just fine in a single multicore machine (actually, on any kind of Windows, much better in general than spawning multiple processes -- Windows' processes are quite heavy-weight compared to Linux ones). Not quite that easy in C, very easy in Java -- for example, start here!
You want Threads and Actors
Good point ... you can't google for it unless you know some keywords.
C: google pthread, short for Posix Thread, although the win32 native interface is non-posix, see Creating Threads on MSDN.
Java: See class Thread
Finally, you should read up a bit on functional programming, actor concurrency, and immutable objects. It turns out that managing concurrency in plain old shared memory is quite difficult, but message passing and functional programming can allow you to use styles that are inherently much safer and avoid concurrency problems. Java does allow you to do everything the hard way, where data is mutable shared memory and you desperately try to manually interlock shared state structures. But you can also use an advanced style from within java. Perhaps start with this JavaWorld article: Actors on the JVM.
Check out this book: Java Concurrency in Practice
I think you should consider Clojure, too. It runs on the JVM and has good Java interoperability. As a Lisp, it's different from what you're used to with C and Java, so it might not be your cup of tea, but it's worth taking a look at the issues addressed by Clojure anyway, since the concepts are valuable regardless of what language you use. Check out this video, and then, if you're so inclined, the clojure site, which has links to some other good screencasts more specifically about Clojure in the upper right.
It depends on what your preferred language is to get the job done.
Besides the threading solutions, you may can also consider
MPI as a possibility from Java and C --- as well as from Python or R or whatever you like.
DeinoMPI appears to be popular on Windows, and OpenMPI just started with support for Windows too in the current release 1.3.3.
A lot of people have talked about threading, which is one approach, but consider another way of doing it. What if you had several JVM's started up, connected to the network, and waiting for work to come their way? How would you program an application so that it could leverage all of those JVMs without knowing whether or not they are on the same CPU?
On a quadcore machine, you should be able to run 12 or more JVMs to process work. And if you approach the problem from this angle, scaling up to multiple computers is fairly simple, although you do have to consider higher network latencies when your communication is across a real network.
Here is a good source of info on threading in C#.
You need to create multithreaded programs. Java supports multi-threading out of the box (though older JVMs ran all threads on one core). For C, you'll either need to use platform specific code to to create and manipulate threads (pthread* for Linux, CreateThread and company for Windows). Alternatively, you might want to do your threading from C++, where there are a fair number of libraries (e.g. Boost::threads) to make life a bit simpler and allow portable code.
If you want code that'll be portable across a single machine with multiple cores AND a cluster, you might look into MPI. It's really intended for the cluster situation, but has been ported to work on a single machine with multiple processors or multiple cores -- though it's not as efficient as code written specifically for a single machine.
So, that's a very broad question. You can experiment with multithreaded programming using many different programming languages including C or Java. If you wanted me to pick one for you, then I'd pick C. :)
You want to look into Windows threads, POSIX threads (or multithreading in Java, if that's language). You might want to try to find some problems to experiment with. I'd suggest trying out matrix multiplication; start with a sequential version and then try to improve the time using threads.
Also, OpenMP is available for Windows and offers a much different view of how to multithreaded programming.
Even though you asked specifically for C or Java, Erlang isn't a bad choice of language if this is just a learning exercise
It allows you to do multiprocess style programming very very easily and it has a large set of libraries that let you dive in at just about any level you like.
It was built for distributed programming in a very pragmatic way. If you are comfortable with java, the transition shouldn't be too difficult.
If you are interested, I would recommend the book, "Programming Erlang" by Joe Armstrong.
(as a note: there are other languages designed to run on in highly parallel environments like Haskell. Erlang just tends to be more pragmatic than languages like Haskell which are rooted more in theory)
If you want to do easy threading, such as parallel loops, I recommend check out .NET 4.0 Beta (C# in VS2010 Beta).
The book chapter Joe linked to is a very good one I use myself and highly recommend, but doesn't cover the new parallel extensions to the .NET framework.
yes , many threads , but if the threads are accessing the same position in the memory only one thread will execute,
we need multi memory cores
By far the easiest way to do multicore programming on Windows is to use .NET 4 and C# or F#. Here is a simple example where a parallel program (from the shootout) is 7× shorter in F# than Java and just as fast.
.NET 4 provides a lot of new infrastructure for parallel programming and it is really easy to use.
That you say "take advantage" sounds to me as something more than doing just any multi-threading. Simulations in my book are computation-intensive and in that respect the most efficient language is C. Some would say assembly but there are very few x86 assembly programmers who can beat a modern C compiler.
For the Windows NT engine (NT4, 2000, XP, Vista and 7) the mechanisms you should look into are threads, critical sections and I/O completion ports (iocp). Threads are nice but you need to be able to synchronize them among themselves and with I/O which is where cs's and iocps come in. To make sure your wringing every last bit of performance out of your code you need to profile, analyze, experiment/re-construct. Lots of fun but very time-consuming.
Multiple threads can exist in a single process. The threads that belong to the same process share the same memory area (can read from and write to the very same variables, and can interfere with one another). On the contrary, different processes live in different memory areas, and each of them has its own variables. In order to communicate, processes have to use other channels (files, pipes or sockets).
If you want to parallelize a computation, you're probably going to need multithreading, because you probably want the threads to cooperate on the same memory.
Speaking about performance, threads are faster to create and manage than processes (because the OS doesn't need to allocate a whole new virtual memory area), and inter-thread communication is usually faster than inter-process communication. But threads are harder to program. Threads can interfere with one another, and can write to each other's memory, but the way this happens is not always obvious (due to several factors, mainly instruction reordering and memory caching), and so you are going to need synchronization primitives to control access to your variables.
Taken from this answer.