Software used in grid computing to discover clients

Software used in grid computing to discover clients - java

In grid computing, what is the de facto software practice used by a server to discover clients and get information about them? For example, the name of the client, how much memory is available, is the client currently performing a task (and how much has it completed), etc. Or is it the other way around? Do the clients occasionally report that information to the server?
Would this be done via RPC? Or a messaging protocol (AMQP, STOMP)?
I'm also wondering if the same method is used to send clients various jobs/taks to complete?
I'm looking to find a Java friendly solution, if possible.
Thanks!

There is no actual de facto standard for server/node/client discovery in grid computing, at least none that is universally used. Many implementations use adhoc discovery based on UDP multicasting, others use registry-based discovery as in SOA architectures. There's plenty of solutions but no universal standard.
Some Java-firendly implementations you might want to look at: Unicore, JPPF, HTCondor, GridGain, Hadoop, Globus, Hazelcast

Zookeeper is something to consider. Perhaps combined with JMS messaging if your resources are distributed far and wide. I use Zookeeper with a SystemInfo service running on each node. The service registers the systems information: memory, number of CPUs, disk space and such as a znode in /Resources in Zookeeper.
Then whatever service needs a resource can query /Resources if looking for a resource to do something and check its specifications before allocating work.
The Java APi for Zookeeper is pretty good. I find it easy to work with.

Related

Alternatives to RMI for IPC?

I have 2 processes that need to communicate over the same PC and different PCs. In the local case the process communication is among different processes e.g Process A and Process B.
In the remote case it will be among 2 instances of Process A running in different PCs.
I will create them from scratch and I am wondering what is the best approach. I am aware of RMI and sockets but I was wondering for my case as described, and taking also into account that the messages exchanged are small and the number of APIs really small, if there is a standard approach/library for this.
Any suggesstions are highly welcome
Update after #EJP comments:
My interest is 1)to implement the requirement for communication in a light manner since the API exposed will be really small and the messages as well 2)use and learn a new popular framework if possible (I already know RMI and sockets)

If you are just looking for messaging frameworks, there's a bunch available out such as
RabbitMQ - http://www.rabbitmq.com/
ZeroC Ice - http://www.zeroc.com/ice.html
AMQP - http://www.amqp.org
OpenSplice DDS - http://www.prismtech.com/opensplice
But when you use a 3rd party framework, you are then adding an additional dependency to your application. If it is something very simple like your case, perhaps writing a TCP client/server would be sufficient for a client/server paradigm or if you are looking for publisher/subscriber paradigm then you can look into using UDP multicast. You just need your data class to extends Serializable if you want to be able to marshal and unmarshal your data to buffer and send it over to network using typical JAVA socket API.

I strongly suggest having a look at Thrift. From all the technologies I've used (web services, RMI, XML-RPC, Corba comes to mind) it is currently my favourite. Essentially the steps involved are:
Download the Thrift compiler.
Add the Maven dependency (make sure it is the same version as the compiler!) I currently use 0.8.0.
Write your Thrift IDL (incredibly easy, google for it as there are plenty of examples).
Compile it for Java.
Writer your server/client.
In general, you can whip together a server and a client in about 30 lines of code. In terms of speed and reliability it has never failed me before.

You might have a look at Versile Java (full disclosure: I am one of the developers), it satisfies at least your criteria #1. From the API documentation, here are some examples of writing remote-enabled objects, running a service, and connecting to a service.

If you want to learn something new then I'd look at OpenSplice. The reason is pretty simple, among the technologies suggested above is the only one that provides you with Data-Centric abstractions.
The cool thing about OpenSplice is that gives you the abstraction of a Global Data Space, yet the implementation of this global data space is fully distributed and very high performance.
Take a look at some of the slides available at http://www.slideshare.net/angelo.corsaro and I am sure you'll get in love with the technology.
Finally OpenSplice is Open Source.
Happy Hacking.
A+

JMX is a good alternative .
Example :
http://www.javalobby.org/java/forums/t49130.html
IMB JMX Example
http://alvinalexander.com/blog/post/java/source-code-java-jmx-hello-world-application

Are there any frameworks to synchronize data generated on one peer with all other peers in an unreliable network?

We are developing a system with the following requirements.
There are N systems that each generate data that is unique to themselves
Each system requires the data from every other system to perform its end goal
These systems are talking to each other on an unreliable network.
It is expected that some systems will be completely unavailable for extended periods of time (but they may be in contact with some of there peers who are in contact with the rest of the network)
To put it another way, each system needs to replicate its data to N peer systems. Ideally, this will be done in an intelligent manner.
I have considered looking into database synchronization frameworks, but I am concerned that it is overkill for this problem. I don't think there is any possibility for row conflicts because each system's data is entirely independent of other systems.
The question is, do you know of any frameworks that could help solve this problem? Or possibly a way to phrase this issue that might help me down a path to discover a solution.
Finally, ideally, this framework would be in C++ (and potentially, java).

SymmetricDS.org
The solution you are looking for sounds a lot like the open source software SymmetricDS.
"SymmetricDS is an asynchronous data replication software package that supports multiple subscribers and bi-directional synchronization. It uses web and database technologies to replicate tables between relational databases, in near real time if desired. The software was designed to scale for a large number of databases, work across low-bandwidth connections, and withstand periods of network outage."
-SymmetricDS.org
Symmetric was designed to be used as a Java library, as well as a stand alone application. Used with a lightweight database like H2, you could avoid your overkill scenario. H2 can optionally be run
embedded within an application and can store data in memory or to disk.
Disclaimer: I recently started working for JumpMind, the company that develops this software.

0mq. It is a C framework with a C++ interface. It notably supports EPGM (reliable multicast over UDP) and N-to-N connections. Though, there will be work to do for your special use case.

Interesting problem. Many of the issues you've described lend themselves particularly well to the BitTorrent protocol.

It seems you want to implementing a reliable broadcast for your peer communication. Check out the library J.N. provided, and if it is not sufficient (or you want to modify it) there are some algorithms in this book.
Check Causal Order Broadcast and Total Order Broadcast.
My teacher at the univ did implement such a library, I will update when I find it.

What you are looking for is called a "distributed database", and they are extensively used even in production system; http://www.project-voldemort.com/ for example, is used by linkedin
As p2p network like DHT and Kadmelia ARE key->value database, there are also some P2P database, where new node are automatically added and the failure resistence of any node is strong, as those network resistance and scalability is proven
So just look on your preferred search engine for "p2p database" and "distributed database", and you will find a lot of implementation.

When do enterprise grade queuing/messaging systems supersede simpler workflow management systems?

Hi guys: I've "simplistic" workflow management tricks (like rotating file queues, controller threads, etc...) work in a wide variety of producer/consumer contexts... Where files are simply renamed, deleted, and created in a systematic manner; or where a "main" thread is calls and coordinates workers.
In contrast, I've also "played" with JMS in some toy applications, and I can see how it might be used to coordinate a complex application workflow.
I was wondering: What do messaging services like JMS offer over standard producer/consumer workflows (of course, if I'm missing something here, or have the wrong idea of when/why JMS is used, feel free to correct me)?
In particular, what type of applications require enterprise-grade messaging frameworks?

What do messaging services like JMS offer over standard producer/consumer workflows?
Scalability, availability, transparency, manageability. In point-to-point communication sender is bound to the receiver and vice versa. You, as the application developer, are responsible for thinking what to do when traffic increases and implement the necessary changes. Your application must be aware of the environment in which it works and must be changed every time the environment changes. You are forced to reinvent the wheel while solving typical messaging problems, for example, temporary congestion (what to do when the consumer can't keep the pace with the producer for a while?). You have to provide your own means of monitoring the current situation, if something does not work as expected. The list goes on...
Now imagine you have to wire 10 different systems this way. Obviously, you'll need to come up with a fairly universal solution so that you don't implement each connection logic from scratch — that would be terribly expensive to produce, not to mention maintaining it. A JMS message broker is one of such possible general solutions.
In particular, what type of applications require enterprise-grade messaging frameworks?
Complicated, in short. I work for a company that has a network of about 70 systems, some of them 30 years old. New systems are added to the network as time passes and the old systems don't need to be changed, neither must new systems be aware of ancient data exchange protocols — a centralized cluster of message brokers can translate a JMS message into some mainframe message format I have no idea about, and same way back with the answer.

Communication between local JVMs

My question: What approach could/should I take to communicate between two or more JVM instances that are running locally?
Some description of the problem:
I am developing a system for a project that requires separate JVM instances to isolate certain tasks from each other entirely.
In it's running, the 'parent' JVM will create 'child' JVMs that it will expect to execute and then return results to it (in the format of relatively simple POJO classes, or perhaps structured XML data). These results should not be transferred using the SysErr/SysOut/SysIn pipes as the child may already use these as part of its running.
If a child JVM does not respond with results within a certain time, the parent JVM should be able to signal to the child to cease processing, or to kill the child process. Otherwise, the child JVM should exit normally at the end of completing its task.
Research so far:
I am aware there are a number of technologies that may be of use e.g....
Using Java's RMI library
Using sockets to transfer objects
Using distribution libraries such as Cajo, Hessian
...but am interested in hearing what approaches others may consider before pursuing one of these options, or any others.
Thanks for any help or advice on this!
Edits:
Quantity of data to transfer- relatively small, it will mostly be just a handful of POJOs containing strings that will represent the result of the child executing. If any solution would be inefficient on larger amounts of information, this is unlikely to be a problem in my system. The amount being transferred should be pretty static and so this does not have to be scalable.
Latency of transfer- not a critical concern in this case, although if any 'polling' of results is needed this should be able to be fairly frequent without significant overheads, so I can maintain a responsive GUI on top of this at a later time (e.g. progress bar)

Not directly an answer to your question, but a suggestion of an alternative.
Have you considered OSGI?
It lets you run java projects in complete isolation from each other, within the SAME jvm.
The beauty of it is that communication between projects is very easy with services (see Core Specifications PDF page 123). This way there is not "serialization" of any sort being done as the data and calls are all in the same jvm.
Furthermore all your requirements of quality of service (response time etc...) go away - you only have to worry about whether the service is UP or DOWN at the time you want to use it. And for that you have a really nice specification that does that for you called Declarative Services (See Enterprise Spec PDF page 141)
Sorry for the off-topic answer, but I thought some other people might consider this as an alternative.
Update
To answer your question about security, I have never considered such a scenario. I don't believe there is a way to enforce "memory" usage within OSGI.
However there is a way of communicating outside of JVM between different OSGI runtimes. It is called Remote Services (see Enterprise Spec PDF, page 7). They also have nice discussion there of the factors to take into consideration when doing something like that (see 13.1 Fallacies).
Folks at Apache Felix (implementation of OSGI) I think have implementation of this with iPOJO, called Distributed Services with iPOJO (their wrapper to make using services easier). I've never used this - so ignore me if I am wrong.

I'd use KryoNet with local sockets since it specialises heavily in serialisation and is quite lightweight (you also get Remote Method Invocation! I'm using it right now), but disable the socket disconnection timeout.
RMI basically works on the principle that you have a remote type and that the remote type implements an interface. This interface is shared. On your local machine, you bind the interface via the RMI library to code 'injected' in-memory from the RMI library, the result being that you have something that satisfies the interface but is able to communicate with the remote object.

akka is another option, as well as other java actor frameworks, it provides communication and other goodies derived from the actor model.

If you can't use stdin/stdout, then i'd go with sockets. You need some sort of serialization layer on top of the sockets (as you would with stdin/stdout), and RMI is a very easy to use and pretty effective such layer.
If you used RMI and found the performance wasn't good enough, i'd switch to some more efficient serializer - there are plenty of options.
I wouldn't go anywhere near web services or XML. That seems like a complete waste of time, likely take more effort and deliver less performance than RMI.

Not many people seem to like RMI any longer.
Options:
Web Services. e.g. http://cxf.apache.org
JMX. Now, this is really a means of using RMI under the table, but it would work.
Other IPC protocols; you cited Hessian
Roll-your-own using sockets, or even shared memory. (Open a mapped file in the parent, open it again in the child. You'd still need something for synchronization.)
Examples of note are Apache ant (which forks all sorts of Jvms for one purpose or another), Apache maven, and the open source variant of the Tanukisoft daemonization kit.
Personally, I'm very facile with web services, so that's the hammer which which I tend to turn things into nails. A typical JAX-WS+JAX-B or JAX-RS+JAX-B service is very little code with CXF, and manages all the data serialization and deserialization for me.

It was mentioned above, but i wanted to expand a bit on the JMX suggestion. we actually are doing pretty much exactly what you are planning to do (from what i can glean from your various comments). we landed on using jmx for a variety of reasons, a few of which i'll mention here. for one thing, jmx is all about management, so in general it is a perfect fit for what you want to do (especially if you already plan on having jmx services for other management tasks). any effort you put into jmx interfaces will do double duty as apis you can call using java management tools like jvisualvm. this leads to my next point, which is the most relevant to what you want. the new Attach API in jdk 6 and above is very sweet. it enables you to dynamically discover and communicate with running jvms. this allows, for example, for your "controller" process to crash and restart and re-find all the existing worker processes. this is the makings of a very robust system. it was mentioned above that jmx basically rmi under the hood, however, unlike using rmi directly, you don't need to manage all the connection details (e.g. dealing with unique ports, discoverability, etc). the attach api is a bit of a hidden gem in the jdk, as it isn't very well documented. when i was poking into this stuff initially, i didn't know the name of the api, so figuring how the "magic" in jvisualvm and jconsole worked was very difficult. finally, i came across an article like this one, which shows how to actually use the attach api dynamically in your own program.

Although it's designed for potentially remote communication between JVMs, I think you'll find that Netty works extremely well between local JVM instances as well.
It's probably the most performant / robust / widely supported library of its type for Java.

A lot is discussed above. But be it sockets, rmi, jms - there is a lof of dirty work involved.
I would ratter advice akka. It is a actor based model which communicate with each other using Messages.
The beauty is, the actors can be on same JVM or another (very little config) and akka takes care the rest for you. I haven't seen a more cleaner way than doing this :)

Try out jGroups if the data to be communicated is not huge.

How about http://code.google.com/p/protobuf/
It is lightweight.

As you mentioned you can obviously send the objects over the network but that is a costly thing not to mention start up a separate JVM.
Another approach if you just want to separate your different worlds inside one JVM is to load the classes with different classloaders. ClassA#CL1!=ClassA#CL2 if they are loaded by CL1 and CL2 as sibling classloaders.
To enable communications between classA#CL1 and classA#CL2 you could have three classloaders.
CL1 that loads process1
CL2 that loads process2 (same classes as in CL1)
CL3 that loads communication classes (POJOs and Service).
Now you let CL3 be the parent classloader of CL1 and CL2.
In classes loaded by CL3 you can have a light-weight communication send/receive functionality (send(Pojo)/receive(Pojo)) the POJOs between classes in CL1 and classes in CL2.
In CL3 you expose a static service that enables implementations from CL1 and CL2 register to send and receive the POJOs.

Java HA framework

I am writing a small proxy application which should be redundant, e.g. primary proxy will be running on one server and the redundant one will run on a separate server. Is there a simple high-availability framework which I can use to implement this redundancy? For example, this HA framework would send pings between instances and raise some sort of exception or notification on the other instance when the first one goes down.

Building such a system has been my routine job in recent years. I have found jgroups
a very usable tools to receive and handle such kind of grouping events. This is the case if you want to build your own HA infrastructure. I don't know, but maybe in your case just a simple reverse proxy such as HAProxy can be enough.

If you want HA without hassle, just use some load balancer with HA capability e.g. Ultramonkey, LVS with keepalived etc.
In a HA configuration, you'd typically want to use virtual IP, so even if you'd have this ping/notify functionality as a framework, you'll still have stuff to do (start responding to requests to the virtual IP once the other instance has failed). So unless you are looking for a learning occasion, I'd advice using a middleware instead of coding this yourself using frameworks.
There are number of health-checks that you can configure for these middlewares. A simple healthcheck might for example, fire a GET request to your app. periodically and look for a specific string (e.g. "XXX running.") in the response to make sure your app. is running fine.

You don't provide much details about the work your application does, so depending on how stateful it is, whether it can tolerate minor dataloss, is it time-critical, do you value developer time over machine time, you can have a varying spectrum of solutions.
There are some good suggestions above, I'd add: take a look at JMS and persistent messaging. Usually these make recovery quite trivial, but at the cost of latency hit (unless you byu a commercial product and learn it well or pay the vendor to tune your application). With JMS queues you can implement active-active processing and save yourself the headache of failure detection.
Another direction to look at is distributed state management/clustering framework like Gigaspaces, Coherence, Gemstone, Infinispan, Gridgain and Teracotta. These can replicate your data and guarantee varying quality of services levels. Most of them come with some type of failure detection and distributed management mechanism.

hadoop is a good place to start

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.