So i have a server jvm and a client jvm. The client communicates with the server by sending serialized java objects over tcp. Now, normally the server would have the classes of the objects it was receiving in its classpath, in order to deserialize the objects properly.
But what i'm looking for is some way to avoid that; ie, have the client "somehow" send the class bytecode over the wire, on-demand. This would of course require recursing down the class tree (in case any members of the original class where themselves objects of other classes that the server didn't know about).
So i was wondering about any technologies out there that do this sort of thing.
Thx.
RMI includes the notion of a "class server." Sounds like you're pretty much reinventing that, so consider looking into using all or part of RMI. Here's a tutorial.
RMI has the ability to dynamically download entire class file definitions over the wire on demand.
Even if you don't use (or want to use) RMI, the technologies underlying the classloading may be of interest, and they're standard Java.
You are asking about Code Mobility. Also the area of grid computing is somewhat relevant.
Take a look at Mobility-RPC, it's a library which does exactly what you ask at the same level of granularity (class-level).
Security is something to bear in mind. But I'd also remember that SQL sent to databases, bash commands executed over SSH, Business rules engines, Adobe Flash, Java Applets, RMI as described above, ActiveX, JavaScript, Hadoop/grid computing frameworks - all of these are examples of remote code execution in widespread use. Like everything, turning the security dial to the max is going to limit your options. But all of the above are used to good effect when properly firewalled or sandboxed.
In this instance, it sounds like you want something to eliminate a minor hassle, and you're not (say) designing a full-blown distributed application. So based on what you've said, despite myself being somewhat of a code mobility proponent, I'd say code mobility is probably overkill in this case. (But useful in others.)
Regarding grid computing, take a look at GridGain, and Hadoop. GridGain is a pure (CPU-centric) grid computing framework, whereas Hadoop is more a data mining/data warehouse platform with its own replicated distributed file system (HDFS).
Both GridGain and Hadoop transmit user-defined Java code implementing tasks/jobs to remote worker nodes. Last time I checked, they did this by transferring user-supplied jar files to the relevant nodes. I think the GridGain ClassLoader is more sophisticated than Hadoop's however (but less sophisticated than Mobility-RPC's). Hadoop basically starts a new JVM for every job, not especially efficient (but not exactly the bottleneck given the IO load!).
Mobility-RPC is somewhat different because it doesn't expect the remote machine to be a worker node at all, it could be any application running the library. So it's more like RPC or ad-hoc task/object transfer.
This sounds like a very bad idea. Basically, it would mean you allow your client to send code to the server which is directly executed inside the current process. Something like this is generally considered a serious vulnerability, namely Arbitrary code execution which is one of the worst vulnerabilities you could ever have.
Building a system design based on that is well, not so smart.
Create a classloader that loads classes from a stream. See the JarFileClassLoader example for details.
This, of course, will become hugely problematic, particularly if any of the classes use reflection and don't directly name an implementation in the bytecode, in addition to potential security issues; you'll need to look into secure classloaders.
If plain RMI does not work for your requirements take a look to mobile agents frameworks in Java (e.g., Aglets).
Related
I have the following situation:
I have 2 JVM processes (really 2 java processes running separately, not 2 threads) running on a local machine. Let's call them ProcessA an ProcessB.
I want them to communicate (exchange data) with one another (e.g. ProcessA sends a message to ProcessB to do something).
Now, I work around this issue by writing a temporary file and these process periodically scan this file to get message. I think this solution is not so good.
What would be a better alternative to achieve what I want?
Multiple options for IPC:
Socket-Based (Bare-Bones) Networking
not necessarily hard, but:
might be verbose for not much,
might offer more surface for bugs, as you write more code.
you could rely on existing frameworks, like Netty
RMI
Technically, that's also network communication, but that's transparent for you.
Fully-fledged Message Passing Architectures
usually built on either RMI or network communications as well, but with support for complicated conversations and workflows
might be too heavy-weight for something simple
frameworks like ActiveMQ or JBoss Messaging
Java Management Extensions (JMX)
more meant for JVM management and monitoring, but could help to implement what you want if you mostly want to have one process query another for data, or send it some request for an action, if they aren't too complex
also works over RMI (amongst other possible protocols)
not so simple to wrap your head around at first, but actually rather simple to use
File-sharing / File-locking
that's what you're doing right now
it's doable, but comes with a lot of problems to handle
Signals
You can simply send signals to your other project
However, it's fairly limited and requires you to implement a translation layer (it is doable, though, but a rather crazy idea to toy with than anything serious.
Without more details, a bare-bone network-based IPC approach seems the best, as it's the:
most extensible (in terms of adding new features and workflows to your
most lightweight (in terms of memory footprint for your app)
most simple (in terms of design)
most educative (in terms of learning how to implement IPC). (as you mentioned "socket is hard" in a comment, and it really is not and should be something you work on)
That being said, based on your example (simply requesting the other process to do an action), JMX could also be good enough for you.
I've added a library on github called Mappedbus (http://github.com/caplogic/mappedbus) which enable two (or many more) Java processes/JVMs to communicate by exchanging messages. The library uses a memory mapped file and makes use of fetch-and-add and volatile read/writes to synchronize the different readers and writers. I've measured the throughput between two processes using this library to 40 million messages/s with an average latency of 25 ns for reading/writing a single message.
What you are looking for is inter-process communication. Java provides a simple IPC framework in the form of Java RMI API. There are several other mechanisms for inter-process communication such as pipes, sockets, message queues (these are all concepts, obviously, so there are frameworks that implement these).
I think in your case Java RMI or a simple custom socket implementation should suffice.
Sockets with DataInput(Output)Stream, to send java objects back and forth. This is easier than using disk file, and much easier than Netty.
I tend to use jGroup to form local clusters between processes. It works for nodes (aka processes) on the same machine, within the same JVM or even across different servers.
Once you understand the basics it is easy working with it and having the options to actually run two or more processes in the same JVM makes it easy to test those processes easily.
The overhead and latency is minimal if both are on the same machine (usually only a TCP rountrip of about >100ns per action).
socket may be a better choice, I think.
Back in 2004 I implement code which do the job with sockets. Until then, many times I search for a better solution, because socket approach triggers firewall and my clients worry. There is no better solution until now. Client must serialize your data, send and server must receive and unserialize.
It is easy.
I would like to build a distributed NoSQL database or key-value store using golang, to learn golang and practice distribute system knowledge I've learnt from school. The target use case I can think of is running MapReduce on top of it, and implement a HDFS-compatible "filesystem" to expose the data to Hadoop, similar to running Hadoop on Ceph and Amazon S3.
My question is, what difficulties should I expect to integrate such an NoSQl database with Hadoop? Or integrate with other languages (e.g., providing Ruby/Python/Node.js/C++ APIs?) if I use golang to build the system.
Ok, I'm not much of a Hadoop user so I'll give you some more general lessons learned about the issues you'll face:
Protocol. If you're going with REST Go will be fine, but expect to find some gotchas in the default HTTP library's defaults (not expiring idle keepalive connections, not necessarily knowing when a reader has closed a stream). But if you want something more compact, know that: a. the Thrift implementation for Go, last I checked, was lacking and relatively slow. b. Go has great support for RPC but it might not play well with other languages. So you might want to check out protobuf, or work on top the redis protocol or something like that.
GC. Go's GC is very simplistic (STW, not generational, etc). If you plan on heavy memory caching in the orders of multiple Gs, expect GC pauses all over the place. There are techniques to reduce GC pressure but the straight forward Go idioms aren't usually optimized for that.
mmap'ing in Go is not straightforward, so it will be a bit of a struggle if you want to leverage that.
Besides slices, lists and maps, you won't have a lot of built in data structures to work with, like a Set type. There are tons of good implementations of them out there, but you'll have to do some digging up.
Take the time to learn concurrency patterns and interface patterns in Go. It's a bit different than other languages, and as a rule of thumb, if you find yourself struggling with a pattern from other languages, you're probably doing it wrong. A good talk about Go concurrency is this one IMHO http://www.youtube.com/watch?v=QDDwwePbDtw
A few projects you might want to have a look at:
Groupcache - a distributed key/value cache written in Go by Brad Fitzpatrick for Google's own use. It's a great implementation of a simple yet super robust distributed system in Go. https://github.com/golang/groupcache and check out Brad's presentation about it: http://talks.golang.org/2013/oscon-dl.slide
InfluxDB which includes a Go based version of the great Raft algorithm: https://github.com/influxdb/influxdb
My own humble (pretty dead) project, a redis compliant database that's based on a plugin architecture. My Go has improved since, but it has some nice parts, and it includes a pretty fast server for the redis protocol. https://bitbucket.org/dvirsky/boilerdb
I have an existing library written in C# which wraps a much lower-level TCP/IP API and exposes messages coming down the wire from a server (proprietary binary protocol) as .NET events. I also provide method calls on an object which handles the complexities of marshalling convenient .NET types (like System.DateTime) down to the binary encodings and fixed-length structures that the API requires (for outgoing messages to the server). There are a fair number of existing applications (both internally and used by third parties) built on top of this .NET library.
Recently, we've been approached by someone who doesn't want to do all the legwork of abstracting the TCP/IP themselves, but their environment is strictly non-Windows (I assume *nix, but I'm not 100% sure), and they've intimated that their ideal would be something callable from Java.
What's the best way to support their requirements, without me having to:
Port the code to Java now (including an unmanaged DLL that we currently P/Invoke into for decompression)
Have to maintain two separate code-bases going forwards (i.e. making the same bug-fixes and feature enhancements twice)
One thing I've considered is to re-write most of the core TCP/IP functionality once into something more cross-platform (C / C++) and then change my .NET library to be a thin layer on top of this (P/Invoke?), and then write a similarly thin Java layer on top of it too (JNI?).
Pros:
I mostly spend my time writing things only once.
Cons:
Most of the code would now be unmanaged - not the end of the world, but not ideal from a productivity point of view (for me).
Longer development time (can't port C# sockets code to C / C++ as quickly as just porting to Java) [How true is this?]
At this point, the underlying API is mostly wrapped and the library is very stable, so there's probably not a lot of new development - it might not be that bad to just port the current code to Java and then have to make occasional bug-fixes or expose new fields twice in the future.
Potential instability for my existing client applications while the version they're running on changes drastically underneath them. (Off the top of my head I can think of 32/64 bit issues, endianness issues, and general bugs that may crop up during the port, etc.)
Another option I've briefly considered is somehow rigging Mono up to Java, so that I can leverage all of the existing C# code I already have. I'm not too clued up though on how smooth the developer experience will be for the Java developers who have to consume it though. I'm pretty sure that most of the code should run without trouble under Mono (bar the decompression P/Invoke which should probably just be ported to C# anyway).
I'd ideally not like to add another layer of TCP/IP, pipes, etc. between my code and the client Java app if I can help it (so WCF to Java-side WS-DeathStar is probably out). I've never done any serious development with Java, but I take some pride in the fact that the library is currently a piece of cake for a third-party developer to integrate into his application (as long as he's running .NET of course :)), and I'd like to be able to keep that same ease-of-use for any Java developers who want the same experience.
So if anyone has opinions on the 3 options I've proposed (port to Java & maintain twice, port to C and write thin language bindings for .NET and Java or, try and integrate Java and Mono), or any other suggestions I'd love to hear them.
Thanks
Edit: After speaking directly with the developer at the client (i.e. removal of broken telephone AKA Sales Department) the requirements have changed enough that this question no longer applies very well to my immediate situation. However, I'll leave the question open in the hopes that we can generate some more good suggestions.
In my particular case, the client actually runs Windows machines in addition to Solaris (who doesn't these days?) and is happy for us to write an application (Windows Service) on top of the library and provide a much more simplified and smaller TCP/IP API for them to code against. We will translate their simple messages into the format that the downstream system understands, and translate incoming responses back for them to consume, so that they can continue to interface with this downstream system via their Java application.
Getting back to the original scenario after thinking about this for a couple of weeks, I do have a few more comments:
A portable C-based library with different language bindings on top would probably be the way to go if you knew up front that you'd need to support multiple languages / platforms.
On *nix, can a single process host both a Java runtime and a Mono runtime simultaneously? I know in earlier versions of .NET you couldn't have two different .NET runtimes in the same process, but I believe they've fixed this with .NET 4? If this is possible, how would one communicate between the two? Ideally you'd want something as simple as a static method call and a delegate to raise responses with.
If there's no easy direct interface support between Java & Mono (methods & delegates, etc.), one might consider using something like ZeroMQ with Protocol Buffers or Apache Thrift as the message format. This would work in-process, inter-process and over the network because of ZeroMQ's support for different transports.
Spend more time getting the requirements nailed down before deciding on an implementation. Until you know what is required, you don't have any criteria for choosing between designs.
If it's a non-windows environment, it doesn't make sense to have .NET anywhere in there, for example.
If you need something that runs on the Java Virtual Machine but looks a lot like C#, you should check out Stab. This will not help you with P/Invoke and the like but you may find it less work to port your C# code to Java and maintain it.
You should look into Mono though. I expect that all your C# code would run unmodified (except the parts that touch the unmanaged DLL).
I have not used it but jni4net is supposed to allow calling .NET code from Java. If your clients want a Java interface, this may be a solution.
I use Mono on Linux and the Mac all the time even when .NET compatibility is not a priority. I like C# and the .NET libraries and prefer the CLR to the JVM. Mono is MIT/X11 licensed which means that you can use it commercially if you like. Unlike some others, I see no reason to avoid technology championed by Microsoft while favouring technology championed by Oracle and IBM.
Using Mono will not help you with the unmanaged bits, although you can still P/Invoke into a native DLL. You will just have to port that DLL yourself or find some equivalent.
You may also want to look into Mono Ahead of Time compilation.
Have you considered mono? It would most likely support your existing code in the non-windows environment. The trick would be calling it from java, but the mono folks might have something to help you out there, too.
This probably isn't the right solution in your case, but for completeness:
There are a few languages that can target both the JVM and .NET, in particular Ruby (JRuby and IronRuby) and Python (Jython and IronPython). Scala might eventually get there too, although right now the .NET version is a long way behind the JVM version.
Anyway, you could potentially rewrite your library in Ruby or Python and target both runtimes.
If what you really, really want is to be able to code in .NET and have it run on the JVM, you could check out Grasshopper (2015-09: link possibly dead). That is what it is designed to do.
I know the Mainsoft guys have been contributors to Mono over the years. If I remember correctly, they wrote the Visual Basic compiler for Mono.
There is also the C# to Java converter from Tangible. I have heard good things but I have never used it myself.
Also, it does not help your situation much but I should point out Mono for Android.
Mono for Android runs the CLR and the Dalvik VM in parallel. In other words, the C# code you wrote for Android can be calling into Java libraries (like the Android UI for example) and executing as a single app. You had asked about the ability to run .NET and Java code in the same process. Clearly, it can be done.
One thing I've considered is to re-write most of the core TCP/IP functionality once into something more cross-platform (C / C++) and then change my .NET library to be a thin layer on top of this (P/Invoke?), and then write a similarly thin Java layer on top of it too (JNI?).
That's a possibility. On the Java side, you should consider using JNA rather than JNI. (If you use JNI, the C / C++ code needs to be written to use JNI-specific signatures.)
Another possibility is to replace the proprietary binary protocol with something that "just works" with multiple programming languages. This is the kind of problem space where CORBA and similar technologies provide a good solution.
My question: What approach could/should I take to communicate between two or more JVM instances that are running locally?
Some description of the problem:
I am developing a system for a project that requires separate JVM instances to isolate certain tasks from each other entirely.
In it's running, the 'parent' JVM will create 'child' JVMs that it will expect to execute and then return results to it (in the format of relatively simple POJO classes, or perhaps structured XML data). These results should not be transferred using the SysErr/SysOut/SysIn pipes as the child may already use these as part of its running.
If a child JVM does not respond with results within a certain time, the parent JVM should be able to signal to the child to cease processing, or to kill the child process. Otherwise, the child JVM should exit normally at the end of completing its task.
Research so far:
I am aware there are a number of technologies that may be of use e.g....
Using Java's RMI library
Using sockets to transfer objects
Using distribution libraries such as Cajo, Hessian
...but am interested in hearing what approaches others may consider before pursuing one of these options, or any others.
Thanks for any help or advice on this!
Edits:
Quantity of data to transfer- relatively small, it will mostly be just a handful of POJOs containing strings that will represent the result of the child executing. If any solution would be inefficient on larger amounts of information, this is unlikely to be a problem in my system. The amount being transferred should be pretty static and so this does not have to be scalable.
Latency of transfer- not a critical concern in this case, although if any 'polling' of results is needed this should be able to be fairly frequent without significant overheads, so I can maintain a responsive GUI on top of this at a later time (e.g. progress bar)
Not directly an answer to your question, but a suggestion of an alternative.
Have you considered OSGI?
It lets you run java projects in complete isolation from each other, within the SAME jvm.
The beauty of it is that communication between projects is very easy with services (see Core Specifications PDF page 123). This way there is not "serialization" of any sort being done as the data and calls are all in the same jvm.
Furthermore all your requirements of quality of service (response time etc...) go away - you only have to worry about whether the service is UP or DOWN at the time you want to use it. And for that you have a really nice specification that does that for you called Declarative Services (See Enterprise Spec PDF page 141)
Sorry for the off-topic answer, but I thought some other people might consider this as an alternative.
Update
To answer your question about security, I have never considered such a scenario. I don't believe there is a way to enforce "memory" usage within OSGI.
However there is a way of communicating outside of JVM between different OSGI runtimes. It is called Remote Services (see Enterprise Spec PDF, page 7). They also have nice discussion there of the factors to take into consideration when doing something like that (see 13.1 Fallacies).
Folks at Apache Felix (implementation of OSGI) I think have implementation of this with iPOJO, called Distributed Services with iPOJO (their wrapper to make using services easier). I've never used this - so ignore me if I am wrong.
I'd use KryoNet with local sockets since it specialises heavily in serialisation and is quite lightweight (you also get Remote Method Invocation! I'm using it right now), but disable the socket disconnection timeout.
RMI basically works on the principle that you have a remote type and that the remote type implements an interface. This interface is shared. On your local machine, you bind the interface via the RMI library to code 'injected' in-memory from the RMI library, the result being that you have something that satisfies the interface but is able to communicate with the remote object.
akka is another option, as well as other java actor frameworks, it provides communication and other goodies derived from the actor model.
If you can't use stdin/stdout, then i'd go with sockets. You need some sort of serialization layer on top of the sockets (as you would with stdin/stdout), and RMI is a very easy to use and pretty effective such layer.
If you used RMI and found the performance wasn't good enough, i'd switch to some more efficient serializer - there are plenty of options.
I wouldn't go anywhere near web services or XML. That seems like a complete waste of time, likely take more effort and deliver less performance than RMI.
Not many people seem to like RMI any longer.
Options:
Web Services. e.g. http://cxf.apache.org
JMX. Now, this is really a means of using RMI under the table, but it would work.
Other IPC protocols; you cited Hessian
Roll-your-own using sockets, or even shared memory. (Open a mapped file in the parent, open it again in the child. You'd still need something for synchronization.)
Examples of note are Apache ant (which forks all sorts of Jvms for one purpose or another), Apache maven, and the open source variant of the Tanukisoft daemonization kit.
Personally, I'm very facile with web services, so that's the hammer which which I tend to turn things into nails. A typical JAX-WS+JAX-B or JAX-RS+JAX-B service is very little code with CXF, and manages all the data serialization and deserialization for me.
It was mentioned above, but i wanted to expand a bit on the JMX suggestion. we actually are doing pretty much exactly what you are planning to do (from what i can glean from your various comments). we landed on using jmx for a variety of reasons, a few of which i'll mention here. for one thing, jmx is all about management, so in general it is a perfect fit for what you want to do (especially if you already plan on having jmx services for other management tasks). any effort you put into jmx interfaces will do double duty as apis you can call using java management tools like jvisualvm. this leads to my next point, which is the most relevant to what you want. the new Attach API in jdk 6 and above is very sweet. it enables you to dynamically discover and communicate with running jvms. this allows, for example, for your "controller" process to crash and restart and re-find all the existing worker processes. this is the makings of a very robust system. it was mentioned above that jmx basically rmi under the hood, however, unlike using rmi directly, you don't need to manage all the connection details (e.g. dealing with unique ports, discoverability, etc). the attach api is a bit of a hidden gem in the jdk, as it isn't very well documented. when i was poking into this stuff initially, i didn't know the name of the api, so figuring how the "magic" in jvisualvm and jconsole worked was very difficult. finally, i came across an article like this one, which shows how to actually use the attach api dynamically in your own program.
Although it's designed for potentially remote communication between JVMs, I think you'll find that Netty works extremely well between local JVM instances as well.
It's probably the most performant / robust / widely supported library of its type for Java.
A lot is discussed above. But be it sockets, rmi, jms - there is a lof of dirty work involved.
I would ratter advice akka. It is a actor based model which communicate with each other using Messages.
The beauty is, the actors can be on same JVM or another (very little config) and akka takes care the rest for you. I haven't seen a more cleaner way than doing this :)
Try out jGroups if the data to be communicated is not huge.
How about http://code.google.com/p/protobuf/
It is lightweight.
As you mentioned you can obviously send the objects over the network but that is a costly thing not to mention start up a separate JVM.
Another approach if you just want to separate your different worlds inside one JVM is to load the classes with different classloaders. ClassA#CL1!=ClassA#CL2 if they are loaded by CL1 and CL2 as sibling classloaders.
To enable communications between classA#CL1 and classA#CL2 you could have three classloaders.
CL1 that loads process1
CL2 that loads process2 (same classes as in CL1)
CL3 that loads communication classes (POJOs and Service).
Now you let CL3 be the parent classloader of CL1 and CL2.
In classes loaded by CL3 you can have a light-weight communication send/receive functionality (send(Pojo)/receive(Pojo)) the POJOs between classes in CL1 and classes in CL2.
In CL3 you expose a static service that enables implementations from CL1 and CL2 register to send and receive the POJOs.
In delphi, I am trying to call a function from an external Java program. Is there any way to do it ?
The standard process to call native code is via JNI. A search on JNI and Delphi will reveal multiple pages that detail how this is done, like this and this
What is more desirable (setting up some out of process server (like Peter already detailed, so I skipped that) or using JNI to call a library depends on how often (and how realtime) you need this to be, and on allowable installation/configuration complexity
If it is a running Java application you will need to expose access to that function. There are a myriad of solutions possible.
If it is only 1 function or very limited functionality, then listening on the humble socket or named pipe is a solution which is currently undervalued and kind of forgotten.
On the next step of integration I would look at asynchronous message passing. It is easy to embed an activemq server or similar or start it in a separate process. This has a number of advantages like that the request are easily synchronized in the Java process by simply using one listening thread, that the behavior is well defined when the Java program is not available or the Delphi one. It is very easy to manage and you get the instrumentation for free.
An embedded Jetty webserver is an easy, reliable solution and implement a servlet to do your bidding. Again a lot of the complexity is now handled by using ubiquitous and standard protocols.
Then there are the synchronous RPC methods like COM, Corba, SOAP which I personally find much too complex, error-prone and maintenance unfriendly to use for ad-hoc communication between processes. If you want to build a complete infrastructure of stuff talking to each other it might be worth it, but not to get 2 programs talking.