I have a computation library implemented with java/scala. And I also have a little of node.js code serving my application. I need to find a way how to connect this 2 worlds with maximum performance, but also simplicity in mind. I was thinking about inter process communication via shared memory, but don't find any mature way to do that in node.js
This should work mostly as a proxy mechanism to call some java (ideally any) code from node.js code. From node.js to java side it will be only request metadata passing, however from java to node.js sometime it could be significant data returned (let's say 100-200 kb as upper border, and around 600-1000 bytes in 90% of the cases) However amount of that's request could be significant.
Think OpenMP could be an option, but also can't find any openmp protocol implementation for Node. However there is also no clear project for java as well.
Looks like for the current moment there is several alternatives:
Native extension + Java Unsafe (currently extracted via reflection, should be opened in JDK 9) and using shared memory in C/C++ based env (need investigation and development. Looses for Node -> c -> Java could be higher than shared memory benefits)
Use socket (quite fast on linux, not sure about Windows, crossplatform)
FastCGI (still use sockets transfering inside, so will be slower than 1 option)
ZeroMQ/Nanomessage as transport layer (again socket inside, but simplified development)
#David's answer. However can't say anything specific about it. Investigation needed.
Well, if sockets are too slow for you, why not keep it in-process?
You could try:
running your unmodified Node.js scripts on Nodyn which in turn runs on DynJS on the JVM; -or-
if you've not particular to the Node.js stack, but like the idea of extreme-wait-free throughput on the JVM, code it all up in vert.x.
Note: An alternative to Nodyn/DynJS would have been the Avatar.js project, which uses Nashorn, which in turn is shipped with recent JVM's and uses the latest and greatest bytecode operators. However in late 2015 the Avatar.js project feels abandoned. :-\
Related
I come from a C/Linux background and don't have much background in Java. I generally develop system administrator utilities like :
disk cleanup
retrieve lost data / files
repairing file systems
disk de-fragmentation
I also develop Network monitoring security applications which help admins monitor :
- their networks,
- scan incoming & outgoing data packets,
- remotely block ports / USBs
- monitor emails with attachments etc
Right now we write code in C for Linux which has to be ported to windows but such a problem will not exist in Java.
My questions are :
Is Java the right language for writing these applications & utilities (as mentioned above)?
I understand Java will provide Libraries and classes to access system resources / network / sockets but will Java abstraction be a hindrance at some point (which would restrict the flexibility which C/C++ provide )?
If for example I want to write a utility to repair a file system / or retrieve data for Windows & Unix ...will I be using same API for both OS or there are different API for different OS?
I am not concerned about the speed / execution trade off since none of my applications have to make real time decisions as in the gaming industry.
Java is the right language if you want portability. You can do almost everything you can do with C/C++ and you can utilize patterns and libraries that help you create great maintainable designs. In case there is something very low level you cannot do with Java, you always can create your own native code that is loaded with Java Native Interface. Thus the only non-portable code you will have will be these native-code libraries.
Right now we write code in C for Linux which has to be ported to
windows but such a problem will not exist in Java.
Java can abstract away only so much since in the end, low level stuff always boils down to making system calls, which are different between OSes.
As long as you're working with pure java logic, or simple operating system utilities, you'll be golden. You want to open a TCP socket and connect to google.com? No problem. You want to open a file in a known location, read some lines, process them, and write the results to a different file? No problem, Java has you covered.
But, if you want to do more low-level stuff with Java, you'll run into trouble pretty soon. You want to open a raw socket and send a TCP packet? You can't, windows doesn't allow that. You want to get a file's creation time on Linux? You can't, Linux doesn't keep that information. Java's BasicFileAttributes.creationTime() will return a file's modification time on Linux. You want to get a nanosecond resolution timestamp? Well, you can, but only on some OSes. Or say you want to get the computer's hostname without resorting to a network lookup (which depends on a network being actually available), well, get ready for some hacking (this is my own answer by the way).
Now, to your more specific questions:
Is Java the right language for writing these applications & utilities (as mentioned above)?
I frankly don't know. I never tried defragmenting or restoring a file programmatically from Java. But since it involves some very low level filesystem operations, I suggest you do some serious reading before moving to Java. Check whether the APIs you need exist in the language itself or in some good libraries.
I understand Java will provide Libraries and classes to access system
resources / network / sockets but will Java abstraction be a hindrance
at some point (which would restrict the flexibility which C/C++
provide )?
Yes. For instance, it's impossible to open a raw socket using pure Java. And if I recall correctly, it's also impossible to set some socket options.
If for example I want to write a utility to repair a file system / or
retrieve data for Windows & Unix ...will I be using same API for both
OS or there are different API for different OS?
I never tried repairing a file system in Java, so I can't tell you about the APIs involved. But I find it hard to believe you'll find a pure Java api for doing low level stuff with the file system. You'll probably have to write your own native code (and run it through JNI) or use someone else's library (which probably uses JNI, like the raw socket library I mentioned earlier).
After searching for an option to run Java code from Django application(python), I found out that Py4J is the best option for me. I tried Jython, JPype and Python subprocess and each of them have certain limitations:
Jython. My app runs in python.
JPype is buggy. You can start JVM just once after that it fails to start again.
Python subprocess. Cannot pass Java object between Python and Java, because of regular console call.
On Py4J web site is written:
In terms of performance, Py4J has a bigger overhead than both of the previous solutions (Jython and JPype) because it relies on sockets, but if performance is critical to your application, accessing Java objects from Python programs might not be the best idea.
In my application performance is critical, because I'm working with Machine learning framework Mahout. My question is: Will Mahout also run slower because of Py4J gateway server or this overhead just mean that invoking Java methods from Python functions is slower (in latter case performance of Mahout will not be a problem and I can use Py4J).
JPype issue that #HIP_HOP mentioned with JVM getting detached from new threads can be overcome with the following hack (add it before the first call to Java objects in the new thread which does not have JVM yet):
# ensure that current thread is attached to JVM
# (essential to prevent JVM / entire container crashes
# due to "JPJavaEnv::FindClass" errors)
if not jpype.isThreadAttachedToJVM():
jpype.attachThreadToJVM()
PySpark uses Py4J quite successfully. If all the heavylifting is done on Spark (or Mahout in your case) itself, and you just want to return result back to "driver"/Python code, then Py4J might work for you very well as well.
Py4j has slightly bigger overhead for huge results (that's not necessarily the case for Spark workloads, as you only return summaries /aggregates for the dataframes). There is an improvement discussion for py4j to switch to binary serialization to remove that overhead for higher badnwidth requirements too: https://github.com/bartdag/py4j/issues/159
My solutions
java thread/process <-> Pipes <-> py subprocess
Use pipes by java's ProcessBuilder to call py with args "-u" to transfer data via pipes.
Here is a good practice.
https://github.com/JULIELab/java-stdio-ipc
Here is my stupid research result about "java <-> py"
[Jython] Java implement of python.
[Jpype] JPype is designed to allow the user to exercise Java as fluidly as possible from within Python. We can break this down into a few specific design goals.
Unlike Jython, JPype does not achieve this by re-implementing Python, but instead by interfacing both virtual machines at the native level. This shared memory based approach achieves good computing performance while providing the access to the entirety of CPython and Java libraries.
[Runtime] The Runtime class in java (old method).
[Process] Java ProcessBuilder class gives more structure to the arguments.
[Pipes] Named pipes could be the answer for you. Use subprocess. Popen to start the Java process and establish pipes to communicate with it.
Try mkfifo() implementation in python.
https://jj09.net/interprocess-communication-python-java/
-> java<-> Pipes <-> py https://github.com/JULIELab/java-stdio-ipc
[Protobuf] This is the opensource solution Google uses to do IPC between Java and Python. For serializing and deserializing data efficiently in a language-neutral, platform-neutral, extensible way, take a look at Protocol Buffers.
[Socket] CS-arch throgh socket
Server(Python) - Client(Java) communication using sockets
https://jj09.net/interprocess-communication-python-java/
Send File From Python Server to Java Client
[procbridge]
A super-lightweight IPC (Inter-Process Communication) protocol over TCP socket. https://github.com/gongzhang/procbridge
https://github.com/gongzhang/procbridge-python
https://github.com/gongzhang/procbridge-java
[hessian binary web service protocol] using python client and java server.
[Jython]
Jython is a reimplementation of Python in Java. As a result it has much lower costs to share data structures between Java and Python and potentially much higher level of integration. Noted downsides of Jython are that it has lagged well behind the state of the art in Python; it has a limited selection of modules that can be used; and the Python object thrashing is not particularly well fit in Java virtual machine leading to some known performance issues.
[Py4J]
Py4J uses a remote tunnel to operate the JVM. This has the advantage that the remote JVM does not share the same memory space and multiple JVMs can be controlled. It provides a fairly general API, but the overall integration to Python is as one would expect when operating a remote channel operating more like an RPC front-end. It seems well documented and capable. Although I haven’t done benchmarking, a remote access JVM will have a transfer penalty when moving data.
[Jep]
Jep stands for Java embedded Python. It is a mirror image of JPype. Rather that focusing on accessing Java from within Python, this project is geared towards allowing Java to access Python as a sub-interpreter. The syntax for accessing Java resources from within the embedded Python is quite similar to support for imports. Notable downsides are that although Python supports multiple interpreters many Python modules do not, thus some of the advantages of the use of Python may be hard to realize. In addition, the documentation is a bit underwhelming thus it is difficult to see how capable it is from the limited examples.
[PyJnius]
PyJnius is another Python to Java only bridge. Syntax is somewhat similar to JPype in that classes can be loaded in and then have mostly Java native syntax. Like JPype, it provides an ability to customize Java classes so that they appear more like native classes. PyJnius seems to be focused on Android. It is written using Cython .pxi files for speed. It does not include a method to represent primitive arrays, thus Python list must be converted whenever an array needs to be passed as an argument or a return. This seems pretty prohibitive for scientific code. PyJnius appears is still in active development.
[Javabridge]
Javabridge is direct low level JNI control from Python. The integration level is quite low on this, but it does serve the purpose of providing the JNI API to Python rather than attempting to wrap Java in a Python skin. The downside being of course you would really have to know a lot of JNI to make effective use of it.
[jpy]
This is the most similar package to JPype in terms of project goals. They have achieved more capabilities in terms of a Java from Python than JPype which does not support any reverse capabilities. It is currently unclear if this project is still active as the most recent release is dated 2014. The integration level with Python is fairly low currently though what they do provide is a similar API to JPype.
[JCC]
JCC is a C++ code generator that produces a C++ object interface wrapping a Java library via Java’s Native Interface (JNI). JCC also generates C++ wrappers that conform to Python’s C type system making the instances of Java classes directly available to a Python interpreter. This may be handy if your goal is not to make use of all of Java but rather have a specific library exposed to Python.
[VOC] https://beeware.org/project/projects/bridges/voc/_
A transpiler that converts Python bytecode into Java bytecode part of the BeeWare project. This may be useful if getting a smallish piece of Python code hooked into Java. It currently list itself as early development. This is more in the reverse direction as its goals are making Python code available in Java rather providing interaction between the two.
[p2j]
This lists itself as “A (restricted) python to java source translator”. Appears to try to convert Python code into Java. Has not been actively maintained since 2013. Like VOC this is primilarly for code translation rather that bridging.
[GraalVM]
Source: https://github.com/oracle/graal
I don't know Mahout. But think about that: At least with JPype and Py4J you will have performance impact when converting types from Java to Python and vice versa. Try to minimize calls between the languages. Maybe it's an alternative for you to code a thin wrapper in Java that condenses many Javacalls to one python2java call.
Because the performance is also a question about your usage screnario (how often you call the script and how large is the data that is moved) and because the different solutions have their own specific benefits/drawbacks, I have created an API to switch between different implementations without you having to change your python script: https://github.com/subes/invesdwin-context-python
Thus testing what works best or just being flexible about what to deploy to is really easy.
I have an existing library written in C# which wraps a much lower-level TCP/IP API and exposes messages coming down the wire from a server (proprietary binary protocol) as .NET events. I also provide method calls on an object which handles the complexities of marshalling convenient .NET types (like System.DateTime) down to the binary encodings and fixed-length structures that the API requires (for outgoing messages to the server). There are a fair number of existing applications (both internally and used by third parties) built on top of this .NET library.
Recently, we've been approached by someone who doesn't want to do all the legwork of abstracting the TCP/IP themselves, but their environment is strictly non-Windows (I assume *nix, but I'm not 100% sure), and they've intimated that their ideal would be something callable from Java.
What's the best way to support their requirements, without me having to:
Port the code to Java now (including an unmanaged DLL that we currently P/Invoke into for decompression)
Have to maintain two separate code-bases going forwards (i.e. making the same bug-fixes and feature enhancements twice)
One thing I've considered is to re-write most of the core TCP/IP functionality once into something more cross-platform (C / C++) and then change my .NET library to be a thin layer on top of this (P/Invoke?), and then write a similarly thin Java layer on top of it too (JNI?).
Pros:
I mostly spend my time writing things only once.
Cons:
Most of the code would now be unmanaged - not the end of the world, but not ideal from a productivity point of view (for me).
Longer development time (can't port C# sockets code to C / C++ as quickly as just porting to Java) [How true is this?]
At this point, the underlying API is mostly wrapped and the library is very stable, so there's probably not a lot of new development - it might not be that bad to just port the current code to Java and then have to make occasional bug-fixes or expose new fields twice in the future.
Potential instability for my existing client applications while the version they're running on changes drastically underneath them. (Off the top of my head I can think of 32/64 bit issues, endianness issues, and general bugs that may crop up during the port, etc.)
Another option I've briefly considered is somehow rigging Mono up to Java, so that I can leverage all of the existing C# code I already have. I'm not too clued up though on how smooth the developer experience will be for the Java developers who have to consume it though. I'm pretty sure that most of the code should run without trouble under Mono (bar the decompression P/Invoke which should probably just be ported to C# anyway).
I'd ideally not like to add another layer of TCP/IP, pipes, etc. between my code and the client Java app if I can help it (so WCF to Java-side WS-DeathStar is probably out). I've never done any serious development with Java, but I take some pride in the fact that the library is currently a piece of cake for a third-party developer to integrate into his application (as long as he's running .NET of course :)), and I'd like to be able to keep that same ease-of-use for any Java developers who want the same experience.
So if anyone has opinions on the 3 options I've proposed (port to Java & maintain twice, port to C and write thin language bindings for .NET and Java or, try and integrate Java and Mono), or any other suggestions I'd love to hear them.
Thanks
Edit: After speaking directly with the developer at the client (i.e. removal of broken telephone AKA Sales Department) the requirements have changed enough that this question no longer applies very well to my immediate situation. However, I'll leave the question open in the hopes that we can generate some more good suggestions.
In my particular case, the client actually runs Windows machines in addition to Solaris (who doesn't these days?) and is happy for us to write an application (Windows Service) on top of the library and provide a much more simplified and smaller TCP/IP API for them to code against. We will translate their simple messages into the format that the downstream system understands, and translate incoming responses back for them to consume, so that they can continue to interface with this downstream system via their Java application.
Getting back to the original scenario after thinking about this for a couple of weeks, I do have a few more comments:
A portable C-based library with different language bindings on top would probably be the way to go if you knew up front that you'd need to support multiple languages / platforms.
On *nix, can a single process host both a Java runtime and a Mono runtime simultaneously? I know in earlier versions of .NET you couldn't have two different .NET runtimes in the same process, but I believe they've fixed this with .NET 4? If this is possible, how would one communicate between the two? Ideally you'd want something as simple as a static method call and a delegate to raise responses with.
If there's no easy direct interface support between Java & Mono (methods & delegates, etc.), one might consider using something like ZeroMQ with Protocol Buffers or Apache Thrift as the message format. This would work in-process, inter-process and over the network because of ZeroMQ's support for different transports.
Spend more time getting the requirements nailed down before deciding on an implementation. Until you know what is required, you don't have any criteria for choosing between designs.
If it's a non-windows environment, it doesn't make sense to have .NET anywhere in there, for example.
If you need something that runs on the Java Virtual Machine but looks a lot like C#, you should check out Stab. This will not help you with P/Invoke and the like but you may find it less work to port your C# code to Java and maintain it.
You should look into Mono though. I expect that all your C# code would run unmodified (except the parts that touch the unmanaged DLL).
I have not used it but jni4net is supposed to allow calling .NET code from Java. If your clients want a Java interface, this may be a solution.
I use Mono on Linux and the Mac all the time even when .NET compatibility is not a priority. I like C# and the .NET libraries and prefer the CLR to the JVM. Mono is MIT/X11 licensed which means that you can use it commercially if you like. Unlike some others, I see no reason to avoid technology championed by Microsoft while favouring technology championed by Oracle and IBM.
Using Mono will not help you with the unmanaged bits, although you can still P/Invoke into a native DLL. You will just have to port that DLL yourself or find some equivalent.
You may also want to look into Mono Ahead of Time compilation.
Have you considered mono? It would most likely support your existing code in the non-windows environment. The trick would be calling it from java, but the mono folks might have something to help you out there, too.
This probably isn't the right solution in your case, but for completeness:
There are a few languages that can target both the JVM and .NET, in particular Ruby (JRuby and IronRuby) and Python (Jython and IronPython). Scala might eventually get there too, although right now the .NET version is a long way behind the JVM version.
Anyway, you could potentially rewrite your library in Ruby or Python and target both runtimes.
If what you really, really want is to be able to code in .NET and have it run on the JVM, you could check out Grasshopper (2015-09: link possibly dead). That is what it is designed to do.
I know the Mainsoft guys have been contributors to Mono over the years. If I remember correctly, they wrote the Visual Basic compiler for Mono.
There is also the C# to Java converter from Tangible. I have heard good things but I have never used it myself.
Also, it does not help your situation much but I should point out Mono for Android.
Mono for Android runs the CLR and the Dalvik VM in parallel. In other words, the C# code you wrote for Android can be calling into Java libraries (like the Android UI for example) and executing as a single app. You had asked about the ability to run .NET and Java code in the same process. Clearly, it can be done.
One thing I've considered is to re-write most of the core TCP/IP functionality once into something more cross-platform (C / C++) and then change my .NET library to be a thin layer on top of this (P/Invoke?), and then write a similarly thin Java layer on top of it too (JNI?).
That's a possibility. On the Java side, you should consider using JNA rather than JNI. (If you use JNI, the C / C++ code needs to be written to use JNI-specific signatures.)
Another possibility is to replace the proprietary binary protocol with something that "just works" with multiple programming languages. This is the kind of problem space where CORBA and similar technologies provide a good solution.
Recently I have heard the bellow statement. Can someone please elaborate on it?
With client side applications, Java has better performance than .Net. The reason is that .Net environment on the server-side (iis?) is different than its client side. While Java uses the same environment at both ends. Since frameworks performance is optimized mainly on the service side, .Net client side is not as good as .Net server side or Java.
Update: I believe he also mentioned the difference between clients (XP, VISTA) and servers (Windows 2008 server) with respect to .Net
In client operating systems you get a concurrent garbage collector. It is slower in absolute time, but it appears to the user to be faster because they get shorter pauses.
In server operating systems you get a serial garbage collector. It is faster overall, but has to pause applications longer.
This is old information, I don't know if it is still true.
EDIT: Java also has a client and server modes. Unlike .NET this isn't tied to the OS, you instead pass it as a command line parameter.
Edit 2: From MSDN Magizine in Dec. 2000
On a multiprocessor system running the server version of the execution engine (MSCorSvr.dll), the managed heap is split into several sections, one per CPU. When a collection is initiated, the collector has one thread per CPU; all threads collect their own sections simultaneously. The workstation version of the execution engine (MSCorWks.dll) doesn't support this feature.
http://msdn.microsoft.com/en-us/magazine/bb985011.aspx
Again, this is old information and may have changed.
This makes absolutely no sense.
.NET is not a server side or a client side framework. There are pieces that you use on the server side or on the client side but it's all part of the same beast.
Aside from whether it's correct or not, most (99.9999%) of people who make an unqualified statement like Y performs better than X for some ambiguous and unmeasurable task are, as Carlin would say, embarrassingly full of s***.
The .NET CLR (Common Language Runtime) is the same on the server and on the client side. The .NET CLR works conceptually like the Java VM.
There are Client Profiles for .NET 3.5 and later that only provide a subset of the .NET API suitable for client apps, but this is just offered as a convenience to reduce the .NET footprint. Any supported OS can install the full .NET version.
I can only guess that the statement is a result of misunderstanding what a Client Profile is.
I've never been able to prove to myself that in general terms, Java is faster than .NET. I've run a few of my own benchmarks that indicate quite the opposite, but even then, I'm not willing to make such a blanket statement.
I can say that in pure code execution, .NET executes faster that Java on the same machine, at least the last time I bothered to test about 2 yrs ago. Code written in C# incidentally executes a little faster than VB.NET because C# doesn't have all the type checking that VB.NET does.
The algorithm I used to test was basically a string parser that took a string which was an arithmetic expression, transformed it into reverse polish notation, then determined the answer (stuff taught in many schools). Even doing my best to optimize the code in Java, I could never get it as fast as even the VB.NET code. Differences were around 10% as I recall.
That said, I've not benchmarked GC or other aspects and never have been able to dig up good unbiased benchmarks that actually test either in a real-ish system. Usually you get someone trying to prove why their religion is better and they ignore any other view point. I'm sure there's some aspects of Java where they have better algorithms that will nullify the raw code execution speed.
In short, when people make statements like that, ask them to back it up. If they can't or rely on 'everyone knows it', don't bet the farm on their statements.
We have an application written in C language which interacts with Oracle database. This application is an executable and runs on unix platform. We need to expose this application over http as web service for others to consume.
I thought of using JNI and CXF for webservice and run the application in tomcat.
Is this a right solution or there are other possibilities?
I found Axis2 supporting C language to write webservice. I have no experience of C language. Is Axis2 in C is good? What http server I can use to deploy the app? Would Apache webserver siffice in this case?
EDIT: The command line is not an option as though I mentioned its an exe but the part which I have to expose don't have any command line available and its bit hard as it needs complicated data structure as input.
It depends on a few factors. Vinko's method requires the app has a good clean command-line interface. Further, creating a new process for every webservice request will limit the number of requests that can be serviced. This may or may not be okay depending on how big the audience is expected to be.
If there's not that great a command-line interface and you want to maximize the number of requests you can serve, that then leaves you two main choices. Write the web service in Java, and call out to C with JNI or JNA. Or, write it in pure C or C++. The last is probably not advisable if the responsible developers don't know any C.
EDIT: Given that command-line is not an option, I recommend Java with JNI or JNA.
Consider using the Apache Foundation package Axis2/C. It is a pretty solid interface, though it still has slightly limited portability (works out of the box on Linux, but not on Solaris, for example - needs some tweaks).
However, since you say you don't have the experience in C, that may make it too daunting for you. On the other hand, you say the code you're trying to convert to a web service is in C (plus perhaps Oracle OCI); that means that you are going to find it hard to avoid learning some C to get things working.
After using Axis2/C on the server-side for more than two years, I strongly NOT recommend using Axis2/C for any server-side code for the following reasons:
It is full of memory leaks. Namely, service code generated from WSDL leaks, simple HTTP server leaks, CGI module leaks (which is not a problem if you use it as a basic CGI, but a major problem if you use it from FastCGI or similar, or reuse the code). The only part of the HTTP-server code in Axis2/C I didn't check so far is mod_axis2 module for Apache2. Maybe it's better.
Axis2/C doesn't have any HTTP server implementation that you could embed easily in your C app: the "simple HTTP server" leaks and it doesn't support HTTP keep-alives (closes connection after every request). I had to implement a server myself based on boost::asio HTTP server examples and Axis2/C CGI module. Spent 1 day on implementation and 4 days to remove all the memory leaks. This proportion seems standard for any Axis2/C-related work. Do you want to spend days and nights with valgrind, debugging memory leaks and double-free's?
Most important, the project is NOT actively maintained: there are a lot of issues with patches in their JIRA, but it takes months and years to review and apply the patches. I doubt if any serious project uses it for server-side. My plan in a long-term is to clone it into GIT and maintain the patched version on github (I have to support the code already implemented with Axis2/C for years).
P.S. in my next web-services related subproject I will use JNI to embed Jetty + CXF.