How to use DMA or RDMA in java? - java

"DMA" here means: Direct memory access, and "RDMA" is: remote direct memory access.
I used Java to created an application to transfer stock data, but I found the latency is bigger than I expected. I heard someone developed same type application used "DMA/RDMA", which has good performance, so I wonder if I can use "DMA/RDMA" in Java?
If not, what language should I use, and if there are any good libraries to use?

This article from IBM developers work give a great overview of how DMA access can be achieved using java

You have two options (that I know of):
Java SDP -- has been deprecated
JXIO [1], a high-performance messaging library built on RDMA and supported by Mellanox
[1] https://github.com/accelio/JXIO

There is DiSNI, a new library to program RDMA in Java. DiSNI is the open source variant of IBM's jVerbs. There are some performance numbers here that illustrate how RDMA with Java compares to sockets in C and Java, and also to RDMA in C.

RDMA as I know it is a property of the networking infrastructure (along with the relevant kernel modules etc), not of the application-level programming language.
In other words, you'd have to get specialized kit to make use of RDMA to reduce network latency. Here is an example of a 10GbE network card that supports RDMA: link.

Java applications are capable of mmaping files via nio packages. Those mmaped files can be accesses by multiple programs - I affraid this is closes thing to DMA available in java

Java 7 supports SDP protocol on Solaris and Linux (with OpenFabrics.org drivers). Unfortunately SDP protocol has been deprecated in the 2.x version of OFED. People resort to JNI wrappers like jVerbs.

Related

Developing Networking Security Applications & System utilities in Java

I come from a C/Linux background and don't have much background in Java. I generally develop system administrator utilities like :
disk cleanup
retrieve lost data / files
repairing file systems
disk de-fragmentation
I also develop Network monitoring security applications which help admins monitor :
- their networks,
- scan incoming & outgoing data packets,
- remotely block ports / USBs
- monitor emails with attachments etc
Right now we write code in C for Linux which has to be ported to windows but such a problem will not exist in Java.
My questions are :
Is Java the right language for writing these applications & utilities (as mentioned above)?
I understand Java will provide Libraries and classes to access system resources / network / sockets but will Java abstraction be a hindrance at some point (which would restrict the flexibility which C/C++ provide )?
If for example I want to write a utility to repair a file system / or retrieve data for Windows & Unix ...will I be using same API for both OS or there are different API for different OS?
I am not concerned about the speed / execution trade off since none of my applications have to make real time decisions as in the gaming industry.
Java is the right language if you want portability. You can do almost everything you can do with C/C++ and you can utilize patterns and libraries that help you create great maintainable designs. In case there is something very low level you cannot do with Java, you always can create your own native code that is loaded with Java Native Interface. Thus the only non-portable code you will have will be these native-code libraries.
Right now we write code in C for Linux which has to be ported to
windows but such a problem will not exist in Java.
Java can abstract away only so much since in the end, low level stuff always boils down to making system calls, which are different between OSes.
As long as you're working with pure java logic, or simple operating system utilities, you'll be golden. You want to open a TCP socket and connect to google.com? No problem. You want to open a file in a known location, read some lines, process them, and write the results to a different file? No problem, Java has you covered.
But, if you want to do more low-level stuff with Java, you'll run into trouble pretty soon. You want to open a raw socket and send a TCP packet? You can't, windows doesn't allow that. You want to get a file's creation time on Linux? You can't, Linux doesn't keep that information. Java's BasicFileAttributes.creationTime() will return a file's modification time on Linux. You want to get a nanosecond resolution timestamp? Well, you can, but only on some OSes. Or say you want to get the computer's hostname without resorting to a network lookup (which depends on a network being actually available), well, get ready for some hacking (this is my own answer by the way).
Now, to your more specific questions:
Is Java the right language for writing these applications & utilities (as mentioned above)?
I frankly don't know. I never tried defragmenting or restoring a file programmatically from Java. But since it involves some very low level filesystem operations, I suggest you do some serious reading before moving to Java. Check whether the APIs you need exist in the language itself or in some good libraries.
I understand Java will provide Libraries and classes to access system
resources / network / sockets but will Java abstraction be a hindrance
at some point (which would restrict the flexibility which C/C++
provide )?
Yes. For instance, it's impossible to open a raw socket using pure Java. And if I recall correctly, it's also impossible to set some socket options.
If for example I want to write a utility to repair a file system / or
retrieve data for Windows & Unix ...will I be using same API for both
OS or there are different API for different OS?
I never tried repairing a file system in Java, so I can't tell you about the APIs involved. But I find it hard to believe you'll find a pure Java api for doing low level stuff with the file system. You'll probably have to write your own native code (and run it through JNI) or use someone else's library (which probably uses JNI, like the raw socket library I mentioned earlier).

Infiniband in Java

As you all know, OFED's Socket Direct protocol is deprecated and OFED's 3.x releases do not come with SDP at all. Hence, Java's SDP also fails to work. I was wondering what is the proper method to program infiniband in Java? Is there any portable solution other than just writing JNI code?
My requirement is achieve RDMA among collection of infiniband powered machines.
jVerbs might be what you're looking for. Here's a little bit of documentation.
jVerbs looks interesting otherwise you might like to try rsockets with LD_PRELOAD.
Use Fast-MPJ or any other mpi in java which gives infinband device layer support. open-mpi was expected to release openMPI for java recently.
If you are looking for SDP replacement try IBM's JSOR API - it uses the same idea of providing RDMA behind good old Java sockets. It is faster than SDP and still supported. Works fine with OFED 3.1.

Py4J has bigger overhead than Jython and JPype

After searching for an option to run Java code from Django application(python), I found out that Py4J is the best option for me. I tried Jython, JPype and Python subprocess and each of them have certain limitations:
Jython. My app runs in python.
JPype is buggy. You can start JVM just once after that it fails to start again.
Python subprocess. Cannot pass Java object between Python and Java, because of regular console call.
On Py4J web site is written:
In terms of performance, Py4J has a bigger overhead than both of the previous solutions (Jython and JPype) because it relies on sockets, but if performance is critical to your application, accessing Java objects from Python programs might not be the best idea.
In my application performance is critical, because I'm working with Machine learning framework Mahout. My question is: Will Mahout also run slower because of Py4J gateway server or this overhead just mean that invoking Java methods from Python functions is slower (in latter case performance of Mahout will not be a problem and I can use Py4J).
JPype issue that #HIP_HOP mentioned with JVM getting detached from new threads can be overcome with the following hack (add it before the first call to Java objects in the new thread which does not have JVM yet):
# ensure that current thread is attached to JVM
# (essential to prevent JVM / entire container crashes
# due to "JPJavaEnv::FindClass" errors)
if not jpype.isThreadAttachedToJVM():
jpype.attachThreadToJVM()
PySpark uses Py4J quite successfully. If all the heavylifting is done on Spark (or Mahout in your case) itself, and you just want to return result back to "driver"/Python code, then Py4J might work for you very well as well.
Py4j has slightly bigger overhead for huge results (that's not necessarily the case for Spark workloads, as you only return summaries /aggregates for the dataframes). There is an improvement discussion for py4j to switch to binary serialization to remove that overhead for higher badnwidth requirements too: https://github.com/bartdag/py4j/issues/159
My solutions
java thread/process <-> Pipes <-> py subprocess
Use pipes by java's ProcessBuilder to call py with args "-u" to transfer data via pipes.
Here is a good practice.
https://github.com/JULIELab/java-stdio-ipc
Here is my stupid research result about "java <-> py"
[Jython] Java implement of python.
[Jpype] JPype is designed to allow the user to exercise Java as fluidly as possible from within Python. We can break this down into a few specific design goals.
Unlike Jython, JPype does not achieve this by re-implementing Python, but instead by interfacing both virtual machines at the native level. This shared memory based approach achieves good computing performance while providing the access to the entirety of CPython and Java libraries.
[Runtime] The Runtime class in java (old method).
[Process] Java ProcessBuilder class gives more structure to the arguments.
[Pipes] Named pipes could be the answer for you. Use subprocess. Popen to start the Java process and establish pipes to communicate with it.
Try mkfifo() implementation in python.
https://jj09.net/interprocess-communication-python-java/
-> java<-> Pipes <-> py https://github.com/JULIELab/java-stdio-ipc
[Protobuf] This is the opensource solution Google uses to do IPC between Java and Python. For serializing and deserializing data efficiently in a language-neutral, platform-neutral, extensible way, take a look at Protocol Buffers.
[Socket] CS-arch throgh socket
Server(Python) - Client(Java) communication using sockets
https://jj09.net/interprocess-communication-python-java/
Send File From Python Server to Java Client
[procbridge]
A super-lightweight IPC (Inter-Process Communication) protocol over TCP socket. https://github.com/gongzhang/procbridge
https://github.com/gongzhang/procbridge-python
https://github.com/gongzhang/procbridge-java
[hessian binary web service protocol] using python client and java server.
[Jython]
Jython is a reimplementation of Python in Java. As a result it has much lower costs to share data structures between Java and Python and potentially much higher level of integration. Noted downsides of Jython are that it has lagged well behind the state of the art in Python; it has a limited selection of modules that can be used; and the Python object thrashing is not particularly well fit in Java virtual machine leading to some known performance issues.
[Py4J]
Py4J uses a remote tunnel to operate the JVM. This has the advantage that the remote JVM does not share the same memory space and multiple JVMs can be controlled. It provides a fairly general API, but the overall integration to Python is as one would expect when operating a remote channel operating more like an RPC front-end. It seems well documented and capable. Although I haven’t done benchmarking, a remote access JVM will have a transfer penalty when moving data.
[Jep]
Jep stands for Java embedded Python. It is a mirror image of JPype. Rather that focusing on accessing Java from within Python, this project is geared towards allowing Java to access Python as a sub-interpreter. The syntax for accessing Java resources from within the embedded Python is quite similar to support for imports. Notable downsides are that although Python supports multiple interpreters many Python modules do not, thus some of the advantages of the use of Python may be hard to realize. In addition, the documentation is a bit underwhelming thus it is difficult to see how capable it is from the limited examples.
[PyJnius]
PyJnius is another Python to Java only bridge. Syntax is somewhat similar to JPype in that classes can be loaded in and then have mostly Java native syntax. Like JPype, it provides an ability to customize Java classes so that they appear more like native classes. PyJnius seems to be focused on Android. It is written using Cython .pxi files for speed. It does not include a method to represent primitive arrays, thus Python list must be converted whenever an array needs to be passed as an argument or a return. This seems pretty prohibitive for scientific code. PyJnius appears is still in active development.
[Javabridge]
Javabridge is direct low level JNI control from Python. The integration level is quite low on this, but it does serve the purpose of providing the JNI API to Python rather than attempting to wrap Java in a Python skin. The downside being of course you would really have to know a lot of JNI to make effective use of it.
[jpy]
This is the most similar package to JPype in terms of project goals. They have achieved more capabilities in terms of a Java from Python than JPype which does not support any reverse capabilities. It is currently unclear if this project is still active as the most recent release is dated 2014. The integration level with Python is fairly low currently though what they do provide is a similar API to JPype.
[JCC]
JCC is a C++ code generator that produces a C++ object interface wrapping a Java library via Java’s Native Interface (JNI). JCC also generates C++ wrappers that conform to Python’s C type system making the instances of Java classes directly available to a Python interpreter. This may be handy if your goal is not to make use of all of Java but rather have a specific library exposed to Python.
[VOC] https://beeware.org/project/projects/bridges/voc/_
A transpiler that converts Python bytecode into Java bytecode part of the BeeWare project. This may be useful if getting a smallish piece of Python code hooked into Java. It currently list itself as early development. This is more in the reverse direction as its goals are making Python code available in Java rather providing interaction between the two.
[p2j]
This lists itself as “A (restricted) python to java source translator”. Appears to try to convert Python code into Java. Has not been actively maintained since 2013. Like VOC this is primilarly for code translation rather that bridging.
[GraalVM]
Source: https://github.com/oracle/graal
I don't know Mahout. But think about that: At least with JPype and Py4J you will have performance impact when converting types from Java to Python and vice versa. Try to minimize calls between the languages. Maybe it's an alternative for you to code a thin wrapper in Java that condenses many Javacalls to one python2java call.
Because the performance is also a question about your usage screnario (how often you call the script and how large is the data that is moved) and because the different solutions have their own specific benefits/drawbacks, I have created an API to switch between different implementations without you having to change your python script: https://github.com/subes/invesdwin-context-python
Thus testing what works best or just being flexible about what to deploy to is really easy.

Can Java apps integrate with VB apps?

I am not sure exactly what I am asking....The guys that do the software development for the company I work for write everything in VB. I am currently the Web developer for this company and I specialize in Flex apps. I am thinking about expanding into their area. But I do not want to do VB, I don't mean to bash on VB but the coding syntax is not for me. So I am wondering if Java can integrate with VB? Also not sure if it matters but I think everything they do is procedural, and I will be doing OOP.
Thanks.
There are lots of integration opportunities, but before examining them, if I were you I would re-examine the question itself.
It should be exceptional to introduce a new language into an established project. The desires or aesthetic preference or skillset of a single developer is not a good enough justification to do so. Introducing a new language into a project should be a strategic decision for the project, not a backhanded one.
If you do choose to expand the core languages used to develop the system,
COM interop
is possible with JACOB. I believe IBM has a bridge as well.(Check alphaworks)
Java-.NET bridging
is possible via JNBridge and other bridges. This makes sense only if VB.NET is in use.
SOAP, XML document exchange, REST
suitable over a services boundary. It requires TCP or HTTP or some network protocol.
common data stores
can serve as a rendezvous point. Both Java and VB can read and update data in SQL Server, Oracle, MSMQ, MQSeries, and so on. Even a filesystem can be an integration point.
Think of data format as related to, but ideally independent of, the integration mechanism. What I mean is: You can use an XML document for integration, whether it is stored in a database, or sent over a REST interface, or stored in a filesystem, or put/get on a queue. You can use a comma-separated file over any of those mechanisms as well.
Potentially they could expose a service layer via soap or something simpler? Also you could always work against the same database with different languages however unless most of the logic is in stored procedures (not necessarily recommending this approach) then you end up with repeated code.
Not really. Java uses CORBA for interop, and VB uses COM for interop. You may be able to make a bridge using JNI, but I understand that can be quite the pain.
I haven't done this by I believe you have the following options:
Use the Java-COM bridge, as VB uses COM. This library was already mentioned here several times
If you are using VB.net, you probably use hessian, As it has both Java and C# implementations.
You could bridge the two using a C/C++ adapter to map JNI calls with COM. But that would be horrible. I hope there is a better solution, but my understanding is that it is pretty hard to integrate .NET code and Java as both vendors (Sun and Microsoft) don't have any incentive to streamline that kind of development.

Why .NET and Java on server side?

Java and .NET are two languages targeted at removing platform dependence. This is achieved by adding a virtual machine/framework between the code and the OS.
So, what is the point in using it on the server side, as all websites are accessible via browser, and that is platform independent? Is there any special reason for using them?
.NET is meant for the Windows platform only. Java is the only one of the two that is meant to be platform independent.
These languages have a strong presence on the server end for many reasons:
Lots of libraries that handle the subtasks of the problem
Both frameworks are built with security in mind
They are managed languages, it is much harder to pull off the typical attacks on software.
They are considered to be mature technologies, they have been put to quite a bit of abuse and have stood the test.
They have industry support.
Both are object oriented
This means that there is the ability to either develop a web site through the use of reusable components or third party components.
Both language allow "sandboxing" of non-managed components (Java: JNI .NET: Boxing) [allows inclusion of legacy components]
They're actually chosen for almost completely opposite reasons:
Java's platform independance means you're not tied to one platform and are thus more flexible in choosing the most cost-effective platform, or the most reliable one. And you can keep your apps running even when you have to change server platforms because the old one isn't supported anymore.
.NET is chosen because if you're going to tie yourself to an OS, Microsoft is the biggest player and thus the least risky option - or simply because companies fell into the "All-Microsoft shop" trap through gateway drugs like Exchange. And once you're there, .NET is what Microsoft wants you to use and supports and integrates with all their current tools.
Both Java and .NET have their own benefits for the server-side.
For example, with .NET you are free to pick the best language for the part of the application you are working on, and all of these .NET languages work together.
So, you may want to use F# for the data mining functions, C# to work with the database, unmanaged C++ (going through a thin managed C++ layer) for fast network connections, or system calls, and there are a host of other languages. .NET is less platform independent currently, but language independent.
Java can be used on several different OSes, which is advantageous if you are selling a solution, since you don't care what OS the customer is using.
Now the JVM is becoming less language dependent, with Clojure and Scala running off of it, so Java has become more interesting now, for designing applications.
They virtualize the underlying system, so they can be run on different kinds of server operating systems.
And, they are designed to be general purpose application development systems, so they are intended to be run on anything with a processor.
If you are asking because you do not understand why one would accept the overhead of an abstraction layer, keep in mind that both Java and .NET JIT down to native code.
So, what is the point in using it on the server side, as all websites are accessible via browser, and that is platform independent?
Well, web applications are not just rendering HTML for fun, they are doing things on the server-side that may involve talking to database(s), sending messages to a MOM, etc.
Is there any special reason for using them?
This is a partial answer but I wanted at least to cover the case of Java here. I could start by arguing that Java is a safe, robust, garbage-collected, object-oriented, high-performance, multi-threaded, interpreted, architecture-neutral, cross-platform, buzzword-compliant programming language... but this wouldn't really answer your question. Actually, the big deal with Java on the server side is IMO that you benefit from standardized Enterprise APIs (aka J2EE) that allow you to do "enterprise things" (JDBC, JTA, JMS, etc) in a standard way with hardware, operating system and software vendor independence (which is a big plus for contract negotiation). In other words, Java is perfect for heterogeneous environments which are almost always the case with big organizations and doesn't lock you in.
While platform independence are great to strive for I would say Java
and .net are commonly used as there are a large number frameworks
available which make it so much easier to develop enterprise level
applications. This is especially true with java, where you have an
incredible choice of high quality technologies most of which are
flexible enough to meet the needs of most projects, allowing you to
focus on your application's functionality.
Also, without no intent to start a flame war, Java and .net have
better development tool support and are easier/quicker to develop
with for your average programmer.
In early web days, it was mostly Perl and occasional brave souls who didn't like Perl or wanted more performance used C++. Then Sun developed JDBC and Servlets for Java, and then other J2EE pieces, and Java became a higher performance alternative to Perl, and easier than C++. With J2EE came a lot of application server products from big companies, and now you have a big Java web/app server community.
Then Microsoft came along, after losing the J++/Java war with Sun, and created a similar web app infrastructure with .NET. With .NET you have fewer choices, with all of the advantages and disadvantages that brings.
So, I think the answer is a mix of decent performance, safety and enterprise features, and major corporate backing. C++ is too hard and dangerous for most people. Perl, PHP, Python, and Ruby have their fans, but not the corporate backing. I don't think the fact that Java/.NET are on Virtual Machines is important for the server side. Java used a VM originally for the client. Sun had to work hard to make a fast sever VM. I think Microsoft used a VM to compete with Sun, and to make it easier to support multiple languages. It will be interesting to see if Google's Go language takes off, which may surpass Java and C# for safety and power but no VM.
the browser doesn't have access to the server resources (database, files etc.) that those frameworks have access to. You couldn't have an application that is only javascript (and does meaningful things)
I understand your question this way: why choose Java or .NET if there are more other comfortable ways to setup a server because the clients use HTTP to access the server?
You are right that the server OS or framework does generally not matter to the client.
However the client side also can contain applets or code that then needs to communicate with its counterpart on the server. Then JSP or .NET becomes more interesting as you do not have to manage different client OS's. Then website become easily extensible.
If you want to integrate some nice graphing and charting solutions (Telerik, Dundas, ... or whatever - not meant to be advertisement - ) from 3rd parties you would also have to select a compatible server infrastructure to run them.

Categories