Java Memory-mapped files? - java

Are Memory-mapped files in Java like Memory-mapped files for Windows? Or is it only emulation based on memory and file common operations in Java?

It uses the OS support for memory mapped files.
I'm trying to find documentation to back that up, but I haven't found anything conclusive yet. However, various bits of the docs do say things like this:
Many of the details of memory-mapped files are inherently dependent upon the underlying operating system and are therefore unspecified. The behavior of this method when the requested region is not completely contained within this channel's file is unspecified. Whether changes made to the content or size of the underlying file, by this program or another, are propagated to the buffer is unspecified. The rate at which changes to the buffer are propagated to the file is unspecified.
If everything were just emulated, there'd be no need for such unspecified behaviour.

Reading between the lines of your question and comment, what you are trying to do is use a memory mapped file to share memory/objects between a Java and C++ application.
My advice is don't try to do this.
There are difficult issues that would need to be addressed to make this reliable:
synchronizing the two applications' use of the shared data structure,
ensuring that changes made by one application get reliably written to main memory and read by the other one,
ensuring that changes get flushed to disc in the expected order.
A Java specific problem is that you cannot put your Java objects in the memory mapped area. Instead you have to serialize and deserialize them in some way that is compatible with the representations that the C++ side is expecting.
Finally, even if you do succeed in addressing all of those issues, your solution is likely to be fragile because it depends on unspecified behaviour of the OS, C++ and Java implementations that potentially may change if you change versions of any of the above.

Related

Creating a custom JVM with larger object header

For various reasons I came to the conclusion that creating a custom JVM build might be the easiest option for what I am trying to achieve as there are simply too many things that are affecting performance really badly if done otherwise.
So I have the environment up and running, modified some simple things to generate some callbacks for what I need, played with some intrinsics, so far so good.
What I would like to know though is: What do the JVM experts here think about the feasibility of creating a custom VM that has a larger object header (e.g. 8 bytes more). markOop.hpp explains the content of the mark word in a pretty good way for the different flavours that exist (32bits, 64bits, 64bits with compressed oops) and I was wondering how hard it would be to extend the header so I can put some extra info on the objects (no tagging is not on option, see my post here).
So before digging deeper in this I was hoping that someone with experience in this could give some early feedback. Like is that a "suicide mission" because there are too many places all over where there are hard coded assumptions regarding the header size and the offsets? Or is all this fairly centralized and could be accomplished with reasonable effort without risking to break everything? Any pointer for what might need special care and what consequences this might have (besides the very obvious; more memory consumption)?
It is definitely possible to enlarge the object header (I've seen such experiments before), though this won't be as easy as just adding a new field into class oopDesc. I believe there are multiple places in JVM code that rely on the size of object header, but there are should not be too much. The size of object header already differs depending on the platform and the UseCompressedOops option, so the most places in the code already use relative offsets and won't suffer from a new field.
The other option is not to expand the header, but rather add a new fake field to java.lang.Object class. HotSpot already has the machinery for adding such fields, look for InjectedField in the sources. However, this won't be trivial either. There are some hardcoded offsets for system classes, see JavaClasses::check_offsets. These need to be fixed, too.
The both approaches are roughly equal in terms of implementation efforts. In both cases I suggest to start with debug (not fastdebug) builds of JVM as they include many helpful assertions that will catch the possible offset problems early.
Having heard of your project, I think you also have the third option: give up "JVMTI only" requirement and rewrite some parts of the agent in Java leveraging the power of bytecode instrumentation and JIT compilation. Yes, this may slightly change Java code being executed and probably result in more classes loaded, but does this really matter, if from the user's perspective the impact will be even less than with JVMTI-only agent? I mean, the performance impact could be significantly less when there are no Java<->native switches, JVMTI overhead and so on. If the agent has low overhead and works with stock JVM, I guess it's not a big problem to make it ON in production in order to get its cool features, is it?

Memory-mapped files: pros and cons?

I need to share data between two Java applications running on the same machine (two different JVMs). I precise that the data to be shared is large (about 7 GB). The applications must access the data very fast because they have to answer incoming queries at a very high rate. I don't want the applications to hold each one a copy of the data.
I've seen that one option is to use memory-mapped files. Application A gets the data from somewhere (let's say a database) and stores it in files. Then application B may access these files using java.nio. I don't know exactly how memory-mapped files work, I only know that the data is stored in a file and that this file (or a part of it) is mapped to a region of the memory (virtual memory?). So, the two applications can read-write the data in memory and the changes are automatically (I guess?) committed to the file. I also don't know if there is a maximum size for a file to be entirely mapped in memory.
My first question is what are the different possibilities for two applications to share data in this scenario (I mean taking into account that the amount of data is very large and that access to this data must be very fast)? I precise that this question is not related to memory-mapped I/O, it just to know what are the other ways to solve the same problem.
My second question is what are the pros and cons of using memory-mapped files?
Thanks
My first question is what are the different possibilities for two applications to share data?
As S.Lott points out, there's a lot of mechanisms:
OS-level message queues
OS-level POSIX shared memory segments (persist after process death)
OS-level memory mappings (could be anonymous or file-backed)
OS-level anonymous pipes (unidirectional)
OS-level named pipes (unidirectional)
OS-level sockets (bidirectional) -- whether AF_UNIX or AF_INET or AF_INET6
OS-level shared global memory -- suitable for multi-threaded programs
Storing data in files
Application-level message queues
Application-level blackboard-style tuplespaces
Application-level key/value stores
Application-level remote procedure call frameworks -- many are available
Application-level web-based frameworks
My second question is what are the pros and cons of using memory-mapped files?
Pros:
very fast -- depending upon how you access the data, potentially zero-copy mechanisms can be used to operate directly on the data with no speed penalties. Care must be taken to update objects in a consistent manner.
should be very portable -- available on Unix systems for probably 25 years (give or take), and apparently Windows has mechanisms too.
Cons:
Single-system sharing. If you want to distribute your application over multiple machines, shared memory isn't a great option. Distributed shared memory systems are available, but they feel very much like the wrong interface to my way of thinking.
Even on a single system, if the memory is located on a single NUMA node but needed to be accessed by processors from multiple nodes, the inter-node requests may significantly slow processing compared to giving each node their own segment of the memory.
You can't just store pointers -- everything must be stored as offsets to base addresses, because the memory may be mapped at different locations in different processes. I have no idea what this means for Java objects, though presumably someone smart did their best to make it transparent to Java programmers. If you're not using their provided mechanisms, then you probably must do the work yourself. (Without actual pointers in Java, perhaps this is not very onerous.)
Updating objects consistently has proven to be very difficult. Passing immutable objects in message-passing systems instead generally results in programs with fewer concurrency bugs. (Concurrent programming in Erlang feels very natural and straight-forward. Concurrent programming in more imperative languages tends to introduce a huge pile of new concurrency controls: semaphores, mutexes, spinlocks, monitors).
Memory mapped files sounds like a headache. A simple option and less error prone would be to use a shared database with a cluster aware cache. That way only writes go down to the database and reads can be served from the cache.
As an example of how to do this in hibernate see http://docs.jboss.org/hibernate/core/3.3/reference/en/html/performance.html#performance-cache

Direct memory access to the network card in java

Some modern network cards support Direct Memory Access for improved performance. How can I utilize this feature from Java?
Does the JVM provide this automatically, or do I need to do an allocateDirect on the ByteBuffers that I am using to talk to that NIC?
Does anyone have documentation that discusses this?
It is the operating systems task to use the DMA feature of the network card. The JVM does not really care how the OS does it, and simply uses the operating system's functions for talking to "network interfaces".
You cannot do this from inside Java in the typical desktop/server JVMs, as this is operating system area which requires you to reach out into C code. Go have a look on JNI or JNA to see how to do this. Please note that this may make your application brittle if you do not get this exactly right.
Yeah - ankon's answer is right. Java operates in a sandbox - a virtual machine (hence the, "VM" in JVM; Sun actually built ONE physical version -- it's on display somewhere).
Java was never designed (intentionally) to reach outside the sandbox, unlike ActiveX, which can go just about anywhere on a PC.
Just think of all the bad things ActiveX has done over the years via a browser. You wouldn't want that to happen with Java, would you?
Although...
you might be able to instantiate an object in Java that does have access to the hardware (like one of those ActiveX controls, or some DLL, for example - which you'd have to write, too).
The problem I see is the throughput. With 100MB or 1000MB cards, would a JVM (remember, this is a VM running on an OS, so you're a couple of layers removed from the hardware) have the speed to handle what's coming in under load? Would you want a Java program holding up data in your NIC while it tinkered with it (think of the impact to the rest of the system)?
At this point, you're probably better off writing the hard-working guts of your solution in C. And, if you still need Java to play with that data, put it in a place where Java can get to it.
If you're not getting the network throughput you need in java, then you're going to need to write a C wrapper in order to access it.
Have you benchmarked your code to find where your performance issues really are? If you let us know that we can likely help you out without resorting to JNI.

Why doesn't the JVM cache JIT compiled code?

The canonical JVM implementation from Sun applies some pretty sophisticated optimization to bytecode to obtain near-native execution speeds after the code has been run a few times.
The question is, why isn't this compiled code cached to disk for use during subsequent uses of the same function/class?
As it stands, every time a program is executed, the JIT compiler kicks in afresh, rather than using a pre-compiled version of the code. Wouldn't adding this feature add a significant boost to the initial run time of the program, when the bytecode is essentially being interpreted?
Without resorting to cut'n'paste of the link that #MYYN posted, I suspect this is because the optimisations that the JVM performs are not static, but rather dynamic, based on the data patterns as well as code patterns. It's likely that these data patterns will change during the application's lifetime, rendering the cached optimisations less than optimal.
So you'd need a mechanism to establish whether than saved optimisations were still optimal, at which point you might as well just re-optimise on the fly.
Oracle's JVM is indeed documented to do so -- quoting Oracle,
the compiler can take advantage of
Oracle JVM's class resolution model to
optionally persist compiled Java
methods across database calls,
sessions, or instances. Such
persistence avoids the overhead of
unnecessary recompilations across
sessions or instances, when it is
known that semantically the Java code
has not changed.
I don't know why all sophisticated VM implementations don't offer similar options.
An updated to the existing answers - Java 8 has a JEP dedicated to solving this:
=> JEP 145: Cache Compiled Code. New link.
At a very high level, its stated goal is:
Save and reuse compiled native code from previous runs in order to
improve the startup time of large Java applications.
Hope this helps.
Excelsior JET has a caching JIT compiler since version 2.0, released back in 2001. Moreover, its AOT compiler may recompile the cache into a single DLL/shared object using all optimizations.
I do not know the actual reasons, not being in any way involved in the JVM implementation, but I can think of some plausible ones:
The idea of Java is to be a write-once-run-anywhere language, and putting precompiled stuff into the class file is kind of violating that (only "kind of" because of course the actual byte code would still be there)
It would increase the class file sizes because you would have the same code there multiple times, especially if you happen to run the same program under multiple different JVMs (which is not really uncommon, when you consider different versions to be different JVMs, which you really have to do)
The class files themselves might not be writable (though it would be pretty easy to check for that)
The JVM optimizations are partially based on run-time information and on other runs they might not be as applicable (though they should still provide some benefit)
But I really am guessing, and as you can see, I don't really think any of my reasons are actual show-stoppers. I figure Sun just don't consider this support as a priority, and maybe my first reason is close to the truth, as doing this habitually might also lead people into thinking that Java class files really need a separate version for each VM instead of being cross-platform.
My preferred way would actually be to have a separate bytecode-to-native translator that you could use to do something like this explicitly beforehand, creating class files that are explicitly built for a specific VM, with possibly the original bytecode in them so that you can run with different VMs too. But that probably comes from my experience: I've been mostly doing Java ME, where it really hurts that the Java compiler isn't smarter about compilation.

How to lock compiled Java classes to prevent decompilation?

How do I lock compiled Java classes to prevent decompilation?
I know this must be very well discussed topic on the Internet, but I could not come to any conclusion after referring them.
Many people do suggest obfuscator, but they just do renaming of classes, methods, and fields with tough-to-remember character sequences but what about sensitive constant values?
For example, you have developed the encryption and decryption component based on a password based encryption technique. Now in this case, any average Java person can use JAD to decompile the class file and easily retrieve the password value (defined as constant) as well as salt and in turn can decrypt the data by writing small independent program!
Or should such sensitive components be built in native code (for example, VC++) and call them via JNI?
Some of the more advanced Java bytecode obfuscators do much more than just class name mangling. Zelix KlassMaster, for example, can also scramble your code flow in a way that makes it really hard to follow and works as an excellent code optimizer...
Also many of the obfuscators are also able to scramble your string constants and remove unused code.
Another possible solution (not necessarily excluding the obfuscation) is to use encrypted JAR files and a custom classloader that does the decryption (preferably using native runtime library).
Third (and possibly offering the strongest protection) is to use native ahead of time compilers like GCC or Excelsior JET, for example, that compile your Java code directly to a platform specific native binary.
In any case You've got to remember that as the saying goes in Estonian "Locks are for animals". Meaning that every bit of code is available (loaded into memory) during the runtime and given enough skill, determination and motivation, people can and will decompile, unscramble and hack your code... Your job is simply to make the process as uncomfortable as you can and still keep the thing working...
As long as they have access to both the encrypted data and the software that decrypts it, there is basically no way you can make this completely secure. Ways this has been solved before is to use some form of external black box to handle encryption/decryption, like dongles, remote authentication servers, etc. But even then, given that the user has full access to their own system, this only makes things difficult, not impossible -unless you can tie your product directly to the functionality stored in the "black box", as, say, online gaming servers.
Disclaimer: I am not a security expert.
This sounds like a bad idea: You are letting someone encrypt stuff with a 'hidden' key that you give him. I don't think this can be made secure.
Maybe asymmetrical keys could work:
deploy an encrypted license with a public key to decrypt
let the customer create a new license and send it to you for encryption
send a new license back to the client.
I'm not sure, but I believe the client can actually encrypt the license key with the public key you gave him. You can then decrypt it with your private key and re-encrypt as well.
You could keep a separate public/private key pair per customer to make sure you actually are getting stuff from the right customer - now you are responsible for the keys...
No matter what you do, it can be 'decompiled'. Heck, you can just disassemble it. Or look at a memory dump to find your constants. You see, the computer needs to know them, so your code will need to too.
What to do about this?
Try not to ship the key as a hardcoded constant in your code: Keep it as a per-user setting. Make the user responsible for looking after that key.
#jatanp: or better yet, they can decompile, remove the licensing code, and recompile. With Java, I don't really think there is a proper, hack-proof solution to this problem. Not even an evil little dongle could prevent this with Java.
My own biz managers worry about this, and I think too much. But then again, we sell our application into large corporates who tend to abide by licensing conditions--generally a safe environment thanks to the bean counters and lawyers. The act of decompiling itself can be illegal if your license is written correctly.
So, I have to ask, do you really need hardened protection like you are seeking for your application? What does your customer base look like? (Corporates? Or the teenage gamer masses, where this would be more of an issue?)
If you're looking for a licensing solution, you can check out the TrueLicense API. It's based on the use of asymmetrical keys. However, it doesn't mean your application cannot be cracked. Every application can be cracked with enough effort. What really important is, as Stu answered, figuring out how strong protection you need.
You can use byte-code encryption with no fear.
The fact is that the cited above paper “Cracking Java byte-code encryption” contains a logic fallacy. The main claim of the paper is before running all classes must be decrypted and passed to the ClassLoader.defineClass(...) method. But this is not true.
The assumption missed here is provided that they are running in authentic, or standard, java run-time environment. Nothing can oblige the protected java app not only to launch these classes but even decrypt and pass them to ClassLoader. In other words, if you are in standard JRE you can't intercept defineClass(...) method because the standard java has no API for this purpose, and if you use modified JRE with patched ClassLoader or any other “hacker trick” you can't do it because protected java app will not work at all, and therefore you will have nothing to intercept. And absolutely doesn't matter which “patch finder” is used or which trick is used by hackers. These technical details are a quite different story.
I don't think there exists any effective offline antipiracy method. The videogame industry has tried to find that many times and their programs has always been cracked. The only solution is that the program must be run online connected with your servers, so that you can verify the lincense key, and that there is only one active connecion by the licensee at a time. This is how World of Warcraft or Diablo works. Even tough there are private servers developed for them to bypass the security.
Having said that, I don't believe that mid/large corporations use illegal copied software, because the cost of the license for them is minimal (perhaps, I don't know how much you are goig to charge for your program) compared to the cost of a trial version.
Q: If I encrypt my .class files and use a custom classloader to load and decrypt them on the fly, will this prevent decompilation?
A: The problem of preventing Java byte-code decompilation is almost as old the language itself. Despite a range of obfuscation tools available on the market, novice Java programmers continue to think of new and clever ways to protect their intellectual property. In this Java Q&A installment, I dispel some myths around an idea frequently rehashed in discussion forums.
The extreme ease with which Java .class files can be reconstructed into Java sources that closely resemble the originals has a lot to do with Java byte-code design goals and trade-offs. Among other things, Java byte code was designed for compactness, platform independence, network mobility, and ease of analysis by byte-code interpreters and JIT (just-in-time)/HotSpot dynamic compilers. Arguably, the compiled .class files express the programmer's intent so clearly they could be easier to analyze than the original source code.
Several things can be done, if not to prevent decompilation completely, at least to make it more difficult. For example, as a post-compilation step you could massage the .class data to make the byte code either harder to read when decompiled or harder to decompile into valid Java code (or both). Techniques like performing extreme method name overloading work well for the former, and manipulating control flow to create control structures not possible to represent through Java syntax work well for the latter. The more successful commercial obfuscators use a mix of these and other techniques.
Unfortunately, both approaches must actually change the code the JVM will run, and many users are afraid (rightfully so) that this transformation may add new bugs to their applications. Furthermore, method and field renaming can cause reflection calls to stop working. Changing actual class and package names can break several other Java APIs (JNDI (Java Naming and Directory Interface), URL providers, etc.). In addition to altered names, if the association between class byte-code offsets and source line numbers is altered, recovering the original exception stack traces could become difficult.
Then there is the option of obfuscating the original Java source code. But fundamentally this causes a similar set of problems.
Encrypt, not obfuscate?
Perhaps the above has made you think, "Well, what if instead of manipulating byte code I encrypt all my classes after compilation and decrypt them on the fly inside the JVM (which can be done with a custom classloader)? Then the JVM executes my original byte code and yet there is nothing to decompile or reverse engineer, right?"
Unfortunately, you would be wrong, both in thinking that you were the first to come up with this idea and in thinking that it actually works. And the reason has nothing to do with the strength of your encryption scheme.

Categories