Creating a custom JVM with larger object header - java

For various reasons I came to the conclusion that creating a custom JVM build might be the easiest option for what I am trying to achieve as there are simply too many things that are affecting performance really badly if done otherwise.
So I have the environment up and running, modified some simple things to generate some callbacks for what I need, played with some intrinsics, so far so good.
What I would like to know though is: What do the JVM experts here think about the feasibility of creating a custom VM that has a larger object header (e.g. 8 bytes more). markOop.hpp explains the content of the mark word in a pretty good way for the different flavours that exist (32bits, 64bits, 64bits with compressed oops) and I was wondering how hard it would be to extend the header so I can put some extra info on the objects (no tagging is not on option, see my post here).
So before digging deeper in this I was hoping that someone with experience in this could give some early feedback. Like is that a "suicide mission" because there are too many places all over where there are hard coded assumptions regarding the header size and the offsets? Or is all this fairly centralized and could be accomplished with reasonable effort without risking to break everything? Any pointer for what might need special care and what consequences this might have (besides the very obvious; more memory consumption)?

It is definitely possible to enlarge the object header (I've seen such experiments before), though this won't be as easy as just adding a new field into class oopDesc. I believe there are multiple places in JVM code that rely on the size of object header, but there are should not be too much. The size of object header already differs depending on the platform and the UseCompressedOops option, so the most places in the code already use relative offsets and won't suffer from a new field.
The other option is not to expand the header, but rather add a new fake field to java.lang.Object class. HotSpot already has the machinery for adding such fields, look for InjectedField in the sources. However, this won't be trivial either. There are some hardcoded offsets for system classes, see JavaClasses::check_offsets. These need to be fixed, too.
The both approaches are roughly equal in terms of implementation efforts. In both cases I suggest to start with debug (not fastdebug) builds of JVM as they include many helpful assertions that will catch the possible offset problems early.
Having heard of your project, I think you also have the third option: give up "JVMTI only" requirement and rewrite some parts of the agent in Java leveraging the power of bytecode instrumentation and JIT compilation. Yes, this may slightly change Java code being executed and probably result in more classes loaded, but does this really matter, if from the user's perspective the impact will be even less than with JVMTI-only agent? I mean, the performance impact could be significantly less when there are no Java<->native switches, JVMTI overhead and so on. If the agent has low overhead and works with stock JVM, I guess it's not a big problem to make it ON in production in order to get its cool features, is it?

Related

Decompiler Bytecode and Obfuscators

Can we completely reverse-engineer the source code from java bytecode ? Why this feature is allowed in Java and How successful are java decompilers against obfuscators.?
I know this question is old but I kept looking for a reliable answer until I found nothing.
So in this post I summarize some of my effort to obfuscate a J2EE JAR.
It seems , that by year 2014 (time of writing) there are not many options out there.
If you read this review later then things may have changed or fixed.
When I think why , I start to sense that the whole obfuscation effort gives a false sense of security. Don't get me wrong. It does add a level of security, but not as much as I would hope.
I will try to give a preview of what I found to explain myself. My recommendation are personal , others may disagree with it.
So to begin with: obfuscation in Java is the process of taking bytecode and making it less readable (using a decompiler of course) while maintaining its original functionality.What can we do, Java ,working as an interperter, must keep its bytecode exposed. You run the obfuscator as a measure of security in case the class file falls into the wrong hands. The result of the obfuscation is a reverse-mapping files and a JAR with the obfuscated classes. The reverse mapping file is used of-course to perform stack trace reading (a.k.a re-trace) or to revert the bytecode to its original shape. The runtime performance hit of an obfuscated class should not pass the 10% (but this really depends on what you do in your code).
But there is a big “but” . Obfuscation will scramble your code but it won’t make it hacker-proof. Bare in mind you only buy time and a determined hacker will find a way to reverse engineer your bytecode into its pure algorithm.
IMHO: the best way to hide a sensitive piece of code is to drown it in some huge pile of meaningless code.
Some of the hackers will try to modify your bytecode (by code injection) to help them achieve their goals. Some obfuscators offer additional level of JAR hardening , making it harder to modify.
De-obfuscators and de-compilers: my favourite Java decompiler is JD-GUI . However, when it comes to de-obfuscators I found the market pretty empty. Most of the tools ask you for a hint (what obfuscation tool was used to encrypt the source JAR) , yet none of them really deliver results (some of them even crash when trying to de-cipher the JAR). They are open source projects with low maintenance. I couldn’t even find a paid application to do a decent de-obfuscation. so enlighten me if you know something.
Free solutions
There are open source , free obfuscators which usually simply rename the classes/methods names, making it one letter method (i.e. from printUsage(String params) to a(String p) ).
They might ,as hinted here , even strip debugging information to make it a bit more difficult. (debugging information is kept at the end of every Java method bytecode and contains: line numbers, variables names ,etc.).
Its a nice effort , but an experience Java developer with a debugger can very easily deduce the purpose of each parameter while doing few live runs.
One of the nice open source obfuscators is ProGuard but there are several more tools.
Nevertheless , if you truly security fanatic you will probably want something stronger. Stronger demands more features (and more money) which leads us to the next bullet:
Paid solutions
While free products may only change classes method names , paid product will usually offer more features:
code/flow obfuscation: this will change the method code and inject empty loops/dead code/confusing switch tables and alike. Some of them may even scramble the exception table content. the obfuscation strength usually determine the output size.
Note: regarding code obfuscation: I deliberately avoided the details in my review. Some of the bytecode I saw and analyzed expose their obfuscation methods, and I wish to protect their IP. I do have an opinion about who uses better algorithms. contact me if you wish to know.
classes/method renaming : well this is the obvious , we discussed it in the free obfuscation. Some of the product will rename the class name and then recursively search for reflection usage of that class and fix those too. Paid products may even rename Spring /Wink configuration files for the same purpose (renaming in reflection).
String encryption: for every string “like this” in the code, it will encrypt it to some level and keep the key somewhere (in the class constant table/static blocks/a new method or any other mean).
debug information : stripping parts or scrambling.many of them will remove the line numbers info.
class
hardening: all kinds of methods like injecting some signing scheme into the beginning of the class/method, making sure an outsider won’t be able to easily modify the JAR and run it. Less important for Android or applets as most of them are digitally signed anyhow. some will combine hardening with water-marking to track pirated copies. But we all know anti-pirating methods by software are doomed to be hacked. Game industry suffered from it for decades until network based subscriptions arrived.
Since most products here deal with Java , some of them provides Android integration. It means it will not only obfuscate the Java (dalvik) code , but also manipulates the Android's manifest file and resources. Some offer anti debugging: remove the debug flag in android apps.
Nice GUI app to configure the various options and maybe do a re-trance on a given log file. The UI is usually used to generate a config file. with such file you can later re-play the obfuscation many times, even from command line.
Incremental build support - this is useful for large groups who release product updates/fixes frequently. You can tell the obfuscator to preserve old “obfuscation” result and randomly obfuscate only “new” code flows. this way you can be sure minimal impact on your methods signature. Without this flag , each obfuscation cycle on a JAR would yield a different output as most good tools use some level of randomness in their algorithms.
CLI and distributed builds. When you work alone then running an obfuscator is not a big issue. you need to configure the obfuscator to your relevant options and run it.However, in enterprise , when integrating obfuscator into the the build script things are a bit different. There is another level of complexity: build engine tasks (like ant/maven) and license management. The good news that all obfuscator I tested have command line API. In distributed build environment there are cluster/pool of build machines to support concurrent demand of builds. The cluster is dynamic and virtual, machines are going up or down, depending on various conditions. Some obfuscation products are based on cpuID license file or hostname. This can create quite a challenge for the build teams to integrate. Some prefer a local floating license server. Some may require public license server (but then: not all build farms have access to the public internet). Some offer multi-site license (which in my opinion is the best).
Some offer code optimizations - algebric equivalence and dropping of dead code. Its nice, but I believe that today's JDK do good job in optimizing bytecode. Its true that dead code makes you downloadable bigger, but with today's bandwidth its less than a problem. I also want to believe that in software today 20:80 thumb rule still applies. in any application 20% is probably a dead code anyway.
So who are the players I tried ?
KlassMaster by Zelix.com - one of the oldest in the industry. Yet they deliver a solid product with 3-4 releases per year. This been going for decades (since 1997). Zelix provides good email support and answered all my emails in a timely manner. They have a nice GUI client to either obfuscate a JAR or create a config file for future obfuscation. It simple and slick. nothing special here. They provided simple to read on-line documentation for all their flags. they support both “exclude” and “include” regular expressions for what the engine should obfuscate. The thing I liked about their process most is that it also adds “noise” to the exception table. It makes it a bit more confusing regarding the method exception handling. Their flow obfuscator strength is quite good and can be configured between 3 possible levels (light,medium and aggressive). Another feature I liked is the fine tuning they provide for debug info stripping (online line numbers, or online local variables or both). Klass Master doesn’t provide any
dedicated Android flags or anti-tamper methods. Their licensing model is quite simple: a text file to be placed near the KlassMaster main JAR. They also support incremental obfuscation.
JFuscator from secureTeam.net : While secureTeam also has a .Net tool , I focus on their Java tool capabilities. Their (Swing based) GUI tool seems nice but it crash when trying the simplest obfuscation task. the error was always the same: Error reading '/opt/sun-jdk1.7.0_55/jre\lib\rt.jar'. Reason: ''/opt/sun-jdk1.7.0_55/jre\lib\rt.jar': no such file or directory' . Now of course I have my Java installed in /opt/sun-jdk1.7.0_55/jre. You can image that they simply didn’t expect linux back slash structure. I contacted secureTeam.net support by email with the minor “path” problem. They asked if I am a linux user and after I replied I am , they never answered my email. I also tried their web site on-line chat : no response. So there I stopped testing. Without further results, I couldn’t examine the obfuscated bytecode quality. From their web site it seems they have anti-tamper method , String manipulation, method renaming and few other features.
GuartIt4J (by Arxan.com) : Arxan is fairly solid player in the mobile environment and as such they offer Android obfuscator which of course works well for Java. They have one of the most flexible engines.They provide code obfuscation,string encryption and alike You can define the complexity of code obfuscation. it is simply an integer. the higher - the longer your method turns out. ofcourse, you must be carefull not to exceed the JVM 64KB limit per class… As I said before one of the best strategies to hide a sensitive code is not to encrypt it , but to inject it into huge pile of garbage. This is exactly what GuardIt does. It can also explode in the same way the methods exception table. I managed to create a method with 100 exceptions in its exception table (pre-obfuscator it was 5). what they miss: their re-trace program is not part of the supplied main JAR. Nevertheless, they were kind enough to send me a sample Java program that performs re-trace given the reverse mapping file and the log. They don’t support incremental obfuscation and no flexibility regarding debug information. Debug information stripping is either all or nothing. watching the output JAR you will tons of conditions and jumps that were injected. Bare in mind , exploding the class size has its performance hit. In some methods I measured almost 50% performance hit when applying long obfuscation (no I/O in those methods). so extrapolating the code comes with a price.(from a 400 opcodes - I went up to 2200 opcodes after obfuscation). JD-GUI , my de-compiler failed to open such classes and crashed (IndexOutOfBoundException). They also supply complete class encryption . Meaning the class is encrypted with some symetrical key which demands a special (or custom written) class loader to open it in memory. This is an anti-tamper mechanism as well as hiding code. Just remember that a JVM can’t run that class without the class loader help. Its a nice feature, but the secret key and the bootstrap loader JAR are probably there. If he got the encrypted JAR the hacker will eventually get his hands and decrypt the classes. Yet this another level of obstacle the common hacker will need to pass. What I didn’t like here is the license file policy: is bounded to CPUid or need to install a floating license server.
SecureIt (by Allatori.com) : SecureIt offers all the general code obfuscation, string encryption ,renaming and such. On top of the standard obfuscation methods they also offer some kind of water-marking which is an anti-tamper/pirating method. They support Android and JavaME (who uses ME these days?!). They support incremental obfuscation. The one thing to note about configuring SecureIt: it is all command line. No GUI tool this time. Personally , I don’t mind command line tools as long as they come with good documentation. Luckily they have a very good documentation and a rich API with many flags to tune if you wish. you can re-trace with they tool (also a command line ) . They can’t obfuscate the exception table. I didn’t check their licensing mechanism.
DashO (by Preemptive.com) : DashO obfuscator will be remembered probably as the best UI tool you can get (to create your configuration). Like SecureIt they lake the exception table obfuscation but they have all the rest of the required features (as well as CLI, Spring framework and gradle/ant integration, and even an eclipse plugin) . Well, they do document a try-catch obfuscator (which is same as exception table obfuscator) , but it is only a recommendation to the engine. When I tried it , it had nil effect on the exception table. As I said , the GUI tool is superb and has a re-trace embedded into it. they also offer some kind of application signing and water-marking as an anti-tamper/pirating mechanism. DashO provides superb Android integration and also combine in their product a door for analytics uploads. You can actually track your application. Injecting crash log uploaders and reporting code to your JAR. Nevertheless that’s not the scope of obfuscation - that’s a whole different code injection product. They have a very good support. both online and by phone. Their licensing scheme is based on monthly subscription or one time purchase payment. A bit different than others. They are using a floating license server to support large environments.
I hope this helps a bit..
Can we completely reverse-engineer the source code from java bytecode ?
Not completely, because some aspects of source code, such as whitespace, local variable names, and comments, are not preserved in bytecode. Otherwise, yes -- while you can't get the exact same source code out, you can almost always get something that can at least be compiled back to the same bytecode.
Why this feature is allowed in Java
It's not so much "allowed" as it is "not prevented". And it's not prevented because doing so is impossible -- the code must be runnable to be useful; if the code is runnable, then it is analyzable; if it is analyzable, then with sufficient analysis it can be converted back to source.
How successful are java decompilers against obfuscators?
Not very. Most obfuscators I've seen (esp. ProGuard) are primarily effective in removing meaningful function and class names; obfuscating the logic itself is not typically attempted.
you can get source code from binary these days. Although the source code obtained by Java's bytecode is more readable, obfuscating will make it slightly unreadable. Its not that only Java can be reverse engineered to code. Even C/C++ these days (with Hexrays plugin for IDA Pro) can be decompiled to source. Obfuscaters will make it hard to read but not impossible. There is nothing that can save your program from an intelligent and capable reverse engineer. :).
Good luck.
Can we completely reverse-engineer the source code from java bytecode
?
The java class file is based on a spec so anyone can read into it. A tool like JD-GUI will tear into your source code easily. It is not a 'feature' per se. While 100% reverse-engineering is not possible, most of your code can be reverse engineered.
How successful are java decompilers against obfuscators?
Depends. The point of the obfuscator is to remove any meaningful names and try to introduce confusion in the code without impacting performance. Most developers are great at obfuscating code themselves :) Pro-guard is pretty good at obfuscation.

Compiling Scheme using Java

I was writing a Scheme interpreter (trying to be fully R5RS compatible) and it just struck me that compiling into VM opcodes would make it faster. (Correct me if I am wrong.) I can interpret the Scheme source code in the memory, but I am stuck at understanding code generation.
My question is: What patterns will be required to generate opcodes from a parse tree, for, say, the JVM or any other VM (or even a real machine)? And what, if any, will be the complications, advantages, or disadvantage of doing so?
For Scheme there will be two major complications related to JVM.
First, JVM does not support explicit tail calls annotations, therefore you won't be able to guarantee a proper tail recursion as required by R5RS (3.5) without resorting to an expensive mini-interpreter trick.
The second issue is with continuations support. JVM does not provide anything useful for implementing continuations, so again you're bound to use a mini-interpreter. I.e., each CPS trivial function should return a next closure, which will be then called by an infinite mini-interpreter loop.
But still there are many interesting optimisation possibilities. I'd recommend to take a look at Bigloo (there is a relatively fast JVM backend) and Kawa. For the general compilation techniques take a look at Scheme in 90 minutes.
And still, interpretation is a viable alternative to compilation (at least on JVM, due to its severe limitations and general inefficiency). See how SISC is implemented, it is quite an interesting and innovative approach.

Java, JIT and Garbage Collector efficiency

I want to know about the efficiency of Java and the advantages and disadvantages of Java Virtual Machine and Android.
Efficiency is the low use of memory, low use of the processor and fast execution.
Mobile devices are simpler than PC, then the apps need to be more efficient. Servers receive many connections and they need to be very efficient. Many mobile devices use Android and Java apps, and many servers use PHP.
Can Java and interpreted languages, such as Java Script, Python and PHP, be more efficient than C and C++?
JIT (just in time) advantages:
It can optimize better, because it knows the value of some variables and where it is used or changed.
It knows the processor and can optimize with processor specific instructions.
It is easier to transform functions into inline function.
It can remove known conditional tests and remove blocks that will not be run.
Java disadvantages:
When the app run for the first time, the app will be very slow, because the bytecodes will be interpreted and JIT compiler will do many analysis to find good optimizations. The apps cannot use the maximum of the hardware power. If an app is a game or a real-time app, if it be run for the first successfully and with no delay, but it uses the maximum of the hardware power, then the next time the app be run, it will not use the maximum of the hardware power due to optimizations. The problem is the app cannot be designed to use the maximum of the hardware power after the optimization, because it will be too slow on the first run, and will not continue to run.
Java checks if the array index is not out of bounds, and it checks if the pointers are not null. It will add several internal "if"s to generated code.
All objects use garbage collector, including objects that are very easy to manually delete.
All instances of objects are created with dynamic memory allocation, including objects that can easily use the stack. If a loop iteration begins creating an instance of a class and ends deleting the created object, dynamic memory allocation will be inefficient.
The garbage collector needs to stop the app while it cleans the memory and it is very undesired for games, GUI apps and real-time apps. Reference counting is slow and it cannot handle circular references. Multi-threaded garbage collector is slower and it needs more use of the CPU.
Can Java and interpreted languages, such as Java Script, Python and PHP, be more efficient than C and C++?
It's very difficult to get more efficient than the best C and C++ programs. There's a lot of C and C++ programs that are nowhere near as efficient as that though, and beating them with (modern) Java code is quite practical if you're any good.
I've also heard good things about the current best-of-breed Javascript engines, but I've never studied them in detail.
With Python and PHP (and many other languages besides) it's a bit different. Those languages are written in C, so it's obvious they cannot be more efficient than C (follows by construction). Yet it's much easier to write efficient code in them (i.e., that uses what is in-effect a very well-written C library) than it is to start from scratch. In particular, it reduces the number of defects per program. That's a very important metric in practice; anyone can produce fast code if it's allowed to be wrong.
In general, I advise not worrying about getting maximal efficiency. You run up against the law of diminishing returns. Instead, use sensible overall algorithms (or, as a friend of mine once said to me, “look after the big O()s and let the constant factors look after themselves”) and focus on the question of whether the program is good enough in practice. Once it is, stop fiddling around and ship it!
Let's pick apart your claimed disadvantages:
When the app run for the first time, the app will be very slow, because the bytecodes will be interpreted and JIT compiler will do many analysis to find good optimizations. The apps cannot use the maximum of the hardware power.
JIT compilation is an implementation issue. Not all platforms do it. Indeed, the Android platform could be modified to 1) do ahead of time compilation, or 2) cache the native code produced by the JIT to give faster startup next time you run the app.
It is interesting that various Java vendors have tried these strategies at various times, and yet the empirical evidence is that plain JIT is the best strategy.
Java checks if the array index is not out of bounds, and it checks if the pointers are not null. It will add several internal "if"s to generated code.
The JIT compiler can optimize away many of these tests. For the rest, the overheads tend to be relatively small; e.g. a few percent difference ... not a factor of 2.
Note that the alternative to checking is the risk that typical application bugs will crash the android platform. Certainly, garbage collection becomes problematic if applications can trash memory.
All objects use garbage collector, including objects that are very easy to manually delete.
The flip-side is that it is easy to forget to delete objects, delete objects twice, use them after they have been deleted and so on. These mistakes all lead to bugs that tend to be hard to track down.
All instances of objects are created with dynamic memory allocation, including objects that can easily use the stack. If a loop iteration begins creating an instance of a class and ends deleting the created object, dynamic memory allocation will be inefficient.
Java dynamic memory allocation and object creation is FAST. Faster than in C++ for example.
The garbage collector needs to stop the app while it cleans the memory and it is very undesired for games, GUI apps and real-time apps.
Use a concurrent / low-pause garbage collector then. Another approach is to implement your app to not generate lots of garbage ... and seldom trigger garbage collection.
Reference counting is slow and it cannot handle circular references.
No decent Java GC uses reference counting. (On the other hand, a lot of C / C++ manual memory management schemes do. For instance, so-called smart pointer schemes in C++.)
Multi-threaded garbage collector is slower and it needs more use of the CPU.
You actually mean concurrent collection I think. Yes it does, but that's the penalty you pay for the extra responsiveness that you demand for interactive games / realtime apps.
What you describe as 'efficient' I would describe as 'ideal'. An application that requires little memory, little CPU time and runs quickly, put another way, is one that is good, fast, and cheap all at the same time. Never mind if it does anything useful or interesting.
The only comparison I'd view as reasonable, if all three goals are required, is among applications that produce a common result. In that case, it is unlikely, given a competing group of evenly-capable programmers, that any one implementation would excel on all three counts over the others.
That said, your question leaves out a key criterion to the mobile market: rate of application development. Mobile applications also profit far more from positive user experience than back-end optimization. Without that constraint, the question of efficiency as you put it, seems to me more of an ponderous consideration than a practical one.
But to the actual question: can a language like Java produce more efficient code than one that compiles statically to the instruction set of the target machine? Probably not. Can it be as efficient, or efficient enough? Absolutely. If we considered an execution platform with fixed, severely constrained resources that changes infrequently, it would be a different matter.
In any language, the way to get fast execution is to do the job with as little execution as possible, and as little garbage collection as possible.
That sounds like a vacuous generality, but what it means in practice, regardless of language, is
For the data structure design, keep it as simple as possible. Stay away from the fancy collection classes full of bells and whistles. Especially stay away from notifications as a way of keeping data consistent. If your data is normalized, it can never be inconsistent. If you can't normalize it, it's better to tolerate temporary inconsistency, than to try to keep it tight with notifications.
Performance problems creep in, even into the best code. You should try not to make them, but you will still make them. Most important is knowing how to find them, once made, and remove them. Here's a blow-by-blow example. If in doing this, you find you need a better big-O algorithm, then put it in. Putting one in without being sure it's needed is a recipe for slowness.
No language can rescue a program from non-removed performance problems. The language and its compiler, JITter, etc. are like a race horse. It's fine to want a good horse, but it's a waste if the jockey isn't as slim as possible.
Your program is the jockey, and it's your job to take it on a weight-loss program.
I will paste an interesting answer given by the James Gosling himself in the Book Masterminds of Programming.
Well, I’ve heard it said that
effectively you have two compilers in
the Java world. You have the compiler
to Java bytecode, and then you have
your JIT, which basically recompiles
everything specifically again. All of
your scary optimizations are in the
JIT.
James: Exactly. These days we’re
beating the really good C and C++
compilers pretty much always. When you
go to the dynamic compiler, you get
two advantages when the compiler’s
running right at the last moment. One
is you know exactly what chipset
you’re running on. So many times when
people are compiling a piece of C
code, they have to compile it to run
on kind of the generic x86
architecture. Almost none of the
binaries you get are particularly well
tuned for any of them. You download
the latest copy of Mozilla,and it’ll
run on pretty much any Intel
architecture CPU. There’s pretty much
one Linux binary. It’s pretty generic,
and it’s compiled with GCC, which is
not a very good C compiler.
When HotSpot runs, it knows exactly
what chipset you’re running on. It
knows exactly how the cache works. It
knows exactly how the memory hierarchy
works. It knows exactly how all the
pipeline interlocks work in the CPU.
It knows what instruction set
extensions this chip has got. It
optimizes for precisely what machine
you’re on. Then the other half of it
is that it actually sees the
application as it’s running. It’s able
to have statistics that know which
things are important. It’s able to
inline things that a C compiler could
never do. The kind of stuff that gets
inlined in the Java world is pretty
amazing. Then you tack onto that the
way the storage management works with
the modern garbage collectors. With a
modern garbage collector, storage
allocation is extremely fast.

Why doesn't the JVM cache JIT compiled code?

The canonical JVM implementation from Sun applies some pretty sophisticated optimization to bytecode to obtain near-native execution speeds after the code has been run a few times.
The question is, why isn't this compiled code cached to disk for use during subsequent uses of the same function/class?
As it stands, every time a program is executed, the JIT compiler kicks in afresh, rather than using a pre-compiled version of the code. Wouldn't adding this feature add a significant boost to the initial run time of the program, when the bytecode is essentially being interpreted?
Without resorting to cut'n'paste of the link that #MYYN posted, I suspect this is because the optimisations that the JVM performs are not static, but rather dynamic, based on the data patterns as well as code patterns. It's likely that these data patterns will change during the application's lifetime, rendering the cached optimisations less than optimal.
So you'd need a mechanism to establish whether than saved optimisations were still optimal, at which point you might as well just re-optimise on the fly.
Oracle's JVM is indeed documented to do so -- quoting Oracle,
the compiler can take advantage of
Oracle JVM's class resolution model to
optionally persist compiled Java
methods across database calls,
sessions, or instances. Such
persistence avoids the overhead of
unnecessary recompilations across
sessions or instances, when it is
known that semantically the Java code
has not changed.
I don't know why all sophisticated VM implementations don't offer similar options.
An updated to the existing answers - Java 8 has a JEP dedicated to solving this:
=> JEP 145: Cache Compiled Code. New link.
At a very high level, its stated goal is:
Save and reuse compiled native code from previous runs in order to
improve the startup time of large Java applications.
Hope this helps.
Excelsior JET has a caching JIT compiler since version 2.0, released back in 2001. Moreover, its AOT compiler may recompile the cache into a single DLL/shared object using all optimizations.
I do not know the actual reasons, not being in any way involved in the JVM implementation, but I can think of some plausible ones:
The idea of Java is to be a write-once-run-anywhere language, and putting precompiled stuff into the class file is kind of violating that (only "kind of" because of course the actual byte code would still be there)
It would increase the class file sizes because you would have the same code there multiple times, especially if you happen to run the same program under multiple different JVMs (which is not really uncommon, when you consider different versions to be different JVMs, which you really have to do)
The class files themselves might not be writable (though it would be pretty easy to check for that)
The JVM optimizations are partially based on run-time information and on other runs they might not be as applicable (though they should still provide some benefit)
But I really am guessing, and as you can see, I don't really think any of my reasons are actual show-stoppers. I figure Sun just don't consider this support as a priority, and maybe my first reason is close to the truth, as doing this habitually might also lead people into thinking that Java class files really need a separate version for each VM instead of being cross-platform.
My preferred way would actually be to have a separate bytecode-to-native translator that you could use to do something like this explicitly beforehand, creating class files that are explicitly built for a specific VM, with possibly the original bytecode in them so that you can run with different VMs too. But that probably comes from my experience: I've been mostly doing Java ME, where it really hurts that the Java compiler isn't smarter about compilation.

How to lock compiled Java classes to prevent decompilation?

How do I lock compiled Java classes to prevent decompilation?
I know this must be very well discussed topic on the Internet, but I could not come to any conclusion after referring them.
Many people do suggest obfuscator, but they just do renaming of classes, methods, and fields with tough-to-remember character sequences but what about sensitive constant values?
For example, you have developed the encryption and decryption component based on a password based encryption technique. Now in this case, any average Java person can use JAD to decompile the class file and easily retrieve the password value (defined as constant) as well as salt and in turn can decrypt the data by writing small independent program!
Or should such sensitive components be built in native code (for example, VC++) and call them via JNI?
Some of the more advanced Java bytecode obfuscators do much more than just class name mangling. Zelix KlassMaster, for example, can also scramble your code flow in a way that makes it really hard to follow and works as an excellent code optimizer...
Also many of the obfuscators are also able to scramble your string constants and remove unused code.
Another possible solution (not necessarily excluding the obfuscation) is to use encrypted JAR files and a custom classloader that does the decryption (preferably using native runtime library).
Third (and possibly offering the strongest protection) is to use native ahead of time compilers like GCC or Excelsior JET, for example, that compile your Java code directly to a platform specific native binary.
In any case You've got to remember that as the saying goes in Estonian "Locks are for animals". Meaning that every bit of code is available (loaded into memory) during the runtime and given enough skill, determination and motivation, people can and will decompile, unscramble and hack your code... Your job is simply to make the process as uncomfortable as you can and still keep the thing working...
As long as they have access to both the encrypted data and the software that decrypts it, there is basically no way you can make this completely secure. Ways this has been solved before is to use some form of external black box to handle encryption/decryption, like dongles, remote authentication servers, etc. But even then, given that the user has full access to their own system, this only makes things difficult, not impossible -unless you can tie your product directly to the functionality stored in the "black box", as, say, online gaming servers.
Disclaimer: I am not a security expert.
This sounds like a bad idea: You are letting someone encrypt stuff with a 'hidden' key that you give him. I don't think this can be made secure.
Maybe asymmetrical keys could work:
deploy an encrypted license with a public key to decrypt
let the customer create a new license and send it to you for encryption
send a new license back to the client.
I'm not sure, but I believe the client can actually encrypt the license key with the public key you gave him. You can then decrypt it with your private key and re-encrypt as well.
You could keep a separate public/private key pair per customer to make sure you actually are getting stuff from the right customer - now you are responsible for the keys...
No matter what you do, it can be 'decompiled'. Heck, you can just disassemble it. Or look at a memory dump to find your constants. You see, the computer needs to know them, so your code will need to too.
What to do about this?
Try not to ship the key as a hardcoded constant in your code: Keep it as a per-user setting. Make the user responsible for looking after that key.
#jatanp: or better yet, they can decompile, remove the licensing code, and recompile. With Java, I don't really think there is a proper, hack-proof solution to this problem. Not even an evil little dongle could prevent this with Java.
My own biz managers worry about this, and I think too much. But then again, we sell our application into large corporates who tend to abide by licensing conditions--generally a safe environment thanks to the bean counters and lawyers. The act of decompiling itself can be illegal if your license is written correctly.
So, I have to ask, do you really need hardened protection like you are seeking for your application? What does your customer base look like? (Corporates? Or the teenage gamer masses, where this would be more of an issue?)
If you're looking for a licensing solution, you can check out the TrueLicense API. It's based on the use of asymmetrical keys. However, it doesn't mean your application cannot be cracked. Every application can be cracked with enough effort. What really important is, as Stu answered, figuring out how strong protection you need.
You can use byte-code encryption with no fear.
The fact is that the cited above paper “Cracking Java byte-code encryption” contains a logic fallacy. The main claim of the paper is before running all classes must be decrypted and passed to the ClassLoader.defineClass(...) method. But this is not true.
The assumption missed here is provided that they are running in authentic, or standard, java run-time environment. Nothing can oblige the protected java app not only to launch these classes but even decrypt and pass them to ClassLoader. In other words, if you are in standard JRE you can't intercept defineClass(...) method because the standard java has no API for this purpose, and if you use modified JRE with patched ClassLoader or any other “hacker trick” you can't do it because protected java app will not work at all, and therefore you will have nothing to intercept. And absolutely doesn't matter which “patch finder” is used or which trick is used by hackers. These technical details are a quite different story.
I don't think there exists any effective offline antipiracy method. The videogame industry has tried to find that many times and their programs has always been cracked. The only solution is that the program must be run online connected with your servers, so that you can verify the lincense key, and that there is only one active connecion by the licensee at a time. This is how World of Warcraft or Diablo works. Even tough there are private servers developed for them to bypass the security.
Having said that, I don't believe that mid/large corporations use illegal copied software, because the cost of the license for them is minimal (perhaps, I don't know how much you are goig to charge for your program) compared to the cost of a trial version.
Q: If I encrypt my .class files and use a custom classloader to load and decrypt them on the fly, will this prevent decompilation?
A: The problem of preventing Java byte-code decompilation is almost as old the language itself. Despite a range of obfuscation tools available on the market, novice Java programmers continue to think of new and clever ways to protect their intellectual property. In this Java Q&A installment, I dispel some myths around an idea frequently rehashed in discussion forums.
The extreme ease with which Java .class files can be reconstructed into Java sources that closely resemble the originals has a lot to do with Java byte-code design goals and trade-offs. Among other things, Java byte code was designed for compactness, platform independence, network mobility, and ease of analysis by byte-code interpreters and JIT (just-in-time)/HotSpot dynamic compilers. Arguably, the compiled .class files express the programmer's intent so clearly they could be easier to analyze than the original source code.
Several things can be done, if not to prevent decompilation completely, at least to make it more difficult. For example, as a post-compilation step you could massage the .class data to make the byte code either harder to read when decompiled or harder to decompile into valid Java code (or both). Techniques like performing extreme method name overloading work well for the former, and manipulating control flow to create control structures not possible to represent through Java syntax work well for the latter. The more successful commercial obfuscators use a mix of these and other techniques.
Unfortunately, both approaches must actually change the code the JVM will run, and many users are afraid (rightfully so) that this transformation may add new bugs to their applications. Furthermore, method and field renaming can cause reflection calls to stop working. Changing actual class and package names can break several other Java APIs (JNDI (Java Naming and Directory Interface), URL providers, etc.). In addition to altered names, if the association between class byte-code offsets and source line numbers is altered, recovering the original exception stack traces could become difficult.
Then there is the option of obfuscating the original Java source code. But fundamentally this causes a similar set of problems.
Encrypt, not obfuscate?
Perhaps the above has made you think, "Well, what if instead of manipulating byte code I encrypt all my classes after compilation and decrypt them on the fly inside the JVM (which can be done with a custom classloader)? Then the JVM executes my original byte code and yet there is nothing to decompile or reverse engineer, right?"
Unfortunately, you would be wrong, both in thinking that you were the first to come up with this idea and in thinking that it actually works. And the reason has nothing to do with the strength of your encryption scheme.

Categories