I want to protect some algorithms, from being reversed engineered. I know there is always a risk, but I want to make the work as complicated as possible. I know in Java there are ProGuard and other obfuscator. But the most knowledge isn't in the structure of the application, but in the numerical details of the algorithm. And reading about it, made me doubt on the protection of the algorithm.
Simple renaming some variables, wouldn't make it hard enough to reverse engineer the algorithms. Perhaps you can tell me, which methods would be more appropriate for algorithms and which of obfuscator may do the best work on algorithms.
At the moment I'm thinking about a bit handwork and to combine it with a tool.
Assuming that your algorithm should be executed as Java bytecode, on arbitrary JVMs. Then people can hack their JVM to dump the bytecode somewhere, no matter how much you obfuscate the class loading process. Once you have the bytecode, you can do control flow analysis, i.e. decide what information gets passed from where to where.
You can confuse the order of the individual instructions, but that won't change the computation. For someone who simply wants to run your algorithm unmodified, this doesn't change anything. How much a reordering will prevent people from modifying your algorithm very much depends on the algorithm and the complexity of the control flow.
You might be able to confuse the control flow using reflection in some obscure way, or by implementing your own interpreter and using that to run the algorithm. But both these approaches will likely come at a severe penalty to the performance of the algorithm.
In other languages (like native x86 code) you might be able to confuse the disassembler by introducing ambiguity about how the bytes should be split into instructions, using some bytes as tail part of an instruction in one case, but as a distinct instruction in other cases. But in Java there is no such option, the meaning of bytecode is too well defined.
One way you might be able to obfuscate things somewhat is by closely intermixing the algorithm with other steps of the program. For a straight-line program, this might make things a wee bit harder to track, in particular if you pass numbers through invisible GUI objects or similar bizarre stuff. But once you require loops or similar, getting the loop bounds lined up seems very hard, so I doubt that this approach has much potential either. And I doubt there is a ready-to-use obfuscator for this, so you'd have to do things by hand.
In my exeperience you can use .so file I.e. native implementation with java implementation and it is really hard to track with obfsucated code but only disadvantage is you will have to use JNI for that.
Related
From what I understand, obfuscating a java web application will just make it a little harder to read your application, but reverse engineering is still possible.
My goal is just to make it very difficult to read, and not be able to decompile and run (not sure if that's possible, I guess it will still run just with ugly variable names??)
So variable names like:
String username = "asdfsadf";
will become
String aw34Asdf234jkasdjl_asdf2343 = "asdfsdaf";
Is this correct:
public classes and variables will remain unchanged
ONLY private strings/classes/methods can be renamed
string encrytion can be used for some sensitive string data like encryption keys etc.
Really my goal is so that someone can't just decompile and release the code.
Web applications run server side. Clients will not see the code unless you mess things up.
There are plenty of good Java obfuscators which will do what you say, and much more. Here are some from google:
ProGuard
yGuard
JODE
Although these will make it much more difficult to read the decompiled code (and some decompilers will refuse to even try), keep in mind that it is always possible for someone to reverse-engineer the code if they have the binary, and are knowledgeable and patient enough.
The problem here is that the code needs to be in proper java syntax when you compile it. So no matter what obfustication you applied, if I have access to even just the bytecode I can figure out a way to reconstruct the source.
(http://www.program-transformation.org/Transform/JavaDecompilers#Java_Bytecode_Decompilers)
What you would need to do is keep the proprietary part of the software in such a place that your pirates would not be able to see it. As far as I am aware, that is the ONLY way to avoid hijacking your software.
You cannot prevent java code from being decompiled and run. Even if it is obfuscated, there may be people out there that are still able to figure out what your code is doing, despite the obfuscation. Everything you publish can be reverse engineered.
There exist even much stronger efforts in other languages to prevent decompiling and debugging, disk copy protection solutions for example, and even they get reverse engineered and hacked frequently.
If you don't want people to reverse engineer your code, let it run server side only, don't publish it and try to harden the server as much as possible.
http://www.excelsior-usa.com/protect-java-web-applications.html
Disclaimer: I work for Excelsior.
http://www.arxan.com/products/server/guardit-for-java/
Disclaimer: I don't work for Arxan.
No amount of obfuscation can protect you against "decompile & compile again" (without trying to understand what the code does). Decompilers don't care for unreadable variable names, nor do compilers.
Incidentally, if someone has access to your code, they don't need to decompile it to use it.
So the question is really: What do you want to achieve? When you know that then you can go to the next question: How much does it cost and how much money can I earn?
Usually, that equation is: You can't save/earn any money from obfuscation but doing it costs you time and money (good obfuscators aren't free). So it's a negative ROI.
Instead, try this approach: Create a great product (so people will feel it's justified to pay for it), fix bugs quickly (-> the thieves have to steal your work again and again just to keep up), add new features. That way, honest consumers have reason to buy from you.
If you plan to get money from thieves and criminals, well, forget it. They don't want to pay you, no matter what. You can make their lives a little bit harder but at a cost.
I was writing a Scheme interpreter (trying to be fully R5RS compatible) and it just struck me that compiling into VM opcodes would make it faster. (Correct me if I am wrong.) I can interpret the Scheme source code in the memory, but I am stuck at understanding code generation.
My question is: What patterns will be required to generate opcodes from a parse tree, for, say, the JVM or any other VM (or even a real machine)? And what, if any, will be the complications, advantages, or disadvantage of doing so?
For Scheme there will be two major complications related to JVM.
First, JVM does not support explicit tail calls annotations, therefore you won't be able to guarantee a proper tail recursion as required by R5RS (3.5) without resorting to an expensive mini-interpreter trick.
The second issue is with continuations support. JVM does not provide anything useful for implementing continuations, so again you're bound to use a mini-interpreter. I.e., each CPS trivial function should return a next closure, which will be then called by an infinite mini-interpreter loop.
But still there are many interesting optimisation possibilities. I'd recommend to take a look at Bigloo (there is a relatively fast JVM backend) and Kawa. For the general compilation techniques take a look at Scheme in 90 minutes.
And still, interpretation is a viable alternative to compilation (at least on JVM, due to its severe limitations and general inefficiency). See how SISC is implemented, it is quite an interesting and innovative approach.
I've looked at the related threads on StackOverflow and Googled with not much luck. I'm also very new to Java (I'm coming from a C# and .NET background) so please bear with me. There is so much available in the Java world it's pretty overwhelming.
I'm starting on a new Java-on-Linux project that requires some heavy and highly repetitious numerical calculations (i.e. statistics, FFT, Linear Algebra, Matrices, etc.). So maximizing the performance of the mathematical operations is a requirement, as is ensuring the math is correct. So hence I have an interest in finding a Java library that perhaps leverages native acceleration such as MKL, and is proven (so commercial options are definitely a possibility here).
In the .NET space there are highly optimized and MKL accelerated commercial Mathematical libraries such as Centerspace NMath and Extreme Optimization. Is there anything comparable in Java?
Most of the math libraries I have found for Java either do not seem to be actively maintained (such as Colt) or do not appear to leverage MKL or other native acceleration (such as Apache Commons Math).
I have considered trying to leverage MKL directly from Java myself (e.g. JNI), but me being new to Java (let alone interoperating between Java and native libraries) it seemed smarter finding a Java library that has already done this correctly, efficiently, and is proven.
Again I apologize if I am mistaken or misguided (even in regarding any libraries I've mentioned) and my ignorance of the Java offerings. It's a whole new world for me coming from the heavily commercialized Microsoft stack so I could easily be mistaken on where to look and regarding the Java libraries I've mentioned. I would greatly appreciate any help or advice.
For things like FFT (bulk operations on arrays), the range check in java might kill your performance (at least recently it did). You probably want to look for libraries which optimize the provability of their index bounds.
According to the The HotSpot spec
The Java programming language
specification requires array bounds
checking to be performed with each
array access. An index bounds check
can be eliminated when the compiler
can prove that an index used for an
array access is within bounds.
I would actually look at JNI, and do your bulk operations there if they are individually very large. The longer the operation takes (i.e. solving a large linear system, or large FFT) the more its worth it to use JNI (even if you have to memcpy there and back).
Personally, I agree with your general approach, offloading the heavyweight maths from Java to a commercial-grade library.
Googling around for Java / MKL integration I found this so what you propose is technically possible. Another option to consider would be the NAG libraries. I use the MKL all the time, though I program in Fortran so there are no integration issues. I can certainly recommend their quality and performance. We tested, for instance, the MKL version of FFTW against a version we built from sources ourselves. The MKL implementation was faster by a small integer multiple.
If you have concerns about the performance of calling a library through JNI, then you should plan to structure your application to make fewer larger calls in preference to more smaller ones. As to the difficulties of using JNI, my view (I've done some JNI programming) is that the initial effort you have to make in learning how to use the interface will be well rewarded.
I note that you don't seem to be overwhelmed yet with suggestions of what Java maths libraries you could use. Like you I would be suspicious of research-quality, low-usage Java libraries trawled from the net.
You'd probably be better off avoiding them I think. I could be wrong, it's not a bit I'm too familiar with, so don't take too much from this unless a few others agree with me, but calling up the JNI has quite a large overhead, since it has to go outside of the JRE and everything to do it, so unless you're grouping a lot of things together into a single function to put through at once, the slight benefit of the external library's will be outweighed hugely by the cost of calling them. I'd give up looking for an MKL library and find an optimized pure Java library. I can't say I know of any better than the standard one to recommend though, sorry.
I'm about to port a smallish library from Java to Python and wanted some advice (smallish ~ a few thousand lines of code). I've studied the Java code a little, and noticed some design patterns that are common in both languages. However, there were definitely some Java-only idioms (singletons, etc) present that are generally not-well-received in Python-world.
I know at least one tool (j2py) exists that will turn a .java file into a .py file by walking the AST. Some initial experimentation yielded less than favorable results.
Should I even be considering using an automated tool to generate some code, or are the languages different enough that any tool would create enough re-work to have justified writing from scratch?
If tools aren't the devil, are there any besides j2py that can at least handle same-project import management? I don't expect any tool to match 3rd party libraries from one language to a substitute in another.
If it were me, I'd consider doing the work by hand. A couple thousand lines of code isn't a lot of code, and by rewriting it yourself (rather than translating it automatically), you'll be in a position to decide how to take advantage of Python idioms appropriately. (FWIW, I worked Java almost exclusively for 9 years, and I'm now working in Python, so I know the kind of translation you'd have to do.)
Code is always better the second time you write it anyway....
Plus a few thousand lines of Java can probably be translated into a few hundred of Python.
Have a look at Jython. It can fairly seamlessly integrate Python on top of Java, and provide access to Java libraries but still let you act on them dynamically.
Automatic translators (f2c, j2py, whatever) normally emit code you wouldn't want to touch by hand. This is fine when all you need to do is use the output (for example, if you have a C compiler and no Fortran compiler, f2c allows you to compile Fortran programs), but terrible when you need to do anything to the code afterwards. If you intend to use this as anything other than a black box, translate it by hand. At that size, it won't be too hard.
I would write it again by hand. I don't know of any automated tools that would generate non-disgusting looking Python, and having ported Java code to Python myself, I found the result was both higher quality than the original and considerably shorter.
You gain quality because Python is more expressive (for example, anonymous inner class MouseAdapters and the like go away in favor of simple first class functions), and you also gain the benefit of writing it a second time.
It also is considerably shorter: for example, 99% of getters/setters can just be left out in favor of directly accessing the fields. For the other 1% which actually do something you can use property().
However as David mentioned, if you don't ever need to read or maintain the code, an automatic translator would be fine.
Jython's not what you're looking for in the final solution, but it will make the porting go much smoother.
My approach would be:
If there are existing tests (unit or otherwise), rewrite them in Jython (using Python's unittest)
Write some characterization tests in Jython (tests that record the current behavior)
Start porting class by class:
For each class, subclass it in Jython and port the methods one by one, making the method in the superclass abstract
After each change, run the tests!
You'll now have working Jython code that hopefully has minimal dependencies on Java.
Run the tests in CPython and fix whatever's left.
Refactor - you'll want to Pythonify the code, probably simplifying it a lot with Python idioms. This is safe and easy because of the tests.
I've this in the past with great success.
I've used Java2Python. It's not too bad, you still need to understand the code as it doesn't do everything correctly, but it does help.
How do I lock compiled Java classes to prevent decompilation?
I know this must be very well discussed topic on the Internet, but I could not come to any conclusion after referring them.
Many people do suggest obfuscator, but they just do renaming of classes, methods, and fields with tough-to-remember character sequences but what about sensitive constant values?
For example, you have developed the encryption and decryption component based on a password based encryption technique. Now in this case, any average Java person can use JAD to decompile the class file and easily retrieve the password value (defined as constant) as well as salt and in turn can decrypt the data by writing small independent program!
Or should such sensitive components be built in native code (for example, VC++) and call them via JNI?
Some of the more advanced Java bytecode obfuscators do much more than just class name mangling. Zelix KlassMaster, for example, can also scramble your code flow in a way that makes it really hard to follow and works as an excellent code optimizer...
Also many of the obfuscators are also able to scramble your string constants and remove unused code.
Another possible solution (not necessarily excluding the obfuscation) is to use encrypted JAR files and a custom classloader that does the decryption (preferably using native runtime library).
Third (and possibly offering the strongest protection) is to use native ahead of time compilers like GCC or Excelsior JET, for example, that compile your Java code directly to a platform specific native binary.
In any case You've got to remember that as the saying goes in Estonian "Locks are for animals". Meaning that every bit of code is available (loaded into memory) during the runtime and given enough skill, determination and motivation, people can and will decompile, unscramble and hack your code... Your job is simply to make the process as uncomfortable as you can and still keep the thing working...
As long as they have access to both the encrypted data and the software that decrypts it, there is basically no way you can make this completely secure. Ways this has been solved before is to use some form of external black box to handle encryption/decryption, like dongles, remote authentication servers, etc. But even then, given that the user has full access to their own system, this only makes things difficult, not impossible -unless you can tie your product directly to the functionality stored in the "black box", as, say, online gaming servers.
Disclaimer: I am not a security expert.
This sounds like a bad idea: You are letting someone encrypt stuff with a 'hidden' key that you give him. I don't think this can be made secure.
Maybe asymmetrical keys could work:
deploy an encrypted license with a public key to decrypt
let the customer create a new license and send it to you for encryption
send a new license back to the client.
I'm not sure, but I believe the client can actually encrypt the license key with the public key you gave him. You can then decrypt it with your private key and re-encrypt as well.
You could keep a separate public/private key pair per customer to make sure you actually are getting stuff from the right customer - now you are responsible for the keys...
No matter what you do, it can be 'decompiled'. Heck, you can just disassemble it. Or look at a memory dump to find your constants. You see, the computer needs to know them, so your code will need to too.
What to do about this?
Try not to ship the key as a hardcoded constant in your code: Keep it as a per-user setting. Make the user responsible for looking after that key.
#jatanp: or better yet, they can decompile, remove the licensing code, and recompile. With Java, I don't really think there is a proper, hack-proof solution to this problem. Not even an evil little dongle could prevent this with Java.
My own biz managers worry about this, and I think too much. But then again, we sell our application into large corporates who tend to abide by licensing conditions--generally a safe environment thanks to the bean counters and lawyers. The act of decompiling itself can be illegal if your license is written correctly.
So, I have to ask, do you really need hardened protection like you are seeking for your application? What does your customer base look like? (Corporates? Or the teenage gamer masses, where this would be more of an issue?)
If you're looking for a licensing solution, you can check out the TrueLicense API. It's based on the use of asymmetrical keys. However, it doesn't mean your application cannot be cracked. Every application can be cracked with enough effort. What really important is, as Stu answered, figuring out how strong protection you need.
You can use byte-code encryption with no fear.
The fact is that the cited above paper “Cracking Java byte-code encryption” contains a logic fallacy. The main claim of the paper is before running all classes must be decrypted and passed to the ClassLoader.defineClass(...) method. But this is not true.
The assumption missed here is provided that they are running in authentic, or standard, java run-time environment. Nothing can oblige the protected java app not only to launch these classes but even decrypt and pass them to ClassLoader. In other words, if you are in standard JRE you can't intercept defineClass(...) method because the standard java has no API for this purpose, and if you use modified JRE with patched ClassLoader or any other “hacker trick” you can't do it because protected java app will not work at all, and therefore you will have nothing to intercept. And absolutely doesn't matter which “patch finder” is used or which trick is used by hackers. These technical details are a quite different story.
I don't think there exists any effective offline antipiracy method. The videogame industry has tried to find that many times and their programs has always been cracked. The only solution is that the program must be run online connected with your servers, so that you can verify the lincense key, and that there is only one active connecion by the licensee at a time. This is how World of Warcraft or Diablo works. Even tough there are private servers developed for them to bypass the security.
Having said that, I don't believe that mid/large corporations use illegal copied software, because the cost of the license for them is minimal (perhaps, I don't know how much you are goig to charge for your program) compared to the cost of a trial version.
Q: If I encrypt my .class files and use a custom classloader to load and decrypt them on the fly, will this prevent decompilation?
A: The problem of preventing Java byte-code decompilation is almost as old the language itself. Despite a range of obfuscation tools available on the market, novice Java programmers continue to think of new and clever ways to protect their intellectual property. In this Java Q&A installment, I dispel some myths around an idea frequently rehashed in discussion forums.
The extreme ease with which Java .class files can be reconstructed into Java sources that closely resemble the originals has a lot to do with Java byte-code design goals and trade-offs. Among other things, Java byte code was designed for compactness, platform independence, network mobility, and ease of analysis by byte-code interpreters and JIT (just-in-time)/HotSpot dynamic compilers. Arguably, the compiled .class files express the programmer's intent so clearly they could be easier to analyze than the original source code.
Several things can be done, if not to prevent decompilation completely, at least to make it more difficult. For example, as a post-compilation step you could massage the .class data to make the byte code either harder to read when decompiled or harder to decompile into valid Java code (or both). Techniques like performing extreme method name overloading work well for the former, and manipulating control flow to create control structures not possible to represent through Java syntax work well for the latter. The more successful commercial obfuscators use a mix of these and other techniques.
Unfortunately, both approaches must actually change the code the JVM will run, and many users are afraid (rightfully so) that this transformation may add new bugs to their applications. Furthermore, method and field renaming can cause reflection calls to stop working. Changing actual class and package names can break several other Java APIs (JNDI (Java Naming and Directory Interface), URL providers, etc.). In addition to altered names, if the association between class byte-code offsets and source line numbers is altered, recovering the original exception stack traces could become difficult.
Then there is the option of obfuscating the original Java source code. But fundamentally this causes a similar set of problems.
Encrypt, not obfuscate?
Perhaps the above has made you think, "Well, what if instead of manipulating byte code I encrypt all my classes after compilation and decrypt them on the fly inside the JVM (which can be done with a custom classloader)? Then the JVM executes my original byte code and yet there is nothing to decompile or reverse engineer, right?"
Unfortunately, you would be wrong, both in thinking that you were the first to come up with this idea and in thinking that it actually works. And the reason has nothing to do with the strength of your encryption scheme.