Patching Java software - java

I'm trying to create a process to patch our current java application so users only need to download the diffs rather than the entire application. I don't think I need to go as low level as a binary diff since most of the jar files are small, so replacing an entire jar file wouldn't be that big of a deal (maybe 5MB at most).
Are there standard tools for determining which files changed and generating a patch for them? I've seen tools like xdelta and vpatch, but I think they work at a binary level.
I basically want to figure out - which files need to be added, replaced or removed. When I run the patch, it will check the current version of the software (from a registry setting) and ensure the patch is for the correct version. If it is, it will then make the necessary changes. It doesn't sound like this would be too difficult to implement on my own, but I was wondering if other people had already done this. I'm using NSIS as my installer if that makes any difference.
Thanks,
Jeff

Be careful when doing this--I recommend not doing it at all.
The biggest problem is public static variables. They are actually compiled into the target, not referenced. This means that even if a java file doesn't change, the class must be recompiled or you will still refer to the old value.
You also want to be very careful of changing method signatures--you will get some very subtle bugs if you change a method signature and do not recompile all files that call that method--even if the calling java files don't actually need to change (for instance, change a parameter from an int to a long).
If you decide to go down this path, be ready for some really hard to debug errors (generally no traces or significant indications, just strange behavior like the number received not matching the one sent) on customer site that you cannot duplicate and a lot of pissed off customers.
Edit (too long for comment):
A binary diff of the class files might work but I'd assume that some kind of version number or date gets compiled in and that they'd change a little every compile for no reason but that could be easily tested.
You could take on some strict development practices of not using public final statics (make them private) and not every changing method signatures (deprecate instead) but I'm not convinced that I know all the possible problems, I just know the ones we encountered.
Also binary diffs of the Jar files would be useless, you'd have to diff the classes and re-integrate them into the jars (doesn't sound easy to track)
Can you package your resources separately then minimize your code a bit? Pull out strings (Good for i18n)--I guess I'm just wondering if you could trim the class files enough to always do a full build/ship.
On the other hand, Sun seems to do an okay job of making class files that are completely compatible with the previous JRE release, so they must have guidelines somewhere.

You may want to see if Java WebStart can help you as it is designed to do exactly those things you want to do.
I know that the documentation describes how to create and do incremental updates, but we deploy the whole application as it changes very rarely. It is then an issue of updating the JNLP when ready.

How is it deployed?
On a local network I just leave everything as .class files in a folder. The startup script uses robocopy or rsync to copy from network share to local. If any .class file is different it is synced down. If not, it doesn't sync.
For non-local network I created my own updater. It downloads a text file of md5sums and compares to local files. If different it pulls file down from http.

A long time ago the way we solved this was to used Classpath and jar files. Our application was built in a Jar file, and it had a launcher Jar file. The launcher classpath had a patch.jar that was read into the classpath before the main application.jar. This meant that we could update the patch.jar to supersede any classes in the main application.
However, this was a long time ago. You may be better using something like the Java Web Start type of approach, which offers more seamless application updating.

Related

Is there any functional reason for including the version number in the name of a JAR file?

In Java, I often see JAR files named with the version number of the software (jsoup-1.11.2.jar), while others are not (freemarker.jar).
Is this just a best practice/convention, or is there some functional reason for it?
Simple answer: no, this is purely a convention.
Obviously, tooling that checks versions can do that easily when version numbers are hard-coded like this. But there is no generic (like jvm based) tool relying on it.
And beyond that - sometimes this scheme is even counter productive. In our self grown build setup we have to always remember to update the build scripts after replacing JAR files - because a new version changes the file name (because version part of the file name).
Having the version in the name of the file allows you to quickly determine which of the n files you have is the latest. Also if you have no way of determining what the version is from within the program it can be helpful.

Hide a class in a .jar

Whenever I build my app all classes (logically) are visible in the .jar that comes out of it.
Aswell as a class that holds information to my MYSQL server (for the app to connect to). But I dont want this information to be publicly visible!
How can I "hide" this code or "hide" the class?
Thanks!!
I think you mean you dont want someone to do reverse engineering with your .class inside your jar file. There are many decompilers that can do that.
So you would need to Obfuscate your code with an obfuscator utility.
The process of obfuscation will convert bytecode into a logical
equivalent version that is extremely difficult for decompilers to pick
apart. Keep in mind that the decompilation process is extremely
complicated and cannot be easily 'tweaked' to bypassed obfuscated
code. Essentially the process is as follows:
Compile Java source code using a regular compiler (ie. JDK)
Run the obfuscator, passing in the compiled class file as a
parameter. The result will be a different output file (perhaps with a
different extension).
This file, when renamed as a .class file, will be functionally
equivalent to the original bytecode. It will not affect performance
because a virtual machine will still be able to interpret it.
Here is an article describing this process in more detail and
introducing an early obfuscator, Crema:
http://www.javaworld.com/javaworld/javatips/jw-javatip22.html

How to modify the class file?

I was working on the project in eclipse in which I have added this maven dependency for PDFBOX
Maven dependency
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>pdfbox</artifactId>
<version>1.6.0</version>
</dependency>
And I was getting the error on some pdf file as:
Parsing Error, Skipping Object
java.io.IOException: expected='endstream' actual='' org.apache.pdfbox.io.PushBackInputStream#1b8d77fe
at org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:439)
at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:552)
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:184)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1088)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1053)
at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:74)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)
at org.apache.tika.Tika.parseToString(Tika.java:357)
at edu.uci.ics.crawler4j.crawler.BinaryParser.parse(BinaryParser.java:37)
at edu.uci.ics.crawler4j.crawler.WebCrawler.handleBinary(WebCrawler.java:223)
at edu.uci.ics.crawler4j.crawler.WebCrawler.processPage(WebCrawler.java:460)
at edu.uci.ics.crawler4j.crawler.WebCrawler.run(WebCrawler.java:129)
at java.lang.Thread.run(Thread.java:662)
So when I google it, I found there was some bug in BaseParser.java file, So they have given the patch(https://issues.apache.org/jira/browse/PDFBOX-195) for this java file only.. So my question is how can I modify this java file only.. I can see the BaseParser.class file in eclipse as I have attached the source doc for that PDFBOX-Issue. Any suggestions will be appreciated.
Given that BaseParser.java is an Apache file, there is absolutely no reason why you cannot download the source, make your changes and re-compile it. I have done this with Apache code in the past. It was pretty straight forward and took me only a few minutes. Remember to submit your fix back to Apache so that way it will be included in the release.
You can:
create subclass manual (and use it if it possible)
download source, fix it, recompile, and finally, overwrite it in jar
create subclass programmaticly (using cglib or asm)
download only BasicParser, mock all depends (just create empty class files with needs methods), recompile it and put in jar (or ./ext ./endorsed dir in jvm, if you want)
Generally, one doesn't modify a class file directly, they download the source code and then rebuild the class file with javac. Yes, it is possible to modify class files without doing such a thing; but, patch files are not generally binary patch files, they are generally source code patch files.
Stefanglase has mentioned that the release you are working with should have the patch applied, but there is a small chance that a recent change reintroduced the issue. You might want to verify that you're not solving the wrong problem before you get too deep into it.
On the rare odds that you really want to modify a binary, you open it with a hexadecimal editor, or a hexeditor for short. Basically this allows you to set any byte in the file to any value, which means you must have a strong knowledge of the file's internal format, what is allowed / disallowed, and how to make allowable changes that actually implement your expected behavior. In short, you'll be doing a compiler's work manually, by hand.
It can be done, but it is the sort of task that generally requires a lot of knowledge, and few people have that knowledge already, so the costs of learning that knowledge and successfully implementing the change is likely much higher than rebuilding from available patched source. Even the costs of successfully implementing the change with the knowledge of the general principals and techniques already present isn't something that one can say with certainty is less than the costs of rebuilding the entire library with patched source.
Good Luck.

how to bundle javascript files into Java application?

Javascript is executed by Java application. However, something like Jquery library is really too long to fit into a String variable. I am able to read jquery.js from a file but not sure how to package it inside the .jar file.
Loading the .js files is the same as loading any other resource from a jar file. Generally, this is what I do:
For files stored in the root of the jar file:
SomeClass.getClass().getClassLoader.getResourceAsStream( "myFile.js" );
For files stored along side a .class file in the jar:
SomeClass.getClass().getResourceAsStream( "myFile.js" )
Both techniques give you an InputStream. This can be turned into a String with code a little bit more work. See Read/convert an InputStream to a String.
This technique is for when your resource files are in the same jar as your java class files.
There are all sorts of places you can keep your JavaScript sources:
In the CLASSPATH. You fetch them with getResourceAsStream()
In the database. Yes, the database. You fetch them like you'd fetch any other CLOB.
Personally I've use both approaches for different purposes. You can keep your JavaScript files around in your build tree in a way that exactly parallels the way you keep .properties files. Personally I just keep them in with the .java files and then have a build rule to make sure they end up in the .war, but they can really live anywhere your build engine can find them.
The database is a nice place to keep scripts because it makes it much easier for your web application to support a "script portal" that allows dynamic updates. That's an extremely powerful facility to have, especially if you craft the web application so that Javascript modules control some of the more important business logic, because you can deploy updates more-or-less "live" without anything like a deployment operation.
One thing that helps a lot is to create some utility code to "wrap" whatever access path you're using to Javascript (that is, either the Sun "javax.script" stuff, or else the Rhino bindings; at this point in time, personally I'd go with straight Rhino because it really doesn't make much difference one way or the other anyway, and the Sun stuff is stuck with a fairly old and buggy Rhino version that in the current climate will probably not see an update for a while). With a utility wrapper, one of the most important things to do is make it possible for your JavaScript code (wherever it comes from) to import other JavaScript files from your server infrastructure. That way you can develop JavaScript tool libraries (or, of course, adapt open-source libraries) and have your business logic scripts import and use them.

Hacker proofing a jar file

What techniques could I use to make my "jar" file Reverse Engineer proof?
You can't make it reverse engineer proof. If the java runtime can read the instructions, so can the user.
There are obfuscators which make the disassembled code less readable/understandable to make reverse engineering it harder, but you can't make it impossible.
Don't release it.
There is no such thing as hacker proof. Sorry.
EDIT FOR COMMENT:
The unfortunate truth is that no matter what barricade you put in the way, if the honestly want in, they'll get in. Simply because if they're persistent enough they'll be looking at your code from an Assembly level. Not a thing on earth you can do about it.
What you can look at doing is Obfuscating code, packing the jar and merging all externals packages into a single to make life harder. However no matter how high the hurdle, my comment in the previous paragraph still applies.
I think this is more about hardening the access path to the jar, more than anything else.
Try to determine what user context
will actually be executing the code
that will access the .jar. Lock
down access to the jar to read-only
access from only that user. How you do this
will depend on if you're using the jar from
a web app or a desktop .exe, and it will also
depend on the operating system you're running
under.
If possible -- sign the jar and
validate the signature from the
executable code. This will at least
tell you if the .jar has been
tampered with. You can then have
some logic to stop the executing application
from using the .jar (and log and display an error).
See jarsigner docs for more information.
I have seen one case where a company wrote a custom classloader, that could decrypt an encrypted jar file. The classloader itself used compiled JNI code, so that the decryption key and algorithm were fairly deeply obfuscated in the binary libary.
You are looking for an "obfuscator" (if you want to ship jars) . Many exist:
http://java-source.net/open-source/obfuscators
You should be aware that many obfuscation techniques removes information you may want to keep for troubleshooting purposes - think of the value of a stack trace from an irreproducible situation - or actual debugging sessions. Regardless of what you do, your quality testing should be done on the jars-to-be-shipped since the obfuscator may introduce subtle bugs.
If you really want to hide things, consider compiling to an native binary with gcj.
Definitely avoid placing any sensitive data in the code. For example:
passwords
database connection strings
One option would be to encrypt these (using industry-standard encryption routines; avoid rolling your own) and place them in an external configuration file or database.
As others have stated, any algorithms in deployed code can be reverse-engineered.
Sensitive algorithms could be placed in a web service or other server-side code if desired.

Categories