Reduce external jar file size - java

I've developed a module for a Java project. The module depends on external library (fastutil). The problem is, the fastutil.jar file is a couple of times heavier than the whole project itself (14 MB). I only use a tiny subset of the classes from the library. The module is now finished, and no-one is likely to extend it in future. Is there a way I could extract only the relevant class to some fastutil_small.jar so that others don't have to download all this extra weight?

Obfuscation tools such as ProGuard usually provide a feature to remove unused classes (and even fields and methods) from the jar file. You have to be careful to verify everything still works, 'though, because you might be using reflecton to access classes or methods that ProGuard can't analyze.
You can use only that feature and already get quite some saving
Or you could combine it with other space-saving obfuscation techniques (such as class and method renaming) to save even more space at the cost of harder debugging (your stack traces will become harder to parse).

From the installation instructions of fastutil:
Note that the jar file is huge, due to the large number of classes: if you plan to ship your own jar with some fastutil classes included, you should look at AutoJar (also available at JPackage) to extract automatically the necessary classes.

As fastutil is LGPL open-source software, you could just copy the relevant source files to your project and drop that jar file. The compiler will then tell you if have all the files you need. Just keep the packages as they are and put a copy of the fastutil license file on top.

Yeah one crude is to have a backup of your original jar. then remove all unused class files from the jar. and there may be some internal references to other class which you can add as and when it is required. ie while executing it may throw a class not found exception so then you can add that class from the original jar to this jar.

For a project of any complexity, I would personally avoid Proguard for this very specific purpose of shrinking an external library like fastutil because doing so currently requires considerable configuration to specify all the jars that will be modified and those that are to be left intact. You will also have to specify the filters to get the source contents from your input jars into the correct output jars.
On top of that, the tool does not like to modify external jar files without having access to modify the application jar file. It will generate an error as a 'warning' even when only using the shrink option that indicates a library that is to be updated is referenced by a fixed jar file. If you are only shrinking the code and doing no optimizations or obfuscation, this requirement is an unnecessary limitation. In my case, this was forcing me to include a whole set of library references in my Proguard configuration as inputs when my only goal was to eliminate classes from the fastutil jar that I do not use.
I think Proguard could solve this issue with minor changes but for right now, I found its usage for the purpose to be frustrating and time consuming. Instead I offer up this solution for anyone who has this specific problem.
In this example, I use ant to clearly remove those primitive types that I do not use in my application and then remove the specific implementations of maps that I do not use. With this configuration, I reduced the jar file from 23Mb to 5Mb which was sufficient for my case.
<jar destfile="F:/Programs/Java/JARS/fastutil-8.5.6-partial.jar">
<zipfileset src="F:/Programs/Java/JARS/fastutil-8.5.6.jar">
<!-- eliminate keys of specific primitive types -->
<exclude name="it/unimi/dsi/fastutil/booleans/**"/>
<exclude name="it/unimi/dsi/fastutil/chars/**"/>
<exclude name="it/unimi/dsi/fastutil/doubles/**"/>
<exclude name="it/unimi/dsi/fastutil/io/**"/>
<exclude name="it/unimi/dsi/fastutil/longs/**"/>
<!-- eliminate maps of specific implementations -->
<exclude name="it/unimi/dsi/fastutil/**/*ArrayMap*"/>
<exclude name="it/unimi/dsi/fastutil/**/*AVLTree*"/>
<exclude name="it/unimi/dsi/fastutil/**/*CustomHash*"/>
<exclude name="it/unimi/dsi/fastutil/**/*Linked*"/>
<exclude name="it/unimi/dsi/fastutil/**/*RBTree*"/>
<exclude name="it/unimi/dsi/fastutil/**/*Reference*"/>
</zipfileset>
<zipfileset src="F:/Programs/Java/JARS/fastutil-8.5.6.jar">
<include name="**/*2ObjectFunction.class"/>
</zipfileset>
</jar>
While this is not the optimal solution, it is easier to setup and troubleshhoot than using Proguard.

Related

Specify a jar matching a pattern for Ant's java task?

I've got a jar file that will match a certain pattern, but contains a version number that will change over time (it's placed there using a dependency manager). It is the only jar file that will ever be in that directory. Is there a way I can invoke Ant's java task using the directory and (if necessary) a pattern to find the jar file rather than statically specifying the filename? I'd rather not have to update build.xml every time the version changes.
So given something like this:
/path/to/some/jarfile-3.24.8.jar
... that might later look like this:
/path/to/some/jarfile-4.3.2.jar
Can I achieve the equivalent of this?
<java jar="path/to/some/jarfile-*.jar" fork="true" spawn="true" />
Thoughts
I was thinking maybe I could rig something up using the fileset task, but I couldn't find a way to reference it from the java task (and the java task doesn't support the <include> nested tag like some other tasks do).
... after further research, pathconvert looks like it might help. I'm not familiar enough with the mechanics to see how to piece things together though.
After some trial and error, I found a solution using fileset and pathconvert - here's my simplified solution:
<fileset dir="/path/to/some" id="jarfile-path-contents" includes="jarfile-*.jar" />
<pathconvert property="file.jarfile" refid="jarfile-path-contents" />
<java jar="${file.jarfile}" fork="true" spawn="false" />
I used fileset in order to specify the pattern and location of the jar.
I used pathconvert to place the found jar file reference into a property.
Then I was able to simply reference that property in my java task.

Best practice for code modification during ant build

Admitted, this doesn't sound like a best practice altogether, but let me explain. During the build, we need to paste the build number and the system version into a class whose sole purpose is to contain these values and make them accessible.
Our first idea was to use system properties, but due to the volatility of the deployment environment (an other way of saying "the sysadmins are doing weird unholy creepy things") we would like to have them hard-coded.
Essentially I see 4 possibilities to achieve it in ant :
use <replace> on a token in the class
The problem with this approach is that the file is changed, so you have to replace the token back after compilation with a <replaceregexp>...sooo ugly, I don't want to touch source code with regex. Plus temporal dependencies.
copy the file, make replace on the copy, compile copy, delete copy
One one has to mind the sequence - the original class has to be compiled first in order to be overwritten by the copy. Temporal dependencies are ugly too.
copy the file, replace the token on the original, compile, replace the stained original with the copy
Same temporal dependency issue unless embedded in the compile target. Which is ugly too, because all our build files use the same imported compile target.
create the file from scratch in the build script / store the file outside the source path
Is an improvement over the first three as there are no temporal dependencies, but the compiler/IDE is very unhappy as it is oblivious of the class. The red markers are disturbingly ugly.
What are your thoughts on the alternatives?
Are there any best practices for this?
I sure hope I have missed a perfectly sane approach.
Thank you
EDIT
We ended up using the manifest to store the build number and system version in the Implementation-Version attribute, unsing MyClass.class.getPackage().getImplementationVersion(). I have found this solution was one of the answers to this thread, which was posted in the comment by andersoj
I think a simpler approach would be to have your Version.java class read from a simple .properties file included in the JAR, and just generate this .properties file at build-time in the Ant build. For example just generate:
build.number = 142
build.timestamp = 5/12/2011 12:31
The built-in <buildnumber> task in Ant does half of this already (see the second example).
#2 is generally the way I've seen it done, except that your not-ready-to-compile sources should be in a separate place from you ready-to-compile sources. This avoids the temporal issues you talk about as it should only be compiled once.
This is a common pattern that shows up all the time in software build processes.
The pattern being:
Generate source from some resource and then compile it.
This applies to many things from filtering sources before compilation to generating interface stubs for RMI, CORBA, Web Services, etc...
Copy the source to a designated 'generated sources' location and do the token replacement on the copies files to generate sources, then compile the generated sources to your compiled classes destination.
The order of compilation will depend on whether or not your other sources depend on the generated sources.
My solution would be to:
use on a token in the class:
<replace dir="${source.dir}" includes="**/BuildInfo.*" summary="yes">
<replacefilter token="{{BUILD}}" value="${build}" />
<replacefilter token="{{BUILDDATE}}" value="${builddate}" />
</replace>
This replacement should only take place in the build steps performed by your build system, never within a compile/debug session inside an IDE.
The build system setup should not submit changed source code back to the source repository anyway, so the problem of changed code does not exist with this approach.
In my experience it does not help when you place the build information in a property file, as administrators tend to keep property files while upgrading - replacing the property file that came out of the install. (Build information in a property file is informational to us. It gives an opportunity to check during startup if the property file is in synch with the code version.)
I remember we used the 4th approach in a little different way. You can pass release number to the ant script while creating a release.Ant script should include that in the release(config/properties file) and your class should read it from there may be using properties file or config file.
I always recommend to create some sort of directory and put all built code there. Don't touch the directories you checked out. I usually create a target directory and place all files modified and built there.
If there aren't too many *.java files (or *.cpp files), copy them to target/source' and compile there. You can use thetask with a` to modify this file one file with the build number as you copy it.
<javac srcdir="${target.dir}/source"
destdir="${target.dir}/classes"
[yadda, yadda, yadda]
</java>
This way, you're making no modification in the checked out source directory, so no one will accidentally check in the changes. Plus, you can do a clean by simply deleting the target directory.
If there are thousands, if not millions of *.java files, then you can copy the templates to target/source and then compile the source in both {$basedir}/source and target/source. That way, you're still not mucking up the checked out code and leaving a chance that someone will accidentally check in a modified version. And, you can still do a clean by simply removing target.
I was looking for a solution to the same problem, reading this link: http://ant.apache.org/manual/Tasks/propertyfile.html I was able to findout the solution.
I work with netbeans, so I just need to add this piece of code to my build.xml
<target name="-post-init">
<property name="header" value="##Generated file - do not modify!"/>
<propertyfile file="${src.dir}/version.prop" comment="${header}">
<entry key="product.build.major" type="int" value="1" />
<entry key="product.build.minor" type="int" default="0" operation="+" />
<entry key="product.build.date" type="date" value="now" />
</propertyfile>
</target>
This will increment the minor version each time yo compile the project with clean and build. So you are save to run the project any time that the minor version will stay still.
And I just need to read the file in Runtime. I hope this help.

How obfuscate part of code?

I try to obfuscate my project, but not all code. I try obfuscate only code from 1 package.
How can i do it in yguard (or somewhere else, proguard?)?
Thanks!
From the documentation:
There are three possible ways of specifying which classes will be excluded from the shrinking and obfuscation process:
It looks like the second way will be most useful for you:
One can specify multiple java classes
using a modified version of a
patternset. The patternset's includes
and excludes element should use java
syntax, but the usual wildcards are
allowed. Some examples:
<class>
<patternset>
<include name="com.mycompany.**.*Bean"/>
<exclude name="com.mycompany.secretpackage.*"/>
<exclude name="com.mycompany.myapp.SecretBean"/>
</patternset>
</class>

Can standard Sun javac do incremental compiling?

Recently I started to use Eclipse's java compiler, because it is significantly faster than standard javac. I was told that it's faster because it performs incremental compiling. But I'm still a bit unsure about this since I can't find any authoritative documentation about both - eclispse's and sun's - compilers "incremental feature". Is it true that Sun's compiler always compiles every source file and Eclipse's compiler compile only changed files and those that are affected by such a change?
Edit: I'm not using Eclipse autobuild feature but instead I'm setting
-Dbuild.compiler=org.eclipse.jdt.core.JDTCompilerAdapter
for my ant builds.
Is it true that Sun's compiler always compiles every source file and Eclipse's compiler compile only changed files and those that are affected by such a change?
I believe that you are correct on both counts.
You can of course force Eclipse to recompile everything.
But the other part of the equation is that Java build tools like Ant and Maven are capable of only compiling classes that have changed, and their tree of dependent classes.
EDIT
In Ant, incremental compilation can be done in two ways:
By default the <javac> task compares the timestamps of .java and corresponding .class files, and only tells the Java compiler to recompile source (.java) files that are newer than their corresponding target (.class) files, or that don't have a target file at all.
The <depend> task also takes into account dependencies between classes, which it determines by reading and analysing the dependency information embedded in the .class files. Having determined which .class files are out of date, the <depend> task deletes them so a following <javac> task will recompile them. However, this is not entirely fool-proof. For example, extensive changes to the source code can lead to the <depend> task may be analysing stale dependencies. Also certain kinds of dependency (e.g. on static constants) are not apparent in the .class file format.
To understand why Ant <depend> is not fool-proof, read the "Limitations" section of the documentation.
Javac only compiles source files that are either named on the command line or are dependencies and are out of date. Eclipse may have a finer-grained way of deciding what that means.
Eclipse certainly does this. Also it does it at save time if you have that option turned on (and it is by default). It looks like sun also doesn't do this (it is very easy to test, just make a small project where A is the main class that uses class B, but B doesn't use class A. Then change A and compile the project again, see if the timestamp for b.class has changed.
This is the way many compilers work (also gcc for instance). You can use tools like ant and make to compile only the part the project that has changed. Also note that these tools aren't perfect, sometimes eclipse just loses track of the changes and you'll need to do a full rebuild.
Restating what I've heard here and phrasing it for lazy folks like me:
You can achieve incremental builds with the javac task in ant, but you should use the depend task to clear out .class files for your modified .java AND you must not leave the includes statement unspecified in the javac task. (Specifying just src path in the javac task and leaving includes unspecified causes javac recompile all sources it finds.)
Here are my depends and javac tasks. With the standard Oracle java compiler, only .java files I modify are compiled. Hope this helps!
<depend srcdir="JavaSource" destdir="${target.classes}" cache="${dependencies.dir}" closure="yes">
<classpath refid="compiler.classpath" />
<include name="**/*.java"/>
</depend>
<javac destdir="${target.classes}" debug="true" debuglevel="${debug.features}" optimize="${optimize.flag}" fork="yes" deprecation="no" source="1.6" target="1.6" encoding="UTF-8" includeantruntime="no">
<classpath refid="compiler.classpath"/>
<src path="JavaSource"/>
<include name="**/*.java" /> <!-- This enables the incremental build -->
</javac>

Best Apache Ant Template [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
Every time I create a new project I copy the last project's ant file to the new one and make the appropriate changes (trying at the same time to make it more flexible for the next project). But since I didn't really thought about it at the beginning, the file started to look really ugly.
Do you have an Ant template that can be easily ported in a new project? Any tips/sites for making one?
Thank you.
An alternative to making a template is to evolve one by gradually generalising your current project's Ant script so that there are fewer changes to make the next time you copy it for use on a new project. There are several things you can do.
Use ${ant.project.name} in file names, so you only have to mention your application name in the project element. For example, if you generate myapp.jar:
<project name="myapp">
...
<target name="jar">
...
<jar jarfile="${ant.project.name}.jar" ...
Structure your source directory structure so that you can package your build by copying whole directories, rather than naming individual files. For example, if you are copying JAR files to a web application archive, do something like:
<copy todir="${war}/WEB-INF/lib" flatten="true">
<fileset dir="lib" includes="**/*.jar">
</copy>
Use properties files for machine-specific and project-specific build file properties.
<!-- Machine-specific property over-rides -->
<property file="/etc/ant/build.properties" />
<!-- Project-specific property over-rides -->
<property file="build.properties" />
<!-- Default property values, used if not specified in properties files -->
<property name="jboss.home" value="/usr/share/jboss" />
...
Note that Ant properties cannot be changed once set, so you override a value by defining a new value before the default value.
You can give http://import-ant.sourceforge.net/ a try.
It is a set of build file snippets that can be used to create simple custom build files.
I had the same problem and generalized my templates and grow them into in own project: Antiplate. Maybe it's also useful for you.
If you are working on several projects with similar directory structures and want to stick with Ant instead of going to Maven use the Import task. It allows you to have the project build files just import the template and define any variables (classpath, dependencies, ...) and have all the real build script off in the imported template. It even allows overriding of the tasks in the template which allows you to put in project specific pre or post target hooks.
I used to do exactly the same thing.... then I switched to maven. Maven relies on a simple xml file to configure your build and a simple repository to manage your build's dependencies (rather than checking these dependencies into your source control system with your code).
One feature I really like is how easy it is to version your jars - easily keeping previous versions available for legacy users of your library. This also works to your benefit when you want to upgrade a library you use - like junit. These dependencies are stored as separate files (with their version info) in your maven repository so old versions of your code always have their specific dependencies available.
It's a better Ant.
I used to do exactly the same thing.... then I switched to maven.
Oh, it's Maven 2. I was afraid that someone was still seriously using Maven nowadays. Leaving the jokes aside: if you decide to switch to Maven 2, you have to take care while looking for information, because Maven 2 is a complete reimplementation of Maven, with some fundamental design decisions changed. Unfortunately, they didn't change the name, which has been a great source of confusion in the past (and still sometimes is, given the "memory" nature of the web).
Another thing you can do if you want to stay in the Ant spirit, is to use Ivy to manage your dependencies.
One thing to look at -- if you're using Eclipse, check out the ant4eclipse tasks. I use a single build script that asks for the details set up in eclipse (source dirs, build path including dependency projects, build order, etc).
This allows you to manage dependencies in one place (eclipse) and still be able to use a command-line build to automation.

Categories