Should I bundle source and class files in the same JAR? - java

Separate Jars
When creating JAR files, I've always kept the source separate and offered it as an optional extra.
eg:
Foo.jar
Foo-source.jar
It seems to be the obvious way to do things and is very common. Advantages being:
Keeps binary jar small
Source may not be open / public
Faster for classloader? (I've no idea, just guessing)
Single Jar
I've started to doubt whether these advantages are always worth it. I'm working on a tiny component that is open-source. None of the advantages I've listed above were problems in this project anyway:
Classes + source still trivially small (and will remain that way)
Source is open
Class loading speed of this jar is irrelevant
Keeping the source with the classes does however bring new advantages:
Single dependency
No issues of version mismatch between source and classes
Developers using this jar will always have the source to hand (to debug or inspect)
Those new advantages are really attractive to me. Yes, I could just zip source, classes and even javadoc into a zip file and let clients of my component decide which they want to use (like Google do with the guava libraries) but is it really worth it?
I know it goes against conventional software engineering logic a little, but I think the advantages of a single jar file out-weigh the alternatives.
Am I wrong? Is there a better way?

Yes, I could just zip source, classes and even javadoc into a zip file and let clients of my component decide which they want to use (like Google do with the guava libraries) but is it really worth it?
Of course it is worth it! It takes about 2 seconds to do it, or just a few minutes to change your build scripts.
This is the way that most people who distribute sources and binaries handle this problem.
EDIT
It is not your perspective you need to consider. You have to think of this from the perspective of the people deploying / using your software.
They aren't going to use the source code on the deployment platform.
Therefore putting the source code in the binary JAR is a waste of disc space, slows down deployment and slows down application startup.
If they want to do something about it, they've got a problem. How do they rebuild the JAR file to get rid of the source code? How do they know what is safe to leave out?
From the deployer / user's perspectives, there are no positives, only negatives.
Finally, your point about people not being able to track source versus binary versions doesn't really hold water. Most people who would be interested in the source code are perfectly capable of doing this. Besides, there some simple things you can do to address the issue, like using JAR filenames that include your software's version number, or putting the version number into the manifest.

I have just come across a potential pitfall for the java+classes in a single jar.
If you have java files in a jar and that jar is included in the classpath of a subsequent javac execution, you MUST make sure that the timestamps of the java file is less than the timestamp of the class file.
This scenario can happen when you copy/move the java or class files prior to packaging as a jar.
If the java file is newer than the class, then even though the java file is found on the classpath (rather than an argument to javac), javac will attempt to compile that java file and then potentially end up with duplicate class errors during the compilation stage.
For this reason I would recommend keeping the source in a separate jar to the class files.
Note that relevant flags in javac will not allow you to prefer class over source: http://docs.oracle.com/javase/7/docs/technotes/tools/windows/javac.html#searching

I prefer 'Separate Jars'.
Because binary class jar is for running on JVM, but source not. Source should be carefully maintained by your source control system(SVN). If source needs to release, zip it in separate jar. Many open source separates class jar and source one.

If you want others to test and inspect/improve your code then you can have your source with the binaries. If not, keep the source away from the jar.

How small is small and why should your jar act differently from others?
Unless you have a very good reason why your jar should have the sources, not simply debugging but something specific to this one jar then I'd say no, choice is best.
I say this because if your jar should not be different from other, then you have to work on the assumption that others should do the same as you. If so, the size of the jar is not important, because its duplicated over all "small" jars. Then my WAR is much bigger than needed which, admittedly is not a massive issue, but is not something I would chose for production when I can download sources in DEV so easily.

Related

Altering open source code

Assume that I am using an open source jar file in my project which is of size 11mb. But I am not utilizing this jar fully (I'll never utilize in the future too). I know that I just need couple of classes from this jar which does my job. In such case, can I just delete the other classes in the jar file and use it?
I'll make sure that whatever the classes remain in the jar is complete by itself. Meaning, these classes do not depend on any other classes in the jar. So Can I just remove the unwanted classes in the jar so that the jar file gets reduced? If I do this job, it it legal? Am I allowed to do such stuff and use in my project?
SmartGWT appears to use the LGPL licence. This means you can link to it even in a proprietary closed-source application without the need to release your source code if you distribute it.
However, this freedom may not apply if you modify the library.
A program that contains no derivative of any portion of the Library, but is designed to work with the Library by being compiled or linked with it, is called a "work that uses the Library". Such a work, in isolation, is not a derivative work of the Library, and therefore falls outside the scope of this License.
It could be argued that chopping out bits of the library creates a derivative work even though you've not altered the source code itself, but IANAL.
Of course, if you are not distributing your project (for example, it's an internal business application for your company) then I don't believe the requirement to release your source code applies even with a derivative work.

What is the best practice for including third party jar files in a Java program?

I have a program that needs several third-party libraries, and at the moment it is packaged like so:
zerobot.jar (my file)
libs/pircbot.jar
libs/mysql-connector-java-5.1.10-bin.jar
libs/c3p0-0.9.1.2.jar
As far as I know the "best" way to handle third-party libs is to put them on the classpath in the manifest of my jar file, which will work cross-platform, won't slow down launch (which bundling them might) and doesn't run into legal issues (which repackaging might).
The problem is for users who supply the third party libraries themselves (example use case, upgrading them to fix a bug). Two of the libraries have the version number in the file, which adds hassle.
My current solution is that my program has a bootstrapping process which makes a new classloader and instantiates the program proper using it. This custom classloader adds all .jar files in lib/ to its classpath.
My current way works fine, but I now have two custom classloaders in my application and a recent change to the code has caused issues that are difficult to debug, so if there is a better way I'd like to remove this complexity. It also seems like over-engineering for what I'm sure is a very common situation.
So my question is, how should I be doing this?
We provide script files with the jar. E.g. some.bat, some.sh etc.
And as of Java6, you can use wildcard to specify classpaths.
Here is a good article that explains this approach : https://blogs.oracle.com/mr/entry/class_path_wildcards_in_mustang
If your audience is technical (and it sounds like they are if they're willing to drop in new jar files) then perhaps you could supply .sh and .bat files that they can edit to modify the classpath? That will be more transparent to them than a custom classloader.
You can try the Fat-Jar solution, it works perfectly with the 'Fat Jar Eclipse Plug-In'. I have used for a few projects with no problems at all. The principle of it seems to be the same of your current solution.
I think I'm going to go the manifest route. Declare the classpath in the manifest for the entry jar, and have entries for:
libs/pircbot.jar
libs/mysql-connector-java-5-bin.jar
libs/c3p0.jar
Then if users want to upgrade to the latest library versions, they will have to rename them to match what is declared in the manifest. I don't think that's too big a hassle, and it makes things a lot simpler internally.
I also just generally don't like the idea of loading everything in ./lib/ automatically, that seems potentially dangerous.

CPD / PMD between projects?

I am rephrasing this question to make it a little more straightforward and easy to understand, hopefully.
I have roughly 30 components (internal) that go into a single web application. That means 30 different projects with their own separate POM. I use inheritance quite a bit in my POMs so one of the things they inherit is a PMD/CPD configuration to prevent code duplication.
Even though I have CPD/PMD running, it only detects duplicate code within the same project. I would like it to detect in any of my projects if there is code shared among the projects that can be refactored out. Moreover, I was looking for something that could (using the same concept/pattern) verify that no code is shared between other open source dependencies.
It would be CPD/PMD, except it would operate on the source jars. This task would consume a large amount of memory if you scan all projects and their dependencies for duplication. Right now, I would just like to apply that to internal projects. If it works, then it would be relatively easy/straightforward to scale that out.
Walter
I'm not sure I got everything but...
I'd create an aggregating module with all projects as dependencies, use the maven-dependency-plugin and it's unpack-dependencies mojo to get all dependencies sources jar (the mojo can take a classifier as parameter) and unpack-them (maybe in target/generated-sources/java, the maven build helper plugin may help here) and finally run pmd:cpd on the whole source base.
This may need some tweaking, I didn't test this at all.
It sounds like you want to find duplicate code anywhere in your 30 projects. I can't speak for PMD; I assume you tell it to make one giant project containing all the source files from the union of the projects. But yes, this would take a lot of RAM and CPU.
Another tool that does is the Java CloneDR. The CloneDR finds duplicate code whether it is exactly the same or close (e.g., a few edits) regardless of source code layout or intervening comments. It is pretty easy to set it up to process all the files in your set of projects.
Just run PMD:CPD as a stand-alone program. All it needs is a directory, and it will recurse. At least, it did for me. I moved all my source to one directory and ran the CPD gui from the batch file distributed with PMD-4.2.5 .
You can perhaps take a look at sonar :
Sonar-CPD engine that is much more scalable and can detect cross-projects duplications.
You can try Lizard for Python.
It doesn't work on source jars, though.
"Code Duplicate Detector
lizard -Eduplicate {path to your code}"
https://pypi.org/project/lizard/
PMD/CPD provides more granularity since it allows the user to specify the number of tokens before a block of code is flagged as duplicate.
https://pmd.github.io/latest/pmd_userdocs_cpd.html#cli-options-reference

How to automate a build of a Java class and all the classes it depends on?

I guess this is kind of a follow-on to question 1522329.
That question talked about getting a list of all classes used at runtime via the java -verbose:class option.
What I'm interested in is automating the build of a JAR file which contains my class(es), and all other classes they rely on. Typically, this would be where I am using code from some third party open source product's "client logic" but they haven't provided a clean set of client API objects. Their complete set of code goes server-side, but I only need the necessary client bits.
This would seem a common issue but I haven't seen anything (e.g. in Eclipse) which helps with this. Am I missing something?
Of course I can still do it manually by: biting the bullet and including all the third-party code in a massive JAR (offending my purist sensibilities) / source walkthrough / trial and error / -verbose:class type stuff (but the latter wouldn't work where, say, my code runs as part of a J2EE servlet, and thus I only want to see this for a given Tomcat webapp and, ideally, only for classes related to my classes therein).
I would recommend using a build system such as Ant or Maven. Maven is designed with Java in mind, and is what I use pretty much exclusively. You can even have Maven assemble (using the assembly plugin) all of the dependent classes into one large jar file, so you don't have to worry about dependencies.
http://maven.apache.org/
Edit:
Regarding the servlet, you can also define which dependencies you want packaged up with your jar, and if you are making a stand alone application you can have the jar tool make an executable jar.
note: yes, I am a bit of a Maven advocate, as it has made the project I work on much easier. No I do not work on the project personally. :)
Take a look at ProGuard.
ProGuard is a free Java class file shrinker, optimizer, obfuscator, and preverifier. It detects and removes unused classes, fields, methods, and attributes. It optimizes bytecode and removes unused instructions. It renames the remaining classes, fields, and methods using short meaningless names. Finally, it preverifies the processed code for Java 6 or for Java Micro Edition.
What you want is not only to include the classes you rely on but also the classes, the classes you rely on, rely on. And so on, and so forth.
So that's not really a build problem, but more a dependency one. To answer your question, you can either solve this with Maven (apparently) or Ant + Ivy.
I work with Ivy and I sometimes build "ueber-jar" using the zipgroupfileset functionality of the Ant Jar task. Not very elegant would say some, but it's done in 10 seconds :-)

non-java files in package structure

We have a developer who is in the habit of committing non-java files (xsd, dtd etc) in the java packages under the src/java folder in our repository. Admittedly, these are relevant files to that package, but I just hate to see non-java files in the src folder.
Is this is a common practice that I should get used to or are we doing something strange by maintaining these files like this?
The problem with putting non Java (or other languages) files that are closely tied to the code in a different place than the code is knowing where to find them. It is possible to standardize the locations then theoretically everyone will know where to go and what to do. But I find in practice that does not happen.
Imagine your app still being maintained 5 or 10 years down the road by a team of junior - intermediate developers that do not work at the company now and will never talk to anyone who works on your project now. Putting files closely linked to the source in the source package structure could make their lives easier.
I am a big proponent of eliminating as many ambiguities as possible within reason.
It's very common and even recommended as long as its justifiable. Generally it's justifiable when it's a static resource (DTD+XSLT for proprietary formats, premade scripts etc.) but it's not when the file is something that's likely to be updated by a third party like IP/geographic location database dump.
I think it gets easier if you think of 'src' as not specifically meaning 'source code'. Think of it as the source of resources that are things needed by your program at compile time and/or runtime.
Things that are a product of compile or build activities should not go here.
Admittedly, like most things, exceptions may apply :)
Update:
Personally, I like to break down src further with subdirectories for each resource type underneath it. Others may like that division at a higher level.
There is a lot of jar libraries that uses the same practice.
I think it is acceptable and comfortable.
In Eclipse it works well for us to have a src folder containing java classes, and a configuration folder (which is blessed as a source folder) containing property files etc. Then they all go in the output folder together and can be found in the classpath while still being in seperate folders inside Eclipse
One of the advantages of keeping all the auxiliary files next to the source is that version consistency is maintained between these 3rd party libraries and your source code. If you ever need to go back and debug a specific version, you can pull the entire set of source+config and have it all be the same version.
That being said I'd put them in a $project/config/ directory, or some such, rather than in $project/src/java itself. They're not source, nor java, really, so it's misleading having them in that directory.
When you really get down to it, though, this is an issue of personal style. There's no "Right" answer and you should be talking with those team members and understanding why they made this decision. Using this thread as evidence to support a unilateral decision probably won't go over well. ;)
Its pretty common, you can find it in really popular frameworks, e.g. xsd files for spring various schemas. Also people usually place hibernate mapping files in the same package as the model classes.
I think this is common as long as the files are necessary. The problems arise when people start committing files that are not needed with the source, such as design specs or random text files.
It is surely common, but incredibly lazy and sloppy. My skin crawls when I see it.
Using a tool such as Maven to build your products enables you to easily, and clearly separate code from resources.
Eclipse bundles can be similarly separated.

Categories