Java Project Layout Best Practices for Ant-based Builds

Java Project Layout Best Practices for Ant-based Builds - java

I'm a little shocked that (if?) this precise questions hasn't been asked, but 15 minutes of scanning SO didn't turn up an exact match. (If I'm wrong, point me the right way and vote to close...)
Question 1:
What are the best practices for laying out Java projects under an Ant build system?
For our purposes, we have the following context (perhaps most of which is irrelevant):
Most developers are using Eclipse (not all)
Project is maintained in subversion
Project builds have recently migrated to Hudson, in which we want to use the release plugin to manage releases, and some custom scripts to handle automated deployment
This project is a "conventional" application, a sort of "production prototype" with a very limited pool of users, but they are at remote sites with airgap separation, so delivering versioned, traceable artifacts for easy installation, manual data collection/recovery, and remote diagnosis is important.
Some dependencies (JARs) are included in the SVN repo; others may be fetched via the ant script at build time, if necessary. Nothing fancy like Ivy yet (and please don't tell me to switch to Maven3... I know, and we'll do so if/when the appropriate time comes.)
Build includes JUnit, FindBugs, CheckStyle, PMD, JavaDoc, a bit of custom documentation generation
Two or three primary JAR artifacts (a main application artifact plus a couple of minimal API JARs for inclusion in a few coupled applications)
Desire to distribute the following distribution artifacts:
a "Full" source+bin tarball with all dependencies resolved, jars and JavaDoc prebuilt
a bin tarball, with just the docs and JavaDoc, jars, and ancillary wrapper scripts etc
a "Partner" source+bin, which has the "shared" source that partner developers are likely to look at, and associated testcases
Current structure looks like this
project-root/
project-root/source
project-root/source/java // main application (depends on -icd)
project-root/source/java-icd // distributable interface code
project-root/source/test // JUnit test sources
project-root/etc // config/data stuff
project-root/doc // pre-formatted docs (release notes, howtos, etc)
project-root/lib // where SVN-managed or Ant-retrieved Jars go
project-root/bin // scripts, etc...
At build time, it expands to include:
build/classes // Compiled classes
build/classes-icd
build/classes-test
build/javadoc
build/javadoc-icd
build/lib // Compiled JAR artifacts
build/reports // PMD, FindBugs, JUnit, etc... output goes here
build/dist // tarballs, zipfiles, doc.jar/src.jar type things, etc..
build/java // Autogenerated .java products
build/build.properties // build and release numbering, etc...
Question 2:
How can I maintain strict separation in the development tree between revision-controlled items and build-time artifacts WHILE producing a coherent distribution as above AND allowing me to treat a development tree as a operational/distribution during development and testing? In particular, I'm loathe to have my <jar> task drop .jar files in the top-level lib directory -- that directory in the developers' trees is inviolable SVN territory. But distributing something for public use with build/lib/*.jar is a confusing annoyance. The same is true of documentation and other built artifacts that we want to appear in a consistent place in the distribution, but don't want to have developers and users use completely different directory structures.
Having all the generated products in a separate build/ directory is very nice for development-time, but it's an annoying artifact to distribute. For distribution purposes I'd rather have all the JARs sitting in a single lib location, in fact, a structure like the below makes the most sense. Currently, we build this structure on the fly with ant dist by doing some intricate path manipulations as .tar.gz and .zip artifacts are built.
What I think the dist should look like:
project-root/
project-root/source // present in only some dists
project-root/etc // same as in development tree
project-root/doc // same as in development tree
project-root/doc/javadoc // from build
project-root/lib // all dependency and built JAR files
project-root/bin // same as in development tree
build.properties // build and release numbering, etc...
This question is narrowly about the "how do I maintain clean development and distribution project layouts?" as I asked above; but also to collect info about Java/Ant project layouts in general, and critiques of our particular approach. (Yes, if you think it should be a Community Wiki I'll make it so...)

My one suggestion would be that the directory tree you distribute should not be the one in CVS. Have a script which puts together a dist directory under build, then zips that up. That script can combine source-controlled and derived files to its heart's content. It can also do things like scrub out SVN directories, which you don't want to distribute. If you want to be able to treat development and distributed trees in the same way, simply ensure that the layout of dist is the same as the layout of the development project - the easiest way to do that would be to copy everything except the build subdirectory (and CVS directories, and perhaps things like the Eclipse .project and .classpath).
I suspect you won't like this suggestion. It may be that you are attached to the idea that the distributed file is simply a portable version of your development environment - but i think it's the case that it isn't, it can never be, and it doesn't need to be. If you can accept that idea, you might find my suggestion agreeable.
EDIT: I thought about this a bit more, and looked at some of the scripts i use. I think what i'd do in this situation is to build a separate tree even in development; point the execution environment at project-root/build/app (or perhaps project-root/build if you can) rather than project-root, and then symlink (or copy if you don't have symlinks) all the necessaries (whether static, from in the project root, or derived, from in build) into that. Building a distribution may then be as simple as zipping up that tree (with a tool that resolves symlinks, of course). The nice thing about this is it allows the structure of the executed tree to be very clean - it won't contain source directories, IDE files, build scripts, or other supporting files from inside the project, etc. If you're using Subversion, it will still contain .svn directories inside anything symlinked from the static areas; if you were using Mercurial, it wouldn't contain any .hg stuff.

Layout-wise, we use something which has evolved into something very close to a Maven layout (see here). This is a very functional layout which has been used by a lot of people. And, if you want to switch to Maven later, you're all set. We have a couple of variations, the most important of which is that we separate automated unit- and integration-tests.
In terms of mingling sources and build artefacts - I would certainly recommend against it. As you've seen, it messes with IDE indexing and version control and generally makes life difficult.
As far as I can tell you either have to accept this mingling, or copy your dependencies as part of the build and treat the output as a separate project - perhaps constantly open in another IDE window if you need it. The idea of mixing your work 'as a user' versus 'as a producer' of your release package sounds like it would be confusing, anyway.

http://ant.apache.org/ant_in_anger.html
The project contains sub directories
bin common binaries, scripts - put this on the path.
build This is the tree for building; Ant creates it and can empty it in the 'clean' project.
dist Distribution outputs go in here; the directory is created in Ant and clean empties it out
doc Hand crafted documentation
lib Imported Java libraries go in to this directory
src source goes in under this tree in a hierarchy which matches the package names.

There are also (maybe a bit outdated) general recommendations from sun/oracle for a project layout you maybe want to take a look at:
Guidelines, Patterns, and Code for End-to-End Java Applications

Related

The idiomatic structure for gradle project for the tests

Task: what I have is the large non-Gradle (make:-)) project, which contains many subprojects, each one in it's own subdirectory. I have to write functional test for some of these subprojects. These subprojects are producing independent results, but with the same structure, so there is many common code for testing these subprojects, so I want to share it in some special location.
Restrictions:
as developers requested, the tests for subprojects should be in the directory of this subproject (to be precise, in the subdirectory, for example, func_tests).
I have some shared dependencies for my test projects, that I usually use, for example, Google Guava, TestNG and so on, and also have some settings for test run (excludeGroups 'slow'...) and I prefer this settings to be common, still, that doesn't matter too much.
symbolic links are accepted way, if that's good design:)
If it's possible, I want to have IntelliJ IDEA correctly handle this dependency.
My ideas:
symlink src/main of every test subproject to some common directory (src/test is "individual"). This will greatly support IDE , but it would lead to copying all the dependencies and preferences. Also, I'm very unsure, if that's preferred way in Gradle.
create common project, which will be imported by every subproject, this will save dependencies (will it?), but I'm not sure IDEA will correctly handle this way.
What is the idiomatic way to do this with Gradle?

Look at samples/java/withIntegrationTests in your Gradle installation. This will give you some idea how to add your tests (there are other ways too). You want to tweak that setup to make sure that IDEA handles your tests. This is done by customization of idea.module.scopes.
Shared code and shared libraries: you can create a map like https://github.com/gradle/gradle/blob/master/gradle/dependencies.gradle and use it in different subprojects. BTW: Gradle codebase has a lot of integration tests and you can check how their build is configured to see if you want to apply some ideas.

Including .jar files in Github for consistency

I am new to using github and have been trying to figure out this question by looking at other people's repositories, but I cannot figure it out. When people fork/clone repositories in github to their local computers to develop on the project, is it expected that the cloned project is complete (ie. it has all of the files that it needs to run properly). For example, if I were to use a third-party library in the form of a .jar file, should I include that .jar file in the repository so that my code is ready to run when someone clones it, or is it better to just make a note that you are using such-and-such third-party libraries and the user will need to download those libraries elsewhere before they begin work. I am just trying to figure at the best practices for my code commits.
Thanks!

Basically it is as Chris said.
You should use a build system that has a package manager. This way you specify which dependencies you need and it downloads them automatically. Personally I have worked with maven and ant. So, here is my experience:
Apache Maven:
First word about maven, it is not a package manager. It is a build system. It just includes a package manager, because for java folks downloading the dependencies is part of the build process.
Maven comes with a nice set of defaults. This means you just use the archtype plugin to create a project ("mvn archetype:create" on the cli). Think of an archetype as a template for your project. You can choose what ever archetype suits your needs best. In case you use some framework, there is probably an archetype for it. Otherwise the simple-project archetype will be your choice. Afterwards your code goes to src/main/java, your test cases go to src/test/java and "mvn install" will build everything. Dependencies can be added to the pom in maven's dependency format. http://search.maven.org/ is the place to look for dependencies. If you find it there, you can simply copy the xml snippet to your pom.xml (which has been created by maven's archetype system for you).
In my experience, maven is the fastest way to get a project with dependencies and test execution set up. Also I never experienced that a maven build which worked on my machine failed somewhere else (except for computers which had year-old java versions). The charm is that maven's default lifecycle (or build cycle) covers all your needs. Also there are a lot of plugins for almost everything. However, you have a big problem if you want to do something that is not covered by maven's lifecycle. However, I only ever encountered that in mixed-language projects. As soon as you need anything but java, you're screwed.
Apache Ivy:
I've only ever used it together with Apache Ant. However, Ivy is a package manager, ant provides a build system. Ivy is integrated into ant as a plugin. While maven usually works out of the box, Ant requires you to write your build file manually. This allows for greater flexibility than maven, but comes with the prize of yet another file to write and maintain. Basically Ant files are as complicated as any source code, which means you should comment and document them. Otherwise you will not be able to maintain your build process later on.
Ivy itself is as easy as maven's dependency system. You have an xml file which defines your dependencies. As for maven, you can find the appropriate xml snippets on maven central http://search.maven.org/.
As a summary, I recommend Maven in case you have a simple Java Project. Ant is for cases where you need to do something special in your build.

Recommended code-reuse

I'm starting a new project in Scala that heavily depends on source files in another Java project. At first I thought about creating a new package/module in that project, but that seemed messy. However, I don't want to duplicate the Java code in my new project. The best solution I came up with is to reference the Java project through an svn external dependancy. Another solution is creating a jar file from the original, I would attach it as a library to the new Scala project. However, this makes keeping up-to-date more inconvenient. Which route is better?
SVN external:
Easy to update
Easy to commit
Inconvenient setup
Jar file as library:
Easy to setup
Old code isn't in active development(stable, bug fixes only)
Multi-step update
Need to open old project to make changes

You have your Scala project, and it depends on parts of your Java project. To me, that sounds like the Java project should be a library, with a version number, which is simply referenced by your Scala project. Or perhaps those parts that are shared across the two projects should be separated into a library. Using build tools like Maven will keep it clear which version is being used. The Java project can then evolve separately, or if it needs to change for the sake of the Scala project, you can bring out a new version and keep using an older one in other contexts if you're afraid of breakage.
The only exception where you go beyond binary dependencies that I can think of is if the Java code itself is actually being processed in some way at compile-time that is specific to the Scala project. Like, say, annotation processing.
Using SVN externals could be a solution. Just make sure you work with branches and snapshots to make sure some update to your Java code on the trunk doesn't suddenly make your Scala project inoperable.

Whether you have mixed scala/java or not is irrelevent.
You probably want to use external dependencies (jars) if both projects are very distinct and have different release cycles.
Otherwise, you can use sbt and have your two projects be sub-projects of the same build (your sbt build file would have the scala project depend on the java project). Or even, if they are really so intertwined, just have one project with both java and scala source files. Sbt handles that just fine.
Maven is an option too.

You can very easily set up a mixed scala/Java project using maven. Have a look at the scala-maven-plugin.
https://github.com/davidB/scala-maven-plugin
Eclipse users may not be too pleased due the poor maven integration.

Jar configurations and their contents

While downloading Google Guice I noticed two main "types" of artifacts available on their downloads page:
guice-3.0.zip; and
guice-3.0-src.zip
Upon downloading them both and inspecting their contents, they seem to be two totally different "perspectives" of the Guice 3.0 release.
The guice-3.0.zip just contains the Guice jar and its dependencies. The guice-3.0-src.zip, however, did not contain the actual Guice jar, but it did contain all sorts of other goodness: javadocs, examples, etc.
So it got me thinking: there must be different "configurations" of jars that get released inside Java projects. Crossing this idea with what little I know from build tools like Ivy (which has the concept of artifact configurations) and Maven (which has the concept of artifact scopes), I am wondering what the relation is between artifact configuration/scope and the end deliverable (the jar).
Let's say I was making a utility jar called my-utils.jar. In its Ivy descriptor, I could cite log4j as a compile-time dependency, and junit as a test dependency. I could then specify which of these two "configurations" to resolve against at buildtime.
What I want to know is: what is the "mapping" between these configurations and the content of the jars that are produced in the end result?
For instance, I might package all of my compile configuration dependencies wind up in the main my-utils.jar, but would there ever be a reason to package my test dependencies into a my-utils-test.jar? And what kind of dependencies would go in the my-utils-src.jar?
I know these are a lot of tiny questions, so I guess you can sum everything up as follows:
For a major project, what are the typical varieties of jars that get released (such as guice-3.0.zip vs guice-3.0-src.zip, etc.), what are the typical contents of each, and how do they map back to the concept of Ivy configurations or Maven scopes?

The one you need to run is guice-3.0.zip. It has the .class files in the correct package structure.
The other JAR, guice-3.0-src.zip, has the .java source files and other things that you might find useful. A smart IDE, like IntelliJ, can use the source JAR to allow you to step into the Guice code with a debugger and see what's going on.
You can also learn a lot by reading the Guice source code. It helps to see how developers who are smarter than you and me write code.
I'd say that the best example I've found is the Efficient Java Matrix Library at Google Code. That has an extensive JUnit test suite that's available along with the source, the docs, and everything else that you need. I think it's most impressive. I'd like to emulate it myself.

Is there any disadvantage to putting API code into a JAR along with the classes?

In Java if you package the source code (.java) files into the jar along with classes (.class) most IDE's like eclipse will show the javadoc comments for code completion.
IIRC there are few open-source projects that do this like JMock.
Lets say I have cleanly separated my API code from implementation code so that I have something like myproject-api.jar and myproject-impl.jar is there any reason why I should not put the source code in my myproject-api.jar ?
Because of Performance? Size?
Why don't other projects do this?
EDIT: Other than the Maven download problem will it hurt anything to put my sources into the classes jar to support as many developers as possible (maven or not)?

Generally because of distribution reason:
if you keep separate binaries and sources, you can download only what you need.
For instance:
myproject-api.jar and myproject-impl.jar
myproject-api-src.jar and myproject-impl-src.jar
myproject-api-docs.zip and myproject-impl-docs.zip
Now, m2eclipse - Maven for Eclipse can download sources automatically as well
mvn eclipse:eclipse -DdownloadSources=true -DdownloadJavadocs=true
Now, it can also generate the right pom to prevent distribution of the source or javadoc jar when anyone declare a dependency on your jar.
The OP comments:
also can't imagine download size being an issue (i mean it is 2010 a couple 100k should not be a problem).
Well actually it (i.e. "the size) is a problem.
Maven suffers already from the "downloading half the internet on first build" syndrome.
If that downloads also sources and/or javadocs, that begins to be really tiresome.
Plus, the "distribution" aspect includes the deployment: in a webapp server, there is no real advantage to deploy a jar with sources in it.
Finally, if you really need to associate sources with binaries, this SO question on Maven could help.

Using maven, attach the sources automatically like this:
http://maven.apache.org/plugins/maven-source-plugin/usage.html
and the javadocs like this:
http://maven.apache.org/plugins/maven-javadoc-plugin/jar-mojo.html
That way they will automatically be picked up by
mvn eclipse:eclipse -DdownloadSources=true -DdownloadJavadocs=true
or by m2eclipse

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.