Multi-component versioning/building best practices

Multi-component versioning/building best practices - java

I have a Java project, built with Maven, that aggregates several components, each one in its own Maven project. Any one of these components may evolve separately.
The structure of my project can be described as follows:
my-main-project that depends on:
my-component-1
my-component-2
etc.
Nowadays, all pom.xml are using "snapshot" versions, so, they are all using the "latest" version available in my repository.
But once I send a release version to my customer, I'm supposed to freeze the versions and make a TAG (or equivalent) in my source-control, so I can restore a previous state in case of maintenance.
So, my question is: should I change all pom.xml files before each release, give version numbers to the components, and tie everything with this dependency versions? Also, if I have many components (my project currenty has 30+ small subcomponents) would I have to renumber/reversion each one before each release? When a single component evolves (due to bug fix or enhancement), must I increase its version so that the changes do not affect pre-existing releases, right?
How people using maven generally handle this many-component versioning case?
Of course, I could just rely on my version-control tags to restore to a previous point-in-time and just tag every component on each release, but I don't like this approach, since the dependency versioning (with maven) gives me much more control and visibility about what is packaged, and relations of (broken-)compatibility and many more.

General Considerations
You may consider some relations between your components.
Are they really independant (each one vs each other) ? Or is there some kinds of relation ... some commons lifecycles ?
If you find some relationship between them, consider using maven multi-modules : http://www.sonatype.com/books/mvnex-book/reference/multimodule.html. In a few words, you will have a parent, with one version, and some modules (some jars .. in a way like Spring and its submodules). This will help you to reduce versions management.
You may consider using maven-release-plugin. It will help you to tag, build and deploy automatically your modules, dealing more easily with versionning and links with SCM and Repository.
Moreover, combine with multi-module it would drastically help you !
There is a lot of topic dealing with this on Stack Overflow.
I don't know if you already know that. I could explain it a lot further if you want, but you may have enough elements to search by yourself if you don't.
Straight Answers
So, my question is: should I change all pom.xml files before each release, give version numbers to the components, and tie everything with this dependency versions?
Yes you should. In Application Lifecycle Management follow the changes is REALLY important. So, as you could imagine, and as you point it out, you really should build and tag each of your components. It could be painful, but maven-realease-plugin and multi module (even with a Continuous Integration plateform) it could be easier.
would I have to renumber/reversion each one before each release?
For exactly the same reasons : yes !
must I increase its version so that the changes do not affect pre-existing releases, right?
Yes, you should too. Assuming you choose a common versionning like MAJOR.minor.correction, the first number indicate compatibilty breaks. Minor version would bring some breaks, but should not. Corrections whould NEVER affect compatibility.
How people using maven generally handle this many-component versioning case?
I cannot reply for every one, but my previous comments on release-plugin and multi-module considered as best pratices. If you want to a little bit further, you can imagine use more powerfull SCM (Clearcase, Perforce, ...), but maven integration is fewer, not "well" documented and community provide less examples than SVN or Git.

Maven Release Plugin
If you are using a multi-module pom.xml you should be able to do mvn release -DautoVersionSubmodules and have it do a "release" build of all your dependencies and remove the -SNAPSHOT versions and upload them to your repository. That is what the release plugin and its workflow exists solely to do.

Related

Gitflow Workflow with Maven - when to build what?

Gitflow introduces several branches like develop, release, hotfix, and also encourages feature branches.
In a Maven project, you usually build SNAPSHOT and release versions, and often number them with semantic, three-digit versions.
It would be sensible to automate the build process as much as possible, but the question is: When should we build a SNAPSHOT version, when should be build a release version, when should we build none of that at all?
I image the following could be sensible:
Whenever a feature branch is merged back into develop, a SNAPSHOT build is triggered and deployed to the Maven repository.
When a release branch is created, as release build is started.
But there are much more situations:
When I fix bugs on the release (or hotfix) branch, do I always want a new release build?
During developing a feature, should I build on the feature branch? If so, what should this version be called (1.2.3-FEATURE1-SNAPSHOT?)?

Let's start with releases. Whether a version is going to be released or not is decided in the future when an already-built binary is deployed to TST envs and checked. When committing or building you can't predict whether the version will be a "release".
Once you abandon these ideas things will become much simpler. And since you can't use branch-based versions for releases, what's the point of making things different for feature branches? You might as well forget about mixing the concepts of branching and versioning together.
With Continuous Delivery (you can borrow its ideas even if you don't use it to the fullest) any build may potentially go to PRD, thus:
Build a binary with any type of versioning that you like. With Maven the easiest is to stick with SNAPSHOT* and never use "release" ones. It's unique, it's standard, it has some advantages with Nexus (retention policies).
When you're ready to go to PRD and the release version is chosen - tag it somehow. It can be a CI Job that keeps track of all PRD deployments; or you may have a page with all the release versions; or you may transfer the binary to another Maven repo (still can be a SNAPSHOT type). The latter is convenient if you go with retention policies for the snapshots.
Also we usually want to mention from which commit the binary was built. You can put this into the binary (some kind of version.properties) during the build time. You may even create an endpoint in your app that servers this version for convenience.
PS: if you simply want to follow GitFlow advice - there is an example of how you could version. But you'll have all the problems (and more) that you already mentioned in the question.
* Maven automatically resolves SNAPSHOT versions into timestamp-ones. But you can't actually use this functionality because the timestamp is going to be different for different artifcacts during the build. If you want to keep version the same across all the binaries in the build you need to generate and assign a timestamp version manually using versions:set. It's not complicated, but is worth mentioning.

How do big companies tackle with the package dependencies conflict problem?

Just as shown in the picture, one app (Java) referenced two third-party package jars (packageA and packageB), and they referenced packageC-0.1 and packageC-0.2 respectively. It would work well if packageC-0.2 was compatible with packageC-0.1. However sometimes packageA used something that could not be supported in packageC-0.2 and Maven can only use the latest version of a jar. This issue is also known as "Jar Hell".
It would be difficult in practice to rewrite package A or force its developers to update packageC to 0.2.
How do you tackle with these problems? This often happens in large-scale companies.
I have to declare that this problem is mostly occurred in BIG companies due to the fact that big company has a lot of departments and it would be very expensive to let the whole company update one dependency each time certain developers use new features of new version of some dependency jars. And this is not big deal in small companies.
Any response will be highly appreciated.
Let me throw away a brick in order to get a gem first.
Alibaba is one of the largest E-Commerces in the world. And we tackle with these problems by creating an isolation container named Pandora. Its principle is simple: packaging those middle-wares together and load them with different ClassLoaders so that they can work well together even they referenced same packages with different versions. But this need a runtime environment provided by Pandora which is running as a tomcat process. I have to admit that this is a heavy plan. Pandora is developed based on a fact that JVM identifies one class by class-loader plus classname.
If you know someone maybe know the answers, share the link with him/her.

We are a large company and we have this problem a lot. We have large dependency trees that over several developer groups. What we do:
We manage versions by BOMs (lists of Maven dependencyManagement) of "recommended versions" that are published by the maintainers of the jars. This way, we make sure that recent versions of the artifacts are used.
We try to reduce the large dependency trees by separating the functionality that is used inside a developer group from the one that they offer to other groups.
But I admit that we are still trying to find better strategies. Let me also mention that using "microservices" is a strategy against this problem, but in many cases it is not a valid strategy for us (mainly because we could not have global transactions on databases any more).

This is a common problem in the java world.
Your best options are to regularly maintain and update dependencies of both packageA and packageB.
If you have control over those applications - make time to do it. If you don't have control, demand that the vendor or author make regular updates.
If both packageA and packageB are used internally, you can use the following practise: have all internal projects in your company refer to a parent in the maven pom.xml that defines "up to date" versions of commonly used third party libraries.
For example:
<framework.jersey>2.27</framework.jersey>
<framework.spring>4.3.18.RELEASE</framework.spring>
<framework.spring.security>4.2.7.RELEASE</framework.spring.security>
Therefore, if your project "A" uses spring, if they use the latest version of your company's "parent" pom, they should both use 4.3.18.RELEASE.
When a new version of spring is released and desirable, you update your company's parent pom, and force all other projects to use that latest version.
This will solve many of these dependency mismatch issues.
Don't worry, it's common in the java world, you're not alone. Just google "jar hell" and you can understand the issue in the broader context.
By the way mvn dependency:tree is your friend for isolating these dependency problems.

I agree with the answer of #JF Meier ，In Maven multi-module project, the dependency management node is usually defined in the parent POM file when doing unified version management. The content of dependencies node declared by the node class is about the resource version of unified definition. The resources in the directly defined dependencies node need not be introduced into the version phase. The contents of the customs are as follows:
in the parent pom
<dependencyManagement> 
    <dependencies > 
      <dependency > 
        <groupId>com.devzuz.mvnbook.proficio</groupId> 
        <artifactId>proficio-model</artifactId> 
        <version>${project.version}</version> 
      </dependency > 
</dependencies >
</dependencyManagement>
in your module ,you do not need to set the version
<dependencies > 
    <dependency > 
      <groupId>com.devzuz.mvnbook.proficio</groupId> 
       <artifactId>proficio-model</artifactId> 
    </dependency > 
  </dependencies > 
This will avoid the problem of inconsistency .

This question can't be answered in general.
In the past we usually just didn't use dependencies of different versions. If the version was changed, team-/company-wide refactoring was necessary. I doubt it is possible with most build tools.
But to answer your question..
Simple answer: Don't use two versions of one dependency within one compilation unit (usually a module)
But if you really have to do this, you could write a wrapper module that references to the legacy version of the library.
But my personal opinion is that within one module there should not be the need for these constructs because "one module" should be relatively small to be manageable. Otherwise it might be a strong indicator that the project could use some modularization refactoring. However, I know very well that some projects of "large-scale companies" can be a huge mess where no 'good' option is available. I guess you are talking about a situation where packageA is owned by a different team than packageB... and this is generally a very bad design decision due to the lack of separation and inherent dependency problems.

First of all, try to avoid the problem. As mentioned in #Henry's comment, don't use 3rd party libraries for trivial tasks.
However, we all use libraries. And sometimes we end up with the problem you describe, where we need two different versions of the same library. If library 'C' has removed and added some APIs between the two versions, and the removed APIs are needed by 'A', while 'B' needs the new ones, you have an issue.
In my company, we run our Java code inside an OSGi container. Using OSGi, you can modularize your code in "bundles", which are jar files with some special directives in their manifest file. Each bundle jar has its own classloader, so two bundles can use different versions of the same library. In your example, you could split your application code that uses 'packageA' into one bundle, and the code that uses 'packageB' in another. The two bundles can call each others APIs, and it will all work fine as long as your bundles do not use 'packageC' classes in the signature of the methods used by the other bundle (known as API leakage).
To get started with OSGi, you can e.g. take a look at OSGi enRoute.

Let me throw away a brick in order to get a gem first.
Alibaba is one of the largest E-Commerces in the world. And we tackle with these problems by creating an isolation container named Pandora. Its principle is simple: packaging those middle-wares together and load them with different ClassLoaders so that they can work well together even they referenced same packages with different versions. But this need a runtime environment provided by Pandora which is running as a tomcat process. I have to admit that this is a heavy plan.
Pandora is developed based on a fact that JVM identifies one class by class-loader plus classname.

Who is using my maven artifact?

I have a system consisting of multiple web applications (war) and libraries (jar). All of them are using maven and are under my control (source code, built artifacts in Nexus,...). Let say that application A is using library L1 directly and L2 indirectly (it is used from L1). I can easily check the dependency tree top-down from the application, using maven's dependency:tree or graph:project plugins. But how can I check, who's using my library? From my example, I want to know, whether A is the only application (or library) using L1 and that L2 is used from L1 and from some other application, let say B. Is there any plugin for maven or nexus or should I try to write some script for that? What are your suggestions?

If you wish to achieve this on a repository level, Apache Archiva has a "used by" feature listed under project information
.
This is similar to what mvnrepository.com lists under its "used by" section of an artifact description.
Unfortunately, Nexus does not seem to provide an equivalent feature.
Now I suppose it would be a hassle to maintain yet another repository just for that, but then it would probably easier than what some other answers suggestions, such as writing a plugin to Nexus. I believe Archiva can be configured to proxy other repositories.
Update
In fact, there's also a plugin for Nexus to achieve the "used by" feature.

As far as I know nothing along these lines exists as an open source tool. You could write a Nexus plugin that traverses a repo and checks for usages of your component in all other components by iterating through all the pom's and analyzing them. This would be a rather heavy task to run though since it would have to look at all components and parse all the poms.
In a similar fashion you could do it on a local repository with some other tool. However it probably makes more sense to parse the contents of a repo manager rather than a local repository.

I don't think there's a Maven way to do this. That being said, there are ways of doing this or similar things. Here's a handful examples:
Open up your projects in your favorite IDE. For instance Eclipse will help you with impact analysis on a class level, which most of the time might be good enough
Use a simple "grep" on your source directory. This sounds a bit brusk (as well as stating the obvious), perhaps, but we've used this a lot
Use dependency analysis tools such as Sonargraph or Lattix

I am not aware of any public libraries for this job, so I wrote a customized app which does it for me.
I work with a distribution which involves more than 70 artifacts bundled together. Many times after modifying an artifact, I want to ensure changes are backward compatible (i.e. no compilation errors are introduced in dependent artifacts). To achieve this, it was crucial to know all dependents of modified artifact.
Hence, I wrote an app which scans through all artifacts under a directory(/subdirectories), extracts their pom.xml and searches (in dependency section of pom) for occurrence of modified artifact.
(I did this in java although shell/windows script can do this even more compactly.)
I'll be happy to share code on github, if that could be of any help.

One way that might suit your needs are to create a master-pom with all your maven projects. Then you run the following command on the master-pom:
mvn dependency:tree -DoutputType=graphml -DoutputFile=dependency.graphml
Open the generated file in yEd.
Used the instructions found here:
http://www.summa-tech.com/blog/2011/04/12/a-visual-maven-dependency-tree-view/

More interesting is probably: what would you do with this information? Inform the developers of A not to use library L1 or L2 anymore, because it has a critical bug?
In my opinion you should be able to create a blacklist of dependencies/parents/plugins on your repository manager. Once a project tries to deploy/upload itself with a blacklisted artifact, it should fail. I'm saying uploading and not downloading, because that might break a lot of projects. As far as I know, this is not yet available for any repository-manager.

One of the ways to approach this problem is outside Java itself : write an OS-level monitoring script that tracks each case of fopen() on the jar file under question! Assuming this is in a corporate environemnt, you might have to wait for a few weeks (!) to allow all using processes to access the library at least once!
On Windows, you might use Sysinternals Process Monitor to do this:
http://technet.microsoft.com/en-us/sysinternals/bb896645
On Unix variants, you would use DTrace or strace.

IMHO and also from my experience, looking for a technical solution for such a problem is often an overkill. If the reason why you want to know who is using your artifact(library) is because you want to ensure backward compatibility when you change an artifact or something similar, I think it is best done by communicating your changes using traditional channels and also encourage other teams who might be using your library to talk about it (project blogs, wiki, email, a well known location where documentations are put, Jour fixe etc.).
In theory, you could write a script that crawls though each project in your repository and then parses the maven build.xml (assuming they all use maven) and see whether they have defined a dependency to your artifact. If all the projects in your organization follows the standard maven structure, it should be easy to write one such script (though if any of those projects have a dependency to your artifact via a transitive dependency, things can get a bit more tricky).

Maven best practices: including timestamps for snapshot releases or not?

I recently added Maven snapshot build capability to a project, configured to use unique timestamp version on deployed artifact. But there is some confusion regarding whether this is the right thing to do (snapshots in question are deployed to one of public repos, not just within an entity like company): some say it causes problems when trying to use snapshots.
So: given how much of Maven is convention based, and following perceived best practices, I am hoping there are some guidelines as to which option to choose.
(NOTE: I slightly edited the title -- I am specifically interesting in benefits (or lack thereof) of including unique timestamp, via deploy option, for public snapshot versions; not so much whether to make use of timestamps if they are included, although that is obviously somewhat related question)

As a rule you should always build against the -SNAPSHOT dependency. However, you should avoid releasing your product if it includes -SNAPSHOT dependencies. If you use the Maven Release Plug-in to automate your release it will check to make sure you are not using release plug-ins or dependencies.
But that is not always possible. In cases where I need to release something based on a snapshot build that is when I use the explicit timestamp/build number rather than the -SNAPSHOT version naming scheme.
You can automate this using the Versions Maven Plugin. It provides goals to lock and unlock snapshot versions in your POM.

The whole point of a snapshot is to let someone use the latest version of the code. Why would you want to use a snapshot five versions back?
With that in mind, what do timestamps in the artifact name buy you?
I've been using Maven for about 5 years now. I've never seen a snapshot jar with a timestamp in the name.

How to modularize a (large) Java App?

I have a rather large (several MLOC) application at hand that I'd like to split up into more maintainable separate parts. Currently the product is comprised of about 40 Eclipse projects, many of them having inter-dependencies. This alone makes a continuous build system unfeasible, because it would have to rebuild very much with each checkin.
Is there a "best practice" way of how to
identify parts that can immediately be separated
document inter-dependencies visually
untangle the existing code
handle "patches" we need to apply to libraries (currently handled by putting them in the classpath before the actual library)
If there are (free/open) tools to support this, I'd appreciate pointers.
Even though I do not have any experience with Maven it seems like it forces a very modular design. I wonder now whether this is something that can be retrofitted iteratively or if a project that was to use it would have to be layouted with modularity in mind right from the start.
Edit 2009-07-10
We are in the process of splitting out some core modules using Apache Ant/Ivy. Really helpful and well designed tool, not imposing as much on you as maven does.
I wrote down some more general details and personal opinion about why we are doing that on my blog - too long to post here and maybe not interesting to everyone, so follow at your own discretion: www.danielschneller.com

Using OSGi could be a good fit for you. It would allow to create modules out of the application. You can also organize dependencies in a better way. If you define your interfaces between the different modules correctly, then you can use continuous integration as you only have to rebuild the module that you affected on check-in.
The mechanisms provided by OSGi will help you untangle the existing code. Because of the way the classloading works, it also helps you handle the patches in an easier way.
Some concepts of OSGi that seem to be a good match for you, as shown from wikipedia:
The framework is conceptually divided into the following areas:
Bundles - Bundles are normal jar components with extra manifest headers.
Services - The services layer connects bundles in a dynamic way by offering a publish-find-bind model for plain old Java objects(POJO).
Services Registry - The API for management services (ServiceRegistration, ServiceTracker and ServiceReference).
Life-Cycle - The API for life cycle management (install, start, stop, update, and uninstall bundles).
Modules - The layer that defines encapsulation and declaration of dependencies (how a bundle can import and export code).
Security - The layer that handles the security aspects by limiting bundle functionality to pre-defined capabilities.

First: good luck & good coffee. You'll need both.
I once had a similiar problem. Legacy code with awful circular dependencies, even between classes from different packages like org.example.pkg1.A depends on org.example.pk2.B and vice versa.
I started with maven2 and fresh eclipse projects. First I tried to identify the most common functionalities (logging layer, common interfaces, common services) and created maven projects. Each time I was happy with a part, I deployed the library to the central nexus repository so that it was almost immediately available for other projects.
So I slowly worked up through the layers. maven2 handled the dependencies and the m2eclipse plugin provided a helpful dependency view. BTW - it's usually not too difficult to convert an eclipse project into a maven project. m2eclipse can do it for you and you just have to create a few new folders (like src/main/java) and adjust the build path for source folders. Takes just a minute or two. But expect more difficulties, if your project is an eclipse plugin or rcp application and you want maven not only to manage artifacts but also to build and deploy the application.
To opinion, eclipse, maven and nexus (or any other maven repository manager) are a good basis to start. You're lucky, if you have a good documentation of the system architecture and this architecture is really implemented ;)

I had a similar experience in a small code base (40 kloc). There are no °rules":
compiled with and without a "module" in order to see it's usage
I started from "leaf modules", modules without other dependencies
I handled cyclic dependencies (this is a very error-prone task)
with maven there is a great deal with documentation (reports) that can be deployed
in your CI process
with maven you can always see what uses what both in the site both in netbeans (with a
very nice directed graph)
with maven you can import library code in your codebase, apply source patches and
compile with your products (sometimes this is very easy sometimes it is very
difficult)
Check also Dependency Analyzer:
(source: javalobby.org)
Netbeans:
(source: zimmer428.net)

Maven is painful to migrate to for an existing system. However it can cope with 100+ module projects without much difficulty.

The first thing you need to decide is what infra-structure you will move to. Should it be a lot of independently maintained modules (which translates to individual Eclipse projects) or will you consider it a single chunk of code which is versioned and deployed as a whole. The first is well suited for migrating to a Maven like build environment - the latter for having all the source code in at once.
In any case you WILL need a continuous integration system running. Your first task is to make the code base build automatically, so you can let your CI system watch over your source repository and rebuild it whenyou change things. I decided for a non-Maven approach here, and we focus on having an easy Eclipse environment so I created a build enviornment using ant4eclipse and Team ProjectSet files (which we use anyway).
The next step would be getting rid of the circular dependencies - this will make your build simpler, get rid of Eclipse warnings, and eventually allow you to get to the "checkout, compile once, run" stage. This might take a while :-( When you migrate methods and classes, do not MOVE them, but extract or delegate them and leave their old name lying around and mark them deprecated. This will separate your untangeling with your refactoring, and allow code "outside" your project to still work with the code inside your project.
You WILL benefit from a source repository which allows for moving files, and keeping history. CVS is very weak in this regard.

I wouldn't recommend Maven for a legacy source code base. It could give you many headaches just trying to adapt everything to work with it.
I suppose what you need is to do an architectural layout of your project. A tool might help, but the most important part is to organize a logical view of the modules.

It's not free but Structure101 will give you as good as you will get in terms of tool support for hitting all your bullet points. But for the record I'm biased, so you might want to check out SonarJ and Lattix too. ;-)

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.