How do big companies tackle with the package dependencies conflict problem?

How do big companies tackle with the package dependencies conflict problem? - java

Just as shown in the picture, one app (Java) referenced two third-party package jars (packageA and packageB), and they referenced packageC-0.1 and packageC-0.2 respectively. It would work well if packageC-0.2 was compatible with packageC-0.1. However sometimes packageA used something that could not be supported in packageC-0.2 and Maven can only use the latest version of a jar. This issue is also known as "Jar Hell".
It would be difficult in practice to rewrite package A or force its developers to update packageC to 0.2.
How do you tackle with these problems? This often happens in large-scale companies.
I have to declare that this problem is mostly occurred in BIG companies due to the fact that big company has a lot of departments and it would be very expensive to let the whole company update one dependency each time certain developers use new features of new version of some dependency jars. And this is not big deal in small companies.
Any response will be highly appreciated.
Let me throw away a brick in order to get a gem first.
Alibaba is one of the largest E-Commerces in the world. And we tackle with these problems by creating an isolation container named Pandora. Its principle is simple: packaging those middle-wares together and load them with different ClassLoaders so that they can work well together even they referenced same packages with different versions. But this need a runtime environment provided by Pandora which is running as a tomcat process. I have to admit that this is a heavy plan. Pandora is developed based on a fact that JVM identifies one class by class-loader plus classname.
If you know someone maybe know the answers, share the link with him/her.

We are a large company and we have this problem a lot. We have large dependency trees that over several developer groups. What we do:
We manage versions by BOMs (lists of Maven dependencyManagement) of "recommended versions" that are published by the maintainers of the jars. This way, we make sure that recent versions of the artifacts are used.
We try to reduce the large dependency trees by separating the functionality that is used inside a developer group from the one that they offer to other groups.
But I admit that we are still trying to find better strategies. Let me also mention that using "microservices" is a strategy against this problem, but in many cases it is not a valid strategy for us (mainly because we could not have global transactions on databases any more).

This is a common problem in the java world.
Your best options are to regularly maintain and update dependencies of both packageA and packageB.
If you have control over those applications - make time to do it. If you don't have control, demand that the vendor or author make regular updates.
If both packageA and packageB are used internally, you can use the following practise: have all internal projects in your company refer to a parent in the maven pom.xml that defines "up to date" versions of commonly used third party libraries.
For example:
<framework.jersey>2.27</framework.jersey>
<framework.spring>4.3.18.RELEASE</framework.spring>
<framework.spring.security>4.2.7.RELEASE</framework.spring.security>
Therefore, if your project "A" uses spring, if they use the latest version of your company's "parent" pom, they should both use 4.3.18.RELEASE.
When a new version of spring is released and desirable, you update your company's parent pom, and force all other projects to use that latest version.
This will solve many of these dependency mismatch issues.
Don't worry, it's common in the java world, you're not alone. Just google "jar hell" and you can understand the issue in the broader context.
By the way mvn dependency:tree is your friend for isolating these dependency problems.

I agree with the answer of #JF Meier ，In Maven multi-module project, the dependency management node is usually defined in the parent POM file when doing unified version management. The content of dependencies node declared by the node class is about the resource version of unified definition. The resources in the directly defined dependencies node need not be introduced into the version phase. The contents of the customs are as follows:
in the parent pom
<dependencyManagement> 
    <dependencies > 
      <dependency > 
        <groupId>com.devzuz.mvnbook.proficio</groupId> 
        <artifactId>proficio-model</artifactId> 
        <version>${project.version}</version> 
      </dependency > 
</dependencies >
</dependencyManagement>
in your module ,you do not need to set the version
<dependencies > 
    <dependency > 
      <groupId>com.devzuz.mvnbook.proficio</groupId> 
       <artifactId>proficio-model</artifactId> 
    </dependency > 
  </dependencies > 
This will avoid the problem of inconsistency .

This question can't be answered in general.
In the past we usually just didn't use dependencies of different versions. If the version was changed, team-/company-wide refactoring was necessary. I doubt it is possible with most build tools.
But to answer your question..
Simple answer: Don't use two versions of one dependency within one compilation unit (usually a module)
But if you really have to do this, you could write a wrapper module that references to the legacy version of the library.
But my personal opinion is that within one module there should not be the need for these constructs because "one module" should be relatively small to be manageable. Otherwise it might be a strong indicator that the project could use some modularization refactoring. However, I know very well that some projects of "large-scale companies" can be a huge mess where no 'good' option is available. I guess you are talking about a situation where packageA is owned by a different team than packageB... and this is generally a very bad design decision due to the lack of separation and inherent dependency problems.

First of all, try to avoid the problem. As mentioned in #Henry's comment, don't use 3rd party libraries for trivial tasks.
However, we all use libraries. And sometimes we end up with the problem you describe, where we need two different versions of the same library. If library 'C' has removed and added some APIs between the two versions, and the removed APIs are needed by 'A', while 'B' needs the new ones, you have an issue.
In my company, we run our Java code inside an OSGi container. Using OSGi, you can modularize your code in "bundles", which are jar files with some special directives in their manifest file. Each bundle jar has its own classloader, so two bundles can use different versions of the same library. In your example, you could split your application code that uses 'packageA' into one bundle, and the code that uses 'packageB' in another. The two bundles can call each others APIs, and it will all work fine as long as your bundles do not use 'packageC' classes in the signature of the methods used by the other bundle (known as API leakage).
To get started with OSGi, you can e.g. take a look at OSGi enRoute.

Let me throw away a brick in order to get a gem first.
Alibaba is one of the largest E-Commerces in the world. And we tackle with these problems by creating an isolation container named Pandora. Its principle is simple: packaging those middle-wares together and load them with different ClassLoaders so that they can work well together even they referenced same packages with different versions. But this need a runtime environment provided by Pandora which is running as a tomcat process. I have to admit that this is a heavy plan.
Pandora is developed based on a fact that JVM identifies one class by class-loader plus classname.

Related

How should I structure my maven projects of a grouped collection of modules?

The general architecture
We have an internal java application (let's call it com.example.framework) acting as a kind of framework in the sense of being extensible through plugins. These plugins can serve various purposes. As an example, there will be a plugin for the support of the differnt database providers, e.g., MysqlPlugin, OraclePlugin and MssqlPlugin. On the other hand their might be support for exchange formats such as JSON or XML, etc.
Code splitting
The framework application is developed as a seperate multi-module java project with the parent group id com.example.framework having the API/SPI as a distinct child module. Therefore, the plugins have this api-module as a dependecy called com.example.framework.api, which works perfectly fine. Idealy, each plugin will have its own artifact under a group called com.example.framework.plugins such that I will only have those plugins installed that are really needed.
The problem to solve
To ease the developer experience, I would like to group plugins of similar functionality, which even might want to share a bit of code together to a git project while keeping some special ones alone. Now I wonder what the best way of structuring this in the maven system is.
Current idea
The best solution I could find is to also use the multi-module approach for grouped plugin projects to achieve splitted artifacts while being able to share code between two plugins. However, I am still confused about the groupId and version of the parent:
The naming convention of maven suggests to use a unique groupId for each project. This would mean to introduce another depth of naming, e.g. com.example.framework.plugins.sql.mysql, which would be inconvenient, since the name of a plugin is no longer sufficient to derive the full module name (with the a-priori knowledge of the common package name com.example.framework.plugins). So I wonder, whether the purpose of the convention is soley to avoid possible duplicates by design? Since I control the namespace and all plugins, I would make sure that there are no conflicts.
The actual question
If I were to remove the intermediate name layer and thus have multiple parent poms with the same groupId, what problems could arise? Since plugins would not even share versions, the parent has no real purpose and also no artifact on its own that could collide, or am I missing anything?
Or is my entire structure not ideal and I should adopt some other form? During my research I could not find any similar use-case.

Usually, different related projects share the same groupId. There is no problem in that. The linked Maven page is misleading.

Should you shade your dependencies?

For my job I use Spark every day. One of the problems comes from dependency conflicts. I can't help but think that they would all go away if people released their jars already shaded to their own namespace.
For internal jars, I'm considering doing this for all our dependencies. Other than a small bit of work, I'm seeing this as a good idea. Is there any drawbacks/risks I'm missing?

Some problems go away with shading, but new problems arise. One problem is that you take away the chance for your users to use a different (patched) version of a dependency than the version used in shading.
But the main risk of shading is that shaded classes end up exposed to clients.
So imagine you have 2 dependencies a, b, each shading log4j. So when you include a and b, you get classes a.shaded.log4j.Logger(v1.3) and b.shaded.log4j.Logger(1.4) on your compile/runtime classpath. And you may have your own log4j.Logger(1.5).
Then you want to do something with all Loggers in your system at runtime, but suddenly you get many different logger classes and class version at runtime.
So shading is only without risk when you can make sure that the clients will not ever see any instances of shaded classes via the API of your library. But this is very difficult to guarantee. Maybe with modules in Java9 this will be a little less problematic, but even then having just one known version of any class on the classpath is much easier to debug/manage than a wild mix of shaded classes with same names but different versions.

Multi-component versioning/building best practices

I have a Java project, built with Maven, that aggregates several components, each one in its own Maven project. Any one of these components may evolve separately.
The structure of my project can be described as follows:
my-main-project that depends on:
my-component-1
my-component-2
etc.
Nowadays, all pom.xml are using "snapshot" versions, so, they are all using the "latest" version available in my repository.
But once I send a release version to my customer, I'm supposed to freeze the versions and make a TAG (or equivalent) in my source-control, so I can restore a previous state in case of maintenance.
So, my question is: should I change all pom.xml files before each release, give version numbers to the components, and tie everything with this dependency versions? Also, if I have many components (my project currenty has 30+ small subcomponents) would I have to renumber/reversion each one before each release? When a single component evolves (due to bug fix or enhancement), must I increase its version so that the changes do not affect pre-existing releases, right?
How people using maven generally handle this many-component versioning case?
Of course, I could just rely on my version-control tags to restore to a previous point-in-time and just tag every component on each release, but I don't like this approach, since the dependency versioning (with maven) gives me much more control and visibility about what is packaged, and relations of (broken-)compatibility and many more.

General Considerations
You may consider some relations between your components.
Are they really independant (each one vs each other) ? Or is there some kinds of relation ... some commons lifecycles ?
If you find some relationship between them, consider using maven multi-modules : http://www.sonatype.com/books/mvnex-book/reference/multimodule.html. In a few words, you will have a parent, with one version, and some modules (some jars .. in a way like Spring and its submodules). This will help you to reduce versions management.
You may consider using maven-release-plugin. It will help you to tag, build and deploy automatically your modules, dealing more easily with versionning and links with SCM and Repository.
Moreover, combine with multi-module it would drastically help you !
There is a lot of topic dealing with this on Stack Overflow.
I don't know if you already know that. I could explain it a lot further if you want, but you may have enough elements to search by yourself if you don't.
Straight Answers
So, my question is: should I change all pom.xml files before each release, give version numbers to the components, and tie everything with this dependency versions?
Yes you should. In Application Lifecycle Management follow the changes is REALLY important. So, as you could imagine, and as you point it out, you really should build and tag each of your components. It could be painful, but maven-realease-plugin and multi module (even with a Continuous Integration plateform) it could be easier.
would I have to renumber/reversion each one before each release?
For exactly the same reasons : yes !
must I increase its version so that the changes do not affect pre-existing releases, right?
Yes, you should too. Assuming you choose a common versionning like MAJOR.minor.correction, the first number indicate compatibilty breaks. Minor version would bring some breaks, but should not. Corrections whould NEVER affect compatibility.
How people using maven generally handle this many-component versioning case?
I cannot reply for every one, but my previous comments on release-plugin and multi-module considered as best pratices. If you want to a little bit further, you can imagine use more powerfull SCM (Clearcase, Perforce, ...), but maven integration is fewer, not "well" documented and community provide less examples than SVN or Git.

Maven Release Plugin
If you are using a multi-module pom.xml you should be able to do mvn release -DautoVersionSubmodules and have it do a "release" build of all your dependencies and remove the -SNAPSHOT versions and upload them to your repository. That is what the release plugin and its workflow exists solely to do.

Package vs project separation in java

First off, I'm coming (back) to Java from C#, so apologies if my terminology or philosophy doesn't quite line up.
Here's the background: we've got a growing collection of internal support tools written for the web. They use HTML5/AJAX/other buzzwords for the frontend and Java for the backend. These tools utilize a lightweight in-house framework so they can share an administrative interface for security and other configuration. Each tool has been written by a separate author and I expect that trend to continue, so I'd like to make it easy for future authors to stay "standardized" on the third-party libraries that we've already decided to use for things like DI, unit testing, ORM, etc.
Our package naming currently looks like this:
com.ourcompany.tools.framework
com.ourcompany.tools.apps.app1name
com.ourcompany.tools.apps.app2name
...and so on.
So here's my question: should each of these apps (and the framework) be treated as a separate project for purposes of Maven setup, Eclipse, etc?
We could have lots of apps appear here over time, so it seems like separation would keep dependencies cleaner and let someone jump in on a single tool more easily. On the other hand, (1) maybe "splitting" deeper portions of a package structure over multiple projects is a code smell and (2) keeping them combined would make tool writers more inclined to use third-party libraries already in place for the other tools.
FWIW, my initial instinct is to separate them.
What say you, Java gurus?

I would absolutely separate them. For the purposes of Maven, make sure each app/project has the appropriate dependencies to the framework/apps so you don't have to build everything when you just want to build a single app.

I keep my projects separated out, but use a parent pom for including all of the dependencies and other common properties. Individual tools / projects have a name and a reference to the parent project, and any project-specific dependencies, if any. This works for helping to keep to common libraries and dependencies, since the common ones are already all configured, but allows me to focus on the specific portion of the codebase that I need to work with.

I'd definitely separate these kind of things out into separate projects.
You should use Maven to handle the dependencies / build process automatically (both for your own internal shared libraries and third party dependencies). There won't be any issue having multiple applications reference the same shared libraries - you can even keep multiple versions around if you need to.
Couple of bonuses from this approach:
This forces you to think carefully about your API design for the shared projects which will be a good thing in the long run.
It will probably also give you about the right granularity for source code control - i.e. your developers can check out and work on specific applications or backend modules individually

If there is a section of a project that is likely to be used on more than one project it makes sense to pull that out. It will make it a little cleaner as well if you need to update the code in one of the commonly used projects.

If you keep them together you will have fewer obstacles developing, building and deploying your tools.
We had the opposite situation, having many separate projects. After merging them into one project tree we are much more productive and this is more important to us than whatever conventions happen to be trending.

How to modularize a (large) Java App?

I have a rather large (several MLOC) application at hand that I'd like to split up into more maintainable separate parts. Currently the product is comprised of about 40 Eclipse projects, many of them having inter-dependencies. This alone makes a continuous build system unfeasible, because it would have to rebuild very much with each checkin.
Is there a "best practice" way of how to
identify parts that can immediately be separated
document inter-dependencies visually
untangle the existing code
handle "patches" we need to apply to libraries (currently handled by putting them in the classpath before the actual library)
If there are (free/open) tools to support this, I'd appreciate pointers.
Even though I do not have any experience with Maven it seems like it forces a very modular design. I wonder now whether this is something that can be retrofitted iteratively or if a project that was to use it would have to be layouted with modularity in mind right from the start.
Edit 2009-07-10
We are in the process of splitting out some core modules using Apache Ant/Ivy. Really helpful and well designed tool, not imposing as much on you as maven does.
I wrote down some more general details and personal opinion about why we are doing that on my blog - too long to post here and maybe not interesting to everyone, so follow at your own discretion: www.danielschneller.com

Using OSGi could be a good fit for you. It would allow to create modules out of the application. You can also organize dependencies in a better way. If you define your interfaces between the different modules correctly, then you can use continuous integration as you only have to rebuild the module that you affected on check-in.
The mechanisms provided by OSGi will help you untangle the existing code. Because of the way the classloading works, it also helps you handle the patches in an easier way.
Some concepts of OSGi that seem to be a good match for you, as shown from wikipedia:
The framework is conceptually divided into the following areas:
Bundles - Bundles are normal jar components with extra manifest headers.
Services - The services layer connects bundles in a dynamic way by offering a publish-find-bind model for plain old Java objects(POJO).
Services Registry - The API for management services (ServiceRegistration, ServiceTracker and ServiceReference).
Life-Cycle - The API for life cycle management (install, start, stop, update, and uninstall bundles).
Modules - The layer that defines encapsulation and declaration of dependencies (how a bundle can import and export code).
Security - The layer that handles the security aspects by limiting bundle functionality to pre-defined capabilities.

First: good luck & good coffee. You'll need both.
I once had a similiar problem. Legacy code with awful circular dependencies, even between classes from different packages like org.example.pkg1.A depends on org.example.pk2.B and vice versa.
I started with maven2 and fresh eclipse projects. First I tried to identify the most common functionalities (logging layer, common interfaces, common services) and created maven projects. Each time I was happy with a part, I deployed the library to the central nexus repository so that it was almost immediately available for other projects.
So I slowly worked up through the layers. maven2 handled the dependencies and the m2eclipse plugin provided a helpful dependency view. BTW - it's usually not too difficult to convert an eclipse project into a maven project. m2eclipse can do it for you and you just have to create a few new folders (like src/main/java) and adjust the build path for source folders. Takes just a minute or two. But expect more difficulties, if your project is an eclipse plugin or rcp application and you want maven not only to manage artifacts but also to build and deploy the application.
To opinion, eclipse, maven and nexus (or any other maven repository manager) are a good basis to start. You're lucky, if you have a good documentation of the system architecture and this architecture is really implemented ;)

I had a similar experience in a small code base (40 kloc). There are no °rules":
compiled with and without a "module" in order to see it's usage
I started from "leaf modules", modules without other dependencies
I handled cyclic dependencies (this is a very error-prone task)
with maven there is a great deal with documentation (reports) that can be deployed
in your CI process
with maven you can always see what uses what both in the site both in netbeans (with a
very nice directed graph)
with maven you can import library code in your codebase, apply source patches and
compile with your products (sometimes this is very easy sometimes it is very
difficult)
Check also Dependency Analyzer:
(source: javalobby.org)
Netbeans:
(source: zimmer428.net)

Maven is painful to migrate to for an existing system. However it can cope with 100+ module projects without much difficulty.

The first thing you need to decide is what infra-structure you will move to. Should it be a lot of independently maintained modules (which translates to individual Eclipse projects) or will you consider it a single chunk of code which is versioned and deployed as a whole. The first is well suited for migrating to a Maven like build environment - the latter for having all the source code in at once.
In any case you WILL need a continuous integration system running. Your first task is to make the code base build automatically, so you can let your CI system watch over your source repository and rebuild it whenyou change things. I decided for a non-Maven approach here, and we focus on having an easy Eclipse environment so I created a build enviornment using ant4eclipse and Team ProjectSet files (which we use anyway).
The next step would be getting rid of the circular dependencies - this will make your build simpler, get rid of Eclipse warnings, and eventually allow you to get to the "checkout, compile once, run" stage. This might take a while :-( When you migrate methods and classes, do not MOVE them, but extract or delegate them and leave their old name lying around and mark them deprecated. This will separate your untangeling with your refactoring, and allow code "outside" your project to still work with the code inside your project.
You WILL benefit from a source repository which allows for moving files, and keeping history. CVS is very weak in this regard.

I wouldn't recommend Maven for a legacy source code base. It could give you many headaches just trying to adapt everything to work with it.
I suppose what you need is to do an architectural layout of your project. A tool might help, but the most important part is to organize a logical view of the modules.

It's not free but Structure101 will give you as good as you will get in terms of tool support for hitting all your bullet points. But for the record I'm biased, so you might want to check out SonarJ and Lattix too. ;-)

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.