Should you shade your dependencies? - java

For my job I use Spark every day. One of the problems comes from dependency conflicts. I can't help but think that they would all go away if people released their jars already shaded to their own namespace.
For internal jars, I'm considering doing this for all our dependencies. Other than a small bit of work, I'm seeing this as a good idea. Is there any drawbacks/risks I'm missing?

Some problems go away with shading, but new problems arise. One problem is that you take away the chance for your users to use a different (patched) version of a dependency than the version used in shading.
But the main risk of shading is that shaded classes end up exposed to clients.
So imagine you have 2 dependencies a, b, each shading log4j. So when you include a and b, you get classes a.shaded.log4j.Logger(v1.3) and b.shaded.log4j.Logger(1.4) on your compile/runtime classpath. And you may have your own log4j.Logger(1.5).
Then you want to do something with all Loggers in your system at runtime, but suddenly you get many different logger classes and class version at runtime.
So shading is only without risk when you can make sure that the clients will not ever see any instances of shaded classes via the API of your library. But this is very difficult to guarantee. Maybe with modules in Java9 this will be a little less problematic, but even then having just one known version of any class on the classpath is much easier to debug/manage than a wild mix of shaded classes with same names but different versions.

Related

Lightweight way to use a Maven dependency?

What do you normally do when you want to make use of some utility which is part of some Java library? I mean, if you add it as a Maven dependency is the whole library get included in the assembled JAR?
I don't have storage problems of any sort and I don't mind the JAR to get bloated but I'm just curious if there's some strategy to reduce the JAR size in this case.
Thanks
It is even worse: You do not only add the "whole library" to your project, but also its dependencies, whether you need them or not.
Joachim Sauer is right in saying that you do not bundle dependencies into your artifact unless you want it to be runnable. But IMHO this just moves the problem to a different point. Eventually, you want to run the stuff. At some point, a runnable JAR, a WAR or an EAR is built and it will incorporate the whole dependency tree (minus the fact that you get only one version per artifact).
The Maven shade plugin can help you to minimise your jar (see Minimize an Uber Jar correctly, Using Shade-Plugin) by only adding the "necessary" classes. But this is of course tricky in general:
Classes referenced by non-Java elements are not found.
Classes used by reflection are not found.
If you have some dependency injection going on, you only use the interfaces in your code. So the implementations might get kicked out.
So your minimizedJar might in general just be too small.
As you see I don't know any general solution for this.
What are sensible approaches?
Don't build JARs that have five purposes, but build JARs that are small. To avoid a huge number of different build processes, use multi-module projects.
Don't add libraries as dependencies unless you really need them. Better duplicate a method to convert Strings instead of adding a whole String library just for this purpose.

How do big companies tackle with the package dependencies conflict problem?

Just as shown in the picture, one app (Java) referenced two third-party package jars (packageA and packageB), and they referenced packageC-0.1 and packageC-0.2 respectively. It would work well if packageC-0.2 was compatible with packageC-0.1. However sometimes packageA used something that could not be supported in packageC-0.2 and Maven can only use the latest version of a jar. This issue is also known as "Jar Hell".
It would be difficult in practice to rewrite package A or force its developers to update packageC to 0.2.
How do you tackle with these problems? This often happens in large-scale companies.
I have to declare that this problem is mostly occurred in BIG companies due to the fact that big company has a lot of departments and it would be very expensive to let the whole company update one dependency each time certain developers use new features of new version of some dependency jars. And this is not big deal in small companies.
Any response will be highly appreciated.
Let me throw away a brick in order to get a gem first.
Alibaba is one of the largest E-Commerces in the world. And we tackle with these problems by creating an isolation container named Pandora. Its principle is simple: packaging those middle-wares together and load them with different ClassLoaders so that they can work well together even they referenced same packages with different versions. But this need a runtime environment provided by Pandora which is running as a tomcat process. I have to admit that this is a heavy plan. Pandora is developed based on a fact that JVM identifies one class by class-loader plus classname.
If you know someone maybe know the answers, share the link with him/her.
We are a large company and we have this problem a lot. We have large dependency trees that over several developer groups. What we do:
We manage versions by BOMs (lists of Maven dependencyManagement) of "recommended versions" that are published by the maintainers of the jars. This way, we make sure that recent versions of the artifacts are used.
We try to reduce the large dependency trees by separating the functionality that is used inside a developer group from the one that they offer to other groups.
But I admit that we are still trying to find better strategies. Let me also mention that using "microservices" is a strategy against this problem, but in many cases it is not a valid strategy for us (mainly because we could not have global transactions on databases any more).
This is a common problem in the java world.
Your best options are to regularly maintain and update dependencies of both packageA and packageB.
If you have control over those applications - make time to do it. If you don't have control, demand that the vendor or author make regular updates.
If both packageA and packageB are used internally, you can use the following practise: have all internal projects in your company refer to a parent in the maven pom.xml that defines "up to date" versions of commonly used third party libraries.
For example:
<framework.jersey>2.27</framework.jersey>
<framework.spring>4.3.18.RELEASE</framework.spring>
<framework.spring.security>4.2.7.RELEASE</framework.spring.security>
Therefore, if your project "A" uses spring, if they use the latest version of your company's "parent" pom, they should both use 4.3.18.RELEASE.
When a new version of spring is released and desirable, you update your company's parent pom, and force all other projects to use that latest version.
This will solve many of these dependency mismatch issues.
Don't worry, it's common in the java world, you're not alone. Just google "jar hell" and you can understand the issue in the broader context.
By the way mvn dependency:tree is your friend for isolating these dependency problems.
I agree with the answer of #JF Meier ,In Maven multi-module project, the dependency management node is usually defined in the parent POM file when doing unified version management. The content of dependencies node declared by the node class is about the resource version of unified definition. The resources in the directly defined dependencies node need not be introduced into the version phase. The contents of the customs are as follows:
in the parent pom
<dependencyManagement> 
    <dependencies > 
      <dependency > 
        <groupId>com.devzuz.mvnbook.proficio</groupId> 
        <artifactId>proficio-model</artifactId> 
        <version>${project.version}</version> 
      </dependency > 
</dependencies >
</dependencyManagement>
in your module ,you do not need to set the version
<dependencies > 
    <dependency > 
      <groupId>com.devzuz.mvnbook.proficio</groupId> 
       <artifactId>proficio-model</artifactId> 
    </dependency > 
  </dependencies > 
This will avoid the problem of inconsistency .
This question can't be answered in general.
In the past we usually just didn't use dependencies of different versions. If the version was changed, team-/company-wide refactoring was necessary. I doubt it is possible with most build tools.
But to answer your question..
Simple answer: Don't use two versions of one dependency within one compilation unit (usually a module)
But if you really have to do this, you could write a wrapper module that references to the legacy version of the library.
But my personal opinion is that within one module there should not be the need for these constructs because "one module" should be relatively small to be manageable. Otherwise it might be a strong indicator that the project could use some modularization refactoring. However, I know very well that some projects of "large-scale companies" can be a huge mess where no 'good' option is available. I guess you are talking about a situation where packageA is owned by a different team than packageB... and this is generally a very bad design decision due to the lack of separation and inherent dependency problems.
First of all, try to avoid the problem. As mentioned in #Henry's comment, don't use 3rd party libraries for trivial tasks.
However, we all use libraries. And sometimes we end up with the problem you describe, where we need two different versions of the same library. If library 'C' has removed and added some APIs between the two versions, and the removed APIs are needed by 'A', while 'B' needs the new ones, you have an issue.
In my company, we run our Java code inside an OSGi container. Using OSGi, you can modularize your code in "bundles", which are jar files with some special directives in their manifest file. Each bundle jar has its own classloader, so two bundles can use different versions of the same library. In your example, you could split your application code that uses 'packageA' into one bundle, and the code that uses 'packageB' in another. The two bundles can call each others APIs, and it will all work fine as long as your bundles do not use 'packageC' classes in the signature of the methods used by the other bundle (known as API leakage).
To get started with OSGi, you can e.g. take a look at OSGi enRoute.
Let me throw away a brick in order to get a gem first.
Alibaba is one of the largest E-Commerces in the world. And we tackle with these problems by creating an isolation container named Pandora. Its principle is simple: packaging those middle-wares together and load them with different ClassLoaders so that they can work well together even they referenced same packages with different versions. But this need a runtime environment provided by Pandora which is running as a tomcat process. I have to admit that this is a heavy plan.
Pandora is developed based on a fact that JVM identifies one class by class-loader plus classname.

How to avoid heavily changing Maven projects when new versions come out

Specific Background
I have just switched from spring data neo4j 4.1.3 to 5.0.0
And this issue has arisen since I changed my pom file.
Maven install fails because "cannot find symbol ... class GraphRepository"
I am newer to Java Maven projects as a whole
Broad Question:
If I update maven dependencies on a given project from one version of something to another and a class that I have been using heavily now gives around 100 error codes saying that whole class is now missing... how do I have this not happen.
Specific Where I think I'm at
I am gonna have to remove every reference to the "GraphRepository" and change it to Neo4jRepository since "Also note that GraphRepository is deprecated and replaced by Neo4jRepository" - Neo4j 4.2 graph repo save method is now ambiguous
But, this just doesn't seem right. Do I really have to go through an entire project and change all that code just to update?
One full line of error:
[ERROR] /.../service/SupportModelServiceImpl.java:[10,49] cannot find symbol
symbol: class GraphRepository
location: package org.springframework.data.neo4j.repository
You cannot prevent external dependencies from introducing breaking changes. However you could write your code so that it takes minimal effort to update external dependencies.
I have observed that in practice not much care is given to dependencies as if they are free. Initially they are as good as free, but once you start stacking your dependencies and have transitive dependencies that conflict or you upgrade to a new version with breaking changes there comes a maintenance cost. I have seen projects where the web of dependencies is so complex that they should be rewritten completely from scratch, if not for management not understanding the concept of technical debt, living in the illusion that maintaining an existing (bad) version of the software is cheaper than writing a new one.
The only option you have to guard against external dependencies, is to encapsulate them in one way or another. This may involve some boiler plate code, though if this boiler plate code is minimal it may be well worth the effort.
Because I have seen projects with horrible dependencies, I have given it some thought how I could prevent such a dependency mess and made the following image:
External code, over which you have no control is in red. If you do not think about structuring your code, your code (in orange) will depend directly on the external code and is at risk for external changes. You can try to write code (in green) that has no dependencies on external code. The way you achieve this is that you define the external functionality that you need in your own interfaces. You have then some code (in orange) that implements these interfaces and has external dependencies. You inject the code with external dependencies through a dependency injection framework.
This approach limits the impact of external changes to only the code in orange. However it requires more planning than directly using dependencies everywhere in your code. And because more planning means more effort, it is often not put in practice.
This is not specific to maven, you will have this issue regardless of whatever build system you use.
but I do not understand why you would want this, a major version change (e.g 4.xx to 5.xx) means something is going to break, and you will have to make changes into your code.

Multi-component versioning/building best practices

I have a Java project, built with Maven, that aggregates several components, each one in its own Maven project. Any one of these components may evolve separately.
The structure of my project can be described as follows:
my-main-project that depends on:
my-component-1
my-component-2
etc.
Nowadays, all pom.xml are using "snapshot" versions, so, they are all using the "latest" version available in my repository.
But once I send a release version to my customer, I'm supposed to freeze the versions and make a TAG (or equivalent) in my source-control, so I can restore a previous state in case of maintenance.
So, my question is: should I change all pom.xml files before each release, give version numbers to the components, and tie everything with this dependency versions? Also, if I have many components (my project currenty has 30+ small subcomponents) would I have to renumber/reversion each one before each release? When a single component evolves (due to bug fix or enhancement), must I increase its version so that the changes do not affect pre-existing releases, right?
How people using maven generally handle this many-component versioning case?
Of course, I could just rely on my version-control tags to restore to a previous point-in-time and just tag every component on each release, but I don't like this approach, since the dependency versioning (with maven) gives me much more control and visibility about what is packaged, and relations of (broken-)compatibility and many more.
General Considerations
You may consider some relations between your components.
Are they really independant (each one vs each other) ? Or is there some kinds of relation ... some commons lifecycles ?
If you find some relationship between them, consider using maven multi-modules : http://www.sonatype.com/books/mvnex-book/reference/multimodule.html. In a few words, you will have a parent, with one version, and some modules (some jars .. in a way like Spring and its submodules). This will help you to reduce versions management.
You may consider using maven-release-plugin. It will help you to tag, build and deploy automatically your modules, dealing more easily with versionning and links with SCM and Repository.
Moreover, combine with multi-module it would drastically help you !
There is a lot of topic dealing with this on Stack Overflow.
I don't know if you already know that. I could explain it a lot further if you want, but you may have enough elements to search by yourself if you don't.
Straight Answers
So, my question is: should I change all pom.xml files before each release, give version numbers to the components, and tie everything with this dependency versions?
Yes you should. In Application Lifecycle Management follow the changes is REALLY important. So, as you could imagine, and as you point it out, you really should build and tag each of your components. It could be painful, but maven-realease-plugin and multi module (even with a Continuous Integration plateform) it could be easier.
would I have to renumber/reversion each one before each release?
For exactly the same reasons : yes !
must I increase its version so that the changes do not affect pre-existing releases, right?
Yes, you should too. Assuming you choose a common versionning like MAJOR.minor.correction, the first number indicate compatibilty breaks. Minor version would bring some breaks, but should not. Corrections whould NEVER affect compatibility.
How people using maven generally handle this many-component versioning case?
I cannot reply for every one, but my previous comments on release-plugin and multi-module considered as best pratices. If you want to a little bit further, you can imagine use more powerfull SCM (Clearcase, Perforce, ...), but maven integration is fewer, not "well" documented and community provide less examples than SVN or Git.
Maven Release Plugin
If you are using a multi-module pom.xml you should be able to do mvn release -DautoVersionSubmodules and have it do a "release" build of all your dependencies and remove the -SNAPSHOT versions and upload them to your repository. That is what the release plugin and its workflow exists solely to do.

Dynamic Dependencies

We have a situation where some of our dependencies have conflicting dependencies.
We depend on A & B.
A dependes on version a of X.
B depends on version b of X.
Are there any dependency management tools that handle this type of a situation. I feel as if I had heard about some dependency management tool that dynamically loaded the dependencies or something. It seemed like it avoided ever running into a situation like above. I think you could specify somehow which version of X to load at a given instant or something.
Is it possible to do something like that? Is there any way in the code you can load and unload the dependency on a need basis?
I have forgotten most of compiler theory. And I haven't dealt much with dependency management. So excuse any ignorance showing through. It's probably genuine!
You can use OSGi or some other framework that manages multiple class-loaders so that the conflicting versions don't end up in the same class loader.
You can try to do the same thing yourself on a small scale by creating class loaders.
You can use the maven-shade-plugin to rename the packages in one or more copies to avoid the conflict.

Categories