How to clean old dependencies from maven repositories?

How to clean old dependencies from maven repositories? - java

I have too many files in .m2 folder where maven stores downloaded dependencies. Is there a way to clean all old dependencies? For example, if there is a dependency with 3 different versions: 1, 2 and 3, after cleaning there must be only 3rd. How I can do it for all dependencies in .m2 folder?

If you are on Unix, you could use the access time of the files in there. Just enable access time for your filesystem, then run a clean build of all your projects you would like to keep dependencies for and then do something like this (UNTESTED!):
find ~/.m2 -amin +5 -iname '*.pom' | while read pom; do parent=`dirname "$pom"`; rm -Rf "$parent"; done
This will find all *.pom files which have last been accessed more than 5 minutes ago (assuming you started your builds max 5 minutes ago) and delete their directories.
Add "echo " before the rm to do a 'dry-run'.

Short answer -
Deleted .m2 folder in {user.home}. E.g. in windows 10 user home is C:\Users\user1. Re-build your project using mvn clean package. Only those dependencies would remain, which are required by the projects.
Long Answer -
.m2 folder is just like a normal folder and the content of the folder is built from different projects. I think there is no way to figure out automatically that which library is "old". In fact old is a vague word. There could be so many reasons when a previous version of a library is used in a project, hence determining which one is unused is not possible.
All you could do, is to delete the .m2 folder and re-build all of your projects and then the folder would automatically build with all the required library.
If you are concern about only a particular version of a library to be used in all the projects; it is important that the project's pom should also update to latest version. i.e. if different POMs refer different versions of the library, all will get downloaded in .m2.

Given a POM file for a maven project you can remove all its dependencies in the local repository (by default ~/.m2/respository) using the Apache Maven Dependency Plugin.
It includes the dependency:purge-local-repository functionality that removes the project dependencies from the local repository, and optionally re-resolve them.
To clean the local dependencies you just have to used the optional parameter reResolve and set it to false since it is set to true by default.
This command line call should work:
mvn dependency:purge-local-repository -DreResolve=false

Download all actual dependencies of your projects
find your-projects-dir -name pom.xml -exec mvn -f '{}' dependency:resolve
Move your local maven repository to temporary location
mv ~/.m2 ~/saved-m2
Rename all files maven-metadata-central.xml* from saved repository into maven-metadata.xml*
find . -type f -name "maven-metadata-central.xml*" -exec rename -v -- 's/-central//' '{}' \;
To setup the modified copy of the local repository as a mirror, create the directory ~/.m2 and the file ~/.m2/settings.xml with the following content (replacing user with your username):
<settings>
<mirrors>
<mirror>
<id>mycentral</id>
<name>My Central</name>
<url>file:/home/user/saved-m2/</url>
<mirrorOf>central</mirrorOf>
</mirror>
</mirrors>
</settings>
Resolve your projects dependencies again:
find your-projects-dir -name pom.xml -exec mvn -f '{}' dependency:resolve
Now you have local maven repository with minimal of necessary artifacts. Remove local mirror from config file and from file system.

It's been more than 6 years since this question was asked, but I still didn't find any tool to satisfactorily clean up my repository. So I wrote one myself in Python to get rid of old local artefacts. Maybe it will be useful for someone else also:
repo-cleaner.py:
from os.path import isdir
from os import listdir
import shutil
import semver
import Constants
# Change to True to get a log of what will be removed
dry_run = False
def check_and_clean(path):
files = listdir(path)
only_files = True
for index, file in enumerate(files):
if isdir('/'.join([path, file])):
only_files = False
else:
files[index] = None
if only_files:
return
directories = [d for d in files if d is not None]
latest_version = check_if_versions(directories)
if latest_version is None:
for directory in directories:
check_and_clean('/'.join([path, directory]))
elif len(directories) == 1:
return
else:
print('Update ' + path.split(Constants.m2_path)[1])
for directory in directories:
if directory == latest_version:
continue
print(directory + ' (Has newer version: ' + latest_version + ')')
if not dry_run:
shutil.rmtree('/'.join([path, directory]))
def check_if_versions(directories):
if len(directories) == 0:
return None
latest_version = ''
for directory in directories:
try:
current_version = semver.VersionInfo.parse(directory)
except ValueError:
return None
if latest_version == '':
latest_version = directory
if current_version.compare(latest_version) > 0:
latest_version = directory
return latest_version
if __name__ == '__main__':
check_and_clean(Constants.m2_path)
Constants.py (edit to point to your own local Maven repo):
# Paths
m2_path = '/home/jb/.m2/repository/'
Make sure that you have Python 3.6+ installed and that the semver package has been installed into your global environment or venv (use pip install semver if missing).
Run the script with python repo-cleaner.py.
It recursively searches within the local Maven repository you configured (normally ~/.m2/repository) and if it finds a catalog where different versions reside it removes all of them but the newest.
Say you have the following tree somewhere in your local Maven repo:
.
└── antlr
├── 2.7.2
│   ├── antlr-2.7.2.jar
│   ├── antlr-2.7.2.jar.sha1
│   ├── antlr-2.7.2.pom
│   ├── antlr-2.7.2.pom.sha1
│   └── _remote.repositories
└── 2.7.7
├── antlr-2.7.7.jar
├── antlr-2.7.7.jar.sha1
├── antlr-2.7.7.pom
├── antlr-2.7.7.pom.sha1
└── _remote.repositories
Then the script removes version 2.7.2 of antlr and what is left is:
.
└── antlr
└── 2.7.7
├── antlr-2.7.7.jar
├── antlr-2.7.7.jar.sha1
├── antlr-2.7.7.pom
├── antlr-2.7.7.pom.sha1
└── _remote.repositories
Any old versions, even ones that you actively use, will be removed. It can easily be restored with Maven (or other tools that manage dependencies).
You can get a log of what is going to be removed without actually removing it by setting dry_run = True. The output will look like this:
update /org/projectlombok/lombok
1.18.2 (newer version: 1.18.6)
1.16.20 (newer version: 1.18.6)
This means that versions 1.16.20 and 1.18.2 of lombok will be removed and 1.18.6 will be left untouched.
The latest version of the above files can be found on my github.

I came up with a utility and hosted on GitHub to clean old versions of libraries in the local Maven repository. The utility, on its default execution removes all older versions of artifacts leaving only the latest ones. Optionally, it can remove all snapshots, sources, javadocs, and also groups or artifacts can be forced / excluded in this process. This cross platform also supports date based removal based on last access / download dates.
https://github.com/techpavan/mvn-repo-cleaner

I wanted to remove old dependencies from my Maven repository as well. I thought about just running Florian's answer, but I wanted something that I could run over and over without remembering a long linux snippet, and I wanted something with a little bit of configurability -- more of a program, less of a chain of unix commands, so I took the base idea and made it into a (relatively small) Ruby program, which removes old dependencies based on their last access time.
It doesn't remove "old versions" but since you might actually have two different active projects with two different versions of a dependency, that wouldn't have done what I wanted anyway. Instead, like Florian's answer, it removes dependencies that haven't been accessed recently.
If you want to try it out, you can:
Visit the GitHub repository
Clone the repository, or download the source
Optionally inspect the code to make sure it's not malicious
Run bin/mvnclean
There are options to override the default Maven repository, ignore files, set the threshold date, but you can read those in the README on GitHub.
I'll probably package it as a Ruby gem at some point after I've done a little more work on it, which will simplify matters (gem install mvnclean; mvnclean) if you already have Ruby installed and operational.

Just clean every content under .m2-->repository folder.When you build project all dependencies load here.
In your case may be your project earlier was using old version of any dependency and now version is upgraded.So better clean .m2 folder and build your project with mvn clean install.
Now dependencies with latest version modules will be downloaded in this folder.

I did spend some hours looking at this problem and to the answers, many of them rely on the atime (which is the last access time on UNIX systems), which is an unreliable solution for two reasons:
Most UNIX systems (including Linux and macOS) update the atime irregularly at best, and that is for a reason: a complete implementation of atime would imply the whole file system would be slowed down by having to update (i.e., write to the disk) the atime every time a file is read, moreover having a such an extreme number of updates would very rapidly wear out the modern, high performance SSD drives
On a CI/CD environment, the VM that's used to build your Maven project will have its Maven repository restored from a shared storage, which in turn will make the atime get set to a "recent" value
I hence created a Maven repository cleaner and made it available on https://github.com/alitokmen/maven-repository-cleaner/. The bash maven-repository-cleaner.sh script has one function, cleanDirectory, which is a recursive function looping through the ~/.m2/repository/ and does the following:
When the subdirectory is not a version number, it digs into that subdirectory for analysis
When a directory has subdirectories which appear to be version numbers, it only deletes all lower versions
In practice, if you have a hierarchy such as:
artifact-group
artifact-name
1.8
1.10
1.2
... maven-repository-cleaner.sh script will:
Navigate to artifact-group
In artifact-group, navigate to artifact-name
In artifact-name, delete the subfolders 1.8 and 1.2, as 1.10 is superior to both 1.2 and 1.8
This is hence very similar to the solutions Andronicus and Pavan Kumar have provided, the difference is that this one is written as a Shell script. To run the tool on your CI/CD platform (or any other form of UNIX system), simply use the below three lines, either at the beginning or at the end of the build:
wget https://raw.githubusercontent.com/alitokmen/maven-repository-cleaner/main/maven-repository-cleaner.sh
chmod +x maven-repository-cleaner.sh
./maven-repository-cleaner.sh

You need to copy the dependency you need for project.
Having these in hand please clear all the <dependency> tag embedded into <dependencies> tag
from POM.XML file in your project.
After saving the file you will not see Maven Dependencies in your Libraries.
Then please paste those <dependency> you have copied earlier.
The required jars will be automatically downloaded by Maven, you can see that too in
the generated Maven Dependencies Libraries after saving the file.
Thanks.

Related

Installing parquet-tools

I am trying to install parquet tools on a FreeBSD machine.
I cloned this repo: git clone https://github.com/apache/parquet-mr
Then I did cd parquet-mr/parquet-tools
Then I did `mvn clean package -Plocal
As specified here: https://github.com/apache/parquet-mr/tree/master/parquet-tools
This is what I got:
Why is this dependency error here? How do I get around it?

On Ubuntu 20, I install via pip:
python3 -m pip install parquet-tools
Haven't tried on FreeBSD but I'd imagine it would also work. See related answer for a caveat on using pip on FreeBSD.
And you can view a file with:
parquet-tools show filename.parquet

I know the question specifies FreeBSD, but if you're on mac, you can do
brew install parquet-tools

parquet-tools is just one module of parquet-mr. It depends on some of the other modules.
When you build from a source version that corresponds to a release, those other modules will be available to Maven, because release artifacts are published as a part of the release process.
However, when building from a snapshot version, you have to make those dependencies available yourself. There are two ways to do so:
Option 1: Build and install all modules of the parent directory:
git clone https://github.com/apache/parquet-mr
cd parquet-mr
mvn install -Plocal
This will put the snapshot artifacts in your local ~/.m2 directory. Subsequently, you can (re)build just parquet-tools like you initially tried, because now the snapshot artifacts will already be available from ~/.m2.
Option 2: Build the parquet-mr modules from the parent directory, while asking Maven to build needed modules as well along the way:
git clone https://github.com/apache/parquet-mr
cd parquet-mr
mvn package -pl parquet-tools -am -Plocal
Option 1 will build more projects than option 2, so if you only need parquet-tools, you are better off with the latter. Please note though that probably both will require installation of a thrift compiler.

Parquet tools- A utility that can be leveraged to read parquet files. Yuu can clone it from Github and run some maven command.
1. git clone https://github.com/Parquet/parquet-mr.git
2. cd parquet-mr/parquet-tools/
3. mvn clean package -Plocal
OR You can download stable release & built from local.
Downloading stable Parquet release.
https://github.com/apache/parquet-mr/archive/apache-parquet-1.8.2.tar.gz
2. Maven local install.
D:\parquet>cd parquet-tools && mvn clean package -Plocal
3. Test it (paste a parquet file under target directory):
D:\parquet\parquet-tools\target>java -jar parquet-tools-1.8.2.jar schema out.parquet
(where out.parquet is my parquet file under target directory)
// Read parquet file
D:\parquet\parquet-tools\target>java -jar parquet-tools-1.6.0.jar cat out.parquet
// Read few lines in parquet file
D:\parquet\parquet-tools\target>java -jar parquet-tools-1.6.0.jar head -n5 out.parquet

Some answers have broken link for the jar download, but you can get it from
maven central
However... this jar and others like it are built so that the hadoop dependencies are "provided" and if you build from source, you'll get that default. So you need to set -Dhadoop.scope=compile when you build, or the result will only work when run on a hadoop node using the "hadoop ..." command.
To make matters worse, this tool apparently disables System.out and System.err so that exceptions that cause main() fails are never printed and you'll be left wondering what happened.
I also found that the default settings for the maven-license-plugin caused it to fail the build when files showed up that it didn't expect (e.g. nbactions.xml if you use netbeans).

How do I know which project is requesting a specific jar from Maven

I'm using Eclipse and recently upgraded all my projects to use the latest version of a library.
However in the Maven repository I can still see the old version of the library.
I've deleted manually the old library from the Maven repository, but it keeps coming back.
I am sure all the projects in Eclipse point to the new version: I've checked all my pom.xml, I've used the "Dependency Hierarchy" tool, etc.
Is there a way to know which project is telling Maven to download the old version of the library?
Many thanks!

You can use the Maven dependency plugin's tree goal:
mvn dependency:tree
and filter using the includes option which uses the pattern [groupId]:[artifactId]:[type]:[version].

Re: "and I have many". Perform the following in the topmost directory:
find . -name "pom.xml" -type f -exec mvn dependency:tree -f {} ';' | grep '^\[.*\] [-+\\\|].*'
Syntax details may vary from Bash to Bash.
Hint: Try it in a bottommost project directory first to ensure that it runs properly as intended. Since you have many projects it may take a while to finish and to recognize possible errors only then.

You can use below command to get a tree of all dependencies and then find out where the specific artifact is coming from.
You can pipe with grep to show only the related ones if you you are on linux/unix based os.
mvn dependency:tree

Thanks guys, appreciated, but it certainly is not an easy way. It looks like you have to do project by project (and I have many). Plus most of my pom reference poms in other folders and it's not able to process that either.

get maven clean install to work like maven clean + maven install

I have the following project hierarchy:
app
|-module1
| |-pom.xml
|-module2
| |-pom.xml
|-pom.xml
Module1 and module2 both copies files to the same target directory, so im using the app's pom.xml to clear that directory. My problem is, the execution order right now is module1[clean], module1[install], module2[clean], module2[install], app[clean], app[install], so everything module1 and module2 puts into that directory will be deleted.
I would like to get it to execute all clean first, then all install, even when i run mvn clean install. Or if there is another way to execute app[clean] before module1[install] and module2[install], that would work too.
EDIT
I ended up making a separate module (Netbeans POM projekt) for cleaning alone. Not the sollution i was hoping for, but it works for now.

The root of the problem here is that you're trying to make Maven do something that sort-of contradicts Maven's multi-module "conventions", as well as conflicting with Maven's "understanding" of a "target directory". There is a reason why Maven's reactor is operating the way that it does, and it is to preserve the Maven "spirit" (or "convention") of how modules are structured in a multi-module build.
In Maven, the target directory is supposed to belong only to one project: each project has its own target directory. In your scenario, there should really be a different target directory for app, module1 and module2.
I suppose your best bet, in order to both achieve your objective and keep your build process flexible, is to:
Have module1 output its own JAR into its own target directory (module1/target).
Have module2 output its own JAR into its own target directory (module2/target).
Add a plugin to app (the parent module) that will collect whatever it needs from module1/target and module2/target into app/target, and do whatever processing on those artifacts.

Maven building only changed files

Lets say i have module structure like below
Modules
->utils
->domain
->client
->services
->deploy (this is at the module level)
Now to lauch the client i need to make a build of all the modules, i.e utils, domain, client, services, because i am loading the jars of all the above modules to fianlly lanch the client
And all the jars gets assembled in the module deploy.
My question is if i change anything in services for example, then is there a way when running a build from deploy maven could recognise it has to build only services and hence build it and deploy it in deploy folder?

If you only call "mvn install" without "clean", the compiler plugin will compile only modified classes.

For GIT
mvn install -amd -pl $(git status | grep -E "modified:|deleted:|added:" | awk '{print $2}' | cut -f1 -d"/")
OR
In your .bashrc file (.bashrc can be found in home directory ~/.bashrc , or create it if doesn't exists) add the following function.
mvn_changed_modules(){
[ -z "$1" ] && echo "Expected command : mvn_changed_modules (install/build/clean or any maven command)" && exit 0
modules=$(git status | grep -E "modified:|deleted:|added:" | awk '{print $2}' | cut -f1 -d"/")
if [ -z "$modules" ];
then
echo "No changes (modified / deleted / added) found"
else
echo -e "Changed modules are : `echo $modules`\n\n"
mvn $1 -amd -pl $modules
fi
}
**Then after re-starting your bash** (command prompt), you **can just use the following command** from the ROOT directory itself.
smilyface#machine>ProjectRootDir]$ mvn_changed_module install
How it works
As per the question mvn install -amd -pl services is the command when "some changes done in services module". So, first get module name from the changed file(s) and put it as input for mvn-install command
Say for example, below is a list of modified files (output of git status) -
services/pom.xml
services/ReadMe.txt
web/src/java/com/some/Name.java
Then services and web are the modules name which need to be build / compile / install

Within a multi-module build you can use:
mvn -pl ChangedModule compile
from the root module will compile only the given ChangedModule. The compiler plugin will only compile the files which have been changed. But it can happen that the module you have changed would cause a recompile of other module which are depending on the ChangedModule. This can be achieved by using the following:
mvn -amd -pl ChangedModule compile
where the -amd means also make dependents. This will work without installing the whole modules into the local repository by a mvn install.

After trying and using aforementioned advises, I've met following problems:
Maven install (without clean) still takes a lot of time, which for several projects can be 10-20s extra time.
Sebasjm's solution is fast and useful (I was using it for a couple of months), but if you have several changed projects, rebuilding them all the time (if you even hadn't change anything) is a huge waste of time
What really worked for me is comparing source modification dates against .jar modification in local repository. And if you check only for VCS changed files (see sebasjm's answer), then date comparison won't take noticeable time (for me it was less than 1s for 100 changed files).
Main benefit of such approach is very accurate rebuild of only really changed projects.
Main problem is doing modification date comparison is a bit more than one-liner script.
For those, who want to try it, but too lazy to write such script themself sharing my version of it: https://github.com/bugy/rebuilder (linux/windows).
It can do some additional useful things, but the main idea and central algorithm is as explained above.

If you are using SVN and *nix, from the root module
mvn install -amd -pl $(svn st | colrm 1 8 | sed 's /.* ' | xargs echo | sed 's- -,:-g' | sed 's ^ : ')

I had the same frustration and I also wrote a project at the time - alas it is not available but I found people who implemented something similar:
for example - https://github.com/erickzanardo/maven-watcher
It uses nodejs and assumes an maven project but should work on windows and unix alike.
The idea of my implementation is to watch for changes and then compile what changed. - kind of like nodemon.
So for example
When a java file changes - I compile the module
When a class file or jar changes - I do something else (for example copy the jar under tomcat and restart tomcat)
And the two are unrelated.. so if the java compilation failed, there should be no reason for the jar file to update.. and it's quite stable.
I have used it on a project with 23K .java files and it worked smoothly.
It took the watch process a couple of seconds to start - but then it would only run if change was detected so the overall experience was nice.
The next step I intended to add is similar to your SVN support - list the modified files and use them as initialization.
Important to note - if compilation fails, it will retry on the next modification. so if you are modifying multiple jars, and the compilation fails as long as you are writing code, it will retry to compile everything on each code change until it compiled successfully.
If you'd like I can try find my old project, fix it up a bit and publish it..

mvn clean install to run full build
mvn install to compile only changed and prepare war/jars other binaries
mvn compile to compile only changed files...
So mvn compile is the fastest. but if run/debug your project with war/jars it might not show those changes.

The question and the answers posted so far do not take the dependency tree into account. What if the utils module is changed? We need to rebuild (retest at least) it and all the modules depending on it.
Ways to do so:
https://github.com/avodonosov/hashver-maven-plugin/
https://github.com/vackosar/gitflow-incremental-builder/
Gradle Enterprise is a commercial service which provides build cache, in
particular for maven
Migrate to newer build tools like Gradle or Bazel which support build caches out of box.

could the first ever maven build be made offline?

The problem: you have a zipped java project distribution, which depends on several libraries like spring-core, spring-context, jacskon, testng and slf4j. The task is to make the thing buildable offline. It's okay to create project-scope local repo with all required library jars.
I've tried to do that. Looks like even as the project contains the jars it requires for javac and runtime, the build would still require internet access. Maven would still lurk into network to fetch most of its own plugins it requires for the build. I assume that maven is run with empty .m2 directory (as this may be the first launch of the build, which may be an offline build). No, I am not okay with distributing full maven repo snapshot along the project itself, as this looks like an utter mess for me.
A bit of backround: the broader task is to create windows portable-style JDK/IntelliJ Idea distribution which goes along the project and allows for some minimal java coding/running inside IDE with minimal configuration and minimal internet access. The project is targeted towards students in a computer class, with little or no control over system configuration. It is desirable to keep console build system intact for the offline mode, but I guess that maven is overly dependent on the network, so I have to ditch it in favor of good old ant.
So, what's your opinion, could we move first maven build in offline mode completely? My gut feeling is that initial maven distribution just contains the bare minimum required to pull essential plugins off the main repo and is not fully functional without seeing the main repo at least once.

Maven has a '-o' switch which allows you to build offline:
-o,--offline Work offline
Of course, you will need to have your dependencies already cached into your $HOME/.m2/repository for this to build without errors. You can load the dependencies with:
mvn dependency:go-offline
I tried this process and it doesn't seem to fully work. I did a:
rm -rf $HOME/.m2/repository
mvn dependency:go-offline # lot of stuff downloaded
# unplugged my network
# develop stuff
mvn install # errors from missing plugins
What did work however is:
rm -rf $HOME/.m2/repository
mvn install # while still online
# unplugged my network
# develop stuff
mvn install

You could run maven dependency:go-offline on a brand new .m2 repo for the concerned project. This should download everything that maven needs to be able to run offline. If these are then put into a project-scope local repo, you should be able to achieve what you want. I haven't tried this though

Specify a local repository location, either within settings.xml file with <localRepository>...</localRepository> or by running mvn with -Dmaven.repo.local=... parameter.
After initial project build, all necessary artifacts should be cached locally, and you can reference this repository location the same ways, while running other Maven builds in offline mode (mvn -o ...).

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.