Who is using my maven artifact?

Who is using my maven artifact? - java

I have a system consisting of multiple web applications (war) and libraries (jar). All of them are using maven and are under my control (source code, built artifacts in Nexus,...). Let say that application A is using library L1 directly and L2 indirectly (it is used from L1). I can easily check the dependency tree top-down from the application, using maven's dependency:tree or graph:project plugins. But how can I check, who's using my library? From my example, I want to know, whether A is the only application (or library) using L1 and that L2 is used from L1 and from some other application, let say B. Is there any plugin for maven or nexus or should I try to write some script for that? What are your suggestions?

If you wish to achieve this on a repository level, Apache Archiva has a "used by" feature listed under project information
.
This is similar to what mvnrepository.com lists under its "used by" section of an artifact description.
Unfortunately, Nexus does not seem to provide an equivalent feature.
Now I suppose it would be a hassle to maintain yet another repository just for that, but then it would probably easier than what some other answers suggestions, such as writing a plugin to Nexus. I believe Archiva can be configured to proxy other repositories.
Update
In fact, there's also a plugin for Nexus to achieve the "used by" feature.

As far as I know nothing along these lines exists as an open source tool. You could write a Nexus plugin that traverses a repo and checks for usages of your component in all other components by iterating through all the pom's and analyzing them. This would be a rather heavy task to run though since it would have to look at all components and parse all the poms.
In a similar fashion you could do it on a local repository with some other tool. However it probably makes more sense to parse the contents of a repo manager rather than a local repository.

I don't think there's a Maven way to do this. That being said, there are ways of doing this or similar things. Here's a handful examples:
Open up your projects in your favorite IDE. For instance Eclipse will help you with impact analysis on a class level, which most of the time might be good enough
Use a simple "grep" on your source directory. This sounds a bit brusk (as well as stating the obvious), perhaps, but we've used this a lot
Use dependency analysis tools such as Sonargraph or Lattix

I am not aware of any public libraries for this job, so I wrote a customized app which does it for me.
I work with a distribution which involves more than 70 artifacts bundled together. Many times after modifying an artifact, I want to ensure changes are backward compatible (i.e. no compilation errors are introduced in dependent artifacts). To achieve this, it was crucial to know all dependents of modified artifact.
Hence, I wrote an app which scans through all artifacts under a directory(/subdirectories), extracts their pom.xml and searches (in dependency section of pom) for occurrence of modified artifact.
(I did this in java although shell/windows script can do this even more compactly.)
I'll be happy to share code on github, if that could be of any help.

One way that might suit your needs are to create a master-pom with all your maven projects. Then you run the following command on the master-pom:
mvn dependency:tree -DoutputType=graphml -DoutputFile=dependency.graphml
Open the generated file in yEd.
Used the instructions found here:
http://www.summa-tech.com/blog/2011/04/12/a-visual-maven-dependency-tree-view/

More interesting is probably: what would you do with this information? Inform the developers of A not to use library L1 or L2 anymore, because it has a critical bug?
In my opinion you should be able to create a blacklist of dependencies/parents/plugins on your repository manager. Once a project tries to deploy/upload itself with a blacklisted artifact, it should fail. I'm saying uploading and not downloading, because that might break a lot of projects. As far as I know, this is not yet available for any repository-manager.

One of the ways to approach this problem is outside Java itself : write an OS-level monitoring script that tracks each case of fopen() on the jar file under question! Assuming this is in a corporate environemnt, you might have to wait for a few weeks (!) to allow all using processes to access the library at least once!
On Windows, you might use Sysinternals Process Monitor to do this:
http://technet.microsoft.com/en-us/sysinternals/bb896645
On Unix variants, you would use DTrace or strace.

IMHO and also from my experience, looking for a technical solution for such a problem is often an overkill. If the reason why you want to know who is using your artifact(library) is because you want to ensure backward compatibility when you change an artifact or something similar, I think it is best done by communicating your changes using traditional channels and also encourage other teams who might be using your library to talk about it (project blogs, wiki, email, a well known location where documentations are put, Jour fixe etc.).
In theory, you could write a script that crawls though each project in your repository and then parses the maven build.xml (assuming they all use maven) and see whether they have defined a dependency to your artifact. If all the projects in your organization follows the standard maven structure, it should be easy to write one such script (though if any of those projects have a dependency to your artifact via a transitive dependency, things can get a bit more tricky).

Related

Is there an Ant task which can fetch an artifact from Hudson/Jenkins?

I've hand-rolled my project's build-system (mostly in Python + Hudson). One of the things I need to do quite often is fetch artifacts from upstream Hudson / Jenkins.
These artifacts could be almost anything - for example a zip-file full of business data to process or even an egg containing a load of python code which must be tested. Almost every important job in our system has upstream dependancies on artifacts produced by other Hudson jobs.
My manager has suggested that the next iteration of the build-system should replace some of my hand-rolled components with Ant. The purpose of this next iteration will be to reduce the complexity of our systems and bring them into line with the work of other teams who mainly use Java and Ant (and very little Python).
Also I'm personally keen to have an excuse to learn Ant. It seems like a really useful tool.
So in order not to re-invent the wheel one component I'm definitely going to need is an Ant task which can fetch an artifact from a particular Hudson build. Does such a thing exist. If it does not exist, is there something close to my requirement that I could customize? I'd rather not re-invent the wheel.
UPDATE1: We have a strong preference for 100% free, open-source tools. Everybody in the team is very happy with Ant, however Maven is something the team is trying to get away from.

The proper solution is to publish the artifacts from Hudson/Jenkins to an artifact manager, such as Nexus or Artifactory, and then pull the artifact versions with something like Ivy or Gradle.

If you must get the dependencies yourself, you could use the get task. Example:
<get src="http://jenkins/job/project-name/lastSuccessfulBuild/artifact/foo.jar"
dest="/path/to/local/file"/>
I do, however, agree with Stefan - dependency management is better accomplished by tools mentioned in his answer instead of manually pulling them down yourself using Ant.

If you are just starting to learn ANT then I would suggest you learn gradle instead, it has the dependency part already integrated and is far easier to work with. Furthermore if you need a specific ANT task you can simply call it from gradle or even import whole ant scripts.
Otherwise I agree with the answer from Stefan Kendall.

Build system that allows sharing modules amongst different binaries

I'm trying to choose the most appropriate build system to work in enterprise with a common source repository, emphasizing sharing of common code. I'd like the source hierarchy to look something like this:
- src
- java
- common
- net
- database
- team1
- team2
- team3
- lib
- tests
- java
- common
- net
- database
- team1
- team2
- team3
- lib
The goal is to have a build system where team[1-3] can have independent builds that explicitly specify their dependencies. Dependencies might look like:
- team1
- common/net
- team3/lib
- team2
- common/database
- team3
So, for example, the build for team1 would include everything within the team1, common/net, and team3/lib; but nothing else. Ideally, tests would be integrated in the same fashion (testing team1 would run tests for team1, common/net, and team3/lib).
I'm currently using Ant, but haven't found a sane way to manage a hierarchy like this. I started to look at Maven 2 for its ability to manage a dependency hierarchy, but it seems to want full-fledged projects for each module. That wouldn't be a problem, but it seems to force me into a directory structure that does not map well to the traditional java package hierarchy. It seems like I might be able to do what I want with buildr using an alternative layout, but I'm worried that might prove to be brittle.
Can someone recommend something that might work for me?

I think you actually have three issues here.
How to layout your project so that the artifacts make sense.
How to best handle the sharing of these artifacts for each project.
How to handle the loss in productivity while converting the development team to use the new project structure.
For the first issue, try to use Maven conventions wherever possible and organize the project into multiple artifacts. If the artifacts should be nested under a parent, do so. Start off with the simplest artifact which has no dependencies and work your way through the code.
I'm not sure why you believe the layout won't support the traditional Java hierarchy? It should work, especially if you use parent poms.
Obviously the second issue can become quite a handful depending upon how you handle the first one. I would err on the side of creating more artifacts instead of fewer and using a repository manager like Nexus or Artifactory to manage them. At least that way, your team's builds can rely on pre-built and tested jars by hitting your repository to pull down the latest SNAPSHOT or RELEASE of the jar they are working with.
For the third, make sure you're using IDEs that have Maven support. If you're stuck using something like Rational Application Developer 7.0.x or an IDE based on something less than Eclipse 3.4, then you won't be able to use the M2Eclipse plugin. Without M2Eclipse, the developers will have to jump through some manual hoops which are not ideal. Netbeans 6.7 and 6.8 have very good Maven support.

As you say, Maven 2 is the preferred option for your case.
Maven folder structure is not madnatory - it is configurable, if you consider it unsiutable. However, I think it is a good structure that you can follow without remorse.
You can use a repository manager so that people who use some dependencies don't necessarily need to checkout the projects they depend on.

I started to look at Maven 2 for its ability to manage a dependency hierarchy, but it seems to want full-fledged projects for each module.
That's one way to do it. Alternatively, a multi-module Maven project can be organized like this:
project
module-1
src
main
....
test
....
pom.xml
module-2
src
main
....
test
....
pom.xml
...
pom.xml
where each pom.xml could also refer to modules defined by other trees. BTW, the Eclipse maven plugin supports this approach as well as the more common one-module-per-project approach.

I'm currently using Ant, but haven't found a sane way to manage a hierarchy like this.
This is surprising as Ant (+Ivy?) gives you all the flexibility you want.
I started to look at Maven 2 for its ability to manage a dependency hierarchy, but it seems to want full-fledged projects for each module.
If by this you mean one pom.xml per module, then that's correct.
That wouldn't be a problem, but it seems to force me into a directory structure that does not map well to the traditional java package hierarchy.
Yes, Maven comes with some conventions, the project directory structure being one of them. This is (a bit) configurable though but I don't think you'll be able to match the wanted layout (with tests and sources into separated hierarchies). And actually, I would strongly advice to use defaults if you go for Maven, you should adopt its philosophy, it will save you a lot, really a lot, of pain (not even mentioning that some plugins might use these default in an hard coded way).
To be honest, I don't really understand what you mean by a directory structure that does not map well to the traditional java package hierarchy. First, Maven is perfect for Java, so this doesn't make any sense to me. Second, and this might be more subjective, your layout (with separated tests and sources trees) doesn't look traditional at all to me. Maybe you should clarify what you mean exactly by traditional...
It seems like I might be able to do what I want with buildr using an alternative layout, but I'm worried that might prove to be brittle
I don't know buildr really well so I can't say much about it but I know it is indeed more flexible. That said, if Ant doesn't give you satisfaction in terms of flexibility, then I don't see why buildr would be better.
And don't forget that buildr and Ant+Ivy have much smaller communities compared to Maven. Don't underestimate this, this might become a real concern.
Personally, I would go for Maven and reconsider your layout. But let's say I'm biased.

What you are going for is going to give you lots of trouble in the long-term... each standalone component should really be made into its own project with its own repository, otherwise, you can get into lots of issues with changes in one component breaking the other components and updating taking excessively long. I strongly recommend that you make each component into its own project and using Maven2 to build.

You can do it with Buildr. You could live for some time with it.
Of course, like most people on the thread, I would rather not recommend this approach.
You can also use base_dir to change the base directory of the projects.

Best ways to manage generated artifacts for web service/xml bindings in a java webapp/client?

I'm working on a couple of web services that use JAXB bindings for the messages (in JAX-WS or spring-ws). When using these bindings there's always some code that is automatically generated from the WSDL to bind the message objects. I'm struggling to figure out the best way I can make this work so that it's easy to work with, hard to break and integrates nicely with IDEs (mostly using eclipse).
I think there are a couple of ways to go about this. The three main options I see right now are:
Generate code, keep the source artifacts and check them into the repository. Pros: integrates easily with IDEs (source highlighting etc), works within the build system. Cons: generated code changes each time you regenerate it, possibly creating noisy commits. It's also redundant since the WSDL file is already checked in, usually.
Generate code as part of the build process. Don't keep source artifacts or only keep them in output directories. Pros: fixes all the cons from the previous one. Cons: harder to integrate with IDE, though maybe this build step can be run automatically? I currently use this on one of my projects but the first time I checkout the project it appears broken, which is a minor nuisance.
Keep generated bindings in separate libraries (jars) included with maven or manually updated jars, depending on your build process. I got the idea from a thread on java.net. This seems more stable and uses explicit versioning but seems a bit heavyweight.
Which one of these options would you implement and how? We're currently using maven and eclipse, so any ideas in that regard would be great. I think this problem generalises to most other build systems and IDE combinations though, even other languages perhaps.

I went for option 3. If you already host your own repository (and optionally CI), it's not that heavyweight. All it takes is a simple POM. It's even possible to include some utility/wrapper/builder classes (that often make life easier with generated classes) and use them in several projects.

I'd go for option 2 and generate code in the "standard" ${project.build.directory}/generated-sources/<toolname> location as part of the build process. Using generated sources is well supported by m2eclipse (use Maven > Update Project Configuration once sources have been generated) and, if I remember well, by the maven eclipse plugin as well (i.e. the folder will be added to the Java Build Path). Actually, I think NetBeans also handle this fine. Not sure for Idea.
For the generation itself, you may need the maven-jaxb2-plugin if I understood correctly.

How to modularize a (large) Java App?

I have a rather large (several MLOC) application at hand that I'd like to split up into more maintainable separate parts. Currently the product is comprised of about 40 Eclipse projects, many of them having inter-dependencies. This alone makes a continuous build system unfeasible, because it would have to rebuild very much with each checkin.
Is there a "best practice" way of how to
identify parts that can immediately be separated
document inter-dependencies visually
untangle the existing code
handle "patches" we need to apply to libraries (currently handled by putting them in the classpath before the actual library)
If there are (free/open) tools to support this, I'd appreciate pointers.
Even though I do not have any experience with Maven it seems like it forces a very modular design. I wonder now whether this is something that can be retrofitted iteratively or if a project that was to use it would have to be layouted with modularity in mind right from the start.
Edit 2009-07-10
We are in the process of splitting out some core modules using Apache Ant/Ivy. Really helpful and well designed tool, not imposing as much on you as maven does.
I wrote down some more general details and personal opinion about why we are doing that on my blog - too long to post here and maybe not interesting to everyone, so follow at your own discretion: www.danielschneller.com

Using OSGi could be a good fit for you. It would allow to create modules out of the application. You can also organize dependencies in a better way. If you define your interfaces between the different modules correctly, then you can use continuous integration as you only have to rebuild the module that you affected on check-in.
The mechanisms provided by OSGi will help you untangle the existing code. Because of the way the classloading works, it also helps you handle the patches in an easier way.
Some concepts of OSGi that seem to be a good match for you, as shown from wikipedia:
The framework is conceptually divided into the following areas:
Bundles - Bundles are normal jar components with extra manifest headers.
Services - The services layer connects bundles in a dynamic way by offering a publish-find-bind model for plain old Java objects(POJO).
Services Registry - The API for management services (ServiceRegistration, ServiceTracker and ServiceReference).
Life-Cycle - The API for life cycle management (install, start, stop, update, and uninstall bundles).
Modules - The layer that defines encapsulation and declaration of dependencies (how a bundle can import and export code).
Security - The layer that handles the security aspects by limiting bundle functionality to pre-defined capabilities.

First: good luck & good coffee. You'll need both.
I once had a similiar problem. Legacy code with awful circular dependencies, even between classes from different packages like org.example.pkg1.A depends on org.example.pk2.B and vice versa.
I started with maven2 and fresh eclipse projects. First I tried to identify the most common functionalities (logging layer, common interfaces, common services) and created maven projects. Each time I was happy with a part, I deployed the library to the central nexus repository so that it was almost immediately available for other projects.
So I slowly worked up through the layers. maven2 handled the dependencies and the m2eclipse plugin provided a helpful dependency view. BTW - it's usually not too difficult to convert an eclipse project into a maven project. m2eclipse can do it for you and you just have to create a few new folders (like src/main/java) and adjust the build path for source folders. Takes just a minute or two. But expect more difficulties, if your project is an eclipse plugin or rcp application and you want maven not only to manage artifacts but also to build and deploy the application.
To opinion, eclipse, maven and nexus (or any other maven repository manager) are a good basis to start. You're lucky, if you have a good documentation of the system architecture and this architecture is really implemented ;)

I had a similar experience in a small code base (40 kloc). There are no °rules":
compiled with and without a "module" in order to see it's usage
I started from "leaf modules", modules without other dependencies
I handled cyclic dependencies (this is a very error-prone task)
with maven there is a great deal with documentation (reports) that can be deployed
in your CI process
with maven you can always see what uses what both in the site both in netbeans (with a
very nice directed graph)
with maven you can import library code in your codebase, apply source patches and
compile with your products (sometimes this is very easy sometimes it is very
difficult)
Check also Dependency Analyzer:
(source: javalobby.org)
Netbeans:
(source: zimmer428.net)

Maven is painful to migrate to for an existing system. However it can cope with 100+ module projects without much difficulty.

The first thing you need to decide is what infra-structure you will move to. Should it be a lot of independently maintained modules (which translates to individual Eclipse projects) or will you consider it a single chunk of code which is versioned and deployed as a whole. The first is well suited for migrating to a Maven like build environment - the latter for having all the source code in at once.
In any case you WILL need a continuous integration system running. Your first task is to make the code base build automatically, so you can let your CI system watch over your source repository and rebuild it whenyou change things. I decided for a non-Maven approach here, and we focus on having an easy Eclipse environment so I created a build enviornment using ant4eclipse and Team ProjectSet files (which we use anyway).
The next step would be getting rid of the circular dependencies - this will make your build simpler, get rid of Eclipse warnings, and eventually allow you to get to the "checkout, compile once, run" stage. This might take a while :-( When you migrate methods and classes, do not MOVE them, but extract or delegate them and leave their old name lying around and mark them deprecated. This will separate your untangeling with your refactoring, and allow code "outside" your project to still work with the code inside your project.
You WILL benefit from a source repository which allows for moving files, and keeping history. CVS is very weak in this regard.

I wouldn't recommend Maven for a legacy source code base. It could give you many headaches just trying to adapt everything to work with it.
I suppose what you need is to do an architectural layout of your project. A tool might help, but the most important part is to organize a logical view of the modules.

It's not free but Structure101 will give you as good as you will get in terms of tool support for hitting all your bullet points. But for the record I'm biased, so you might want to check out SonarJ and Lattix too. ;-)

How many multiple "Eclipse Projects" is considered too excessive for one actual development project?

I'm currently working on a project that contains many different Eclipse projects referencing each other to make up one large project. Is there a point where a developer should ask themselves if they should rethink the way their development project is structured?
NOTE: My project currently contains 25+ different Eclipse projects.

My general rule of thumb is I would create a new project for every reusable component. So for example if I have some isolated functionality that can be packaged say as a jar, I would create a new project so I can build,package and distribute the component independently.
Also, if there are certain projects that you do not need to make frequent changes to, you can build them only when required and keep them "closed" in eclipse to save time on indexing, etc. Even if you think that a certain component is not reusable, as long as it is separated from the rest of the code base in terms of logic/concerns you may be well served by just separating it out. Sometimes seemingly specific code might be reusable in another project or in a future version of the same project.

When compiled, a project would typically result in a jar. So if your application consists of potentially reusable components, it is ok to use a project for each.
I'm a big fan of using a lot of projects, I feel that this "breaks down" large things beyond what I can do with packages, and helps me orient and navigate.
Of course, if you're developing Eclipse plug-ins, everything would be a project anyway.
The only thing I would watch out for has to do with your source-control and it's ability to handle moves of files between projects. Subclipse had been giving me trouble with it, or maybe it's my SVN server that did.

If your project has that many sub-projects, or modules, needed to actually compose your final artifact then it is time to look at having something like Maven and setting up a multi-module project. It will a) allow you to work on each module independently without ide worries and allow easy setup in your ide (and others' IDEs) through the mvn eclipse:eclipse goal. In addition, when building your entire top level project, maven will be able to derive from list of dependencies you have described what modules need to be built in what order.
Here's a quick link via google and a link to the book Maven: The Definitive Guide, which will explain things in much better detail in chapter 6 (once you have the basics).
This will also force your project to not be explicitly tied to Eclipse. Being able to build independent from an ide means that any Joe Schmoe can come along and easily work with your code base using whatever tools he/she needs.

Create jars for the projects you don't work in often. That should greatly reduce the clutter. If you work on all the projects often, then you can add targets to your build that will jar up the respective projects for you, which condenses everything down to one file that you can then include on the class path.

An additional method is to create many different workspaces. The benefit of separate workspaces is that you can remove some of the visual clutter/ performance overhead of having lots of projects. You can use targets to jar up all of you projects and put them in a repository so you can reference them in each workspace.

At a former job the entire application was more then +170 projects. While it was rarely necessary to have all projects checked out locally, even the 30-40 projects constantly in our scope made reindexing, etc. very slow.

Yeesh. One Project for each Project. If you are using reusable projects, make them into a library for heavens sake. Break the none re-usable projects into packages, that's what they are there for.

That's a hard question and answers span from having one eclipse project at all to having one eclipse project for every single class.
My bottomline:
You can have too few projects,
and never too many (of course use
automation e.g. mvn eclipse:eclipse)
Use
-Declipse.useProjectReferences=true/false
when using maven to switch workspace
mode btw jar and project
dependencies
Use mvn release plugin to generate
consecutive releases (automatic
version increase)
Multiple projects gives you
independent versioning which is
extremely important. E.g. one dev may work on a new version of a
module while you still depends on
the previous one and you at some
point decide to upgrade to the newer
version(possibly by increasing its version in pom.xml dependency section). Or in other scenario if one
project contains a bug you downgrade
to its previous version.
Multiple projects makes you think
about the architecture more than if
you have just packages.
Multiple projects generally make
architectural problems evident more
than if you have just one project.
Anyone would like to comment on
this?
You never know if you project
evolves into OSGI/SOA/EDA where you
need separation.
Even if you're 100% sure that you
projects will be deployed as one jar
in an old way in a single jvm, it
still does not hurt(mvn assembly
plugin) to have multiple eclipse
projects for logically independent
pieces of code
BTW, the project I work on is divided into 24 eclipse projects.

Hell, we have more than 100. Projects don't cost anything.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.