How to modularize a (large) Java App? - java

I have a rather large (several MLOC) application at hand that I'd like to split up into more maintainable separate parts. Currently the product is comprised of about 40 Eclipse projects, many of them having inter-dependencies. This alone makes a continuous build system unfeasible, because it would have to rebuild very much with each checkin.
Is there a "best practice" way of how to
identify parts that can immediately be separated
document inter-dependencies visually
untangle the existing code
handle "patches" we need to apply to libraries (currently handled by putting them in the classpath before the actual library)
If there are (free/open) tools to support this, I'd appreciate pointers.
Even though I do not have any experience with Maven it seems like it forces a very modular design. I wonder now whether this is something that can be retrofitted iteratively or if a project that was to use it would have to be layouted with modularity in mind right from the start.
Edit 2009-07-10
We are in the process of splitting out some core modules using Apache Ant/Ivy. Really helpful and well designed tool, not imposing as much on you as maven does.
I wrote down some more general details and personal opinion about why we are doing that on my blog - too long to post here and maybe not interesting to everyone, so follow at your own discretion: www.danielschneller.com

Using OSGi could be a good fit for you. It would allow to create modules out of the application. You can also organize dependencies in a better way. If you define your interfaces between the different modules correctly, then you can use continuous integration as you only have to rebuild the module that you affected on check-in.
The mechanisms provided by OSGi will help you untangle the existing code. Because of the way the classloading works, it also helps you handle the patches in an easier way.
Some concepts of OSGi that seem to be a good match for you, as shown from wikipedia:
The framework is conceptually divided into the following areas:
Bundles - Bundles are normal jar components with extra manifest headers.
Services - The services layer connects bundles in a dynamic way by offering a publish-find-bind model for plain old Java objects(POJO).
Services Registry - The API for management services (ServiceRegistration, ServiceTracker and ServiceReference).
Life-Cycle - The API for life cycle management (install, start, stop, update, and uninstall bundles).
Modules - The layer that defines encapsulation and declaration of dependencies (how a bundle can import and export code).
Security - The layer that handles the security aspects by limiting bundle functionality to pre-defined capabilities.

First: good luck & good coffee. You'll need both.
I once had a similiar problem. Legacy code with awful circular dependencies, even between classes from different packages like org.example.pkg1.A depends on org.example.pk2.B and vice versa.
I started with maven2 and fresh eclipse projects. First I tried to identify the most common functionalities (logging layer, common interfaces, common services) and created maven projects. Each time I was happy with a part, I deployed the library to the central nexus repository so that it was almost immediately available for other projects.
So I slowly worked up through the layers. maven2 handled the dependencies and the m2eclipse plugin provided a helpful dependency view. BTW - it's usually not too difficult to convert an eclipse project into a maven project. m2eclipse can do it for you and you just have to create a few new folders (like src/main/java) and adjust the build path for source folders. Takes just a minute or two. But expect more difficulties, if your project is an eclipse plugin or rcp application and you want maven not only to manage artifacts but also to build and deploy the application.
To opinion, eclipse, maven and nexus (or any other maven repository manager) are a good basis to start. You're lucky, if you have a good documentation of the system architecture and this architecture is really implemented ;)

I had a similar experience in a small code base (40 kloc). There are no °rules":
compiled with and without a "module" in order to see it's usage
I started from "leaf modules", modules without other dependencies
I handled cyclic dependencies (this is a very error-prone task)
with maven there is a great deal with documentation (reports) that can be deployed
in your CI process
with maven you can always see what uses what both in the site both in netbeans (with a
very nice directed graph)
with maven you can import library code in your codebase, apply source patches and
compile with your products (sometimes this is very easy sometimes it is very
difficult)
Check also Dependency Analyzer:
(source: javalobby.org)
Netbeans:
(source: zimmer428.net)

Maven is painful to migrate to for an existing system. However it can cope with 100+ module projects without much difficulty.

The first thing you need to decide is what infra-structure you will move to. Should it be a lot of independently maintained modules (which translates to individual Eclipse projects) or will you consider it a single chunk of code which is versioned and deployed as a whole. The first is well suited for migrating to a Maven like build environment - the latter for having all the source code in at once.
In any case you WILL need a continuous integration system running. Your first task is to make the code base build automatically, so you can let your CI system watch over your source repository and rebuild it whenyou change things. I decided for a non-Maven approach here, and we focus on having an easy Eclipse environment so I created a build enviornment using ant4eclipse and Team ProjectSet files (which we use anyway).
The next step would be getting rid of the circular dependencies - this will make your build simpler, get rid of Eclipse warnings, and eventually allow you to get to the "checkout, compile once, run" stage. This might take a while :-( When you migrate methods and classes, do not MOVE them, but extract or delegate them and leave their old name lying around and mark them deprecated. This will separate your untangeling with your refactoring, and allow code "outside" your project to still work with the code inside your project.
You WILL benefit from a source repository which allows for moving files, and keeping history. CVS is very weak in this regard.

I wouldn't recommend Maven for a legacy source code base. It could give you many headaches just trying to adapt everything to work with it.
I suppose what you need is to do an architectural layout of your project. A tool might help, but the most important part is to organize a logical view of the modules.

It's not free but Structure101 will give you as good as you will get in terms of tool support for hitting all your bullet points. But for the record I'm biased, so you might want to check out SonarJ and Lattix too. ;-)

Related

Package vs project separation in java

First off, I'm coming (back) to Java from C#, so apologies if my terminology or philosophy doesn't quite line up.
Here's the background: we've got a growing collection of internal support tools written for the web. They use HTML5/AJAX/other buzzwords for the frontend and Java for the backend. These tools utilize a lightweight in-house framework so they can share an administrative interface for security and other configuration. Each tool has been written by a separate author and I expect that trend to continue, so I'd like to make it easy for future authors to stay "standardized" on the third-party libraries that we've already decided to use for things like DI, unit testing, ORM, etc.
Our package naming currently looks like this:
com.ourcompany.tools.framework
com.ourcompany.tools.apps.app1name
com.ourcompany.tools.apps.app2name
...and so on.
So here's my question: should each of these apps (and the framework) be treated as a separate project for purposes of Maven setup, Eclipse, etc?
We could have lots of apps appear here over time, so it seems like separation would keep dependencies cleaner and let someone jump in on a single tool more easily. On the other hand, (1) maybe "splitting" deeper portions of a package structure over multiple projects is a code smell and (2) keeping them combined would make tool writers more inclined to use third-party libraries already in place for the other tools.
FWIW, my initial instinct is to separate them.
What say you, Java gurus?
I would absolutely separate them. For the purposes of Maven, make sure each app/project has the appropriate dependencies to the framework/apps so you don't have to build everything when you just want to build a single app.
I keep my projects separated out, but use a parent pom for including all of the dependencies and other common properties. Individual tools / projects have a name and a reference to the parent project, and any project-specific dependencies, if any. This works for helping to keep to common libraries and dependencies, since the common ones are already all configured, but allows me to focus on the specific portion of the codebase that I need to work with.
I'd definitely separate these kind of things out into separate projects.
You should use Maven to handle the dependencies / build process automatically (both for your own internal shared libraries and third party dependencies). There won't be any issue having multiple applications reference the same shared libraries - you can even keep multiple versions around if you need to.
Couple of bonuses from this approach:
This forces you to think carefully about your API design for the shared projects which will be a good thing in the long run.
It will probably also give you about the right granularity for source code control - i.e. your developers can check out and work on specific applications or backend modules individually
If there is a section of a project that is likely to be used on more than one project it makes sense to pull that out. It will make it a little cleaner as well if you need to update the code in one of the commonly used projects.
If you keep them together you will have fewer obstacles developing, building and deploying your tools.
We had the opposite situation, having many separate projects. After merging them into one project tree we are much more productive and this is more important to us than whatever conventions happen to be trending.

Extending/Inheriting Tomcat Projects

We are developing webapps with Eclipse + Tomcat plugin. We recently started a new app which will run on Facebook and StudiVZ (FB competitor in Germany). Since the functionality of the app will be 95% the same we split the code into separate Eclipse projects (app-core, app-facebook, app-vz). The -core project is source-linked into the -facebook and -vz projects in Eclipse. We are also using Hudson for CI and made ant scripts that import the code from the -core project before building. So basically we tried to inherit on a project level.
The described method has some flaws:
Versioning is complicated
The -core project does not run standalone, which makes automatic testing partly impossible
We need to modify some models where the -core projects classes depend on
Other problems that make me think this is not the best solution
Does anyone have suggestions for a better solution?
There are a wealth of build tools available for Java that address dependency management and versioning specifically. Many of these integrate with Hudson and Eclipse.
I'd suggest looking at Maven and how it does dependency management as a good starting point. Even if you don't use Maven itself, many of the solutions out there build on Maven's dependency management mechanism. Something like Apache Ivy allows you to use maven dependency management, but still use your own custom Ant scripts; whereas something like Gradle is wholesale replacement.
You should be able to split your project into 3 or more parts and then establish dependencies via Java Build Path. You need to clean up the dependencies between the projects. If you need to configure your core components depending on whether it is a -facebook or a -vz project, you might need to separate configuration, maybe even use Spring or similar dependency injection framework.
When trying to introduce reuse into web-based Java projects, usually the problems arise in the UI code. Not many frameworks were built with this approach in mind.
I don't use/hate Eclipse[1], but can point to how we deal with a similar problem.
We use Maven with IntelliJ. In particular, both of these support modules which have defined internal dependencies. In your case it could be -fb and -vz modules depending on core, or you can split core into smaller parts (such as DAO, business logic, etc.).
When compiling, deliverables of "upper" modules would be used to build "lower" modules.
Let's go over points/flaws you have raised:
versioning is no longer a problem as everything sits under the same root of Subversion/GIT/VCS of your choice
Why is that a problem? Certainly this shouldn't be an issue for unit tests as how I understand TDD, these should not require complex environments. For automated tests, you would have to test the core API (as this is the interface between core and everything else, right?) hence this shouldn't require any fronted stuff?
you need to explain your other points to tell why you don't like it
It is against Geneva convention to ask a developer to use anything other than IDE of his/her choice.

Best way to manage multi-module projects?

I have a medium size project split into 3 modules: Core, plugins (in short its an interpretation layer), and implementation. There are a few global dependencies, and module specific dependencies. There is a custom ant target for generating javadoc excluding the implementation (for obvious reasons). This is stored in an public online SVN repository and therefor needs to be independent of any machine sans the JRE
Right now I'm using the built in NetBeans project management, and it sucks, probably mainly do the fact that the project management system was not designed for modules. Lack of a global library set (you can import a library specific to your nb installation, but then it doesn't get updated), lack of auto resolving of library dependencies (dependency on a project means the project and its dependencies), lack of an independent multi-project formatting style (either tied to profile specific "Global options" or individually setup and synced module-specific options), and other things make managing my project a pain.
When I was experimenting with IDEA, one of the things I loved was its project management. It was close to what I wanted, but like most things in IDEA could of been simpler. However the IDE itself was bad (not up for debate), so I switched back to NetBeans. And Maven looks bad, both from having to traverse its file structure manually and general opinion.
Are their better options out there that can be stored in a standard SVN repository with limited tools to use, are pretty easy to use for 1-3 developers, and for 2-5 modules? It must be able to handle java, and (in the perfect world) integration with NetBeans.
Honestly maven is your best bet. I wouldn't knock it you haven't actually tried it yet. It tends to be a very divisive technology, but those who love it love it for a damn good reason. If you are someone who prefers to keep your hands off the build script/files after you initially set it up, and it looks like you are given you were using Netbeans' built in projects which generate an ant build.xml behind the scenes, then you should just try maven and see what happens.
I'm not sure why you think you need to "traverse the directory structure" with maven if you are in netbeans. See this screenshot for an example of what it looks like. You don't ever see src/main/java or target/ or anything on the file system (unless you need to).
(source: netbeans.org)
If you use a maven multi-module project, you'll get the modularity you are looking for within Netbeans as well. If you want a sample, go checkout an open source project that has tons of modules and load it in Netbeans and play around with it: http://camel.apache.org/source.html

Why maven? What are the benefits? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
What are the main benefits of using maven compared to let's say ant ?
It seems to be more of a annoyance than a helpful tool.
I use maven 2, with plain Eclipse Java EE (no m2eclipse), and tomcat.
Supporters of maven believe that
Maven lets you get your package dependencies easily
Maven forces you to have a standard directory structure
In my experience
Figuring out package dependencies is really not that hard. You rarely do it anyway. Probably once during project setup and few more during upgrades. With maven you'll end up fixing mismatched dependencies, badly written poms, and doing package exclusions anyway.
Slow FIX-COMPILE-DEPLOY-DEBUG cycle, which kills productivity. This is my main gripe. You make a change, the you have to wait for maven build to kick in and wait for it to deploy. No hot deployment whatsoever.
Or am I just doing it wrong ? Please point me to the right direction, I'm all ears.
Figuring out package dependencies is really not that hard. You rarely do it anyway. Probably once during project setup and few more during upgrades. With maven you'll end up fixing mismatched dependencies, badly written poms, and doing package exclusions anyway.
Not that hard... for toy projects. But the projects I work on have many, really many, of them, and I'm very glad to get them transitively, to have a standardized naming scheme for them. Managing all this manually by hand would be a nightmare.
And yes, sometimes you have to work on the convergence of dependencies. But think about it twice, this is not inherent to Maven, this is inherent to any system using dependencies (and I am talking about Java dependencies in general here).
So with Ant, you have to do the same work except that you have to do everything manually: grabbing some version of project A and its dependencies, grabbing some version of project B and its dependencies, figuring out yourself what exact versions they use, checking that they don't overlap, checking that they are not incompatible, etc. Welcome to hell.
On the other hand, Maven supports dependency management and will retrieve them transitively for me and gives me the tooling I need to manage the complexity inherent to dependency management: I can analyze a dependency tree, control the versions used in transitive dependencies, exclude some of them if required, control the converge across modules, etc. There is no magic. But at least you have support.
And don't forget that dependency management is only a small part of what Maven offers, there is much more (not even mentioning the other tools that integrates nicely with Maven, e.g. Sonar).
Slow FIX-COMPILE-DEPLOY-DEBUG cycle, which kills productivity. This is my main gripe. You make a change, the you have to wait for maven build to kick in and wait for it to deploy. No hot deployment whatsoever.
First, why do you use Maven like this? I don't. I use my IDE to write tests, code until they pass, refactor, deploy, hot deploy and run a local Maven build when I'm done, before to commit, to make sure I will not break the continuous build.
Second, I'm not sure using Ant would make things much better. And to my experience, modular Maven builds using binary dependencies gives me faster build time than typical monolithic Ant builds. Anyway, have a look at Maven Shell for a ready to (re)use Maven environment (which is awesome by the way).
So at end, and I'm sorry to say so, it's not really Maven that is killing your productivity, it's you misusing your tools. And if you're not happy with it, well, what can I say, don't use it. Personally, I'm using Maven since 2003 and I never looked back.
Maven can be considered as complete project development tool not just build tool like Ant.
You should use Eclipse IDE with maven plugin to fix all your problems.
Here are few advantages of Maven, quoted from the Benefits of using Maven page:
Henning
quick project setup, no complicated build.xml files, just a POM and go
all developers in a project use the same jar dependencies due to
centralized POM.
getting a number of reports and metrics for a project "for free"
reduce the size of source distributions, because jars can be
pulled from a central location
Emmanuel Venisse
a lot of goals are available so it isn't necessary to develop some
specific build process part contrary
to ANT we can reuse existing ANT tasks
in build process with antrun plugin
Jesse Mcconnell
Promotes modular design of code. by making it simple to manage mulitple
projects it allows the design to be
laid out into muliple logical parts,
weaving these parts together through
the use of dependency tracking in pom
files.
Enforces modular design of code. it is easy to pay lipservice to modular
code, but when the code is in seperate
compiling projects it is impossible to
cross pollinate references between
modules of code unless you
specifically allow for it in your
dependency management... there is no
'I'll just do this now and fix it
later' implementations.
Dependency Management is clearly declared. with the dependency
management mechanism you have to try
to screw up your jar
versioning...there is none of the
classic problem of 'which version of
this vendor jar is this?' And setting
it up on an existing project rips the
top off of the existing mess if it
exists when you are forced to make
'unknown' versions in your repository
to get things up and running...that or
lie to yourself that you know the
actual version of ABC.jar.
strong typed life cycle there is a strong defined lifecycle that a
software system goes thru from the
initiation of a build to the end...
and the users are allowed to mix and
match their system to the lifecycle
instead of cobble together their own
lifecycle.. this has the additional
benefit of allowing people to move
from one project to another and speak
using the same vocabulary in terms of
software building
Vincent Massol
Greater momentum: Ant is now legacy and not moving fast ahead. Maven is
forging ahead fast and there's a
potential of having lots of high-value
tools around Maven (CI, Dashboard
project, IDE integration, etc).
Figuring out dependencies for small projects is not hard. But once you start dealing with a dependency tree with hundreds of dependencies, things can easily get out of hand. (I'm speaking from experience here ...)
The other point is that if you use an IDE with incremental compilation and Maven support (like Eclipse + m2eclipse), then you should be able to set up edit/compile/hot deploy and test.
I personally don't do this because I've come to distrust this mode of development due to bad experiences in the past (pre Maven). Perhaps someone can comment on whether this actually works with Eclipse + m2eclipse.
Maven is one of the tools where you need to actually decide up front that you like it and want to use it, since you will spend quite some time learning it, and having made said decision once and for all will allow you to skip all kinds of doubt while learning (because you like it and want to use it)!
The strong conventions help in many places - like Hudson that can do wonders with Maven projects - but it may be hard to see initially.
edit: As of 2016 Maven is the only Java build tool where all three major IDEs can use the sources out of the box. In other words, using maven makes your build IDE-agnostic. This allows for e.g. using Netbeans profiling even if you normally work In eclipse
Maven advantages over ant are quite a few. I try to summarize them here.
Convention over Configuration
Maven uses a distinctive approach for the project layout and startup, that makes easy to just jump in a project. Usually it only takes the checkount and the maven command to get the artifacts of the project.
Project Modularization
Project conventions suggest (or better, force) the developer to modularize the project. Instead of a monolithic project you are often forced to divide your project in smaller sub components, which make it easier debug and manage the overall project structure
Dependency Management and Project Lifecycle
Overall, with a good SCM configuration and an internal repository, the dependency management is quite easy, and you are again forced to think in terms of Project Lifecycle - component versions, release management and so on. A little more complex than the ant something, but again, an improvement in quality of the project.
What is wrong with maven?
Maven is not easy. The build cycle (what gets done and when) is not so clear within the POM. Also, some issue arise with the quality of components and missing dependencies in public repositories.
The best approach (to me) is to have an internal repository for caching (and keeping) dependencies around, and to apply to release management of components. For projects bigger than the sample projects in a book, you will thank maven before or after
Maven can provide benefits for your build process by employing standard conventions and practices to accelerate your development cycle while at the same time helping you achieve a higher rate of success. For a more detailed look at how Maven can help you with your development process please refer to The Benefits of Using Maven.
Maven is a powerful project management tool that is based on POM (project object model). It is used for projects build, dependency and documentation.
It simplifies the build process like ANT. But it is too much advanced than ANT.
Maven helps to manage-
Builds,Documentation,Reporing,SCMs,Releases,Distribution.
- maven repository is a directory of packaged JAR file with pom.xml file. Maven searches for dependencies in the repositories.
I've never come across point 2? Can you explain why you think this affects deployment in any way. If anything maven allows you to structure your projects in a modularised way that actually allows hot fixes for bugs in a particular tier, and allows independent development of an API from the remainder of the project for example.
It is possible that you are trying to cram everything into a single module, in which case the problem isn't really maven at all, but the way you are using it.
This should have been a comment, but it wasn't fitting in a comment length, so I posted it as an answer.
All the benefits mentioned in other answers are achievable by simpler means than using maven. If, for-example, you are new to a project, you'll anyway spend more time creating project architecture, joining components, coding than downloading jars and copying them to lib folder. If you are experienced in your domain, then you already know how to start off the project with what libraries. I don't see any benefit of using maven, especially when it poses a lot of problems while automatically doing the "dependency management".
I only have intermediate level knowledge of maven, but I tell you, I have done large projects(like ERPs) without using maven.

How many multiple "Eclipse Projects" is considered too excessive for one actual development project?

I'm currently working on a project that contains many different Eclipse projects referencing each other to make up one large project. Is there a point where a developer should ask themselves if they should rethink the way their development project is structured?
NOTE: My project currently contains 25+ different Eclipse projects.
My general rule of thumb is I would create a new project for every reusable component. So for example if I have some isolated functionality that can be packaged say as a jar, I would create a new project so I can build,package and distribute the component independently.
Also, if there are certain projects that you do not need to make frequent changes to, you can build them only when required and keep them "closed" in eclipse to save time on indexing, etc. Even if you think that a certain component is not reusable, as long as it is separated from the rest of the code base in terms of logic/concerns you may be well served by just separating it out. Sometimes seemingly specific code might be reusable in another project or in a future version of the same project.
When compiled, a project would typically result in a jar. So if your application consists of potentially reusable components, it is ok to use a project for each.
I'm a big fan of using a lot of projects, I feel that this "breaks down" large things beyond what I can do with packages, and helps me orient and navigate.
Of course, if you're developing Eclipse plug-ins, everything would be a project anyway.
The only thing I would watch out for has to do with your source-control and it's ability to handle moves of files between projects. Subclipse had been giving me trouble with it, or maybe it's my SVN server that did.
If your project has that many sub-projects, or modules, needed to actually compose your final artifact then it is time to look at having something like Maven and setting up a multi-module project. It will a) allow you to work on each module independently without ide worries and allow easy setup in your ide (and others' IDEs) through the mvn eclipse:eclipse goal. In addition, when building your entire top level project, maven will be able to derive from list of dependencies you have described what modules need to be built in what order.
Here's a quick link via google and a link to the book Maven: The Definitive Guide, which will explain things in much better detail in chapter 6 (once you have the basics).
This will also force your project to not be explicitly tied to Eclipse. Being able to build independent from an ide means that any Joe Schmoe can come along and easily work with your code base using whatever tools he/she needs.
Create jars for the projects you don't work in often. That should greatly reduce the clutter. If you work on all the projects often, then you can add targets to your build that will jar up the respective projects for you, which condenses everything down to one file that you can then include on the class path.
An additional method is to create many different workspaces. The benefit of separate workspaces is that you can remove some of the visual clutter/ performance overhead of having lots of projects. You can use targets to jar up all of you projects and put them in a repository so you can reference them in each workspace.
At a former job the entire application was more then +170 projects. While it was rarely necessary to have all projects checked out locally, even the 30-40 projects constantly in our scope made reindexing, etc. very slow.
Yeesh. One Project for each Project. If you are using reusable projects, make them into a library for heavens sake. Break the none re-usable projects into packages, that's what they are there for.
That's a hard question and answers span from having one eclipse project at all to having one eclipse project for every single class.
My bottomline:
You can have too few projects,
and never too many (of course use
automation e.g. mvn eclipse:eclipse)
Use
-Declipse.useProjectReferences=true/false
when using maven to switch workspace
mode btw jar and project
dependencies
Use mvn release plugin to generate
consecutive releases (automatic
version increase)
Multiple projects gives you
independent versioning which is
extremely important. E.g. one dev may work on a new version of a
module while you still depends on
the previous one and you at some
point decide to upgrade to the newer
version(possibly by increasing its version in pom.xml dependency section). Or in other scenario if one
project contains a bug you downgrade
to its previous version.
Multiple projects makes you think
about the architecture more than if
you have just packages.
Multiple projects generally make
architectural problems evident more
than if you have just one project.
Anyone would like to comment on
this?
You never know if you project
evolves into OSGI/SOA/EDA where you
need separation.
Even if you're 100% sure that you
projects will be deployed as one jar
in an old way in a single jvm, it
still does not hurt(mvn assembly
plugin) to have multiple eclipse
projects for logically independent
pieces of code
BTW, the project I work on is divided into 24 eclipse projects.
Hell, we have more than 100. Projects don't cost anything.

Categories