Maintaining Clean Architecture in Spring MVC with a data-centric approach
I'm trying to map out the architecture for the front-end of a new Java-based web app (portal type application) we are making at work. I want to get this right from day one, and I would like to kick off a discussion here to help me implement Uncle Bob's Clean Architecture in my architectural design.
Here's a quick run-down of our tech stack, top to bottom (the technology isn't importance, the structure is):
Oracle Database
Oracle Service Bus exposing services using WSDLs
JAX-WS generated Java-classes from the WSDLs (let's call this the "generated service layer")
A Domain module consisting of POJOs mapped to the generated data objects
A Consumer-module exposing the "generated service layer" to the front-end application
A Spring MVC based front-end module using FreeMarker to render the views
A key point:
In particular, the name of something declared in an outer circle must not be mentioned by the code in the an inner circle. That includes, functions, classes. variables, or any other named software entity.
Attempting to adhere to Bob's Clean Architecture, I've gone back and forth a bit with myself regarding where to place the application logic, namely the "Use Case"-layer in his architecture.
Here is the approach I've come up with:
Layer 1 - Entities
Entities encapsulate Enterprise wide business rules.
This is where our Domain module containing the domain-objects lives, these are self-containing objects with minimal dependencies on each other. Only logic pertaining to the objects themselves may live on these domain objects, and no use-case specific logic.
Access to our database is exposed via WSDLs using a service bus that transforms the data, as opposed to an ORM like JPA or Hibernate. Because of this, we do not have "entities" in the traditional sense (with Ids), but a data-centric approach making this layer a data access layer, presented to the rest of the application by the Consumer-module.
Layer 2 - Use Cases
The software in this layer contains application specific business rules.
This is where logic specific to our application's use cases lives. Changes to this layer should not affect the data access layer (layer 1). Changes to the GUI or framework implementation (Spring MVC) should not affect this layer.
This is where it gets a little tricky:
Since our data access layer (in layer 1) must be kept clean of application logic, we need a layer that facilitates use of that layer in a fashion that suits the use cases. One solution I've found to this problem is the use a variant of the "MVVM-pattern" that I choose to call MVC-VM. See below for an explanation. The "VM"-part of this lives in this Use Case-layer, represented by *ViewModel-classes that encapsulate this Use Case-specific logic.
Layer 3 - Interface Adapters
The software in this layer is a set of adapters that convert data from the format most convenient for the use cases and entities, to the format most convenient for some external agency such as the Database or the Web.
This is where the MVC-architecture of our GUI lives (the "MVC" in our "MVC-VM"). Essentially this is when the Controller-classes get data from the *ViewModel-classes and puts it in Spring MVC's ModelMap ojects that are used directly by the FreeMarker-templates in the View.
The way I see it, the servicebus would in our case also fall in under this layer.
Layer 4 - Frameworks and Drivers
Generally you don’t write much code in this layer other than glue code that communicates to the next circle inwards.
This layer is really just a configuration-layer in our application, namely the Spring configuration. This would for example be where we specify that FreeMarker is used to render the view.
Model View ViewModel Pattern
MVVM facilitates a clear separation of the development of the graphical user interface (either as markup language or GUI code) from the development of the business logic or back end logic known as the model (also known as the data model to distinguish it from the view model). The view model of MVVM is a value converter meaning that the view model is responsible for exposing the data objects from the model in such a way that those objects are easily managed and consumed.
More on the MVVM-pattern at Wikipedia.
The MVC-VM roles would be fulfilled in our application like so:
Model - represented simply by the ModelMap datastructure in Spring MVC that is used by the view templates.
View - FreeMarker templates
Controller Spring's Controller-classes that directs HTTP URL requests to specific handlers (and as such functions as a FrontController). The handlers in these classes are responsible for fetching data from the use case-layer and pushing it out to the view templates when showing data(HTTP GET), as well as sending data down for storing (HTTP POST). This way it essentially functions as a binder between the ViewModel and View, using the Model.
ViewModel - These classes are responsible for 1) structuring data from the data access layer in a fashion that is usable by the View and 2) treat data-input from the View. "Treat" means to validate and to break down the data so that it can be sent down the stack for storing. This layer would take form as <UseCase>VM-classes in a viewmodel package in our Spring MVC front-end module.
A key component here is the implicit binding that happens in Spring MVC between ModelMap and the FreeMarker-templates. The templates only use the model (ModelMaps), where the controller has put the data in a format it can use. That way we can make templates like so:
<body>
<h1>Welcome ${user}!</h1>
<p>Our latest product:
${latestProduct.name}!
</body>
I apologize for the verbose explanation, but I could not explain this (relatively simple) architecture in fewer words.
I would greatly appreciate some input on my approach here - am I on the right track? Does the MVC-VM thing make sense? Am I violating any Clean Architecture Principles?
There are of course many solutions to this, but I am trying to find a solution that is 1) not over-engineered and 2) adheres to the principles of Bob's Clean Architecture.
Update:
I think the key issue that puts me off here is what form the "Use case" layer takes in this application. Remember we have an MVC front-end that gets data from a data access layer. If the MVC part fits in Bob's "Interfaces adapters", and the domain models of the data layer fit in Bob's "Entities" layer, then what do I call the use case classes that implement application logic? I am tempted to just call them <UseCase>Models and put them in the MVC project, but according to Bob
The models are likely just data structures that are passed from the controllers to the use cases, and then back from the use cases to the presenters and views.
so that means my model objects should be "dumb" (like a simple Map. ModelMap in Spring) and it is then the responsibility of the controller to put data from the Use Case class into this Map-structure.
So again, what form does my Use Case-classes take? How about <UseCase>Interactor?
But in conclusion I realize that the MVC-MV-thing is over-engineering (or simply incorrect) - as "mikalai" indicates below this essentially just a two-layer applcation in its current form; a data access layer and a front-end MVC layer. Simple as that.
Whoa that was a lot. And I think you have mostly translated Uncle Bob's jargon over to your Spring Java app.
Since architecture is mostly opinion and since your question is sort of asking for one...
There are many different styles of architecture and ... most are overrated. Because most are the same thing: higher cohesion and looser coupling through indirection and abstraction.
What matters MOST (IMHO) are the dependencies. Making lots of small projects as opposed to one giant monolithic project is the best way to get "clean" architecture.
Your most important technology for clean architecture will not be "Spring MVC" technology or "Freemarker" templating language, or another Dr. Dobb's article with diagrams of boxes, hexagons and various other abstract polygons.
Focus on your build and dependency management technology. It is because this technology will enforce your architecture rules.
Also if your code is hard to test.. you probably have bad architecture.
Focus on making your code easy to test and write lots of tests.
If you do that it will be easy to change your code with out worry ... so much you could even change your architecture :)
Beware of focusing too much an bull#%$## architecture rules. Seriously: if your code is easy to test, easy to change, easy to understand and performs wells then you have a good architecture. There is no 6 weeks to 6 pack abs article to do this (sorry Uncle Bob). It takes experience and time... there is no magic bullet plan.
So here my own "clean" architecture... I mean guidelines:
Make many small projects
Use dependency management (ie Maven, Gradle)
Refactor constantly
Understand and use some sort of dependency injection (Spring)
Write unit tests
Understand cross cutting concerns (ie when you need AspectJ, metaprogramming, etc..)
My solution
So it turns out that implementing Bob's "Clean Architecture" in Java/Spring MVC is borderline non-trivial and requires more architectural facets than I originally had included.
And I could actually not find any example of implementations online.
Evidently my architecture was missing a separate module for the "Use Case"-layer as this logic should not live in the Spring MVC Web module (and not be called "*ViewModel"). The Web/MVC module is simply a detail of the application, and the application logic should be completely separated from it, and separately testable.
This new "Use Case"-module now contains *Interactor-classes which get data from the domain module (entities). Moreover, "Request/Response Objects" are needed to facilitate the communication between the MVC/Web-module and the Use Case module.
My dependency chain now looks like this:
Spring MVC module -> Use Case module -> Domain module
where every arrow (dependency) takes form as a Boundary, meaning an interface is defined in the module to the right of the arrow that is implemented where required and injected where needed (Inversion of Control).
Here are the interfaces I ended up with (per Use Case):
I<UseCase>Request - implemented in the MVC module, instantiated in the Controller
I<UseCase>Response - implemented in the Use Case module, instantiated in the Interactor
I<UseCase>Interactor - implemented in the UseCase module, injected in the Controller
I<UseCase>Consumer - implemented in the Domain module, injected in the Interactor
How it works?
The Controller takes parameters from the HTTP request and packs it in a RequestModel which it passes down to the Interactor. The Interactor fetches the data it needs from the domain module *Consumer and imposes it's application specific logic on it, then puts it in a ResponseModel and sends it back up to the Controller. The Controller then finally simply puts this (now GUI-friendly) simple data in a Map object and forwards it to the FreeMarker template which then uses this data directly and renders the HTML.
A Presenter could get involved in that last part there, making this an implementation of the Model-View-Presenter pattern, but I'm leaving that for now.
My conclusion
I ended up with more files than what is necessary this early in development, strictly speaking. However as the complexity and size of the application grows, I am confident that this structure keeps it easy for us to maintain low coupling and high cohesion. Also, the Web-module is now easily replaceable - it just delivers requests to the use-case module and receives response-objects. Moreover, each layer of the application (domain logic, application logic, and GUI logic) is separately testable, with only the View-part requiring a webserver in order to be tested.
Thanks for all advice and pointers I received here. And please comment on my solution - I don't claim that it is perfect.
Because of this, we do not have "entities" in the traditional sense
(with Ids), but a data-centric approach making this layer a data
access layer, presented to the rest of the application by the
Consumer-module.
Something seems odd to me in that part. Why couldn't your entities have ID's even if you get them from web services ?
In the Clean Architecture approach, the Entities layer is precisely not a data access layer. Data access should be a detail in your architecture, not a central concern. As you said yourself, Entities contain domain-specific business rules. Business rules, or behavior, is very different from the way you fetch your data.
Entities is where all the domain logic happens, not where you get your data from. According to Clean Architecture, you get your persisted or external data from Gateways.
One solution I've found to this problem is the use a variant of the
"MVVM-pattern" that I choose to call MVC-VM. See below for an
explanation. The "VM"-part of this lives in this Use Case-layer,
represented by *ViewModel-classes that encapsulate this Use
Case-specific logic.
ViewModel clearly refers to a View, which is a presentation artifact -another detail. Use cases/Interactors should be devoid of such details. Instead, Interactors should send and receive delivery mechanism-agnostic data structures (RequestModels and ResponseModels) through Boundaries.
I understand that this is a custom pattern of yours and doesn't involve a reference to a presentation framework, but the word "View" is just misleading.
I've been asked to make an ETL-style application that transfers information from one data source to another. At the moment, I've decided to use a three-layer architecture but I would like to find out more about the best practices as well as the life cycle described on this wikipedia page:
http://en.wikipedia.org/wiki/Extract,_transform,_load
Four-layered approach for ETL architecture design
Functional layer: Core functional ETL processing (extract, transform, and load).
Operational management layer: Job-stream definition and management, parameters, scheduling, monitoring, communication and alerting.
Audit, balance and control (ABC) layer: Job-execution statistics, balancing and controls, rejects- and error-handling, codes management.
Utility layer: Common components supporting all other layers.
Real-life ETL cycle
The typical real-life ETL cycle consists of the following execution steps:
Cycle initiation
Build reference data
Extract (from sources)
Validate
Transform (clean, apply business rules, check for data integrity, create aggregates or disaggregates)
Stage (load into staging tables, if used)
Audit reports (for example, on compliance with business rules. Also, in case of failure, helps to diagnose/repair)
Publish (to target tables)
Archive
Clean up
I don't know what your situation is or what your requirements are, but you're likely over thinking the problem.
The name alone is "the" architecture:
Extract
Transform
Load
Exporting a DB table to a CSV can be considered "ET" while loading the CSV is the "L". Most ETL problems are simply not complicated.
Beyond that, you should grab any of the 1 or 2 million ETL and ESB packages already available in Java, free and commercial, libraries and full boat processing systems, and simply adopt one of them that you like best.
Get a white board, string some bubbles together with lines and turn that in to code.
To answer the question, "What's the best practice?" the answer depends on what you are trying to accomplish.
To simplify let's assume you are doing one of the following:
You are building a data warehouse that will restructure the data in some way
You are moving data from point A to point B, but you are not restructuring the data
When I use the word "restructuring", I mean changing the grain or lowest level of detail of a table.
For 1. The ten steps outlined in your question is generally followed. General best practices:
As much transformation logic as possible is pushed onto database resources, not ETL software (ETL software is generally slower)
Validate, Transform, and Audit steps are used to employ whatever Master Data Management (MDM) standards your organization uses
For 2. This is much more straightforward so either method outlined in your question can be used.
I'm wondering if anyone has any experience in "isolating" framework objects from each other (Spring, Hibernate, Struts). I'm beginning to see design "problems" where an object from one framework gets used in another object from a different framework. My fear is we're creating tightly coupled objects.
For instance, I have an application where we have a DynaActionForm with several attributes...one of which is a POJO generated by the Hibernate Tools. This POJO gets used everywhere...the JSP populates data to it, the Struts Action sends it down to a Service Layer, the DAO will persist it...ack!
Now, imagine that someone decides to do a little refactoring on that POJO...so that means the JSP, Action, Service, DAO all needs to be updated...which is kind of painful...There has got to be a better way?!
There's a book called Core J2EE Patterns: Best Practices and Design Strategies (2nd Edition)...is this worth a look? I don't believe it touches on any specific frameworks, but it looks like it might give some insight on how to properly layer the application...
Thanks!
For instance, I have an application where we have a DynaActionForm with several attributes...one of which is a POJO generated by the Hibernate Tools. This POJO gets used everywhere...the JSP populates data to it, the Struts Action sends it down to a Service Layer, the DAO will persist it...ack!
To me, there is nothing wrong with having Domain Objects as a "transveral" layer in a web application (after all, you want their state to go from the database to the UI and I don't see the need to map them into intermediate structures):
Now, imagine that someone decides to do a little refactoring on that POJO...so that means the JSP, Action, Service, DAO all needs to be updated...which is kind of painful...There has got to be a better way?!
Sure, you could read "Beans" from the database at the DAO layer level, map them into "Domain Objects" at the service layer and map the Domain Objects into "Value Objects" for the presentation layer and you would have very low coupling. But then you'll realize that:
Adding a column in a database usually means adding some information on the view and vice-versa.
Duplication of objects and mappings are extremely painful to do and to maintain.
And you'll forget this idea.
There's a book called Core J2EE Patterns: Best Practices and Design Strategies (2nd Edition)...is this worth a look? I don't believe it touches on any specific frameworks, but it looks like it might give some insight on how to properly layer the application...
This book was a "showcase" of how to implement (over engineered) applications using the whole J2EE stack (with EJB 2.x) and has somehow always been considered as too complicated (too much patterns). On top of that, it is today clearly outdated. So it is interesting but must be taken with a giant grain of salt.
In other words, I wouldn't recommend that book (at least certainly not as state of the art). Instead, have a look at Real World Java EE Patterns - Rethinking Best Practices (see Chapter 3 - Mapping of the Core J2EE patterns into Java EE) and/or the Spring literature if you are not using Java EE.
First, avoid Struts 1. Having to extend a framework class (like DynaActionForm) is one of the reasons this framework is no longer a good choice.
You don't use spring classes in the usual scenarios. Spring is non-invasive - it just wires your objects. You depend on it only if using some interfaces like ApplicationContextAware, or if you are using the hibernate or jdbc extensions. Using these extensions together with hibernate/jdbc completely fine and it is not an undesired coupling.
Update: If you are forced to work with Struts 1 (honestly, try negotiating for Struts 2, Struts 1 is obsolete!), the usual way to go was to create a copy of the Form class, that contained the exact same fields, but did not extend the framework class. There would be a factory method that takes the form class and returns the simple POJO. This is duplication of code, but I've seen it in practice and is not that bad (compared to the use of Struts 1 :) )
I think your problem is not so big as it seems.
Let's imagine, what can you really change in your POJO:
1) name of its class: any IDE with refactoring support will automatically make all necessary changes for you
2) add some field/method: it almost always means adding new functionality what is always should be done manually and carefully. It usually cause to some changes in your service layer, very seldom in DAO, and usually in your view (jsp).
3) change methods implementation: with good design this should cause any changes in other classes.
That's all, imho.
Make a decision about technology for implementing busyness-logic (EJB or Spring) and use its facilities of dependency injection. Using DI will make different parts of your program communicate to each other through interfaces. It should be enough for reaching necessary (small enough) level of coupling.
It's always nice to keep things clear if you can and separate the layers etc. But don't go overboard. I've seen systems where the developers were so intent on strictly adhering to their adopted patterns and practices that they ended up with a system worse than the imaginary one they were trying to avoid.
The art of good design is understanding the good practices and patterns, knowing when and how to apply them, but also knowing when it's appropriate to break or ignore them.
So take a good look at how you can achieve what you are after, read up on the patterns. Then do a trial on a separate proof of concept or a small part of your system to see your ideas in practice. My experience is that only once you actually put some code in place, do you really see the pros and cons of the idea. Once you have done that, you will be able to make an informed decision about what you will or will not introduce.
Finally, it's possible to build a system which does handle all the issues you are concerned about, but be pragmatic - is each goal you are attempting to reach worth the extra code and APIs you will have to introduce to reach it.
I'd say that Core J2EE Patterns: Best Practices and Design Strategies (2nd Edition) addresses EJB 2.0 concerns, some of which would be considered anti-patterns today. Knowledge is never wasted, but I wouldn't make this my first choice.
The problem is that it's impossible to decouple all the layers. Refactoring the POJO means modifying the problem you're solving, so all the layers DO have to be modified. There's no way around that.
Pure decoupling of layers that have no knowledge of each other requires a lot of duplication, translation, and mapping to occur. Don't fall for the idea that loose coupling means this work goes away.
One thing you can do is have a service layer that's expressed in terms of XML requests and responses. It forces you to map the XML to objects on the service side, but it does decouple the UI from the rest.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I thought about this awhile ago and it recently resurfaced as my shop is doing its first real Java web app.
As an intro, I see two main package naming strategies. (To be clear, I'm not referring to the whole 'domain.company.project' part of this, I'm talking about the package convention beneath that.) Anyway, the package naming conventions that I see are as follows:
Functional: Naming your packages according to their function architecturally rather than their identity according to the business domain. Another term for this might be naming according to 'layer'. So, you'd have a *.ui package and a *.domain package and a *.orm package. Your packages are horizontal slices rather than vertical.
This is much more common than logical naming. In fact, I don't believe I've ever seen or heard of a project that does this. This of course makes me leery (sort of like thinking that you've come up with a solution to an NP problem) as I'm not terribly smart and I assume everyone must have great reasons for doing it the way they do. On the other hand, I'm not opposed to people just missing the elephant in the room and I've never heard a an actual argument for doing package naming this way. It just seems to be the de facto standard.
Logical: Naming your packages according to their business domain identity and putting every class that has to do with that vertical slice of functionality into that package.
I have never seen or heard of this, as I mentioned before, but it makes a ton of sense to me.
I tend to approach systems vertically rather than horizontally. I want to go in and develop the Order Processing system, not the data access layer. Obviously, there's a good chance that I'll touch the data access layer in the development of that system, but the point is that I don't think of it that way. What this means, of course, is that when I receive a change order or want to implement some new feature, it'd be nice to not have to go fishing around in a bunch of packages in order to find all the related classes. Instead, I just look in the X package because what I'm doing has to do with X.
From a development standpoint, I see it as a major win to have your packages document your business domain rather than your architecture. I feel like the domain is almost always the part of the system that's harder to grok where as the system's architecture, especially at this point, is almost becoming mundane in its implementation. The fact that I can come to a system with this type of naming convention and instantly from the naming of the packages know that it deals with orders, customers, enterprises, products, etc. seems pretty darn handy.
It seems like this would allow you to take much better advantage of Java's access modifiers. This allows you to much more cleanly define interfaces into subsystems rather than into layers of the system. So if you have an orders subsystem that you want to be transparently persistent, you could in theory just never let anything else know that it's persistent by not having to create public interfaces to its persistence classes in the dao layer and instead packaging the dao class in with only the classes it deals with. Obviously, if you wanted to expose this functionality, you could provide an interface for it or make it public. It just seems like you lose a lot of this by having a vertical slice of your system's features split across multiple packages.
I suppose one disadvantage that I can see is that it does make ripping out layers a little bit more difficult. Instead of just deleting or renaming a package and then dropping a new one in place with an alternate technology, you have to go in and change all of the classes in all of the packages. However, I don't see this is a big deal. It may be from a lack of experience, but I have to imagine that the amount of times you swap out technologies pales in comparison to the amount of times you go in and edit vertical feature slices within your system.
So I guess the question then would go out to you, how do you name your packages and why? Please understand that I don't necessarily think that I've stumbled onto the golden goose or something here. I'm pretty new to all this with mostly academic experience. However, I can't spot the holes in my reasoning so I'm hoping you all can so that I can move on.
For package design, I first divide by layer, then by some other functionality.
There are some additional rules:
layers are stacked from most general (bottom) to most specific (top)
each layer has a public interface (abstraction)
a layer can only depend on the public interface of another layer (encapsulation)
a layer can only depend on more general layers (dependencies from top to bottom)
a layer preferably depends on the layer directly below it
So, for a web application for example, you could have the following layers in your application tier (from top to bottom):
presentation layer: generates the UI that will be shown in the client tier
application layer: contains logic that is specific to an application, stateful
service layer: groups functionality by domain, stateless
integration layer: provides access to the backend tier (db, jms, email, ...)
For the resulting package layout, these are some additional rules:
the root of every package name is <prefix.company>.<appname>.<layer>
the interface of a layer is further split up by functionality: <root>.<logic>
the private implementation of a layer is prefixed with private: <root>.private
Here is an example layout.
The presentation layer is divided by view technology, and optionally by (groups of) applications.
com.company.appname.presentation.internal
com.company.appname.presentation.springmvc.product
com.company.appname.presentation.servlet
...
The application layer is divided into use cases.
com.company.appname.application.lookupproduct
com.company.appname.application.internal.lookupproduct
com.company.appname.application.editclient
com.company.appname.application.internal.editclient
...
The service layer is divided into business domains, influenced by the domain logic in a backend tier.
com.company.appname.service.clientservice
com.company.appname.service.internal.jmsclientservice
com.company.appname.service.internal.xmlclientservice
com.company.appname.service.productservice
...
The integration layer is divided into 'technologies' and access objects.
com.company.appname.integration.jmsgateway
com.company.appname.integration.internal.mqjmsgateway
com.company.appname.integration.productdao
com.company.appname.integration.internal.dbproductdao
com.company.appname.integration.internal.mockproductdao
...
Advantages of separating packages like this is that it is easier to manage complexity, and it increases testability and reusability. While it seems like a lot of overhead, in my experience it actually comes very natural and everyone working on this structure (or similar) picks it up in a matter of days.
Why do I think the vertical approach is not so good?
In the layered model, several different high-level modules can use the same lower-level module. For example: you can build multiple views for the same application, multiple applications can use the same service, multiple services can use the same gateway. The trick here is that when moving through the layers, the level of functionality changes. Modules in more specific layers don't map 1-1 on modules from the more general layer, because the levels of functionality they express don't map 1-1.
When you use the vertical approach for package design, i.e. you divide by functionality first, then you force all building blocks with different levels of functionality into the same 'functionality jacket'. You might design your general modules for the more specific one. But this violates the important principle that the more general layer should not know about more specific layers. The service layer for example shouldn't be modeled after concepts from the application layer.
I find myself sticking with Uncle Bob's package design principles. In short, classes which are to be reused together and changed together (for the same reason, e.g. a dependency change or a framework change) should be put in the same package. IMO, the functional breakdown would have better chance of achieving these goals than the vertical/business-specific break-down in most applications.
For example, a horizontal slice of domain objects can be reused by different kinds of front-ends or even applications and a horizontal slice of the web front-end is likely to change together when the underlying web framework needs to be changed. On the other hand, it's easy to imagine the ripple effect of these changes across many packages if classes across different functional areas are grouped in those packages.
Obviously, not all kinds of software are the same and the vertical breakdown may make sense (in terms of achieving the goals of reusability and closeability-to-change) in certain projects.
There are usually both levels of division present. From the top, there are deployment units. These are named 'logically' (in your terms, think Eclipse features). Inside deployment unit, you have functional division of packages (think Eclipse plugins).
For example, feature is com.feature, and it consists of com.feature.client, com.feature.core and com.feature.ui plugins. Inside plugins, I have very little division to other packages, although that's not unusual too.
Update: Btw, there is great talk by Juergen Hoeller about code organization at InfoQ: http://www.infoq.com/presentations/code-organization-large-projects. Juergen is one of architects of Spring, and knows a lot about this stuff.
Most java projects I've worked on slice the java packages functionally first, then logically.
Usually parts are sufficiently large that they're broken up into separate build artifacts, where you might put core functionality into one jar, apis into another, web frontend stuff into a warfile, etc.
Both functional (architectural) and logical (feature) approaches to packaging have a place. Many example applications (those found in text books etc.) follow the functional approach of placing presentation, business services, data mapping, and other architectural layers into separate packages. In example applications, each package often has only a few or just one class.
This initial approach is fine since a contrived example often serves to: 1) conceptually map out the architecture of the framework being presented, 2) is done so with a single logical purpose (e.g. add/remove/update/delete pets from a clinic). The problem is that many readers take this as a standard that has no bounds.
As a "business" application expands to include more and more features, following the functional approach becomes a burden. Although I know where to look for types based on architecture layer (e.g. web controllers under a "web" or "ui" package, etc.), developing a single logical feature begins to require jumping back and forth between many packages. This is cumbersome, at the very least, but its worse than that.
Since logically related types are not packaged together, the API is overly publicized; the interaction between logically related types is forced to be 'public' so that types can import and interact with each other (the ability to minimize to default/package visibility is lost).
If I am building a framework library, by all means my packages will follow a functional/architectural packaging approach. My API consumers might even appreciate that their import statements contain intuitive package named after the architecture.
Conversely, when building a business application I will package by feature. I have no problem placing Widget, WidgetService, and WidgetController all in the same "com.myorg.widget." package and then taking advantage of default visibility (and having fewer import statements as well as inter-package dependencies).
There are, however, cross-over cases. If my WidgetService is used by many logical domains (features), I might create a "com.myorg.common.service." package. There is also a good chance that I create classes with intention to be re-usable across features and end up with packages such as "com.myorg.common.ui.helpers." and "com.myorg.common.util.". I may even end up moving all these later "common" classes in a separate project and include them in my business application as a myorg-commons.jar dependency.
Packages are to be compiled and distributed as a unit. When considering what classes belong in a package, one of the key criteria is its dependencies. What other packages (including third-party libraries) does this class depend on. A well-organized system will cluster classes with similar dependencies in a package. This limits the impact of a change in one library, since only a few well-defined packages will depend on it.
It sounds like your logical, vertical system might tend to "smear" dependencies across most packages. That is, if every feature is packaged as a vertical slice, every package will depend on every third party library that you use. Any change to a library is likely to ripple through your whole system.
I personally prefer grouping classes logically then within that include a subpackage for each functional participation.
Goals of packaging
Packages are after all about grouping things together - the idea being related classes live close to each other. If they live in the same package they can take advantage of package private to limit visibility. The problem is lumping all your view and persitance stuff into one package can lead to a lot of classes being mixed up into a single package. The next sensible thing to do is thus create view, persistance, util sub packages and refactor classes accordingly. Underfortunately protected and package private scoping does not support the concept of the current package and sub package as this would aide in enforcing such visibility rules.
I see now value in separation via functionality becase what value is there to group all the view related stuff. Things in this naming strategy become disconnected with some classes in the view whilst others are in persistance and so on.
An example of my logical packaging structure
For purposes of illustration lets name two modules - ill use the name module as a concept that groups classes under a particular branch of a pacckage tree.
apple.model
apple.store
banana.model
banana.store
Advantages
A client using the Banana.store.BananaStore is only exposed to the functionality we wish to make available. The hibernate version is an implementation detail which they do not need to be aware nor should they see these classes as they add clutter to storage operations.
Other Logical v Functional advantages
The further up towards the root the broader the scope becomes and things belonging to one package start to exhibit more and more dependencies on things belonging to toher modules. If one were to examine for example the "banana" module most of the dependencies would be limited to within that module. In fact most helpers under "banana" would not be referenced at all outside this package scope.
Why functionality ?
What value does one achieve by lumping things based on functionality. Most classes in such a case are independent of each other with little or no need to take advantage of package private methods or classes. Refactoring them so into their own subpackages gains little but does help reduce the clutter.
Developer changes to the system
When developers are tasked to make changes that are a bit more than trivial it seems silly that potentially they have changes that include files from all areas of the package tree. With the logical structured approach their changes are more local within the same part of the package tree which just seems right.
It depends on the granularity of your logical processes?
If they're standalone, you often have a new project for them in source control, rather than a new package.
The project I'm on at the moment is erring towards logical splitting, there's a package for the jython aspect, a package for a rule engine, packages for foo, bar, binglewozzle, etc. I'm looking at having the XML specific parsers/writers for each module within that package, rather than having an XML package (which I have done previously), although there will still be a core XML package where shared logic goes. One reason for this however is that it may be extensible (plugins) and thus each plugin will need to also define its XML (or database, etc) code, so centralising this could introduce problems later on.
In the end it seems to be how it seems most sensible for the particular project. I think it's easy to package along the lines of the typical project layered diagram however. You'll end up with a mix of logical and functional packaging.
What's needed is tagged namespaces. An XML parser for some Jython functionality could be tagged both Jython and XML, rather than having to choose one or the other.
Or maybe I'm wibbling.
I try to design package structures in such a way that if I were to draw a dependency graph, it would be easy to follow and use a consistent pattern, with as few circular references as possible.
For me, this is much easier to maintain and visualize in a vertical naming system rather than horizontal. if component1.display has a reference to component2.dataaccess, that throws off more warning bells than if display.component1 has a reference to dataaccess. component2.
Of course, components shared by both go in their own package.
I totally follow and propose the logical ("by-feature") organization! A package should follow the concept of a "module" as closely as possible. The functional organization may spread a module over a project, resulting in less encapsulation, and prone to changes in implementation details.
Let's take an Eclipse plugin for example: putting all the views or actions in one package would be a mess. Instead, each component of a feature should go to the feature's package, or if there are many, into subpackages (featureA.handlers, featureA.preferences etc.)
Of course, the problem lies in the hierarchical package system (which among others Java has), which makes the handling of orthogonal concerns impossible or at least very difficult - although they occur everywhere!
I would personally go for functional naming. The short reason: it avoids code duplication or dependency nightmare.
Let me elaborate a bit. What happens when you are using an external jar file, with its own package tree? You are effectively importing the (compiled) code into your project, and with it a (functionally separated) package tree. Would it make sense to use the two naming conventions at the same time? No, unless that was hidden from you. And it is, if your project is small enough and has a single component. But if you have several logical units, you probably don't want to re-implement, let's say, the data file loading module. You want to share it between logical units, not have artificial dependencies between logically unrelated units, and not have to choose which unit you are going to put that particular shared tool into.
I guess this is why functional naming is the most used in projects that reach, or are meant to reach, a certain size, and logical naming is used in class naming conventions to keep track of the specific role, if any of each class in a package.
I will try to respond more precisely to each of your points on logical naming.
If you have to go fishing in old classes to modify functionalities when you have a change of plans, it's a sign of bad abstraction: you should build classes that provide a well defined functionality, definable in one short sentence. Only a few, top-level classes should assemble all these to reflect your business intelligence. This way, you will be able to reuse more code, have easier maintenance, clearer documentation and less dependency issues.
That mainly depends on the way you grok your project. Definitely, logical and functional view are orthogonal. So if you use one naming convention, you need to apply the other one to class names in order to keep some order, or fork from one naming convention to an other at some depth.
Access modifiers are a good way to allow other classes that understand your processing to access the innards of your class. Logical relationship does not mean an understanding of algorithmic or concurrency constraints. Functional may, although it does not. I am very weary of access modifiers other than public and private, because they often hide a lack of proper architecturing and class abstraction.
In big, commercial projects, changing technologies happens more often than you would believe. For instance, I have had to change 3 times already of XML parser, 2 times of caching technology, and 2 times of geolocalisation software. Good thing I had hid all the gritty details in a dedicated package...
It is an interesting experiment not to use packages at all (except for the root package.)
The question that arises then, is, when and why it makes sense to introduce packages. Presumably, the answer will be different from what you would have answered at the beginning of the project.
I presume that your question arises at all, because packages are like categories and it's sometimes hard to decide for one or the other. Sometimes tags would be more appreciate to communicate that a class is usable in many contexts.
From a purely practical standpoint, java's visibility constructs allow classes in the same package to access methods and properties with protected and default visibility, as well as the public ones. Using non-public methods from a completely different layer of the code would definitely be a big code smell. So I tend to put classes from the same layer into the same package.
I don't often use these protected or default methods elsewhere - except possibly in the unit tests for the class - but when I do, it is always from a class at the same layer
It depends. In my line of work, we sometimes split packages by functions (data access, analytics) or by asset class (credit, equities, interest rates). Just select the structure which is most convenient for your team.
From my experience, re-usability creates more problems than solving. With the latest & cheap processors and memory, I would prefer duplication of code rather than tightly integrating in order to reuse.