Ways to work around the lack of package access specifiers?

Ways to work around the lack of package access specifiers? - java

I'm new to Java. I've discovered, while trying to structure my code, that Java intimately ties source file organisation (directory structure) to package structure and package structure to external visibility of classes (a class is either visible to all other packages, or none).
This makes it quite difficult to organise the internal implementation details of my public library into logical units of related functionality while maintaining good encapsulation. JSR 294 explains it best:
Today, an implementation can be partitioned into multiple packages.
Subparts of such an implementation need to be more tightly coupled to
each other than to the surrounding software environment. Today
designers are forced to declare elements of the program that are
needed by other subparts of the implementation as public - thereby
making them globally accessible, which is clearly suboptimal.
Alternately, the entire implementation can be placed in a single
package. This resolves the issue above, but is unwieldy, and exposes
all internals of all subparts to each other.
So my question is, what workarounds exist for this limitation, and what are the pros & cons? Two are mentioned in the JSR - use packages for logical grouping (violating encapsulation); place everything in a single package (unwieldy). Are there other pros/cons to these workarounds? Are there other solutions? (I've become vaguely aware of OSGi bundles, but I've found it hard to understand how they work and what the the pros/cons might be (perhaps that's a con). It appears to be very intrusive compared to vanilla packages, to development & deployment.
Note: I'll upvote any good answers, but the the best answer will be one that comprehensively folds in the pros & cons of others (plagiarise!).
Related (but not duplicate!) questions
Anticipating cries of 'Possible duplicate', here are similar questions that I've found on SO; I present them here for reference and also to explain why they don't answer my question.
Java : Expose only a single package in a jar file: asks how to do it, but given that it's not possible in current releases of Java, doesn't discuss workarounds. Has interesting pointers to forthcoming Modularization (Project Jigsaw) in Java 8.
Package and visibility - duplicate question of the above, basically.
Best practice for controlling access to a ".internal" package - question and answers seem to be specific to OSGi or Eclipse plug-ins.

Tools like ProGuard can be used to repackage a JAR, exposing only those classes you specify in the configuration file. (It does this in addition to optimizing, inlining, and obfuscating.) You might be able to set up ProGuard in e.g. a Maven or Ant build, so you write your library exposing methods as public, and then use ProGuard to eliminate them from the generated JAR.

I'll get the ball rolling. Steal this answer and add to it/correct it/elaborate please!
Use multiple packages for multiple logical groupings
Pros: effective logical grouping of related code.
Cons: when internal implementation detail classes in different packages need to use one another, they must be made public - even to the end user - violating encapsulation. (Work around this by using a standard naming convention for packages containing internal implementation details such as .internal or .impl).
Put everything in one package
Pros: effective encapsulation
Cons: unwieldy for development/maintenance of the library if it contains many classes
Use OSGi bundles
Pros: ? (do they fix the problem?)
Cons: appears to be very intrusive at development (for both library user and author) and deployment, compared to just deploying .jar files.
Wait for Jigsaw in Java 8
http://openjdk.java.net/projects/jigsaw/
Pros: fixes the problem for good?
Cons: doesn't exist yet, not specific release date known.

I've never found this to be a problem. The workaround (if you want to call it that) is called good API design.
If you design your library well, then you can almost always do the following:
Put the main public API in one package e.g. "my.package.core" or just "my.package"
Put helper modules in other packages (according to logical groupings), but give each one it's own public API subset (e.g. a factory class like "my.package.foobarimpl.FoobarFactory")
The main public API package uses only the public API of helper modules
Your tests should also run primarily against the public APIs (since this is what you care about in terms of regressions or functionality)
To me the "right level of encapsulation" for a package is therefore to expose enough public API that your package can be used effectively as a dependency. No more and no less. It shouldn't matter whether it is being used by another package in the same library or by an external user. If you design your packages around this principle, you increase the chance of effective re-use.
Making parts of a package "globally accessible" really doesn't do any harm as long as your API is reasonably well designed. Remember that packages aren't object instances and as a result encapsulation doesn't matter nearly as much: making elements of a package public is usually much less harmful than exposing internal implementation details of a class (which I agree should almost always be private/protected).
Consider java.lang.String for example. It has a big public API, but whatever you do with the public API can't interfere with other users of java.lang.String. It's perfectly safe to use as a dependency from multiple places at the same time. On the other hand, all hell would break loose if you allowed users of java.lang.String to directly access the internal character array (which would allow in-place mutation of immutable Strings.... nasty!!).
P.S. Honourable mention goes to OSGi because it is a pretty awesome technology and very useful in many circumstances. However its sweet spot is really around deployment and lifecycle management of modules (stopping / starting / loading etc.). You don't really need it for code organisation IMHO.

Related

Sanity check: how are packages and modules are meant to be used together?

Prior to Java 9, I had assumed that packages were a way to promote/enforce modularization of code and to solve the namespacing problem. Packages actually do a really poor problem of solving the latter (com.popkernel.myproject.Employee myEmployee = new com.popkernel.myproject.Employee();, ugh) so I primarily focused on the former benefit.
And while packages do a very poor job at enforcing modularity, I've found them quite effective at promoting modularity. "What package does this class belong in?" has always been an illuminating question, and I appreciate that Java forces me to ask myself it.
But now Java 9 modules are here and they enforce the modularity problem better than packages. In fact, you could compose a new project entirely out of modules where each module wraps a single package.
Of course, modules are a LOT of extra boilerplate compared to packages, not to mention the quirk that unlike packages, it is impossible to tell what module a class belongs to just by examining its .java file.
So going forward my plan for using packages and modules together is: as I'm writing code, group related concepts into packages. Then, if I decide I want to distribute the code I'm working on, formalize it into a module.
Is that a sane way to utilize those concepts in tandem?

Is that a sane way to utilize those concepts in tandem?
Yes, kind of. Though there are other benefits that you can gain from modularising your code such as Reliable configuration, Strong encapsulation, Increased Readability etc.
So going forward my plan for using packages and modules together is:
as I'm writing code, group related concepts into packages. Then, if I
decide I want to distribute the code I'm working on, formalize it into
a module.
A quick question you can ask yourself is that while you would be using modules, and as you would group related concepts into packages, eventually how do you group related packages then?
The answer that could match up to the reason is the introduction of a new kind of Java program component - Modules
A module is a named, self-describing collection of code and data. Its
code is organized as a set of packages containing types, i.e.,
Java classes and interfaces; its data includes resources and other
kinds of static information.
Until now such collections were primarily targeted as JAR files, that were distributed to be consumed as dependencies/libraries within other, but they are becoming legacy formats and differ quite a bit with modules. In one of my recent answers to Java 9 - What is the difference between "Modules" and "JAR" files? I have tried to detail the differences.

Keeping track of utility classes

I've recently been more and more frustrated with a problem I see emerging in my projects code-base.
I'm working on a large scale java project that has >1M lines of code. The interfaces and class structure are designed very well and the engineers writing the code are very proficient. The problem is that in an attempt to make the code cleaner people write Utility classes whenever they need to reuse some functionality, as a result over time and as the project grows more and more utility methods crop up. However, when the next engineer comes across the need for the same functionality he has no way of knowing that someone had already implemented a utility class (or method) somewhere in the code and implements another copy of the functionality in a different class. The result is a lot of code duplication and too many utility classes with overlapping functionality.
Are there any tools or any design principles which we as a team can implement in order to prevent the duplication and low visibility of the utility classes?
Example: engineer A has 3 places he needs to transform XML to String so he writes a utility class called XMLUtil and places a static toString(Document) method in it. Engineer B has several places where he serializes Documents into various formats including String, so he writes a utility class called SerializationUtil and has a static method called serialize(Document) which returns a String.
Note that this is more than just code-duplication as it is quite possible that the 2 implementations of the above example are different (say one uses transformer API and the other uses Xerces2-J) so this can be seen as a "best-practices" problem as well...
Update: I guess I better describe the current environment we develop in.
We use Hudson for CI, Clover for code coverage and Checkstyle for static code analysis.
We use agile development including daily talks and (perhaps insufficient) code reviews.
We define all our utility classes in a .util which due to it's size now has 13 sub-packages and about 60 classes under the root (.util) class. We also use 3rd party libraries such as most of the apache commons jars and some of the jars that make up Guava.
I'm positive that we can reduce the amount of utilities by half if we put someone on the task of refactoring that entire package, I was wondering if there are any tools which can make that operation less costly, and if there are any methodologies which can delay as much as possible the problem from recurring.

A good solution to this problem is to start adding more object-orientation. To use your example:
Example: engineer A has 3 places he needs to transform XML to String so he writes a utility class called XMLUtil and places a static toString(Document) method in it
The solution is to stop using primitive types or types provided by the JVM (String, Integer, java.util.Date, java.w3c.Document) and wrap them in your own project-specific classes. Then your XmlDocument class can provide a convenient toString method and other utility methods. Your own ProjectFooDate can contain the parsing and formatting methods that would otherwise end up in various DateUtils classes, etc.
This way, the IDE will prompt you with your utility methods whenever you try to do something with an object.

Your problem is a very common one. And a real problem too, because there is no good solution.
We are in the same situation here, well I'd say worse, with 13 millions line of code, turnover and more than 800 developers working on the code. We often discuss about the very same problem that you describe.
The first idea - that your developers have already used - is to refactor common code in some utility classes. Our problem with that solution, even with pair programming, mentoring and discussion, is that we are simply too many for this to be effective. In fact we grow in subteams, with people sharing knowledge in their subteam, but the knowledge doesn't transit between subteams. Maybe we are wrong but I think that even pair programming and talks can't help in this case.
We also have an architecture team. This team is responsible to deal with design and architecture concerns and to make common utilities that we might need. This team in fact produces something we could call a corporate framework. Yes, it is a framework, and sometimes it works well. This team is also responsible to push best practices and to raise awareness of what should be done or not, what is available or what is not.
Good core Java API design is one of the reason for Java success. Good third party open sources libraries count a lot too. Even a small well crafted API allows to offer a really useful abstraction and can help reduce code size a lot. But you know, making framework and public API is not the same thing at all as just coding an utility class in 2 hours. It has a really high cost. An utility class costs 2 hours for the initial coding, maybe 2 days with debugging and unit tests. When you start sharing common code on big projects/teams, you really make an API. You must ensure perfect documentation then, really readable and maintainable code. When you release new version of this code, you must stay backward compatible. You have to promote it company wide (or at least team wide). From 2 days for your small utility class you grow to 10 days, 20 days or even 50 days for a full-fledged API.
And your API design may not be so great. Well, it is not that your engineers are not bright - indeed they are. But are you willing to let them work 50 days on a small utility class that just help parsing number in a consistent way for the UI? Are you willing to let them redesign the whole thing when you start using a mobile UI with totally different needs? Also have you noticed how the brightest engineers in the word make APIs that will never be popular or will fade slowly? You see, the first web project we made used only internal frameworks or no framework at all. We then added PHP/JSP/ASP. Then in Java we added Struts. Now JSF is the standard. And we are thinking about using Spring Web Flow, Vaadin or Lift...
All I want to say is that there is no good solution, the overhead grows exponentially with code size and team size. Sharing a big codebase restricts your agility and responsiveness. Any change must be done carefully, you must think of all potential integration problems and everybody must be trained of the new specificities and features.
But the main productivity point in a software company is not to gain 10 or even 50 lines of code when parsing XML. A generic code to do this will grow to a thousand lines of code anyway and recreates a complex API that will be layered by utility classes. When the guy make an utility class for parsing XML, it is good abstraction. He give a name to one dozen or even one hundred lines of specialized code. This code is useful because it is specialized. The common API allows to work on streams, URL, strings, whatever. It has a factory so you can choose you parser implementation. The utility class is good because it work only with this parser and with strings. And because you need one line of code to call it. But of course, this utility code is of limited use. It works well for this mobile application, or for loading XML configuration. And that's why the developer added the utility class for it in the first place.
In conclusion, what I would consider instead of trying to consolidate the code for the whole codebase is to split code responsibility as the teams grow:
transform your big team that work on one big project into small teams that work on several subprojects;
ensure that interfacing is good to minimize integration problems, but let team have their own code;
inside theses teams and corresponding codebases, ensure you have the best practices. No duplicate code, good abstractions. Use existing proven APIs from the community. Use pair programming, strong API documentation, wikis... But you should really let different teams make their choices, build their own code, even if this means duplicate code across teams or different design decisions. You know, if the design decisions are different this may be because the needs are different.
What you are really managing is complexity. In the end if you make one monolithic codebase, a very generic and advanced one, you increase the time for newcomers to ramp up, you increase the risk that developers will not use your common code at all, and you slow down everybody because any change has far greater chances to break existing functionality.

There are several agile/ XP practices you can use to address this, e.g.:
talk with each other (e.g. during daily stand-up meeting)
pair programming/ code review
Then create, document & test one or several utility library projects which can be referenced. I recommend to use Maven to manage dependecies/ versions.

You might consider suggesting that all utility classes be placed in a well organized package structure like com.yourcompany.util.. If people are willing to name sub packages and classes well, then at least if they need to find a utility, they know where to look. I don't think there is any silver bullet answer here though. Communication is important. Maybe if a developer sends a simple email to the rest of the development staff when they write a new utility, that will be enough to get it on people's radar. Or a shared wiki page where people can list/document them.

Team communication (shout out "hey does someone have a Document toString?")
Keep utility classes to an absolute minimum and restrict them to a single namespace
Always think: how can I do this with an object. In your example, I would extend the Document class and add those toString and serialize methods to it.

This problem is helped when combining IDE "code-completion" features with languages which support type extensions (e.g. C# and F#). So that, imagining Java had a such a feature, a programmer could explore all the extension methods on a class easily within the IDE like:
Document doc = ...
doc.to //list pops up with toXmlString, toJsonString, all the "to" series extension methods
Of course, Java doesn't have type extensions. But you could use grep to search your project for "all static public methods which take SomeClass as the first argument" to gain similar insight into what utility methods have already been written for a given class.

Its pretty hard to build a tool that recognizes "same functionality". (In theory this is in fact impossible, and where you can do it in practice you likely need a theorem prover).
But what often happens is people clone clode that is close to what they want, and then customize it. That kind of code you can find, using a clone detector.
Our CloneDR is a tool for detecting exact and near-miss cloned code based on using parameterized syntax trees. It matches parsed versions of the code, so it isn't confused by layout, changed comments, revised variable names, or in many cases, inserted or deleted statements. There are versions for many languages (C++, COBOL, C#, Java, JavaScript, PHP, ...) and you can see examples of clone detection runs at the provided
link. It typically finds 10-20% duplicated code, and if you abstract that code into library methods on a religious base, your code base can actually shrink (that has occurred with one organization using CloneDR).

You are looking for a solution that can you help you manage this inevitable problem, then I can suggest a tool:
TeamCity: an amazing easy to use product that manages all your automated code building from your repository and runs unit tests etc.
It's even a free product for most people.
The even better part: it has built in code duplicate detection across all your code.
More stuff to read up:
Tools to detect duplicated code (Java)

a standard application utility project. build a jar with the restricted extensibility scope and package based on functionality.
use common utilities like apache-commons or google collections and provide an abstraction
maintain knowledge-base and documentation and JIRA tracking for bugs and enhancements
evolutionary refactoring
findbugs and pmd for finding code duplication or bugs
review and test utility tools for performance
util karma! ask team members to contribute to the code base, whenever they find one in the existing jungle code or requiring new ones.

How to make code modular?

I have some Java programs, now I want to find out whether it is modular or not, if it is modular then up to what extent, because modularity can never be binary term i.e. 0 or 1.
How do I decide that particular code is modular upto this much extent. I want to know how to make code much more modular?

Some Benchmarks for modularity:
How many times are you rewriting similar code for doing a particular task?
How much do you have to refactor your code when you change some part of your program?
Are the files small and easy to navigate through?
Are the application modules performing adequately and independently as and when required?
Is your code minimally disastrous? Does all hell break lose when you delete just one function or variable? Do you get 20-odd errors upon re-naming a class? (To examine this, you can implement a stacking mechanism to keep trace of all the hops in your application)
How near is the code to natural language usage? (i.e. modules and their subcomponents represent more real world objects without giving much concern to net source file size).
For more ideas check out this blurb about modularity and this one on software quality
As for your concern on making your code more modular first you should ask yourself the above questions, obtain specific answers for them and then have a look at this.
The basic philosophy is to break down your application into as small of code fragments as possible, arranged neatly across a multitude of easily understandable and accessible directory layouts.
Each method in your application must do no more than the minimum quanta of processing needed. Combining these methods into more and more macro level methods should lead you back to your application.

Key points are
Separation of concerns
Cohesion
Encapsulation (communicates via interface)
Substitutability
Reusability
A good example of such module system is standard car parts like disk brakes and car stereo.
You don't want to build car stereo from scratch when you are building cars. You'd rather buy it and plug it in. You also don't want the braking system affecting the car stereo — or worse car stereo affecting the brake system.
To answer your question, "How do I decide that particular code is modular up to this much extent," we can form questions to test the modularity. Can you easily substitute your modules with something else without affecting other parts of your application?
XML parsers could be another example. Once you obtain the DOM interface, you really don't care which implementation of XML parser is used underneath (e.g. Apache Xerces or JAXP).
In Java, another question may be: Are all functionality accessible via interfaces? Interface pretty much takes care of the low coupling.
Also, can you describe each module in your system with one sentence? For example, a car stereo plays music and radio. Disk brakes decelerate the vehicle safely.
(Here's what I wrote to What is component driven development?)
According to Wikipedia, Component-Based Development is an alias for Component-based software engineering (CBSE).
[It] is a branch of software
engineering, the priority of which is
the separation of concerns in respect
of the wide-ranging functionality
available throughout a given software
system.
This is somewhat vague, so let's look at more details.
An individual component is a software
package, or a module, that
encapsulates a set of related
functions (or data).
All system processes are placed into
separate components so that all of the
data and functions inside each
component are semantically related
(just as with the contents of
classes). Because of this principle,
it is often said that components are
modular and cohesive.
So, according to this definition, a component can be anything as long as it does one thing really well and only one thing.
With regards to system-wide
co-ordination, components communicate
with each other via interfaces. [...]
This principle results in components referred to as encapsulated.
So this is sounding more and more like what we think of good API or SOA should look like.
The provided interfaces are represented by a lollipop and required interfaces are represented by an open socket symbol attached to the outer edge of the component in UML.
Another important attribute of
components is that they are
substitutable, so that a component
could be replaced by another (at
design time or run-time), if the
requirements of the initial component
(expressed via the interfaces) are met
by the successor component.
Reusability is an important
characteristic of a high quality
software component. A software
component should be designed and
implemented so that it can be reused
in many different programs.
Substitutability and reusability is what makes a component a component.
So what's the difference between this and Object-Oriented Programming?
The idea in object-oriented
programming (OOP) is that software
should be written according to a
mental model of the actual or imagined
objects it represents. [...]
Component-based software engineering,
by contrast, makes no such
assumptions, and instead states that
software should be developed by gluing
prefabricated components together much
like in the field of electronics or
mechanics.

To answer your specific question of how to make the code more modular, a couple of approaches are:
One of best tool for modularization is spotting code re-use. If you find that your code does the same exact (or very similar) thing in more than once place, it's a good candidate for modularizing away.
Determine which pieces of logic can be made independent, in a sense that other logic would use them without needing to know how they are built. This is somewhat similar to what you to in OO design, although module/component does not necessarily need to correspond to a modeled object as in OO.

Hej,
See, "How to encapsulate software (Part 1)," here:
http://www.edmundkirwan.com/encap/overview/paper7.html
Regards,
Ed.

Since this has been tagged with 'osgi', I can throw in an OSGi-related perspective.
The short answer is that it is possible to go from completely spaghetti code to modular in small steps; it doesn't have to be a big bang. For example, even spaghetti code depends on some kind of bolognaise logging library, so in some sense, it's already modular, just with One Very Big Metball (sorry, module) in it.
The trick is to break the big meatball into one smaller chunk and then a slightly less big meatball and then recurse. It doesn't all have to be done in one go either; simply chip off a bit more each time until there is nothing left to remove.
As for OSGi, it's still possible to put an uber-jar into a bundle. In fact, you can do this without changing the bits; either by modifying the Manifest.MF in place, or by wrapping that in another JAR and specify Bundle-ClassPath: metaball.jar in the manifest.
Failing that, tools like BND can help generate the right data you'd need, and then it can be dropped in an OSGi runtime easily enough. But beware of overly coupled code, and stuff that mucks around with classloaders - those will trip you up.

Assuming I understand your question, that you want to know what it is that makes code modular, since code modules will obviously need some dependency between each other to work at all. This is my answer:
If you can break your system down into modules, and you can test those modules in isolation, that is a good indication that a system is modular.

As you say modularity is not a binary thing so it depends on your relative definition.
I would say: Can you use a given method in any program where you need to perform that function? Is it the "black box" where you wouldn't need to know what it were doing under the hood? If the answer is no, i.e. the method would only work properly in that program then it is not truely modular.

Modularity is relative to who ever is developing the code. But I think the general consensus is that modular code is code that has portions that can easily be swapped out without changing most of the original code.
IMHO, If you have 3 modules A B and C and you want to change or replace module C completely, if it is a SIMPLE task to do so then you have modular code.

You can use a code analysis tool such as CAP to analyse the dependencies between types and packages. They'll help you find and remove any cyclic dependencies, which are often a problem when trying to develop modular code.
If there are no cyclic dependencies, you can start separating your code into discrete jars.
In general it is good practice to code to interfaces if you can, this generally means your code can more easily be refactored and/or used in different contexts.
Dependency injection frameworks such as Spring can also help with the modularity of your design. As types are injected with their dependencies by some external configuration process they don't need a direct dependency on an implementation.

The package-by-feature idea helps to make code more modular.
Many examples seen on the web divide applications first into layers, not features
models
data access
user interface
It seems better, however, to divide applications up using top-level packages that align with features, not layers.
Here is an example of a web app that uses package-by-feature. Note the names of the top-level packages, which read as a list of actual features in the application. Note as well how each package contains all items related to a feature - the items aren't spread out all over the place; most of the time, they are all in a single package/directory.
Usually, deletion of a feature in such an app can be implemented in a single operation - deletion of a single directory.

Is having a fine-grained package structure a good or a bad thing?

I recently looked at a Java application which had a very fine-grained package structure. Many packages only contained one or two classes and many sub packages. Also many packages contained more sub packages than actual classes.
Is this a good or a bad thing?

IMO, it is a bad thing, though not a real show-stopper in terms of maintainability.
The disadvantages are that it makes classes harder to find, and that it makes the package names more verbose. The former applies more when you are not using an IDE.
It could be argued that it helps modularization in conjunction with "package private" scoping. But conversely, you could also argue that over-packagization actually does the opposite; i.e. forcing you to use public where you wouldn't have had to if you'd been less fine-grained / pedantic.

The actual number of types that end up in one particular package is not that important. It is how you arrive at your package structure.
Things like abstraction ("what" instead of "how", essentially "public" API), coupling (the degree of how dependent a package is on other packages) and cohesion (how interrelated is the functionality in one package) are more important.
Some guidelines for package design (mostly from Uncle Bob) are for example:
Packages should form reusable and releasable modules
Packages should be focussed for good reusability
Avoid cyclic dependencies between packages
Packages should depend only on packages that change less often
The abstraction of a package should be in proportion to how often it changes
Do not try to flesh out an entire package structure from scratch (You Are Not Going To Need It). Instead let it evolve and refactor often. Look at the import section of your Java sources for inspiration on moving types around. Don't be distracted by packages that contain only one or a few types.

I think finer grained package structure is a good thing. The main reason is it can help minimize your dependencies. But be careful... if you break things up too much that actually belong together, you will end up with circular dependencies!
As a rule of thumb, I usually have an interface (or group of related interfaces) in a package. And in subpackages I will have implementations of those interfaces (instead of having all the implementations in the same package). That way a client can just depend on the interface and implementation of interest... and not all the other stuff they don't need.
Hope this helps.

It is subjective, of course, but I generally prefer to decide my packages in a way they would contain at least 3-4 classes, but not more than about 13-15. It makes understanding better, while not cluttering the project. For others it might be different, though.
In case a package grows more than 13-15 classes, a subpackage is asking to emerge.

A package is a unit of encapsulation.
Ideally, it exposes a public API via interface(s) and hides implementation detail in package private classes.
The size of the package is therefore the number of classes needed to implement the public API.

There is also the magical number 7 +/- 2. Within physiological circles it has been shown that this is a basic limit on our ability to comprehend a given set of things. How much we can keep in short term memory and process at the same time before our brains need to start swapping. It is not a bad rule of thumb I've found when coding, ie once a package starts getting bigger than 10 classes it is time to think seriously about splitting it up, but below 5 packages it is really not worth it. Also applies well to things like menu organisation. See Millers paper, or google.

When writing from scratch my way is to begin all classes in the root package (com.example.app1). When there is a certain amount of interrelated classes that "do a whole thing" is time to create a package. After some time developing helper classes may go to a generic package (ex. com.example.app1.misc, com.example.app1.utils) to unload the root package.
So i try to keep clean the root package.
Fine-grain is not bad but as other said often refactoring produce a compact package structure.

What strategy do you use for package naming in Java projects and why? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I thought about this awhile ago and it recently resurfaced as my shop is doing its first real Java web app.
As an intro, I see two main package naming strategies. (To be clear, I'm not referring to the whole 'domain.company.project' part of this, I'm talking about the package convention beneath that.) Anyway, the package naming conventions that I see are as follows:
Functional: Naming your packages according to their function architecturally rather than their identity according to the business domain. Another term for this might be naming according to 'layer'. So, you'd have a *.ui package and a *.domain package and a *.orm package. Your packages are horizontal slices rather than vertical.
This is much more common than logical naming. In fact, I don't believe I've ever seen or heard of a project that does this. This of course makes me leery (sort of like thinking that you've come up with a solution to an NP problem) as I'm not terribly smart and I assume everyone must have great reasons for doing it the way they do. On the other hand, I'm not opposed to people just missing the elephant in the room and I've never heard a an actual argument for doing package naming this way. It just seems to be the de facto standard.
Logical: Naming your packages according to their business domain identity and putting every class that has to do with that vertical slice of functionality into that package.
I have never seen or heard of this, as I mentioned before, but it makes a ton of sense to me.
I tend to approach systems vertically rather than horizontally. I want to go in and develop the Order Processing system, not the data access layer. Obviously, there's a good chance that I'll touch the data access layer in the development of that system, but the point is that I don't think of it that way. What this means, of course, is that when I receive a change order or want to implement some new feature, it'd be nice to not have to go fishing around in a bunch of packages in order to find all the related classes. Instead, I just look in the X package because what I'm doing has to do with X.
From a development standpoint, I see it as a major win to have your packages document your business domain rather than your architecture. I feel like the domain is almost always the part of the system that's harder to grok where as the system's architecture, especially at this point, is almost becoming mundane in its implementation. The fact that I can come to a system with this type of naming convention and instantly from the naming of the packages know that it deals with orders, customers, enterprises, products, etc. seems pretty darn handy.
It seems like this would allow you to take much better advantage of Java's access modifiers. This allows you to much more cleanly define interfaces into subsystems rather than into layers of the system. So if you have an orders subsystem that you want to be transparently persistent, you could in theory just never let anything else know that it's persistent by not having to create public interfaces to its persistence classes in the dao layer and instead packaging the dao class in with only the classes it deals with. Obviously, if you wanted to expose this functionality, you could provide an interface for it or make it public. It just seems like you lose a lot of this by having a vertical slice of your system's features split across multiple packages.
I suppose one disadvantage that I can see is that it does make ripping out layers a little bit more difficult. Instead of just deleting or renaming a package and then dropping a new one in place with an alternate technology, you have to go in and change all of the classes in all of the packages. However, I don't see this is a big deal. It may be from a lack of experience, but I have to imagine that the amount of times you swap out technologies pales in comparison to the amount of times you go in and edit vertical feature slices within your system.
So I guess the question then would go out to you, how do you name your packages and why? Please understand that I don't necessarily think that I've stumbled onto the golden goose or something here. I'm pretty new to all this with mostly academic experience. However, I can't spot the holes in my reasoning so I'm hoping you all can so that I can move on.

For package design, I first divide by layer, then by some other functionality.
There are some additional rules:
layers are stacked from most general (bottom) to most specific (top)
each layer has a public interface (abstraction)
a layer can only depend on the public interface of another layer (encapsulation)
a layer can only depend on more general layers (dependencies from top to bottom)
a layer preferably depends on the layer directly below it
So, for a web application for example, you could have the following layers in your application tier (from top to bottom):
presentation layer: generates the UI that will be shown in the client tier
application layer: contains logic that is specific to an application, stateful
service layer: groups functionality by domain, stateless
integration layer: provides access to the backend tier (db, jms, email, ...)
For the resulting package layout, these are some additional rules:
the root of every package name is <prefix.company>.<appname>.<layer>
the interface of a layer is further split up by functionality: <root>.<logic>
the private implementation of a layer is prefixed with private: <root>.private
Here is an example layout.
The presentation layer is divided by view technology, and optionally by (groups of) applications.
com.company.appname.presentation.internal
com.company.appname.presentation.springmvc.product
com.company.appname.presentation.servlet
...
The application layer is divided into use cases.
com.company.appname.application.lookupproduct
com.company.appname.application.internal.lookupproduct
com.company.appname.application.editclient
com.company.appname.application.internal.editclient
...
The service layer is divided into business domains, influenced by the domain logic in a backend tier.
com.company.appname.service.clientservice
com.company.appname.service.internal.jmsclientservice
com.company.appname.service.internal.xmlclientservice
com.company.appname.service.productservice
...
The integration layer is divided into 'technologies' and access objects.
com.company.appname.integration.jmsgateway
com.company.appname.integration.internal.mqjmsgateway
com.company.appname.integration.productdao
com.company.appname.integration.internal.dbproductdao
com.company.appname.integration.internal.mockproductdao
...
Advantages of separating packages like this is that it is easier to manage complexity, and it increases testability and reusability. While it seems like a lot of overhead, in my experience it actually comes very natural and everyone working on this structure (or similar) picks it up in a matter of days.
Why do I think the vertical approach is not so good?
In the layered model, several different high-level modules can use the same lower-level module. For example: you can build multiple views for the same application, multiple applications can use the same service, multiple services can use the same gateway. The trick here is that when moving through the layers, the level of functionality changes. Modules in more specific layers don't map 1-1 on modules from the more general layer, because the levels of functionality they express don't map 1-1.
When you use the vertical approach for package design, i.e. you divide by functionality first, then you force all building blocks with different levels of functionality into the same 'functionality jacket'. You might design your general modules for the more specific one. But this violates the important principle that the more general layer should not know about more specific layers. The service layer for example shouldn't be modeled after concepts from the application layer.

I find myself sticking with Uncle Bob's package design principles. In short, classes which are to be reused together and changed together (for the same reason, e.g. a dependency change or a framework change) should be put in the same package. IMO, the functional breakdown would have better chance of achieving these goals than the vertical/business-specific break-down in most applications.
For example, a horizontal slice of domain objects can be reused by different kinds of front-ends or even applications and a horizontal slice of the web front-end is likely to change together when the underlying web framework needs to be changed. On the other hand, it's easy to imagine the ripple effect of these changes across many packages if classes across different functional areas are grouped in those packages.
Obviously, not all kinds of software are the same and the vertical breakdown may make sense (in terms of achieving the goals of reusability and closeability-to-change) in certain projects.

There are usually both levels of division present. From the top, there are deployment units. These are named 'logically' (in your terms, think Eclipse features). Inside deployment unit, you have functional division of packages (think Eclipse plugins).
For example, feature is com.feature, and it consists of com.feature.client, com.feature.core and com.feature.ui plugins. Inside plugins, I have very little division to other packages, although that's not unusual too.
Update: Btw, there is great talk by Juergen Hoeller about code organization at InfoQ: http://www.infoq.com/presentations/code-organization-large-projects. Juergen is one of architects of Spring, and knows a lot about this stuff.

Most java projects I've worked on slice the java packages functionally first, then logically.
Usually parts are sufficiently large that they're broken up into separate build artifacts, where you might put core functionality into one jar, apis into another, web frontend stuff into a warfile, etc.

Both functional (architectural) and logical (feature) approaches to packaging have a place. Many example applications (those found in text books etc.) follow the functional approach of placing presentation, business services, data mapping, and other architectural layers into separate packages. In example applications, each package often has only a few or just one class.
This initial approach is fine since a contrived example often serves to: 1) conceptually map out the architecture of the framework being presented, 2) is done so with a single logical purpose (e.g. add/remove/update/delete pets from a clinic). The problem is that many readers take this as a standard that has no bounds.
As a "business" application expands to include more and more features, following the functional approach becomes a burden. Although I know where to look for types based on architecture layer (e.g. web controllers under a "web" or "ui" package, etc.), developing a single logical feature begins to require jumping back and forth between many packages. This is cumbersome, at the very least, but its worse than that.
Since logically related types are not packaged together, the API is overly publicized; the interaction between logically related types is forced to be 'public' so that types can import and interact with each other (the ability to minimize to default/package visibility is lost).
If I am building a framework library, by all means my packages will follow a functional/architectural packaging approach. My API consumers might even appreciate that their import statements contain intuitive package named after the architecture.
Conversely, when building a business application I will package by feature. I have no problem placing Widget, WidgetService, and WidgetController all in the same "com.myorg.widget." package and then taking advantage of default visibility (and having fewer import statements as well as inter-package dependencies).
There are, however, cross-over cases. If my WidgetService is used by many logical domains (features), I might create a "com.myorg.common.service." package. There is also a good chance that I create classes with intention to be re-usable across features and end up with packages such as "com.myorg.common.ui.helpers." and "com.myorg.common.util.". I may even end up moving all these later "common" classes in a separate project and include them in my business application as a myorg-commons.jar dependency.

Packages are to be compiled and distributed as a unit. When considering what classes belong in a package, one of the key criteria is its dependencies. What other packages (including third-party libraries) does this class depend on. A well-organized system will cluster classes with similar dependencies in a package. This limits the impact of a change in one library, since only a few well-defined packages will depend on it.
It sounds like your logical, vertical system might tend to "smear" dependencies across most packages. That is, if every feature is packaged as a vertical slice, every package will depend on every third party library that you use. Any change to a library is likely to ripple through your whole system.

I personally prefer grouping classes logically then within that include a subpackage for each functional participation.
Goals of packaging
Packages are after all about grouping things together - the idea being related classes live close to each other. If they live in the same package they can take advantage of package private to limit visibility. The problem is lumping all your view and persitance stuff into one package can lead to a lot of classes being mixed up into a single package. The next sensible thing to do is thus create view, persistance, util sub packages and refactor classes accordingly. Underfortunately protected and package private scoping does not support the concept of the current package and sub package as this would aide in enforcing such visibility rules.
I see now value in separation via functionality becase what value is there to group all the view related stuff. Things in this naming strategy become disconnected with some classes in the view whilst others are in persistance and so on.
An example of my logical packaging structure
For purposes of illustration lets name two modules - ill use the name module as a concept that groups classes under a particular branch of a pacckage tree.
apple.model
apple.store
banana.model
banana.store
Advantages
A client using the Banana.store.BananaStore is only exposed to the functionality we wish to make available. The hibernate version is an implementation detail which they do not need to be aware nor should they see these classes as they add clutter to storage operations.
Other Logical v Functional advantages
The further up towards the root the broader the scope becomes and things belonging to one package start to exhibit more and more dependencies on things belonging to toher modules. If one were to examine for example the "banana" module most of the dependencies would be limited to within that module. In fact most helpers under "banana" would not be referenced at all outside this package scope.
Why functionality ?
What value does one achieve by lumping things based on functionality. Most classes in such a case are independent of each other with little or no need to take advantage of package private methods or classes. Refactoring them so into their own subpackages gains little but does help reduce the clutter.
Developer changes to the system
When developers are tasked to make changes that are a bit more than trivial it seems silly that potentially they have changes that include files from all areas of the package tree. With the logical structured approach their changes are more local within the same part of the package tree which just seems right.

It depends on the granularity of your logical processes?
If they're standalone, you often have a new project for them in source control, rather than a new package.
The project I'm on at the moment is erring towards logical splitting, there's a package for the jython aspect, a package for a rule engine, packages for foo, bar, binglewozzle, etc. I'm looking at having the XML specific parsers/writers for each module within that package, rather than having an XML package (which I have done previously), although there will still be a core XML package where shared logic goes. One reason for this however is that it may be extensible (plugins) and thus each plugin will need to also define its XML (or database, etc) code, so centralising this could introduce problems later on.
In the end it seems to be how it seems most sensible for the particular project. I think it's easy to package along the lines of the typical project layered diagram however. You'll end up with a mix of logical and functional packaging.
What's needed is tagged namespaces. An XML parser for some Jython functionality could be tagged both Jython and XML, rather than having to choose one or the other.
Or maybe I'm wibbling.

I try to design package structures in such a way that if I were to draw a dependency graph, it would be easy to follow and use a consistent pattern, with as few circular references as possible.
For me, this is much easier to maintain and visualize in a vertical naming system rather than horizontal. if component1.display has a reference to component2.dataaccess, that throws off more warning bells than if display.component1 has a reference to dataaccess. component2.
Of course, components shared by both go in their own package.

I totally follow and propose the logical ("by-feature") organization! A package should follow the concept of a "module" as closely as possible. The functional organization may spread a module over a project, resulting in less encapsulation, and prone to changes in implementation details.
Let's take an Eclipse plugin for example: putting all the views or actions in one package would be a mess. Instead, each component of a feature should go to the feature's package, or if there are many, into subpackages (featureA.handlers, featureA.preferences etc.)
Of course, the problem lies in the hierarchical package system (which among others Java has), which makes the handling of orthogonal concerns impossible or at least very difficult - although they occur everywhere!

I would personally go for functional naming. The short reason: it avoids code duplication or dependency nightmare.
Let me elaborate a bit. What happens when you are using an external jar file, with its own package tree? You are effectively importing the (compiled) code into your project, and with it a (functionally separated) package tree. Would it make sense to use the two naming conventions at the same time? No, unless that was hidden from you. And it is, if your project is small enough and has a single component. But if you have several logical units, you probably don't want to re-implement, let's say, the data file loading module. You want to share it between logical units, not have artificial dependencies between logically unrelated units, and not have to choose which unit you are going to put that particular shared tool into.
I guess this is why functional naming is the most used in projects that reach, or are meant to reach, a certain size, and logical naming is used in class naming conventions to keep track of the specific role, if any of each class in a package.
I will try to respond more precisely to each of your points on logical naming.
If you have to go fishing in old classes to modify functionalities when you have a change of plans, it's a sign of bad abstraction: you should build classes that provide a well defined functionality, definable in one short sentence. Only a few, top-level classes should assemble all these to reflect your business intelligence. This way, you will be able to reuse more code, have easier maintenance, clearer documentation and less dependency issues.
That mainly depends on the way you grok your project. Definitely, logical and functional view are orthogonal. So if you use one naming convention, you need to apply the other one to class names in order to keep some order, or fork from one naming convention to an other at some depth.
Access modifiers are a good way to allow other classes that understand your processing to access the innards of your class. Logical relationship does not mean an understanding of algorithmic or concurrency constraints. Functional may, although it does not. I am very weary of access modifiers other than public and private, because they often hide a lack of proper architecturing and class abstraction.
In big, commercial projects, changing technologies happens more often than you would believe. For instance, I have had to change 3 times already of XML parser, 2 times of caching technology, and 2 times of geolocalisation software. Good thing I had hid all the gritty details in a dedicated package...

It is an interesting experiment not to use packages at all (except for the root package.)
The question that arises then, is, when and why it makes sense to introduce packages. Presumably, the answer will be different from what you would have answered at the beginning of the project.
I presume that your question arises at all, because packages are like categories and it's sometimes hard to decide for one or the other. Sometimes tags would be more appreciate to communicate that a class is usable in many contexts.

From a purely practical standpoint, java's visibility constructs allow classes in the same package to access methods and properties with protected and default visibility, as well as the public ones. Using non-public methods from a completely different layer of the code would definitely be a big code smell. So I tend to put classes from the same layer into the same package.
I don't often use these protected or default methods elsewhere - except possibly in the unit tests for the class - but when I do, it is always from a class at the same layer

It depends. In my line of work, we sometimes split packages by functions (data access, analytics) or by asset class (credit, equities, interest rates). Just select the structure which is most convenient for your team.

From my experience, re-usability creates more problems than solving. With the latest & cheap processors and memory, I would prefer duplication of code rather than tightly integrating in order to reuse.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.