Split packages in plain java - java

OSGi has a problem with split packages, i.e. same package but hosted in multiple bundles.
Are there any edge cases that split packages might pose problems in plain java (without OSGi) ?
Just curious.

Where split packages come from
Split packages (in OSGi) occur when the manifest header Require-Bundle is used (as it is, I believe, in Eclipse's manifests). Require-Bundle names other bundles which are used to search for classes (if the package isn't Imported). The search happens before the bundles own classpath is searched. This allows the classes for a single package to be loaded from the exports of multiple bundles (probably distinct jars).
The OSGi spec (4.1) section 3.13 describes Require-Bundle and has a long list of (unexpected) consequences of using this header (ought this header be deprecated?), one section of which is devoted to split packages. Some of these consequences are bizarre (and rather OSGi-specific) but most are avoided if you understand one thing:
if a class (in a package) is provided by more than one bundle then you are in trouble.
If the package pieces are disjoint, then all should be well, except that you might not have the classes visible everywhere and package visibility members might appear to be private if viewed from a "wrong" part of a split package.
[Of course that's too simple—multiple versions of packages can be installed—but from the application's point of view at any one time all classes from a package should be sourced from a single module.]
What happens in 'standard Java'
In standard Java, without fancy class-loaders, you have a classpath, and the order of searching of jars (and directories) for classes to load is fixed and well-defined: what you get is what you get. (But then, we give up manageable modularity.)
Sure, you can have split packages—it's quite common in fact—and it is an indication of poor modularity. The symptoms can be obscure compile/build-time errors, but in the case of multiple class implementations (one over-rides the rest in a single class-path) it most often produces obscure run-time behaviour, owing to subtly-different semantics.
If you are lucky you end up looking at the wrong code—without realising it—and asking yourself "but how can that possibly be doing that?"If you are unlucky you are looking at the right code and asking exactly the same thing—because something else was producing unexpected answers.
This is not entirely unlike the old database adage: "if you record the same piece of information in two places, pretty soon it won't be the same anymore". Our problem is that 'pretty soon' isn't normally soon enough.

For OSGi packages in different bundles are different, regardless of their name, because each bundle uses its own class loader. It is not a problem but a feature, to ensure encapsulation of bundles.
So in plain Java this is normally not a problem, until you start using some framework that uses class loaders. That is typically the case when components are loaded.

Splitting packages across jars probably isn't a great idea. I suggest making all packages within jars sealed (put "Sealed: true" in the main section of the manifest). Sealed packages can't be split between jars.
In the case of OSGi, classes with the same package name but a different class loader are treated as if they are in different packages.

You'll get a nasty runtime error if you have classes in the same package and some are in a signed JAR while others are not.

Are you asking because the package in question is yours, not third party code?
An easy example would be a web app with service and persistence layers as separate OSGi bundles. The persistence interfaces would have to be shared by both bundles.
If I've interpreted your question correctly, would the solution be to create a sealed JAR containing the shared interfaces and make it part of both bundles?
I don't mean to try and hijack the thread. I'm asking for clarification and some better insight from those who might have done more with OSGi to date than I have.

Related

Sanity check: how are packages and modules are meant to be used together?

Prior to Java 9, I had assumed that packages were a way to promote/enforce modularization of code and to solve the namespacing problem. Packages actually do a really poor problem of solving the latter (com.popkernel.myproject.Employee myEmployee = new com.popkernel.myproject.Employee();, ugh) so I primarily focused on the former benefit.
And while packages do a very poor job at enforcing modularity, I've found them quite effective at promoting modularity. "What package does this class belong in?" has always been an illuminating question, and I appreciate that Java forces me to ask myself it.
But now Java 9 modules are here and they enforce the modularity problem better than packages. In fact, you could compose a new project entirely out of modules where each module wraps a single package.
Of course, modules are a LOT of extra boilerplate compared to packages, not to mention the quirk that unlike packages, it is impossible to tell what module a class belongs to just by examining its .java file.
So going forward my plan for using packages and modules together is: as I'm writing code, group related concepts into packages. Then, if I decide I want to distribute the code I'm working on, formalize it into a module.
Is that a sane way to utilize those concepts in tandem?
Is that a sane way to utilize those concepts in tandem?
Yes, kind of. Though there are other benefits that you can gain from modularising your code such as Reliable configuration, Strong encapsulation, Increased Readability etc.
So going forward my plan for using packages and modules together is:
as I'm writing code, group related concepts into packages. Then, if I
decide I want to distribute the code I'm working on, formalize it into
a module.
A quick question you can ask yourself is that while you would be using modules, and as you would group related concepts into packages, eventually how do you group related packages then?
The answer that could match up to the reason is the introduction of a new kind of Java program component - Modules
A module is a named, self-describing collection of code and data. Its
code is organized as a set of packages containing types, i.e.,
Java classes and interfaces; its data includes resources and other
kinds of static information.
Until now such collections were primarily targeted as JAR files, that were distributed to be consumed as dependencies/libraries within other, but they are becoming legacy formats and differ quite a bit with modules. In one of my recent answers to Java 9 - What is the difference between "Modules" and "JAR" files? I have tried to detail the differences.

Ways to work around the lack of package access specifiers?

I'm new to Java. I've discovered, while trying to structure my code, that Java intimately ties source file organisation (directory structure) to package structure and package structure to external visibility of classes (a class is either visible to all other packages, or none).
This makes it quite difficult to organise the internal implementation details of my public library into logical units of related functionality while maintaining good encapsulation. JSR 294 explains it best:
Today, an implementation can be partitioned into multiple packages.
Subparts of such an implementation need to be more tightly coupled to
each other than to the surrounding software environment. Today
designers are forced to declare elements of the program that are
needed by other subparts of the implementation as public - thereby
making them globally accessible, which is clearly suboptimal.
Alternately, the entire implementation can be placed in a single
package. This resolves the issue above, but is unwieldy, and exposes
all internals of all subparts to each other.
So my question is, what workarounds exist for this limitation, and what are the pros & cons? Two are mentioned in the JSR - use packages for logical grouping (violating encapsulation); place everything in a single package (unwieldy). Are there other pros/cons to these workarounds? Are there other solutions? (I've become vaguely aware of OSGi bundles, but I've found it hard to understand how they work and what the the pros/cons might be (perhaps that's a con). It appears to be very intrusive compared to vanilla packages, to development & deployment.
Note: I'll upvote any good answers, but the the best answer will be one that comprehensively folds in the pros & cons of others (plagiarise!).
Related (but not duplicate!) questions
Anticipating cries of 'Possible duplicate', here are similar questions that I've found on SO; I present them here for reference and also to explain why they don't answer my question.
Java : Expose only a single package in a jar file: asks how to do it, but given that it's not possible in current releases of Java, doesn't discuss workarounds. Has interesting pointers to forthcoming Modularization (Project Jigsaw) in Java 8.
Package and visibility - duplicate question of the above, basically.
Best practice for controlling access to a ".internal" package - question and answers seem to be specific to OSGi or Eclipse plug-ins.
Tools like ProGuard can be used to repackage a JAR, exposing only those classes you specify in the configuration file. (It does this in addition to optimizing, inlining, and obfuscating.) You might be able to set up ProGuard in e.g. a Maven or Ant build, so you write your library exposing methods as public, and then use ProGuard to eliminate them from the generated JAR.
I'll get the ball rolling. Steal this answer and add to it/correct it/elaborate please!
Use multiple packages for multiple logical groupings
Pros: effective logical grouping of related code.
Cons: when internal implementation detail classes in different packages need to use one another, they must be made public - even to the end user - violating encapsulation. (Work around this by using a standard naming convention for packages containing internal implementation details such as .internal or .impl).
Put everything in one package
Pros: effective encapsulation
Cons: unwieldy for development/maintenance of the library if it contains many classes
Use OSGi bundles
Pros: ? (do they fix the problem?)
Cons: appears to be very intrusive at development (for both library user and author) and deployment, compared to just deploying .jar files.
Wait for Jigsaw in Java 8
http://openjdk.java.net/projects/jigsaw/
Pros: fixes the problem for good?
Cons: doesn't exist yet, not specific release date known.
I've never found this to be a problem. The workaround (if you want to call it that) is called good API design.
If you design your library well, then you can almost always do the following:
Put the main public API in one package e.g. "my.package.core" or just "my.package"
Put helper modules in other packages (according to logical groupings), but give each one it's own public API subset (e.g. a factory class like "my.package.foobarimpl.FoobarFactory")
The main public API package uses only the public API of helper modules
Your tests should also run primarily against the public APIs (since this is what you care about in terms of regressions or functionality)
To me the "right level of encapsulation" for a package is therefore to expose enough public API that your package can be used effectively as a dependency. No more and no less. It shouldn't matter whether it is being used by another package in the same library or by an external user. If you design your packages around this principle, you increase the chance of effective re-use.
Making parts of a package "globally accessible" really doesn't do any harm as long as your API is reasonably well designed. Remember that packages aren't object instances and as a result encapsulation doesn't matter nearly as much: making elements of a package public is usually much less harmful than exposing internal implementation details of a class (which I agree should almost always be private/protected).
Consider java.lang.String for example. It has a big public API, but whatever you do with the public API can't interfere with other users of java.lang.String. It's perfectly safe to use as a dependency from multiple places at the same time. On the other hand, all hell would break loose if you allowed users of java.lang.String to directly access the internal character array (which would allow in-place mutation of immutable Strings.... nasty!!).
P.S. Honourable mention goes to OSGi because it is a pretty awesome technology and very useful in many circumstances. However its sweet spot is really around deployment and lifecycle management of modules (stopping / starting / loading etc.). You don't really need it for code organisation IMHO.

Why shouldn't we use the (default)src package?

I recently started using Eclipse IDE and have read at a number of places that one shouldn't use the default(src) package and create new packages.
I just wanted to know the reason behind this.
Using the default package may create namespace collisions. Imagine you're creating a library which contains a MyClass class. Someone uses your library in his project and also has a MyClass class in his default package. What should the compiler do? Package in Java is actually a namespace which fully identifies your project. So it's important to not use the default package in the real world projects.
Originally, it was intended as a means to ensure there were no clashes between different pieces of Java code.
Because Java was meant to be run anywhere, and over the net (meaning it might pick up bits from Sun, IBM or even Joe Bloggs and the Dodgy Software Company Pty Ltd), the fact that I owned paxdiablo.com (I don't actually but let's pretend I do for the sake of this answer) meant that it would be safe to call all my code com.paxdiablo.blah.blah.blah and that wouldn't interfere with anyone else, unless they were mentally deficient in some way and used my namespace :-)
From chapter 7, "Packages", of the Java Language Spec:
Programs are organized as sets of packages. Each package has its own set of names for types, which helps to prevent name conflicts.
I actually usually start by using the default package and only move it into a real package (something fairly easy to do with the Eclipse IDE) if it survives long enough to be released to the wild.
Java uses the package as a way to differentiate between classes. By using packages, you can have an org.example.Something class and an org.example.extended.Something class and be able to differentiate between them even though they are both named Something. Since their packages are different, you can use them both in the same project.
By declaring a package you define your own namespace (for classes). This way if you have two identical classes using a different package name (namespace) will differentiate between which one you want to use.
The main reasons I can think of are:
It keeps things organised, which will help you (and others!) know where to look for classes/functionality.
You can define classes with the same name if they are in different packages.
Classes/etc in the default package cannot be imported into named packages. This means that in order to use your classes, other people will have to put all their classes in the default package too. This exacerbates the problems which reasons 1 & 2 solve.
From a java point of view, there are two general dev/deploy lifecycles you can folllow, either using ant to build and deploy, or the maven lifecycle. Both of these lifecycles look for source code and resources in local directories, and in the case of maven, in defined repositories, either locally or on the net.
The point is, when you set up a project, for development and eventually deployment, you want to build a project structure that is portable, and not dependent on the IDE, ie. your project can be built and deployed using either of your build environments. If you use a heavy dependence on the Eclipse framework for providing class variables, compile paths, etc.. you may run into the problem that your project will only build and deploy using that configurationj, and it may not be portable to another developers environment, so to speak.

package structure & directory structure

In Java web application, what is the exact meaning of the term "package structure" and "directory structure" ? Aren't they the same? I saw some articles have these two terms, but I am not sure about the exact meaning and difference.
Package is a collection of code that changes together, is used together and is shipped together. So a jar/war is a package.
Package Design Principles
I understand that you meant source package, which is more like directory structure. But I believe, a directory is a physical representation on hard drive.
EDIT: I had writtern original answer more than 3years back. But did not change as it was accepted. But changing it now so that any new visitor may benefit and also to avoid link rot. Some additional meaning of package may be extracted based on the discussion below. For example, is a jar a package?
Classes that get reused together should be packaged together so that the package can be treated as a sort of complete product available for you. And those which are reused together should be separated away from the ones those are not reused with. For example, your Logging utility classes are not necessarily used together with your file io classes. So package all logging them separately. But logging classes could be related to one another. So create a sort of complete product for logging, say, for the want of better name commons-logging package it in a (re)usable jar and another separate complete product for io utilities, again for the want of better name, say commons-io.jar. If you update say commons-io library to say support java nio, then you may not necessarily want to make any changes to the logging library. So separating them is better.
Now, let's say you wanted your logging utility classes to support structured logging for say some sort of log analysis by tools like splunk. Some clients of your logging utility may want to update to your newer version; some others may not. So when you release a new version, package all classes which are needed and reused together for migration. So some clients of your utility classes can safely delete your old commons-logging jar and move to commons-logging-new jar. Some other clients are still ok with older jar. However no clients are needed to have both these jars (new and old) just because you forced them to use some classes for older packaged jar.
Avoid cyclic dependencies. a depend on b; b on c; c on d; but d depends on a. The scenario is obviously deterring as it will be very difficult to define layers or modules, etc and you cannot vary them independly relative to each other.
Also, you could package your classes such that if a layer or module changes, other module or layers do not have to change necessarily. So, for example, if you decide to go from old MVC framework to a rest APIs upgrade, then only view and controller may need changes; your model does not.
In most Java applications, the package structure should be matched by the directory structure for the .java and .class files. However these directories are part of a larger directory structure, including other data than the source and/or the bytecode.
Depending on the context, the "package structure" might also refer to delivery packages, each containing an application or a library.

Resolving java package dependencies

It is time to sub-divide a platform I'm developing and I'm looking for advice on how to handle cross-component dependencies. I spose there a many cases, so I'll give an example.
I have an Address class that I want to make visible to developers. It is also referenced by classes in my.Contacts, my.Appointments, and my.Location packages - each of which I want to be separately compiled, jar-d, and delivered. Of course I want Address to be a single class - an Address works across these platform components transparently.
How should Address be packaged, built, and delivered?
Thanks!
Two thoughts:
Address sounds like a common component that can be used in different deliverables and so should be available in some common or core library
It may make sense for your components to talk to an Address interface, and the implementation can be provided separately (e.g. provide an Address interface and an AddressImpl implementation). This will reduce the amount of binding between the core library and the library your developers will develop.
In this case Address is a part of a library which deserves its own jar. If you create a class named Address in my.Contacts, my.Appointments, and my.Location and you want to use all theses jar in a same application, you'll have a conflict for your Address class.
I suggest you don't "Deliver" these jars separately. Java has very subtle versioning issues that you don't want to run into. Build everything together and package it into one or two jars and always deliver both jars, or build them together and deliver a subset of jars (but never combine new and old jars--don't just try to send a single jar as an update).
If you must build them separately be very aware that final constants are compiled in and not referenced--so if you change one and deliver a new jar, any references from an older jar will not be updated.
Also method signatures that change will have strange, unpredictable results.
It sounds like you want a developer interface as well--that may be a set of interfaces and classes that reside in a separate jar. If you make that one jar well enough that you never have to rev it (and, of course, with no references to external constants) you can probably get away with not updating it which will keep your customer's extensions from getting crusty.

Categories