I'm wondering about best practices when creating Javadocs. I have a project with many files. Code has been created by many developers. Each file has an annotation #author, so it is obvious who has created a particular class.
But when some other developer adds new code to a file, modifies it, etc., how should he inform the rest of the team that he has created some new function or has modified existing code? In other words, how should we "keep the Javadocs compatible with reality"? ;)
Add his name to the existing #author tag? Then, it is easier to identify who to ask in case of any doubts.
Add an #author tag to each new method, inner class, etc.?
Of course, since we use SVN, it is easy to investigate who has made what, but for keeping things clear this Javadoc stuff should be taken into consideration as well.
What's the best way to use these #author tags?
I would say that for most purposes #author is unwanted noise. The user of your API shouldn't - and probably doesn't - care, or want to know, who wrote which parts.
And, as you have already stated, SVN already holds this information in a much more authoritative way than the code can. So if I was one of the team, I would always prefer SVN's log and ignore the #author. I'd bet that the code will get out of sync with reality, whatever policy you adopted. Following the Don't Repeat Yourself principle, why hold this information in two places?
If, however, there is some bureaucratic or policy reason that this information MUST be included in the code, have you considered automatically updating the #author tag in the code on check in? You could probably achieve this with an SVN hook. You could for example list all the developers who changed a given file in the order they changed it; or who changed it most; or whatever. Or, if the #author is mandated in (source) code you release to the outside world, you could consider adding the #author automatically as part of the release build (I suspect you could get this information out of SVN somehow).
As for adding more than a single class level #author tag (or other comment), I'd say you'd be accumulating a lot of unhelpful noise. (Again, you have SVN.)
In my experience it is much more useful to identify a historical change (say a change to a line of code, or a method), then to work out which change set this relates to (and which track ticket). Then you have the full context for the change: you have the ticket, the change set, you can find other change sets on the same ticket, or around the same time, you can find related tickets, and you can see ALL the changes that formed that unit of work. You are never going to get this from annotation or comments in code.
You may want to consider why you want author tags in the source. The Apache Foundation do not and I agree.
https://web.archive.org/web/20150226035235/www.theinquirer.net/inquirer/news/1037207/apache-enforces-the-removal-of-author-tags
To my best understanding this is a cargo cult way of working from when sources were printed on paper. With modern version control systems this information and more can be found in the history anyway.
You can have more than one #author tag. In case you make some big changes to a class, just add a new #author tag with your own name in it. There's no need to mark the changes you've done or to put your name around the changes, as the revision history should be able to display that clearly.
In really big and long-running projects with lots of developers, it is useful to know ho is responsible for given code, who can provide you with extra information and such. In that case it would be handy to have such an informationin the file using #author tag. Not marking who created the file or who made some major contributions, but who is a contact person for that code. Those might be very different people as original author may be already on different project or left the company years ago.
I think on huge project that approach may be handy, however there is a caveat. Keeping every single file's author information is very difficult as there is huge amount of files and sooner or later will fail. More and more files will have outdated information and developers will no longer trust this #author as source of information and will just ignore it.
Solution, which may work, is not to keep #author on every single file, but only per module (high level packages). Javadoc has a feature, where you can document not only files but whole packages (See this question for more detail).
This is however a special case and as long as your project is not that big or old, I reccomend ommiting the author information.
I completely agree that it is unnecessary and you probably shouldn't add it. However I still add it, I see it like adding a signature to a painting, or adding it to a stamp on a piece of metal in a computer helped design. You wrote that piece of code, and adding your name shows that you are proud of it and that you are confident of its quality, even if it does nothing else. Even if it's changed in the future, you laid the foundations for everything built on top of it, and should it really be rewritten completely, the tag can be changed, removed or expanded. I agree that it is redundant thanks to version control, but having your name in version control isn't nearly as satisfying. If someone just adds a "final" or formats the code, their name will be in version control, even if they hardly contributed at all. I also agree that it is noise, in that it doesn't add anything to the code, however is it really even slightly noticeably annoying? I don't think so. If you are reasonable about it, I think it can make a project "more friendly", if that makes sense.
It is the year 2021 and When I am replying to this question it is nearly 8 years from the first publish. The world is a little bit different place and it is using microservices at full throttle. Therefore I would sum up the overall mood around authorship like this: it is pointless. Let me explain it on a few points:
Most widespread software or organisation projects are developed by multiple authors, not individuals anymore.
Majority of software is versioned in Git, CI/CD and cloud is reachable reality.
Advanced IDEs visualise greatly code changes. Code changes are more important than the overall class source code.
Handling bugs caused by code changes are far more important than handling bugs caused by the usage of the wrong class version.
Unless the software is a library, IP/commercial software, framework with regular releases, authorship has no meaning.
It is highly probable that the authors of the source code you use/work on are not working or never work in your organisation.
Maintaining an appropriate ratio of author contribution in author declaration leads to additional effort with 0 gain. Nobody want that.
Therefore knowledge of simple code edits and appropriate line changes are more important than knowledge of the whole class.
Here is my opinion on class authorship digested to the short article.
It's very handy to have #author tag and have multiple authors. Even Oracle's Documentation outlines that it's good practice to have #author on top of class to give credit to particular author who did the job and also to track down things if someone needs to be spoken to during development process. If there are multiple authors they can be listed in order they contributed on a particular java file/class. Yes, there are plugins and with git structure, you can check can see author's name hanging around in the code precisely, but this idea will be controversial though. Because, sometimes multiple authors edit same line of code and might not show two authors editing same line of code. I have plugin enabled, it never shows 2 authors name for editing same line of code. With big companies, it's handy to have this practice set up.
If it is company code, I would not do that: we have VCS. Instead, if it is a blog post or code snippets of my personal repo, I would proudly add this and hoping some copy-paste guy will find my code useful and accidentally, copy my name as well :)
Just my type of humour, I guess.
Building off of this answer and digging through links from posts from 2004 I found this (pasted below). It's from Dirk-Willem van Gulik who was one of the founders of the Apache Software Foundation so I assume he had some input on the decision to stop using #author tags.
List: xml-cocoon-dev Subject: Re: #author tags (WAS: RE: ASF
Board Summary for February 18, 2004) From: Dirk-Willem van Gulik
<dirkx () asemantics ! com> Date: 2004-02-27 10:33:32
Message-ID: 63E38432-6910-11D8-BA7E-000A95CDA38A () asemantics ! com
On Feb 27, 2004, at 12:45 AM, Conal Tuohy wrote:
I don't think the ASF should discourage developers from keeping useful
metadata about the code inside the source files. What better place to
put the metadata than in the code? This makes it more likely to be
used and kept up to date than if it was stored somewhere else, IMHO.
One way to look at this is that #author tags are in a way factually
'wrong'; in most cases it just signals which person wrote the first
skeleton of that code; but subsequently it was fixes, peer-reviewed
and looked at by a whole community. Also do not forget the many
people in your community which help with QA, Documentation,
user-feedback and so on. To put one person in the (hot) seat for
what is essentially a group effort is not quite right.
Looking through the CVS logs of a few tomcat files: each block of 30
lines seems to have had commits of at least 5 persons; with a median
of 6 and an average of 9. The average number of #author tags on
those arbitrary blocks is about 0.5. And that is not counting QA,
docs, suggestions of mailing lists, bug resolutions, user support.
I.e. those things which make tomcat such a great supported product.
Secondly what we 'sell' as the ASF brand is a code base which is peer
reviewed, quality controlled and created by a sustainable group which
will survive the coming and going of volunteers. One where knowledge
is generally shared and not just depended on one single individual.
This is one of the key reasons why large companies, governments, etc
have a lot less qualms about using apache than using most other open
source; we mitigate the worry that it depends on a single person,
and can implode or fork without warning, right from the get-go.
Finally - a lot of developers do live in countries where you can get
sued. The ASF can provide a certain level of protection; but this is
based on the KEY premisse that there is oversight and peer review.
That what we ship is a community product; and that everything is
backed by the community and cannot be attributed to a single person.
Every commit gets peer review; ever release requires +1s' and are
backed by the community as a whole. #author tags are by necessity
incomplete and thus portrait the situation inaccurately. Any hint or
suggestion that parts of the code are not a community product makes
defence more complex and expensive. We do not want to temp anyone -
but rather present a clean picture with no blemishes or easy go's.
And to give this a positive slant; be -proud- of this culture; be
proud of being part of something larger of incredible quality. Each
of you did not just write a few pesky lines of code surrounded by an
#author tag; but where instrumental in getting the -whole- thing work
! And if you are ever trying to understand why cocoon made it this
far, and other commercial/open-source projects did not, then do look
there; quality and a sense of long term stability.
Take Care, Have fun,
Dw
While tracking this down I came across a few blog posts against this and some in favor. That's a very small sample size but I think it is fair to say this was at least a controversial change -- some folks wanted them to stay. That said, in 2022 I don't really ever see them used. (Remember, this mail is from 2004.) You even mentioned it yourself about SVN history (but now maybe Git is the more common tool) and even in this mail they mention the CVS logs (another source control tool). Maybe the source control tools of today are easier to use, who knows.
I feel like there still might be some oddball use cases that make sense but I don't think the standard idea of "I wrote (or modified) this so I'm putting myself as the #author" is necessary. I currently have a question on Code Review about using it for Creative Commons attribution and (in my opinion) I think it's a good use but I would hardly call that a well-accepted good practice (but I don't think it hurts).
Related
My real question is about how to look up the expectations on the methods (the 'contract' for a method) in Spring. I keep hitting questions, where unless I find some blogger or a stack-overflow that addresses that specific issue, there seems to be no informative documentation. Am I looking the wrong places? Do I need to buy some book?
In the current specific case: I have working looking up a user/password by making my SQL table map to Spring's defaults, but when a user is absent it's hitting a null pointer exception. I see JdbcUserDetailsManager's "void setUserExistsSql( anSQLString)", and I want to know if that sql-string should return a boolean? a null? and what it should be 'named.' Googling is not turning up any usage examples, nor any documentation. The javadocs I'm finding are uncommented. I can guess-and-test, but it seems there should be a better way to look-it-up?
Ok, I've been working with spring since version 1, and many other open-source projects follow the same pattern. Documentation is hard and expensive to produce, and programmers donating their time for free often don't want to write it. Spring though is one of the better projects as far as documentation is concerned.
However, I've always found it necessary to link spring's source code into my project. If you're using maven you can download the sources along with the jars, and tools like IntelliJ (and probably eclipse) will allow you to drill down into the source and to trace its execution with their debuggers.
With these types of projects it is almost always necessary at some point to drill down and read the source, and that's a good thing because the source is always up to date and always exactly describes the behaviour you're trying to use. Documentation on the other hand is often badly written using an informal language (e.g. English) and it can never accurately describe anything, especially if it's being written or read by someone who isn't a native speaker, which is often the case.
So, to answer your question -- look to the source.
I am starting a new project which might be open-sourced later on and/or at least get some external contributors during its life-time.
I am now thinking about what the best approach to code-style / auto-formatting would be. I am a strong supporter of only having auto-formatted code committed to a project, as this eliminates the differences between individual developers and helps keeping individual commits clutter-free of reformatting issues.
My first approach was to use Eclipse built-in style for the project, but I really don't like the default style, because I think line-break at 80 characters is way out-dated for today's screen resolutions. Also, as the name suggests, it's available only for people using Eclipse as IDE.
So I was also thinking about using my own formatter settings and checking the exported settings into the project's repository so that any contributor can pick them up. Again, this would force most people to use Eclipse, as I am not aware of any formatting definition that can be read by multiple IDEs.
Any hint how this is handled in other projects? I searched some github repositories, but to me it seems that this issue is more or less ignored by a lot of projects.
I do understand that this question may be border-line for Stack Overflow, as I don't know if a definite answer is possible and if this triggers a discussion, but it is something I often struggle with when starting a new project.
While screens grow wider, they don't seem to grow taller.
Whatever you other drivers are, preserve vertical space. Put { and } on lines containing other language key words, if you can.
In any case, use a maven plugin or other automated tool in your compile chain to enforce the rules that you care about. That way they are unambiguous.
Also don't create too many rules that don't matter. Each rule costs time to make the code comply.
I understand your concern and in my opinion the best approach is to create code formatting preference file which can be shared along with the project.
For example in eclipse Using a file explorer, navigate to //.settings and copy org.eclipse.jdt.core.prefs to a new location. This file contains all your formatting settings. Hence this can be shared to maintain the code formatting consistencies.
If not that then you might have to rely on the editor specific code formatting.
I definitely look forward to other expert opinion on the same if what I have shared is not optimal as per the requirement.
I've recently been more and more frustrated with a problem I see emerging in my projects code-base.
I'm working on a large scale java project that has >1M lines of code. The interfaces and class structure are designed very well and the engineers writing the code are very proficient. The problem is that in an attempt to make the code cleaner people write Utility classes whenever they need to reuse some functionality, as a result over time and as the project grows more and more utility methods crop up. However, when the next engineer comes across the need for the same functionality he has no way of knowing that someone had already implemented a utility class (or method) somewhere in the code and implements another copy of the functionality in a different class. The result is a lot of code duplication and too many utility classes with overlapping functionality.
Are there any tools or any design principles which we as a team can implement in order to prevent the duplication and low visibility of the utility classes?
Example: engineer A has 3 places he needs to transform XML to String so he writes a utility class called XMLUtil and places a static toString(Document) method in it. Engineer B has several places where he serializes Documents into various formats including String, so he writes a utility class called SerializationUtil and has a static method called serialize(Document) which returns a String.
Note that this is more than just code-duplication as it is quite possible that the 2 implementations of the above example are different (say one uses transformer API and the other uses Xerces2-J) so this can be seen as a "best-practices" problem as well...
Update: I guess I better describe the current environment we develop in.
We use Hudson for CI, Clover for code coverage and Checkstyle for static code analysis.
We use agile development including daily talks and (perhaps insufficient) code reviews.
We define all our utility classes in a .util which due to it's size now has 13 sub-packages and about 60 classes under the root (.util) class. We also use 3rd party libraries such as most of the apache commons jars and some of the jars that make up Guava.
I'm positive that we can reduce the amount of utilities by half if we put someone on the task of refactoring that entire package, I was wondering if there are any tools which can make that operation less costly, and if there are any methodologies which can delay as much as possible the problem from recurring.
A good solution to this problem is to start adding more object-orientation. To use your example:
Example: engineer A has 3 places he needs to transform XML to String so he writes a utility class called XMLUtil and places a static toString(Document) method in it
The solution is to stop using primitive types or types provided by the JVM (String, Integer, java.util.Date, java.w3c.Document) and wrap them in your own project-specific classes. Then your XmlDocument class can provide a convenient toString method and other utility methods. Your own ProjectFooDate can contain the parsing and formatting methods that would otherwise end up in various DateUtils classes, etc.
This way, the IDE will prompt you with your utility methods whenever you try to do something with an object.
Your problem is a very common one. And a real problem too, because there is no good solution.
We are in the same situation here, well I'd say worse, with 13 millions line of code, turnover and more than 800 developers working on the code. We often discuss about the very same problem that you describe.
The first idea - that your developers have already used - is to refactor common code in some utility classes. Our problem with that solution, even with pair programming, mentoring and discussion, is that we are simply too many for this to be effective. In fact we grow in subteams, with people sharing knowledge in their subteam, but the knowledge doesn't transit between subteams. Maybe we are wrong but I think that even pair programming and talks can't help in this case.
We also have an architecture team. This team is responsible to deal with design and architecture concerns and to make common utilities that we might need. This team in fact produces something we could call a corporate framework. Yes, it is a framework, and sometimes it works well. This team is also responsible to push best practices and to raise awareness of what should be done or not, what is available or what is not.
Good core Java API design is one of the reason for Java success. Good third party open sources libraries count a lot too. Even a small well crafted API allows to offer a really useful abstraction and can help reduce code size a lot. But you know, making framework and public API is not the same thing at all as just coding an utility class in 2 hours. It has a really high cost. An utility class costs 2 hours for the initial coding, maybe 2 days with debugging and unit tests. When you start sharing common code on big projects/teams, you really make an API. You must ensure perfect documentation then, really readable and maintainable code. When you release new version of this code, you must stay backward compatible. You have to promote it company wide (or at least team wide). From 2 days for your small utility class you grow to 10 days, 20 days or even 50 days for a full-fledged API.
And your API design may not be so great. Well, it is not that your engineers are not bright - indeed they are. But are you willing to let them work 50 days on a small utility class that just help parsing number in a consistent way for the UI? Are you willing to let them redesign the whole thing when you start using a mobile UI with totally different needs? Also have you noticed how the brightest engineers in the word make APIs that will never be popular or will fade slowly? You see, the first web project we made used only internal frameworks or no framework at all. We then added PHP/JSP/ASP. Then in Java we added Struts. Now JSF is the standard. And we are thinking about using Spring Web Flow, Vaadin or Lift...
All I want to say is that there is no good solution, the overhead grows exponentially with code size and team size. Sharing a big codebase restricts your agility and responsiveness. Any change must be done carefully, you must think of all potential integration problems and everybody must be trained of the new specificities and features.
But the main productivity point in a software company is not to gain 10 or even 50 lines of code when parsing XML. A generic code to do this will grow to a thousand lines of code anyway and recreates a complex API that will be layered by utility classes. When the guy make an utility class for parsing XML, it is good abstraction. He give a name to one dozen or even one hundred lines of specialized code. This code is useful because it is specialized. The common API allows to work on streams, URL, strings, whatever. It has a factory so you can choose you parser implementation. The utility class is good because it work only with this parser and with strings. And because you need one line of code to call it. But of course, this utility code is of limited use. It works well for this mobile application, or for loading XML configuration. And that's why the developer added the utility class for it in the first place.
In conclusion, what I would consider instead of trying to consolidate the code for the whole codebase is to split code responsibility as the teams grow:
transform your big team that work on one big project into small teams that work on several subprojects;
ensure that interfacing is good to minimize integration problems, but let team have their own code;
inside theses teams and corresponding codebases, ensure you have the best practices. No duplicate code, good abstractions. Use existing proven APIs from the community. Use pair programming, strong API documentation, wikis... But you should really let different teams make their choices, build their own code, even if this means duplicate code across teams or different design decisions. You know, if the design decisions are different this may be because the needs are different.
What you are really managing is complexity. In the end if you make one monolithic codebase, a very generic and advanced one, you increase the time for newcomers to ramp up, you increase the risk that developers will not use your common code at all, and you slow down everybody because any change has far greater chances to break existing functionality.
There are several agile/ XP practices you can use to address this, e.g.:
talk with each other (e.g. during daily stand-up meeting)
pair programming/ code review
Then create, document & test one or several utility library projects which can be referenced. I recommend to use Maven to manage dependecies/ versions.
You might consider suggesting that all utility classes be placed in a well organized package structure like com.yourcompany.util.. If people are willing to name sub packages and classes well, then at least if they need to find a utility, they know where to look. I don't think there is any silver bullet answer here though. Communication is important. Maybe if a developer sends a simple email to the rest of the development staff when they write a new utility, that will be enough to get it on people's radar. Or a shared wiki page where people can list/document them.
Team communication (shout out "hey does someone have a Document toString?")
Keep utility classes to an absolute minimum and restrict them to a single namespace
Always think: how can I do this with an object. In your example, I would extend the Document class and add those toString and serialize methods to it.
This problem is helped when combining IDE "code-completion" features with languages which support type extensions (e.g. C# and F#). So that, imagining Java had a such a feature, a programmer could explore all the extension methods on a class easily within the IDE like:
Document doc = ...
doc.to //list pops up with toXmlString, toJsonString, all the "to" series extension methods
Of course, Java doesn't have type extensions. But you could use grep to search your project for "all static public methods which take SomeClass as the first argument" to gain similar insight into what utility methods have already been written for a given class.
Its pretty hard to build a tool that recognizes "same functionality". (In theory this is in fact impossible, and where you can do it in practice you likely need a theorem prover).
But what often happens is people clone clode that is close to what they want, and then customize it. That kind of code you can find, using a clone detector.
Our CloneDR is a tool for detecting exact and near-miss cloned code based on using parameterized syntax trees. It matches parsed versions of the code, so it isn't confused by layout, changed comments, revised variable names, or in many cases, inserted or deleted statements. There are versions for many languages (C++, COBOL, C#, Java, JavaScript, PHP, ...) and you can see examples of clone detection runs at the provided
link. It typically finds 10-20% duplicated code, and if you abstract that code into library methods on a religious base, your code base can actually shrink (that has occurred with one organization using CloneDR).
You are looking for a solution that can you help you manage this inevitable problem, then I can suggest a tool:
TeamCity: an amazing easy to use product that manages all your automated code building from your repository and runs unit tests etc.
It's even a free product for most people.
The even better part: it has built in code duplicate detection across all your code.
More stuff to read up:
Tools to detect duplicated code (Java)
a standard application utility project. build a jar with the restricted extensibility scope and package based on functionality.
use common utilities like apache-commons or google collections and provide an abstraction
maintain knowledge-base and documentation and JIRA tracking for bugs and enhancements
evolutionary refactoring
findbugs and pmd for finding code duplication or bugs
review and test utility tools for performance
util karma! ask team members to contribute to the code base, whenever they find one in the existing jungle code or requiring new ones.
I have some Java programs, now I want to find out whether it is modular or not, if it is modular then up to what extent, because modularity can never be binary term i.e. 0 or 1.
How do I decide that particular code is modular upto this much extent. I want to know how to make code much more modular?
Some Benchmarks for modularity:
How many times are you rewriting similar code for doing a particular task?
How much do you have to refactor your code when you change some part of your program?
Are the files small and easy to navigate through?
Are the application modules performing adequately and independently as and when required?
Is your code minimally disastrous? Does all hell break lose when you delete just one function or variable? Do you get 20-odd errors upon re-naming a class? (To examine this, you can implement a stacking mechanism to keep trace of all the hops in your application)
How near is the code to natural language usage? (i.e. modules and their subcomponents represent more real world objects without giving much concern to net source file size).
For more ideas check out this blurb about modularity and this one on software quality
As for your concern on making your code more modular first you should ask yourself the above questions, obtain specific answers for them and then have a look at this.
The basic philosophy is to break down your application into as small of code fragments as possible, arranged neatly across a multitude of easily understandable and accessible directory layouts.
Each method in your application must do no more than the minimum quanta of processing needed. Combining these methods into more and more macro level methods should lead you back to your application.
Key points are
Separation of concerns
Cohesion
Encapsulation (communicates via interface)
Substitutability
Reusability
A good example of such module system is standard car parts like disk brakes and car stereo.
You don't want to build car stereo from scratch when you are building cars. You'd rather buy it and plug it in. You also don't want the braking system affecting the car stereo — or worse car stereo affecting the brake system.
To answer your question, "How do I decide that particular code is modular up to this much extent," we can form questions to test the modularity. Can you easily substitute your modules with something else without affecting other parts of your application?
XML parsers could be another example. Once you obtain the DOM interface, you really don't care which implementation of XML parser is used underneath (e.g. Apache Xerces or JAXP).
In Java, another question may be: Are all functionality accessible via interfaces? Interface pretty much takes care of the low coupling.
Also, can you describe each module in your system with one sentence? For example, a car stereo plays music and radio. Disk brakes decelerate the vehicle safely.
(Here's what I wrote to What is component driven development?)
According to Wikipedia, Component-Based Development is an alias for Component-based software engineering (CBSE).
[It] is a branch of software
engineering, the priority of which is
the separation of concerns in respect
of the wide-ranging functionality
available throughout a given software
system.
This is somewhat vague, so let's look at more details.
An individual component is a software
package, or a module, that
encapsulates a set of related
functions (or data).
All system processes are placed into
separate components so that all of the
data and functions inside each
component are semantically related
(just as with the contents of
classes). Because of this principle,
it is often said that components are
modular and cohesive.
So, according to this definition, a component can be anything as long as it does one thing really well and only one thing.
With regards to system-wide
co-ordination, components communicate
with each other via interfaces. [...]
This principle results in components referred to as encapsulated.
So this is sounding more and more like what we think of good API or SOA should look like.
The provided interfaces are represented by a lollipop and required interfaces are represented by an open socket symbol attached to the outer edge of the component in UML.
Another important attribute of
components is that they are
substitutable, so that a component
could be replaced by another (at
design time or run-time), if the
requirements of the initial component
(expressed via the interfaces) are met
by the successor component.
Reusability is an important
characteristic of a high quality
software component. A software
component should be designed and
implemented so that it can be reused
in many different programs.
Substitutability and reusability is what makes a component a component.
So what's the difference between this and Object-Oriented Programming?
The idea in object-oriented
programming (OOP) is that software
should be written according to a
mental model of the actual or imagined
objects it represents. [...]
Component-based software engineering,
by contrast, makes no such
assumptions, and instead states that
software should be developed by gluing
prefabricated components together much
like in the field of electronics or
mechanics.
To answer your specific question of how to make the code more modular, a couple of approaches are:
One of best tool for modularization is spotting code re-use. If you find that your code does the same exact (or very similar) thing in more than once place, it's a good candidate for modularizing away.
Determine which pieces of logic can be made independent, in a sense that other logic would use them without needing to know how they are built. This is somewhat similar to what you to in OO design, although module/component does not necessarily need to correspond to a modeled object as in OO.
Hej,
See, "How to encapsulate software (Part 1)," here:
http://www.edmundkirwan.com/encap/overview/paper7.html
Regards,
Ed.
Since this has been tagged with 'osgi', I can throw in an OSGi-related perspective.
The short answer is that it is possible to go from completely spaghetti code to modular in small steps; it doesn't have to be a big bang. For example, even spaghetti code depends on some kind of bolognaise logging library, so in some sense, it's already modular, just with One Very Big Metball (sorry, module) in it.
The trick is to break the big meatball into one smaller chunk and then a slightly less big meatball and then recurse. It doesn't all have to be done in one go either; simply chip off a bit more each time until there is nothing left to remove.
As for OSGi, it's still possible to put an uber-jar into a bundle. In fact, you can do this without changing the bits; either by modifying the Manifest.MF in place, or by wrapping that in another JAR and specify Bundle-ClassPath: metaball.jar in the manifest.
Failing that, tools like BND can help generate the right data you'd need, and then it can be dropped in an OSGi runtime easily enough. But beware of overly coupled code, and stuff that mucks around with classloaders - those will trip you up.
Assuming I understand your question, that you want to know what it is that makes code modular, since code modules will obviously need some dependency between each other to work at all. This is my answer:
If you can break your system down into modules, and you can test those modules in isolation, that is a good indication that a system is modular.
As you say modularity is not a binary thing so it depends on your relative definition.
I would say: Can you use a given method in any program where you need to perform that function? Is it the "black box" where you wouldn't need to know what it were doing under the hood? If the answer is no, i.e. the method would only work properly in that program then it is not truely modular.
Modularity is relative to who ever is developing the code. But I think the general consensus is that modular code is code that has portions that can easily be swapped out without changing most of the original code.
IMHO, If you have 3 modules A B and C and you want to change or replace module C completely, if it is a SIMPLE task to do so then you have modular code.
You can use a code analysis tool such as CAP to analyse the dependencies between types and packages. They'll help you find and remove any cyclic dependencies, which are often a problem when trying to develop modular code.
If there are no cyclic dependencies, you can start separating your code into discrete jars.
In general it is good practice to code to interfaces if you can, this generally means your code can more easily be refactored and/or used in different contexts.
Dependency injection frameworks such as Spring can also help with the modularity of your design. As types are injected with their dependencies by some external configuration process they don't need a direct dependency on an implementation.
The package-by-feature idea helps to make code more modular.
Many examples seen on the web divide applications first into layers, not features
models
data access
user interface
It seems better, however, to divide applications up using top-level packages that align with features, not layers.
Here is an example of a web app that uses package-by-feature. Note the names of the top-level packages, which read as a list of actual features in the application. Note as well how each package contains all items related to a feature - the items aren't spread out all over the place; most of the time, they are all in a single package/directory.
Usually, deletion of a feature in such an app can be implemented in a single operation - deletion of a single directory.
I'm working on a Java library and would like to remove some functions from it. My reasons for this is public API and design cleanup. Some objects have setters, but should be immutable, some functionality has been implemented better/cleaner in different methods, etc.
I have marked these methods 'deprecated', and would like to remove them eventually. At the moment I'm thinking about removing these after few sprints (two week development cycles).
Are there any 'best practices' about removing redundant public code?
/JaanusSiim
Set a date and publicize it in the #deprecated tag. The amount of time given to the removal depends on the amount of users your code has, how well connected you are with them and the the reason for the change.
If you have thousands of users and you barely talk to them, the time frame should probably be in the decades range :-)
If your users are your 10 coworkers and you see them daily, the time frame can easily be in the weeks range.
/**
* #deprecated
* This method will be removed after Halloween!
* #see #newLocationForFunctionality
*/
Consider it this way, customer A downloads the latest version of you library file or frame work. He hits compile on this machine and suddenly he see thousands of errors because the member file or function does no longer exist. From this point on, you've given the customer a reason why not to upgrade to your new version and to stay with the old version.
Raymond Chen answers this the best with his blog about win32 API,
Though, our experience in our software house has been, once the API has been written we have to carry the API to the end of the product life cycle. To help users to new versions, we provide backwards compatibility with the old commands in the new framework.
It depends on how often the code is rebuild. For example, if there are 4 applications using the library, and they are rebuild daily, a month is a long enough time to fix the deprecated calls.
Also, if you use the deprecated tag, provide some comment on which code replaces the deprecated call.
Use #deprecated tag. Read the Deprecation of APIs document for more info.
After everyone using the code tells you they have cleaned up on their side, start removing the deprecated code and wait and see if someone complains - then tell them to fix their own code...
Given that this is a library, consider archiving a version with the deprecated functions. Make this version available in both source code and compiled form, as a backup solution for those who haven't modernized their code to your new API. (The binary form is needed, because even you may have trouble compiling the old version in a few years.) Make it clear that this version will not be supported and enhanced. Tag this version with a symbolic symbol in your version control system. Then move forward.
It certainly depends at which scale your API is used and what you promised upfront to your customers.
As described by Vinko Vrsalovic, you should enter a date when they have to expect the abandon of the function.
In production, if it's "just" a matter of getting cleaner code, I tend to leave things in place even past the deprecating date as long as it doesn't break anything.
On the other hand in development I do it immediately, in order to get things sorted out quickly.
You may be interested in examples of how deprecation works in some other projects. For example, here follows what the policy in the Django project for function deprecation is:
A minor release may deprecate certain features from previous releases. If a feature in version A.B is deprecated, it will continue to work in version A.B+1. In version A.B+2, use of the feature will raise a PendingDeprecationWarning but will continue to work. Version A.B+3 will remove the feature entirely.
too bad you are not using .Net :(
The built in Obsolete attribute generates compiler warnings.