how many classes per package? methods per class? lines per method? - java

I have to give a general note to some huge Java project for which I have but little visibility and I was wondering if there were any guidelines for determining:
what number of classes per package can be considered right, to low, or to high (this project has 3.89 classes per package, which seems a bit too small for me),
number of methods per class? (this project has 6.54 methods per class...
number of lines per method? (this project has about 7 lines per method (seems pretty good to me, maybe a bit low))
I should note that this question is only dealing with volumetry. I have a bunch of reports from quality tools (checkstyle, jdepend, cpd, pmd, ncss) that give me more vision about code redundancy, classes usage, bugs, etc.

Steve McConnell in his book Code Complete recommends about 7 methods per class and no more lines in a method then can be viewed in a single screen without scrolling.
I'm not sure about classes per package.
I would highly recommend reading Code Complete for more information on such topics.

I think stats like that are pretty useless, how does knowing the lines per method show whether its any use to the project or not; i think you should be looking more along the lines of:
Do your packages encompass like
classes?
Do your classes work as an
entity on their own?
Do the methods
within the classes function
correctly and efficiently?
Surely other than memory usage, it doesn't matter whether the method is large or not? the other thing to look for in very protracted methods is whether the stack trace is going to be bigger than adding that functionality to a parent method. I'd be wary of measuring a projects success based on the lines of code.

Robert C. Martin, who recently released the book "Clean Code", states that the number of lines per method should be the absolutely smallest possible. Between 1-7 lines is a good rule of thumb.
There's also a good point being made in the book The ThoughtWorks Anthology, in the essay “Object Calisthenics” by Jeff Bay. He suggests 9 pretty hardcore constraints that will make you a better OO developer in the long run. Read more about them here.
To answer your specific questions, these are the constraints specifically to you:
- No more than 10 classes per package
- A maximum of 50 lines per class
These constraints might not be ideal for all of your real projects, but using them in a small (hobby?) project will force you into a better practice.

Unfortunately, there's no absolute (objective) notion of quality in software. Thus, there's no "right" value for these. However, here are a two (personal) obsevations:
3.89 classes/package is very low. It means you'll be battling through a complicated tree of packages.
7 Lines per method: Indeed sounds good. However, if these number was arrived at as a result of an intentional effort to reduce the line count of methods then you might have ended up with a single logical task being spread around several private methods which will make it more difficult to understand the class (in certain cases). Actually in CodeComplete-2, the author cites a research which discovered that method length is much less importance than its cyclomatic complexity and its nesting level.

A useful design guideline says that each class should only do one thing and do it well. This will not give you a fixed number of methods per class, but it will limit the number and make the class easier to comprehend and maintain.
For methods you can adopt a similar view and aim for methods that are as small as possible, but no smaller. Think of it this way: if you can split the method into two or more distinct parts it is obviously not as small as it could be. Small methods are easy to understand and by splitting the code like this you will get a better overview in high level methods and push details to low level methods.

(note: tl;dr available at the very bottom for my real opinion)
I'm not going to quote any big name and say that's the right answer because it's always very case dependant how you do all this stuff. For example the number of methods: If you're making a control software for modern HD LCD TV's remote controller which has about 40-50 buttons, how can you break that down into classes coherently so that you only have like, say, 7 methods per class?
Personally I like to keep all the methods of one accessor level in one class which means some utility classes may end up having hundreds of methods but in my opinions it's easier to do something like StringUtil.escapeXMLspecialCharacters(someString) than StringUtil.XML.escapeSpecialCharacters(someString) or XMLUtil.escapeSpecialCharacters(someString). While these all are seemingly OK solutions, the first one thrives (at least in my mind, that is!) because it's simple and very easy way to access that method: You don't have to think if the string you're handling contains XML or XHTML or JSON or whatever, you'll just pick one method from the general group of methods and that's it.
Keeping on the previous TV remote analogy, lets assume you do split them to various classes anyway. If we allow ourselves to have 7 of such methods per class on average and manage to group the buttons on the remote to sensical groups like MenuButtons, AdjustmentButtons and 'NumberSelectorButtons', we end up with 8 or so classes. That's not a bad thing actually, but it gets slightly confusing easily especially if they're not divided to sensical groups with great care. Just imagine the rants around your TVRemotes'R'Us Inc. office: "Who says the power on/off button is a control button?" "Who's the joker who put volume +/- to menu buttons? PRE/CH (the button which switches between current and previous channel and/or image source) button isn't a number button!" "The guide button opens both tv guide AND navigational menu depending on context, what are we going to do with it!?"
So as you can hopefully see from this example, using some arbitrary number to limit yourself could introduce some unneeded complexity and break the logical flow of the application.
Before I throw in my last two cents, one thing about the number of lines per method: Think code as blocks. Each loop is a block, each conditional is a block and so on and so forth. What is the minimum amount of these blocks needed for a unit of code which has a single responsibility? That should be your limiter, not the desire to have "Seven everywhere." from number of classes in package, methods in classes and lines of code in methods.
And here's the TL;DR:
So, my real opinion is actually this: The number of classes in package should be fairly low. I've been lately starting to do the following but I'm not sure if I'll keep up to it:
Package foo contains interfaces and other common classes for implementations.
Package foo.bar contains implementation of said interfaces for function bar
Package foo.baz contains implementation of said interfaces for function baz
This usually means my whole structure has a coherent (and most likely low) number of classes and by reading the top level class interfaces (and their comments) I should be able to understand the other packages too.
Methods per class: All which are needed as I explained above. If your class can't live without 170 methods, then let it have them. Refactoring is a virtue, not something that can be applied all the time.
Lines per method: As low as possible, I usually end up with 10 to 25 lines per method and 25 is a bit high for me so I'd say 10 is a good balance point for that.

Related

How to apply an arbitrary number of rules to objects (that may have interactions between rules that cannot be determined ahead of time)?

Suppose that I have a game that has fairly simple rules, but may have any number of rules modifiers. (For boardgames, the classical example would be Cosmic Encounter). The base rules are easy to program, but you cannot write all the conditional logic at the time. For example, if your game has 100 possible modifiers (but say only 5 are active at a time) that's 100 choose 5 possibilities and you can't enumerate them manually. So how is this done?
(This could also be for any business logic rules engine, but games are where I seem to encounter this more often).
I would think you might be able to do this by having "state" objects and then passing them through a series of modifier objects, but I'm not sure this is the right way of doing it. What is the technique / style for this? What are some good primers/examples?
(I can program well enough, but with my EE degree have only whatever theory I've picked up on the job....).

Why do we need getters?

I have read the stackoverflow page which discusses "Why use getters and setters?", I have been convinced by some of the reasons using a setter, for example: later validation, data encapsulation, etc. But what is the reason of using getters anyway? I don't see any harm of getting a value of a private field, or reasons to validation before you get the a field's value. Is it OK to never use a getter and always get a field's value using dot notation?
If a given field in a Java class be visible for reading (on the RHS of an expression), then it must also be possible to assign that field (on the LHS of an expression). For example:
class A {
int someValue;
}
A a = new A();
int value = a.someValue; // if you can do this (potentially harmless)
a.someValue = 10; // then you can also do this (bad)
Besides the above problem, a major reason for having a getter in a class is to shield the consumer of that class from implementation details. A getter does not necessarily have to simply return a value. It could return a value distilled from a Collection or something else entirely. By using a getter (and a setter), we free the consumer of the class from having to worry about the implementation changing over time.
I want to focus on practicalities, since I think you're at a point where you haven't seen the conceptual benefits line up just yet with the actual practice.
The obvious conceptual benefit is that setters and getters can be changed without impacting the outside world using those functions. Another Java-specific benefit is that all methods not marked as final are capable of being overriden, so you get the ability for subclasses to override the behavior as a bonus.
Overkill?
Yet you're probably at a point where you've heard these conceptual benefits before and it still sounds like overkill for your more daily scenarios. A difficult part of understanding software engineering practices is that they are generally designed to deal with very real world, large-scale codebases being managed by teams of developers. A lot of things are going to seem like overkill initially when you're just working on a small project of your own.
So let's get into some practical, real-world scenarios. I formerly worked in a very large-scale codebase. It a was low-level C codebase with a long legacy and sometimes barely a step above assembly, but many of the lessons I learned there translate to all kinds of languages.
Real-World Grief
In this codebase, we had a lot of bugs, and the majority of them related to state management and side effects. For example, we had cases where two fields of a structure were supposed to stay in sync with each other. The range of valid values for one field depended on the value of the other. Yet we ran into bugs where those two fields were out of sync. Unfortunately since they were just public variables with a very global scope ('global' should really be considered a degree with respect to the amount of code that can access a variable rather than an absolute), there were potentially tens of thousands of lines of code that could be the culprit.
As a simpler example, we had cases where the value of a field was never supposed to be negative, yet in our debugging sessions, we found negative values. Let's call this value that's never supposed to be negative, x. When we discovered the bugs resulting from x being negative, it was long after x was touched by anything. So we spent hours placing memory breakpoints and trying to find needles in a haystack by looking at all possible places that modified x in some way. Eventually we found and fixed the bug, but it was a bug that should have been discovered years earlier and should have been much less painful to fix.
Such would have been the case if large portions of the codebase weren't just directly accessing x and used functions like set_x instead. If that were the case, we could have done something as simple as this:
void set_x(int new_value)
{
assert(new_value >= 0);
x = new_value;
}
... and we would have discovered the culprit immediately and fixed it in a matter of minutes. Instead, we discovered it years after the bug was introduced and it took us meticulous hours of headaches to trace it down and fix.
Such is the price we can pay for ignoring engineering wisdom, and after dealing with the 10,000th issue which could have been avoided with a practice as simple as depending on functions rather than raw data throughout a codebase, if your hairs haven't all turned grey at that point, you're still generally not going to have a cheerful disposition.
The biggest value of getters and setters comes from the setters. It's the state manipulation that you generally want to control the most to prevent/detect bugs. The getter becomes a necessity simply as a result of requiring a setter to modify the data. Yet getters can also be useful sometimes when you want to exchange a raw state for a computation non-intrusively (by just changing one function's implementation), e.g.
Interface Stability
One of the most difficult things to appreciate earlier in your career is going to be interface stability (to prevent public interfaces from changing constantly). This is something that can only be appreciated with projects of scale and possibly compatibility issues with third parties.
When you're working on a small project on your own, you might be able to change the public definition of a class to your heart's content and rewrite all the code using it to update it with your changes. It won't seem like a big deal to constantly rewrite the code this way, as the amount of code using an interface might be quite small (ex: a few hundred lines of code using your class, and all code that you personally wrote).
When you work on a large-scale project and look down at millions of lines of code, changing the public definition of a widely-used class might mean that 100,000 lines of code need to be rewritten using that class in response. And a lot of that code won't even be your own code, so you have to intrusively analyze and fix other people's code and possibly collaborate with them closely to coordinate these changes. Some of these people may not even be on your team: they may be third parties writing plugins for your software or former developers who have moved on to other projects.
You really don't want to run into this scenario repeatedly, so designing public interfaces well enough to keep them stable (unchanging) becomes a key skill for your most central interfaces. If those interfaces are leaking implementation details like raw data, then the temptation to change them over and over is going to be a scenario you can face all the time.
So you generally want to design interfaces to focus on "what" they should do, not "how" they should do it, since the "how" might change a lot more often than the "what". For example, perhaps a function should append a new element to a list. However, you may want to swap out the list data structure it's using for another, or introduce a lock to make that function thread safe ("how" concerns). If these "how" concerns are not leaked to the public interface, then you can change the implementation of that class (how it's doing things) locally without affecting any of the existing code that is requesting it to do things.
You also don't want classes to do too much and become monolithic, since then your class variables will become "more global" (become visible to a lot more code even within the class's implementation) and it'll also be hard to settle on a stable design when it's already doing so much (the more classes do, the more they'll want to do).
Getters and setters aren't the best examples of such interface design, but they do avoid exposing those "how" details at least slightly better than a publicly exposed variable, and thus have fewer reasons to change (break).
Practical Avoidance of Getters/Setters
Is it OK to never use a getter and always get a field's value using dot notation?
This could sometimes be okay. For example, if you are implementing a tree structure and it utilizes a node class as a private implementation detail that clients never use directly, then trying too hard to focus on the engineering of this node class is probably going to start becoming counter-productive.
There your node class isn't a public interface. It's a private implementation detail for your tree. You can guarantee that it won't be used by anything more than the tree implementation, so there it might be overkill to apply these kinds of practices.
Where you don't want to ignore such practices is in the real public interface, the tree interface. You don't want to allow the tree to be misused and left in an invalid state, and you don't want an unstable interface which you're constantly tempted to change long after the tree is being widely used.
Another case where it might be okay is if you're just working on a scrap project/experiment as a kind of learning exercise, and you know for sure that the code you write is rather disposable and is never going to be used in any project of scale or grow into anything of scale.
Nevertheless, if you're very new to these concepts, I think it's a useful exercise even for your small scale projects to err on the side of using getters/setters. It's similar to how Mr. Miyagi got Daniel-San to paint the fence, wash the car, etc. Daniel-San finds it all pointless with his arms exhausted on top of that. Then Mr. Miyagi goes "hyah hyah hyoh hyah" throwing big punches and kicks, and using that indirect training, Daniel-San blocks all of them without realizing how he's even doing it.
In java you can't tell the compiler to allow read-only access to a public field from outside.
So exposing public fields opens the door to uncontroled modifications.
Fields are not polymorphic.
The alternative to a getter would be a public field; however, fields are not polymorphic.
This means that you cannot extend the class and "override" the field without introducing weird behaviour. Basically, the value you get will depend on how you refer to the field.
Furthermore, you can't include the field in an interface and you can't perform validation (that applies more to a setter).

How to format methods in a class

I have nearly about 12 methods in my class. My doubt is, is there any formatting style like the called method has to be written next to the caller method" Is there any standard that maximum methods per class?
I would suggest reading Robert C. Martin's thoughts on this in his book Clean Code. He writes that a class should be readable as an article or a page of a book, so you preferable keep methods close to which they call into. Of course it is impossible to keep everything this way but you can head towards it. This eliminates the need to browse big sources frequently. For maximum methods Fowler has some rules also but it really depends on the class, but: keep methods and classes as small as possible.
It is impossible to keep calling and called methods next to each other, most obviously due to the fact that they can be in different classes.
There are no standards that would say "you can't have over 20 methods in a class", since it's not something that you can standardize (or rather it wouldn't make sense). With experience you'll learn to see if a class has too many methods (one indication would be that a class seems to be responsible for 2 different things, in which case you'd refactor the class into 2 different classes).
Generally speaking, 12 methods are too many for a class, I think you should think it over, if there are too many methods, maybe they are contradict to object-oriented thoughts. If you are OK with a specific language, you can get to know some design models such as MVC, maybe that will give you some ideas.

Huge java class for android project, how to split it up?

In C++ I'm used to being able to split classes up into multiple files using the scope resolution operator (::), but in java it seems impossible to split a class between multiple files.
I've read that classes shouldn't be more than a few hundred lines, but that sounds like ideological nonsense from people who don't write significant applications.
I am writing an industrial Android application (not for consumers, for technicians using professional test and measurement equipment in conjunction with the app linked via bluetooth) and several of my android activities are more than 1000 lines long and I'm not even close to being finished. The primary activity is over 6000 lines long and I expect it to become much longer still... It's becoming very unwieldy and, like I said, in C++ I would just logically split the class among multiple source files, but I guess that's not an option here.
Is there any alternative that I am overlooking to reduce the length of my source files without actually cutting out code (which is not an option...)?
It's acceptable to have a class with more than a hundred lines (most of my classes I write are more than 100 lines long). The thing is, dealing with an Object Orientated language, one should categorize as much as possible, to make the code maintainable, intuitive and readable. Good luck with your app!
You can subclass your class. Make a base class with core functionality and subclass it. So you can have several files and at the end you have just one class you can use.
you can still use composition and implementation techniques. The Java Package is the same as C++ Namespace
Why don't you try to separate common things in different classes, and then use instances of those classes on your Activities? Something like a command pattern, but a bit simpler.

Coupling/Cohesion

Whilst there are many good examples on this forum that contain examples of coupling and cohesion, I am struggling to apply it to my code fully. I can identify parts in my code that may need changing. Would any Java experts be able to take a look at my code and explain to me what aspects are good and bad. I don't mind changing it myself at all. It's just that many people seem to disagree with each other and I'm finding it hard to actually understand what principles to follow...
First, I'd like to say that the primary reason you get such varying answers is that this really does become an art over time. Many of the opinions you get don't boil down to a hard fast rule or fact, more it comes down to general experience. After 10-20 years doing this, you start to remember what things you did that caused pain, and how you avoided doing them again. Many answers work for some problems, but it's the individual's experience that determines their opinion.
There is really only 1 really big thing I would change in your code. I would consider looking into what's called the Command Pattern. Information on this shouldn't be difficult to find either on the web or in the GoF book.
The primary idea is that each of your commands "add child", "add parent" become a separate class. The logic for a single command is enclosed in a single small class that is easy to test and modify. That class should then be "executed" to do the work from your main class. In this way, your main class only has to deal with command line parsing, and can lose most of it's knowledge of a FamilyTree. It just has to know what command line maps into which Command classes and kick them off.
That's my 2 cents.
I can recommend Alan's and James's book Design Patterns explained -- A new perspective on object-oriented design (ISBN-13: 978-0321247148):
It's a great book about has-a and is-a decissions, including cohesion and coupling in object-oriented design.
In short:
Cohesion in software engineering, as in real life, is how much the elements consisting a whole(in our case let's say a class) can be said that they actually belong together. Thus, it is a measure of how strongly related each piece of functionality expressed by the source code of a software module is.
One way of looking at cohesion in terms of OO is if the methods in the class are using any of the private attributes.
Now the discussion is bigger than this but High Cohesion (or the cohesion's best type - the functional cohesion) is when parts of a module are grouped because they all contribute to a single well-defined task of the module.
Coupling in simple words, is how much one component (again, imagine a class, although not necessarily) knows about the inner workings or inner elements of another one, i.e. how much knowledge it has of the other component.
Loose coupling is a method of interconnecting the components in a system or network so that those components, depend on each other to the least extent practically possible…
In long:
I wrote a blog post about this. It discusses all this in much detail, with examples etc. It also explains the benefits of why you should follow these principles. I think it could help...
Coupling defines the degree to which each component depends on other components in the system. Given two components A and B ,how much code in B must change if A changes.
Cohesion defines the measure of how coherent or strongly related the various functions of a single software component are.It refers to what the class does.
Low cohesion would mean that the class does a great variety of actions and is not focused on what it should do. High cohesion would then mean that the class is focused on what it should be doing, i.e. only methods relating to the intention of the class.
Note: Good APIs exhibit loose coupling and high cohesion.
One particularly abhorrent form of tight coupling that should always be avoided is having two components that depend on each other directly or indirectly, that is, a dependency cycle or circular dependency.
Detailed info in below link
http://softwarematerial.blogspot.sg/2015/12/coupling-and-cohesion.html

Categories