I am newbie to Java. I have some design questions.
Say I have a crawler application, that does the following:
1. Crawls a url and gets its content
2. Parses the contents
3. Displays the contents
How do you decide between implementing a function or a class?
-- Should the parser be a function of the crawler class, or should it be a class in itself, so it can be used by other applications as well?
-- If it should be a class, should it be protected or public class?
How do you decide between implementing a public or protected class?
-- If I had to create a class to generate stats from the parsed contents for eg, should that class be protected (so only the crawler class can access it) or should it be public?
Thanks
Ron
I think Andy's answer is very good. I have a few additions:
If you believe that a class will be extended in the future, you can set all your private methods (if any) to protected. In this way, any future extending classes can also access these.
I like the rule that a method shouldn't be longer than that you can see its opening and closing brackets ({ }) without scrolling. If a method is longer than that, try to split it up into several methods (private, protected or public by your preference). This makes code more readable, and could also save on lines of code.
So let's say a method is getting big and you split it up into several private methods. If these new methods are only used within the first "mother"-method, it makes sense to move all of that into a class of its own. In this way you will make the original class smaller and more readable. In addition, you will make the functionality of the new class easier to understand, as it is not mixed up with that of the original class.
The best guidance I've seen for these types of questions is the "SOLID Principles of OO Design."
http://butunclebob.com/ArticleS.UncleBob.PrinciplesOfOod
The most basic of these principles, and the one that sort of answers your first question is the "Single Responsibility Principle." This states that, "a class should have one, and only one, reason to change." In other words, your classes should each do exactly one thing. If you end up needing to change how that one thing works, you only have one class to change, and hopefully just one place to make the change within that class. In your case, you would probably want a class to retrieve the content from the URL, another class to parse it into some sort of in-memory data structure, another class to process the data (if needed), and yet another class (or classes) to display the content in whatever format you need. Obviously, you can get carried away with classes, but it's typically easier to test a lot of small, single-operation classes, as opposed to one or two large, all-encompassing classes.
The question on public vs. protected depends on how you plan to use this code. If your class could be used independently outside your library, you could think about making it public, but if it accomplishes some task which is specific or tied to your other classes, it could probably be protected. For example, a class to retrieve content from a URL is a good general-purpose class, so you could make it public, but a class that does some specific type of manipulation of data might not be useful outside your library, so it can be protected. Overall, it's not always black and white, but ultimately, it's usually not a huge deal either way.
I like to think of classes as "guys" who can do specific stuff "methods".
In your case, theres a guy who can fetch the content of an url if you tell him which url that is.
Then there is this another guy, that is really good at parsing content. I think he does that with a tool called rome, but i'm not sure. he keeps that private (hint ;) )
Then we have that third guy, who displays stuff. He's a bit retarded and only understands stuff that "another guy" produces, but hey thats fine.
Finally the project needs a boss guy, who gives orders to the other 3 guys and passes messages between them.
ps: I never really though about making classes protected or not. Usually they are simply public without any specific reason. As long as it don't hurt, why bother?
Related
In many examples regarding Java extends found online, classes were used that have a certain "logical connection". For example, a banana extends a fruit, a Student extends a Person etc.
Is it good practice to extend a class with another class just to inherit the methods and attributes, even though both classes don't show a "connection" like in the example above?
For example, a class UserManagementService extends a class DatabaseConnectionService so that UserManagementService can simply connect to the database by calling the method connect() instead of instantiating DatabaseConnectionService and calling databaseConnectionService.connect().
Yes, this is obviously bad; that's because your type hierarchy is neccessarily public API. If your API exposes a UserManagementService, I can use it as a DatabaseConnectionService object. That means that your choice to have UMS extend DCS is locked in - if you ever change that, any code that uses your UMS may then fail. You can try to solve this in documentation:
/**
* Lets you query information about, and perform operations on,
* the storage of users allowed on this service.
*
* IMPORTANT NOTE: Even though this class extends DatabaseConnectionService,
* this is <em>not guaranteed</em> by this implementation,
* so do not rely on this!
*/
public class UserManagementService extends DatabaseConnectionService {
...
}
But surely you can see that this is pretty suboptimal and 'ugly' (hard to maintain - you can't test that other code you have no control over actually heeds your warning here).
It also applies in reverse: If ever DCS adds something that just makes no sense for a UMS to have, then your UMS is, all of a sudden, broken and exposes crazy stuff that makes no sense and causes many questions or worse.
Contrast this to declaring that a Student is some kind of Person: That's inherently true; that is not merely a convenient implementation detail. If the country at large gains some sort of servicenumber feature and you extend the Person class to support this, then Students all of a sudden also gain this servicenumber thing. But that's good: Students ARE persons, after all.
So, how do you fix it?
Easy. Don't extend DCS. Create a field of type DCS, and if there are a bunch of methods that DCS has that you want UMS to also have, write em out. Their implementations can be very simple oneliners:
public int count() {
return dataConService.count();
}
// and a lot more of this, if really needed.
Common retort to this logic: But I control it all and I don't foresee new features ever being added!
Well, okay, but understand that single-person throwaway projects are a very bad basis to talk 'style guides and code cleanliness' - it's just you, hacking away for a weekend, write it however you like, you'll be fine.
Style guides and approaches to coding become useful when a team of 50 programmers program an application that is to survive and be in the business for over a decade, with programmers leaving and new programmers joining the team, and 5 years after the project is started, features you didn't even think of yet need to be added because of customer demand.
With that in mind, understand that code bases become gigantic and it'll be very hard to safely change things and train new programmers to become familiar with it. One extremely useful way to make that a little easier is to aggressively modularize things: Anytime you can draw the entire sourcecode base on a whiteboard (which will be huge), but then draw a smallish circle around a tiny part of it and go: This stuff can be understood all by itself, tested by itself, and developed on without completely understanding all the other source code - that's good. That's what you want.
"UMS currently extends DCS but don't rely on that" is exactly the kind of thing that makes drawing that tiny circle more complicated, which is why it's not a good idea to do it.
How to describe a class is my question? When you start a class in bluej there's always a documentation comment for the description of the class. What is to be written in that description?
For example I have a class called Economy that extends an abstract class Structure and the abstract class Structure implements an interface Basic. So what should I write in the description of the class Economy?
The very first thing to understand is ... one should actually write as few comments as possible. Instead: write code that can be read and understood and used without having (a lot) of additional comments around them.
Example: the names you choose "Structure" and "Basic" are very much ... meaningless. Those names do not tell anything about the intended behavior that one can expect from the corresponding class and the interface.
Thing is: comments lie. They add an extra quality to your source code; but a quality that can't be checked automatically. Thus it is very easy for that information to get out of sync with the things the code really does.
In other words: it can be perfectly OK to put an empty or very short description on a class. Besides: there is SRP that gives you guidance on "putting only a single responsibility" into each class. So, the core point of a "class description" would be to name/describe that one responsibility of the corresponding class.
Think about what someone would need to know if they wanted to use your class, or a basic description of the class you would give to someone if they didn't know what it did. Why would someone use your class? What does it do?
#Jägermeister mentioned how fallible code comments can be, so make sure that whatever you write, you keep it updated with what your code does. If you change the class, make sure you change your description of the class as well. And keep your description fairly short, you should most likely only need a few lines or less.
If you find yourself writing several lines of description, it might be a good idea to look at your class, and ask yourself if it's trying to do too much. In this case, it might be a good idea to make another class to accept some of its responsibilities.
I have a public method that calls group of private methods.
I would like to test each of the private method with unit test as it is too complicated to test everything through the public method ,
Is think it will be a bad practice to change method accessibility only for testing purposes.
But I dont see any other way to test it (maybe reflection , but it is ugly)
Private methods should only exist as a consequence of refactoring a public method, that you've developed using TDD.
If you create a class with public methods and plan to add private methods to it, then your architecture will fail.
I know it's harsh, but what you're asking for is really, really bad software design.
I suggest you buy Uncle Bob's book "Clean Code"
http://www.amazon.co.uk/Clean-Code-Handbook-Software-Craftsmanship/dp/0132350882
Which basically gives you a great foundation for getting it right and saving you a lot of grief in your future as a developer.
There is IMO only one correct answer to this question; If the the class is too complex it means it's doing too much and has too many responsibilities. You need to extract those responsibilities into other classes that can be tested separately.
So the answer to your question is NO!
What you have is a code smell. You're seeing the symptoms of a problem, but you're not curing it. What you need to do is to use refactoring techniques like extract class or extract subclass. Try to see if you can extract one of those private methods (or parts of it) into a class of itself. Then you can add unit test to that new class. Divide and conquer untill you have the code under control.
You could, as has been mentioned, change the visibility from private to package, and then ensure that the unit-tests are in the same package (which should normally be the case anyway).
This can be an acceptable solution to your testing problem, given that the interfaces of the (now) private functions are sufficiently stable and that you also do some integration testing (that is, checking that the public methods call the private ones in the correct way).
There are, however, some other options you might want to consider:
If the private functions are interface-stable but sufficiently complex, you might consider creating separate classes for them - it is likely that some of them might benefit from being split into several smaller functions themselves.
If testing the private functions via the public interface is inconvenient (maybe because of the need for a complex setup), this can sometimes be solved by the use of helper functions that simplify the setup and allow different tests to share common setup code.
You are right, changing the visibility of methods just so you are able to test them is a bad thing to do. Here are the options you have:
Test it through existing public methods. You really shouldn't test methods but behavior, which normally needs multiple methods anyway. So stop thinking about testing that method, but figure out the behavior that is not tested. If your class is well designed it should be easily testable.
Move the method into a new class. This is probably the best solution to your problem from a design perspective. If your code is so complex that you can't reach all the paths in that private method, parts of it should probably live in their own class. In that class they will have at least package scope and can easily be tested. Again: you should still test behavior not methods.
Use reflection. You can access private fields and methods using reflection. While this is technical possible it just adds more legacy code to the existing legacy code in order to hide the legacy code. In the general case a rather stupid thing to do. There are exceptions to this. For example is for some reason you are not allowed to make even the smallest change to the production source code. If you really need this, google it.
Just change the visibility Yes it is bad practice. But sometimes the alternatives are: Make large changes without tests or don't test it at all. So sometimes it is ok to just bite the bullet and change the visibility. Especially when it is the first step for writing some tests and then extracting the behavior in its own class.
I know it's not efficient, but I don't really know why.
Most of the time, when you implement your game you got a main class which has a loop and updates every frame and creates certain objects.
My question is why it's not considered efficient to pass the main class to every object in its constructor?
In my case, I developed my game in Java for Android, using LibGDX.
Thank you!
It increases coupling (how much objects depend on each other) and therefore reduces re-usability and has the tenancy to produce 'spaghetti code'. I don't really understand what you mean by not being 'efficient', but this is why you shouldn't do it.
You should also consider why you need that main class in every single object. If you really think you do, you might need to reconsider your system design. Would you mind elaborating on why you think you need it?
Mostly, it is a matter of coupling the code and making proper design decisions.
You should avoid dependencies between classes whenever possible. It makes the code easily maintainable and the whole design clearer.
Consider the case: you are creating a simulation racing game. You have a few classes for such entities: wheel, engine, gearshift knob, etc... and non-entities: level, player...
Let's say, you have some main point (i.e. GameEngine class where you create instances).
According to you're approach you want to pass GameEngine's instance in entities constructors (or related mutator methods). It's not the best idea.
You really want to allow wheels or breaks to have the knowledge about the rest of the world (such as player's informations, scores, level etc.) and give them access to it's public interface methods?
All classes should have at small level of responsibility (and knowledge about other items) as possible.
If you really need reference to some kind of main point object in you're classes consider using dependency injection tools, such as Dagger.
It won't make you're game design better, but, at least, forces you to favor composition over inheritance - what leads to create better code.
It's not entirely inefficient, since (afiak in the general case) passing a reference to a method is quite cheap when you consider the number of JVM opcodes required, however, a possibly more efficient way of doing this would be to make a static instance of the game class and access that static field from the other classes. You would have to test these two options yourself.
In addition, passing a reference to the methods could make maintaining the code harder, as you have ultimately added a dependency.
Lately I met a situation where I needed to create a custom VideoView to my android application. I needed an access to the MediaPlayer object and to add some listeners.
Unfortunately (for me), all members of the VideoView class are private, so even extending the class wouldn't help me to gain access to its MediaPlayer object (or anything else), I had to make a complete duplicate of the class with my modifications.
Well, although it is sound like I'm complaining for the "hard work", it is easier than extending the class in this case (since all the source is available...), but it made me really doubt this method of information hiding. Is this a better practice than leaving main components available to modification / access (protected, not public)? I mean, I understand that if I extend the VideoView class, someday maybe they'll change something in the VideoView class and I might have troubles, but if they'll change the class, my own (duplicate) version will have a bigger difference from the VideoView class, and my goal is not to create my own video view, but to extend the available VideoView.
When a programmer makes something private, they're making a bet that nobody else will ever need to use or override it, and so there will be a payoff from the information hiding. Sometimes that bet doesn't come off. Them's the breaks.
I usually prefer composition rather than inheritance in such situations.
EDIT:
It's safe to use inheritance when both subclass and super class are in the control of the same programmer but implementation inheritance can lead to a fragile API. As you mentioned if superclass implementation changes then subclass can break or more worst - will do unintended things silently.
The other approach would be to have private field that references an instance of the existing class (VideoView) known as composition and each instance method in the new class invokes the corresponding method on the contained instance of the existing class and returns the results. This wrapper approach can be referred as 'Decorator' pattern as well
I can't speak for the reasoning of the particular VideoView developers, but if you're developing an API, and determine that the state represented by certain data needs to always follow certain rules in order to maintain the integrity and intended purpose of the object, then it makes sense to make the member vars private so you can control their modification.
It does limit what other devs can do, but I assume that's the point. There's some things that, if they were to be changed, you would want it to go through discussion and verification amongst the group that has governance over the API. In that case it makes sense to privatize so that modifications to it can't get out of hand outside of the group's oversight.
I don't know that there's a static rule of thumb that determines when something needs to fall into this category, but I can definitely see the use in certain cases.
When I read all the enlightening answers (and comments) and commenting to those I realized that I expected something which is irrelevant from some classes. In the case of VideoView fr example, this class is already the last in the inheritance chain. It should not be extended, as it is one logical unit, very specific and very tight, for a very specific purpose. My needs, to get special states from the view and the MediaPlayer, were needs for QC purposes, and such needs really shouldn't be considered when providing a product which is a closed unit (although the source is open). This is a reasonable argument and I find it satisfing. Sometimes not every concept of OOP should be implemented. Thank you all for the responses.