This question already has answers here:
Closed 12 years ago.
Possible Duplicates:
What should I keep in mind in order to refactor huge code base?
When is it good (if ever) to scrap production code and start over?
I am currently working with some legacy source code files. They have quite a few problems because they were written by a database expert who does not know much about Java. For instance,
Fields in classes are public. No getters and setters.
Use raw types, not parameterized types.
Use static unnecessarily.
Super long method names.
Methods need too many parameters.
Repeat Yourself frequently.
I want to modify them so that they are more object-oriented. What are some best practices and effective/efficient approaches?
Read "Working Effectively with Legacy Code" by Michael Feathers. Great book - and obviously it'll be a lot more detailed than answers here. It's got lots of techniques for handling things sensibly.
It looks like you've identified a number of issues, which is a large part of the problem. Many of those sound like they can be fixed relatively easily - it's overall design and architecture which is harder to do, of course.
Are there already unit tests, or will you be adding those too?
Before you start, create a system-level regression test suite for the application. You need this so that you can verify that your changes don't break things.
To do the refactoring, you want a use a combination of a good IDE, and text search tool (e.g. grep). Use the text search tool to find occurrences of the "syndromes" that you want to fix, then use the IDE (and its builtin refactoring capabilities) to fix the instances ... one at a time.
For example, Eclipse allows you to rename a method or class, or generate getters and setters. So you'd cure a 'public' attribute by:
Change the attribute to private.
Generate the getter and setter methods.
Save the file.
Go through all of the Java compilation errors resulting from the fact that the attribute is now private, and change to use the getter or setter as appropriate.
This approach will give you the low-hanging fruit. More fundamental design issues are more difficult, and may be impossible to fix without fundamental restructuring of the application. The refactoring capabilities will help you execute such changes, but deciding what to do is ultimately up to you.
Finally, my advice is to not be too ambitious. Go for incremental improvement, and be prepared to draw the line when the code is "good enough". You won't achieve perfection ... not even if you start from a clean slate ... so don't set your expectations high.
Is it just the code that is bad, or does it also hurt the user experience? Refactoring continuously is a good idea, but it should not be a goal unto itself. It should improve the application in terms of user interaction, maintainability, stability, performance, etc.
That is why I am not extremely fond of huge refactoring just to improve the code quality. Instead, refactor the code that you work with.
While working with a legacy system for several years, I have personally found that:
Create for yourself a vision of how you want the code after you're done. It should be attainable, contain a list of technology changes, general architecture changes. It may also be a good idea to make a rough priority of what classes are most critical to change. We lacked such a vision a few years ago, and while we refactored a lot, the code quality barely improved.
Now, you should restrict your refactoring to those that make you reach your vision. Don't fall into the trap of doing what appears good at the moment.
Focus on a particular component, and make it better. Then move on to the next. It's tempting to make huge changes that affect the entire system, but in truth you will introduce more problems than you solve.
Write integration regression tests. I.e., a few big tests that test a lot of functionality. It's not optimal, but it's the best you can do. Writing unit tests for every single class in your old system may end up a waste of time because it's not designed to be tested anyway and you want to redesign half of the classes.
Accept that it will take time.
Eclipse should be able to take care of #1 and help you work your way through many of the others.
As for converting poor OO code to good OO code it is amazingly difficult. Often it seems easier to rewrite it from scratch.
I tend to go from the bottom up. As I'm working on some small section I'll recognize a bunch of data that belongs together as a group and I'll make a good object that replaces that code without changing anything else--Very Small Changes with constant tests between each change.
This makes for a mediocre design at best, but I honestly don't know if you can go from not OO to good OO on a large project without dissecting the original program, understanding it and using it as a template for the rewrite and few projects allow this (even though it might be faster, you'll rarely if ever be able to convince management of that fact)
The point is risk I think.
The ugly code is just ugly, but it could work, it has been tested and bugfixed. If runnable code is changed, risk will follow. so test is critical.
You could refactor related code when
you have to bugfix, as a conservatism solution.
Maybe the first challenge is to persuade your manager:)
What's the problem with it not having getters and setters? I'd suggest refactoring those only when you need to add non-trivial getters or setters (e.g., with validation).
The rest sounds like you need to identify groups of values and create new types holding them, so instead of passing a String name, String address, int yearOfBirth, String[] accountNames, int[] balances you would pass a Customer around, which would in turn have an Account[].
IDEA Ultimate Edition has a code duplication detector that's very good (it's only missing a 'suggested solution' button!), and there are CPD etc.
I'd suggest that in a large legacy codebase you might waste time refactoring code only to find out it wasn't used anyway. I outlined some steps for removing unused code: http://rickyclarkson.blogspot.com/2009/12/deleting-code-what-first.html
How many of those "issues" are real problems and not just matters of style? Of this list, the only 'real' issue I can see is "Repeat Yourself frequently", and that's more of an ongoing maintenance problem that should be resolved during normal code maintenance when someone's going to be changing the code anyway.
I want to modify them so that they are more object-oriented.
Object-Orientation should not be your only goal when refactoring. The question you should ask yourself is what is the expected ROI (better quality ? easier enhancements ? better sharing of this code across a team ?) A ROI is not just words, you should be prepared to measure with numbers the return on investment (even the quality enhancements for example). You should take into accounts the life duration of your products in estimating the ROI.
You should also ask yourself what is the size of the code which is dependent on the code you want to refactor. Refactoring a library could be easy but could lead to a lot of changes in source codes dependent on this library, a work well larger than just refactoring the library.
Before touching any code, you should estimate the total work that needs to be done to finish refactoring the code and dependent code. You should estimate a total rewrite of the code, a partial rewrite, or just an internal rewrite without touching APIs.
With the costs and returns, you could decide if it's worth the effort to refactor your code.
Related
The IDE has suggested to add a getter/setter to a private field.
It is never used, and reaching the field is only from within the class.
what is the preferred coding style? keeping the never used methods?
Im asking specifically about java/kotlin but this is a general question
There are a few distinctions that you need to know about to answer this question yourself - as it depends on a ton of things; far too much to ask for and for you to write down:
For this entire answer it's important to think about the distinction between layers of code. These layers can be a bit hard to think about if the project you're imagining when thinking about layers is something small and written just by yourself. So don't do that - think about, say, Microsoft Word as a product. It's written by many people, over many years - entire departments and dev teams. It's somewhat modular (there's the "Mail Merge" system that doesn't interact, at all, with the 'show available fonts' dropdown).
What's the whole private fields, public getters/setters all about in the first place?
Fields are highly inflexible constructs. If you 'expose' them (make it public), then there is no granularity available to you. The only knobs you can twiddle with is:
You can make a field unchangable for everybody - you can't change it, nor can anybody else. (To do this, mark it final).
That's it. You can't do anything else 'to' it. You can't have more fine grained control about access (such as allowing code 'nearby' to change it, but not code further out), you can't run some code as field writes/read happen either. Perhaps you need more granularity. Keep in mind that we're trying to wrote code that will survive 10 years in an environment with 100 programmers, most of whom won't last the entire 10 years, in many different teams. So, imagine you wanted to:
Make it a field that everybody gets to read, but only 'your' code (that is, the programmers working on this particular corner of the codebase who are aware of this particular corner's rules and needs) should get to change.
Make it a field that everybody gets to read, and write, but, if its not 'your' code doing the writing, a log line should be emitted.
Make it a field that nobody gets to write (not even you - it is initialized at object creation, that's it, makes it easier to reason abou, that's why we 'handcuff' ourselves: When you need to maintain code for 10 years, limiting certain things off and having a compiler that enforces these is quite useful), and 'outsiders' can read, but you want to tweak the read a bit, for example, substitute 'the current date' when the value is blank.
And so on.
Even more importantly, is time: Sometimes you start out just wanting to expose a field to everybody right now, but later on you realize: Oh, wait, we need to emit a log line. Or: Oops, we need to return the current date if the value is blank.
If you just make a field public, you:
Do not have any of that granularity.
Even if you're okay with that now, you can't later on update your code and add stuff that needs this granularity; not without turning the field into a getter/setter pair, and that is not backwards compatible: You need to send a mail to those 100 developers or start refactoring their code which is a huge undertaking.
Hence, even if you don't see any point or purpose in giving you the granularity powers right now, it's still advised to just make that field private and add getters and setters: That way if later on some currently not forseeable request comes in (such as: Log the writes to this field, please!), you can add that feature without having to ask all other 100 developers to pull the change and edit all their branches, which is a huge undertaking.
YAGNI
A maxim in the programming world is YAGNI - You aren't gonna need it.
YAGNI is a dangerous beast - it applies -solely- to semi-local endeavours.
The basic principle of YAGNI is: Code is a flowing concept, and you should never hesitate to make improvements, especially if you can't think of a way this would break any existing usage. Hence, given that your development processes should be set up such that adding stuff is easy, don't add stuff until you need it - after all, if you add stuff even if you don't currently need it, maybe you never need it and you're now just clogging up the code for no good reason. IF somebody needs it, they can trivially add it then.
The problem with YAGNI is that predicate: YAGNI is based on the notion that making a change is quick and painless.
Imagine this scenario: The Microsoft Office development crew decides to write their own font rendering system, because what windows delivers just looks bad on HiDPI screens. So, they spend a ton of time and research on this and with much fanfare release a new version. Everybody loves it.
The OS team comes aknocking and the MS Office team decides to 'hand over' the new font rendering engine to the OS team. In order to avoid having 2 teams spend the resources on maintaining it, the next version of MS Office is pegged to only run on a new version of the OS that includes the new pipeline, and thus, the MS Office team removes the font rendering engine from it - it's now the OS's job.
Whoops, any YAGNI is now quite a big problem: If there's something foreseeable and obvious the MSOffice team needed that they didn't add (or if the Windows OS team applied YAGNI to the API they expose to apps to do font rendering stuff), then the MS Office team needs to give a call to the Windows OS team that's in another country, working on other source control, and having entirely different versioning pipelines, and ask them for a change. It'll take 2 years before it's all said and done.
Linters/stylcheckers are tools, and fairly stupid ones at that
Any warning about style or suggestion about changes are just that - suggestions. These tools aren't perfect, and will absolutely suggest very silly things from time to time. You should never apply style advice until you understand why it is given and under what circumstances it should be followed, and you should feel free to tell linters/stylecheckers to buzz off if they are wrong.
Some dev shops put out absolutist rules ('you can NEVER check in code that fails our linter tool - we have git commit hooks that enfore this!'), but those shops are misguided: They seem to think that if only you rigorously apply enough style rules, that code will therefore be well written, bug free, and performant. This is obviously entirely false. You should absolutely help programmers (and might lightly enforce this even) to help themselves and avail themselves of the tools available to write better code, but you can't beat the bird to make it sing, so to speak.
Thus, be aware that sometimes the best thing to do about a style suggestion - is to ignore it.
Back to your question
So, now you know what I'm driving at when I ask these questions, which naturally lead you to answering your own question:
Is the field even meant to be exposed in the first place? Anything you 'expose' is likely to be used by code that's relatively far removed from you (different team, different time, different context), and once you expose it, you have to continue to support it - any changes you make can't fundamentally change/remove what you exposed. So, perhaps just having a private field with no getters and setters is the best place to start:
If you're sure it makes no sense to expose it, then don't. Just leave them as private fields, the code in this source file can edit them, and other code cannot even assume this field exists - they should know nothing about it.
If you're sure it makes perfect sense to expose it; it is the very point of the class, then make a private field with public getter (and if you want, setter - do you intend for it to be mutable or not?) - even if you don't see any need to do special stuff in that getter. Java programmers expect to access properties from other source files via getters and setters and you keep the flexibility to change things later without breaking compatibility.
If you're not sure, then think about YAGNI: Is this an API that is going to be exposed so far and wide it'll affect people who cannot easily modify the codebase? Then, sorry, you're going to have to think some more and make a decision. But most likely you're not writing that kind of code, and anybody who might want to access this thing could change the code fairly easily: It'll be you, or a colleague working in the same source tree. In which case, don't think about it too long - err on the side of caution and don't make getters and setters. If someone needs em later, well, let them make the call - with the benefit of that use case they now have, they'll be more likely to make a well informed decision than you can, without that benefit.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I have read through a bunch of best practices online for JUnit and Java in general, and a big one that people like to point out is that fields and methods should be private unless you really need to let users access them. Class variables should be private with getters and setters, and the only methods you should expose should be ones that users will call directly.
My question: how strictly necessary are these rules when you have things like standalone apps that don't have any users? I'm currently working on something that will get run on a server maybe once a month. There are config files that the app uses that can be modified, but otherwise there is no real user interaction once it runs. I have mostly been following best practices but have run into issues with unit testing. A lot of the time it feels like I am just jumping through hoops with my unit testing getting things just right, and it would be much easier if the method or whatever was public or even protected instead.
I understand that encapsulation will make it easier to make changes behind the scenes without needing to change code all over, but without users to impact that seems a bit more flimsy. I am just making my current job harder on the off-chance it will save me time later. I've also seen all of the answers on this site saying that if you need to unit test a private method you are doing something wrong. But that is predicated on the idea that those methods should always be private, which is what I am questioning.
If I know that no one will be using the application (calling its methods from a jar or API or whatever) is there anything wrong with making everything protected? Or even public? What about keeping private fields but making every method public? Where is the balance between "correct" accessibility on pieces of code, and ease of use?
It is not "necessary", but applying standards of good design and coding principles even in the "small" projects will help you in the long run.
Yes, it takes discipline to write good software. Languages are tools that help you accomplish a goal. Like any tool, they can be misused, and when misused can be dangerous. Power tools, like a table saw, can be very dangerous if misused, so if you care about your own safety you always follow proper procedure, even if it might feel a little inconvenient (or you end up nicknamed "stubby").
I'd argue that it's on the small projects, where you want to cut corners and "just write the code", that adhering to the best practices is most important. You are training yourself in the proper use of your tools, so when it really matters you do the right thing automatically.
Also consider that projects that start out "small" can evolve over time to become quite large as you keep adding enhancements and new functionality. This is the nature of agile software development. If you followed best practices from the start you'll find it much easier to adapt as the project grows.
Another factor is that using OOP principles is a way of taming complexity. If you have a well-defined API and, for example, use only getters and setters, you can partition off the API from the implementation in your own mind. After writing class A, when writing a client of A, say B, you can think only about the API. Later when you need to enhance A you can easily tell what parts of A affect the API vs what parts are purely internal. If you didn't use encapsulation you'd have to scan your entire codebase to see if a change to A would break something else.
Do I apply this to EVERYTHING I write? No, of course not. I don't do this with short single-use scripts in dynamic languages (Perl, AWK, etc) but when writing Java I make it a point to always write "good" code, if only to keep my skills sharp.
There is generally no necessity to follow any rules as long as your code compiles and runs correctly.
However code style "best practices" have proven to enhance code quality, especially over time when a project develops/matures. Making fields private makes your code more resilient to later changes; if you ommit the getters/setters and access fields directly, any changes to a field impact related code much more directly.
While there is seemingly no advantange in a getter/setter at first, the advantage lies in the future: A getter forces any code working with the attribute through a single point of control which in case of any changes related to that field helps either mask the concrete representation/location of the field and/or allows for polymorphism when required whithout changes/checking all the existing callers.
Finally the less surface (accessible methods/fields) a class exposes to other classes (users) the less you have to maintain. Reducing the exposed API to the absolute minimum reduces coupling between classes, which again is an advantage when something needs to be changed. Striving to hide the inner workings of every object as good as possible is not a goal by itself, its the advantages that result from it that are the goal.
As always, good balancing is always required. But when in doubt, it is better to error/lean on the side of "source code quality" practices; instead of taking too many shortcuts; as there are many different aspects in your "simple" question one should consider:
It is hard to anticipate what will happen to some piece of software over time. Yes, you don't have any users today. But you know what: one major property of great tools is ... as soon as other people see them, they want to use them, too. And all of a sudden, you have users. And feature requests, bug reports, ... and make no mistake: first people will love you for the productivity gain from your tool; and then they will start to put pressure on you because all of a sudden your tool is essential for other people to make their goals.
Many things are fine to be addressed via convention. Example: sometimes, if I would only be using public methods of my "class under test", unit tests become more complicated than necessary. In such a case, I absolutely have no problem about putting a getter here or there that allows me to inspect the internal state of my "class under test"; so that my test can trigger some activity; and then call the getter. I make those methods "package protected"; and I put a // used for unit testing above them. I have not seen problems coming out of that informal practice. Please note: those methods should only be used in test cases. No other production class is supposed to call them.
Regarding the core of your question on private stuff: I think, one should always hide implementation details from the outside. Whenever you write a piece of code that is supposed to live longer than the next hour, you should do the right thing - and try to write code with very high quality. And making the internals of your objects visible on the outside
comes only with drawbacks; there is nothing positive in doing so.
Good OO is about using models that come with certain behavior.
Internal state should stay internal; there is no benefit in
exposing. For the record: sometimes, you have simple data
containers. Classes that only have some fields, but no methods on
them. In that case, yeah, make the fields public; there is (not
much) advantage in providing getters/setters. ( See "Clean Code" by
Robert Martin, chapter 6 on "Objects and Data structures")
I would like to transition our codebase from poorly written PHP code to poorly written Java, since I believe Java code is easier to tidy up. What are the pros and cons, and for those who have done it yourselves, would you recommend PtoJ for a project of about 300k ugly lines of code? Tips and tricks are most welcome; thanks!
Poorly written PHP is likely to be very hard to convert because a lot of the bad stuff in PHP just doesn't exist in Java (the same is true vice versa though, so don't take that as me saying Java is better - I'm going to keep well clear of that flame-war).
If you're talking about a legacy PHP app, then its highly likely that your code contains a lot of procedural code and inline HTML, neither of which are going to convert well to Java.
If you're really unlucky, you'll have things like eval() statements, dynamic variable names (using $$ syntax), looped include() statements, reliance on the 'register_globals' flag, and worse. That kind of stuff will completely thwart any conversion attempt.
Your other major problem is that debugging the result after the conversion is going to be hell, even if you have beautiful code to start with. If you want to avoid regressions, you will basically need to go through the entire code base on both sides with a fine comb.
The only time you're going to get a satisfactory result from an automated conversion of this type is if you start with a reasonably tide code base, written at least mainly in up-to-date OOP code.
In my opinion, you'd be better off doing the refacting excersise before the conversion. But of course, given your question, that would rather defeat the point. Therefore my recommendation is to stick it in PHP. PHP code can be very good, and even bad PHP can be polished up with a bit of refactoring.
[EDIT]
In answer to #Jonas's question in the comments, 'what is the best way to refactor horrible PHP code?'
It really depends on the nature of the code. A large monolithic block of code (which describes a lot of the bad PHP I've seen) can be very hard (if not imposible) to implementunit tests for. You may find that functional tests are the only kind of tests you can write on the old code base. These would use Selenium or similar tools to run the code through the browser as if it were a user. If you can get a set of reliable functional tests written, it is good for helping you remain confident that you aren't introducing regressions.
The good news is that it can be very easy - and satisfying - to rip apart bad code and rebuild it.
The way I've approached it in the past is to take a two-stage approach.
Stage one rewrites the monolithic code into decent quality procedural code. This is relatively easy, and the new code can be dropped into place as you go. This is where the bulk of the work happens, but you'll still end up with procedural code. Just better procedural code.
Stage two: once you've got a critical mass of reasonable quality procedural code, you can then refactor it again into an OOP model. This has to wait until later, because it is typically quite hard to convert old bad quality PHP straight into a set of objects. It also has to be done in fairly large chunks because you'll be moving large amounts of code into objects all at once. But if you did a good job in stage one, then stage two should be fairly straightforward.
When you've got it into objects, then you can start seriously thinking about unit tests.
I would say that automatic conversion from PHP to Java have the following:
pros:
quick and dirty, possibly making happy some project manager concerned with short-time delivery (assuming that you're lucky and the automatically generated code works without too much debugging, which I doubt)
cons:
ugly code: I doubt that automatic conversion from ugly PHP will generate anything but ugly Java
unmaintainable code: the automatically generate code is likely to be unmaintainable, or, at least, very difficult to maintain
bad approach: I assume you have a PHP Web application; in this case, I think that the automatic translation is unlikely to use Java best practices for Web application, or available frameworks
In summary
I would avoid automatic translation from PHP to Java, and I woudl at least consider rewriting the application from the ground up using Java. Especially if you have a Web application, choose a good Java framework for webapps, do a careful design, and proceed with an incremental implementation (one feature of your original PHP webapp at a time). With this approach, you'll end up with cleaner code that is easier to maintain and evolve ... and you may find out that the required time is not that bigger that what you'd need to clean/debug automatically generated code :)
P2J appears to be offline now, but I've written a proof-of-concept that converts a subset of PHP into Java. It uses the transpiler library for SWI-Prolog:
:- use_module(library(transpiler)).
:- set_prolog_flag(double_quotes,chars).
:- initialization(main).
main :-
Input = "function add($a,$b){ print $a.$b; return $a.$b;} function squared($a){ return $a*$a; } function add_exclamation_point($parameter){return $parameter.\"!\";}",
translate(Input,'php','java',X),
atom_chars(Y,X),
writeln(Y).
This is the program's output:
public static String add(String a,String b){
System.out.println(a+b);
return a+b;
}
public static int squared(int a){
return a*a;
}
public static String add_exclamation_point(String parameter){
return parameter+"!";
}
In contrast to other answers here, I would agree with your strategy to convert "PHP code to poorly written Java, since I believe Java code is easier to tidy up", but you need to make sure the tool that you are using doesn't introduce more bugs than you can handle.
An optimum stategy would be:
1) Do automated conversion
2) Get an MVP running with some basic tests
3) Start using the amazing Eclipse/IntelliJ refractoring tool to make the code more readable.
A modern Java IDE can refactor code with zero bugs when done properly. It can also tell you which functions are never called and a lot of other inspections.
I don't know how "PtoJ" was, since their website has vanished, but you ideally want something that doesn't just translate the syntax, but the logic. I used php2java.com recently and it worked very well. I've also used various "syntax" converters (not just for PHP to Java, but also ObjC -> Swift, Java -> Swift), and even they work just fine if you put in the time to make things work after.
Also, found this interesting blog entry about what might have happened to numiton PtoJ (http://www.runtimeconverter.com/single-post/2017/11/14/What-happened-to-numition).
http://www.numiton.com/products/ntile-ptoj/translation-samples/web-and-db-access/mysql.html
Would you rather not use Hibernate ?
When I receive code I have not seen before to refactor it into some sane state, I normally fix "cosmetic" things (like converting StringTokenizers to String#split(), replacing pre-1.2 collections by newer collections, making fields final, converting C-style arrays to Java-style arrays, ...) while reading the source code I have to get familiar with.
Are there many people using this strategy (maybe it is some kind of "best practice" I don't know?) or is this considered too dangerous, and not touching old code if it is not absolutely necessary is generally prefered? Or is it more common to combine the "cosmetic cleanup" step with the more invasive "general refactoring" step?
What are the common "low-hanging fruits" when doing "cosmetic clean-up" (vs. refactoring with more invasive changes)?
In my opinion, "cosmetic cleanup" is "general refactoring." You're just changing the code to make it more understandable without changing its behavior.
I always refactor by attacking the minor changes first. The more readable you can make the code quickly, the easier it will be to do the structural changes later - especially since it helps you look for repeated code, etc.
I typically start by looking at code that is used frequently and will need to be changed often, first. (This has the biggest impact in the least time...) Variable naming is probably the easiest and safest "low hanging fruit" to attack first, followed by framework updates (collection changes, updated methods, etc). Once those are done, breaking up large methods is usually my next step, followed by other typical refactorings.
There is no right or wrong answer here, as this depends largely on circumstances.
If the code is live, working, undocumented, and contains no testing infrastructure, then I wouldn't touch it. If someone comes back in the future and wants new features, I will try to work them into the existing code while changing as little as possible.
If the code is buggy, problematic, missing features, and was written by a programmer that no longer works with the company, then I would probably redesign and rewrite the whole thing. I could always still reference that programmer's code for a specific solution to a specific problem, but it would help me reorganize everything in my mind and in source. In this situation, the whole thing is probably poorly designed and it could use a complete re-think.
For everything in between, I would take the approach you outlined. I would start by cleaning up everything cosmetically so that I can see what's going on. Then I'd start working on whatever code stood out as needing the most work. I would add documentation as I understand how it works so that I will help remember what's going on.
Ultimately, remember that if you're going to be maintaining the code now, it should be up to your standards. Where it's not, you should take the time to bring it up to your standards - whatever that takes. This will save you a lot of time, effort, and frustration down the road.
The lowest-hanging cosmetic fruit is (in Eclipse, anyway) shift-control-F. Automatic formatting is your friend.
First thing I do is trying to hide most of the things to the outside world. If the code is crappy most of the time the guy that implemented it did not know much about data hiding and alike.
So my advice, first thing to do:
Turn as many members and methods as
private as you can without breaking the
compilation.
As a second step I try to identify the interfaces. I replace the concrete classes through the interfaces in all methods of related classes. This way you decouple the classes a bit.
Further refactoring can then be done more safely and locally.
You can buy a copy of Refactoring: Improving the Design of Existing Code from Martin Fowler, you'll find a lot of things you can do during your refactoring operation.
Plus you can use tools provided by your IDE and others code analyzers such as Findbugs or PMD to detect problems in your code.
Resources :
www.refactoring.com
wikipedia - List of tools for static code analysis in java
On the same topic :
How do you refactor a large messy codebase?
Code analyzers: PMD & FindBugs
By starting with "cosmetic cleanup" you get a good overview of how messy the code is and this combined with better readability is a good beginning.
I always (yeah, right... sometimes there's something called a deadline that mess with me) start with this approach and it has served me very well so far.
You're on the right track. By doing the small fixes you'll be more familiar with the code and the bigger fixes will be easier to do with all the detritus out of the way.
Run a tool like JDepend, CheckStyle or PMD on the source. They can automatically do loads of changes that are cosemetic but based on general refactoring rules.
I do not change old code except to reformat it using the IDE. There is too much risk of introducing a bug - or removing a bug that other code now depends upon! Or introducing a dependency that didn't exist such as using the heap instead of the stack.
Beyond the IDE reformat, I don't change code that the boss hasn't asked me to change. If something is egregious, I ask the boss if I can make changes and state a case of why this is good for the company.
If the boss asks me to fix a bug in the code, I make as few changes as possible. Say the bug is in a simple for loop. I'd refactor the loop into a new method. Then I'd write a test case for that method to demonstrate I have located the bug. Then I'd fix the new method. Then I'd make sure the test cases pass.
Yeah, I'm a contractor. Contracting gives you a different point of view. I recommend it.
There is one thing you should be aware of. The code you are starting with has been TESTED and approved, and your changes automatically means that that retesting must happen as you may have inadvertently broken some behaviour elsewhere.
Besides, everybody makes errors. Every non-trivial change you make (changing StringTokenizer to split is not an automatic feature in e.g. Eclipse, so you write it yourself) is an opportunity for errors to creep in. Do you get the exact behaviour right of a conditional, or did you by mere mistake forget a !?
Hence, your changes implies retesting. That work may be quite substantial and severely overwhelm the small changes you have done.
I don't normally bother going through old code looking for problems. However, if I'm reading it, as you appear to be doing, and it makes my brain glitch, I fix it.
Common low-hanging fruits for me tend to be more about renaming classes, methods, fields etc., and writing examples of behaviour (a.k.a. unit tests) when I can't be sure of what a class is doing by inspection - generally making the code more readable as I read it. None of these are what I'd call "invasive" but they're more than just cosmetic.
From experience it depends on two things: time and risk.
If you have plenty of time then you can do a lot more, if not then the scope of whatever changes you make is reduced accordingly. As much as I hate doing it I have had to create some horrible shameful hacks because I simply didn't have enough time to do it right...
If the code you are working on has lots of dependencies or is critical to the application then make as few changes as possible - you never know what your fix might break... :)
It sounds like you have a solid idea of what things should look like so I am not going to say what specific changes to make in what order 'cause that will vary from person to person. Just make small localized changes first, test, expand the scope of your changes, test. Expand. Test. Expand. Test. Until you either run out of time or there is no more room for improvement!
BTW When testing you are likely to see where things break most often - create test cases for them (JUnit or whatever).
EXCEPTION:
Two things that I always find myself doing are reformatting (CTRL+SHFT+F in Eclipse) and commenting code that is not obvious. After that I just hammer the most obvious nail first...
I am building a spring mvc web application.
I plan on using hibernate.
I don't have much experience with obfuscating etc.
What are the potential downsides to obfuscating an application?
I understand that there might be issues with debugging the app, and recovering lost source code is also an issue.
Are there any known issues with the actually running of the application? Can bugs be introduced?
Since this is an area I am looking for general guidance, please feel free to open up any issues that I should be aware of.
There are certainly some potential performance/maintenance issues, but a good obfuscator will let you get round at least some of them. Things to look out for:
an obvious one: if your code calls methods by reflection or dynamically loads classes, then this is liable to fail if the class/method names are obfuscated; a good obfuscator will let you select class/method names not to obfuscate to get round this problem;
a similar issue can occur if not all of your application is compiled at the same time;
if it deals directly at the bytecode level, an obfuscator can create code that in principle a Java compiler cannot create (e.g. it can insert arbitrary GOTO instructions, whereas from Java these can only be created as part of a loop)-- this may be a bit theoretical, but if I were writing a JVM, I'd optimise performance for sequences of bytecodes that a Java compiler can create, not ones that it can't...
the obfuscator is liable to make other subtle changes to performance if it significantly alters the number of bytecodes in a method, or in some way changes whether a given method/piece of code hits thresholds for certain JVM optimisations (e.g. "inline methods with fewer than X bytecodes").
But as you can see, some of these effects are a little subtle and theoretical-- so to some extent what you need to do is soak-test your application after obfuscation, just as you would with any other major change.
You should also be careful not to assume that obfuscation hides your code/algorithm (if that is your intention) as much as you want it to-- use a decompiler to have a look at the contents of the resulting obfuscated classes.
Surprised no one has mentioned speed - in general, more obfuscated = slower-running code
[Edit] I can't believe this has -2. It is a correct answer.
Shortening identifiers and removing unused methods will decrease the file-size, but have 0 impact on the running speed (other than the few nanoseconds shaved off the loading time). In the meanwhile, most of the obfuscation of the program comes from added code:
Breaking 1 method into 5; interleaving methods; merging classes [aggregation transformations]
Splitting 1 arithmetic expression into 10; jumbling the control-flow [computation transformations]
And adding chunks of code that do nothing [opaque predicates]
are all common obfuscation techniques that cause a program to run slower.
You may want to look at some of the comments here, to decide if obfuscating makes sense:
https://stackoverflow.com/questions/1988451/net-obfuscation
You may want to express why you want to obfuscate. IMO the best reasons are mainly to have a smaller application, as you can get rid of classes that aren't being used in your project, while obfuscating.
I have never seen bugs introduced, as long as you aren't using reflection, assuming you can find something, as private methods for example will have their names changed.
The biggest problem centers around that fact that obfuscating programs generally make a guarantee of not changing the behavior of their target program. In some cases it proves to be very hard to do this -- for example, imagine a program which checks the value of certain private fields via reflection from a string array. An obfuscator may not be able to tell that this string also needs to be updated correspondingly, and the result will be unexpected access errors that pop up at runtime.
Worse still, it may not be obvious that the behavior of a program has changed subtly -- then you may not know that there's a problem at all, until your customer finds it first and gets upset.
Generally, professional-grade obfuscation products are sophisticated enough to catch some kinds of problems and prevent them, but ultimately it can be challenging to cover all the bases. The best defense is to run unit tests against the obfuscated result and make sure that all your expected behavior continues to hold true.
1 free one you might want to check out is Babel. It is designed to be used on the command line (like many other obfuscators), there is a Reflector addin that will provide a UI for you.
When it comes to obfuscation, you really need to analyze what your goal is. In your case - if you have a web application (mvc) are you planning on selling it as a canned downloadable application? (if not and you keep the source on your web servers then you don't need it).
You might look at the components and pick only certain parts to obfuscate ... not the whole thing. In general ASP.Net apps break pretty easy when you try to add obfuscation after you developed them due to all the reflection used.
Pretty much everything mentioned above is true ... it all depends on how many features you turn on to make it hard to reverse your code:
Renaming of members (fields/methods/events/properties) is most common (comes in different flavors: simple renaming of methods from something like GetId() to a() all the way to unreadable characters and removal of namespaces). BTW: This is where reflection usually breaks. Your assembly file may end up being smaller due to smaller strings being used too.
String encryption: this makes it harder to reverse your static strings used in your code. BTW: this paired with renaming makes it difficult for you to debug your renaming problems ... so you might turn it on after you have that working. This also will have to add code to decrypt the string right before it is used in IL
Code mangling ... this is what BlueRaja was refering to. It makes your code look like spagetti code - to make it harder for someone to figure out. The CLR does not like this ... it can't optimize things as easy and your final code will mostlikely proccess slower due to the additional branching and something not being inlined due to the IL rewriting used for this option. BTW: this option really does raise the bar on what it takes to reverse you source code, but may come with a performance hit.
Removal of unused code. Some obfuscators offer you the option to trim any code that it finds not being used. This may make your assembly a little smaller if you have alot of dead code hanging around ... but it is just a free benefit obfuscators throw in.
My advice is to only use it if you know why you are using it and design with that end in mind ... don't try to add it after you've finished your code (I've done that and it's not fun)