Sanity Check - Significant increase in the number of objects when using JUNIT

Sanity Check - Significant increase in the number of objects when using JUNIT - java

I am using Junit for the first time in a project and I'm fascinated by the way it is forcing me to restructure my code. One thing I've noticed is that the number of objects I've created in order to be able to test chunks of code is significantly increasing. Is this typical?
Thanks,
Elliott

Yes, this is normal.
In general the smaller/more focused your classes and methods are, the easier to understand and test them. This might produce more files and actual lines of code, but it is because you are adding more abstractions that makes your code have a better/cleaner design.
You may want to read about the Single Responsibility Principle. Uncle Bob also has some re-factoring examples in his book called Clean Code where he touches on exactly these points.
One more thing when you are unit testing. Dependency Injection is one of the single most important thing that will save you a lot of headaches when it comes to structuring your code. (And just for clarification, DI will not necessary cause you to have more classes, but it will help decouple your classes more from each other.)

Yes, I think this is fairly typical. When I start introducing testing code into a legacy codebase, I find myself creating smaller utility classes and pojos and testing those. The original class just becomes a wrapper to call these smaller classes.
One example would be when you have a method which does a calculation, updates an object and then saves to a database.
public void calculateAndUpdate(Thing t) {
calculate(t); // quite a complex calculation with mutliple results & updates t
dao.save(t);
}
You could create a calculation object which is returned by the calculate method. The method then updates the Thing object and saves it.
public void calculateAndUpdate(Thing t) {
Calculation calculation = new Calculator().calculate(t); // does not update t at all
update(t, calculation); // updates t with the result of calculation
dao.save(t); // saves t to the database
}
So I've introduced two new objects, a Calculator & Calculation. This allows me to test the result of the calculation without having to have a database available. I can also unit test the update method as well. It's also more functional, which I like :-)
If I continued to test with the original method, then I would have to unit test the calculation udpate and save as one item. Which isn't nice.
For me, the second is a better code design, better separation of concerns, smaller classes, more easily tested. But the number of small classes goes up. But the overall complexity goes down.

depends on what kind of objects you are referring to. Typically, you should be fine with using a mocking framework like EasyMock or Mockito in which case the number of additional classes required solely for testing purposes should be pretty less. If you are referring to additional objects in your main source code, may be unit testing is helping you refactor your code to make it more readable and reusable, which is a good idea anyways IMHO :-)

Related

Why do we not mock domain objects in unit tests?

I give you 2 tests; the purpose of which is solely to confirm that when service.doSomething is called, emailService.sendEmail is called with the person's email as a parameter.
#Mock
private EmailService emailService;
#InjectMocks
private Service service;
#Captor
private ArgumentCaptor<String> stringCaptor;
#Test
public void test_that_when_doSomething_is_called_sendEmail_is_called_NO_MOCKING() {
final String email = "billy.tyne#myspace.com";
// There is only one way of building an Address and it requires all these fields
final Address crowsNest = new Address("334", "Main Street", "Gloucester", "MA", "01930", "USA");
// There is only one way of building a Phone and it requires all these fields
final Phone phone = new Phone("1", "978-281-2965");
// There is only one way of building a Vessel and it requires all these fields
final Vessel andreaGail = new Vessel("Andrea Gail", "Fishing", 92000);
// There is only one way of building a Person and it requires all these fields
final Person captain = new Person("Billy", "Tyne", email, crowsNest, phone, andreaGail);
service.doSomething(captain); // <-- This requires only the person's email to be initialised, it doesn't care about anything else
verify(emailService, times(1)).sendEmail(stringCaptor.capture());
assertThat(stringCaptor.getValue(), eq(email));
}
#Test
public void test_that_when_doSomething_is_called_sendEmail_is_called_WITH_MOCKING() {
final String email = "billy.tyne#myspace.com";
final Person captain = mock(Person.class);
when(captain.getEmail()).thenReturn(email);
service.doSomething(captain); // <-- This requires the person's email to be initialised, it doesn't care about anything else
verify(emailService, times(1)).sendEmail(stringCaptor.capture());
assertThat(stringCaptor.getValue(), eq(email));
}
Why is it that my team is telling me not to mock the domain objects required to run my tests, but not part of the actual test? I am told mocks are for the dependencies of the tested service only. In my opinion, the resulting test code is leaner, cleaner and easier to understand. There is nothing to distract from the purpose of the test which is to verify the call to emailService.sendEmail occurs. This is something that I have heard and accepted as gospel for a long time, over many jobs. But I still can not agree with.

I think I understand your team's position.
They are probably saying that you should reserve mocks for things that have hard-to-instantiate dependencies. That includes repositories that make calls to a database, and other services that can potentially have their own rats-nest of dependencies. It doesn't include domain objects that can be instantiated (even if filling out all the constructor arguments is a pain).
If you mock the domain objects then the test doesn't give you any code coverage of them. I know I'd rather get these domain objects covered by tests of services, controllers, repositories, etc. as much as possible and minimize tests written just to exercise their getters and setters directly. That lets tests of domain objects focus on any actual business logic.
That does mean that if the domain object has an error then tests of multiple components can fail. I think that's ok. I would still have tests of the domain objects (because it's easier to test those in isolation than to make sure all paths are covered in a test of a service), but I don't want to depend entirely on the domain object tests to accurately reflect how those objects are used in the service, it seems like too much to ask.
You have a point that the mocks allow you to make the objects without filling in all their data (and I'm sure the real code can get a lot worse than what is posted). It's a trade-off, but having code coverage that includes the actual domain objects as well as the service under test seems like a bigger win to me.
It seems to me like your team has chosen to err on the side of pragmatism vs purity. If everybody else has arrived at this consensus you need to respect that. Some things are worth making waves over. This isn't one of them.

It is a tradeoff, and you have designed your example nicely to be 'on the edge'. Generally, mocking should be done for a reason. Good reasons are:
You can not easily make the depended-on-component (DOC) behave as intended for your tests.
Does calling the DOC cause any non-derministic behaviour (date/time, randomness, network connections)?
The test setup is overly complex and/or maintenance intensive (like, need for external files) (* see below)
The original DOC brings portability problems for your test code.
Does using the original DOC cause unnacceptably long build / execution times?
Has the DOC stability (maturity) issues that make the tests unreliable, or, worse, is the DOC not even available yet?
For example, you (typically) don't mock standard library math functions like sin or cos, because they don't have any of the abovementioned problems.
Why is it recommendable to avoid mocking where unnecessary?
For one thing, mocking increases test complexity.
Secondly, mocking makes your tests dependent on the inner workings of your code, namely, on how the code interacts with the DOCs (like, in your case, that the captain's first name is obtained using getFirstName, although possibly another way might exist to get that information).
And, as Nathan mentioned, it may be seen as a plus that - without mocking - DOCs are tested for free - although I would be careful here: There is a risk that your tests lose focus if you get tempted to also test the DOCs. The DOCs should have tests of their own.
Why is your scenario 'on the edge'?
One of the abovementioned good reasons for mocking is marked with (*): "The test setup is overly complex ...", and your example is constructed to have a test setup that is a bit complex. Complexity of the test setup is obviously not a hard criterion and developers will simply have to make a choice. If you want to look at it this way, you could say that either way has some risks when it comes to future maintenance scenarios.
Summarized, I would say that neither position (generally to mock or generally not to mock) is right. Instead, developers should understand the decision criteria and then apply them to the specific situation. And, when the scenario is in the grey zone such that the criteria don't lead to a clear decision, don't fight over it.

There are two mistakes here.
First, testing that when a service method is called, it delegates to another method. That is a bad specification. A service method should be specified in terms of the values it returns (for getters) or the values that could be subsequently got (for mutators) through that service interface. The service layer should be treated as a Facade. In general, few methods should be specified in terms of which methods they delegate to and when they delegate. The delegations are implementation details and so should not be tested.
Unfortunately, the popular mocking frameworks encourage this erroneous approach. And so does over zealous use of Behaviour Driven Development.
The second mistake is centered around the very concept of unit testing. We would like each of our unit tests to test one thing, so when there is a fault in one thing, we have one test failure, and locating the fault is easy. And we tend to think of "unit" meaning the same as "method" or "class". This leads people to think that a unit test should involve only one real class, and all other classes should be mocked. This is impossible for all but the simplest of classes. Almost all Java code uses classes from the standard library, such as String or HashSet. Most professional Java code uses classes from various frameworks, such as Spring. Nobody seriously suggests mocking those. We accept that those classes are trustworthy, and so do not need mocking. We accept that it is OK not to mock "trustworthy" classes that the code of our unit uses. But, you say, our classes are not trustworthy, so we must mock them. Not so. You can trust those other classes, by having good unit tests for them. But how to avoid a tangle of interdependent classes that cause a confusing mass of test failures when there is only one fault present? That would be a nightmare to debug! Use a concept from 1970s programming (called, a virtual machine hierarchy, which is now a rather confusing term, given the additional meanings of virtual machine): arrange your software in layers from low level to high level, with higher layers performing operations using lower layers. Each layer provides a more expressive or advanced means of abstractly describing operations and objects. So, domain objects are in a low level, and the service layer is at a higher level. When several tests fail, start debugging the lowest level test failure(s): the fault will probably be in that layer, possibly (but probably not) in a lower layer, and not in a higher layer.
Reserve mocks only for input and output interfaces that would make the tests very expensive to run (typically, this means mocking the repository layer and the logging interface).

The intention of an automated test is to reveal that the intended behavior of some unit of software is no longer performing as expected (aka reveal bugs.)
The granularity/size/bounds of units under test in a given test suite is to be decided by you and your team.
Once that is decided, if something outside of that scope can be mocked without sacrificing the behavior being tested, then that means it is clearly irrelevant to the test, and it should be mocked. This will help with making your tests more:
Isolated
Fast
Readable (as you mentioned)
...and most importantly, when the test fails, it will reveal that the intended behavior of some unit of software is no longer performing as expected. Given a sufficiently small unit under test, it will be obvious where the bug has occurred and why.
If your test-without-mocks example were to fail, it could indicate an issue with Address, Phone, Vessel, or Person. This will cause wasted time tracking down exactly where the bug has occurred.
One thing I will mention is that your example with mocks is actually a bit unreadable IMO because you are asserting that a String will have a value of "Billy" but it is unclear why.

How to write tests first in Java?

I have read a lot about test-driven design. My project is using tests, but currently they are written after the code has been written, and I am not clear how to do it in the other direction.
Simple example: I have a class Rectangle. It has private fields width and height with corresponding getters and setters. Common Java. Now, I want to add a function getArea() which returns the product of both, but I want to write the test first.
Of course, I can write a unit test. But it isn’t the case that it fails, but it does not even compile, because there is no getArea() function yet. Does that mean that writing the test always already involves changing the productive code to introduce dummys without functionality? Or do I have to write the test in a way that it uses introspection? I don’t like the latter approach, because it makes the code less readable and later refactoring with tools will not discover it and break the test, and I know that we refactor a lot. Also, adding ‘dummys’ may include lots of changes, i.e. if I need additional fields, the database must be changed for Hibernate to continue to work, … that seems way to much productive code changes for me when yet “writing tests only”. What I would like to have is a situation where I can actually only write code inside src/test/, not touching src/main at all, but without introspection.
Is there a way to do that?

Well, TDD does not mean, that you cannot have anything in the production code before writing the test.
For example:
You put your method, e.g. getArea(param1, param2) in your production code with an empty body.
Then you write the test with valid input and your expected result.
You run the test and it will fail.
Then you change the production code and run the test again.
If it still fails: back to the previous step.
If it passes, you write the next test.
A quick introduction can be found for example here: codeutopia -> 5-step-method-to-make-test-driven-development-and-unit-testing-easy

What I would like to have is a situation where I can actually only write code inside src/test/, not touching src/main at all, but without introspection.
There isn't, that I have ever seen, a way to write a test with a dependency on a new part of the API, and have that test immediately compile without first extending the API of the test subject.
It's introspection or nothing.
But it isn’t the case that it fails, but it does not even compile, because there is no getArea() function yet
Historically, writing code that couldn't compile was part of the rhythm of TDD. Write a little bit of test code, write a little bit of production code, write a little bit of test code, write a little bit of production code, and so on.
Robert Martin describes this as the nano-cycle of TDD
... the goal is always to promote the line by line granularity that I experienced while working with Kent so long ago.
I've abandoned the nano-cycle constraint in my own work. Perhaps I fail to appreciate it because I've never paired with Kent.
But I'm perfectly happy to write tests that don't compile, and then back fill the production code I need when the test is in a satisfactory state. That works well for me because I normally work in a development environment that can generate production implementations at just a few key strokes.
Another possibility is to consider a discipline like TDD as if you meant it, which does a lot more of the real work in the test source hierarchy before moving code into the production hierarchy.

I've been working on Android dev quite sometimes, but never fully adopt TDD in Android. However I tried recently to develop my new app with complete TDD. So here is my opinion..
Does that mean that writing the test always already involves changing the productive code to introduce dummys without functionality?
I think is the yes. As I understand every tests are equivalent to every specs/use cases I have on the software. So writing a fail test first is about the attempt to filling in the requirement specs with test codes. Then when I tried to fill the productive code to pass the just-written TC, I really tried to make it work. After a doing this a while, I was pretty surprised how with my productive code size is very small, but it's able to fill how much of the requirement.
For me personally all the fail TC I wrote before productive code, were actually come from list of questions, which I brainstormed them about the requirement, and I sometimes used it to explore edge cases of requirement.
So the basic workflow is Red - Green - Refactor, which I got from the presentation from Bryan Breecham - https://www.infoq.com/presentations/tdd-lego/
About,
What I would like to have is a situation where I can actually only write code inside src/test/, not touching src/main at all, but without introspection.
For me I think it's possible, when you write all your productive logic first, then UT plays the roles of fulfilling the requirement. It's just the another way around. So in overall I think TDD is the approach but people may use Unit Test in different purposes, e.g reduce testing time, etc.

Unit testing methods which do not produce a distinct output

In my Vaadin GUI application, there are so many methods which look like below.
#Override
protected void loadLayout() {
CssLayout statusLayout = new CssLayout();
statusLayout.addComponent(connectedTextLabel);
statusLayout.addComponent(connectedCountLabel);
statusLayout.addComponent(notConnectedTextLabel);
statusLayout.addComponent(notConnectedCountLabel);
connectionsTable.getCustomHeaderLayout().addComponent(statusLayout);
connectionsTable.getCustomHeaderLayout().addComponent(commandLayout);
connectionsTable.getCustomHeaderLayout().addComponent(historyViewCheckbox);
bodySplitter.addComponent(connectionsTable);
bodySplitter.addComponent(connectionHistoryTable);
bodySplitter.setSplitPosition(75, Sizeable.Unit.PERCENTAGE);
bodySplitter.setSizeFull();
bodyLayout.addComponent(bodySplitter);
if (connectionDef.getConnectionHistoryDef() == null) {
historyViewCheckbox.setVisible(false);
}
if (connectionDef.getConnectionStatusField() == null || connectionDef.getConnectedStatusValue() == null || connectionDef.getConnectedStatusValue().isEmpty()) {
connectedTextLabel.setVisible(false);
connectedCountLabel.setVisible(false);
notConnectedTextLabel.setVisible(false);
notConnectedCountLabel.setVisible(false);
}
}
protected void setStyleNamesAndControlIds() {
mainLayout.setId("mainLayout");
header.setId("header");
footer.setId("footer");
propertyEditorLayout.setId("propertyEditorLayout");
propertyEditor.setId("propertyEditor");
mainLayout.setStyleName("mainLayout");
propertyEditorLayout.setStyleName("ui_action_edit");
header.setStyleName("TopPane");
footer.setStyleName("footer");
}
These methods are used for setting up the layout of GUIs. They do not produce a single distinct output. Almost every line in these methods is doing a separate job, which is not almost relevant to other lines.
Usually, when unit testing a method, I check the return value of the method, or validate calls on a limited number of external objects such as database connections.
But, for methods like above, there is no such single output. If I wrote unit tests for such methods, My test code checks for each method call happens in every line in the method, and in the end, it looks almost like the method itself.
If someone altered the code in any way, the test will break and they will have to update the test to match the change. But, there is no assurance that the change didn't actually break anything since test doesn't check the actual UI drawn in the browser.
For an example, if someone changed a style name of a control, he will have to update the test code with the new style name and the test will pass. But, for things to actually work without any issue, he has to change the relevant scss style files too. But the test didn't make any contribution to detect this issue. Same applies to layout setup code as well.
Is there any advantage of writing unit tests like above, other than keeping the code coverage rating at a higher level? For me, it feels useless and writing a test to compare the decompiled bytecode of the method to the original decompiled bytecode kept as a string in the test looks much better than these kinds of tests.

Is there any advantage of writing unit tests like above, other than
keeping the code coverage rating at a higher level?
Yes, if you take a sensible approach. It might not make sense, as you say, to test that a control has a particular style. So focus your tests on the parts of your code that are likely to break. If there is any conditional logic that goes into producing your UI, test that logic. The test will then protect your code from future changes that could break your logic.
As for you comment about testing methods that don't return a value, you can address that several ways.
It's your code, so you can restructure it to be more testable. Think about breaking it down into smaller methods. Isolate your logic into individual methods that can be called in a test.
Indirect verification - Rather than focusing on return values, focus on the effect your method has on other objects in the system.
Finally consider if unit testing of the UI is right for you and your organization. UIs are often difficult to unit test (as you have pointed out). Many companies write functional tests for their UIs. These are tests that drive the UI of the actual product. This is very different from unit tests which do not require the full product and are targeted at very small units of functionality.

Here's one simple example you could look to see how to fly over your application and try what is needed. This is vaadin 8, CDI & Wildfly Swarm example and in no way the only way to test UI's of Vaadin application.
https://github.com/wildfly-swarm/wildfly-swarm-examples/blob/master/vaadin/src/it/java/org/wildfly/swarm/it/vaadin/VaadinApplicationIT.java

Change method accessibility in order to test it

I have a public method that calls group of private methods.
I would like to test each of the private method with unit test as it is too complicated to test everything through the public method ,
Is think it will be a bad practice to change method accessibility only for testing purposes.
But I dont see any other way to test it (maybe reflection , but it is ugly)

Private methods should only exist as a consequence of refactoring a public method, that you've developed using TDD.
If you create a class with public methods and plan to add private methods to it, then your architecture will fail.
I know it's harsh, but what you're asking for is really, really bad software design.
I suggest you buy Uncle Bob's book "Clean Code"
http://www.amazon.co.uk/Clean-Code-Handbook-Software-Craftsmanship/dp/0132350882
Which basically gives you a great foundation for getting it right and saving you a lot of grief in your future as a developer.

There is IMO only one correct answer to this question; If the the class is too complex it means it's doing too much and has too many responsibilities. You need to extract those responsibilities into other classes that can be tested separately.
So the answer to your question is NO!
What you have is a code smell. You're seeing the symptoms of a problem, but you're not curing it. What you need to do is to use refactoring techniques like extract class or extract subclass. Try to see if you can extract one of those private methods (or parts of it) into a class of itself. Then you can add unit test to that new class. Divide and conquer untill you have the code under control.

You could, as has been mentioned, change the visibility from private to package, and then ensure that the unit-tests are in the same package (which should normally be the case anyway).
This can be an acceptable solution to your testing problem, given that the interfaces of the (now) private functions are sufficiently stable and that you also do some integration testing (that is, checking that the public methods call the private ones in the correct way).
There are, however, some other options you might want to consider:
If the private functions are interface-stable but sufficiently complex, you might consider creating separate classes for them - it is likely that some of them might benefit from being split into several smaller functions themselves.
If testing the private functions via the public interface is inconvenient (maybe because of the need for a complex setup), this can sometimes be solved by the use of helper functions that simplify the setup and allow different tests to share common setup code.

You are right, changing the visibility of methods just so you are able to test them is a bad thing to do. Here are the options you have:
Test it through existing public methods. You really shouldn't test methods but behavior, which normally needs multiple methods anyway. So stop thinking about testing that method, but figure out the behavior that is not tested. If your class is well designed it should be easily testable.
Move the method into a new class. This is probably the best solution to your problem from a design perspective. If your code is so complex that you can't reach all the paths in that private method, parts of it should probably live in their own class. In that class they will have at least package scope and can easily be tested. Again: you should still test behavior not methods.
Use reflection. You can access private fields and methods using reflection. While this is technical possible it just adds more legacy code to the existing legacy code in order to hide the legacy code. In the general case a rather stupid thing to do. There are exceptions to this. For example is for some reason you are not allowed to make even the smallest change to the production source code. If you really need this, google it.
Just change the visibility Yes it is bad practice. But sometimes the alternatives are: Make large changes without tests or don't test it at all. So sometimes it is ok to just bite the bullet and change the visibility. Especially when it is the first step for writing some tests and then extracting the behavior in its own class.

How to refactor legacy code effectively and efficiently? [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicates:
What should I keep in mind in order to refactor huge code base?
When is it good (if ever) to scrap production code and start over?
I am currently working with some legacy source code files. They have quite a few problems because they were written by a database expert who does not know much about Java. For instance,
Fields in classes are public. No getters and setters.
Use raw types, not parameterized types.
Use static unnecessarily.
Super long method names.
Methods need too many parameters.
Repeat Yourself frequently.
I want to modify them so that they are more object-oriented. What are some best practices and effective/efficient approaches?

Read "Working Effectively with Legacy Code" by Michael Feathers. Great book - and obviously it'll be a lot more detailed than answers here. It's got lots of techniques for handling things sensibly.
It looks like you've identified a number of issues, which is a large part of the problem. Many of those sound like they can be fixed relatively easily - it's overall design and architecture which is harder to do, of course.
Are there already unit tests, or will you be adding those too?

Before you start, create a system-level regression test suite for the application. You need this so that you can verify that your changes don't break things.
To do the refactoring, you want a use a combination of a good IDE, and text search tool (e.g. grep). Use the text search tool to find occurrences of the "syndromes" that you want to fix, then use the IDE (and its builtin refactoring capabilities) to fix the instances ... one at a time.
For example, Eclipse allows you to rename a method or class, or generate getters and setters. So you'd cure a 'public' attribute by:
Change the attribute to private.
Generate the getter and setter methods.
Save the file.
Go through all of the Java compilation errors resulting from the fact that the attribute is now private, and change to use the getter or setter as appropriate.
This approach will give you the low-hanging fruit. More fundamental design issues are more difficult, and may be impossible to fix without fundamental restructuring of the application. The refactoring capabilities will help you execute such changes, but deciding what to do is ultimately up to you.
Finally, my advice is to not be too ambitious. Go for incremental improvement, and be prepared to draw the line when the code is "good enough". You won't achieve perfection ... not even if you start from a clean slate ... so don't set your expectations high.

Is it just the code that is bad, or does it also hurt the user experience? Refactoring continuously is a good idea, but it should not be a goal unto itself. It should improve the application in terms of user interaction, maintainability, stability, performance, etc.
That is why I am not extremely fond of huge refactoring just to improve the code quality. Instead, refactor the code that you work with.
While working with a legacy system for several years, I have personally found that:
Create for yourself a vision of how you want the code after you're done. It should be attainable, contain a list of technology changes, general architecture changes. It may also be a good idea to make a rough priority of what classes are most critical to change. We lacked such a vision a few years ago, and while we refactored a lot, the code quality barely improved.
Now, you should restrict your refactoring to those that make you reach your vision. Don't fall into the trap of doing what appears good at the moment.
Focus on a particular component, and make it better. Then move on to the next. It's tempting to make huge changes that affect the entire system, but in truth you will introduce more problems than you solve.
Write integration regression tests. I.e., a few big tests that test a lot of functionality. It's not optimal, but it's the best you can do. Writing unit tests for every single class in your old system may end up a waste of time because it's not designed to be tested anyway and you want to redesign half of the classes.
Accept that it will take time.

Eclipse should be able to take care of #1 and help you work your way through many of the others.
As for converting poor OO code to good OO code it is amazingly difficult. Often it seems easier to rewrite it from scratch.
I tend to go from the bottom up. As I'm working on some small section I'll recognize a bunch of data that belongs together as a group and I'll make a good object that replaces that code without changing anything else--Very Small Changes with constant tests between each change.
This makes for a mediocre design at best, but I honestly don't know if you can go from not OO to good OO on a large project without dissecting the original program, understanding it and using it as a template for the rewrite and few projects allow this (even though it might be faster, you'll rarely if ever be able to convince management of that fact)

The point is risk I think.
The ugly code is just ugly, but it could work, it has been tested and bugfixed. If runnable code is changed, risk will follow. so test is critical.
You could refactor related code when
you have to bugfix, as a conservatism solution.
Maybe the first challenge is to persuade your manager:)

What's the problem with it not having getters and setters? I'd suggest refactoring those only when you need to add non-trivial getters or setters (e.g., with validation).
The rest sounds like you need to identify groups of values and create new types holding them, so instead of passing a String name, String address, int yearOfBirth, String[] accountNames, int[] balances you would pass a Customer around, which would in turn have an Account[].
IDEA Ultimate Edition has a code duplication detector that's very good (it's only missing a 'suggested solution' button!), and there are CPD etc.
I'd suggest that in a large legacy codebase you might waste time refactoring code only to find out it wasn't used anyway. I outlined some steps for removing unused code: http://rickyclarkson.blogspot.com/2009/12/deleting-code-what-first.html

How many of those "issues" are real problems and not just matters of style? Of this list, the only 'real' issue I can see is "Repeat Yourself frequently", and that's more of an ongoing maintenance problem that should be resolved during normal code maintenance when someone's going to be changing the code anyway.

I want to modify them so that they are more object-oriented.
Object-Orientation should not be your only goal when refactoring. The question you should ask yourself is what is the expected ROI (better quality ? easier enhancements ? better sharing of this code across a team ?) A ROI is not just words, you should be prepared to measure with numbers the return on investment (even the quality enhancements for example). You should take into accounts the life duration of your products in estimating the ROI.
You should also ask yourself what is the size of the code which is dependent on the code you want to refactor. Refactoring a library could be easy but could lead to a lot of changes in source codes dependent on this library, a work well larger than just refactoring the library.
Before touching any code, you should estimate the total work that needs to be done to finish refactoring the code and dependent code. You should estimate a total rewrite of the code, a partial rewrite, or just an internal rewrite without touching APIs.
With the costs and returns, you could decide if it's worth the effort to refactor your code.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.