I am building a spring mvc web application.
I plan on using hibernate.
I don't have much experience with obfuscating etc.
What are the potential downsides to obfuscating an application?
I understand that there might be issues with debugging the app, and recovering lost source code is also an issue.
Are there any known issues with the actually running of the application? Can bugs be introduced?
Since this is an area I am looking for general guidance, please feel free to open up any issues that I should be aware of.
There are certainly some potential performance/maintenance issues, but a good obfuscator will let you get round at least some of them. Things to look out for:
an obvious one: if your code calls methods by reflection or dynamically loads classes, then this is liable to fail if the class/method names are obfuscated; a good obfuscator will let you select class/method names not to obfuscate to get round this problem;
a similar issue can occur if not all of your application is compiled at the same time;
if it deals directly at the bytecode level, an obfuscator can create code that in principle a Java compiler cannot create (e.g. it can insert arbitrary GOTO instructions, whereas from Java these can only be created as part of a loop)-- this may be a bit theoretical, but if I were writing a JVM, I'd optimise performance for sequences of bytecodes that a Java compiler can create, not ones that it can't...
the obfuscator is liable to make other subtle changes to performance if it significantly alters the number of bytecodes in a method, or in some way changes whether a given method/piece of code hits thresholds for certain JVM optimisations (e.g. "inline methods with fewer than X bytecodes").
But as you can see, some of these effects are a little subtle and theoretical-- so to some extent what you need to do is soak-test your application after obfuscation, just as you would with any other major change.
You should also be careful not to assume that obfuscation hides your code/algorithm (if that is your intention) as much as you want it to-- use a decompiler to have a look at the contents of the resulting obfuscated classes.
Surprised no one has mentioned speed - in general, more obfuscated = slower-running code
[Edit] I can't believe this has -2. It is a correct answer.
Shortening identifiers and removing unused methods will decrease the file-size, but have 0 impact on the running speed (other than the few nanoseconds shaved off the loading time). In the meanwhile, most of the obfuscation of the program comes from added code:
Breaking 1 method into 5; interleaving methods; merging classes [aggregation transformations]
Splitting 1 arithmetic expression into 10; jumbling the control-flow [computation transformations]
And adding chunks of code that do nothing [opaque predicates]
are all common obfuscation techniques that cause a program to run slower.
You may want to look at some of the comments here, to decide if obfuscating makes sense:
https://stackoverflow.com/questions/1988451/net-obfuscation
You may want to express why you want to obfuscate. IMO the best reasons are mainly to have a smaller application, as you can get rid of classes that aren't being used in your project, while obfuscating.
I have never seen bugs introduced, as long as you aren't using reflection, assuming you can find something, as private methods for example will have their names changed.
The biggest problem centers around that fact that obfuscating programs generally make a guarantee of not changing the behavior of their target program. In some cases it proves to be very hard to do this -- for example, imagine a program which checks the value of certain private fields via reflection from a string array. An obfuscator may not be able to tell that this string also needs to be updated correspondingly, and the result will be unexpected access errors that pop up at runtime.
Worse still, it may not be obvious that the behavior of a program has changed subtly -- then you may not know that there's a problem at all, until your customer finds it first and gets upset.
Generally, professional-grade obfuscation products are sophisticated enough to catch some kinds of problems and prevent them, but ultimately it can be challenging to cover all the bases. The best defense is to run unit tests against the obfuscated result and make sure that all your expected behavior continues to hold true.
1 free one you might want to check out is Babel. It is designed to be used on the command line (like many other obfuscators), there is a Reflector addin that will provide a UI for you.
When it comes to obfuscation, you really need to analyze what your goal is. In your case - if you have a web application (mvc) are you planning on selling it as a canned downloadable application? (if not and you keep the source on your web servers then you don't need it).
You might look at the components and pick only certain parts to obfuscate ... not the whole thing. In general ASP.Net apps break pretty easy when you try to add obfuscation after you developed them due to all the reflection used.
Pretty much everything mentioned above is true ... it all depends on how many features you turn on to make it hard to reverse your code:
Renaming of members (fields/methods/events/properties) is most common (comes in different flavors: simple renaming of methods from something like GetId() to a() all the way to unreadable characters and removal of namespaces). BTW: This is where reflection usually breaks. Your assembly file may end up being smaller due to smaller strings being used too.
String encryption: this makes it harder to reverse your static strings used in your code. BTW: this paired with renaming makes it difficult for you to debug your renaming problems ... so you might turn it on after you have that working. This also will have to add code to decrypt the string right before it is used in IL
Code mangling ... this is what BlueRaja was refering to. It makes your code look like spagetti code - to make it harder for someone to figure out. The CLR does not like this ... it can't optimize things as easy and your final code will mostlikely proccess slower due to the additional branching and something not being inlined due to the IL rewriting used for this option. BTW: this option really does raise the bar on what it takes to reverse you source code, but may come with a performance hit.
Removal of unused code. Some obfuscators offer you the option to trim any code that it finds not being used. This may make your assembly a little smaller if you have alot of dead code hanging around ... but it is just a free benefit obfuscators throw in.
My advice is to only use it if you know why you are using it and design with that end in mind ... don't try to add it after you've finished your code (I've done that and it's not fun)
Related
I wrote a JMH test about the cost of new instruction, and checked the class files it generates. Except the usual classes, there're tons of derived classes in generated folder:
This really shocked me, for just few annotations will lead to so many class bounded together through inheritance. I am curious about what things are in those class, so I use a decompile tool (BTW I learned this tool from one talk on KotlinConf 2019) called procyon to decompile these generated class, most of them are control related, like measure time (they are explicitly specified can't be inlined) and collect metrics. But there're tons of weird boolean in those class:
there're many booleans in other generted class files as well. I googled this, and seems they're somewhat derived from JMH sourse code. So I want to ask what is these booleans used for? I will assume they are closely related to the working principle underlying the JMH... seems no comments about the booleans in JMH source code.
Also, any suggestions about improving the JMH test I mentioned from the very beginning...? I know testing such thing can be very tricky and vulnerable, so I don't know if they are accurate, or reliable enough.
Many thanks.
Just guessing.
As you can see, the booleans are private and unused in the source file. They might be used somewhere via reflection, but I'd bet they aren't. So the only thing left is ensuring that markerBegin and the other field belong to a different cache in order to prevent false sharing.
I am a senior developer, so this appears to me a stupid question. My answer should be NO, or WHAT? NO!!!
But I was in a meeting yesterday, and I was explaining some PMD results. When we get to the "too long method name" issue, I started to explain and the customer said: well, and remember a long method name has an impact on performance, the program run slower.
I said: no, you are wrong, is only a clean code rule, and is important to get a good code, but has nothing to do with performance, the bytecode is similar with different names.
But the client, and there were some people in the meeting arguing in this, was sure about this. They had some projects in that long method names were the cause of poor performance.
The only idea I have is that some introspection or reflection thing has is related to this, but apart from this, I am sure, or I thought I was Sure, the method name length has not any performance impact.
Any idea or suggestion about this?
Arguably it will take more space in memory and storage - so a jar file containing classes with enormous method names will be larger than one with short class names, for example.
However, any difference in performance is incredibly unlikely to be noticeable. I think it's almost certain that the projects where they were blaming long method names for poor performance were actually misdiagnosed. It's not like it would be the first time that's happened.
Of course, the best way to take the heat out of this situation is to provide evidence - if performance is important, you should have tests for performance. Run those tests with long method names, then refactor them to short method names and rerun the tests. I'd be incredibly surprised if there were a significant difference.
Method names are not just relevant with reflection but also during class loading and of course in both cases a long method names means that at some level there is more for the CPU to do. However, with method name length that are even remotely practical (i.e. not thousands of characters long), I am absolutely certain that it's impossible for this to be significant compared to other things that have to be done during reflection or class loading.
But the client, and there were some people in the meeting arguing in
this, was sure about this. They had some projects in that long method
names were the cause of poor performance.
It sounds like a total guess being treated as fact.
This is just a case of some people's general nuttiness about performance.
Even if they happen to be right, it's a total guess.
Every program has room for performance improvement by changing certain things.
Guessing does not inform you what those things are.
If two programs that do the same thing have different performance, it only means they've been optimized to different degrees.
Your challenge is to explain this.
Startup times will be affected positively if class names and member names are shortened. To that end one can use a bytecode shrinker.
For example, yguard (LGPL) can shrink code. It also allows you to deobfuscate stack traces for debugging purposes.
Manually assigning short class and member names for performance reasons is of course a horrible idea.
I can't why it can possibly impact performance significantly unless you are pulling method names out yourself through reflection and then render them on an UI. That is obviously not the case. So I'm just confused. Are you sure your client isn't confusing method name with file name or is he thinking about the cases where some really old programming languages do not support super long method names? Depending on how old that person is, their judgement is definitely absurd to a computer scientist. If they can prove their point with fact, they may as well submit it to ACM, Oracle/Sun or MIT to verify their findings.
I think the length of function name impact to performance as followed:
compile time from bytecode to binary code (with java, .net, ..). The byte code still contains file name, class name, package name.
if we use *.lib, *.dll, *.so it may impact to performance (in android for example when you use native code)
when we use native code to call to java function (in java, android)
when a black box (lib file,app) connect to other black boxes (lib file,app) it use function name in header file as the indetification. So I think length of name will impact to performance.
When I receive code I have not seen before to refactor it into some sane state, I normally fix "cosmetic" things (like converting StringTokenizers to String#split(), replacing pre-1.2 collections by newer collections, making fields final, converting C-style arrays to Java-style arrays, ...) while reading the source code I have to get familiar with.
Are there many people using this strategy (maybe it is some kind of "best practice" I don't know?) or is this considered too dangerous, and not touching old code if it is not absolutely necessary is generally prefered? Or is it more common to combine the "cosmetic cleanup" step with the more invasive "general refactoring" step?
What are the common "low-hanging fruits" when doing "cosmetic clean-up" (vs. refactoring with more invasive changes)?
In my opinion, "cosmetic cleanup" is "general refactoring." You're just changing the code to make it more understandable without changing its behavior.
I always refactor by attacking the minor changes first. The more readable you can make the code quickly, the easier it will be to do the structural changes later - especially since it helps you look for repeated code, etc.
I typically start by looking at code that is used frequently and will need to be changed often, first. (This has the biggest impact in the least time...) Variable naming is probably the easiest and safest "low hanging fruit" to attack first, followed by framework updates (collection changes, updated methods, etc). Once those are done, breaking up large methods is usually my next step, followed by other typical refactorings.
There is no right or wrong answer here, as this depends largely on circumstances.
If the code is live, working, undocumented, and contains no testing infrastructure, then I wouldn't touch it. If someone comes back in the future and wants new features, I will try to work them into the existing code while changing as little as possible.
If the code is buggy, problematic, missing features, and was written by a programmer that no longer works with the company, then I would probably redesign and rewrite the whole thing. I could always still reference that programmer's code for a specific solution to a specific problem, but it would help me reorganize everything in my mind and in source. In this situation, the whole thing is probably poorly designed and it could use a complete re-think.
For everything in between, I would take the approach you outlined. I would start by cleaning up everything cosmetically so that I can see what's going on. Then I'd start working on whatever code stood out as needing the most work. I would add documentation as I understand how it works so that I will help remember what's going on.
Ultimately, remember that if you're going to be maintaining the code now, it should be up to your standards. Where it's not, you should take the time to bring it up to your standards - whatever that takes. This will save you a lot of time, effort, and frustration down the road.
The lowest-hanging cosmetic fruit is (in Eclipse, anyway) shift-control-F. Automatic formatting is your friend.
First thing I do is trying to hide most of the things to the outside world. If the code is crappy most of the time the guy that implemented it did not know much about data hiding and alike.
So my advice, first thing to do:
Turn as many members and methods as
private as you can without breaking the
compilation.
As a second step I try to identify the interfaces. I replace the concrete classes through the interfaces in all methods of related classes. This way you decouple the classes a bit.
Further refactoring can then be done more safely and locally.
You can buy a copy of Refactoring: Improving the Design of Existing Code from Martin Fowler, you'll find a lot of things you can do during your refactoring operation.
Plus you can use tools provided by your IDE and others code analyzers such as Findbugs or PMD to detect problems in your code.
Resources :
www.refactoring.com
wikipedia - List of tools for static code analysis in java
On the same topic :
How do you refactor a large messy codebase?
Code analyzers: PMD & FindBugs
By starting with "cosmetic cleanup" you get a good overview of how messy the code is and this combined with better readability is a good beginning.
I always (yeah, right... sometimes there's something called a deadline that mess with me) start with this approach and it has served me very well so far.
You're on the right track. By doing the small fixes you'll be more familiar with the code and the bigger fixes will be easier to do with all the detritus out of the way.
Run a tool like JDepend, CheckStyle or PMD on the source. They can automatically do loads of changes that are cosemetic but based on general refactoring rules.
I do not change old code except to reformat it using the IDE. There is too much risk of introducing a bug - or removing a bug that other code now depends upon! Or introducing a dependency that didn't exist such as using the heap instead of the stack.
Beyond the IDE reformat, I don't change code that the boss hasn't asked me to change. If something is egregious, I ask the boss if I can make changes and state a case of why this is good for the company.
If the boss asks me to fix a bug in the code, I make as few changes as possible. Say the bug is in a simple for loop. I'd refactor the loop into a new method. Then I'd write a test case for that method to demonstrate I have located the bug. Then I'd fix the new method. Then I'd make sure the test cases pass.
Yeah, I'm a contractor. Contracting gives you a different point of view. I recommend it.
There is one thing you should be aware of. The code you are starting with has been TESTED and approved, and your changes automatically means that that retesting must happen as you may have inadvertently broken some behaviour elsewhere.
Besides, everybody makes errors. Every non-trivial change you make (changing StringTokenizer to split is not an automatic feature in e.g. Eclipse, so you write it yourself) is an opportunity for errors to creep in. Do you get the exact behaviour right of a conditional, or did you by mere mistake forget a !?
Hence, your changes implies retesting. That work may be quite substantial and severely overwhelm the small changes you have done.
I don't normally bother going through old code looking for problems. However, if I'm reading it, as you appear to be doing, and it makes my brain glitch, I fix it.
Common low-hanging fruits for me tend to be more about renaming classes, methods, fields etc., and writing examples of behaviour (a.k.a. unit tests) when I can't be sure of what a class is doing by inspection - generally making the code more readable as I read it. None of these are what I'd call "invasive" but they're more than just cosmetic.
From experience it depends on two things: time and risk.
If you have plenty of time then you can do a lot more, if not then the scope of whatever changes you make is reduced accordingly. As much as I hate doing it I have had to create some horrible shameful hacks because I simply didn't have enough time to do it right...
If the code you are working on has lots of dependencies or is critical to the application then make as few changes as possible - you never know what your fix might break... :)
It sounds like you have a solid idea of what things should look like so I am not going to say what specific changes to make in what order 'cause that will vary from person to person. Just make small localized changes first, test, expand the scope of your changes, test. Expand. Test. Expand. Test. Until you either run out of time or there is no more room for improvement!
BTW When testing you are likely to see where things break most often - create test cases for them (JUnit or whatever).
EXCEPTION:
Two things that I always find myself doing are reformatting (CTRL+SHFT+F in Eclipse) and commenting code that is not obvious. After that I just hammer the most obvious nail first...
I help maintain and build on a fairly large Swing GUI, with a lot of complex interaction. Often I find myself fixing bugs that are the result of things getting into odd states due to some race condition somewhere else in the code.
As the code base gets large, I've found it's gotten less consistent about specifying via documentation which methods have threading restrictions: most commonly, methods that must be run on the Swing EDT. Similarly, it would be useful to know and provide static awareness into which (of our custom) listeners are notified on the EDT by specification.
So it came to me that this should be something that could be easily enforced using annotations. Lo and behold, there exists at least one static analysis tool, CheckThread, that uses annotations to accomplish this. It seems to allow you to declare a method to be confined to a specific thread (most commonly the EDT), and will flag methods that try to call that method without also declaring themselves as confined to that thread.
So on the surface this just seems like a low-pain, huge-gain addition to the source and build cycle. My questions are:
Are there any success stories for people using CheckThread or similar libraries to enforce threading constraints? Any stories of failure? Why did it succeed/fail?
Is this good in theory? Are there theoretical downsides?
Is this good in practice? Is it worth it? What kind of value has it delivered?
If it works in practice, what are good tools to support this? I've just found CheckThread but admit I'm not entirely sure what I'm searching for to find other tools that do the same thing.
I know whether it's right for us depends on our scenario. But I've never heard of people using something like this in practice, and to be honest it doesn't seem to have taken hold much from some general browsing. So I'm wondering why.
This answer is more focused on the theory aspect of your question.
Fundamentally you are making an assertion: "This methods runs only under certain threads". This assertion isn't really different than any other assertion you might make ("The method accepts only integers less than 17 for parameter X"). Issues are
Where do such assertions come from?
Can static analyzers check them?
Where do you get such a static analyzer?
Mostly such assertions have to come from the software designers, as they are the only people that know the intentions. The traditional term for this is "Design by Contract",
although most DBC schemes are only over the current program state (C's assert macro) and they should really be over the programs' past and future states ("temporal assertions"), e.,g., "This routine will allocate a block of storage, and eventually some piece of code will deallocate it". One can build tools that try to determine hueristically what the assertions are (e.g., Engler's assertion induction work; others have done work in this area). That's useful, but the false positives are an issue. As practical matter, asking the designers to code such assertions doesn't seem particularly onerous, and is really good long term documentation. Whether you code such assertions with a specific "Contract" language construct, or with an if statement ("if Debug && Not(assertion) Then Fail();") or hide them in an annotation is really just a matter of convenience. Its nice when the language allows to code such assertions directly.
Checking of such assertions statically is difficult. If you stick with current-state only, the static analyzer pretty much has to do full data flow analysis of your entire application, because the information needed to satisfy the assertion likely comes from data created by another part of the application. (In your case, the "inside EDT" signal has to come from analyzing the whole call graph of the application to see if there is any call-path that leads to the method from a thread which is NOT the EDT thread). If you use temporal properties, the static check pretty much needs some kind of state-space verification logic in addition; these are presently still pretty much research tools. Even with all this machinery, static analyzers generally have to be "conservative" in their anlayses; if they can't demonstrate that something is false, they pretty much have to assume it is true, because of the halting problem.
Where do you get such analyzers? Given all the machinery needed, they're hard to build and so you should expect them to be rare. If somebody has built one, great. If not... as a general rule, you don't want do this yourself from scratch. The best long-term hope is to have generic program analysis machinery available on which to build such analyzers, to amortize the cost of building all the infrastructure. (I build program analyzer tool foundations; see our DMS Software Reengineering Toolkit).
One way to make it "easier" to build such static analyzers is to restrict the cases they handle to narrow scope, e.g., CheckThread. I'd expect CheckThread to do exactly what it presently does, and it would be unlikely to get a lot stronger.
The reason that "assert" macros and other such dynamic "current state" checks are popular is that they can actually be implemented by a simple runtime test. That's pretty practical. The problem here is that you may never exercise a path that leads to a failed conditions. So, for dynamic analysis, absence of detected failure is not really evidence of correctness. Still feels good.
Bottom line: static analyzers and dynamic analyzers each have their strength.
We haven't tried any static analysis tools, but we've used AspectJ to write a simple aspect that detects at runtime when any code in java.awt or javax.swing is invoked outside the EDT. It has found several places in our code that were missing a SwingUtilities.invokeLater(). We run with this aspect enabled throughout our QA cycle, then turn it off shortly before release.
As requested, this doesn’t pertain specifically to Java or the EDT, but I’ve seen good results with Coverity’s concurrency static analysis checkers for C/C++. They did have a higher false positive rate than less complicated checkers, but the code owners seemed willing to put up with that, given how hard threading bugs can be to find via testing. The details are confidential, I’m afraid, but Dawson Engler’s public papers (e.g., “Bugs as Deviant Behavior”) are very good on the general approach of “The following «N» instances of your code do «X» before doing «Y»,; this instance doesn’t.”
This question already has answers here:
Closed 12 years ago.
Possible Duplicates:
What should I keep in mind in order to refactor huge code base?
When is it good (if ever) to scrap production code and start over?
I am currently working with some legacy source code files. They have quite a few problems because they were written by a database expert who does not know much about Java. For instance,
Fields in classes are public. No getters and setters.
Use raw types, not parameterized types.
Use static unnecessarily.
Super long method names.
Methods need too many parameters.
Repeat Yourself frequently.
I want to modify them so that they are more object-oriented. What are some best practices and effective/efficient approaches?
Read "Working Effectively with Legacy Code" by Michael Feathers. Great book - and obviously it'll be a lot more detailed than answers here. It's got lots of techniques for handling things sensibly.
It looks like you've identified a number of issues, which is a large part of the problem. Many of those sound like they can be fixed relatively easily - it's overall design and architecture which is harder to do, of course.
Are there already unit tests, or will you be adding those too?
Before you start, create a system-level regression test suite for the application. You need this so that you can verify that your changes don't break things.
To do the refactoring, you want a use a combination of a good IDE, and text search tool (e.g. grep). Use the text search tool to find occurrences of the "syndromes" that you want to fix, then use the IDE (and its builtin refactoring capabilities) to fix the instances ... one at a time.
For example, Eclipse allows you to rename a method or class, or generate getters and setters. So you'd cure a 'public' attribute by:
Change the attribute to private.
Generate the getter and setter methods.
Save the file.
Go through all of the Java compilation errors resulting from the fact that the attribute is now private, and change to use the getter or setter as appropriate.
This approach will give you the low-hanging fruit. More fundamental design issues are more difficult, and may be impossible to fix without fundamental restructuring of the application. The refactoring capabilities will help you execute such changes, but deciding what to do is ultimately up to you.
Finally, my advice is to not be too ambitious. Go for incremental improvement, and be prepared to draw the line when the code is "good enough". You won't achieve perfection ... not even if you start from a clean slate ... so don't set your expectations high.
Is it just the code that is bad, or does it also hurt the user experience? Refactoring continuously is a good idea, but it should not be a goal unto itself. It should improve the application in terms of user interaction, maintainability, stability, performance, etc.
That is why I am not extremely fond of huge refactoring just to improve the code quality. Instead, refactor the code that you work with.
While working with a legacy system for several years, I have personally found that:
Create for yourself a vision of how you want the code after you're done. It should be attainable, contain a list of technology changes, general architecture changes. It may also be a good idea to make a rough priority of what classes are most critical to change. We lacked such a vision a few years ago, and while we refactored a lot, the code quality barely improved.
Now, you should restrict your refactoring to those that make you reach your vision. Don't fall into the trap of doing what appears good at the moment.
Focus on a particular component, and make it better. Then move on to the next. It's tempting to make huge changes that affect the entire system, but in truth you will introduce more problems than you solve.
Write integration regression tests. I.e., a few big tests that test a lot of functionality. It's not optimal, but it's the best you can do. Writing unit tests for every single class in your old system may end up a waste of time because it's not designed to be tested anyway and you want to redesign half of the classes.
Accept that it will take time.
Eclipse should be able to take care of #1 and help you work your way through many of the others.
As for converting poor OO code to good OO code it is amazingly difficult. Often it seems easier to rewrite it from scratch.
I tend to go from the bottom up. As I'm working on some small section I'll recognize a bunch of data that belongs together as a group and I'll make a good object that replaces that code without changing anything else--Very Small Changes with constant tests between each change.
This makes for a mediocre design at best, but I honestly don't know if you can go from not OO to good OO on a large project without dissecting the original program, understanding it and using it as a template for the rewrite and few projects allow this (even though it might be faster, you'll rarely if ever be able to convince management of that fact)
The point is risk I think.
The ugly code is just ugly, but it could work, it has been tested and bugfixed. If runnable code is changed, risk will follow. so test is critical.
You could refactor related code when
you have to bugfix, as a conservatism solution.
Maybe the first challenge is to persuade your manager:)
What's the problem with it not having getters and setters? I'd suggest refactoring those only when you need to add non-trivial getters or setters (e.g., with validation).
The rest sounds like you need to identify groups of values and create new types holding them, so instead of passing a String name, String address, int yearOfBirth, String[] accountNames, int[] balances you would pass a Customer around, which would in turn have an Account[].
IDEA Ultimate Edition has a code duplication detector that's very good (it's only missing a 'suggested solution' button!), and there are CPD etc.
I'd suggest that in a large legacy codebase you might waste time refactoring code only to find out it wasn't used anyway. I outlined some steps for removing unused code: http://rickyclarkson.blogspot.com/2009/12/deleting-code-what-first.html
How many of those "issues" are real problems and not just matters of style? Of this list, the only 'real' issue I can see is "Repeat Yourself frequently", and that's more of an ongoing maintenance problem that should be resolved during normal code maintenance when someone's going to be changing the code anyway.
I want to modify them so that they are more object-oriented.
Object-Orientation should not be your only goal when refactoring. The question you should ask yourself is what is the expected ROI (better quality ? easier enhancements ? better sharing of this code across a team ?) A ROI is not just words, you should be prepared to measure with numbers the return on investment (even the quality enhancements for example). You should take into accounts the life duration of your products in estimating the ROI.
You should also ask yourself what is the size of the code which is dependent on the code you want to refactor. Refactoring a library could be easy but could lead to a lot of changes in source codes dependent on this library, a work well larger than just refactoring the library.
Before touching any code, you should estimate the total work that needs to be done to finish refactoring the code and dependent code. You should estimate a total rewrite of the code, a partial rewrite, or just an internal rewrite without touching APIs.
With the costs and returns, you could decide if it's worth the effort to refactor your code.