I work on a large legacy Java 8 (Android) application. We recently found a bug that was caused by an ignored result of method. Specifically a caller of a send() method didn't take the right actions when it the sending failed. It's been fixed but now I want to add some static analysis to help find if other existing bugs of the same nature exist in our code. And additionally, to prevent new bugs of the same nature from being added in the future.
We already use Find Bugs, PMD, Checkstyle, Lint, and SonarQube. So I figured that one of these probably already has the check I'm looking for, but it just needs to be enabled. But after a few hours of searching and testing, I don't think that's the case.
For reference, this is the code I was testing with:
public class Application {
public status void main(String[] args) {
foo(); // I want this to be caught
Bar aBar = new Bar();
aBar.baz(); // I want this to be caught
}
static boolean foo() {
return System.currentTimeMillis() % 2 == 0;
}
}
public class Bar {
boolean baz() {
return System.currentTimeMillis() % 2 == 0;
}
}
I want to catch this on the caller side since some callers may use the value while others do not. (The send() method described above was this case)
I found the following existing static analysis rules but they only seem to apply to very specific circumstances to avoid false positives and not work on my example:
Return values from functions without side effects should not be ignored (only for immutable classes in the Java API)
Method ignores exceptional return value (only for known methods like File.delete())
Method ignores return value (only for methods annotated with javax.annotation.CheckReturnValue I think...)
Method ignores return value, is this OK? (only when the return value is the same type as the type the method is invoked on)
Return value of method without side effect is ignored (only when the method does not produce any effect other than return value)
So far the best option seems to be #3 but it requires me to annotate EVERY method or class in my HUGE project. Java 9+ seems to allow annotating at the package-level but that's not an option for me. Even if it was, the project has A LOT of packages. I really would like a way to configure this to be applied to my whole project via one/few locations instead needing to modify every file.
Lastly I came across this Stack Overflow answer that showed me that IntelliJ has this check with a "Report all ignored non-library calls" check. Doing this seems to work as far as highlighting in the IDE. But I want this to cause CI fail. I found there's a way to trigger this via command line using intelliJ tools but this still outputs an XML/JSON file and I'll need to write custom code to parse that output. I'd also need to install IDE tools onto the CI machine which seems like overkill.
Does anyone know of a better way to achieve what I want? I can't be the first person to only care about false negatives and not care about false positives. I feel like it should be manageable to have any return value that is currently being unused to either be logged or have it explicitly stated that the return value is intentionally ignored it via an annotation or assigning to a variable convention like they do in Error Prone
Scenarios like the one you describe invariably give rise to a substantial software defect (a true bug in every respect); made more frustrating and knotty because the code fails silently, and which allowed the problem to remain hidden. Your desire to identify any similar hidden defects (and correct them) is easy to understand; however, (I humbly suggest) static code analysis may not be the best strategy:
Working from the concerns you express in your question: a CheckReturnValue rule runs a high risk of producing a cascade of //Ignore code comments, rule violationSuppress clauses, and/or #suppressRule annotations that far outnumber the rule's positive defect detection count.
The Java programming language further increases the likelihood of a high rule suppression count, after taking Java garbage collection into consideration and assessing how garbage collection effects software development. Working from the understanding that Java garbage collection is based on object instance reference counting, that only instances with a reference count of 0 (zero) are eligible for garbage collection, it makes perfect sense for Java developers to avoid unnecessary references, and to naturally adopt the practice of ignoring unimportant method call return values. The ignored instances will simply fall off of the local call stack, most will reach a reference count of 0 (zero), immediately become eligible for and quickly undergo garbage collection.
Shifting now from a negative perspective to positive, I offer alternatives, for your consideration, that (I believe) will improve your results, as well as your probability to reach a successful outcome.
Based on your description of the scenario and resulting defect / bug, it feels like the proximate root cause of the problem is a unit testing failure or an integration testing failure. The implementation of a send operation that may (and almost certainly will at some point) fail, both unit testing and integration testing absolutely should have incorporated multiple possible failure scenarios and verified failure scenario handling. I obviously don't know, but I'm willing to bet that if you focus on creating and running unit tests and integration tests, the quality of the system will improve at every step, the improvements will be clearly evident, and you may very well uncover some or all of the hidden bugs that are the cause of your current concern, trepidation, stress, and worry.
Consider keeping the gist of your current static code analysis research alive, but shift your approach in a new direction. The first time I read your question, I was struck by the realization that the code checks you would like to perform exist in multiple unrelated locations across the code base and are quickly becoming overly complex, the specific details of the checks are different in many section of code, and each of the special cases make the overall effort unrealistic. Basically, what you would like to implement represents a cross-cutting goal that falls across a sizable section of the code base, and the implementation details have made what is a fairly simple good idea ridiculously complex. Your question is almost a textbook example of a problem that is best implemented taking a cross-cutting aspect-oriented approach.
If you have the time and interest, please take a look at the AspectJ framework, maybe code a few exploratory aspects, and let me know what you think. I'd like to hear your thoughts, if you feel like having a geeky dev conversation at some point. I really hope this is helpful-
You may use the intelliJ IDEA's inspection: Java | Probable bugs | Result of method call ignored with "Report all ignored non-library calls" option enabled. It catches both cases provided in your code sample.
Related
This question already has answers here:
In Java streams is peek really only for debugging?
(10 answers)
Closed 4 years ago.
So I have a list of objects which I want part or whole to be processed, and I would want to log those objects that were processed.
consider a fictional example:
List<ClassInSchool> classes;
classes
.stream()
.filter(verifyClassInSixthGrade())
.filter(classHasNoClassRoom())
.peek(classInSchool -> log.debug("Processing classroom {} in sixth grade without classroom.", classInSchool)
.forEach(findMatchingClassRoomIfAvailable());
Would using .peek() in this instance be regarded as unintended use of the API?
To further explain, in this question the key takeaway is: "Don't use the API in an unintended way, even if it accomplishes your immediate goal." My question is whether or not every use of peek, short from debugging your stream until you have verified the whole chain works as designed and removed the .peek() again, is unintended use. So if using it as a means to log every object actually processed by the stream is considered unintended use.
The documentation of peek describes the intent as
This method exists mainly to support debugging, where you want to see the elements as they flow past a certain point in a pipeline.
An expression of the form .peek(classInSchool -> log.debug("Processing classroom {} in sixth grade without classroom.", classInSchool) fulfills this intend, as it is about reporting the processing of an element. It doesn’t matter whether you use the logging framework or just print statements, as in the documentation’s example, .peek(e -> System.out.println("Filtered value: " + e)). In either case, the intent matters, not the technical approach. If someone used peek with the intent to print all elements, it would be wrong, even if it used the same technical approach as the documentation’s example (System.out.println).
The documentation doesn’t mandate that you have to distinguish between production environment or debugging environment, to remove the peek usage for the former. Actually, your use would even fulfill that, as the logging framework allows you to mute that action via the configurable logging level.
I would still suggest to keep in mind that for some pipelines, inserting a peek operation may insert more overhead than the actual operation (or hinder the JVM’s loop optimizations to such degree). But if you do not experience performance problems, you may follow the old advice to not try to optimize unless you have a real reason…
Peek should be avoided as for certain terminal operations it may not be called, see this answer. In that example it would probably be better to do the logging inside the action of forEach rather than using peek. Debugging in this situation means temporary code used for fixing a bug or diagnosing an issue.
In java streams using .peek() is regarded as to be used for debugging purposes only, would logging be considered as debugging?
It depends on whether your logging code is going to be a permanent fixture of your code, or not.
Only you can really know the real purpose of your logging ...
Also note that the javadoc says:
In cases where the stream implementation is able to optimize away the production of some or all the elements (such as with short-circuiting operations like findFirst, or in the example described in count()), the action will not be invoked for those elements.
So, you are liable to find that in some circumstances peek won't be a reliable way to log (or debug) your pipeline.
In general, adding peek is liable to change the behavior of the pipeline and / or the JVM's ability to optimize it ... in a current or future generation JVM.
Eh, it's somewhat open to interpretation. Intent is something that's not always easy to determine.
I think the API note was mostly added to discourage an overzealous usage of peek when almost all desirable behaviour can be accomplished without it. It was just too useful for the developers to exclude it completely but they wanted to be clear that its inclusion was not to be taken as an unqualified endorsement; they saw the potential for misuse and they tried to address it.
I suspect - though I'm only speculating - that there were mixed opinions on whether to include it at all, and that including a version with a caveat in the JavaDoc was the compromise.
With that in mind, I think my suggestion for deciding when to use peek would simply be: don't use it unless you have a very good reason to.
In your case, you definitely don't have a good reason to. You're iterating over everything and passing the result to the method findMatchingClassRoomIfAvailable (well, presumably - your example wasn't very good). If you want to log something for each item in the stream then just log it at the top that method.
Is it misuse? I don't think so. Would I write it this way? No.
I have read the stackoverflow page which discusses "Why use getters and setters?", I have been convinced by some of the reasons using a setter, for example: later validation, data encapsulation, etc. But what is the reason of using getters anyway? I don't see any harm of getting a value of a private field, or reasons to validation before you get the a field's value. Is it OK to never use a getter and always get a field's value using dot notation?
If a given field in a Java class be visible for reading (on the RHS of an expression), then it must also be possible to assign that field (on the LHS of an expression). For example:
class A {
int someValue;
}
A a = new A();
int value = a.someValue; // if you can do this (potentially harmless)
a.someValue = 10; // then you can also do this (bad)
Besides the above problem, a major reason for having a getter in a class is to shield the consumer of that class from implementation details. A getter does not necessarily have to simply return a value. It could return a value distilled from a Collection or something else entirely. By using a getter (and a setter), we free the consumer of the class from having to worry about the implementation changing over time.
I want to focus on practicalities, since I think you're at a point where you haven't seen the conceptual benefits line up just yet with the actual practice.
The obvious conceptual benefit is that setters and getters can be changed without impacting the outside world using those functions. Another Java-specific benefit is that all methods not marked as final are capable of being overriden, so you get the ability for subclasses to override the behavior as a bonus.
Overkill?
Yet you're probably at a point where you've heard these conceptual benefits before and it still sounds like overkill for your more daily scenarios. A difficult part of understanding software engineering practices is that they are generally designed to deal with very real world, large-scale codebases being managed by teams of developers. A lot of things are going to seem like overkill initially when you're just working on a small project of your own.
So let's get into some practical, real-world scenarios. I formerly worked in a very large-scale codebase. It a was low-level C codebase with a long legacy and sometimes barely a step above assembly, but many of the lessons I learned there translate to all kinds of languages.
Real-World Grief
In this codebase, we had a lot of bugs, and the majority of them related to state management and side effects. For example, we had cases where two fields of a structure were supposed to stay in sync with each other. The range of valid values for one field depended on the value of the other. Yet we ran into bugs where those two fields were out of sync. Unfortunately since they were just public variables with a very global scope ('global' should really be considered a degree with respect to the amount of code that can access a variable rather than an absolute), there were potentially tens of thousands of lines of code that could be the culprit.
As a simpler example, we had cases where the value of a field was never supposed to be negative, yet in our debugging sessions, we found negative values. Let's call this value that's never supposed to be negative, x. When we discovered the bugs resulting from x being negative, it was long after x was touched by anything. So we spent hours placing memory breakpoints and trying to find needles in a haystack by looking at all possible places that modified x in some way. Eventually we found and fixed the bug, but it was a bug that should have been discovered years earlier and should have been much less painful to fix.
Such would have been the case if large portions of the codebase weren't just directly accessing x and used functions like set_x instead. If that were the case, we could have done something as simple as this:
void set_x(int new_value)
{
assert(new_value >= 0);
x = new_value;
}
... and we would have discovered the culprit immediately and fixed it in a matter of minutes. Instead, we discovered it years after the bug was introduced and it took us meticulous hours of headaches to trace it down and fix.
Such is the price we can pay for ignoring engineering wisdom, and after dealing with the 10,000th issue which could have been avoided with a practice as simple as depending on functions rather than raw data throughout a codebase, if your hairs haven't all turned grey at that point, you're still generally not going to have a cheerful disposition.
The biggest value of getters and setters comes from the setters. It's the state manipulation that you generally want to control the most to prevent/detect bugs. The getter becomes a necessity simply as a result of requiring a setter to modify the data. Yet getters can also be useful sometimes when you want to exchange a raw state for a computation non-intrusively (by just changing one function's implementation), e.g.
Interface Stability
One of the most difficult things to appreciate earlier in your career is going to be interface stability (to prevent public interfaces from changing constantly). This is something that can only be appreciated with projects of scale and possibly compatibility issues with third parties.
When you're working on a small project on your own, you might be able to change the public definition of a class to your heart's content and rewrite all the code using it to update it with your changes. It won't seem like a big deal to constantly rewrite the code this way, as the amount of code using an interface might be quite small (ex: a few hundred lines of code using your class, and all code that you personally wrote).
When you work on a large-scale project and look down at millions of lines of code, changing the public definition of a widely-used class might mean that 100,000 lines of code need to be rewritten using that class in response. And a lot of that code won't even be your own code, so you have to intrusively analyze and fix other people's code and possibly collaborate with them closely to coordinate these changes. Some of these people may not even be on your team: they may be third parties writing plugins for your software or former developers who have moved on to other projects.
You really don't want to run into this scenario repeatedly, so designing public interfaces well enough to keep them stable (unchanging) becomes a key skill for your most central interfaces. If those interfaces are leaking implementation details like raw data, then the temptation to change them over and over is going to be a scenario you can face all the time.
So you generally want to design interfaces to focus on "what" they should do, not "how" they should do it, since the "how" might change a lot more often than the "what". For example, perhaps a function should append a new element to a list. However, you may want to swap out the list data structure it's using for another, or introduce a lock to make that function thread safe ("how" concerns). If these "how" concerns are not leaked to the public interface, then you can change the implementation of that class (how it's doing things) locally without affecting any of the existing code that is requesting it to do things.
You also don't want classes to do too much and become monolithic, since then your class variables will become "more global" (become visible to a lot more code even within the class's implementation) and it'll also be hard to settle on a stable design when it's already doing so much (the more classes do, the more they'll want to do).
Getters and setters aren't the best examples of such interface design, but they do avoid exposing those "how" details at least slightly better than a publicly exposed variable, and thus have fewer reasons to change (break).
Practical Avoidance of Getters/Setters
Is it OK to never use a getter and always get a field's value using dot notation?
This could sometimes be okay. For example, if you are implementing a tree structure and it utilizes a node class as a private implementation detail that clients never use directly, then trying too hard to focus on the engineering of this node class is probably going to start becoming counter-productive.
There your node class isn't a public interface. It's a private implementation detail for your tree. You can guarantee that it won't be used by anything more than the tree implementation, so there it might be overkill to apply these kinds of practices.
Where you don't want to ignore such practices is in the real public interface, the tree interface. You don't want to allow the tree to be misused and left in an invalid state, and you don't want an unstable interface which you're constantly tempted to change long after the tree is being widely used.
Another case where it might be okay is if you're just working on a scrap project/experiment as a kind of learning exercise, and you know for sure that the code you write is rather disposable and is never going to be used in any project of scale or grow into anything of scale.
Nevertheless, if you're very new to these concepts, I think it's a useful exercise even for your small scale projects to err on the side of using getters/setters. It's similar to how Mr. Miyagi got Daniel-San to paint the fence, wash the car, etc. Daniel-San finds it all pointless with his arms exhausted on top of that. Then Mr. Miyagi goes "hyah hyah hyoh hyah" throwing big punches and kicks, and using that indirect training, Daniel-San blocks all of them without realizing how he's even doing it.
In java you can't tell the compiler to allow read-only access to a public field from outside.
So exposing public fields opens the door to uncontroled modifications.
Fields are not polymorphic.
The alternative to a getter would be a public field; however, fields are not polymorphic.
This means that you cannot extend the class and "override" the field without introducing weird behaviour. Basically, the value you get will depend on how you refer to the field.
Furthermore, you can't include the field in an interface and you can't perform validation (that applies more to a setter).
I help maintain and build on a fairly large Swing GUI, with a lot of complex interaction. Often I find myself fixing bugs that are the result of things getting into odd states due to some race condition somewhere else in the code.
As the code base gets large, I've found it's gotten less consistent about specifying via documentation which methods have threading restrictions: most commonly, methods that must be run on the Swing EDT. Similarly, it would be useful to know and provide static awareness into which (of our custom) listeners are notified on the EDT by specification.
So it came to me that this should be something that could be easily enforced using annotations. Lo and behold, there exists at least one static analysis tool, CheckThread, that uses annotations to accomplish this. It seems to allow you to declare a method to be confined to a specific thread (most commonly the EDT), and will flag methods that try to call that method without also declaring themselves as confined to that thread.
So on the surface this just seems like a low-pain, huge-gain addition to the source and build cycle. My questions are:
Are there any success stories for people using CheckThread or similar libraries to enforce threading constraints? Any stories of failure? Why did it succeed/fail?
Is this good in theory? Are there theoretical downsides?
Is this good in practice? Is it worth it? What kind of value has it delivered?
If it works in practice, what are good tools to support this? I've just found CheckThread but admit I'm not entirely sure what I'm searching for to find other tools that do the same thing.
I know whether it's right for us depends on our scenario. But I've never heard of people using something like this in practice, and to be honest it doesn't seem to have taken hold much from some general browsing. So I'm wondering why.
This answer is more focused on the theory aspect of your question.
Fundamentally you are making an assertion: "This methods runs only under certain threads". This assertion isn't really different than any other assertion you might make ("The method accepts only integers less than 17 for parameter X"). Issues are
Where do such assertions come from?
Can static analyzers check them?
Where do you get such a static analyzer?
Mostly such assertions have to come from the software designers, as they are the only people that know the intentions. The traditional term for this is "Design by Contract",
although most DBC schemes are only over the current program state (C's assert macro) and they should really be over the programs' past and future states ("temporal assertions"), e.,g., "This routine will allocate a block of storage, and eventually some piece of code will deallocate it". One can build tools that try to determine hueristically what the assertions are (e.g., Engler's assertion induction work; others have done work in this area). That's useful, but the false positives are an issue. As practical matter, asking the designers to code such assertions doesn't seem particularly onerous, and is really good long term documentation. Whether you code such assertions with a specific "Contract" language construct, or with an if statement ("if Debug && Not(assertion) Then Fail();") or hide them in an annotation is really just a matter of convenience. Its nice when the language allows to code such assertions directly.
Checking of such assertions statically is difficult. If you stick with current-state only, the static analyzer pretty much has to do full data flow analysis of your entire application, because the information needed to satisfy the assertion likely comes from data created by another part of the application. (In your case, the "inside EDT" signal has to come from analyzing the whole call graph of the application to see if there is any call-path that leads to the method from a thread which is NOT the EDT thread). If you use temporal properties, the static check pretty much needs some kind of state-space verification logic in addition; these are presently still pretty much research tools. Even with all this machinery, static analyzers generally have to be "conservative" in their anlayses; if they can't demonstrate that something is false, they pretty much have to assume it is true, because of the halting problem.
Where do you get such analyzers? Given all the machinery needed, they're hard to build and so you should expect them to be rare. If somebody has built one, great. If not... as a general rule, you don't want do this yourself from scratch. The best long-term hope is to have generic program analysis machinery available on which to build such analyzers, to amortize the cost of building all the infrastructure. (I build program analyzer tool foundations; see our DMS Software Reengineering Toolkit).
One way to make it "easier" to build such static analyzers is to restrict the cases they handle to narrow scope, e.g., CheckThread. I'd expect CheckThread to do exactly what it presently does, and it would be unlikely to get a lot stronger.
The reason that "assert" macros and other such dynamic "current state" checks are popular is that they can actually be implemented by a simple runtime test. That's pretty practical. The problem here is that you may never exercise a path that leads to a failed conditions. So, for dynamic analysis, absence of detected failure is not really evidence of correctness. Still feels good.
Bottom line: static analyzers and dynamic analyzers each have their strength.
We haven't tried any static analysis tools, but we've used AspectJ to write a simple aspect that detects at runtime when any code in java.awt or javax.swing is invoked outside the EDT. It has found several places in our code that were missing a SwingUtilities.invokeLater(). We run with this aspect enabled throughout our QA cycle, then turn it off shortly before release.
As requested, this doesn’t pertain specifically to Java or the EDT, but I’ve seen good results with Coverity’s concurrency static analysis checkers for C/C++. They did have a higher false positive rate than less complicated checkers, but the code owners seemed willing to put up with that, given how hard threading bugs can be to find via testing. The details are confidential, I’m afraid, but Dawson Engler’s public papers (e.g., “Bugs as Deviant Behavior”) are very good on the general approach of “The following «N» instances of your code do «X» before doing «Y»,; this instance doesn’t.”
While poking around the questions, I recently discovered the assert keyword in Java. At first, I was excited. Something useful I didn't already know! A more efficient way for me to check the validity of input parameters! Yay learning!
But then I took a closer look, and my enthusiasm was not so much "tempered" as "snuffed-out completely" by one simple fact: you can turn assertions off.*
This sounds like a nightmare. If I'm asserting that I don't want the code to keep going if the input listOfStuff is null, why on earth would I want that assertion ignored? It sounds like if I'm debugging a piece of production code and suspect that listOfStuff may have been erroneously passed a null but don't see any logfile evidence of that assertion being triggered, I can't trust that listOfStuff actually got sent a valid value; I also have to account for the possibility that assertions may have been turned off entirely.
And this assumes that I'm the one debugging the code. Somebody unfamiliar with assertions might see that and assume (quite reasonably) that if the assertion message doesn't appear in the log, listOfStuff couldn't be the problem. If your first encounter with assert was in the wild, would it even occur to you that it could be turned-off entirely? It's not like there's a command-line option that lets you disable try/catch blocks, after all.
All of which brings me to my question (and this is a question, not an excuse for a rant! I promise!):
What am I missing?
Is there some nuance that renders Java's implementation of assert far more useful than I'm giving it credit for? Is the ability to enable/disable it from the command line actually incredibly valuable in some contexts? Am I misconceptualizing it somehow when I envision using it in production code in lieu of statements like if (listOfStuff == null) barf();?
I just feel like there's something important here that I'm not getting.
*Okay, technically speaking, they're actually off by default; you have to go out of your way to turn them on. But still, you can knock them out entirely.
Edit: Enlightenment requested, enlightenment received.
The notion that assert is first and foremost a debugging tool goes a long, long way towards making it make sense to me.
I still take issue with the notion that input checks for non-trivial private methods should be disabled in a production environment because the developer thinks the bad inputs are impossible. In my experience, mature production code is a mad, sprawling thing, developed over the course of years by people with varying degrees of skill targeted to rapidly changing requirements of varying degrees of sanity. And even if the bad input really is impossible, a piece of sloppy maintenance coding six months from now can change that. The link gustafc provided (thanks!) includes this as an example:
assert interval > 0 && interval <= 1000/MAX_REFRESH_RATE : interval;
Disabling such a simple check in production strikes me as foolishly optimistic. However, this is a difference in coding philosophy, not a broken feature.
In addition, I can definitely see the value of something like this:
assert reallyExpensiveSanityCheck(someObject) : someObject;
My thanks to everybody who took the time to help me understand this feature; it is very much appreciated.
assert is a useful piece of Design by Contract. In that context, assertions can be used in:
Precondition checks.
Postcondition checks.
Intermediate result checks.
Class invariant checks.
Assertions can be expensive to evaluate (take, for example, the class invariant, which must hold before and after calling any public method of your class). Assertions are typically wanted only in debug builds and for testing purposes; you assert things that can't happen - things which are synonymous of having a bug. Assertions verify your code against its own semantics.
Assertions are not an input validation mechanism. When input could really be correct or wrong in the production environment, i.e. for input-output layers, use other methods, such as exceptions or good old conditional checks.
Java's assertions aren't really made for argument validation - it's specifically stated that assertions are not to be used instead of dear old IllegalArgumentException (and neither is that how they are used in C-ish languages). They are more there for internal validation, to let you make an assumption about the code which isn't obvious from looking at it.
As for turning them off, you do that in C(++), too, just that if someone's got an assert-less build, they have no way to turn it on. In Java, you just restart the app with the appropriate VM parameters.
Every language I've ever seen with assertions comes with the capability of shutting them off. When you write an assertion you should be thinking "this is silly, there's no way in the universe this could ever be false" -- if you think it could be false, it should be an error check. The assertion is just to help you during development if something goes horribly wrong; when you build the code for production you disable them to save time and avoid (hopefully) superfluous checks
Assertions are meant to ensure things you are sure that your code fulfills really are fulfilled. It's an aid in debugging, in the development phase of the product, and is usually omitted when the code is released.
What am I missing?
You're not using assertions the way they were meant to be used. You said "check the validity of input parameters" - that's precisely the sort of things you do not want to verify with assertions.
The idea is that if an assertion fails, you 100% have a bug in your code. Assertions are often used for identifying the bug earlier than it would have surfaced otherwise.
I think its the way assert usage is interpreted and envisioned.
If you really want to add the check in your actual production code, why not use If directly or any other conditional statement?
Those being already present in language, the idea of assert was only to have developer's add assertions only if they don't really expect this condition to ever happen.
E.g checking an object to be null, let's say a developer wrote a private method and called it from two places (this is not ideal example but may works for private methods) in the class where he knows he passes a not null object, instead of adding unnecessary check of if since as of today you know there is no way object would be null
But if someone tomorrow calls this method with null argument, in developer's unit testing this can be caught due to presence of assertion and in final code you still don't need an if check.
Assertions are really a great and concise documentation tool for a code maintainer.
For example I can write:
foo should be non-null and greater
than 0
or put this into the body of the program:
assert foo != null;
assert foo.value > 0;
They are extremely valuable for documenting private/package private methods to express original programmer invariants.
For the added bonus, when the subsystem starts to behave flaky, you can turn asserts on and add extra validation instantly.
This sounds about right. Assertions are just a tool that is useful for debugging code - they should not be turned on all the time, especially in production code.
For example, in C or C++, assertions are disabled in release builds.
If asserts could not be turned off, then why should they even exist.
If you want to performa validity check on an input, you can easily write
if (foobar<=0) throw new BadFoobarException();
or pop up a message box or whatever is useful in context.
The whole point of asserts is that they are something that can be turned on for debugging and turned off for production.
Assertions aren't for the end user to see. They're for the programmer, so you can make sure the code is doing the right thing while it's being developed. Once the testing's done, assertions are usually turned off for performance reasons.
If you're anticipating that something bad is going to happen in production, like listOfStuff being null, then either your code isn't tested enough, or you're not sanitizing your input before you let your code have at it. Either way, an "if (bad stuff) { throw an exception }" would be better. Assertions are for test/development time, not for production.
Use an assert if you're willing to pay $1 to your end-user whenever the assertion fails.
An assertion failure should be an indication of a design error in the program.
An assertion states that I have engineered the program in such a way that I know and guarantee that the specified predicate always holds.
An assertion is useful to readers of my code, since they see that (1) I'm willing to set some money on that property; and (2) in previous executions and test cases the property did hold indeed.
My bet assumes that the client of my code sticks to the rules, and adheres to the contract he and I agreed upon. This contract can be tolerant (all input values allowed and checked for validity) or demanding (client and I agreed that he'll never supply certain input values [described as preconditions], and that he doesn't want me to check for these values over and over again).
If the client sticks to the rules, and my assertions nevertheless fail, the client is entitled to some compensation.
Assertions are to indicate a problem in the code that may be recoverable, or as an aid in debugging. You should use a more destructive mechanism for more serious errors, such as stopping the program.
They can also be used to catch an unrecoverable error before the application fails later in debugging and testing scenarios to help you narrow down a problem. Part of the reason for this is so that integrity checking does not reduce the performance of well-tested code in production.
Also, in certain cases, such as a resource leak, the situation may not be desirable, but the consequences of stopping the program are worse than the consequences of continuing on.
This doesn't directly answer your question about assert, but I'd recommend checking out the Preconditions class in guava/google-collections. It allows you to write nice stuff like this (using static imports):
// throw NPE if listOfStuff is null
this.listOfStuff = checkNotNull(listOfStuff);
// same, but the NPE will have "listOfStuff" as its message
this.listOfStuff = checkNotNull(listOfStuff, "listOfStuff");
It seems like something like this might be what you want (and it can't be turned off).
I am building a spring mvc web application.
I plan on using hibernate.
I don't have much experience with obfuscating etc.
What are the potential downsides to obfuscating an application?
I understand that there might be issues with debugging the app, and recovering lost source code is also an issue.
Are there any known issues with the actually running of the application? Can bugs be introduced?
Since this is an area I am looking for general guidance, please feel free to open up any issues that I should be aware of.
There are certainly some potential performance/maintenance issues, but a good obfuscator will let you get round at least some of them. Things to look out for:
an obvious one: if your code calls methods by reflection or dynamically loads classes, then this is liable to fail if the class/method names are obfuscated; a good obfuscator will let you select class/method names not to obfuscate to get round this problem;
a similar issue can occur if not all of your application is compiled at the same time;
if it deals directly at the bytecode level, an obfuscator can create code that in principle a Java compiler cannot create (e.g. it can insert arbitrary GOTO instructions, whereas from Java these can only be created as part of a loop)-- this may be a bit theoretical, but if I were writing a JVM, I'd optimise performance for sequences of bytecodes that a Java compiler can create, not ones that it can't...
the obfuscator is liable to make other subtle changes to performance if it significantly alters the number of bytecodes in a method, or in some way changes whether a given method/piece of code hits thresholds for certain JVM optimisations (e.g. "inline methods with fewer than X bytecodes").
But as you can see, some of these effects are a little subtle and theoretical-- so to some extent what you need to do is soak-test your application after obfuscation, just as you would with any other major change.
You should also be careful not to assume that obfuscation hides your code/algorithm (if that is your intention) as much as you want it to-- use a decompiler to have a look at the contents of the resulting obfuscated classes.
Surprised no one has mentioned speed - in general, more obfuscated = slower-running code
[Edit] I can't believe this has -2. It is a correct answer.
Shortening identifiers and removing unused methods will decrease the file-size, but have 0 impact on the running speed (other than the few nanoseconds shaved off the loading time). In the meanwhile, most of the obfuscation of the program comes from added code:
Breaking 1 method into 5; interleaving methods; merging classes [aggregation transformations]
Splitting 1 arithmetic expression into 10; jumbling the control-flow [computation transformations]
And adding chunks of code that do nothing [opaque predicates]
are all common obfuscation techniques that cause a program to run slower.
You may want to look at some of the comments here, to decide if obfuscating makes sense:
https://stackoverflow.com/questions/1988451/net-obfuscation
You may want to express why you want to obfuscate. IMO the best reasons are mainly to have a smaller application, as you can get rid of classes that aren't being used in your project, while obfuscating.
I have never seen bugs introduced, as long as you aren't using reflection, assuming you can find something, as private methods for example will have their names changed.
The biggest problem centers around that fact that obfuscating programs generally make a guarantee of not changing the behavior of their target program. In some cases it proves to be very hard to do this -- for example, imagine a program which checks the value of certain private fields via reflection from a string array. An obfuscator may not be able to tell that this string also needs to be updated correspondingly, and the result will be unexpected access errors that pop up at runtime.
Worse still, it may not be obvious that the behavior of a program has changed subtly -- then you may not know that there's a problem at all, until your customer finds it first and gets upset.
Generally, professional-grade obfuscation products are sophisticated enough to catch some kinds of problems and prevent them, but ultimately it can be challenging to cover all the bases. The best defense is to run unit tests against the obfuscated result and make sure that all your expected behavior continues to hold true.
1 free one you might want to check out is Babel. It is designed to be used on the command line (like many other obfuscators), there is a Reflector addin that will provide a UI for you.
When it comes to obfuscation, you really need to analyze what your goal is. In your case - if you have a web application (mvc) are you planning on selling it as a canned downloadable application? (if not and you keep the source on your web servers then you don't need it).
You might look at the components and pick only certain parts to obfuscate ... not the whole thing. In general ASP.Net apps break pretty easy when you try to add obfuscation after you developed them due to all the reflection used.
Pretty much everything mentioned above is true ... it all depends on how many features you turn on to make it hard to reverse your code:
Renaming of members (fields/methods/events/properties) is most common (comes in different flavors: simple renaming of methods from something like GetId() to a() all the way to unreadable characters and removal of namespaces). BTW: This is where reflection usually breaks. Your assembly file may end up being smaller due to smaller strings being used too.
String encryption: this makes it harder to reverse your static strings used in your code. BTW: this paired with renaming makes it difficult for you to debug your renaming problems ... so you might turn it on after you have that working. This also will have to add code to decrypt the string right before it is used in IL
Code mangling ... this is what BlueRaja was refering to. It makes your code look like spagetti code - to make it harder for someone to figure out. The CLR does not like this ... it can't optimize things as easy and your final code will mostlikely proccess slower due to the additional branching and something not being inlined due to the IL rewriting used for this option. BTW: this option really does raise the bar on what it takes to reverse you source code, but may come with a performance hit.
Removal of unused code. Some obfuscators offer you the option to trim any code that it finds not being used. This may make your assembly a little smaller if you have alot of dead code hanging around ... but it is just a free benefit obfuscators throw in.
My advice is to only use it if you know why you are using it and design with that end in mind ... don't try to add it after you've finished your code (I've done that and it's not fun)