Related
As part of our instrumentation tool suite, we have a static prepass that modifies some methods of a class and then marks those methods with a user defined attribute. When the application is run, if the class file is presented directly to the the transform() method, i.e. it is the first load of the class, I can see these attributes. But if I use retransformClasses(), then when I get control in at the transform() method my attributes have been deleted. I can see why the JVM might discard unknown attributes when recreating the class bytes to pass to transform(), but I cannot find any documentation verifying and/or describing this behavior.
How can I accomplish this goal? I can see no guarantee that the same does not happen for RuntimeVisible annotations. And even if they are preserved, they are so much more difficult to work with than attributes I would like to avoid that approach.
Any ideas on how to add 'notes' to a method that are preserved through retransformClasses()?
Thanks for any suggestions.
Once a class file is loaded, HotSpot JVM does not preserve the original bytecode. Instead, it reconstitutes the bytecode from the internal VM representation when needed. The attributes that VM does not understand are not restored.
The documentation to retransformClasses explicitly mentions this possibility:
The initial class file bytes represent the bytes passed to
ClassLoader.defineClass or redefineClasses (before any transformations
were applied), however they might not exactly match them. The constant
pool might not have the same layout or contents. The constant pool may
have more or fewer entries. Constant pool entries may be in a
different order; however, constant pool indices in the bytecodes of
methods will correspond. Some attributes may not be present. Where
order is not meaningful, for example the order of methods, order might
not be preserved.
In the mean time, RuntimeVisibleAnnotations attribute is understood by the JVM. Moreover, there is Java API to access them, therefore JVM cannot throw them away during transformation. HotSpot JVM indeed writes RuntimeVisibleAnnotations when reconstituting the bytecode.
So, your best bet is to use annotations - after all, they are designed exactly for marking members with user-defined metadata.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
Part of the project I'm working on got updated, and at some point somebody started sending empty collections instead of null as arguments to a method.
This led to a single bug first, which then led me to change if (null == myCollection) with if (CollectionUtils.isEmpty(myCollection)), which in the end led to a cascade of several bugs. This way I discovered that a lot of the code treats these collections differently:
when a collection is empty (i.e. the user specifically wanted to have nothing here)
when a collection is null (i.e. the user did not mention anything here)
Hence, my question: is this good or bad design practice?
In his (very good) book Effective Java, Joshua Bloch treats this question for the returned values of methods (not "in general" like your question) :
(About use of null) It is errorprone, because the programmer writing
the client might forget to write the special case code to handle a null
return.
(...)
It is sometimes argued that a null return value is preferable to an
empty array because it avoids the expense of allocating the array.
This argument fails on two counts. First, it is inadvisable to worry
about performance at this level unless profiling has shown that the
method in question is a real contributor to performance problems (Item
55). Second, it is possible to return the same zero-length array from every invocation that returns no items because
zero-length arrays are immutable and immutable objects may be shared
freely (Item 15).
(...)
In summary, there is no reason ever to return null from an
array- or collection-valued method instead of returning an empty array
or collection. (...)
Personnally I use this reasoning as a rule of thumb with any use of Collections in my code. Of course, there is some case where a distinction between null and empty makes sense but in my experience it's quite rare.
Nevertheless, as stated by BionicCode in the comments section, in the case of a method that returns null instead of empty to specify that something went wrong, you always have the possibility of throwing an exception instead.
I know this question already has an accepted answer. But because of the discussions I had, I decided to write my own to address this problem.
The inventor of NULL himself Mr. Tony Hoare says
I call it my billion-dollar mistake. It was the invention of the null reference in 1965. At that time, I was designing the first comprehensive type system for references in an object oriented language (ALGOL W). My goal was to ensure that all use of references should be absolutely safe, with checking performed automatically by the compiler. But I couldn’t resist the temptation to put in a null reference, simply because it was so easy to implement. This has led to innumerable errors, vulnerabilities, and system crashes, which have probably caused a billion dollars of pain and damage in the last forty years.
First of all it is very bad practice to introduce breaking changes when updating old code that was already released and tested. Refactoring in this situation is always dangerous and should be avoided. I know that it can hurt to release ugly or stupid code, but if this happened, then this is a clear sign of a lack of quality management tools (e.g. code reviews or conventions). But obviously there were no unit tests otherwise the changes would have been reverted immediately by the author of the changes, due to the failing tests. The main problem is that your team has no conventions how to handle NULL as argument or as result. The intention of the author of the update to switch from NULL arguments to empty collections is absolutely right and should be supported, but he shouldn't have done it for working code and he should have discussed with the team, so that the rest of the team can follow in order to make it more effective. Your Team definitely must come together and agree on abandoning NULL as an argument or result value or at least find a standard. Better send NULL to hell. Your only solution is to revert the changes done by the author of the update (I assume you are using version control). Then redo the update, but do it the old fashioned and nasty way using NULL like you were doing it before. Don't use NULL in new code - for a brighter future. Trying to fix the updated version will escalate the situation for sure and therefore will waste time. (I assume that we are talking about a bigger project). Roll back to the previous version if possible.
To make it short if you don't like to continue reading: yes it's very bad practice. You yourself can draw this conclusion from the situation you are in now. You now witness very unstable and unpredictable code. The irrational bugs are the proof that your code has become unpredictable. If you don't have unit tests for at least the business logic then the code is hot iron. The practice that has lead to this code can never be good. NULL has no intuitional meaning to the user of an API, when there is the possibility to get a neutral result or parameter (like this is the case with a collection). An empty collection is neutral. You can expect a collection to be empty or to contain at least one element. There is noting else you can expect from a collection. You want to indicate an error and that an operation couldn't terminate? Then prefer to throw an exception with a good name that communicates the root of the error clearly to the caller.
NULL is a historical relic. Stumbling over a NULL value meant that the programmer wrote sloppy code. He forgot to initialize a pointer by assigning a memory address to it. The compiler must know a memory address as a reference in order to create a valid pointer. If you wanted a pointer but don't waste memory for just declaring the pointer or didn't know the address to this point, NULL was a convention to have the pointer to point to nowhere, which means no memory must be allocated for the pointer's reference (except for the pointer itself). Today in modern OO-Languages with garbage collection and plenty of available memory, NULL has become irrelevant in programming. There are situations where it is used to express e.g. the absence of data like in SQL. But in OO programming you can absolutely avoid it and therefore make your application more robust.
You have the choice to apply the Null-Object Pattern. Also it is best practice to either use default parameters (if your language like C# supports this) or use overloads. E.g. if the method parameter is optional then use an overload (or a default parameter). If the parameter is mandatory but you can't provide the value, then simply don't call the method. If the parameter is a collection, then always use an empty collection whenever you have no values. The author of the method must handle this case as he must handle all possible cases or parameter states. This includes NULL. So it's the duty of the method's author to check for NULL and to decide how to handle it. Here kicks the convention in. If your team agreed on to never use NULL, this annoying and ugly NULL checks are not required anymore. Some frameworks offer a #NotNull attribute. The author can use it to decorate the method parameters to indicate that NULL is not a valid value. The compiler will now do the NULL checks and show an error to the programmer that (mis)uses the method and simply won't compile. Alongside code reviews, this can help to prevent or identify violations and lead to more robust code.
Most libraries provide helper classes e.g. Array.Empty() or Enumerable.Empty() (C#) to create empty collections that provide methods like IsEmpty(). This makes intentions semantically clear and code therefore nice to read. It's worth it to write your own helper if none exist in your standard library.
Try to integrate quality management into your team's routine. Do code reviews to make sure the code to be released is conform to your quality standards (unit tests, no NULL values, always use curly braces for statement bodies, naming, etc...)
I hope you can fix your problem. I know this is a stressful situation to clean up the mess of somebody else. This is why communication in teams is so important.
It depends on your needs. Null is definitele not empty collection. I'd say that it is a bad practice to treat separately empty and not empty collection.
Null howver indicates a kind of lack of data. I'd say that if such situation is legal and you are using java 8 or higher you should probably use Optional. In this case Optional.empty() means that there is no collection while Optional.of(collection) means that collection is here even if it itself is empty.
It is recommended to use like this in case of collection.
if(myCollection !=null && !myCollection.isEmpty()) {
//Process the logic
}
However as part of design, Joshua Bloch recommends to use empty collection in Effective Java.
To quote his statement,
there is no reason ever to return null from an array-valued method
instead of returning a zero-length array.
You can find the link for effective java.
https://www.amazon.com/dp/0321356683
well, by accessing a null value, you can get a NullPointerException directly, and most times (from my experience) it is bad practice to make a difference between a null value and empty collection in deeper hierarchy of the logic. it should just be empty instead of null.
Let’s put some downvote bait here, shall we? As I understand it, you have got a method like
public void foo(Collection myCollection) {
if (CollectionUtils.isEmpty(myCollection)) {
// ...
}
// ...
}
I also understand that this method is called from many places so it would most practical if you don’t need to change all the calls. And I understand that there is a semantic difference between passing null and passing an empty collection to it. An empty collection means that we know for a fact that there are no elements. Null means that it is not specified whether there are any (and I am assuming that your method is able to do useful work in this case too).
The nice design would have two methods:
/** #param myCollection a possibly empty collection; not null */
public void foo(Collection myCollection);
/** Call this method if you don’t want to specify elements */
public void foo();
However, considering your existing code base, you don’t want to introduce the non-null requirement mentioned all of a sudden and break code that has been working until now. One way forward would be to introduce a comment on the 1-arg method effectively saying
Passing null to this method is deprecated. It works but may be
prohibited in a future version.
This will buy you time to change the code base over the coming months or even years and still allow you to arrive at the better design at some point.
At the same time you may change your 1-arg method to just delegating to the no-arg method if it receives a null.
Caveat: You need to be very clear in your documentation (Javadoc) about the semantics of not specifying elements (ideally calling the no-arg method) and the semantic difference from passing an empty collection.
It's well known that GCs will sometimes move objects around in memory. And it's to my understanding that as long as all references are updated when the object is moved (before any user code is called), this should be perfectly safe.
However, I saw someone mention that reference comparison could be unsafe due to the object being moved by the GC in the middle of a reference comparison such that the comparison could fail even when both references should be referring to the same object?
ie, is there any situation under which the following code would not print "true"?
Foo foo = new Foo();
Foo bar = foo;
if(foo == bar) {
System.out.println("true");
}
I tried googling this and the lack of reliable results leads me to believe that the person who stated this was wrong, but I did find an assortment of forum posts (like this one) that seemed to indicate that he was correct. But that thread also has people saying that it shouldn't be the case.
Java Bytecode instructions are always atomic in relation to the GC (i.e. no cycle can happen while a single instruction is being executed).
The only time the GC will run is between two Bytecode instructions.
Looking at the bytecode that javac generates for the if instruction in your code we can simply check to see if a GC would have any effect:
// a GC here wouldn't change anything
ALOAD 1
// a GC cycle here would update all references accordingly, even the one on the stack
ALOAD 2
// same here. A GC cycle will update all references to the object on the stack
IF_ACMPNE L3
// this is the comparison of the two references. no cycle can happen while this comparison
// "is running" so there won't be any problems with this either
Aditionally, even if the GC were able to run during the execution of a bytecode instruction, the references of the object would not change. It's still the same object before and after the cycle.
So, in short the answer to your question is no, it will always output true.
Source:
https://docs.oracle.com/javase/specs/jls/se8/html/jls-15.html#jls-15.21.3
The short answer is, looking at the java 8 specification: No.
The == operator will always perform object equality check (given that neither reference is null). Even if the object is moved, the object is still the same object.
If you see such an effect, you have just found a JVM bug. Go submit it.
It could, of course, be that some obscure implementation of the JVM does not enforce this for whatever strange performance reason. If that is the case, it would be wise to simply move on from that JVM...
TL;DR
You should not think about that kind of stuff what so ever, It's a dark place.
Java has clearly stated out it's specifications and you should not doubt it, ever.
2.7. Representation of Objects
The Java Virtual Machine does not mandate any particular internal structure for objects.
Source: JVMS SE8.
I doubt it! If you may doubt this very basic operator you may find yourself doubt everything else, getting frustrated and paranoid with trust issues is not the place you want to be.
What if it happens to me? Such a bug should not be existed. The Oracle discussion you supplied reporting a bug that happened years ago and somehow discussion OP decided to pop that up for no reason, either without reliable documentation of such bug existed now days. However, if such bug or any others has occurred to you, please submit it here.
To let your worries go away, Java has adjusted the pointer to pointer approach into the JVM pointer table, you can read more about it's efficenty here.
GCs only happen at points in the program where the state is well-defined and the JVM has exact knowledge where everything is in registers/the stack/on the heap so all references can be fixed up when an object gets moved.
I.e. they cannot occur between execution of arbitrary assembly instructions. Conceptually you can think of them occuring between bytecode instructions of the JVM with the GC adjusting all references that have been generated by previous instructions.
You are asking a question with a wrong premise. Since the == operator does not compare memory locations, it isn’t sensible to changes of memory location per se. The == operator, applied to references, compares the identity of the referred objects, regardless of how the JVM implements it.
To name an example that counteracts the usual understanding, a distributed JVM may have objects held in the RAM of different computers, including the possibility of local copies. So simply comparing addresses won’t work. Of course, it’s up to the JVM implementation to ensure that the semantics, as defined in the Java Language Specification, do not change.
If a particular JVM implementation implements a reference comparison by directly comparing memory locations of objects and has a garbage collector that can change memory locations, of course, it’s up to the JVM to ensure that these two features can’t interfere with each other in an incompatible way.
If you are curious on how this can work, e.g. inside optimized, JIT compiled code, the granularity isn’t as fine as you might think. Every sequential code, including forward branches, can be considered to run fast enough to allow to delay garbage collection to its completion. So garbage collection can’t happen at any time inside optimized code, but must be allowed at certain points, e.g.
backward branches (note that due to loop unrolling, not every loop iteration implies a backward branch)
memory allocations
thread synchronization actions
invoking a method that hasn’t been inlined/analyzed
maybe something special, I forgot
So the JVM emits code containing certain “safe points” at which it is known, which references are currently held, how to replace them, if necessary and, of course, changing locations has no impact on the correctness. Between these points, the code can run without having to care about the possibility of changing memory locations whereas the garbage collector will wait for code reaching a safe point when necessary, which is guaranteed to happen in finite, rather short time.
But, as said, these are implementation details. On the formal level, things like changing memory locations do not exist, so there is no need to explicitly specify that they are not allowed to change the semantics of Java code. No implementation detail is allowed to do that.
I understand you are asking this question after someone says it behaves that way, but really asking if it does behave that way isn't the right approach to evaluating what they said.
What you should really be asking (primarily yourself, others only if you can't decide on an answer) is whether it makes sense for the GC to be allowed to cause a comparison to fail that logically should succeed (basically any comparison that doesn't include a weak reference).
The answer to that is obviously "no", as it would break pretty much anything beyond "hello, world" and probably even that.
So, if allowed, it is a bug -- either in the spec or the implementation. Now since both the spec and the implementation were written by humans, it is possible such a bug exists. If so, it will be reported and almost certainly fixed.
No, because that would be flagrantly ridiculous and a patent bug.
The GC takes a great deal of care behind the scenes to avoid catastrophically breaking everything. In particular, it will only move objects when threads are paused at safepoints, which are specific places in the running code generated by the JVM for threads to be paused at. A thread at a safepoint is in a known state, where the positions of all the possible object references in registers and memory are known, so the GC can update them to point to the object's new address. Garbage collection won't break your comparison operations.
Java object hold a reference to the "object" not to the memory space where the object is stored.
Java do this because it allow the JVM to manage memory usage by its own (e.g. Garbage collector) and to improve global usage without impacting the client program directly.
As instance for improvement, the first X int (I don't remember how much) are always allocated in memory to execute for loop fatser (ex: for (int i =0; i<10; i++))
And as example for object reference, just try to create an and try to print it
int[] i = {1,2,3};
System.out.println(i);
You will see that Java returning a something starting with [I#. It is saying that is point on a "array of int at" and then the reference to the object. Not the memory zone!
Is there any reasons/arguments not to implement a Java collection that restricts its members based on a predicate/constraint?
Given that such functionality should be necessary often, I was expecting it to be implemented already on collections frameworks like apache-commons or Guava. But while apache indeed had it, Guava deprecated its version of it and recommend not using similar approaches.
The Collection interface contract states that a collection may place any restrictions on its elements as long as it is properly documented, so I'm unable to see why a guarded collection would be discouraged. What other option is there to, say, ensure a Integer collection never contains negative values without hiding the whole collection?
It is just a matter of preference -look at thread about checking before vs checking after - I think that is what it boils down to. Also checking only on add() i good enough only for immutable objects.
There can hardly be one ("acceptable") answer, so I'll just add some thoughts:
As mentioned in the comments, the Collection#add(E) already allows for throwing an IllegalArgumentException, with the reason
if some property of the element prevents it from being added to this collection
So one could say that this case was explicitly considered in the design of the collection interface, and there is no obvious, profound, purely technical (interface-contract related) reason to not allow creating such a collection.
However, when thinking about possible application patterns, one quickly finds cases where the observed behavior of such a collection could be ... counterintuitive, to say the least.
One was already mentioned by dcsohl in the comments, and referred to cases where such a collection would only be a view on another collection:
List<Integer> listWithIntegers = new ArrayList<Integer>();
List<Integer> listWithPositiveIntegers =
createView(listWithIntegers, e -> e > 0);
//listWithPositiveIntegers.add(-1); // Would throw IllegalArgumentException
listWithIntegers.add(-1); // Fine
// This would be true:
assert(listWithPositiveIntegers.contains(-1));
However, one could argue that
Such a collection would not necessarily have to be only a view. Instead, one could enforce that only new collections with such constraints may be created
The behavior is similar to that of Collections.unmodifiableCollection(Collection), which is widely anticipated as it is. (Although it serves a far broader and omnipresent use-case, namely avoiding the internal state of a class to be exposed by returning a modifiable version of a collection via an accessor method)
But in this case, the potential for "inconsistencies" is much higher.
For example, consider a call to Collection#addAll(Collection). It also allows throwing an IllegalArgumentException "if some property of an element of the specified collection prevents it from being added to this collection". But there are no guarantees about things like atomicity. To phrase it that way: It is not specified what the state of the collection will be when such an exception was thrown. Imagine a case like this:
List<Integer> listWithPositiveIntegers = createList(e -> e > 0);
listWithPositiveIntegers.add(1); // Fine
listWithPositiveIntegers.add(2); // Fine
listWithPositiveIntegers.add(Arrays.asList(3,-4,5)); // Throws
assert(listWithPositiveIntegers.contains(3)); // True or false?
assert(listWithPositiveIntegers.contains(5)); // True or false?
(It may be subtle, but it may be an issue).
All this might become even trickier when the condition changes after the collection has been created (regardless of whether it is only a view or not). For example, one could imagine a sequence of calls like this:
List<Integer> listWithPredicate = create(predicate);
listWithPredicate.add(-1); // Fine
someMethod();
listWithPredicate.add(-1); // Throws
Where in someMethod(), there is an innocent line like
predicate.setForbiddingNegatives(true);
One of the comments already mentioned possible performance issues. This is certainly true, but I think that this is not really a strong technical argument: There are no formal complexity guarantees for the runtime of any method of the Collection interface, anyhow. You don't know how long a collection.add(e) call takes. For a LinkedList it is O(1), but for a TreeSet it may be O(n log n) (and who knows what n is at this point in time).
Maybe the performance issue and the possible inconsistencies can be considered as special cases of a more general statement:
Such a collection would allow to basically execute arbitrary code during many operations - depending on the implementation of the predicate.
This may literally have arbitrary implications, and makes reasoning about algorithms, performance and the exact behavior (in terms of consistency) impossible.
The bottom line is: There are many possible reasons to not use such a collection. But I can't think of a strong and general technical reason. So there may be application cases for such a collection, but the caveats should be kept in mind, considering how exactly such a collection is intended to be used.
I would say that such a collection would have too many responsibilities and violate SRP.
The main issue I see here is the readability and maintainability of the code that uses the collection. Suppose you have a collection to which you allow adding only positive integers (Collection<Integer>) and you use it throughout the code. Then the requirements change and you are only allowed to add odd positive integers to it. Because there are no compile time checks, it would be much harder for you to find all the occurrences in the code where you add elements to that collection than it would be if you had a separate wrapper class which encapsulates the collection.
Although of course not even close to such an extreme, it bears some resemblance to using Object reference for all objects in the application.
The better approach is to utilize compile time checks and follow the well-established OOP principles like type safety and encapsulation. That means creating a separate wrapper class or creating a separate type for collection elements.
For example, if you really want to make quite sure that you only work with positive integers in a context, you could create a separate type PositiveInteger extends Number and then add them to a Collection<PositiveInteger>. This way you get compile time safety and converting PositiveInteger to OddPositiveInteger requires much less effort.
Enums are an excellent example of preferring dedicated types vs runtime-constrained values (constant strings or integers).
I've recently been looking at The Java Virtual Machine Specifications (JVMS) to try to better understand the what makes my programs work, but I've found a section that I'm not quite getting...
Section 4.7.4 describes the StackMapTable Attribute, and in that section the document goes into details about stack map frames. The issue is that it's a little wordy and I learn best by example; not by reading.
I understand that the first stack map frame is derived from the method descriptor, but I don't understand how (which is supposedly explained here.) Also, I don't entirely understand what the stack map frames do. I would assume they're similar to blocks in Java, but it appears as though you can't have stack map frames inside each other.
Anyway, I have two specific questions:
What do the stack map frames do?
How is the first stack map frame created?
and one general question:
Can someone provide an explanation less wordy and easier to understand than the one given in the JVMS?
Java requires all classes that are loaded to be verified, in order to maintain the security of the sandbox and ensure that the code is safe to optimize. Note that this is done on the bytecode level, so the verification does not verify invariants of the Java language, it merely verifies that the bytecode makes sense according to the rules for bytecode.
Among other things, bytecode verification makes sure that instructions are well formed, that all the jumps are to valid instructions within the method, and that all instructions operate on values of the correct type. The last one is where the stack map comes in.
The thing is that bytecode by itself contains no explicit type information. Types are determined implicitly through dataflow analysis. For example, an iconst instruction creates an integer value. If you store it in slot 1, that slot now has an int. If control flow merges from code which stores a float there instead, the slot is now considered to have invalid type, meaning that you can't do anything more with that value until overwriting it.
Historically, the bytecode verifier inferred all the types using these dataflow rules. Unfortunately, it is impossible to infer all the types in a single linear pass through the bytecode because a backwards jump might invalidate already inferred types. The classic verifier solved this by iterating through the code until everything stopped changing, potentially requiring multiple passes.
However, verification makes class loading slow in Java. Oracle decided to solve this issue by adding a new, faster verifier, that can verify bytecode in a single pass. To do this, they required all new classes starting in Java 7 (with Java 6 in a transitional state) to carry metadata about their types, so that the bytecode can be verified in a single pass. Since the bytecode format itself can't be changed, this type information is stored seperately in an attribute called StackMapTable.
Simply storing the type for every single value at every single point in the code would obviously take up a lot of space and be very wasteful. In order to make the metadata smaller and more efficient, they decided to have it only list the types at positions which are targets of jumps. If you think about it, this is the only time you need the extra information to do a single pass verification. In between jump targets, all control flow is linear, so you can infer the types at in between positions using the old inference rules.
Each position where types are explicitly listed is known as a stack map frame. The StackMapTable attribute contains a list of frames in order, though they are usually expressed as a difference from the previous frame in order to reduce data size. If there are no frames in the method, which occurs when control flow never joins (i.e. the CFG is a tree), then the StackMapTable attribute can be omitted entirely.
So this is the basic idea of how StackMapTable works and why it was added. The last question is how the implicit initial frame is created. The answer of course is that at the beginning of the method, the operand stack is empty and the local variable slots have the types given by the types of the method parameters, which are determined from the method decriptor.
If you're used to Java, there are a few minor differences to how method parameter types work at the bytecode level. First off, virtual methods have an implicit this as first parameter. Second, boolean, byte, char, and short do not exist at the bytecode level. Instead, they are all implemented as ints behind the scenes.