What does Joshua Bloch mean by extra-linguistic? - java

From this Artima article on clone vs copy constructor:
Object's clone method is very tricky. It's based on field copies, and
it's "extra-linguistic." It creates an object without calling a
constructor. There are no guarantees that it preserves the invariants
established by the constructors. There have been lots of bugs over the
years, both in and outside Sun, stemming from the fact that if you
just call super.clone repeatedly up the chain until you have cloned an
object, you have a shallow copy of the object.
What does Joshua Bloch mean by extra-linguistic?

He means something like "outside of the scope of Java".
Specifically in Java the "correct" way to create a new object is by using that Object's constructor. Many class writers rely on this assumption and code logic into their constructors - things like input validation or anything else you want to guarantee at construction time - this is what he calls "invariants established by the constructors". But cloning bypasses this basic constraint and creates a memory copy without invoking the constructor - hence it is "extra linguistic".
Technically, so does serialization.

Probably the fact that it isn't implemented in Java but it has a native in the Object class.

The extra-linguistic object creation mechanisms (meaning other than calling or chaining constructors) are:
cloning
serialization
reflection
btye-code generation

Related

Is it correct to call java.lang.String immutable?

This Java tutorial
says that an immutable object cannot change its state after creation.
java.lang.String has a field
/** Cache the hash code for the string */
private int hash; // Default to 0
which is initialized on the first call of the hashCode() method, so it changes after creation:
String s = new String(new char[] {' '});
Field hash = s.getClass().getDeclaredField("hash");
hash.setAccessible(true);
System.out.println(hash.get(s));
s.hashCode();
System.out.println(hash.get(s));
output
0
32
Is it correct to call String immutable?
A better definition would be not that the object does not change, but that it cannot be observed to have been changed. It's behavior will never change: .substring(x,y) will always return the same thing for that string ditto for equals and all the other methods.
That variable is calculated the first time you call .hashcode() and is cached for further calls. This is basically what they call "memoization" in functional programming languages.
Reflection isn't really a tool for "programming" but rather for meta-programming (ie programming programs for generating programs) so it doesn't really count. It's the equivalent of changing a constant's value using a memory debugger.
The term "Immutable" is vague enough to not allow for a precise definition.
I suggest reading Kinds of Immutability from Eric Lippert's blog. Although it's technically a C# article, it's quite relevant to the question posed. In particular:
Observational immutability:
Suppose you’ve got an object which has the property that every time
you call a method on it, look at a field, etc, you get the same
result. From the point of view of the caller such an object would be
immutable. However you could imagine that behind the scenes the object
was doing lazy initialization, memoizing results of function calls in
a hash table, etc. The “guts” of the object might be entirely mutable.
What does it matter? Truly deeply immutable objects never change their
internal state at all, and are therefore inherently threadsafe. An
object which is mutable behind the scenes might still need to have
complicated threading code in order to protect its internal mutable
state from corruption should the object be called on two threads “at
the same time”.
Once created, all the methods on a String instance (called with the same parameters) will always provide the same result. You cannot change its behavoiur (with any public method), so it will always represent the same entity. Also it is final and cannot be subclassed, so it is guaranteed that all instances will behave like this.
Therefore from public view the object is considered immutable. The internal state does not really matter in this case.
Yes it is correct to call them immutable.
While it is true that you can reach in and modify private ... and final ... variables of a class, it is an unnecessary and incredibly unwise thing to do on a String object. It is generally assumed that nobody is going to be crazy enough do it.
From a security standpoint, the reflection calls needed to modify the state of a String all perform security checks. Unless you've miss-implement your sandbox, the calls will be blocked for non-trusted code. So you should have to worry about this as a way that untrusted code can break sandbox security.
It is also worth noting that the JLS states that using reflection to change final, may break things (e.g. in multi-threading) or may not have any effect.
From the viewpoint of a developer who is using reflection, it is not correct to call String immutable. There are actual Java developers using reflection to write real software every day. Dismissing reflection as a "hack" is preposterous. However, from the viewpoint of a developer who is not using reflection, it is correct to call String immutable. Whether or not it is valid to assume that String is immutable depends on context.
Immutability is an abstract concept and therefore cannot apply in an absolute sense to anything with a physical form (see the ship of Theseus). Programming language constructs like objects, variables, and methods exist physically as bits in a storage medium. Data degradation is a physical process which happens to all storage media, so no data can ever be said to be truly immutable. In addition, it is almost always possible in practice to subvert the programming language features intended to prevent the mutation of a particular datum. In contrast, the number 3 is 3, has always been 3, and will always be 3.
As applied to program data, immutability should be considered a useful assumption rather than a fundamental property. For example, if one assumes that a String is immutable, one may cache its hash code for reuse and avoid the cost of ever recomputing its hash code again later. Virtually all non-trivial software relies on assumptions that certain data will not mutate for certain durations of time. Software developers generally assume that the code segment of a program will not change while it is executing, unless they are writing self-modifying code. Understanding what assumptions are valid in a particular context is an important aspect of software development.
It can not be modified from outside and it is a final class, so it can not be subclassed and made mutable. Theese are two requirments for immutability. Reflection is considered as a hack, its not a normal way of development.
A class can be immutable while still having mutable fields, as long as it doesn't provide access to its mutable fields.
It's immutable by design. If you use Reflection (getting the declared Field and resetting its accessibility), you are circumventing its design.
Reflection will allow you to change the contents of any private field. Is it therefore correct to call any object in Java immutable?
Immutability refers to changes that are either initiated by or perceivable by the application.
In the case of string, the fact that a particular implementation chooses to lazily calculate the hashcode is not perceptible to the application. I would go a step further, and say that an internal variable that is incremented by the object -- but never exposed and never used in any other way -- would also be acceptable in an "immutable" object.
Yes it is correct. When you modified a String like you do in your example, a new String is created but the older one maintain its value.

Deep Copy an Object

Is it possible to deep copy an Object out of the box? i.e. any other way than coding a clone function manually.
Cloning does not necessarily perform a deep copy. In fact, the default implementation of Object.clone() creates a shallow copy.
If the object's closure consists of objects that implement Serializable or Externalizable, you can use ObjectOutputStream and ObjectInputStream to create a deep copy ... but it is expensive.
The cloning library is another option, but my initial reading of the code is that it relies on the class of every object in the graph providing a no-argument constructor. Then it will then patch the resulting object to have a copy of the original object's state. This process might have undesirable side-effects, depending on what the no-args constructor actually does.
In short, I don't think there is a universal solution.
I suggest to use java.lang.reflect.
java.lang.Class expose all fields and allows reading public fields and calling public methods.
Only the private field without accessors can't be cloned.
I briefly looked at the cloning library code. It does what Serialization does that is get the graph of the object internal and instead of writing to file, it writes to a memory location = which is the clone of the object. So although its faster than Serialization, its certainly doing the same thing.

Java final modifier

I was told that, I misunderstand effects of final. What are the effects of final keyword?
Here is short overview of what I think, I know:
Java final modifier (aka aggregation relation)
primitive variables: can be set only once. (memory and performance
gain)
objects variables: may be modified, final applies to object
reference.
fields: can be set only once.
methods: can't be overridden, hidden.
classes: can't be extended.
garbage collection: will force Java generational garbage collection
mark-sweep to double sweep.
Can's and Cant's
Can make clone fail (this is both good and bad)
Can make immutable primitives aka const
Can make blank immutable - initialized at creation aka readonly
Can make objects shallowly immutable
Can make scope / visibility immutable
Can make method invocation overhead smaller (because it does not need virtual table)
Can make method arguments used as final (even if thy are not)
Can make objects threadsafe (if object is defined as final, it wont make method arguments final)
Can make mock tests (not that you could do anything about it - you can say bugs are intended)
Can't make friends (mutable with other friends and immutable for rest)
Can't make mutable that is changed to be immutable later (but can with factory pattern like fix)
Can't make array elements immutable aka deeply immutable
Can't make new instances of object (this is both good and bad)
Can't make serialization work
There are no alternatives to final, but there is wrapper + private and enums.
Answering each of your points in turn:
primitive variables: can be set only once. (memory and performance gain)
Yes, but no memory gain, and no performance gain. (Your supposed performance gain comes from setting only once ... not from final.)
objects variables: may be modified, final applies to object reference.
Yes. (However, this description miss the point that this is entirely consistent with the way that the rest of the Java language deals with the object / reference duality. For instance, when objects are passed as parameters and returned as results.)
fields: can be set only once.
The real answer is: same as for variables.
methods: can't be overridden, hidden.
Yes. But also note that what is going on here is that the final keyword is being used in a different syntactic context to mean something different to final for an field / variable.
classes: can't be extended.
Yes. But also see note above.
garbage collection: will force Java generational garbage collection mark-sweep to double sweep.
This is nonsense. The final keyword has no relevance whatsoever to garbage collection. You might be confusing final with finalization ... they are unrelated.
But even finalizers don't force an extra sweep. What happens is that an object that needs finalization is set on one side until the main GC finishes. The GC then runs the finalize method on the object and sets its flag ... and continues. The next time the GC runs, the object is treated as a normal object:
if it is reachable it is marked and copied
if it is not reachable it is not marked.
(Your characterization - "Java generational garbage collection mark-sweep" is garbled. A garbage collector can be either "mark-sweep" OR "generational" (a subclass of "copying"). It can't be both. Java normally uses generational collection, and only falls back to mark-sweep in emergencies; i.e. when running out of space or when a low pause collector cannot keep up.)
Can make clone fail (this is both good and bad)
I don't think so.
Can make immutable primitives aka const
Yes.
Can make blank immutable - initialized at creation aka readonly
Yes ... though I've never heard the term "blank immutable" used before.
Can make objects shallowly immutable
Object mutability is about whether observable state may change. As such, declaring attributes final may or may not make the object behave as immutable. Besides the notion of "shallowly immutable" is not well defined, not least because the notion of what "shallow" is cannot be mapped without deep knowledge of the class semantics.
(To be clear, the mutability of variables / fields is a well defined concept in the context of the JLS. It is just the concept of mutability of objects that is undefined from the perspective of the JLS.)
Can make scope / visibility immutable
Terminology error. Mutability is about object state. Visibility and scope are not.
Can make method invocation overhead smaller (because it does not need virtual table)
In practice, this is irrelevant. A modern JIT compiler does this optimization for non-final methods too, if they are not overridden by any class that the application actually uses. (Clever stuff happens ...)
Can make method arguments used as final (even if thy are not)
Huh? I cannot parse this sentence.
Can make objects threadsafe
In certain situations yes.
(if object is defined as final, it wont make method arguments final)
Yes, if you mean if class is final. Objects are not final.
Can make mock tests (not that you could do anything about it - you can say bugs are intended)
Doesn't parse.
Can't make friends (mutable with other friends and immutable for rest)
Java doesn't have "friends".
Can't make mutable that is changed to be immutable later (but can with factory pattern like fix)
Yes to the first, a final field can't be switched from mutable to immutable.
It is unclear what you mean by the second part. It is true that you can use a factory (or builder) pattern to construct immutable objects. However, if you use final for the object fields at no point will the object be mutable.
Alternatively, you can implement immutable objects that use non-final fields to represent immutable state, and you can design the API so that you can "flip a switch" to make a previously mutable object immutable from now onwards. But if you take this approach, you need to be a lot more careful with synchronization ... if your objects need to be thread-safe.
Can't make array elements immutable aka deeply immutable
Yes, but your terminology is broken; see comment above about "shallow mutability".
Can't make new instances of object (this is both good and bad)
No. There's nothing stopping you making a new instance of an object with final fields or a final class or final methods.
Can't make serialization work
No. Serialization works. (Granted, deserialization of final fields using a custom readObject method presents problems ... though you can work around them using reflection hacks.)
There are no alternatives to final,
Correct.
but there is wrapper + private
Yes, modulo that (strictly speaking) an unsynchronized getter for a non-final field may be non-thread-safe ... even if it is initialized during object construction and then never changed!
and enums.
Solves a different problem. And enums can be mutable.
Final keyword is usually used to preserve immutability. To use final for classes or methods is to prevent linkages between methods from being broken. For example, suppose the implementation of some method of class X assumes that method M will behave in a certain way. Declaring X or M as final will prevent derived classes from redefining M in such a way as to cause X to behave incorrectly.

Why people are so afraid of using clone() (on collection and JDK classes)?

A number of times I've argued that using clone() isn't such a bad practice. Yes, I know the arguments. Bloch said it's bad. He indeed did, but he said that implementing clone() is bad. Using clone on the other hand, especially if it is implemented correctly by a trusted library, such as the JDK, is OK.
Just yesterday I had a discussion about an answer of mine that merely suggests that using clone() for ArrayList is OK (and got no upvotes for that reason, I guess).
If we look at the #author of ArrayList, we can see a familiar name - Josh Bloch. So clone() on ArrayList (and other collections) is perfectly fine (just look at their implementations).
Same goes for Calendar and perhaps most of the java.lang and java.util classes.
So, give me a reason why not to use clone() with JDK classes?
So, give me a reason why not to use clone() with JDK classes?
Given an ArrayList reference, you would need a getClass check to check that it is not a subclass of the JDK class. And then what? Potential subclasses cannot be trusted. Presumably a subclass would have different behaviour in some way.
It requires that the reference is more specific than List. Some people don't mind that, but the majority opinion is that that is a bad idea.
You'll have to deal with a cast, an unsafe cast at that.
From my experience, the problem of clone() arises on derived classes.
Say, ArrayList implements clone(), which returns an object of ArrayList.
Assume ArrayList has an derived class, namely, MyArrayList. It will be a disaster if MyArrayList does not override the clone() method. (By default it inherits the code from ArrayList).
The user of MyArrayList may expect clone() to return an object of MyArrayList; however, this is not true.
This is annoying: if a base class implements clone(), its derived class has to override the clone() all the way.
I will answer with a quote from the man himself (Josh Bloch on Design - Copy Constructor versus Cloning):
There are very few things for which I use Cloneable anymore. I often provide a public clone method on concrete classes because people expect it.
It can't be any more explicit than this: clone() on his Collection Framework classes are provided because people expect it. If people stop expecting it, he would've gladly thrown it away. One way to get people to stop expecting it is to educate people to stop using it, and not to advocate its use.
Of course, Bloch himself also said (not exact quote but close) "API is like sex: make one mistake and you support it for life". Any public clone() can probably never be taken back. Nevertheless, that's not a good enough reason to use it.
In order not to encourage other, less experienced, developers to implement Clone() themselves. I've worked with many developers whose coding styles are largely copied from (sometimes awful) code that they've worked with.
It does not enforce whether the implementer will do a deep or shallow copy.
Using clone can be risky very risky if you don't check the implementation of this clone method... as you can suppose a clone impl can act in different ways... shallow or deep clone... and some developpers may not always check what kind of clone they will retrieve...
In a big application, big team, it's also risky because when cloning is used, if you modify the clone implementation you may modify the application behaviour and create new bugs... and have to check everywhere the clone was called (but it can be the same for other object methods like equals, toString...
When modifying the clone of a small subclass A (from Deep to Shallow clone for exemple), if an instance of B has a reference to A and a deep clone impl, then since the objects referenced in A are not shallow cloned, the B clone won't be a deep clone anymore (and it's the same for any class referencing a B instance...).
It's not easy to deal with the deepness of your clone methods.
Also when you have an interface (extending Clonable) and many (many!) implementations, sometimes you will check some impl and see that clones are deep, and call a clone on the inteface but you can't be sure at runtime all impl really have deep clone and can introduce bugs...
Think it could be better to impl for each method a shallowClone and a deepClone method, and on very specific needs implement customised methods (for exemple you want a clone and limit the depth of this clone to 2, make a custom impl for that in all classes concerned).
Don't think it's a big matter to use a clone method on a JDK class since it's not going to be changed by another developper, or at least not often. But you'd better not call clone on JDK classes if you don't know the class implementation at compile time. I mean calling the clone on an ArrayList is not a matter but calling it on a Collection could be dangerous since another Collection implementation could be introduced by another developper (he can even extends a Collection impl) and nothing tells you that the clone of this impl will work like you expect it to do...
If the types in your collection are mutable, you have to worry about whether just the collection, itself, will be cloned, or whether the elements will also be cloned... hence implementing your own function, where you know whether the elements or just the container will be cloned will make things much clearer.

Considering object encapsulation, should getters return an immutable property?

When a getter returns a property, such as returning a List of other related objects, should that list and it's objects be immutable to prevent code outside of the class, changing the state of those objects, without the main parent object knowing?
For example if a Contact object, has a getDetails getter, which returns a List of ContactDetails objects, then any code calling that getter:
can remove ContactDetail objects from that list without the Contact object knowing of it.
can change each ContactDetail object without the Contact object knowing of it.
So what should we do here? Should we just trust the calling code and return easily mutable objects, or go the hard way and make a immutable class for each mutable class?
It's a matter of whether you should be "defensive" in your code. If you're the (sole) user of your class and you trust yourself then by all means no need for immutability. However, if this code needs to work no matter what, or you don't trust your user, then make everything that is externalized immutable.
That said, most properties I create are mutable. An occasional user botches this up, but then again it's his/her fault, since it is clearly documented that mutation should not occur via mutable objects received via getters.
It depends on the context. If the list is intended to be mutable, there is no point in cluttering up the API of the main class with methods to mutate it when List has a perfectly good API of its own.
However, if the main class can't cope with mutations, then you'll need to return an immutable list - and the entries in the list may also need to be immutable themselves.
Don't forget, though, that you can return a custom List implementation that knows how to respond safely to mutation requests, whether by firing events or by performing any required actions directly. In fact, this is a classic example of a good time to use an inner class.
If you have control of the calling code then what matters most is that the choice you make is documented well in all the right places.
Joshua Bloch in his excellent "Effective Java" book says that you should ALWAYS make defensive copies when returning something like this. That may be a little extreme, especially if the ContactDetails objects are not Cloneable, but it's always the safe way. If in doubt always favour code safety over performance - unless profiling has shown that the cloneing is a real performance bottleneck.
There are actually several levels of protection you can add. You can simply return the member, which is essentially giving any other class access to the internals of your class. Very unsafe, but in fairness widely done. It will also cause you trouble later if you want to change the internals so that the ContactDetails are stored in a Set. You can return a newly-created list with references to the same objects in the internal list. This is safer - another class can't remove or add to the list, but it can modify the existing objects. Thirdly return a newly created list with copies of the ContactDetails objects. That's the safe way, but can be expensive.
I would do this a better way. Don't return a list at all - instead return an iterator over a list. That way you don't have to create a new list (List has a method to get an iterator) but the external class can't modify the list. It can still modify the items, unless you write your own iterator that clones the elements as needed. If you later switch to using another collection internally it can still return an iterator, so no external changes are needed.
In the particular case of a Collection, List, Set, or Map in Java, it is easy to return an immutable view to the class using return Collections.unmodifiableList(list);
Of course, if it is possible that the backing-data will still be modified then you need to make a full copy of the list.
Depends on the context, really. But generally, yes, one should write as defensive code as possible (returning array copies, returning readonly wrappers around collections etc.). In any case, it should be clearly documented.
I used to return a read-only version of the list, or at least, a copy. But each object contained in the list must be editable, unless they are immutable by design.
I think you'll find that it's very rare for every gettable to be immutable.
What you could do is to fire events when a property is changed within such objects. Not a perfect solution either.
Documentation is probably the most pragmatic solution ;)
Your first imperative should be to follow the Law of Demeter or ‘Tell don't ask’; tell the object instance what to do e.g.
contact.print( printer ) ; // or
contact.show( new Dialog() ) ; // or
contactList.findByName( searchName ).print( printer ) ;
Object-oriented code tells objects to do things. Procedural code gets information then acts on that information. Asking an object to reveal the details of its internals breaks encapsulation, it is procedural code, not sound OO programming and as Will has already said it is a flawed design.
If you follow the Law of Demeter approach any change in the state of an object occurs through its defined interface, therefore side-effects are known and controlled. Your problem goes away.
When I was starting out I was still heavily under the influence of HIDE YOUR DATA OO PRINCIPALS LOL. I would sit and ponder what would happen if somebody changed the state of one of the objects exposed by a property. Should I make them read only for external callers? Should I not expose them at all?
Collections brought out these anxieties to the extreme. I mean, somebody could remove all the objects in the collection while I'm not looking!
I eventually realized that if your objects' hold such tight dependencies on their externally visible properties and their types that, if somebody touches them in a bad place you go boom, your architecture is flawed.
There are valid reasons to make your external properties readonly and their types immutable. But that is the corner case, not the typical one, imho.
First of all, setters and getters are an indication of bad OO. Generally the idea of OO is you ask the object to do something for you. Setting and getting is the opposite. Sun should have figured out some other way to implement Java beans so that people wouldn't pick up this pattern and think it's "Correct".
Secondly, each object you have should be a world in itself--generally, if you are going to use setters and getters they should return fairly safe independent objects. Those objects may or may not be immutable because they are just first-class objects. The other possibility is that they return native types which are always immutable. So saying "Should setters and getters return something immutable" doesn't make too much sense.
As for making immutable objects themselves, you should virtually always make the members inside your object final unless you have a strong reason not to (Final should have been the default, "mutable" should be a keyword that overrides that default). This implies that wherever possible, objects will be immutable.
As for predefined quasi-object things you might pass around, I recommend you wrap stuff like collections and groups of values that go together into their own classes with their own methods. I virtually never pass around an unprotected collection simply because you aren't giving any guidance/help on how it's used where the use of a well-designed object should be obvious. Safety is also a factor since allowing someone access to a collection inside your class makes it virtually impossible to ensure that the class will always be valid.

Categories