Documentation of immutability of Java wrapper classes

Documentation of immutability of Java wrapper classes - java

It is well-known that the type wrappers such as Integer, Double, Boolean are immutable. However, I was unable to find this documented in the official API documentation, e.g., https://docs.oracle.com/javase/8/docs/api/java/lang/Boolean.html. I also looked in the source code files, and did not find this stated in the comments. (The comments in the source code for String, on the other hand, does mention its immutability.)
Is this because:
- it's documented elsewhere (if so, where?),
- this fact is too "well-known", or
- the developer is expected to read the implementation of the wrappers and figure out whether the wrapper is immutable or not?

It is worth consideration that immutable can mean two things:
a) that if you pass this value somwhere it can't be mutated.
b) "a" and that it can be safely used in multithreaded environment
ad A) There are classes that are just immutable but not thread safe, they are good to be used with setters/getters and to be keys in HashMap - these ones with no mutators, with all fields private but without all fields final or volatile.
ad B) There are classes that are immutable and thread safe - these without mutators and with all fields private and final or volatile.
Classes that are thread safe are often described as so in the documentation or even by name, of course some classes can be immutable and/or thread safe but not strictly documented as so. For example the String class is documented to be "constant", but there is no information about thread safety - there is only one enigmatic statement "Because String objects are immutable they can be shared" but I think it means something different than ...shared with other threads. We just know the properties of popular classes, but I agree that these properties should be clearly documented. Unfortunatelly in real life they aren't. So the only way to know if class is immutable is to check the documentation and if there is not enough information, then check the implemetation and ask the author if He plans to make the class mutable in the future. This topic is considered in a book Java Concurrency in Practice, and the author suggest to use two annotations to denote that something is #ThreadSafe and/or #Immutable but unfortunatelly this isn't a common practice yet.

The boxed wrappers are "immutable" because they're virtually
interchangeable syntactically with the literal types they wrap. For example boolean is immutable:
boolean x = false;
x.flip(); // not implemented
Native types in most programming languages are immutable. Therefore, by the wrapper contract,
Boolean x = false;
x.mutate(/* ??? */);
is not defined either.

Related

Are AtomicInteger synchronization primitives?

Are AtomicIntegers considered synchronization primitives, or is it just the methods provided by Java (wait(), notify(), etc).
I am confused about the definition of primitives, as atomicintegers can operate on int and provide lock free thread sage programming. Without the use of synchronized.

AtomicInteger is a class. Its methods are... well, methods. Neither one of those would be considered a synchronization primative.
The compareAndSet method, which is also used by incrementAndGet and other such methods, uses Unsafe.compareAndSwapInt (on OpenJDK 7, which is what I have handy). That's a native method — so it could well be considered a primitive. And in fact, on modern CPUs, it translates to a CAS instruction, so it's a primitive all the way down to the hardware level.
The class also relies on volatile's memory visibility, which is also a synchronization primitive.

I think this question is a bit "vague"; but I think that "language primitive" typically refers to language elements that are part of the core of the language.
In other words: the keywords, and the associated semantics. In that sense; I would see the synchronized (in its two meanings) and volatile keywords as being the only "primitive" regarding multithreading.
Of course, classes such as Object; and therefore all its methods like wait(), notify() ... are also an essential part of Java (one which you can't avoid in the first place). And of course, same can be said about the Thread class.
Long story short: you can differentiate between concepts that exist as language keywords (and are thus handled by the compiler); and "on-top" concepts that come as "normal" classes. And as the answer from yshavit nicely describes, certain aspects of AtomicInteger can be directly mapped into the "native" side of things. So the real answer is maybe that, as said, the term "primitive" doesn't provide much aid in describing/differentiating concepts regarding Java multi-threading topics.

Regarding your first query:
Are AtomicIntegers considered synchronization primitives, or is it just the methods provided by Java (wait(), notify(), etc).
No. AtomicInteger is neither a method nor synchronized primitive.
AtomicInteger is a class with methods. Have a look at oracle documentation page on atomic packages
A small toolkit of classes that support lock-free thread-safe programming on single variables. In essence, the classes in this package extend the notion of volatile values, fields, and array elements to those that also provide an atomic conditional update operation of the form:
boolean compareAndSet(expectedValue, updateValue);
The classes in this package also contain methods to get and unconditionally set values, as well as a weaker conditional atomic update operation weakCompareAndSet
Regarding your second query:
I am confused about the definition of primitives, as atomicintegers can operate on int and provide lock free thread sage programming. Without the use of synchronized.
One key note:
The scope of synchronized is broad in nature compared to AtomicInteger or AtomicXXX variables. With synchronized methods or blocks, you can protect critical section of code, whcih contains many statements.
The compareAndSet method is not a general replacement for locking. It applies only when critical updates for an object are confined to a single variable.
Atomic classes are not general purpose replacements for java.lang.Integer and related classes. However, AtomicInteger extends Number to allow uniform access by tools and utilities that deal with numerically-based classes.

Why not lock on a value-based class

The docs say that you shouldn't lock on an instance of a value-based Java class such as Optional because code
may produce unpredictable results if it attempts to distinguish two references to equal values of a value-based class ... indirectly via an appeal to synchronization...
Why should Java's value-based classes not be serialized? asserts
Because future JVM implementations might not use object headers and reference pointers for value-based classes, some of the limitations are clear. (E.g. not locking on an identity which the JVM must not uphold. A reference on which is locked could be removed and replaced by another later, which makes releasing the lock pointless and will cause deadlocks).
I.E. that the prohibition is future-proofing. But there's no reference for that assertion.
If future-proofing is the basis, I'd like a reference for it. If not, I'd like to understand what the basis is since value-based objects are Objects.
EDIT
BTW, I understand the reasons not to lock on Integers and other primitive-wrapper classes; they may be cached. But I can find no documentation saying the same is true of value-based classes, and while Integer, &etc. are based on values, they are not value-based classes. I.E. The JavaDocs of Optional &etc. explicitly say
This is a value-based class
The same is not true for Integer, &etc.

Here's what a Blog post by Nicolai Parlog says about value-based classes:
In Java 8 value types are preceded by value-based classes. Their precise relation in the future is unclear but it could be similar to that of boxed and unboxed primitives (e.g. Integer and int). Additionally, the compiler will likely be free to silently switch between the two to improve performance. Exactly that switching back and forth, i.e. removing and later recreating a reference, also forbids identity-based mechanisms to be applied to value-based classes.
So what Nicolai is saying is this:
In the future, compilers may do things that transparently translate between values and value-based classes in ways that do not preserve object identity.
Certain things ("identity-based mechanisms") depend on object identity. Examples include the semantics of == for references, identity hashcode, primitive locking, and object serialization.
For those things, there is the potential that the transparent translation won't be transparent.
In the case of primitive locking, the concern is that something like the following sequence may occur.
An instance of a value-based class is created.
The instance is converted to a value behind the scenes.
The value is then converted back, giving a different object.
If two threads then use "the instance" as a primitive lock, they could be unaware that in fact there are in fact two objects (now). If they then attempted to synchronize, they would (could) be locking different objects. That would mean there was no mutual exclusion on whatever the state was that the locking was intended to protect.
If you don't lock on a value-based class, you won't have to worry about that potential hazard ... in the future.
But note, that Nicolai's blog posting is one person's speculation on what might happen in Java 10 or later.
BTW, I understand the reasons not to lock on Integers and other primitive-wrapper classes; they may be cached.
Caching is not the problem per se, but a mechanism that gives rise to the problem. The real problem is that it is difficult to reason about the object identity of the lock object, and hence whether the locking regime is sound.
With the the primitive wrappers, it is the semantics of boxing and unboxing that gives rise uncertainty of object identity. Going forward, the mooted value type <-> object conversion would be another source of this uncertainty.
The above blog is based on "State of the Values" April 2014. John Rose, Brian Goetz, and Guy Steele which talks about adding value types to a future version of Java. This note is a position statement rather than a fully spec'd (and adopted) proposal. However the note does give us this hint:
"Many of the above restrictions correspond to the restrictions on so-called value-based classes. In fact, it seems likely that the boxed form of every value type will be a value-based class."
which could be read as implying that there will be a relationship between value types and existing value-based classes. (Especially if you read between the lines of the Java 8 description of value-based classes.)
UPDATE - 2019/05/18
Value types didn't make it into Java 12, and they are not (yet) on the list for Java 13.
However, it is already possible to demonstrate a problem that is related to the problem that the blog post talks about:
public class BrokenSync {
private final Integer lock = 1;
public void someMethod() {
synchronized (lock) {
// do something
}
}
}
The problem is that each instance of BrokenSync will create an Integer instance by auto-boxing 1. But the JLS says that Integer objects produced by auto-boxing are not necessarily distinct objects. So, you can end up with all instances of BrokenSync using the same Integer object as a lock.

A lock is associated with an object. If an object is shared, it's lock can be shared. Immutable value classes can be shared. In theory all references to a value object that has a particular semantic value could refer to one shared objects. It is common for creation code for value objects to reuse value objects. For example by caching previously created values. So in general when you have a reference to a value object your code should work correctly even if the value object is also used elsewhere. So don't use it for a lock.

All the needed information is right on the page you cite titled "Value-based Classes", although it's not written as clearly as it might be, and it doesn't use some magic phrases that would have clarified these issues.
The magic phrase I would use to describe this situation is that value-based classes can have implementation-defined behaviors. What the page says is that these classes "make no use of" reference equality ==. It doesn't say that the implementation may not define such an operator, but it does say, in essence, "we are remaining silent on this point". One of the phrases used is "make no commitment", which is a bit clearer.
For example, one kind of implementation might, say out of convenience, might make a value-based class V behave like most other objects and not bother suppressing the members that V makes no use of. Another might implement value-based classes differently, using the same internal mechanism as it uses for primitive-wrapper classes. (If you're building the VM, you don't have to implement every class in Java.) As long as the VM satisfies all the requirements (or if you like, contracts) as specified on this page, it's met its obligations to the user.
Now suppose you write code that locks such an object. The locking wasn't written to cope with this situation, so it will do something, but it need not be consistent from VM to VM.
To be specific as to your question, future-proofing is just a special case of implementation-defined behavior. How it's written today may not be how it's written tomorrow, even when both are legal.

Final fields and Immutable Classes

According to this: A Strategy for Defining Immutable Objects
One of the conditions for a class to be immutable, is making all its fields final and private.
Why final??? The other conditions aren't sufficient?

Without making the field final we can make an immutable class/object if other conditions are available.
But I think the final is useful while dealing with concurrency and synchronization.

Per the definition for an immutable object (courtesy of Wikipedia) "In object-oriented and functional programming, an immutable object is an object whose state cannot be modified after it is created."
Once an final object has been created it cannot be re-assigned. Without the final key work you could still change an object after it has been created.
See also
final object in java

Counter question "Why not final?".
final means for primitive types you'll not be able to change the value once assigned which is enough to make them Immmutable,
while for non-primitive types the reference can't be changed (1st step towards Immutability) once assigned and you need to do some more as mentioned in the link shared by you.

The key to the linked document is this quote
Not all classes documented as "immutable" follow these rules....However, such strategies require sophisticated analysis and are not for beginners.
This is a tutorial for beginners. It's easier to tell them "make everything private and final" then have to explain all the edge cases with how to properly handle mutable references and making sure not to let your references escape.

Is it correct to call java.lang.String immutable?

This Java tutorial
says that an immutable object cannot change its state after creation.
java.lang.String has a field
/** Cache the hash code for the string */
private int hash; // Default to 0
which is initialized on the first call of the hashCode() method, so it changes after creation:
String s = new String(new char[] {' '});
Field hash = s.getClass().getDeclaredField("hash");
hash.setAccessible(true);
System.out.println(hash.get(s));
s.hashCode();
System.out.println(hash.get(s));
output
0
32
Is it correct to call String immutable?

A better definition would be not that the object does not change, but that it cannot be observed to have been changed. It's behavior will never change: .substring(x,y) will always return the same thing for that string ditto for equals and all the other methods.
That variable is calculated the first time you call .hashcode() and is cached for further calls. This is basically what they call "memoization" in functional programming languages.
Reflection isn't really a tool for "programming" but rather for meta-programming (ie programming programs for generating programs) so it doesn't really count. It's the equivalent of changing a constant's value using a memory debugger.

The term "Immutable" is vague enough to not allow for a precise definition.
I suggest reading Kinds of Immutability from Eric Lippert's blog. Although it's technically a C# article, it's quite relevant to the question posed. In particular:
Observational immutability:
Suppose you’ve got an object which has the property that every time
you call a method on it, look at a field, etc, you get the same
result. From the point of view of the caller such an object would be
immutable. However you could imagine that behind the scenes the object
was doing lazy initialization, memoizing results of function calls in
a hash table, etc. The “guts” of the object might be entirely mutable.
What does it matter? Truly deeply immutable objects never change their
internal state at all, and are therefore inherently threadsafe. An
object which is mutable behind the scenes might still need to have
complicated threading code in order to protect its internal mutable
state from corruption should the object be called on two threads “at
the same time”.

Once created, all the methods on a String instance (called with the same parameters) will always provide the same result. You cannot change its behavoiur (with any public method), so it will always represent the same entity. Also it is final and cannot be subclassed, so it is guaranteed that all instances will behave like this.
Therefore from public view the object is considered immutable. The internal state does not really matter in this case.

Yes it is correct to call them immutable.
While it is true that you can reach in and modify private ... and final ... variables of a class, it is an unnecessary and incredibly unwise thing to do on a String object. It is generally assumed that nobody is going to be crazy enough do it.
From a security standpoint, the reflection calls needed to modify the state of a String all perform security checks. Unless you've miss-implement your sandbox, the calls will be blocked for non-trusted code. So you should have to worry about this as a way that untrusted code can break sandbox security.
It is also worth noting that the JLS states that using reflection to change final, may break things (e.g. in multi-threading) or may not have any effect.

From the viewpoint of a developer who is using reflection, it is not correct to call String immutable. There are actual Java developers using reflection to write real software every day. Dismissing reflection as a "hack" is preposterous. However, from the viewpoint of a developer who is not using reflection, it is correct to call String immutable. Whether or not it is valid to assume that String is immutable depends on context.
Immutability is an abstract concept and therefore cannot apply in an absolute sense to anything with a physical form (see the ship of Theseus). Programming language constructs like objects, variables, and methods exist physically as bits in a storage medium. Data degradation is a physical process which happens to all storage media, so no data can ever be said to be truly immutable. In addition, it is almost always possible in practice to subvert the programming language features intended to prevent the mutation of a particular datum. In contrast, the number 3 is 3, has always been 3, and will always be 3.
As applied to program data, immutability should be considered a useful assumption rather than a fundamental property. For example, if one assumes that a String is immutable, one may cache its hash code for reuse and avoid the cost of ever recomputing its hash code again later. Virtually all non-trivial software relies on assumptions that certain data will not mutate for certain durations of time. Software developers generally assume that the code segment of a program will not change while it is executing, unless they are writing self-modifying code. Understanding what assumptions are valid in a particular context is an important aspect of software development.

It can not be modified from outside and it is a final class, so it can not be subclassed and made mutable. Theese are two requirments for immutability. Reflection is considered as a hack, its not a normal way of development.

A class can be immutable while still having mutable fields, as long as it doesn't provide access to its mutable fields.
It's immutable by design. If you use Reflection (getting the declared Field and resetting its accessibility), you are circumventing its design.

Reflection will allow you to change the contents of any private field. Is it therefore correct to call any object in Java immutable?
Immutability refers to changes that are either initiated by or perceivable by the application.
In the case of string, the fact that a particular implementation chooses to lazily calculate the hashcode is not perceptible to the application. I would go a step further, and say that an internal variable that is incremented by the object -- but never exposed and never used in any other way -- would also be acceptable in an "immutable" object.

Yes it is correct. When you modified a String like you do in your example, a new String is created but the older one maintain its value.

Why are Wrapper Classes, String,... final ?

Many classes in the Core Java API are final (Wrapper classes, String, Math).
Why is it so ?

They are final for security reasons. There may be other reasons, but security is the most important.
Imagine an ability to inherit java.lang.String, and supply your own, mutable implementation to a security-sensitive API. The API would have no choice but take your string (remember the substitution principle) then but you would be able to change the string from under them (on a concurrent thread or after the API has returned), even after they have checked it to be valid.
Same goes for wrappers of primitives: you do not want to see them mutable under any circumstance, because it would violate important assumptions about their behavior encoded in the APIs using these classes.
Making String final addresses this issue by not letting others supply their own, potentially hostile, implementations of classes as fundamental as String.

You might want to prevent other programmers from creating subclasses or from overriding certain methods. For these situations, you use the final keyword.
The String class is meant to be immutable - string objects can't be modified by any of their methods. Since java does not enforce this, the class designers did. Nobody can create subclasses of String.
Hope this answers your question.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.