How can I be sure that my class is immutable - java

I am required to create a class very similar to String however instead of storing an array of characters, the object must store an array of bytes because I will be dealing with binary data, not strings.
I am using HashMaps within my application. I am therefore keen to make my custom byteArray class immutable since immutable objects perform faster searches in hashmaps. (I would like a source for this fact please)
I'm pretty sure my class is immutable, but its still performing poorly vs string in hashmap searches. How can I be sure it is immutable?

The most important thing is to copy the bytes into your array. If you have
this.bytes = passedInArray;
The caller can modify passedInArray and hence modify this.bytes. You must do
this.bytes = Arrays.copyOf(passedInArray, passedInArray.length);
(Or similar, clone is o.k. too). If this class will be mainly used as a key in Maps, I'd calculate the hashcode immediately (in the constructor), simpler than doing it lazily.
Implement the obvious equals() and I think you are done.

Your question is "How can I be sure that my class is immutable?" I'm not sure that's what you mean to ask, but the way to make your class immutable is listed by Josh Bloch in Effective Java, 2nd Ed. in item 15 here, and which I'll summarize in this answer:
Don't provide any mutator methods (methods that change the object's state, usually called "setters").
Ensure the class can't be extended. Generally, make the class final. This keeps others from subclassing it and modifying protected fields.
Make all fields final, so you can't change them.
Make all fields private, so others can't change them.
"Ensure exclusive access to mutable components." That is, if something else points to the data and therefore can alter it, make a defensive copy (as #user949300 pointed out).
Note that immutable objects don't automatically yield a the big performance boost. The boost from immutable objects would be from not having to lock or copy the object, and from reusing it instead of creating a new one. I believe the searches in HashMap use the class' hashCode() method, and the lookup should be O(c), or constant-time and fast. If you are having performance issues, you may need to look at if there's slowness in your hashCode() method (unlikely), or issues elsewhere.
One possibility is if you have implemented hashCode() poorly (or not at all) and this is causing a large number of collisions in your HashMap -- that is, calling that method with different instances of your class returns mostly similar or same values -- then the instances will be stored in a linked list at the location specified by hashCode(). Traversing this list will convert your efficiency from constant-time to linear-time, making performance much worse.

since immutable objects perform faster searches in hashmaps. (I would like a source for this fact please)
No, this isn't true. Performance as a hashmap key will be determined by the runtime, and collision avoidance, of hashCode.
I'm pretty sure my class is immutable, but its still performing poorly vs string in hashmap searches. How can I be sure it is immutable?
Your problem is more likely to be a poor choice of hashCode implementation. Consider basing your implementation around Arrays.hashCode.
(Your question ArrayList<Byte> vs String in Java suggests you're trying to tune a specific implementation; the advice there to use byte[] is good.)

Related

adavantages and/or disadvantages oof mutable and immutable classes [duplicate]

I'm trying to get my head around mutable vs immutable objects. Using mutable objects gets a lot of bad press (e.g. returning an array of strings from a method) but I'm having trouble understanding what the negative impacts are of this. What are the best practices around using mutable objects? Should you avoid them whenever possible?
Well, there are a few aspects to this.
Mutable objects without reference-identity can cause bugs at odd times. For example, consider a Person bean with a value-based equals method:
Map<Person, String> map = ...
Person p = new Person();
map.put(p, "Hey, there!");
p.setName("Daniel");
map.get(p); // => null
The Person instance gets "lost" in the map when used as a key because its hashCode and equality were based upon mutable values. Those values changed outside the map and all of the hashing became obsolete. Theorists like to harp on this point, but in practice I haven't found it to be too much of an issue.
Another aspect is the logical "reasonability" of your code. This is a hard term to define, encompassing everything from readability to flow. Generically, you should be able to look at a piece of code and easily understand what it does. But more important than that, you should be able to convince yourself that it does what it does correctly. When objects can change independently across different code "domains", it sometimes becomes difficult to keep track of what is where and why ("spooky action at a distance"). This is a more difficult concept to exemplify, but it's something that is often faced in larger, more complex architectures.
Finally, mutable objects are killer in concurrent situations. Whenever you access a mutable object from separate threads, you have to deal with locking. This reduces throughput and makes your code dramatically more difficult to maintain. A sufficiently complicated system blows this problem so far out of proportion that it becomes nearly impossible to maintain (even for concurrency experts).
Immutable objects (and more particularly, immutable collections) avoid all of these problems. Once you get your mind around how they work, your code will develop into something which is easier to read, easier to maintain and less likely to fail in odd and unpredictable ways. Immutable objects are even easier to test, due not only to their easy mockability, but also the code patterns they tend to enforce. In short, they're good practice all around!
With that said, I'm hardly a zealot in this matter. Some problems just don't model nicely when everything is immutable. But I do think that you should try to push as much of your code in that direction as possible, assuming of course that you're using a language which makes this a tenable opinion (C/C++ makes this very difficult, as does Java). In short: the advantages depend somewhat on your problem, but I would tend to prefer immutability.
Immutable Objects vs. Immutable Collections
One of the finer points in the debate over mutable vs. immutable objects is the possibility of extending the concept of immutability to collections. An immutable object is an object that often represents a single logical structure of data (for example an immutable string). When you have a reference to an immutable object, the contents of the object will not change.
An immutable collection is a collection that never changes.
When I perform an operation on a mutable collection, then I change the collection in place, and all entities that have references to the collection will see the change.
When I perform an operation on an immutable collection, a reference is returned to a new collection reflecting the change. All entities that have references to previous versions of the collection will not see the change.
Clever implementations do not necessarily need to copy (clone) the entire collection in order to provide that immutability. The simplest example is the stack implemented as a singly linked list and the push/pop operations. You can reuse all of the nodes from the previous collection in the new collection, adding only a single node for the push, and cloning no nodes for the pop. The push_tail operation on a singly linked list, on the other hand, is not so simple or efficient.
Immutable vs. Mutable variables/references
Some functional languages take the concept of immutability to object references themselves, allowing only a single reference assignment.
In Erlang this is true for all "variables". I can only assign objects to a reference once. If I were to operate on a collection, I would not be able to reassign the new collection to the old reference (variable name).
Scala also builds this into the language with all references being declared with var or val, vals only being single assignment and promoting a functional style, but vars allowing a more C-like or Java-like program structure.
The var/val declaration is required, while many traditional languages use optional modifiers such as final in java and const in C.
Ease of Development vs. Performance
Almost always the reason to use an immutable object is to promote side effect free programming and simple reasoning about the code (especially in a highly concurrent/parallel environment). You don't have to worry about the underlying data being changed by another entity if the object is immutable.
The main drawback is performance. Here is a write-up on a simple test I did in Java comparing some immutable vs. mutable objects in a toy problem.
The performance issues are moot in many applications, but not all, which is why many large numerical packages, such as the Numpy Array class in Python, allow for In-Place updates of large arrays. This would be important for application areas that make use of large matrix and vector operations. This large data-parallel and computationally intensive problems achieve a great speed-up by operating in place.
Immutable objects are a very powerful concept. They take away a lot of the burden of trying to keep objects/variables consistent for all clients.
You can use them for low level, non-polymorphic objects - like a CPoint class - that are used mostly with value semantics.
Or you can use them for high level, polymorphic interfaces - like an IFunction representing a mathematical function - that is used exclusively with object semantics.
Greatest advantage: immutability + object semantics + smart pointers make object ownership a non-issue, all clients of the object have their own private copy by default. Implicitly this also means deterministic behavior in the presence of concurrency.
Disadvantage: when used with objects containing lots of data, memory consumption can become an issue. A solution to this could be to keep operations on an object symbolic and do a lazy evaluation. However, this can then lead to chains of symbolic calculations, that may negatively influence performance if the interface is not designed to accommodate symbolic operations. Something to definitely avoid in this case is returning huge chunks of memory from a method. In combination with chained symbolic operations, this could lead to massive memory consumption and performance degradation.
So immutable objects are definitely my primary way of thinking about object-oriented design, but they are not a dogma.
They solve a lot of problems for clients of objects, but also create many, especially for the implementers.
Check this blog post: http://www.yegor256.com/2014/06/09/objects-should-be-immutable.html. It explains why immutable objects are better than mutable. In short:
immutable objects are simpler to construct, test, and use
truly immutable objects are always thread-safe
they help to avoid temporal coupling
their usage is side-effect free (no defensive copies)
identity mutability problem is avoided
they always have failure atomicity
they are much easier to cache
You should specify what language you're talking about. For low-level languages like C or C++, I prefer to use mutable objects to conserve space and reduce memory churn. In higher-level languages, immutable objects make it easier to reason about the behavior of the code (especially multi-threaded code) because there's no "spooky action at a distance".
A mutable object is simply an object that can be modified after it's created/instantiated, vs an immutable object that cannot be modified (see the Wikipedia page on the subject). An example of this in a programming language is Pythons lists and tuples. Lists can be modified (e.g., new items can be added after it's created) whereas tuples cannot.
I don't really think there's a clearcut answer as to which one is better for all situations. They both have their places.
Shortly:
Mutable instance is passed by reference.
Immutable instance is passed by value.
Abstract example. Lets suppose that there exists a file named txtfile on my HDD. Now, when you are asking me to give you the txtfile file, I can do it in the following two modes:
I can create a shortcut to the txtfile and pass shortcut to you, or
I can do a full copy of the txtfile file and pass copied file to you.
In the first mode, the returned file represents a mutable file, because any change into the shortcut file will be reflected into the original one as well, and vice versa.
In the second mode, the returned file represents an immutable file, because any change into the copied file will not be reflected into the original one, and vice versa.
If a class type is mutable, a variable of that class type can have a number of different meanings. For example, suppose an object foo has a field int[] arr, and it holds a reference to a int[3] holding the numbers {5, 7, 9}. Even though the type of the field is known, there are at least four different things it can represent:
A potentially-shared reference, all of whose holders care only that it encapsulates the values 5, 7, and 9. If foo wants arr to encapsulate different values, it must replace it with a different array that contains the desired values. If one wants to make a copy of foo, one may give the copy either a reference to arr or a new array holding the values {1,2,3}, whichever is more convenient.
The only reference, anywhere in the universe, to an array which encapsulates the values 5, 7, and 9. set of three storage locations which at the moment hold the values 5, 7, and 9; if foo wants it to encapsulate the values 5, 8, and 9, it may either change the second item in that array or create a new array holding the values 5, 8, and 9 and abandon the old one. Note that if one wanted to make a copy of foo, one must in the copy replace arr with a reference to a new array in order for foo.arr to remain as the only reference to that array anywhere in the universe.
A reference to an array which is owned by some other object that has exposed it to foo for some reason (e.g. perhaps it wants foo to store some data there). In this scenario, arr doesn't encapsulate the contents of the array, but rather its identity. Because replacing arr with a reference to a new array would totally change its meaning, a copy of foo should hold a reference to the same array.
A reference to an array of which foo is the sole owner, but to which references are held by other object for some reason (e.g. it wants to have the other object to store data there--the flipside of the previous case). In this scenario, arr encapsulates both the identity of the array and its contents. Replacing arr with a reference to a new array would totally change its meaning, but having a clone's arr refer to foo.arr would violate the assumption that foo is the sole owner. There is thus no way to copy foo.
In theory, int[] should be a nice simple well-defined type, but it has four very different meanings. By contrast, a reference to an immutable object (e.g. String) generally only has one meaning. Much of the "power" of immutable objects stems from that fact.
Mutable collections are in general faster than their immutable counterparts when used for in-place
operations.
However, mutability comes at a cost: you need to be much more careful sharing them between
different parts of your program.
It is easy to create bugs where a shared mutable collection is updated
unexpectedly, forcing you to hunt down which line in a large codebase is performing the unwanted update.
A common approach is to use mutable collections locally within a function or private to a class where there
is a performance bottleneck, but to use immutable collections elsewhere where speed is less of a concern.
That gives you the high performance of mutable collections where it matters most, while not sacrificing
the safety that immutable collections give you throughout the bulk of your application logic.
If you return references of an array or string, then outside world can modify the content in that object, and hence make it as mutable (modifiable) object.
Immutable means can't be changed, and mutable means you can change.
Objects are different than primitives in Java. Primitives are built in types (boolean, int, etc) and objects (classes) are user created types.
Primitives and objects can be mutable or immutable when defined as member variables within the implementation of a class.
A lot of people people think primitives and object variables having a final modifier infront of them are immutable, however, this isn't exactly true. So final almost doesn't mean immutable for variables. See example here
http://www.siteconsortium.com/h/D0000F.php.
General Mutable vs Immutable
Unmodifiable - is a wrapper around modifiable. It guarantees that it can not be changed directly(but it is possibly using backing object)
Immutable - state of which can not be changed after creation. Object is immutable when all its fields are immutable. It is a next step of Unmodifiable object
Thread safe
The main advantage of Immutable object is that it is a naturally for concurrent environment. The biggest problem in concurrency is shared resource which can be changed any of thread. But if an object is immutable it is read-only which is thread safe operation. Any modification of an original immutable object return a copy
source of truth, side-effects free
As a developer you are completely sure that immutable object's state can not be changed from any place(on purpose or not). For example if a consumer uses immutable object he is able to use an original immutable object
compile optimisation
Improve performance
Disadvantage:
Copying of object is more heavy operation than changing a mutable object, that is why it has some performance footprint
To create an immutable object you should use:
1. Language level
Each language contains tools to help you with it. For example:
Java has final and primitives
Swift has let and struct[About].
Language defines a type of variable. For example:
Java has primitive and reference type,
Swift has value and reference type[About].
For immutable object more convenient is primitives and value type which make a copy by default. As for reference type it is more difficult(because you are able to change object's state out of it) but possible. For example you can use clone pattern on a developer level to make a deep(instead of shallow) copy.
2. Developer level
As a developer you should not provide an interface for changing state
[Swift] and [Java] immutable collection

Is it correct to call java.lang.String immutable?

This Java tutorial
says that an immutable object cannot change its state after creation.
java.lang.String has a field
/** Cache the hash code for the string */
private int hash; // Default to 0
which is initialized on the first call of the hashCode() method, so it changes after creation:
String s = new String(new char[] {' '});
Field hash = s.getClass().getDeclaredField("hash");
hash.setAccessible(true);
System.out.println(hash.get(s));
s.hashCode();
System.out.println(hash.get(s));
output
0
32
Is it correct to call String immutable?
A better definition would be not that the object does not change, but that it cannot be observed to have been changed. It's behavior will never change: .substring(x,y) will always return the same thing for that string ditto for equals and all the other methods.
That variable is calculated the first time you call .hashcode() and is cached for further calls. This is basically what they call "memoization" in functional programming languages.
Reflection isn't really a tool for "programming" but rather for meta-programming (ie programming programs for generating programs) so it doesn't really count. It's the equivalent of changing a constant's value using a memory debugger.
The term "Immutable" is vague enough to not allow for a precise definition.
I suggest reading Kinds of Immutability from Eric Lippert's blog. Although it's technically a C# article, it's quite relevant to the question posed. In particular:
Observational immutability:
Suppose you’ve got an object which has the property that every time
you call a method on it, look at a field, etc, you get the same
result. From the point of view of the caller such an object would be
immutable. However you could imagine that behind the scenes the object
was doing lazy initialization, memoizing results of function calls in
a hash table, etc. The “guts” of the object might be entirely mutable.
What does it matter? Truly deeply immutable objects never change their
internal state at all, and are therefore inherently threadsafe. An
object which is mutable behind the scenes might still need to have
complicated threading code in order to protect its internal mutable
state from corruption should the object be called on two threads “at
the same time”.
Once created, all the methods on a String instance (called with the same parameters) will always provide the same result. You cannot change its behavoiur (with any public method), so it will always represent the same entity. Also it is final and cannot be subclassed, so it is guaranteed that all instances will behave like this.
Therefore from public view the object is considered immutable. The internal state does not really matter in this case.
Yes it is correct to call them immutable.
While it is true that you can reach in and modify private ... and final ... variables of a class, it is an unnecessary and incredibly unwise thing to do on a String object. It is generally assumed that nobody is going to be crazy enough do it.
From a security standpoint, the reflection calls needed to modify the state of a String all perform security checks. Unless you've miss-implement your sandbox, the calls will be blocked for non-trusted code. So you should have to worry about this as a way that untrusted code can break sandbox security.
It is also worth noting that the JLS states that using reflection to change final, may break things (e.g. in multi-threading) or may not have any effect.
From the viewpoint of a developer who is using reflection, it is not correct to call String immutable. There are actual Java developers using reflection to write real software every day. Dismissing reflection as a "hack" is preposterous. However, from the viewpoint of a developer who is not using reflection, it is correct to call String immutable. Whether or not it is valid to assume that String is immutable depends on context.
Immutability is an abstract concept and therefore cannot apply in an absolute sense to anything with a physical form (see the ship of Theseus). Programming language constructs like objects, variables, and methods exist physically as bits in a storage medium. Data degradation is a physical process which happens to all storage media, so no data can ever be said to be truly immutable. In addition, it is almost always possible in practice to subvert the programming language features intended to prevent the mutation of a particular datum. In contrast, the number 3 is 3, has always been 3, and will always be 3.
As applied to program data, immutability should be considered a useful assumption rather than a fundamental property. For example, if one assumes that a String is immutable, one may cache its hash code for reuse and avoid the cost of ever recomputing its hash code again later. Virtually all non-trivial software relies on assumptions that certain data will not mutate for certain durations of time. Software developers generally assume that the code segment of a program will not change while it is executing, unless they are writing self-modifying code. Understanding what assumptions are valid in a particular context is an important aspect of software development.
It can not be modified from outside and it is a final class, so it can not be subclassed and made mutable. Theese are two requirments for immutability. Reflection is considered as a hack, its not a normal way of development.
A class can be immutable while still having mutable fields, as long as it doesn't provide access to its mutable fields.
It's immutable by design. If you use Reflection (getting the declared Field and resetting its accessibility), you are circumventing its design.
Reflection will allow you to change the contents of any private field. Is it therefore correct to call any object in Java immutable?
Immutability refers to changes that are either initiated by or perceivable by the application.
In the case of string, the fact that a particular implementation chooses to lazily calculate the hashcode is not perceptible to the application. I would go a step further, and say that an internal variable that is incremented by the object -- but never exposed and never used in any other way -- would also be acceptable in an "immutable" object.
Yes it is correct. When you modified a String like you do in your example, a new String is created but the older one maintain its value.

Caching hashes in Java collections?

When I implement a collection that uses hashes for optimizing access, should I cache the hash values or assume an efficient implementation of hashCode()?
On the other hand, when I implement a class that overrides hashCode(), should I assume that the collection (i.e. HashSet) caches the hash?
This question is only about performance vs. memory overhead. I know that the hash value of an object should not change.
Clarification:
A mutable object would of course have to clear the cached value when it is changed, whereas the collection relies on objects not changing. But this is not relevant for my question.
When designing Guava's ImmutableSet and ImmutableMap classes, we opted not to cache hash codes. This way, you'll get better performance from hash code caching when and only when you care enough to do the caching yourself. If we cached them ourselves, we'd be costing you extra time and memory even in the case that you care deeply about speed and space!
It's true that HashMap does this caching, but it was HashMap's author (Josh Bloch) who strongly suggested we not follow that precedent!
Edit: oh, also, if your hashCode() is slow, the caching by the collection only addresses half of the problem anyway, as hashCode() still must be invoked on the object passed in to get() no matter what.
Considering that java.lang.String caches its hash, i guess that hashcode() is supposed to be fast.
So as first approach, I would not cache hashes in my collection.
In my objects that I use, I would not cache hash code unless it is oviously slow, and only do it if profiling tell me so.
If my objects will be used by others, i would probubly consider cachnig hash codes sooner (but needs measurements anyway).
On the other hand, when I implement a class that overrides hashcode(),
should I assume that the collection (i.e. HashSet) caches the hash?
No, you should not make any assumptions beyond the scope of the class you are writing.
Of course you should try to make your hashCode cheap. If it isn't, and your class is immutable, create the hashCode on initialization or lazily upon the first request (see java.lang.String). If your class is not immutable, I don't see any other option than to re-calculate the hashCode every time.
I'd say in most cases you can rely on efficient implementations of hashCode(). AFAIK, that method is only invoked on lookup methods (like contains, get etc.) or methods that change the collection (add/put, remove etc.).
Thus, in most cases there shouldn't be any need to cache hashes yourself.
Why do you want to cache it? You need to ask objects what their hashcode is while you're working with it to allocate it to a hash bucket (and any objects that are in the same bucket that may have the same hashcode), but then you can forget it.
You could store objects in a wrapper HashNode or something, but I would try implementing it first without caching (just like HashSet et al does) and see if you need the added performance and complexity before going there.

What is the reason behind Enum.hashCode()?

The method hashCode() in class Enum is final and defined as super.hashCode(), which means it returns a number based on the address of the instance, which is a random number from programmers POV.
Defining it e.g. as ordinal() ^ getClass().getName().hashCode() would be deterministic across different JVMs. It would even work a bit better, since the least significant bits would "change as much as possible", e.g., for an enum containing up to 16 elements and a HashMap of size 16, there'd be for sure no collisions (sure, using an EnumMap is better, but sometimes not possible, e.g. there's no ConcurrentEnumMap). With the current definition you have no such guarantee, have you?
Summary of the answers
Using Object.hashCode() compares to a nicer hashCode like the one above as follows:
PROS
simplicity
CONTRAS
speed
more collisions (for any size of a HashMap)
non-determinism, which propagates to other objects making them unusable for
deterministic simulations
ETag computation
hunting down bugs depending e.g. on a HashSet iteration order
I'd personally prefer the nicer hashCode, but IMHO no reason weights much, maybe except for the speed.
UPDATE
I was curious about the speed and wrote a benchmark with surprising results. For a price of a single field per class you can a deterministic hash code which is nearly four times faster. Storing the hash code in each field would be even faster, although negligibly.
The explanation why the standard hash code is not much faster is that it can't be the object's address as objects gets moved by the GC.
UPDATE 2
There are some strange things going on with the hashCode performance in general. When I understand them, there's still the open question, why System.identityHashCode (reading from the object header) is way slower than accessing a normal object field.
The only reason for using Object's hashCode() and for making it final I can imagine, is to make me ask this question.
First of all, you should not rely on such mechanisms for sharing objects between JVMs. That's simply not a supported use case. When you serialize / deserialize you should rely on your own comparison mechanisms or only "compare" the results against objects within your own JVM.
The reason for letting enums hashCode be implemented as Objects hash code (based on identity) is because, within one JVM there will only be one instance of each enum object. This is enough to ensure that such implementation makes sense and is correct.
You could argue like "Hey, String and the wrappers for the primitives (Long, Integer, ...) all have well defined, deterministic, specifications of hashCode! Why doesn't the enums have it?", Well, to begin with, you can have several distinct string references representing the same string which means that using super.hashCode would be an error, so these classes necessarily need their own hashCode implementations. For these core classes it made sense to let them have well-defined deterministic hashCodes.
Why did they choose to solve it like this?
Well, look at the requirements of the hashCode implementation. The main concern is to make sure that each object should return a distinct hash code (unless it is equal to another object). The identity-based approach is super efficient and guarantees this, while your suggestion does not. This requirement is apparently stronger than any "convenience bonus" about easing up on serialization etc.
I think that the reason they made it final is to avoid developers shooting themselves in the foot by rewriting a suboptimal (or even incorrect) hashCode.
Regarding the chosen implementation: it's not stable across JVMs, but it's very fast, avoid collisions, and doesn't need an additional field in the enum. Given the normally small number of instances of an enum class, and the speed of the equals method, I wouldn't be surprised if the HashMap lookup time was bigger with your algorithm than with the current one, due to its additional complexity.
I've asked the same question, because did not saw this one. Why in Enum hashCode() refers to the Object hashCode() implementaion, instead of ordinal() function?
I encountered it as a sort of a problem, when defining my own hash function, for an Object relying on enum hashCode as one of the composites. When checking a value in a Set of Objects, returned by the function, I checked them in an order, which I would expect it to be the same, since the hashCode I define myself, and so I expect elements to fall at the same nodes on the tree, but since hashCode returned by enum changes from start to start, this assumption was wrong, and test could fail once in a while.
So, when I figured out the problem, I started using ordinal instead. I am not sure everyone writing hashCode for their Object realize this.
So basically, you can't define your own deterministic hashCode, while relying on enum hashCode, and you need to use ordinal instead
P.S. This was too big for a comment :)
The JVM enforces that for an enum constant, only one object will exist in memory. There is no way that you could end up with two different instance objects of the same enum constant within a single VM, not with reflection, not across the network via serialization/deserialization.
That being said, since it is the only object to represent this constant, it doesn't matter that its hascode is its address since no other object can occupy the same address space at the same time. It is guaranteed to be unique & "deterministic" (in the sense that in the same VM, in memory, all objects will have the same reference, no matter what it is).
There is no requirement for hash codes to be deterministic between JVMs and no advantage gained if they were. If you are relying on this fact you are using them wrong.
As only one instance of each enum value exists, Object.hashcode() is guaranteed never to collide, is good code reuse and is very fast.
If equality is defined by identity, then Object.hashcode() will always give the best performance.
The determinism of other hash codes is just a side effect of their implementation. As their equality is usually defined by field values, mixing in non-deterministic values would be a waste of time.
As long as we can't send an enum object1 to a different JVM I see no reason for putting such a requirements on enums (and objects in general)
1 I thought it was clear enough - an object is an instance of a class. A serialized object is a sequence of bytes, usually stored in a byte array. I was talking about an object.
One more reason that it is implemented like this I could imagine is because of the requirement for hashCode() and equals() to be consistent, and for the design goal of Enums that they sould be simple to use and compile-time constant (to use them is "case" constants). This also makes it legal to compare enum instances with "==", and you simply wouldn't want "equals" to behave differntly from "==" for enums. This again ties hashCode to the default Object.hashCode() reference-based behavior.
As said before, I also don't expect equals() and hashCode() to consider two enum constants from different JVM as being equal. When talking about serialization: For instance fields typed as enums the default binary serializer in Java has a special behaviour that serializess only the name of the constant, and on deserialization the reference to the corresponding enum value in the de-serializing JVM is re-created. JAXB and other XML-based serialization mechanisms work in a similar way. So: just don't worry

Considering object encapsulation, should getters return an immutable property?

When a getter returns a property, such as returning a List of other related objects, should that list and it's objects be immutable to prevent code outside of the class, changing the state of those objects, without the main parent object knowing?
For example if a Contact object, has a getDetails getter, which returns a List of ContactDetails objects, then any code calling that getter:
can remove ContactDetail objects from that list without the Contact object knowing of it.
can change each ContactDetail object without the Contact object knowing of it.
So what should we do here? Should we just trust the calling code and return easily mutable objects, or go the hard way and make a immutable class for each mutable class?
It's a matter of whether you should be "defensive" in your code. If you're the (sole) user of your class and you trust yourself then by all means no need for immutability. However, if this code needs to work no matter what, or you don't trust your user, then make everything that is externalized immutable.
That said, most properties I create are mutable. An occasional user botches this up, but then again it's his/her fault, since it is clearly documented that mutation should not occur via mutable objects received via getters.
It depends on the context. If the list is intended to be mutable, there is no point in cluttering up the API of the main class with methods to mutate it when List has a perfectly good API of its own.
However, if the main class can't cope with mutations, then you'll need to return an immutable list - and the entries in the list may also need to be immutable themselves.
Don't forget, though, that you can return a custom List implementation that knows how to respond safely to mutation requests, whether by firing events or by performing any required actions directly. In fact, this is a classic example of a good time to use an inner class.
If you have control of the calling code then what matters most is that the choice you make is documented well in all the right places.
Joshua Bloch in his excellent "Effective Java" book says that you should ALWAYS make defensive copies when returning something like this. That may be a little extreme, especially if the ContactDetails objects are not Cloneable, but it's always the safe way. If in doubt always favour code safety over performance - unless profiling has shown that the cloneing is a real performance bottleneck.
There are actually several levels of protection you can add. You can simply return the member, which is essentially giving any other class access to the internals of your class. Very unsafe, but in fairness widely done. It will also cause you trouble later if you want to change the internals so that the ContactDetails are stored in a Set. You can return a newly-created list with references to the same objects in the internal list. This is safer - another class can't remove or add to the list, but it can modify the existing objects. Thirdly return a newly created list with copies of the ContactDetails objects. That's the safe way, but can be expensive.
I would do this a better way. Don't return a list at all - instead return an iterator over a list. That way you don't have to create a new list (List has a method to get an iterator) but the external class can't modify the list. It can still modify the items, unless you write your own iterator that clones the elements as needed. If you later switch to using another collection internally it can still return an iterator, so no external changes are needed.
In the particular case of a Collection, List, Set, or Map in Java, it is easy to return an immutable view to the class using return Collections.unmodifiableList(list);
Of course, if it is possible that the backing-data will still be modified then you need to make a full copy of the list.
Depends on the context, really. But generally, yes, one should write as defensive code as possible (returning array copies, returning readonly wrappers around collections etc.). In any case, it should be clearly documented.
I used to return a read-only version of the list, or at least, a copy. But each object contained in the list must be editable, unless they are immutable by design.
I think you'll find that it's very rare for every gettable to be immutable.
What you could do is to fire events when a property is changed within such objects. Not a perfect solution either.
Documentation is probably the most pragmatic solution ;)
Your first imperative should be to follow the Law of Demeter or ‘Tell don't ask’; tell the object instance what to do e.g.
contact.print( printer ) ; // or
contact.show( new Dialog() ) ; // or
contactList.findByName( searchName ).print( printer ) ;
Object-oriented code tells objects to do things. Procedural code gets information then acts on that information. Asking an object to reveal the details of its internals breaks encapsulation, it is procedural code, not sound OO programming and as Will has already said it is a flawed design.
If you follow the Law of Demeter approach any change in the state of an object occurs through its defined interface, therefore side-effects are known and controlled. Your problem goes away.
When I was starting out I was still heavily under the influence of HIDE YOUR DATA OO PRINCIPALS LOL. I would sit and ponder what would happen if somebody changed the state of one of the objects exposed by a property. Should I make them read only for external callers? Should I not expose them at all?
Collections brought out these anxieties to the extreme. I mean, somebody could remove all the objects in the collection while I'm not looking!
I eventually realized that if your objects' hold such tight dependencies on their externally visible properties and their types that, if somebody touches them in a bad place you go boom, your architecture is flawed.
There are valid reasons to make your external properties readonly and their types immutable. But that is the corner case, not the typical one, imho.
First of all, setters and getters are an indication of bad OO. Generally the idea of OO is you ask the object to do something for you. Setting and getting is the opposite. Sun should have figured out some other way to implement Java beans so that people wouldn't pick up this pattern and think it's "Correct".
Secondly, each object you have should be a world in itself--generally, if you are going to use setters and getters they should return fairly safe independent objects. Those objects may or may not be immutable because they are just first-class objects. The other possibility is that they return native types which are always immutable. So saying "Should setters and getters return something immutable" doesn't make too much sense.
As for making immutable objects themselves, you should virtually always make the members inside your object final unless you have a strong reason not to (Final should have been the default, "mutable" should be a keyword that overrides that default). This implies that wherever possible, objects will be immutable.
As for predefined quasi-object things you might pass around, I recommend you wrap stuff like collections and groups of values that go together into their own classes with their own methods. I virtually never pass around an unprotected collection simply because you aren't giving any guidance/help on how it's used where the use of a well-designed object should be obvious. Safety is also a factor since allowing someone access to a collection inside your class makes it virtually impossible to ensure that the class will always be valid.

Categories