In class today, my professor was discussing how to structure a class. The course primarily uses Java and I have more Java experience than the teacher (he comes from a C++ background), so I mentioned that in Java one should favor immutability. My professor asked me to justify my answer, and I gave the reasons that I've heard from the Java community:
Safety (especially with threading)
Reduced object count
Allows certain optimizations (especially for garbage collector)
The professor challenged my statement by saying that he'd like to see some statistical measurement of these benefits. I cited a wealth of anecdotal evidence, but even as I did so, I realized he was right: as far as I know, there hasn't been an empirical study of whether immutability actually provides the benefits it promises in real-world code. I know it does from experience, but others' experiences may differ.
So, my question is, have there been any statistical studies done on the effects of immutability in real-world code?
I would point to Item 15 in Effective Java. The value of immutability is in the design (and it isn't always appropriate - it is just a good first approximation) and design preferences are rarely argued from a statistical point of view, but we have seen mutable objects (Calendar, Date) that have gone really bad, and serious replacements (JodaTime, JSR-310) have opted for immutability.
The biggest advantage of immutability in Java, in my opinion, is simplicity. It becomes much simpler to reason about the state of an object, if that state cannot change. This is of course even more important in a multi-threaded environment, but even in simple, linear single-threaded programs it can make things far easier to understand.
See this page for more examples.
So, my question is, have there been
any statistical studies done on the
effects of immutability in real-world
code?
I'd argue that your professor is just being obtuse -- not necessarily intentionally or even a bad thing. Its just that the question is too vague. Two real problems with the question:
"Statistical studies on the effect of [x]" doesn't really mean anything if you don't specify what kind of measurements you're looking for.
"Real-world code" doesn't really mean anything unless you state a specific domain. Real world code includes scientific computing, game development, blog engines, automated proof generators, stored procedures, operating system kernals, etc
For what its worth, the ability for the compiler to optimize immutable objects is well-documented. Off the top of my head:
The Haskell compiler performs deforestation (also called short-cut fusion), where Haskell will transform the expression map f . map g to map f . g. Since Haskell functions are immutable, these expressions are guaranteed to produce equivalent output, but the second function runs twice as fast since we don't need to create an intermediate list.
Common subexpression elimination where we could convert x = foo(12); y = foo(12) to temp = foo(12); x = temp; y = temp; is only possible if the compiler can guarantee foo is a pure function. To my knowledge, the D compiler can perform substitutions like this using the pure and immutable keywords. If I remember correctly, some C and C++ compilers will aggressively optimize calls to these functions marked "pure" (or whatever the equivalent keyword is).
So long as we don't have mutable state, a sufficiently smart compiler can execute linear blocks of code multiple threads with a guarantee that we won't corrupt the state of variables in another thread.
Regarding concurrency, the pitfalls of concurrency using mutable state are well-documented and don't need to be restated.
Sure, this is all anecdotal evidence, but that's pretty much the best you'll get. The immutable vs mutable debate is largely a pissing match, and you are not going to find a paper making a sweeping generalization like "functional programming is superior to imperative programming".
At most, you'll probably find that you can summarize the benefits of immutable vs mutable in a set of best practices rather than as codified studies and statistics. For example, mutable state is the enemy of multithreaded programming; on the other hand, mutable queues and arrays are often easier to write and more efficient in practice than their immutable variants.
It takes practice, but eventually you learn to use the right tool for the job, rather than shoehorning your favorite pet paradigm into project.
I think your professor's being overly stubborn (probably deliberately, to push you to a fuller understanding). Really the benefits of immutability are not so much what the complier can do with optimisations, but really that it's much easier for us humans to read and understand. A variable that is guaranteed to be set when the object is created and is guaranteed not to change afterwards, is much easier to grok and reason with than one which is this value now but might be set to some other value later.
This is especially true with threading, in that you don't need to worry about processor caches and monitors and all that boilerplate that comes with avoiding concurrent modifications, when the language guarantees that no such modification can possibly occur.
And once you express the benefits of immutability as "the code is easier to follow", it feels a bit sillier to ask for empirical measurements of productivity increases vis-a-vis "easier-to-followness".
On the other hand, the compiler and Hotspot can probably perform certain optimisations based on knowing that a value can never change - like you I have a feeling that this would take place and is a good things but I'm not sure of the details. It's a lot more likely that there will be empirical data for the types of optimisation that can occur, and how much faster the resulting code is.
Don't argue with the prof. You have nothing to gain.
These are open questions, like dynamic vs static typing. We sometimes think functional techniques involving immutable data are better for various reasons, but it's mostly a matter of style so far.
What would you objectively measure? GC and object count could be measured with mutable/immutable versions of the same program (although how typical that would be would be subjective, so this is a pretty weak argument). I can't imagine how you could measure the removal of threading bugs, except maybe anecdotally by comparison with a real world example of a production application plagued by intermittent issues fixed by adding immutability.
Immutability is a good thing for value objects. But how about other things? Imagine an object that creates a statistic:
Stats s = new Stats ();
... some loop ...
s.count ();
s.end ();
s.print ();
which should print "Processed 536.21 rows/s". How do you plan to implement count() with an immutable? Even if you use an immutable value object for the counter itself, s can't be immutable since it would have to replace the counter object inside of itself. The only way out would be:
s = s.count ();
which means to copy the state of s for every round in the loop. While this can be done, it surely isn't as efficient as incrementing the internal counter.
Moreover, most people would fail to use this API right because they would expect count() to modify the state of the object instead of returning a new one. So in this case, it would create more bugs.
As other comments have claimed, it would be very, very hard to collect statistics on the merits of immutable objects, because it would be virtually impossible to find control cases - pairs of software applications which are alike in every way, except that one uses immutable objects and the other does not. (In nearly every case, I would claim that one version of that software was written some time after the other, and learned numerous lessons from the first, and so improvements in performance will have many causes.) Any experienced programmer who thinks about this for a moment ought to realize this. I think your professor is trying to deflect your suggestion.
Meanwhile, it is very easy to make cogent arguments in favor of immutability, at least in Java, and probably in C# and other OO languages. As Yishai states, Effective Java makes this argument well. So does the copy of Java Concurrency in Practice sitting on my bookshelf.
Immutable objects allow code which to share an object's value by sharing a reference. Mutable objects, however, have the identity that code which wants to share an object's identity to do so by sharing a reference. Both kinds of sharing are essential in most applications. If one doesn't have immutable objects available, it's possible to share values by copying them into either new objects or objects supplied by the intended recipient of those values. Getting my without mutable objects is much harder. One could somewhat "fake" mutable objects by saying stateOfUniverse = stateOfUniverse.withSomeChange(...), but would requires that nothing else modify stateOfUniverse while its withSomeChange method is running [precluding any sort of multi-threading]. Further, if one were e.g. trying to track a fleet of trucks, and part of the code was interested in one particular truck, it would be necessary for that code to always look up that truck in a table of trucks any time it might have changed.
A better approach is to subdivide the universe into entities and values. Entities would have changeable characteristics, but an immutable identity, so a storage location of e.g. type Truck could continue to identify the same truck even as the truck itself changes position, loads and unloads cargo, etc. Values would not have generally have a particular identity, but would have immutable characteristics. A Truck might store its location as type WorldCoordinate. A WorldCoordinate that represents 45.6789012N 98.7654321W would continue to so as long as any reference to it exists; if a truck that was at that location moved north slightly, it would create a new WorldCoordinate to represent 45.6789013N 98.7654321W, abandon the old one, and store a reference to that new one.
It is generally easiest to reason about code when everything encapsulates either an immutable value or an immutable identity, and when the things which are supposed to have an immutable identity are mutable. If one didn't want to use any mutable objects outside a variable stateOfUniverse, updating a truck's position would require something like:
ImmutableMapping<int,Truck> trucks = stateOfUniverse.getTrucks();
Truck myTruck = trucks.get(myTruckId);
myTruck = myTruck.withLocation(newLocation);
trucks = trucks.withItem(myTruckId,myTruck);
stateOfUniverse = stateOfUniverse.withTrucks(trucks);
but reasoning about that code would be more difficult than would be:
myTruck.setLocation(newLocation);
Related
What was the driving factor or design plan in making the methods of HashTable synchronized?
This link says that HashTable is synchronized because its methods are synchronized. But, I want to know the reason "why" the methods were synchronized?
Was it just to provide some synchronization feature? A developer could explicitly handle a race condition through synchronization techniques. Why provide HashTable with this feature?
Keep in mind: these classes were created "ages" ago - when you check the javadoc for Hashtable, you find it says "since Java 1.0"; whereas HashMap says "1.2"!
Back then, Java was trying to compete with languages like C and C++; by providing unique selling points such as "built-in concurrency".
But people quickly figured that one better synchronizes containers when using them in multi-threaded environments!
So my (more of an opinion-based) answer is: at the time when this class was first designed, people assumed that the requirement "can be used by multiple threads" was more important than "gives optimal performance".
Because Java was "advertised" like: "use it to write multi-threaded write once run everywhere code". That approach fails quickly when the default container classes given to people need additional outside wrapping to actually make them "multi-threaded" ready.
During the years, the people behind Java started to understand that "more granular" solutions are required. Therefore the core collection classes are not synchronized to avoid the corresponding performance hits. Meaning: the default with collections is to go "unprotected"; so you have to put in some thoughts when your requirements is that "multi-threaded" correctness.
Same for "lists" btw: Vector is synchronized; ArrayList is not.
We cannot tell you why. Those who designed Java over two decades ago maybe can. It's not a useful question. Assuming you actually wanted to ask about java.util.Hashtable and not the fictional HashTable type, bear in mind that it's been obsolescent for nineteen years. Nineteen years! Don't use it. It (and Vector) have cruft that the replacement types, both synchronized and unsynchronized, do not carry. Use the modern (as of nineteen years ago) types.
What does it mean to say "with inheritance you're locked into compile-time decisions about code behavior".
I suggest this post from Donal Fellows on Programmers,
Some languages are pretty strongly static, and only allow the
specification of the inheritance relationship between two classes at
the time of definition of those classes. For C++, definition time is
practically the same as compilation time. (It's slightly different in
Java and C#, but not very much.) Other languages allow much more
dynamic reconfiguration of the relationship of classes (and class-like
objects in Javascript) to each other; some go as far as allowing the
class of an existing object to be modified, or the superclass of a
class to be changed. (This can cause total logical chaos, but can also
model real world nasties quite well.)
But it is important to contrast this to composition, where the
relationship between one object and another is not defined by their
class relationship (i.e., their type) but rather by the references
that each has in relation to the other. General composition is a very
powerful and ubiquitous method of arranging objects: when one object
needs to know something about another, it has a reference to that
other object and invokes methods upon it as necessary. As soon as you
start looking for this super-fundamental pattern, you'll find it
absolutely everywhere; the only way to avoid it is to put everything
in one object, which would be massively dumb! (There's also stricter
UML composition/aggregation, but that's not what the GoF book is
talking about there.)
One of the things about the composition relationship is that
particular objects do not need to be hard-bound to each other. The
pattern of concrete objects is very flexible, even in very static
languages like C++. (There is an upside to having things very static:
it is possible to analyse the code more closely and — at least
potentially — issue better code with less overhead.) To recap,
Javascript, as with many other dynamic languages, can pretend it
doesn't use compilation at all; just pretence, of course, but the
fundamental language model doesn't require transformation to a fixed
intermediate format (e.g., a “binary executable on disk”). That
compilation which is done is done at runtime, and can be easily redone
if things vary too much. (The fascinating thing is that such a good
job of compilation can be done, even starting from a very dynamic
basis…)
Some GoF patterns only really make sense in the context of a language
where things are fairly static. That's OK; it just means that not all
forces affecting the pattern are necessarily listed. One of the key
points about studying patterns is that it helps us be aware of these
important differences and caveats. (Other patterns are more universal.
Keep your eyes open for those.)
While developing a two-dimensional vector class as part of a math library, I'm considering having static and instance method pairs for stylistic and usability reasons. That is, two equivalent functions but one is static & non-mutating, and the other is instanced & mutating. I know I'm not the first person to consider this problem (See here, for example) but I haven't found any information that directly addresses it.
Pros of having static and instance method pairs:
Some people prefer to use one or the other and in some cases being able to choose makes code easier to read.
It is implied that static methods are not mutating when both static and instanced methods are provided. This can make the calling code much clearer, e.g.:
someVector = Vector2d.add(vec1, vec2);
someVector = (new Vector2d(vec1)).add(vec2); // does the same thing although more convoluted.
// similarly adding directly to a vector is simpler with a mutator method.
someVector.add(vec2);
someVector = Vector2d.add(someVector, vec2);
This is especially important when long chains of function calls are used, which is common with vectors.
In-place operations can be faster computationally than creating a new instance for every operation. The user decides when performance is important. For users of a Vector class, performance may be important as vectors are frequently used in computationally expensive code.
Pros of having only static or instance methods, but not both:
No significant code redundancy. Easier to maintain.
Less bloat. The javadocs will be almost half the size.
Not necessary to inform users that static methods never mutate and non-getter instanced methods always mutate.
How frowned upon is having static/instance method pairs? Is it used in any major libraries?
Is the pattern "static methods don't mutate, instance methods do" widely known?
I think your concept of providing both static/immutable and instance/mutable methods is a good one. I think the distinction is easy to explain and will be easy for the API users to understand and remember.
I think your API implementation code will not have redundant business logic. You will find that that you repeat a pattern where the static implementation creates a new instance and calls the instance method on that new instance.
Given that I am lazy, I would look at building a bit of infrastructure that would auto-generate the static methods, their javadoc and their unit tests at compile-time. This would be overkill if you have 10 methods, but becomes a big win if you have 1,000 methods.
On the first part, "static methods don't mutate", that's widely used in OOP. I haven't heard of it being expressed explicitly. But it is common sense: "If you change an object, why would the method be static if it could be an instance method?" So I completely agree with the "static methods don't mutate".
On the second part, "instance methods do [mutate]", that's actually not as widely used. It rather depends on whether you decide your design to apply immutability or mutability. Examples from the Java API: java.lang.String is immutable, java.util.Date is mutable (most likely by accident / bad design), java.lang.StringBuilder is mutable intentionally (that's its purpose). Mutability can lead to defensive cloning in order to protect the code from mutation bugs. Whether this really is a problem depends on a few things:
Is it an API others will use? You never know how they will use your code... IMO it's more important to protect API code from mutation bugs than normal code.
How good is the unit test coverage? Would your unit tests find all the mutation bugs that might sneak in? If you follow TDD properly (Uncle Bob's 3 Laws of TDD), and it's non-API code, mutation bugs are very unlikely to sneak in without being instantly discovered.
If you have code that has to protect itself against mutation bugs using defensive cloning, how often is that code called? If defensive clones are created frequently, it might be better to use immutable objects than mutable objects. Basically this is the call of the number of calls of read-only methods (that would eventually defensively clone) of associating classes vs. the number of calls of mutator methods on the class itself.
Personally, I prefer immutable objects, I'm a fan of final (if I could change Java, I would make final the default for all fields and variables, and introduce a keyword var to make them non-final), and I try to do functional programming in Java, although it is not a functional programming language, as much as possible. From my experience I know that I spend significantly less time debugging my code than others (actually I run the Java debugger maybe twice a year or so). I do not have enough empirical data and proper analysis for creating any kind of "causal relationship" between experience, immutability, functional programming and correctness, therefore I will only say I believe that immutability and functional programming help for correctness, and you will have to come up with your own judgement on this.
Concluding on the second part, "instance methods do [mutate]" is the widely used assumption in case the object is mutable anyway, otherwise instance methods would clone.
I am wondering about the benefits of having the string-type immutable from the programmers point-of-view.
Technical benefits (on the compiler/language side) can be summarized mostly that it is easier to do optimisations if the type is immutable. Read here for a related question.
Also, in a mutable string type, either you have thread-safety already built-in (then again, optimisations are harder to do) or you have to do it yourself. You will have in any case the choice to use a mutable string type with built-in thread safety, so that is not really an advantage of immutable string-types. (Again, it will be easier to do the handling and optimisations to ensure thread-safety on the immutable type but that is not the point here.)
But what are the benefits of immutable string-types in the usage? What is the point of having some types immutable and others not? That seems very inconsistent to me.
In C++, if I want to have some string to be immutable, I am passing it as const reference to a function (const std::string&). If I want to have a changeable copy of the original string, I am passing it as std::string. Only if I want to have it mutable, I am passing it as reference (std::string&). So I just have the choice about what I want to do. I can just do this with every possible type.
In Python or in Java, some types are immutable (mostly all primitive types and strings), others are not.
In pure functional languages like Haskell, everything is immutable.
Is there a good reason why it make sense to have this inconsistency? Or is it just purely for technical lower level reasons?
What is the point of having some
types immutable and others not?
Without some mutable types, you'd have to go the whole hog to pure functional programming -- a completely different paradigm than the OOP and procedural approaches which are currently most popular, and, while extremely powerful, apparently very challenging to a lot of programmers (what happens when you do need side effects in a language where nothing is mutable, and in real-world programming of course you inevitably do, is part of the challenge -- Haskell's Monads are a very elegant approach, for example, but how many programmers do you know that fully and confidently understand them and can use them as well as typical OOP constructs?-).
If you don't understand the enormous value of having multiple paradigms available (both FP one and ones crucially relying on mutable data), I recommend studying Haridi's and Van Roy's masterpiece, Concepts, Techniques, and Models of Computer Programming -- "a SICP for the 21st Century", as I once described it;-).
Most programmers, whether familiar with Haridi and Van Roy or not, will readily admit that having at least some mutable data types is important to them. Despite the sentence I've quoted above from your Q, which takes a completely different viewpoint, I believe that may also be the root of your perplexity: not "why some of each", but rather "why some immutables at all".
The "thoroughly mutable" approach was once (accidentally) obtained in a Fortran implementation. If you had, say,
SUBROUTINE ZAP(I)
I = 0
RETURN
then a program snippet doing, e.g.,
PRINT 23
ZAP(23)
PRINT 23
would print 23, then 0 -- the number 23 had been mutated, so all references to 23 in the rest of the program would in fact refer to 0. Not a bug in the compiler, technically: Fortran had subtle rules about what your program is and is not allowed to do in passing constants vs variables to procedures that assign to their arguments, and this snippet violates those little-known, non-compiler-enforceable rules, so it's a but in the program, not in the compiler. In practice, of course, the number of bugs caused this way was unacceptably high, so typical compilers soon switched to less destructive behavior in such situations (putting constants in read-only segments to get a runtime error, if the OS supported that; or, passing a fresh copy of the constant rather than the constant itself, despite the overhead; and so forth) even though technically they were program bugs allowing the compiler to display undefined behavior quite "correctly";-).
The alternative enforced in some other languages is to add the complication of multiple ways of parameter passing -- most notably perhaps in C++, what with by-value, by-reference, by constant reference, by pointer, by constant pointer, ... and then of course you see programmers baffled by declarations such as const foo* const bar (where the rightmost const is basically irrelevant if bar is an argument to some function... but crucial instead if bar is a local variable...!-).
Actually Algol-68 probably went farther along this direction (if you can have a value and a reference, why not a reference to a reference? or reference to reference to reference? &c -- Algol 68 put no limitations on this, and the rules to define what was going on are perhaps the subtlest, hardest mix ever found in an "intended for real use" programming language). Early C (which only had by-value and by-explicit-pointer -- no const, no references, no complications) was no doubt in part a reaction to it, as was the original Pascal. But const soon crept in, and complications started mounting again.
Java and Python (among other languages) cut through this thicket with a powerful machete of simplicity: all argument passing, and all assignment, is "by object reference" (never reference to a variable or other reference, never semantically implicit copies, &c). Defining (at least) numbers as semantically immutable preserves programmers' sanity (as well as this precious aspect of language simplicity) by avoiding "oopses" such as that exhibited by the Fortran code above.
Treating strings as primitives just like numbers is quite consistent with the languages' intended high semantic level, because in real life we do need strings that are just as simple to use as numbers; alternatives such as defining strings as lists of characters (Haskell) or as arrays of characters (C) poses challenges to both the compiler (keeping efficient performance under such semantics) and the programmer (effectively ignoring this arbitrary structuring to enable use of strings as simple primitives, as real life programming often requires).
Python went a bit further by adding a simple immutable container (tuple) and tying hashing to "effective immutability" (which avoids certain surprises to the programmer that are found, e.g., in Perl, with its hashes allowing mutable strings as keys) -- and why not? Once you have immutability (a precious concept that saves the programmer from having to learn about N different semantics for assignment and argument passing, with N tending to increase with time;-), you might as well get full mileage out of it;-).
I am not sure if this qualifies as non-technical, nevertheless: if strings are mutable, then most(*) collections need to make private copies of their string keys.
Otherwise a "foo" key changed externally to "bar" would result in "bar" sitting in the internal structures of the collection where "foo" is expected. This way "foo" lookup would find "bar", which is less of a problem (return nothing, reindex the offending key) but "bar" lookup would find nothing, which is a bigger problem.
(*) A dumb collection that does a linear scan of all keys on each lookup would not have to do that, since it would naturally accomodate key changes.
There is no overarching, fundamental reason not to have strings mutable. The best explanation I have found for their immutability is that it promotes a more functional, less side-effectsy way of programming. This ends up being cleaner, more elegant, and more Pythonic.
Semantically, they should be immutable, no? The string "hello" should always represent "hello". You can't change it any more than you can change the number three!
Not sure if you would count this as a 'technical low level' benefit, but the fact that immutable string is implicitly threadsafe saves you a lot of effort of coding for thread safety.
Slightly toy example...
Thread A - Check user with login name FOO has permission to do something, return true
Thread B - Modify user string to login name BAR
Thread A - Perform some operation with login name BAR due to previous permission check passing against FOO.
The fact that the String can't change saves you the effort of guarding against this.
If you want full consistency you can only make everything immutable, because mutable Bools or Ints would simply make no sense at all. Some functional languages do that in fact.
Python's philosophy is "Simple is better than complex." In C you need to be aware that strings can change and think about how that can affect you. Python assumes that the default use case for strings is "put text together" - there is absolutely nothing you need to know about strings to do that. But if you want your strings to change, you just have to use a more appropriate type (ie lists, StringIO, templates, etc).
In a language with reference semantics for user-defined types, having mutable strings would be a desaster, because every time you assign a string variable, you would alias a mutable string object, and you would have to do defensive copies all over the place. That's why strings are immutable in Java and C# -- if the string object is immutable, it does not matter how many variables point to it.
Note that in C++, two string variables never share state (at least conceptionally -- technically, there might be copy-on-write going on, but that is getting out of fashion due to inefficiencies in multi-threading scenarios).
If strings are mutable, then many consumers of a string will have to to make copies of it. If strings are immutable, this is far less important (unless immutability is enforced by hardware interlocks, it might not be a bad idea for some security-conscious consumers of a string to make their own copies in case the strings they're given aren't as immutable as they should be).
The StringBuilder class is pretty good, though I think it would be nicer if it had a "Value" property (read would be equivalent to ToString, but it would show up in object inspectors; write would allow direct setting of the whole content) and a default widening conversion to a string. It would have been nice in theory to have MutableString type descended from a common ancestor with String, so a mutable string could be passed to a function which didn't care whether a string was mutable, though I suspect that optimizations which rely on the fact that Strings have a certain fixed implementation would have been less effective.
The main advantage for the programmer is that with mutable strings, you never need to worry about who might alter your string. Therefore, you never have to consciously decide "Should I copy this string here?".
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
The advantages of immutable objects in Java seem clear:
consistent state
automatic thread safety
simplicity
You can favour immutability by using private final fields and constructor injection.
But, what are the downsides to favouring immutable objects in Java?
i.e.
incompatibility with ORM or web presentation tools?
Inflexible design?
Implementation complexities?
Is it possible to design a large-scale system (deep object graph) that predominately uses immutable objects?
But, what are the downsides to
favouring immutable objects in Java?
incompatibility with ORM or web
presentation tools?
Reflection based frameworks are complicated by immutable objects since they requires constructor injection:
there are no default arguments in Java, which forces us to ALWAYS provide all of the necessary dependencies
constructor overriding can be messy
constructor argument names are not usually available through reflection, which forces us to depend on argument order for dependency resolution
Implementation complexities?
Creating immutable objects is still a boring task; the compiler should take care of the implementation details, as in groovy
Is it possible to design a large-scale system (deep object graph) that predominately uses immutable objects?
definitely yes; immutable objects makes great building blocks for other objects (they favor composition) since it's much easier to maintain the invariant of a complex object when you can rely on its immutable components. The only true downside to me is about creating many temporary objects (e.g. String concat was a problem in the past).
With immutability, any time you need to modify data, you need to create a new object. This can be expensive.
Imagine needing to modify one bit in an object that consumes several megabytes of memory: you would need to instantiate a whole new object, allocate memory, etc. If you need to do this many times, mutability becomes very attractive.
If you go for mutability then you will find that whenever you need to call a method that you don't want to have the object change, or you need to return an object that is part of the internal state, you need to make a defensive copy.
If you really look at programs that make use of mutible objects you will find that they are prone to "attack" by modifying:
objects passed to constructors
objects passed to methods
objects returned from methods.
The issue doesn't show up very often because most programs don't change the data (they are in reality immutable by virtue of them never changing).
I personally make every thing I possibly can final. I probably have 90%-95% of all variables (parameters, local, instance, static, exceptions, etc...) marked as final. There are some cases where it has to be mutable, but the vast majority of cases it does not.
I think it might depend on your focus. If you are writing libraries for 3rd parties to use you think about this much more than if you are writing an application that only you (or your team) will maintain.
I find that you can write large scale applications using immutable objects for the majority of the system without too much pain.
Fundamentally, in the real world, the state associated with many particular identities will change. If I ask what is "the present position of Joe's Buick", today it might be a location in Seattle, and tomorrow it might be a location in Los Alamos. It would be possible to define and create a GeographicLocation object whose value will always represent the location where Joe's Buick was at some particular moment in time and would never changes--if today it represents a spot in Seattle, then it will always do so. Such an object, however, would have no continuing identity as "the present location of Joe's Buick".
It may also be possible to define things so that there is a VehicleLocation object which is connected to Joe's Buick such that the object always represents "the present location of Joe's Buick". Such an object could retains its identity as "the present location of Joe's Buick", even as the car moves around, but would not represent a constant geographical location. Defining "identity" may be tricky if one considers the scenario where Joe sells his Buick to Bob and buys a Ford--should the object track "the present location of Joe's Ford" or "the present location of Bob's Buick"--but in many cases such issues may be avoided by using a data model that guarantees that some aspects of object identity will never change.
It isn't possible for everything about an object to be immutable. If an object is immutable, then it cannot have an immutable identity that encapsulates anything beyond its current state. If an object is mutable, however, it can have an immutable identity whose meaning transcends its present state. In many situations, having an immutable identity is more useful than having an immutable state, and in such situations mutable objects are nearly essential. While it is possible in some cases to "simulate" mutable objects by having an immutable object which would search through the most recent version of an immutable objects to find information that may "change" between one version and the next, such an approaches are often extremely inefficient. Even if one could magically receive once per minute a bound book that gave the location of every vehicle everywhere, looking up "Joe's Buick" in the book would take a lot longer than merely asking a "present location of Joe's Buick" object which would always know where the car was.
You pretty much answered your own question. The JavaBean specification, I don't believe, mentions anything about immutability, yet JavaBeans are the bread and butter of many Java frameworks.
The concept of immutable types is somewhat uncommon for people used to imperative programming styles. However, for many situations immutability has serious advantages, you named the most important ones already.
There are good ways to implement immutable balanced trees, queues, stacks, dequeues and other data structures. And in fact many modern programming languages / frameworks only support immutable strings because of their advantages and sometimes also other objects.
With an immutable object, if the value needs to be changed, then it must be replaced with a new instance. Depending on the lifecycle of the object, replacing it with a different instance can potentially increase the tenured (long) garbage collection time. This becomes more critical if the object is kept around in memory long enough to be placed in the tenured generation.
The problem in java is that one has to live with all those objects, where the class looks like:
class Mutable {
State1 f1;
MoreState f2;
void doSomething() { // mutate the state, but don't document it }
void doSomethingElse() /// mutate the state heavily, do not mention in doc
}
(Note the missing Cloneable interface).
The problem with the garbage collector is not such a big one nowadays. The VM's are happy with short living objects.
Advances in Compiler/JIT technology will make it possible, sooner or later, to optimize intermediate temporary object creation away. For example:
BigInteger three =, two =, i1 = ...;
BigInteger i2 = i1.mul(three).div(two);
The JIT could notice that the intermediate object i1.mul(three) can be used for the end result and call a variant of the div method that works on a mutable accumulator.
See Functional Java to attain a comprehensive answer to your question.
Immutability, as every other design pattern, should only be used when you need it. You give the example of thread safety: In a highly threaded application, you could favor immutability over the added expense of making it thread safe yourself.
However, if your design requires objects to be mutable, don't go out of your way to make them immutable, just because "it's a design pattern".
As for your graph, you could choose to make your nodes immutable and let another class take care of the connections between them, or you could make a mutable node that takes care of its own children and has an immutable value class.
Probably the biggest cost of using immutabile objects in Java is that future developers won't be expecting it or used to that style. Expect to either document heavily or watch alot of your objects spawn mutable peers over time.
That being said, the only real technical reason I can think of to avoid immutable objects is GC churn. For most applications, I don't think this is a compelling reason to avoid them.
The biggest thing I've ever done with a ~90% immutable objects was a toy scheme-esque interpreter, so its certainly possible to do complex Java projects.
in immutable data you dont set things twice... see haskell and scala vals (and clojure of cource)...
for example.. for a data structure.. like a tree, when you perform write operation to the tree, in fact you are adding elements outside of the immutable tree.. after you done.. the tree and the branch are recombined in a new tree.. so like this you could perform concurrent reads and writes very safelly..
in tradicional model, you must lock a value cause it could be reseted any time.. so.. you end up with a very heat zone for threads..since they act sequentially there anyway..
with imuttable data, you dont set things more than once.. its a whole new way of programming.. you may end up using a little bit more memory.. but parallelizing is natural and painless..
As with any tool, you have to know when to use it and when not to.
Like Tehblanx points out that if you want to change the state of a variable that holds an immutable object, you have to create a new object, which can be expensive, especially if the object is big and complex. Absolutely true, but that simply means that you have to intelligently decide which objects should be mutable and which should be immutable. If someone is saying that ALL objects should be immutable, well, that's just crazy talk.
I'd tend to say that objects that represent a single logical "fact" should be immutable, while objects that represent multiple facts should be mutable. Like, an Integer or a String should be immutable. A "Customer" object that contains name, address, current amount, date of last purchase, etc should be mutable. Of course I can immediately think of a hundred exceptions to such a general rule. An exception I make all the time is when I have a class that just exists as a wrapper to hold a primitive in some case where a primitive is not legal, like in a collection, but I need to update it constantly.
In Java, a method can't return multiple objects, like return a, b, c. Returning an array of objects makes the code look ugly. In this situation, I have to pass mutable objects to the method and let it change the states of these objects. However, I don't know whether returning multiple objects is a code smell or not.
The answer is none. There are not any good reasons to be mutable.
You do run in to problems with lots of frameworks(or framework versions) that require mutable objects in order to work with them(Spring I am glaring in your direction). As you work with them and fish through the code you will shake your fist in anger that you need to introduce dirty mutability into an otherwise glorious block of code when it could have been easily avoided.
I'm sure there are limited corner cases(probably more hypothetical that anything) where the overhead of object creation and collection is uncceptable. But I urge the people that would make this argument to look at languages like scala where included collections are immutable by default and then look at the bevy of performance critical apps built on top of that concept.
This is of course hyperbole. In reality, you should go with immutability first, see if it causes you any measurable problems, if it does then introduce mutability, but make sure you can prove it solves your problem. Otherwise you've just created liability for no benefit. In doing this I think you'll find objective cases for "Implementation Complexity" and "Inflexibility" very hard to make.
Some implementations of immutable objects have transactional means to update an immutable object. Similar to how databases provide safe commits and rollbacks. But in apparent contrast with many of the answers here. Immutable objects are never changed. A typical operation would be.
B = append(A,C)
B is a new object. Just like A and C. No modification was made to A or C. Internally a red black tree implementation makes such semantics fast enough to be usable.
The downside is that it is not as fast as making the operations in place. But that only compares a single part of the system. When evaluating possible downsides we need to look at the system as a whole. And I personally don't have a clear picture of the entire impact. Although I suspect immutability wins out at the end.
I know some experts contend there is contention at the top level of the red black tree. And that has a negative effect in throught-put.
My biggest worry with immutable data structures is how to save/reconstitute them. That is, if a class has final fields, I can't instantiate it and then set its fields.