Related
Java programs can be very memory hungry. For example, a Double object has 24 bytes: 8 bytes of data and 16 bytes of JVM-imposed overhead. In general, the objects that represent the primitive types are very expensive.
The same happens for any collection in the Java Standard Library. There are even some counterintuitive facts such as a HashSet being more memory hungry than a HashMap, since a HashSet contains a HashMap inside (http://docs.oracle.com/javase/7/docs/api/java/util/HashSet.html).
Could you come up with some advice when modeling data and delegation of objects in high performance settings so that these "weaknesses" of Java are mitigated?
Some techniques I use to reduce memory:
Make your own IntArrayList (etc) class that prevents boxing
Make your own IntHashMap (etc) class where keys are primitives
Use nio's ByteBuffer to store large arrays of data efficiently (and in native memory, outside heap). It's like a byte array but contains methods to store/retrieve all primitive types from the buffer at any arbitrary offset (trade memory for speed)
Don't use pooling because pools keep unused instances explicitly alive.
Use threads scarcely, they're super memory hungry (in native memory, outside heap)
When making substrings of big strings, and discarding the original, the substrings still refer to the original. So use new String to dispose of the old big string.
A linear array is smaller than a multidimensional array, and if the size of all but the last dimension is a power of two, calculating indices is fastest: array[x|y<<4] for a 16xN array.
Initialize collections and StringBuilder with an initial capacity chosen such that it prevents internal reallocation in a typical circumstance.
Use StringBuilder instead of string concatenation, because the compiled class files use new StringBuilder() without initial capacity to concatenate strings.
Depends on the application, but generally speaking
Layout data structures in (parallel) arrays of primitives
Try to make big "flat" objects, inlining otherwise sensible sub-structures
Specialize collections of primitives
Reuse objects, use object pools, ThreadLocals
Go off-heap
I cannot say these practices are "best", because they, unfortunately, make you suffer, losing the point why you are using Java, reduce flexibility, supportability, reliability, testability and other "good" properties of the codebase.
But, they certainly allow to lower memory footprint and GC pressure.
One of the memory problems that are easy to overlook in Java is memory leakage. Nicholas Greene already pointed you to memory profiling.
Many people assume that Java's garbage collection prevents memory leaks, but that is not actually true - all it takes is one forgotten reference somewhere to keep an object around in perpetuity. Paradoxically, trying to optimize your program may introduce more opportunities for memory leaks because you end up with more complex data structures.
One example for a memory leak if you are implementing, for instance, a stack:
Integer stack[];
stack = new Integer[10];
int stackPtr = 0;
// a few push operation on our stack.
stack[stackPtr++] = new Integer(5);
stack[stackPtr++] = new Integer(3);
// and pop from the stack again
--stackPtr;
--stackPtr;
// at this point, the stack is logically empty, but
// the Integer objects are still referenced by the array,
// and are basically leaked.
The correct solution would have been:
stack[--stackPtr] = null;
If you have high performance constraints and need to use collections for simple types, you might take a look on some implementations of Primitive Collections for Java.
Some are:
HPPC
GNU Trove
Apache Commons Primitives
Also, as a reference take a look at this question: Why can Java Collections not directly store Primitives types?
Luís Bianchin already gave you a few libraries which implement optimal collections in Java.
Nevertheless, it seems that you are specially concerned about Java collections' memory allocation. In that case, there are a few alternatives which are quite straight forward.
Cache
You could use a cache to limit the memory the collection (the cache) can allocate. By doing that, you only load in main memory the most frequently used entries and you don't need to load the whole data set form disk/network/whatever. I highly recommend Guava Cache as it's very well documented and pretty mature.
Persistent Collections
Sometimes a cache is not a solution for your problem. For example, in an ETL solution, you might know you will only load each entry once. For this scenario I recommend to go for persistent collections. These are disk stored collections that are way faster than traditional databases but have nice Java APIs.
MapDB and PCollections are for me the best libraries.
Profile memory usage
On top of that, if you really want to know the actual state of your program's memory allocation I highly recommend you to use a profiler. This way you will not only know how much memory you collections occupy, but also how the GC behaves over time.
In fact, you should only try an alternative to Java's collections and data structures if there is an actual memory problem, and that is something a profiler can tell you.
The JDK has a profiler called VisualVM which does a great job. Nevertheless, I recommend you to use a commercial profiler if you can afford it. The commercial profilers usually have a low impact in the application's performance when compared to VisualVM.
Memory optimal data is nice with the network.
Finally, that it's not strictly related to your question, but it's closely connected. In case you want to serialize your Java objects into an optimal binary representation I recommend you Google Protocol Buffers in Java. Protocol buffers are ideal to transfer data structures thought the network using the least bandwidth possible and having a really fast coding/decoding.
Well there is a lot of things you can do.
Here are a few problems and solutions:
When you change the value of a string in java, the string is not actually overwritten. Instead, a new string is created to replace the old one. However, the old string still exists. This can be a problem when using RAM efficiently is a concern. Here are some solutions to this problem:
When using a string to specify something like the "state" of an object or anything else that can only have a specific set of possible values, don't use a string. Instead use an enum. If you don't know what an enum is or how to use one yet, here's a link to a tutorial on what enums are and how to use them!
If you are using a string as a variable who's value will change at some point in the program, don't define a string how you usually would. Instead, use the StringBuilder class from the java.lang package. StringBuilder is a class which is used to create strings and change their values. This class handles strings differently than usual. When it is used to change the value of a string, StringBuilder doesn't create a duplicate string with a different value to replace the old string, it actually changes the value of the original string. Therefore, since you aren't creating duplicate strings, this saves RAM. Here is a link to to the StringBuilder class in the java api.
Writer and reader objects such as fileWriters and fileReaders also take up RAM. If you have a lot of them, this can also cause problems. Here are some solutions:
All reader and writer objects have a method called close(). As you can probably guess, it closes the writer or reader object. All it does is get rid of the reader or writer object. Whenever you have a reader or writer object and you reach the point in your code when you know you will never use the reader or writer object anymore, use this method. It will get rid of the reader or writer object and will free some RAM.
Every object in java takes up memory. When you have an object that you won't use anymore, it's not very convenient to keep it around.
The Object class has a method called finalize(). This method has the same effect as the close() method in reader and writer objects. When you aren't going to use an object anymore, use the finalize() method to get rid of it and free some RAM.
Beware of early optimisation.
See When is optimisation premature?
While not knowing the exact requirements of your application or runtime environment, in my experience java was able to handle anything I threw it at. Doing some profiling on your demo /proof of concept app might be time well spent if performance or garbage collection (you tagged memory leaks) is an issue.
I'm trying to get my head around mutable vs immutable objects. Using mutable objects gets a lot of bad press (e.g. returning an array of strings from a method) but I'm having trouble understanding what the negative impacts are of this. What are the best practices around using mutable objects? Should you avoid them whenever possible?
Well, there are a few aspects to this.
Mutable objects without reference-identity can cause bugs at odd times. For example, consider a Person bean with a value-based equals method:
Map<Person, String> map = ...
Person p = new Person();
map.put(p, "Hey, there!");
p.setName("Daniel");
map.get(p); // => null
The Person instance gets "lost" in the map when used as a key because its hashCode and equality were based upon mutable values. Those values changed outside the map and all of the hashing became obsolete. Theorists like to harp on this point, but in practice I haven't found it to be too much of an issue.
Another aspect is the logical "reasonability" of your code. This is a hard term to define, encompassing everything from readability to flow. Generically, you should be able to look at a piece of code and easily understand what it does. But more important than that, you should be able to convince yourself that it does what it does correctly. When objects can change independently across different code "domains", it sometimes becomes difficult to keep track of what is where and why ("spooky action at a distance"). This is a more difficult concept to exemplify, but it's something that is often faced in larger, more complex architectures.
Finally, mutable objects are killer in concurrent situations. Whenever you access a mutable object from separate threads, you have to deal with locking. This reduces throughput and makes your code dramatically more difficult to maintain. A sufficiently complicated system blows this problem so far out of proportion that it becomes nearly impossible to maintain (even for concurrency experts).
Immutable objects (and more particularly, immutable collections) avoid all of these problems. Once you get your mind around how they work, your code will develop into something which is easier to read, easier to maintain and less likely to fail in odd and unpredictable ways. Immutable objects are even easier to test, due not only to their easy mockability, but also the code patterns they tend to enforce. In short, they're good practice all around!
With that said, I'm hardly a zealot in this matter. Some problems just don't model nicely when everything is immutable. But I do think that you should try to push as much of your code in that direction as possible, assuming of course that you're using a language which makes this a tenable opinion (C/C++ makes this very difficult, as does Java). In short: the advantages depend somewhat on your problem, but I would tend to prefer immutability.
Immutable Objects vs. Immutable Collections
One of the finer points in the debate over mutable vs. immutable objects is the possibility of extending the concept of immutability to collections. An immutable object is an object that often represents a single logical structure of data (for example an immutable string). When you have a reference to an immutable object, the contents of the object will not change.
An immutable collection is a collection that never changes.
When I perform an operation on a mutable collection, then I change the collection in place, and all entities that have references to the collection will see the change.
When I perform an operation on an immutable collection, a reference is returned to a new collection reflecting the change. All entities that have references to previous versions of the collection will not see the change.
Clever implementations do not necessarily need to copy (clone) the entire collection in order to provide that immutability. The simplest example is the stack implemented as a singly linked list and the push/pop operations. You can reuse all of the nodes from the previous collection in the new collection, adding only a single node for the push, and cloning no nodes for the pop. The push_tail operation on a singly linked list, on the other hand, is not so simple or efficient.
Immutable vs. Mutable variables/references
Some functional languages take the concept of immutability to object references themselves, allowing only a single reference assignment.
In Erlang this is true for all "variables". I can only assign objects to a reference once. If I were to operate on a collection, I would not be able to reassign the new collection to the old reference (variable name).
Scala also builds this into the language with all references being declared with var or val, vals only being single assignment and promoting a functional style, but vars allowing a more C-like or Java-like program structure.
The var/val declaration is required, while many traditional languages use optional modifiers such as final in java and const in C.
Ease of Development vs. Performance
Almost always the reason to use an immutable object is to promote side effect free programming and simple reasoning about the code (especially in a highly concurrent/parallel environment). You don't have to worry about the underlying data being changed by another entity if the object is immutable.
The main drawback is performance. Here is a write-up on a simple test I did in Java comparing some immutable vs. mutable objects in a toy problem.
The performance issues are moot in many applications, but not all, which is why many large numerical packages, such as the Numpy Array class in Python, allow for In-Place updates of large arrays. This would be important for application areas that make use of large matrix and vector operations. This large data-parallel and computationally intensive problems achieve a great speed-up by operating in place.
Immutable objects are a very powerful concept. They take away a lot of the burden of trying to keep objects/variables consistent for all clients.
You can use them for low level, non-polymorphic objects - like a CPoint class - that are used mostly with value semantics.
Or you can use them for high level, polymorphic interfaces - like an IFunction representing a mathematical function - that is used exclusively with object semantics.
Greatest advantage: immutability + object semantics + smart pointers make object ownership a non-issue, all clients of the object have their own private copy by default. Implicitly this also means deterministic behavior in the presence of concurrency.
Disadvantage: when used with objects containing lots of data, memory consumption can become an issue. A solution to this could be to keep operations on an object symbolic and do a lazy evaluation. However, this can then lead to chains of symbolic calculations, that may negatively influence performance if the interface is not designed to accommodate symbolic operations. Something to definitely avoid in this case is returning huge chunks of memory from a method. In combination with chained symbolic operations, this could lead to massive memory consumption and performance degradation.
So immutable objects are definitely my primary way of thinking about object-oriented design, but they are not a dogma.
They solve a lot of problems for clients of objects, but also create many, especially for the implementers.
Check this blog post: http://www.yegor256.com/2014/06/09/objects-should-be-immutable.html. It explains why immutable objects are better than mutable. In short:
immutable objects are simpler to construct, test, and use
truly immutable objects are always thread-safe
they help to avoid temporal coupling
their usage is side-effect free (no defensive copies)
identity mutability problem is avoided
they always have failure atomicity
they are much easier to cache
You should specify what language you're talking about. For low-level languages like C or C++, I prefer to use mutable objects to conserve space and reduce memory churn. In higher-level languages, immutable objects make it easier to reason about the behavior of the code (especially multi-threaded code) because there's no "spooky action at a distance".
A mutable object is simply an object that can be modified after it's created/instantiated, vs an immutable object that cannot be modified (see the Wikipedia page on the subject). An example of this in a programming language is Pythons lists and tuples. Lists can be modified (e.g., new items can be added after it's created) whereas tuples cannot.
I don't really think there's a clearcut answer as to which one is better for all situations. They both have their places.
Shortly:
Mutable instance is passed by reference.
Immutable instance is passed by value.
Abstract example. Lets suppose that there exists a file named txtfile on my HDD. Now, when you are asking me to give you the txtfile file, I can do it in the following two modes:
I can create a shortcut to the txtfile and pass shortcut to you, or
I can do a full copy of the txtfile file and pass copied file to you.
In the first mode, the returned file represents a mutable file, because any change into the shortcut file will be reflected into the original one as well, and vice versa.
In the second mode, the returned file represents an immutable file, because any change into the copied file will not be reflected into the original one, and vice versa.
If a class type is mutable, a variable of that class type can have a number of different meanings. For example, suppose an object foo has a field int[] arr, and it holds a reference to a int[3] holding the numbers {5, 7, 9}. Even though the type of the field is known, there are at least four different things it can represent:
A potentially-shared reference, all of whose holders care only that it encapsulates the values 5, 7, and 9. If foo wants arr to encapsulate different values, it must replace it with a different array that contains the desired values. If one wants to make a copy of foo, one may give the copy either a reference to arr or a new array holding the values {1,2,3}, whichever is more convenient.
The only reference, anywhere in the universe, to an array which encapsulates the values 5, 7, and 9. set of three storage locations which at the moment hold the values 5, 7, and 9; if foo wants it to encapsulate the values 5, 8, and 9, it may either change the second item in that array or create a new array holding the values 5, 8, and 9 and abandon the old one. Note that if one wanted to make a copy of foo, one must in the copy replace arr with a reference to a new array in order for foo.arr to remain as the only reference to that array anywhere in the universe.
A reference to an array which is owned by some other object that has exposed it to foo for some reason (e.g. perhaps it wants foo to store some data there). In this scenario, arr doesn't encapsulate the contents of the array, but rather its identity. Because replacing arr with a reference to a new array would totally change its meaning, a copy of foo should hold a reference to the same array.
A reference to an array of which foo is the sole owner, but to which references are held by other object for some reason (e.g. it wants to have the other object to store data there--the flipside of the previous case). In this scenario, arr encapsulates both the identity of the array and its contents. Replacing arr with a reference to a new array would totally change its meaning, but having a clone's arr refer to foo.arr would violate the assumption that foo is the sole owner. There is thus no way to copy foo.
In theory, int[] should be a nice simple well-defined type, but it has four very different meanings. By contrast, a reference to an immutable object (e.g. String) generally only has one meaning. Much of the "power" of immutable objects stems from that fact.
Mutable collections are in general faster than their immutable counterparts when used for in-place
operations.
However, mutability comes at a cost: you need to be much more careful sharing them between
different parts of your program.
It is easy to create bugs where a shared mutable collection is updated
unexpectedly, forcing you to hunt down which line in a large codebase is performing the unwanted update.
A common approach is to use mutable collections locally within a function or private to a class where there
is a performance bottleneck, but to use immutable collections elsewhere where speed is less of a concern.
That gives you the high performance of mutable collections where it matters most, while not sacrificing
the safety that immutable collections give you throughout the bulk of your application logic.
If you return references of an array or string, then outside world can modify the content in that object, and hence make it as mutable (modifiable) object.
Immutable means can't be changed, and mutable means you can change.
Objects are different than primitives in Java. Primitives are built in types (boolean, int, etc) and objects (classes) are user created types.
Primitives and objects can be mutable or immutable when defined as member variables within the implementation of a class.
A lot of people people think primitives and object variables having a final modifier infront of them are immutable, however, this isn't exactly true. So final almost doesn't mean immutable for variables. See example here
http://www.siteconsortium.com/h/D0000F.php.
General Mutable vs Immutable
Unmodifiable - is a wrapper around modifiable. It guarantees that it can not be changed directly(but it is possibly using backing object)
Immutable - state of which can not be changed after creation. Object is immutable when all its fields are immutable. It is a next step of Unmodifiable object
Thread safe
The main advantage of Immutable object is that it is a naturally for concurrent environment. The biggest problem in concurrency is shared resource which can be changed any of thread. But if an object is immutable it is read-only which is thread safe operation. Any modification of an original immutable object return a copy
source of truth, side-effects free
As a developer you are completely sure that immutable object's state can not be changed from any place(on purpose or not). For example if a consumer uses immutable object he is able to use an original immutable object
compile optimisation
Improve performance
Disadvantage:
Copying of object is more heavy operation than changing a mutable object, that is why it has some performance footprint
To create an immutable object you should use:
1. Language level
Each language contains tools to help you with it. For example:
Java has final and primitives
Swift has let and struct[About].
Language defines a type of variable. For example:
Java has primitive and reference type,
Swift has value and reference type[About].
For immutable object more convenient is primitives and value type which make a copy by default. As for reference type it is more difficult(because you are able to change object's state out of it) but possible. For example you can use clone pattern on a developer level to make a deep(instead of shallow) copy.
2. Developer level
As a developer you should not provide an interface for changing state
[Swift] and [Java] immutable collection
I am wondering about the benefits of having the string-type immutable from the programmers point-of-view.
Technical benefits (on the compiler/language side) can be summarized mostly that it is easier to do optimisations if the type is immutable. Read here for a related question.
Also, in a mutable string type, either you have thread-safety already built-in (then again, optimisations are harder to do) or you have to do it yourself. You will have in any case the choice to use a mutable string type with built-in thread safety, so that is not really an advantage of immutable string-types. (Again, it will be easier to do the handling and optimisations to ensure thread-safety on the immutable type but that is not the point here.)
But what are the benefits of immutable string-types in the usage? What is the point of having some types immutable and others not? That seems very inconsistent to me.
In C++, if I want to have some string to be immutable, I am passing it as const reference to a function (const std::string&). If I want to have a changeable copy of the original string, I am passing it as std::string. Only if I want to have it mutable, I am passing it as reference (std::string&). So I just have the choice about what I want to do. I can just do this with every possible type.
In Python or in Java, some types are immutable (mostly all primitive types and strings), others are not.
In pure functional languages like Haskell, everything is immutable.
Is there a good reason why it make sense to have this inconsistency? Or is it just purely for technical lower level reasons?
What is the point of having some
types immutable and others not?
Without some mutable types, you'd have to go the whole hog to pure functional programming -- a completely different paradigm than the OOP and procedural approaches which are currently most popular, and, while extremely powerful, apparently very challenging to a lot of programmers (what happens when you do need side effects in a language where nothing is mutable, and in real-world programming of course you inevitably do, is part of the challenge -- Haskell's Monads are a very elegant approach, for example, but how many programmers do you know that fully and confidently understand them and can use them as well as typical OOP constructs?-).
If you don't understand the enormous value of having multiple paradigms available (both FP one and ones crucially relying on mutable data), I recommend studying Haridi's and Van Roy's masterpiece, Concepts, Techniques, and Models of Computer Programming -- "a SICP for the 21st Century", as I once described it;-).
Most programmers, whether familiar with Haridi and Van Roy or not, will readily admit that having at least some mutable data types is important to them. Despite the sentence I've quoted above from your Q, which takes a completely different viewpoint, I believe that may also be the root of your perplexity: not "why some of each", but rather "why some immutables at all".
The "thoroughly mutable" approach was once (accidentally) obtained in a Fortran implementation. If you had, say,
SUBROUTINE ZAP(I)
I = 0
RETURN
then a program snippet doing, e.g.,
PRINT 23
ZAP(23)
PRINT 23
would print 23, then 0 -- the number 23 had been mutated, so all references to 23 in the rest of the program would in fact refer to 0. Not a bug in the compiler, technically: Fortran had subtle rules about what your program is and is not allowed to do in passing constants vs variables to procedures that assign to their arguments, and this snippet violates those little-known, non-compiler-enforceable rules, so it's a but in the program, not in the compiler. In practice, of course, the number of bugs caused this way was unacceptably high, so typical compilers soon switched to less destructive behavior in such situations (putting constants in read-only segments to get a runtime error, if the OS supported that; or, passing a fresh copy of the constant rather than the constant itself, despite the overhead; and so forth) even though technically they were program bugs allowing the compiler to display undefined behavior quite "correctly";-).
The alternative enforced in some other languages is to add the complication of multiple ways of parameter passing -- most notably perhaps in C++, what with by-value, by-reference, by constant reference, by pointer, by constant pointer, ... and then of course you see programmers baffled by declarations such as const foo* const bar (where the rightmost const is basically irrelevant if bar is an argument to some function... but crucial instead if bar is a local variable...!-).
Actually Algol-68 probably went farther along this direction (if you can have a value and a reference, why not a reference to a reference? or reference to reference to reference? &c -- Algol 68 put no limitations on this, and the rules to define what was going on are perhaps the subtlest, hardest mix ever found in an "intended for real use" programming language). Early C (which only had by-value and by-explicit-pointer -- no const, no references, no complications) was no doubt in part a reaction to it, as was the original Pascal. But const soon crept in, and complications started mounting again.
Java and Python (among other languages) cut through this thicket with a powerful machete of simplicity: all argument passing, and all assignment, is "by object reference" (never reference to a variable or other reference, never semantically implicit copies, &c). Defining (at least) numbers as semantically immutable preserves programmers' sanity (as well as this precious aspect of language simplicity) by avoiding "oopses" such as that exhibited by the Fortran code above.
Treating strings as primitives just like numbers is quite consistent with the languages' intended high semantic level, because in real life we do need strings that are just as simple to use as numbers; alternatives such as defining strings as lists of characters (Haskell) or as arrays of characters (C) poses challenges to both the compiler (keeping efficient performance under such semantics) and the programmer (effectively ignoring this arbitrary structuring to enable use of strings as simple primitives, as real life programming often requires).
Python went a bit further by adding a simple immutable container (tuple) and tying hashing to "effective immutability" (which avoids certain surprises to the programmer that are found, e.g., in Perl, with its hashes allowing mutable strings as keys) -- and why not? Once you have immutability (a precious concept that saves the programmer from having to learn about N different semantics for assignment and argument passing, with N tending to increase with time;-), you might as well get full mileage out of it;-).
I am not sure if this qualifies as non-technical, nevertheless: if strings are mutable, then most(*) collections need to make private copies of their string keys.
Otherwise a "foo" key changed externally to "bar" would result in "bar" sitting in the internal structures of the collection where "foo" is expected. This way "foo" lookup would find "bar", which is less of a problem (return nothing, reindex the offending key) but "bar" lookup would find nothing, which is a bigger problem.
(*) A dumb collection that does a linear scan of all keys on each lookup would not have to do that, since it would naturally accomodate key changes.
There is no overarching, fundamental reason not to have strings mutable. The best explanation I have found for their immutability is that it promotes a more functional, less side-effectsy way of programming. This ends up being cleaner, more elegant, and more Pythonic.
Semantically, they should be immutable, no? The string "hello" should always represent "hello". You can't change it any more than you can change the number three!
Not sure if you would count this as a 'technical low level' benefit, but the fact that immutable string is implicitly threadsafe saves you a lot of effort of coding for thread safety.
Slightly toy example...
Thread A - Check user with login name FOO has permission to do something, return true
Thread B - Modify user string to login name BAR
Thread A - Perform some operation with login name BAR due to previous permission check passing against FOO.
The fact that the String can't change saves you the effort of guarding against this.
If you want full consistency you can only make everything immutable, because mutable Bools or Ints would simply make no sense at all. Some functional languages do that in fact.
Python's philosophy is "Simple is better than complex." In C you need to be aware that strings can change and think about how that can affect you. Python assumes that the default use case for strings is "put text together" - there is absolutely nothing you need to know about strings to do that. But if you want your strings to change, you just have to use a more appropriate type (ie lists, StringIO, templates, etc).
In a language with reference semantics for user-defined types, having mutable strings would be a desaster, because every time you assign a string variable, you would alias a mutable string object, and you would have to do defensive copies all over the place. That's why strings are immutable in Java and C# -- if the string object is immutable, it does not matter how many variables point to it.
Note that in C++, two string variables never share state (at least conceptionally -- technically, there might be copy-on-write going on, but that is getting out of fashion due to inefficiencies in multi-threading scenarios).
If strings are mutable, then many consumers of a string will have to to make copies of it. If strings are immutable, this is far less important (unless immutability is enforced by hardware interlocks, it might not be a bad idea for some security-conscious consumers of a string to make their own copies in case the strings they're given aren't as immutable as they should be).
The StringBuilder class is pretty good, though I think it would be nicer if it had a "Value" property (read would be equivalent to ToString, but it would show up in object inspectors; write would allow direct setting of the whole content) and a default widening conversion to a string. It would have been nice in theory to have MutableString type descended from a common ancestor with String, so a mutable string could be passed to a function which didn't care whether a string was mutable, though I suspect that optimizations which rely on the fact that Strings have a certain fixed implementation would have been less effective.
The main advantage for the programmer is that with mutable strings, you never need to worry about who might alter your string. Therefore, you never have to consciously decide "Should I copy this string here?".
In class today, my professor was discussing how to structure a class. The course primarily uses Java and I have more Java experience than the teacher (he comes from a C++ background), so I mentioned that in Java one should favor immutability. My professor asked me to justify my answer, and I gave the reasons that I've heard from the Java community:
Safety (especially with threading)
Reduced object count
Allows certain optimizations (especially for garbage collector)
The professor challenged my statement by saying that he'd like to see some statistical measurement of these benefits. I cited a wealth of anecdotal evidence, but even as I did so, I realized he was right: as far as I know, there hasn't been an empirical study of whether immutability actually provides the benefits it promises in real-world code. I know it does from experience, but others' experiences may differ.
So, my question is, have there been any statistical studies done on the effects of immutability in real-world code?
I would point to Item 15 in Effective Java. The value of immutability is in the design (and it isn't always appropriate - it is just a good first approximation) and design preferences are rarely argued from a statistical point of view, but we have seen mutable objects (Calendar, Date) that have gone really bad, and serious replacements (JodaTime, JSR-310) have opted for immutability.
The biggest advantage of immutability in Java, in my opinion, is simplicity. It becomes much simpler to reason about the state of an object, if that state cannot change. This is of course even more important in a multi-threaded environment, but even in simple, linear single-threaded programs it can make things far easier to understand.
See this page for more examples.
So, my question is, have there been
any statistical studies done on the
effects of immutability in real-world
code?
I'd argue that your professor is just being obtuse -- not necessarily intentionally or even a bad thing. Its just that the question is too vague. Two real problems with the question:
"Statistical studies on the effect of [x]" doesn't really mean anything if you don't specify what kind of measurements you're looking for.
"Real-world code" doesn't really mean anything unless you state a specific domain. Real world code includes scientific computing, game development, blog engines, automated proof generators, stored procedures, operating system kernals, etc
For what its worth, the ability for the compiler to optimize immutable objects is well-documented. Off the top of my head:
The Haskell compiler performs deforestation (also called short-cut fusion), where Haskell will transform the expression map f . map g to map f . g. Since Haskell functions are immutable, these expressions are guaranteed to produce equivalent output, but the second function runs twice as fast since we don't need to create an intermediate list.
Common subexpression elimination where we could convert x = foo(12); y = foo(12) to temp = foo(12); x = temp; y = temp; is only possible if the compiler can guarantee foo is a pure function. To my knowledge, the D compiler can perform substitutions like this using the pure and immutable keywords. If I remember correctly, some C and C++ compilers will aggressively optimize calls to these functions marked "pure" (or whatever the equivalent keyword is).
So long as we don't have mutable state, a sufficiently smart compiler can execute linear blocks of code multiple threads with a guarantee that we won't corrupt the state of variables in another thread.
Regarding concurrency, the pitfalls of concurrency using mutable state are well-documented and don't need to be restated.
Sure, this is all anecdotal evidence, but that's pretty much the best you'll get. The immutable vs mutable debate is largely a pissing match, and you are not going to find a paper making a sweeping generalization like "functional programming is superior to imperative programming".
At most, you'll probably find that you can summarize the benefits of immutable vs mutable in a set of best practices rather than as codified studies and statistics. For example, mutable state is the enemy of multithreaded programming; on the other hand, mutable queues and arrays are often easier to write and more efficient in practice than their immutable variants.
It takes practice, but eventually you learn to use the right tool for the job, rather than shoehorning your favorite pet paradigm into project.
I think your professor's being overly stubborn (probably deliberately, to push you to a fuller understanding). Really the benefits of immutability are not so much what the complier can do with optimisations, but really that it's much easier for us humans to read and understand. A variable that is guaranteed to be set when the object is created and is guaranteed not to change afterwards, is much easier to grok and reason with than one which is this value now but might be set to some other value later.
This is especially true with threading, in that you don't need to worry about processor caches and monitors and all that boilerplate that comes with avoiding concurrent modifications, when the language guarantees that no such modification can possibly occur.
And once you express the benefits of immutability as "the code is easier to follow", it feels a bit sillier to ask for empirical measurements of productivity increases vis-a-vis "easier-to-followness".
On the other hand, the compiler and Hotspot can probably perform certain optimisations based on knowing that a value can never change - like you I have a feeling that this would take place and is a good things but I'm not sure of the details. It's a lot more likely that there will be empirical data for the types of optimisation that can occur, and how much faster the resulting code is.
Don't argue with the prof. You have nothing to gain.
These are open questions, like dynamic vs static typing. We sometimes think functional techniques involving immutable data are better for various reasons, but it's mostly a matter of style so far.
What would you objectively measure? GC and object count could be measured with mutable/immutable versions of the same program (although how typical that would be would be subjective, so this is a pretty weak argument). I can't imagine how you could measure the removal of threading bugs, except maybe anecdotally by comparison with a real world example of a production application plagued by intermittent issues fixed by adding immutability.
Immutability is a good thing for value objects. But how about other things? Imagine an object that creates a statistic:
Stats s = new Stats ();
... some loop ...
s.count ();
s.end ();
s.print ();
which should print "Processed 536.21 rows/s". How do you plan to implement count() with an immutable? Even if you use an immutable value object for the counter itself, s can't be immutable since it would have to replace the counter object inside of itself. The only way out would be:
s = s.count ();
which means to copy the state of s for every round in the loop. While this can be done, it surely isn't as efficient as incrementing the internal counter.
Moreover, most people would fail to use this API right because they would expect count() to modify the state of the object instead of returning a new one. So in this case, it would create more bugs.
As other comments have claimed, it would be very, very hard to collect statistics on the merits of immutable objects, because it would be virtually impossible to find control cases - pairs of software applications which are alike in every way, except that one uses immutable objects and the other does not. (In nearly every case, I would claim that one version of that software was written some time after the other, and learned numerous lessons from the first, and so improvements in performance will have many causes.) Any experienced programmer who thinks about this for a moment ought to realize this. I think your professor is trying to deflect your suggestion.
Meanwhile, it is very easy to make cogent arguments in favor of immutability, at least in Java, and probably in C# and other OO languages. As Yishai states, Effective Java makes this argument well. So does the copy of Java Concurrency in Practice sitting on my bookshelf.
Immutable objects allow code which to share an object's value by sharing a reference. Mutable objects, however, have the identity that code which wants to share an object's identity to do so by sharing a reference. Both kinds of sharing are essential in most applications. If one doesn't have immutable objects available, it's possible to share values by copying them into either new objects or objects supplied by the intended recipient of those values. Getting my without mutable objects is much harder. One could somewhat "fake" mutable objects by saying stateOfUniverse = stateOfUniverse.withSomeChange(...), but would requires that nothing else modify stateOfUniverse while its withSomeChange method is running [precluding any sort of multi-threading]. Further, if one were e.g. trying to track a fleet of trucks, and part of the code was interested in one particular truck, it would be necessary for that code to always look up that truck in a table of trucks any time it might have changed.
A better approach is to subdivide the universe into entities and values. Entities would have changeable characteristics, but an immutable identity, so a storage location of e.g. type Truck could continue to identify the same truck even as the truck itself changes position, loads and unloads cargo, etc. Values would not have generally have a particular identity, but would have immutable characteristics. A Truck might store its location as type WorldCoordinate. A WorldCoordinate that represents 45.6789012N 98.7654321W would continue to so as long as any reference to it exists; if a truck that was at that location moved north slightly, it would create a new WorldCoordinate to represent 45.6789013N 98.7654321W, abandon the old one, and store a reference to that new one.
It is generally easiest to reason about code when everything encapsulates either an immutable value or an immutable identity, and when the things which are supposed to have an immutable identity are mutable. If one didn't want to use any mutable objects outside a variable stateOfUniverse, updating a truck's position would require something like:
ImmutableMapping<int,Truck> trucks = stateOfUniverse.getTrucks();
Truck myTruck = trucks.get(myTruckId);
myTruck = myTruck.withLocation(newLocation);
trucks = trucks.withItem(myTruckId,myTruck);
stateOfUniverse = stateOfUniverse.withTrucks(trucks);
but reasoning about that code would be more difficult than would be:
myTruck.setLocation(newLocation);
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
The advantages of immutable objects in Java seem clear:
consistent state
automatic thread safety
simplicity
You can favour immutability by using private final fields and constructor injection.
But, what are the downsides to favouring immutable objects in Java?
i.e.
incompatibility with ORM or web presentation tools?
Inflexible design?
Implementation complexities?
Is it possible to design a large-scale system (deep object graph) that predominately uses immutable objects?
But, what are the downsides to
favouring immutable objects in Java?
incompatibility with ORM or web
presentation tools?
Reflection based frameworks are complicated by immutable objects since they requires constructor injection:
there are no default arguments in Java, which forces us to ALWAYS provide all of the necessary dependencies
constructor overriding can be messy
constructor argument names are not usually available through reflection, which forces us to depend on argument order for dependency resolution
Implementation complexities?
Creating immutable objects is still a boring task; the compiler should take care of the implementation details, as in groovy
Is it possible to design a large-scale system (deep object graph) that predominately uses immutable objects?
definitely yes; immutable objects makes great building blocks for other objects (they favor composition) since it's much easier to maintain the invariant of a complex object when you can rely on its immutable components. The only true downside to me is about creating many temporary objects (e.g. String concat was a problem in the past).
With immutability, any time you need to modify data, you need to create a new object. This can be expensive.
Imagine needing to modify one bit in an object that consumes several megabytes of memory: you would need to instantiate a whole new object, allocate memory, etc. If you need to do this many times, mutability becomes very attractive.
If you go for mutability then you will find that whenever you need to call a method that you don't want to have the object change, or you need to return an object that is part of the internal state, you need to make a defensive copy.
If you really look at programs that make use of mutible objects you will find that they are prone to "attack" by modifying:
objects passed to constructors
objects passed to methods
objects returned from methods.
The issue doesn't show up very often because most programs don't change the data (they are in reality immutable by virtue of them never changing).
I personally make every thing I possibly can final. I probably have 90%-95% of all variables (parameters, local, instance, static, exceptions, etc...) marked as final. There are some cases where it has to be mutable, but the vast majority of cases it does not.
I think it might depend on your focus. If you are writing libraries for 3rd parties to use you think about this much more than if you are writing an application that only you (or your team) will maintain.
I find that you can write large scale applications using immutable objects for the majority of the system without too much pain.
Fundamentally, in the real world, the state associated with many particular identities will change. If I ask what is "the present position of Joe's Buick", today it might be a location in Seattle, and tomorrow it might be a location in Los Alamos. It would be possible to define and create a GeographicLocation object whose value will always represent the location where Joe's Buick was at some particular moment in time and would never changes--if today it represents a spot in Seattle, then it will always do so. Such an object, however, would have no continuing identity as "the present location of Joe's Buick".
It may also be possible to define things so that there is a VehicleLocation object which is connected to Joe's Buick such that the object always represents "the present location of Joe's Buick". Such an object could retains its identity as "the present location of Joe's Buick", even as the car moves around, but would not represent a constant geographical location. Defining "identity" may be tricky if one considers the scenario where Joe sells his Buick to Bob and buys a Ford--should the object track "the present location of Joe's Ford" or "the present location of Bob's Buick"--but in many cases such issues may be avoided by using a data model that guarantees that some aspects of object identity will never change.
It isn't possible for everything about an object to be immutable. If an object is immutable, then it cannot have an immutable identity that encapsulates anything beyond its current state. If an object is mutable, however, it can have an immutable identity whose meaning transcends its present state. In many situations, having an immutable identity is more useful than having an immutable state, and in such situations mutable objects are nearly essential. While it is possible in some cases to "simulate" mutable objects by having an immutable object which would search through the most recent version of an immutable objects to find information that may "change" between one version and the next, such an approaches are often extremely inefficient. Even if one could magically receive once per minute a bound book that gave the location of every vehicle everywhere, looking up "Joe's Buick" in the book would take a lot longer than merely asking a "present location of Joe's Buick" object which would always know where the car was.
You pretty much answered your own question. The JavaBean specification, I don't believe, mentions anything about immutability, yet JavaBeans are the bread and butter of many Java frameworks.
The concept of immutable types is somewhat uncommon for people used to imperative programming styles. However, for many situations immutability has serious advantages, you named the most important ones already.
There are good ways to implement immutable balanced trees, queues, stacks, dequeues and other data structures. And in fact many modern programming languages / frameworks only support immutable strings because of their advantages and sometimes also other objects.
With an immutable object, if the value needs to be changed, then it must be replaced with a new instance. Depending on the lifecycle of the object, replacing it with a different instance can potentially increase the tenured (long) garbage collection time. This becomes more critical if the object is kept around in memory long enough to be placed in the tenured generation.
The problem in java is that one has to live with all those objects, where the class looks like:
class Mutable {
State1 f1;
MoreState f2;
void doSomething() { // mutate the state, but don't document it }
void doSomethingElse() /// mutate the state heavily, do not mention in doc
}
(Note the missing Cloneable interface).
The problem with the garbage collector is not such a big one nowadays. The VM's are happy with short living objects.
Advances in Compiler/JIT technology will make it possible, sooner or later, to optimize intermediate temporary object creation away. For example:
BigInteger three =, two =, i1 = ...;
BigInteger i2 = i1.mul(three).div(two);
The JIT could notice that the intermediate object i1.mul(three) can be used for the end result and call a variant of the div method that works on a mutable accumulator.
See Functional Java to attain a comprehensive answer to your question.
Immutability, as every other design pattern, should only be used when you need it. You give the example of thread safety: In a highly threaded application, you could favor immutability over the added expense of making it thread safe yourself.
However, if your design requires objects to be mutable, don't go out of your way to make them immutable, just because "it's a design pattern".
As for your graph, you could choose to make your nodes immutable and let another class take care of the connections between them, or you could make a mutable node that takes care of its own children and has an immutable value class.
Probably the biggest cost of using immutabile objects in Java is that future developers won't be expecting it or used to that style. Expect to either document heavily or watch alot of your objects spawn mutable peers over time.
That being said, the only real technical reason I can think of to avoid immutable objects is GC churn. For most applications, I don't think this is a compelling reason to avoid them.
The biggest thing I've ever done with a ~90% immutable objects was a toy scheme-esque interpreter, so its certainly possible to do complex Java projects.
in immutable data you dont set things twice... see haskell and scala vals (and clojure of cource)...
for example.. for a data structure.. like a tree, when you perform write operation to the tree, in fact you are adding elements outside of the immutable tree.. after you done.. the tree and the branch are recombined in a new tree.. so like this you could perform concurrent reads and writes very safelly..
in tradicional model, you must lock a value cause it could be reseted any time.. so.. you end up with a very heat zone for threads..since they act sequentially there anyway..
with imuttable data, you dont set things more than once.. its a whole new way of programming.. you may end up using a little bit more memory.. but parallelizing is natural and painless..
As with any tool, you have to know when to use it and when not to.
Like Tehblanx points out that if you want to change the state of a variable that holds an immutable object, you have to create a new object, which can be expensive, especially if the object is big and complex. Absolutely true, but that simply means that you have to intelligently decide which objects should be mutable and which should be immutable. If someone is saying that ALL objects should be immutable, well, that's just crazy talk.
I'd tend to say that objects that represent a single logical "fact" should be immutable, while objects that represent multiple facts should be mutable. Like, an Integer or a String should be immutable. A "Customer" object that contains name, address, current amount, date of last purchase, etc should be mutable. Of course I can immediately think of a hundred exceptions to such a general rule. An exception I make all the time is when I have a class that just exists as a wrapper to hold a primitive in some case where a primitive is not legal, like in a collection, but I need to update it constantly.
In Java, a method can't return multiple objects, like return a, b, c. Returning an array of objects makes the code look ugly. In this situation, I have to pass mutable objects to the method and let it change the states of these objects. However, I don't know whether returning multiple objects is a code smell or not.
The answer is none. There are not any good reasons to be mutable.
You do run in to problems with lots of frameworks(or framework versions) that require mutable objects in order to work with them(Spring I am glaring in your direction). As you work with them and fish through the code you will shake your fist in anger that you need to introduce dirty mutability into an otherwise glorious block of code when it could have been easily avoided.
I'm sure there are limited corner cases(probably more hypothetical that anything) where the overhead of object creation and collection is uncceptable. But I urge the people that would make this argument to look at languages like scala where included collections are immutable by default and then look at the bevy of performance critical apps built on top of that concept.
This is of course hyperbole. In reality, you should go with immutability first, see if it causes you any measurable problems, if it does then introduce mutability, but make sure you can prove it solves your problem. Otherwise you've just created liability for no benefit. In doing this I think you'll find objective cases for "Implementation Complexity" and "Inflexibility" very hard to make.
Some implementations of immutable objects have transactional means to update an immutable object. Similar to how databases provide safe commits and rollbacks. But in apparent contrast with many of the answers here. Immutable objects are never changed. A typical operation would be.
B = append(A,C)
B is a new object. Just like A and C. No modification was made to A or C. Internally a red black tree implementation makes such semantics fast enough to be usable.
The downside is that it is not as fast as making the operations in place. But that only compares a single part of the system. When evaluating possible downsides we need to look at the system as a whole. And I personally don't have a clear picture of the entire impact. Although I suspect immutability wins out at the end.
I know some experts contend there is contention at the top level of the red black tree. And that has a negative effect in throught-put.
My biggest worry with immutable data structures is how to save/reconstitute them. That is, if a class has final fields, I can't instantiate it and then set its fields.